sourceweave-web-search 0.2.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- sourceweave_web_search-0.2.0/LICENSE +21 -0
- sourceweave_web_search-0.2.0/PKG-INFO +343 -0
- sourceweave_web_search-0.2.0/README.md +314 -0
- sourceweave_web_search-0.2.0/pyproject.toml +69 -0
- sourceweave_web_search-0.2.0/src/sourceweave_web_search/__init__.py +4 -0
- sourceweave_web_search-0.2.0/src/sourceweave_web_search/build_openwebui.py +64 -0
- sourceweave_web_search-0.2.0/src/sourceweave_web_search/cli.py +189 -0
- sourceweave_web_search-0.2.0/src/sourceweave_web_search/config.py +68 -0
- sourceweave_web_search-0.2.0/src/sourceweave_web_search/mcp_server.py +206 -0
- sourceweave_web_search-0.2.0/src/sourceweave_web_search/tool.py +2242 -0
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 Mohammad ElNaqa
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
|
@@ -0,0 +1,343 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: sourceweave-web-search
|
|
3
|
+
Version: 0.2.0
|
|
4
|
+
Summary: MCP server and CLI for web search and page reading with SearXNG, Crawl4AI, and Redis
|
|
5
|
+
Keywords: crawl4ai,mcp,model-context-protocol,openwebui,search,web-search
|
|
6
|
+
Author: Mohammad ElNaqa
|
|
7
|
+
Author-email: Mohammad ElNaqa <55245971+MRNAQA@users.noreply.github.com>
|
|
8
|
+
License-Expression: MIT
|
|
9
|
+
License-File: LICENSE
|
|
10
|
+
Classifier: Development Status :: 3 - Alpha
|
|
11
|
+
Classifier: Intended Audience :: Developers
|
|
12
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
13
|
+
Classifier: Operating System :: OS Independent
|
|
14
|
+
Classifier: Programming Language :: Python :: 3
|
|
15
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
16
|
+
Classifier: Topic :: Internet :: WWW/HTTP :: Indexing/Search
|
|
17
|
+
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
|
|
18
|
+
Requires-Dist: aiohttp
|
|
19
|
+
Requires-Dist: loguru
|
|
20
|
+
Requires-Dist: markitdown[docx,pdf,pptx,xlsx]
|
|
21
|
+
Requires-Dist: mcp>=1.8.0
|
|
22
|
+
Requires-Dist: pydantic
|
|
23
|
+
Requires-Dist: redis>=5.0
|
|
24
|
+
Requires-Python: >=3.12
|
|
25
|
+
Project-URL: Homepage, https://github.com/MRNAQA/sourceweave-web-search
|
|
26
|
+
Project-URL: Repository, https://github.com/MRNAQA/sourceweave-web-search
|
|
27
|
+
Project-URL: Issues, https://github.com/MRNAQA/sourceweave-web-search/issues
|
|
28
|
+
Description-Content-Type: text/markdown
|
|
29
|
+
|
|
30
|
+
# SourceWeave Web Search
|
|
31
|
+
|
|
32
|
+
<!-- mcp-name: io.github.mrnaqa/sourceweave-web-search -->
|
|
33
|
+
|
|
34
|
+
SourceWeave Web Search is an MCP server and CLI for web search plus follow-up page reading.
|
|
35
|
+
|
|
36
|
+
It uses SearXNG for search, Crawl4AI for HTML extraction, and Redis or Valkey for caching.
|
|
37
|
+
|
|
38
|
+
For most users, the setup is simple:
|
|
39
|
+
|
|
40
|
+
1. run the supporting services locally in containers, or point at existing external endpoints
|
|
41
|
+
2. start the MCP server with `uvx`
|
|
42
|
+
3. connect your MCP client to the running server over `stdio` or local HTTP
|
|
43
|
+
|
|
44
|
+
## Key Features
|
|
45
|
+
|
|
46
|
+
- MCP server with `stdio`, `sse`, and `streamable-http` transports
|
|
47
|
+
- lean search plus follow-up page reading for MCP clients
|
|
48
|
+
- explicit per-URL document conversion for PDFs and other supported documents
|
|
49
|
+
- focused reads, related-link limits, image metadata, and page-quality hints
|
|
50
|
+
- publishable Python package, container image, and generated OpenWebUI artifact
|
|
51
|
+
- compatible with OpenCode, VS Code Copilot, and other MCP clients
|
|
52
|
+
|
|
53
|
+
## Requirements
|
|
54
|
+
|
|
55
|
+
- Python `3.12+`
|
|
56
|
+
- a reachable SearXNG endpoint
|
|
57
|
+
- a reachable Crawl4AI endpoint
|
|
58
|
+
- a reachable Redis or Valkey instance
|
|
59
|
+
|
|
60
|
+
Optional:
|
|
61
|
+
|
|
62
|
+
- Docker and Docker Compose for the repo-local stack
|
|
63
|
+
|
|
64
|
+
## Recommended Local Deployment
|
|
65
|
+
|
|
66
|
+
Start the supporting services locally:
|
|
67
|
+
|
|
68
|
+
```bash
|
|
69
|
+
git clone https://github.com/MRNAQA/sourceweave-web-search.git
|
|
70
|
+
cd sourceweave-web-search
|
|
71
|
+
cp .env.example .env
|
|
72
|
+
docker compose up -d redis crawl4ai searxng
|
|
73
|
+
```
|
|
74
|
+
|
|
75
|
+
Then start the MCP server from the published package with `uvx` and point it at those local endpoints:
|
|
76
|
+
|
|
77
|
+
```bash
|
|
78
|
+
SOURCEWEAVE_SEARCH_SEARXNG_BASE_URL="http://127.0.0.1:19080/search?format=json&q=<query>" \
|
|
79
|
+
SOURCEWEAVE_SEARCH_CRAWL4AI_BASE_URL="http://127.0.0.1:19235" \
|
|
80
|
+
SOURCEWEAVE_SEARCH_CACHE_REDIS_URL="redis://127.0.0.1:16379/2" \
|
|
81
|
+
uvx --from sourceweave-web-search sourceweave-search-mcp
|
|
82
|
+
```
|
|
83
|
+
|
|
84
|
+
For a local HTTP MCP endpoint instead of `stdio`:
|
|
85
|
+
|
|
86
|
+
```bash
|
|
87
|
+
SOURCEWEAVE_SEARCH_SEARXNG_BASE_URL="http://127.0.0.1:19080/search?format=json&q=<query>" \
|
|
88
|
+
SOURCEWEAVE_SEARCH_CRAWL4AI_BASE_URL="http://127.0.0.1:19235" \
|
|
89
|
+
SOURCEWEAVE_SEARCH_CACHE_REDIS_URL="redis://127.0.0.1:16379/2" \
|
|
90
|
+
uvx --from sourceweave-web-search sourceweave-search-mcp \
|
|
91
|
+
--transport streamable-http \
|
|
92
|
+
--host 127.0.0.1 \
|
|
93
|
+
--port 8000
|
|
94
|
+
```
|
|
95
|
+
|
|
96
|
+
You can also point the same `uvx` command at externally hosted SearXNG, Crawl4AI, and Redis or Valkey endpoints by changing the environment variables.
|
|
97
|
+
|
|
98
|
+
## Installation Options
|
|
99
|
+
|
|
100
|
+
### Python package
|
|
101
|
+
|
|
102
|
+
Published releases can be installed from PyPI:
|
|
103
|
+
|
|
104
|
+
```bash
|
|
105
|
+
pip install sourceweave-web-search
|
|
106
|
+
```
|
|
107
|
+
|
|
108
|
+
Or run directly without a global install:
|
|
109
|
+
|
|
110
|
+
```bash
|
|
111
|
+
uvx --from sourceweave-web-search sourceweave-search-mcp
|
|
112
|
+
uvx --from sourceweave-web-search sourceweave-search --query "python programming"
|
|
113
|
+
```
|
|
114
|
+
|
|
115
|
+
### Repo checkout
|
|
116
|
+
|
|
117
|
+
For local development or source-based runs:
|
|
118
|
+
|
|
119
|
+
```bash
|
|
120
|
+
git clone https://github.com/MRNAQA/sourceweave-web-search.git
|
|
121
|
+
cd sourceweave-web-search
|
|
122
|
+
uv sync --locked --group dev
|
|
123
|
+
uv run sourceweave-search-mcp
|
|
124
|
+
```
|
|
125
|
+
|
|
126
|
+
### Container image
|
|
127
|
+
|
|
128
|
+
The release workflow can publish a container image to:
|
|
129
|
+
|
|
130
|
+
- `ghcr.io/mrnaqa/sourceweave-web-search`
|
|
131
|
+
- optionally `docker.io/mrnaqa/sourceweave-web-search` when Docker Hub publishing is configured
|
|
132
|
+
|
|
133
|
+
Example runtime:
|
|
134
|
+
|
|
135
|
+
```bash
|
|
136
|
+
docker run --rm -p 8000:8000 \
|
|
137
|
+
-e SOURCEWEAVE_SEARCH_SEARXNG_BASE_URL="http://host.docker.internal:19080/search?format=json&q=<query>" \
|
|
138
|
+
-e SOURCEWEAVE_SEARCH_CRAWL4AI_BASE_URL="http://host.docker.internal:19235" \
|
|
139
|
+
-e SOURCEWEAVE_SEARCH_CACHE_REDIS_URL="redis://host.docker.internal:16379/2" \
|
|
140
|
+
ghcr.io/mrnaqa/sourceweave-web-search:latest
|
|
141
|
+
```
|
|
142
|
+
|
|
143
|
+
## Runtime Configuration
|
|
144
|
+
|
|
145
|
+
Set these environment variables:
|
|
146
|
+
|
|
147
|
+
| Variable | Purpose |
|
|
148
|
+
| --- | --- |
|
|
149
|
+
| `SOURCEWEAVE_SEARCH_SEARXNG_BASE_URL` | SearXNG URL template. Must contain `<query>`. |
|
|
150
|
+
| `SOURCEWEAVE_SEARCH_CRAWL4AI_BASE_URL` | Crawl4AI base URL. |
|
|
151
|
+
| `SOURCEWEAVE_SEARCH_CACHE_REDIS_URL` | Redis or Valkey URL used for caching. |
|
|
152
|
+
| `FASTMCP_HOST` | Host for `sse` or `streamable-http` transport. |
|
|
153
|
+
| `FASTMCP_PORT` | Port for `sse` or `streamable-http` transport. |
|
|
154
|
+
|
|
155
|
+
Example:
|
|
156
|
+
|
|
157
|
+
```bash
|
|
158
|
+
SOURCEWEAVE_SEARCH_SEARXNG_BASE_URL="http://127.0.0.1:19080/search?format=json&q=<query>" \
|
|
159
|
+
SOURCEWEAVE_SEARCH_CRAWL4AI_BASE_URL="http://127.0.0.1:19235" \
|
|
160
|
+
SOURCEWEAVE_SEARCH_CACHE_REDIS_URL="redis://127.0.0.1:16379/2" \
|
|
161
|
+
sourceweave-search --query "python programming" --read-first-pages 2
|
|
162
|
+
```
|
|
163
|
+
|
|
164
|
+
## Quick Start
|
|
165
|
+
|
|
166
|
+
The CLI is useful for smoke testing the runtime outside an MCP client.
|
|
167
|
+
|
|
168
|
+
Search and immediately read the first results:
|
|
169
|
+
|
|
170
|
+
```bash
|
|
171
|
+
sourceweave-search --query "python programming" --read-first-pages 2
|
|
172
|
+
```
|
|
173
|
+
|
|
174
|
+
Read a discovered page and include stored related links:
|
|
175
|
+
|
|
176
|
+
```bash
|
|
177
|
+
sourceweave-search \
|
|
178
|
+
--query "react useEffect cleanup example" \
|
|
179
|
+
--read-first-page \
|
|
180
|
+
--related-links-limit 3
|
|
181
|
+
```
|
|
182
|
+
|
|
183
|
+
Force document conversion for an explicit URL:
|
|
184
|
+
|
|
185
|
+
```bash
|
|
186
|
+
sourceweave-search \
|
|
187
|
+
--query "guide pdf" \
|
|
188
|
+
--url '{"url": "https://example.com/guide.pdf", "convert_document": true}'
|
|
189
|
+
```
|
|
190
|
+
|
|
191
|
+
## MCP Server
|
|
192
|
+
|
|
193
|
+
Run over stdio:
|
|
194
|
+
|
|
195
|
+
```bash
|
|
196
|
+
sourceweave-search-mcp
|
|
197
|
+
```
|
|
198
|
+
|
|
199
|
+
Run as a local HTTP endpoint:
|
|
200
|
+
|
|
201
|
+
```bash
|
|
202
|
+
sourceweave-search-mcp --transport streamable-http --host 127.0.0.1 --port 8000
|
|
203
|
+
```
|
|
204
|
+
|
|
205
|
+
## What MCP Clients Get
|
|
206
|
+
|
|
207
|
+
MCP clients receive a simple two-step flow:
|
|
208
|
+
|
|
209
|
+
- a search step that returns compact results plus `page_id` handles
|
|
210
|
+
- a follow-up page-read step that returns stored content, focused excerpts, related-link summaries, image metadata, and page-quality hints when relevant
|
|
211
|
+
|
|
212
|
+
Human operators usually only need to know how to run the server and where to point the runtime endpoints. MCP clients handle the exact tool parameters.
|
|
213
|
+
|
|
214
|
+
## MCP Client Setup
|
|
215
|
+
|
|
216
|
+
### OpenCode
|
|
217
|
+
|
|
218
|
+
Example `opencode.json` / `opencode.jsonc` / `~/.config/opencode/opencode.json`:
|
|
219
|
+
|
|
220
|
+
```jsonc
|
|
221
|
+
{
|
|
222
|
+
"$schema": "https://opencode.ai/config.json",
|
|
223
|
+
"mcp": {
|
|
224
|
+
"sourceweave": {
|
|
225
|
+
"type": "local",
|
|
226
|
+
"command": [
|
|
227
|
+
"uvx",
|
|
228
|
+
"--from",
|
|
229
|
+
"sourceweave-web-search",
|
|
230
|
+
"sourceweave-search-mcp"
|
|
231
|
+
],
|
|
232
|
+
"environment": {
|
|
233
|
+
"SOURCEWEAVE_SEARCH_SEARXNG_BASE_URL": "http://127.0.0.1:19080/search?format=json&q=<query>",
|
|
234
|
+
"SOURCEWEAVE_SEARCH_CRAWL4AI_BASE_URL": "http://127.0.0.1:19235",
|
|
235
|
+
"SOURCEWEAVE_SEARCH_CACHE_REDIS_URL": "redis://127.0.0.1:16379/2"
|
|
236
|
+
},
|
|
237
|
+
"enabled": true,
|
|
238
|
+
"timeout": 30000
|
|
239
|
+
}
|
|
240
|
+
}
|
|
241
|
+
}
|
|
242
|
+
```
|
|
243
|
+
|
|
244
|
+
For a shared HTTP endpoint instead:
|
|
245
|
+
|
|
246
|
+
```json
|
|
247
|
+
{
|
|
248
|
+
"$schema": "https://opencode.ai/config.json",
|
|
249
|
+
"mcp": {
|
|
250
|
+
"sourceweave": {
|
|
251
|
+
"type": "remote",
|
|
252
|
+
"url": "http://127.0.0.1:18000/mcp",
|
|
253
|
+
"enabled": true,
|
|
254
|
+
"timeout": 30000
|
|
255
|
+
}
|
|
256
|
+
}
|
|
257
|
+
}
|
|
258
|
+
```
|
|
259
|
+
|
|
260
|
+
### VS Code Copilot
|
|
261
|
+
|
|
262
|
+
Example `.vscode/mcp.json`:
|
|
263
|
+
|
|
264
|
+
```json
|
|
265
|
+
{
|
|
266
|
+
"servers": {
|
|
267
|
+
"sourceweave": {
|
|
268
|
+
"type": "stdio",
|
|
269
|
+
"command": "uvx",
|
|
270
|
+
"args": [
|
|
271
|
+
"--from",
|
|
272
|
+
"sourceweave-web-search",
|
|
273
|
+
"sourceweave-search-mcp"
|
|
274
|
+
],
|
|
275
|
+
"env": {
|
|
276
|
+
"SOURCEWEAVE_SEARCH_SEARXNG_BASE_URL": "http://127.0.0.1:19080/search?format=json&q=<query>",
|
|
277
|
+
"SOURCEWEAVE_SEARCH_CRAWL4AI_BASE_URL": "http://127.0.0.1:19235",
|
|
278
|
+
"SOURCEWEAVE_SEARCH_CACHE_REDIS_URL": "redis://127.0.0.1:16379/2"
|
|
279
|
+
}
|
|
280
|
+
}
|
|
281
|
+
}
|
|
282
|
+
}
|
|
283
|
+
```
|
|
284
|
+
|
|
285
|
+
For a shared HTTP endpoint instead:
|
|
286
|
+
|
|
287
|
+
```json
|
|
288
|
+
{
|
|
289
|
+
"servers": {
|
|
290
|
+
"sourceweave": {
|
|
291
|
+
"type": "http",
|
|
292
|
+
"url": "http://127.0.0.1:18000/mcp"
|
|
293
|
+
}
|
|
294
|
+
}
|
|
295
|
+
}
|
|
296
|
+
```
|
|
297
|
+
|
|
298
|
+
## Publishing
|
|
299
|
+
|
|
300
|
+
The manual release workflow at `.github/workflows/release.yml` accepts a changelog and can optionally:
|
|
301
|
+
|
|
302
|
+
- publish the wheel and sdist to PyPI
|
|
303
|
+
- publish the container image to GHCR
|
|
304
|
+
- mirror the container image to Docker Hub when Docker Hub credentials are configured
|
|
305
|
+
|
|
306
|
+
Releases always attach the built distributions and `artifacts/sourceweave_web_search.py` to the GitHub release.
|
|
307
|
+
|
|
308
|
+
For contributor setup and publishing requirements, see [`CONTRIBUTING.md`](CONTRIBUTING.md).
|
|
309
|
+
|
|
310
|
+
## OpenWebUI
|
|
311
|
+
|
|
312
|
+
This repo also ships a generated standalone OpenWebUI tool file at `artifacts/sourceweave_web_search.py`.
|
|
313
|
+
|
|
314
|
+
From a repo checkout, verify it is in sync with the canonical implementation:
|
|
315
|
+
|
|
316
|
+
```bash
|
|
317
|
+
uv run sourceweave-build-openwebui --check
|
|
318
|
+
```
|
|
319
|
+
|
|
320
|
+
Paste that artifact into OpenWebUI when you want the standalone tool-file deployment path.
|
|
321
|
+
|
|
322
|
+
## Defaults
|
|
323
|
+
|
|
324
|
+
Default host-side endpoints used by the package:
|
|
325
|
+
|
|
326
|
+
- SearXNG: `http://127.0.0.1:19080/search?format=json&q=<query>`
|
|
327
|
+
- Crawl4AI: `http://127.0.0.1:19235`
|
|
328
|
+
- Redis: `redis://127.0.0.1:16379/2`
|
|
329
|
+
|
|
330
|
+
Default repo-local ports:
|
|
331
|
+
|
|
332
|
+
- SearXNG: `19080`
|
|
333
|
+
- Crawl4AI: `19235`
|
|
334
|
+
- Redis: `16379`
|
|
335
|
+
- MCP: `8000` when run directly with `uvx`; `18000` at `/mcp` when using the repo's `mcp` compose service
|
|
336
|
+
|
|
337
|
+
## Contributing
|
|
338
|
+
|
|
339
|
+
See [`CONTRIBUTING.md`](CONTRIBUTING.md) for local development, verification, packaging notes, and release workflow details.
|
|
340
|
+
|
|
341
|
+
## License
|
|
342
|
+
|
|
343
|
+
This project is licensed under the MIT License. See [`LICENSE`](LICENSE).
|
|
@@ -0,0 +1,314 @@
|
|
|
1
|
+
# SourceWeave Web Search
|
|
2
|
+
|
|
3
|
+
<!-- mcp-name: io.github.mrnaqa/sourceweave-web-search -->
|
|
4
|
+
|
|
5
|
+
SourceWeave Web Search is an MCP server and CLI for web search plus follow-up page reading.
|
|
6
|
+
|
|
7
|
+
It uses SearXNG for search, Crawl4AI for HTML extraction, and Redis or Valkey for caching.
|
|
8
|
+
|
|
9
|
+
For most users, the setup is simple:
|
|
10
|
+
|
|
11
|
+
1. run the supporting services locally in containers, or point at existing external endpoints
|
|
12
|
+
2. start the MCP server with `uvx`
|
|
13
|
+
3. connect your MCP client to the running server over `stdio` or local HTTP
|
|
14
|
+
|
|
15
|
+
## Key Features
|
|
16
|
+
|
|
17
|
+
- MCP server with `stdio`, `sse`, and `streamable-http` transports
|
|
18
|
+
- lean search plus follow-up page reading for MCP clients
|
|
19
|
+
- explicit per-URL document conversion for PDFs and other supported documents
|
|
20
|
+
- focused reads, related-link limits, image metadata, and page-quality hints
|
|
21
|
+
- publishable Python package, container image, and generated OpenWebUI artifact
|
|
22
|
+
- compatible with OpenCode, VS Code Copilot, and other MCP clients
|
|
23
|
+
|
|
24
|
+
## Requirements
|
|
25
|
+
|
|
26
|
+
- Python `3.12+`
|
|
27
|
+
- a reachable SearXNG endpoint
|
|
28
|
+
- a reachable Crawl4AI endpoint
|
|
29
|
+
- a reachable Redis or Valkey instance
|
|
30
|
+
|
|
31
|
+
Optional:
|
|
32
|
+
|
|
33
|
+
- Docker and Docker Compose for the repo-local stack
|
|
34
|
+
|
|
35
|
+
## Recommended Local Deployment
|
|
36
|
+
|
|
37
|
+
Start the supporting services locally:
|
|
38
|
+
|
|
39
|
+
```bash
|
|
40
|
+
git clone https://github.com/MRNAQA/sourceweave-web-search.git
|
|
41
|
+
cd sourceweave-web-search
|
|
42
|
+
cp .env.example .env
|
|
43
|
+
docker compose up -d redis crawl4ai searxng
|
|
44
|
+
```
|
|
45
|
+
|
|
46
|
+
Then start the MCP server from the published package with `uvx` and point it at those local endpoints:
|
|
47
|
+
|
|
48
|
+
```bash
|
|
49
|
+
SOURCEWEAVE_SEARCH_SEARXNG_BASE_URL="http://127.0.0.1:19080/search?format=json&q=<query>" \
|
|
50
|
+
SOURCEWEAVE_SEARCH_CRAWL4AI_BASE_URL="http://127.0.0.1:19235" \
|
|
51
|
+
SOURCEWEAVE_SEARCH_CACHE_REDIS_URL="redis://127.0.0.1:16379/2" \
|
|
52
|
+
uvx --from sourceweave-web-search sourceweave-search-mcp
|
|
53
|
+
```
|
|
54
|
+
|
|
55
|
+
For a local HTTP MCP endpoint instead of `stdio`:
|
|
56
|
+
|
|
57
|
+
```bash
|
|
58
|
+
SOURCEWEAVE_SEARCH_SEARXNG_BASE_URL="http://127.0.0.1:19080/search?format=json&q=<query>" \
|
|
59
|
+
SOURCEWEAVE_SEARCH_CRAWL4AI_BASE_URL="http://127.0.0.1:19235" \
|
|
60
|
+
SOURCEWEAVE_SEARCH_CACHE_REDIS_URL="redis://127.0.0.1:16379/2" \
|
|
61
|
+
uvx --from sourceweave-web-search sourceweave-search-mcp \
|
|
62
|
+
--transport streamable-http \
|
|
63
|
+
--host 127.0.0.1 \
|
|
64
|
+
--port 8000
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
You can also point the same `uvx` command at externally hosted SearXNG, Crawl4AI, and Redis or Valkey endpoints by changing the environment variables.
|
|
68
|
+
|
|
69
|
+
## Installation Options
|
|
70
|
+
|
|
71
|
+
### Python package
|
|
72
|
+
|
|
73
|
+
Published releases can be installed from PyPI:
|
|
74
|
+
|
|
75
|
+
```bash
|
|
76
|
+
pip install sourceweave-web-search
|
|
77
|
+
```
|
|
78
|
+
|
|
79
|
+
Or run directly without a global install:
|
|
80
|
+
|
|
81
|
+
```bash
|
|
82
|
+
uvx --from sourceweave-web-search sourceweave-search-mcp
|
|
83
|
+
uvx --from sourceweave-web-search sourceweave-search --query "python programming"
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
### Repo checkout
|
|
87
|
+
|
|
88
|
+
For local development or source-based runs:
|
|
89
|
+
|
|
90
|
+
```bash
|
|
91
|
+
git clone https://github.com/MRNAQA/sourceweave-web-search.git
|
|
92
|
+
cd sourceweave-web-search
|
|
93
|
+
uv sync --locked --group dev
|
|
94
|
+
uv run sourceweave-search-mcp
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
### Container image
|
|
98
|
+
|
|
99
|
+
The release workflow can publish a container image to:
|
|
100
|
+
|
|
101
|
+
- `ghcr.io/mrnaqa/sourceweave-web-search`
|
|
102
|
+
- optionally `docker.io/mrnaqa/sourceweave-web-search` when Docker Hub publishing is configured
|
|
103
|
+
|
|
104
|
+
Example runtime:
|
|
105
|
+
|
|
106
|
+
```bash
|
|
107
|
+
docker run --rm -p 8000:8000 \
|
|
108
|
+
-e SOURCEWEAVE_SEARCH_SEARXNG_BASE_URL="http://host.docker.internal:19080/search?format=json&q=<query>" \
|
|
109
|
+
-e SOURCEWEAVE_SEARCH_CRAWL4AI_BASE_URL="http://host.docker.internal:19235" \
|
|
110
|
+
-e SOURCEWEAVE_SEARCH_CACHE_REDIS_URL="redis://host.docker.internal:16379/2" \
|
|
111
|
+
ghcr.io/mrnaqa/sourceweave-web-search:latest
|
|
112
|
+
```
|
|
113
|
+
|
|
114
|
+
## Runtime Configuration
|
|
115
|
+
|
|
116
|
+
Set these environment variables:
|
|
117
|
+
|
|
118
|
+
| Variable | Purpose |
|
|
119
|
+
| --- | --- |
|
|
120
|
+
| `SOURCEWEAVE_SEARCH_SEARXNG_BASE_URL` | SearXNG URL template. Must contain `<query>`. |
|
|
121
|
+
| `SOURCEWEAVE_SEARCH_CRAWL4AI_BASE_URL` | Crawl4AI base URL. |
|
|
122
|
+
| `SOURCEWEAVE_SEARCH_CACHE_REDIS_URL` | Redis or Valkey URL used for caching. |
|
|
123
|
+
| `FASTMCP_HOST` | Host for `sse` or `streamable-http` transport. |
|
|
124
|
+
| `FASTMCP_PORT` | Port for `sse` or `streamable-http` transport. |
|
|
125
|
+
|
|
126
|
+
Example:
|
|
127
|
+
|
|
128
|
+
```bash
|
|
129
|
+
SOURCEWEAVE_SEARCH_SEARXNG_BASE_URL="http://127.0.0.1:19080/search?format=json&q=<query>" \
|
|
130
|
+
SOURCEWEAVE_SEARCH_CRAWL4AI_BASE_URL="http://127.0.0.1:19235" \
|
|
131
|
+
SOURCEWEAVE_SEARCH_CACHE_REDIS_URL="redis://127.0.0.1:16379/2" \
|
|
132
|
+
sourceweave-search --query "python programming" --read-first-pages 2
|
|
133
|
+
```
|
|
134
|
+
|
|
135
|
+
## Quick Start
|
|
136
|
+
|
|
137
|
+
The CLI is useful for smoke testing the runtime outside an MCP client.
|
|
138
|
+
|
|
139
|
+
Search and immediately read the first results:
|
|
140
|
+
|
|
141
|
+
```bash
|
|
142
|
+
sourceweave-search --query "python programming" --read-first-pages 2
|
|
143
|
+
```
|
|
144
|
+
|
|
145
|
+
Read a discovered page and include stored related links:
|
|
146
|
+
|
|
147
|
+
```bash
|
|
148
|
+
sourceweave-search \
|
|
149
|
+
--query "react useEffect cleanup example" \
|
|
150
|
+
--read-first-page \
|
|
151
|
+
--related-links-limit 3
|
|
152
|
+
```
|
|
153
|
+
|
|
154
|
+
Force document conversion for an explicit URL:
|
|
155
|
+
|
|
156
|
+
```bash
|
|
157
|
+
sourceweave-search \
|
|
158
|
+
--query "guide pdf" \
|
|
159
|
+
--url '{"url": "https://example.com/guide.pdf", "convert_document": true}'
|
|
160
|
+
```
|
|
161
|
+
|
|
162
|
+
## MCP Server
|
|
163
|
+
|
|
164
|
+
Run over stdio:
|
|
165
|
+
|
|
166
|
+
```bash
|
|
167
|
+
sourceweave-search-mcp
|
|
168
|
+
```
|
|
169
|
+
|
|
170
|
+
Run as a local HTTP endpoint:
|
|
171
|
+
|
|
172
|
+
```bash
|
|
173
|
+
sourceweave-search-mcp --transport streamable-http --host 127.0.0.1 --port 8000
|
|
174
|
+
```
|
|
175
|
+
|
|
176
|
+
## What MCP Clients Get
|
|
177
|
+
|
|
178
|
+
MCP clients receive a simple two-step flow:
|
|
179
|
+
|
|
180
|
+
- a search step that returns compact results plus `page_id` handles
|
|
181
|
+
- a follow-up page-read step that returns stored content, focused excerpts, related-link summaries, image metadata, and page-quality hints when relevant
|
|
182
|
+
|
|
183
|
+
Human operators usually only need to know how to run the server and where to point the runtime endpoints. MCP clients handle the exact tool parameters.
|
|
184
|
+
|
|
185
|
+
## MCP Client Setup
|
|
186
|
+
|
|
187
|
+
### OpenCode
|
|
188
|
+
|
|
189
|
+
Example `opencode.json` / `opencode.jsonc` / `~/.config/opencode/opencode.json`:
|
|
190
|
+
|
|
191
|
+
```jsonc
|
|
192
|
+
{
|
|
193
|
+
"$schema": "https://opencode.ai/config.json",
|
|
194
|
+
"mcp": {
|
|
195
|
+
"sourceweave": {
|
|
196
|
+
"type": "local",
|
|
197
|
+
"command": [
|
|
198
|
+
"uvx",
|
|
199
|
+
"--from",
|
|
200
|
+
"sourceweave-web-search",
|
|
201
|
+
"sourceweave-search-mcp"
|
|
202
|
+
],
|
|
203
|
+
"environment": {
|
|
204
|
+
"SOURCEWEAVE_SEARCH_SEARXNG_BASE_URL": "http://127.0.0.1:19080/search?format=json&q=<query>",
|
|
205
|
+
"SOURCEWEAVE_SEARCH_CRAWL4AI_BASE_URL": "http://127.0.0.1:19235",
|
|
206
|
+
"SOURCEWEAVE_SEARCH_CACHE_REDIS_URL": "redis://127.0.0.1:16379/2"
|
|
207
|
+
},
|
|
208
|
+
"enabled": true,
|
|
209
|
+
"timeout": 30000
|
|
210
|
+
}
|
|
211
|
+
}
|
|
212
|
+
}
|
|
213
|
+
```
|
|
214
|
+
|
|
215
|
+
For a shared HTTP endpoint instead:
|
|
216
|
+
|
|
217
|
+
```json
|
|
218
|
+
{
|
|
219
|
+
"$schema": "https://opencode.ai/config.json",
|
|
220
|
+
"mcp": {
|
|
221
|
+
"sourceweave": {
|
|
222
|
+
"type": "remote",
|
|
223
|
+
"url": "http://127.0.0.1:18000/mcp",
|
|
224
|
+
"enabled": true,
|
|
225
|
+
"timeout": 30000
|
|
226
|
+
}
|
|
227
|
+
}
|
|
228
|
+
}
|
|
229
|
+
```
|
|
230
|
+
|
|
231
|
+
### VS Code Copilot
|
|
232
|
+
|
|
233
|
+
Example `.vscode/mcp.json`:
|
|
234
|
+
|
|
235
|
+
```json
|
|
236
|
+
{
|
|
237
|
+
"servers": {
|
|
238
|
+
"sourceweave": {
|
|
239
|
+
"type": "stdio",
|
|
240
|
+
"command": "uvx",
|
|
241
|
+
"args": [
|
|
242
|
+
"--from",
|
|
243
|
+
"sourceweave-web-search",
|
|
244
|
+
"sourceweave-search-mcp"
|
|
245
|
+
],
|
|
246
|
+
"env": {
|
|
247
|
+
"SOURCEWEAVE_SEARCH_SEARXNG_BASE_URL": "http://127.0.0.1:19080/search?format=json&q=<query>",
|
|
248
|
+
"SOURCEWEAVE_SEARCH_CRAWL4AI_BASE_URL": "http://127.0.0.1:19235",
|
|
249
|
+
"SOURCEWEAVE_SEARCH_CACHE_REDIS_URL": "redis://127.0.0.1:16379/2"
|
|
250
|
+
}
|
|
251
|
+
}
|
|
252
|
+
}
|
|
253
|
+
}
|
|
254
|
+
```
|
|
255
|
+
|
|
256
|
+
For a shared HTTP endpoint instead:
|
|
257
|
+
|
|
258
|
+
```json
|
|
259
|
+
{
|
|
260
|
+
"servers": {
|
|
261
|
+
"sourceweave": {
|
|
262
|
+
"type": "http",
|
|
263
|
+
"url": "http://127.0.0.1:18000/mcp"
|
|
264
|
+
}
|
|
265
|
+
}
|
|
266
|
+
}
|
|
267
|
+
```
|
|
268
|
+
|
|
269
|
+
## Publishing
|
|
270
|
+
|
|
271
|
+
The manual release workflow at `.github/workflows/release.yml` accepts a changelog and can optionally:
|
|
272
|
+
|
|
273
|
+
- publish the wheel and sdist to PyPI
|
|
274
|
+
- publish the container image to GHCR
|
|
275
|
+
- mirror the container image to Docker Hub when Docker Hub credentials are configured
|
|
276
|
+
|
|
277
|
+
Releases always attach the built distributions and `artifacts/sourceweave_web_search.py` to the GitHub release.
|
|
278
|
+
|
|
279
|
+
For contributor setup and publishing requirements, see [`CONTRIBUTING.md`](CONTRIBUTING.md).
|
|
280
|
+
|
|
281
|
+
## OpenWebUI
|
|
282
|
+
|
|
283
|
+
This repo also ships a generated standalone OpenWebUI tool file at `artifacts/sourceweave_web_search.py`.
|
|
284
|
+
|
|
285
|
+
From a repo checkout, verify it is in sync with the canonical implementation:
|
|
286
|
+
|
|
287
|
+
```bash
|
|
288
|
+
uv run sourceweave-build-openwebui --check
|
|
289
|
+
```
|
|
290
|
+
|
|
291
|
+
Paste that artifact into OpenWebUI when you want the standalone tool-file deployment path.
|
|
292
|
+
|
|
293
|
+
## Defaults
|
|
294
|
+
|
|
295
|
+
Default host-side endpoints used by the package:
|
|
296
|
+
|
|
297
|
+
- SearXNG: `http://127.0.0.1:19080/search?format=json&q=<query>`
|
|
298
|
+
- Crawl4AI: `http://127.0.0.1:19235`
|
|
299
|
+
- Redis: `redis://127.0.0.1:16379/2`
|
|
300
|
+
|
|
301
|
+
Default repo-local ports:
|
|
302
|
+
|
|
303
|
+
- SearXNG: `19080`
|
|
304
|
+
- Crawl4AI: `19235`
|
|
305
|
+
- Redis: `16379`
|
|
306
|
+
- MCP: `8000` when run directly with `uvx`; `18000` at `/mcp` when using the repo's `mcp` compose service
|
|
307
|
+
|
|
308
|
+
## Contributing
|
|
309
|
+
|
|
310
|
+
See [`CONTRIBUTING.md`](CONTRIBUTING.md) for local development, verification, packaging notes, and release workflow details.
|
|
311
|
+
|
|
312
|
+
## License
|
|
313
|
+
|
|
314
|
+
This project is licensed under the MIT License. See [`LICENSE`](LICENSE).
|