mcp-researchpowerpack 3.6.18 → 3.7.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +110 -624
- package/dist/clients/reddit.js +7 -7
- package/dist/clients/reddit.js.map +1 -1
- package/dist/clients/research.js +2 -2
- package/dist/clients/research.js.map +1 -1
- package/dist/clients/scraper.js +1 -1
- package/dist/clients/scraper.js.map +1 -1
- package/dist/clients/search.js +1 -1
- package/dist/clients/search.js.map +1 -1
- package/dist/config/yaml/tools.yaml +6 -5
- package/dist/schemas/scrape-links.d.ts +3 -0
- package/dist/schemas/scrape-links.d.ts.map +1 -1
- package/dist/schemas/scrape-links.js +5 -0
- package/dist/schemas/scrape-links.js.map +1 -1
- package/dist/schemas/web-search.js +2 -2
- package/dist/schemas/web-search.js.map +1 -1
- package/dist/services/llm-processor.d.ts +1 -0
- package/dist/services/llm-processor.d.ts.map +1 -1
- package/dist/services/llm-processor.js +29 -2
- package/dist/services/llm-processor.js.map +1 -1
- package/dist/tools/reddit.d.ts.map +1 -1
- package/dist/tools/reddit.js +46 -27
- package/dist/tools/reddit.js.map +1 -1
- package/dist/tools/registry.d.ts.map +1 -1
- package/dist/tools/registry.js +23 -3
- package/dist/tools/registry.js.map +1 -1
- package/dist/tools/research.d.ts.map +1 -1
- package/dist/tools/research.js +35 -17
- package/dist/tools/research.js.map +1 -1
- package/dist/tools/scrape.d.ts.map +1 -1
- package/dist/tools/scrape.js +21 -13
- package/dist/tools/scrape.js.map +1 -1
- package/dist/tools/search.d.ts.map +1 -1
- package/dist/tools/search.js +14 -10
- package/dist/tools/search.js.map +1 -1
- package/dist/tools/utils.d.ts.map +1 -1
- package/dist/tools/utils.js +4 -2
- package/dist/tools/utils.js.map +1 -1
- package/dist/utils/errors.js +3 -3
- package/dist/utils/errors.js.map +1 -1
- package/dist/utils/response.d.ts +1 -0
- package/dist/utils/response.d.ts.map +1 -1
- package/dist/utils/response.js +21 -13
- package/dist/utils/response.js.map +1 -1
- package/dist/version.d.ts +1 -1
- package/dist/version.js +2 -2
- package/dist/version.js.map +1 -1
- package/package.json +4 -4
package/README.md
CHANGED
|
@@ -1,191 +1,34 @@
|
|
|
1
|
-
|
|
2
|
-
<h3 align="center">Stop tab-hopping for research. Start getting structured context.</h3>
|
|
3
|
-
|
|
4
|
-
<p align="center">
|
|
5
|
-
<strong>
|
|
6
|
-
<em>The ultimate research toolkit for your AI coding assistant. It searches the web, mines Reddit, scrapes any URL, and synthesizes everything into perfectly structured context your LLM actually understands.</em>
|
|
7
|
-
</strong>
|
|
8
|
-
</p>
|
|
9
|
-
|
|
10
|
-
<p align="center">
|
|
11
|
-
<!-- Package Info -->
|
|
12
|
-
<a href="https://www.npmjs.com/package/mcp-researchpowerpack"><img alt="npm" src="https://img.shields.io/npm/v/mcp-researchpowerpack.svg?style=flat-square&color=4D87E6"></a>
|
|
13
|
-
<a href="#"><img alt="node" src="https://img.shields.io/badge/node-18+-4D87E6.svg?style=flat-square"></a>
|
|
14
|
-
•
|
|
15
|
-
<!-- Features -->
|
|
16
|
-
<a href="https://opensource.org/licenses/MIT"><img alt="license" src="https://img.shields.io/badge/License-MIT-F9A825.svg?style=flat-square"></a>
|
|
17
|
-
<a href="#"><img alt="platform" src="https://img.shields.io/badge/platform-macOS_|_Linux_|_Windows-2ED573.svg?style=flat-square"></a>
|
|
18
|
-
</p>
|
|
19
|
-
|
|
20
|
-
<p align="center">
|
|
21
|
-
<img alt="modular" src="https://img.shields.io/badge/🧩_modular-use_1_tool_or_all_5-2ED573.svg?style=for-the-badge">
|
|
22
|
-
<img alt="zero crash" src="https://img.shields.io/badge/💪_zero_crash-missing_keys_=_helpful_errors-2ED573.svg?style=for-the-badge">
|
|
23
|
-
</p>
|
|
24
|
-
|
|
25
|
-
<div align="center">
|
|
26
|
-
|
|
27
|
-
### 🧭 Quick Navigation
|
|
28
|
-
|
|
29
|
-
[**⚡ Get Started**](#-get-started-in-60-seconds) •
|
|
30
|
-
[**🎯 Why Research Powerpack**](#-why-research-powerpack) •
|
|
31
|
-
[**🎮 Tools**](#-tool-reference) •
|
|
32
|
-
[**⚙️ Configuration**](#%EF%B8%8F-environment-variables--tool-availability) •
|
|
33
|
-
[**📚 Examples**](#-recommended-workflows)
|
|
34
|
-
|
|
35
|
-
</div>
|
|
36
|
-
|
|
37
|
-
---
|
|
38
|
-
|
|
39
|
-
**`research-powerpack-mcp`** is the research assistant your AI has been missing. Stop asking your LLM to guess about things it doesn't know. This MCP server acts like a senior researcher -- searching the web, mining Reddit discussions, scraping documentation, and synthesizing everything into structured context so your AI can give you answers you can actually trust.
|
|
40
|
-
|
|
41
|
-
<div align="center">
|
|
42
|
-
<table>
|
|
43
|
-
<tr>
|
|
44
|
-
<td align="center">
|
|
45
|
-
<h3>🔍</h3>
|
|
46
|
-
<b>Batch Web Search</b><br/>
|
|
47
|
-
<sub>100 keywords in parallel</sub>
|
|
48
|
-
</td>
|
|
49
|
-
<td align="center">
|
|
50
|
-
<h3>💬</h3>
|
|
51
|
-
<b>Reddit Mining</b><br/>
|
|
52
|
-
<sub>Real opinions, not marketing</sub>
|
|
53
|
-
</td>
|
|
54
|
-
<td align="center">
|
|
55
|
-
<h3>🌐</h3>
|
|
56
|
-
<b>Universal Scraping</b><br/>
|
|
57
|
-
<sub>JS rendering + geo-targeting</sub>
|
|
58
|
-
</td>
|
|
59
|
-
<td align="center">
|
|
60
|
-
<h3>🧠</h3>
|
|
61
|
-
<b>Deep Research</b><br/>
|
|
62
|
-
<sub>AI synthesis with citations</sub>
|
|
63
|
-
</td>
|
|
64
|
-
</tr>
|
|
65
|
-
</table>
|
|
66
|
-
</div>
|
|
67
|
-
|
|
68
|
-
Here's how it works:
|
|
69
|
-
- **You:** "What's the best database for my use case?"
|
|
70
|
-
- **AI + Powerpack:** Searches Google, mines Reddit threads, scrapes docs, synthesizes findings.
|
|
71
|
-
- **You:** Get an actually informed answer with real community opinions and citations.
|
|
72
|
-
- **Result:** Better decisions, faster. No more juggling 47 browser tabs.
|
|
73
|
-
|
|
74
|
-
---
|
|
75
|
-
|
|
76
|
-
## 🎯 Why Research Powerpack
|
|
77
|
-
|
|
78
|
-
Manual research is tedious and error-prone. `research-powerpack-mcp` replaces that entire workflow with a single integrated pipeline.
|
|
79
|
-
|
|
80
|
-
<table align="center">
|
|
81
|
-
<tr>
|
|
82
|
-
<td align="center"><b>❌ Without Research Powerpack</b></td>
|
|
83
|
-
<td align="center"><b>✅ With Research Powerpack</b></td>
|
|
84
|
-
</tr>
|
|
85
|
-
<tr>
|
|
86
|
-
<td>
|
|
87
|
-
<ol>
|
|
88
|
-
<li>Open 15 browser tabs.</li>
|
|
89
|
-
<li>Skim Stack Overflow answers from 2019.</li>
|
|
90
|
-
<li>Search Reddit, get distracted along the way.</li>
|
|
91
|
-
<li>Copy-paste random snippets to your AI.</li>
|
|
92
|
-
<li>Get a mediocre answer from confused context.</li>
|
|
93
|
-
</ol>
|
|
94
|
-
</td>
|
|
95
|
-
<td>
|
|
96
|
-
<ol>
|
|
97
|
-
<li>Ask your AI to research it.</li>
|
|
98
|
-
<li>AI searches, scrapes, mines Reddit automatically.</li>
|
|
99
|
-
<li>Receive synthesized insights with sources.</li>
|
|
100
|
-
<li>Make an informed decision.</li>
|
|
101
|
-
<li>Move on to the work that matters. ☕</li>
|
|
102
|
-
</ol>
|
|
103
|
-
</td>
|
|
104
|
-
</tr>
|
|
105
|
-
</table>
|
|
106
|
-
|
|
107
|
-
This isn't just fetching random pages. Research Powerpack builds **high-signal, low-noise context** with CTR-weighted ranking, smart comment allocation, and intelligent token distribution that prevents massive responses from breaking your LLM's context window.
|
|
108
|
-
|
|
109
|
-
---
|
|
110
|
-
|
|
111
|
-
## 🚀 Get Started in 60 Seconds
|
|
112
|
-
|
|
113
|
-
### 1. Install
|
|
1
|
+
MCP server that gives your AI assistant research tools. Google search, Reddit deep-dives, web scraping with LLM extraction, and multi-model deep research — all as MCP tools that chain into each other.
|
|
114
2
|
|
|
115
3
|
```bash
|
|
116
|
-
|
|
4
|
+
npx mcp-researchpowerpack
|
|
117
5
|
```
|
|
118
6
|
|
|
119
|
-
|
|
7
|
+
five tools, zero config to start. each API key you add unlocks more capabilities.
|
|
120
8
|
|
|
121
|
-
|
|
9
|
+
[](https://www.npmjs.com/package/mcp-researchpowerpack)
|
|
10
|
+
[](https://nodejs.org/)
|
|
11
|
+
[](https://opensource.org/licenses/MIT)
|
|
122
12
|
|
|
123
|
-
|
|
124
|
-
|:------:|:-----------:|:----:|
|
|
125
|
-
| 🖥️ **Claude Desktop** | `claude_desktop_config.json` | [Setup](#claude-desktop) |
|
|
126
|
-
| ⌨️ **Claude Code** | `~/.claude.json` or CLI | [Setup](#claude-code-cli) |
|
|
127
|
-
| 🎯 **Cursor** | `.cursor/mcp.json` | [Setup](#cursorwindsurf) |
|
|
128
|
-
| 🏄 **Windsurf** | MCP settings | [Setup](#cursorwindsurf) |
|
|
129
|
-
|
|
130
|
-
</div>
|
|
131
|
-
|
|
132
|
-
#### Claude Desktop
|
|
133
|
-
|
|
134
|
-
Add to your `claude_desktop_config.json`:
|
|
135
|
-
|
|
136
|
-
```json
|
|
137
|
-
{
|
|
138
|
-
"mcpServers": {
|
|
139
|
-
"research-powerpack": {
|
|
140
|
-
"command": "npx",
|
|
141
|
-
"args": ["mcp-researchpowerpack"],
|
|
142
|
-
"env": {
|
|
143
|
-
"SERPER_API_KEY": "your_key",
|
|
144
|
-
"REDDIT_CLIENT_ID": "your_id",
|
|
145
|
-
"REDDIT_CLIENT_SECRET": "your_secret",
|
|
146
|
-
"SCRAPEDO_API_KEY": "your_key",
|
|
147
|
-
"OPENROUTER_API_KEY": "your_key"
|
|
148
|
-
}
|
|
149
|
-
}
|
|
150
|
-
}
|
|
151
|
-
}
|
|
152
|
-
```
|
|
13
|
+
---
|
|
153
14
|
|
|
154
|
-
|
|
15
|
+
## tools
|
|
155
16
|
|
|
156
|
-
|
|
157
|
-
|
|
158
|
-
|
|
159
|
-
|
|
160
|
-
|
|
161
|
-
|
|
162
|
-
|
|
163
|
-
"REDDIT_CLIENT_ID": "xxx",
|
|
164
|
-
"REDDIT_CLIENT_SECRET": "xxx",
|
|
165
|
-
"RESEARCH_MODEL": "xxxx",
|
|
166
|
-
"SCRAPEDO_API_KEY": "xxx",
|
|
167
|
-
"SERPER_API_KEY": "xxxx"
|
|
168
|
-
}
|
|
169
|
-
}' | tee ~/Library/Application\ Support/Claude/claude_desktop_config.json
|
|
170
|
-
```
|
|
17
|
+
| tool | what it does | requires |
|
|
18
|
+
|:---|:---|:---|
|
|
19
|
+
| `web_search` | parallel Google search across 3-100 keywords, CTR-weighted ranking, consensus detection | `SERPER_API_KEY` |
|
|
20
|
+
| `search_reddit` | same engine but filtered to reddit.com, 10-50 queries in parallel | `SERPER_API_KEY` |
|
|
21
|
+
| `get_reddit_post` | fetches 2-50 Reddit posts with full comment trees, optional LLM extraction | `REDDIT_CLIENT_ID` + `REDDIT_CLIENT_SECRET` |
|
|
22
|
+
| `scrape_links` | scrapes 1-50 URLs with JS rendering fallback, HTML-to-markdown, optional LLM extraction | `SCRAPEDO_API_KEY` |
|
|
23
|
+
| `deep_research` | sends questions to research-capable models (Grok, Gemini) with web search enabled, supports local file attachments | `OPENROUTER_API_KEY` |
|
|
171
24
|
|
|
172
|
-
|
|
25
|
+
tools are designed to chain: `web_search` suggests calling `scrape_links`, which suggests `search_reddit`, which suggests `get_reddit_post`, which suggests `deep_research` for synthesis.
|
|
173
26
|
|
|
174
|
-
|
|
27
|
+
## install
|
|
175
28
|
|
|
176
|
-
|
|
177
|
-
claude mcp add research-powerpack npx \
|
|
178
|
-
--scope user \
|
|
179
|
-
--env SERPER_API_KEY=your_key \
|
|
180
|
-
--env REDDIT_CLIENT_ID=your_id \
|
|
181
|
-
--env REDDIT_CLIENT_SECRET=your_secret \
|
|
182
|
-
--env OPENROUTER_API_KEY=your_key \
|
|
183
|
-
--env OPENROUTER_BASE_URL=https://openrouter.ai/api/v1 \
|
|
184
|
-
--env RESEARCH_MODEL=x-ai/grok-4.1-fast \
|
|
185
|
-
-- research-powerpack-mcp
|
|
186
|
-
```
|
|
29
|
+
### Claude Desktop / Claude Code
|
|
187
30
|
|
|
188
|
-
|
|
31
|
+
add to your MCP config:
|
|
189
32
|
|
|
190
33
|
```json
|
|
191
34
|
{
|
|
@@ -194,496 +37,139 @@ Or manually add to `~/.claude.json`:
|
|
|
194
37
|
"command": "npx",
|
|
195
38
|
"args": ["mcp-researchpowerpack"],
|
|
196
39
|
"env": {
|
|
197
|
-
"SERPER_API_KEY": "
|
|
198
|
-
"
|
|
199
|
-
"REDDIT_CLIENT_SECRET": "your_secret",
|
|
200
|
-
"OPENROUTER_API_KEY": "your_key",
|
|
201
|
-
"OPENROUTER_BASE_URL": "https://openrouter.ai/api/v1",
|
|
202
|
-
"RESEARCH_MODEL": "x-ai/grok-4.1-fast"
|
|
40
|
+
"SERPER_API_KEY": "...",
|
|
41
|
+
"OPENROUTER_API_KEY": "..."
|
|
203
42
|
}
|
|
204
43
|
}
|
|
205
44
|
}
|
|
206
45
|
}
|
|
207
46
|
```
|
|
208
47
|
|
|
209
|
-
|
|
210
|
-
|
|
211
|
-
Add to `.cursor/mcp.json` or equivalent:
|
|
212
|
-
|
|
213
|
-
```json
|
|
214
|
-
{
|
|
215
|
-
"mcpServers": {
|
|
216
|
-
"research-powerpack": {
|
|
217
|
-
"command": "npx",
|
|
218
|
-
"args": ["mcp-researchpowerpack"],
|
|
219
|
-
"env": {
|
|
220
|
-
"SERPER_API_KEY": "your_key"
|
|
221
|
-
}
|
|
222
|
-
}
|
|
223
|
-
}
|
|
224
|
-
}
|
|
225
|
-
```
|
|
226
|
-
|
|
227
|
-
> **Zero Crash Promise:** Missing API keys? No problem. The server always starts. Tools that require missing keys return helpful setup instructions instead of crashing.
|
|
228
|
-
|
|
229
|
-
---
|
|
230
|
-
|
|
231
|
-
## 🌐 Transport Modes
|
|
232
|
-
|
|
233
|
-
Research Powerpack supports three transport modes:
|
|
234
|
-
|
|
235
|
-
| Mode | Use Case | How to Start |
|
|
236
|
-
|------|----------|-------------|
|
|
237
|
-
| **STDIO** (default) | Claude Desktop, Cursor, Windsurf | `npx mcp-researchpowerpack` |
|
|
238
|
-
| **HTTP Streamable** | Self-hosted, Docker, LAN sharing | `MCP_TRANSPORT=http npx mcp-researchpowerpack` |
|
|
239
|
-
| **Cloudflare Workers** | Serverless, globally distributed | Already deployed ↓ |
|
|
240
|
-
|
|
241
|
-
### Remote MCP (Cloudflare Workers)
|
|
242
|
-
|
|
243
|
-
A remote MCP endpoint is deployed and ready to use:
|
|
244
|
-
|
|
245
|
-
```
|
|
246
|
-
https://mcp-researchpowerpack.workers.yigitkonur.com/mcp
|
|
247
|
-
```
|
|
248
|
-
|
|
249
|
-
Connect from any MCP client that supports HTTP Streamable transport:
|
|
250
|
-
|
|
251
|
-
```json
|
|
252
|
-
{
|
|
253
|
-
"mcpServers": {
|
|
254
|
-
"research-powerpack-remote": {
|
|
255
|
-
"type": "streamable-http",
|
|
256
|
-
"url": "https://mcp-researchpowerpack.workers.yigitkonur.com/mcp"
|
|
257
|
-
}
|
|
258
|
-
}
|
|
259
|
-
}
|
|
260
|
-
```
|
|
261
|
-
|
|
262
|
-
### Self-Hosted HTTP Streamable
|
|
48
|
+
### from source
|
|
263
49
|
|
|
264
50
|
```bash
|
|
265
|
-
|
|
266
|
-
|
|
267
|
-
|
|
268
|
-
|
|
269
|
-
MCP_TRANSPORT=http MCP_PORT=8080 npx mcp-researchpowerpack
|
|
270
|
-
```
|
|
271
|
-
|
|
272
|
-
```json
|
|
273
|
-
{
|
|
274
|
-
"mcpServers": {
|
|
275
|
-
"research-powerpack-http": {
|
|
276
|
-
"type": "streamable-http",
|
|
277
|
-
"url": "http://localhost:3001/mcp"
|
|
278
|
-
}
|
|
279
|
-
}
|
|
280
|
-
}
|
|
281
|
-
```
|
|
282
|
-
|
|
283
|
-
---
|
|
284
|
-
|
|
285
|
-
## 🎮 Tool Reference
|
|
286
|
-
|
|
287
|
-
<div align="center">
|
|
288
|
-
<table>
|
|
289
|
-
<tr>
|
|
290
|
-
<td align="center">
|
|
291
|
-
<h3>🔍</h3>
|
|
292
|
-
<b><code>web_search</code></b><br/>
|
|
293
|
-
<sub>Batch Google search</sub>
|
|
294
|
-
</td>
|
|
295
|
-
<td align="center">
|
|
296
|
-
<h3>💬</h3>
|
|
297
|
-
<b><code>search_reddit</code></b><br/>
|
|
298
|
-
<sub>Find Reddit discussions</sub>
|
|
299
|
-
</td>
|
|
300
|
-
<td align="center">
|
|
301
|
-
<h3>📖</h3>
|
|
302
|
-
<b><code>get_reddit_post</code></b><br/>
|
|
303
|
-
<sub>Fetch posts + comments</sub>
|
|
304
|
-
</td>
|
|
305
|
-
<td align="center">
|
|
306
|
-
<h3>🌐</h3>
|
|
307
|
-
<b><code>scrape_links</code></b><br/>
|
|
308
|
-
<sub>Extract any URL</sub>
|
|
309
|
-
</td>
|
|
310
|
-
<td align="center">
|
|
311
|
-
<h3>🧠</h3>
|
|
312
|
-
<b><code>deep_research</code></b><br/>
|
|
313
|
-
<sub>AI synthesis</sub>
|
|
314
|
-
</td>
|
|
315
|
-
</tr>
|
|
316
|
-
</table>
|
|
317
|
-
</div>
|
|
318
|
-
|
|
319
|
-
### `web_search`
|
|
320
|
-
|
|
321
|
-
**Batch web search** using Google via Serper API. Search up to 100 keywords in parallel.
|
|
322
|
-
|
|
323
|
-
| Parameter | Type | Required | Description |
|
|
324
|
-
|-----------|------|----------|-------------|
|
|
325
|
-
| `keywords` | `string[]` | Yes | Search queries (1-100). Use distinct keywords for maximum coverage. |
|
|
326
|
-
|
|
327
|
-
**Supports Google operators:** `site:`, `-exclusion`, `"exact phrase"`, `filetype:`
|
|
328
|
-
|
|
329
|
-
```json
|
|
330
|
-
{
|
|
331
|
-
"keywords": [
|
|
332
|
-
"best IDE 2025",
|
|
333
|
-
"VS Code alternatives",
|
|
334
|
-
"Cursor vs Windsurf comparison"
|
|
335
|
-
]
|
|
336
|
-
}
|
|
337
|
-
```
|
|
338
|
-
|
|
339
|
-
---
|
|
340
|
-
|
|
341
|
-
### `search_reddit`
|
|
342
|
-
|
|
343
|
-
**Search Reddit** via Google with automatic `site:reddit.com` filtering.
|
|
344
|
-
|
|
345
|
-
| Parameter | Type | Required | Description |
|
|
346
|
-
|-----------|------|----------|-------------|
|
|
347
|
-
| `queries` | `string[]` | Yes | Search queries (max 10) |
|
|
348
|
-
| `date_after` | `string` | No | Filter results after date (YYYY-MM-DD) |
|
|
349
|
-
|
|
350
|
-
**Search operators:** `intitle:keyword`, `"exact phrase"`, `OR`, `-exclude`
|
|
351
|
-
|
|
352
|
-
```json
|
|
353
|
-
{
|
|
354
|
-
"queries": [
|
|
355
|
-
"best mechanical keyboard 2025",
|
|
356
|
-
"intitle:keyboard recommendation"
|
|
357
|
-
],
|
|
358
|
-
"date_after": "2024-01-01"
|
|
359
|
-
}
|
|
51
|
+
git clone https://github.com/yigitkonur/mcp-research-powerpack.git
|
|
52
|
+
cd mcp-research-powerpack
|
|
53
|
+
pnpm install && pnpm build
|
|
54
|
+
pnpm start
|
|
360
55
|
```
|
|
361
56
|
|
|
362
|
-
|
|
363
|
-
|
|
364
|
-
### `get_reddit_post`
|
|
365
|
-
|
|
366
|
-
**Fetch Reddit posts** with smart comment allocation (1,000 comment budget distributed automatically).
|
|
367
|
-
|
|
368
|
-
| Parameter | Type | Required | Default | Description |
|
|
369
|
-
|-----------|------|----------|---------|-------------|
|
|
370
|
-
| `urls` | `string[]` | Yes | — | Reddit post URLs (2-50) |
|
|
371
|
-
| `fetch_comments` | `boolean` | No | `true` | Whether to fetch comments |
|
|
372
|
-
| `max_comments` | `number` | No | auto | Override comment allocation |
|
|
373
|
-
|
|
374
|
-
**Smart Allocation:**
|
|
375
|
-
- 2 posts → ~500 comments/post (deep dive)
|
|
376
|
-
- 10 posts → ~100 comments/post
|
|
377
|
-
- 50 posts → ~20 comments/post (quick scan)
|
|
378
|
-
|
|
379
|
-
```json
|
|
380
|
-
{
|
|
381
|
-
"urls": [
|
|
382
|
-
"https://reddit.com/r/programming/comments/abc123/post_title",
|
|
383
|
-
"https://reddit.com/r/webdev/comments/def456/another_post"
|
|
384
|
-
]
|
|
385
|
-
}
|
|
386
|
-
```
|
|
387
|
-
|
|
388
|
-
---
|
|
389
|
-
|
|
390
|
-
### `scrape_links`
|
|
391
|
-
|
|
392
|
-
**Universal URL content extraction** with automatic fallback modes.
|
|
393
|
-
|
|
394
|
-
| Parameter | Type | Required | Default | Description |
|
|
395
|
-
|-----------|------|----------|---------|-------------|
|
|
396
|
-
| `urls` | `string[]` | Yes | — | URLs to scrape (3-50) |
|
|
397
|
-
| `timeout` | `number` | No | `30` | Timeout per URL (seconds) |
|
|
398
|
-
| `use_llm` | `boolean` | No | `false` | Enable AI extraction |
|
|
399
|
-
| `what_to_extract` | `string` | No | — | Extraction instructions for AI |
|
|
400
|
-
|
|
401
|
-
**Automatic Fallback:** Basic → JS rendering → JS + US geo-targeting
|
|
402
|
-
|
|
403
|
-
```json
|
|
404
|
-
{
|
|
405
|
-
"urls": ["https://example.com/article1", "https://example.com/article2"],
|
|
406
|
-
"use_llm": true,
|
|
407
|
-
"what_to_extract": "Extract the main arguments and key statistics"
|
|
408
|
-
}
|
|
409
|
-
```
|
|
57
|
+
### HTTP mode
|
|
410
58
|
|
|
411
|
-
|
|
412
|
-
|
|
413
|
-
### `deep_research`
|
|
414
|
-
|
|
415
|
-
**AI-powered batch research** with web search and citations.
|
|
416
|
-
|
|
417
|
-
| Parameter | Type | Required | Description |
|
|
418
|
-
|-----------|------|----------|-------------|
|
|
419
|
-
| `questions` | `object[]` | Yes | Research questions (2-10) |
|
|
420
|
-
| `questions[].question` | `string` | Yes | The research question |
|
|
421
|
-
| `questions[].file_attachments` | `object[]` | No | Files to include as context |
|
|
422
|
-
|
|
423
|
-
**Token Allocation:** 32,000 tokens distributed across questions:
|
|
424
|
-
- 2 questions → 16,000 tokens/question (deep dive)
|
|
425
|
-
- 10 questions → 3,200 tokens/question (rapid multi-topic)
|
|
426
|
-
|
|
427
|
-
```json
|
|
428
|
-
{
|
|
429
|
-
"questions": [
|
|
430
|
-
{ "question": "What are the current best practices for React Server Components in 2025?" },
|
|
431
|
-
{ "question": "Compare Bun vs Node.js for production workloads with benchmarks." }
|
|
432
|
-
]
|
|
433
|
-
}
|
|
59
|
+
```bash
|
|
60
|
+
MCP_TRANSPORT=http MCP_PORT=3000 npx mcp-researchpowerpack
|
|
434
61
|
```
|
|
435
62
|
|
|
436
|
-
|
|
437
|
-
|
|
438
|
-
## ⚙️ Environment Variables & Tool Availability
|
|
439
|
-
|
|
440
|
-
Research Powerpack uses a **modular architecture**. Tools are automatically enabled based on which API keys you provide:
|
|
63
|
+
exposes `/mcp` (POST/GET/DELETE with session headers) and `/health`.
|
|
441
64
|
|
|
442
|
-
|
|
65
|
+
## API keys
|
|
443
66
|
|
|
444
|
-
|
|
445
|
-
|:------------:|:-------------:|:---------:|
|
|
446
|
-
| `SERPER_API_KEY` | `web_search`, `search_reddit` | 2,500 queries/mo |
|
|
447
|
-
| `REDDIT_CLIENT_ID` + `SECRET` | `get_reddit_post` | Unlimited |
|
|
448
|
-
| `SCRAPEDO_API_KEY` | `scrape_links` | 1,000 credits/mo |
|
|
449
|
-
| `OPENROUTER_API_KEY` | `deep_research` + AI in `scrape_links` | Pay-as-you-go |
|
|
450
|
-
| `RESEARCH_MODEL` | Model for `deep_research` | Default: `perplexity/sonar-deep-research` |
|
|
451
|
-
| `LLM_EXTRACTION_MODEL` | Model for AI extraction in `scrape_links` | Default: `openrouter/gpt-oss-120b:nitro` |
|
|
67
|
+
each key unlocks a capability. missing keys silently disable their tools — the server never crashes.
|
|
452
68
|
|
|
453
|
-
|
|
69
|
+
| variable | enables | free tier |
|
|
70
|
+
|:---|:---|:---|
|
|
71
|
+
| `SERPER_API_KEY` | `web_search`, `search_reddit` | 2,500 searches/mo at serper.dev |
|
|
72
|
+
| `REDDIT_CLIENT_ID` + `REDDIT_CLIENT_SECRET` | `get_reddit_post` | unlimited (reddit.com/prefs/apps, "script" type) |
|
|
73
|
+
| `SCRAPEDO_API_KEY` | `scrape_links` | 1,000 credits/mo at scrape.do |
|
|
74
|
+
| `OPENROUTER_API_KEY` | `deep_research`, LLM extraction in scrape/reddit | pay-per-token at openrouter.ai |
|
|
454
75
|
|
|
455
|
-
|
|
76
|
+
## configuration
|
|
456
77
|
|
|
457
|
-
|
|
458
|
-
# Search-only mode (just web_search and search_reddit)
|
|
459
|
-
SERPER_API_KEY=xxx
|
|
460
|
-
|
|
461
|
-
# Reddit research mode (search + fetch posts)
|
|
462
|
-
SERPER_API_KEY=xxx
|
|
463
|
-
REDDIT_CLIENT_ID=xxx
|
|
464
|
-
REDDIT_CLIENT_SECRET=xxx
|
|
465
|
-
|
|
466
|
-
# Full research mode (all 5 tools)
|
|
467
|
-
SERPER_API_KEY=xxx
|
|
468
|
-
REDDIT_CLIENT_ID=xxx
|
|
469
|
-
REDDIT_CLIENT_SECRET=xxx
|
|
470
|
-
SCRAPEDO_API_KEY=xxx
|
|
471
|
-
OPENROUTER_API_KEY=xxx
|
|
472
|
-
```
|
|
473
|
-
|
|
474
|
-
### Full Power Mode
|
|
475
|
-
|
|
476
|
-
For the best research experience, configure all four API keys:
|
|
477
|
-
|
|
478
|
-
```bash
|
|
479
|
-
SERPER_API_KEY=your_serper_key # Free: 2,500 queries/month
|
|
480
|
-
REDDIT_CLIENT_ID=your_reddit_id # Free: Unlimited
|
|
481
|
-
REDDIT_CLIENT_SECRET=your_reddit_secret
|
|
482
|
-
SCRAPEDO_API_KEY=your_scrapedo_key # Free: 1,000 credits/month
|
|
483
|
-
OPENROUTER_API_KEY=your_openrouter_key # Pay-as-you-go
|
|
484
|
-
```
|
|
78
|
+
optional tuning via environment variables:
|
|
485
79
|
|
|
486
|
-
|
|
487
|
-
|
|
488
|
-
|
|
489
|
-
-
|
|
490
|
-
-
|
|
491
|
-
|
|
492
|
-
|
|
493
|
-
|
|
494
|
-
|
|
495
|
-
|
|
496
|
-
<details>
|
|
497
|
-
<summary><b>🔍 Serper API (Google Search) — FREE: 2,500 queries/month</b></summary>
|
|
498
|
-
|
|
499
|
-
#### What you get
|
|
500
|
-
- Fast Google search results via API
|
|
501
|
-
- Enables `web_search` and `search_reddit` tools
|
|
502
|
-
|
|
503
|
-
#### Setup Steps
|
|
504
|
-
1. Go to [serper.dev](https://serper.dev)
|
|
505
|
-
2. Click **"Get API Key"** (top right)
|
|
506
|
-
3. Sign up with email or Google
|
|
507
|
-
4. Copy your API key from the dashboard
|
|
508
|
-
5. Add to your config:
|
|
509
|
-
```
|
|
510
|
-
SERPER_API_KEY=your_key_here
|
|
511
|
-
```
|
|
512
|
-
|
|
513
|
-
#### Pricing
|
|
514
|
-
- **Free**: 2,500 queries/month
|
|
515
|
-
- **Paid**: $50/month for 50,000 queries
|
|
516
|
-
|
|
517
|
-
</details>
|
|
518
|
-
|
|
519
|
-
<details>
|
|
520
|
-
<summary><b>🤖 Reddit OAuth — FREE: Unlimited access</b></summary>
|
|
521
|
-
|
|
522
|
-
#### What you get
|
|
523
|
-
- Full Reddit API access
|
|
524
|
-
- Fetch posts and comments with upvote sorting
|
|
525
|
-
- Enables `get_reddit_post` tool
|
|
526
|
-
|
|
527
|
-
#### Setup Steps
|
|
528
|
-
1. Go to [reddit.com/prefs/apps](https://www.reddit.com/prefs/apps)
|
|
529
|
-
2. Scroll down and click **"create another app..."**
|
|
530
|
-
3. Fill in:
|
|
531
|
-
- **Name**: `research-powerpack` (or any name)
|
|
532
|
-
- **App type**: Select **"script"** (important!)
|
|
533
|
-
- **Redirect URI**: `http://localhost:8080`
|
|
534
|
-
4. Click **"create app"**
|
|
535
|
-
5. Copy your credentials:
|
|
536
|
-
- **Client ID**: The string under your app name
|
|
537
|
-
- **Client Secret**: The "secret" field
|
|
538
|
-
6. Add to your config:
|
|
539
|
-
```
|
|
540
|
-
REDDIT_CLIENT_ID=your_client_id
|
|
541
|
-
REDDIT_CLIENT_SECRET=your_client_secret
|
|
542
|
-
```
|
|
543
|
-
|
|
544
|
-
</details>
|
|
545
|
-
|
|
546
|
-
<details>
|
|
547
|
-
<summary><b>🌐 Scrape.do (Web Scraping) — FREE: 1,000 credits/month</b></summary>
|
|
548
|
-
|
|
549
|
-
#### What you get
|
|
550
|
-
- JavaScript rendering support
|
|
551
|
-
- Geo-targeting and CAPTCHA handling
|
|
552
|
-
- Enables `scrape_links` tool
|
|
553
|
-
|
|
554
|
-
#### Setup Steps
|
|
555
|
-
1. Go to [scrape.do](https://scrape.do)
|
|
556
|
-
2. Click **"Start Free"**
|
|
557
|
-
3. Sign up with email
|
|
558
|
-
4. Copy your API key from the dashboard
|
|
559
|
-
5. Add to your config:
|
|
560
|
-
```
|
|
561
|
-
SCRAPEDO_API_KEY=your_key_here
|
|
562
|
-
```
|
|
563
|
-
|
|
564
|
-
#### Credit Usage
|
|
565
|
-
- **Basic scrape**: 1 credit
|
|
566
|
-
- **JavaScript rendering**: 5 credits
|
|
567
|
-
- **Geo-targeting**: +25 credits
|
|
568
|
-
|
|
569
|
-
</details>
|
|
570
|
-
|
|
571
|
-
<details>
|
|
572
|
-
<summary><b>🧠 OpenRouter (AI Models) — Pay-as-you-go</b></summary>
|
|
573
|
-
|
|
574
|
-
#### What you get
|
|
575
|
-
- Access to 100+ AI models via one API
|
|
576
|
-
- Enables `deep_research` tool
|
|
577
|
-
- Enables AI extraction in `scrape_links`
|
|
578
|
-
|
|
579
|
-
#### Setup Steps
|
|
580
|
-
1. Go to [openrouter.ai](https://openrouter.ai)
|
|
581
|
-
2. Sign up with Google/GitHub/email
|
|
582
|
-
3. Go to [openrouter.ai/keys](https://openrouter.ai/keys)
|
|
583
|
-
4. Click **"Create Key"**
|
|
584
|
-
5. Copy the key (starts with `sk-or-...`)
|
|
585
|
-
6. Add to your config:
|
|
586
|
-
```
|
|
587
|
-
OPENROUTER_API_KEY=sk-or-v1-xxxxx
|
|
588
|
-
```
|
|
589
|
-
|
|
590
|
-
#### Recommended Models for Deep Research
|
|
591
|
-
```bash
|
|
592
|
-
# Default (optimized for research)
|
|
593
|
-
RESEARCH_MODEL=perplexity/sonar-deep-research
|
|
80
|
+
| variable | default | description |
|
|
81
|
+
|:---|:---|:---|
|
|
82
|
+
| `RESEARCH_MODEL` | `x-ai/grok-4-fast` | primary deep research model |
|
|
83
|
+
| `RESEARCH_FALLBACK_MODEL` | `google/gemini-2.5-flash` | fallback if primary fails |
|
|
84
|
+
| `LLM_EXTRACTION_MODEL` | `openai/gpt-oss-120b:nitro` | default model for scrape/reddit LLM extraction (can be overridden per-request via the `model` parameter in `scrape_links`) |
|
|
85
|
+
| `DEFAULT_REASONING_EFFORT` | `high` | research depth (`low`, `medium`, `high`) |
|
|
86
|
+
| `DEFAULT_MAX_URLS` | `100` | max search results per research question (10-200) |
|
|
87
|
+
| `API_TIMEOUT_MS` | `1800000` | request timeout in ms (default 30 min) |
|
|
88
|
+
| `MCP_TRANSPORT` | `stdio` | `stdio` or `http` |
|
|
89
|
+
| `MCP_PORT` | `3000` | port for HTTP mode |
|
|
594
90
|
|
|
595
|
-
|
|
596
|
-
RESEARCH_MODEL=x-ai/grok-4.1-fast
|
|
91
|
+
## how it works
|
|
597
92
|
|
|
598
|
-
|
|
599
|
-
RESEARCH_MODEL=anthropic/claude-3.5-sonnet
|
|
93
|
+
### search ranking
|
|
600
94
|
|
|
601
|
-
|
|
602
|
-
RESEARCH_MODEL=openai/gpt-4o-mini
|
|
603
|
-
```
|
|
95
|
+
results from multiple queries are deduplicated by normalized URL and scored using CTR-weighted position values (position 1 = 100.0, position 10 = 12.56). URLs appearing across multiple queries get a consensus marker. threshold tries >= 3, falls back to >= 2, then >= 1.
|
|
604
96
|
|
|
605
|
-
|
|
606
|
-
```bash
|
|
607
|
-
# Default (fast and cost-effective for extraction)
|
|
608
|
-
LLM_EXTRACTION_MODEL=openrouter/gpt-oss-120b:nitro
|
|
97
|
+
### Reddit comment budget
|
|
609
98
|
|
|
610
|
-
|
|
611
|
-
LLM_EXTRACTION_MODEL=anthropic/claude-3.5-sonnet
|
|
99
|
+
global budget of 1,000 comments, max 200 per post. after the first pass, surplus from posts with fewer comments is redistributed to truncated posts in a second fetch pass.
|
|
612
100
|
|
|
613
|
-
|
|
614
|
-
LLM_EXTRACTION_MODEL=openai/gpt-4o-mini
|
|
615
|
-
```
|
|
101
|
+
### scraping pipeline
|
|
616
102
|
|
|
617
|
-
|
|
103
|
+
three-mode fallback per URL: basic → JS rendering → JS + US geo-targeting. results go through HTML-to-markdown conversion (turndown), then optional LLM extraction with a 100k char input cap and 8,000 token output per URL. the extraction model defaults to `openai/gpt-oss-120b:nitro` (configurable via `LLM_EXTRACTION_MODEL` env var) and can be overridden per-request using the `model` parameter.
|
|
618
104
|
|
|
619
|
-
|
|
105
|
+
### deep research
|
|
620
106
|
|
|
621
|
-
|
|
107
|
+
32,000 token budget divided across questions (1 question = 32k, 10 questions = 3.2k each). Gemini models get `google_search` tool access. Grok/Perplexity get `search_parameters` with citations. primary model fails → automatic fallback.
|
|
622
108
|
|
|
623
|
-
|
|
109
|
+
### file attachments
|
|
624
110
|
|
|
625
|
-
|
|
111
|
+
`deep_research` can read local files and include them as context. files over 600 lines are smart-truncated (first 500 + last 100 lines). line numbers preserved.
|
|
626
112
|
|
|
627
|
-
|
|
628
|
-
1. web_search → ["React vs Vue 2025", "Next.js vs Nuxt comparison"]
|
|
629
|
-
2. search_reddit → ["best frontend framework 2025", "Next.js production experience"]
|
|
630
|
-
3. get_reddit_post → [URLs from step 2]
|
|
631
|
-
4. scrape_links → [Documentation and blog URLs from step 1]
|
|
632
|
-
5. deep_research → [Synthesize findings into specific questions]
|
|
633
|
-
```
|
|
113
|
+
## concurrency
|
|
634
114
|
|
|
635
|
-
|
|
115
|
+
| operation | parallel limit |
|
|
116
|
+
|:---|:---|
|
|
117
|
+
| web search keywords | 8 |
|
|
118
|
+
| Reddit search queries | 8 |
|
|
119
|
+
| Reddit post fetches per batch | 5 (batches of 10) |
|
|
120
|
+
| URL scraping per batch | 10 (batches of 30) |
|
|
121
|
+
| LLM extraction | 3 |
|
|
122
|
+
| deep research questions | 3 |
|
|
636
123
|
|
|
637
|
-
|
|
638
|
-
1. web_search → ["competitor name review", "competitor vs alternatives"]
|
|
639
|
-
2. scrape_links → [Competitor websites, review sites]
|
|
640
|
-
3. search_reddit → ["competitor name experience", "switching from competitor"]
|
|
641
|
-
4. get_reddit_post → [URLs from step 3]
|
|
642
|
-
```
|
|
124
|
+
all clients use manual retry with exponential backoff and jitter. the OpenAI SDK's built-in retry is disabled (`maxRetries: 0`).
|
|
643
125
|
|
|
644
|
-
|
|
126
|
+
## project structure
|
|
645
127
|
|
|
646
128
|
```
|
|
647
|
-
|
|
648
|
-
|
|
649
|
-
|
|
650
|
-
|
|
129
|
+
src/
|
|
130
|
+
index.ts — entry point, STDIO + HTTP transport, signal handling
|
|
131
|
+
worker.ts — Cloudflare Workers entry (Durable Objects)
|
|
132
|
+
config/
|
|
133
|
+
index.ts — env parsing (lazy Proxy objects), capability detection
|
|
134
|
+
loader.ts — YAML → Zod → JSON Schema pipeline, cached
|
|
135
|
+
yaml/tools.yaml — single source of truth for all tool definitions
|
|
136
|
+
schemas/
|
|
137
|
+
deep-research.ts — Zod validation for research questions + file attachments
|
|
138
|
+
scrape-links.ts — Zod validation for URLs, timeout, LLM options
|
|
139
|
+
web-search.ts — Zod validation for keyword arrays
|
|
140
|
+
tools/
|
|
141
|
+
registry.ts — tool lookup → capability check → validate → execute
|
|
142
|
+
search.ts — web_search handler
|
|
143
|
+
reddit.ts — search_reddit + get_reddit_post handlers
|
|
144
|
+
scrape.ts — scrape_links handler
|
|
145
|
+
research.ts — deep_research handler
|
|
146
|
+
clients/
|
|
147
|
+
search.ts — Serper API client
|
|
148
|
+
reddit.ts — Reddit OAuth + comment fetching
|
|
149
|
+
scraper.ts — scrape.do client with fallback modes
|
|
150
|
+
research.ts — OpenRouter client with model-specific handling
|
|
151
|
+
services/
|
|
152
|
+
llm-processor.ts — shared LLM extraction (singleton OpenAI client)
|
|
153
|
+
markdown-cleaner.ts — HTML → markdown via turndown
|
|
154
|
+
file-attachment.ts — local file reading with line ranges
|
|
155
|
+
utils/
|
|
156
|
+
concurrency.ts — bounded parallel execution (pMap, pMapSettled)
|
|
157
|
+
url-aggregator.ts — CTR-weighted scoring and consensus detection
|
|
158
|
+
errors.ts — error classification, fetchWithTimeout
|
|
159
|
+
logger.ts — MCP logging protocol
|
|
160
|
+
response.ts — standardized output formatting
|
|
651
161
|
```
|
|
652
162
|
|
|
653
|
-
|
|
163
|
+
## deploy
|
|
654
164
|
|
|
655
|
-
|
|
165
|
+
### Cloudflare Workers
|
|
656
166
|
|
|
657
167
|
```bash
|
|
658
|
-
|
|
659
|
-
cd mcp-researchpowerpack
|
|
660
|
-
npm install
|
|
661
|
-
npm run dev
|
|
662
|
-
npm run build
|
|
663
|
-
npm run typecheck
|
|
168
|
+
npx wrangler deploy
|
|
664
169
|
```
|
|
665
170
|
|
|
666
|
-
|
|
667
|
-
|
|
668
|
-
## 🔧 Troubleshooting
|
|
669
|
-
|
|
670
|
-
<details>
|
|
671
|
-
<summary><b>Expand for troubleshooting tips</b></summary>
|
|
672
|
-
|
|
673
|
-
| Problem | Solution |
|
|
674
|
-
| :--- | :--- |
|
|
675
|
-
| **Tool returns "API key not configured"** | Add the required ENV variable to your MCP config. The error message tells you exactly which key is missing. |
|
|
676
|
-
| **Reddit posts returning empty** | Check your `REDDIT_CLIENT_ID` and `REDDIT_CLIENT_SECRET`. Make sure you created a "script" type app. |
|
|
677
|
-
| **Scraping fails on JavaScript sites** | This is expected for the first attempt. The tool auto-retries with JS rendering. If still failing, the site may be blocking scrapers. |
|
|
678
|
-
| **Deep research taking too long** | Use a faster model like `x-ai/grok-4.1-fast` instead of `perplexity/sonar-deep-research`. |
|
|
679
|
-
| **Token limit errors** | Reduce the number of URLs/questions per request. The tool distributes a fixed token budget. |
|
|
680
|
-
|
|
681
|
-
</details>
|
|
682
|
-
|
|
683
|
-
---
|
|
684
|
-
|
|
685
|
-
<div align="center">
|
|
171
|
+
uses Durable Objects with SQLite storage. YAML-based tool definitions are replaced with inline definitions in the worker entry since there's no filesystem.
|
|
686
172
|
|
|
687
|
-
|
|
173
|
+
## license
|
|
688
174
|
|
|
689
|
-
|
|
175
|
+
MIT
|