mcp-researchpowerpack 3.6.18 → 3.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (48) hide show
  1. package/README.md +110 -624
  2. package/dist/clients/reddit.js +7 -7
  3. package/dist/clients/reddit.js.map +1 -1
  4. package/dist/clients/research.js +2 -2
  5. package/dist/clients/research.js.map +1 -1
  6. package/dist/clients/scraper.js +1 -1
  7. package/dist/clients/scraper.js.map +1 -1
  8. package/dist/clients/search.js +1 -1
  9. package/dist/clients/search.js.map +1 -1
  10. package/dist/config/yaml/tools.yaml +6 -5
  11. package/dist/schemas/scrape-links.d.ts +3 -0
  12. package/dist/schemas/scrape-links.d.ts.map +1 -1
  13. package/dist/schemas/scrape-links.js +5 -0
  14. package/dist/schemas/scrape-links.js.map +1 -1
  15. package/dist/schemas/web-search.js +2 -2
  16. package/dist/schemas/web-search.js.map +1 -1
  17. package/dist/services/llm-processor.d.ts +1 -0
  18. package/dist/services/llm-processor.d.ts.map +1 -1
  19. package/dist/services/llm-processor.js +29 -2
  20. package/dist/services/llm-processor.js.map +1 -1
  21. package/dist/tools/reddit.d.ts.map +1 -1
  22. package/dist/tools/reddit.js +46 -27
  23. package/dist/tools/reddit.js.map +1 -1
  24. package/dist/tools/registry.d.ts.map +1 -1
  25. package/dist/tools/registry.js +23 -3
  26. package/dist/tools/registry.js.map +1 -1
  27. package/dist/tools/research.d.ts.map +1 -1
  28. package/dist/tools/research.js +35 -17
  29. package/dist/tools/research.js.map +1 -1
  30. package/dist/tools/scrape.d.ts.map +1 -1
  31. package/dist/tools/scrape.js +21 -13
  32. package/dist/tools/scrape.js.map +1 -1
  33. package/dist/tools/search.d.ts.map +1 -1
  34. package/dist/tools/search.js +14 -10
  35. package/dist/tools/search.js.map +1 -1
  36. package/dist/tools/utils.d.ts.map +1 -1
  37. package/dist/tools/utils.js +4 -2
  38. package/dist/tools/utils.js.map +1 -1
  39. package/dist/utils/errors.js +3 -3
  40. package/dist/utils/errors.js.map +1 -1
  41. package/dist/utils/response.d.ts +1 -0
  42. package/dist/utils/response.d.ts.map +1 -1
  43. package/dist/utils/response.js +21 -13
  44. package/dist/utils/response.js.map +1 -1
  45. package/dist/version.d.ts +1 -1
  46. package/dist/version.js +2 -2
  47. package/dist/version.js.map +1 -1
  48. package/package.json +4 -4
package/README.md CHANGED
@@ -1,191 +1,34 @@
1
- <h1 align="center">🔬 Research Powerpack MCP 🔬</h1>
2
- <h3 align="center">Stop tab-hopping for research. Start getting structured context.</h3>
3
-
4
- <p align="center">
5
- <strong>
6
- <em>The ultimate research toolkit for your AI coding assistant. It searches the web, mines Reddit, scrapes any URL, and synthesizes everything into perfectly structured context your LLM actually understands.</em>
7
- </strong>
8
- </p>
9
-
10
- <p align="center">
11
- <!-- Package Info -->
12
- <a href="https://www.npmjs.com/package/mcp-researchpowerpack"><img alt="npm" src="https://img.shields.io/npm/v/mcp-researchpowerpack.svg?style=flat-square&color=4D87E6"></a>
13
- <a href="#"><img alt="node" src="https://img.shields.io/badge/node-18+-4D87E6.svg?style=flat-square"></a>
14
- &nbsp;&nbsp;•&nbsp;&nbsp;
15
- <!-- Features -->
16
- <a href="https://opensource.org/licenses/MIT"><img alt="license" src="https://img.shields.io/badge/License-MIT-F9A825.svg?style=flat-square"></a>
17
- <a href="#"><img alt="platform" src="https://img.shields.io/badge/platform-macOS_|_Linux_|_Windows-2ED573.svg?style=flat-square"></a>
18
- </p>
19
-
20
- <p align="center">
21
- <img alt="modular" src="https://img.shields.io/badge/🧩_modular-use_1_tool_or_all_5-2ED573.svg?style=for-the-badge">
22
- <img alt="zero crash" src="https://img.shields.io/badge/💪_zero_crash-missing_keys_=_helpful_errors-2ED573.svg?style=for-the-badge">
23
- </p>
24
-
25
- <div align="center">
26
-
27
- ### 🧭 Quick Navigation
28
-
29
- [**⚡ Get Started**](#-get-started-in-60-seconds) •
30
- [**🎯 Why Research Powerpack**](#-why-research-powerpack) •
31
- [**🎮 Tools**](#-tool-reference) •
32
- [**⚙️ Configuration**](#%EF%B8%8F-environment-variables--tool-availability) •
33
- [**📚 Examples**](#-recommended-workflows)
34
-
35
- </div>
36
-
37
- ---
38
-
39
- **`research-powerpack-mcp`** is the research assistant your AI has been missing. Stop asking your LLM to guess about things it doesn't know. This MCP server acts like a senior researcher -- searching the web, mining Reddit discussions, scraping documentation, and synthesizing everything into structured context so your AI can give you answers you can actually trust.
40
-
41
- <div align="center">
42
- <table>
43
- <tr>
44
- <td align="center">
45
- <h3>🔍</h3>
46
- <b>Batch Web Search</b><br/>
47
- <sub>100 keywords in parallel</sub>
48
- </td>
49
- <td align="center">
50
- <h3>💬</h3>
51
- <b>Reddit Mining</b><br/>
52
- <sub>Real opinions, not marketing</sub>
53
- </td>
54
- <td align="center">
55
- <h3>🌐</h3>
56
- <b>Universal Scraping</b><br/>
57
- <sub>JS rendering + geo-targeting</sub>
58
- </td>
59
- <td align="center">
60
- <h3>🧠</h3>
61
- <b>Deep Research</b><br/>
62
- <sub>AI synthesis with citations</sub>
63
- </td>
64
- </tr>
65
- </table>
66
- </div>
67
-
68
- Here's how it works:
69
- - **You:** "What's the best database for my use case?"
70
- - **AI + Powerpack:** Searches Google, mines Reddit threads, scrapes docs, synthesizes findings.
71
- - **You:** Get an actually informed answer with real community opinions and citations.
72
- - **Result:** Better decisions, faster. No more juggling 47 browser tabs.
73
-
74
- ---
75
-
76
- ## 🎯 Why Research Powerpack
77
-
78
- Manual research is tedious and error-prone. `research-powerpack-mcp` replaces that entire workflow with a single integrated pipeline.
79
-
80
- <table align="center">
81
- <tr>
82
- <td align="center"><b>❌ Without Research Powerpack</b></td>
83
- <td align="center"><b>✅ With Research Powerpack</b></td>
84
- </tr>
85
- <tr>
86
- <td>
87
- <ol>
88
- <li>Open 15 browser tabs.</li>
89
- <li>Skim Stack Overflow answers from 2019.</li>
90
- <li>Search Reddit, get distracted along the way.</li>
91
- <li>Copy-paste random snippets to your AI.</li>
92
- <li>Get a mediocre answer from confused context.</li>
93
- </ol>
94
- </td>
95
- <td>
96
- <ol>
97
- <li>Ask your AI to research it.</li>
98
- <li>AI searches, scrapes, mines Reddit automatically.</li>
99
- <li>Receive synthesized insights with sources.</li>
100
- <li>Make an informed decision.</li>
101
- <li>Move on to the work that matters. ☕</li>
102
- </ol>
103
- </td>
104
- </tr>
105
- </table>
106
-
107
- This isn't just fetching random pages. Research Powerpack builds **high-signal, low-noise context** with CTR-weighted ranking, smart comment allocation, and intelligent token distribution that prevents massive responses from breaking your LLM's context window.
108
-
109
- ---
110
-
111
- ## 🚀 Get Started in 60 Seconds
112
-
113
- ### 1. Install
1
+ MCP server that gives your AI assistant research tools. Google search, Reddit deep-dives, web scraping with LLM extraction, and multi-model deep research — all as MCP tools that chain into each other.
114
2
 
115
3
  ```bash
116
- npm install research-powerpack-mcp
4
+ npx mcp-researchpowerpack
117
5
  ```
118
6
 
119
- ### 2. Configure Your MCP Client
7
+ five tools, zero config to start. each API key you add unlocks more capabilities.
120
8
 
121
- <div align="center">
9
+ [![npm](https://img.shields.io/npm/v/mcp-researchpowerpack.svg?style=flat-square)](https://www.npmjs.com/package/mcp-researchpowerpack)
10
+ [![node](https://img.shields.io/badge/node-20+-93450a.svg?style=flat-square)](https://nodejs.org/)
11
+ [![license](https://img.shields.io/badge/license-MIT-grey.svg?style=flat-square)](https://opensource.org/licenses/MIT)
122
12
 
123
- | Client | Config File | Docs |
124
- |:------:|:-----------:|:----:|
125
- | 🖥️ **Claude Desktop** | `claude_desktop_config.json` | [Setup](#claude-desktop) |
126
- | ⌨️ **Claude Code** | `~/.claude.json` or CLI | [Setup](#claude-code-cli) |
127
- | 🎯 **Cursor** | `.cursor/mcp.json` | [Setup](#cursorwindsurf) |
128
- | 🏄 **Windsurf** | MCP settings | [Setup](#cursorwindsurf) |
129
-
130
- </div>
131
-
132
- #### Claude Desktop
133
-
134
- Add to your `claude_desktop_config.json`:
135
-
136
- ```json
137
- {
138
- "mcpServers": {
139
- "research-powerpack": {
140
- "command": "npx",
141
- "args": ["mcp-researchpowerpack"],
142
- "env": {
143
- "SERPER_API_KEY": "your_key",
144
- "REDDIT_CLIENT_ID": "your_id",
145
- "REDDIT_CLIENT_SECRET": "your_secret",
146
- "SCRAPEDO_API_KEY": "your_key",
147
- "OPENROUTER_API_KEY": "your_key"
148
- }
149
- }
150
- }
151
- }
152
- ```
13
+ ---
153
14
 
154
- or quick install (for macOS):
15
+ ## tools
155
16
 
156
- ```
157
- cat ~/Library/Application\ Support/Claude/claude_desktop_config.json | jq '.mcpServers["research-powerpack"] = {
158
- "command": "npx",
159
- "args": ["research-powerpack-mcp@latest"],
160
- "disabled": false,
161
- "env": {
162
- "OPENROUTER_API_KEY": "xxx",
163
- "REDDIT_CLIENT_ID": "xxx",
164
- "REDDIT_CLIENT_SECRET": "xxx",
165
- "RESEARCH_MODEL": "xxxx",
166
- "SCRAPEDO_API_KEY": "xxx",
167
- "SERPER_API_KEY": "xxxx"
168
- }
169
- }' | tee ~/Library/Application\ Support/Claude/claude_desktop_config.json
170
- ```
17
+ | tool | what it does | requires |
18
+ |:---|:---|:---|
19
+ | `web_search` | parallel Google search across 3-100 keywords, CTR-weighted ranking, consensus detection | `SERPER_API_KEY` |
20
+ | `search_reddit` | same engine but filtered to reddit.com, 10-50 queries in parallel | `SERPER_API_KEY` |
21
+ | `get_reddit_post` | fetches 2-50 Reddit posts with full comment trees, optional LLM extraction | `REDDIT_CLIENT_ID` + `REDDIT_CLIENT_SECRET` |
22
+ | `scrape_links` | scrapes 1-50 URLs with JS rendering fallback, HTML-to-markdown, optional LLM extraction | `SCRAPEDO_API_KEY` |
23
+ | `deep_research` | sends questions to research-capable models (Grok, Gemini) with web search enabled, supports local file attachments | `OPENROUTER_API_KEY` |
171
24
 
172
- #### Claude Code (CLI)
25
+ tools are designed to chain: `web_search` suggests calling `scrape_links`, which suggests `search_reddit`, which suggests `get_reddit_post`, which suggests `deep_research` for synthesis.
173
26
 
174
- One command to set everything up:
27
+ ## install
175
28
 
176
- ```bash
177
- claude mcp add research-powerpack npx \
178
- --scope user \
179
- --env SERPER_API_KEY=your_key \
180
- --env REDDIT_CLIENT_ID=your_id \
181
- --env REDDIT_CLIENT_SECRET=your_secret \
182
- --env OPENROUTER_API_KEY=your_key \
183
- --env OPENROUTER_BASE_URL=https://openrouter.ai/api/v1 \
184
- --env RESEARCH_MODEL=x-ai/grok-4.1-fast \
185
- -- research-powerpack-mcp
186
- ```
29
+ ### Claude Desktop / Claude Code
187
30
 
188
- Or manually add to `~/.claude.json`:
31
+ add to your MCP config:
189
32
 
190
33
  ```json
191
34
  {
@@ -194,496 +37,139 @@ Or manually add to `~/.claude.json`:
194
37
  "command": "npx",
195
38
  "args": ["mcp-researchpowerpack"],
196
39
  "env": {
197
- "SERPER_API_KEY": "your_key",
198
- "REDDIT_CLIENT_ID": "your_id",
199
- "REDDIT_CLIENT_SECRET": "your_secret",
200
- "OPENROUTER_API_KEY": "your_key",
201
- "OPENROUTER_BASE_URL": "https://openrouter.ai/api/v1",
202
- "RESEARCH_MODEL": "x-ai/grok-4.1-fast"
40
+ "SERPER_API_KEY": "...",
41
+ "OPENROUTER_API_KEY": "..."
203
42
  }
204
43
  }
205
44
  }
206
45
  }
207
46
  ```
208
47
 
209
- #### Cursor/Windsurf
210
-
211
- Add to `.cursor/mcp.json` or equivalent:
212
-
213
- ```json
214
- {
215
- "mcpServers": {
216
- "research-powerpack": {
217
- "command": "npx",
218
- "args": ["mcp-researchpowerpack"],
219
- "env": {
220
- "SERPER_API_KEY": "your_key"
221
- }
222
- }
223
- }
224
- }
225
- ```
226
-
227
- > **Zero Crash Promise:** Missing API keys? No problem. The server always starts. Tools that require missing keys return helpful setup instructions instead of crashing.
228
-
229
- ---
230
-
231
- ## 🌐 Transport Modes
232
-
233
- Research Powerpack supports three transport modes:
234
-
235
- | Mode | Use Case | How to Start |
236
- |------|----------|-------------|
237
- | **STDIO** (default) | Claude Desktop, Cursor, Windsurf | `npx mcp-researchpowerpack` |
238
- | **HTTP Streamable** | Self-hosted, Docker, LAN sharing | `MCP_TRANSPORT=http npx mcp-researchpowerpack` |
239
- | **Cloudflare Workers** | Serverless, globally distributed | Already deployed ↓ |
240
-
241
- ### Remote MCP (Cloudflare Workers)
242
-
243
- A remote MCP endpoint is deployed and ready to use:
244
-
245
- ```
246
- https://mcp-researchpowerpack.workers.yigitkonur.com/mcp
247
- ```
248
-
249
- Connect from any MCP client that supports HTTP Streamable transport:
250
-
251
- ```json
252
- {
253
- "mcpServers": {
254
- "research-powerpack-remote": {
255
- "type": "streamable-http",
256
- "url": "https://mcp-researchpowerpack.workers.yigitkonur.com/mcp"
257
- }
258
- }
259
- }
260
- ```
261
-
262
- ### Self-Hosted HTTP Streamable
48
+ ### from source
263
49
 
264
50
  ```bash
265
- # Start on default port 3001
266
- MCP_TRANSPORT=http npx mcp-researchpowerpack
267
-
268
- # Custom port
269
- MCP_TRANSPORT=http MCP_PORT=8080 npx mcp-researchpowerpack
270
- ```
271
-
272
- ```json
273
- {
274
- "mcpServers": {
275
- "research-powerpack-http": {
276
- "type": "streamable-http",
277
- "url": "http://localhost:3001/mcp"
278
- }
279
- }
280
- }
281
- ```
282
-
283
- ---
284
-
285
- ## 🎮 Tool Reference
286
-
287
- <div align="center">
288
- <table>
289
- <tr>
290
- <td align="center">
291
- <h3>🔍</h3>
292
- <b><code>web_search</code></b><br/>
293
- <sub>Batch Google search</sub>
294
- </td>
295
- <td align="center">
296
- <h3>💬</h3>
297
- <b><code>search_reddit</code></b><br/>
298
- <sub>Find Reddit discussions</sub>
299
- </td>
300
- <td align="center">
301
- <h3>📖</h3>
302
- <b><code>get_reddit_post</code></b><br/>
303
- <sub>Fetch posts + comments</sub>
304
- </td>
305
- <td align="center">
306
- <h3>🌐</h3>
307
- <b><code>scrape_links</code></b><br/>
308
- <sub>Extract any URL</sub>
309
- </td>
310
- <td align="center">
311
- <h3>🧠</h3>
312
- <b><code>deep_research</code></b><br/>
313
- <sub>AI synthesis</sub>
314
- </td>
315
- </tr>
316
- </table>
317
- </div>
318
-
319
- ### `web_search`
320
-
321
- **Batch web search** using Google via Serper API. Search up to 100 keywords in parallel.
322
-
323
- | Parameter | Type | Required | Description |
324
- |-----------|------|----------|-------------|
325
- | `keywords` | `string[]` | Yes | Search queries (1-100). Use distinct keywords for maximum coverage. |
326
-
327
- **Supports Google operators:** `site:`, `-exclusion`, `"exact phrase"`, `filetype:`
328
-
329
- ```json
330
- {
331
- "keywords": [
332
- "best IDE 2025",
333
- "VS Code alternatives",
334
- "Cursor vs Windsurf comparison"
335
- ]
336
- }
337
- ```
338
-
339
- ---
340
-
341
- ### `search_reddit`
342
-
343
- **Search Reddit** via Google with automatic `site:reddit.com` filtering.
344
-
345
- | Parameter | Type | Required | Description |
346
- |-----------|------|----------|-------------|
347
- | `queries` | `string[]` | Yes | Search queries (max 10) |
348
- | `date_after` | `string` | No | Filter results after date (YYYY-MM-DD) |
349
-
350
- **Search operators:** `intitle:keyword`, `"exact phrase"`, `OR`, `-exclude`
351
-
352
- ```json
353
- {
354
- "queries": [
355
- "best mechanical keyboard 2025",
356
- "intitle:keyboard recommendation"
357
- ],
358
- "date_after": "2024-01-01"
359
- }
51
+ git clone https://github.com/yigitkonur/mcp-research-powerpack.git
52
+ cd mcp-research-powerpack
53
+ pnpm install && pnpm build
54
+ pnpm start
360
55
  ```
361
56
 
362
- ---
363
-
364
- ### `get_reddit_post`
365
-
366
- **Fetch Reddit posts** with smart comment allocation (1,000 comment budget distributed automatically).
367
-
368
- | Parameter | Type | Required | Default | Description |
369
- |-----------|------|----------|---------|-------------|
370
- | `urls` | `string[]` | Yes | — | Reddit post URLs (2-50) |
371
- | `fetch_comments` | `boolean` | No | `true` | Whether to fetch comments |
372
- | `max_comments` | `number` | No | auto | Override comment allocation |
373
-
374
- **Smart Allocation:**
375
- - 2 posts → ~500 comments/post (deep dive)
376
- - 10 posts → ~100 comments/post
377
- - 50 posts → ~20 comments/post (quick scan)
378
-
379
- ```json
380
- {
381
- "urls": [
382
- "https://reddit.com/r/programming/comments/abc123/post_title",
383
- "https://reddit.com/r/webdev/comments/def456/another_post"
384
- ]
385
- }
386
- ```
387
-
388
- ---
389
-
390
- ### `scrape_links`
391
-
392
- **Universal URL content extraction** with automatic fallback modes.
393
-
394
- | Parameter | Type | Required | Default | Description |
395
- |-----------|------|----------|---------|-------------|
396
- | `urls` | `string[]` | Yes | — | URLs to scrape (3-50) |
397
- | `timeout` | `number` | No | `30` | Timeout per URL (seconds) |
398
- | `use_llm` | `boolean` | No | `false` | Enable AI extraction |
399
- | `what_to_extract` | `string` | No | — | Extraction instructions for AI |
400
-
401
- **Automatic Fallback:** Basic → JS rendering → JS + US geo-targeting
402
-
403
- ```json
404
- {
405
- "urls": ["https://example.com/article1", "https://example.com/article2"],
406
- "use_llm": true,
407
- "what_to_extract": "Extract the main arguments and key statistics"
408
- }
409
- ```
57
+ ### HTTP mode
410
58
 
411
- ---
412
-
413
- ### `deep_research`
414
-
415
- **AI-powered batch research** with web search and citations.
416
-
417
- | Parameter | Type | Required | Description |
418
- |-----------|------|----------|-------------|
419
- | `questions` | `object[]` | Yes | Research questions (2-10) |
420
- | `questions[].question` | `string` | Yes | The research question |
421
- | `questions[].file_attachments` | `object[]` | No | Files to include as context |
422
-
423
- **Token Allocation:** 32,000 tokens distributed across questions:
424
- - 2 questions → 16,000 tokens/question (deep dive)
425
- - 10 questions → 3,200 tokens/question (rapid multi-topic)
426
-
427
- ```json
428
- {
429
- "questions": [
430
- { "question": "What are the current best practices for React Server Components in 2025?" },
431
- { "question": "Compare Bun vs Node.js for production workloads with benchmarks." }
432
- ]
433
- }
59
+ ```bash
60
+ MCP_TRANSPORT=http MCP_PORT=3000 npx mcp-researchpowerpack
434
61
  ```
435
62
 
436
- ---
437
-
438
- ## ⚙️ Environment Variables & Tool Availability
439
-
440
- Research Powerpack uses a **modular architecture**. Tools are automatically enabled based on which API keys you provide:
63
+ exposes `/mcp` (POST/GET/DELETE with session headers) and `/health`.
441
64
 
442
- <div align="center">
65
+ ## API keys
443
66
 
444
- | ENV Variable | Tools Enabled | Free Tier |
445
- |:------------:|:-------------:|:---------:|
446
- | `SERPER_API_KEY` | `web_search`, `search_reddit` | 2,500 queries/mo |
447
- | `REDDIT_CLIENT_ID` + `SECRET` | `get_reddit_post` | Unlimited |
448
- | `SCRAPEDO_API_KEY` | `scrape_links` | 1,000 credits/mo |
449
- | `OPENROUTER_API_KEY` | `deep_research` + AI in `scrape_links` | Pay-as-you-go |
450
- | `RESEARCH_MODEL` | Model for `deep_research` | Default: `perplexity/sonar-deep-research` |
451
- | `LLM_EXTRACTION_MODEL` | Model for AI extraction in `scrape_links` | Default: `openrouter/gpt-oss-120b:nitro` |
67
+ each key unlocks a capability. missing keys silently disable their tools — the server never crashes.
452
68
 
453
- </div>
69
+ | variable | enables | free tier |
70
+ |:---|:---|:---|
71
+ | `SERPER_API_KEY` | `web_search`, `search_reddit` | 2,500 searches/mo at serper.dev |
72
+ | `REDDIT_CLIENT_ID` + `REDDIT_CLIENT_SECRET` | `get_reddit_post` | unlimited (reddit.com/prefs/apps, "script" type) |
73
+ | `SCRAPEDO_API_KEY` | `scrape_links` | 1,000 credits/mo at scrape.do |
74
+ | `OPENROUTER_API_KEY` | `deep_research`, LLM extraction in scrape/reddit | pay-per-token at openrouter.ai |
454
75
 
455
- ### Configuration Examples
76
+ ## configuration
456
77
 
457
- ```bash
458
- # Search-only mode (just web_search and search_reddit)
459
- SERPER_API_KEY=xxx
460
-
461
- # Reddit research mode (search + fetch posts)
462
- SERPER_API_KEY=xxx
463
- REDDIT_CLIENT_ID=xxx
464
- REDDIT_CLIENT_SECRET=xxx
465
-
466
- # Full research mode (all 5 tools)
467
- SERPER_API_KEY=xxx
468
- REDDIT_CLIENT_ID=xxx
469
- REDDIT_CLIENT_SECRET=xxx
470
- SCRAPEDO_API_KEY=xxx
471
- OPENROUTER_API_KEY=xxx
472
- ```
473
-
474
- ### Full Power Mode
475
-
476
- For the best research experience, configure all four API keys:
477
-
478
- ```bash
479
- SERPER_API_KEY=your_serper_key # Free: 2,500 queries/month
480
- REDDIT_CLIENT_ID=your_reddit_id # Free: Unlimited
481
- REDDIT_CLIENT_SECRET=your_reddit_secret
482
- SCRAPEDO_API_KEY=your_scrapedo_key # Free: 1,000 credits/month
483
- OPENROUTER_API_KEY=your_openrouter_key # Pay-as-you-go
484
- ```
78
+ optional tuning via environment variables:
485
79
 
486
- This unlocks:
487
- - **5 research tools** working together
488
- - **AI-powered content extraction** in scrape_links
489
- - **Deep research with web search** and citations
490
- - **Complete Reddit mining** (search fetch analyze)
491
-
492
- **Total setup time:** ~10 minutes. **Total free tier value:** ~$50/month equivalent.
493
-
494
- ### 🔑 API Key Setup Guides
495
-
496
- <details>
497
- <summary><b>🔍 Serper API (Google Search) — FREE: 2,500 queries/month</b></summary>
498
-
499
- #### What you get
500
- - Fast Google search results via API
501
- - Enables `web_search` and `search_reddit` tools
502
-
503
- #### Setup Steps
504
- 1. Go to [serper.dev](https://serper.dev)
505
- 2. Click **"Get API Key"** (top right)
506
- 3. Sign up with email or Google
507
- 4. Copy your API key from the dashboard
508
- 5. Add to your config:
509
- ```
510
- SERPER_API_KEY=your_key_here
511
- ```
512
-
513
- #### Pricing
514
- - **Free**: 2,500 queries/month
515
- - **Paid**: $50/month for 50,000 queries
516
-
517
- </details>
518
-
519
- <details>
520
- <summary><b>🤖 Reddit OAuth — FREE: Unlimited access</b></summary>
521
-
522
- #### What you get
523
- - Full Reddit API access
524
- - Fetch posts and comments with upvote sorting
525
- - Enables `get_reddit_post` tool
526
-
527
- #### Setup Steps
528
- 1. Go to [reddit.com/prefs/apps](https://www.reddit.com/prefs/apps)
529
- 2. Scroll down and click **"create another app..."**
530
- 3. Fill in:
531
- - **Name**: `research-powerpack` (or any name)
532
- - **App type**: Select **"script"** (important!)
533
- - **Redirect URI**: `http://localhost:8080`
534
- 4. Click **"create app"**
535
- 5. Copy your credentials:
536
- - **Client ID**: The string under your app name
537
- - **Client Secret**: The "secret" field
538
- 6. Add to your config:
539
- ```
540
- REDDIT_CLIENT_ID=your_client_id
541
- REDDIT_CLIENT_SECRET=your_client_secret
542
- ```
543
-
544
- </details>
545
-
546
- <details>
547
- <summary><b>🌐 Scrape.do (Web Scraping) — FREE: 1,000 credits/month</b></summary>
548
-
549
- #### What you get
550
- - JavaScript rendering support
551
- - Geo-targeting and CAPTCHA handling
552
- - Enables `scrape_links` tool
553
-
554
- #### Setup Steps
555
- 1. Go to [scrape.do](https://scrape.do)
556
- 2. Click **"Start Free"**
557
- 3. Sign up with email
558
- 4. Copy your API key from the dashboard
559
- 5. Add to your config:
560
- ```
561
- SCRAPEDO_API_KEY=your_key_here
562
- ```
563
-
564
- #### Credit Usage
565
- - **Basic scrape**: 1 credit
566
- - **JavaScript rendering**: 5 credits
567
- - **Geo-targeting**: +25 credits
568
-
569
- </details>
570
-
571
- <details>
572
- <summary><b>🧠 OpenRouter (AI Models) — Pay-as-you-go</b></summary>
573
-
574
- #### What you get
575
- - Access to 100+ AI models via one API
576
- - Enables `deep_research` tool
577
- - Enables AI extraction in `scrape_links`
578
-
579
- #### Setup Steps
580
- 1. Go to [openrouter.ai](https://openrouter.ai)
581
- 2. Sign up with Google/GitHub/email
582
- 3. Go to [openrouter.ai/keys](https://openrouter.ai/keys)
583
- 4. Click **"Create Key"**
584
- 5. Copy the key (starts with `sk-or-...`)
585
- 6. Add to your config:
586
- ```
587
- OPENROUTER_API_KEY=sk-or-v1-xxxxx
588
- ```
589
-
590
- #### Recommended Models for Deep Research
591
- ```bash
592
- # Default (optimized for research)
593
- RESEARCH_MODEL=perplexity/sonar-deep-research
80
+ | variable | default | description |
81
+ |:---|:---|:---|
82
+ | `RESEARCH_MODEL` | `x-ai/grok-4-fast` | primary deep research model |
83
+ | `RESEARCH_FALLBACK_MODEL` | `google/gemini-2.5-flash` | fallback if primary fails |
84
+ | `LLM_EXTRACTION_MODEL` | `openai/gpt-oss-120b:nitro` | default model for scrape/reddit LLM extraction (can be overridden per-request via the `model` parameter in `scrape_links`) |
85
+ | `DEFAULT_REASONING_EFFORT` | `high` | research depth (`low`, `medium`, `high`) |
86
+ | `DEFAULT_MAX_URLS` | `100` | max search results per research question (10-200) |
87
+ | `API_TIMEOUT_MS` | `1800000` | request timeout in ms (default 30 min) |
88
+ | `MCP_TRANSPORT` | `stdio` | `stdio` or `http` |
89
+ | `MCP_PORT` | `3000` | port for HTTP mode |
594
90
 
595
- # Fast and capable
596
- RESEARCH_MODEL=x-ai/grok-4.1-fast
91
+ ## how it works
597
92
 
598
- # High quality
599
- RESEARCH_MODEL=anthropic/claude-3.5-sonnet
93
+ ### search ranking
600
94
 
601
- # Budget-friendly
602
- RESEARCH_MODEL=openai/gpt-4o-mini
603
- ```
95
+ results from multiple queries are deduplicated by normalized URL and scored using CTR-weighted position values (position 1 = 100.0, position 10 = 12.56). URLs appearing across multiple queries get a consensus marker. threshold tries >= 3, falls back to >= 2, then >= 1.
604
96
 
605
- #### Recommended Models for AI Extraction (`use_llm` in `scrape_links`)
606
- ```bash
607
- # Default (fast and cost-effective for extraction)
608
- LLM_EXTRACTION_MODEL=openrouter/gpt-oss-120b:nitro
97
+ ### Reddit comment budget
609
98
 
610
- # High quality extraction
611
- LLM_EXTRACTION_MODEL=anthropic/claude-3.5-sonnet
99
+ global budget of 1,000 comments, max 200 per post. after the first pass, surplus from posts with fewer comments is redistributed to truncated posts in a second fetch pass.
612
100
 
613
- # Budget-friendly
614
- LLM_EXTRACTION_MODEL=openai/gpt-4o-mini
615
- ```
101
+ ### scraping pipeline
616
102
 
617
- > **Note:** `RESEARCH_MODEL` and `LLM_EXTRACTION_MODEL` are independent. You can use a powerful model for deep research and a faster/cheaper model for content extraction, or vice versa.
103
+ three-mode fallback per URL: basic JS rendering → JS + US geo-targeting. results go through HTML-to-markdown conversion (turndown), then optional LLM extraction with a 100k char input cap and 8,000 token output per URL. the extraction model defaults to `openai/gpt-oss-120b:nitro` (configurable via `LLM_EXTRACTION_MODEL` env var) and can be overridden per-request using the `model` parameter.
618
104
 
619
- </details>
105
+ ### deep research
620
106
 
621
- ---
107
+ 32,000 token budget divided across questions (1 question = 32k, 10 questions = 3.2k each). Gemini models get `google_search` tool access. Grok/Perplexity get `search_parameters` with citations. primary model fails → automatic fallback.
622
108
 
623
- ## 📚 Recommended Workflows
109
+ ### file attachments
624
110
 
625
- ### Research a Technology Decision
111
+ `deep_research` can read local files and include them as context. files over 600 lines are smart-truncated (first 500 + last 100 lines). line numbers preserved.
626
112
 
627
- ```
628
- 1. web_search → ["React vs Vue 2025", "Next.js vs Nuxt comparison"]
629
- 2. search_reddit → ["best frontend framework 2025", "Next.js production experience"]
630
- 3. get_reddit_post → [URLs from step 2]
631
- 4. scrape_links → [Documentation and blog URLs from step 1]
632
- 5. deep_research → [Synthesize findings into specific questions]
633
- ```
113
+ ## concurrency
634
114
 
635
- ### Competitive Analysis
115
+ | operation | parallel limit |
116
+ |:---|:---|
117
+ | web search keywords | 8 |
118
+ | Reddit search queries | 8 |
119
+ | Reddit post fetches per batch | 5 (batches of 10) |
120
+ | URL scraping per batch | 10 (batches of 30) |
121
+ | LLM extraction | 3 |
122
+ | deep research questions | 3 |
636
123
 
637
- ```
638
- 1. web_search → ["competitor name review", "competitor vs alternatives"]
639
- 2. scrape_links → [Competitor websites, review sites]
640
- 3. search_reddit → ["competitor name experience", "switching from competitor"]
641
- 4. get_reddit_post → [URLs from step 3]
642
- ```
124
+ all clients use manual retry with exponential backoff and jitter. the OpenAI SDK's built-in retry is disabled (`maxRetries: 0`).
643
125
 
644
- ### Debug an Obscure Error
126
+ ## project structure
645
127
 
646
128
  ```
647
- 1. web_search → ["exact error message", "error + framework name"]
648
- 2. search_reddit → ["error message", "framework + error type"]
649
- 3. get_reddit_post [URLs with solutions]
650
- 4. scrape_links → [Stack Overflow answers, GitHub issues]
129
+ src/
130
+ index.ts — entry point, STDIO + HTTP transport, signal handling
131
+ worker.ts — Cloudflare Workers entry (Durable Objects)
132
+ config/
133
+ index.ts — env parsing (lazy Proxy objects), capability detection
134
+ loader.ts — YAML → Zod → JSON Schema pipeline, cached
135
+ yaml/tools.yaml — single source of truth for all tool definitions
136
+ schemas/
137
+ deep-research.ts — Zod validation for research questions + file attachments
138
+ scrape-links.ts — Zod validation for URLs, timeout, LLM options
139
+ web-search.ts — Zod validation for keyword arrays
140
+ tools/
141
+ registry.ts — tool lookup → capability check → validate → execute
142
+ search.ts — web_search handler
143
+ reddit.ts — search_reddit + get_reddit_post handlers
144
+ scrape.ts — scrape_links handler
145
+ research.ts — deep_research handler
146
+ clients/
147
+ search.ts — Serper API client
148
+ reddit.ts — Reddit OAuth + comment fetching
149
+ scraper.ts — scrape.do client with fallback modes
150
+ research.ts — OpenRouter client with model-specific handling
151
+ services/
152
+ llm-processor.ts — shared LLM extraction (singleton OpenAI client)
153
+ markdown-cleaner.ts — HTML → markdown via turndown
154
+ file-attachment.ts — local file reading with line ranges
155
+ utils/
156
+ concurrency.ts — bounded parallel execution (pMap, pMapSettled)
157
+ url-aggregator.ts — CTR-weighted scoring and consensus detection
158
+ errors.ts — error classification, fetchWithTimeout
159
+ logger.ts — MCP logging protocol
160
+ response.ts — standardized output formatting
651
161
  ```
652
162
 
653
- ---
163
+ ## deploy
654
164
 
655
- ## 🛠️ Development
165
+ ### Cloudflare Workers
656
166
 
657
167
  ```bash
658
- git clone https://github.com/yigitkonur/mcp-researchpowerpack.git
659
- cd mcp-researchpowerpack
660
- npm install
661
- npm run dev
662
- npm run build
663
- npm run typecheck
168
+ npx wrangler deploy
664
169
  ```
665
170
 
666
- ---
667
-
668
- ## 🔧 Troubleshooting
669
-
670
- <details>
671
- <summary><b>Expand for troubleshooting tips</b></summary>
672
-
673
- | Problem | Solution |
674
- | :--- | :--- |
675
- | **Tool returns "API key not configured"** | Add the required ENV variable to your MCP config. The error message tells you exactly which key is missing. |
676
- | **Reddit posts returning empty** | Check your `REDDIT_CLIENT_ID` and `REDDIT_CLIENT_SECRET`. Make sure you created a "script" type app. |
677
- | **Scraping fails on JavaScript sites** | This is expected for the first attempt. The tool auto-retries with JS rendering. If still failing, the site may be blocking scrapers. |
678
- | **Deep research taking too long** | Use a faster model like `x-ai/grok-4.1-fast` instead of `perplexity/sonar-deep-research`. |
679
- | **Token limit errors** | Reduce the number of URLs/questions per request. The tool distributes a fixed token budget. |
680
-
681
- </details>
682
-
683
- ---
684
-
685
- <div align="center">
171
+ uses Durable Objects with SQLite storage. YAML-based tool definitions are replaced with inline definitions in the worker entry since there's no filesystem.
686
172
 
687
- MIT © [Yigit Konur](https://github.com/yigitkonur)
173
+ ## license
688
174
 
689
- </div>
175
+ MIT