research-powerpack-mcp 3.3.2 → 3.4.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +43 -0
- package/dist/config/index.d.ts +1 -1
- package/dist/config/index.d.ts.map +1 -1
- package/dist/config/loader.d.ts +40 -0
- package/dist/config/loader.d.ts.map +1 -0
- package/dist/config/loader.js +300 -0
- package/dist/config/loader.js.map +1 -0
- package/dist/config/types.d.ts +80 -0
- package/dist/config/types.d.ts.map +1 -0
- package/dist/config/types.js +6 -0
- package/dist/config/types.js.map +1 -0
- package/dist/config/yaml/tools.yaml +617 -0
- package/dist/index.js +13 -122
- package/dist/index.js.map +1 -1
- package/dist/schemas/web-search.js +2 -2
- package/dist/schemas/web-search.js.map +1 -1
- package/dist/tools/definitions.d.ts +12 -62
- package/dist/tools/definitions.d.ts.map +1 -1
- package/dist/tools/definitions.js +13 -159
- package/dist/tools/definitions.js.map +1 -1
- package/dist/tools/registry.d.ts +71 -0
- package/dist/tools/registry.d.ts.map +1 -0
- package/dist/tools/registry.js +238 -0
- package/dist/tools/registry.js.map +1 -0
- package/dist/tools/utils.d.ts +92 -0
- package/dist/tools/utils.d.ts.map +1 -0
- package/dist/tools/utils.js +142 -0
- package/dist/tools/utils.js.map +1 -0
- package/package.json +3 -2
|
@@ -0,0 +1,617 @@
|
|
|
1
|
+
# Research Powerpack MCP Server - Tool Configuration (ENHANCED)
|
|
2
|
+
# Single source of truth for all tool metadata, descriptions, and parameter schemas
|
|
3
|
+
# Version: 1.0-enhanced
|
|
4
|
+
# Optimization: ~60% token reduction from original, 95%+ instructional context preserved
|
|
5
|
+
|
|
6
|
+
version: "1.0"
|
|
7
|
+
|
|
8
|
+
metadata:
|
|
9
|
+
name: "research-powerpack-mcp"
|
|
10
|
+
description: "Research tools for AI assistants"
|
|
11
|
+
|
|
12
|
+
# ============================================================================
|
|
13
|
+
# SHARED PRINCIPLES - Apply to ALL tools
|
|
14
|
+
# ============================================================================
|
|
15
|
+
shared:
|
|
16
|
+
core_philosophy: |
|
|
17
|
+
These tools are designed for COMPREHENSIVE research through parallel processing.
|
|
18
|
+
Using minimal inputs wastes the parallel capacity. Always maximize input diversity.
|
|
19
|
+
|
|
20
|
+
principles:
|
|
21
|
+
diversity: |
|
|
22
|
+
Maximize input diversity. More items = better coverage = higher quality output.
|
|
23
|
+
Each input should target a DIFFERENT angle - no overlap, no duplicates.
|
|
24
|
+
|
|
25
|
+
iteration: |
|
|
26
|
+
ALWAYS use sequentialthinking between tool calls to:
|
|
27
|
+
- Evaluate what you found
|
|
28
|
+
- Identify gaps in coverage
|
|
29
|
+
- Notice new angles from results
|
|
30
|
+
- Decide whether to iterate or proceed
|
|
31
|
+
Results are feedback - use them to ask better questions!
|
|
32
|
+
|
|
33
|
+
parallel: |
|
|
34
|
+
All items process in parallel - no time penalty for more items.
|
|
35
|
+
Use the maximum recommended count for comprehensive coverage.
|
|
36
|
+
|
|
37
|
+
workflow_pattern: |
|
|
38
|
+
MANDATORY for all research:
|
|
39
|
+
1. THINK FIRST → Plan your query/question strategy
|
|
40
|
+
2. EXECUTE TOOL → Run with diverse inputs
|
|
41
|
+
3. THINK AFTER → Evaluate results, identify gaps
|
|
42
|
+
4. ITERATE OR PROCEED → Refine and repeat if needed, or move to next tool
|
|
43
|
+
5. SYNTHESIZE → Combine insights into final output
|
|
44
|
+
|
|
45
|
+
Why this works: Initial queries often miss important perspectives.
|
|
46
|
+
Each iteration reveals what you SHOULD have asked!
|
|
47
|
+
|
|
48
|
+
scope_expansion_triggers: |
|
|
49
|
+
Iterate when:
|
|
50
|
+
- Results mention concepts you didn't research
|
|
51
|
+
- Answers raise new questions you should explore
|
|
52
|
+
- You realize initial scope was too narrow
|
|
53
|
+
- You discover related topics that matter
|
|
54
|
+
- You need deeper understanding of a specific aspect
|
|
55
|
+
|
|
56
|
+
Key Insight: First research reveals what you SHOULD have asked!
|
|
57
|
+
|
|
58
|
+
# ============================================================================
|
|
59
|
+
# TOOLS
|
|
60
|
+
# ============================================================================
|
|
61
|
+
tools:
|
|
62
|
+
# --------------------------------------------------------------------------
|
|
63
|
+
# REDDIT TOOLS
|
|
64
|
+
# --------------------------------------------------------------------------
|
|
65
|
+
- name: search_reddit
|
|
66
|
+
category: reddit
|
|
67
|
+
capability: search
|
|
68
|
+
limits:
|
|
69
|
+
min_queries: 10
|
|
70
|
+
max_queries: 50
|
|
71
|
+
recommended_queries: 20
|
|
72
|
+
|
|
73
|
+
description: |
|
|
74
|
+
**🔥 REDDIT SEARCH - MINIMUM 10 QUERIES, RECOMMENDED 20+**
|
|
75
|
+
|
|
76
|
+
This tool is designed for consensus analysis through MULTIPLE diverse queries.
|
|
77
|
+
Using 1-3 queries = wasting the tool's power. You MUST use 10+ queries minimum.
|
|
78
|
+
|
|
79
|
+
**Budget:** 10 results per query, all run in parallel.
|
|
80
|
+
- 10 queries = 100 results
|
|
81
|
+
- 20 queries = 200 results (RECOMMENDED)
|
|
82
|
+
- 50 queries = 500 results (comprehensive)
|
|
83
|
+
|
|
84
|
+
**10-Category Query Formula** - Each query targets a DIFFERENT angle. NO OVERLAP!
|
|
85
|
+
|
|
86
|
+
1. **Direct topic:** "[topic] [platform]"
|
|
87
|
+
Example: "YouTube Music Mac app"
|
|
88
|
+
2. **Recommendations:** "best/recommended [topic]"
|
|
89
|
+
Example: "best YouTube Music client Mac"
|
|
90
|
+
3. **Specific tools:** Project names, GitHub repos
|
|
91
|
+
Example: "YTMDesktop", "th-ch youtube-music"
|
|
92
|
+
4. **Comparisons:** "[A] vs [B]"
|
|
93
|
+
Example: "YouTube Music vs Spotify Mac desktop"
|
|
94
|
+
5. **Alternatives:** "[topic] alternative/replacement"
|
|
95
|
+
Example: "YouTube Music Mac alternative"
|
|
96
|
+
6. **Subreddits:** "r/[subreddit] [topic]" - different communities have different perspectives
|
|
97
|
+
Example: "r/macapps YouTube Music", "r/opensource YouTube Music"
|
|
98
|
+
7. **Problems/Issues:** "[topic] issues/crashes/problems"
|
|
99
|
+
Example: "YouTube Music Mac crashes", "YTM desktop performance problems"
|
|
100
|
+
8. **Year-specific:** Add "2024" or "2025" for recent discussions
|
|
101
|
+
Example: "best YouTube Music Mac 2024"
|
|
102
|
+
9. **Features:** "[topic] [specific feature]"
|
|
103
|
+
Example: "YouTube Music offline Mac", "YTM lyrics desktop"
|
|
104
|
+
10. **Developer/GitHub:** "[topic] GitHub/open source/electron"
|
|
105
|
+
Example: "youtube-music electron GitHub", "YTM desktop open source"
|
|
106
|
+
|
|
107
|
+
**Search Operators:**
|
|
108
|
+
- `intitle:` - Search in post titles only
|
|
109
|
+
- `"exact phrase"` - Match exact phrase
|
|
110
|
+
- `OR` - Match either term
|
|
111
|
+
- `-exclude` - Exclude term
|
|
112
|
+
- All queries auto-add `site:reddit.com`
|
|
113
|
+
|
|
114
|
+
**Example showing all 10 categories:**
|
|
115
|
+
❌ BAD: `{"queries": ["best YouTube Music app"]}` → 1 vague query, misses 90% of consensus
|
|
116
|
+
✅ GOOD: `{"queries": ["YouTube Music Mac app", "best YTM client Mac", "YTMDesktop Mac", "YouTube Music vs Spotify Mac", "YouTube Music Mac alternative", "r/macapps YouTube Music", "YTM Mac crashes", "YouTube Music Mac 2024", "YTM offline Mac", "youtube-music GitHub", ...expand to 20 queries]}` → comprehensive multi-angle coverage
|
|
117
|
+
|
|
118
|
+
**Pro Tips:**
|
|
119
|
+
1. **Use ALL 10 categories** - Each reveals different community perspectives
|
|
120
|
+
2. **Target specific subreddits** - Different communities have different expertise
|
|
121
|
+
3. **Include year numbers** - "2024", "2025" filters for recent discussions
|
|
122
|
+
4. **Add comparison keywords** - "vs", "versus" find decision threads
|
|
123
|
+
5. **Include problem keywords** - "issue", "bug", "crash" find real experiences
|
|
124
|
+
6. **Vary phrasing** - "best", "top", "recommended" capture different discussions
|
|
125
|
+
7. **Use technical terms** - "electron", "GitHub", "API" find developer perspectives
|
|
126
|
+
8. **NO DUPLICATES** - Each query must target a unique angle
|
|
127
|
+
|
|
128
|
+
**Workflow:**
|
|
129
|
+
search_reddit → sequentialthinking (evaluate results) → get_reddit_post OR search again → sequentialthinking → synthesize
|
|
130
|
+
|
|
131
|
+
**REMEMBER:** More queries = better consensus detection = higher quality results!
|
|
132
|
+
|
|
133
|
+
parameters:
|
|
134
|
+
queries:
|
|
135
|
+
type: array
|
|
136
|
+
required: true
|
|
137
|
+
items:
|
|
138
|
+
type: string
|
|
139
|
+
validation:
|
|
140
|
+
minItems: 10
|
|
141
|
+
maxItems: 50
|
|
142
|
+
description: |
|
|
143
|
+
**10-50 diverse queries** (Minimum 10 required, 20-30 recommended)
|
|
144
|
+
|
|
145
|
+
Each query MUST target a different angle using the 10-category formula.
|
|
146
|
+
No duplicates, no overlap - maximize diversity for comprehensive consensus.
|
|
147
|
+
|
|
148
|
+
**Quick Checklist:**
|
|
149
|
+
✓ Direct topic queries
|
|
150
|
+
✓ Recommendation queries ("best", "recommended")
|
|
151
|
+
✓ Specific tool/project names
|
|
152
|
+
✓ Comparisons ("vs", "versus")
|
|
153
|
+
✓ Subreddit targeting ("r/...")
|
|
154
|
+
✓ Year-specific (2024, 2025)
|
|
155
|
+
✓ Problem keywords (issue, bug, crash)
|
|
156
|
+
✓ Feature keywords
|
|
157
|
+
✓ Developer angle (GitHub, open source)
|
|
158
|
+
|
|
159
|
+
date_after:
|
|
160
|
+
type: string
|
|
161
|
+
required: false
|
|
162
|
+
description: "Filter results after date (YYYY-MM-DD). Example: '2024-01-01'"
|
|
163
|
+
|
|
164
|
+
# --------------------------------------------------------------------------
|
|
165
|
+
- name: get_reddit_post
|
|
166
|
+
category: reddit
|
|
167
|
+
capability: reddit
|
|
168
|
+
limits:
|
|
169
|
+
min_urls: 2
|
|
170
|
+
max_urls: 50
|
|
171
|
+
recommended_urls: 20
|
|
172
|
+
default_max_comments: 1000
|
|
173
|
+
|
|
174
|
+
description: |
|
|
175
|
+
**🔥 FETCH REDDIT POSTS - 2-50 URLs, RECOMMENDED 10-20+**
|
|
176
|
+
|
|
177
|
+
This tool fetches Reddit posts with smart comment allocation.
|
|
178
|
+
Using 2-5 posts = missing community consensus. Use 10-20+ for broad perspective.
|
|
179
|
+
|
|
180
|
+
**Comment Budget:** 1,000 total comments distributed automatically across posts.
|
|
181
|
+
- 2 posts: ~500 comments/post (deep dive)
|
|
182
|
+
- 10 posts: ~100 comments/post (balanced)
|
|
183
|
+
- 20 posts: ~50 comments/post (RECOMMENDED: broad)
|
|
184
|
+
- 50 posts: ~20 comments/post (max coverage)
|
|
185
|
+
|
|
186
|
+
Comment allocation is AUTOMATIC - you don't need to calculate!
|
|
187
|
+
|
|
188
|
+
**When to use different post counts:**
|
|
189
|
+
|
|
190
|
+
**2-5 posts:** Deep dive on specific discussions
|
|
191
|
+
- Use when: You found THE perfect thread and want all comments
|
|
192
|
+
- Trade-off: Deep but narrow perspective
|
|
193
|
+
|
|
194
|
+
**10-15 posts:** Balanced depth + breadth (GOOD)
|
|
195
|
+
- Use when: Want good comment depth across multiple discussions
|
|
196
|
+
- Trade-off: Good balance of depth and coverage
|
|
197
|
+
|
|
198
|
+
**20-30 posts:** Broad community perspective (RECOMMENDED)
|
|
199
|
+
- Use when: Want to see consensus across many discussions
|
|
200
|
+
- Trade-off: Less comments per post but more diverse opinions
|
|
201
|
+
|
|
202
|
+
**40-50 posts:** Maximum coverage
|
|
203
|
+
- Use when: Researching controversial topic, need all perspectives
|
|
204
|
+
- Trade-off: Fewer comments per post but comprehensive coverage
|
|
205
|
+
|
|
206
|
+
**Example:**
|
|
207
|
+
❌ BAD: `{"urls": ["single_url"]}` → 1 perspective, could be biased/outdated
|
|
208
|
+
✅ GOOD: `{"urls": [20 URLs from diverse subreddits: programming, webdev, node, golang, devops, etc.]}` → comprehensive community perspective
|
|
209
|
+
|
|
210
|
+
**Pro Tips:**
|
|
211
|
+
1. **Use 10-20+ posts** - More posts = broader community perspective
|
|
212
|
+
2. **Mix subreddits** - Different communities have different expertise and perspectives
|
|
213
|
+
3. **Include various discussion types** - Best practices, comparisons, problems, solutions
|
|
214
|
+
4. **Let comment allocation auto-adjust** - Don't override max_comments unless needed
|
|
215
|
+
5. **Use after search_reddit** - Get URLs from search, then fetch full content here
|
|
216
|
+
|
|
217
|
+
**CRITICAL:** Comments often contain the BEST insights, solutions, and real-world experiences.
|
|
218
|
+
Always set fetch_comments=true unless you only need post titles.
|
|
219
|
+
|
|
220
|
+
**Workflow:** search_reddit (find posts) → get_reddit_post (fetch full content + comments)
|
|
221
|
+
|
|
222
|
+
parameters:
|
|
223
|
+
urls:
|
|
224
|
+
type: array
|
|
225
|
+
required: true
|
|
226
|
+
items:
|
|
227
|
+
type: string
|
|
228
|
+
validation:
|
|
229
|
+
minItems: 2
|
|
230
|
+
maxItems: 50
|
|
231
|
+
description: |
|
|
232
|
+
**2-50 Reddit post URLs** (Minimum 2, recommended 10-20)
|
|
233
|
+
|
|
234
|
+
More posts = broader community perspective and better consensus detection.
|
|
235
|
+
Get URLs from search_reddit results, then fetch full content here.
|
|
236
|
+
|
|
237
|
+
fetch_comments:
|
|
238
|
+
type: boolean
|
|
239
|
+
required: false
|
|
240
|
+
default: true
|
|
241
|
+
description: |
|
|
242
|
+
**Fetch comments from posts (RECOMMENDED: true)**
|
|
243
|
+
|
|
244
|
+
Comments often contain the BEST insights, solutions, and real-world experiences.
|
|
245
|
+
Set to true (default): Get post + comments
|
|
246
|
+
Set to false: Get post content only (faster but misses insights)
|
|
247
|
+
|
|
248
|
+
max_comments:
|
|
249
|
+
type: number
|
|
250
|
+
required: false
|
|
251
|
+
default: 100
|
|
252
|
+
description: |
|
|
253
|
+
**Override automatic comment allocation**
|
|
254
|
+
|
|
255
|
+
Leave empty for smart allocation based on post count:
|
|
256
|
+
- 2 posts: ~500 comments/post
|
|
257
|
+
- 10 posts: ~100 comments/post
|
|
258
|
+
- 20 posts: ~50 comments/post
|
|
259
|
+
|
|
260
|
+
Only override if you need specific comment depth.
|
|
261
|
+
|
|
262
|
+
# --------------------------------------------------------------------------
|
|
263
|
+
# DEEP RESEARCH TOOL
|
|
264
|
+
# --------------------------------------------------------------------------
|
|
265
|
+
- name: deep_research
|
|
266
|
+
category: research
|
|
267
|
+
capability: deepResearch
|
|
268
|
+
useZodSchema: true
|
|
269
|
+
zodSchemaRef: "deepResearchParamsSchema"
|
|
270
|
+
limits:
|
|
271
|
+
min_questions: 1
|
|
272
|
+
max_questions: 10
|
|
273
|
+
recommended_questions: 5
|
|
274
|
+
min_question_length: 200
|
|
275
|
+
min_specific_questions: 2
|
|
276
|
+
|
|
277
|
+
description: |
|
|
278
|
+
**🔥 DEEP RESEARCH - 2-10 QUESTIONS, RECOMMENDED 5+**
|
|
279
|
+
|
|
280
|
+
This tool runs 2-10 questions IN PARALLEL with AI-powered research.
|
|
281
|
+
Using 1-2 questions = wasting the parallel research capability!
|
|
282
|
+
|
|
283
|
+
**Token Budget:** 32,000 tokens distributed across questions.
|
|
284
|
+
- 2 questions: 16,000 tokens each (deep dive)
|
|
285
|
+
- 5 questions: 6,400 tokens each (RECOMMENDED: balanced)
|
|
286
|
+
- 10 questions: 3,200 tokens each (comprehensive multi-topic)
|
|
287
|
+
|
|
288
|
+
All questions research in PARALLEL - no time penalty for more questions!
|
|
289
|
+
|
|
290
|
+
**When to use this tool:**
|
|
291
|
+
- Multi-perspective analysis on related topics
|
|
292
|
+
- Researching a domain from multiple angles
|
|
293
|
+
- Validating understanding across different aspects
|
|
294
|
+
- Comparing approaches/technologies side-by-side
|
|
295
|
+
- Deep technical questions requiring comprehensive research
|
|
296
|
+
|
|
297
|
+
**Question Template** - Each question MUST include these sections:
|
|
298
|
+
|
|
299
|
+
1. **🎯 WHAT I NEED:** Clearly state what you're trying to achieve or understand
|
|
300
|
+
2. **🤔 WHY I'M RESEARCHING:** What decision does this inform? What problem are you solving?
|
|
301
|
+
3. **📚 WHAT I ALREADY KNOW:** Share current understanding so research fills gaps, not repeats basics
|
|
302
|
+
4. **🔧 HOW I'LL USE THIS:** Practical application - implementation, debugging, architecture
|
|
303
|
+
5. **❓ SPECIFIC QUESTIONS (2-5):** Break down into specific, pointed sub-questions
|
|
304
|
+
6. **🌐 PRIORITY SOURCES:** (optional) Preferred docs/sites to prioritize
|
|
305
|
+
7. **⚡ FOCUS AREAS:** (optional) What matters most - performance, security, etc.
|
|
306
|
+
|
|
307
|
+
**ATTACH FILES when asking about code - THIS IS MANDATORY:**
|
|
308
|
+
- 🐛 Bugs/errors → Attach the failing code
|
|
309
|
+
- ⚡ Performance issues → Attach the slow code paths
|
|
310
|
+
- ♻️ Refactoring → Attach current implementation
|
|
311
|
+
- 🔍 Code review → Attach code to review
|
|
312
|
+
- 🏗️ Architecture → Attach relevant modules
|
|
313
|
+
|
|
314
|
+
Research without code context for code questions is generic and unhelpful!
|
|
315
|
+
|
|
316
|
+
**Example:**
|
|
317
|
+
❌ BAD: `{"questions": [{"question": "Research React hooks"}]}` → 1 vague question, no template, no context, wastes 90% capacity
|
|
318
|
+
|
|
319
|
+
✅ GOOD:
|
|
320
|
+
```json
|
|
321
|
+
{"questions": [{
|
|
322
|
+
"question": "🎯 WHAT I NEED: Understand when to use useCallback vs useMemo in React 18\n\n🤔 WHY: Optimizing a data-heavy dashboard with 50+ components, seeing performance issues\n\n📚 WHAT I KNOW: Both memoize values, useCallback for functions, useMemo for computed values. Unclear when each actually prevents re-renders.\n\n🔧 HOW I'LL USE THIS: Refactor Dashboard.tsx to eliminate unnecessary re-renders\n\n❓ SPECIFIC QUESTIONS:\n1. When does useCallback actually prevent re-renders vs when it doesn't?\n2. Performance benchmarks: useCallback vs useMemo vs neither in React 18?\n3. Common anti-patterns that negate their benefits?\n4. How to measure if they're actually helping?\n\n🌐 PRIORITY: Official React docs, React team blog posts\n⚡ FOCUS: Patterns for frequently updating state"
|
|
323
|
+
}, ...add 4 more questions for comprehensive coverage]}
|
|
324
|
+
```
|
|
325
|
+
|
|
326
|
+
**Pro Tips:**
|
|
327
|
+
1. **Use 5-10 questions** - Maximize parallel research capacity
|
|
328
|
+
2. **Follow the template** - Include all 7 sections for each question
|
|
329
|
+
3. **Be specific** - Include version numbers, error codes, library names
|
|
330
|
+
4. **Add 2-5 sub-questions** - Break down what you need to know
|
|
331
|
+
5. **Attach files for code questions** - MANDATORY for bugs/performance/refactoring
|
|
332
|
+
6. **Describe files thoroughly** - Explain what the file is and what to focus on
|
|
333
|
+
7. **Specify focus areas** - "Focus on X, Y, Z" for prioritization
|
|
334
|
+
8. **Group related questions** - Research a domain from multiple angles
|
|
335
|
+
|
|
336
|
+
**Scope Expansion Triggers** - Iterate when:
|
|
337
|
+
- Results mention concepts you didn't research
|
|
338
|
+
- Answers raise new questions you should explore
|
|
339
|
+
- You realize initial scope was too narrow
|
|
340
|
+
- You discover related topics that matter
|
|
341
|
+
|
|
342
|
+
**Workflow:**
|
|
343
|
+
deep_research (3-5 questions) → sequentialthinking (evaluate, identify gaps) →
|
|
344
|
+
OPTIONAL: deep_research AGAIN with NEW questions based on learnings →
|
|
345
|
+
sequentialthinking (synthesize) → final decision
|
|
346
|
+
|
|
347
|
+
**REMEMBER:**
|
|
348
|
+
- ALWAYS think after getting results (digest and identify gaps!)
|
|
349
|
+
- DON'T assume first research is complete (iterate based on findings!)
|
|
350
|
+
- USE learnings to ask better questions (results = feedback!)
|
|
351
|
+
- EXPAND scope when results reveal new important areas!
|
|
352
|
+
|
|
353
|
+
schemaDescriptions:
|
|
354
|
+
questions: |
|
|
355
|
+
**2-10 structured questions following the template**
|
|
356
|
+
|
|
357
|
+
Each question should cover a different angle of your research topic.
|
|
358
|
+
Attach files for any code-related questions - this is mandatory!
|
|
359
|
+
|
|
360
|
+
file_attachments: |
|
|
361
|
+
**MANDATORY for code questions: bugs, performance, refactoring, code review, architecture**
|
|
362
|
+
|
|
363
|
+
Format: {path: "/absolute/path", description: "What this file is, why relevant, what to focus on", start_line?, end_line?}
|
|
364
|
+
|
|
365
|
+
Use absolute paths. Include thorough description explaining the file's relevance,
|
|
366
|
+
focus areas, and known issues.
|
|
367
|
+
|
|
368
|
+
# --------------------------------------------------------------------------
|
|
369
|
+
# SCRAPE LINKS TOOL
|
|
370
|
+
# --------------------------------------------------------------------------
|
|
371
|
+
- name: scrape_links
|
|
372
|
+
category: scrape
|
|
373
|
+
capability: scraping
|
|
374
|
+
useZodSchema: true
|
|
375
|
+
zodSchemaRef: "scrapeLinksParamsSchema"
|
|
376
|
+
limits:
|
|
377
|
+
min_urls: 1
|
|
378
|
+
max_urls: 50
|
|
379
|
+
recommended_urls: 5
|
|
380
|
+
min_extraction_prompt_length: 50
|
|
381
|
+
min_extraction_targets: 3
|
|
382
|
+
|
|
383
|
+
description: |
|
|
384
|
+
**🔥 WEB SCRAPING - 1-50 URLs, RECOMMENDED 3-5. ALWAYS use_llm=true**
|
|
385
|
+
|
|
386
|
+
This tool has TWO modes:
|
|
387
|
+
1. **Basic scraping** (use_llm=false) - Gets raw HTML/text - messy, requires manual parsing
|
|
388
|
+
2. **AI-powered extraction** (use_llm=true) - Intelligently extracts what you need ⭐ **USE THIS!**
|
|
389
|
+
|
|
390
|
+
**⚡ ALWAYS SET use_llm=true FOR INTELLIGENT EXTRACTION ⚡**
|
|
391
|
+
|
|
392
|
+
**Why use AI extraction (use_llm=true):**
|
|
393
|
+
- Filters out navigation, ads, footers automatically
|
|
394
|
+
- Extracts ONLY what you specify in what_to_extract
|
|
395
|
+
- Handles complex page structures intelligently
|
|
396
|
+
- Returns clean, structured content ready to use
|
|
397
|
+
- Saves hours of manual HTML parsing
|
|
398
|
+
- Cost: pennies (~$0.01 per 10 pages)
|
|
399
|
+
|
|
400
|
+
**Token Budget:** 32,000 tokens distributed across URLs.
|
|
401
|
+
- 3 URLs: ~10,666 tokens each (deep extraction)
|
|
402
|
+
- 5 URLs: ~6,400 tokens each (RECOMMENDED: balanced)
|
|
403
|
+
- 10 URLs: ~3,200 tokens each (detailed)
|
|
404
|
+
- 50 URLs: ~640 tokens each (quick scan)
|
|
405
|
+
|
|
406
|
+
**Extraction Prompt Formula:**
|
|
407
|
+
```
|
|
408
|
+
Extract [target1] | [target2] | [target3] | [target4] | [target5]
|
|
409
|
+
with focus on [aspect1], [aspect2], [aspect3]
|
|
410
|
+
```
|
|
411
|
+
|
|
412
|
+
**Extraction Rules:**
|
|
413
|
+
- Use pipe `|` to separate extraction targets
|
|
414
|
+
- Minimum 3 targets required
|
|
415
|
+
- Be SPECIFIC about what you want ("pricing tiers" not "pricing")
|
|
416
|
+
- Include "with focus on" to prioritize certain aspects
|
|
417
|
+
- More targets = more comprehensive extraction
|
|
418
|
+
- Aim for 5-10 extraction targets
|
|
419
|
+
|
|
420
|
+
**Extraction Templates by Domain:**
|
|
421
|
+
|
|
422
|
+
**Product Research:**
|
|
423
|
+
```
|
|
424
|
+
Extract pricing details | feature comparisons | user reviews | technical specifications |
|
|
425
|
+
integration options | support channels | deployment models | security features
|
|
426
|
+
with focus on enterprise capabilities, pricing transparency, and integration complexity
|
|
427
|
+
```
|
|
428
|
+
|
|
429
|
+
**Technical Documentation:**
|
|
430
|
+
```
|
|
431
|
+
Extract API endpoints | authentication methods | rate limits | error codes |
|
|
432
|
+
request examples | response schemas | SDK availability | webhook support
|
|
433
|
+
with focus on authentication flow, rate limiting policies, and error handling patterns
|
|
434
|
+
```
|
|
435
|
+
|
|
436
|
+
**Competitive Analysis:**
|
|
437
|
+
```
|
|
438
|
+
Extract product features | pricing models | target customers | unique selling points |
|
|
439
|
+
technology stack | customer testimonials | case studies | market positioning
|
|
440
|
+
with focus on differentiators, pricing strategy, and customer satisfaction
|
|
441
|
+
```
|
|
442
|
+
|
|
443
|
+
**Example:**
|
|
444
|
+
❌ BAD: `{"urls": ["url"], "use_llm": false, "what_to_extract": "get pricing"}` → raw HTML, vague prompt, 1 target, no focus areas
|
|
445
|
+
|
|
446
|
+
✅ GOOD: `{"urls": [5 URLs], "use_llm": true, "what_to_extract": "Extract pricing tiers | plan features | API rate limits | enterprise options | integration capabilities | user testimonials with focus on enterprise features, API limitations, and real-world performance data"}` → clean structured extraction
|
|
447
|
+
|
|
448
|
+
**Pro Tips:**
|
|
449
|
+
1. **ALWAYS use use_llm=true** - The AI extraction is the tool's superpower
|
|
450
|
+
2. **Use 3-10 URLs** - Balance between depth and breadth
|
|
451
|
+
3. **Specify 5-10 extraction targets** - More targets = more comprehensive
|
|
452
|
+
4. **Use pipe `|` separators** - Clearly separate each target
|
|
453
|
+
5. **Add focus areas** - "with focus on X, Y, Z" for prioritization
|
|
454
|
+
6. **Be specific** - "pricing tiers" not "pricing", "API rate limits" not "API info"
|
|
455
|
+
7. **Cover multiple aspects** - Features, pricing, technical, social proof
|
|
456
|
+
|
|
457
|
+
**Automatic Fallback:** Basic → JavaScript rendering → JavaScript + US geo-targeting
|
|
458
|
+
**Batching:** Max 30 concurrent requests (50 URLs = [30] then [20] batches)
|
|
459
|
+
|
|
460
|
+
**REMEMBER:** AI extraction costs pennies but saves hours of manual parsing!
|
|
461
|
+
|
|
462
|
+
schemaDescriptions:
|
|
463
|
+
urls: |
|
|
464
|
+
**1-50 URLs to scrape** (3-5 recommended for balanced depth/breadth)
|
|
465
|
+
|
|
466
|
+
More URLs = broader coverage but fewer tokens per URL.
|
|
467
|
+
- 3 URLs: ~10K tokens each (deep)
|
|
468
|
+
- 5 URLs: ~6K tokens each (balanced)
|
|
469
|
+
- 10 URLs: ~3K tokens each (detailed)
|
|
470
|
+
|
|
471
|
+
timeout: "Timeout per URL (5-120 seconds, default: 30)"
|
|
472
|
+
|
|
473
|
+
use_llm: |
|
|
474
|
+
**⚡ ALWAYS SET TO true FOR INTELLIGENT EXTRACTION ⚡**
|
|
475
|
+
|
|
476
|
+
Enables AI-powered content extraction that:
|
|
477
|
+
- Filters out navigation, ads, footers automatically
|
|
478
|
+
- Extracts ONLY what you specify in what_to_extract
|
|
479
|
+
- Handles complex page structures intelligently
|
|
480
|
+
- Returns clean, structured content
|
|
481
|
+
|
|
482
|
+
Cost: pennies (~$0.001 per page)
|
|
483
|
+
Default: false (but you should ALWAYS set it to true!)
|
|
484
|
+
|
|
485
|
+
what_to_extract: |
|
|
486
|
+
**Extraction prompt for AI (REQUIRED when use_llm=true)**
|
|
487
|
+
|
|
488
|
+
Formula: Extract [target1] | [target2] | [target3] with focus on [aspect1], [aspect2]
|
|
489
|
+
|
|
490
|
+
Requirements:
|
|
491
|
+
- Minimum 50 characters (be detailed!)
|
|
492
|
+
- Minimum 3 extraction targets separated by `|`
|
|
493
|
+
- Include "with focus on" for prioritization
|
|
494
|
+
- Be SPECIFIC about what you want
|
|
495
|
+
|
|
496
|
+
More specific targets = better extraction quality!
|
|
497
|
+
|
|
498
|
+
# --------------------------------------------------------------------------
|
|
499
|
+
# WEB SEARCH TOOL
|
|
500
|
+
# --------------------------------------------------------------------------
|
|
501
|
+
- name: web_search
|
|
502
|
+
category: search
|
|
503
|
+
capability: search
|
|
504
|
+
useZodSchema: true
|
|
505
|
+
zodSchemaRef: "webSearchParamsSchema"
|
|
506
|
+
limits:
|
|
507
|
+
min_keywords: 3
|
|
508
|
+
max_keywords: 100
|
|
509
|
+
recommended_keywords: 7
|
|
510
|
+
|
|
511
|
+
description: |
|
|
512
|
+
**🔥 WEB SEARCH - MINIMUM 3 KEYWORDS, RECOMMENDED 5-7**
|
|
513
|
+
|
|
514
|
+
This tool searches up to 100 keywords IN PARALLEL via Google.
|
|
515
|
+
Using 1-2 keywords = wasting the tool's parallel search power!
|
|
516
|
+
|
|
517
|
+
**Results Budget:** 10 results per keyword, all searches run in parallel.
|
|
518
|
+
- 3 keywords = 30 results (minimum)
|
|
519
|
+
- 7 keywords = 70 results (RECOMMENDED)
|
|
520
|
+
- 100 keywords = 1000 results (comprehensive)
|
|
521
|
+
|
|
522
|
+
**7-Perspective Keyword Formula** - Each keyword targets a DIFFERENT angle:
|
|
523
|
+
|
|
524
|
+
1. **Direct/Broad:** "[topic]"
|
|
525
|
+
Example: "React state management"
|
|
526
|
+
2. **Specific/Technical:** "[topic] [technical term]"
|
|
527
|
+
Example: "React useReducer vs Redux"
|
|
528
|
+
3. **Problem-Focused:** "[topic] issues/debugging/problems"
|
|
529
|
+
Example: "React state management performance issues"
|
|
530
|
+
4. **Best Practices:** "[topic] best practices [year]"
|
|
531
|
+
Example: "React state management best practices 2024"
|
|
532
|
+
5. **Comparison:** "[A] vs [B]"
|
|
533
|
+
Example: "React state management libraries comparison"
|
|
534
|
+
6. **Tutorial/Guide:** "[topic] tutorial/guide"
|
|
535
|
+
Example: "React state management tutorial"
|
|
536
|
+
7. **Advanced:** "[topic] patterns/architecture large applications"
|
|
537
|
+
Example: "React state management patterns large applications"
|
|
538
|
+
|
|
539
|
+
**Search Operators with Examples:**
|
|
540
|
+
- `site:domain.com` - Search within specific site
|
|
541
|
+
Example: `"React hooks" site:github.com` → React hooks repos on GitHub
|
|
542
|
+
- `"exact phrase"` - Match exact phrase
|
|
543
|
+
Example: `"Docker OOM" site:stackoverflow.com` → exact error discussions
|
|
544
|
+
- `-exclude` - Exclude term from results
|
|
545
|
+
Example: `React state management -Redux` → find alternatives to Redux
|
|
546
|
+
- `filetype:pdf` - Find specific file types
|
|
547
|
+
Example: `React tutorial filetype:pdf` → downloadable guides
|
|
548
|
+
- `OR` - Match either term
|
|
549
|
+
Example: `React OR Vue state management` → compare frameworks
|
|
550
|
+
|
|
551
|
+
**Keyword Patterns by Use Case:**
|
|
552
|
+
|
|
553
|
+
**Technology Research:**
|
|
554
|
+
`["PostgreSQL vs MySQL performance 2024", "PostgreSQL best practices production", "\"PostgreSQL\" site:github.com stars:>1000", "PostgreSQL connection pooling", "PostgreSQL vs MongoDB use cases"]`
|
|
555
|
+
|
|
556
|
+
**Problem Solving:**
|
|
557
|
+
`["Docker container memory leak debugging", "Docker memory limit not working", "\"Docker OOM\" site:stackoverflow.com", "Docker memory optimization best practices"]`
|
|
558
|
+
|
|
559
|
+
**Comparison Research:**
|
|
560
|
+
`["Next.js vs Remix performance", "Next.js 14 vs Remix 2024", "\"Next.js\" OR \"Remix\" benchmarks", "Next.js vs Remix developer experience"]`
|
|
561
|
+
|
|
562
|
+
**Example:**
|
|
563
|
+
❌ BAD: `{"keywords": ["React"]}` → 1 vague keyword, no operators, no diversity
|
|
564
|
+
|
|
565
|
+
✅ GOOD: `{"keywords": ["React state management best practices", "React useReducer vs Redux 2024", "React Context API performance", "Zustand React state library", "\"React state\" site:github.com", "React state management large applications", "React global state alternatives -Redux"]}` → 7 diverse angles with operators
|
|
566
|
+
|
|
567
|
+
**Pro Tips:**
|
|
568
|
+
1. **Use 5-7 keywords minimum** - Each reveals different perspective
|
|
569
|
+
2. **Add year numbers** - "2024", "2025" for recent content
|
|
570
|
+
3. **Use search operators** - site:, "exact", -exclude, filetype:
|
|
571
|
+
4. **Vary specificity** - Mix broad + specific keywords
|
|
572
|
+
5. **Include comparisons** - "vs", "versus", "compared to", "OR"
|
|
573
|
+
6. **Target sources** - site:github.com, site:stackoverflow.com
|
|
574
|
+
7. **Add context** - "best practices", "tutorial", "production", "performance"
|
|
575
|
+
8. **Think parallel** - Each keyword searches independently
|
|
576
|
+
|
|
577
|
+
**Workflow:**
|
|
578
|
+
web_search → sequentialthinking (evaluate which URLs look promising) →
|
|
579
|
+
scrape_links (MUST scrape promising URLs - that's where real content is!) →
|
|
580
|
+
sequentialthinking (evaluate scraped content) →
|
|
581
|
+
OPTIONAL: web_search again if gaps found → synthesize
|
|
582
|
+
|
|
583
|
+
**Why this workflow works:**
|
|
584
|
+
- Search results reveal new keywords you didn't think of
|
|
585
|
+
- Scraped content shows what's actually useful vs what looked good
|
|
586
|
+
- Thinking between tool calls prevents tunnel vision
|
|
587
|
+
- Iterative refinement = comprehensive coverage
|
|
588
|
+
|
|
589
|
+
**CRITICAL:**
|
|
590
|
+
- ALWAYS scrape after web_search - that's where the real content is!
|
|
591
|
+
- ALWAYS think between tool calls - evaluate and refine!
|
|
592
|
+
- DON'T stop after one search - iterate based on learnings!
|
|
593
|
+
|
|
594
|
+
**FOLLOW-UP:** Use `scrape_links` to extract full content from promising URLs!
|
|
595
|
+
|
|
596
|
+
schemaDescriptions:
|
|
597
|
+
keywords: |
|
|
598
|
+
**3-100 diverse keywords** (Minimum 3 required, 5-7 recommended)
|
|
599
|
+
|
|
600
|
+
Each keyword runs as a separate Google search in parallel.
|
|
601
|
+
Each keyword should target a different angle using the 7-perspective formula.
|
|
602
|
+
|
|
603
|
+
**Diversity Checklist:**
|
|
604
|
+
✓ Includes broad keyword
|
|
605
|
+
✓ Includes specific/technical keyword
|
|
606
|
+
✓ Includes comparison keyword (vs, OR)
|
|
607
|
+
✓ Includes best practices keyword
|
|
608
|
+
✓ Includes year-specific keyword (2024, 2025)
|
|
609
|
+
✓ Uses search operators (site:, "exact", -exclude)
|
|
610
|
+
✓ Targets specific sources (GitHub, Stack Overflow, docs)
|
|
611
|
+
|
|
612
|
+
**Search Operators:**
|
|
613
|
+
- `site:domain.com` - Search within site
|
|
614
|
+
- `"exact phrase"` - Match exact phrase
|
|
615
|
+
- `-exclude` - Exclude term
|
|
616
|
+
- `filetype:pdf` - Find file type
|
|
617
|
+
- `OR` - Match either term
|