research-powerpack-mcp 3.3.1 → 3.3.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (71) hide show
  1. package/README.md +43 -0
  2. package/dist/clients/reddit.d.ts +4 -3
  3. package/dist/clients/reddit.d.ts.map +1 -1
  4. package/dist/clients/research.d.ts +12 -2
  5. package/dist/clients/research.d.ts.map +1 -1
  6. package/dist/clients/research.js +108 -36
  7. package/dist/clients/research.js.map +1 -1
  8. package/dist/clients/scraper.d.ts +4 -3
  9. package/dist/clients/scraper.d.ts.map +1 -1
  10. package/dist/clients/search.d.ts +3 -2
  11. package/dist/clients/search.d.ts.map +1 -1
  12. package/dist/config/index.d.ts +3 -2
  13. package/dist/config/index.d.ts.map +1 -1
  14. package/dist/config/index.js +3 -3
  15. package/dist/config/index.js.map +1 -1
  16. package/dist/config/loader.d.ts +40 -0
  17. package/dist/config/loader.d.ts.map +1 -0
  18. package/dist/config/loader.js +300 -0
  19. package/dist/config/loader.js.map +1 -0
  20. package/dist/config/types.d.ts +80 -0
  21. package/dist/config/types.d.ts.map +1 -0
  22. package/dist/config/types.js +6 -0
  23. package/dist/config/types.js.map +1 -0
  24. package/dist/config/yaml/tools.yaml +1308 -0
  25. package/dist/index.js +13 -122
  26. package/dist/index.js.map +1 -1
  27. package/dist/schemas/deep-research.d.ts +0 -37
  28. package/dist/schemas/deep-research.d.ts.map +1 -1
  29. package/dist/schemas/deep-research.js +2 -1
  30. package/dist/schemas/deep-research.js.map +1 -1
  31. package/dist/schemas/scrape-links.d.ts +0 -6
  32. package/dist/schemas/scrape-links.d.ts.map +1 -1
  33. package/dist/schemas/scrape-links.js +1 -1
  34. package/dist/schemas/scrape-links.js.map +1 -1
  35. package/dist/schemas/web-search.d.ts +0 -3
  36. package/dist/schemas/web-search.d.ts.map +1 -1
  37. package/dist/schemas/web-search.js +3 -3
  38. package/dist/schemas/web-search.js.map +1 -1
  39. package/dist/services/llm-processor.d.ts +0 -10
  40. package/dist/services/llm-processor.d.ts.map +1 -1
  41. package/dist/services/llm-processor.js +2 -1
  42. package/dist/services/llm-processor.js.map +1 -1
  43. package/dist/tools/definitions.d.ts +12 -62
  44. package/dist/tools/definitions.d.ts.map +1 -1
  45. package/dist/tools/definitions.js +13 -121
  46. package/dist/tools/definitions.js.map +1 -1
  47. package/dist/tools/reddit.d.ts +2 -1
  48. package/dist/tools/reddit.d.ts.map +1 -1
  49. package/dist/tools/reddit.js +3 -3
  50. package/dist/tools/reddit.js.map +1 -1
  51. package/dist/tools/registry.d.ts +71 -0
  52. package/dist/tools/registry.d.ts.map +1 -0
  53. package/dist/tools/registry.js +238 -0
  54. package/dist/tools/registry.js.map +1 -0
  55. package/dist/tools/utils.d.ts +92 -0
  56. package/dist/tools/utils.d.ts.map +1 -0
  57. package/dist/tools/utils.js +142 -0
  58. package/dist/tools/utils.js.map +1 -0
  59. package/dist/utils/errors.d.ts +3 -51
  60. package/dist/utils/errors.d.ts.map +1 -1
  61. package/dist/utils/errors.js +10 -7
  62. package/dist/utils/errors.js.map +1 -1
  63. package/dist/utils/url-aggregator.d.ts +5 -4
  64. package/dist/utils/url-aggregator.d.ts.map +1 -1
  65. package/dist/utils/url-aggregator.js +30 -1
  66. package/dist/utils/url-aggregator.js.map +1 -1
  67. package/dist/version.d.ts +0 -10
  68. package/dist/version.d.ts.map +1 -1
  69. package/dist/version.js +1 -10
  70. package/dist/version.js.map +1 -1
  71. package/package.json +3 -2
@@ -0,0 +1,1308 @@
1
+ # Research Powerpack MCP Server - Tool Configuration
2
+ # Single source of truth for all tool metadata, descriptions, and parameter schemas
3
+ # Version: 1.0
4
+
5
+ version: "1.0"
6
+
7
+ metadata:
8
+ name: "research-powerpack-mcp"
9
+ description: "Research tools for AI assistants"
10
+
11
+ tools:
12
+ # ============================================================================
13
+ # REDDIT TOOLS
14
+ # ============================================================================
15
+
16
+ - name: search_reddit
17
+ category: reddit
18
+ capability: search
19
+
20
+ # Configurable limits
21
+ limits:
22
+ min_queries: 10
23
+ max_queries: 50
24
+ recommended_queries: 20
25
+
26
+ description: |
27
+ **🔥 AGGRESSIVE REDDIT RESEARCH - MINIMUM 10 QUERIES REQUIRED 🔥**
28
+
29
+ **CRITICAL:** This tool is DESIGNED for consensus analysis through MULTIPLE diverse queries.
30
+ Using 1-3 queries = WASTING the tool's power. You MUST use 10+ queries minimum.
31
+
32
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
33
+ 📊 **QUERY BUDGET ALLOCATION** (Use ALL your query slots!)
34
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
35
+
36
+ - **MINIMUM:** 10 queries (hard requirement)
37
+ - **RECOMMENDED:** 20-30 queries (optimal consensus detection)
38
+ - **MAXIMUM:** 50 queries (comprehensive deep research)
39
+
40
+ **TOKEN ALLOCATION:** 10 results per query, all queries run in parallel
41
+ - 10 queries = 100 total results
42
+ - 20 queries = 200 total results
43
+ - 50 queries = 500 total results
44
+
45
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
46
+ 🎯 **10-CATEGORY QUERY FORMULA** (Cover ALL categories!)
47
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
48
+
49
+ Each query should target a DIFFERENT angle. NO OVERLAP!
50
+
51
+ 1. **Direct Topic (3-5 queries):**
52
+ - "YouTube Music Mac app"
53
+ - "YTM desktop application"
54
+ - "YouTube Music client macOS"
55
+
56
+ 2. **Recommendations (3-5 queries):**
57
+ - "best YouTube Music client Mac"
58
+ - "recommended YTM desktop app"
59
+ - "top YouTube Music Mac applications"
60
+
61
+ 3. **Specific Tools/Projects (5-10 queries):**
62
+ - "YTMDesktop Mac"
63
+ - "th-ch youtube-music"
64
+ - "steve228uk YouTube Music"
65
+ - "youtube-music-desktop-app GitHub"
66
+
67
+ 4. **Comparisons (3-5 queries):**
68
+ - "YouTube Music vs Spotify Mac desktop"
69
+ - "YTM vs Apple Music desktop app"
70
+ - "YouTube Music desktop vs web player"
71
+
72
+ 5. **Alternatives (3-5 queries):**
73
+ - "YouTube Music Mac alternative"
74
+ - "YTM replacement desktop"
75
+ - "better than YouTube Music Mac"
76
+
77
+ 6. **Subreddit-Specific (5-10 queries):**
78
+ - "r/YoutubeMusic desktop app"
79
+ - "r/macapps YouTube Music"
80
+ - "r/opensource YouTube Music client"
81
+ - "r/software YouTube Music Mac"
82
+
83
+ 7. **Problems/Issues (3-5 queries):**
84
+ - "YouTube Music desktop app issues"
85
+ - "YTM Mac app crashes"
86
+ - "YouTube Music desktop performance problems"
87
+
88
+ 8. **Year-Specific for Recency (2-3 queries):**
89
+ - "best YouTube Music Mac app 2024"
90
+ - "YouTube Music desktop 2025"
91
+ - "YTM client 2024 recommendations"
92
+
93
+ 9. **Features (3-5 queries):**
94
+ - "YouTube Music offline Mac"
95
+ - "YTM lyrics desktop app"
96
+ - "YouTube Music desktop features"
97
+
98
+ 10. **Developer/GitHub (3-5 queries):**
99
+ - "youtube-music electron app GitHub"
100
+ - "YTM desktop open source"
101
+ - "YouTube Music API desktop client"
102
+
103
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
104
+ ❌ **BAD EXAMPLE** (DON'T DO THIS - Wastes the tool!)
105
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
106
+
107
+ ```json
108
+ {
109
+ "queries": ["best YouTube Music app"]
110
+ }
111
+ ```
112
+
113
+ **Why this is BAD:**
114
+ - Only 1 query (minimum is 10!)
115
+ - No diversity (missing 9 other categories)
116
+ - No subreddit targeting
117
+ - No year-specific queries
118
+ - No comparison queries
119
+ - Misses 90% of available consensus data
120
+
121
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
122
+ ✅ **GOOD EXAMPLE** (DO THIS - Uses tool properly!)
123
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
124
+
125
+ ```json
126
+ {
127
+ "queries": [
128
+ "YouTube Music Mac app",
129
+ "YTM desktop application",
130
+ "best YouTube Music client Mac",
131
+ "recommended YTM desktop",
132
+ "YTMDesktop Mac",
133
+ "th-ch youtube-music",
134
+ "YouTube Music vs Spotify Mac desktop",
135
+ "YTM vs Apple Music desktop",
136
+ "YouTube Music Mac alternative",
137
+ "r/YoutubeMusic desktop app",
138
+ "r/macapps YouTube Music",
139
+ "r/opensource YouTube Music",
140
+ "YouTube Music desktop issues",
141
+ "YTM Mac crashes",
142
+ "best YouTube Music Mac 2024",
143
+ "YouTube Music desktop 2025",
144
+ "YouTube Music offline Mac",
145
+ "YTM lyrics desktop",
146
+ "youtube-music electron GitHub",
147
+ "YTM desktop open source"
148
+ ]
149
+ }
150
+ ```
151
+
152
+ **Why this is GOOD:**
153
+ - 20 queries (optimal range)
154
+ - Covers ALL 10 categories
155
+ - Includes subreddit targeting (r/YoutubeMusic, r/macapps, r/opensource)
156
+ - Includes year-specific (2024, 2025)
157
+ - Includes comparisons (vs Spotify, vs Apple Music)
158
+ - Includes problems (issues, crashes)
159
+ - Includes features (offline, lyrics)
160
+ - Includes developer angle (GitHub, open source)
161
+ - Each query targets DIFFERENT angle
162
+ - Will find high-consensus posts across multiple perspectives
163
+
164
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
165
+ 💡 **PRO TIPS FOR MAXIMUM EFFECTIVENESS**
166
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
167
+
168
+ 1. **Use ALL 10 categories** - Each reveals different community perspectives
169
+ 2. **Target specific subreddits** - r/YoutubeMusic, r/macapps, r/opensource, r/software
170
+ 3. **Include year numbers** - "2024", "2025" for recent discussions
171
+ 4. **Add comparison keywords** - "vs", "versus", "compared to", "better than"
172
+ 5. **Include problem keywords** - "issue", "bug", "crash", "slow", "problem"
173
+ 6. **Vary your phrasing** - "best", "top", "recommended", "popular"
174
+ 7. **Use technical terms** - "electron", "GitHub", "API", "open source"
175
+ 8. **NO DUPLICATES** - Each query must be unique
176
+
177
+ **REMEMBER:** More queries = better consensus detection = higher quality results!
178
+
179
+ **OPERATORS SUPPORTED:**
180
+ - `intitle:` - Search in post titles only
181
+ - `"exact phrase"` - Match exact phrase
182
+ - `OR` - Match either term
183
+ - `-exclude` - Exclude term
184
+ - Auto-adds `site:reddit.com` to all queries
185
+
186
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
187
+ 🧠 **ITERATIVE WORKFLOW - THINK → SEARCH → THINK → REFINE → SEARCH AGAIN**
188
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
189
+
190
+ **CRITICAL:** Use sequential thinking BETWEEN tool calls to refine your approach!
191
+
192
+ **WORKFLOW PATTERN:**
193
+ ```
194
+ 1. THINK FIRST (1-2 thoughts via sequentialthinking)
195
+ → Analyze what you need to research
196
+ → Plan your initial query strategy
197
+ → Identify which categories to cover
198
+
199
+ 2. SEARCH (call search_reddit with 10-20 queries)
200
+ → Execute your planned queries
201
+ → Get initial results
202
+
203
+ 3. THINK AFTER RESULTS (2-3 thoughts via sequentialthinking)
204
+ → Evaluate what you found
205
+ → Identify gaps in coverage
206
+ → Notice new angles from results
207
+ → Decide: get_reddit_post OR search_reddit again with refined queries
208
+
209
+ 4. REFINE & ITERATE
210
+ Option A: Get full content
211
+ → Call get_reddit_post with promising URLs
212
+ → Think about insights
213
+ → Search again if gaps remain
214
+
215
+ Option B: Search again with new angles
216
+ → Call search_reddit with refined queries based on learnings
217
+ → Cover gaps you discovered
218
+ → Think and decide next step
219
+ ```
220
+
221
+ **WHY THIS WORKS:**
222
+ - Search results = feedback that reveals new angles
223
+ - Thinking between calls = space to evaluate and refine
224
+ - Humans don't search once and stop - neither should you!
225
+ - Initial queries might miss important perspectives
226
+ - Results often reveal better search terms
227
+
228
+ **EXAMPLE ITERATIVE FLOW:**
229
+ ```
230
+ Step 1: Think
231
+ "I need to research YouTube Music Mac apps. Let me start with direct,
232
+ recommendation, and comparison queries across 5 subreddits."
233
+
234
+ Step 2: search_reddit (10 queries)
235
+ [Direct: "YouTube Music Mac", Recommendations: "best YTM Mac",
236
+ Comparisons: "YTM vs Spotify Mac", etc.]
237
+
238
+ Step 3: Think (evaluate results)
239
+ "Results show people discussing 'th-ch/youtube-music' and 'YTMDesktop'
240
+ projects heavily. Also seeing complaints about performance. I should:
241
+ - Search specifically for these project names
242
+ - Add performance-focused queries
243
+ - Target r/opensource and r/electronjs subreddits"
244
+
245
+ Step 4: search_reddit AGAIN (10 refined queries)
246
+ [Specific: "th-ch youtube-music Mac", "YTMDesktop performance",
247
+ "r/opensource YouTube Music", "r/electronjs YTM", etc.]
248
+
249
+ Step 5: Think (evaluate combined results)
250
+ "Now I have comprehensive coverage. Top consensus posts are X, Y, Z.
251
+ Let me fetch full content from these 15 high-consensus posts."
252
+
253
+ Step 6: get_reddit_post (15 URLs from both searches)
254
+ [Fetch full content + comments from top posts]
255
+
256
+ Step 7: Think (final synthesis)
257
+ "Based on all results, the community consensus is..."
258
+ ```
259
+
260
+ **KEY INSIGHT:** Each search reveals new information that should inform your next search!
261
+
262
+ **MANDATORY WORKFLOW:**
263
+ ```
264
+ search_reddit → sequentialthinking (2-3 thoughts) →
265
+ EITHER get_reddit_post OR search_reddit again →
266
+ sequentialthinking → final decision
267
+ ```
268
+
269
+ parameters:
270
+ queries:
271
+ type: array
272
+ required: true
273
+ items:
274
+ type: string
275
+ validation:
276
+ minItems: 10 # HARD MINIMUM - enforced
277
+ maxItems: 50 # HARD MAXIMUM - enforced
278
+ description: |
279
+ **PROVIDE 10-50 DIVERSE QUERIES** (Minimum 10 required, 20-30 recommended)
280
+
281
+ Each query MUST target a different angle. Use the 10-category formula above.
282
+
283
+ **VALIDATION RULES:**
284
+ - Minimum 10 queries (you'll get an error with less)
285
+ - Maximum 50 queries (you'll get an error with more)
286
+ - Each query should be unique (avoid duplicates)
287
+ - Cover multiple categories from the 10-category formula
288
+
289
+ **QUICK CHECKLIST:**
290
+ ✓ At least 10 queries total
291
+ ✓ Includes direct topic queries
292
+ ✓ Includes recommendation queries
293
+ ✓ Includes specific tool names
294
+ ✓ Includes comparisons (vs, versus)
295
+ ✓ Includes subreddit targeting (r/...)
296
+ ✓ Includes year-specific (2024, 2025)
297
+ ✓ Includes problem keywords (issue, bug, crash)
298
+ ✓ Includes feature keywords (offline, lyrics, etc.)
299
+ ✓ Includes developer angle (GitHub, open source)
300
+
301
+ date_after:
302
+ type: string
303
+ required: false
304
+ description: "Filter results after date (YYYY-MM-DD format). Optional. Example: '2024-01-01'"
305
+
306
+ - name: get_reddit_post
307
+ category: reddit
308
+ capability: reddit
309
+
310
+ # Configurable limits
311
+ limits:
312
+ min_urls: 2
313
+ max_urls: 50
314
+ recommended_urls: 20
315
+ default_max_comments: 100
316
+
317
+ description: |
318
+ **🔥 FETCH REDDIT POSTS - MAXIMIZE POST COUNT FOR BROAD PERSPECTIVE 🔥**
319
+
320
+ **CRITICAL:** This tool fetches 2-50 Reddit posts with smart comment allocation.
321
+ Using 2-5 posts = MISSING community consensus. Use 10-20+ posts for comprehensive perspective!
322
+
323
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
324
+ 📊 **SMART COMMENT BUDGET** (1,000 comments distributed automatically)
325
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
326
+
327
+ - 2 posts: ~500 comments/post (deep dive on specific posts)
328
+ - 10 posts: ~100 comments/post (balanced - GOOD)
329
+ - 20 posts: ~50 comments/post (broad perspective - RECOMMENDED)
330
+ - 50 posts: ~20 comments/post (maximum coverage)
331
+
332
+ **Comment allocation is AUTOMATIC** - you don't need to calculate!
333
+
334
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
335
+ 🎯 **WHEN TO USE DIFFERENT POST COUNTS**
336
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
337
+
338
+ **2-5 posts:** Deep dive on specific discussions
339
+ - Use when: You found THE perfect thread and want all comments
340
+ - Trade-off: Deep but narrow perspective
341
+
342
+ **10-15 posts:** Balanced depth + breadth (GOOD)
343
+ - Use when: Want good comment depth across multiple discussions
344
+ - Trade-off: Good balance
345
+
346
+ **20-30 posts:** Broad community perspective (RECOMMENDED)
347
+ - Use when: Want to see consensus across many discussions
348
+ - Trade-off: Less comments per post but more diverse opinions
349
+
350
+ **40-50 posts:** Maximum coverage
351
+ - Use when: Researching controversial topic, need all perspectives
352
+ - Trade-off: Fewer comments per post but comprehensive coverage
353
+
354
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
355
+ ❌ **BAD EXAMPLE** (DON'T DO THIS - Misses community consensus!)
356
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
357
+
358
+ ```json
359
+ {
360
+ "urls": [
361
+ "https://reddit.com/r/programming/comments/abc123/best_database/"
362
+ ],
363
+ "fetch_comments": true
364
+ }
365
+ ```
366
+
367
+ **Why this is BAD:**
368
+ - Only 1 URL (minimum is 2, should use 10-20+)
369
+ - Misses other community discussions
370
+ - Single perspective (could be biased/outdated)
371
+ - Not using the tool's multi-post aggregation power
372
+
373
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
374
+ ✅ **GOOD EXAMPLE** (DO THIS - Gets broad community perspective!)
375
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
376
+
377
+ ```json
378
+ {
379
+ "urls": [
380
+ "https://reddit.com/r/programming/comments/abc123/best_database/",
381
+ "https://reddit.com/r/webdev/comments/def456/database_recommendations/",
382
+ "https://reddit.com/r/node/comments/ghi789/postgresql_vs_mysql/",
383
+ "https://reddit.com/r/golang/comments/jkl012/database_choice/",
384
+ "https://reddit.com/r/rails/comments/mno345/production_database/",
385
+ "https://reddit.com/r/django/comments/pqr678/database_setup/",
386
+ "https://reddit.com/r/dotnet/comments/stu901/database_performance/",
387
+ "https://reddit.com/r/java/comments/vwx234/database_scaling/",
388
+ "https://reddit.com/r/devops/comments/yza567/database_reliability/",
389
+ "https://reddit.com/r/aws/comments/bcd890/rds_vs_aurora/",
390
+ "https://reddit.com/r/selfhosted/comments/efg123/database_hosting/",
391
+ "https://reddit.com/r/sysadmin/comments/hij456/database_backup/",
392
+ "https://reddit.com/r/docker/comments/klm789/database_containers/",
393
+ "https://reddit.com/r/kubernetes/comments/nop012/database_k8s/",
394
+ "https://reddit.com/r/database/comments/qrs345/postgresql_tips/",
395
+ "https://reddit.com/r/sql/comments/tuv678/query_optimization/",
396
+ "https://reddit.com/r/PostgreSQL/comments/wxy901/production_setup/",
397
+ "https://reddit.com/r/mysql/comments/zab234/performance_tuning/",
398
+ "https://reddit.com/r/mongodb/comments/cde567/use_cases/",
399
+ "https://reddit.com/r/redis/comments/fgh890/caching_strategies/"
400
+ ],
401
+ "fetch_comments": true,
402
+ "max_comments": 100
403
+ }
404
+ ```
405
+
406
+ **Why this is GOOD:**
407
+ - 20 posts (optimal for broad perspective)
408
+ - Covers multiple subreddits (programming, webdev, node, golang, etc.)
409
+ - Different discussion angles (best practices, vs comparisons, production, performance)
410
+ - Will reveal community consensus across diverse communities
411
+ - Automatic comment allocation (~50 comments per post)
412
+ - Comprehensive coverage of the topic
413
+
414
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
415
+ 🚀 **PRO TIPS FOR MAXIMUM EFFECTIVENESS**
416
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
417
+
418
+ 1. **Use 10-20+ posts** - More posts = broader community perspective
419
+ 2. **Mix subreddits** - Different communities have different perspectives
420
+ 3. **Include various discussion types** - Best practices, comparisons, problems, solutions
421
+ 4. **Let comment allocation auto-adjust** - Don't override max_comments unless needed
422
+ 5. **Use after search_reddit** - Get URLs from search, then fetch full content
423
+ 6. **fetch_comments=true** - Comments often have the best insights
424
+
425
+ **REMEMBER:** More posts = better consensus detection = higher confidence in findings!
426
+
427
+ **WORKFLOW:** search_reddit (find posts) → get_reddit_post (fetch full content + comments)
428
+
429
+ parameters:
430
+ urls:
431
+ type: array
432
+ required: true
433
+ items:
434
+ type: string
435
+ validation:
436
+ minItems: 2
437
+ maxItems: 50
438
+ description: |
439
+ **Reddit post URLs (MINIMUM 2, RECOMMENDED 10-20, MAX 50)**
440
+
441
+ More posts = broader community perspective and better consensus detection.
442
+
443
+ **VALIDATION:**
444
+ - Minimum 2 URLs required
445
+ - Maximum 50 URLs allowed
446
+ - Each URL must be a valid Reddit post URL
447
+
448
+ **RECOMMENDED USAGE:**
449
+ - 2-5 posts: Deep dive on specific discussions
450
+ - 10-15 posts: Balanced depth + breadth
451
+ - 20-30 posts: Broad community perspective (RECOMMENDED)
452
+ - 40-50 posts: Maximum coverage
453
+
454
+ **PRO TIP:** Get URLs from search_reddit results, then fetch full content here!
455
+
456
+ fetch_comments:
457
+ type: boolean
458
+ required: false
459
+ default: true
460
+ description: |
461
+ **Fetch comments from posts (RECOMMENDED: true)**
462
+
463
+ Comments often contain the BEST insights, solutions, and real-world experiences.
464
+
465
+ **Set to true (default):** Get post + comments
466
+ **Set to false:** Get post content only (faster but misses insights)
467
+
468
+ **PRO TIP:** Always keep this true unless you only need post titles/content!
469
+
470
+ max_comments:
471
+ type: number
472
+ required: false
473
+ default: 100
474
+ description: |
475
+ **Override automatic comment allocation (DEFAULT: 100, auto-adjusts based on post count)**
476
+
477
+ Leave empty for smart allocation:
478
+ - 2 posts: ~500 comments/post
479
+ - 10 posts: ~100 comments/post
480
+ - 20 posts: ~50 comments/post
481
+ - 50 posts: ~20 comments/post
482
+
483
+ **Only override if:** You need specific comment depth
484
+
485
+ **PRO TIP:** Let the tool auto-allocate for optimal results!
486
+
487
+ # ============================================================================
488
+ # DEEP RESEARCH TOOL
489
+ # ============================================================================
490
+
491
+ - name: deep_research
492
+ category: research
493
+ capability: deepResearch
494
+ # Complex schema - use existing Zod schema, descriptions injected from YAML
495
+ useZodSchema: true
496
+ zodSchemaRef: "deepResearchParamsSchema"
497
+
498
+ # Configurable limits
499
+ limits:
500
+ min_questions: 1
501
+ max_questions: 10
502
+ recommended_questions: 5
503
+ min_question_length: 200
504
+ min_specific_questions: 2
505
+
506
+ description: |
507
+ **🔥 BATCH DEEP RESEARCH - USE ALL 10 QUESTION SLOTS FOR COMPREHENSIVE COVERAGE 🔥**
508
+
509
+ **CRITICAL:** This tool runs 2-10 questions IN PARALLEL with AI-powered research.
510
+ Using 1-2 questions = WASTING the parallel research capability!
511
+
512
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
513
+ 📊 **TOKEN BUDGET ALLOCATION** (32,000 tokens distributed across questions)
514
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
515
+
516
+ - 2 questions: 16,000 tokens/question (deep dive)
517
+ - 5 questions: 6,400 tokens/question (balanced - RECOMMENDED)
518
+ - 10 questions: 3,200 tokens/question (comprehensive multi-topic)
519
+
520
+ **All questions research in PARALLEL** - no time penalty for more questions!
521
+
522
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
523
+ 🎯 **WHEN TO USE THIS TOOL**
524
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
525
+
526
+ ✅ **USE for:**
527
+ - Multi-perspective analysis on related topics
528
+ - Researching a domain from multiple angles
529
+ - Validating understanding across different aspects
530
+ - Comparing approaches/technologies side-by-side
531
+ - Deep technical questions requiring comprehensive research
532
+
533
+ ⚠️ **ALWAYS ATTACH FILES when asking about:**
534
+ - 🐛 Bugs/errors → Attach the failing code
535
+ - ⚡ Performance issues → Attach the slow code paths
536
+ - ♻️ Refactoring → Attach current implementation
537
+ - 🔍 Code review → Attach code to review
538
+ - 🏗️ Architecture questions about YOUR code → Attach relevant modules
539
+
540
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
541
+ 📋 **REQUIRED QUESTION TEMPLATE** (Follow this structure!)
542
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
543
+
544
+ **Each question MUST include these sections:**
545
+
546
+ **1. 🎯 WHAT I NEED:**
547
+ [Clearly state what you're trying to achieve, solve, or understand]
548
+
549
+ **2. 🤔 WHY I'M RESEARCHING THIS:**
550
+ [Explain the context - what decision does this inform? What problem are you solving?]
551
+
552
+ **3. 📚 WHAT I ALREADY KNOW:**
553
+ [Share your current understanding so research fills gaps, not repeats basics]
554
+
555
+ **4. 🔧 HOW I PLAN TO USE THIS:**
556
+ [Describe the practical application - implementation, debugging, architecture, etc.]
557
+
558
+ **5. ❓ SPECIFIC QUESTIONS (2-5):**
559
+ - Question 1: [Specific, pointed question]
560
+ - Question 2: [Another specific question]
561
+ - Question 3: [etc.]
562
+
563
+ **6. 🌐 PRIORITY SOURCES (optional):**
564
+ [Sites/docs to prioritize: "Prefer official React docs, GitHub issues, Stack Overflow"]
565
+
566
+ **7. ⚡ PRIORITY INFO (optional):**
567
+ [What matters most: "Focus on performance implications" or "Prioritize security best practices"]
568
+
569
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
570
+ ❌ **BAD EXAMPLE** (DON'T DO THIS - Wastes the tool!)
571
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
572
+
573
+ ```json
574
+ {
575
+ "questions": [
576
+ {
577
+ "question": "Research React hooks"
578
+ }
579
+ ]
580
+ }
581
+ ```
582
+
583
+ **Why this is BAD:**
584
+ - Only 1 question (should use 5-10 for comprehensive coverage)
585
+ - Too vague ("Research React hooks")
586
+ - No template sections (missing WHY, WHAT I KNOW, etc.)
587
+ - No specific sub-questions
588
+ - No file attachments (if this is about YOUR code)
589
+ - Wastes 90% of available research capacity
590
+
591
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
592
+ ✅ **GOOD EXAMPLE** (DO THIS - Uses tool properly!)
593
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
594
+
595
+ ```json
596
+ {
597
+ "questions": [
598
+ {
599
+ "question": "🎯 WHAT I NEED: Understand when to use useCallback vs useMemo in React 18\n\n🤔 WHY: Optimizing a data-heavy dashboard with 50+ components, seeing performance issues\n\n📚 WHAT I KNOW: Both memoize values, useCallback for functions, useMemo for computed values. Unclear when each prevents re-renders.\n\n🔧 HOW I'LL USE THIS: Refactor Dashboard.tsx to eliminate unnecessary re-renders\n\n❓ SPECIFIC QUESTIONS:\n1. When does useCallback actually prevent re-renders vs when it doesn't?\n2. Performance benchmarks: useCallback vs useMemo vs neither in React 18?\n3. Common anti-patterns that negate their benefits?\n4. How to measure if they're actually helping?\n5. Best practices for large component trees?\n\n🌐 PRIORITY: Official React docs, React team blog posts, performance case studies\n⚡ FOCUS: Patterns for frequently updating state"
600
+ },
601
+ {
602
+ "question": "🎯 WHAT I NEED: Best practices for React Context API with frequent updates\n\n🤔 WHY: Dashboard uses Context for filter state, causing full re-renders\n\n📚 WHAT I KNOW: Context triggers re-render of all consumers. Can split contexts or use useMemo.\n\n🔧 HOW I'LL USE THIS: Redesign FilterContext to minimize re-renders\n\n❓ SPECIFIC QUESTIONS:\n1. How to structure Context to avoid unnecessary re-renders?\n2. When to split one Context into multiple?\n3. Context + useReducer vs external state library?\n4. Performance comparison: Context vs Zustand vs Redux?\n\n🌐 PRIORITY: React docs, Kent C. Dodds articles, real-world examples\n⚡ FOCUS: Patterns for frequently updating state"
603
+ },
604
+ {
605
+ "question": "🎯 WHAT I NEED: Virtualization strategy for rendering 10,000+ rows\n\n🤔 WHY: DataGrid component freezes when displaying large datasets\n\n📚 WHAT I KNOW: react-window and react-virtualized exist. Need to understand tradeoffs.\n\n🔧 HOW I'LL USE THIS: Implement virtualization in DataGrid.tsx\n\n❓ SPECIFIC QUESTIONS:\n1. react-window vs react-virtualized in 2024?\n2. How to handle dynamic row heights?\n3. Integration with React 18 concurrent features?\n4. Performance impact of virtualization overhead?\n\n🌐 PRIORITY: Library docs, performance benchmarks, production examples"
606
+ }
607
+ ]
608
+ }
609
+ ```
610
+
611
+ **Why this is GOOD:**
612
+ - 3 questions (good coverage, could use up to 10)
613
+ - Each follows the template structure
614
+ - Specific context and WHY for each
615
+ - 2-5 specific sub-questions per topic
616
+ - File attachments for code-related question
617
+ - Detailed file descriptions
618
+ - Focus areas specified
619
+ - Will get comprehensive, actionable research
620
+
621
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
622
+ 🚀 **PRO TIPS FOR MAXIMUM EFFECTIVENESS**
623
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
624
+
625
+ 1. **Use 5-10 questions** - Maximize parallel research capacity
626
+ 2. **Follow the template** - Include all 7 sections for each question
627
+ 3. **Be specific** - Include version numbers, error codes, library names
628
+ 4. **Add 2-5 sub-questions** - Break down what you need to know
629
+ 5. **Attach files for code questions** - MANDATORY for bugs/performance/refactoring
630
+ 6. **Describe files thoroughly** - Use numbered sections [1] [2] [3] [4] [5]
631
+ 7. **Specify focus areas** - "Focus on X, Y, Z" for prioritization
632
+ 8. **Group related questions** - Research a domain from multiple angles
633
+
634
+ **REMEMBER:** More questions = more comprehensive research = better decisions!
635
+
636
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
637
+ 🧠 **ITERATIVE WORKFLOW - THINK → RESEARCH → THINK → EXPAND → RESEARCH AGAIN**
638
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
639
+
640
+ **CRITICAL:** Use sequential thinking BETWEEN research calls to expand scope based on findings!
641
+
642
+ **WORKFLOW PATTERN:**
643
+ ```
644
+ 1. THINK FIRST (1-2 thoughts via sequentialthinking)
645
+ → Analyze what you need to research
646
+ → Plan initial 3-5 questions
647
+ → Identify knowledge gaps
648
+
649
+ 2. RESEARCH (call deep_research with 3-5 questions)
650
+ → Execute your planned questions
651
+ → Get comprehensive research results
652
+
653
+ 3. THINK AFTER RESULTS (2-3 thoughts via sequentialthinking)
654
+ → Evaluate what you learned
655
+ → Identify NEW questions that emerged
656
+ → Notice gaps or deeper areas to explore
657
+ → Decide: sufficient OR need more research
658
+
659
+ 4. EXPAND SCOPE & ITERATE (if gaps found)
660
+ → Call deep_research AGAIN with 3-5 NEW questions
661
+ → Based on learnings from first research
662
+ → Explore deeper or adjacent topics
663
+ → Think and synthesize
664
+ ```
665
+
666
+ **WHY THIS WORKS:**
667
+ - Research results reveal questions you didn't know to ask
668
+ - Thinking between calls = space to digest and identify gaps
669
+ - Initial questions might be too narrow or miss key aspects
670
+ - Results often reveal more important questions
671
+ - Iterative deepening = comprehensive understanding
672
+
673
+ **EXAMPLE ITERATIVE FLOW:**
674
+ ```
675
+ Step 1: Think
676
+ "Need to understand React performance optimization. Start with
677
+ memoization, Context API, and virtualization questions."
678
+
679
+ Step 2: deep_research (3 questions)
680
+ [Q1: useCallback vs useMemo, Q2: Context API patterns,
681
+ Q3: Virtualization libraries]
682
+
683
+ Step 3: Think (evaluate results)
684
+ "Research revealed:
685
+ - React 18 concurrent features are important for performance
686
+ - Suspense + transitions affect optimization strategy
687
+ - Server Components change the game
688
+
689
+ NEW questions emerged:
690
+ - How do concurrent features interact with memoization?
691
+ - When to use Server Components vs Client Components?
692
+ - Suspense boundaries impact on performance?
693
+
694
+ Should research these deeper aspects now."
695
+
696
+ Step 4: deep_research AGAIN (3 NEW questions)
697
+ [Q1: React 18 concurrent features + memoization,
698
+ Q2: Server vs Client Components performance,
699
+ Q3: Suspense boundaries optimization]
700
+
701
+ Step 5: Think (evaluate combined results)
702
+ "Now have complete picture. Initial research gave fundamentals,
703
+ second research covered advanced patterns. Ready to implement."
704
+ ```
705
+
706
+ **KEY INSIGHT:** First research reveals what you SHOULD have asked!
707
+
708
+ **SCOPE EXPANSION TRIGGERS:**
709
+ - Results mention concepts you didn't research
710
+ - Answers raise new questions
711
+ - Realize initial scope was too narrow
712
+ - Discover related topics that matter
713
+ - Need deeper understanding of specific aspect
714
+
715
+ **MANDATORY WORKFLOW:**
716
+ ```
717
+ deep_research (3-5 questions) →
718
+ sequentialthinking (2-3 thoughts, evaluate results) →
719
+ OPTIONAL: deep_research again (3-5 NEW questions based on learnings) →
720
+ sequentialthinking (synthesize) →
721
+ final decision
722
+ ```
723
+
724
+ **REMEMBER:**
725
+ - ALWAYS think after getting results (digest and identify gaps!)
726
+ - DON'T assume first research is complete (iterate based on findings!)
727
+ - USE learnings to ask better questions (results = feedback!)
728
+ - EXPAND scope when results reveal new important areas!
729
+
730
+ # Schema descriptions to inject into existing Zod schema
731
+ schemaDescriptions:
732
+ urls: |
733
+ **URLs to scrape (1-50 URLs recommended: 3-5 for balanced depth/breadth)**
734
+
735
+ More URLs = broader coverage but fewer tokens per URL
736
+ - 3 URLs: ~10K tokens each (deep extraction)
737
+ - 5 URLs: ~6K tokens each (balanced - RECOMMENDED)
738
+ - 10 URLs: ~3K tokens each (detailed)
739
+ - 50 URLs: ~640 tokens each (quick scan)
740
+
741
+ **VALIDATION:**
742
+ - Minimum 1 URL required
743
+ - Maximum 50 URLs allowed
744
+ - Each URL must be valid HTTP/HTTPS
745
+
746
+ timeout: "Timeout in seconds for each URL (5-120 seconds, default: 30)"
747
+
748
+ use_llm: |
749
+ **Enable AI-powered content extraction (HIGHLY RECOMMENDED - set to true)**
750
+
751
+ ⚡ **ALWAYS SET THIS TO true FOR INTELLIGENT EXTRACTION** ⚡
752
+
753
+ **Benefits of use_llm=true:**
754
+ - Filters out navigation, ads, footers automatically
755
+ - Extracts ONLY what you specify in what_to_extract
756
+ - Handles complex page structures intelligently
757
+ - Returns clean, structured content
758
+ - Saves manual HTML parsing
759
+
760
+ **Cost:** ~$0.001 per page (pennies for quality extraction)
761
+ **When to use false:** Only for debugging or raw HTML needs
762
+
763
+ **Default:** false (but you should set it to true!)
764
+
765
+ what_to_extract: |
766
+ **Extraction prompt for AI (REQUIRED when use_llm=true)**
767
+
768
+ **FORMULA:** Extract [target1] | [target2] | [target3] with focus on [aspect1], [aspect2]
769
+
770
+ **REQUIREMENTS:**
771
+ - Minimum 50 characters (be detailed!)
772
+ - Minimum 3 extraction targets separated by `|`
773
+ - Include "with focus on" for prioritization
774
+ - Be SPECIFIC about what you want
775
+
776
+ **EXAMPLES:**
777
+
778
+ ✅ GOOD: "Extract pricing tiers | plan features | API rate limits | enterprise options |
779
+ integration capabilities | user testimonials | technical requirements | performance benchmarks
780
+ with focus on enterprise features, API limitations, and real-world performance data"
781
+
782
+ ❌ BAD: "get main content" (too vague, no targets, no focus)
783
+
784
+ **PRO TIP:** More specific targets = better extraction quality!
785
+
786
+ # ============================================================================
787
+ # SCRAPE LINKS TOOL
788
+ # ============================================================================
789
+
790
+ - name: scrape_links
791
+ category: scrape
792
+ capability: scraping
793
+ useZodSchema: true
794
+ zodSchemaRef: "scrapeLinksParamsSchema"
795
+
796
+ # Configurable limits
797
+ limits:
798
+ min_urls: 1
799
+ max_urls: 50
800
+ recommended_urls: 5
801
+ min_extraction_prompt_length: 50
802
+ min_extraction_targets: 3
803
+
804
+ description: |
805
+ **🔥 INTELLIGENT WEB SCRAPING - ALWAYS USE AI EXTRACTION 🔥**
806
+
807
+ **CRITICAL:** This tool has TWO modes:
808
+ 1. **Basic scraping** (use_llm=false) - Gets raw HTML/text
809
+ 2. **AI-powered extraction** (use_llm=true) - Intelligently extracts what you need ⭐ **USE THIS!**
810
+
811
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
812
+ ⚡ **ALWAYS SET use_llm=true FOR INTELLIGENT EXTRACTION** ⚡
813
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
814
+
815
+ **Why use AI extraction:**
816
+ - Filters out navigation, ads, footers automatically
817
+ - Extracts ONLY what you specify
818
+ - Handles complex page structures
819
+ - Returns clean, structured content
820
+ - Saves you from parsing HTML manually
821
+
822
+ **Cost:** Minimal (~$0.01 per 10 pages with quality extraction)
823
+ **Benefit:** 10x better results than raw scraping
824
+
825
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
826
+ 📊 **TOKEN ALLOCATION** (32,000 tokens distributed across URLs)
827
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
828
+
829
+ - 3 URLs: ~10,666 tokens/URL (deep extraction)
830
+ - 5 URLs: ~6,400 tokens/URL (balanced - RECOMMENDED)
831
+ - 10 URLs: ~3,200 tokens/URL (detailed)
832
+ - 50 URLs: ~640 tokens/URL (high-level scan)
833
+
834
+ **AUTOMATIC FALLBACK:** Basic → JavaScript rendering → JavaScript + US geo-targeting
835
+ **BATCHING:** Max 30 concurrent requests (50 URLs = [30] then [20] batches)
836
+
837
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
838
+ 🎯 **EXTRACTION PROMPT FORMULA** (Use OR statements for multiple targets!)
839
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
840
+
841
+ **Template:**
842
+ ```
843
+ Extract [target1] | [target2] | [target3] | [target4] | [target5]
844
+ with focus on [aspect1], [aspect2], [aspect3]
845
+ ```
846
+
847
+ **Rules:**
848
+ - Use pipe `|` to separate extraction targets (minimum 3 targets)
849
+ - Be SPECIFIC about what you want
850
+ - Include "with focus on" for prioritization
851
+ - More targets = more comprehensive extraction
852
+ - Aim for 5-10 extraction targets
853
+
854
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
855
+ ❌ **BAD EXAMPLE** (DON'T DO THIS - Wastes AI extraction!)
856
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
857
+
858
+ ```json
859
+ {
860
+ "urls": ["https://example.com/pricing"],
861
+ "use_llm": false,
862
+ "what_to_extract": "get pricing"
863
+ }
864
+ ```
865
+
866
+ **Why this is BAD:**
867
+ - use_llm=false (missing AI extraction!)
868
+ - Vague extraction prompt ("get pricing")
869
+ - Only 1 extraction target
870
+ - No focus areas specified
871
+ - Will return messy HTML instead of clean data
872
+
873
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
874
+ ✅ **GOOD EXAMPLE** (DO THIS - Uses AI extraction properly!)
875
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
876
+
877
+ ```json
878
+ {
879
+ "urls": [
880
+ "https://example.com/pricing",
881
+ "https://example.com/features",
882
+ "https://example.com/docs/api",
883
+ "https://example.com/enterprise",
884
+ "https://example.com/testimonials"
885
+ ],
886
+ "use_llm": true,
887
+ "what_to_extract": "Extract pricing tiers | plan features | API rate limits | enterprise options |
888
+ integration capabilities | user testimonials | technical requirements | performance benchmarks with focus on enterprise features, API limitations, and real-world performance data"
889
+ }
890
+ ```
891
+
892
+ **Why this is GOOD:**
893
+ - use_llm=true ✓ (AI extraction enabled!)
894
+ - 5 URLs (balanced depth/breadth)
895
+ - 8 extraction targets separated by `|`
896
+ - Specific targets (pricing tiers, API rate limits, etc.)
897
+ - Focus areas specified (enterprise, API, performance)
898
+ - Will return clean, structured extraction
899
+ - Covers multiple aspects of the product
900
+
901
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
902
+ 💡 **EXTRACTION PROMPT EXAMPLES BY USE CASE**
903
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
904
+
905
+ **Product Research:**
906
+ ```
907
+ Extract pricing details | feature comparisons | user reviews | technical specifications |
908
+ integration options | support channels | deployment models | security features
909
+ with focus on enterprise capabilities, pricing transparency, and integration complexity
910
+ ```
911
+
912
+ **Technical Documentation:**
913
+ ```
914
+ Extract API endpoints | authentication methods | rate limits | error codes |
915
+ request examples | response schemas | SDK availability | webhook support
916
+ with focus on authentication flow, rate limiting policies, and error handling patterns
917
+ ```
918
+
919
+ **Competitive Analysis:**
920
+ ```
921
+ Extract product features | pricing models | target customers | unique selling points |
922
+ technology stack | customer testimonials | case studies | market positioning
923
+ with focus on differentiators, pricing strategy, and customer satisfaction
924
+ ```
925
+
926
+ **Content Research:**
927
+ ```
928
+ Extract main arguments | supporting evidence | data points | expert quotes |
929
+ methodology | conclusions | references | publication date
930
+ with focus on credibility, data quality, and actionable insights
931
+ ```
932
+
933
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
934
+ 🚀 **PRO TIPS FOR MAXIMUM EFFECTIVENESS**
935
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
936
+
937
+ 1. **ALWAYS use use_llm=true** - The AI extraction is the tool's superpower
938
+ 2. **Use 3-10 URLs** - Balance between depth and breadth
939
+ 3. **Specify 5-10 extraction targets** - More targets = more comprehensive
940
+ 4. **Use pipe `|` separators** - Clearly separate each target
941
+ 5. **Add focus areas** - "with focus on X, Y, Z" for prioritization
942
+ 6. **Be specific** - "pricing tiers" not "pricing", "API rate limits" not "API info"
943
+ 7. **Think parallel extraction** - Each target extracts independently
944
+ 8. **Cover multiple aspects** - Features, pricing, technical, social proof
945
+
946
+ **REMEMBER:** AI extraction costs pennies but saves hours of manual parsing!
947
+
948
+ schemaDescriptions:
949
+ urls: |
950
+ **URLs to scrape (1-50 URLs recommended: 3-5 for balanced depth/breadth)**
951
+
952
+ More URLs = broader coverage but fewer tokens per URL
953
+ - 3 URLs: ~10K tokens each (deep extraction)
954
+ - 5 URLs: ~6K tokens each (balanced - RECOMMENDED)
955
+ - 10 URLs: ~3K tokens each (detailed)
956
+ - 50 URLs: ~640 tokens each (quick scan)
957
+
958
+ **VALIDATION:**
959
+ - Minimum 1 URL required
960
+ - Maximum 50 URLs allowed
961
+ - Each URL must be valid HTTP/HTTPS
962
+
963
+ timeout: "Timeout in seconds for each URL (5-120 seconds, default: 30)"
964
+
965
+ use_llm: |
966
+ **Enable AI-powered content extraction (HIGHLY RECOMMENDED - set to true)**
967
+
968
+ ⚡ **ALWAYS SET THIS TO true FOR INTELLIGENT EXTRACTION** ⚡
969
+
970
+ **Benefits of use_llm=true:**
971
+ - Filters out navigation, ads, footers automatically
972
+ - Extracts ONLY what you specify in what_to_extract
973
+ - Handles complex page structures intelligently
974
+ - Returns clean, structured content
975
+ - Saves manual HTML parsing
976
+
977
+ **Cost:** ~$0.001 per page (pennies for quality extraction)
978
+ **When to use false:** Only for debugging or raw HTML needs
979
+
980
+ **Default:** false (but you should set it to true!)
981
+
982
+ what_to_extract: |
983
+ **Extraction prompt for AI (REQUIRED when use_llm=true)**
984
+
985
+ **FORMULA:** Extract [target1] | [target2] | [target3] with focus on [aspect1], [aspect2]
986
+
987
+ **REQUIREMENTS:**
988
+ - Minimum 50 characters (be detailed!)
989
+ - Minimum 3 extraction targets separated by `|`
990
+ - Include "with focus on" for prioritization
991
+ - Be SPECIFIC about what you want
992
+
993
+ **EXAMPLES:**
994
+
995
+ ✅ GOOD: "Extract pricing tiers | plan features | API rate limits | enterprise options |
996
+ integration capabilities | user testimonials | technical requirements | performance benchmarks
997
+ with focus on enterprise features, API limitations, and real-world performance data"
998
+
999
+ ❌ BAD: "get main content" (too vague, no targets, no focus)
1000
+
1001
+ **PRO TIP:** More specific targets = better extraction quality!
1002
+
1003
+ # ============================================================================
1004
+ # WEB SEARCH TOOL
1005
+ # ============================================================================
1006
+
1007
+ - name: web_search
1008
+ category: search
1009
+ capability: search
1010
+ useZodSchema: true
1011
+ zodSchemaRef: "webSearchParamsSchema"
1012
+
1013
+ # Configurable limits
1014
+ limits:
1015
+ min_keywords: 3
1016
+ max_keywords: 100
1017
+ recommended_keywords: 7
1018
+
1019
+ description: |
1020
+ **🔥 BATCH WEB SEARCH - MINIMUM 3 KEYWORDS, RECOMMENDED 5-7 🔥**
1021
+
1022
+ **CRITICAL:** This tool searches up to 100 keywords IN PARALLEL via Google.
1023
+ Using 1-2 keywords = WASTING the tool's parallel search power!
1024
+
1025
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
1026
+ 📊 **KEYWORD BUDGET** (Use multiple perspectives!)
1027
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
1028
+
1029
+ - **MINIMUM:** 3 keywords (hard requirement)
1030
+ - **RECOMMENDED:** 5-7 keywords (optimal diversity)
1031
+ - **MAXIMUM:** 100 keywords (comprehensive research)
1032
+
1033
+ **RESULTS:** 10 results per keyword, all searches run in parallel
1034
+ - 3 keywords = 30 total results
1035
+ - 7 keywords = 70 total results
1036
+ - 100 keywords = 1000 total results
1037
+
1038
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
1039
+ 🎯 **KEYWORD DIVERSITY FORMULA** (Cover different angles!)
1040
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
1041
+
1042
+ Each keyword should target a DIFFERENT perspective:
1043
+
1044
+ 1. **Direct/Broad:** "React state management"
1045
+ 2. **Specific/Technical:** "React useReducer vs Redux"
1046
+ 3. **Problem-Focused:** "React state management performance issues"
1047
+ 4. **Best Practices:** "React state management best practices 2024"
1048
+ 5. **Comparison:** "React state management libraries comparison"
1049
+ 6. **Tutorial/Guide:** "React state management tutorial"
1050
+ 7. **Advanced:** "React state management patterns large applications"
1051
+
1052
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
1053
+ 🔧 **SEARCH OPERATORS** (Use these for precision!)
1054
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
1055
+
1056
+ - `site:domain.com` - Search within specific site
1057
+ - `"exact phrase"` - Match exact phrase
1058
+ - `-exclude` - Exclude term
1059
+ - `filetype:pdf` - Find specific file types
1060
+ - `OR` - Match either term
1061
+
1062
+ **Examples:**
1063
+ - `"React hooks" site:github.com` - React hooks repos on GitHub
1064
+ - `React state management -Redux` - Exclude Redux results
1065
+ - `React tutorial filetype:pdf` - Find React PDF tutorials
1066
+ - `React OR Vue state management` - Compare frameworks
1067
+
1068
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
1069
+ ❌ **BAD EXAMPLE** (DON'T DO THIS - Wastes parallel search!)
1070
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
1071
+
1072
+ ```json
1073
+ {
1074
+ "keywords": ["React"]
1075
+ }
1076
+ ```
1077
+
1078
+ **Why this is BAD:**
1079
+ - Only 1 keyword (minimum is 3!)
1080
+ - Too broad/vague
1081
+ - No search operators
1082
+ - No diversity (missing 6 other perspectives)
1083
+ - Misses specific, technical, and comparison results
1084
+
1085
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
1086
+ ✅ **GOOD EXAMPLE** (DO THIS - Uses parallel search properly!)
1087
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
1088
+
1089
+ ```json
1090
+ {
1091
+ "keywords": [
1092
+ "React state management best practices",
1093
+ "React useReducer vs Redux 2024",
1094
+ "React Context API performance",
1095
+ "Zustand React state library",
1096
+ "React state management large applications",
1097
+ "\"React state\" site:github.com",
1098
+ "React global state alternatives -Redux"
1099
+ ]
1100
+ }
1101
+ ```
1102
+
1103
+ **Why this is GOOD:**
1104
+ - 7 keywords (optimal range)
1105
+ - Covers multiple perspectives (best practices, comparison, performance, specific library)
1106
+ - Uses search operators (site:, "exact", -exclude, filetype:)
1107
+ - Includes year for recency (2024)
1108
+ - Targets different aspects (performance, large apps, alternatives)
1109
+ - Each keyword reveals different insights
1110
+ - Will find diverse, high-quality results
1111
+
1112
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
1113
+ 💡 **KEYWORD CRAFTING STRATEGIES BY USE CASE**
1114
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
1115
+
1116
+ **Technology Research:**
1117
+ ```json
1118
+ [
1119
+ "PostgreSQL vs MySQL performance 2024",
1120
+ "PostgreSQL best practices production",
1121
+ "\"PostgreSQL\" site:github.com stars:>1000",
1122
+ "PostgreSQL connection pooling",
1123
+ "PostgreSQL performance tuning",
1124
+ "PostgreSQL vs MongoDB use cases",
1125
+ "PostgreSQL replication setup"
1126
+ ]
1127
+ ```
1128
+
1129
+ **Problem Solving:**
1130
+ ```json
1131
+ [
1132
+ "Docker container memory leak debugging",
1133
+ "Docker memory limit not working",
1134
+ "\"Docker OOM\" site:stackoverflow.com",
1135
+ "Docker container resource monitoring",
1136
+ "Docker memory optimization best practices"
1137
+ ]
1138
+ ```
1139
+
1140
+ **Comparison Research:**
1141
+ ```json
1142
+ [
1143
+ "Next.js vs Remix performance",
1144
+ "Next.js 14 vs Remix 2024",
1145
+ "\"Next.js\" OR \"Remix\" benchmarks",
1146
+ "Next.js Remix migration guide",
1147
+ "Next.js vs Remix developer experience"
1148
+ ]
1149
+ ```
1150
+
1151
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
1152
+ 🚀 **PRO TIPS FOR MAXIMUM EFFECTIVENESS**
1153
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
1154
+
1155
+ 1. **Use 5-7 keywords minimum** - Each reveals different perspective
1156
+ 2. **Add year numbers** - "2024", "2025" for recent content
1157
+ 3. **Use search operators** - site:, "exact", -exclude, filetype:
1158
+ 4. **Vary specificity** - Mix broad + specific keywords
1159
+ 5. **Include comparisons** - "vs", "versus", "compared to", "OR"
1160
+ 6. **Target sources** - site:github.com, site:stackoverflow.com
1161
+ 7. **Add context** - "best practices", "tutorial", "production", "performance"
1162
+ 8. **Think parallel** - Each keyword searches independently
1163
+
1164
+ **REMEMBER:** More diverse keywords = better coverage = higher quality results!
1165
+
1166
+ **FOLLOW-UP:** Use `scrape_links` to extract full content from promising URLs!
1167
+
1168
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
1169
+ 🧠 **ITERATIVE WORKFLOW - THINK → SEARCH → THINK → SCRAPE → THINK → REFINE**
1170
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
1171
+
1172
+ **CRITICAL:** Use sequential thinking BETWEEN tool calls to refine based on results!
1173
+
1174
+ **WORKFLOW PATTERN:**
1175
+ ```
1176
+ 1. THINK FIRST (1-2 thoughts via sequentialthinking)
1177
+ → Analyze what you need to find
1178
+ → Plan initial keyword strategy
1179
+ → Identify which perspectives to cover
1180
+
1181
+ 2. SEARCH (call web_search with 5-7 keywords)
1182
+ → Execute your planned keywords
1183
+ → Get search results with URLs
1184
+
1185
+ 3. THINK AFTER RESULTS (2-3 thoughts via sequentialthinking)
1186
+ → Evaluate which URLs look most promising
1187
+ → Identify gaps in search coverage
1188
+ → Notice new angles from result snippets
1189
+ → Decide: scrape_links OR search again OR both
1190
+
1191
+ 4. SCRAPE PROMISING URLs (call scrape_links)
1192
+ → Extract full content from 3-10 best URLs
1193
+ → Use AI extraction (use_llm=true)
1194
+ → Get detailed information
1195
+
1196
+ 5. THINK AFTER SCRAPING (2-3 thoughts via sequentialthinking)
1197
+ → Evaluate scraped content
1198
+ → Identify what's still missing
1199
+ → Decide if more searches needed
1200
+
1201
+ 6. REFINE & ITERATE (if gaps remain)
1202
+ → Call web_search AGAIN with refined keywords
1203
+ → Cover gaps discovered from initial results
1204
+ → Scrape new URLs
1205
+ → Think and synthesize
1206
+ ```
1207
+
1208
+ **WHY THIS WORKS:**
1209
+ - Search results reveal new keywords you didn't think of
1210
+ - Scraped content shows what's actually useful vs what looked good
1211
+ - Thinking between tool calls prevents tunnel vision
1212
+ - Iterative refinement = comprehensive coverage
1213
+ - Humans don't search once - neither should you!
1214
+
1215
+ **EXAMPLE ITERATIVE FLOW:**
1216
+ ```
1217
+ Step 1: Think
1218
+ "Need to research PostgreSQL vs MySQL. Start with performance,
1219
+ scalability, and production use cases."
1220
+
1221
+ Step 2: web_search (5 keywords)
1222
+ ["PostgreSQL vs MySQL performance 2024",
1223
+ "PostgreSQL scalability production",
1224
+ "PostgreSQL vs MySQL benchmarks",
1225
+ "PostgreSQL production best practices",
1226
+ "MySQL vs PostgreSQL use cases"]
1227
+
1228
+ Step 3: Think (evaluate results)
1229
+ "Results mention connection pooling, replication, and specific
1230
+ benchmarks. I see URLs from official docs, DigitalOcean, and
1231
+ Percona. Should scrape these for details. Also noticed 'PgBouncer'
1232
+ mentioned - should search for that specifically."
1233
+
1234
+ Step 4: scrape_links (5 URLs from results)
1235
+ [Scrape PostgreSQL docs, MySQL docs, benchmark articles]
1236
+
1237
+ Step 5: Think (evaluate scraped content)
1238
+ "Got good info on connection pooling and replication. But missing:
1239
+ - Specific PgBouncer vs MySQL connection pooling comparison
1240
+ - Real-world migration experiences
1241
+ - Cost implications at scale
1242
+ Need to search again with these angles."
1243
+
1244
+ Step 6: web_search AGAIN (5 refined keywords)
1245
+ ["PgBouncer vs MySQL connection pooling",
1246
+ "PostgreSQL vs MySQL migration experience",
1247
+ "PostgreSQL vs MySQL cost at scale",
1248
+ "PostgreSQL vs MySQL production stories",
1249
+ "PostgreSQL MySQL real-world comparison"]
1250
+
1251
+ Step 7: scrape_links (new URLs)
1252
+ [Scrape migration stories, cost analyses]
1253
+
1254
+ Step 8: Think (final synthesis)
1255
+ "Now have complete picture: performance, scalability, migration
1256
+ experiences, costs. Can make informed recommendation."
1257
+ ```
1258
+
1259
+ **KEY INSIGHT:** Results are feedback! Use them to discover better searches!
1260
+
1261
+ **MANDATORY WORKFLOW:**
1262
+ ```
1263
+ web_search → sequentialthinking (2-3 thoughts) →
1264
+ scrape_links (MUST scrape promising URLs) →
1265
+ sequentialthinking (evaluate) →
1266
+ OPTIONAL: web_search again if gaps found →
1267
+ sequentialthinking → final synthesis
1268
+ ```
1269
+
1270
+ **REMEMBER:**
1271
+ - ALWAYS scrape after web_search (that's where the real content is!)
1272
+ - ALWAYS think between tool calls (evaluate and refine!)
1273
+ - DON'T stop after one search (iterate based on learnings!)
1274
+
1275
+ schemaDescriptions:
1276
+ keywords: |
1277
+ **Array of search keywords (MINIMUM 3, RECOMMENDED 5-7, MAX 100)**
1278
+
1279
+ Each keyword runs as a separate Google search in parallel.
1280
+
1281
+ **VALIDATION:**
1282
+ - Minimum 3 keywords required
1283
+ - Maximum 100 keywords allowed
1284
+ - Each keyword should target different angle
1285
+
1286
+ **DIVERSITY CHECKLIST:**
1287
+ ✓ Includes broad keyword
1288
+ ✓ Includes specific/technical keyword
1289
+ ✓ Includes comparison keyword (vs, OR)
1290
+ ✓ Includes best practices keyword
1291
+ ✓ Includes year-specific keyword (2024, 2025)
1292
+ ✓ Uses search operators (site:, "exact", -exclude)
1293
+ ✓ Targets specific sources (GitHub, Stack Overflow, docs)
1294
+
1295
+ **SEARCH OPERATORS:**
1296
+ - `site:domain.com` - Search within site
1297
+ - `"exact phrase"` - Match exact phrase
1298
+ - `-exclude` - Exclude term
1299
+ - `filetype:pdf` - Find file type
1300
+ - `OR` - Match either term
1301
+
1302
+ **EXAMPLES:**
1303
+
1304
+ ✅ GOOD: ["React hooks best practices", "React useEffect vs useLayoutEffect",
1305
+ "\"React hooks\" site:github.com", "React hooks performance 2024",
1306
+ "React custom hooks patterns", "React hooks -class components"]
1307
+
1308
+ ❌ BAD: ["React"] (too vague, only 1 keyword, no diversity)