mcp-sequential-research 1.0.0 → 1.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +7 -6
- package/docs/MCP_GUIDANCE.md +54 -58
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -229,12 +229,13 @@ This format is designed for downstream claim-mining tools.
|
|
|
229
229
|
|
|
230
230
|
Works with other MCP servers:
|
|
231
231
|
|
|
232
|
-
| Source Type | Recommended MCP |
|
|
233
|
-
|
|
234
|
-
| Patents | Google Patents MCP
|
|
235
|
-
| Web | Google Search MCP
|
|
236
|
-
|
|
|
237
|
-
|
|
|
232
|
+
| Source Type | Recommended MCP | Tool |
|
|
233
|
+
|-------------|-----------------|------|
|
|
234
|
+
| Patents | Google Patents MCP | `search_patents` |
|
|
235
|
+
| Web Search | Google Search MCP | `google_search` |
|
|
236
|
+
| Web Scraping | Google Search MCP | `read_webpage` |
|
|
237
|
+
| Memory | Memory MCP | `search_nodes` |
|
|
238
|
+
| Academic | Semantic Scholar API | — |
|
|
238
239
|
|
|
239
240
|
## License
|
|
240
241
|
|
package/docs/MCP_GUIDANCE.md
CHANGED
|
@@ -32,14 +32,13 @@ This is the exact **operator loop** Claude Code follows for comprehensive resear
|
|
|
32
32
|
│ └─→ sequential-research:sequential_research_plan(prompt, constraints) │
|
|
33
33
|
│ ↓ │
|
|
34
34
|
│ 2. WEB QUERIES (for each plan.queries where query_family == "web") │
|
|
35
|
-
│ └─→ google-search:
|
|
35
|
+
│ └─→ google-search:google_search({query, num: 10}) │
|
|
36
36
|
│ Returns: {title, link, snippet}[] │
|
|
37
37
|
│ ↓ │
|
|
38
38
|
│ 3. SCRAPE WEB CONTENT │
|
|
39
39
|
│ ├─→ Collect top URLs (dedupe) │
|
|
40
|
-
│
|
|
41
|
-
│
|
|
42
|
-
│ Returns: {markdown, metadata} │
|
|
40
|
+
│ └─→ For each URL: google-search:read_webpage({url}) │
|
|
41
|
+
│ Returns: {title, text, url} │
|
|
43
42
|
│ ↓ │
|
|
44
43
|
│ 4. PATENT QUERIES (for each plan.queries where query_family == "patent") │
|
|
45
44
|
│ └─→ google-patents:search_patents({query, num_results}) │
|
|
@@ -89,7 +88,7 @@ For each query where `query_family == "web"`:
|
|
|
89
88
|
|
|
90
89
|
```json
|
|
91
90
|
{
|
|
92
|
-
"tool": "google-search:
|
|
91
|
+
"tool": "google-search:google_search",
|
|
93
92
|
"arguments": {
|
|
94
93
|
"query": "photonic computing silicon photonics site:.edu OR filetype:pdf",
|
|
95
94
|
"num": 10
|
|
@@ -110,36 +109,16 @@ For each query where `query_family == "web"`:
|
|
|
110
109
|
}
|
|
111
110
|
```
|
|
112
111
|
|
|
113
|
-
### Step 3: Scrape Web Content via
|
|
112
|
+
### Step 3: Scrape Web Content via Google Search MCP `read_webpage`
|
|
114
113
|
|
|
115
|
-
After collecting search results, extract full content using
|
|
114
|
+
After collecting search results, extract full content using `read_webpage`.
|
|
116
115
|
|
|
117
|
-
**For
|
|
116
|
+
**For each URL:**
|
|
118
117
|
```json
|
|
119
118
|
{
|
|
120
|
-
"tool": "
|
|
119
|
+
"tool": "google-search:read_webpage",
|
|
121
120
|
"arguments": {
|
|
122
|
-
"
|
|
123
|
-
"https://example.mit.edu/photonics.pdf",
|
|
124
|
-
"https://lightmatter.co/technology",
|
|
125
|
-
"https://ieee.org/article/photonic-computing"
|
|
126
|
-
],
|
|
127
|
-
"options": {
|
|
128
|
-
"formats": ["markdown"],
|
|
129
|
-
"onlyMainContent": true
|
|
130
|
-
}
|
|
131
|
-
}
|
|
132
|
-
}
|
|
133
|
-
```
|
|
134
|
-
|
|
135
|
-
**For a single URL:**
|
|
136
|
-
```json
|
|
137
|
-
{
|
|
138
|
-
"tool": "firecrawl:firecrawl_scrape",
|
|
139
|
-
"arguments": {
|
|
140
|
-
"url": "https://example.mit.edu/photonics.pdf",
|
|
141
|
-
"formats": ["markdown"],
|
|
142
|
-
"onlyMainContent": true
|
|
121
|
+
"url": "https://example.mit.edu/photonics.pdf"
|
|
143
122
|
}
|
|
144
123
|
}
|
|
145
124
|
```
|
|
@@ -147,17 +126,14 @@ After collecting search results, extract full content using Firecrawl.
|
|
|
147
126
|
**Response format:**
|
|
148
127
|
```json
|
|
149
128
|
{
|
|
150
|
-
"
|
|
151
|
-
"
|
|
152
|
-
|
|
153
|
-
"metadata": {
|
|
154
|
-
"title": "Silicon Photonics for AI",
|
|
155
|
-
"sourceURL": "https://example.mit.edu/photonics.pdf"
|
|
156
|
-
}
|
|
157
|
-
}
|
|
129
|
+
"title": "Silicon Photonics for AI - MIT",
|
|
130
|
+
"text": "# Silicon Photonics for AI\n\nRecent advances in silicon photonics have enabled...",
|
|
131
|
+
"url": "https://example.mit.edu/photonics.pdf"
|
|
158
132
|
}
|
|
159
133
|
```
|
|
160
134
|
|
|
135
|
+
**Note:** Call `read_webpage` for each URL sequentially or in parallel. The tool automatically converts HTML to readable text and handles most page types.
|
|
136
|
+
|
|
161
137
|
### Step 4: Execute Patent Queries via Google Patents MCP
|
|
162
138
|
|
|
163
139
|
For each query where `query_family == "patent"`:
|
|
@@ -234,7 +210,7 @@ Transform all responses into the standard schema with sequential source IDs:
|
|
|
234
210
|
2. Deduplicate URLs before assigning IDs
|
|
235
211
|
3. Patents get `source_type: "patent"` with extra fields
|
|
236
212
|
4. Web content gets `source_type: "web"`
|
|
237
|
-
5.
|
|
213
|
+
5. The `text` from `read_webpage` goes in `excerpt` (truncated if needed)
|
|
238
214
|
|
|
239
215
|
### Step 6: Call `sequential_research_compile`
|
|
240
216
|
|
|
@@ -280,8 +256,8 @@ await fs.writeFile(`research/${slug}/raw_results.json`, JSON.stringify(rawResult
|
|
|
280
256
|
| Step | MCP Server | Tool | Purpose |
|
|
281
257
|
|------|-----------|------|---------|
|
|
282
258
|
| 1 | sequential-research | `sequential_research_plan` | Generate structured query plan |
|
|
283
|
-
| 2 | google-search | `
|
|
284
|
-
| 3 |
|
|
259
|
+
| 2 | google-search | `google_search` | Get web search results |
|
|
260
|
+
| 3 | google-search | `read_webpage` | Extract full page content |
|
|
285
261
|
| 4 | google-patents | `search_patents` | Search patent database |
|
|
286
262
|
| 5 | — | — | Normalize to raw_results[] |
|
|
287
263
|
| 6 | sequential-research | `sequential_research_compile` | Generate report with citations |
|
|
@@ -289,31 +265,40 @@ await fs.writeFile(`research/${slug}/raw_results.json`, JSON.stringify(rawResult
|
|
|
289
265
|
|
|
290
266
|
---
|
|
291
267
|
|
|
292
|
-
##
|
|
268
|
+
## Web Scraping with `read_webpage`
|
|
269
|
+
|
|
270
|
+
The `google-search:read_webpage` tool provides a simple, reliable way to fetch web content without additional MCP server dependencies.
|
|
293
271
|
|
|
294
|
-
###
|
|
272
|
+
### Usage
|
|
295
273
|
|
|
296
|
-
|
|
297
|
-
|
|
298
|
-
|
|
299
|
-
|
|
300
|
-
|
|
301
|
-
|
|
274
|
+
```json
|
|
275
|
+
{
|
|
276
|
+
"tool": "google-search:read_webpage",
|
|
277
|
+
"arguments": {
|
|
278
|
+
"url": "https://example.com/article"
|
|
279
|
+
}
|
|
280
|
+
}
|
|
281
|
+
```
|
|
302
282
|
|
|
303
|
-
###
|
|
283
|
+
### Response Format
|
|
304
284
|
|
|
305
285
|
```json
|
|
306
286
|
{
|
|
307
|
-
"
|
|
308
|
-
"
|
|
309
|
-
"
|
|
310
|
-
"excludeTags": ["nav", "footer", "aside"], // Skip navigation
|
|
311
|
-
"waitFor": 2000, // Wait for JS rendering (ms)
|
|
312
|
-
"timeout": 30000 // Request timeout (ms)
|
|
287
|
+
"title": "Article Title",
|
|
288
|
+
"text": "The full text content of the page...",
|
|
289
|
+
"url": "https://example.com/article"
|
|
313
290
|
}
|
|
314
291
|
```
|
|
315
292
|
|
|
316
|
-
###
|
|
293
|
+
### Features
|
|
294
|
+
|
|
295
|
+
- **Automatic HTML to text conversion** — Clean, readable output
|
|
296
|
+
- **No additional setup** — Uses the same google-search MCP server
|
|
297
|
+
- **Handles most page types** — HTML, some PDFs, etc.
|
|
298
|
+
|
|
299
|
+
### Handling Scrape Errors
|
|
300
|
+
|
|
301
|
+
If a URL fails to scrape, include it in results with a note:
|
|
317
302
|
|
|
318
303
|
```json
|
|
319
304
|
{
|
|
@@ -326,13 +311,24 @@ await fs.writeFile(`research/${slug}/raw_results.json`, JSON.stringify(rawResult
|
|
|
326
311
|
"source_type": "web",
|
|
327
312
|
"title": "Page Title (scrape failed)",
|
|
328
313
|
"url": "https://example.com/blocked",
|
|
329
|
-
"excerpt": "[Content unavailable -
|
|
314
|
+
"excerpt": "[Content unavailable - page could not be fetched]"
|
|
330
315
|
}
|
|
331
316
|
],
|
|
332
317
|
"execution_notes": "1 of 5 URLs failed to scrape"
|
|
333
318
|
}
|
|
334
319
|
```
|
|
335
320
|
|
|
321
|
+
### Parallel Execution
|
|
322
|
+
|
|
323
|
+
You can call `read_webpage` for multiple URLs in parallel to improve throughput:
|
|
324
|
+
|
|
325
|
+
```
|
|
326
|
+
// Execute these concurrently:
|
|
327
|
+
google-search:read_webpage({url: "https://site1.com/page"})
|
|
328
|
+
google-search:read_webpage({url: "https://site2.com/page"})
|
|
329
|
+
google-search:read_webpage({url: "https://site3.com/page"})
|
|
330
|
+
```
|
|
331
|
+
|
|
336
332
|
## Citation Format Requirement
|
|
337
333
|
|
|
338
334
|
Citations must be **stable** and **machine-parseable** for downstream claim-mining.
|