@seanyao/roll 0.5.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +201 -0
- package/bin/roll +1375 -0
- package/conventions/config.yaml +15 -0
- package/conventions/global/.cursor-rules +31 -0
- package/conventions/global/AGENTS.md +100 -0
- package/conventions/global/CLAUDE.md +32 -0
- package/conventions/global/GEMINI.md +28 -0
- package/conventions/templates/backend-service/.cursor-rules +17 -0
- package/conventions/templates/backend-service/AGENTS.md +88 -0
- package/conventions/templates/backend-service/CLAUDE.md +18 -0
- package/conventions/templates/backend-service/GEMINI.md +16 -0
- package/conventions/templates/cli/.cursor-rules +17 -0
- package/conventions/templates/cli/AGENTS.md +66 -0
- package/conventions/templates/cli/CLAUDE.md +18 -0
- package/conventions/templates/cli/GEMINI.md +16 -0
- package/conventions/templates/frontend-only/.cursor-rules +16 -0
- package/conventions/templates/frontend-only/AGENTS.md +71 -0
- package/conventions/templates/frontend-only/CLAUDE.md +16 -0
- package/conventions/templates/frontend-only/GEMINI.md +14 -0
- package/conventions/templates/fullstack/.cursor-rules +17 -0
- package/conventions/templates/fullstack/AGENTS.md +87 -0
- package/conventions/templates/fullstack/CLAUDE.md +17 -0
- package/conventions/templates/fullstack/GEMINI.md +15 -0
- package/package.json +33 -0
- package/skills/roll-.changelog/SKILL.md +79 -0
- package/skills/roll-.clarify/SKILL.md +59 -0
- package/skills/roll-.echo/SKILL.md +113 -0
- package/skills/roll-.qa/SKILL.md +204 -0
- package/skills/roll-.review/SKILL.md +105 -0
- package/skills/roll-build/SKILL.md +559 -0
- package/skills/roll-debug/SKILL.md +428 -0
- package/skills/roll-design/ENGINEERING_CHECKLIST.md +256 -0
- package/skills/roll-design/SKILL.md +276 -0
- package/skills/roll-fix/SKILL.md +442 -0
- package/skills/roll-jot/SKILL.md +50 -0
- package/skills/roll-research/SKILL.md +307 -0
- package/skills/roll-research/references/schema.json +162 -0
- package/skills/roll-research/scripts/md_to_pdf.py +289 -0
- package/skills/roll-sentinel/SKILL.md +355 -0
- package/skills/roll-spar/SKILL.md +287 -0
- package/template/.env.example +47 -0
- package/template/.github/workflows/ci.yml +32 -0
- package/template/.github/workflows/sentinel.yml +26 -0
- package/template/AGENTS.md +80 -0
- package/template/BACKLOG.md +42 -0
- package/template/package.json +43 -0
- package/tools/roll-fetch/SKILL.md +182 -0
- package/tools/roll-fetch/package.json +15 -0
- package/tools/roll-fetch/smart-web-fetch.js +558 -0
- package/tools/roll-probe/SKILL.md +84 -0
|
@@ -0,0 +1,182 @@
|
|
|
1
|
+
---
|
|
2
|
+
hidden: true
|
|
3
|
+
name: roll-fetch
|
|
4
|
+
description: Web page fetching and crawling for AI agents. Extract content from URLs for research, documentation, and competitive analysis.
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Roll Fetch - Web Content Extraction
|
|
8
|
+
|
|
9
|
+
Extract content from web pages for research and analysis.
|
|
10
|
+
|
|
11
|
+
## When to Use
|
|
12
|
+
|
|
13
|
+
- Product research (competitor analysis)
|
|
14
|
+
- Technical documentation gathering
|
|
15
|
+
- Code examples and best practices
|
|
16
|
+
- Full site crawling for backup/analysis
|
|
17
|
+
|
|
18
|
+
## Environment Setup
|
|
19
|
+
|
|
20
|
+
Configure API keys per machine:
|
|
21
|
+
|
|
22
|
+
```bash
|
|
23
|
+
# Required for Tavily
|
|
24
|
+
export TAVILY_API_KEY=tvly-dev-...
|
|
25
|
+
|
|
26
|
+
# Optional for cloud browser fallback
|
|
27
|
+
export BROWSER_USE_API_KEY=bu-...
|
|
28
|
+
```
|
|
29
|
+
|
|
30
|
+
Or create `.env` file in project root:
|
|
31
|
+
```
|
|
32
|
+
TAVILY_API_KEY=tvly-dev-...
|
|
33
|
+
BROWSER_USE_API_KEY=bu-...
|
|
34
|
+
```
|
|
35
|
+
|
|
36
|
+
## Methods
|
|
37
|
+
|
|
38
|
+
### 1. Tavily API (Recommended)
|
|
39
|
+
|
|
40
|
+
Best quality extraction, requires `TAVILY_API_KEY`.
|
|
41
|
+
|
|
42
|
+
```bash
|
|
43
|
+
# Using Tavily CLI or API
|
|
44
|
+
curl -X POST https://api.tavily.com/extract \
|
|
45
|
+
-H "Content-Type: application/json" \
|
|
46
|
+
-d '{
|
|
47
|
+
"urls": ["https://example.com"],
|
|
48
|
+
"api_key": "your_tavily_api_key"
|
|
49
|
+
}'
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
**Pros**: AI-optimized extraction, handles complex layouts
|
|
53
|
+
**Cons**: Requires API key, rate limited
|
|
54
|
+
|
|
55
|
+
### 2. LLM Native Fetch (Default)
|
|
56
|
+
|
|
57
|
+
Use your built-in URL fetching capability directly.
|
|
58
|
+
|
|
59
|
+
**When to use**: When Tavily is unavailable or for quick checks.
|
|
60
|
+
|
|
61
|
+
**Note**: Most modern AI agents (Kimi, Codex, Claude) have native URL fetching. Use `FetchURL` tool or equivalent.
|
|
62
|
+
|
|
63
|
+
### 3. Browser Automation (Fallback)
|
|
64
|
+
|
|
65
|
+
Local browser automation for stubborn pages using **[browser-use](https://github.com/browser-use/browser-use)**.
|
|
66
|
+
|
|
67
|
+
**How to Choose:**
|
|
68
|
+
|
|
69
|
+
| If | Then Use | Why |
|
|
70
|
+
|----|---------|-----|
|
|
71
|
+
| `BROWSER_USE_API_KEY` in env | **Cloud** | Managed browsers, less setup |
|
|
72
|
+
| No API key, but `browser-use` installed | **Local** | Free, no external dependency |
|
|
73
|
+
| Neither | Skip to manual extraction | Tell user "Need browser automation setup" |
|
|
74
|
+
|
|
75
|
+
**Option A: Local (Free, No API Key)**
|
|
76
|
+
```python
|
|
77
|
+
from browser_use import Agent, Browser, BrowserConfig
|
|
78
|
+
import asyncio
|
|
79
|
+
|
|
80
|
+
async def fetch_page(url):
|
|
81
|
+
# Pure local, no API key needed
|
|
82
|
+
browser = Browser(config=BrowserConfig(headless=True))
|
|
83
|
+
await browser.start()
|
|
84
|
+
page = await browser.get_current_page()
|
|
85
|
+
await page.goto(url)
|
|
86
|
+
content = await page.content()
|
|
87
|
+
await browser.stop()
|
|
88
|
+
return content
|
|
89
|
+
|
|
90
|
+
# Run
|
|
91
|
+
content = asyncio.run(fetch_page("https://example.com"))
|
|
92
|
+
```
|
|
93
|
+
|
|
94
|
+
**Option B: Cloud API**
|
|
95
|
+
```python
|
|
96
|
+
from browser_use import Agent
|
|
97
|
+
|
|
98
|
+
agent = Agent(
|
|
99
|
+
task=f"Extract the main content from {url} and return as markdown",
|
|
100
|
+
llm="moonshot" # or openai, anthropic
|
|
101
|
+
)
|
|
102
|
+
result = await agent.run()
|
|
103
|
+
```
|
|
104
|
+
|
|
105
|
+
**Setup** (Local):
|
|
106
|
+
```bash
|
|
107
|
+
pip install browser-use
|
|
108
|
+
playwright install chromium
|
|
109
|
+
```
|
|
110
|
+
|
|
111
|
+
## Usage
|
|
112
|
+
|
|
113
|
+
### CLI Usage (via smart-web-fetch.js)
|
|
114
|
+
|
|
115
|
+
```bash
|
|
116
|
+
# Auto mode (Tavily → Native → Browser)
|
|
117
|
+
node smart-web-fetch.js fetch https://example.com
|
|
118
|
+
|
|
119
|
+
# Explicit method
|
|
120
|
+
node smart-web-fetch.js fetch https://example.com tavily
|
|
121
|
+
node smart-web-fetch.js fetch https://example.com native
|
|
122
|
+
node smart-web-fetch.js fetch https://example.com browser
|
|
123
|
+
|
|
124
|
+
# Search
|
|
125
|
+
node smart-web-fetch.js search "Python async" 5
|
|
126
|
+
```
|
|
127
|
+
|
|
128
|
+
### Programmatic Usage
|
|
129
|
+
|
|
130
|
+
```javascript
|
|
131
|
+
const { smartFetch, smartSearch } = require('./smart-web-fetch.js');
|
|
132
|
+
|
|
133
|
+
// Fetch a page
|
|
134
|
+
const result = await smartFetch('https://example.com');
|
|
135
|
+
console.log(result.content);
|
|
136
|
+
|
|
137
|
+
// Search
|
|
138
|
+
const searchResult = await smartSearch('OpenAI GPT-5', 5);
|
|
139
|
+
console.log(searchResult.results);
|
|
140
|
+
```
|
|
141
|
+
|
|
142
|
+
### Single Page Fetch
|
|
143
|
+
|
|
144
|
+
```
|
|
145
|
+
User: "Fetch https://docs.example.com/api"
|
|
146
|
+
→ Use smart-web-fetch.js with auto mode
|
|
147
|
+
→ Return clean markdown content
|
|
148
|
+
```
|
|
149
|
+
|
|
150
|
+
### Full Site Crawl
|
|
151
|
+
|
|
152
|
+
```
|
|
153
|
+
User: "Crawl https://docs.example.com"
|
|
154
|
+
→ Use smart-web-fetch.js recursively
|
|
155
|
+
→ Extract all internal links
|
|
156
|
+
→ Recursively fetch up to max depth (default: 2)
|
|
157
|
+
→ Save each page as separate markdown file
|
|
158
|
+
```
|
|
159
|
+
|
|
160
|
+
## Output Format
|
|
161
|
+
|
|
162
|
+
Always return clean Markdown:
|
|
163
|
+
- Extract main content only (remove nav, ads, footers)
|
|
164
|
+
- Preserve code blocks and tables
|
|
165
|
+
- Include source URL as header
|
|
166
|
+
|
|
167
|
+
## Quality Check
|
|
168
|
+
|
|
169
|
+
Validate extracted content:
|
|
170
|
+
- Min length: 500 chars (reject if shorter)
|
|
171
|
+
- Check for captcha/error messages
|
|
172
|
+
- Verify main content structure (headings, paragraphs)
|
|
173
|
+
|
|
174
|
+
## Examples
|
|
175
|
+
|
|
176
|
+
| Task | Method | Command |
|
|
177
|
+
|------|--------|---------|
|
|
178
|
+
| Quick article | Auto | `node smart-web-fetch.js fetch https://blog.example.com` |
|
|
179
|
+
| API docs | Tavily | `node smart-web-fetch.js fetch https://docs.example.com tavily` |
|
|
180
|
+
| SPA site | Browser | `node smart-web-fetch.js fetch https://spa.example.com browser` |
|
|
181
|
+
| Search | Tavily | `node smart-web-fetch.js search "Python async" 5` |
|
|
182
|
+
| Fallback test | Native | `node smart-web-fetch.js fetch https://example.com native` |
|
|
@@ -0,0 +1,15 @@
|
|
|
1
|
+
{
|
|
2
|
+
"name": "smart-web-fetch",
|
|
3
|
+
"version": "1.0.0",
|
|
4
|
+
"description": "Intelligent web fetching with automatic Tavily → Scrapling fallback",
|
|
5
|
+
"main": "smart-web-fetch.js",
|
|
6
|
+
"bin": {
|
|
7
|
+
"smart-web-fetch": "./smart-web-fetch.js"
|
|
8
|
+
},
|
|
9
|
+
"scripts": {
|
|
10
|
+
"test": "node smart-web-fetch.js fetch https://example.com"
|
|
11
|
+
},
|
|
12
|
+
"keywords": ["web-scraping", "tavily", "scrapling", "fallback"],
|
|
13
|
+
"author": "R0_lobster",
|
|
14
|
+
"license": "MIT"
|
|
15
|
+
}
|