doc-fetch-cli 2.0.7 → 2.0.9
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +55 -15
- package/doc-fetch +0 -0
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -81,10 +81,53 @@ doc-fetch \
|
|
|
81
81
|
| `--url` | `-u` | Base URL to fetch documentation from | **Required** |
|
|
82
82
|
| `--output` | `-o` | Output file path | `docs.md` |
|
|
83
83
|
| `--depth` | `-d` | Maximum crawl depth | `2` |
|
|
84
|
-
| `--concurrent` | `-c` | Number of concurrent fetchers | `
|
|
84
|
+
| `--concurrent` | `-c` | Number of concurrent fetchers | `5` |
|
|
85
85
|
| `--llm-txt` | | Generate AI-friendly llm.txt index | `false` |
|
|
86
86
|
| `--user-agent` | | Custom user agent string | `DocFetch/1.0` |
|
|
87
87
|
|
|
88
|
+
## ⚡ Advanced Tips for Large Documentation Sites
|
|
89
|
+
|
|
90
|
+
### Faster Scraping for Large Sites (1000+ pages)
|
|
91
|
+
|
|
92
|
+
```bash
|
|
93
|
+
# Increase concurrency for faster crawling
|
|
94
|
+
doc-fetch --url https://docs.example.com --output docs.md --concurrent 15
|
|
95
|
+
|
|
96
|
+
# Reduce depth if you only need top-level pages
|
|
97
|
+
doc-fetch --url https://docs.example.com --output docs.md --depth 2 --concurrent 20
|
|
98
|
+
|
|
99
|
+
# For massive sites, use multiple passes with different starting URLs
|
|
100
|
+
doc-fetch --url https://docs.example.com/guide --output guide.md --depth 3 --concurrent 10
|
|
101
|
+
doc-fetch --url https://docs.example.com/api --output api.md --depth 3 --concurrent 10
|
|
102
|
+
```
|
|
103
|
+
|
|
104
|
+
### Recommended Settings by Site Size
|
|
105
|
+
|
|
106
|
+
| Site Size | Pages | Concurrency | Depth | Time Estimate |
|
|
107
|
+
|-----------|-------|-------------|-------|---------------|
|
|
108
|
+
| Small | <100 | 5 | 3 | ~30 seconds |
|
|
109
|
+
| Medium | 100-500 | 10 | 3 | ~2 minutes |
|
|
110
|
+
| Large | 500-2000 | 15 | 4 | ~5-10 minutes |
|
|
111
|
+
| Very Large | 2000+ | 20 | 4 | ~15-30 minutes |
|
|
112
|
+
|
|
113
|
+
### Troubleshooting
|
|
114
|
+
|
|
115
|
+
**"Queue full" warnings**: Increase buffer size by using higher concurrency (`--concurrent 15`)
|
|
116
|
+
|
|
117
|
+
**Slow initial crawl**: Normal - speed increases as more workers find pages
|
|
118
|
+
|
|
119
|
+
**Missing pages**: Increase depth (`--depth 4`) or start from multiple entry points
|
|
120
|
+
|
|
121
|
+
**Rate limiting**: Add delay between requests or reduce concurrency
|
|
122
|
+
|
|
123
|
+
### Best Practices
|
|
124
|
+
|
|
125
|
+
1. **Start with conservative settings** (`--concurrent 5`, `--depth 2`)
|
|
126
|
+
2. **Monitor output** for missing sections
|
|
127
|
+
3. **Adjust based on site structure** (some sites have deeper nav trees)
|
|
128
|
+
4. **Use --llm-txt** for AI agent consumption (generates link index)
|
|
129
|
+
5. **Respect robots.txt** - DocFetch honors it automatically
|
|
130
|
+
|
|
88
131
|
## 📁 Output Files
|
|
89
132
|
|
|
90
133
|
When using `--llm-txt`, DocFetch generates two files:
|
|
@@ -108,23 +151,20 @@ This guide covers installation, setup, and first program...
|
|
|
108
151
|
Complete Go language specification and syntax...
|
|
109
152
|
```
|
|
110
153
|
|
|
111
|
-
### `docs.llm.txt` -
|
|
154
|
+
### `docs.llm.txt` - Link Index (v2.0.7+ Format)
|
|
112
155
|
```txt
|
|
113
|
-
# llm.txt
|
|
114
|
-
|
|
115
|
-
|
|
116
|
-
https://golang.org/doc/install
|
|
117
|
-
|
|
118
|
-
|
|
119
|
-
|
|
120
|
-
https://golang.org/
|
|
121
|
-
Complete Go language specification and syntax.
|
|
122
|
-
|
|
123
|
-
[API] net/http
|
|
124
|
-
https://pkg.go.dev/net/http
|
|
125
|
-
HTTP client/server implementation.
|
|
156
|
+
# llm.txt
|
|
157
|
+
# Link index with descriptions
|
|
158
|
+
|
|
159
|
+
Getting Started: https://golang.org/doc/install
|
|
160
|
+
Language Specification: https://golang.org/ref/spec
|
|
161
|
+
net/http Package: https://pkg.go.dev/net/http
|
|
162
|
+
Installing Go: https://golang.org/doc/install
|
|
163
|
+
Writing Your First Program: https://golang.org/doc/tutorial/create-module
|
|
126
164
|
```
|
|
127
165
|
|
|
166
|
+
**NEW in v2.0.7**: Simplified format extracts link text + URL for easy AI parsing. No more verbose descriptions - just clean "Description: URL" format.
|
|
167
|
+
|
|
128
168
|
## 🌟 Real-World Examples
|
|
129
169
|
|
|
130
170
|
### Fetch Go Documentation
|
package/doc-fetch
CHANGED
|
Binary file
|
package/package.json
CHANGED