doc-fetch-cli 2.0.7 → 2.0.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (3) hide show
  1. package/README.md +55 -15
  2. package/doc-fetch +0 -0
  3. package/package.json +1 -1
package/README.md CHANGED
@@ -81,10 +81,53 @@ doc-fetch \
81
81
  | `--url` | `-u` | Base URL to fetch documentation from | **Required** |
82
82
  | `--output` | `-o` | Output file path | `docs.md` |
83
83
  | `--depth` | `-d` | Maximum crawl depth | `2` |
84
- | `--concurrent` | `-c` | Number of concurrent fetchers | `3` |
84
+ | `--concurrent` | `-c` | Number of concurrent fetchers | `5` |
85
85
  | `--llm-txt` | | Generate AI-friendly llm.txt index | `false` |
86
86
  | `--user-agent` | | Custom user agent string | `DocFetch/1.0` |
87
87
 
88
+ ## ⚡ Advanced Tips for Large Documentation Sites
89
+
90
+ ### Faster Scraping for Large Sites (1000+ pages)
91
+
92
+ ```bash
93
+ # Increase concurrency for faster crawling
94
+ doc-fetch --url https://docs.example.com --output docs.md --concurrent 15
95
+
96
+ # Reduce depth if you only need top-level pages
97
+ doc-fetch --url https://docs.example.com --output docs.md --depth 2 --concurrent 20
98
+
99
+ # For massive sites, use multiple passes with different starting URLs
100
+ doc-fetch --url https://docs.example.com/guide --output guide.md --depth 3 --concurrent 10
101
+ doc-fetch --url https://docs.example.com/api --output api.md --depth 3 --concurrent 10
102
+ ```
103
+
104
+ ### Recommended Settings by Site Size
105
+
106
+ | Site Size | Pages | Concurrency | Depth | Time Estimate |
107
+ |-----------|-------|-------------|-------|---------------|
108
+ | Small | <100 | 5 | 3 | ~30 seconds |
109
+ | Medium | 100-500 | 10 | 3 | ~2 minutes |
110
+ | Large | 500-2000 | 15 | 4 | ~5-10 minutes |
111
+ | Very Large | 2000+ | 20 | 4 | ~15-30 minutes |
112
+
113
+ ### Troubleshooting
114
+
115
+ **"Queue full" warnings**: Increase buffer size by using higher concurrency (`--concurrent 15`)
116
+
117
+ **Slow initial crawl**: Normal - speed increases as more workers find pages
118
+
119
+ **Missing pages**: Increase depth (`--depth 4`) or start from multiple entry points
120
+
121
+ **Rate limiting**: Add delay between requests or reduce concurrency
122
+
123
+ ### Best Practices
124
+
125
+ 1. **Start with conservative settings** (`--concurrent 5`, `--depth 2`)
126
+ 2. **Monitor output** for missing sections
127
+ 3. **Adjust based on site structure** (some sites have deeper nav trees)
128
+ 4. **Use --llm-txt** for AI agent consumption (generates link index)
129
+ 5. **Respect robots.txt** - DocFetch honors it automatically
130
+
88
131
  ## 📁 Output Files
89
132
 
90
133
  When using `--llm-txt`, DocFetch generates two files:
@@ -108,23 +151,20 @@ This guide covers installation, setup, and first program...
108
151
  Complete Go language specification and syntax...
109
152
  ```
110
153
 
111
- ### `docs.llm.txt` - AI-Friendly Index
154
+ ### `docs.llm.txt` - Link Index (v2.0.7+ Format)
112
155
  ```txt
113
- # llm.txt - AI-friendly documentation index
114
-
115
- [GUIDE] Getting Started
116
- https://golang.org/doc/install
117
- Covers installation, setup, and first program.
118
-
119
- [REFERENCE] Language Specification
120
- https://golang.org/ref/spec
121
- Complete Go language specification and syntax.
122
-
123
- [API] net/http
124
- https://pkg.go.dev/net/http
125
- HTTP client/server implementation.
156
+ # llm.txt
157
+ # Link index with descriptions
158
+
159
+ Getting Started: https://golang.org/doc/install
160
+ Language Specification: https://golang.org/ref/spec
161
+ net/http Package: https://pkg.go.dev/net/http
162
+ Installing Go: https://golang.org/doc/install
163
+ Writing Your First Program: https://golang.org/doc/tutorial/create-module
126
164
  ```
127
165
 
166
+ **NEW in v2.0.7**: Simplified format extracts link text + URL for easy AI parsing. No more verbose descriptions - just clean "Description: URL" format.
167
+
128
168
  ## 🌟 Real-World Examples
129
169
 
130
170
  ### Fetch Go Documentation
package/doc-fetch CHANGED
Binary file
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "doc-fetch-cli",
3
- "version": "2.0.7",
3
+ "version": "2.0.8",
4
4
  "description": "Dynamic documentation fetching CLI that converts entire documentation sites to single markdown files for AI/LLM consumption",
5
5
  "bin": {
6
6
  "doc-fetch": "./bin/doc-fetch.js"