md-fetch 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (78) hide show
  1. package/AGENTS.md +212 -0
  2. package/LICENSE +21 -0
  3. package/README.md +449 -0
  4. package/README.zh-CN.md +449 -0
  5. package/dist/cli.d.ts +27 -0
  6. package/dist/cli.d.ts.map +1 -0
  7. package/dist/cli.js +158 -0
  8. package/dist/cli.js.map +1 -0
  9. package/dist/constants.d.ts +9 -0
  10. package/dist/constants.d.ts.map +1 -0
  11. package/dist/constants.js +15 -0
  12. package/dist/constants.js.map +1 -0
  13. package/dist/core/browser.d.ts +23 -0
  14. package/dist/core/browser.d.ts.map +1 -0
  15. package/dist/core/browser.js +125 -0
  16. package/dist/core/browser.js.map +1 -0
  17. package/dist/core/converter.d.ts +18 -0
  18. package/dist/core/converter.d.ts.map +1 -0
  19. package/dist/core/converter.js +74 -0
  20. package/dist/core/converter.js.map +1 -0
  21. package/dist/core/extractor.d.ts +28 -0
  22. package/dist/core/extractor.d.ts.map +1 -0
  23. package/dist/core/extractor.js +151 -0
  24. package/dist/core/extractor.js.map +1 -0
  25. package/dist/core/fetcher.d.ts +24 -0
  26. package/dist/core/fetcher.d.ts.map +1 -0
  27. package/dist/core/fetcher.js +111 -0
  28. package/dist/core/fetcher.js.map +1 -0
  29. package/dist/core/processor.d.ts +22 -0
  30. package/dist/core/processor.d.ts.map +1 -0
  31. package/dist/core/processor.js +104 -0
  32. package/dist/core/processor.js.map +1 -0
  33. package/dist/core/screenshotter.d.ts +31 -0
  34. package/dist/core/screenshotter.d.ts.map +1 -0
  35. package/dist/core/screenshotter.js +222 -0
  36. package/dist/core/screenshotter.js.map +1 -0
  37. package/dist/index.d.ts +3 -0
  38. package/dist/index.d.ts.map +1 -0
  39. package/dist/index.js +14 -0
  40. package/dist/index.js.map +1 -0
  41. package/dist/screen-cli.d.ts +26 -0
  42. package/dist/screen-cli.d.ts.map +1 -0
  43. package/dist/screen-cli.js +196 -0
  44. package/dist/screen-cli.js.map +1 -0
  45. package/dist/screen.d.ts +3 -0
  46. package/dist/screen.d.ts.map +1 -0
  47. package/dist/screen.js +14 -0
  48. package/dist/screen.js.map +1 -0
  49. package/dist/types/index.d.ts +151 -0
  50. package/dist/types/index.d.ts.map +1 -0
  51. package/dist/types/index.js +42 -0
  52. package/dist/types/index.js.map +1 -0
  53. package/dist/utils/filename-sanitizer.d.ts +38 -0
  54. package/dist/utils/filename-sanitizer.d.ts.map +1 -0
  55. package/dist/utils/filename-sanitizer.js +79 -0
  56. package/dist/utils/filename-sanitizer.js.map +1 -0
  57. package/dist/utils/frontmatter.d.ts +6 -0
  58. package/dist/utils/frontmatter.d.ts.map +1 -0
  59. package/dist/utils/frontmatter.js +65 -0
  60. package/dist/utils/frontmatter.js.map +1 -0
  61. package/package.json +56 -0
  62. package/skills/md-fetch/SKILL.md +133 -0
  63. package/skills/md-fetch/references/cli-reference.md +257 -0
  64. package/src/cli.ts +169 -0
  65. package/src/constants.ts +17 -0
  66. package/src/core/browser.ts +161 -0
  67. package/src/core/converter.ts +82 -0
  68. package/src/core/extractor.ts +172 -0
  69. package/src/core/fetcher.ts +143 -0
  70. package/src/core/processor.ts +124 -0
  71. package/src/core/screenshotter.ts +289 -0
  72. package/src/index.ts +15 -0
  73. package/src/screen-cli.ts +216 -0
  74. package/src/screen.ts +15 -0
  75. package/src/types/index.ts +227 -0
  76. package/src/utils/filename-sanitizer.ts +88 -0
  77. package/src/utils/frontmatter.ts +81 -0
  78. package/tsconfig.json +20 -0
package/AGENTS.md ADDED
@@ -0,0 +1,212 @@
1
+ # md-fetch - Web Content Processing CLI Tools
2
+
3
+ ## 项目概述
4
+ 一套基于 Node.js 的 CLI 工具集,包含:
5
+ 1. **md-fetch** - 将 URL 内容转换为干净的 Markdown 格式
6
+ 2. **md-fetch-screen** - 对网页进行高质量截图
7
+
8
+ ## 技术栈
9
+ - **语言**: TypeScript (ES 模块)
10
+ - **运行时**: Node.js ≥18
11
+ - **包管理器**: pnpm
12
+ - **核心依赖**:
13
+ - `commander`: CLI 参数解析
14
+ - `@mozilla/readability`: 内容提取
15
+ - `turndown`: HTML 转 Markdown
16
+ - `puppeteer-core`: 无头浏览器(不自带 Chrome)
17
+ - `jsdom`: DOM 解析
18
+ - `undici`: 代理支持
19
+
20
+ ## 架构设计
21
+
22
+ ### md-fetch 核心处理流程
23
+ ```
24
+ URL 输入 → Processor
25
+
26
+ fetch/browser (获取 HTML)
27
+
28
+ Extractor (提取内容 + 元数据)
29
+
30
+ Converter (转为 Markdown)
31
+
32
+ Generate Frontmatter (生成 YAML)
33
+
34
+ 输出 (stdout/file)
35
+ ```
36
+
37
+ ### md-fetch-screen 截图流程
38
+ ```
39
+ URL 输入 → Screenshotter
40
+
41
+ Browser.launch (启动浏览器)
42
+
43
+ Page.setViewport (设置视口 + 像素比例)
44
+
45
+ Page.goto (导航到 URL)
46
+
47
+ 隐藏元素 (可选)
48
+
49
+ 延迟等待 (可选)
50
+
51
+ Screenshot (全页/视口/元素)
52
+
53
+ 保存文件 (自动命名)
54
+ ```
55
+
56
+ ### 关键模块
57
+ - **fetcher** (`src/core/fetcher.ts`): 使用原生 fetch 执行 HTTP 请求
58
+ - **browser** (`src/core/browser.ts`): Puppeteer 集成,用于 SPA 页面
59
+ - **extractor** (`src/core/extractor.ts`): 使用 readability 提取主要内容和元数据
60
+ - **converter** (`src/core/converter.ts`): 使用 turndown 将 HTML 转为 Markdown
61
+ - **processor** (`src/core/processor.ts`): 协调 Markdown 转换流程
62
+ - **screenshotter** (`src/core/screenshotter.ts`): 截图核心类,管理浏览器和截图逻辑
63
+ - **filename-sanitizer** (`src/utils/filename-sanitizer.ts`): URL 安全化和时间戳生成
64
+ - **cli** (`src/cli.ts`): md-fetch CLI 接口和参数解析
65
+ - **screen-cli** (`src/screen-cli.ts`): md-fetch-screen CLI 接口和参数解析
66
+
67
+ ## 开发命令
68
+ ```bash
69
+ # 安装依赖
70
+ pnpm install
71
+
72
+ # 开发模式运行
73
+ pnpm dev <url>
74
+
75
+ # 构建
76
+ pnpm build
77
+
78
+ # 测试
79
+ pnpm test
80
+ ```
81
+
82
+ ## 项目结构
83
+ ```
84
+ md-fetch/
85
+ ├── src/
86
+ │ ├── index.ts # md-fetch CLI 入口点
87
+ │ ├── cli.ts # md-fetch CLI 参数解析
88
+ │ ├── screen.ts # md-fetch-screen CLI 入口点
89
+ │ ├── screen-cli.ts # md-fetch-screen CLI 参数解析
90
+ │ ├── constants.ts # 常量定义
91
+ │ ├── core/
92
+ │ │ ├── fetcher.ts # HTTP fetch 逻辑
93
+ │ │ ├── browser.ts # Puppeteer 浏览器管理
94
+ │ │ ├── extractor.ts # 内容提取
95
+ │ │ ├── converter.ts # HTML 转 Markdown
96
+ │ │ ├── processor.ts # Markdown 主处理编排器
97
+ │ │ └── screenshotter.ts # 截图核心类
98
+ │ ├── utils/
99
+ │ │ ├── frontmatter.ts # YAML frontmatter 生成
100
+ │ │ └── filename-sanitizer.ts # 文件名安全化
101
+ │ └── types/
102
+ │ └── index.ts # TypeScript 类型定义
103
+ ├── dist/ # 构建输出
104
+ │ ├── index.js # md-fetch 可执行文件
105
+ │ └── screen.js # md-fetch-screen 可执行文件
106
+ └── package.json # 包配置(定义两个可执行命令)
107
+ ```
108
+
109
+ ## 设计决策
110
+ 1. **puppeteer-core**: 不捆绑浏览器,减小包体积,用户需自行安装 Chrome
111
+ 2. **ES 模块**: 使用现代 Node.js 模块系统
112
+ 3. **TypeScript 严格模式**: 确保类型安全
113
+ 4. **最小化依赖**: 只保留核心功能必需的依赖
114
+
115
+ ## CLI 用法示例
116
+
117
+ ### md-fetch (Markdown 转换)
118
+ ```bash
119
+ # 基本使用
120
+ md-fetch https://example.com
121
+
122
+ # 保存到文件
123
+ md-fetch https://example.com -o article.md
124
+
125
+ # 浏览器模式(用于 SPA)
126
+ md-fetch https://react-app.com -b
127
+
128
+ # 自定义选择器
129
+ md-fetch https://example.com -s "article.main-content"
130
+
131
+ # 多个 URL
132
+ md-fetch url1.com url2.com url3.com
133
+
134
+ # 自定义 headers
135
+ md-fetch https://example.com -H "Authorization: Bearer token"
136
+
137
+ # 详细日志
138
+ md-fetch https://example.com --verbose
139
+ ```
140
+
141
+ ### md-fetch-screen (网页截图)
142
+ ```bash
143
+ # 基本截图(全页,标准分辨率)
144
+ md-fetch-screen https://example.com
145
+
146
+ # 视口截图,自定义尺寸
147
+ md-fetch-screen https://example.com --viewport -W 1440 -H 900
148
+
149
+ # 高清截图(2倍像素比例)
150
+ md-fetch-screen https://example.com --scale 2
151
+
152
+ # 截取特定元素
153
+ md-fetch-screen https://example.com --selector "#main-content"
154
+
155
+ # 隐藏广告和弹窗
156
+ md-fetch-screen https://example.com --hide ".ad,.popup"
157
+
158
+ # JPEG 格式,指定输出目录
159
+ md-fetch-screen https://example.com --format jpeg --quality 85 --output ./screenshots
160
+
161
+ # 等待页面加载后延迟截图
162
+ md-fetch-screen https://example.com --wait-until networkidle0 --delay 2000
163
+
164
+ # 批量截图
165
+ md-fetch-screen https://site1.com https://site2.com https://site3.com
166
+
167
+ # 详细日志
168
+ md-fetch-screen https://example.com --verbose
169
+ ```
170
+
171
+ ## 错误处理
172
+ - **网络错误**: 自动重试 3 次,带指数退避
173
+ - **浏览器错误**: 提供清晰的 Chrome 安装提示
174
+ - **提取错误**: 如果 readability 失败,回退到原始 HTML
175
+ - **批量处理**: 单个失败不影响其他 URL,最后汇总报告
176
+
177
+ ## 当前状态
178
+
179
+ ### md-fetch 已实现功能
180
+ - ✅ HTTP fetch 获取网页内容
181
+ - ✅ Readability 内容提取
182
+ - ✅ HTML 转 Markdown
183
+ - ✅ YAML frontmatter 自动生成
184
+ - ✅ 输出到 stdout 或文件
185
+ - ✅ 自定义选择器提取
186
+ - ✅ 禁用 readability 选项
187
+ - ✅ 自定义 HTTP headers
188
+ - ✅ 超时配置
189
+ - ✅ 详细日志模式
190
+ - ✅ 多个 URL 处理
191
+ - ✅ 自动重试(3次,指数退避)
192
+ - ✅ 浏览器模式(Puppeteer 无头浏览器)
193
+ - ✅ 代理支持(环境变量 HTTP_PROXY/HTTPS_PROXY/NO_PROXY)
194
+
195
+ ### md-fetch-screen 已实现功能
196
+ - ✅ 全页截图模式
197
+ - ✅ 视口截图模式
198
+ - ✅ 自定义视口尺寸(宽度/高度)
199
+ - ✅ 设备像素比例(scale)支持高清截图
200
+ - ✅ 多种图片格式(PNG/JPEG/WebP)
201
+ - ✅ 质量控制(JPEG/WebP)
202
+ - ✅ 截取特定元素(CSS 选择器)
203
+ - ✅ 隐藏元素功能
204
+ - ✅ 截图前延迟
205
+ - ✅ 自动文件命名(URL + 时间戳)
206
+ - ✅ 批量截图
207
+ - ✅ 代理支持
208
+ - ✅ 详细日志模式
209
+ - ✅ 参数验证和错误处理
210
+
211
+ ### 待实现功能
212
+ - ⏳ 从文件批量读取 URL
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,449 @@
1
+ # md-fetch
2
+
3
+ [中文文档](./README.zh-CN.md)
4
+
5
+ A suite of CLI tools for web content processing:
6
+ - **md-fetch** - Convert web pages to clean Markdown format
7
+ - **md-fetch-screen** - Take high-quality screenshots of web pages
8
+
9
+ ## Authors
10
+
11
+ Built by **Claude Code** & **Claude Sonnet**
12
+
13
+ ## Table of Contents
14
+
15
+ - [md-fetch - Markdown Converter](#md-fetch---markdown-converter)
16
+ - [Features](#features)
17
+ - [Installation](#installation)
18
+ - [Usage](#usage)
19
+ - [CLI Options](#cli-options)
20
+ - [md-fetch-screen - Screenshot Tool](#md-fetch-screen---screenshot-tool)
21
+ - [Features](#screenshot-features)
22
+ - [Usage](#screenshot-usage)
23
+ - [CLI Options](#screenshot-cli-options)
24
+ - [Tech Stack](#tech-stack)
25
+ - [Development](#development)
26
+
27
+ ---
28
+
29
+ # md-fetch - Markdown Converter
30
+
31
+ ## Features
32
+
33
+ - 🚀 Fetch web content using native fetch API
34
+ - 🌐 Headless browser mode (Puppeteer) for SPA pages
35
+ - 📄 Extract main content using Mozilla Readability
36
+ - ✨ Convert HTML to Markdown using Turndown
37
+ - 📋 **Auto-generate YAML frontmatter** (includes title, URL, author, publish date, and more metadata)
38
+ - 🎯 Custom CSS selector support for content extraction
39
+ - 🔒 Proxy support (HTTP_PROXY/HTTPS_PROXY environment variables)
40
+ - ⚙️ Configurable timeout, headers, and other options
41
+ - 🔄 Auto-retry (3 times with exponential backoff)
42
+ - 📦 Minimal dependencies
43
+
44
+ ## Installation
45
+
46
+ ### Development Setup
47
+
48
+ ```bash
49
+ # Clone the repository (if you haven't already)
50
+ git clone <repo-url>
51
+ cd md-fetch
52
+
53
+ # Install dependencies
54
+ pnpm install
55
+ ```
56
+
57
+ ### Global Installation
58
+
59
+ **Using pnpm:**
60
+
61
+ ```bash
62
+ # 1. Build the project
63
+ pnpm build
64
+
65
+ # 2. Setup pnpm (first time only)
66
+ pnpm setup
67
+
68
+ # 3. Link globally (recommended for development)
69
+ pnpm link --global
70
+
71
+ # 4. Now you can use md-fetch anywhere
72
+ md-fetch https://example.com
73
+ ```
74
+
75
+ **Using npm:**
76
+
77
+ ```bash
78
+ # 1. Build the project
79
+ pnpm build
80
+
81
+ # 2. Link globally
82
+ npm link
83
+
84
+ # 3. Now you can use md-fetch anywhere
85
+ md-fetch https://example.com
86
+ ```
87
+
88
+ ### Rebuild After Code Changes
89
+
90
+ ```bash
91
+ # 1. Rebuild
92
+ pnpm build
93
+
94
+ # 2. No need to re-link, changes take effect automatically
95
+ md-fetch https://example.com
96
+ ```
97
+
98
+ ### Uninstall
99
+
100
+ **Using pnpm:**
101
+
102
+ ```bash
103
+ # Unlink globally
104
+ pnpm unlink --global
105
+
106
+ # Optional: Clean up unused packages in pnpm global store
107
+ pnpm store prune
108
+ ```
109
+
110
+ **Using npm:**
111
+
112
+ ```bash
113
+ # Unlink globally
114
+ npm unlink -g md-fetch
115
+ ```
116
+
117
+ **Remove project:**
118
+
119
+ ```bash
120
+ # Simply delete the project directory
121
+ cd ..
122
+ rm -rf md-fetch # Or use rmdir /s md-fetch on Windows
123
+ ```
124
+
125
+ ## Usage
126
+
127
+ ### Development Mode
128
+
129
+ ```bash
130
+ # Basic usage - output to stdout
131
+ pnpm dev -- https://example.com
132
+
133
+ # Save to file
134
+ pnpm dev -- https://example.com -o output.md
135
+
136
+ # Browser mode (for SPA pages)
137
+ pnpm dev -- -b https://react-app.example.com
138
+
139
+ # Disable readability, keep full HTML content
140
+ pnpm dev -- https://example.com -R
141
+ # Or use the full option name
142
+ pnpm dev -- https://example.com --no-readability
143
+
144
+ # Use custom CSS selector
145
+ pnpm dev -- https://example.com -s "article.main-content"
146
+
147
+ # Process multiple URLs
148
+ pnpm dev -- https://example.com https://httpbin.org/html
149
+
150
+ # Custom HTTP headers
151
+ pnpm dev -- https://example.com -H "Authorization: Bearer token"
152
+
153
+ # Use proxy
154
+ pnpm dev -- https://example.com --proxy http://proxy.example.com:8080
155
+
156
+ # Verbose logging
157
+ pnpm dev -- https://example.com --verbose
158
+
159
+ # View all options
160
+ pnpm dev -- --help
161
+ ```
162
+
163
+ ### Production Usage (After Global Installation)
164
+
165
+ ```bash
166
+ # Basic usage
167
+ md-fetch https://example.com
168
+
169
+ # Save to file
170
+ md-fetch https://example.com -o article.md
171
+
172
+ # Browser mode
173
+ md-fetch -b https://react-app.example.com
174
+
175
+ # Use proxy (from environment variable)
176
+ export HTTPS_PROXY=http://proxy.example.com:8080
177
+ md-fetch https://example.com
178
+ ```
179
+
180
+ ## Output Example
181
+
182
+ md-fetch automatically adds YAML frontmatter at the beginning of the Markdown file with page metadata:
183
+
184
+ ```markdown
185
+ ---
186
+ title: "Example Domain"
187
+ url: https://example.com
188
+ description: "Example Domain description"
189
+ author: "John Doe"
190
+ siteName: "Example"
191
+ publishedTime: 2024-01-01T00:00:00Z
192
+ modifiedTime: 2024-01-15T10:30:00Z
193
+ keywords:
194
+ - example
195
+ - demo
196
+ - test
197
+ image: https://example.com/og-image.jpg
198
+ lang: en
199
+ ---
200
+
201
+ # Example Domain
202
+
203
+ This domain is for use in illustrative examples...
204
+ ```
205
+
206
+ ### Frontmatter Fields
207
+
208
+ - `title` - Page title (extracted from Readability, Open Graph, Twitter Cards, or `<title>` tag)
209
+ - `url` - Original URL
210
+ - `description` - Page description or excerpt
211
+ - `author` - Author information
212
+ - `siteName` - Site name
213
+ - `publishedTime` - Published date (ISO 8601 format)
214
+ - `modifiedTime` - Last modified date (ISO 8601 format)
215
+ - `keywords` - Keywords array
216
+ - `image` - Main page image (Open Graph or Twitter Cards)
217
+ - `lang` - Page language code
218
+
219
+ ## CLI Options
220
+
221
+ ```
222
+ Usage: md-fetch <urls...> [options]
223
+
224
+ Arguments:
225
+ urls URLs to convert to Markdown
226
+
227
+ Options:
228
+ -V, --version output the version number
229
+ -o, --output <file> Output to file instead of stdout
230
+ -b, --browser Use headless browser mode (for SPA pages)
231
+ --browser-path <path> Custom Chrome/Chromium executable path
232
+ -R, --no-readability Disable readability, keep full HTML content
233
+ -s, --selector <selector> Custom CSS selector to extract content
234
+ -H, --header <header> Custom HTTP header (can be repeated)
235
+ --proxy <url> Proxy server URL (also reads HTTP_PROXY/HTTPS_PROXY env vars)
236
+ -t, --timeout <ms> Request timeout in milliseconds (default: 30000)
237
+ --user-agent <string> Custom user agent (default: "md-fetch/1.0.0")
238
+ --wait-until <event> Browser wait condition (load|domcontentloaded|networkidle0|networkidle2)
239
+ --verbose Enable verbose logging
240
+ -h, --help display help for command
241
+ ```
242
+
243
+ ## Tech Stack
244
+
245
+ - **TypeScript** - Type safety
246
+ - **Node.js ≥18** - Native fetch API
247
+ - **ES Modules** - Modern JavaScript
248
+ - **Commander** - CLI argument parsing
249
+ - **Mozilla Readability** - Smart content extraction
250
+ - **Turndown** - HTML to Markdown conversion
251
+ - **JSDOM** - DOM parsing
252
+ - **Puppeteer-core** - Headless browser support
253
+ - **Undici** - Proxy support
254
+
255
+ ## Development
256
+
257
+ ```bash
258
+ # Install dependencies
259
+ pnpm install
260
+
261
+ # Development mode
262
+ pnpm dev -- <url>
263
+
264
+ # Build
265
+ pnpm build
266
+
267
+ # Run tests
268
+ pnpm test
269
+ ```
270
+
271
+ ## How It Works
272
+
273
+ 1. **Fetch** - Fetch HTML content using native fetch or Puppeteer headless browser
274
+ 2. **Extract** - Extract main content using Readability or custom selector, also extract page metadata
275
+ 3. **Convert** - Convert to Markdown using Turndown
276
+ 4. **Generate Frontmatter** - Generate YAML frontmatter from extracted metadata
277
+ 5. **Output** - Output frontmatter and Markdown content to stdout or save to file
278
+
279
+ ## Proxy Support
280
+
281
+ md-fetch automatically reads proxy configuration from environment variables:
282
+
283
+ ```bash
284
+ # Set proxy
285
+ export HTTP_PROXY=http://proxy.example.com:8080
286
+ export HTTPS_PROXY=http://proxy.example.com:8080
287
+
288
+ # Exclude certain domains
289
+ export NO_PROXY=localhost,127.0.0.1,.example.com
290
+
291
+ # Or via command line argument
292
+ md-fetch https://example.com --proxy http://proxy.example.com:8080
293
+ ```
294
+
295
+ ---
296
+
297
+ # md-fetch-screen - Screenshot Tool
298
+
299
+ ## Screenshot Features
300
+
301
+ - 📸 Take high-quality screenshots of web pages
302
+ - 🖥️ Full-page or viewport-only screenshot modes
303
+ - 📐 Customizable viewport size (width/height)
304
+ - ✨ Device scale factor support for high-DPI screenshots (Retina displays)
305
+ - 🎨 Multiple image formats (PNG, JPEG, WebP)
306
+ - 🎯 Screenshot specific elements using CSS selectors
307
+ - 🙈 Hide unwanted elements (ads, popups, etc.)
308
+ - ⏱️ Configurable delay before screenshot
309
+ - 🔒 Proxy support
310
+ - 🌐 Headless browser mode using Puppeteer
311
+ - 📁 Automatic filename generation from URL and timestamp
312
+ - 🔄 Batch screenshot multiple URLs
313
+
314
+ ## Screenshot Usage
315
+
316
+ ### Basic Usage
317
+
318
+ ```bash
319
+ # Basic screenshot (full page, standard resolution)
320
+ md-fetch-screen https://example.com
321
+
322
+ # Viewport-only screenshot with custom size
323
+ md-fetch-screen https://example.com --viewport -W 1440 -H 900
324
+
325
+ # High-DPI screenshot (2x scale for Retina displays)
326
+ md-fetch-screen https://example.com --scale 2
327
+
328
+ # Screenshot with verbose logging
329
+ md-fetch-screen https://example.com --verbose
330
+ ```
331
+
332
+ ### Advanced Usage
333
+
334
+ ```bash
335
+ # Screenshot specific element
336
+ md-fetch-screen https://example.com --selector "#main-content"
337
+
338
+ # Hide ads and popups
339
+ md-fetch-screen https://example.com --hide ".ad,.popup,.cookie-banner"
340
+
341
+ # JPEG format with custom quality
342
+ md-fetch-screen https://example.com --format jpeg --quality 85
343
+
344
+ # Save to specific directory
345
+ md-fetch-screen https://example.com --output ./screenshots
346
+
347
+ # Wait for page to load, then delay 2 seconds
348
+ md-fetch-screen https://example.com --wait-until networkidle0 --delay 2000
349
+
350
+ # Batch screenshot multiple URLs
351
+ md-fetch-screen https://site1.com https://site2.com https://site3.com
352
+ ```
353
+
354
+ ### Understanding Width, Height, and Scale
355
+
356
+ **Full-Page Mode (default):**
357
+ - Width/Height control the browser viewport size
358
+ - The screenshot captures the entire page content
359
+ - Final image dimensions depend on actual page height
360
+
361
+ ```bash
362
+ # Full page with 1920px viewport width
363
+ md-fetch-screen https://example.com -W 1920 -H 1080
364
+ ```
365
+
366
+ **Viewport Mode:**
367
+ - Width/Height directly control the screenshot size
368
+ - Only captures what's visible in the viewport
369
+
370
+ ```bash
371
+ # Exactly 1440x900 screenshot
372
+ md-fetch-screen https://example.com --viewport -W 1440 -H 900
373
+ ```
374
+
375
+ **Scale Factor (Device Pixel Ratio):**
376
+ - `--scale 1` (default): Standard resolution
377
+ - Viewport 1920x1080 → Image 1920x1080 pixels
378
+ - `--scale 2`: High-DPI (Retina)
379
+ - Viewport 1920x1080 → Image 3840x2160 pixels
380
+ - `--scale 3`: Ultra high-DPI
381
+ - Viewport 1920x1080 → Image 5760x3240 pixels
382
+
383
+ ```bash
384
+ # High-quality Retina screenshot
385
+ md-fetch-screen https://example.com --scale 2
386
+
387
+ # Viewport mode with 2x scale = 2880x1800 final image
388
+ md-fetch-screen https://example.com --viewport -W 1440 -H 900 --scale 2
389
+ ```
390
+
391
+ ## Screenshot CLI Options
392
+
393
+ ```
394
+ Usage: md-fetch-screen [options] <urls...>
395
+
396
+ Arguments:
397
+ urls URLs to screenshot
398
+
399
+ Options:
400
+ -V, --version output the version number
401
+
402
+ Viewport & Size:
403
+ -f, --full-page Full page screenshot (default)
404
+ --viewport Viewport-only screenshot
405
+ -W, --width <pixels> Viewport width in pixels (default: 1920)
406
+ -H, --height <pixels> Viewport height in pixels (default: 1080)
407
+ --scale <number> Device scale factor for high-DPI (1/2/3, default: 1)
408
+
409
+ Output:
410
+ --output <dir> Output directory (default: ".")
411
+ --format <type> Image format: png|jpeg|webp (default: "png")
412
+ --quality <number> JPEG/WebP quality 0-100 (default: 90)
413
+
414
+ Browser:
415
+ --browser-path <path> Custom Chrome/Chromium executable path
416
+ --wait-until <event> Wait condition: load|domcontentloaded|networkidle0|networkidle2
417
+ --timeout <ms> Timeout in milliseconds (default: 30000)
418
+ --user-agent <string> Custom user agent
419
+ --proxy <url> Proxy server URL
420
+
421
+ Content:
422
+ --delay <ms> Delay before screenshot in ms (default: 0)
423
+ --selector <css> CSS selector to screenshot specific element
424
+ --hide <selectors> CSS selectors to hide (comma-separated)
425
+
426
+ Other:
427
+ --verbose Enable verbose logging
428
+ -h, --help display help for command
429
+ ```
430
+
431
+ ### Filename Format
432
+
433
+ Screenshots are automatically named using the following format:
434
+ ```
435
+ <domain_path_50chars>_<timestamp>.png
436
+ ```
437
+
438
+ Examples:
439
+ - `example.com_20251229153045.png`
440
+ - `github.com_user_repo_issues_123_20251229153045.png`
441
+
442
+ The filename includes:
443
+ - Domain and path (up to 50 characters, sanitized for filesystem safety)
444
+ - Timestamp in format: `YYYYMMDDHHmmss`
445
+ - File extension based on format
446
+
447
+ ## License
448
+
449
+ MIT