cf-killer 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,290 @@
1
+ Metadata-Version: 2.4
2
+ Name: cf-killer
3
+ Version: 0.1.0
4
+ Summary: Cloudflare 5s 盾自动求解 + 页面批量抓取工具(基于 CloakBrowser)
5
+ Author: cf-killer contributors
6
+ License: MIT
7
+ Project-URL: Homepage, https://github.com/CloakHQ/CloakBrowser
8
+ Project-URL: Repository, https://github.com/CloakHQ/CloakBrowser
9
+ Keywords: cloudflare,anti-bot,web-scraping,cloakbrowser,turnstile-solver,playwright
10
+ Classifier: Development Status :: 3 - Alpha
11
+ Classifier: Intended Audience :: Developers
12
+ Classifier: License :: OSI Approved :: MIT License
13
+ Classifier: Operating System :: OS Independent
14
+ Classifier: Programming Language :: Python :: 3
15
+ Classifier: Programming Language :: Python :: 3.9
16
+ Classifier: Programming Language :: Python :: 3.10
17
+ Classifier: Programming Language :: Python :: 3.11
18
+ Classifier: Programming Language :: Python :: 3.12
19
+ Classifier: Topic :: Internet :: WWW/HTTP
20
+ Classifier: Topic :: Software Development :: Libraries :: Python Modules
21
+ Requires-Python: >=3.9
22
+ Description-Content-Type: text/markdown
23
+ Requires-Dist: cloakbrowser>=0.3.0
24
+ Requires-Dist: playwright>=1.40
25
+
26
+ # CF Killer
27
+
28
+ 基于 [CloakBrowser](https://github.com/erickirt/CloakBrowser)(Chromium C++ 源码级反检测浏览器)的 **Cloudflare 5 秒盾自动求解 + 页面批量抓取** 工具。
29
+
30
+ ---
31
+
32
+ ## 1. 运行环境
33
+
34
+ | 项目 | 说明 |
35
+ |------|------|
36
+ | OS | Windows 10+ / Linux / macOS |
37
+ | Python | 3.9+(推荐 3.11) |
38
+ | 浏览器 | CloakBrowser 专用 Chromium(自动下载,~200MB) |
39
+
40
+ ## 2. 依赖安装
41
+
42
+ ```bash
43
+ # 1. 安装 cloakbrowser(含 Playwright)
44
+ pip install cloakbrowser
45
+
46
+ # 2. 下载特制 Chromium 二进制(首次运行前执行一次)
47
+ python -c "import cloakbrowser; cloakbrowser.ensure_binary()"
48
+ ```
49
+
50
+ 核心依赖链:
51
+
52
+ ```
53
+ cloakbrowser (C++ 源码级反检测 Chromium)
54
+ ├── playwright >= 1.40 # 浏览器自动化
55
+ ├── httpx >= 0.24 # HTTP 客户端
56
+ └── greenlet >= 3.1.1 # 协程支持
57
+ ```
58
+
59
+ ---
60
+
61
+ ## 3. 功能概述
62
+
63
+ ### 3.1 Cloudflare 自动解盾 (`CFSolver`)
64
+
65
+ 自动检测并求解 Cloudflare Turnstile 挑战,支持多种 challenge 类型:
66
+
67
+ | 类型 | 策略 |
68
+ |------|------|
69
+ | `non-interactive` | 纯轮询等待 CF 自动放行 |
70
+ | `managed` | 等待 iframe → 点击 checkbox → 轮询消失 |
71
+ | `interactive` | 同上,带更复杂的点击路径 |
72
+ | `embedded` | 嵌入式 Turnstile 求解 |
73
+
74
+ 点击采用四路径递进策略:iframe 内精确选择器 → iframe 坐标点击 → 主页面容器坐标 → Tab+Space 兜底。
75
+
76
+ ### 3.2 页面批量抓取 (`CFPageFetcher`)
77
+
78
+ - 基于 CloakBrowser 持久化上下文,复用浏览器指纹和 cookie
79
+ - 内置 CF 检测(支持 JS 延迟写入标题的站点,如 ScienceDirect)
80
+ - 自动 context 回收:处理 N 页后重建浏览器上下文,防止内存泄漏
81
+ - **延迟回收机制**:并发场景下等活跃页面全部完成后再回收,避免竞态崩溃
82
+ - 支持代理(单实例/多实例/callable 三种模式)
83
+
84
+ ### 3.3 文件下载 (`download_file`)
85
+
86
+ 过 CF 后,通过页内 `fetch()` 直接下载二进制文件(PDF、图片等),复用浏览器 cookie 和 TLS 指纹,绕过反爬限制。
87
+
88
+ ### 3.4 多实例并行 (`fetch_all`)
89
+
90
+ 将 URL 均匀分配到多个浏览器实例,每个实例独立 event loop + 独立代理,ThreadPoolExecutor 并行执行,最大化吞吐量。
91
+
92
+ ### 3.5 主入口函数
93
+
94
+ | 函数 | 用途 |
95
+ |------|------|
96
+ | `fetch_url(url, ...)` | 同步抓取单个 URL |
97
+ | `fetch_urls(urls, ...)` | 同步批量抓取(`fetch_all` 别名) |
98
+ | `fetch_all(urls, ...)` | 多实例并行抓取,支持分片代理 |
99
+
100
+ ---
101
+
102
+ ## 4. 测试案例
103
+
104
+ ### 案例 A:批量页面抓取
105
+
106
+ 测试 31 个混合 URL(Gut 医学期刊 + American Football Wiki + ScienceDirect),验证 CF 解盾和页面抓取能力。
107
+
108
+ ```python
109
+ # -*- coding: utf-8 -*-
110
+ """CF 自动解盾 + 页面抓取 — 测试脚本"""
111
+ import os
112
+ import sys
113
+
114
+ if sys.platform == "win32":
115
+ import io
116
+ sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding="utf-8", errors="replace")
117
+ sys.stderr = io.TextIOWrapper(sys.stderr.buffer, encoding="utf-8", errors="replace")
118
+
119
+ sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
120
+ from cf_killer import fetch_all
121
+
122
+ HEADLESS = True
123
+ PROXY = None
124
+ CONCURRENCY = 3
125
+ INSTANCES = 2
126
+ MAX_PAGES_PER_CONTEXT = 10
127
+ RETURN_COOKIES = False
128
+
129
+ URLS = [
130
+ "https://gut.bmj.com/content/75/6/1085",
131
+ "https://gut.bmj.com/content/75/6/1087",
132
+ "https://gut.bmj.com/content/75/6/1090",
133
+ "https://gut.bmj.com/content/75/6/1092",
134
+ "https://gut.bmj.com/content/75/6/1094",
135
+ "https://gut.bmj.com/content/75/6/1097",
136
+ "https://gut.bmj.com/content/75/6/1110",
137
+ "https://gut.bmj.com/content/75/6/1123",
138
+ "https://gut.bmj.com/content/75/6/1136",
139
+ "https://gut.bmj.com/content/75/6/1147",
140
+ "https://gut.bmj.com/content/75/6/1160",
141
+ "https://gut.bmj.com/content/75/6/1169",
142
+ "https://gut.bmj.com/content/75/6/1186",
143
+ "https://gut.bmj.com/content/75/6/1201",
144
+ "https://gut.bmj.com/content/75/6/1211",
145
+ "https://gut.bmj.com/content/75/6/1226",
146
+ "https://gut.bmj.com/content/75/6/1237",
147
+ "https://gut.bmj.com/content/75/6/1248",
148
+ "https://gut.bmj.com/content/75/6/1264",
149
+ "https://gut.bmj.com/content/75/6/1266.1",
150
+ "https://gut.bmj.com/content/75/6/1266.2",
151
+ "https://gut.bmj.com/content/75/6/1267",
152
+ "https://gut.bmj.com/content/75/6/1109",
153
+ "http://americanfootball.fandom.com/1993_Kentucky_vs._Mississippi",
154
+ "http://americanfootball.fandom.com/Isaiah_Foskey",
155
+ "http://americanfootball.fandom.com/wiki/2014_Susquehanna_Crusaders",
156
+ "http://americanfootball.fandom.com/wiki/2015_Lake_Forest_Foresters",
157
+ "http://americanfootball.fandom.com/wiki/2023_Colorado_State_Rams",
158
+ "http://americanfootballdatabase.fandom.com/Paul_Hackett_(American_football)",
159
+ "http://americanfootballdatabase.fandom.com/wiki/100th_Grey_Cup",
160
+ "https://www.sciencedirect.com/science/article/pii/S0039606025002491",
161
+ ]
162
+
163
+ if __name__ == "__main__":
164
+ print(f"测试: {len(URLS)} 个 URL")
165
+
166
+ results = fetch_all(
167
+ URLS,
168
+ instances=INSTANCES,
169
+ concurrency=CONCURRENCY,
170
+ max_pages_per_context=MAX_PAGES_PER_CONTEXT,
171
+ headless=HEADLESS,
172
+ solve_cf=True,
173
+ proxy=PROXY,
174
+ return_cookies=RETURN_COOKIES,
175
+ verbose=False,
176
+ )
177
+
178
+ ok = sum(1 for r in results if r["success"])
179
+ print(f"\n{'='*50}")
180
+ for r in results:
181
+ status = "✓" if r["success"] else "✗"
182
+ print(f" {status} {(r['title'] or 'FAILED')[:60]}")
183
+ print(f"{'='*50}")
184
+ print(f"结果: {ok}/{len(results)} 成功")
185
+ ```
186
+
187
+ 运行:`python test.py`
188
+
189
+ 预期输出:
190
+
191
+ ```
192
+ 测试: 31 个 URL
193
+
194
+ ==================================================
195
+ ✓ Gut-peritoneal-multisystem axis in endometriosis | Gut
196
+ ✓ Hitting the mitotic spot of fibrolamellar carcinoma | Gut
197
+ ...
198
+ ✓ 100th Grey Cup | American Football Database | Fandom
199
+ ✓ Guidelines for perioperative care in elective colorectal sur
200
+ ==================================================
201
+ 结果: 31/31 成功
202
+ ```
203
+
204
+ ---
205
+
206
+ ### 案例 B:PDF 文件下载
207
+
208
+ 通过过 CF 后的浏览器页面发起 `fetch()` 下载 PDF,复用 TLS 指纹和 cookie。
209
+
210
+ ```python
211
+ # -*- coding: utf-8 -*-
212
+ """测试 download_file 方法 — PDF 下载"""
213
+ import asyncio
214
+ import os
215
+ import sys
216
+
217
+ if sys.platform == "win32":
218
+ import io
219
+ sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding="utf-8", errors="replace")
220
+ sys.stderr = io.TextIOWrapper(sys.stderr.buffer, encoding="utf-8", errors="replace")
221
+
222
+ sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
223
+ from cf_killer import CFPageFetcher
224
+
225
+ PDF_URL = "https://www.myavls.org/assets/pdf/SuperficialVenousDiseaseGuidelinesPMS313-02.03.16.pdf"
226
+ OUTPUT = os.path.join(os.path.dirname(os.path.abspath(__file__)), "SuperficialVenousDiseaseGuidelines.pdf")
227
+
228
+
229
+ async def main():
230
+ print(f"目标: {PDF_URL}")
231
+ print(f"保存: {OUTPUT}")
232
+
233
+ async with CFPageFetcher(
234
+ headless=True,
235
+ verbose=True,
236
+ solve_cf=True,
237
+ ) as fetcher:
238
+ ok = await fetcher.download_file(PDF_URL, OUTPUT)
239
+ if ok:
240
+ size_kb = os.path.getsize(OUTPUT) / 1024
241
+ print(f"\n✅ 下载成功! 文件: {OUTPUT} ({size_kb:.0f} KB)")
242
+ else:
243
+ print(f"\n❌ 下载失败")
244
+
245
+
246
+ if __name__ == "__main__":
247
+ asyncio.run(main())
248
+ ```
249
+
250
+ 运行:`python test_download.py`
251
+
252
+ 预期输出:
253
+
254
+ ```
255
+ 目标: https://www.myavls.org/assets/pdf/SuperficialVenousDiseaseGuidelinesPMS313-02.03.16.pdf
256
+ 保存: ...\SuperficialVenousDiseaseGuidelines.pdf
257
+ [上下文] 已创建
258
+ [下载] 预热: https://www.myavls.org/
259
+ 非 CF url=https://www.myavls.org/
260
+ [下载] 已保存: ...\SuperficialVenousDiseaseGuidelines.pdf (121KB)
261
+
262
+ ✅ 下载成功! ... (121 KB)
263
+ ```
264
+
265
+ ---
266
+
267
+ ## 5. 主要 API 参数
268
+
269
+ ### `CFPageFetcher`
270
+
271
+ | 参数 | 类型 | 默认值 | 说明 |
272
+ |------|------|--------|------|
273
+ | `headless` | bool | True | 无头模式 |
274
+ | `humanize` | bool | False | 人类化鼠标轨迹/键盘时序 |
275
+ | `solve_cf` | bool | True | 自动求解 CF 挑战 |
276
+ | `cf_max_retries` | int | 5 | CF 求解最大重试次数 |
277
+ | `timeout` | int | 90000 | 页面导航超时 (ms) |
278
+ | `proxy` | str | None | 代理 URL |
279
+ | `max_pages_per_context` | int | 20 | 每 N 页回收浏览器上下文 |
280
+ | `return_cookies` | bool | False | 结果中是否包含 cookies |
281
+
282
+ ### `fetch_all`
283
+
284
+ | 参数 | 类型 | 默认值 | 说明 |
285
+ |------|------|--------|------|
286
+ | `urls` | list | - | URL 列表 |
287
+ | `instances` | int | 1 | 并行浏览器实例数 |
288
+ | `concurrency` | int | 3 | 每实例并发 tab 数 |
289
+ | `max_pages_per_context` | int | 20 | 每 N 页自动回收 |
290
+ | `proxy` | str/list/callable | None | 单代理/代理列表/代理工厂函数 |
@@ -0,0 +1,265 @@
1
+ # CF Killer
2
+
3
+ 基于 [CloakBrowser](https://github.com/erickirt/CloakBrowser)(Chromium C++ 源码级反检测浏览器)的 **Cloudflare 5 秒盾自动求解 + 页面批量抓取** 工具。
4
+
5
+ ---
6
+
7
+ ## 1. 运行环境
8
+
9
+ | 项目 | 说明 |
10
+ |------|------|
11
+ | OS | Windows 10+ / Linux / macOS |
12
+ | Python | 3.9+(推荐 3.11) |
13
+ | 浏览器 | CloakBrowser 专用 Chromium(自动下载,~200MB) |
14
+
15
+ ## 2. 依赖安装
16
+
17
+ ```bash
18
+ # 1. 安装 cloakbrowser(含 Playwright)
19
+ pip install cloakbrowser
20
+
21
+ # 2. 下载特制 Chromium 二进制(首次运行前执行一次)
22
+ python -c "import cloakbrowser; cloakbrowser.ensure_binary()"
23
+ ```
24
+
25
+ 核心依赖链:
26
+
27
+ ```
28
+ cloakbrowser (C++ 源码级反检测 Chromium)
29
+ ├── playwright >= 1.40 # 浏览器自动化
30
+ ├── httpx >= 0.24 # HTTP 客户端
31
+ └── greenlet >= 3.1.1 # 协程支持
32
+ ```
33
+
34
+ ---
35
+
36
+ ## 3. 功能概述
37
+
38
+ ### 3.1 Cloudflare 自动解盾 (`CFSolver`)
39
+
40
+ 自动检测并求解 Cloudflare Turnstile 挑战,支持多种 challenge 类型:
41
+
42
+ | 类型 | 策略 |
43
+ |------|------|
44
+ | `non-interactive` | 纯轮询等待 CF 自动放行 |
45
+ | `managed` | 等待 iframe → 点击 checkbox → 轮询消失 |
46
+ | `interactive` | 同上,带更复杂的点击路径 |
47
+ | `embedded` | 嵌入式 Turnstile 求解 |
48
+
49
+ 点击采用四路径递进策略:iframe 内精确选择器 → iframe 坐标点击 → 主页面容器坐标 → Tab+Space 兜底。
50
+
51
+ ### 3.2 页面批量抓取 (`CFPageFetcher`)
52
+
53
+ - 基于 CloakBrowser 持久化上下文,复用浏览器指纹和 cookie
54
+ - 内置 CF 检测(支持 JS 延迟写入标题的站点,如 ScienceDirect)
55
+ - 自动 context 回收:处理 N 页后重建浏览器上下文,防止内存泄漏
56
+ - **延迟回收机制**:并发场景下等活跃页面全部完成后再回收,避免竞态崩溃
57
+ - 支持代理(单实例/多实例/callable 三种模式)
58
+
59
+ ### 3.3 文件下载 (`download_file`)
60
+
61
+ 过 CF 后,通过页内 `fetch()` 直接下载二进制文件(PDF、图片等),复用浏览器 cookie 和 TLS 指纹,绕过反爬限制。
62
+
63
+ ### 3.4 多实例并行 (`fetch_all`)
64
+
65
+ 将 URL 均匀分配到多个浏览器实例,每个实例独立 event loop + 独立代理,ThreadPoolExecutor 并行执行,最大化吞吐量。
66
+
67
+ ### 3.5 主入口函数
68
+
69
+ | 函数 | 用途 |
70
+ |------|------|
71
+ | `fetch_url(url, ...)` | 同步抓取单个 URL |
72
+ | `fetch_urls(urls, ...)` | 同步批量抓取(`fetch_all` 别名) |
73
+ | `fetch_all(urls, ...)` | 多实例并行抓取,支持分片代理 |
74
+
75
+ ---
76
+
77
+ ## 4. 测试案例
78
+
79
+ ### 案例 A:批量页面抓取
80
+
81
+ 测试 31 个混合 URL(Gut 医学期刊 + American Football Wiki + ScienceDirect),验证 CF 解盾和页面抓取能力。
82
+
83
+ ```python
84
+ # -*- coding: utf-8 -*-
85
+ """CF 自动解盾 + 页面抓取 — 测试脚本"""
86
+ import os
87
+ import sys
88
+
89
+ if sys.platform == "win32":
90
+ import io
91
+ sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding="utf-8", errors="replace")
92
+ sys.stderr = io.TextIOWrapper(sys.stderr.buffer, encoding="utf-8", errors="replace")
93
+
94
+ sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
95
+ from cf_killer import fetch_all
96
+
97
+ HEADLESS = True
98
+ PROXY = None
99
+ CONCURRENCY = 3
100
+ INSTANCES = 2
101
+ MAX_PAGES_PER_CONTEXT = 10
102
+ RETURN_COOKIES = False
103
+
104
+ URLS = [
105
+ "https://gut.bmj.com/content/75/6/1085",
106
+ "https://gut.bmj.com/content/75/6/1087",
107
+ "https://gut.bmj.com/content/75/6/1090",
108
+ "https://gut.bmj.com/content/75/6/1092",
109
+ "https://gut.bmj.com/content/75/6/1094",
110
+ "https://gut.bmj.com/content/75/6/1097",
111
+ "https://gut.bmj.com/content/75/6/1110",
112
+ "https://gut.bmj.com/content/75/6/1123",
113
+ "https://gut.bmj.com/content/75/6/1136",
114
+ "https://gut.bmj.com/content/75/6/1147",
115
+ "https://gut.bmj.com/content/75/6/1160",
116
+ "https://gut.bmj.com/content/75/6/1169",
117
+ "https://gut.bmj.com/content/75/6/1186",
118
+ "https://gut.bmj.com/content/75/6/1201",
119
+ "https://gut.bmj.com/content/75/6/1211",
120
+ "https://gut.bmj.com/content/75/6/1226",
121
+ "https://gut.bmj.com/content/75/6/1237",
122
+ "https://gut.bmj.com/content/75/6/1248",
123
+ "https://gut.bmj.com/content/75/6/1264",
124
+ "https://gut.bmj.com/content/75/6/1266.1",
125
+ "https://gut.bmj.com/content/75/6/1266.2",
126
+ "https://gut.bmj.com/content/75/6/1267",
127
+ "https://gut.bmj.com/content/75/6/1109",
128
+ "http://americanfootball.fandom.com/1993_Kentucky_vs._Mississippi",
129
+ "http://americanfootball.fandom.com/Isaiah_Foskey",
130
+ "http://americanfootball.fandom.com/wiki/2014_Susquehanna_Crusaders",
131
+ "http://americanfootball.fandom.com/wiki/2015_Lake_Forest_Foresters",
132
+ "http://americanfootball.fandom.com/wiki/2023_Colorado_State_Rams",
133
+ "http://americanfootballdatabase.fandom.com/Paul_Hackett_(American_football)",
134
+ "http://americanfootballdatabase.fandom.com/wiki/100th_Grey_Cup",
135
+ "https://www.sciencedirect.com/science/article/pii/S0039606025002491",
136
+ ]
137
+
138
+ if __name__ == "__main__":
139
+ print(f"测试: {len(URLS)} 个 URL")
140
+
141
+ results = fetch_all(
142
+ URLS,
143
+ instances=INSTANCES,
144
+ concurrency=CONCURRENCY,
145
+ max_pages_per_context=MAX_PAGES_PER_CONTEXT,
146
+ headless=HEADLESS,
147
+ solve_cf=True,
148
+ proxy=PROXY,
149
+ return_cookies=RETURN_COOKIES,
150
+ verbose=False,
151
+ )
152
+
153
+ ok = sum(1 for r in results if r["success"])
154
+ print(f"\n{'='*50}")
155
+ for r in results:
156
+ status = "✓" if r["success"] else "✗"
157
+ print(f" {status} {(r['title'] or 'FAILED')[:60]}")
158
+ print(f"{'='*50}")
159
+ print(f"结果: {ok}/{len(results)} 成功")
160
+ ```
161
+
162
+ 运行:`python test.py`
163
+
164
+ 预期输出:
165
+
166
+ ```
167
+ 测试: 31 个 URL
168
+
169
+ ==================================================
170
+ ✓ Gut-peritoneal-multisystem axis in endometriosis | Gut
171
+ ✓ Hitting the mitotic spot of fibrolamellar carcinoma | Gut
172
+ ...
173
+ ✓ 100th Grey Cup | American Football Database | Fandom
174
+ ✓ Guidelines for perioperative care in elective colorectal sur
175
+ ==================================================
176
+ 结果: 31/31 成功
177
+ ```
178
+
179
+ ---
180
+
181
+ ### 案例 B:PDF 文件下载
182
+
183
+ 通过过 CF 后的浏览器页面发起 `fetch()` 下载 PDF,复用 TLS 指纹和 cookie。
184
+
185
+ ```python
186
+ # -*- coding: utf-8 -*-
187
+ """测试 download_file 方法 — PDF 下载"""
188
+ import asyncio
189
+ import os
190
+ import sys
191
+
192
+ if sys.platform == "win32":
193
+ import io
194
+ sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding="utf-8", errors="replace")
195
+ sys.stderr = io.TextIOWrapper(sys.stderr.buffer, encoding="utf-8", errors="replace")
196
+
197
+ sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
198
+ from cf_killer import CFPageFetcher
199
+
200
+ PDF_URL = "https://www.myavls.org/assets/pdf/SuperficialVenousDiseaseGuidelinesPMS313-02.03.16.pdf"
201
+ OUTPUT = os.path.join(os.path.dirname(os.path.abspath(__file__)), "SuperficialVenousDiseaseGuidelines.pdf")
202
+
203
+
204
+ async def main():
205
+ print(f"目标: {PDF_URL}")
206
+ print(f"保存: {OUTPUT}")
207
+
208
+ async with CFPageFetcher(
209
+ headless=True,
210
+ verbose=True,
211
+ solve_cf=True,
212
+ ) as fetcher:
213
+ ok = await fetcher.download_file(PDF_URL, OUTPUT)
214
+ if ok:
215
+ size_kb = os.path.getsize(OUTPUT) / 1024
216
+ print(f"\n✅ 下载成功! 文件: {OUTPUT} ({size_kb:.0f} KB)")
217
+ else:
218
+ print(f"\n❌ 下载失败")
219
+
220
+
221
+ if __name__ == "__main__":
222
+ asyncio.run(main())
223
+ ```
224
+
225
+ 运行:`python test_download.py`
226
+
227
+ 预期输出:
228
+
229
+ ```
230
+ 目标: https://www.myavls.org/assets/pdf/SuperficialVenousDiseaseGuidelinesPMS313-02.03.16.pdf
231
+ 保存: ...\SuperficialVenousDiseaseGuidelines.pdf
232
+ [上下文] 已创建
233
+ [下载] 预热: https://www.myavls.org/
234
+ 非 CF url=https://www.myavls.org/
235
+ [下载] 已保存: ...\SuperficialVenousDiseaseGuidelines.pdf (121KB)
236
+
237
+ ✅ 下载成功! ... (121 KB)
238
+ ```
239
+
240
+ ---
241
+
242
+ ## 5. 主要 API 参数
243
+
244
+ ### `CFPageFetcher`
245
+
246
+ | 参数 | 类型 | 默认值 | 说明 |
247
+ |------|------|--------|------|
248
+ | `headless` | bool | True | 无头模式 |
249
+ | `humanize` | bool | False | 人类化鼠标轨迹/键盘时序 |
250
+ | `solve_cf` | bool | True | 自动求解 CF 挑战 |
251
+ | `cf_max_retries` | int | 5 | CF 求解最大重试次数 |
252
+ | `timeout` | int | 90000 | 页面导航超时 (ms) |
253
+ | `proxy` | str | None | 代理 URL |
254
+ | `max_pages_per_context` | int | 20 | 每 N 页回收浏览器上下文 |
255
+ | `return_cookies` | bool | False | 结果中是否包含 cookies |
256
+
257
+ ### `fetch_all`
258
+
259
+ | 参数 | 类型 | 默认值 | 说明 |
260
+ |------|------|--------|------|
261
+ | `urls` | list | - | URL 列表 |
262
+ | `instances` | int | 1 | 并行浏览器实例数 |
263
+ | `concurrency` | int | 3 | 每实例并发 tab 数 |
264
+ | `max_pages_per_context` | int | 20 | 每 N 页自动回收 |
265
+ | `proxy` | str/list/callable | None | 单代理/代理列表/代理工厂函数 |
@@ -0,0 +1,38 @@
1
+ # -*- coding: utf-8 -*-
2
+ """
3
+ cf-killer: Cloudflare 5s 盾自动求解 + 页面批量抓取工具
4
+
5
+ 基于 CloakBrowser(Chromium C++ 源码级反检测浏览器)。
6
+
7
+ Usage:
8
+ from cf_killer import fetch_all, fetch_url, CFPageFetcher
9
+
10
+ # 批量抓取
11
+ results = fetch_all(["https://example.com", ...], headless=True)
12
+
13
+ # 单页抓取
14
+ result = fetch_url("https://example.com")
15
+
16
+ # 异步上下文管理器
17
+ async with CFPageFetcher(headless=True) as fetcher:
18
+ results = await fetcher.fetch_batch(urls)
19
+ await fetcher.download_file(pdf_url, "output.pdf")
20
+ """
21
+
22
+ from .core import (
23
+ CFSolver,
24
+ CFPageFetcher,
25
+ fetch_all,
26
+ fetch_urls,
27
+ fetch_url,
28
+ )
29
+
30
+ __all__ = [
31
+ "CFSolver",
32
+ "CFPageFetcher",
33
+ "fetch_all",
34
+ "fetch_urls",
35
+ "fetch_url",
36
+ ]
37
+
38
+ __version__ = "0.1.0"