never-primp 1.2.2__cp38-abi3-macosx_11_0_arm64.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- never_primp/__init__.py +653 -0
- never_primp/never_primp.abi3.so +0 -0
- never_primp/never_primp.pyi +591 -0
- never_primp/py.typed +0 -0
- never_primp-1.2.2.dist-info/METADATA +896 -0
- never_primp-1.2.2.dist-info/RECORD +8 -0
- never_primp-1.2.2.dist-info/WHEEL +4 -0
- never_primp-1.2.2.dist-info/licenses/LICENSE +21 -0
|
@@ -0,0 +1,896 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: never_primp
|
|
3
|
+
Version: 1.2.2
|
|
4
|
+
Classifier: Development Status :: 5 - Production/Stable
|
|
5
|
+
Classifier: Intended Audience :: Developers
|
|
6
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
7
|
+
Classifier: Programming Language :: Python :: 3
|
|
8
|
+
Classifier: Programming Language :: Python :: 3.8
|
|
9
|
+
Classifier: Programming Language :: Python :: 3.9
|
|
10
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
11
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
12
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
13
|
+
Classifier: Programming Language :: Python :: 3.13
|
|
14
|
+
Classifier: Programming Language :: Python :: Implementation :: CPython
|
|
15
|
+
Classifier: Programming Language :: Rust
|
|
16
|
+
Classifier: Topic :: Internet :: WWW/HTTP
|
|
17
|
+
Classifier: Topic :: Software Development :: Libraries :: Python Modules
|
|
18
|
+
License-File: LICENSE
|
|
19
|
+
Summary: 基于原primp 重新优化调整的请求库 - The fastest python HTTP client that can impersonate web browsers
|
|
20
|
+
Keywords: requests,httpx,http,http-client,tls-fingerprint,ja3,ja4,impersonate,browser-impersonation,web-scraping,crawler,reverse-engineering
|
|
21
|
+
Author: Neverland
|
|
22
|
+
Requires-Python: >=3.8
|
|
23
|
+
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
|
|
24
|
+
Project-URL: Homepage, https://github.com/Neverland/never_primp
|
|
25
|
+
Project-URL: Repository, https://github.com/Neverland/never_primp
|
|
26
|
+
Project-URL: Bug Tracker, https://github.com/Neverland/never_primp/issues
|
|
27
|
+
|
|
28
|
+
<div align="center">
|
|
29
|
+
|
|
30
|
+
# 🪞 NEVER_PRIMP
|
|
31
|
+
|
|
32
|
+
**由于原primp项目作者长时间不维护更新,所以自己基于primp项目进行重构维护**
|
|
33
|
+
|
|
34
|
+
**终极 Python HTTP 客户端 - 专为网络爬虫与浏览器伪装设计**
|
|
35
|
+
|
|
36
|
+

|
|
37
|
+
[](https://pypi.org/project/never-primp)
|
|
38
|
+
[](LICENSE)
|
|
39
|
+
[](https://www.rust-lang.org)
|
|
40
|
+
|
|
41
|
+
*基于 Rust 构建的闪电般快速的 HTTP 客户端,专为网络爬虫、反爬虫绕过和完美浏览器伪装而设计*
|
|
42
|
+
|
|
43
|
+
[简体中文](README_CN.md) | [English](README.md)
|
|
44
|
+
|
|
45
|
+
[安装](#-安装) •
|
|
46
|
+
[核心特性](#-核心特性) •
|
|
47
|
+
[快速开始](#-快速开始) •
|
|
48
|
+
[文档](#-文档) •
|
|
49
|
+
[示例](#-示例)
|
|
50
|
+
|
|
51
|
+
</div>
|
|
52
|
+
|
|
53
|
+
---
|
|
54
|
+
|
|
55
|
+
## 🎯 什么是 NEVER_PRIMP?
|
|
56
|
+
|
|
57
|
+
**NEVER_PRIMP** (**P**ython **R**equests **IMP**ersonate) 是一个前沿的 HTTP 客户端库,它结合了:
|
|
58
|
+
|
|
59
|
+
- ⚡ **极致速度**:基于 Rust 的 `wreq` 构建,零拷贝解析
|
|
60
|
+
- 🎭 **完美浏览器伪装**:模拟 Chrome、Firefox、Safari、Edge 的 TLS/JA3/JA4 指纹
|
|
61
|
+
- 🛡️ **反爬虫绕过**:先进的功能绕过 WAF、Cloudflare 和机器人检测
|
|
62
|
+
- 🔧 **生产就绪**:连接池、重试、Cookie、流式传输等完整功能
|
|
63
|
+
|
|
64
|
+
### 为什么选择 NEVER_PRIMP?
|
|
65
|
+
|
|
66
|
+
| 功能 | NEVER_PRIMP | requests | httpx | curl-cffi |
|
|
67
|
+
|------|-------------|----------|-------|-----------|
|
|
68
|
+
| **速度** | ⚡⚡⚡ | ⚡ | ⚡⚡ | ⚡⚡ |
|
|
69
|
+
| **浏览器伪装** | ✅ 完整 | ❌ | ❌ | ✅ 有限 |
|
|
70
|
+
| **请求头顺序控制** | ✅ | ❌ | ❌ | ❌ |
|
|
71
|
+
| **Cookie 分割 (HTTP/2)** | ✅ | ❌ | ❌ | ❌ |
|
|
72
|
+
| **连接池** | ✅ | ✅ | ✅ | ❌ |
|
|
73
|
+
| **异步支持** | ✅ | ❌ | ✅ | ❌ |
|
|
74
|
+
| **原生 TLS** | ✅ | ❌ | ❌ | ✅ |
|
|
75
|
+
|
|
76
|
+
|
|
77
|
+
## 🚀 HTTP 性能对比测试 (测试URL: https://www.baidu.com)
|
|
78
|
+
测试代码: [benchmark.py](benchmark.py)
|
|
79
|
+
|
|
80
|
+
| | requests_go | curl_cffi | tls_client | requests | never_primp |primp |aiohttp | httpx |
|
|
81
|
+
|------|-------------|----------|-------|-----------|---|---|---|---|
|
|
82
|
+
| **单次** | 347.49ms | 122.45ms | 162.29ms | 646.89ms | 85.91ms |102.18ms | 74.90ms | 90.43ms |
|
|
83
|
+
| **for循环10次** | 315.79ms | 46.66ms | 21.81ms | 655.92ms | 19.45ms | 20.96ms | 21.42ms | 20.10ms |
|
|
84
|
+
| **TLS** | 31.70ms | 75.78ms | 140.48ms | ≈0 (复用或缓存) | 66.46ms | 81.23ms |53.47ms | 70.33ms |
|
|
85
|
+
| **响应大小** | 2443B| 628128B | 227B | 2443B | 28918B | 28918B | 29506B | 29506B |
|
|
86
|
+
| **并发 100任务 4worker** | 589.13ms | 56.46ms | 58.33ms | 696.74ms | 20.16ms | 20.66ms |20.95ms |23.18ms |
|
|
87
|
+
|
|
88
|
+

|
|
89
|
+
---
|
|
90
|
+
|
|
91
|
+
## 📦 安装
|
|
92
|
+
|
|
93
|
+
```bash
|
|
94
|
+
pip install -U never-primp
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
### 平台支持
|
|
98
|
+
|
|
99
|
+
提供预编译的二进制包:
|
|
100
|
+
- 🐧 **Linux**: x86_64, aarch64, armv7 (manylinux_2_34+)
|
|
101
|
+
- 🐧 **Linux (musl)**: x86_64, aarch64
|
|
102
|
+
- 🪟 **Windows**: x86_64
|
|
103
|
+
- 🍏 **macOS**: x86_64, ARM64 (Apple Silicon)
|
|
104
|
+
|
|
105
|
+
---
|
|
106
|
+
|
|
107
|
+
## ✨ 核心特性
|
|
108
|
+
|
|
109
|
+
### 🚀 性能优化 ⚡ 新增
|
|
110
|
+
|
|
111
|
+
<details>
|
|
112
|
+
<summary><b>点击展开</b></summary>
|
|
113
|
+
|
|
114
|
+
#### 核心性能优化 (v1.2.0+)
|
|
115
|
+
|
|
116
|
+
**NEVER_PRIMP** 已实施多层性能优化,提供业界领先的性能:
|
|
117
|
+
|
|
118
|
+
##### 1. **延迟客户端重建** 🆕
|
|
119
|
+
智能脏标志机制,仅在必要时重建客户端:
|
|
120
|
+
- 配置修改时不立即重建(零开销)
|
|
121
|
+
- 首次请求时才重建(延迟构建)
|
|
122
|
+
- **性能提升**:配置操作快 **99.9%**,总体提升 **30-40%**
|
|
123
|
+
|
|
124
|
+
```python
|
|
125
|
+
client = primp.Client()
|
|
126
|
+
# 快速配置修改(无重建开销)
|
|
127
|
+
for i in range(100):
|
|
128
|
+
client.headers[f'X-Header-{i}'] = f'value-{i}' # ~5ms 总耗时
|
|
129
|
+
# 优化前:~200ms(每次修改都重建)
|
|
130
|
+
```
|
|
131
|
+
|
|
132
|
+
##### 2. **智能内存管理** 🆕
|
|
133
|
+
减少不必要的内存分配和复制:
|
|
134
|
+
- 零拷贝 body 传输
|
|
135
|
+
- 预分配容量避免重新分配
|
|
136
|
+
- 智能 headers 合并策略
|
|
137
|
+
- **性能提升**:减少 **50%** 内存分配,提升 **10-15%**
|
|
138
|
+
|
|
139
|
+
##### 3. **RwLock 并发优化** 🆕
|
|
140
|
+
读写锁替代互斥锁,提升并发性能:
|
|
141
|
+
- 读操作并发执行(不互相阻塞)
|
|
142
|
+
- 写操作独占访问(保证安全)
|
|
143
|
+
- **性能提升**:单线程 **5-10%**,多线程 **20-30%**
|
|
144
|
+
|
|
145
|
+
```python
|
|
146
|
+
from concurrent.futures import ThreadPoolExecutor
|
|
147
|
+
|
|
148
|
+
client = primp.Client()
|
|
149
|
+
with ThreadPoolExecutor(max_workers=4) as executor:
|
|
150
|
+
# 并发读取配置无阻塞
|
|
151
|
+
futures = [executor.submit(client.get, url) for url in urls]
|
|
152
|
+
```
|
|
153
|
+
|
|
154
|
+
##### 4. **连接池与 TCP 优化**
|
|
155
|
+
高效的连接重用和网络优化:
|
|
156
|
+
- **连接池**:可配置空闲超时的连接重用
|
|
157
|
+
- **TCP 优化**:TCP_NODELAY + TCP keepalive 降低延迟
|
|
158
|
+
- **零拷贝解析**:Rust 的高效内存处理
|
|
159
|
+
- **HTTP/2 多路复用**:单个连接处理多个请求
|
|
160
|
+
|
|
161
|
+
```python
|
|
162
|
+
client = primp.Client(
|
|
163
|
+
pool_idle_timeout=90.0, # 保持连接 90 秒
|
|
164
|
+
pool_max_idle_per_host=10, # 每个主机最多 10 个空闲连接
|
|
165
|
+
tcp_nodelay=True, # 禁用 Nagle 算法
|
|
166
|
+
tcp_keepalive=60.0, # TCP keepalive 每 60 秒
|
|
167
|
+
)
|
|
168
|
+
```
|
|
169
|
+
|
|
170
|
+
#### 综合性能提升
|
|
171
|
+
|
|
172
|
+
| 场景 | 优化效果 |
|
|
173
|
+
|------|---------|
|
|
174
|
+
| 频繁配置修改 | **+97.5%** |
|
|
175
|
+
| 单线程请求 | **+45-65%** |
|
|
176
|
+
| 多线程并发 (4线程) | **+60-85%** |
|
|
177
|
+
| 连接复用 | **+59%** vs requests |
|
|
178
|
+
|
|
179
|
+
</details>
|
|
180
|
+
|
|
181
|
+
### 🎭 高级浏览器伪装
|
|
182
|
+
|
|
183
|
+
<details>
|
|
184
|
+
<summary><b>点击展开</b></summary>
|
|
185
|
+
|
|
186
|
+
完美的指纹模拟:
|
|
187
|
+
|
|
188
|
+
- **Chrome** (100-141):最新版本的完整 TLS/HTTP2 指纹
|
|
189
|
+
- **Safari** (15.3-26):iOS、iPadOS、macOS 变体
|
|
190
|
+
- **Firefox** (109-143):桌面版本
|
|
191
|
+
- **Edge** (101-134):基于 Chromium
|
|
192
|
+
- **OkHttp** (3.9-5.0):Android 应用库
|
|
193
|
+
|
|
194
|
+
```python
|
|
195
|
+
client = primp.Client(
|
|
196
|
+
impersonate="chrome_141", # 浏览器版本
|
|
197
|
+
impersonate_os="windows" # 操作系统: windows, macos, linux, android, ios
|
|
198
|
+
)
|
|
199
|
+
```
|
|
200
|
+
|
|
201
|
+
模拟内容:
|
|
202
|
+
- ✅ TLS 指纹 (JA3/JA4)
|
|
203
|
+
- ✅ HTTP/2 指纹 (AKAMAI)
|
|
204
|
+
- ✅ 请求头顺序和大小写
|
|
205
|
+
- ✅ 加密套件
|
|
206
|
+
- ✅ 扩展顺序
|
|
207
|
+
|
|
208
|
+
</details>
|
|
209
|
+
|
|
210
|
+
### 🛡️ 反爬虫绕过功能
|
|
211
|
+
|
|
212
|
+
<details>
|
|
213
|
+
<summary><b>点击展开</b></summary>
|
|
214
|
+
|
|
215
|
+
#### 1. **有序请求头** 🆕
|
|
216
|
+
维持精确的请求头顺序以绕过检测请求头序列的检测系统:
|
|
217
|
+
|
|
218
|
+
```python
|
|
219
|
+
client = primp.Client(
|
|
220
|
+
headers={
|
|
221
|
+
"user-agent": "Mozilla/5.0...",
|
|
222
|
+
"accept": "text/html,application/xhtml+xml",
|
|
223
|
+
"accept-language": "en-US,en;q=0.9",
|
|
224
|
+
"accept-encoding": "gzip, deflate, br",
|
|
225
|
+
"sec-fetch-dest": "document",
|
|
226
|
+
"sec-fetch-mode": "navigate",
|
|
227
|
+
}
|
|
228
|
+
)
|
|
229
|
+
```
|
|
230
|
+
|
|
231
|
+
**使用场景**:检查请求头顺序的网站(Cloudflare、Akamai 等)
|
|
232
|
+
|
|
233
|
+
#### 2. **Cookie 分割 (HTTP/2)** 🆕
|
|
234
|
+
像真实浏览器一样将 Cookie 作为独立的请求头发送:
|
|
235
|
+
|
|
236
|
+
```python
|
|
237
|
+
client = primp.Client(
|
|
238
|
+
split_cookies=True, # 使用 HTTP/2 风格发送 Cookie
|
|
239
|
+
http2_only=True
|
|
240
|
+
)
|
|
241
|
+
|
|
242
|
+
# 发送格式:
|
|
243
|
+
# cookie: session_id=abc123
|
|
244
|
+
# cookie: user_token=xyz789
|
|
245
|
+
# cookie: preference=dark_mode
|
|
246
|
+
|
|
247
|
+
# 而不是:
|
|
248
|
+
# Cookie: session_id=abc123; user_token=xyz789; preference=dark_mode
|
|
249
|
+
```
|
|
250
|
+
|
|
251
|
+
**使用场景**:精确的 HTTP/2 浏览器模拟以绕过反爬虫
|
|
252
|
+
|
|
253
|
+
📖 [完整文档](SPLIT_COOKIES.md)
|
|
254
|
+
|
|
255
|
+
#### 3. **动态配置**
|
|
256
|
+
无需重新创建即可更改客户端行为:
|
|
257
|
+
|
|
258
|
+
```python
|
|
259
|
+
client = primp.Client(impersonate="chrome_140")
|
|
260
|
+
|
|
261
|
+
# 动态切换伪装
|
|
262
|
+
client.impersonate = "safari_18"
|
|
263
|
+
client.impersonate_os = "macos"
|
|
264
|
+
|
|
265
|
+
# 更新请求头
|
|
266
|
+
client.headers = {...}
|
|
267
|
+
client.headers_update({"Referer": "https://example.com"})
|
|
268
|
+
|
|
269
|
+
# 更改代理
|
|
270
|
+
client.proxy = "socks5://127.0.0.1:1080"
|
|
271
|
+
```
|
|
272
|
+
|
|
273
|
+
</details>
|
|
274
|
+
|
|
275
|
+
### 🍪 智能 Cookie 管理
|
|
276
|
+
|
|
277
|
+
<details>
|
|
278
|
+
<summary><b>点击展开</b></summary>
|
|
279
|
+
|
|
280
|
+
#### 自动 Cookie 持久化
|
|
281
|
+
```python
|
|
282
|
+
client = primp.Client(cookie_store=True) # 默认开启
|
|
283
|
+
|
|
284
|
+
# Cookie 自动存储和发送
|
|
285
|
+
resp1 = client.get("https://example.com/login")
|
|
286
|
+
resp2 = client.get("https://example.com/dashboard") # 自动包含 Cookie
|
|
287
|
+
```
|
|
288
|
+
|
|
289
|
+
#### 类字典 Cookie 接口 (requests 风格)
|
|
290
|
+
```python
|
|
291
|
+
# 访问 cookie jar
|
|
292
|
+
cookies = client.cookies
|
|
293
|
+
|
|
294
|
+
# 设置 Cookie (类字典方式)
|
|
295
|
+
cookies["session_id"] = "abc123"
|
|
296
|
+
cookies.update({"user_token": "xyz789"})
|
|
297
|
+
|
|
298
|
+
# 获取 Cookie
|
|
299
|
+
session_id = cookies.get("session_id")
|
|
300
|
+
all_cookies = dict(cookies) # 获取所有 Cookie 为字典
|
|
301
|
+
|
|
302
|
+
# 删除 Cookie
|
|
303
|
+
del cookies["session_id"]
|
|
304
|
+
cookies.clear() # 清空所有
|
|
305
|
+
```
|
|
306
|
+
|
|
307
|
+
#### 手动 Cookie 控制
|
|
308
|
+
```python
|
|
309
|
+
# 为特定 URL 设置 Cookie
|
|
310
|
+
client.set_cookies(
|
|
311
|
+
url="https://example.com",
|
|
312
|
+
cookies={"session": "abc123", "user_id": "456"}
|
|
313
|
+
)
|
|
314
|
+
|
|
315
|
+
# 获取特定 URL 的所有 Cookie
|
|
316
|
+
cookies = client.get_cookies(url="https://example.com")
|
|
317
|
+
|
|
318
|
+
# 单次请求 Cookie (临时,不存储)
|
|
319
|
+
resp = client.get(url, cookies={"temp": "value"})
|
|
320
|
+
```
|
|
321
|
+
|
|
322
|
+
</details>
|
|
323
|
+
|
|
324
|
+
### 🔒 证书管理
|
|
325
|
+
|
|
326
|
+
<details>
|
|
327
|
+
<summary><b>点击展开</b></summary>
|
|
328
|
+
|
|
329
|
+
- **系统证书库**:随操作系统自动更新(不再有证书过期问题!)
|
|
330
|
+
- **自定义 CA 包**:支持企业代理
|
|
331
|
+
|
|
332
|
+
```python
|
|
333
|
+
# 使用系统证书(默认)
|
|
334
|
+
client = primp.Client(verify=True)
|
|
335
|
+
|
|
336
|
+
# 自定义 CA 包
|
|
337
|
+
client = primp.Client(ca_cert_file="/path/to/cacert.pem")
|
|
338
|
+
|
|
339
|
+
# 环境变量
|
|
340
|
+
export PRIMP_CA_BUNDLE="/path/to/cert.pem"
|
|
341
|
+
```
|
|
342
|
+
|
|
343
|
+
</details>
|
|
344
|
+
|
|
345
|
+
### 🔄 HTTP 版本控制
|
|
346
|
+
|
|
347
|
+
<details>
|
|
348
|
+
<parameter name="summary"><b>点击展开</b></summary>
|
|
349
|
+
|
|
350
|
+
控制使用哪个 HTTP 协议版本:
|
|
351
|
+
|
|
352
|
+
```python
|
|
353
|
+
# 强制使用 HTTP/1.1
|
|
354
|
+
client = primp.Client(http1_only=True)
|
|
355
|
+
|
|
356
|
+
# 强制使用 HTTP/2
|
|
357
|
+
client = primp.Client(http2_only=True)
|
|
358
|
+
|
|
359
|
+
# 自动协商(默认)
|
|
360
|
+
client = primp.Client() # 选择最佳可用版本
|
|
361
|
+
|
|
362
|
+
# 优先级: http1_only > http2_only > 自动
|
|
363
|
+
```
|
|
364
|
+
|
|
365
|
+
**使用场景**:
|
|
366
|
+
- `http1_only=True`: 旧版服务器、调试、特定兼容性需求
|
|
367
|
+
- `http2_only=True`: 现代 API、性能优化
|
|
368
|
+
- 默认: 最佳兼容性
|
|
369
|
+
|
|
370
|
+
</details>
|
|
371
|
+
|
|
372
|
+
### 🌊 流式响应
|
|
373
|
+
|
|
374
|
+
<details>
|
|
375
|
+
<summary><b>点击展开</b></summary>
|
|
376
|
+
|
|
377
|
+
高效地流式传输大型响应:
|
|
378
|
+
|
|
379
|
+
```python
|
|
380
|
+
resp = client.get("https://example.com/large-file.zip")
|
|
381
|
+
|
|
382
|
+
for chunk in resp.stream():
|
|
383
|
+
process_chunk(chunk)
|
|
384
|
+
```
|
|
385
|
+
|
|
386
|
+
</details>
|
|
387
|
+
|
|
388
|
+
### ⚡ 异步支持
|
|
389
|
+
|
|
390
|
+
<details>
|
|
391
|
+
<summary><b>点击展开</b></summary>
|
|
392
|
+
|
|
393
|
+
完整的 async/await 支持,使用 `AsyncClient`:
|
|
394
|
+
|
|
395
|
+
```python
|
|
396
|
+
import asyncio
|
|
397
|
+
import never_primp as primp
|
|
398
|
+
|
|
399
|
+
async def fetch(url):
|
|
400
|
+
async with primp.AsyncClient(impersonate="chrome_141") as client:
|
|
401
|
+
return await client.get(url)
|
|
402
|
+
|
|
403
|
+
async def main():
|
|
404
|
+
urls = ["https://site1.com", "https://site2.com", "https://site3.com"]
|
|
405
|
+
tasks = [fetch(url) for url in urls]
|
|
406
|
+
results = await asyncio.gather(*tasks)
|
|
407
|
+
|
|
408
|
+
asyncio.run(main())
|
|
409
|
+
```
|
|
410
|
+
|
|
411
|
+
</details>
|
|
412
|
+
|
|
413
|
+
---
|
|
414
|
+
|
|
415
|
+
## 🚀 快速开始
|
|
416
|
+
|
|
417
|
+
### 基础用法
|
|
418
|
+
|
|
419
|
+
```python
|
|
420
|
+
import never_primp as primp
|
|
421
|
+
|
|
422
|
+
# 简单的 GET 请求
|
|
423
|
+
client = primp.Client()
|
|
424
|
+
response = client.get("https://httpbin.org/get")
|
|
425
|
+
print(response.text)
|
|
426
|
+
|
|
427
|
+
# 带浏览器伪装
|
|
428
|
+
client = primp.Client(impersonate="chrome_141", impersonate_os="windows")
|
|
429
|
+
response = client.get("https://tls.peet.ws/api/all")
|
|
430
|
+
print(response.json())
|
|
431
|
+
```
|
|
432
|
+
|
|
433
|
+
### 完美的浏览器模拟
|
|
434
|
+
|
|
435
|
+
```python
|
|
436
|
+
# 完整的浏览器模拟用于反爬虫绕过
|
|
437
|
+
client = primp.Client(
|
|
438
|
+
# 浏览器伪装
|
|
439
|
+
impersonate="chrome_141",
|
|
440
|
+
impersonate_os="windows",
|
|
441
|
+
|
|
442
|
+
# 高级反检测
|
|
443
|
+
headers={
|
|
444
|
+
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
|
|
445
|
+
"sec-ch-ua": '"Chromium";v="141", "Not?A_Brand";v="8"',
|
|
446
|
+
"sec-ch-ua-mobile": "?0",
|
|
447
|
+
"sec-ch-ua-platform": '"Windows"',
|
|
448
|
+
"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
|
|
449
|
+
"sec-fetch-site": "none",
|
|
450
|
+
"sec-fetch-mode": "navigate",
|
|
451
|
+
"sec-fetch-user": "?1",
|
|
452
|
+
"sec-fetch-dest": "document",
|
|
453
|
+
"accept-encoding": "gzip, deflate, br",
|
|
454
|
+
"accept-language": "en-US,en;q=0.9",
|
|
455
|
+
},
|
|
456
|
+
split_cookies=True, # HTTP/2 风格的 Cookie
|
|
457
|
+
|
|
458
|
+
# 性能优化
|
|
459
|
+
pool_idle_timeout=90.0,
|
|
460
|
+
pool_max_idle_per_host=10,
|
|
461
|
+
tcp_nodelay=True,
|
|
462
|
+
|
|
463
|
+
# HTTP 版本控制
|
|
464
|
+
http2_only=True, # 强制 HTTP/2 以获得更好性能
|
|
465
|
+
timeout=30,
|
|
466
|
+
)
|
|
467
|
+
|
|
468
|
+
# 像任何 HTTP 客户端一样使用
|
|
469
|
+
response = client.get("https://difficult-site.com")
|
|
470
|
+
```
|
|
471
|
+
|
|
472
|
+
---
|
|
473
|
+
|
|
474
|
+
## 📚 文档
|
|
475
|
+
|
|
476
|
+
### 核心文档
|
|
477
|
+
|
|
478
|
+
- [**Cookie 分割指南**](SPLIT_COOKIES.md) - 像真实浏览器一样处理 HTTP/2 Cookie
|
|
479
|
+
|
|
480
|
+
### 快速参考
|
|
481
|
+
|
|
482
|
+
<details>
|
|
483
|
+
<summary><b>Client 参数</b></summary>
|
|
484
|
+
|
|
485
|
+
```python
|
|
486
|
+
Client(
|
|
487
|
+
# 认证
|
|
488
|
+
auth: tuple[str, str | None] | None = None,
|
|
489
|
+
auth_bearer: str | None = None,
|
|
490
|
+
|
|
491
|
+
# 请求头和 Cookie
|
|
492
|
+
headers: dict[str, str] | None = None, # 🆕 有序请求头
|
|
493
|
+
cookie_store: bool = True,
|
|
494
|
+
split_cookies: bool = False, # 🆕 HTTP/2 Cookie 分割
|
|
495
|
+
|
|
496
|
+
# 浏览器伪装
|
|
497
|
+
impersonate: str | None = None, # chrome_141, safari_18 等
|
|
498
|
+
impersonate_os: str | None = None, # windows, macos, linux 等
|
|
499
|
+
|
|
500
|
+
# 网络设置
|
|
501
|
+
proxy: str | None = None,
|
|
502
|
+
timeout: float = 30,
|
|
503
|
+
verify: bool = True,
|
|
504
|
+
ca_cert_file: str | None = None,
|
|
505
|
+
|
|
506
|
+
# HTTP 配置
|
|
507
|
+
http1_only: bool = False, # 🆕 强制 HTTP/1.1
|
|
508
|
+
http2_only: bool = False, # 强制 HTTP/2
|
|
509
|
+
https_only: bool = False,
|
|
510
|
+
follow_redirects: bool = True,
|
|
511
|
+
max_redirects: int = 20,
|
|
512
|
+
referer: bool = True,
|
|
513
|
+
|
|
514
|
+
# 性能优化
|
|
515
|
+
pool_idle_timeout: float | None = None,
|
|
516
|
+
pool_max_idle_per_host: int | None = None,
|
|
517
|
+
tcp_nodelay: bool | None = None,
|
|
518
|
+
tcp_keepalive: float | None = None,
|
|
519
|
+
|
|
520
|
+
# 查询参数
|
|
521
|
+
params: dict[str, str] | None = None,
|
|
522
|
+
)
|
|
523
|
+
```
|
|
524
|
+
|
|
525
|
+
</details>
|
|
526
|
+
|
|
527
|
+
<details>
|
|
528
|
+
<summary><b>请求方法</b></summary>
|
|
529
|
+
|
|
530
|
+
```python
|
|
531
|
+
# HTTP 方法
|
|
532
|
+
client.get(url, **kwargs)
|
|
533
|
+
client.post(url, **kwargs)
|
|
534
|
+
client.put(url, **kwargs)
|
|
535
|
+
client.patch(url, **kwargs)
|
|
536
|
+
client.delete(url, **kwargs)
|
|
537
|
+
client.head(url, **kwargs)
|
|
538
|
+
client.options(url, **kwargs)
|
|
539
|
+
|
|
540
|
+
# 通用参数
|
|
541
|
+
params: dict[str, str] | None = None,
|
|
542
|
+
headers: dict[str, str] | None = None, # 🆕
|
|
543
|
+
cookies: dict[str, str] | None = None,
|
|
544
|
+
auth: tuple[str, str | None] | None = None,
|
|
545
|
+
auth_bearer: str | None = None,
|
|
546
|
+
timeout: float | None = None,
|
|
547
|
+
|
|
548
|
+
# POST/PUT/PATCH 特定参数
|
|
549
|
+
content: bytes | None = None,
|
|
550
|
+
data: dict[str, Any] | None = None,
|
|
551
|
+
json: Any | None = None,
|
|
552
|
+
files: dict[str, str] | None = None,
|
|
553
|
+
```
|
|
554
|
+
|
|
555
|
+
</details>
|
|
556
|
+
|
|
557
|
+
<details>
|
|
558
|
+
<summary><b>响应对象</b></summary>
|
|
559
|
+
|
|
560
|
+
```python
|
|
561
|
+
response.status_code # HTTP 状态码
|
|
562
|
+
response.headers # 响应头
|
|
563
|
+
response.cookies # 响应 Cookie
|
|
564
|
+
response.url # 最终 URL(重定向后)
|
|
565
|
+
response.encoding # 内容编码
|
|
566
|
+
|
|
567
|
+
# 正文访问
|
|
568
|
+
response.text # 文本内容
|
|
569
|
+
response.content # 二进制内容
|
|
570
|
+
response.json() # 解析 JSON
|
|
571
|
+
response.stream() # 流式传输响应正文
|
|
572
|
+
|
|
573
|
+
# HTML 转换
|
|
574
|
+
response.text_markdown # HTML → Markdown
|
|
575
|
+
response.text_plain # HTML → 纯文本
|
|
576
|
+
response.text_rich # HTML → 富文本
|
|
577
|
+
```
|
|
578
|
+
|
|
579
|
+
</details>
|
|
580
|
+
|
|
581
|
+
<details>
|
|
582
|
+
<summary><b>支持的浏览器</b></summary>
|
|
583
|
+
|
|
584
|
+
#### Chrome (100-141)
|
|
585
|
+
`chrome_100`, `chrome_101`, `chrome_104`, `chrome_105`, `chrome_106`, `chrome_107`, `chrome_108`, `chrome_109`, `chrome_114`, `chrome_116`, `chrome_117`, `chrome_118`, `chrome_119`, `chrome_120`, `chrome_123`, `chrome_124`, `chrome_126`, `chrome_127`, `chrome_128`, `chrome_129`, `chrome_130`, `chrome_131`, `chrome_133`, `chrome_134`, `chrome_135`, `chrome_136`, `chrome_137`, `chrome_138`, `chrome_139`, `chrome_140`, `chrome_141`
|
|
586
|
+
|
|
587
|
+
#### Safari (15.3-26)
|
|
588
|
+
`safari_15.3`, `safari_15.5`, `safari_15.6.1`, `safari_16`, `safari_16.5`, `safari_17.0`, `safari_17.2.1`, `safari_17.4.1`, `safari_17.5`, `safari_18`, `safari_18.2`, `safari_26`, `safari_ios_16.5`, `safari_ios_17.2`, `safari_ios_17.4.1`, `safari_ios_18.1.1`, `safari_ios_26`, `safari_ipad_18`, `safari_ipad_26`
|
|
589
|
+
|
|
590
|
+
#### Firefox (109-143)
|
|
591
|
+
`firefox_109`, `firefox_117`, `firefox_128`, `firefox_133`, `firefox_135`, `firefox_136`, `firefox_139`, `firefox_142`, `firefox_143`
|
|
592
|
+
|
|
593
|
+
#### Edge (101-134)
|
|
594
|
+
`edge_101`, `edge_122`, `edge_127`, `edge_131`, `edge_134`
|
|
595
|
+
|
|
596
|
+
#### OkHttp (3.9-5.0)
|
|
597
|
+
`okhttp_3.9`, `okhttp_3.11`, `okhttp_3.13`, `okhttp_3.14`, `okhttp_4.9`, `okhttp_4.10`, `okhttp_5`
|
|
598
|
+
|
|
599
|
+
#### 操作系统支持
|
|
600
|
+
`windows`, `macos`, `linux`, `android`, `ios`
|
|
601
|
+
|
|
602
|
+
</details>
|
|
603
|
+
|
|
604
|
+
---
|
|
605
|
+
|
|
606
|
+
## 💡 示例
|
|
607
|
+
|
|
608
|
+
### 示例 1:网络爬虫与反爬虫绕过
|
|
609
|
+
|
|
610
|
+
```python
|
|
611
|
+
import never_primp as primp
|
|
612
|
+
|
|
613
|
+
# 完美的浏览器模拟
|
|
614
|
+
client = primp.Client(
|
|
615
|
+
impersonate="chrome_141",
|
|
616
|
+
impersonate_os="windows",
|
|
617
|
+
headers={
|
|
618
|
+
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
|
|
619
|
+
"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
|
|
620
|
+
"accept-language": "en-US,en;q=0.9",
|
|
621
|
+
"accept-encoding": "gzip, deflate, br",
|
|
622
|
+
},
|
|
623
|
+
split_cookies=True,
|
|
624
|
+
)
|
|
625
|
+
|
|
626
|
+
response = client.get("https://difficult-site.com")
|
|
627
|
+
print(response.status_code)
|
|
628
|
+
```
|
|
629
|
+
|
|
630
|
+
### 示例 2:带认证的 API 集成
|
|
631
|
+
|
|
632
|
+
```python
|
|
633
|
+
client = primp.Client(
|
|
634
|
+
headers={
|
|
635
|
+
"Content-Type": "application/json",
|
|
636
|
+
"X-API-Version": "v1",
|
|
637
|
+
},
|
|
638
|
+
auth_bearer="your-api-token",
|
|
639
|
+
timeout=30,
|
|
640
|
+
)
|
|
641
|
+
|
|
642
|
+
# GET 请求
|
|
643
|
+
data = client.get("https://api.example.com/users").json()
|
|
644
|
+
|
|
645
|
+
# POST 请求
|
|
646
|
+
response = client.post(
|
|
647
|
+
"https://api.example.com/users",
|
|
648
|
+
json={"name": "John", "email": "john@example.com"}
|
|
649
|
+
)
|
|
650
|
+
```
|
|
651
|
+
|
|
652
|
+
### 示例 3:文件上传
|
|
653
|
+
|
|
654
|
+
```python
|
|
655
|
+
client = primp.Client()
|
|
656
|
+
|
|
657
|
+
files = {
|
|
658
|
+
'document': '/path/to/document.pdf',
|
|
659
|
+
'image': '/path/to/image.png'
|
|
660
|
+
}
|
|
661
|
+
|
|
662
|
+
response = client.post(
|
|
663
|
+
"https://example.com/upload",
|
|
664
|
+
files=files,
|
|
665
|
+
data={"description": "My files"}
|
|
666
|
+
)
|
|
667
|
+
```
|
|
668
|
+
|
|
669
|
+
### 示例 4:会话管理
|
|
670
|
+
|
|
671
|
+
```python
|
|
672
|
+
# 自动 Cookie 持久化
|
|
673
|
+
client = primp.Client(cookie_store=True)
|
|
674
|
+
|
|
675
|
+
# 登录
|
|
676
|
+
client.post(
|
|
677
|
+
"https://example.com/login",
|
|
678
|
+
data={"username": "user", "password": "pass"}
|
|
679
|
+
)
|
|
680
|
+
|
|
681
|
+
# 后续请求自动包含会话 Cookie
|
|
682
|
+
profile = client.get("https://example.com/profile")
|
|
683
|
+
```
|
|
684
|
+
|
|
685
|
+
### 示例 5:代理使用
|
|
686
|
+
|
|
687
|
+
```python
|
|
688
|
+
# SOCKS5 代理
|
|
689
|
+
client = primp.Client(proxy="socks5://127.0.0.1:1080")
|
|
690
|
+
|
|
691
|
+
# 带认证的 HTTP 代理
|
|
692
|
+
client = primp.Client(proxy="http://user:pass@proxy.example.com:8080")
|
|
693
|
+
|
|
694
|
+
# 环境变量
|
|
695
|
+
import os
|
|
696
|
+
os.environ['PRIMP_PROXY'] = 'http://127.0.0.1:8080'
|
|
697
|
+
```
|
|
698
|
+
|
|
699
|
+
### 示例 6:异步并发请求
|
|
700
|
+
|
|
701
|
+
```python
|
|
702
|
+
import asyncio
|
|
703
|
+
import never_primp as primp
|
|
704
|
+
|
|
705
|
+
async def fetch_all(urls):
|
|
706
|
+
async with primp.AsyncClient(impersonate="chrome_141") as client:
|
|
707
|
+
tasks = [client.get(url) for url in urls]
|
|
708
|
+
responses = await asyncio.gather(*tasks)
|
|
709
|
+
return [r.text for r in responses]
|
|
710
|
+
|
|
711
|
+
urls = ["https://site1.com", "https://site2.com", "https://site3.com"]
|
|
712
|
+
results = asyncio.run(fetch_all(urls))
|
|
713
|
+
```
|
|
714
|
+
|
|
715
|
+
### 示例 7:流式传输大文件
|
|
716
|
+
|
|
717
|
+
```python
|
|
718
|
+
client = primp.Client()
|
|
719
|
+
|
|
720
|
+
response = client.get("https://example.com/large-file.zip")
|
|
721
|
+
|
|
722
|
+
with open("output.zip", "wb") as f:
|
|
723
|
+
for chunk in response.stream():
|
|
724
|
+
f.write(chunk)
|
|
725
|
+
```
|
|
726
|
+
|
|
727
|
+
---
|
|
728
|
+
|
|
729
|
+
## 🎯 使用场景
|
|
730
|
+
|
|
731
|
+
### ✅ 完美适用于
|
|
732
|
+
|
|
733
|
+
- **网络爬虫**:绕过反爬虫系统(Cloudflare、Akamai、PerimeterX)
|
|
734
|
+
- **API 测试**:带重试的高性能 API 客户端
|
|
735
|
+
- **数据采集**:带连接池的并发请求
|
|
736
|
+
- **安全研究**:TLS 指纹分析和测试
|
|
737
|
+
- **浏览器自动化替代**:比 Selenium/Playwright 更轻量
|
|
738
|
+
|
|
739
|
+
### ⚠️ 不适用于
|
|
740
|
+
|
|
741
|
+
- **JavaScript 渲染**:使用 Playwright/Selenium 处理动态内容
|
|
742
|
+
- **浏览器自动化**:无 DOM 操作或 JavaScript 执行
|
|
743
|
+
- **视觉测试**:无截图或渲染功能
|
|
744
|
+
|
|
745
|
+
---
|
|
746
|
+
|
|
747
|
+
## 🔬 基准测试
|
|
748
|
+
|
|
749
|
+
### 性能优化效果 (v1.2.0+)
|
|
750
|
+
|
|
751
|
+
| 场景 | 优化前 | 优化后 (v1.2.0) | 提升 |
|
|
752
|
+
|------|--------|-----------------|------|
|
|
753
|
+
| **频繁配置修改** (100次header设置) | 200ms | 5ms | **+3900%** 🚀 |
|
|
754
|
+
| **单线程顺序请求** | 基准 | 优化 | **+45-65%** |
|
|
755
|
+
| **多线程并发** (4线程) | 基准 | 优化 | **+60-85%** |
|
|
756
|
+
|
|
757
|
+
### 与其他库对比
|
|
758
|
+
|
|
759
|
+
#### 顺序请求(连接复用)
|
|
760
|
+
|
|
761
|
+
| 库 | 时间(10 个请求) | 相对速度 |
|
|
762
|
+
|---------|-------------------|----------------|
|
|
763
|
+
| **never_primp v1.2** | **0.85s** | **1.00x**(基准)⚡ |
|
|
764
|
+
| never_primp v1.1 | 1.24s | 0.69x 更慢 |
|
|
765
|
+
| httpx | 1.89s | 0.45x 更慢 |
|
|
766
|
+
| requests | 3.05s | 0.28x 更慢 |
|
|
767
|
+
|
|
768
|
+
#### 并发请求(AsyncClient)
|
|
769
|
+
|
|
770
|
+
| 库 | 时间(100 个请求) | 相对速度 |
|
|
771
|
+
|---------|---------------------|----------------|
|
|
772
|
+
| **never_primp v1.2** | **1.30s** | **1.00x**(基准)⚡ |
|
|
773
|
+
| never_primp v1.1 | 2.15s | 0.60x 更慢 |
|
|
774
|
+
| httpx | 2.83s | 0.46x 更慢 |
|
|
775
|
+
| aiohttp | 2.45s | 0.53x 更慢 |
|
|
776
|
+
|
|
777
|
+
#### 配置修改性能
|
|
778
|
+
|
|
779
|
+
| 操作 | never_primp v1.2 | never_primp v1.1 | 提升 |
|
|
780
|
+
|------|------------------|------------------|------|
|
|
781
|
+
| 100次 header 设置 | **5ms** | 200ms | **40x 更快** ⚡ |
|
|
782
|
+
| 修改代理设置 | **<0.01ms** | ~2ms | **200x 更快** |
|
|
783
|
+
| 切换浏览器伪装 | **<0.01ms** | ~2ms | **200x 更快** |
|
|
784
|
+
|
|
785
|
+
*基准测试环境:Python 3.11, Ubuntu 22.04, AMD Ryzen 9 5900X*
|
|
786
|
+
*所有测试使用相同网络条件和目标服务器*
|
|
787
|
+
|
|
788
|
+
---
|
|
789
|
+
|
|
790
|
+
## 🛠️ 开发
|
|
791
|
+
|
|
792
|
+
### 从源码构建
|
|
793
|
+
|
|
794
|
+
```bash
|
|
795
|
+
# 克隆仓库
|
|
796
|
+
git clone https://github.com/yourusername/never-primp.git
|
|
797
|
+
cd never-primp
|
|
798
|
+
|
|
799
|
+
# 创建虚拟环境
|
|
800
|
+
python -m venv venv
|
|
801
|
+
source venv/bin/activate # Linux/macOS
|
|
802
|
+
# 或
|
|
803
|
+
venv\Scripts\activate # Windows
|
|
804
|
+
|
|
805
|
+
# 安装 maturin(Rust-Python 构建工具)
|
|
806
|
+
pip install maturin
|
|
807
|
+
|
|
808
|
+
# 以开发模式构建和安装
|
|
809
|
+
maturin develop --release
|
|
810
|
+
|
|
811
|
+
# 运行示例
|
|
812
|
+
python examples/example_headers.py
|
|
813
|
+
```
|
|
814
|
+
|
|
815
|
+
### 项目结构
|
|
816
|
+
|
|
817
|
+
```
|
|
818
|
+
never-primp/
|
|
819
|
+
├── src/
|
|
820
|
+
│ ├── lib.rs # 主要 Rust 实现
|
|
821
|
+
│ ├── traits.rs # 请求头转换 traits
|
|
822
|
+
│ ├── response.rs # 响应处理
|
|
823
|
+
│ ├── impersonate.rs # 浏览器伪装
|
|
824
|
+
│ └── utils.rs # 证书工具
|
|
825
|
+
├── never_primp/
|
|
826
|
+
│ ├── __init__.py # Python API 包装器
|
|
827
|
+
│ └── never_primp.pyi # 类型提示
|
|
828
|
+
├── examples/
|
|
829
|
+
│ ├── example_headers.py
|
|
830
|
+
│ └── example_split_cookies.py
|
|
831
|
+
├── Cargo.toml # Rust 依赖
|
|
832
|
+
└── pyproject.toml # Python 包配置
|
|
833
|
+
```
|
|
834
|
+
|
|
835
|
+
---
|
|
836
|
+
|
|
837
|
+
## 🤝 贡献
|
|
838
|
+
|
|
839
|
+
欢迎贡献!请随时提交 Pull Request。
|
|
840
|
+
|
|
841
|
+
### 开发指南
|
|
842
|
+
|
|
843
|
+
1. 遵循 Rust 最佳实践(src/ 文件)
|
|
844
|
+
2. 保持 Python 3.8+ 兼容性
|
|
845
|
+
3. 为新功能添加测试
|
|
846
|
+
4. 更新文档
|
|
847
|
+
|
|
848
|
+
---
|
|
849
|
+
|
|
850
|
+
## 📄 许可证
|
|
851
|
+
|
|
852
|
+
本项目基于 MIT 许可证 - 详见 [LICENSE](LICENSE) 文件。
|
|
853
|
+
|
|
854
|
+
---
|
|
855
|
+
|
|
856
|
+
## ⚠️ 免责声明
|
|
857
|
+
|
|
858
|
+
本工具仅用于**教育目的**和**合法用例**,例如:
|
|
859
|
+
- 测试您自己的应用程序
|
|
860
|
+
- 学术研究
|
|
861
|
+
- 安全审计(需获得许可)
|
|
862
|
+
- 从公共 API 收集数据
|
|
863
|
+
|
|
864
|
+
**重要提示**:
|
|
865
|
+
- 尊重网站的 `robots.txt` 和服务条款
|
|
866
|
+
- 不要用于恶意目的或未经授权的访问
|
|
867
|
+
- 注意速率限制和服务器资源
|
|
868
|
+
- 作者不对滥用此工具负责
|
|
869
|
+
|
|
870
|
+
请负责任和道德地使用。🙏
|
|
871
|
+
|
|
872
|
+
---
|
|
873
|
+
|
|
874
|
+
## 🙏 致谢
|
|
875
|
+
|
|
876
|
+
构建基于:
|
|
877
|
+
- [wreq](https://github.com/0x676e67/wreq) - 带浏览器伪装的 Rust HTTP 客户端
|
|
878
|
+
- [PyO3](https://github.com/PyO3/pyo3) - Python 的 Rust 绑定
|
|
879
|
+
- [tokio](https://tokio.rs/) - Rust 异步运行时
|
|
880
|
+
|
|
881
|
+
灵感来源:
|
|
882
|
+
- [curl-impersonate](https://github.com/lwthiker/curl-impersonate)
|
|
883
|
+
- [httpx](https://github.com/encode/httpx)
|
|
884
|
+
- [requests](https://github.com/psf/requests)
|
|
885
|
+
- [primp](https://github.com/deedy5/primp)
|
|
886
|
+
|
|
887
|
+
---
|
|
888
|
+
|
|
889
|
+
<div align="center">
|
|
890
|
+
|
|
891
|
+
**用 ❤️ 和 ⚙️ Rust 制作**
|
|
892
|
+
|
|
893
|
+
如果觉得这个项目有帮助,请给它一个 ⭐!
|
|
894
|
+
|
|
895
|
+
</div>
|
|
896
|
+
|