NullGazeX 2.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- nullgazex-2.1.0/LICENSE +21 -0
- nullgazex-2.1.0/NullGazeX.egg-info/PKG-INFO +325 -0
- nullgazex-2.1.0/NullGazeX.egg-info/SOURCES.txt +13 -0
- nullgazex-2.1.0/NullGazeX.egg-info/dependency_links.txt +1 -0
- nullgazex-2.1.0/NullGazeX.egg-info/requires.txt +13 -0
- nullgazex-2.1.0/NullGazeX.egg-info/top_level.txt +1 -0
- nullgazex-2.1.0/PKG-INFO +325 -0
- nullgazex-2.1.0/README.md +296 -0
- nullgazex-2.1.0/nullgaze/__init__.py +168 -0
- nullgazex-2.1.0/nullgaze/downloader.py +697 -0
- nullgazex-2.1.0/nullgaze/exceptions.py +32 -0
- nullgazex-2.1.0/nullgaze/scraper.py +375 -0
- nullgazex-2.1.0/nullgaze/utils.py +138 -0
- nullgazex-2.1.0/pyproject.toml +43 -0
- nullgazex-2.1.0/setup.cfg +4 -0
nullgazex-2.1.0/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 Batal
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
|
@@ -0,0 +1,325 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: NullGazeX
|
|
3
|
+
Version: 2.1.0
|
|
4
|
+
Summary: Research tool for studying DPI, TLS fingerprinting, and content-delivery protections via ClientHello splitting and parallel-strategy bypass. For educational and research purposes only.
|
|
5
|
+
Author-email: Batal <batal@proton.me>
|
|
6
|
+
License-Expression: MIT
|
|
7
|
+
Project-URL: Homepage, https://github.com/batal/NullGazeX
|
|
8
|
+
Classifier: Programming Language :: Python :: 3
|
|
9
|
+
Classifier: Operating System :: OS Independent
|
|
10
|
+
Classifier: Topic :: Internet :: WWW/HTTP
|
|
11
|
+
Classifier: Topic :: Multimedia :: Graphics
|
|
12
|
+
Classifier: Topic :: Scientific/Engineering :: Information Analysis
|
|
13
|
+
Classifier: Intended Audience :: Education
|
|
14
|
+
Classifier: Intended Audience :: Science/Research
|
|
15
|
+
Requires-Python: >=3.8
|
|
16
|
+
Description-Content-Type: text/markdown
|
|
17
|
+
License-File: LICENSE
|
|
18
|
+
Requires-Dist: requests>=2.25.0
|
|
19
|
+
Requires-Dist: curl_cffi>=0.5.0
|
|
20
|
+
Provides-Extra: browser
|
|
21
|
+
Requires-Dist: DrissionPage>=4.0.0; extra == "browser"
|
|
22
|
+
Provides-Extra: scrape
|
|
23
|
+
Requires-Dist: beautifulsoup4>=4.9.0; extra == "scrape"
|
|
24
|
+
Provides-Extra: full
|
|
25
|
+
Requires-Dist: curl_cffi>=0.5.0; extra == "full"
|
|
26
|
+
Requires-Dist: DrissionPage>=4.0.0; extra == "full"
|
|
27
|
+
Requires-Dist: beautifulsoup4>=4.9.0; extra == "full"
|
|
28
|
+
Dynamic: license-file
|
|
29
|
+
|
|
30
|
+
# NullGazeX — Research Tool for Network Filtering Analysis
|
|
31
|
+
|
|
32
|
+
> **Important**: NullGazeX is a **research and educational tool** designed
|
|
33
|
+
> to study network filtering, Deep Packet Inspection (DPI), TLS
|
|
34
|
+
> fingerprinting, and content-delivery protection mechanisms. It is
|
|
35
|
+
> **not** intended for unauthorized access to protected content. The
|
|
36
|
+
> author assumes **no liability** for any misuse or for any consequences
|
|
37
|
+
> arising from its use. By using this software you agree that you are
|
|
38
|
+
> solely responsible for complying with all applicable laws and the
|
|
39
|
+
> terms of service of any website you interact with.
|
|
40
|
+
|
|
41
|
+
NullGazeX implements a multi-tier pipeline of bypass strategies — raced
|
|
42
|
+
in parallel for **terrifying speed** — to retrieve content through ISP
|
|
43
|
+
SNI filters, Cloudflare challenges, TLS fingerprint checks, and
|
|
44
|
+
hotlinking protections.
|
|
45
|
+
|
|
46
|
+
---
|
|
47
|
+
|
|
48
|
+
## What NullGazeX Does
|
|
49
|
+
|
|
50
|
+
| Capability | Description |
|
|
51
|
+
| :--- | :--- |
|
|
52
|
+
| **Image Downloading** | Download images through any firewall or anti-scraping protection. |
|
|
53
|
+
| **Page Scraping** | Retrieve full page HTML, text, titles, or JSON from any URL — undetected. |
|
|
54
|
+
| **DPI Bypass** | ClientHello packet splitting evades ISP SNI filters without external proxies or VPNs. |
|
|
55
|
+
| **TLS Impersonation** | Rotates curl\_cffi browser fingerprints (Chrome, Safari, Firefox, Edge) so traffic looks like a real browser. |
|
|
56
|
+
| **Parallel Strategy Racing** | All bypass strategies launch concurrently — the fastest wins in fractions of a second. |
|
|
57
|
+
| **Adaptive Anti-Blocking** | Per-domain cooldown, jittered delays, session cycling, and fingerprint rotation prevent server-side rate-limiting. |
|
|
58
|
+
| **Stealth Browser Fallback** | Headless Chromium (via DrissionPage) with anti-detection flags for extreme JS challenges. |
|
|
59
|
+
|
|
60
|
+
---
|
|
61
|
+
|
|
62
|
+
## How It Works
|
|
63
|
+
|
|
64
|
+
NullGazeX uses a **local loopback proxy** that intercepts the TLS
|
|
65
|
+
ClientHello handshake packet and splits it into two TCP segments with
|
|
66
|
+
an 8 ms gap between them. The SNI (Server Name Indication) field is
|
|
67
|
+
thus broken across two packets. DPI firewalls inspect packets
|
|
68
|
+
individually and do not reassemble them — so the SNI check never fires.
|
|
69
|
+
|
|
70
|
+
Combined with **rotating TLS fingerprints** (the `curl_cffi` library
|
|
71
|
+
impersonates Chrome 110–124, Safari 15–17, Firefox 116–117, and Edge
|
|
72
|
+
99–110), the traffic is indistinguishable from a real browser.
|
|
73
|
+
|
|
74
|
+
For more details see **[DPI_BYPASS_DOC.md](nullgaze/DPI_BYPASS_DOC.md)**.
|
|
75
|
+
|
|
76
|
+
---
|
|
77
|
+
|
|
78
|
+
## Installation
|
|
79
|
+
|
|
80
|
+
### Standard (image downloading + scraping)
|
|
81
|
+
```bash
|
|
82
|
+
pip install NullGazeX
|
|
83
|
+
```
|
|
84
|
+
|
|
85
|
+
### With browser support (for extreme JS challenges)
|
|
86
|
+
```bash
|
|
87
|
+
pip install NullGazeX[browser]
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
### Full installation (everything)
|
|
91
|
+
```bash
|
|
92
|
+
pip install NullGazeX[full]
|
|
93
|
+
```
|
|
94
|
+
|
|
95
|
+
Dependencies are resolved automatically:
|
|
96
|
+
|
|
97
|
+
| Extra | Includes |
|
|
98
|
+
| :--- | :--- |
|
|
99
|
+
| *(none)* | `requests`, `curl_cffi` |
|
|
100
|
+
| `browser` | `DrissionPage` (headless Chromium) |
|
|
101
|
+
| `scrape` | `beautifulsoup4` (HTML parsing) |
|
|
102
|
+
| `full` | everything above |
|
|
103
|
+
|
|
104
|
+
---
|
|
105
|
+
|
|
106
|
+
## Quick Start — Image Downloading
|
|
107
|
+
|
|
108
|
+
### Single image
|
|
109
|
+
```python
|
|
110
|
+
from nullgaze import download_image
|
|
111
|
+
|
|
112
|
+
download_image(
|
|
113
|
+
url="https://example.com/images/photo.jpg",
|
|
114
|
+
output_path="outputs/photo.jpg",
|
|
115
|
+
verbose=True
|
|
116
|
+
)
|
|
117
|
+
```
|
|
118
|
+
|
|
119
|
+
### Batch download (concurrent)
|
|
120
|
+
```python
|
|
121
|
+
from nullgaze import download_images
|
|
122
|
+
|
|
123
|
+
targets = [
|
|
124
|
+
("https://example.com/img1.jpg", "outputs/img1.jpg"),
|
|
125
|
+
("https://example.com/img2.jpg", "outputs/img2.jpg"),
|
|
126
|
+
("https://example.com/img3.jpg", "outputs/img3.jpg"),
|
|
127
|
+
]
|
|
128
|
+
|
|
129
|
+
results = download_images(targets, max_workers=10, verbose=True)
|
|
130
|
+
|
|
131
|
+
for i, res in enumerate(results):
|
|
132
|
+
if isinstance(res, Exception):
|
|
133
|
+
print(f"Failed {i+1}: {res}")
|
|
134
|
+
else:
|
|
135
|
+
print(f"Saved to: {res}")
|
|
136
|
+
```
|
|
137
|
+
|
|
138
|
+
---
|
|
139
|
+
|
|
140
|
+
## Quick Start — Page Scraping
|
|
141
|
+
|
|
142
|
+
### Scrape a page (raw HTML)
|
|
143
|
+
```python
|
|
144
|
+
from nullgaze import scrape_page
|
|
145
|
+
|
|
146
|
+
html = scrape_page("https://example.com/article", verbose=True)
|
|
147
|
+
print(html[:500])
|
|
148
|
+
```
|
|
149
|
+
|
|
150
|
+
### Scrape clean text
|
|
151
|
+
```python
|
|
152
|
+
from nullgaze import scrape_text
|
|
153
|
+
|
|
154
|
+
text = scrape_text("https://example.com/article")
|
|
155
|
+
print(text)
|
|
156
|
+
```
|
|
157
|
+
|
|
158
|
+
### Get page title
|
|
159
|
+
```python
|
|
160
|
+
from nullgaze import scrape_title
|
|
161
|
+
|
|
162
|
+
title = scrape_title("https://example.com/article")
|
|
163
|
+
print(f"Title: {title}")
|
|
164
|
+
```
|
|
165
|
+
|
|
166
|
+
### Scrape a JSON API
|
|
167
|
+
```python
|
|
168
|
+
from nullgaze import scrape_json
|
|
169
|
+
|
|
170
|
+
data = scrape_json("https://api.example.com/data")
|
|
171
|
+
print(data)
|
|
172
|
+
```
|
|
173
|
+
|
|
174
|
+
### Scrape multiple URLs in parallel
|
|
175
|
+
```python
|
|
176
|
+
from nullgaze import scrape_bulk
|
|
177
|
+
|
|
178
|
+
urls = [
|
|
179
|
+
"https://example.com/page1",
|
|
180
|
+
"https://example.com/page2",
|
|
181
|
+
"https://example.com/page3",
|
|
182
|
+
]
|
|
183
|
+
|
|
184
|
+
results = scrape_bulk(urls, max_workers=10, verbose=True)
|
|
185
|
+
for i, res in enumerate(results):
|
|
186
|
+
if isinstance(res, Exception):
|
|
187
|
+
print(f"Failed {urls[i]}: {res}")
|
|
188
|
+
else:
|
|
189
|
+
print(f"Success: {len(res)} chars")
|
|
190
|
+
```
|
|
191
|
+
|
|
192
|
+
---
|
|
193
|
+
|
|
194
|
+
## Object-Oriented API
|
|
195
|
+
|
|
196
|
+
### Image Downloader
|
|
197
|
+
```python
|
|
198
|
+
from nullgaze import ImageDownloader
|
|
199
|
+
|
|
200
|
+
downloader = ImageDownloader(verbose=True)
|
|
201
|
+
|
|
202
|
+
custom_headers = {
|
|
203
|
+
"Referer": "https://custom-referer.com/",
|
|
204
|
+
"Cookie": "session=abc123",
|
|
205
|
+
}
|
|
206
|
+
|
|
207
|
+
downloader.download(
|
|
208
|
+
url="https://example.com/protected/image.png",
|
|
209
|
+
output_path="outputs/image.png",
|
|
210
|
+
headers=custom_headers,
|
|
211
|
+
race_timeout=3.0, # give up after 3s
|
|
212
|
+
)
|
|
213
|
+
```
|
|
214
|
+
|
|
215
|
+
### Page Scraper
|
|
216
|
+
```python
|
|
217
|
+
from nullgaze import PageScraper
|
|
218
|
+
|
|
219
|
+
scraper = PageScraper(verbose=True)
|
|
220
|
+
|
|
221
|
+
# All strategies race in parallel
|
|
222
|
+
html = scraper.scrape("https://example.com", race_timeout=5.0)
|
|
223
|
+
|
|
224
|
+
# Get just the title
|
|
225
|
+
title = scraper.scrape_title("https://example.com")
|
|
226
|
+
|
|
227
|
+
# Get clean text
|
|
228
|
+
text = scraper.scrape_text("https://example.com")
|
|
229
|
+
```
|
|
230
|
+
|
|
231
|
+
---
|
|
232
|
+
|
|
233
|
+
## Strategy Pipeline
|
|
234
|
+
|
|
235
|
+
NullGazeX races these strategies **in parallel** — the first to succeed
|
|
236
|
+
wins, the rest are cancelled:
|
|
237
|
+
|
|
238
|
+
| Tier | Strategy | Speed | Best For |
|
|
239
|
+
| :---: | :--- | :---: | :--- |
|
|
240
|
+
| 1 | **Direct HTTP** | ⚡⚡⚡ ~0.3s | Unrestricted servers |
|
|
241
|
+
| 2 | **DPI-Bypass + TLS Impersonation** | ⚡⚡ ~0.5s | ISP SNI blocks, fingerprint checks |
|
|
242
|
+
| 3 | **Stealth Browser (Chromium)** | 🐢 ~3s | Cloudflare Turnstile, JS challenges |
|
|
243
|
+
| 4 | **Weserv Proxy** *(images only)* | ⚡⚡ ~1s | General fallback |
|
|
244
|
+
|
|
245
|
+
The library caches the successful strategy per domain so subsequent
|
|
246
|
+
requests to the same domain skip the race and go straight to the
|
|
247
|
+
winning strategy.
|
|
248
|
+
|
|
249
|
+
---
|
|
250
|
+
|
|
251
|
+
## Anti-Blocking Features
|
|
252
|
+
|
|
253
|
+
| Mechanism | How It Works |
|
|
254
|
+
| :--- | :--- |
|
|
255
|
+
| **Fingerprint rotation** | Each request picks a random TLS fingerprint (Chrome 110–124, Safari 15–17, Firefox 116–117, Edge 99–110). |
|
|
256
|
+
| **Header rotation** | User-Agent, Accept, Accept-Language, and Sec-Fetch-* headers are randomized per request. |
|
|
257
|
+
| **Jittered delays** | 50–200 ms random delay between bulk dispatches to avoid burst detection. |
|
|
258
|
+
| **Adaptive cooldown** | When a server returns 429/503, the domain is cooled down with exponential backoff (2s → 4s → 8s → capped at 30s). |
|
|
259
|
+
| **Session cycling** | curl\_cffi sessions are cycled every 25 requests to rotate TLS session state. |
|
|
260
|
+
| **Domain strategy cache** | Caches which strategy works per domain — avoids re-discovering on every request. |
|
|
261
|
+
|
|
262
|
+
---
|
|
263
|
+
|
|
264
|
+
## Error Handling
|
|
265
|
+
|
|
266
|
+
```python
|
|
267
|
+
from nullgaze import (
|
|
268
|
+
download_image,
|
|
269
|
+
DownloadFailedError,
|
|
270
|
+
InvalidURLError,
|
|
271
|
+
ScrapeError,
|
|
272
|
+
BlockedError,
|
|
273
|
+
)
|
|
274
|
+
|
|
275
|
+
# Image errors
|
|
276
|
+
try:
|
|
277
|
+
download_image("https://example.com/img.jpg", "output.jpg")
|
|
278
|
+
except InvalidURLError:
|
|
279
|
+
print("Bad URL")
|
|
280
|
+
except DownloadFailedError as e:
|
|
281
|
+
print(f"All strategies failed: {e}")
|
|
282
|
+
|
|
283
|
+
# Scraping errors
|
|
284
|
+
from nullgaze import scrape_page
|
|
285
|
+
try:
|
|
286
|
+
html = scrape_page("https://example.com")
|
|
287
|
+
except BlockedError:
|
|
288
|
+
print("Server explicitly blocked us (403/429/503)")
|
|
289
|
+
except ScrapeError as e:
|
|
290
|
+
print(f"Scraping failed: {e}")
|
|
291
|
+
```
|
|
292
|
+
|
|
293
|
+
---
|
|
294
|
+
|
|
295
|
+
## Pre-Warming for Maximum Speed
|
|
296
|
+
|
|
297
|
+
Call `engine_prewarm()` at startup to pre-launch the DPI-bypass proxy
|
|
298
|
+
and open a trial connection. The first real request will hit a hot path
|
|
299
|
+
in under 100 ms:
|
|
300
|
+
|
|
301
|
+
```python
|
|
302
|
+
from nullgaze import engine_prewarm, download_image
|
|
303
|
+
|
|
304
|
+
engine_prewarm() # proxy ready, connections hot
|
|
305
|
+
|
|
306
|
+
# Now every request starts at full speed
|
|
307
|
+
download_image("https://example.com/img.jpg", "output.jpg")
|
|
308
|
+
```
|
|
309
|
+
|
|
310
|
+
---
|
|
311
|
+
|
|
312
|
+
## License
|
|
313
|
+
|
|
314
|
+
MIT License. See [LICENSE](LICENSE).
|
|
315
|
+
|
|
316
|
+
---
|
|
317
|
+
|
|
318
|
+
## Disclaimer
|
|
319
|
+
|
|
320
|
+
This library is provided as a **research and educational tool** for
|
|
321
|
+
studying network filtering, DPI, TLS fingerprinting, and web scraping
|
|
322
|
+
techniques. The author is **not responsible** for how you use it.
|
|
323
|
+
You are solely responsible for ensuring your use complies with all
|
|
324
|
+
applicable laws, regulations, and the terms of service of any website
|
|
325
|
+
you interact with. By using this software you acknowledge these terms.
|
|
@@ -0,0 +1,13 @@
|
|
|
1
|
+
LICENSE
|
|
2
|
+
README.md
|
|
3
|
+
pyproject.toml
|
|
4
|
+
NullGazeX.egg-info/PKG-INFO
|
|
5
|
+
NullGazeX.egg-info/SOURCES.txt
|
|
6
|
+
NullGazeX.egg-info/dependency_links.txt
|
|
7
|
+
NullGazeX.egg-info/requires.txt
|
|
8
|
+
NullGazeX.egg-info/top_level.txt
|
|
9
|
+
nullgaze/__init__.py
|
|
10
|
+
nullgaze/downloader.py
|
|
11
|
+
nullgaze/exceptions.py
|
|
12
|
+
nullgaze/scraper.py
|
|
13
|
+
nullgaze/utils.py
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
nullgaze
|
nullgazex-2.1.0/PKG-INFO
ADDED
|
@@ -0,0 +1,325 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: NullGazeX
|
|
3
|
+
Version: 2.1.0
|
|
4
|
+
Summary: Research tool for studying DPI, TLS fingerprinting, and content-delivery protections via ClientHello splitting and parallel-strategy bypass. For educational and research purposes only.
|
|
5
|
+
Author-email: Batal <batal@proton.me>
|
|
6
|
+
License-Expression: MIT
|
|
7
|
+
Project-URL: Homepage, https://github.com/batal/NullGazeX
|
|
8
|
+
Classifier: Programming Language :: Python :: 3
|
|
9
|
+
Classifier: Operating System :: OS Independent
|
|
10
|
+
Classifier: Topic :: Internet :: WWW/HTTP
|
|
11
|
+
Classifier: Topic :: Multimedia :: Graphics
|
|
12
|
+
Classifier: Topic :: Scientific/Engineering :: Information Analysis
|
|
13
|
+
Classifier: Intended Audience :: Education
|
|
14
|
+
Classifier: Intended Audience :: Science/Research
|
|
15
|
+
Requires-Python: >=3.8
|
|
16
|
+
Description-Content-Type: text/markdown
|
|
17
|
+
License-File: LICENSE
|
|
18
|
+
Requires-Dist: requests>=2.25.0
|
|
19
|
+
Requires-Dist: curl_cffi>=0.5.0
|
|
20
|
+
Provides-Extra: browser
|
|
21
|
+
Requires-Dist: DrissionPage>=4.0.0; extra == "browser"
|
|
22
|
+
Provides-Extra: scrape
|
|
23
|
+
Requires-Dist: beautifulsoup4>=4.9.0; extra == "scrape"
|
|
24
|
+
Provides-Extra: full
|
|
25
|
+
Requires-Dist: curl_cffi>=0.5.0; extra == "full"
|
|
26
|
+
Requires-Dist: DrissionPage>=4.0.0; extra == "full"
|
|
27
|
+
Requires-Dist: beautifulsoup4>=4.9.0; extra == "full"
|
|
28
|
+
Dynamic: license-file
|
|
29
|
+
|
|
30
|
+
# NullGazeX — Research Tool for Network Filtering Analysis
|
|
31
|
+
|
|
32
|
+
> **Important**: NullGazeX is a **research and educational tool** designed
|
|
33
|
+
> to study network filtering, Deep Packet Inspection (DPI), TLS
|
|
34
|
+
> fingerprinting, and content-delivery protection mechanisms. It is
|
|
35
|
+
> **not** intended for unauthorized access to protected content. The
|
|
36
|
+
> author assumes **no liability** for any misuse or for any consequences
|
|
37
|
+
> arising from its use. By using this software you agree that you are
|
|
38
|
+
> solely responsible for complying with all applicable laws and the
|
|
39
|
+
> terms of service of any website you interact with.
|
|
40
|
+
|
|
41
|
+
NullGazeX implements a multi-tier pipeline of bypass strategies — raced
|
|
42
|
+
in parallel for **terrifying speed** — to retrieve content through ISP
|
|
43
|
+
SNI filters, Cloudflare challenges, TLS fingerprint checks, and
|
|
44
|
+
hotlinking protections.
|
|
45
|
+
|
|
46
|
+
---
|
|
47
|
+
|
|
48
|
+
## What NullGazeX Does
|
|
49
|
+
|
|
50
|
+
| Capability | Description |
|
|
51
|
+
| :--- | :--- |
|
|
52
|
+
| **Image Downloading** | Download images through any firewall or anti-scraping protection. |
|
|
53
|
+
| **Page Scraping** | Retrieve full page HTML, text, titles, or JSON from any URL — undetected. |
|
|
54
|
+
| **DPI Bypass** | ClientHello packet splitting evades ISP SNI filters without external proxies or VPNs. |
|
|
55
|
+
| **TLS Impersonation** | Rotates curl\_cffi browser fingerprints (Chrome, Safari, Firefox, Edge) so traffic looks like a real browser. |
|
|
56
|
+
| **Parallel Strategy Racing** | All bypass strategies launch concurrently — the fastest wins in fractions of a second. |
|
|
57
|
+
| **Adaptive Anti-Blocking** | Per-domain cooldown, jittered delays, session cycling, and fingerprint rotation prevent server-side rate-limiting. |
|
|
58
|
+
| **Stealth Browser Fallback** | Headless Chromium (via DrissionPage) with anti-detection flags for extreme JS challenges. |
|
|
59
|
+
|
|
60
|
+
---
|
|
61
|
+
|
|
62
|
+
## How It Works
|
|
63
|
+
|
|
64
|
+
NullGazeX uses a **local loopback proxy** that intercepts the TLS
|
|
65
|
+
ClientHello handshake packet and splits it into two TCP segments with
|
|
66
|
+
an 8 ms gap between them. The SNI (Server Name Indication) field is
|
|
67
|
+
thus broken across two packets. DPI firewalls inspect packets
|
|
68
|
+
individually and do not reassemble them — so the SNI check never fires.
|
|
69
|
+
|
|
70
|
+
Combined with **rotating TLS fingerprints** (the `curl_cffi` library
|
|
71
|
+
impersonates Chrome 110–124, Safari 15–17, Firefox 116–117, and Edge
|
|
72
|
+
99–110), the traffic is indistinguishable from a real browser.
|
|
73
|
+
|
|
74
|
+
For more details see **[DPI_BYPASS_DOC.md](nullgaze/DPI_BYPASS_DOC.md)**.
|
|
75
|
+
|
|
76
|
+
---
|
|
77
|
+
|
|
78
|
+
## Installation
|
|
79
|
+
|
|
80
|
+
### Standard (image downloading + scraping)
|
|
81
|
+
```bash
|
|
82
|
+
pip install NullGazeX
|
|
83
|
+
```
|
|
84
|
+
|
|
85
|
+
### With browser support (for extreme JS challenges)
|
|
86
|
+
```bash
|
|
87
|
+
pip install NullGazeX[browser]
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
### Full installation (everything)
|
|
91
|
+
```bash
|
|
92
|
+
pip install NullGazeX[full]
|
|
93
|
+
```
|
|
94
|
+
|
|
95
|
+
Dependencies are resolved automatically:
|
|
96
|
+
|
|
97
|
+
| Extra | Includes |
|
|
98
|
+
| :--- | :--- |
|
|
99
|
+
| *(none)* | `requests`, `curl_cffi` |
|
|
100
|
+
| `browser` | `DrissionPage` (headless Chromium) |
|
|
101
|
+
| `scrape` | `beautifulsoup4` (HTML parsing) |
|
|
102
|
+
| `full` | everything above |
|
|
103
|
+
|
|
104
|
+
---
|
|
105
|
+
|
|
106
|
+
## Quick Start — Image Downloading
|
|
107
|
+
|
|
108
|
+
### Single image
|
|
109
|
+
```python
|
|
110
|
+
from nullgaze import download_image
|
|
111
|
+
|
|
112
|
+
download_image(
|
|
113
|
+
url="https://example.com/images/photo.jpg",
|
|
114
|
+
output_path="outputs/photo.jpg",
|
|
115
|
+
verbose=True
|
|
116
|
+
)
|
|
117
|
+
```
|
|
118
|
+
|
|
119
|
+
### Batch download (concurrent)
|
|
120
|
+
```python
|
|
121
|
+
from nullgaze import download_images
|
|
122
|
+
|
|
123
|
+
targets = [
|
|
124
|
+
("https://example.com/img1.jpg", "outputs/img1.jpg"),
|
|
125
|
+
("https://example.com/img2.jpg", "outputs/img2.jpg"),
|
|
126
|
+
("https://example.com/img3.jpg", "outputs/img3.jpg"),
|
|
127
|
+
]
|
|
128
|
+
|
|
129
|
+
results = download_images(targets, max_workers=10, verbose=True)
|
|
130
|
+
|
|
131
|
+
for i, res in enumerate(results):
|
|
132
|
+
if isinstance(res, Exception):
|
|
133
|
+
print(f"Failed {i+1}: {res}")
|
|
134
|
+
else:
|
|
135
|
+
print(f"Saved to: {res}")
|
|
136
|
+
```
|
|
137
|
+
|
|
138
|
+
---
|
|
139
|
+
|
|
140
|
+
## Quick Start — Page Scraping
|
|
141
|
+
|
|
142
|
+
### Scrape a page (raw HTML)
|
|
143
|
+
```python
|
|
144
|
+
from nullgaze import scrape_page
|
|
145
|
+
|
|
146
|
+
html = scrape_page("https://example.com/article", verbose=True)
|
|
147
|
+
print(html[:500])
|
|
148
|
+
```
|
|
149
|
+
|
|
150
|
+
### Scrape clean text
|
|
151
|
+
```python
|
|
152
|
+
from nullgaze import scrape_text
|
|
153
|
+
|
|
154
|
+
text = scrape_text("https://example.com/article")
|
|
155
|
+
print(text)
|
|
156
|
+
```
|
|
157
|
+
|
|
158
|
+
### Get page title
|
|
159
|
+
```python
|
|
160
|
+
from nullgaze import scrape_title
|
|
161
|
+
|
|
162
|
+
title = scrape_title("https://example.com/article")
|
|
163
|
+
print(f"Title: {title}")
|
|
164
|
+
```
|
|
165
|
+
|
|
166
|
+
### Scrape a JSON API
|
|
167
|
+
```python
|
|
168
|
+
from nullgaze import scrape_json
|
|
169
|
+
|
|
170
|
+
data = scrape_json("https://api.example.com/data")
|
|
171
|
+
print(data)
|
|
172
|
+
```
|
|
173
|
+
|
|
174
|
+
### Scrape multiple URLs in parallel
|
|
175
|
+
```python
|
|
176
|
+
from nullgaze import scrape_bulk
|
|
177
|
+
|
|
178
|
+
urls = [
|
|
179
|
+
"https://example.com/page1",
|
|
180
|
+
"https://example.com/page2",
|
|
181
|
+
"https://example.com/page3",
|
|
182
|
+
]
|
|
183
|
+
|
|
184
|
+
results = scrape_bulk(urls, max_workers=10, verbose=True)
|
|
185
|
+
for i, res in enumerate(results):
|
|
186
|
+
if isinstance(res, Exception):
|
|
187
|
+
print(f"Failed {urls[i]}: {res}")
|
|
188
|
+
else:
|
|
189
|
+
print(f"Success: {len(res)} chars")
|
|
190
|
+
```
|
|
191
|
+
|
|
192
|
+
---
|
|
193
|
+
|
|
194
|
+
## Object-Oriented API
|
|
195
|
+
|
|
196
|
+
### Image Downloader
|
|
197
|
+
```python
|
|
198
|
+
from nullgaze import ImageDownloader
|
|
199
|
+
|
|
200
|
+
downloader = ImageDownloader(verbose=True)
|
|
201
|
+
|
|
202
|
+
custom_headers = {
|
|
203
|
+
"Referer": "https://custom-referer.com/",
|
|
204
|
+
"Cookie": "session=abc123",
|
|
205
|
+
}
|
|
206
|
+
|
|
207
|
+
downloader.download(
|
|
208
|
+
url="https://example.com/protected/image.png",
|
|
209
|
+
output_path="outputs/image.png",
|
|
210
|
+
headers=custom_headers,
|
|
211
|
+
race_timeout=3.0, # give up after 3s
|
|
212
|
+
)
|
|
213
|
+
```
|
|
214
|
+
|
|
215
|
+
### Page Scraper
|
|
216
|
+
```python
|
|
217
|
+
from nullgaze import PageScraper
|
|
218
|
+
|
|
219
|
+
scraper = PageScraper(verbose=True)
|
|
220
|
+
|
|
221
|
+
# All strategies race in parallel
|
|
222
|
+
html = scraper.scrape("https://example.com", race_timeout=5.0)
|
|
223
|
+
|
|
224
|
+
# Get just the title
|
|
225
|
+
title = scraper.scrape_title("https://example.com")
|
|
226
|
+
|
|
227
|
+
# Get clean text
|
|
228
|
+
text = scraper.scrape_text("https://example.com")
|
|
229
|
+
```
|
|
230
|
+
|
|
231
|
+
---
|
|
232
|
+
|
|
233
|
+
## Strategy Pipeline
|
|
234
|
+
|
|
235
|
+
NullGazeX races these strategies **in parallel** — the first to succeed
|
|
236
|
+
wins, the rest are cancelled:
|
|
237
|
+
|
|
238
|
+
| Tier | Strategy | Speed | Best For |
|
|
239
|
+
| :---: | :--- | :---: | :--- |
|
|
240
|
+
| 1 | **Direct HTTP** | ⚡⚡⚡ ~0.3s | Unrestricted servers |
|
|
241
|
+
| 2 | **DPI-Bypass + TLS Impersonation** | ⚡⚡ ~0.5s | ISP SNI blocks, fingerprint checks |
|
|
242
|
+
| 3 | **Stealth Browser (Chromium)** | 🐢 ~3s | Cloudflare Turnstile, JS challenges |
|
|
243
|
+
| 4 | **Weserv Proxy** *(images only)* | ⚡⚡ ~1s | General fallback |
|
|
244
|
+
|
|
245
|
+
The library caches the successful strategy per domain so subsequent
|
|
246
|
+
requests to the same domain skip the race and go straight to the
|
|
247
|
+
winning strategy.
|
|
248
|
+
|
|
249
|
+
---
|
|
250
|
+
|
|
251
|
+
## Anti-Blocking Features
|
|
252
|
+
|
|
253
|
+
| Mechanism | How It Works |
|
|
254
|
+
| :--- | :--- |
|
|
255
|
+
| **Fingerprint rotation** | Each request picks a random TLS fingerprint (Chrome 110–124, Safari 15–17, Firefox 116–117, Edge 99–110). |
|
|
256
|
+
| **Header rotation** | User-Agent, Accept, Accept-Language, and Sec-Fetch-* headers are randomized per request. |
|
|
257
|
+
| **Jittered delays** | 50–200 ms random delay between bulk dispatches to avoid burst detection. |
|
|
258
|
+
| **Adaptive cooldown** | When a server returns 429/503, the domain is cooled down with exponential backoff (2s → 4s → 8s → capped at 30s). |
|
|
259
|
+
| **Session cycling** | curl\_cffi sessions are cycled every 25 requests to rotate TLS session state. |
|
|
260
|
+
| **Domain strategy cache** | Caches which strategy works per domain — avoids re-discovering on every request. |
|
|
261
|
+
|
|
262
|
+
---
|
|
263
|
+
|
|
264
|
+
## Error Handling
|
|
265
|
+
|
|
266
|
+
```python
|
|
267
|
+
from nullgaze import (
|
|
268
|
+
download_image,
|
|
269
|
+
DownloadFailedError,
|
|
270
|
+
InvalidURLError,
|
|
271
|
+
ScrapeError,
|
|
272
|
+
BlockedError,
|
|
273
|
+
)
|
|
274
|
+
|
|
275
|
+
# Image errors
|
|
276
|
+
try:
|
|
277
|
+
download_image("https://example.com/img.jpg", "output.jpg")
|
|
278
|
+
except InvalidURLError:
|
|
279
|
+
print("Bad URL")
|
|
280
|
+
except DownloadFailedError as e:
|
|
281
|
+
print(f"All strategies failed: {e}")
|
|
282
|
+
|
|
283
|
+
# Scraping errors
|
|
284
|
+
from nullgaze import scrape_page
|
|
285
|
+
try:
|
|
286
|
+
html = scrape_page("https://example.com")
|
|
287
|
+
except BlockedError:
|
|
288
|
+
print("Server explicitly blocked us (403/429/503)")
|
|
289
|
+
except ScrapeError as e:
|
|
290
|
+
print(f"Scraping failed: {e}")
|
|
291
|
+
```
|
|
292
|
+
|
|
293
|
+
---
|
|
294
|
+
|
|
295
|
+
## Pre-Warming for Maximum Speed
|
|
296
|
+
|
|
297
|
+
Call `engine_prewarm()` at startup to pre-launch the DPI-bypass proxy
|
|
298
|
+
and open a trial connection. The first real request will hit a hot path
|
|
299
|
+
in under 100 ms:
|
|
300
|
+
|
|
301
|
+
```python
|
|
302
|
+
from nullgaze import engine_prewarm, download_image
|
|
303
|
+
|
|
304
|
+
engine_prewarm() # proxy ready, connections hot
|
|
305
|
+
|
|
306
|
+
# Now every request starts at full speed
|
|
307
|
+
download_image("https://example.com/img.jpg", "output.jpg")
|
|
308
|
+
```
|
|
309
|
+
|
|
310
|
+
---
|
|
311
|
+
|
|
312
|
+
## License
|
|
313
|
+
|
|
314
|
+
MIT License. See [LICENSE](LICENSE).
|
|
315
|
+
|
|
316
|
+
---
|
|
317
|
+
|
|
318
|
+
## Disclaimer
|
|
319
|
+
|
|
320
|
+
This library is provided as a **research and educational tool** for
|
|
321
|
+
studying network filtering, DPI, TLS fingerprinting, and web scraping
|
|
322
|
+
techniques. The author is **not responsible** for how you use it.
|
|
323
|
+
You are solely responsible for ensuring your use complies with all
|
|
324
|
+
applicable laws, regulations, and the terms of service of any website
|
|
325
|
+
you interact with. By using this software you acknowledge these terms.
|