scrapy-rotating-proxy-middleware 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 JiBao Proxy
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,104 @@
1
+ Metadata-Version: 2.4
2
+ Name: scrapy-rotating-proxy-middleware
3
+ Version: 0.1.0
4
+ Summary: Scrapy downloader middleware that rotates proxies and retries on Cloudflare/DataDome/PerimeterX bans.
5
+ Author-email: JiBao Proxy <support@jibaoproxy.com>
6
+ License: MIT
7
+ Project-URL: Homepage, https://jibaoproxy.com
8
+ Project-URL: Source, https://github.com/jibaoproxyofficial-pixel/scrapy-rotating-proxy-middleware
9
+ Keywords: scrapy,proxy,rotating-proxy,web-scraping,cloudflare,datadome,anti-bot,residential-proxy
10
+ Classifier: Framework :: Scrapy
11
+ Classifier: Intended Audience :: Developers
12
+ Classifier: License :: OSI Approved :: MIT License
13
+ Classifier: Programming Language :: Python :: 3
14
+ Classifier: Topic :: Internet :: WWW/HTTP
15
+ Requires-Python: >=3.8
16
+ Description-Content-Type: text/markdown
17
+ License-File: LICENSE
18
+ Requires-Dist: Scrapy>=2.0
19
+ Requires-Dist: w3lib
20
+ Dynamic: license-file
21
+
22
+ # scrapy-rotating-proxy-middleware
23
+
24
+ [![PyPI version](https://img.shields.io/pypi/v/scrapy-rotating-proxy-middleware.svg)](https://pypi.org/project/scrapy-rotating-proxy-middleware/)
25
+ [![Python versions](https://img.shields.io/pypi/pyversions/scrapy-rotating-proxy-middleware.svg)](https://pypi.org/project/scrapy-rotating-proxy-middleware/)
26
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
27
+
28
+ A drop-in [Scrapy](https://scrapy.org) downloader middleware that **rotates proxies and retries on bans** — `403`, `429`, Cloudflare "Just a moment", DataDome, and PerimeterX challenges. Point it at a static proxy list or a single rotating gateway and your spider stops dying on blocks.
29
+
30
+ ```bash
31
+ pip install scrapy-rotating-proxy-middleware
32
+ ```
33
+
34
+ ## Why
35
+
36
+ Scrapy's built-in `HttpProxyMiddleware` assigns **one** proxy and never reacts when that exit IP gets blocked. In practice most anti-bot blocks aren't about your spider logic — they're about the IP and its [TLS fingerprint](https://jibaoproxy.com/blog/ja3-tls-fingerprint-detection-explained.html) being scored before your request reaches the page. This middleware:
37
+
38
+ - assigns a proxy per request (random from a list, or a rotating gateway),
39
+ - **detects bans** by status code *and* response-body signature (Cloudflare / DataDome / PerimeterX),
40
+ - transparently **rotates to a fresh proxy and retries**, with a per-request retry budget,
41
+ - moves inline `user:pass` credentials into the `Proxy-Authorization` header automatically.
42
+
43
+ ## Setup
44
+
45
+ Enable it in `settings.py` and disable Scrapy's default proxy middleware:
46
+
47
+ ```python
48
+ DOWNLOADER_MIDDLEWARES = {
49
+ "scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware": None,
50
+ "scrapy_rotating_proxy.middleware.RotatingProxyMiddleware": 610,
51
+ }
52
+ ```
53
+
54
+ ### Option A — a rotating residential gateway (recommended)
55
+
56
+ A residential gateway gives you a **new exit IP on every connection** from a single URL, so you don't manage a list at all:
57
+
58
+ ```python
59
+ # settings.py
60
+ ROTATING_PROXY_GATEWAY = "http://USERNAME:PASSWORD@us.jibaoproxy.com:913"
61
+ ```
62
+
63
+ ### Option B — a static proxy list
64
+
65
+ ```python
66
+ ROTATING_PROXY_LIST = [
67
+ "http://USERNAME:PASSWORD@proxy-a.example.com:8000",
68
+ "http://USERNAME:PASSWORD@proxy-b.example.com:8000",
69
+ "socks5://USERNAME:PASSWORD@proxy-c.example.com:1080",
70
+ ]
71
+ ```
72
+
73
+ That's it — run your spider as usual.
74
+
75
+ ## Configuration
76
+
77
+ | Setting | Default | Description |
78
+ | --- | --- | --- |
79
+ | `ROTATING_PROXY_GATEWAY` | – | Single rotating-gateway URL. |
80
+ | `ROTATING_PROXY_LIST` | – | List of proxy URLs (used if no gateway). |
81
+ | `ROTATING_PROXY_BAN_CODES` | `403, 407, 429, 503` | Status codes treated as bans. |
82
+ | `ROTATING_PROXY_MAX_RETRIES` | `5` | Proxy rotations per request before giving up. |
83
+
84
+ Set a proxy on a single request explicitly and the middleware leaves it alone:
85
+
86
+ ```python
87
+ yield scrapy.Request(url, meta={"proxy": "http://USERNAME:PASSWORD@host:port"})
88
+ ```
89
+
90
+ ## Ban detection
91
+
92
+ A response counts as a ban when its status is in `ROTATING_PROXY_BAN_CODES`, **or** the first 4 KB of the body matches a known anti-bot signature (`cf-chl`, `Just a moment`, `Attention Required`, `captcha-delivery`/DataDome, `px-captcha`/PerimeterX). On a ban the request is re-scheduled with a fresh proxy and `dont_filter=True`, up to the retry budget.
93
+
94
+ If you keep hitting bans after rotation, the exit IPs themselves are the problem — datacenter ranges get scored as bot traffic at the ASN level. Residential exits with clean ASN reputation are what actually pass. We build [JiBao Proxy](https://jibaoproxy.com) for exactly this: 72M+ residential IPs across 200+ countries, sticky sessions, and SOCKS5/HTTP gateways. The middleware works with any provider, though.
95
+
96
+ ## Related
97
+
98
+ - [Scrapy proxy middleware: the complete guide](https://jibaoproxy.com/blog/scrapy-proxy-middleware-guide.html)
99
+ - [Why your JA3/TLS fingerprint gets you blocked](https://jibaoproxy.com/blog/ja3-tls-fingerprint-detection-explained.html)
100
+ - [Bypassing DataDome & PerimeterX in 2026](https://jibaoproxy.com/blog/datadome-perimeterx-bypass-2026.html)
101
+
102
+ ## License
103
+
104
+ MIT
@@ -0,0 +1,83 @@
1
+ # scrapy-rotating-proxy-middleware
2
+
3
+ [![PyPI version](https://img.shields.io/pypi/v/scrapy-rotating-proxy-middleware.svg)](https://pypi.org/project/scrapy-rotating-proxy-middleware/)
4
+ [![Python versions](https://img.shields.io/pypi/pyversions/scrapy-rotating-proxy-middleware.svg)](https://pypi.org/project/scrapy-rotating-proxy-middleware/)
5
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
6
+
7
+ A drop-in [Scrapy](https://scrapy.org) downloader middleware that **rotates proxies and retries on bans** — `403`, `429`, Cloudflare "Just a moment", DataDome, and PerimeterX challenges. Point it at a static proxy list or a single rotating gateway and your spider stops dying on blocks.
8
+
9
+ ```bash
10
+ pip install scrapy-rotating-proxy-middleware
11
+ ```
12
+
13
+ ## Why
14
+
15
+ Scrapy's built-in `HttpProxyMiddleware` assigns **one** proxy and never reacts when that exit IP gets blocked. In practice most anti-bot blocks aren't about your spider logic — they're about the IP and its [TLS fingerprint](https://jibaoproxy.com/blog/ja3-tls-fingerprint-detection-explained.html) being scored before your request reaches the page. This middleware:
16
+
17
+ - assigns a proxy per request (random from a list, or a rotating gateway),
18
+ - **detects bans** by status code *and* response-body signature (Cloudflare / DataDome / PerimeterX),
19
+ - transparently **rotates to a fresh proxy and retries**, with a per-request retry budget,
20
+ - moves inline `user:pass` credentials into the `Proxy-Authorization` header automatically.
21
+
22
+ ## Setup
23
+
24
+ Enable it in `settings.py` and disable Scrapy's default proxy middleware:
25
+
26
+ ```python
27
+ DOWNLOADER_MIDDLEWARES = {
28
+ "scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware": None,
29
+ "scrapy_rotating_proxy.middleware.RotatingProxyMiddleware": 610,
30
+ }
31
+ ```
32
+
33
+ ### Option A — a rotating residential gateway (recommended)
34
+
35
+ A residential gateway gives you a **new exit IP on every connection** from a single URL, so you don't manage a list at all:
36
+
37
+ ```python
38
+ # settings.py
39
+ ROTATING_PROXY_GATEWAY = "http://USERNAME:PASSWORD@us.jibaoproxy.com:913"
40
+ ```
41
+
42
+ ### Option B — a static proxy list
43
+
44
+ ```python
45
+ ROTATING_PROXY_LIST = [
46
+ "http://USERNAME:PASSWORD@proxy-a.example.com:8000",
47
+ "http://USERNAME:PASSWORD@proxy-b.example.com:8000",
48
+ "socks5://USERNAME:PASSWORD@proxy-c.example.com:1080",
49
+ ]
50
+ ```
51
+
52
+ That's it — run your spider as usual.
53
+
54
+ ## Configuration
55
+
56
+ | Setting | Default | Description |
57
+ | --- | --- | --- |
58
+ | `ROTATING_PROXY_GATEWAY` | – | Single rotating-gateway URL. |
59
+ | `ROTATING_PROXY_LIST` | – | List of proxy URLs (used if no gateway). |
60
+ | `ROTATING_PROXY_BAN_CODES` | `403, 407, 429, 503` | Status codes treated as bans. |
61
+ | `ROTATING_PROXY_MAX_RETRIES` | `5` | Proxy rotations per request before giving up. |
62
+
63
+ Set a proxy on a single request explicitly and the middleware leaves it alone:
64
+
65
+ ```python
66
+ yield scrapy.Request(url, meta={"proxy": "http://USERNAME:PASSWORD@host:port"})
67
+ ```
68
+
69
+ ## Ban detection
70
+
71
+ A response counts as a ban when its status is in `ROTATING_PROXY_BAN_CODES`, **or** the first 4 KB of the body matches a known anti-bot signature (`cf-chl`, `Just a moment`, `Attention Required`, `captcha-delivery`/DataDome, `px-captcha`/PerimeterX). On a ban the request is re-scheduled with a fresh proxy and `dont_filter=True`, up to the retry budget.
72
+
73
+ If you keep hitting bans after rotation, the exit IPs themselves are the problem — datacenter ranges get scored as bot traffic at the ASN level. Residential exits with clean ASN reputation are what actually pass. We build [JiBao Proxy](https://jibaoproxy.com) for exactly this: 72M+ residential IPs across 200+ countries, sticky sessions, and SOCKS5/HTTP gateways. The middleware works with any provider, though.
74
+
75
+ ## Related
76
+
77
+ - [Scrapy proxy middleware: the complete guide](https://jibaoproxy.com/blog/scrapy-proxy-middleware-guide.html)
78
+ - [Why your JA3/TLS fingerprint gets you blocked](https://jibaoproxy.com/blog/ja3-tls-fingerprint-detection-explained.html)
79
+ - [Bypassing DataDome & PerimeterX in 2026](https://jibaoproxy.com/blog/datadome-perimeterx-bypass-2026.html)
80
+
81
+ ## License
82
+
83
+ MIT
@@ -0,0 +1,28 @@
1
+ [build-system]
2
+ requires = ["setuptools>=61"]
3
+ build-backend = "setuptools.build_meta"
4
+
5
+ [project]
6
+ name = "scrapy-rotating-proxy-middleware"
7
+ version = "0.1.0"
8
+ description = "Scrapy downloader middleware that rotates proxies and retries on Cloudflare/DataDome/PerimeterX bans."
9
+ readme = "README.md"
10
+ requires-python = ">=3.8"
11
+ license = { text = "MIT" }
12
+ authors = [{ name = "JiBao Proxy", email = "support@jibaoproxy.com" }]
13
+ keywords = ["scrapy", "proxy", "rotating-proxy", "web-scraping", "cloudflare", "datadome", "anti-bot", "residential-proxy"]
14
+ classifiers = [
15
+ "Framework :: Scrapy",
16
+ "Intended Audience :: Developers",
17
+ "License :: OSI Approved :: MIT License",
18
+ "Programming Language :: Python :: 3",
19
+ "Topic :: Internet :: WWW/HTTP",
20
+ ]
21
+ dependencies = ["Scrapy>=2.0", "w3lib"]
22
+
23
+ [project.urls]
24
+ Homepage = "https://jibaoproxy.com"
25
+ Source = "https://github.com/jibaoproxyofficial-pixel/scrapy-rotating-proxy-middleware"
26
+
27
+ [tool.setuptools]
28
+ packages = ["scrapy_rotating_proxy"]
@@ -0,0 +1,4 @@
1
+ from .middleware import RotatingProxyMiddleware
2
+
3
+ __all__ = ["RotatingProxyMiddleware"]
4
+ __version__ = "0.1.0"
@@ -0,0 +1,123 @@
1
+ """Drop-in Scrapy downloader middleware for rotating proxies with ban detection.
2
+
3
+ Works with any HTTP/HTTPS/SOCKS proxy — a static list, or a single rotating
4
+ gateway (residential providers rotate the exit IP for you on every connection).
5
+ On an anti-bot block (403/429/Cloudflare/DataDome) it transparently rotates the
6
+ proxy and retries, so your spider keeps flowing instead of dying on bans.
7
+ """
8
+ import logging
9
+ import random
10
+ from urllib.parse import urlparse, urlunparse
11
+
12
+ from scrapy.exceptions import NotConfigured
13
+ from w3lib.http import basic_auth_header
14
+
15
+ logger = logging.getLogger(__name__)
16
+
17
+ # Default response signals that mean "this exit IP got blocked, try another".
18
+ DEFAULT_BAN_CODES = {403, 407, 429, 503}
19
+ DEFAULT_BAN_MARKERS = (
20
+ b"cf-chl", # Cloudflare challenge
21
+ b"Just a moment", # Cloudflare interstitial
22
+ b"Attention Required", # Cloudflare block
23
+ b"Access denied",
24
+ b"captcha-delivery", # DataDome
25
+ b"datadome",
26
+ b"px-captcha", # PerimeterX
27
+ )
28
+
29
+
30
+ class RotatingProxyMiddleware:
31
+ """Assign a proxy per request and retry on bans by rotating to a fresh one.
32
+
33
+ Settings:
34
+ ROTATING_PROXY_LIST list of proxy URLs (http://user:pass@host:port)
35
+ ROTATING_PROXY_GATEWAY single rotating-gateway URL (alternative to LIST)
36
+ ROTATING_PROXY_BAN_CODES status codes treated as bans (default 403/407/429/503)
37
+ ROTATING_PROXY_MAX_RETRIES retries per request before giving up (default 5)
38
+ """
39
+
40
+ def __init__(self, proxies, gateway, ban_codes, max_retries):
41
+ self.proxies = proxies or []
42
+ self.gateway = gateway
43
+ self.ban_codes = ban_codes
44
+ self.max_retries = max_retries
45
+
46
+ @classmethod
47
+ def from_crawler(cls, crawler):
48
+ s = crawler.settings
49
+ proxies = s.getlist("ROTATING_PROXY_LIST")
50
+ gateway = s.get("ROTATING_PROXY_GATEWAY")
51
+ if not proxies and not gateway:
52
+ raise NotConfigured(
53
+ "Set ROTATING_PROXY_LIST or ROTATING_PROXY_GATEWAY to use "
54
+ "RotatingProxyMiddleware."
55
+ )
56
+ ban_codes = set(s.getlist("ROTATING_PROXY_BAN_CODES")) or DEFAULT_BAN_CODES
57
+ ban_codes = {int(c) for c in ban_codes}
58
+ max_retries = s.getint("ROTATING_PROXY_MAX_RETRIES", 5)
59
+ return cls(proxies, gateway, ban_codes, max_retries)
60
+
61
+ def _pick(self):
62
+ """Return the next proxy URL: random from the list, or the gateway."""
63
+ if self.proxies:
64
+ return random.choice(self.proxies)
65
+ return self.gateway
66
+
67
+ @staticmethod
68
+ def _apply(request, proxy_url):
69
+ """Set the proxy on the request, moving any inline credentials into the
70
+ Proxy-Authorization header (Scrapy does not read user:pass from the URL)."""
71
+ parsed = urlparse(proxy_url)
72
+ if parsed.username:
73
+ creds = basic_auth_header(parsed.username, parsed.password or "")
74
+ request.headers[b"Proxy-Authorization"] = creds
75
+ netloc = parsed.hostname + (f":{parsed.port}" if parsed.port else "")
76
+ proxy_url = urlunparse((parsed.scheme, netloc, "", "", "", ""))
77
+ request.meta["proxy"] = proxy_url
78
+
79
+ def process_request(self, request, spider):
80
+ if "proxy" in request.meta and not request.meta.get("_rotating"):
81
+ return # respect a proxy the spider set deliberately
82
+ self._apply(request, self._pick())
83
+
84
+ def _is_ban(self, response):
85
+ if response.status in self.ban_codes:
86
+ return True
87
+ body = response.body[:4096].lower()
88
+ return any(m.lower() in body for m in DEFAULT_BAN_MARKERS)
89
+
90
+ def process_response(self, request, response, spider):
91
+ if not self._is_ban(response):
92
+ return response
93
+ retries = request.meta.get("_proxy_retries", 0)
94
+ if retries >= self.max_retries:
95
+ logger.warning(
96
+ "Gave up on %s after %d proxy rotations (last status %d)",
97
+ request.url, retries, response.status,
98
+ )
99
+ return response
100
+ logger.debug(
101
+ "Ban detected (%d) on %s — rotating proxy, retry %d/%d",
102
+ response.status, request.url, retries + 1, self.max_retries,
103
+ )
104
+ new = request.copy()
105
+ new.meta["_proxy_retries"] = retries + 1
106
+ new.meta["_rotating"] = True
107
+ new.meta.pop("proxy", None)
108
+ self._apply(new, self._pick())
109
+ new.dont_filter = True
110
+ return new
111
+
112
+ def process_exception(self, request, exception, spider):
113
+ # Connection errors to a dead proxy: rotate and retry, same budget.
114
+ retries = request.meta.get("_proxy_retries", 0)
115
+ if retries >= self.max_retries:
116
+ return None
117
+ new = request.copy()
118
+ new.meta["_proxy_retries"] = retries + 1
119
+ new.meta["_rotating"] = True
120
+ new.meta.pop("proxy", None)
121
+ self._apply(new, self._pick())
122
+ new.dont_filter = True
123
+ return new
@@ -0,0 +1,104 @@
1
+ Metadata-Version: 2.4
2
+ Name: scrapy-rotating-proxy-middleware
3
+ Version: 0.1.0
4
+ Summary: Scrapy downloader middleware that rotates proxies and retries on Cloudflare/DataDome/PerimeterX bans.
5
+ Author-email: JiBao Proxy <support@jibaoproxy.com>
6
+ License: MIT
7
+ Project-URL: Homepage, https://jibaoproxy.com
8
+ Project-URL: Source, https://github.com/jibaoproxyofficial-pixel/scrapy-rotating-proxy-middleware
9
+ Keywords: scrapy,proxy,rotating-proxy,web-scraping,cloudflare,datadome,anti-bot,residential-proxy
10
+ Classifier: Framework :: Scrapy
11
+ Classifier: Intended Audience :: Developers
12
+ Classifier: License :: OSI Approved :: MIT License
13
+ Classifier: Programming Language :: Python :: 3
14
+ Classifier: Topic :: Internet :: WWW/HTTP
15
+ Requires-Python: >=3.8
16
+ Description-Content-Type: text/markdown
17
+ License-File: LICENSE
18
+ Requires-Dist: Scrapy>=2.0
19
+ Requires-Dist: w3lib
20
+ Dynamic: license-file
21
+
22
+ # scrapy-rotating-proxy-middleware
23
+
24
+ [![PyPI version](https://img.shields.io/pypi/v/scrapy-rotating-proxy-middleware.svg)](https://pypi.org/project/scrapy-rotating-proxy-middleware/)
25
+ [![Python versions](https://img.shields.io/pypi/pyversions/scrapy-rotating-proxy-middleware.svg)](https://pypi.org/project/scrapy-rotating-proxy-middleware/)
26
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
27
+
28
+ A drop-in [Scrapy](https://scrapy.org) downloader middleware that **rotates proxies and retries on bans** — `403`, `429`, Cloudflare "Just a moment", DataDome, and PerimeterX challenges. Point it at a static proxy list or a single rotating gateway and your spider stops dying on blocks.
29
+
30
+ ```bash
31
+ pip install scrapy-rotating-proxy-middleware
32
+ ```
33
+
34
+ ## Why
35
+
36
+ Scrapy's built-in `HttpProxyMiddleware` assigns **one** proxy and never reacts when that exit IP gets blocked. In practice most anti-bot blocks aren't about your spider logic — they're about the IP and its [TLS fingerprint](https://jibaoproxy.com/blog/ja3-tls-fingerprint-detection-explained.html) being scored before your request reaches the page. This middleware:
37
+
38
+ - assigns a proxy per request (random from a list, or a rotating gateway),
39
+ - **detects bans** by status code *and* response-body signature (Cloudflare / DataDome / PerimeterX),
40
+ - transparently **rotates to a fresh proxy and retries**, with a per-request retry budget,
41
+ - moves inline `user:pass` credentials into the `Proxy-Authorization` header automatically.
42
+
43
+ ## Setup
44
+
45
+ Enable it in `settings.py` and disable Scrapy's default proxy middleware:
46
+
47
+ ```python
48
+ DOWNLOADER_MIDDLEWARES = {
49
+ "scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware": None,
50
+ "scrapy_rotating_proxy.middleware.RotatingProxyMiddleware": 610,
51
+ }
52
+ ```
53
+
54
+ ### Option A — a rotating residential gateway (recommended)
55
+
56
+ A residential gateway gives you a **new exit IP on every connection** from a single URL, so you don't manage a list at all:
57
+
58
+ ```python
59
+ # settings.py
60
+ ROTATING_PROXY_GATEWAY = "http://USERNAME:PASSWORD@us.jibaoproxy.com:913"
61
+ ```
62
+
63
+ ### Option B — a static proxy list
64
+
65
+ ```python
66
+ ROTATING_PROXY_LIST = [
67
+ "http://USERNAME:PASSWORD@proxy-a.example.com:8000",
68
+ "http://USERNAME:PASSWORD@proxy-b.example.com:8000",
69
+ "socks5://USERNAME:PASSWORD@proxy-c.example.com:1080",
70
+ ]
71
+ ```
72
+
73
+ That's it — run your spider as usual.
74
+
75
+ ## Configuration
76
+
77
+ | Setting | Default | Description |
78
+ | --- | --- | --- |
79
+ | `ROTATING_PROXY_GATEWAY` | – | Single rotating-gateway URL. |
80
+ | `ROTATING_PROXY_LIST` | – | List of proxy URLs (used if no gateway). |
81
+ | `ROTATING_PROXY_BAN_CODES` | `403, 407, 429, 503` | Status codes treated as bans. |
82
+ | `ROTATING_PROXY_MAX_RETRIES` | `5` | Proxy rotations per request before giving up. |
83
+
84
+ Set a proxy on a single request explicitly and the middleware leaves it alone:
85
+
86
+ ```python
87
+ yield scrapy.Request(url, meta={"proxy": "http://USERNAME:PASSWORD@host:port"})
88
+ ```
89
+
90
+ ## Ban detection
91
+
92
+ A response counts as a ban when its status is in `ROTATING_PROXY_BAN_CODES`, **or** the first 4 KB of the body matches a known anti-bot signature (`cf-chl`, `Just a moment`, `Attention Required`, `captcha-delivery`/DataDome, `px-captcha`/PerimeterX). On a ban the request is re-scheduled with a fresh proxy and `dont_filter=True`, up to the retry budget.
93
+
94
+ If you keep hitting bans after rotation, the exit IPs themselves are the problem — datacenter ranges get scored as bot traffic at the ASN level. Residential exits with clean ASN reputation are what actually pass. We build [JiBao Proxy](https://jibaoproxy.com) for exactly this: 72M+ residential IPs across 200+ countries, sticky sessions, and SOCKS5/HTTP gateways. The middleware works with any provider, though.
95
+
96
+ ## Related
97
+
98
+ - [Scrapy proxy middleware: the complete guide](https://jibaoproxy.com/blog/scrapy-proxy-middleware-guide.html)
99
+ - [Why your JA3/TLS fingerprint gets you blocked](https://jibaoproxy.com/blog/ja3-tls-fingerprint-detection-explained.html)
100
+ - [Bypassing DataDome & PerimeterX in 2026](https://jibaoproxy.com/blog/datadome-perimeterx-bypass-2026.html)
101
+
102
+ ## License
103
+
104
+ MIT
@@ -0,0 +1,11 @@
1
+ LICENSE
2
+ README.md
3
+ pyproject.toml
4
+ scrapy_rotating_proxy/__init__.py
5
+ scrapy_rotating_proxy/middleware.py
6
+ scrapy_rotating_proxy_middleware.egg-info/PKG-INFO
7
+ scrapy_rotating_proxy_middleware.egg-info/SOURCES.txt
8
+ scrapy_rotating_proxy_middleware.egg-info/dependency_links.txt
9
+ scrapy_rotating_proxy_middleware.egg-info/requires.txt
10
+ scrapy_rotating_proxy_middleware.egg-info/top_level.txt
11
+ tests/test_middleware.py
@@ -0,0 +1,4 @@
1
+ [egg_info]
2
+ tag_build =
3
+ tag_date = 0
4
+
@@ -0,0 +1,37 @@
1
+ """Lightweight tests that don't require live network."""
2
+ import scrapy
3
+ from scrapy_rotating_proxy.middleware import RotatingProxyMiddleware
4
+
5
+
6
+ def _mw(**settings):
7
+ from scrapy.settings import Settings
8
+ from scrapy.crawler import Crawler
9
+ crawler = Crawler(scrapy.Spider, Settings(settings))
10
+ return RotatingProxyMiddleware.from_crawler(crawler)
11
+
12
+
13
+ def test_credentials_move_to_header():
14
+ mw = _mw(ROTATING_PROXY_GATEWAY="http://user:pass@gw.example.com:913")
15
+ req = scrapy.Request("https://httpbin.org/ip")
16
+ mw.process_request(req, None)
17
+ assert req.meta["proxy"] == "http://gw.example.com:913"
18
+ assert b"Proxy-Authorization" in req.headers
19
+
20
+
21
+ def test_ban_status_triggers_rotation():
22
+ mw = _mw(ROTATING_PROXY_GATEWAY="http://gw.example.com:913")
23
+ req = scrapy.Request("https://example.com")
24
+ resp = scrapy.http.TextResponse("https://example.com", status=403, body=b"")
25
+ out = mw.process_response(req, resp, None)
26
+ assert isinstance(out, scrapy.Request)
27
+ assert out.meta["_proxy_retries"] == 1
28
+
29
+
30
+ def test_cloudflare_body_triggers_rotation():
31
+ mw = _mw(ROTATING_PROXY_GATEWAY="http://gw.example.com:913")
32
+ req = scrapy.Request("https://example.com")
33
+ resp = scrapy.http.TextResponse(
34
+ "https://example.com", status=200, body=b"<title>Just a moment...</title>"
35
+ )
36
+ out = mw.process_response(req, resp, None)
37
+ assert isinstance(out, scrapy.Request)