recurl-cli 0.1.2__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,226 @@
1
+ Metadata-Version: 2.4
2
+ Name: recurl-cli
3
+ Version: 0.1.2
4
+ Summary: Drop-in curl replacement with automatic anti-bot bypass
5
+ License: MIT
6
+ Project-URL: Homepage, https://github.com/neul-labs/recurl
7
+ Project-URL: Repository, https://github.com/neul-labs/recurl
8
+ Project-URL: Documentation, https://docs.neullabs.com/recurl
9
+ Project-URL: Issues, https://github.com/neul-labs/recurl/issues
10
+ Keywords: curl,http,anti-bot,scraping,cli
11
+ Classifier: Development Status :: 4 - Beta
12
+ Classifier: Intended Audience :: Developers
13
+ Classifier: License :: OSI Approved :: MIT License
14
+ Classifier: Operating System :: OS Independent
15
+ Classifier: Programming Language :: Python :: 3
16
+ Classifier: Programming Language :: Python :: 3.8
17
+ Classifier: Programming Language :: Python :: 3.9
18
+ Classifier: Programming Language :: Python :: 3.10
19
+ Classifier: Programming Language :: Python :: 3.11
20
+ Classifier: Programming Language :: Python :: 3.12
21
+ Classifier: Topic :: Internet :: WWW/HTTP
22
+ Classifier: Topic :: Software Development :: Libraries :: Python Modules
23
+ Classifier: Topic :: Utilities
24
+ Requires-Python: >=3.8
25
+ Description-Content-Type: text/markdown
26
+
27
+ # recurl-cli
28
+
29
+ **Python's missing curl.** Drop-in HTTP client with automatic anti-bot bypass for Python developers, data scientists, and web scrapers.
30
+
31
+ [![PyPI version](https://img.shields.io/pypi/v/recurl-cli.svg)](https://pypi.org/project/recurl-cli/)
32
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
33
+ [![Python Version](https://img.shields.io/pypi/pyversions/recurl-cli.svg)](https://pypi.org/project/recurl-cli/)
34
+
35
+ ---
36
+
37
+ ## What is recurl?
38
+
39
+ recurl is a smart drop-in replacement for `curl` that transparently handles bot detection and anti-bot protections used by modern websites. It runs real curl under the hood, detects when a request is blocked (403, captcha, Cloudflare challenge), and automatically escalates through browser impersonation and headless Chromium rendering to get the response you need.
40
+
41
+ **Same curl syntax. No code changes. It just works.**
42
+
43
+ ```bash
44
+ # Works even on Cloudflare-protected sites
45
+ python -m recurl https://protected-site.com/api/data
46
+ ```
47
+
48
+ ## Why Python developers need recurl
49
+
50
+ If you've ever written Python scripts for web scraping or API access, you've hit these walls:
51
+
52
+ - `requests.get()` returns **403 Forbidden** on protected sites
53
+ - `urllib` gets blocked by TLS fingerprinting
54
+ - You end up installing Selenium, Playwright, or Puppeteer just to fetch a single page
55
+ - Headless browser setup is heavy, slow, and overkill for simple requests
56
+
57
+ recurl solves this by being a **curl replacement with built-in escalation**:
58
+
59
+ 1. **First attempt**: Standard curl request (fast, low overhead)
60
+ 2. **If blocked**: Retries with browser TLS fingerprint impersonation
61
+ 3. **Still blocked**: Launches headless Chromium, solves JS challenges, extracts cookies, replays the request
62
+
63
+ No Python dependencies for browser automation. No heavy browser setup. Just install and use.
64
+
65
+ ## Installation
66
+
67
+ ### pip (recommended)
68
+
69
+ ```bash
70
+ pip install recurl-cli
71
+ ```
72
+
73
+ ### Other package managers
74
+
75
+ | Platform | Command |
76
+ |----------|---------|
77
+ | **npm** | `npm install -g recurl-cli` |
78
+ | **Homebrew** | `brew tap neul-labs/tap && brew install recurl` |
79
+ | **Cargo** | `cargo install recurl` |
80
+ | **Scoop** | `scoop install recurl` |
81
+
82
+ See the [full installation guide](https://github.com/neul-labs/recurl#installation) for platform-specific instructions.
83
+
84
+ ## Quick Start
85
+
86
+ ```bash
87
+ # Use as a Python module
88
+ python -m recurl https://api.example.com/data
89
+
90
+ # Pass through all curl flags
91
+ python -m recurl -X POST -H "Content-Type: application/json" -d '{"key":"value"}' https://api.example.com
92
+
93
+ # Force JS rendering for heavily protected sites
94
+ python -m recurl --recurl-js https://cloudflare-protected-site.com
95
+
96
+ # Debug mode to see escalation steps
97
+ python -m recurl --recurl-debug https://example.com
98
+ ```
99
+
100
+ ### Python API (coming soon)
101
+
102
+ ```python
103
+ from recurl import fetch
104
+
105
+ # Simple fetch that handles anti-bot protections automatically
106
+ response = fetch("https://protected-site.com")
107
+ print(response.text)
108
+ ```
109
+
110
+ ## Supported Anti-Bot Services
111
+
112
+ recurl automatically detects and bypasses protection from:
113
+
114
+ | Service | Detection | Bypass |
115
+ |---------|-----------|--------|
116
+ | Cloudflare | Bot Management, Turnstile, JS Challenge | ✓ |
117
+ | Akamai Bot Manager | Behavioral analysis | ✓ |
118
+ | PerimeterX / HUMAN | Client-side fingerprinting | ✓ |
119
+ | DataDome | Bot Protection | ✓ |
120
+ | Imperva / Incapsula | Challenge pages | ✓ |
121
+ | Kasada | Bot Mitigation | ✓ |
122
+ | AWS WAF Bot Control | Request analysis | ✓ |
123
+ | Shape / F5 | Bot Defense | ✓ |
124
+ | hCaptcha | Challenge widget | ✓ |
125
+ | reCAPTCHA | Challenge widget | ✓ |
126
+
127
+ ## Platform Support
128
+
129
+ | Platform | Architecture | Impersonation | JS Preflight |
130
+ |----------|-------------|:-------------:|:------------:|
131
+ | Linux | x86_64 | ✓ | ✓ |
132
+ | Linux | aarch64 | ✓ | ✓ |
133
+ | macOS | Apple Silicon | ✓ | ✓ |
134
+ | macOS | Intel | ✓ | ✓ |
135
+ | Windows | x86_64 | — | ✓ |
136
+
137
+ ## CLI Reference
138
+
139
+ ### recurl-specific flags
140
+
141
+ | Flag | Description |
142
+ |------|-------------|
143
+ | `--recurl-strict` | Disable fallback, pure curl passthrough |
144
+ | `--recurl-impersonate <profile>` | Force TLS fingerprint profile (chrome, firefox, safari) |
145
+ | `--recurl-js` | Force JS preflight (skip straight to Chromium) |
146
+ | `--recurl-js-rendered` | Return rendered DOM instead of raw response |
147
+ | `--recurl-js-wait <selector>` | Wait for CSS selector before capturing |
148
+ | `--recurl-js-timeout <ms>` | JS preflight timeout (default: 30000) |
149
+ | `--recurl-debug` | Show diagnostic output and escalation steps |
150
+
151
+ All standard curl flags work as expected.
152
+
153
+ ## Use Cases for Python Developers
154
+
155
+ - **Web scraping** - Extract data from protected sites without Selenium/Playwright overhead
156
+ - **Data pipelines** - Reliable HTTP requests in Airflow, Luigi, or cron jobs
157
+ - **API integration** - Test and call APIs behind bot protection
158
+ - **Research & analytics** - Fetch pricing, inventory, or public datasets
159
+ - **CI/CD** - Reliable HTTP calls in GitHub Actions, GitLab CI, Jenkins
160
+ - **Shell scripting from Python** - Use `subprocess.run(["recurl", ...])` for guaranteed delivery
161
+
162
+ ## How It Works
163
+
164
+ ```
165
+ recurl receives request
166
+ |
167
+ +---> curl_engine (real curl binary)
168
+ | |
169
+ | +---> Success? Return response immediately
170
+ | |
171
+ | +---> Blocked? (403, 429, captcha, challenge page)
172
+ | |
173
+ | +---> Retry with impersonation (browser TLS fingerprint)
174
+ | | |
175
+ | | +---> Success? Return response
176
+ | | |
177
+ | | +---> Still blocked?
178
+ | | |
179
+ | | +---> JS preflight (headless Chromium)
180
+ | | |
181
+ | | +---> Solve challenge, extract cookies
182
+ | | |
183
+ | | +---> Replay request with cookies
184
+ | | |
185
+ | | +---> Return final response
186
+ |
187
+ +---> Return result to user
188
+ ```
189
+
190
+ The user sees only the final successful response.
191
+
192
+ ## Configuration
193
+
194
+ ### Environment Variables
195
+
196
+ | Variable | Description |
197
+ |----------|-------------|
198
+ | `RECURL_STRICT=1` | Same as `--recurl-strict` |
199
+ | `RECURL_DEBUG=1` | Enable debug output |
200
+ | `RECURL_DAEMON_IDLE_MS` | Daemon idle timeout (default: 60000) |
201
+
202
+ ### Daemon Mode
203
+
204
+ The optional `recurld` daemon keeps Chromium warm for sub-second responses:
205
+
206
+ ```bash
207
+ # Start daemon
208
+ recurld start
209
+
210
+ # Check status
211
+ recurld status
212
+
213
+ # Stop daemon
214
+ recurld stop
215
+ ```
216
+
217
+ ## Links
218
+
219
+ - **Main Repository**: [github.com/neul-labs/recurl](https://github.com/neul-labs/recurl)
220
+ - **Documentation**: [docs.neullabs.com/recurl](https://docs.neullabs.com/recurl)
221
+ - **Issues**: [github.com/neul-labs/recurl/issues](https://github.com/neul-labs/recurl/issues)
222
+ - **License**: MIT
223
+
224
+ ## Keywords
225
+
226
+ Python HTTP client, curl replacement, web scraping Python, anti-bot bypass, Cloudflare bypass Python, headless browser Python, TLS fingerprint spoofing, bot detection evasion, requests alternative, urllib replacement, Python CLI tool, data extraction, API client Python, web crawler Python, Chromium automation Python
@@ -0,0 +1,200 @@
1
+ # recurl-cli
2
+
3
+ **Python's missing curl.** Drop-in HTTP client with automatic anti-bot bypass for Python developers, data scientists, and web scrapers.
4
+
5
+ [![PyPI version](https://img.shields.io/pypi/v/recurl-cli.svg)](https://pypi.org/project/recurl-cli/)
6
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
7
+ [![Python Version](https://img.shields.io/pypi/pyversions/recurl-cli.svg)](https://pypi.org/project/recurl-cli/)
8
+
9
+ ---
10
+
11
+ ## What is recurl?
12
+
13
+ recurl is a smart drop-in replacement for `curl` that transparently handles bot detection and anti-bot protections used by modern websites. It runs real curl under the hood, detects when a request is blocked (403, captcha, Cloudflare challenge), and automatically escalates through browser impersonation and headless Chromium rendering to get the response you need.
14
+
15
+ **Same curl syntax. No code changes. It just works.**
16
+
17
+ ```bash
18
+ # Works even on Cloudflare-protected sites
19
+ python -m recurl https://protected-site.com/api/data
20
+ ```
21
+
22
+ ## Why Python developers need recurl
23
+
24
+ If you've ever written Python scripts for web scraping or API access, you've hit these walls:
25
+
26
+ - `requests.get()` returns **403 Forbidden** on protected sites
27
+ - `urllib` gets blocked by TLS fingerprinting
28
+ - You end up installing Selenium, Playwright, or Puppeteer just to fetch a single page
29
+ - Headless browser setup is heavy, slow, and overkill for simple requests
30
+
31
+ recurl solves this by being a **curl replacement with built-in escalation**:
32
+
33
+ 1. **First attempt**: Standard curl request (fast, low overhead)
34
+ 2. **If blocked**: Retries with browser TLS fingerprint impersonation
35
+ 3. **Still blocked**: Launches headless Chromium, solves JS challenges, extracts cookies, replays the request
36
+
37
+ No Python dependencies for browser automation. No heavy browser setup. Just install and use.
38
+
39
+ ## Installation
40
+
41
+ ### pip (recommended)
42
+
43
+ ```bash
44
+ pip install recurl-cli
45
+ ```
46
+
47
+ ### Other package managers
48
+
49
+ | Platform | Command |
50
+ |----------|---------|
51
+ | **npm** | `npm install -g recurl-cli` |
52
+ | **Homebrew** | `brew tap neul-labs/tap && brew install recurl` |
53
+ | **Cargo** | `cargo install recurl` |
54
+ | **Scoop** | `scoop install recurl` |
55
+
56
+ See the [full installation guide](https://github.com/neul-labs/recurl#installation) for platform-specific instructions.
57
+
58
+ ## Quick Start
59
+
60
+ ```bash
61
+ # Use as a Python module
62
+ python -m recurl https://api.example.com/data
63
+
64
+ # Pass through all curl flags
65
+ python -m recurl -X POST -H "Content-Type: application/json" -d '{"key":"value"}' https://api.example.com
66
+
67
+ # Force JS rendering for heavily protected sites
68
+ python -m recurl --recurl-js https://cloudflare-protected-site.com
69
+
70
+ # Debug mode to see escalation steps
71
+ python -m recurl --recurl-debug https://example.com
72
+ ```
73
+
74
+ ### Python API (coming soon)
75
+
76
+ ```python
77
+ from recurl import fetch
78
+
79
+ # Simple fetch that handles anti-bot protections automatically
80
+ response = fetch("https://protected-site.com")
81
+ print(response.text)
82
+ ```
83
+
84
+ ## Supported Anti-Bot Services
85
+
86
+ recurl automatically detects and bypasses protection from:
87
+
88
+ | Service | Detection | Bypass |
89
+ |---------|-----------|--------|
90
+ | Cloudflare | Bot Management, Turnstile, JS Challenge | ✓ |
91
+ | Akamai Bot Manager | Behavioral analysis | ✓ |
92
+ | PerimeterX / HUMAN | Client-side fingerprinting | ✓ |
93
+ | DataDome | Bot Protection | ✓ |
94
+ | Imperva / Incapsula | Challenge pages | ✓ |
95
+ | Kasada | Bot Mitigation | ✓ |
96
+ | AWS WAF Bot Control | Request analysis | ✓ |
97
+ | Shape / F5 | Bot Defense | ✓ |
98
+ | hCaptcha | Challenge widget | ✓ |
99
+ | reCAPTCHA | Challenge widget | ✓ |
100
+
101
+ ## Platform Support
102
+
103
+ | Platform | Architecture | Impersonation | JS Preflight |
104
+ |----------|-------------|:-------------:|:------------:|
105
+ | Linux | x86_64 | ✓ | ✓ |
106
+ | Linux | aarch64 | ✓ | ✓ |
107
+ | macOS | Apple Silicon | ✓ | ✓ |
108
+ | macOS | Intel | ✓ | ✓ |
109
+ | Windows | x86_64 | — | ✓ |
110
+
111
+ ## CLI Reference
112
+
113
+ ### recurl-specific flags
114
+
115
+ | Flag | Description |
116
+ |------|-------------|
117
+ | `--recurl-strict` | Disable fallback, pure curl passthrough |
118
+ | `--recurl-impersonate <profile>` | Force TLS fingerprint profile (chrome, firefox, safari) |
119
+ | `--recurl-js` | Force JS preflight (skip straight to Chromium) |
120
+ | `--recurl-js-rendered` | Return rendered DOM instead of raw response |
121
+ | `--recurl-js-wait <selector>` | Wait for CSS selector before capturing |
122
+ | `--recurl-js-timeout <ms>` | JS preflight timeout (default: 30000) |
123
+ | `--recurl-debug` | Show diagnostic output and escalation steps |
124
+
125
+ All standard curl flags work as expected.
126
+
127
+ ## Use Cases for Python Developers
128
+
129
+ - **Web scraping** - Extract data from protected sites without Selenium/Playwright overhead
130
+ - **Data pipelines** - Reliable HTTP requests in Airflow, Luigi, or cron jobs
131
+ - **API integration** - Test and call APIs behind bot protection
132
+ - **Research & analytics** - Fetch pricing, inventory, or public datasets
133
+ - **CI/CD** - Reliable HTTP calls in GitHub Actions, GitLab CI, Jenkins
134
+ - **Shell scripting from Python** - Use `subprocess.run(["recurl", ...])` for guaranteed delivery
135
+
136
+ ## How It Works
137
+
138
+ ```
139
+ recurl receives request
140
+ |
141
+ +---> curl_engine (real curl binary)
142
+ | |
143
+ | +---> Success? Return response immediately
144
+ | |
145
+ | +---> Blocked? (403, 429, captcha, challenge page)
146
+ | |
147
+ | +---> Retry with impersonation (browser TLS fingerprint)
148
+ | | |
149
+ | | +---> Success? Return response
150
+ | | |
151
+ | | +---> Still blocked?
152
+ | | |
153
+ | | +---> JS preflight (headless Chromium)
154
+ | | |
155
+ | | +---> Solve challenge, extract cookies
156
+ | | |
157
+ | | +---> Replay request with cookies
158
+ | | |
159
+ | | +---> Return final response
160
+ |
161
+ +---> Return result to user
162
+ ```
163
+
164
+ The user sees only the final successful response.
165
+
166
+ ## Configuration
167
+
168
+ ### Environment Variables
169
+
170
+ | Variable | Description |
171
+ |----------|-------------|
172
+ | `RECURL_STRICT=1` | Same as `--recurl-strict` |
173
+ | `RECURL_DEBUG=1` | Enable debug output |
174
+ | `RECURL_DAEMON_IDLE_MS` | Daemon idle timeout (default: 60000) |
175
+
176
+ ### Daemon Mode
177
+
178
+ The optional `recurld` daemon keeps Chromium warm for sub-second responses:
179
+
180
+ ```bash
181
+ # Start daemon
182
+ recurld start
183
+
184
+ # Check status
185
+ recurld status
186
+
187
+ # Stop daemon
188
+ recurld stop
189
+ ```
190
+
191
+ ## Links
192
+
193
+ - **Main Repository**: [github.com/neul-labs/recurl](https://github.com/neul-labs/recurl)
194
+ - **Documentation**: [docs.neullabs.com/recurl](https://docs.neullabs.com/recurl)
195
+ - **Issues**: [github.com/neul-labs/recurl/issues](https://github.com/neul-labs/recurl/issues)
196
+ - **License**: MIT
197
+
198
+ ## Keywords
199
+
200
+ Python HTTP client, curl replacement, web scraping Python, anti-bot bypass, Cloudflare bypass Python, headless browser Python, TLS fingerprint spoofing, bot detection evasion, requests alternative, urllib replacement, Python CLI tool, data extraction, API client Python, web crawler Python, Chromium automation Python
@@ -0,0 +1,37 @@
1
+ [build-system]
2
+ requires = ["setuptools>=61.0", "wheel"]
3
+ build-backend = "setuptools.build_meta"
4
+
5
+ [project]
6
+ name = "recurl-cli"
7
+ version = "0.1.2"
8
+ description = "Drop-in curl replacement with automatic anti-bot bypass"
9
+ readme = "README.md"
10
+ license = {text = "MIT"}
11
+ requires-python = ">=3.8"
12
+ keywords = ["curl", "http", "anti-bot", "scraping", "cli"]
13
+ classifiers = [
14
+ "Development Status :: 4 - Beta",
15
+ "Intended Audience :: Developers",
16
+ "License :: OSI Approved :: MIT License",
17
+ "Operating System :: OS Independent",
18
+ "Programming Language :: Python :: 3",
19
+ "Programming Language :: Python :: 3.8",
20
+ "Programming Language :: Python :: 3.9",
21
+ "Programming Language :: Python :: 3.10",
22
+ "Programming Language :: Python :: 3.11",
23
+ "Programming Language :: Python :: 3.12",
24
+ "Topic :: Internet :: WWW/HTTP",
25
+ "Topic :: Software Development :: Libraries :: Python Modules",
26
+ "Topic :: Utilities",
27
+ ]
28
+
29
+ [project.urls]
30
+ Homepage = "https://github.com/neul-labs/recurl"
31
+ Repository = "https://github.com/neul-labs/recurl"
32
+ Documentation = "https://docs.neullabs.com/recurl"
33
+ Issues = "https://github.com/neul-labs/recurl/issues"
34
+
35
+ [project.scripts]
36
+ recurl = "recurl.cli:main"
37
+ recurld = "recurl.cli:main_daemon"
@@ -0,0 +1,60 @@
1
+ """
2
+ recurl - Drop-in curl replacement with automatic anti-bot bypass
3
+
4
+ This is a thin Python wrapper that delegates to the platform-specific
5
+ binary downloaded at install time.
6
+ """
7
+
8
+ import os
9
+ import platform
10
+ import subprocess
11
+ import sys
12
+ from pathlib import Path
13
+
14
+
15
+ def _find_binary(name: str) -> str:
16
+ """Locate the downloaded recurl binary."""
17
+ # 1. Check alongside this package
18
+ package_dir = Path(__file__).parent
19
+ bin_path = package_dir / "bin" / name
20
+ if bin_path.exists():
21
+ return str(bin_path)
22
+
23
+ # 2. Check in PATH
24
+ found = shutil.which(name)
25
+ if found:
26
+ return found
27
+
28
+ raise FileNotFoundError(
29
+ f"Could not find {name} binary. "
30
+ "Try reinstalling: pip install --force-reinstall recurl-cli"
31
+ )
32
+
33
+
34
+ def run(args: list[str] = None) -> int:
35
+ """
36
+ Run recurl with the given CLI arguments.
37
+
38
+ Args:
39
+ args: List of arguments (e.g., ["-s", "https://example.com"]).
40
+ If None, uses sys.argv[1:].
41
+
42
+ Returns:
43
+ Exit code from the recurl process.
44
+ """
45
+ import shutil
46
+
47
+ binary = _find_binary("recurl")
48
+ cmd = [binary] + (args if args is not None else sys.argv[1:])
49
+ result = subprocess.run(cmd)
50
+ return result.returncode
51
+
52
+
53
+ def run_daemon(args: list[str] = None) -> int:
54
+ """Run recurld with the given CLI arguments."""
55
+ import shutil
56
+
57
+ binary = _find_binary("recurld")
58
+ cmd = [binary] + (args if args is not None else sys.argv[1:])
59
+ result = subprocess.run(cmd)
60
+ return result.returncode
@@ -0,0 +1,20 @@
1
+ #!/usr/bin/env python3
2
+ """
3
+ CLI entry points for the recurl Python wrapper.
4
+ """
5
+
6
+ import sys
7
+
8
+ from recurl import run, run_daemon
9
+
10
+
11
+ def main() -> None:
12
+ sys.exit(run())
13
+
14
+
15
+ def main_daemon() -> None:
16
+ sys.exit(run_daemon())
17
+
18
+
19
+ if __name__ == "__main__":
20
+ main()
@@ -0,0 +1,226 @@
1
+ Metadata-Version: 2.4
2
+ Name: recurl-cli
3
+ Version: 0.1.2
4
+ Summary: Drop-in curl replacement with automatic anti-bot bypass
5
+ License: MIT
6
+ Project-URL: Homepage, https://github.com/neul-labs/recurl
7
+ Project-URL: Repository, https://github.com/neul-labs/recurl
8
+ Project-URL: Documentation, https://docs.neullabs.com/recurl
9
+ Project-URL: Issues, https://github.com/neul-labs/recurl/issues
10
+ Keywords: curl,http,anti-bot,scraping,cli
11
+ Classifier: Development Status :: 4 - Beta
12
+ Classifier: Intended Audience :: Developers
13
+ Classifier: License :: OSI Approved :: MIT License
14
+ Classifier: Operating System :: OS Independent
15
+ Classifier: Programming Language :: Python :: 3
16
+ Classifier: Programming Language :: Python :: 3.8
17
+ Classifier: Programming Language :: Python :: 3.9
18
+ Classifier: Programming Language :: Python :: 3.10
19
+ Classifier: Programming Language :: Python :: 3.11
20
+ Classifier: Programming Language :: Python :: 3.12
21
+ Classifier: Topic :: Internet :: WWW/HTTP
22
+ Classifier: Topic :: Software Development :: Libraries :: Python Modules
23
+ Classifier: Topic :: Utilities
24
+ Requires-Python: >=3.8
25
+ Description-Content-Type: text/markdown
26
+
27
+ # recurl-cli
28
+
29
+ **Python's missing curl.** Drop-in HTTP client with automatic anti-bot bypass for Python developers, data scientists, and web scrapers.
30
+
31
+ [![PyPI version](https://img.shields.io/pypi/v/recurl-cli.svg)](https://pypi.org/project/recurl-cli/)
32
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
33
+ [![Python Version](https://img.shields.io/pypi/pyversions/recurl-cli.svg)](https://pypi.org/project/recurl-cli/)
34
+
35
+ ---
36
+
37
+ ## What is recurl?
38
+
39
+ recurl is a smart drop-in replacement for `curl` that transparently handles bot detection and anti-bot protections used by modern websites. It runs real curl under the hood, detects when a request is blocked (403, captcha, Cloudflare challenge), and automatically escalates through browser impersonation and headless Chromium rendering to get the response you need.
40
+
41
+ **Same curl syntax. No code changes. It just works.**
42
+
43
+ ```bash
44
+ # Works even on Cloudflare-protected sites
45
+ python -m recurl https://protected-site.com/api/data
46
+ ```
47
+
48
+ ## Why Python developers need recurl
49
+
50
+ If you've ever written Python scripts for web scraping or API access, you've hit these walls:
51
+
52
+ - `requests.get()` returns **403 Forbidden** on protected sites
53
+ - `urllib` gets blocked by TLS fingerprinting
54
+ - You end up installing Selenium, Playwright, or Puppeteer just to fetch a single page
55
+ - Headless browser setup is heavy, slow, and overkill for simple requests
56
+
57
+ recurl solves this by being a **curl replacement with built-in escalation**:
58
+
59
+ 1. **First attempt**: Standard curl request (fast, low overhead)
60
+ 2. **If blocked**: Retries with browser TLS fingerprint impersonation
61
+ 3. **Still blocked**: Launches headless Chromium, solves JS challenges, extracts cookies, replays the request
62
+
63
+ No Python dependencies for browser automation. No heavy browser setup. Just install and use.
64
+
65
+ ## Installation
66
+
67
+ ### pip (recommended)
68
+
69
+ ```bash
70
+ pip install recurl-cli
71
+ ```
72
+
73
+ ### Other package managers
74
+
75
+ | Platform | Command |
76
+ |----------|---------|
77
+ | **npm** | `npm install -g recurl-cli` |
78
+ | **Homebrew** | `brew tap neul-labs/tap && brew install recurl` |
79
+ | **Cargo** | `cargo install recurl` |
80
+ | **Scoop** | `scoop install recurl` |
81
+
82
+ See the [full installation guide](https://github.com/neul-labs/recurl#installation) for platform-specific instructions.
83
+
84
+ ## Quick Start
85
+
86
+ ```bash
87
+ # Use as a Python module
88
+ python -m recurl https://api.example.com/data
89
+
90
+ # Pass through all curl flags
91
+ python -m recurl -X POST -H "Content-Type: application/json" -d '{"key":"value"}' https://api.example.com
92
+
93
+ # Force JS rendering for heavily protected sites
94
+ python -m recurl --recurl-js https://cloudflare-protected-site.com
95
+
96
+ # Debug mode to see escalation steps
97
+ python -m recurl --recurl-debug https://example.com
98
+ ```
99
+
100
+ ### Python API (coming soon)
101
+
102
+ ```python
103
+ from recurl import fetch
104
+
105
+ # Simple fetch that handles anti-bot protections automatically
106
+ response = fetch("https://protected-site.com")
107
+ print(response.text)
108
+ ```
109
+
110
+ ## Supported Anti-Bot Services
111
+
112
+ recurl automatically detects and bypasses protection from:
113
+
114
+ | Service | Detection | Bypass |
115
+ |---------|-----------|--------|
116
+ | Cloudflare | Bot Management, Turnstile, JS Challenge | ✓ |
117
+ | Akamai Bot Manager | Behavioral analysis | ✓ |
118
+ | PerimeterX / HUMAN | Client-side fingerprinting | ✓ |
119
+ | DataDome | Bot Protection | ✓ |
120
+ | Imperva / Incapsula | Challenge pages | ✓ |
121
+ | Kasada | Bot Mitigation | ✓ |
122
+ | AWS WAF Bot Control | Request analysis | ✓ |
123
+ | Shape / F5 | Bot Defense | ✓ |
124
+ | hCaptcha | Challenge widget | ✓ |
125
+ | reCAPTCHA | Challenge widget | ✓ |
126
+
127
+ ## Platform Support
128
+
129
+ | Platform | Architecture | Impersonation | JS Preflight |
130
+ |----------|-------------|:-------------:|:------------:|
131
+ | Linux | x86_64 | ✓ | ✓ |
132
+ | Linux | aarch64 | ✓ | ✓ |
133
+ | macOS | Apple Silicon | ✓ | ✓ |
134
+ | macOS | Intel | ✓ | ✓ |
135
+ | Windows | x86_64 | — | ✓ |
136
+
137
+ ## CLI Reference
138
+
139
+ ### recurl-specific flags
140
+
141
+ | Flag | Description |
142
+ |------|-------------|
143
+ | `--recurl-strict` | Disable fallback, pure curl passthrough |
144
+ | `--recurl-impersonate <profile>` | Force TLS fingerprint profile (chrome, firefox, safari) |
145
+ | `--recurl-js` | Force JS preflight (skip straight to Chromium) |
146
+ | `--recurl-js-rendered` | Return rendered DOM instead of raw response |
147
+ | `--recurl-js-wait <selector>` | Wait for CSS selector before capturing |
148
+ | `--recurl-js-timeout <ms>` | JS preflight timeout (default: 30000) |
149
+ | `--recurl-debug` | Show diagnostic output and escalation steps |
150
+
151
+ All standard curl flags work as expected.
152
+
153
+ ## Use Cases for Python Developers
154
+
155
+ - **Web scraping** - Extract data from protected sites without Selenium/Playwright overhead
156
+ - **Data pipelines** - Reliable HTTP requests in Airflow, Luigi, or cron jobs
157
+ - **API integration** - Test and call APIs behind bot protection
158
+ - **Research & analytics** - Fetch pricing, inventory, or public datasets
159
+ - **CI/CD** - Reliable HTTP calls in GitHub Actions, GitLab CI, Jenkins
160
+ - **Shell scripting from Python** - Use `subprocess.run(["recurl", ...])` for guaranteed delivery
161
+
162
+ ## How It Works
163
+
164
+ ```
165
+ recurl receives request
166
+ |
167
+ +---> curl_engine (real curl binary)
168
+ | |
169
+ | +---> Success? Return response immediately
170
+ | |
171
+ | +---> Blocked? (403, 429, captcha, challenge page)
172
+ | |
173
+ | +---> Retry with impersonation (browser TLS fingerprint)
174
+ | | |
175
+ | | +---> Success? Return response
176
+ | | |
177
+ | | +---> Still blocked?
178
+ | | |
179
+ | | +---> JS preflight (headless Chromium)
180
+ | | |
181
+ | | +---> Solve challenge, extract cookies
182
+ | | |
183
+ | | +---> Replay request with cookies
184
+ | | |
185
+ | | +---> Return final response
186
+ |
187
+ +---> Return result to user
188
+ ```
189
+
190
+ The user sees only the final successful response.
191
+
192
+ ## Configuration
193
+
194
+ ### Environment Variables
195
+
196
+ | Variable | Description |
197
+ |----------|-------------|
198
+ | `RECURL_STRICT=1` | Same as `--recurl-strict` |
199
+ | `RECURL_DEBUG=1` | Enable debug output |
200
+ | `RECURL_DAEMON_IDLE_MS` | Daemon idle timeout (default: 60000) |
201
+
202
+ ### Daemon Mode
203
+
204
+ The optional `recurld` daemon keeps Chromium warm for sub-second responses:
205
+
206
+ ```bash
207
+ # Start daemon
208
+ recurld start
209
+
210
+ # Check status
211
+ recurld status
212
+
213
+ # Stop daemon
214
+ recurld stop
215
+ ```
216
+
217
+ ## Links
218
+
219
+ - **Main Repository**: [github.com/neul-labs/recurl](https://github.com/neul-labs/recurl)
220
+ - **Documentation**: [docs.neullabs.com/recurl](https://docs.neullabs.com/recurl)
221
+ - **Issues**: [github.com/neul-labs/recurl/issues](https://github.com/neul-labs/recurl/issues)
222
+ - **License**: MIT
223
+
224
+ ## Keywords
225
+
226
+ Python HTTP client, curl replacement, web scraping Python, anti-bot bypass, Cloudflare bypass Python, headless browser Python, TLS fingerprint spoofing, bot detection evasion, requests alternative, urllib replacement, Python CLI tool, data extraction, API client Python, web crawler Python, Chromium automation Python
@@ -0,0 +1,10 @@
1
+ README.md
2
+ pyproject.toml
3
+ setup.py
4
+ recurl/__init__.py
5
+ recurl/cli.py
6
+ recurl_cli.egg-info/PKG-INFO
7
+ recurl_cli.egg-info/SOURCES.txt
8
+ recurl_cli.egg-info/dependency_links.txt
9
+ recurl_cli.egg-info/entry_points.txt
10
+ recurl_cli.egg-info/top_level.txt
@@ -0,0 +1,3 @@
1
+ [console_scripts]
2
+ recurl = recurl.cli:main
3
+ recurld = recurl.cli:main_daemon
@@ -0,0 +1 @@
1
+ recurl
@@ -0,0 +1,4 @@
1
+ [egg_info]
2
+ tag_build =
3
+ tag_date = 0
4
+
@@ -0,0 +1,98 @@
1
+ """
2
+ Custom setuptools installer for recurl.
3
+ Downloads the correct platform binary from GitHub Releases during install.
4
+ """
5
+
6
+ import os
7
+ import platform
8
+ import shutil
9
+ import sys
10
+ import tarfile
11
+ import urllib.request
12
+ import zipfile
13
+ from pathlib import Path
14
+
15
+ from setuptools import setup
16
+ from setuptools.command.install import install
17
+
18
+ VERSION = "0.1.2"
19
+ GITHUB_REPO = "neul-labs/recurl"
20
+
21
+
22
+ def detect_platform():
23
+ system = platform.system().lower()
24
+ machine = platform.machine().lower()
25
+
26
+ plat_map = {
27
+ "darwin": "darwin",
28
+ "linux": "linux",
29
+ "windows": "windows",
30
+ }
31
+ arch_map = {
32
+ "x86_64": "x86_64",
33
+ "amd64": "x86_64",
34
+ "arm64": "aarch64",
35
+ "aarch64": "aarch64",
36
+ }
37
+
38
+ plat = plat_map.get(system)
39
+ arch = arch_map.get(machine)
40
+
41
+ if not plat or not arch:
42
+ raise RuntimeError(
43
+ f"Unsupported platform: {system}-{machine}. "
44
+ "recurl supports: darwin-x86_64, darwin-arm64, linux-x86_64, linux-arm64, windows-x86_64"
45
+ )
46
+
47
+ return plat, arch
48
+
49
+
50
+ def download_binary(package_dir: Path):
51
+ plat, arch = detect_platform()
52
+ ext = "zip" if plat == "windows" else "tar.gz"
53
+ asset_name = f"recurl-{plat}-{arch}.{ext}"
54
+ url = f"https://github.com/{GITHUB_REPO}/releases/download/v{VERSION}/{asset_name}"
55
+
56
+ bin_dir = package_dir / "bin"
57
+ bin_dir.mkdir(parents=True, exist_ok=True)
58
+
59
+ tmp_path = package_dir / asset_name
60
+ print(f"[recurl] Downloading {asset_name}...")
61
+ urllib.request.urlretrieve(url, tmp_path)
62
+
63
+ print("[recurl] Extracting...")
64
+ if ext == "tar.gz":
65
+ with tarfile.open(tmp_path, "r:gz") as tar:
66
+ tar.extractall(path=bin_dir)
67
+ else:
68
+ with zipfile.ZipFile(tmp_path, "r") as zf:
69
+ zf.extractall(path=bin_dir)
70
+
71
+ # The archive has a top-level folder; flatten it
72
+ entries = [e for e in bin_dir.iterdir() if e.is_dir()]
73
+ if entries:
74
+ top = entries[0]
75
+ for item in top.iterdir():
76
+ shutil.move(str(item), str(bin_dir / item.name))
77
+ top.rmdir()
78
+
79
+ tmp_path.unlink(missing_ok=True)
80
+
81
+ # Make executable on Unix
82
+ if plat != "windows":
83
+ for binary in ("recurl", "recurld"):
84
+ binary_path = bin_dir / binary
85
+ if binary_path.exists():
86
+ binary_path.chmod(0o755)
87
+
88
+
89
+ class RecurlInstall(install):
90
+ def run(self):
91
+ install.run(self)
92
+ package_dir = Path(self.install_lib) / "recurl"
93
+ download_binary(package_dir)
94
+
95
+
96
+ setup(
97
+ cmdclass={"install": RecurlInstall},
98
+ )