webtools-cli 1.0.0__tar.gz → 1.0.3__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- webtools_cli-1.0.3/PKG-INFO +162 -0
- webtools_cli-1.0.3/README.md +127 -0
- {webtools_cli-1.0.0 → webtools_cli-1.0.3}/pyproject.toml +3 -4
- {webtools_cli-1.0.0 → webtools_cli-1.0.3}/webtools/core.py +50 -28
- webtools_cli-1.0.3/webtools_cli.egg-info/PKG-INFO +162 -0
- webtools_cli-1.0.0/PKG-INFO +0 -110
- webtools_cli-1.0.0/README.md +0 -74
- webtools_cli-1.0.0/webtools_cli.egg-info/PKG-INFO +0 -110
- {webtools_cli-1.0.0 → webtools_cli-1.0.3}/LICENSE +0 -0
- {webtools_cli-1.0.0 → webtools_cli-1.0.3}/setup.cfg +0 -0
- {webtools_cli-1.0.0 → webtools_cli-1.0.3}/webtools/__init__.py +0 -0
- {webtools_cli-1.0.0 → webtools_cli-1.0.3}/webtools/__main__.py +0 -0
- {webtools_cli-1.0.0 → webtools_cli-1.0.3}/webtools/cli.py +0 -0
- {webtools_cli-1.0.0 → webtools_cli-1.0.3}/webtools/web/Web_Tools.png +0 -0
- {webtools_cli-1.0.0 → webtools_cli-1.0.3}/webtools/web/index.html +0 -0
- {webtools_cli-1.0.0 → webtools_cli-1.0.3}/webtools/web/script.js +0 -0
- {webtools_cli-1.0.0 → webtools_cli-1.0.3}/webtools/web/style.css +0 -0
- {webtools_cli-1.0.0 → webtools_cli-1.0.3}/webtools_cli.egg-info/SOURCES.txt +0 -0
- {webtools_cli-1.0.0 → webtools_cli-1.0.3}/webtools_cli.egg-info/dependency_links.txt +0 -0
- {webtools_cli-1.0.0 → webtools_cli-1.0.3}/webtools_cli.egg-info/entry_points.txt +0 -0
- {webtools_cli-1.0.0 → webtools_cli-1.0.3}/webtools_cli.egg-info/requires.txt +0 -0
- {webtools_cli-1.0.0 → webtools_cli-1.0.3}/webtools_cli.egg-info/top_level.txt +0 -0
|
@@ -0,0 +1,162 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: webtools-cli
|
|
3
|
+
Version: 1.0.3
|
|
4
|
+
Summary: Advanced Web Intelligence & Scraping Toolkit with CLI and Web UI
|
|
5
|
+
Author: Abhinav Adarsh
|
|
6
|
+
License-Expression: MIT
|
|
7
|
+
Project-URL: Homepage, https://github.com/abhinavgautam08/webtools-cli
|
|
8
|
+
Keywords: web-scraping,osint,seo,intelligence,cli
|
|
9
|
+
Classifier: Development Status :: 4 - Beta
|
|
10
|
+
Classifier: Environment :: Console
|
|
11
|
+
Classifier: Intended Audience :: Developers
|
|
12
|
+
Classifier: Programming Language :: Python :: 3
|
|
13
|
+
Classifier: Programming Language :: Python :: 3.9
|
|
14
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
15
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
16
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
17
|
+
Classifier: Topic :: Internet :: WWW/HTTP
|
|
18
|
+
Requires-Python: >=3.9
|
|
19
|
+
Description-Content-Type: text/markdown
|
|
20
|
+
License-File: LICENSE
|
|
21
|
+
Requires-Dist: flask
|
|
22
|
+
Requires-Dist: requests
|
|
23
|
+
Requires-Dist: beautifulsoup4
|
|
24
|
+
Requires-Dist: qrcode
|
|
25
|
+
Requires-Dist: opencv-python
|
|
26
|
+
Requires-Dist: numpy
|
|
27
|
+
Requires-Dist: textblob
|
|
28
|
+
Requires-Dist: Pillow
|
|
29
|
+
Requires-Dist: mtranslate
|
|
30
|
+
Requires-Dist: colorama
|
|
31
|
+
Requires-Dist: pyreadline3; platform_system == "Windows"
|
|
32
|
+
Provides-Extra: playwright
|
|
33
|
+
Requires-Dist: playwright; extra == "playwright"
|
|
34
|
+
Dynamic: license-file
|
|
35
|
+
|
|
36
|
+
# WebTools CLI
|
|
37
|
+
|
|
38
|
+
[](https://pypi.org/project/webtools-cli/)
|
|
39
|
+
[](https://github.com/abhinavgautam08/webtools-cli/blob/main/LICENSE)
|
|
40
|
+
[](https://pypi.org/project/webtools-cli/)
|
|
41
|
+
|
|
42
|
+

|
|
43
|
+
|
|
44
|
+
WebTools CLI is an advanced web intelligence suite for researchers, OSINT enthusiasts, and developers. It brings the power of deep web analysis and automated scraping directly into your terminal, bridging the gap between a high-speed **Terminal UI** and a feature-rich **Cyber-themed Dashboard**.
|
|
45
|
+
|
|
46
|
+
---
|
|
47
|
+
|
|
48
|
+
## 🚀 Why WebTools CLI?
|
|
49
|
+
|
|
50
|
+
- **🎯 Stealth & Speed**: Smart proxy rotation and Turbo-Fetch logic for evasion and performance.
|
|
51
|
+
- **🧠 AI-Powered**: Automated content summarization, sentiment analysis, and readability scoring.
|
|
52
|
+
- **🔧 Security-Centric**: Built-in honeypot detection, threat leveling, and image forensic analysis.
|
|
53
|
+
- **💻 Terminal-First**: Designed for power users who live in the command line.
|
|
54
|
+
- **🛡️ Cross-Platform**: Works seamlessly on Windows, Linux, and macOS (with auto-download for Windows tunnels).
|
|
55
|
+
- **🔌 SPA Ready**: Automatic Playwright fallback for JavaScript-heavy sites like LinkedIn/Instagram.
|
|
56
|
+
|
|
57
|
+
---
|
|
58
|
+
|
|
59
|
+
## 📦 Installation
|
|
60
|
+
|
|
61
|
+
See the installation guide for recommended system specifications.
|
|
62
|
+
|
|
63
|
+
### Quick Install
|
|
64
|
+
|
|
65
|
+
Install globally via pip:
|
|
66
|
+
|
|
67
|
+
```bash
|
|
68
|
+
pip install webtools-cli
|
|
69
|
+
```
|
|
70
|
+
|
|
71
|
+
To upgrade to the latest version:
|
|
72
|
+
|
|
73
|
+
```bash
|
|
74
|
+
pip install webtools-cli --upgrade
|
|
75
|
+
```
|
|
76
|
+
|
|
77
|
+
### Optional Dependencies
|
|
78
|
+
|
|
79
|
+
For Single Page Application (SPA) support:
|
|
80
|
+
|
|
81
|
+
```bash
|
|
82
|
+
playwright install chromium
|
|
83
|
+
```
|
|
84
|
+
|
|
85
|
+
---
|
|
86
|
+
|
|
87
|
+
## 📋 Key Features
|
|
88
|
+
|
|
89
|
+
### Advanced Scraping & Stealth
|
|
90
|
+
- **Smart Proxy Rotation**: Automatically rotates User-Agents and Proxies to evade detection.
|
|
91
|
+
- **Turbo-Fetch**: Parallel chunk downloads for large media (Videos/Images).
|
|
92
|
+
- **Deep Crawl**: Recursive link mapping up to 3 levels deep.
|
|
93
|
+
- **Headless Fallback**: Integrated Playwright support for auth-walled or SPA environments.
|
|
94
|
+
|
|
95
|
+
### Intelligence & Security Analysis
|
|
96
|
+
- **OSINT Toolkit**: Auto-extract emails, phones, locations, social media, and tech stacks.
|
|
97
|
+
- **SEO Auditor**: Page score, heading hierarchy, link integrity, and image alt-text auditing.
|
|
98
|
+
- **Image Forensics**: CLI-based Error Level Analysis (ELA) and AI-likelihood detection.
|
|
99
|
+
- **Honeypot Detector**: Identifies hidden traps and anti-bot measures (Cloudflare/CAPTCHAs).
|
|
100
|
+
|
|
101
|
+
### Modern Experience
|
|
102
|
+
- **Matrix Background**: "Flickering Grid" animated dashboard (Canvas-based).
|
|
103
|
+
- **Responsive Preview**: Live rendering scaling for desktop and mobile viewpoints.
|
|
104
|
+
- **History & Stats**: Phase-by-phase performance tracking and historical session management.
|
|
105
|
+
|
|
106
|
+
---
|
|
107
|
+
|
|
108
|
+
## 🚀 Getting Started
|
|
109
|
+
|
|
110
|
+
### Basic Usage
|
|
111
|
+
|
|
112
|
+
#### Launch Interactive Menu
|
|
113
|
+
```bash
|
|
114
|
+
webtools
|
|
115
|
+
```
|
|
116
|
+
|
|
117
|
+
#### Non-Interactive Script Mode
|
|
118
|
+
```bash
|
|
119
|
+
python -m webtools
|
|
120
|
+
```
|
|
121
|
+
|
|
122
|
+
### Slash Commands Reference
|
|
123
|
+
|
|
124
|
+
Navigate the suite using quick terminal commands:
|
|
125
|
+
|
|
126
|
+
| Command | Alias | Description |
|
|
127
|
+
|---------|-------|-------------|
|
|
128
|
+
| `/web` | `/w` | Launch **Web UI** (Cloudflare Tunnel + QR) |
|
|
129
|
+
| `/cli` | `/c` | Launch **CLI Intelligence** scan |
|
|
130
|
+
| `/image` | `/i` | **Image Forensics** & AI Likelihood |
|
|
131
|
+
| `/history`| `/hi`| View and manage scan history |
|
|
132
|
+
| `/help` | `/h` | Show full command documentation |
|
|
133
|
+
| `/clear` | - | Purge all locally scraped data |
|
|
134
|
+
| `/quit` | `/q` | Exit the application |
|
|
135
|
+
|
|
136
|
+
---
|
|
137
|
+
|
|
138
|
+
## ☁️ Deployment Options
|
|
139
|
+
|
|
140
|
+
- **Local Development**: Run on your machine with a generated QR code for mobile access.
|
|
141
|
+
- **Cloud Tunnels**: Automatic `cloudflared` integration to expose your UI globally.
|
|
142
|
+
- **Google Colab**: Compatible with Colab for cloud-based scraping (see badge above).
|
|
143
|
+
|
|
144
|
+
---
|
|
145
|
+
|
|
146
|
+
## 🤝 Resources & Support
|
|
147
|
+
|
|
148
|
+
- **[GitHub Repository](https://github.com/abhinavgautam08/webtools-cli)** - Source code and updates.
|
|
149
|
+
- **[Issue Tracker](https://github.com/abhinavgautam08/webtools-cli/issues)** - Report bugs or request features.
|
|
150
|
+
- **[License](./LICENSE)** - MIT License.
|
|
151
|
+
|
|
152
|
+
---
|
|
153
|
+
|
|
154
|
+
## ⚖️ Legal
|
|
155
|
+
|
|
156
|
+
This tool is for **educational and testing purposes only**. Always respect `robots.txt` and the Terms of Service of the websites you scrape. Neither the author nor the contributors are responsible for any misuse of this tool.
|
|
157
|
+
|
|
158
|
+
---
|
|
159
|
+
|
|
160
|
+
<p align="center">
|
|
161
|
+
Built with ❤️ by <strong>Abhinav Adarsh</strong> and the open source community
|
|
162
|
+
</p>
|
|
@@ -0,0 +1,127 @@
|
|
|
1
|
+
# WebTools CLI
|
|
2
|
+
|
|
3
|
+
[](https://pypi.org/project/webtools-cli/)
|
|
4
|
+
[](https://github.com/abhinavgautam08/webtools-cli/blob/main/LICENSE)
|
|
5
|
+
[](https://pypi.org/project/webtools-cli/)
|
|
6
|
+
|
|
7
|
+

|
|
8
|
+
|
|
9
|
+
WebTools CLI is an advanced web intelligence suite for researchers, OSINT enthusiasts, and developers. It brings the power of deep web analysis and automated scraping directly into your terminal, bridging the gap between a high-speed **Terminal UI** and a feature-rich **Cyber-themed Dashboard**.
|
|
10
|
+
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
## 🚀 Why WebTools CLI?
|
|
14
|
+
|
|
15
|
+
- **🎯 Stealth & Speed**: Smart proxy rotation and Turbo-Fetch logic for evasion and performance.
|
|
16
|
+
- **🧠 AI-Powered**: Automated content summarization, sentiment analysis, and readability scoring.
|
|
17
|
+
- **🔧 Security-Centric**: Built-in honeypot detection, threat leveling, and image forensic analysis.
|
|
18
|
+
- **💻 Terminal-First**: Designed for power users who live in the command line.
|
|
19
|
+
- **🛡️ Cross-Platform**: Works seamlessly on Windows, Linux, and macOS (with auto-download for Windows tunnels).
|
|
20
|
+
- **🔌 SPA Ready**: Automatic Playwright fallback for JavaScript-heavy sites like LinkedIn/Instagram.
|
|
21
|
+
|
|
22
|
+
---
|
|
23
|
+
|
|
24
|
+
## 📦 Installation
|
|
25
|
+
|
|
26
|
+
See the installation guide for recommended system specifications.
|
|
27
|
+
|
|
28
|
+
### Quick Install
|
|
29
|
+
|
|
30
|
+
Install globally via pip:
|
|
31
|
+
|
|
32
|
+
```bash
|
|
33
|
+
pip install webtools-cli
|
|
34
|
+
```
|
|
35
|
+
|
|
36
|
+
To upgrade to the latest version:
|
|
37
|
+
|
|
38
|
+
```bash
|
|
39
|
+
pip install webtools-cli --upgrade
|
|
40
|
+
```
|
|
41
|
+
|
|
42
|
+
### Optional Dependencies
|
|
43
|
+
|
|
44
|
+
For Single Page Application (SPA) support:
|
|
45
|
+
|
|
46
|
+
```bash
|
|
47
|
+
playwright install chromium
|
|
48
|
+
```
|
|
49
|
+
|
|
50
|
+
---
|
|
51
|
+
|
|
52
|
+
## 📋 Key Features
|
|
53
|
+
|
|
54
|
+
### Advanced Scraping & Stealth
|
|
55
|
+
- **Smart Proxy Rotation**: Automatically rotates User-Agents and Proxies to evade detection.
|
|
56
|
+
- **Turbo-Fetch**: Parallel chunk downloads for large media (Videos/Images).
|
|
57
|
+
- **Deep Crawl**: Recursive link mapping up to 3 levels deep.
|
|
58
|
+
- **Headless Fallback**: Integrated Playwright support for auth-walled or SPA environments.
|
|
59
|
+
|
|
60
|
+
### Intelligence & Security Analysis
|
|
61
|
+
- **OSINT Toolkit**: Auto-extract emails, phones, locations, social media, and tech stacks.
|
|
62
|
+
- **SEO Auditor**: Page score, heading hierarchy, link integrity, and image alt-text auditing.
|
|
63
|
+
- **Image Forensics**: CLI-based Error Level Analysis (ELA) and AI-likelihood detection.
|
|
64
|
+
- **Honeypot Detector**: Identifies hidden traps and anti-bot measures (Cloudflare/CAPTCHAs).
|
|
65
|
+
|
|
66
|
+
### Modern Experience
|
|
67
|
+
- **Matrix Background**: "Flickering Grid" animated dashboard (Canvas-based).
|
|
68
|
+
- **Responsive Preview**: Live rendering scaling for desktop and mobile viewpoints.
|
|
69
|
+
- **History & Stats**: Phase-by-phase performance tracking and historical session management.
|
|
70
|
+
|
|
71
|
+
---
|
|
72
|
+
|
|
73
|
+
## 🚀 Getting Started
|
|
74
|
+
|
|
75
|
+
### Basic Usage
|
|
76
|
+
|
|
77
|
+
#### Launch Interactive Menu
|
|
78
|
+
```bash
|
|
79
|
+
webtools
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
#### Non-Interactive Script Mode
|
|
83
|
+
```bash
|
|
84
|
+
python -m webtools
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
### Slash Commands Reference
|
|
88
|
+
|
|
89
|
+
Navigate the suite using quick terminal commands:
|
|
90
|
+
|
|
91
|
+
| Command | Alias | Description |
|
|
92
|
+
|---------|-------|-------------|
|
|
93
|
+
| `/web` | `/w` | Launch **Web UI** (Cloudflare Tunnel + QR) |
|
|
94
|
+
| `/cli` | `/c` | Launch **CLI Intelligence** scan |
|
|
95
|
+
| `/image` | `/i` | **Image Forensics** & AI Likelihood |
|
|
96
|
+
| `/history`| `/hi`| View and manage scan history |
|
|
97
|
+
| `/help` | `/h` | Show full command documentation |
|
|
98
|
+
| `/clear` | - | Purge all locally scraped data |
|
|
99
|
+
| `/quit` | `/q` | Exit the application |
|
|
100
|
+
|
|
101
|
+
---
|
|
102
|
+
|
|
103
|
+
## ☁️ Deployment Options
|
|
104
|
+
|
|
105
|
+
- **Local Development**: Run on your machine with a generated QR code for mobile access.
|
|
106
|
+
- **Cloud Tunnels**: Automatic `cloudflared` integration to expose your UI globally.
|
|
107
|
+
- **Google Colab**: Compatible with Colab for cloud-based scraping (see badge above).
|
|
108
|
+
|
|
109
|
+
---
|
|
110
|
+
|
|
111
|
+
## 🤝 Resources & Support
|
|
112
|
+
|
|
113
|
+
- **[GitHub Repository](https://github.com/abhinavgautam08/webtools-cli)** - Source code and updates.
|
|
114
|
+
- **[Issue Tracker](https://github.com/abhinavgautam08/webtools-cli/issues)** - Report bugs or request features.
|
|
115
|
+
- **[License](./LICENSE)** - MIT License.
|
|
116
|
+
|
|
117
|
+
---
|
|
118
|
+
|
|
119
|
+
## ⚖️ Legal
|
|
120
|
+
|
|
121
|
+
This tool is for **educational and testing purposes only**. Always respect `robots.txt` and the Terms of Service of the websites you scrape. Neither the author nor the contributors are responsible for any misuse of this tool.
|
|
122
|
+
|
|
123
|
+
---
|
|
124
|
+
|
|
125
|
+
<p align="center">
|
|
126
|
+
Built with ❤️ by <strong>Abhinav Adarsh</strong> and the open source community
|
|
127
|
+
</p>
|
|
@@ -4,10 +4,10 @@ build-backend = "setuptools.build_meta"
|
|
|
4
4
|
|
|
5
5
|
[project]
|
|
6
6
|
name = "webtools-cli"
|
|
7
|
-
version = "1.0.
|
|
7
|
+
version = "1.0.3"
|
|
8
8
|
description = "Advanced Web Intelligence & Scraping Toolkit with CLI and Web UI"
|
|
9
9
|
readme = "README.md"
|
|
10
|
-
license =
|
|
10
|
+
license = "MIT"
|
|
11
11
|
requires-python = ">=3.9"
|
|
12
12
|
authors = [
|
|
13
13
|
{name = "Abhinav Adarsh"},
|
|
@@ -17,7 +17,6 @@ classifiers = [
|
|
|
17
17
|
"Development Status :: 4 - Beta",
|
|
18
18
|
"Environment :: Console",
|
|
19
19
|
"Intended Audience :: Developers",
|
|
20
|
-
"License :: OSI Approved :: MIT License",
|
|
21
20
|
"Programming Language :: Python :: 3",
|
|
22
21
|
"Programming Language :: Python :: 3.9",
|
|
23
22
|
"Programming Language :: Python :: 3.10",
|
|
@@ -46,7 +45,7 @@ playwright = ["playwright"]
|
|
|
46
45
|
webtools = "webtools.cli:main"
|
|
47
46
|
|
|
48
47
|
[project.urls]
|
|
49
|
-
Homepage = "https://github.com/abhinavgautam08/webtools"
|
|
48
|
+
Homepage = "https://github.com/abhinavgautam08/webtools-cli"
|
|
50
49
|
|
|
51
50
|
[tool.setuptools.packages.find]
|
|
52
51
|
include = ["webtools*"]
|
|
@@ -4,7 +4,8 @@ sys.dont_write_bytecode = True
|
|
|
4
4
|
# --- PACKAGE PATHS ---
|
|
5
5
|
PACKAGE_DIR = os.path.dirname(os.path.abspath(__file__))
|
|
6
6
|
DATA_DIR = os.path.join(os.path.expanduser('~'), '.webtools')
|
|
7
|
-
os.
|
|
7
|
+
SCRAPED_DIR = os.path.join(DATA_DIR, 'scraped')
|
|
8
|
+
os.makedirs(SCRAPED_DIR, exist_ok=True)
|
|
8
9
|
try:
|
|
9
10
|
from colorama import init, Fore, Style
|
|
10
11
|
init(autoreset=True)
|
|
@@ -97,9 +98,8 @@ log.setLevel(logging.ERROR)
|
|
|
97
98
|
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
|
|
98
99
|
|
|
99
100
|
# Directories setup kar rahe hain
|
|
100
|
-
os.makedirs('
|
|
101
|
-
os.makedirs('
|
|
102
|
-
os.makedirs('webfiles/scraped/videos', exist_ok=True)
|
|
101
|
+
os.makedirs(os.path.join(SCRAPED_DIR, 'images'), exist_ok=True)
|
|
102
|
+
os.makedirs(os.path.join(SCRAPED_DIR, 'videos'), exist_ok=True)
|
|
103
103
|
|
|
104
104
|
# --- PERFORMANCE AUDITOR ---
|
|
105
105
|
class PerformanceTracker:
|
|
@@ -385,7 +385,7 @@ def serve_favicon():
|
|
|
385
385
|
|
|
386
386
|
@app.route('/download/<path:filename>')
|
|
387
387
|
def serve_scraped_file(filename):
|
|
388
|
-
return send_from_directory(
|
|
388
|
+
return send_from_directory(SCRAPED_DIR, filename)
|
|
389
389
|
|
|
390
390
|
def scrape_with_playwright(url, proxy=None):
|
|
391
391
|
if not PLAYWRIGHT_AVAILABLE:
|
|
@@ -561,9 +561,24 @@ def detect_tech_stack(soup, response):
|
|
|
561
561
|
|
|
562
562
|
return list(stack)
|
|
563
563
|
|
|
564
|
+
def ensure_textblob_corpora():
|
|
565
|
+
"""Ensure necessary NLTK corpora for TextBlob are downloaded"""
|
|
566
|
+
try:
|
|
567
|
+
import nltk
|
|
568
|
+
needed = ['punkt', 'brown', 'averaged_perceptron_tagger']
|
|
569
|
+
for corpus in needed:
|
|
570
|
+
try:
|
|
571
|
+
nltk.data.find(f'tokenizers/{corpus}' if corpus == 'punkt' else f'corpora/{corpus}')
|
|
572
|
+
except (LookupError, AttributeError):
|
|
573
|
+
print(f"Downloading required AI data: {corpus}...")
|
|
574
|
+
nltk.download(corpus, quiet=True)
|
|
575
|
+
except:
|
|
576
|
+
pass
|
|
577
|
+
|
|
564
578
|
def analyze_ai_content(text):
|
|
565
579
|
"""Text analyze karo (sentiment, summary, readability, aur keywords)"""
|
|
566
580
|
try:
|
|
581
|
+
ensure_textblob_corpora()
|
|
567
582
|
from textblob import TextBlob
|
|
568
583
|
import re
|
|
569
584
|
|
|
@@ -816,9 +831,9 @@ def execute_scrape_logic(url, fetch_images=False, fetch_videos=False, crawl_dept
|
|
|
816
831
|
self.headers = {}
|
|
817
832
|
response = MockResponse(pw_html)
|
|
818
833
|
else:
|
|
819
|
-
return
|
|
834
|
+
return {'success': False, 'error': f'Request failed and Playwright fallback failed: {str(e)}'}
|
|
820
835
|
else:
|
|
821
|
-
return
|
|
836
|
+
return {'success': False, 'error': f'Request failed: {str(e)}'}
|
|
822
837
|
|
|
823
838
|
soup = BeautifulSoup(response.text, 'html.parser')
|
|
824
839
|
perf_tracker.record_phase("HTML Parsing")
|
|
@@ -990,7 +1005,7 @@ def execute_scrape_logic(url, fetch_images=False, fetch_videos=False, crawl_dept
|
|
|
990
1005
|
else:
|
|
991
1006
|
filename = f"{base}_{uuid.uuid4().hex[:8]}{ext}"
|
|
992
1007
|
|
|
993
|
-
filepath =
|
|
1008
|
+
filepath = os.path.join(SCRAPED_DIR, 'videos', filename)
|
|
994
1009
|
|
|
995
1010
|
# Pehle TURBO FETCH try karo
|
|
996
1011
|
if not download_file_turbo(v_url, filepath):
|
|
@@ -1161,7 +1176,7 @@ def execute_scrape_logic(url, fetch_images=False, fetch_videos=False, crawl_dept
|
|
|
1161
1176
|
|
|
1162
1177
|
filename = f"{uuid.uuid4().hex[:8]}_{filename}"
|
|
1163
1178
|
|
|
1164
|
-
filepath =
|
|
1179
|
+
filepath = os.path.join(SCRAPED_DIR, 'images', filename)
|
|
1165
1180
|
with open(filepath, 'wb') as f:
|
|
1166
1181
|
f.write(content)
|
|
1167
1182
|
return (img_src, f'images/{filename}', f'/download/images/{filename}', image_hash, filepath)
|
|
@@ -1212,7 +1227,7 @@ def execute_scrape_logic(url, fetch_images=False, fetch_videos=False, crawl_dept
|
|
|
1212
1227
|
# Image Tasks collect karo
|
|
1213
1228
|
image_tasks = []
|
|
1214
1229
|
if fetch_images:
|
|
1215
|
-
os.makedirs('
|
|
1230
|
+
os.makedirs(os.path.join(SCRAPED_DIR, 'images'), exist_ok=True)
|
|
1216
1231
|
|
|
1217
1232
|
# Exclude karne ke liye Video Posters ID karo
|
|
1218
1233
|
poster_blacklist = set()
|
|
@@ -1277,7 +1292,7 @@ def execute_scrape_logic(url, fetch_images=False, fetch_videos=False, crawl_dept
|
|
|
1277
1292
|
# Video Tasks collect karo
|
|
1278
1293
|
video_tasks = []
|
|
1279
1294
|
if fetch_videos:
|
|
1280
|
-
os.makedirs('
|
|
1295
|
+
os.makedirs(os.path.join(SCRAPED_DIR, 'videos'), exist_ok=True)
|
|
1281
1296
|
# -------------------------------------------------------------------------
|
|
1282
1297
|
# RESOURCE SNIFFER (Deep Scan)
|
|
1283
1298
|
# Scans raw HTML/JS for hidden video links (mp4, m3u8, etc)
|
|
@@ -1461,11 +1476,11 @@ def execute_scrape_logic(url, fetch_images=False, fetch_videos=False, crawl_dept
|
|
|
1461
1476
|
body.append(js_script)
|
|
1462
1477
|
# Files save karo
|
|
1463
1478
|
html_content = str(soup)
|
|
1464
|
-
with open('
|
|
1479
|
+
with open(os.path.join(SCRAPED_DIR, 'index.html'), 'w', encoding='utf-8') as f:
|
|
1465
1480
|
f.write(html_content)
|
|
1466
|
-
with open('
|
|
1481
|
+
with open(os.path.join(SCRAPED_DIR, 'style.css'), 'w', encoding='utf-8') as f:
|
|
1467
1482
|
f.write('\n\n'.join(css_content) or '/* No CSS found */')
|
|
1468
|
-
with open('
|
|
1483
|
+
with open(os.path.join(SCRAPED_DIR, 'script.js'), 'w', encoding='utf-8') as f:
|
|
1469
1484
|
f.write('\n\n'.join(js_content) or '// No JS found */')
|
|
1470
1485
|
# Stats calculate karo
|
|
1471
1486
|
def get_size(content):
|
|
@@ -1767,12 +1782,12 @@ def api_save():
|
|
|
1767
1782
|
@app.route('/api/download-zip')
|
|
1768
1783
|
def download_zip():
|
|
1769
1784
|
try:
|
|
1770
|
-
zip_path = '
|
|
1785
|
+
zip_path = os.path.join(DATA_DIR, 'scraped_files.zip')
|
|
1771
1786
|
with zipfile.ZipFile(zip_path, 'w', zipfile.ZIP_DEFLATED) as zipf:
|
|
1772
|
-
for root, dirs, files in os.walk(
|
|
1787
|
+
for root, dirs, files in os.walk(SCRAPED_DIR):
|
|
1773
1788
|
for file in files:
|
|
1774
1789
|
file_path = os.path.join(root, file)
|
|
1775
|
-
arcname = os.path.relpath(file_path,
|
|
1790
|
+
arcname = os.path.relpath(file_path, SCRAPED_DIR)
|
|
1776
1791
|
zipf.write(file_path, arcname)
|
|
1777
1792
|
|
|
1778
1793
|
return send_file(zip_path, as_attachment=True, download_name='scraped_files.zip')
|
|
@@ -1781,12 +1796,10 @@ def download_zip():
|
|
|
1781
1796
|
|
|
1782
1797
|
def clear_scraped_data():
|
|
1783
1798
|
try:
|
|
1784
|
-
|
|
1785
|
-
|
|
1786
|
-
|
|
1787
|
-
os.makedirs('
|
|
1788
|
-
os.makedirs('webfiles/scraped/images', exist_ok=True)
|
|
1789
|
-
os.makedirs('webfiles/scraped/videos', exist_ok=True)
|
|
1799
|
+
if os.path.exists(SCRAPED_DIR):
|
|
1800
|
+
shutil.rmtree(SCRAPED_DIR)
|
|
1801
|
+
os.makedirs(os.path.join(SCRAPED_DIR, 'images'), exist_ok=True)
|
|
1802
|
+
os.makedirs(os.path.join(SCRAPED_DIR, 'videos'), exist_ok=True)
|
|
1790
1803
|
return True
|
|
1791
1804
|
except Exception as e:
|
|
1792
1805
|
print(f"Cleanup Error: {e}")
|
|
@@ -2156,11 +2169,20 @@ def start_cloudflare_tunnel(port):
|
|
|
2156
2169
|
# OS ke hisaab se executable choose karo
|
|
2157
2170
|
cf_executable = os.path.join(DATA_DIR, 'cloudflared.exe') if os.name == 'nt' else os.path.join(DATA_DIR, 'cloudflared')
|
|
2158
2171
|
|
|
2159
|
-
# Agar missing ho
|
|
2160
|
-
if not os.path.exists(cf_executable)
|
|
2161
|
-
print("Downloading cloudflared...")
|
|
2162
|
-
|
|
2163
|
-
|
|
2172
|
+
# Agar missing ho toh download karo
|
|
2173
|
+
if not os.path.exists(cf_executable):
|
|
2174
|
+
print(f"Downloading cloudflared for {os.name}...")
|
|
2175
|
+
if os.name == 'nt':
|
|
2176
|
+
# Windows binary download URL
|
|
2177
|
+
win_url = "https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-windows-amd64.exe"
|
|
2178
|
+
resp = requests.get(win_url, stream=True)
|
|
2179
|
+
with open(cf_executable, 'wb') as f:
|
|
2180
|
+
for chunk in resp.iter_content(chunk_size=8192):
|
|
2181
|
+
f.write(chunk)
|
|
2182
|
+
else:
|
|
2183
|
+
# Linux binary download URL
|
|
2184
|
+
subprocess.run(['wget', '-q', '-O', cf_executable, 'https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64'])
|
|
2185
|
+
subprocess.run(['chmod', '+x', cf_executable])
|
|
2164
2186
|
|
|
2165
2187
|
process = subprocess.Popen(
|
|
2166
2188
|
[cf_executable, 'tunnel', '--protocol', 'http2', '--url', f'http://127.0.0.1:{port}'],
|
|
@@ -0,0 +1,162 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: webtools-cli
|
|
3
|
+
Version: 1.0.3
|
|
4
|
+
Summary: Advanced Web Intelligence & Scraping Toolkit with CLI and Web UI
|
|
5
|
+
Author: Abhinav Adarsh
|
|
6
|
+
License-Expression: MIT
|
|
7
|
+
Project-URL: Homepage, https://github.com/abhinavgautam08/webtools-cli
|
|
8
|
+
Keywords: web-scraping,osint,seo,intelligence,cli
|
|
9
|
+
Classifier: Development Status :: 4 - Beta
|
|
10
|
+
Classifier: Environment :: Console
|
|
11
|
+
Classifier: Intended Audience :: Developers
|
|
12
|
+
Classifier: Programming Language :: Python :: 3
|
|
13
|
+
Classifier: Programming Language :: Python :: 3.9
|
|
14
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
15
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
16
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
17
|
+
Classifier: Topic :: Internet :: WWW/HTTP
|
|
18
|
+
Requires-Python: >=3.9
|
|
19
|
+
Description-Content-Type: text/markdown
|
|
20
|
+
License-File: LICENSE
|
|
21
|
+
Requires-Dist: flask
|
|
22
|
+
Requires-Dist: requests
|
|
23
|
+
Requires-Dist: beautifulsoup4
|
|
24
|
+
Requires-Dist: qrcode
|
|
25
|
+
Requires-Dist: opencv-python
|
|
26
|
+
Requires-Dist: numpy
|
|
27
|
+
Requires-Dist: textblob
|
|
28
|
+
Requires-Dist: Pillow
|
|
29
|
+
Requires-Dist: mtranslate
|
|
30
|
+
Requires-Dist: colorama
|
|
31
|
+
Requires-Dist: pyreadline3; platform_system == "Windows"
|
|
32
|
+
Provides-Extra: playwright
|
|
33
|
+
Requires-Dist: playwright; extra == "playwright"
|
|
34
|
+
Dynamic: license-file
|
|
35
|
+
|
|
36
|
+
# WebTools CLI
|
|
37
|
+
|
|
38
|
+
[](https://pypi.org/project/webtools-cli/)
|
|
39
|
+
[](https://github.com/abhinavgautam08/webtools-cli/blob/main/LICENSE)
|
|
40
|
+
[](https://pypi.org/project/webtools-cli/)
|
|
41
|
+
|
|
42
|
+

|
|
43
|
+
|
|
44
|
+
WebTools CLI is an advanced web intelligence suite for researchers, OSINT enthusiasts, and developers. It brings the power of deep web analysis and automated scraping directly into your terminal, bridging the gap between a high-speed **Terminal UI** and a feature-rich **Cyber-themed Dashboard**.
|
|
45
|
+
|
|
46
|
+
---
|
|
47
|
+
|
|
48
|
+
## 🚀 Why WebTools CLI?
|
|
49
|
+
|
|
50
|
+
- **🎯 Stealth & Speed**: Smart proxy rotation and Turbo-Fetch logic for evasion and performance.
|
|
51
|
+
- **🧠 AI-Powered**: Automated content summarization, sentiment analysis, and readability scoring.
|
|
52
|
+
- **🔧 Security-Centric**: Built-in honeypot detection, threat leveling, and image forensic analysis.
|
|
53
|
+
- **💻 Terminal-First**: Designed for power users who live in the command line.
|
|
54
|
+
- **🛡️ Cross-Platform**: Works seamlessly on Windows, Linux, and macOS (with auto-download for Windows tunnels).
|
|
55
|
+
- **🔌 SPA Ready**: Automatic Playwright fallback for JavaScript-heavy sites like LinkedIn/Instagram.
|
|
56
|
+
|
|
57
|
+
---
|
|
58
|
+
|
|
59
|
+
## 📦 Installation
|
|
60
|
+
|
|
61
|
+
See the installation guide for recommended system specifications.
|
|
62
|
+
|
|
63
|
+
### Quick Install
|
|
64
|
+
|
|
65
|
+
Install globally via pip:
|
|
66
|
+
|
|
67
|
+
```bash
|
|
68
|
+
pip install webtools-cli
|
|
69
|
+
```
|
|
70
|
+
|
|
71
|
+
To upgrade to the latest version:
|
|
72
|
+
|
|
73
|
+
```bash
|
|
74
|
+
pip install webtools-cli --upgrade
|
|
75
|
+
```
|
|
76
|
+
|
|
77
|
+
### Optional Dependencies
|
|
78
|
+
|
|
79
|
+
For Single Page Application (SPA) support:
|
|
80
|
+
|
|
81
|
+
```bash
|
|
82
|
+
playwright install chromium
|
|
83
|
+
```
|
|
84
|
+
|
|
85
|
+
---
|
|
86
|
+
|
|
87
|
+
## 📋 Key Features
|
|
88
|
+
|
|
89
|
+
### Advanced Scraping & Stealth
|
|
90
|
+
- **Smart Proxy Rotation**: Automatically rotates User-Agents and Proxies to evade detection.
|
|
91
|
+
- **Turbo-Fetch**: Parallel chunk downloads for large media (Videos/Images).
|
|
92
|
+
- **Deep Crawl**: Recursive link mapping up to 3 levels deep.
|
|
93
|
+
- **Headless Fallback**: Integrated Playwright support for auth-walled or SPA environments.
|
|
94
|
+
|
|
95
|
+
### Intelligence & Security Analysis
|
|
96
|
+
- **OSINT Toolkit**: Auto-extract emails, phones, locations, social media, and tech stacks.
|
|
97
|
+
- **SEO Auditor**: Page score, heading hierarchy, link integrity, and image alt-text auditing.
|
|
98
|
+
- **Image Forensics**: CLI-based Error Level Analysis (ELA) and AI-likelihood detection.
|
|
99
|
+
- **Honeypot Detector**: Identifies hidden traps and anti-bot measures (Cloudflare/CAPTCHAs).
|
|
100
|
+
|
|
101
|
+
### Modern Experience
|
|
102
|
+
- **Matrix Background**: "Flickering Grid" animated dashboard (Canvas-based).
|
|
103
|
+
- **Responsive Preview**: Live rendering scaling for desktop and mobile viewpoints.
|
|
104
|
+
- **History & Stats**: Phase-by-phase performance tracking and historical session management.
|
|
105
|
+
|
|
106
|
+
---
|
|
107
|
+
|
|
108
|
+
## 🚀 Getting Started
|
|
109
|
+
|
|
110
|
+
### Basic Usage
|
|
111
|
+
|
|
112
|
+
#### Launch Interactive Menu
|
|
113
|
+
```bash
|
|
114
|
+
webtools
|
|
115
|
+
```
|
|
116
|
+
|
|
117
|
+
#### Non-Interactive Script Mode
|
|
118
|
+
```bash
|
|
119
|
+
python -m webtools
|
|
120
|
+
```
|
|
121
|
+
|
|
122
|
+
### Slash Commands Reference
|
|
123
|
+
|
|
124
|
+
Navigate the suite using quick terminal commands:
|
|
125
|
+
|
|
126
|
+
| Command | Alias | Description |
|
|
127
|
+
|---------|-------|-------------|
|
|
128
|
+
| `/web` | `/w` | Launch **Web UI** (Cloudflare Tunnel + QR) |
|
|
129
|
+
| `/cli` | `/c` | Launch **CLI Intelligence** scan |
|
|
130
|
+
| `/image` | `/i` | **Image Forensics** & AI Likelihood |
|
|
131
|
+
| `/history`| `/hi`| View and manage scan history |
|
|
132
|
+
| `/help` | `/h` | Show full command documentation |
|
|
133
|
+
| `/clear` | - | Purge all locally scraped data |
|
|
134
|
+
| `/quit` | `/q` | Exit the application |
|
|
135
|
+
|
|
136
|
+
---
|
|
137
|
+
|
|
138
|
+
## ☁️ Deployment Options
|
|
139
|
+
|
|
140
|
+
- **Local Development**: Run on your machine with a generated QR code for mobile access.
|
|
141
|
+
- **Cloud Tunnels**: Automatic `cloudflared` integration to expose your UI globally.
|
|
142
|
+
- **Google Colab**: Compatible with Colab for cloud-based scraping (see badge above).
|
|
143
|
+
|
|
144
|
+
---
|
|
145
|
+
|
|
146
|
+
## 🤝 Resources & Support
|
|
147
|
+
|
|
148
|
+
- **[GitHub Repository](https://github.com/abhinavgautam08/webtools-cli)** - Source code and updates.
|
|
149
|
+
- **[Issue Tracker](https://github.com/abhinavgautam08/webtools-cli/issues)** - Report bugs or request features.
|
|
150
|
+
- **[License](./LICENSE)** - MIT License.
|
|
151
|
+
|
|
152
|
+
---
|
|
153
|
+
|
|
154
|
+
## ⚖️ Legal
|
|
155
|
+
|
|
156
|
+
This tool is for **educational and testing purposes only**. Always respect `robots.txt` and the Terms of Service of the websites you scrape. Neither the author nor the contributors are responsible for any misuse of this tool.
|
|
157
|
+
|
|
158
|
+
---
|
|
159
|
+
|
|
160
|
+
<p align="center">
|
|
161
|
+
Built with ❤️ by <strong>Abhinav Adarsh</strong> and the open source community
|
|
162
|
+
</p>
|
webtools_cli-1.0.0/PKG-INFO
DELETED
|
@@ -1,110 +0,0 @@
|
|
|
1
|
-
Metadata-Version: 2.4
|
|
2
|
-
Name: webtools-cli
|
|
3
|
-
Version: 1.0.0
|
|
4
|
-
Summary: Advanced Web Intelligence & Scraping Toolkit with CLI and Web UI
|
|
5
|
-
Author: Abhinav Adarsh
|
|
6
|
-
License: MIT
|
|
7
|
-
Project-URL: Homepage, https://github.com/abhinavgautam08/webtools
|
|
8
|
-
Keywords: web-scraping,osint,seo,intelligence,cli
|
|
9
|
-
Classifier: Development Status :: 4 - Beta
|
|
10
|
-
Classifier: Environment :: Console
|
|
11
|
-
Classifier: Intended Audience :: Developers
|
|
12
|
-
Classifier: License :: OSI Approved :: MIT License
|
|
13
|
-
Classifier: Programming Language :: Python :: 3
|
|
14
|
-
Classifier: Programming Language :: Python :: 3.9
|
|
15
|
-
Classifier: Programming Language :: Python :: 3.10
|
|
16
|
-
Classifier: Programming Language :: Python :: 3.11
|
|
17
|
-
Classifier: Programming Language :: Python :: 3.12
|
|
18
|
-
Classifier: Topic :: Internet :: WWW/HTTP
|
|
19
|
-
Requires-Python: >=3.9
|
|
20
|
-
Description-Content-Type: text/markdown
|
|
21
|
-
License-File: LICENSE
|
|
22
|
-
Requires-Dist: flask
|
|
23
|
-
Requires-Dist: requests
|
|
24
|
-
Requires-Dist: beautifulsoup4
|
|
25
|
-
Requires-Dist: qrcode
|
|
26
|
-
Requires-Dist: opencv-python
|
|
27
|
-
Requires-Dist: numpy
|
|
28
|
-
Requires-Dist: textblob
|
|
29
|
-
Requires-Dist: Pillow
|
|
30
|
-
Requires-Dist: mtranslate
|
|
31
|
-
Requires-Dist: colorama
|
|
32
|
-
Requires-Dist: pyreadline3; platform_system == "Windows"
|
|
33
|
-
Provides-Extra: playwright
|
|
34
|
-
Requires-Dist: playwright; extra == "playwright"
|
|
35
|
-
Dynamic: license-file
|
|
36
|
-
|
|
37
|
-
<p align="center">
|
|
38
|
-
<img src="Web_Tools.png" alt="WebTools CLI" width="180">
|
|
39
|
-
</p>
|
|
40
|
-
|
|
41
|
-
<h1 align="center">WebTools CLI</h1>
|
|
42
|
-
|
|
43
|
-
<p align="center">
|
|
44
|
-
<strong>Advanced Web Intelligence & Scraping Toolkit</strong><br>
|
|
45
|
-
<em>OSINT - SEO - AI Analysis - Security Scanner</em>
|
|
46
|
-
</p>
|
|
47
|
-
|
|
48
|
-
<p align="center">
|
|
49
|
-
<img src="https://img.shields.io/badge/python-3.9+-blue?logo=python&logoColor=white" alt="Python">
|
|
50
|
-
<img src="https://img.shields.io/badge/license-MIT-green" alt="License">
|
|
51
|
-
<img src="https://img.shields.io/badge/version-1.0.0-cyan" alt="Version">
|
|
52
|
-
</p>
|
|
53
|
-
|
|
54
|
-
---
|
|
55
|
-
|
|
56
|
-
## Install
|
|
57
|
-
|
|
58
|
-
```bash
|
|
59
|
-
pip install webtools-cli
|
|
60
|
-
```
|
|
61
|
-
|
|
62
|
-
## Usage
|
|
63
|
-
|
|
64
|
-
```bash
|
|
65
|
-
# Launch CLI
|
|
66
|
-
webtools
|
|
67
|
-
|
|
68
|
-
# Or via Python module
|
|
69
|
-
python -m webtools
|
|
70
|
-
```
|
|
71
|
-
|
|
72
|
-
## Features
|
|
73
|
-
|
|
74
|
-
| Feature | Description |
|
|
75
|
-
|---------|-------------|
|
|
76
|
-
| **Web Mode** | Full web UI with Cloudflare tunnel + QR code sharing |
|
|
77
|
-
| **CLI Intelligence** | Deep scan any URL from terminal |
|
|
78
|
-
| **Security Scanner** | Threat detection, honeypot traps, CSRF checks |
|
|
79
|
-
| **SEO Analyzer** | Score, headings, broken links, image audit |
|
|
80
|
-
| **AI Analysis** | Sentiment, readability, keywords, summarization |
|
|
81
|
-
| **OSINT** | Emails, phones, locations, social media, tech stack |
|
|
82
|
-
| **Smart Media** | Image quality filter + video deep-scan with sniffer |
|
|
83
|
-
| **Proxy Intelligence** | Smart proxy rotation with learning algorithm |
|
|
84
|
-
| **Playwright Fallback** | Handles SPAs and auth walls automatically |
|
|
85
|
-
| **Performance Tracker** | Phase-by-phase timing with historical stats |
|
|
86
|
-
|
|
87
|
-
## CLI Commands
|
|
88
|
-
|
|
89
|
-
| Command | Description |
|
|
90
|
-
|---------|-------------|
|
|
91
|
-
| `/web` or `/w` | Launch Web UI mode |
|
|
92
|
-
| `/cli` or `/c` | Launch CLI Intelligence mode |
|
|
93
|
-
| `/image` or `/i` | Image Forensics & AI Detection |
|
|
94
|
-
| `/help` or `/h` | Show all commands |
|
|
95
|
-
| `/history` or `/hi` | View scan history |
|
|
96
|
-
| `/clear` | Purge scraped data |
|
|
97
|
-
| `/quit` or `/q` | Exit |
|
|
98
|
-
|
|
99
|
-
## Requirements
|
|
100
|
-
|
|
101
|
-
- Python 3.9+
|
|
102
|
-
- Optional: `playwright` for SPA/auth wall bypass
|
|
103
|
-
|
|
104
|
-
## Author
|
|
105
|
-
|
|
106
|
-
**Abhinav Adarsh**
|
|
107
|
-
|
|
108
|
-
## License
|
|
109
|
-
|
|
110
|
-
MIT
|
webtools_cli-1.0.0/README.md
DELETED
|
@@ -1,74 +0,0 @@
|
|
|
1
|
-
<p align="center">
|
|
2
|
-
<img src="Web_Tools.png" alt="WebTools CLI" width="180">
|
|
3
|
-
</p>
|
|
4
|
-
|
|
5
|
-
<h1 align="center">WebTools CLI</h1>
|
|
6
|
-
|
|
7
|
-
<p align="center">
|
|
8
|
-
<strong>Advanced Web Intelligence & Scraping Toolkit</strong><br>
|
|
9
|
-
<em>OSINT - SEO - AI Analysis - Security Scanner</em>
|
|
10
|
-
</p>
|
|
11
|
-
|
|
12
|
-
<p align="center">
|
|
13
|
-
<img src="https://img.shields.io/badge/python-3.9+-blue?logo=python&logoColor=white" alt="Python">
|
|
14
|
-
<img src="https://img.shields.io/badge/license-MIT-green" alt="License">
|
|
15
|
-
<img src="https://img.shields.io/badge/version-1.0.0-cyan" alt="Version">
|
|
16
|
-
</p>
|
|
17
|
-
|
|
18
|
-
---
|
|
19
|
-
|
|
20
|
-
## Install
|
|
21
|
-
|
|
22
|
-
```bash
|
|
23
|
-
pip install webtools-cli
|
|
24
|
-
```
|
|
25
|
-
|
|
26
|
-
## Usage
|
|
27
|
-
|
|
28
|
-
```bash
|
|
29
|
-
# Launch CLI
|
|
30
|
-
webtools
|
|
31
|
-
|
|
32
|
-
# Or via Python module
|
|
33
|
-
python -m webtools
|
|
34
|
-
```
|
|
35
|
-
|
|
36
|
-
## Features
|
|
37
|
-
|
|
38
|
-
| Feature | Description |
|
|
39
|
-
|---------|-------------|
|
|
40
|
-
| **Web Mode** | Full web UI with Cloudflare tunnel + QR code sharing |
|
|
41
|
-
| **CLI Intelligence** | Deep scan any URL from terminal |
|
|
42
|
-
| **Security Scanner** | Threat detection, honeypot traps, CSRF checks |
|
|
43
|
-
| **SEO Analyzer** | Score, headings, broken links, image audit |
|
|
44
|
-
| **AI Analysis** | Sentiment, readability, keywords, summarization |
|
|
45
|
-
| **OSINT** | Emails, phones, locations, social media, tech stack |
|
|
46
|
-
| **Smart Media** | Image quality filter + video deep-scan with sniffer |
|
|
47
|
-
| **Proxy Intelligence** | Smart proxy rotation with learning algorithm |
|
|
48
|
-
| **Playwright Fallback** | Handles SPAs and auth walls automatically |
|
|
49
|
-
| **Performance Tracker** | Phase-by-phase timing with historical stats |
|
|
50
|
-
|
|
51
|
-
## CLI Commands
|
|
52
|
-
|
|
53
|
-
| Command | Description |
|
|
54
|
-
|---------|-------------|
|
|
55
|
-
| `/web` or `/w` | Launch Web UI mode |
|
|
56
|
-
| `/cli` or `/c` | Launch CLI Intelligence mode |
|
|
57
|
-
| `/image` or `/i` | Image Forensics & AI Detection |
|
|
58
|
-
| `/help` or `/h` | Show all commands |
|
|
59
|
-
| `/history` or `/hi` | View scan history |
|
|
60
|
-
| `/clear` | Purge scraped data |
|
|
61
|
-
| `/quit` or `/q` | Exit |
|
|
62
|
-
|
|
63
|
-
## Requirements
|
|
64
|
-
|
|
65
|
-
- Python 3.9+
|
|
66
|
-
- Optional: `playwright` for SPA/auth wall bypass
|
|
67
|
-
|
|
68
|
-
## Author
|
|
69
|
-
|
|
70
|
-
**Abhinav Adarsh**
|
|
71
|
-
|
|
72
|
-
## License
|
|
73
|
-
|
|
74
|
-
MIT
|
|
@@ -1,110 +0,0 @@
|
|
|
1
|
-
Metadata-Version: 2.4
|
|
2
|
-
Name: webtools-cli
|
|
3
|
-
Version: 1.0.0
|
|
4
|
-
Summary: Advanced Web Intelligence & Scraping Toolkit with CLI and Web UI
|
|
5
|
-
Author: Abhinav Adarsh
|
|
6
|
-
License: MIT
|
|
7
|
-
Project-URL: Homepage, https://github.com/abhinavgautam08/webtools
|
|
8
|
-
Keywords: web-scraping,osint,seo,intelligence,cli
|
|
9
|
-
Classifier: Development Status :: 4 - Beta
|
|
10
|
-
Classifier: Environment :: Console
|
|
11
|
-
Classifier: Intended Audience :: Developers
|
|
12
|
-
Classifier: License :: OSI Approved :: MIT License
|
|
13
|
-
Classifier: Programming Language :: Python :: 3
|
|
14
|
-
Classifier: Programming Language :: Python :: 3.9
|
|
15
|
-
Classifier: Programming Language :: Python :: 3.10
|
|
16
|
-
Classifier: Programming Language :: Python :: 3.11
|
|
17
|
-
Classifier: Programming Language :: Python :: 3.12
|
|
18
|
-
Classifier: Topic :: Internet :: WWW/HTTP
|
|
19
|
-
Requires-Python: >=3.9
|
|
20
|
-
Description-Content-Type: text/markdown
|
|
21
|
-
License-File: LICENSE
|
|
22
|
-
Requires-Dist: flask
|
|
23
|
-
Requires-Dist: requests
|
|
24
|
-
Requires-Dist: beautifulsoup4
|
|
25
|
-
Requires-Dist: qrcode
|
|
26
|
-
Requires-Dist: opencv-python
|
|
27
|
-
Requires-Dist: numpy
|
|
28
|
-
Requires-Dist: textblob
|
|
29
|
-
Requires-Dist: Pillow
|
|
30
|
-
Requires-Dist: mtranslate
|
|
31
|
-
Requires-Dist: colorama
|
|
32
|
-
Requires-Dist: pyreadline3; platform_system == "Windows"
|
|
33
|
-
Provides-Extra: playwright
|
|
34
|
-
Requires-Dist: playwright; extra == "playwright"
|
|
35
|
-
Dynamic: license-file
|
|
36
|
-
|
|
37
|
-
<p align="center">
|
|
38
|
-
<img src="Web_Tools.png" alt="WebTools CLI" width="180">
|
|
39
|
-
</p>
|
|
40
|
-
|
|
41
|
-
<h1 align="center">WebTools CLI</h1>
|
|
42
|
-
|
|
43
|
-
<p align="center">
|
|
44
|
-
<strong>Advanced Web Intelligence & Scraping Toolkit</strong><br>
|
|
45
|
-
<em>OSINT - SEO - AI Analysis - Security Scanner</em>
|
|
46
|
-
</p>
|
|
47
|
-
|
|
48
|
-
<p align="center">
|
|
49
|
-
<img src="https://img.shields.io/badge/python-3.9+-blue?logo=python&logoColor=white" alt="Python">
|
|
50
|
-
<img src="https://img.shields.io/badge/license-MIT-green" alt="License">
|
|
51
|
-
<img src="https://img.shields.io/badge/version-1.0.0-cyan" alt="Version">
|
|
52
|
-
</p>
|
|
53
|
-
|
|
54
|
-
---
|
|
55
|
-
|
|
56
|
-
## Install
|
|
57
|
-
|
|
58
|
-
```bash
|
|
59
|
-
pip install webtools-cli
|
|
60
|
-
```
|
|
61
|
-
|
|
62
|
-
## Usage
|
|
63
|
-
|
|
64
|
-
```bash
|
|
65
|
-
# Launch CLI
|
|
66
|
-
webtools
|
|
67
|
-
|
|
68
|
-
# Or via Python module
|
|
69
|
-
python -m webtools
|
|
70
|
-
```
|
|
71
|
-
|
|
72
|
-
## Features
|
|
73
|
-
|
|
74
|
-
| Feature | Description |
|
|
75
|
-
|---------|-------------|
|
|
76
|
-
| **Web Mode** | Full web UI with Cloudflare tunnel + QR code sharing |
|
|
77
|
-
| **CLI Intelligence** | Deep scan any URL from terminal |
|
|
78
|
-
| **Security Scanner** | Threat detection, honeypot traps, CSRF checks |
|
|
79
|
-
| **SEO Analyzer** | Score, headings, broken links, image audit |
|
|
80
|
-
| **AI Analysis** | Sentiment, readability, keywords, summarization |
|
|
81
|
-
| **OSINT** | Emails, phones, locations, social media, tech stack |
|
|
82
|
-
| **Smart Media** | Image quality filter + video deep-scan with sniffer |
|
|
83
|
-
| **Proxy Intelligence** | Smart proxy rotation with learning algorithm |
|
|
84
|
-
| **Playwright Fallback** | Handles SPAs and auth walls automatically |
|
|
85
|
-
| **Performance Tracker** | Phase-by-phase timing with historical stats |
|
|
86
|
-
|
|
87
|
-
## CLI Commands
|
|
88
|
-
|
|
89
|
-
| Command | Description |
|
|
90
|
-
|---------|-------------|
|
|
91
|
-
| `/web` or `/w` | Launch Web UI mode |
|
|
92
|
-
| `/cli` or `/c` | Launch CLI Intelligence mode |
|
|
93
|
-
| `/image` or `/i` | Image Forensics & AI Detection |
|
|
94
|
-
| `/help` or `/h` | Show all commands |
|
|
95
|
-
| `/history` or `/hi` | View scan history |
|
|
96
|
-
| `/clear` | Purge scraped data |
|
|
97
|
-
| `/quit` or `/q` | Exit |
|
|
98
|
-
|
|
99
|
-
## Requirements
|
|
100
|
-
|
|
101
|
-
- Python 3.9+
|
|
102
|
-
- Optional: `playwright` for SPA/auth wall bypass
|
|
103
|
-
|
|
104
|
-
## Author
|
|
105
|
-
|
|
106
|
-
**Abhinav Adarsh**
|
|
107
|
-
|
|
108
|
-
## License
|
|
109
|
-
|
|
110
|
-
MIT
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|