PyPI - plethora - Versions diffs - 1.0.0__tar.gz - Mend

plethora 1.0.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (15) hide show

plethora-1.0.0/LICENSE +21 -0
plethora-1.0.0/PKG-INFO +415 -0
plethora-1.0.0/README.md +385 -0
plethora-1.0.0/plethora/__init__.py +9 -0
plethora-1.0.0/plethora/cli.py +126 -0
plethora-1.0.0/plethora/formatter.py +485 -0
plethora-1.0.0/plethora/scraper.py +407 -0
plethora-1.0.0/plethora.egg-info/PKG-INFO +415 -0
plethora-1.0.0/plethora.egg-info/SOURCES.txt +13 -0
plethora-1.0.0/plethora.egg-info/dependency_links.txt +1 -0
plethora-1.0.0/plethora.egg-info/entry_points.txt +2 -0
plethora-1.0.0/plethora.egg-info/requires.txt +6 -0
plethora-1.0.0/plethora.egg-info/top_level.txt +1 -0
plethora-1.0.0/pyproject.toml +46 -0
plethora-1.0.0/setup.cfg +4 -0

plethora-1.0.0/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2026 Soumyadip Karforma
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

plethora-1.0.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,415 @@
+Metadata-Version: 2.4
+Name: plethora
+Version: 1.0.0
+Summary: Search the web, scrape sites, and generate reports — all from your terminal.
+Author-email: Soumyadip Karforma <soumyadipkarforma@gmail.com>
+License-Expression: MIT
+Project-URL: Homepage, https://github.com/soumyadipkarforma/plethora
+Project-URL: Repository, https://github.com/soumyadipkarforma/plethora
+Project-URL: Issues, https://github.com/soumyadipkarforma/plethora/issues
+Keywords: web-scraping,search,report,cli,beautifulsoup
+Classifier: Development Status :: 5 - Production/Stable
+Classifier: Environment :: Console
+Classifier: Intended Audience :: Developers
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.10
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Programming Language :: Python :: 3.13
+Classifier: Topic :: Internet :: WWW/HTTP
+Classifier: Topic :: Utilities
+Requires-Python: >=3.10
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Requires-Dist: requests>=2.28
+Requires-Dist: beautifulsoup4>=4.11
+Requires-Dist: fpdf2>=2.7
+Provides-Extra: rich
+Requires-Dist: rich>=13.0; extra == "rich"
+Dynamic: license-file
+<div align="center">
+# 🔍 Plethora
+### Search the web. Scrape the sites. Generate reports. All from your terminal.
+I built this because I got tired of manually Googling stuff and copy-pasting content.
+Now I just run a one-liner and get a clean report — low, medium, or high detail — in
+plain text, Markdown, HTML, JSON, or PDF. No browser needed. No fluff.
+[![PyPI](https://img.shields.io/pypi/v/plethora?logo=pypi&logoColor=white)](https://pypi.org/project/plethora/)
+[![Python 3.10+](https://img.shields.io/badge/python-3.10+-3776AB?logo=python&logoColor=white)](#requirements)
+[![License: MIT](https://img.shields.io/badge/license-MIT-green.svg)](#license)
+[![Sponsor](https://img.shields.io/badge/sponsor-💖_Sponsor_Me-ea4aaa?logo=github-sponsors&logoColor=white)](https://github.com/sponsors/soumyadipkarforma)
+[![Instagram](https://img.shields.io/badge/Instagram-%23E4405F.svg?logo=Instagram&logoColor=white)](https://instagram.com/soumyadip_karforma) [![X](https://img.shields.io/badge/X-black.svg?logo=X&logoColor=white)](https://x.com/soumyadip_k) [![YouTube](https://img.shields.io/badge/YouTube-%23FF0000.svg?logo=YouTube&logoColor=white)](https://youtube.com/@soumyadip_karforma) [![Email](https://img.shields.io/badge/Email-D14836?logo=gmail&logoColor=white)](mailto:soumyadipkarforma@gmail.com)
+</div>
+---
+## 💡 Why I Made This
+I wanted a fast way to research topics from the terminal — search for something,
+pull down the actual content from each result, and save it all in one place.
+So I wrote this: a set of scripts that does exactly that.
+**The idea is simple:** pick a detail level, run the script, get your report.
+---
+## 🐚 The Scripts — The Fastest Way to Use This
+These are the main thing. No flags to remember, no setup — just run them:
+```bash
+# Quick list of search results — titles, URLs, snippets
+./scrape-low "best static site generators"
+# Scrape the actual pages — headings, meta, content previews
+./scrape-med "python web frameworks 2026"
+# Full deep scrape — page content + sub-pages + everything
+./scrape-high "machine learning research papers" 8 3
+```
+**That's it.** Each script takes a search query and optionally how many results you want.
+`scrape-high` also takes a sub-page count as the third argument.
+```
+./scrape-low  "query" [num_results]
+./scrape-med  "query" [num_results]
+./scrape-high "query" [num_results] [max_subpages]
+```
+After the scrape finishes, it shows you where the report was saved and asks
+if you want to view it right there in the terminal with `less`. Say `y` and read it,
+or `n` and go grab it from the `reports/` folder later.
+---
+## 📋 What Each Level Gets You
+```
+┌──────────┬──────────────────────────────────────────────────────┐
+│  Level   │  What You Get                                       │
+├──────────┼──────────────────────────────────────────────────────┤
+│  🟢 LOW  │  Search results list — titles, URLs, snippets       │
+│          │  ⚡ Instant — doesn't visit any pages                │
+├──────────┼──────────────────────────────────────────────────────┤
+│  🟡 MED  │  Visits each result page — pulls headings, meta,    │
+│          │  lists, and a content preview (500 chars)            │
+├──────────┼──────────────────────────────────────────────────────┤
+│  🔴 HIGH │  Deep scrape — full page content + follows links    │
+│          │  to sub-pages. Tables, images, 2000 char content    │
+└──────────┴──────────────────────────────────────────────────────┘
+```
+---
+## 🚀 Setup
+### Install from PyPI (Recommended)
+```bash
+pip install plethora
+```
+That's it. Works everywhere — Linux, macOS, Windows, Termux, **Google Colab**.
+After installing, use the CLI:
+```bash
+plethora "your search query" --level medium
+```
+Or use it as a Python library:
+```python
+from plethora import web_search, scrape_page, run
+results = web_search("python tutorials", num_results=5)
+report_paths = run("AI news 2026", level="high", out_format="json")
+```
+#### Google Colab
+```python
+!pip install plethora
+from plethora import run
+paths = run("machine learning trends", level="medium", out_format="md")
+```
+### One-Command Setup (from source)
+I've included setup scripts for every major platform. Just run the one for your system
+and everything gets installed — Python, pip, dependencies, permissions. Zero hassle.
+| Platform | Command |
+|----------|---------|
+| **Termux (Android)** | `bash termux-setup` |
+| **Linux (Debian/Fedora/Arch/openSUSE)** | `bash linux-setup` |
+| **macOS** | `bash mac-setup` |
+| **Windows** | Double-click `windows-setup.bat` or run it from CMD |
+Each script handles the full chain: system packages → Python → pip dependencies → script permissions.
+After running it, you're ready to go.
+### Manual Setup
+If you'd rather do it yourself:
+- **Python 3.10+**
+- `requests` + `beautifulsoup4` (required)
+- `rich` (optional — gives you nice progress bars)
+- `fpdf2` (required for PDF output)
+```bash
+pip install requests beautifulsoup4 rich fpdf2
+```
+Make the scripts executable:
+```bash
+chmod +x scrape-low scrape-med scrape-high
+```
+You're good to go.
+---
+## ⚙️ Advanced: The Python CLI
+If you need more control, use `scrape.py` directly with flags:
+```bash
+# Basic usage
+python scrape.py "your search query" --level medium
+# Generate all formats at once (txt + md + html + json + pdf)
+python scrape.py "AI research" --level high --format all
+# Parallel scrape with 8 threads, skip cache
+python scrape.py "web dev trends" --level medium --workers 8 --no-cache
+# Quiet mode for piping
+python scrape.py "data science" --level low --quiet --format json
+```
+### All Options
+```
+python scrape.py <query> [options]
+  -l, --level LEVEL      low | medium | high                   (default: medium)
+  -n, --results N        Number of search results               (default: 5)
+  -s, --subpages N       Max sub-pages per site (high only)     (default: 2)
+  -o, --output DIR       Output directory                       (default: reports/)
+  -f, --format FMT       txt | md | html | json | pdf | all   (default: txt)
+  -w, --workers N        Concurrent scraping threads            (default: 4)
+  -q, --quiet            Suppress progress output
+  --no-cache             Bypass URL cache
+  --cache-ttl SECS       Cache TTL in seconds                   (default: 3600)
+```
+---
+## 📝 Output Formats
+| Format | Extension | Description |
+|--------|-----------|-------------|
+| **txt** | `.txt` | Clean plain text — great for terminal reading |
+| **md** | `.md` | Markdown — perfect for pasting into notes or docs |
+| **html** | `.html` | Self-contained HTML with dark theme — open in any browser |
+| **json** | `.json` | Raw structured data — feed it into your own scripts |
+| **pdf** | `.pdf` | Portable PDF with watermark — share or print anywhere |
+All formats include the **Plethora** watermark. Use `--format all` to get everything.
+---
+## ✨ What's Under the Hood
+- **Concurrent scraping** — pages are fetched in parallel with configurable threads
+- **Smart caching** — already-fetched URLs are cached locally (1hr default TTL)
+- **robots.txt respect** — checks before scraping, skips disallowed URLs
+- **Auto-retries** — failed requests retry 3x with exponential backoff
+- **Per-domain rate limiting** — won't hammer the same site
+- **Rich extraction** — headings (h1–h6), paragraphs, lists, tables, image metadata
+- **Progress bars** — live Rich progress when scraping (disable with `--quiet`)
+---
+## 📂 Project Structure
+```
+plethora/
+├── scrape-low          # ⭐ Shell shortcut → low detail report
+├── scrape-med          # ⭐ Shell shortcut → medium detail report
+├── scrape-high         # ⭐ Shell shortcut → high detail report
+├── scrape.py           # Full CLI with all options
+├── scraper.py          # Core engine — search, scrape, concurrency, caching
+├── formatter.py        # Report generators — txt, md, html, json, pdf
+├── common              # Shared shell helper (argument parsing)
+├── termux-setup        # 📱 One-command Termux setup
+├── linux-setup         # 🐧 One-command Linux setup
+├── mac-setup           # 🍎 One-command macOS setup
+├── windows-setup.bat   # 🪟 One-command Windows setup
+├── .cache/             # URL cache (auto-created)
+└── reports/            # All generated reports go here
+```
+---
+## 📖 Example Output
+<details>
+<summary><b>🟢 Low Report</b> — search results at a glance</summary>
+```
+============================================================
+ LOW-DETAIL REPORT
+ Query: python web scraping
+ Results: 5
+============================================================
+  1. Python Web Scraping Tutorial - GeeksforGeeks
+     https://www.geeksforgeeks.org/python/python-web-scraping-tutorial/
+     Web scraping is the process of extracting data from websites…
+  2. Beautiful Soup: Build a Web Scraper With Python
+     https://realpython.com/beautiful-soup-web-scraper-python/
+     Learn how to use Beautiful Soup and Requests to scrape…
+```
+</details>
+<details>
+<summary><b>🟡 Medium Report</b> — page content & structure</summary>
+```
+────────────────────────────────────────────────────────────
+  [1] Python Web Scraping Tutorial - GeeksforGeeks
+  URL: https://www.geeksforgeeks.org/python/python-web-scraping-tutorial/
+  Meta: Comprehensive guide to web scraping with Python…
+    • Python Web Scraping Tutorial
+      • Requests Module
+      • Parsing HTML with BeautifulSoup
+      • Selenium
+  ── Content Preview ──
+  Web scraping is the process of extracting data from websites
+  automatically. Python is widely used for web scraping because…
+```
+</details>
+<details>
+<summary><b>🔴 High Report</b> — deep scrape with sub-pages</summary>
+```
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+  [1] Python Web Scraping Tutorial - GeeksforGeeks
+  URL: https://www.geeksforgeeks.org/python/python-web-scraping-tutorial/
+  ── Headings ──
+    • Python Web Scraping Tutorial
+      • Requests Module
+      • Parsing HTML with BeautifulSoup
+      • Selenium
+  ── Content ──
+  [Full extracted text up to 2000 characters…]
+  🖼 Tutorial diagram — https://media.geeksforgeeks.org/…
+  ── Sub-pages (2) ──
+    ┌ Sub-page 1: Requests Tutorial
+    │ URL: https://www.geeksforgeeks.org/python-requests-tutorial/
+    │ [Sub-page content up to 800 characters…]
+    └────────────────────────────────────────
+```
+</details>
+---
+## 🔧 Using as a Python Library
+```python
+from plethora import web_search, scrape_page, scrape_subpages, run
+# Search only
+results = web_search("your query", num_results=10)
+# Scrape a single URL
+page = scrape_page("https://example.com")
+print(page["title"], page["headings"], page["lists"], page["tables"])
+# Full pipeline — returns list of report file paths
+paths = run("AI news 2026", level="high", num_results=5, out_format="all")
+```
+---
+## 📦 Publishing to PyPI
+### Automatic (GitHub Actions)
+A workflow is included that auto-publishes to PyPI when you create a GitHub release.
+1. Get an API token from [pypi.org/manage/account](https://pypi.org/manage/account/)
+2. Add it as a repo secret named `PYPI_API_TOKEN` in **Settings → Secrets → Actions**
+3. Create a new release on GitHub (e.g., tag `v1.0.0`)
+4. The workflow builds and uploads automatically
+### Manual (Termux / any terminal)
+```bash
+pip install build twine
+python -m build
+twine upload dist/*
+```
+You'll be prompted for your PyPI username (`__token__`) and API token.
+---
+## ⚠️ Disclaimer
+This tool is for **personal research and educational purposes only**.
+It respects `robots.txt`, includes per-domain rate limiting, and plays nice
+with servers. Please don't abuse it. Use responsibly.
+---
+## 💰 Support This Project
+If you find this useful, consider supporting me — it keeps me building stuff like this.
+[![Sponsor on GitHub](https://img.shields.io/badge/Sponsor_on_GitHub-💖-ea4aaa?style=for-the-badge&logo=github-sponsors&logoColor=white)](https://github.com/sponsors/soumyadipkarforma)
+[![Buy Me a Coffee](https://img.shields.io/badge/Buy%20Me%20a%20Coffee-ffdd00?style=for-the-badge&logo=buy-me-a-coffee&logoColor=black)](https://buymeacoffee.com/soumyadipkarforma)
+[![Patreon](https://img.shields.io/badge/Patreon-F96854?style=for-the-badge&logo=patreon&logoColor=white)](https://patreon.com/SoumyadipKarforma)
+---
+<div align="center">
+**Built by [@soumyadipkarforma](https://github.com/soumyadipkarforma)** · MIT License
+[![Instagram](https://img.shields.io/badge/Instagram-%23E4405F.svg?logo=Instagram&logoColor=white)](https://instagram.com/soumyadip_karforma) [![X](https://img.shields.io/badge/X-black.svg?logo=X&logoColor=white)](https://x.com/soumyadip_k) [![YouTube](https://img.shields.io/badge/YouTube-%23FF0000.svg?logo=YouTube&logoColor=white)](https://youtube.com/@soumyadip_karforma) [![Email](https://img.shields.io/badge/Email-D14836?logo=gmail&logoColor=white)](mailto:soumyadipkarforma@gmail.com)
+---
+## 🌿 Other Branches
+| Branch | What's There |
+|--------|-------------|
+| [`website`](https://github.com/soumyadipkarforma/plethora/tree/website) | 🌐 React web app — use Plethora from your browser. [Live demo →](https://soumyadipkarforma.github.io/plethora/) |
+| [`pypi-package`](https://github.com/soumyadipkarforma/plethora/tree/pypi-package) | 📦 Pip-installable Python library — `pip install plethora` for use in your own scripts |
+> **This branch (`main`)** has the terminal scripts and CLI tool — clone it and start scraping.
+</div>