plethora 1.0.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
plethora-1.0.0/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Soumyadip Karforma
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,415 @@
1
+ Metadata-Version: 2.4
2
+ Name: plethora
3
+ Version: 1.0.0
4
+ Summary: Search the web, scrape sites, and generate reports β€” all from your terminal.
5
+ Author-email: Soumyadip Karforma <soumyadipkarforma@gmail.com>
6
+ License-Expression: MIT
7
+ Project-URL: Homepage, https://github.com/soumyadipkarforma/plethora
8
+ Project-URL: Repository, https://github.com/soumyadipkarforma/plethora
9
+ Project-URL: Issues, https://github.com/soumyadipkarforma/plethora/issues
10
+ Keywords: web-scraping,search,report,cli,beautifulsoup
11
+ Classifier: Development Status :: 5 - Production/Stable
12
+ Classifier: Environment :: Console
13
+ Classifier: Intended Audience :: Developers
14
+ Classifier: Programming Language :: Python :: 3
15
+ Classifier: Programming Language :: Python :: 3.10
16
+ Classifier: Programming Language :: Python :: 3.11
17
+ Classifier: Programming Language :: Python :: 3.12
18
+ Classifier: Programming Language :: Python :: 3.13
19
+ Classifier: Topic :: Internet :: WWW/HTTP
20
+ Classifier: Topic :: Utilities
21
+ Requires-Python: >=3.10
22
+ Description-Content-Type: text/markdown
23
+ License-File: LICENSE
24
+ Requires-Dist: requests>=2.28
25
+ Requires-Dist: beautifulsoup4>=4.11
26
+ Requires-Dist: fpdf2>=2.7
27
+ Provides-Extra: rich
28
+ Requires-Dist: rich>=13.0; extra == "rich"
29
+ Dynamic: license-file
30
+
31
+ <div align="center">
32
+
33
+ # πŸ” Plethora
34
+
35
+ ### Search the web. Scrape the sites. Generate reports. All from your terminal.
36
+
37
+ I built this because I got tired of manually Googling stuff and copy-pasting content.
38
+ Now I just run a one-liner and get a clean report β€” low, medium, or high detail β€” in
39
+ plain text, Markdown, HTML, JSON, or PDF. No browser needed. No fluff.
40
+
41
+ [![PyPI](https://img.shields.io/pypi/v/plethora?logo=pypi&logoColor=white)](https://pypi.org/project/plethora/)
42
+ [![Python 3.10+](https://img.shields.io/badge/python-3.10+-3776AB?logo=python&logoColor=white)](#requirements)
43
+ [![License: MIT](https://img.shields.io/badge/license-MIT-green.svg)](#license)
44
+ [![Sponsor](https://img.shields.io/badge/sponsor-πŸ’–_Sponsor_Me-ea4aaa?logo=github-sponsors&logoColor=white)](https://github.com/sponsors/soumyadipkarforma)
45
+
46
+ [![Instagram](https://img.shields.io/badge/Instagram-%23E4405F.svg?logo=Instagram&logoColor=white)](https://instagram.com/soumyadip_karforma) [![X](https://img.shields.io/badge/X-black.svg?logo=X&logoColor=white)](https://x.com/soumyadip_k) [![YouTube](https://img.shields.io/badge/YouTube-%23FF0000.svg?logo=YouTube&logoColor=white)](https://youtube.com/@soumyadip_karforma) [![Email](https://img.shields.io/badge/Email-D14836?logo=gmail&logoColor=white)](mailto:soumyadipkarforma@gmail.com)
47
+
48
+ </div>
49
+
50
+ ---
51
+
52
+ ## πŸ’‘ Why I Made This
53
+
54
+ I wanted a fast way to research topics from the terminal β€” search for something,
55
+ pull down the actual content from each result, and save it all in one place.
56
+ So I wrote this: a set of scripts that does exactly that.
57
+
58
+ **The idea is simple:** pick a detail level, run the script, get your report.
59
+
60
+ ---
61
+
62
+ ## 🐚 The Scripts β€” The Fastest Way to Use This
63
+
64
+ These are the main thing. No flags to remember, no setup β€” just run them:
65
+
66
+ ```bash
67
+ # Quick list of search results β€” titles, URLs, snippets
68
+ ./scrape-low "best static site generators"
69
+
70
+ # Scrape the actual pages β€” headings, meta, content previews
71
+ ./scrape-med "python web frameworks 2026"
72
+
73
+ # Full deep scrape β€” page content + sub-pages + everything
74
+ ./scrape-high "machine learning research papers" 8 3
75
+ ```
76
+
77
+ **That's it.** Each script takes a search query and optionally how many results you want.
78
+ `scrape-high` also takes a sub-page count as the third argument.
79
+
80
+ ```
81
+ ./scrape-low "query" [num_results]
82
+ ./scrape-med "query" [num_results]
83
+ ./scrape-high "query" [num_results] [max_subpages]
84
+ ```
85
+
86
+ After the scrape finishes, it shows you where the report was saved and asks
87
+ if you want to view it right there in the terminal with `less`. Say `y` and read it,
88
+ or `n` and go grab it from the `reports/` folder later.
89
+
90
+ ---
91
+
92
+ ## πŸ“‹ What Each Level Gets You
93
+
94
+ ```
95
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
96
+ β”‚ Level β”‚ What You Get β”‚
97
+ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
98
+ β”‚ 🟒 LOW β”‚ Search results list β€” titles, URLs, snippets β”‚
99
+ β”‚ β”‚ ⚑ Instant β€” doesn't visit any pages β”‚
100
+ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
101
+ β”‚ 🟑 MED β”‚ Visits each result page β€” pulls headings, meta, β”‚
102
+ β”‚ β”‚ lists, and a content preview (500 chars) β”‚
103
+ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
104
+ β”‚ πŸ”΄ HIGH β”‚ Deep scrape β€” full page content + follows links β”‚
105
+ β”‚ β”‚ to sub-pages. Tables, images, 2000 char content β”‚
106
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
107
+ ```
108
+
109
+ ---
110
+
111
+ ## πŸš€ Setup
112
+
113
+ ### Install from PyPI (Recommended)
114
+
115
+ ```bash
116
+ pip install plethora
117
+ ```
118
+
119
+ That's it. Works everywhere β€” Linux, macOS, Windows, Termux, **Google Colab**.
120
+
121
+ After installing, use the CLI:
122
+
123
+ ```bash
124
+ plethora "your search query" --level medium
125
+ ```
126
+
127
+ Or use it as a Python library:
128
+
129
+ ```python
130
+ from plethora import web_search, scrape_page, run
131
+
132
+ results = web_search("python tutorials", num_results=5)
133
+ report_paths = run("AI news 2026", level="high", out_format="json")
134
+ ```
135
+
136
+ #### Google Colab
137
+
138
+ ```python
139
+ !pip install plethora
140
+
141
+ from plethora import run
142
+ paths = run("machine learning trends", level="medium", out_format="md")
143
+ ```
144
+
145
+ ### One-Command Setup (from source)
146
+
147
+ I've included setup scripts for every major platform. Just run the one for your system
148
+ and everything gets installed β€” Python, pip, dependencies, permissions. Zero hassle.
149
+
150
+ | Platform | Command |
151
+ |----------|---------|
152
+ | **Termux (Android)** | `bash termux-setup` |
153
+ | **Linux (Debian/Fedora/Arch/openSUSE)** | `bash linux-setup` |
154
+ | **macOS** | `bash mac-setup` |
155
+ | **Windows** | Double-click `windows-setup.bat` or run it from CMD |
156
+
157
+ Each script handles the full chain: system packages β†’ Python β†’ pip dependencies β†’ script permissions.
158
+ After running it, you're ready to go.
159
+
160
+ ### Manual Setup
161
+
162
+ If you'd rather do it yourself:
163
+
164
+ - **Python 3.10+**
165
+ - `requests` + `beautifulsoup4` (required)
166
+ - `rich` (optional β€” gives you nice progress bars)
167
+ - `fpdf2` (required for PDF output)
168
+
169
+ ```bash
170
+ pip install requests beautifulsoup4 rich fpdf2
171
+ ```
172
+
173
+ Make the scripts executable:
174
+
175
+ ```bash
176
+ chmod +x scrape-low scrape-med scrape-high
177
+ ```
178
+
179
+ You're good to go.
180
+
181
+ ---
182
+
183
+ ## βš™οΈ Advanced: The Python CLI
184
+
185
+ If you need more control, use `scrape.py` directly with flags:
186
+
187
+ ```bash
188
+ # Basic usage
189
+ python scrape.py "your search query" --level medium
190
+
191
+ # Generate all formats at once (txt + md + html + json + pdf)
192
+ python scrape.py "AI research" --level high --format all
193
+
194
+ # Parallel scrape with 8 threads, skip cache
195
+ python scrape.py "web dev trends" --level medium --workers 8 --no-cache
196
+
197
+ # Quiet mode for piping
198
+ python scrape.py "data science" --level low --quiet --format json
199
+ ```
200
+
201
+ ### All Options
202
+
203
+ ```
204
+ python scrape.py <query> [options]
205
+
206
+ -l, --level LEVEL low | medium | high (default: medium)
207
+ -n, --results N Number of search results (default: 5)
208
+ -s, --subpages N Max sub-pages per site (high only) (default: 2)
209
+ -o, --output DIR Output directory (default: reports/)
210
+ -f, --format FMT txt | md | html | json | pdf | all (default: txt)
211
+ -w, --workers N Concurrent scraping threads (default: 4)
212
+ -q, --quiet Suppress progress output
213
+ --no-cache Bypass URL cache
214
+ --cache-ttl SECS Cache TTL in seconds (default: 3600)
215
+ ```
216
+
217
+ ---
218
+
219
+ ## πŸ“ Output Formats
220
+
221
+ | Format | Extension | Description |
222
+ |--------|-----------|-------------|
223
+ | **txt** | `.txt` | Clean plain text β€” great for terminal reading |
224
+ | **md** | `.md` | Markdown β€” perfect for pasting into notes or docs |
225
+ | **html** | `.html` | Self-contained HTML with dark theme β€” open in any browser |
226
+ | **json** | `.json` | Raw structured data β€” feed it into your own scripts |
227
+ | **pdf** | `.pdf` | Portable PDF with watermark β€” share or print anywhere |
228
+
229
+ All formats include the **Plethora** watermark. Use `--format all` to get everything.
230
+
231
+ ---
232
+
233
+ ## ✨ What's Under the Hood
234
+
235
+ - **Concurrent scraping** β€” pages are fetched in parallel with configurable threads
236
+ - **Smart caching** β€” already-fetched URLs are cached locally (1hr default TTL)
237
+ - **robots.txt respect** β€” checks before scraping, skips disallowed URLs
238
+ - **Auto-retries** β€” failed requests retry 3x with exponential backoff
239
+ - **Per-domain rate limiting** β€” won't hammer the same site
240
+ - **Rich extraction** β€” headings (h1–h6), paragraphs, lists, tables, image metadata
241
+ - **Progress bars** β€” live Rich progress when scraping (disable with `--quiet`)
242
+
243
+ ---
244
+
245
+ ## πŸ“‚ Project Structure
246
+
247
+ ```
248
+ plethora/
249
+ β”œβ”€β”€ scrape-low # ⭐ Shell shortcut β†’ low detail report
250
+ β”œβ”€β”€ scrape-med # ⭐ Shell shortcut β†’ medium detail report
251
+ β”œβ”€β”€ scrape-high # ⭐ Shell shortcut β†’ high detail report
252
+ β”œβ”€β”€ scrape.py # Full CLI with all options
253
+ β”œβ”€β”€ scraper.py # Core engine β€” search, scrape, concurrency, caching
254
+ β”œβ”€β”€ formatter.py # Report generators β€” txt, md, html, json, pdf
255
+ β”œβ”€β”€ common # Shared shell helper (argument parsing)
256
+ β”œβ”€β”€ termux-setup # πŸ“± One-command Termux setup
257
+ β”œβ”€β”€ linux-setup # 🐧 One-command Linux setup
258
+ β”œβ”€β”€ mac-setup # 🍎 One-command macOS setup
259
+ β”œβ”€β”€ windows-setup.bat # πŸͺŸ One-command Windows setup
260
+ β”œβ”€β”€ .cache/ # URL cache (auto-created)
261
+ └── reports/ # All generated reports go here
262
+ ```
263
+
264
+ ---
265
+
266
+ ## πŸ“– Example Output
267
+
268
+ <details>
269
+ <summary><b>🟒 Low Report</b> β€” search results at a glance</summary>
270
+
271
+ ```
272
+ ============================================================
273
+ LOW-DETAIL REPORT
274
+ Query: python web scraping
275
+ Results: 5
276
+ ============================================================
277
+
278
+ 1. Python Web Scraping Tutorial - GeeksforGeeks
279
+ https://www.geeksforgeeks.org/python/python-web-scraping-tutorial/
280
+ Web scraping is the process of extracting data from websites…
281
+
282
+ 2. Beautiful Soup: Build a Web Scraper With Python
283
+ https://realpython.com/beautiful-soup-web-scraper-python/
284
+ Learn how to use Beautiful Soup and Requests to scrape…
285
+ ```
286
+
287
+ </details>
288
+
289
+ <details>
290
+ <summary><b>🟑 Medium Report</b> β€” page content & structure</summary>
291
+
292
+ ```
293
+ ────────────────────────────────────────────────────────────
294
+ [1] Python Web Scraping Tutorial - GeeksforGeeks
295
+ URL: https://www.geeksforgeeks.org/python/python-web-scraping-tutorial/
296
+ Meta: Comprehensive guide to web scraping with Python…
297
+ β€’ Python Web Scraping Tutorial
298
+ β€’ Requests Module
299
+ β€’ Parsing HTML with BeautifulSoup
300
+ β€’ Selenium
301
+
302
+ ── Content Preview ──
303
+ Web scraping is the process of extracting data from websites
304
+ automatically. Python is widely used for web scraping because…
305
+ ```
306
+
307
+ </details>
308
+
309
+ <details>
310
+ <summary><b>πŸ”΄ High Report</b> β€” deep scrape with sub-pages</summary>
311
+
312
+ ```
313
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
314
+ [1] Python Web Scraping Tutorial - GeeksforGeeks
315
+ URL: https://www.geeksforgeeks.org/python/python-web-scraping-tutorial/
316
+
317
+ ── Headings ──
318
+ β€’ Python Web Scraping Tutorial
319
+ β€’ Requests Module
320
+ β€’ Parsing HTML with BeautifulSoup
321
+ β€’ Selenium
322
+
323
+ ── Content ──
324
+ [Full extracted text up to 2000 characters…]
325
+
326
+ πŸ–Ό Tutorial diagram β€” https://media.geeksforgeeks.org/…
327
+
328
+ ── Sub-pages (2) ──
329
+ β”Œ Sub-page 1: Requests Tutorial
330
+ β”‚ URL: https://www.geeksforgeeks.org/python-requests-tutorial/
331
+ β”‚ [Sub-page content up to 800 characters…]
332
+ └────────────────────────────────────────
333
+ ```
334
+
335
+ </details>
336
+
337
+ ---
338
+
339
+ ## πŸ”§ Using as a Python Library
340
+
341
+ ```python
342
+ from plethora import web_search, scrape_page, scrape_subpages, run
343
+
344
+ # Search only
345
+ results = web_search("your query", num_results=10)
346
+
347
+ # Scrape a single URL
348
+ page = scrape_page("https://example.com")
349
+ print(page["title"], page["headings"], page["lists"], page["tables"])
350
+
351
+ # Full pipeline β€” returns list of report file paths
352
+ paths = run("AI news 2026", level="high", num_results=5, out_format="all")
353
+ ```
354
+
355
+ ---
356
+
357
+ ## πŸ“¦ Publishing to PyPI
358
+
359
+ ### Automatic (GitHub Actions)
360
+
361
+ A workflow is included that auto-publishes to PyPI when you create a GitHub release.
362
+
363
+ 1. Get an API token from [pypi.org/manage/account](https://pypi.org/manage/account/)
364
+ 2. Add it as a repo secret named `PYPI_API_TOKEN` in **Settings β†’ Secrets β†’ Actions**
365
+ 3. Create a new release on GitHub (e.g., tag `v1.0.0`)
366
+ 4. The workflow builds and uploads automatically
367
+
368
+ ### Manual (Termux / any terminal)
369
+
370
+ ```bash
371
+ pip install build twine
372
+ python -m build
373
+ twine upload dist/*
374
+ ```
375
+
376
+ You'll be prompted for your PyPI username (`__token__`) and API token.
377
+
378
+ ---
379
+
380
+ ## ⚠️ Disclaimer
381
+
382
+ This tool is for **personal research and educational purposes only**.
383
+ It respects `robots.txt`, includes per-domain rate limiting, and plays nice
384
+ with servers. Please don't abuse it. Use responsibly.
385
+
386
+ ---
387
+
388
+ ## πŸ’° Support This Project
389
+
390
+ If you find this useful, consider supporting me β€” it keeps me building stuff like this.
391
+
392
+ [![Sponsor on GitHub](https://img.shields.io/badge/Sponsor_on_GitHub-πŸ’–-ea4aaa?style=for-the-badge&logo=github-sponsors&logoColor=white)](https://github.com/sponsors/soumyadipkarforma)
393
+ [![Buy Me a Coffee](https://img.shields.io/badge/Buy%20Me%20a%20Coffee-ffdd00?style=for-the-badge&logo=buy-me-a-coffee&logoColor=black)](https://buymeacoffee.com/soumyadipkarforma)
394
+ [![Patreon](https://img.shields.io/badge/Patreon-F96854?style=for-the-badge&logo=patreon&logoColor=white)](https://patreon.com/SoumyadipKarforma)
395
+
396
+ ---
397
+
398
+ <div align="center">
399
+
400
+ **Built by [@soumyadipkarforma](https://github.com/soumyadipkarforma)** Β· MIT License
401
+
402
+ [![Instagram](https://img.shields.io/badge/Instagram-%23E4405F.svg?logo=Instagram&logoColor=white)](https://instagram.com/soumyadip_karforma) [![X](https://img.shields.io/badge/X-black.svg?logo=X&logoColor=white)](https://x.com/soumyadip_k) [![YouTube](https://img.shields.io/badge/YouTube-%23FF0000.svg?logo=YouTube&logoColor=white)](https://youtube.com/@soumyadip_karforma) [![Email](https://img.shields.io/badge/Email-D14836?logo=gmail&logoColor=white)](mailto:soumyadipkarforma@gmail.com)
403
+
404
+ ---
405
+
406
+ ## 🌿 Other Branches
407
+
408
+ | Branch | What's There |
409
+ |--------|-------------|
410
+ | [`website`](https://github.com/soumyadipkarforma/plethora/tree/website) | 🌐 React web app β€” use Plethora from your browser. [Live demo β†’](https://soumyadipkarforma.github.io/plethora/) |
411
+ | [`pypi-package`](https://github.com/soumyadipkarforma/plethora/tree/pypi-package) | πŸ“¦ Pip-installable Python library β€” `pip install plethora` for use in your own scripts |
412
+
413
+ > **This branch (`main`)** has the terminal scripts and CLI tool β€” clone it and start scraping.
414
+
415
+ </div>