PyPI - python-job-scraper - Versions diffs - 0.3.0__tar.gz - Mend

python-job-scraper 0.3.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (32) hide show

python_job_scraper-0.3.0/LICENSE +21 -0
python_job_scraper-0.3.0/PKG-INFO +221 -0
python_job_scraper-0.3.0/README.md +169 -0
python_job_scraper-0.3.0/jobscraper/__init__.py +302 -0
python_job_scraper-0.3.0/jobscraper/exception.py +32 -0
python_job_scraper-0.3.0/jobscraper/glassdoor/__init__.py +309 -0
python_job_scraper-0.3.0/jobscraper/glassdoor/constant.py +33 -0
python_job_scraper-0.3.0/jobscraper/glassdoor/util.py +215 -0
python_job_scraper-0.3.0/jobscraper/indeed/__init__.py +331 -0
python_job_scraper-0.3.0/jobscraper/indeed/constant.py +38 -0
python_job_scraper-0.3.0/jobscraper/indeed/util.py +157 -0
python_job_scraper-0.3.0/jobscraper/linkedin/__init__.py +5 -0
python_job_scraper-0.3.0/jobscraper/linkedin/_scraper.py +283 -0
python_job_scraper-0.3.0/jobscraper/linkedin/constant.py +60 -0
python_job_scraper-0.3.0/jobscraper/linkedin/util.py +331 -0
python_job_scraper-0.3.0/jobscraper/model.py +144 -0
python_job_scraper-0.3.0/jobscraper/util.py +500 -0
python_job_scraper-0.3.0/pyproject.toml +103 -0
python_job_scraper-0.3.0/python_job_scraper.egg-info/PKG-INFO +221 -0
python_job_scraper-0.3.0/python_job_scraper.egg-info/SOURCES.txt +30 -0
python_job_scraper-0.3.0/python_job_scraper.egg-info/dependency_links.txt +1 -0
python_job_scraper-0.3.0/python_job_scraper.egg-info/requires.txt +29 -0
python_job_scraper-0.3.0/python_job_scraper.egg-info/top_level.txt +1 -0
python_job_scraper-0.3.0/setup.cfg +4 -0
python_job_scraper-0.3.0/tests/test_glassdoor.py +323 -0
python_job_scraper-0.3.0/tests/test_glassdoor_integration.py +106 -0
python_job_scraper-0.3.0/tests/test_indeed.py +210 -0
python_job_scraper-0.3.0/tests/test_indeed_integration.py +141 -0
python_job_scraper-0.3.0/tests/test_linkedin.py +349 -0
python_job_scraper-0.3.0/tests/test_linkedin_integration.py +189 -0
python_job_scraper-0.3.0/tests/test_scrape_jobs.py +130 -0
python_job_scraper-0.3.0/tests/test_util.py +200 -0

python_job_scraper-0.3.0/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2026 seeedstack
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

python_job_scraper-0.3.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,221 @@
+Metadata-Version: 2.4
+Name: python-job-scraper
+Version: 0.3.0
+Summary: Multi-platform job scraping library supporting Indeed, LinkedIn, Glassdoor, Upwork, and Internshala.
+Author-email: sarankirthic <sarankirthic@gmail.com>
+License-Expression: MIT
+Project-URL: Homepage, https://github.com/seeedstack/job-scraper
+Project-URL: Documentation, https://github.com/seeedstack/job-scraper/blob/main/README.md
+Project-URL: Repository, https://github.com/seeedstack/job-scraper.git
+Project-URL: Issues, https://github.com/seeedstack/job-scraper/issues
+Project-URL: Changelog, https://github.com/seeedstack/job-scraper/releases
+Keywords: job-scraping,indeed,linkedin,glassdoor,upwork,internshala,web-scraping
+Classifier: Development Status :: 4 - Beta
+Classifier: Intended Audience :: Developers
+Classifier: Natural Language :: English
+Classifier: Operating System :: OS Independent
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.13
+Classifier: Programming Language :: Python :: 3.14
+Classifier: Topic :: Internet :: WWW/HTTP
+Classifier: Topic :: Office/Business
+Classifier: Topic :: Software Development :: Libraries
+Requires-Python: >=3.13
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Requires-Dist: requests>=2.31
+Requires-Dist: tls-client>=1.0
+Requires-Dist: pydantic>=2.0
+Requires-Dist: pandas>=2.0
+Requires-Dist: markdownify>=0.11
+Requires-Dist: beautifulsoup4>=4.12
+Requires-Dist: lxml>=4.9
+Provides-Extra: test
+Requires-Dist: pytest>=7.4.0; extra == "test"
+Requires-Dist: pytest-cov>=4.1.0; extra == "test"
+Provides-Extra: lint
+Requires-Dist: black>=24.1.0; extra == "lint"
+Requires-Dist: ruff>=0.1.0; extra == "lint"
+Requires-Dist: mypy>=1.7.0; extra == "lint"
+Requires-Dist: pre-commit>=3.5.0; extra == "lint"
+Provides-Extra: scraping
+Requires-Dist: ddgs>=9.0.0; extra == "scraping"
+Provides-Extra: dev
+Requires-Dist: black>=24.1.0; extra == "dev"
+Requires-Dist: ruff>=0.1.0; extra == "dev"
+Requires-Dist: mypy>=1.7.0; extra == "dev"
+Requires-Dist: pre-commit>=3.5.0; extra == "dev"
+Requires-Dist: pytest>=7.4.0; extra == "dev"
+Requires-Dist: pytest-cov>=4.1.0; extra == "dev"
+Requires-Dist: ddgs>=9.0.0; extra == "dev"
+Dynamic: license-file
+# python-job-scraper
+Scrape job listings from multiple job sites with one function call. Results land in a single, normalized `pandas.DataFrame`.
+**Supported sites:**
+- [x] Indeed
+- [x] Glassdoor
+- [x] LinkedIn
+- [ ] Naukri
+- [ ] Foundit
+- [ ] Shine
+- [ ] Internshala
+- [ ] Upwork
+- [ ] Apna
+```python
+from jobscraper import scrape_jobs
+jobs = scrape_jobs(
+    site_name=["indeed", "glassdoor", "linkedin"],
+    search_term="software engineer",
+    location="Bangalore",
+    results_wanted=20,
+)
+```
+No API keys. No accounts. Chrome 120 TLS fingerprinting keeps requests looking like a real browser.
+---
+## Installation
+**Requirements:** Python 3.13+
+### With pip
+```bash
+pip install python-job-scraper
+```
+### With uv
+```bash
+uv pip install python-job-scraper
+```
+### From source
+```bash
+git clone https://github.com/seeedstack/job-scraper.git
+cd job-scraper
+pip install .
+```
+---
+## Usage
+### Single site
+```python
+jobs = scrape_jobs(
+    site_name="indeed",
+    search_term="data scientist",
+    location="Mumbai",
+    results_wanted=15,
+    hours_old=48,          # only jobs posted in the last 48 hours
+    job_type="fulltime",
+)
+print(jobs[["title", "company", "location", "date_posted", "min_amount"]].head())
+```
+### Multiple sites in parallel
+```python
+jobs = scrape_jobs(
+    site_name=["indeed", "glassdoor", "linkedin"],
+    search_term="product manager",
+    location="Delhi",
+    results_wanted=10,     # 10 per site → up to 30 total
+    description_format="markdown",
+)
+```
+### LinkedIn with authentication (richer data)
+Without a cookie, LinkedIn returns public job cards — title, company, location, date.
+With your `li_at` cookie, the Voyager API unlocks salary ranges, full descriptions, and direct apply URLs.
+```bash
+LI_AT=your_cookie python examples/test_linkedin.py
+```
+```python
+jobs = scrape_jobs(
+    site_name="linkedin",
+    search_term="machine learning engineer",
+    location="Hyderabad",
+    cookies={"li_at": "your_li_at_cookie_value"},
+    is_remote=True,
+)
+```
+---
+## Parameters
+| Parameter | Type | Default | Description |
+|---|---|---|---|
+| `site_name` | `str \| list[str]` | **required** | `"indeed"`, `"glassdoor"`, `"linkedin"` |
+| `search_term` | `str` | **required** | Job title or keyword |
+| `location` | `str` | `None` | City or region |
+| `results_wanted` | `int` | `20` | Max results **per site** |
+| `hours_old` | `int` | `None` | Exclude jobs older than N hours |
+| `job_type` | `str` | `None` | `"fulltime"` `"parttime"` `"contract"` `"internship"` |
+| `is_remote` | `bool` | `False` | Remote jobs only (LinkedIn) |
+| `distance` | `int` | `50` | Search radius in km |
+| `country_indeed` | `str` | `"india"` | Country for Indeed |
+| `description_format` | `str` | `"markdown"` | `"markdown"` or `"html"` |
+| `enforce_annual_salary` | `bool` | `False` | Normalize all pay to annual |
+| `offset` | `int` | `0` | Skip first N results (for pagination) |
+| `cookies` | `dict` | `None` | Pass `{"li_at": "..."}` for LinkedIn Voyager |
+| `proxies` | `str \| list` | `None` | Proxy URL(s) |
+| `verbose` | `int` | `0` | `0`=errors `1`=warnings `2`=info |
+---
+## Output columns
+| Column | Description |
+|---|---|
+| `site` | Source platform |
+| `title` | Job title |
+| `company` | Company name |
+| `location` | City / state / country |
+| `date_posted` | Posting date |
+| `job_type` | Employment type |
+| `is_remote` | Remote flag |
+| `min_amount` / `max_amount` | Salary range |
+| `interval` | Pay period: `hourly` `monthly` `yearly` |
+| `currency` | Currency code |
+| `description` | Full job description |
+| `job_url` | Link to the listing |
+| `job_url_direct` | Direct apply URL (when available) |
+| `company_url` | Company profile URL |
+| `emails` | Contact emails found in description |
+All-NA columns are dropped automatically. Use `enforce_annual_salary=True` to normalize hourly/monthly/daily rates to annual before comparing across sites.
+---
+## Running tests
+```bash
+# Unit tests only
+pytest tests/
+# Include live integration tests (hits real sites)
+pytest tests/ -m integration
+```
+---
+## License
+MIT © 2026 saran
+This library is intended for personal and research use. Scraping job sites may conflict with their Terms of Service — use responsibly and at your own risk. No warranty is provided for the accuracy or availability of scraped data.

python_job_scraper-0.3.0/README.md ADDED Viewed

@@ -0,0 +1,169 @@
+# python-job-scraper
+Scrape job listings from multiple job sites with one function call. Results land in a single, normalized `pandas.DataFrame`.
+**Supported sites:**
+- [x] Indeed
+- [x] Glassdoor
+- [x] LinkedIn
+- [ ] Naukri
+- [ ] Foundit
+- [ ] Shine
+- [ ] Internshala
+- [ ] Upwork
+- [ ] Apna
+```python
+from jobscraper import scrape_jobs
+jobs = scrape_jobs(
+    site_name=["indeed", "glassdoor", "linkedin"],
+    search_term="software engineer",
+    location="Bangalore",
+    results_wanted=20,
+)
+```
+No API keys. No accounts. Chrome 120 TLS fingerprinting keeps requests looking like a real browser.
+---
+## Installation
+**Requirements:** Python 3.13+
+### With pip
+```bash
+pip install python-job-scraper
+```
+### With uv
+```bash
+uv pip install python-job-scraper
+```
+### From source
+```bash
+git clone https://github.com/seeedstack/job-scraper.git
+cd job-scraper
+pip install .
+```
+---
+## Usage
+### Single site
+```python
+jobs = scrape_jobs(
+    site_name="indeed",
+    search_term="data scientist",
+    location="Mumbai",
+    results_wanted=15,
+    hours_old=48,          # only jobs posted in the last 48 hours
+    job_type="fulltime",
+)
+print(jobs[["title", "company", "location", "date_posted", "min_amount"]].head())
+```
+### Multiple sites in parallel
+```python
+jobs = scrape_jobs(
+    site_name=["indeed", "glassdoor", "linkedin"],
+    search_term="product manager",
+    location="Delhi",
+    results_wanted=10,     # 10 per site → up to 30 total
+    description_format="markdown",
+)
+```
+### LinkedIn with authentication (richer data)
+Without a cookie, LinkedIn returns public job cards — title, company, location, date.
+With your `li_at` cookie, the Voyager API unlocks salary ranges, full descriptions, and direct apply URLs.
+```bash
+LI_AT=your_cookie python examples/test_linkedin.py
+```
+```python
+jobs = scrape_jobs(
+    site_name="linkedin",
+    search_term="machine learning engineer",
+    location="Hyderabad",
+    cookies={"li_at": "your_li_at_cookie_value"},
+    is_remote=True,
+)
+```
+---
+## Parameters
+| Parameter | Type | Default | Description |
+|---|---|---|---|
+| `site_name` | `str \| list[str]` | **required** | `"indeed"`, `"glassdoor"`, `"linkedin"` |
+| `search_term` | `str` | **required** | Job title or keyword |
+| `location` | `str` | `None` | City or region |
+| `results_wanted` | `int` | `20` | Max results **per site** |
+| `hours_old` | `int` | `None` | Exclude jobs older than N hours |
+| `job_type` | `str` | `None` | `"fulltime"` `"parttime"` `"contract"` `"internship"` |
+| `is_remote` | `bool` | `False` | Remote jobs only (LinkedIn) |
+| `distance` | `int` | `50` | Search radius in km |
+| `country_indeed` | `str` | `"india"` | Country for Indeed |
+| `description_format` | `str` | `"markdown"` | `"markdown"` or `"html"` |
+| `enforce_annual_salary` | `bool` | `False` | Normalize all pay to annual |
+| `offset` | `int` | `0` | Skip first N results (for pagination) |
+| `cookies` | `dict` | `None` | Pass `{"li_at": "..."}` for LinkedIn Voyager |
+| `proxies` | `str \| list` | `None` | Proxy URL(s) |
+| `verbose` | `int` | `0` | `0`=errors `1`=warnings `2`=info |
+---
+## Output columns
+| Column | Description |
+|---|---|
+| `site` | Source platform |
+| `title` | Job title |
+| `company` | Company name |
+| `location` | City / state / country |
+| `date_posted` | Posting date |
+| `job_type` | Employment type |
+| `is_remote` | Remote flag |
+| `min_amount` / `max_amount` | Salary range |
+| `interval` | Pay period: `hourly` `monthly` `yearly` |
+| `currency` | Currency code |
+| `description` | Full job description |
+| `job_url` | Link to the listing |
+| `job_url_direct` | Direct apply URL (when available) |
+| `company_url` | Company profile URL |
+| `emails` | Contact emails found in description |
+All-NA columns are dropped automatically. Use `enforce_annual_salary=True` to normalize hourly/monthly/daily rates to annual before comparing across sites.
+---
+## Running tests
+```bash
+# Unit tests only
+pytest tests/
+# Include live integration tests (hits real sites)
+pytest tests/ -m integration
+```
+---
+## License
+MIT © 2026 saran
+This library is intended for personal and research use. Scraping job sites may conflict with their Terms of Service — use responsibly and at your own risk. No warranty is provided for the accuracy or availability of scraped data.