python-job-scraper 0.3.0__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,221 @@
1
+ Metadata-Version: 2.4
2
+ Name: python-job-scraper
3
+ Version: 0.3.0
4
+ Summary: Multi-platform job scraping library supporting Indeed, LinkedIn, Glassdoor, Upwork, and Internshala.
5
+ Author-email: sarankirthic <sarankirthic@gmail.com>
6
+ License-Expression: MIT
7
+ Project-URL: Homepage, https://github.com/seeedstack/job-scraper
8
+ Project-URL: Documentation, https://github.com/seeedstack/job-scraper/blob/main/README.md
9
+ Project-URL: Repository, https://github.com/seeedstack/job-scraper.git
10
+ Project-URL: Issues, https://github.com/seeedstack/job-scraper/issues
11
+ Project-URL: Changelog, https://github.com/seeedstack/job-scraper/releases
12
+ Keywords: job-scraping,indeed,linkedin,glassdoor,upwork,internshala,web-scraping
13
+ Classifier: Development Status :: 4 - Beta
14
+ Classifier: Intended Audience :: Developers
15
+ Classifier: Natural Language :: English
16
+ Classifier: Operating System :: OS Independent
17
+ Classifier: Programming Language :: Python :: 3
18
+ Classifier: Programming Language :: Python :: 3.13
19
+ Classifier: Programming Language :: Python :: 3.14
20
+ Classifier: Topic :: Internet :: WWW/HTTP
21
+ Classifier: Topic :: Office/Business
22
+ Classifier: Topic :: Software Development :: Libraries
23
+ Requires-Python: >=3.13
24
+ Description-Content-Type: text/markdown
25
+ License-File: LICENSE
26
+ Requires-Dist: requests>=2.31
27
+ Requires-Dist: tls-client>=1.0
28
+ Requires-Dist: pydantic>=2.0
29
+ Requires-Dist: pandas>=2.0
30
+ Requires-Dist: markdownify>=0.11
31
+ Requires-Dist: beautifulsoup4>=4.12
32
+ Requires-Dist: lxml>=4.9
33
+ Provides-Extra: test
34
+ Requires-Dist: pytest>=7.4.0; extra == "test"
35
+ Requires-Dist: pytest-cov>=4.1.0; extra == "test"
36
+ Provides-Extra: lint
37
+ Requires-Dist: black>=24.1.0; extra == "lint"
38
+ Requires-Dist: ruff>=0.1.0; extra == "lint"
39
+ Requires-Dist: mypy>=1.7.0; extra == "lint"
40
+ Requires-Dist: pre-commit>=3.5.0; extra == "lint"
41
+ Provides-Extra: scraping
42
+ Requires-Dist: ddgs>=9.0.0; extra == "scraping"
43
+ Provides-Extra: dev
44
+ Requires-Dist: black>=24.1.0; extra == "dev"
45
+ Requires-Dist: ruff>=0.1.0; extra == "dev"
46
+ Requires-Dist: mypy>=1.7.0; extra == "dev"
47
+ Requires-Dist: pre-commit>=3.5.0; extra == "dev"
48
+ Requires-Dist: pytest>=7.4.0; extra == "dev"
49
+ Requires-Dist: pytest-cov>=4.1.0; extra == "dev"
50
+ Requires-Dist: ddgs>=9.0.0; extra == "dev"
51
+ Dynamic: license-file
52
+
53
+ # python-job-scraper
54
+
55
+ Scrape job listings from multiple job sites with one function call. Results land in a single, normalized `pandas.DataFrame`.
56
+
57
+ **Supported sites:**
58
+
59
+ - [x] Indeed
60
+ - [x] Glassdoor
61
+ - [x] LinkedIn
62
+ - [ ] Naukri
63
+ - [ ] Foundit
64
+ - [ ] Shine
65
+ - [ ] Internshala
66
+ - [ ] Upwork
67
+ - [ ] Apna
68
+
69
+ ```python
70
+ from jobscraper import scrape_jobs
71
+
72
+ jobs = scrape_jobs(
73
+ site_name=["indeed", "glassdoor", "linkedin"],
74
+ search_term="software engineer",
75
+ location="Bangalore",
76
+ results_wanted=20,
77
+ )
78
+ ```
79
+
80
+ No API keys. No accounts. Chrome 120 TLS fingerprinting keeps requests looking like a real browser.
81
+
82
+ ---
83
+
84
+ ## Installation
85
+
86
+ **Requirements:** Python 3.13+
87
+
88
+ ### With pip
89
+
90
+ ```bash
91
+ pip install python-job-scraper
92
+ ```
93
+
94
+ ### With uv
95
+ ```bash
96
+ uv pip install python-job-scraper
97
+ ```
98
+
99
+ ### From source
100
+
101
+ ```bash
102
+ git clone https://github.com/seeedstack/job-scraper.git
103
+ cd job-scraper
104
+
105
+ pip install .
106
+ ```
107
+
108
+ ---
109
+
110
+ ## Usage
111
+
112
+ ### Single site
113
+
114
+ ```python
115
+ jobs = scrape_jobs(
116
+ site_name="indeed",
117
+ search_term="data scientist",
118
+ location="Mumbai",
119
+ results_wanted=15,
120
+ hours_old=48, # only jobs posted in the last 48 hours
121
+ job_type="fulltime",
122
+ )
123
+ print(jobs[["title", "company", "location", "date_posted", "min_amount"]].head())
124
+ ```
125
+
126
+ ### Multiple sites in parallel
127
+
128
+ ```python
129
+ jobs = scrape_jobs(
130
+ site_name=["indeed", "glassdoor", "linkedin"],
131
+ search_term="product manager",
132
+ location="Delhi",
133
+ results_wanted=10, # 10 per site → up to 30 total
134
+ description_format="markdown",
135
+ )
136
+ ```
137
+
138
+ ### LinkedIn with authentication (richer data)
139
+
140
+ Without a cookie, LinkedIn returns public job cards — title, company, location, date.
141
+ With your `li_at` cookie, the Voyager API unlocks salary ranges, full descriptions, and direct apply URLs.
142
+
143
+ ```bash
144
+ LI_AT=your_cookie python examples/test_linkedin.py
145
+ ```
146
+
147
+ ```python
148
+ jobs = scrape_jobs(
149
+ site_name="linkedin",
150
+ search_term="machine learning engineer",
151
+ location="Hyderabad",
152
+ cookies={"li_at": "your_li_at_cookie_value"},
153
+ is_remote=True,
154
+ )
155
+ ```
156
+
157
+ ---
158
+
159
+ ## Parameters
160
+
161
+ | Parameter | Type | Default | Description |
162
+ |---|---|---|---|
163
+ | `site_name` | `str \| list[str]` | **required** | `"indeed"`, `"glassdoor"`, `"linkedin"` |
164
+ | `search_term` | `str` | **required** | Job title or keyword |
165
+ | `location` | `str` | `None` | City or region |
166
+ | `results_wanted` | `int` | `20` | Max results **per site** |
167
+ | `hours_old` | `int` | `None` | Exclude jobs older than N hours |
168
+ | `job_type` | `str` | `None` | `"fulltime"` `"parttime"` `"contract"` `"internship"` |
169
+ | `is_remote` | `bool` | `False` | Remote jobs only (LinkedIn) |
170
+ | `distance` | `int` | `50` | Search radius in km |
171
+ | `country_indeed` | `str` | `"india"` | Country for Indeed |
172
+ | `description_format` | `str` | `"markdown"` | `"markdown"` or `"html"` |
173
+ | `enforce_annual_salary` | `bool` | `False` | Normalize all pay to annual |
174
+ | `offset` | `int` | `0` | Skip first N results (for pagination) |
175
+ | `cookies` | `dict` | `None` | Pass `{"li_at": "..."}` for LinkedIn Voyager |
176
+ | `proxies` | `str \| list` | `None` | Proxy URL(s) |
177
+ | `verbose` | `int` | `0` | `0`=errors `1`=warnings `2`=info |
178
+
179
+ ---
180
+
181
+ ## Output columns
182
+
183
+ | Column | Description |
184
+ |---|---|
185
+ | `site` | Source platform |
186
+ | `title` | Job title |
187
+ | `company` | Company name |
188
+ | `location` | City / state / country |
189
+ | `date_posted` | Posting date |
190
+ | `job_type` | Employment type |
191
+ | `is_remote` | Remote flag |
192
+ | `min_amount` / `max_amount` | Salary range |
193
+ | `interval` | Pay period: `hourly` `monthly` `yearly` |
194
+ | `currency` | Currency code |
195
+ | `description` | Full job description |
196
+ | `job_url` | Link to the listing |
197
+ | `job_url_direct` | Direct apply URL (when available) |
198
+ | `company_url` | Company profile URL |
199
+ | `emails` | Contact emails found in description |
200
+
201
+ All-NA columns are dropped automatically. Use `enforce_annual_salary=True` to normalize hourly/monthly/daily rates to annual before comparing across sites.
202
+
203
+ ---
204
+
205
+ ## Running tests
206
+
207
+ ```bash
208
+ # Unit tests only
209
+ pytest tests/
210
+
211
+ # Include live integration tests (hits real sites)
212
+ pytest tests/ -m integration
213
+ ```
214
+
215
+ ---
216
+
217
+ ## License
218
+
219
+ MIT © 2026 saran
220
+
221
+ This library is intended for personal and research use. Scraping job sites may conflict with their Terms of Service — use responsibly and at your own risk. No warranty is provided for the accuracy or availability of scraped data.
@@ -0,0 +1,19 @@
1
+ jobscraper/__init__.py,sha256=ptbGKLaZMgiRHqjn00X_o4CncucUqgeB9yYfTjEAYTY,10678
2
+ jobscraper/exception.py,sha256=D_XkX-fzXI-oH9vJScuVSYDrJxdmNVmaKmw7e8LOIAE,1070
3
+ jobscraper/model.py,sha256=He2QUqg00HnSn-HGpvwbk61mP3Q27Ank8EbZ8TtpQww,3982
4
+ jobscraper/util.py,sha256=CJOM5-LfrXIzzc8k3d1u_WXRTgkaf6beVbvGf-7qlC4,17066
5
+ jobscraper/glassdoor/__init__.py,sha256=SkBquyfuOfC0fHxTNO7E1YOz_b9stQTZya7d_UQomNM,10630
6
+ jobscraper/glassdoor/constant.py,sha256=JLVMVs8_eORq-PBkQOzdbst9kcKCBaIouBFvMDwsXR0,933
7
+ jobscraper/glassdoor/util.py,sha256=S-TP9PGqm0M1H_-z4aTOv_Uz7jmwbTz6TTYexRcHViE,6507
8
+ jobscraper/indeed/__init__.py,sha256=Jza6LvhgGK8lT8L7bvI436tQBD38_S38GzyLOMuvs_c,12540
9
+ jobscraper/indeed/constant.py,sha256=wg2z55DZCT9WT9osGglwEw4L3kvdCavr89ufkpg1uLs,1168
10
+ jobscraper/indeed/util.py,sha256=z6rVLvOpArjbQU4vmNSYfJsHLTSUklLqFfI2mpZGQSw,5189
11
+ jobscraper/linkedin/__init__.py,sha256=WvZvbJyPCzxpePng6GMAlQ9zOi1Vw9zhUygCV94eKlg,121
12
+ jobscraper/linkedin/_scraper.py,sha256=FWv9kd07RbFvA6spxkLWnqI3orQqWtBuqDShUkOBrpM,10440
13
+ jobscraper/linkedin/constant.py,sha256=_y-9wB8M41iT6xVPNadqXbZ4VUnINNfOhoj99s4tfI8,2011
14
+ jobscraper/linkedin/util.py,sha256=9ndsxUOwtc9l_sG2Nk4DVLpm4eqZ_u_RL6qdaO1ae5M,10946
15
+ python_job_scraper-0.3.0.dist-info/licenses/LICENSE,sha256=Jh4wkHSZya7IlXUR-Df-WdW2pZiydHF_t1RKtUpY7t4,1067
16
+ python_job_scraper-0.3.0.dist-info/METADATA,sha256=BxNc3D6bu11BGF7EolSjAyQ4HQsZNp0aHor_PBXqMdY,6551
17
+ python_job_scraper-0.3.0.dist-info/WHEEL,sha256=aeYiig01lYGDzBgS8HxWXOg3uV61G9ijOsup-k9o1sk,91
18
+ python_job_scraper-0.3.0.dist-info/top_level.txt,sha256=C7zK92676MQ6ch90D4JIBqUsZjXK3zquJYwhBw5dpRw,11
19
+ python_job_scraper-0.3.0.dist-info/RECORD,,
@@ -0,0 +1,5 @@
1
+ Wheel-Version: 1.0
2
+ Generator: setuptools (82.0.1)
3
+ Root-Is-Purelib: true
4
+ Tag: py3-none-any
5
+
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 seeedstack
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1 @@
1
+ jobscraper