jobspy-js 1.5.0 → 1.7.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/AGENTS.md +93 -0
- package/CHANGELOG.md +110 -0
- package/README.md +146 -3
- package/SDK.md +97 -1
- package/dist/cli/index.cjs +71 -7
- package/dist/cli/index.cjs.map +1 -1
- package/dist/cli/index.js +71 -7
- package/dist/cli/index.js.map +1 -1
- package/dist/credentials.d.ts +8 -0
- package/dist/index.cjs +2 -1
- package/dist/index.cjs.map +1 -1
- package/dist/index.d.ts +1 -1
- package/dist/index.js +3 -2
- package/dist/mcp/index.cjs +55 -3
- package/dist/mcp/index.cjs.map +1 -1
- package/dist/mcp/index.js +55 -3
- package/dist/mcp/index.js.map +1 -1
- package/dist/{scraper-CfpLfryL.cjs → scraper-975Qy6MB.cjs} +503 -5
- package/dist/scraper-975Qy6MB.cjs.map +1 -0
- package/dist/{scraper-7dVhBEoK.js → scraper-CC9KdZdU.js} +505 -7
- package/dist/scraper-CC9KdZdU.js.map +1 -0
- package/dist/scraper.d.ts +20 -1
- package/dist/scrapers/base.d.ts +14 -1
- package/dist/scrapers/bayt/index.d.ts +6 -1
- package/dist/scrapers/bdjobs/index.d.ts +5 -1
- package/dist/scrapers/glassdoor/index.d.ts +5 -1
- package/dist/scrapers/indeed/index.d.ts +6 -1
- package/dist/scrapers/linkedin/index.d.ts +15 -1
- package/dist/scrapers/naukri/index.d.ts +5 -1
- package/dist/scrapers/ziprecruiter/index.d.ts +5 -1
- package/dist/types.d.ts +45 -0
- package/package.json +1 -1
- package/src/cli/index.ts +82 -1
- package/src/credentials.ts +76 -0
- package/src/index.ts +1 -1
- package/src/mcp/index.ts +63 -2
- package/src/scraper.ts +43 -2
- package/src/scrapers/base.ts +25 -2
- package/src/scrapers/bayt/index.ts +40 -1
- package/src/scrapers/bdjobs/index.ts +36 -0
- package/src/scrapers/glassdoor/index.ts +34 -0
- package/src/scrapers/indeed/index.ts +164 -2
- package/src/scrapers/linkedin/index.ts +129 -2
- package/src/scrapers/naukri/index.ts +38 -0
- package/src/scrapers/ziprecruiter/index.ts +43 -1
- package/src/types.ts +51 -0
- package/tests/integration/indeed.test.ts +23 -0
- package/dist/scraper-7dVhBEoK.js.map +0 -1
- package/dist/scraper-CfpLfryL.cjs.map +0 -1
package/AGENTS.md
ADDED
|
@@ -0,0 +1,93 @@
|
|
|
1
|
+
# AGENTS.md — AI Agent Guidelines
|
|
2
|
+
|
|
3
|
+
This file instructs AI coding agents (GitHub Copilot, Cursor, Claude, etc.) on how to contribute to this project correctly.
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## Mandatory: Update README and CHANGELOG on Every Meaningful Change
|
|
8
|
+
|
|
9
|
+
Whenever you make a change that affects user-visible behaviour — a new feature, a bug fix, a breaking change, a new CLI flag, a new API parameter, or a change to an existing interface — you **must** update both files as part of the same task:
|
|
10
|
+
|
|
11
|
+
1. **[CHANGELOG.md](CHANGELOG.md)** — record the change under `[Unreleased]`.
|
|
12
|
+
2. **[README.md](README.md)** — update any affected section (feature list, tables, examples, structure diagram).
|
|
13
|
+
|
|
14
|
+
Do not skip these updates. Do not leave them as a separate follow-up task.
|
|
15
|
+
|
|
16
|
+
---
|
|
17
|
+
|
|
18
|
+
## CHANGELOG Rules
|
|
19
|
+
|
|
20
|
+
Follow [Keep a Changelog](https://keepachangelog.com/en/1.1.0/) conventions:
|
|
21
|
+
|
|
22
|
+
- All unreleased work goes under `## [Unreleased]` at the top.
|
|
23
|
+
- Use these subsection headers: `Added`, `Changed`, `Fixed`, `Removed`, `Security`, `Deprecated`.
|
|
24
|
+
- Write entries from the **user's point of view** — describe what they can now do, not what lines changed.
|
|
25
|
+
- When a version is released, rename `[Unreleased]` to `## [x.y.z] — YYYY-MM-DD` and add a new empty `[Unreleased]` block above it.
|
|
26
|
+
- Update the comparison links at the bottom of the file whenever a new version is cut.
|
|
27
|
+
|
|
28
|
+
### Entry format
|
|
29
|
+
|
|
30
|
+
```markdown
|
|
31
|
+
## [Unreleased]
|
|
32
|
+
|
|
33
|
+
### Added
|
|
34
|
+
- **Short feature name** — one-sentence description of what was added and why it is useful.
|
|
35
|
+
|
|
36
|
+
### Fixed
|
|
37
|
+
- Brief description of the bug and what was corrected.
|
|
38
|
+
```
|
|
39
|
+
|
|
40
|
+
---
|
|
41
|
+
|
|
42
|
+
## README Rules
|
|
43
|
+
|
|
44
|
+
The README is the primary user-facing documentation. Keep it accurate and complete:
|
|
45
|
+
|
|
46
|
+
| Section to update | When |
|
|
47
|
+
|---|---|
|
|
48
|
+
| **Features** bullet list | new capability added |
|
|
49
|
+
| **SDK — Parameters table** (`scrapeJobs` params) | new/changed param |
|
|
50
|
+
| **Authentication / Credentials** section | credential-related changes |
|
|
51
|
+
| **CLI — Quick Start** examples | new common use-case |
|
|
52
|
+
| **CLI — All CLI Options** table | new/changed CLI flag |
|
|
53
|
+
| **Config File — Profile Options** table | new/changed profile key |
|
|
54
|
+
| **Project Structure** tree | new source file added |
|
|
55
|
+
| **Supported Sites** table | new scraper |
|
|
56
|
+
|
|
57
|
+
Do not add a new CLI flag or SDK parameter without adding it to the corresponding table in README.md.
|
|
58
|
+
|
|
59
|
+
---
|
|
60
|
+
|
|
61
|
+
## Code Change Guidelines
|
|
62
|
+
|
|
63
|
+
### Types first
|
|
64
|
+
New user-facing features start in `src/types.ts`. Add interfaces and fields there before implementing them elsewhere, so TypeScript catches all callsites.
|
|
65
|
+
|
|
66
|
+
### Credential / auth changes
|
|
67
|
+
- Load credentials via `src/credentials.ts` — do not read `process.env` directly in scrapers.
|
|
68
|
+
- Credentials should only be *used* when `useCreds === true`. Never auto-login without explicit opt-in.
|
|
69
|
+
- Always test anonymous scraping first; authenticated fallback is a last resort.
|
|
70
|
+
|
|
71
|
+
### Scraper changes
|
|
72
|
+
- The constructor signature for scrapers is `{ proxies?, credentials?, useCreds? }` — maintain this shape.
|
|
73
|
+
- `SCRAPER_MAP` in `src/scraper.ts` must reflect the current constructor signature type.
|
|
74
|
+
- New scrapers must implement `scrape(input)` and should implement `fetchJob(id, format)` if single-job fetching is possible.
|
|
75
|
+
|
|
76
|
+
### CLI changes
|
|
77
|
+
- New options must be added in three places: the `program.option(...)` chain, the `o = { ... }` merge object, and the `scrapeJobs(...)` call.
|
|
78
|
+
- Profile config keys use `snake_case`; CLI flags use `--kebab-case`; the merge object uses `camelCase`.
|
|
79
|
+
|
|
80
|
+
### Testing
|
|
81
|
+
- Run `pnpm build` after changes to confirm TypeScript compiles cleanly.
|
|
82
|
+
- Add / update tests in `tests/` for any logic in `src/state.ts`, `src/credentials.ts`, or `src/utils.ts`.
|
|
83
|
+
|
|
84
|
+
---
|
|
85
|
+
|
|
86
|
+
## Release Checklist
|
|
87
|
+
|
|
88
|
+
When cutting a new version:
|
|
89
|
+
|
|
90
|
+
1. Move `[Unreleased]` entries in `CHANGELOG.md` to a new dated version block.
|
|
91
|
+
2. Bump `version` in `package.json`.
|
|
92
|
+
3. Update CHANGELOG comparison links at the bottom.
|
|
93
|
+
4. Commit with message: `chore: release vX.Y.Z`.
|
package/CHANGELOG.md
ADDED
|
@@ -0,0 +1,110 @@
|
|
|
1
|
+
# Changelog
|
|
2
|
+
|
|
3
|
+
All notable changes to this project will be documented in this file.
|
|
4
|
+
|
|
5
|
+
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/) and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## [Unreleased]
|
|
10
|
+
|
|
11
|
+
### Added
|
|
12
|
+
- **Provider credentials** — per-provider username/password support for all scrapers via env vars (`LINKEDIN_USERNAME` / `LINKEDIN_PASSWORD`, `INDEED_USERNAME` / `INDEED_PASSWORD`, `GLASSDOOR_USERNAME` / `GLASSDOOR_PASSWORD`, `ZIPRECRUITER_*`, `BAYT_*`, `NAUKRI_*`, `BDJOBS_*`), CLI flags (`--linkedin-username`, `--linkedin-password`, …), or `credentials` object in `ScrapeJobsParams`.
|
|
13
|
+
- **`--creds` CLI flag** — opt-in authenticated scraping fallback (also `JOBSPY_CREDS=1` env var). Credentials are loaded but only *used* when this flag is set.
|
|
14
|
+
- **LinkedIn login fallback** — on 429 throttle or auth-wall redirect, the LinkedIn scraper automatically attempts a form-based session login and retries the blocked request when `--creds` is active.
|
|
15
|
+
- `src/credentials.ts` — new module that merges credentials from three priority layers: env vars → per-parameter fields → explicit `credentials` object.
|
|
16
|
+
- `ProviderCredentials` and `ProviderCreds` types exported from `types.ts`.
|
|
17
|
+
- `use_creds`, `credentials`, and all `*_username`/`*_password` fields added to `ScrapeJobsParams` and `ScraperInput`.
|
|
18
|
+
- `--init` profiles now include commented-out credential examples.
|
|
19
|
+
|
|
20
|
+
---
|
|
21
|
+
|
|
22
|
+
## [1.6.0] — 2026-03-02
|
|
23
|
+
|
|
24
|
+
### Added
|
|
25
|
+
- **`fetchJobDetails(site, id, options)`** — fetch full job details by provider-specific ID for *any* supported site (`indeed`, `linkedin`, `glassdoor`, `zip_recruiter`, `bayt`, `naukri`, `bdjobs`). Equivalent to `fetchLinkedInJob` but provider-agnostic.
|
|
26
|
+
- **`--id <jobId>` CLI flag** — fetch full job details from any provider by ID (requires `-s/--site`).
|
|
27
|
+
- Abstract `fetchJob(id, format)` method on the base `Scraper` class; subclasses override per provider.
|
|
28
|
+
|
|
29
|
+
### Changed
|
|
30
|
+
- LinkedIn scraper now distinguishes between a true auth-wall redirect (no job content) and a page that merely references the signup URL — prevents false-positive empty results.
|
|
31
|
+
- LinkedIn auth-wall check uses `show-more-less-html__markup` presence as a page-content signal.
|
|
32
|
+
|
|
33
|
+
---
|
|
34
|
+
|
|
35
|
+
## [1.5.0] — 2026-02-27
|
|
36
|
+
|
|
37
|
+
### Added
|
|
38
|
+
- **`fetchLinkedInJob(idOrUrl, options)`** — fetch full details for a single LinkedIn job by numeric ID or full URL. Returns description, job level, job type, job function, company industry, company logo, and direct application URL.
|
|
39
|
+
- **`--describe <jobId>` CLI flag** — fetch and pretty-print LinkedIn job details from the command line.
|
|
40
|
+
- **Unified `jobspy.json` config file** — single file stores both search profiles (`config.profiles`) and dedup state (`state.profiles`). Replaces the previous separate state file.
|
|
41
|
+
- **`--init` CLI flag** — generates a `jobspy.json` with two sample profiles (`frontend`, `backend`).
|
|
42
|
+
- **`--list-profiles` CLI flag** — lists all profiles with last-run timestamp, sites, and search term.
|
|
43
|
+
- **`--profile <name>` CLI flag** — run a named search profile from `jobspy.json`; CLI flags override profile values.
|
|
44
|
+
- **`--all` CLI flag** — skip dedup filtering for one run while still updating state.
|
|
45
|
+
- **Dedup / incremental runs** — `scrapeJobs()` with a profile name automatically tracks seen URLs and date watermarks per provider. Only new jobs are returned on subsequent runs.
|
|
46
|
+
- `profile` and `skip_dedup` fields in `ScrapeJobsParams`.
|
|
47
|
+
- `ScrapeJobsResult` now includes `totalScraped`, `newCount`, and `profile` metadata.
|
|
48
|
+
- `src/state.ts` — state file I/O, `filterNewJobs()`, `updateProviderState()`, `mergeParams()`.
|
|
49
|
+
|
|
50
|
+
### Changed
|
|
51
|
+
- CLI option merging: profile config values are used as defaults; CLI flags always take precedence.
|
|
52
|
+
|
|
53
|
+
---
|
|
54
|
+
|
|
55
|
+
## [1.3.0] — 2026-02-21
|
|
56
|
+
|
|
57
|
+
### Changed
|
|
58
|
+
- **Dual ESM/CJS output** — Vite build now emits both `.js` (ESM) and `.cjs` (CommonJS) bundles for broad compatibility with Node.js consumers.
|
|
59
|
+
- Updated `package.json` exports map with `import`/`require` conditions.
|
|
60
|
+
|
|
61
|
+
---
|
|
62
|
+
|
|
63
|
+
## [1.2.0] — 2026-02-21
|
|
64
|
+
|
|
65
|
+
### Added
|
|
66
|
+
- **`SDK.md`** — comprehensive SDK reference covering all parameters, types, enums, output fields, proxy configuration, country support, and advanced examples.
|
|
67
|
+
|
|
68
|
+
---
|
|
69
|
+
|
|
70
|
+
## [1.1.0] — 2026-02-18
|
|
71
|
+
|
|
72
|
+
### Added
|
|
73
|
+
- **Google Careers scraper** (`google_careers`) — scrapes jobs posted at Google the company via plain HTTP; parses `AF_initDataCallback` JSON payload.
|
|
74
|
+
- **Playwright support for Google Jobs** (`google`) — headless Chromium execution via `@playwright/test` to handle JavaScript-rendered job listings.
|
|
75
|
+
|
|
76
|
+
### Fixed
|
|
77
|
+
- ZipRecruiter scraper — corrected JSON extraction and pagination.
|
|
78
|
+
- BDJobs scraper — fixed request parameters and result parsing.
|
|
79
|
+
|
|
80
|
+
---
|
|
81
|
+
|
|
82
|
+
## [1.0.1] — 2026-02-18
|
|
83
|
+
|
|
84
|
+
### Changed
|
|
85
|
+
- Refactored source structure for improved readability and maintainability (module split, consistent naming).
|
|
86
|
+
- Added `release` script to `package.json`.
|
|
87
|
+
|
|
88
|
+
---
|
|
89
|
+
|
|
90
|
+
## [1.0.0] — 2026-02-17
|
|
91
|
+
|
|
92
|
+
### Added
|
|
93
|
+
- Initial TypeScript port of [JobSpy](https://github.com/speedyapply/JobSpy) (Python).
|
|
94
|
+
- **9 scrapers**: LinkedIn (HTML), Indeed (GraphQL), Glassdoor (GraphQL), Google Jobs (Playwright), Google Careers (HTTP), ZipRecruiter, Bayt, Naukri (REST), BDJobs (REST).
|
|
95
|
+
- **Three interfaces**: SDK (`scrapeJobs()`), CLI (`jobspy`), MCP server.
|
|
96
|
+
- [wreq-js](https://github.com/nicehash/wreq-js) browser TLS fingerprint emulation (JA3/JA4, Chrome/Firefox/Safari).
|
|
97
|
+
- Concurrent multi-site scraping via `Promise.allSettled`.
|
|
98
|
+
- Proxy rotation support.
|
|
99
|
+
- Salary extraction from job descriptions.
|
|
100
|
+
- 60+ country support for Indeed and Glassdoor regional domains.
|
|
101
|
+
- `JobPost`, `ScraperInput`, `JobResponse`, and all supporting types.
|
|
102
|
+
|
|
103
|
+
[Unreleased]: https://github.com/borgius/jobspy-js/compare/v1.6.0...HEAD
|
|
104
|
+
[1.6.0]: https://github.com/borgius/jobspy-js/compare/v1.5.0...v1.6.0
|
|
105
|
+
[1.5.0]: https://github.com/borgius/jobspy-js/compare/v1.3.0...v1.5.0
|
|
106
|
+
[1.3.0]: https://github.com/borgius/jobspy-js/compare/v1.2.0...v1.3.0
|
|
107
|
+
[1.2.0]: https://github.com/borgius/jobspy-js/compare/v1.1.0...v1.2.0
|
|
108
|
+
[1.1.0]: https://github.com/borgius/jobspy-js/compare/v1.0.1...v1.1.0
|
|
109
|
+
[1.0.1]: https://github.com/borgius/jobspy-js/compare/v1.0.0...v1.0.1
|
|
110
|
+
[1.0.0]: https://github.com/borgius/jobspy-js/releases/tag/v1.0.0
|
package/README.md
CHANGED
|
@@ -15,6 +15,7 @@ Uses [wreq-js](https://github.com/nicehash/wreq-js) for browser TLS fingerprint
|
|
|
15
15
|
- **Concurrent scraping** — all sites scraped in parallel
|
|
16
16
|
- **Salary extraction** — parses compensation from descriptions when not provided directly
|
|
17
17
|
- **60+ countries** — Indeed/Glassdoor regional domain support
|
|
18
|
+
- **Credential fallback** — optional per-provider login (env vars or CLI flags) when anonymous scraping is blocked
|
|
18
19
|
|
|
19
20
|
## Supported Sites
|
|
20
21
|
|
|
@@ -46,7 +47,7 @@ npm install jobspy-js
|
|
|
46
47
|
> **Full SDK reference:** See [SDK.md](https://github.com/borgius/jobspy-js/blob/master/SDK.md) for complete documentation — all parameters, types, enums, output fields, proxy configuration, country support, and advanced examples.
|
|
47
48
|
|
|
48
49
|
```ts
|
|
49
|
-
import { scrapeJobs, fetchLinkedInJob } from "jobspy-js";
|
|
50
|
+
import { scrapeJobs, fetchLinkedInJob, fetchJobDetails } from "jobspy-js";
|
|
50
51
|
|
|
51
52
|
// Scrape multiple job boards
|
|
52
53
|
const result = await scrapeJobs({
|
|
@@ -64,6 +65,10 @@ for (const job of result.jobs) {
|
|
|
64
65
|
// Fetch details for a single LinkedIn job
|
|
65
66
|
const details = await fetchLinkedInJob("4127292817");
|
|
66
67
|
console.log(details.description);
|
|
68
|
+
|
|
69
|
+
// Fetch full job details by ID for any provider
|
|
70
|
+
const job = await fetchJobDetails("indeed", "fdde406379455a1e");
|
|
71
|
+
console.log(job.description);
|
|
67
72
|
```
|
|
68
73
|
|
|
69
74
|
### Parameters
|
|
@@ -85,6 +90,14 @@ console.log(details.description);
|
|
|
85
90
|
| `enforce_annual_salary` | `boolean` | `false` | Convert all salaries to annual |
|
|
86
91
|
| `profile` | `string` | — | Named profile for dedup tracking |
|
|
87
92
|
| `skip_dedup` | `boolean` | `false` | Skip dedup filtering (still updates state) |
|
|
93
|
+
| `use_creds` | `boolean` | `false` | Enable credential fallback when anonymous scraping is blocked (also: `JOBSPY_CREDS=1`) |
|
|
94
|
+
| `credentials` | `ProviderCredentials` | — | Explicit credentials object (see [Authentication](#authentication--credentials)) |
|
|
95
|
+
| `linkedin_username` | `string` | — | LinkedIn username/email (also: `LINKEDIN_USERNAME`) |
|
|
96
|
+
| `linkedin_password` | `string` | — | LinkedIn password (also: `LINKEDIN_PASSWORD`) |
|
|
97
|
+
| `indeed_username` | `string` | — | Indeed username/email (also: `INDEED_USERNAME`) |
|
|
98
|
+
| `indeed_password` | `string` | — | Indeed password (also: `INDEED_PASSWORD`) |
|
|
99
|
+
| `glassdoor_username` | `string` | — | Glassdoor username/email (also: `GLASSDOOR_USERNAME`) |
|
|
100
|
+
| `glassdoor_password` | `string` | — | Glassdoor password (also: `GLASSDOOR_PASSWORD`) |
|
|
88
101
|
|
|
89
102
|
### fetchLinkedInJob()
|
|
90
103
|
|
|
@@ -107,7 +120,103 @@ Options: `{ format?: "markdown"|"html"|"plain", proxies?: string|string[] }`
|
|
|
107
120
|
|
|
108
121
|
> **Full reference:** See [SDK.md](https://github.com/borgius/jobspy-js/blob/master/SDK.md#fetchlinkedinjob) for all fields and examples.
|
|
109
122
|
|
|
110
|
-
|
|
123
|
+
### fetchJobDetails()
|
|
124
|
+
|
|
125
|
+
Fetch full details for a single job by ID on **any** provider:
|
|
126
|
+
|
|
127
|
+
```ts
|
|
128
|
+
import { fetchJobDetails } from "jobspy-js";
|
|
129
|
+
|
|
130
|
+
// Works with any supported site
|
|
131
|
+
const job = await fetchJobDetails("indeed", "fdde406379455a1e");
|
|
132
|
+
// also: fetchJobDetails("linkedin", "4127292817")
|
|
133
|
+
// also: fetchJobDetails("glassdoor", "123456789")
|
|
134
|
+
// also: fetchJobDetails("zip_recruiter", "some-listing-key")
|
|
135
|
+
// also: fetchJobDetails("bayt", "/en/job-title-1234567")
|
|
136
|
+
// also: fetchJobDetails("naukri", "123456789")
|
|
137
|
+
// also: fetchJobDetails("bdjobs", "123456")
|
|
138
|
+
|
|
139
|
+
console.log(job.description); // full job description
|
|
140
|
+
console.log(job.title); // job title
|
|
141
|
+
console.log(job.company); // company name
|
|
142
|
+
```
|
|
143
|
+
|
|
144
|
+
Options: `{ format?: "markdown"|"html"|"plain", proxies?: string|string[], country?: string }`
|
|
145
|
+
|
|
146
|
+
> **Full reference:** See [SDK.md](https://github.com/borgius/jobspy-js/blob/master/SDK.md#fetchjobdetails) for all fields and examples.
|
|
147
|
+
## Authentication / Credentials
|
|
148
|
+
|
|
149
|
+
All providers support optional authenticated scraping as a fallback for when anonymous access is blocked (e.g. LinkedIn 429s or auth-wall redirects). Credentials are **never used unless explicitly enabled**.
|
|
150
|
+
|
|
151
|
+
### Enable credential fallback
|
|
152
|
+
|
|
153
|
+
```bash
|
|
154
|
+
# Via env var
|
|
155
|
+
JOBSPY_CREDS=1 jobspy -s linkedin -q "engineer"
|
|
156
|
+
|
|
157
|
+
# Via CLI flag
|
|
158
|
+
jobspy -s linkedin -q "engineer" --creds
|
|
159
|
+
```
|
|
160
|
+
|
|
161
|
+
Or in the SDK:
|
|
162
|
+
|
|
163
|
+
```ts
|
|
164
|
+
const result = await scrapeJobs({
|
|
165
|
+
site_name: ["linkedin"],
|
|
166
|
+
search_term: "engineer",
|
|
167
|
+
use_creds: true,
|
|
168
|
+
linkedin_username: process.env.LINKEDIN_USERNAME,
|
|
169
|
+
linkedin_password: process.env.LINKEDIN_PASSWORD,
|
|
170
|
+
});
|
|
171
|
+
```
|
|
172
|
+
|
|
173
|
+
### Setting credentials
|
|
174
|
+
|
|
175
|
+
Credentials are resolved in this priority order (highest wins):
|
|
176
|
+
|
|
177
|
+
1. **Explicit `credentials` object** in `ScrapeJobsParams`
|
|
178
|
+
2. **Per-field params** (`linkedin_username`, `linkedin_password`, …)
|
|
179
|
+
3. **Environment variables**
|
|
180
|
+
|
|
181
|
+
#### Environment variables
|
|
182
|
+
|
|
183
|
+
| Provider | Username env var | Password env var |
|
|
184
|
+
|----------|-----------------|------------------|
|
|
185
|
+
| LinkedIn | `LINKEDIN_USERNAME` | `LINKEDIN_PASSWORD` |
|
|
186
|
+
| Indeed | `INDEED_USERNAME` | `INDEED_PASSWORD` |
|
|
187
|
+
| Glassdoor | `GLASSDOOR_USERNAME` | `GLASSDOOR_PASSWORD` |
|
|
188
|
+
| ZipRecruiter | `ZIPRECRUITER_USERNAME` | `ZIPRECRUITER_PASSWORD` |
|
|
189
|
+
| Bayt | `BAYT_USERNAME` | `BAYT_PASSWORD` |
|
|
190
|
+
| Naukri | `NAUKRI_USERNAME` | `NAUKRI_PASSWORD` |
|
|
191
|
+
| BDJobs | `BDJOBS_USERNAME` | `BDJOBS_PASSWORD` |
|
|
192
|
+
|
|
193
|
+
#### CLI flags
|
|
194
|
+
|
|
195
|
+
```bash
|
|
196
|
+
jobspy -s linkedin -q "engineer" --creds \
|
|
197
|
+
--linkedin-username me@email.com \
|
|
198
|
+
--linkedin-password secret
|
|
199
|
+
```
|
|
200
|
+
|
|
201
|
+
#### `jobspy.json` profile
|
|
202
|
+
|
|
203
|
+
```json
|
|
204
|
+
{
|
|
205
|
+
"config": {
|
|
206
|
+
"profiles": {
|
|
207
|
+
"frontend": {
|
|
208
|
+
"site": ["linkedin", "indeed"],
|
|
209
|
+
"search_term": "react developer",
|
|
210
|
+
"creds": true,
|
|
211
|
+
"linkedin_username": "me@email.com",
|
|
212
|
+
"linkedin_password": "secret"
|
|
213
|
+
}
|
|
214
|
+
}
|
|
215
|
+
}
|
|
216
|
+
}
|
|
217
|
+
```
|
|
218
|
+
|
|
219
|
+
> **Security note:** Prefer environment variables over storing passwords in `jobspy.json`. The state section of that file is committed by some users.
|
|
111
220
|
|
|
112
221
|
### Quick Start
|
|
113
222
|
|
|
@@ -127,6 +236,14 @@ jobspy -s google_careers -q "software engineer" -l "USA" -n 10
|
|
|
127
236
|
# Fetch full details for a single LinkedIn job
|
|
128
237
|
jobspy --describe 4127292817
|
|
129
238
|
jobspy --describe https://www.linkedin.com/jobs/view/4127292817
|
|
239
|
+
|
|
240
|
+
# Fetch full job details by ID for any provider
|
|
241
|
+
jobspy -s indeed --id fdde406379455a1e
|
|
242
|
+
jobspy -s glassdoor --id 123456789
|
|
243
|
+
|
|
244
|
+
# Use credential fallback (anonymous scraping blocked)
|
|
245
|
+
JOBSPY_CREDS=1 LINKEDIN_USERNAME=me@email.com LINKEDIN_PASSWORD=secret jobspy -s linkedin -q "engineer"
|
|
246
|
+
jobspy -s linkedin -q "engineer" --creds --linkedin-username me@email.com --linkedin-password secret
|
|
130
247
|
```
|
|
131
248
|
|
|
132
249
|
### All CLI Options
|
|
@@ -147,6 +264,7 @@ jobspy --describe https://www.linkedin.com/jobs/view/4127292817
|
|
|
147
264
|
| `-p, --proxies <proxies...>` | | — | Proxy servers |
|
|
148
265
|
| `--format <format>` | | `markdown` | Description format: `markdown`, `html`, `plain` |
|
|
149
266
|
| `--linkedin-fetch-description` | | `false` | Fetch full LinkedIn descriptions |
|
|
267
|
+
| `--indeed-fetch-description` | | `false` | Fetch full Indeed job pages/Direct-link descriptions |
|
|
150
268
|
| `--linkedin-company-ids <ids...>` | | — | Filter by LinkedIn company IDs |
|
|
151
269
|
| `--offset <offset>` | | `0` | Pagination offset |
|
|
152
270
|
| `--hours-old <hours>` | | — | Only jobs posted within N hours |
|
|
@@ -158,6 +276,21 @@ jobspy --describe https://www.linkedin.com/jobs/view/4127292817
|
|
|
158
276
|
| `--list-profiles` | | — | List all saved profiles |
|
|
159
277
|
| `--init` | | — | Generate a `jobspy.json` with sample profiles |
|
|
160
278
|
| `--describe <jobId>` | | — | Fetch full LinkedIn job details by ID or URL |
|
|
279
|
+
| `--id <jobId>` | | — | Fetch full job details by ID (requires `-s/--site`) |
|
|
280
|
+
| **Credentials** | | | |
|
|
281
|
+
| `--creds` | | `false` | Enable credential fallback when anonymous scraping is blocked (also: `JOBSPY_CREDS=1`) |
|
|
282
|
+
| `--linkedin-username <user>` | | — | LinkedIn username/email (also: `LINKEDIN_USERNAME`) |
|
|
283
|
+
| `--linkedin-password <pass>` | | — | LinkedIn password (also: `LINKEDIN_PASSWORD`) |
|
|
284
|
+
| `--indeed-username <user>` | | — | Indeed username/email (also: `INDEED_USERNAME`) |
|
|
285
|
+
| `--indeed-password <pass>` | | — | Indeed password (also: `INDEED_PASSWORD`) |
|
|
286
|
+
| `--glassdoor-username <user>` | | — | Glassdoor username/email (also: `GLASSDOOR_USERNAME`) |
|
|
287
|
+
| `--glassdoor-password <pass>` | | — | Glassdoor password (also: `GLASSDOOR_PASSWORD`) |
|
|
288
|
+
| `--ziprecruiter-username <user>` | | — | ZipRecruiter username/email (also: `ZIPRECRUITER_USERNAME`) |
|
|
289
|
+
| `--ziprecruiter-password <pass>` | | — | ZipRecruiter password (also: `ZIPRECRUITER_PASSWORD`) |
|
|
290
|
+
| `--bayt-username <user>` | | — | Bayt username/email (also: `BAYT_USERNAME`) |
|
|
291
|
+
| `--bayt-password <pass>` | | — | Bayt password (also: `BAYT_PASSWORD`) |
|
|
292
|
+
| `--naukri-username <user>` | | — | Naukri username/email (also: `NAUKRI_USERNAME`) |
|
|
293
|
+
| `--naukri-password <pass>` | | — | Naukri password (also: `NAUKRI_PASSWORD`) |
|
|
161
294
|
|
|
162
295
|
## Config File (`jobspy.json`)
|
|
163
296
|
|
|
@@ -187,6 +320,7 @@ This creates a `jobspy.json` in the current directory with two sample profiles:
|
|
|
187
320
|
"country": "usa",
|
|
188
321
|
"format": "markdown",
|
|
189
322
|
"linkedin_fetch_description": true,
|
|
323
|
+
"indeed_fetch_description": false,
|
|
190
324
|
"hours_old": 72,
|
|
191
325
|
"enforce_annual_salary": true,
|
|
192
326
|
"verbose": 1,
|
|
@@ -202,6 +336,7 @@ This creates a `jobspy.json` in the current directory with two sample profiles:
|
|
|
202
336
|
"job_type": "fulltime",
|
|
203
337
|
"results": 50,
|
|
204
338
|
"hours_old": 48,
|
|
339
|
+
"indeed_fetch_description": false,
|
|
205
340
|
"output": "backend-jobs.json"
|
|
206
341
|
}
|
|
207
342
|
}
|
|
@@ -238,6 +373,13 @@ Each profile in `config.profiles` supports the following keys:
|
|
|
238
373
|
| `enforce_annual_salary` | `boolean` | Normalize salaries to annual |
|
|
239
374
|
| `verbose` | `number` | Log verbosity level |
|
|
240
375
|
| `output` | `string` | Output file path |
|
|
376
|
+
| `creds` | `boolean` | Enable credential fallback (also: `JOBSPY_CREDS=1`) |
|
|
377
|
+
| `linkedin_username` | `string` | LinkedIn username/email (also: `LINKEDIN_USERNAME`) |
|
|
378
|
+
| `linkedin_password` | `string` | LinkedIn password (also: `LINKEDIN_PASSWORD`) |
|
|
379
|
+
| `indeed_username` | `string` | Indeed username/email (also: `INDEED_USERNAME`) |
|
|
380
|
+
| `indeed_password` | `string` | Indeed password (also: `INDEED_PASSWORD`) |
|
|
381
|
+
| `glassdoor_username` | `string` | Glassdoor username/email (also: `GLASSDOOR_USERNAME`) |
|
|
382
|
+
| `glassdoor_password` | `string` | Glassdoor password (also: `GLASSDOOR_PASSWORD`) |
|
|
241
383
|
|
|
242
384
|
### Running Profiles
|
|
243
385
|
|
|
@@ -404,6 +546,7 @@ npm test
|
|
|
404
546
|
src/
|
|
405
547
|
├── index.ts # SDK entry point
|
|
406
548
|
├── scraper.ts # Main scrapeJobs() orchestrator
|
|
549
|
+
├── credentials.ts # Credential loader (env → params → object merge)
|
|
407
550
|
├── state.ts # Profile state, dedup logic, file I/O
|
|
408
551
|
├── types.ts # All types, enums, country config
|
|
409
552
|
├── utils.ts # Logger, proxy rotation, HTML helpers
|
|
@@ -412,7 +555,7 @@ src/
|
|
|
412
555
|
└── scrapers/
|
|
413
556
|
├── base.ts # Abstract Scraper base class
|
|
414
557
|
├── indeed/ # GraphQL API
|
|
415
|
-
├── linkedin/ # HTML scraping
|
|
558
|
+
├── linkedin/ # HTML scraping + optional login fallback
|
|
416
559
|
├── glassdoor/ # GraphQL API
|
|
417
560
|
├── google/ # Playwright headless Chrome
|
|
418
561
|
├── google-careers/ # Plain HTTP; AF_initDataCallback JSON parsing
|
package/SDK.md
CHANGED
|
@@ -11,6 +11,10 @@ Comprehensive SDK documentation for [jobspy-js](https://github.com/borgius/jobsp
|
|
|
11
11
|
- [fetchLinkedInJob()](#fetchlinkedinjob)
|
|
12
12
|
- [Parameters](#fetchlinkedinjob-parameters)
|
|
13
13
|
- [Return Value](#fetchlinkedinjob-return-value)
|
|
14
|
+
- [fetchJobDetails()](#fetchjobdetails)
|
|
15
|
+
- [Parameters](#fetchjobdetails-parameters)
|
|
16
|
+
- [Return Value](#fetchjobdetails-return-value)
|
|
17
|
+
- [Supported Sites](#fetchjobdetails-supported-sites)
|
|
14
18
|
- [Types & Enums](#types--enums)
|
|
15
19
|
- [Site](#site)
|
|
16
20
|
- [JobType](#jobtype)
|
|
@@ -35,6 +39,7 @@ Comprehensive SDK documentation for [jobspy-js](https://github.com/borgius/jobsp
|
|
|
35
39
|
- [Pagination with Offset](#pagination-with-offset)
|
|
36
40
|
- [Recent Jobs Only](#recent-jobs-only)
|
|
37
41
|
- [Fetch Single LinkedIn Job](#fetch-single-linkedin-job)
|
|
42
|
+
- [Fetch Job by ID (Any Provider)](#fetch-job-by-id-any-provider)
|
|
38
43
|
- [Error Handling](#error-handling)
|
|
39
44
|
- [Exports](#exports)
|
|
40
45
|
|
|
@@ -92,6 +97,7 @@ interface ScrapeJobsParams {
|
|
|
92
97
|
proxies?: string | string[];
|
|
93
98
|
description_format?: string;
|
|
94
99
|
linkedin_fetch_description?: boolean;
|
|
100
|
+
indeed_fetch_description?: boolean;
|
|
95
101
|
linkedin_company_ids?: number[];
|
|
96
102
|
offset?: number;
|
|
97
103
|
hours_old?: number;
|
|
@@ -115,6 +121,7 @@ interface ScrapeJobsParams {
|
|
|
115
121
|
| `proxies` | `string \| string[]` | — | Proxy server(s) for rotating requests. See [Proxy Configuration](#proxy-configuration). |
|
|
116
122
|
| `description_format` | `string` | `"markdown"` | Format for job descriptions: `"markdown"`, `"html"`, or `"plain"`. |
|
|
117
123
|
| `linkedin_fetch_description` | `boolean` | `false` | Fetch full job descriptions from LinkedIn (requires an extra HTTP request per job — slower). |
|
|
124
|
+
| `indeed_fetch_description` | `boolean` | `false` | Visit the Indeed job page or direct link to scrape the full description. |
|
|
118
125
|
| `linkedin_company_ids` | `number[]` | — | Filter LinkedIn results to specific company IDs. |
|
|
119
126
|
| `offset` | `number` | `0` | Skip the first N results (pagination offset). |
|
|
120
127
|
| `hours_old` | `number` | — | Only return jobs posted within the last N hours. |
|
|
@@ -189,6 +196,73 @@ interface LinkedInJobDetails {
|
|
|
189
196
|
|
|
190
197
|
---
|
|
191
198
|
|
|
199
|
+
## fetchJobDetails()
|
|
200
|
+
|
|
201
|
+
Fetch full details for a single job by ID on **any** supported provider. This is a universal alternative to `fetchLinkedInJob()` that works across all scrapers.
|
|
202
|
+
|
|
203
|
+
```ts
|
|
204
|
+
import { fetchJobDetails } from "jobspy-js";
|
|
205
|
+
|
|
206
|
+
const job = await fetchJobDetails("indeed", "fdde406379455a1e");
|
|
207
|
+
console.log(job.title); // job title
|
|
208
|
+
console.log(job.company); // company name
|
|
209
|
+
console.log(job.description); // full description
|
|
210
|
+
```
|
|
211
|
+
|
|
212
|
+
### fetchJobDetails Parameters
|
|
213
|
+
|
|
214
|
+
```ts
|
|
215
|
+
fetchJobDetails(
|
|
216
|
+
site: string, // Site name (e.g. "indeed", "linkedin", "glassdoor")
|
|
217
|
+
jobId: string, // Provider-specific job ID
|
|
218
|
+
options?: {
|
|
219
|
+
format?: string; // "markdown" (default), "html", or "plain"
|
|
220
|
+
proxies?: string | string[]; // Proxy server(s)
|
|
221
|
+
country?: string; // Country code (default: "usa")
|
|
222
|
+
},
|
|
223
|
+
): Promise<FlatJobRecord | null>
|
|
224
|
+
```
|
|
225
|
+
|
|
226
|
+
| Parameter | Type | Default | Description |
|
|
227
|
+
|-----------|------|---------|-------------|
|
|
228
|
+
| `site` | `string` | — | Site name: `"indeed"`, `"linkedin"`, `"glassdoor"`, `"zip_recruiter"`, `"bayt"`, `"naukri"`, `"bdjobs"` |
|
|
229
|
+
| `jobId` | `string` | — | Provider-specific job ID (as returned in search results) |
|
|
230
|
+
| `options.format` | `string` | `"markdown"` | Description format: `"markdown"`, `"html"`, or `"plain"` |
|
|
231
|
+
| `options.proxies` | `string \| string[]` | — | Proxy server(s) for the request |
|
|
232
|
+
| `options.country` | `string` | `"usa"` | Country for Indeed/Glassdoor |
|
|
233
|
+
|
|
234
|
+
### fetchJobDetails Return Value
|
|
235
|
+
|
|
236
|
+
Returns a `FlatJobRecord` (same shape as search results) or `null` if the job wasn't found. The record includes all available fields for the job:
|
|
237
|
+
|
|
238
|
+
| Field | Type | Description |
|
|
239
|
+
|-------|------|-------------|
|
|
240
|
+
| `id` | `string` | Prefixed job ID (e.g. `"in-fdde406379455a1e"`) |
|
|
241
|
+
| `site` | `string` | Site name |
|
|
242
|
+
| `title` | `string` | Job title |
|
|
243
|
+
| `company` | `string` | Company name |
|
|
244
|
+
| `description` | `string` | Full job description |
|
|
245
|
+
| `job_url` | `string` | Job listing URL |
|
|
246
|
+
| `location` | `string` | Formatted location |
|
|
247
|
+
| `date_posted` | `string` | When the job was posted |
|
|
248
|
+
| ... | | All other standard `FlatJobRecord` fields |
|
|
249
|
+
|
|
250
|
+
### fetchJobDetails Supported Sites
|
|
251
|
+
|
|
252
|
+
| Site | Job ID Format | Notes |
|
|
253
|
+
|------|--------------|-------|
|
|
254
|
+
| `indeed` | Indeed job key (e.g. `"fdde406379455a1e"`) | Fetches full description from job page |
|
|
255
|
+
| `linkedin` | LinkedIn job ID (e.g. `"4127292817"`) | Uses existing LinkedIn detail fetcher |
|
|
256
|
+
| `glassdoor` | Glassdoor listing ID | Requires CSRF token initialization |
|
|
257
|
+
| `zip_recruiter` | ZipRecruiter listing key | Parses full HTML description |
|
|
258
|
+
| `bayt` | Bayt job path (e.g. `"/en/job-title-1234567"`) | Scrapes job detail page |
|
|
259
|
+
| `naukri` | Naukri job ID | Uses REST API |
|
|
260
|
+
| `bdjobs` | BDJobs job ID | Extracts from detail page |
|
|
261
|
+
| `google` | — | Not supported (returns error) |
|
|
262
|
+
| `google_careers` | — | Not supported (returns error) |
|
|
263
|
+
|
|
264
|
+
---
|
|
265
|
+
|
|
192
266
|
## Types & Enums
|
|
193
267
|
|
|
194
268
|
All types and enums are exported from the package root:
|
|
@@ -636,6 +710,28 @@ const job2 = await fetchLinkedInJob(
|
|
|
636
710
|
);
|
|
637
711
|
```
|
|
638
712
|
|
|
713
|
+
### Fetch Job by ID (Any Provider)
|
|
714
|
+
|
|
715
|
+
```ts
|
|
716
|
+
import { fetchJobDetails } from "jobspy-js";
|
|
717
|
+
|
|
718
|
+
// Fetch an Indeed job by ID
|
|
719
|
+
const job = await fetchJobDetails("indeed", "fdde406379455a1e");
|
|
720
|
+
console.log(job.title); // job title
|
|
721
|
+
console.log(job.description); // full description
|
|
722
|
+
|
|
723
|
+
// Fetch a Glassdoor job with HTML format
|
|
724
|
+
const job2 = await fetchJobDetails("glassdoor", "123456789", {
|
|
725
|
+
format: "html",
|
|
726
|
+
country: "uk",
|
|
727
|
+
});
|
|
728
|
+
|
|
729
|
+
// Fetch a ZipRecruiter job via proxy
|
|
730
|
+
const job3 = await fetchJobDetails("zip_recruiter", "listing-key", {
|
|
731
|
+
proxies: "user:pass@proxy.example.com:8080",
|
|
732
|
+
});
|
|
733
|
+
```
|
|
734
|
+
|
|
639
735
|
---
|
|
640
736
|
|
|
641
737
|
## Error Handling
|
|
@@ -675,7 +771,7 @@ Everything exported from `"jobspy-js"`:
|
|
|
675
771
|
|
|
676
772
|
```ts
|
|
677
773
|
// Main functions
|
|
678
|
-
export { scrapeJobs, fetchLinkedInJob } from "./scraper";
|
|
774
|
+
export { scrapeJobs, fetchLinkedInJob, fetchJobDetails } from "./scraper";
|
|
679
775
|
|
|
680
776
|
// Enums
|
|
681
777
|
export { Site, JobType, CompensationInterval, DescriptionFormat, SalarySource } from "./types";
|