job-forge 2.0.3 → 2.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -300,6 +300,8 @@ When a form says "enter the code we sent to your email", you MUST retrieve the c
300
300
  | Workday | `from:myworkday newer_than:10m` |
301
301
  | Lever | `from:lever newer_than:10m` |
302
302
  | Ashby | `from:ashby newer_than:10m` |
303
+ | SmartRecruiters | `from:smartrecruiters newer_than:10m` |
304
+ | Aggregator redirect (WeWorkRemotely / RemoteOK) | Detect the underlying ATS from the post-redirect URL, then use that row's sender query |
303
305
  | Unknown | `newer_than:10m subject:(verify OR code OR confirm)` |
304
306
 
305
307
  **Rules:**
package/AGENTS.md CHANGED
@@ -295,6 +295,8 @@ When a form says "enter the code we sent to your email", you MUST retrieve the c
295
295
  | Workday | `from:myworkday newer_than:10m` |
296
296
  | Lever | `from:lever newer_than:10m` |
297
297
  | Ashby | `from:ashby newer_than:10m` |
298
+ | SmartRecruiters | `from:smartrecruiters newer_than:10m` |
299
+ | Aggregator redirect (WeWorkRemotely / RemoteOK) | Detect the underlying ATS from the post-redirect URL, then use that row's sender query |
298
300
  | Unknown | `newer_than:10m subject:(verify OR code OR confirm)` |
299
301
 
300
302
  **Rules:**
package/CLAUDE.md CHANGED
@@ -295,6 +295,8 @@ When a form says "enter the code we sent to your email", you MUST retrieve the c
295
295
  | Workday | `from:myworkday newer_than:10m` |
296
296
  | Lever | `from:lever newer_than:10m` |
297
297
  | Ashby | `from:ashby newer_than:10m` |
298
+ | SmartRecruiters | `from:smartrecruiters newer_than:10m` |
299
+ | Aggregator redirect (WeWorkRemotely / RemoteOK) | Detect the underlying ATS from the post-redirect URL, then use that row's sender query |
298
300
  | Unknown | `newer_than:10m subject:(verify OR code OR confirm)` |
299
301
 
300
302
  **Rules:**
@@ -53,8 +53,8 @@ The skill router (`.opencode/skills/job-forge.md`) loads mode and data files on
53
53
 
54
54
  ```
55
55
  ┌─────────────────────────────────┐
56
- opencode Agent
57
- │ (reads OPENCODE.md + modes/*.md) │
56
+ Agent
57
+ │ (reads AGENTS.md + modes/*.md) │
58
58
  └──────────┬──────────────────────┘
59
59
 
60
60
  ┌──────────────────┼──────────────────────┐
@@ -85,7 +85,7 @@ The skill router (`.opencode/skills/job-forge.md`) loads mode and data files on
85
85
 
86
86
  ## Modes (`modes/`)
87
87
 
88
- Markdown mode files in `modes/` define how the opencode workflow behaves together with the root `OPENCODE.md`. **`_shared.md`** is the shared layer (archetypes, scoring dimensions, negotiation scaffolding); the rest align with `/job-forge` command entry points listed in `OPENCODE.md`.
88
+ Markdown mode files in `modes/` define how the workflow behaves together with the root `AGENTS.md`. **`_shared.md`** is the shared layer (archetypes, scoring dimensions, negotiation scaffolding); the rest align with `/job-forge` command entry points listed in `AGENTS.md`.
89
89
 
90
90
  | File | Focus |
91
91
  |------|--------|
@@ -124,7 +124,7 @@ For customization (archetypes, weights, tone), start with `_shared.md` and [CUST
124
124
  5. **Score**: Weighted average across 10 dimensions (1-5)
125
125
  6. **Report**: Save as `reports/{num}-{company}-{date}.md`
126
126
  7. **PDF**: Generate ATS-optimized CV (`generate-pdf.mjs`)
127
- 8. **Track**: Write one TSV per evaluation under `batch/tracker-additions/` (see [OPENCODE.md](../OPENCODE.md) TSV layout); fold rows into `data/applications.md` with `npm run merge` / `merge-tracker.mjs` when you are ready (not automatic in every workflow)
127
+ 8. **Track**: Write one TSV per evaluation under `batch/tracker-additions/` (see [AGENTS.md](../AGENTS.md) TSV layout); fold rows into `data/applications.md` with `npm run merge` / `merge-tracker.mjs` when you are ready (not automatic in every workflow)
128
128
 
129
129
  ## Batch Processing
130
130
 
@@ -295,6 +295,8 @@ When a form says "enter the code we sent to your email", you MUST retrieve the c
295
295
  | Workday | `from:myworkday newer_than:10m` |
296
296
  | Lever | `from:lever newer_than:10m` |
297
297
  | Ashby | `from:ashby newer_than:10m` |
298
+ | SmartRecruiters | `from:smartrecruiters newer_than:10m` |
299
+ | Aggregator redirect (WeWorkRemotely / RemoteOK) | Detect the underlying ATS from the post-redirect URL, then use that row's sender query |
298
300
  | Unknown | `newer_than:10m subject:(verify OR code OR confirm)` |
299
301
 
300
302
  **Rules:**
package/modes/README.md CHANGED
@@ -1,9 +1,9 @@
1
1
  # Modes
2
2
 
3
- Markdown prompts used with opencode together with the root [`OPENCODE.md`](../OPENCODE.md). Each file aligns with a `/job-forge …` entry point or shared behavior described there.
3
+ Markdown prompts used together with the root [`AGENTS.md`](../AGENTS.md). Each file aligns with a `/job-forge …` entry point or shared behavior described there.
4
4
 
5
5
  - **`_shared.md`** — Archetypes, scoring dimensions, negotiation scaffolding. Edit this first when you change how offers are classified or weighted.
6
- - **Per-command files** — Each `*.md` here pairs with a `/job-forge …` entry in [`OPENCODE.md`](../OPENCODE.md). How modes connect to batch, tracker, and scripts is spelled out in [**Architecture — Modes**](../docs/ARCHITECTURE.md#modes-modes).
6
+ - **Per-command files** — Each `*.md` here pairs with a `/job-forge …` entry in [`AGENTS.md`](../AGENTS.md). How modes connect to batch, tracker, and scripts is spelled out in [**Architecture — Modes**](../docs/ARCHITECTURE.md#modes-modes).
7
7
 
8
8
  | File | Role |
9
9
  |------|------|
package/modes/apply.md CHANGED
@@ -276,10 +276,12 @@ Check for an OTP gate after the candidate (or Geometra) submits — the major po
276
276
  | `lever` | `from:lever newer_than:10m` |
277
277
  | `ashby` | `from:ashby newer_than:10m` |
278
278
  | `workable` | `from:workable newer_than:10m` |
279
+ | `smartrecruiters` | `from:smartrecruiters newer_than:10m` |
280
+ | `wwr` / `remoteok` | Follow the apply redirect to the underlying ATS, re-detect the host, then use that row's query. Aggregators do not send OTP emails themselves. |
279
281
  | `builtin` | `from:builtin newer_than:10m` |
280
282
  | `custom` / `unknown` / missing | `newer_than:10m subject:(verify OR code OR confirm)` |
281
283
 
282
- **Fallback when `ats` is missing** (legacy pipeline entries with no `| ats=` suffix, or scan-output without an `ats` column): infer from the URL host — `*.greenhouse.io` → `greenhouse`; `jobs.ashbyhq.com` → `ashby`; `jobs.lever.co` → `lever`; `*.myworkdayjobs.com` → `workday`; `apply.workable.com` / `jobs.workable.com` → `workable`; `builtin.com` → `builtin`; otherwise use the generic `verify OR code OR confirm` subject query.
284
+ **Fallback when `ats` is missing** (legacy pipeline entries with no `| ats=` suffix, or scan-output without an `ats` column): infer from the URL host — `*.greenhouse.io` → `greenhouse`; `jobs.ashbyhq.com` → `ashby`; `jobs.lever.co` → `lever`; `*.myworkdayjobs.com` → `workday`; `apply.workable.com` / `jobs.workable.com` → `workable`; `api.smartrecruiters.com` / `jobs.smartrecruiters.com` → `smartrecruiters`; `weworkremotely.com` → `wwr`; `remoteok.com` → `remoteok`; `builtin.com` → `builtin`; otherwise use the generic `verify OR code OR confirm` subject query.
283
285
 
284
286
  **Before reporting the submission as failed, always check Gmail.** A "submit did nothing" outcome usually means a silent OTP step — not a real failure.
285
287
 
package/modes/scan.md CHANGED
@@ -34,9 +34,88 @@ Read `portals.yml` which contains:
34
34
 
35
35
  **Every company MUST have a `careers_url` in portals.yml.** If it doesn't, search for it once, save it, and use it in future scans.
36
36
 
37
- ### Use Level 2 — Greenhouse API (COMPLEMENTARY)
38
-
39
- For companies using Greenhouse, the JSON API (`boards-api.greenhouse.io/v1/boards/{slug}/jobs`) returns clean structured data. Use as a quick complement to Level 1it's faster than Geometra but only works with Greenhouse.
37
+ ### Use Level 2 — ATS / Aggregator APIs (COMPLEMENTARY)
38
+
39
+ For companies using an ATS or aggregator that exposes a public JSON/RSS API, fetch structured data directly. APIs are faster than Geometra and harder to hallucinate (the response is load-bearing record IDs verbatim from the response, never reconstruct them). Use as a complement to Level 1.
40
+
41
+ Supported API shapes:
42
+
43
+ #### Greenhouse (JSON, per-company board)
44
+
45
+ - **Endpoint**: `https://boards-api.greenhouse.io/v1/boards/{slug}/jobs`
46
+ - **Method**: `GET` (plain, no auth)
47
+ - **Shape**: `{ jobs: [{ id, title, absolute_url, updated_at, location: { name } }, ...] }`
48
+ - **Canonical URL to record**: `https://job-boards.greenhouse.io/{slug}/jobs/{id}` — do NOT use `absolute_url` when it points to a customer-skinned front-end (see Verification section below).
49
+ - **ats**: `greenhouse`
50
+
51
+ #### Ashby (JSON, per-company board)
52
+
53
+ - **Endpoint**: `https://api.ashbyhq.com/posting-api/job-board/{slug}?includeCompensation=true`
54
+ - **Method**: `GET`
55
+ - **Shape**: `{ jobs: [{ id, title, jobUrl, publishedDate, locationName, employmentType, department, team, compensation }] }`
56
+ - **Canonical URL to record**: use the returned `jobUrl` (format `https://jobs.ashbyhq.com/{slug}/{uuid}`).
57
+ - **ats**: `ashby`
58
+
59
+ #### Lever (JSON, per-company board)
60
+
61
+ - **Endpoint**: `https://api.lever.co/v0/postings/{slug}?mode=json`
62
+ - **Method**: `GET`
63
+ - **Shape**: array of postings `[{ id, text, hostedUrl, createdAt, categories: { commitment, department, location, team } }, ...]`
64
+ - **Canonical URL to record**: `hostedUrl` (format `https://jobs.lever.co/{slug}/{uuid}`).
65
+ - **ats**: `lever`
66
+
67
+ #### Workday (JSON, per-tenant + site — FINICKY)
68
+
69
+ - **Endpoint**: `https://{subdomain}.{pod}.myworkdayjobs.com/wday/cxs/{tenant}/{site}/jobs`
70
+ - `subdomain` = the Workday tenant hostname prefix (e.g. `nvidia`, `salesforce`, `adobe`, `shopify`).
71
+ - `pod` = the Workday data-center pod segment (varies: `wd1`, `wd3`, `wd5`). The hostname in `careers_url` reveals which.
72
+ - `tenant` = repeats the company slug in the path (usually equal to `subdomain`, but not always).
73
+ - `site` = the public site name exposed by the tenant (e.g. `NVIDIAExternalCareerSite`, `External`, `ShopifyCareerSite`). Read it from the tenant's HTML landing page if unknown.
74
+ - **Method**: `POST` with JSON body:
75
+ ```json
76
+ {"appliedFacets": {}, "limit": 20, "offset": 0, "searchText": ""}
77
+ ```
78
+ - **Required headers**: `Content-Type: application/json`, `Accept: application/json`. Some tenants reject requests without a realistic `User-Agent` — set one if the response is 403.
79
+ - **Shape**: `{ jobPostings: [{ title, externalPath, postedOn, locationsText, bulletFields }, ...], total }`
80
+ - **Canonical URL to record**: `https://{subdomain}.{pod}.myworkdayjobs.com/{site}{externalPath}` (note: `externalPath` already starts with `/job/...` — do NOT prepend an extra `/`).
81
+ - **Pagination**: increment `offset` by `limit` (20) until `jobPostings.length < limit` or `offset >= total`.
82
+ - **ats**: `workday`
83
+ - **Fallback**: Workday APIs are brittle — tenants occasionally block POST from data-center IPs, change `site` names silently, or return empty `jobPostings` while the HTML page shows listings. If the POST fails or returns 0 jobs on a tenant that Level 1 confirmed has listings, fall back to Level 1 (Geometra scraping the `careers_url`). Treat Workday as Level 2 with a guaranteed Level 1 fallback.
84
+
85
+ #### SmartRecruiters (JSON, per-company postings)
86
+
87
+ - **Endpoint**: `https://api.smartrecruiters.com/v1/companies/{company}/postings`
88
+ - **Method**: `GET` (plain, no auth)
89
+ - **Shape**: `{ content: [{ id, name, refNumber, jobAdUrl, releasedDate, location: { city, country, remote }, company: { identifier, name }, department }], totalFound, offset, limit }`
90
+ - **Canonical URL to record**: use `jobAdUrl` when present, otherwise `https://jobs.smartrecruiters.com/{company}/{id}`.
91
+ - **Pagination**: pass `?offset=N&limit=100` (max 100). Loop until `offset + content.length >= totalFound`.
92
+ - **ats**: `smartrecruiters`
93
+
94
+ #### WeWorkRemotely (RSS, cross-company aggregator)
95
+
96
+ - **Endpoints** (one per category — enable the ones matching your target roles):
97
+ - `https://weworkremotely.com/categories/remote-programming-jobs.rss`
98
+ - `https://weworkremotely.com/categories/remote-devops-sysadmin-jobs.rss`
99
+ - `https://weworkremotely.com/categories/remote-product-jobs.rss`
100
+ - `https://weworkremotely.com/categories/remote-design-jobs.rss`
101
+ - `https://weworkremotely.com/categories/all-other-remote-jobs.rss`
102
+ - **Method**: `GET` — returns RSS 2.0 XML.
103
+ - **Shape**: `<rss><channel><item><title>{company}: {role}</title><link>https://weworkremotely.com/remote-jobs/{slug}</link><pubDate>...</pubDate><region>...</region></item></channel></rss>`
104
+ - **Company/role extraction**: split `<title>` on the first `: ` — left side is company, right side is role. Fallback to the whole title as role if there is no `: `.
105
+ - **Canonical URL to record**: the `<link>` verbatim (format `https://weworkremotely.com/remote-jobs/{slug}`).
106
+ - **Cross-company note**: WeWorkRemotely is NOT per-company — it aggregates postings from hundreds of companies. Scan it via the `cross_company_feeds` section in `portals.yml`, not `tracked_companies`.
107
+ - **ats**: `wwr` (aggregator). The underlying company's ATS is unknown at scan time — downstream evaluators follow the link and re-detect.
108
+
109
+ #### RemoteOK (JSON, cross-company aggregator)
110
+
111
+ - **Endpoint**: `https://remoteok.com/api`
112
+ - **Method**: `GET` — returns a JSON array. The **first element is a legal/disclaimer object** (no `id`, has `legal`) — skip it. The remaining 100 entries are postings.
113
+ - **Required headers**: `User-Agent: Mozilla/5.0 ...` — RemoteOK returns 403 without a browser-like UA.
114
+ - **Shape** (per posting after skip): `{ id, slug, company, company_logo, position, description, tags: [string], date, epoch, url, apply_url, location, salary_min, salary_max }`
115
+ - **Canonical URL to record**: `url` (format `https://remoteok.com/remote-jobs/{id}-{slug}`).
116
+ - **Filtering**: RemoteOK feeds are broad — use `tags` for pre-filter (e.g. `tags` contains `"engineer"` or `"ai"`) before passing through `title_filter`.
117
+ - **Cross-company note**: same as WeWorkRemotely — configure via `cross_company_feeds`, not `tracked_companies`.
118
+ - **ats**: `remoteok` (aggregator).
40
119
 
41
120
  ### Use Level 3 — WebSearch Queries (BROAD DISCOVERY)
42
121
 
@@ -44,7 +123,7 @@ The `search_queries` with `site:` filters cover portals broadly (all Ashby board
44
123
 
45
124
  **Execution priority:**
46
125
  1. Level 1: Geometra → all `tracked_companies` with `careers_url`
47
- 2. Level 2: API → all `tracked_companies` with `api:`
126
+ 2. Level 2: API → all `tracked_companies` with `api:` (Greenhouse / Ashby / Lever / Workday / SmartRecruiters) AND all `cross_company_feeds` with `enabled: true` (WeWorkRemotely / RemoteOK)
48
127
  3. Level 3: WebSearch → all `search_queries` with `enabled: true`
49
128
 
50
129
  The levels are additive — all are executed, results are merged and deduplicated.
@@ -65,15 +144,26 @@ The levels are additive — all are executed, results are merged and deduplicate
65
144
  f. Accumulate in candidates list
66
145
  g. If `careers_url` fails (404, redirect), try `scan_query` as fallback and note for URL update
67
146
 
68
- 5. **Level 2 — Greenhouse APIs** (WebFetch can batch freely — it's cheap and doesn't use Geometra sessions):
69
- For each company in `tracked_companies` with `api:` defined and `enabled: true`:
70
- a. WebFetch the API URL JSON with job list
71
- b. For each job extract: `{title, url, company, gh_slug, gh_id, updated_at}`
72
- - **`url`**: ALWAYS record the canonical Greenhouse URL: `https://job-boards.greenhouse.io/{gh_slug}/jobs/{gh_id}`. Do **NOT** use `absolute_url` when it points to a customer-skinned front-end (e.g. `pinterestcareers.com/jobs/?gh_jid=N`, `okta.com/company/careers/opportunity/N`, `samsara.com/company/careers/roles/N`, `zoominfo.com/careers?gh_jid=N`, `collibra.com/.../?gh_jid=N`, `careers.toasttab.com/jobs?gh_jid=N`, `careers.airbnb.com/positions/N`, `coinbase.com/careers/positions/N`, `instacart.careers/job/?gh_jid=N`, `pinterestcareers.com/jobs/?gh_jid=N`). These customer front-ends return shells or 403 to bots and cause downstream WebFetch-based verification to wrongly mark the role CLOSED.
73
- - **`gh_slug`**: the Greenhouse board slug (from the API URL that was fetched).
74
- - **`gh_id`**: `jobs[].id` from the API response.
75
- - **`updated_at`**: `jobs[].updated_at` record for staleness detection (skip if older than 90 days, flag if older than 30).
76
- c. Accumulate in candidates list (dedup with Level 1). The pipeline.md entry MUST carry `| gh={gh_slug}/{gh_id}` at the end of the metadata so downstream evaluators can fall back to `https://boards-api.greenhouse.io/v1/boards/{gh_slug}/jobs/{gh_id}` when the canonical URL renders as a shell.
147
+ 5. **Level 2 — ATS / Aggregator APIs** (WebFetch can batch freely — it's cheap and doesn't use Geometra sessions):
148
+
149
+ **5a. Per-company APIs** for each company in `tracked_companies` with `api:` defined and `enabled: true`:
150
+ a. WebFetch (or `fetch` for Workday, which needs POST) the API URL per the endpoint shape documented above.
151
+ b. Extract per-posting `{title, url, company, updated_at, ats}` plus ATS-specific IDs:
152
+ - **Greenhouse** → also record `gh_slug`, `gh_id`. URL MUST be canonical `https://job-boards.greenhouse.io/{gh_slug}/jobs/{gh_id}` — do **NOT** use `absolute_url` when it points to a customer-skinned front-end (e.g. `pinterestcareers.com/jobs/?gh_jid=N`, `okta.com/company/careers/opportunity/N`, `samsara.com/company/careers/roles/N`, `zoominfo.com/careers?gh_jid=N`, `collibra.com/.../?gh_jid=N`, `careers.toasttab.com/jobs?gh_jid=N`, `careers.airbnb.com/positions/N`, `coinbase.com/careers/positions/N`, `instacart.careers/job/?gh_jid=N`). These customer front-ends return shells or 403 to bots and cause downstream WebFetch-based verification to wrongly mark the role CLOSED.
153
+ - **Ashby** record the returned `jobUrl`.
154
+ - **Lever** record the returned `hostedUrl`.
155
+ - **Workday** build URL as `https://{subdomain}.{pod}.myworkdayjobs.com/{site}{externalPath}`. If the POST fails, DROP that tenant's API attempt and fall back to Level 1 for that company do NOT fabricate postings.
156
+ - **SmartRecruiters** → record `jobAdUrl` (fallback: `https://jobs.smartrecruiters.com/{company}/{id}`).
157
+ - **`updated_at`**: use `updated_at` (Greenhouse) / `publishedDate` (Ashby) / `createdAt` (Lever) / `postedOn` (Workday) / `releasedDate` (SmartRecruiters) — record for staleness detection (skip if older than 90 days, flag if older than 30).
158
+ c. Accumulate in candidates list (dedup with Level 1). The pipeline.md entry MUST carry `| ats={type}` at the end, and for Greenhouse ALSO `| gh={gh_slug}/{gh_id}` so downstream evaluators can fall back to `https://boards-api.greenhouse.io/v1/boards/{gh_slug}/jobs/{gh_id}` when the canonical URL renders as a shell.
159
+
160
+ **5b. Cross-company aggregator feeds** — for each feed in `cross_company_feeds` with `enabled: true`:
161
+ a. WebFetch the RSS (WeWorkRemotely) or JSON (RemoteOK) endpoint per the shape documented above.
162
+ b. Parse each entry to `{title, url, company, ats, updated_at}`:
163
+ - **WeWorkRemotely** → split `<title>` on the first `: ` to separate company from role; `<link>` → url; `<pubDate>` → updated_at.
164
+ - **RemoteOK** → skip the first element (legal disclaimer); from each remaining entry take `company`, `position`, `url`, `date`.
165
+ c. Apply the feed's `tag_filter` / `category_filter` before the global `title_filter` — aggregators have much higher volume than per-company APIs.
166
+ d. Accumulate in candidates list (dedup with Level 1 + 5a).
77
167
 
78
168
  6. **Level 3 — WebSearch queries** (WebSearch is parallel-safe; batch freely):
79
169
  For each query in `search_queries` with `enabled: true`:
@@ -106,10 +196,14 @@ The levels are additive — all are executed, results are merged and deduplicate
106
196
  - When a fuzzy match is found but the URL is new, log it as `skipped_repost` (not `skipped_dup`) with a note referencing the original entry number.
107
197
 
108
198
  8. **For each new offer that passes filters**:
109
- a. Add to `pipeline.md` section "Pending": `- [ ] {url} | {company} | {title} | ats={ats}` — the `| ats={type}` suffix is REQUIRED for every entry (values: `greenhouse`, `ashby`, `workable`, `lever`, `workday`, `builtin`, `custom`, `unknown`). When the offer came from the Greenhouse API (Level 2), ALSO append `| gh={gh_slug}/{gh_id}` so downstream verification can hit the JSON endpoint. Example entries:
199
+ a. Add to `pipeline.md` section "Pending": `- [ ] {url} | {company} | {title} | ats={ats}` — the `| ats={type}` suffix is REQUIRED for every entry (values: `greenhouse`, `ashby`, `workable`, `lever`, `workday`, `smartrecruiters`, `wwr`, `remoteok`, `builtin`, `custom`, `unknown`). When the offer came from the Greenhouse API (Level 2), ALSO append `| gh={gh_slug}/{gh_id}` so downstream verification can hit the JSON endpoint. Example entries:
110
200
  - `- [ ] https://job-boards.greenhouse.io/webflow/jobs/7689676 | Webflow | Lead AI Engineer | ats=greenhouse | gh=webflow/7689676`
111
201
  - `- [ ] https://jobs.ashbyhq.com/everai/abc-123 | EverAI | Senior AI PM | ats=ashby`
112
202
  - `- [ ] https://jobs.lever.co/temporal/xyz | Temporal | Product Manager - AI | ats=lever`
203
+ - `- [ ] https://nvidia.wd5.myworkdayjobs.com/NVIDIAExternalCareerSite/job/US-CA-Santa-Clara/Senior-AI-Engineer_JR123456 | NVIDIA | Senior AI Engineer | ats=workday`
204
+ - `- [ ] https://jobs.smartrecruiters.com/Visa1/744000012345678 | Visa | Staff ML Engineer | ats=smartrecruiters`
205
+ - `- [ ] https://weworkremotely.com/remote-jobs/acme-senior-platform-engineer | Acme | Senior Platform Engineer | ats=wwr`
206
+ - `- [ ] https://remoteok.com/remote-jobs/12345-senior-ai-engineer-acme | Acme | Senior AI Engineer | ats=remoteok`
113
207
  b. Record in `scan-history.tsv`: `{url}\t{date}\t{query_name}\t{title}\t{company}\tadded`
114
208
 
115
209
  9. **Offers filtered by title**: record in `scan-history.tsv` with status `skipped_title`
@@ -158,10 +252,10 @@ Scan mode MUST write its ranked candidate list to a file, not just return it in
158
252
  | 2 | EverAI | ashby | Senior AI PM | - | - | https://jobs.ashbyhq.com/everai/abc-123 | 2026-04-15 |
159
253
  | ... | ... | ... | ... | ... | ... | ... | ... |
160
254
 
161
- **`ats` values** (one of): `greenhouse`, `ashby`, `workable`, `lever`, `workday`, `builtin`, `custom`, `unknown`. Every row MUST populate this column — it's what the apply subagent uses to pick the correct Gmail OTP sender query.
255
+ **`ats` values** (one of): `greenhouse`, `ashby`, `workable`, `lever`, `workday`, `smartrecruiters`, `wwr`, `remoteok`, `builtin`, `custom`, `unknown`. Every row MUST populate this column — it's what the apply subagent uses to pick the correct Gmail OTP sender query. The `wwr` and `remoteok` values identify aggregator postings whose real underlying ATS is only known after the redirect is followed — downstream evaluators re-detect and may rewrite to the underlying ATS.
162
256
 
163
257
  Every row MUST have:
164
- - `ats` — the ATS platform hosting the posting. Inferred from the canonical URL host (e.g. `boards-api.greenhouse.io` / `job-boards.greenhouse.io` → `greenhouse`; `jobs.ashbyhq.com` → `ashby`; `jobs.lever.co` → `lever`; `myworkdayjobs.com` / `.wd5.myworkdayjobs.com` → `workday`; `apply.workable.com` / `jobs.workable.com` → `workable`; `builtin.com/jobs/` → `builtin`; company-own domains → `custom`; anything indeterminate → `unknown`).
258
+ - `ats` — the ATS platform hosting the posting. Inferred from the canonical URL host (e.g. `boards-api.greenhouse.io` / `job-boards.greenhouse.io` → `greenhouse`; `jobs.ashbyhq.com` → `ashby`; `jobs.lever.co` → `lever`; `*.myworkdayjobs.com` (any `wd1`/`wd3`/`wd5` pod) → `workday`; `apply.workable.com` / `jobs.workable.com` → `workable`; `api.smartrecruiters.com` / `jobs.smartrecruiters.com` → `smartrecruiters`; `weworkremotely.com` → `wwr`; `remoteok.com` → `remoteok`; `builtin.com/jobs/` → `builtin`; company-own domains → `custom`; anything indeterminate → `unknown`).
165
259
  - `url` in canonical form. For Greenhouse use `https://job-boards.greenhouse.io/{gh_slug}/jobs/{gh_id}` (matching the suffix in `data/pipeline.md`). For other ATSes use the platform's native URL (do not rewrite).
166
260
  - `updated_at` in `YYYY-MM-DD` form (the most recent `updated_at` in the API response, or scan date when the source has no such field).
167
261
 
@@ -214,6 +308,8 @@ Each company in `tracked_companies` MUST have a `careers_url` — the direct URL
214
308
  - **Ashby:** `https://jobs.ashbyhq.com/{slug}`
215
309
  - **Greenhouse:** `https://job-boards.greenhouse.io/{slug}` or `https://job-boards.eu.greenhouse.io/{slug}`
216
310
  - **Lever:** `https://jobs.lever.co/{slug}`
311
+ - **Workday:** `https://{subdomain}.{pod}.myworkdayjobs.com/{site}` (pod = `wd1`/`wd3`/`wd5`/..., varies by tenant data center; site is tenant-defined, e.g. `External`, `NVIDIAExternalCareerSite`)
312
+ - **SmartRecruiters:** `https://careers.smartrecruiters.com/{company}` (human-facing) / `https://api.smartrecruiters.com/v1/companies/{company}/postings` (API)
217
313
  - **Custom:** The company's own URL (e.g., `https://openai.com/careers`)
218
314
 
219
315
  **If `careers_url` doesn't exist** for a company:
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "job-forge",
3
- "version": "2.0.3",
3
+ "version": "2.1.0",
4
4
  "description": "AI-powered job search pipeline built on opencode",
5
5
  "type": "module",
6
6
  "bin": {
@@ -406,6 +406,36 @@ search_queries:
406
406
  # Companies whose career pages are checked directly.
407
407
  # scan_method: geometra (default), websearch, greenhouse_api
408
408
  # For Greenhouse companies, add api: field for faster structured JSON access.
409
+ #
410
+ # Per-ATS api: field shapes (see modes/scan.md for full endpoint docs):
411
+ #
412
+ # Greenhouse: api: https://boards-api.greenhouse.io/v1/boards/{slug}/jobs
413
+ # Ashby: api: https://api.ashbyhq.com/posting-api/job-board/{slug}?includeCompensation=true
414
+ # Lever: api: https://api.lever.co/v0/postings/{slug}?mode=json
415
+ # SmartRecruiters: api: https://api.smartrecruiters.com/v1/companies/{company}/postings
416
+ # Workday: api_type: workday (requires workday_subdomain, workday_pod, workday_tenant, workday_site)
417
+ #
418
+ # Workday schema (finicky — POST with JSON body, tenant + site vary per company):
419
+ # - name: NVIDIA
420
+ # careers_url: https://nvidia.wd5.myworkdayjobs.com/NVIDIAExternalCareerSite
421
+ # api_type: workday
422
+ # workday_subdomain: nvidia # hostname prefix
423
+ # workday_pod: wd5 # data-center pod (wd1/wd3/wd5/...)
424
+ # workday_tenant: nvidia # usually same as subdomain
425
+ # workday_site: NVIDIAExternalCareerSite # public site name — read from careers_url path
426
+ # tags: ["chips", "ai-lab", "us"]
427
+ # enabled: true
428
+ #
429
+ # Built API URL: https://{subdomain}.{pod}.myworkdayjobs.com/wday/cxs/{tenant}/{site}/jobs
430
+ # POST body: {"appliedFacets": {}, "limit": 20, "offset": 0, "searchText": ""}
431
+ #
432
+ # SmartRecruiters schema (simple GET):
433
+ # - name: Visa
434
+ # careers_url: https://careers.smartrecruiters.com/Visa1
435
+ # api: https://api.smartrecruiters.com/v1/companies/Visa1/postings
436
+ # ats: smartrecruiters
437
+ # tags: ["fintech", "enterprise", "us"]
438
+ # enabled: true
409
439
 
410
440
  tracked_companies:
411
441
 
@@ -3138,3 +3168,80 @@ tracked_companies:
3138
3168
  notes: "TypeScript ORM. Remote."
3139
3169
  tags: ["developer-tools", "open-source", "remote-first"]
3140
3170
  enabled: true
3171
+
3172
+ # -- Cross-company aggregator feeds --
3173
+ # Aggregator boards that expose a single feed covering hundreds of companies.
3174
+ # Unlike tracked_companies, these are NOT per-company — the scanner pulls the
3175
+ # whole feed, applies a pre-filter (category / tags), then runs each posting
3176
+ # through the global title_filter above.
3177
+ #
3178
+ # Types:
3179
+ # - weworkremotely → RSS 2.0 XML per category
3180
+ # - remoteok → JSON array, first element is a legal disclaimer (skipped)
3181
+ #
3182
+ # See modes/scan.md → "Level 2 — ATS / Aggregator APIs" for full shape docs.
3183
+
3184
+ cross_company_feeds:
3185
+
3186
+ # -- We Work Remotely (RSS per category) --
3187
+ # Feed IDs map to https://weworkremotely.com/categories/{id}.rss
3188
+ - name: WeWorkRemotely — Programming
3189
+ type: weworkremotely
3190
+ feed: remote-programming-jobs
3191
+ url: https://weworkremotely.com/categories/remote-programming-jobs.rss
3192
+ # Optional pre-filter applied BEFORE title_filter. Drops obviously
3193
+ # off-target entries without cluttering scan-history.
3194
+ category_filter:
3195
+ positive: [] # empty = accept all from this feed
3196
+ negative: ["WordPress", "PHP", "Shopify Theme"]
3197
+ enabled: true
3198
+
3199
+ - name: WeWorkRemotely — DevOps & SysAdmin
3200
+ type: weworkremotely
3201
+ feed: remote-devops-sysadmin-jobs
3202
+ url: https://weworkremotely.com/categories/remote-devops-sysadmin-jobs.rss
3203
+ enabled: true
3204
+
3205
+ - name: WeWorkRemotely — Product
3206
+ type: weworkremotely
3207
+ feed: remote-product-jobs
3208
+ url: https://weworkremotely.com/categories/remote-product-jobs.rss
3209
+ enabled: true
3210
+
3211
+ - name: WeWorkRemotely — All Other
3212
+ type: weworkremotely
3213
+ feed: all-other-remote-jobs
3214
+ url: https://weworkremotely.com/categories/all-other-remote-jobs.rss
3215
+ # This category is very broad — start disabled, enable if signal is good.
3216
+ enabled: false
3217
+
3218
+ # -- RemoteOK (single JSON feed, filter by tags) --
3219
+ # One endpoint, 100 newest postings. Filter by tags BEFORE title_filter,
3220
+ # otherwise you burn the title_filter pass on ~80% irrelevant rows.
3221
+ - name: RemoteOK — AI & Engineering
3222
+ type: remoteok
3223
+ url: https://remoteok.com/api
3224
+ # Required — RemoteOK returns 403 without a browser-like UA.
3225
+ user_agent: "Mozilla/5.0 (compatible; JobForgeScanner/1.0; +https://github.com/razroo/JobForge)"
3226
+ # Pre-filter on entry.tags (case-insensitive substring match against the
3227
+ # array). Row passes if ANY positive matches AND NO negative matches.
3228
+ tag_filter:
3229
+ positive:
3230
+ - "engineer"
3231
+ - "engineering"
3232
+ - "ai"
3233
+ - "ml"
3234
+ - "llm"
3235
+ - "python"
3236
+ - "golang"
3237
+ - "typescript"
3238
+ - "devops"
3239
+ - "platform"
3240
+ - "product manager"
3241
+ negative:
3242
+ - "wordpress"
3243
+ - "php"
3244
+ - "marketing"
3245
+ - "sales"
3246
+ - "customer support"
3247
+ enabled: true