npm - job-forge - Versions diffs - 2.0.3 → 2.2.0 - Mend

job-forge 2.0.3 → 2.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (12) hide show

package/.cursor/rules/main.mdc +10 -2
package/AGENTS.md +10 -2
package/CLAUDE.md +10 -2
package/docs/ARCHITECTURE.md +4 -4
package/iso/instructions.md +10 -2
package/modes/README.md +2 -2
package/modes/apply.md +3 -1
package/modes/pipeline.md +8 -4
package/modes/scan.md +112 -16
package/package.json +1 -1
package/scripts/next-num.mjs +59 -4
package/templates/portals.example.yml +107 -0

package/.cursor/rules/main.mdc CHANGED Viewed

@@ -10,7 +10,13 @@ alwaysApply: true
 The Hard Limits below are non-negotiable numeric rules. If you catch yourself about to violate one, STOP and restructure.
 1. **Max parallel subagents: 2.** Never emit 3+ `task` tool calls in a single message. For N jobs, run `ceil(N/2)` sequential rounds of 2. No exceptions — not for "urgent", not for "the user asked for 10".
-2. **Max 1 application per company+role.** Before every `task` dispatch for `apply`, Grep `data/pipeline.md` and today's `data/applications/*.md` for the URL and for `company+role`. If already APPLIED, skip that job and do not dispatch.
+2. **Max 1 application per company+role.** Before every `task` dispatch for `apply`, Grep ALL of the following for the URL and for `company+role`:
+   - `data/pipeline.md`
+   - all `data/applications/*.md` day files (not just today's — prior-day Applies count too)
+   - `batch/tracker-additions/*.tsv` (pending outcomes not yet merged)
+   - `batch/tracker-additions/merged/*.tsv` (outcomes already consumed into day files — catches same-day earlier-batch Applies that merge collapsed into an existing row)
+   If any source shows an APPLIED / Applied outcome for this URL or company+role, skip that job and do not dispatch. **Why merged/ matters**: when two batches in the same day target the same role, `npx job-forge merge` updates the existing day-file row instead of creating a new one — so `grep data/applications/*.md` for the higher report number misses the earlier apply. The merged TSV is the only place the newer attempt's breadcrumb remains.
 3. **Always clean Geometra sessions before dispatching.** Before every round of `task` dispatches that will use Geometra, call `geometra_list_sessions` then `geometra_disconnect({closeBrowser: true})`. Every round. The disconnect is a no-op when the pool is empty.
 4. **Orchestrator does NOT fill forms.** This session MUST NOT call `geometra_fill_form`, `geometra_run_actions`, `geometra_pick_listbox_option`, or `geometra_fill_otp` when handling a multi-job request. If you need to, it means you MUST have delegated — `task` out the remaining work instead.
 5. **Re-dispatch only AFTER the previous subagent returns.** Never fire the same company's `task` twice while the first is still in-flight. Wait for the return value, then decide if a retry is warranted.
@@ -300,6 +306,8 @@ When a form says "enter the code we sent to your email", you MUST retrieve the c
 | Workday | `from:myworkday newer_than:10m` |
 | Lever | `from:lever newer_than:10m` |
 | Ashby | `from:ashby newer_than:10m` |
+| SmartRecruiters | `from:smartrecruiters newer_than:10m` |
+| Aggregator redirect (WeWorkRemotely / RemoteOK) | Detect the underlying ATS from the post-redirect URL, then use that row's sender query |
 | Unknown | `newer_than:10m subject:(verify OR code OR confirm)` |
 **Rules:**
@@ -473,7 +481,7 @@ To check or modify MCP settings, edit `opencode.json` in the project root.
 - Output in `output/` (gitignored), Reports in `reports/`
 - JDs in `jds/` (referenced as `local:jds/{file}` in pipeline.md)
 - Batch in `batch/` (gitignored except scripts and prompt)
-- Report numbering: sequential 3-digit zero-padded, max existing + 1
+- Report numbering: sequential 3-digit zero-padded. **Always use `npx job-forge next-num` to get the next number** — do NOT derive it yourself from `ls reports/`. The CLI scans all sources: `reports/*.md`, the `#` column of every `data/applications/*.md` day file, and pending + merged `batch/tracker-additions/*.tsv`. Deriving from `reports/` alone misses numbers assigned by prior-day tracker additions that were never written as report files (e.g., `SKIP` entries), which causes ID collisions downstream.
 - **RULE: After each batch of evaluations, run `npx job-forge merge`** to merge tracker additions and avoid duplications.
 - **RULE: NEVER create new entries in applications.md if company+role already exists.** Update the existing entry.
 - **RULE: NEVER attribute commits to opencode (no `Co-Authored-By: opencode` or similar).** All commits must be attributed solely to the person making the commit (e.g., CharlieGreenman).

package/AGENTS.md CHANGED Viewed

@@ -5,7 +5,13 @@
 The Hard Limits below are non-negotiable numeric rules. If you catch yourself about to violate one, STOP and restructure.
 1. **Max parallel subagents: 2.** Never emit 3+ `task` tool calls in a single message. For N jobs, run `ceil(N/2)` sequential rounds of 2. No exceptions — not for "urgent", not for "the user asked for 10".
-2. **Max 1 application per company+role.** Before every `task` dispatch for `apply`, Grep `data/pipeline.md` and today's `data/applications/*.md` for the URL and for `company+role`. If already APPLIED, skip that job and do not dispatch.
+2. **Max 1 application per company+role.** Before every `task` dispatch for `apply`, Grep ALL of the following for the URL and for `company+role`:
+   - `data/pipeline.md`
+   - all `data/applications/*.md` day files (not just today's — prior-day Applies count too)
+   - `batch/tracker-additions/*.tsv` (pending outcomes not yet merged)
+   - `batch/tracker-additions/merged/*.tsv` (outcomes already consumed into day files — catches same-day earlier-batch Applies that merge collapsed into an existing row)
+   If any source shows an APPLIED / Applied outcome for this URL or company+role, skip that job and do not dispatch. **Why merged/ matters**: when two batches in the same day target the same role, `npx job-forge merge` updates the existing day-file row instead of creating a new one — so `grep data/applications/*.md` for the higher report number misses the earlier apply. The merged TSV is the only place the newer attempt's breadcrumb remains.
 3. **Always clean Geometra sessions before dispatching.** Before every round of `task` dispatches that will use Geometra, call `geometra_list_sessions` then `geometra_disconnect({closeBrowser: true})`. Every round. The disconnect is a no-op when the pool is empty.
 4. **Orchestrator does NOT fill forms.** This session MUST NOT call `geometra_fill_form`, `geometra_run_actions`, `geometra_pick_listbox_option`, or `geometra_fill_otp` when handling a multi-job request. If you need to, it means you MUST have delegated — `task` out the remaining work instead.
 5. **Re-dispatch only AFTER the previous subagent returns.** Never fire the same company's `task` twice while the first is still in-flight. Wait for the return value, then decide if a retry is warranted.
@@ -295,6 +301,8 @@ When a form says "enter the code we sent to your email", you MUST retrieve the c
 | Workday | `from:myworkday newer_than:10m` |
 | Lever | `from:lever newer_than:10m` |
 | Ashby | `from:ashby newer_than:10m` |
+| SmartRecruiters | `from:smartrecruiters newer_than:10m` |
+| Aggregator redirect (WeWorkRemotely / RemoteOK) | Detect the underlying ATS from the post-redirect URL, then use that row's sender query |
 | Unknown | `newer_than:10m subject:(verify OR code OR confirm)` |
 **Rules:**
@@ -468,7 +476,7 @@ To check or modify MCP settings, edit `opencode.json` in the project root.
 - Output in `output/` (gitignored), Reports in `reports/`
 - JDs in `jds/` (referenced as `local:jds/{file}` in pipeline.md)
 - Batch in `batch/` (gitignored except scripts and prompt)
-- Report numbering: sequential 3-digit zero-padded, max existing + 1
+- Report numbering: sequential 3-digit zero-padded. **Always use `npx job-forge next-num` to get the next number** — do NOT derive it yourself from `ls reports/`. The CLI scans all sources: `reports/*.md`, the `#` column of every `data/applications/*.md` day file, and pending + merged `batch/tracker-additions/*.tsv`. Deriving from `reports/` alone misses numbers assigned by prior-day tracker additions that were never written as report files (e.g., `SKIP` entries), which causes ID collisions downstream.
 - **RULE: After each batch of evaluations, run `npx job-forge merge`** to merge tracker additions and avoid duplications.
 - **RULE: NEVER create new entries in applications.md if company+role already exists.** Update the existing entry.
 - **RULE: NEVER attribute commits to opencode (no `Co-Authored-By: opencode` or similar).** All commits must be attributed solely to the person making the commit (e.g., CharlieGreenman).

package/CLAUDE.md CHANGED Viewed

@@ -5,7 +5,13 @@
 The Hard Limits below are non-negotiable numeric rules. If you catch yourself about to violate one, STOP and restructure.
 1. **Max parallel subagents: 2.** Never emit 3+ `task` tool calls in a single message. For N jobs, run `ceil(N/2)` sequential rounds of 2. No exceptions — not for "urgent", not for "the user asked for 10".
-2. **Max 1 application per company+role.** Before every `task` dispatch for `apply`, Grep `data/pipeline.md` and today's `data/applications/*.md` for the URL and for `company+role`. If already APPLIED, skip that job and do not dispatch.
+2. **Max 1 application per company+role.** Before every `task` dispatch for `apply`, Grep ALL of the following for the URL and for `company+role`:
+   - `data/pipeline.md`
+   - all `data/applications/*.md` day files (not just today's — prior-day Applies count too)
+   - `batch/tracker-additions/*.tsv` (pending outcomes not yet merged)
+   - `batch/tracker-additions/merged/*.tsv` (outcomes already consumed into day files — catches same-day earlier-batch Applies that merge collapsed into an existing row)
+   If any source shows an APPLIED / Applied outcome for this URL or company+role, skip that job and do not dispatch. **Why merged/ matters**: when two batches in the same day target the same role, `npx job-forge merge` updates the existing day-file row instead of creating a new one — so `grep data/applications/*.md` for the higher report number misses the earlier apply. The merged TSV is the only place the newer attempt's breadcrumb remains.
 3. **Always clean Geometra sessions before dispatching.** Before every round of `task` dispatches that will use Geometra, call `geometra_list_sessions` then `geometra_disconnect({closeBrowser: true})`. Every round. The disconnect is a no-op when the pool is empty.
 4. **Orchestrator does NOT fill forms.** This session MUST NOT call `geometra_fill_form`, `geometra_run_actions`, `geometra_pick_listbox_option`, or `geometra_fill_otp` when handling a multi-job request. If you need to, it means you MUST have delegated — `task` out the remaining work instead.
 5. **Re-dispatch only AFTER the previous subagent returns.** Never fire the same company's `task` twice while the first is still in-flight. Wait for the return value, then decide if a retry is warranted.
@@ -295,6 +301,8 @@ When a form says "enter the code we sent to your email", you MUST retrieve the c
 | Workday | `from:myworkday newer_than:10m` |
 | Lever | `from:lever newer_than:10m` |
 | Ashby | `from:ashby newer_than:10m` |
+| SmartRecruiters | `from:smartrecruiters newer_than:10m` |
+| Aggregator redirect (WeWorkRemotely / RemoteOK) | Detect the underlying ATS from the post-redirect URL, then use that row's sender query |
 | Unknown | `newer_than:10m subject:(verify OR code OR confirm)` |
 **Rules:**
@@ -468,7 +476,7 @@ To check or modify MCP settings, edit `opencode.json` in the project root.
 - Output in `output/` (gitignored), Reports in `reports/`
 - JDs in `jds/` (referenced as `local:jds/{file}` in pipeline.md)
 - Batch in `batch/` (gitignored except scripts and prompt)
-- Report numbering: sequential 3-digit zero-padded, max existing + 1
+- Report numbering: sequential 3-digit zero-padded. **Always use `npx job-forge next-num` to get the next number** — do NOT derive it yourself from `ls reports/`. The CLI scans all sources: `reports/*.md`, the `#` column of every `data/applications/*.md` day file, and pending + merged `batch/tracker-additions/*.tsv`. Deriving from `reports/` alone misses numbers assigned by prior-day tracker additions that were never written as report files (e.g., `SKIP` entries), which causes ID collisions downstream.
 - **RULE: After each batch of evaluations, run `npx job-forge merge`** to merge tracker additions and avoid duplications.
 - **RULE: NEVER create new entries in applications.md if company+role already exists.** Update the existing entry.
 - **RULE: NEVER attribute commits to opencode (no `Co-Authored-By: opencode` or similar).** All commits must be attributed solely to the person making the commit (e.g., CharlieGreenman).

package/docs/ARCHITECTURE.md CHANGED Viewed

@@ -53,8 +53,8 @@ The skill router (`.opencode/skills/job-forge.md`) loads mode and data files on
 ```
                     ┌─────────────────────────────────┐
-                    │         opencode Agent        │
-                    │   (reads OPENCODE.md + modes/*.md) │
+                    │            Agent                │
+                    │   (reads AGENTS.md + modes/*.md) │
                     └──────────┬──────────────────────┘
                                │
             ┌──────────────────┼──────────────────────┐
@@ -85,7 +85,7 @@ The skill router (`.opencode/skills/job-forge.md`) loads mode and data files on
 ## Modes (`modes/`)
-Markdown mode files in `modes/` define how the opencode workflow behaves together with the root `OPENCODE.md`. **`_shared.md`** is the shared layer (archetypes, scoring dimensions, negotiation scaffolding); the rest align with `/job-forge` command entry points listed in `OPENCODE.md`.
+Markdown mode files in `modes/` define how the workflow behaves together with the root `AGENTS.md`. **`_shared.md`** is the shared layer (archetypes, scoring dimensions, negotiation scaffolding); the rest align with `/job-forge` command entry points listed in `AGENTS.md`.
 | File | Focus |
 |------|--------|
@@ -124,7 +124,7 @@ For customization (archetypes, weights, tone), start with `_shared.md` and [CUST
 5. **Score**: Weighted average across 10 dimensions (1-5)
 6. **Report**: Save as `reports/{num}-{company}-{date}.md`
 7. **PDF**: Generate ATS-optimized CV (`generate-pdf.mjs`)
-8. **Track**: Write one TSV per evaluation under `batch/tracker-additions/` (see [OPENCODE.md](../OPENCODE.md) TSV layout); fold rows into `data/applications.md` with `npm run merge` / `merge-tracker.mjs` when you are ready (not automatic in every workflow)
+8. **Track**: Write one TSV per evaluation under `batch/tracker-additions/` (see [AGENTS.md](../AGENTS.md) TSV layout); fold rows into `data/applications.md` with `npm run merge` / `merge-tracker.mjs` when you are ready (not automatic in every workflow)
 ## Batch Processing

package/iso/instructions.md CHANGED Viewed

@@ -5,7 +5,13 @@
 The Hard Limits below are non-negotiable numeric rules. If you catch yourself about to violate one, STOP and restructure.
 1. **Max parallel subagents: 2.** Never emit 3+ `task` tool calls in a single message. For N jobs, run `ceil(N/2)` sequential rounds of 2. No exceptions — not for "urgent", not for "the user asked for 10".
-2. **Max 1 application per company+role.** Before every `task` dispatch for `apply`, Grep `data/pipeline.md` and today's `data/applications/*.md` for the URL and for `company+role`. If already APPLIED, skip that job and do not dispatch.
+2. **Max 1 application per company+role.** Before every `task` dispatch for `apply`, Grep ALL of the following for the URL and for `company+role`:
+   - `data/pipeline.md`
+   - all `data/applications/*.md` day files (not just today's — prior-day Applies count too)
+   - `batch/tracker-additions/*.tsv` (pending outcomes not yet merged)
+   - `batch/tracker-additions/merged/*.tsv` (outcomes already consumed into day files — catches same-day earlier-batch Applies that merge collapsed into an existing row)
+   If any source shows an APPLIED / Applied outcome for this URL or company+role, skip that job and do not dispatch. **Why merged/ matters**: when two batches in the same day target the same role, `npx job-forge merge` updates the existing day-file row instead of creating a new one — so `grep data/applications/*.md` for the higher report number misses the earlier apply. The merged TSV is the only place the newer attempt's breadcrumb remains.
 3. **Always clean Geometra sessions before dispatching.** Before every round of `task` dispatches that will use Geometra, call `geometra_list_sessions` then `geometra_disconnect({closeBrowser: true})`. Every round. The disconnect is a no-op when the pool is empty.
 4. **Orchestrator does NOT fill forms.** This session MUST NOT call `geometra_fill_form`, `geometra_run_actions`, `geometra_pick_listbox_option`, or `geometra_fill_otp` when handling a multi-job request. If you need to, it means you MUST have delegated — `task` out the remaining work instead.
 5. **Re-dispatch only AFTER the previous subagent returns.** Never fire the same company's `task` twice while the first is still in-flight. Wait for the return value, then decide if a retry is warranted.
@@ -295,6 +301,8 @@ When a form says "enter the code we sent to your email", you MUST retrieve the c
 | Workday | `from:myworkday newer_than:10m` |
 | Lever | `from:lever newer_than:10m` |
 | Ashby | `from:ashby newer_than:10m` |
+| SmartRecruiters | `from:smartrecruiters newer_than:10m` |
+| Aggregator redirect (WeWorkRemotely / RemoteOK) | Detect the underlying ATS from the post-redirect URL, then use that row's sender query |
 | Unknown | `newer_than:10m subject:(verify OR code OR confirm)` |
 **Rules:**
@@ -468,7 +476,7 @@ To check or modify MCP settings, edit `opencode.json` in the project root.
 - Output in `output/` (gitignored), Reports in `reports/`
 - JDs in `jds/` (referenced as `local:jds/{file}` in pipeline.md)
 - Batch in `batch/` (gitignored except scripts and prompt)
-- Report numbering: sequential 3-digit zero-padded, max existing + 1
+- Report numbering: sequential 3-digit zero-padded. **Always use `npx job-forge next-num` to get the next number** — do NOT derive it yourself from `ls reports/`. The CLI scans all sources: `reports/*.md`, the `#` column of every `data/applications/*.md` day file, and pending + merged `batch/tracker-additions/*.tsv`. Deriving from `reports/` alone misses numbers assigned by prior-day tracker additions that were never written as report files (e.g., `SKIP` entries), which causes ID collisions downstream.
 - **RULE: After each batch of evaluations, run `npx job-forge merge`** to merge tracker additions and avoid duplications.
 - **RULE: NEVER create new entries in applications.md if company+role already exists.** Update the existing entry.
 - **RULE: NEVER attribute commits to opencode (no `Co-Authored-By: opencode` or similar).** All commits must be attributed solely to the person making the commit (e.g., CharlieGreenman).

package/modes/README.md CHANGED Viewed

@@ -1,9 +1,9 @@
 # Modes
-Markdown prompts used with opencode together with the root [`OPENCODE.md`](../OPENCODE.md). Each file aligns with a `/job-forge …` entry point or shared behavior described there.
+Markdown prompts used together with the root [`AGENTS.md`](../AGENTS.md). Each file aligns with a `/job-forge …` entry point or shared behavior described there.
 - **`_shared.md`** — Archetypes, scoring dimensions, negotiation scaffolding. Edit this first when you change how offers are classified or weighted.
-- **Per-command files** — Each `*.md` here pairs with a `/job-forge …` entry in [`OPENCODE.md`](../OPENCODE.md). How modes connect to batch, tracker, and scripts is spelled out in [**Architecture — Modes**](../docs/ARCHITECTURE.md#modes-modes).
+- **Per-command files** — Each `*.md` here pairs with a `/job-forge …` entry in [`AGENTS.md`](../AGENTS.md). How modes connect to batch, tracker, and scripts is spelled out in [**Architecture — Modes**](../docs/ARCHITECTURE.md#modes-modes).
 | File | Role |
 |------|------|

package/modes/apply.md CHANGED Viewed

@@ -276,10 +276,12 @@ Check for an OTP gate after the candidate (or Geometra) submits — the major po
 | `lever`      | `from:lever newer_than:10m` |
 | `ashby`      | `from:ashby newer_than:10m` |
 | `workable`   | `from:workable newer_than:10m` |
+| `smartrecruiters` | `from:smartrecruiters newer_than:10m` |
+| `wwr` / `remoteok` | Follow the apply redirect to the underlying ATS, re-detect the host, then use that row's query. Aggregators do not send OTP emails themselves. |
 | `builtin`    | `from:builtin newer_than:10m` |
 | `custom` / `unknown` / missing | `newer_than:10m subject:(verify OR code OR confirm)` |
-**Fallback when `ats` is missing** (legacy pipeline entries with no `| ats=` suffix, or scan-output without an `ats` column): infer from the URL host — `*.greenhouse.io` → `greenhouse`; `jobs.ashbyhq.com` → `ashby`; `jobs.lever.co` → `lever`; `*.myworkdayjobs.com` → `workday`; `apply.workable.com` / `jobs.workable.com` → `workable`; `builtin.com` → `builtin`; otherwise use the generic `verify OR code OR confirm` subject query.
+**Fallback when `ats` is missing** (legacy pipeline entries with no `| ats=` suffix, or scan-output without an `ats` column): infer from the URL host — `*.greenhouse.io` → `greenhouse`; `jobs.ashbyhq.com` → `ashby`; `jobs.lever.co` → `lever`; `*.myworkdayjobs.com` → `workday`; `apply.workable.com` / `jobs.workable.com` → `workable`; `api.smartrecruiters.com` / `jobs.smartrecruiters.com` → `smartrecruiters`; `weworkremotely.com` → `wwr`; `remoteok.com` → `remoteok`; `builtin.com` → `builtin`; otherwise use the generic `verify OR code OR confirm` subject query.
 **Before reporting the submission as failed, always check Gmail.** A "submit did nothing" outcome usually means a silent OTP step — not a real failure.

package/modes/pipeline.md CHANGED Viewed

@@ -6,7 +6,7 @@ Processes accumulated job offer URLs from `data/pipeline.md`. The user adds URLs
 1. **Read** `data/pipeline.md` → find `- [ ]` items in the "Pending" section
 2. **For each pending URL**:
-   a. Calculate the next sequential `REPORT_NUM` (read `reports/`, take the highest number + 1)
+   a. Calculate the next sequential `REPORT_NUM` by running `npx job-forge next-num` (scans `reports/`, day file `#` columns, and `batch/tracker-additions/` — do NOT derive from `reports/` alone)
    b. **Extract JD** using Geometra MCP (geometra_connect + geometra_page_model) → WebFetch → WebSearch
    c. If the URL is not accessible → mark as `- [!]` with a note and continue
    d. **Run full auto-pipeline**: A-F Evaluation → Report .md → PDF (if score >= 3.0, per `_shared.md` thresholds) → Draft answers (if score >= 3.5) → Tracker
@@ -45,9 +45,13 @@ Processes accumulated job offer URLs from `data/pipeline.md`. The user adds URLs
 ## Automatic Numbering
-1. List all files in `reports/`
-2. Extract the number from the prefix (e.g., `142-medispend...` → 142)
-3. New number = highest found + 1
+Run `npx job-forge next-num` — returns the next 3-digit zero-padded report number. The CLI scans:
+1. `reports/*.md` filename prefixes
+2. The `#` column of every `data/applications/*.md` day file
+3. The `{num}` prefix of every `batch/tracker-additions/*.tsv` (pending + merged)
+Takes the max across all three sources and adds 1. Do NOT derive from any single source — prior-day SKIPs and other non-report tracker entries advance the counter but never write to `reports/`, so `ls reports/` alone misses them.
 ## Source Synchronization

package/modes/scan.md CHANGED Viewed

@@ -34,9 +34,88 @@ Read `portals.yml` which contains:
 **Every company MUST have a `careers_url` in portals.yml.** If it doesn't, search for it once, save it, and use it in future scans.
-### Use Level 2 — Greenhouse API (COMPLEMENTARY)
-For companies using Greenhouse, the JSON API (`boards-api.greenhouse.io/v1/boards/{slug}/jobs`) returns clean structured data. Use as a quick complement to Level 1 — it's faster than Geometra but only works with Greenhouse.
+### Use Level 2 — ATS / Aggregator APIs (COMPLEMENTARY)
+For companies using an ATS or aggregator that exposes a public JSON/RSS API, fetch structured data directly. APIs are faster than Geometra and harder to hallucinate (the response is load-bearing — record IDs verbatim from the response, never reconstruct them). Use as a complement to Level 1.
+Supported API shapes:
+#### Greenhouse (JSON, per-company board)
+- **Endpoint**: `https://boards-api.greenhouse.io/v1/boards/{slug}/jobs`
+- **Method**: `GET` (plain, no auth)
+- **Shape**: `{ jobs: [{ id, title, absolute_url, updated_at, location: { name } }, ...] }`
+- **Canonical URL to record**: `https://job-boards.greenhouse.io/{slug}/jobs/{id}` — do NOT use `absolute_url` when it points to a customer-skinned front-end (see Verification section below).
+- **ats**: `greenhouse`
+#### Ashby (JSON, per-company board)
+- **Endpoint**: `https://api.ashbyhq.com/posting-api/job-board/{slug}?includeCompensation=true`
+- **Method**: `GET`
+- **Shape**: `{ jobs: [{ id, title, jobUrl, publishedDate, locationName, employmentType, department, team, compensation }] }`
+- **Canonical URL to record**: use the returned `jobUrl` (format `https://jobs.ashbyhq.com/{slug}/{uuid}`).
+- **ats**: `ashby`
+#### Lever (JSON, per-company board)
+- **Endpoint**: `https://api.lever.co/v0/postings/{slug}?mode=json`
+- **Method**: `GET`
+- **Shape**: array of postings `[{ id, text, hostedUrl, createdAt, categories: { commitment, department, location, team } }, ...]`
+- **Canonical URL to record**: `hostedUrl` (format `https://jobs.lever.co/{slug}/{uuid}`).
+- **ats**: `lever`
+#### Workday (JSON, per-tenant + site — FINICKY)
+- **Endpoint**: `https://{subdomain}.{pod}.myworkdayjobs.com/wday/cxs/{tenant}/{site}/jobs`
+  - `subdomain` = the Workday tenant hostname prefix (e.g. `nvidia`, `salesforce`, `adobe`, `shopify`).
+  - `pod` = the Workday data-center pod segment (varies: `wd1`, `wd3`, `wd5`). The hostname in `careers_url` reveals which.
+  - `tenant` = repeats the company slug in the path (usually equal to `subdomain`, but not always).
+  - `site` = the public site name exposed by the tenant (e.g. `NVIDIAExternalCareerSite`, `External`, `ShopifyCareerSite`). Read it from the tenant's HTML landing page if unknown.
+- **Method**: `POST` with JSON body:
+  ```json
+  {"appliedFacets": {}, "limit": 20, "offset": 0, "searchText": ""}
+  ```
+- **Required headers**: `Content-Type: application/json`, `Accept: application/json`. Some tenants reject requests without a realistic `User-Agent` — set one if the response is 403.
+- **Shape**: `{ jobPostings: [{ title, externalPath, postedOn, locationsText, bulletFields }, ...], total }`
+- **Canonical URL to record**: `https://{subdomain}.{pod}.myworkdayjobs.com/{site}{externalPath}` (note: `externalPath` already starts with `/job/...` — do NOT prepend an extra `/`).
+- **Pagination**: increment `offset` by `limit` (20) until `jobPostings.length < limit` or `offset >= total`.
+- **ats**: `workday`
+- **Fallback**: Workday APIs are brittle — tenants occasionally block POST from data-center IPs, change `site` names silently, or return empty `jobPostings` while the HTML page shows listings. If the POST fails or returns 0 jobs on a tenant that Level 1 confirmed has listings, fall back to Level 1 (Geometra scraping the `careers_url`). Treat Workday as Level 2 with a guaranteed Level 1 fallback.
+#### SmartRecruiters (JSON, per-company postings)
+- **Endpoint**: `https://api.smartrecruiters.com/v1/companies/{company}/postings`
+- **Method**: `GET` (plain, no auth)
+- **Shape**: `{ content: [{ id, name, refNumber, jobAdUrl, releasedDate, location: { city, country, remote }, company: { identifier, name }, department }], totalFound, offset, limit }`
+- **Canonical URL to record**: use `jobAdUrl` when present, otherwise `https://jobs.smartrecruiters.com/{company}/{id}`.
+- **Pagination**: pass `?offset=N&limit=100` (max 100). Loop until `offset + content.length >= totalFound`.
+- **ats**: `smartrecruiters`
+#### WeWorkRemotely (RSS, cross-company aggregator)
+- **Endpoints** (one per category — enable the ones matching your target roles):
+  - `https://weworkremotely.com/categories/remote-programming-jobs.rss`
+  - `https://weworkremotely.com/categories/remote-devops-sysadmin-jobs.rss`
+  - `https://weworkremotely.com/categories/remote-product-jobs.rss`
+  - `https://weworkremotely.com/categories/remote-design-jobs.rss`
+  - `https://weworkremotely.com/categories/all-other-remote-jobs.rss`
+- **Method**: `GET` — returns RSS 2.0 XML.
+- **Shape**: `<rss><channel><item><title>{company}: {role}</title><link>https://weworkremotely.com/remote-jobs/{slug}</link><pubDate>...</pubDate><region>...</region></item></channel></rss>`
+- **Company/role extraction**: split `<title>` on the first `: ` — left side is company, right side is role. Fallback to the whole title as role if there is no `: `.
+- **Canonical URL to record**: the `<link>` verbatim (format `https://weworkremotely.com/remote-jobs/{slug}`).
+- **Cross-company note**: WeWorkRemotely is NOT per-company — it aggregates postings from hundreds of companies. Scan it via the `cross_company_feeds` section in `portals.yml`, not `tracked_companies`.
+- **ats**: `wwr` (aggregator). The underlying company's ATS is unknown at scan time — downstream evaluators follow the link and re-detect.
+#### RemoteOK (JSON, cross-company aggregator)
+- **Endpoint**: `https://remoteok.com/api`
+- **Method**: `GET` — returns a JSON array. The **first element is a legal/disclaimer object** (no `id`, has `legal`) — skip it. The remaining 100 entries are postings.
+- **Required headers**: `User-Agent: Mozilla/5.0 ...` — RemoteOK returns 403 without a browser-like UA.
+- **Shape** (per posting after skip): `{ id, slug, company, company_logo, position, description, tags: [string], date, epoch, url, apply_url, location, salary_min, salary_max }`
+- **Canonical URL to record**: `url` (format `https://remoteok.com/remote-jobs/{id}-{slug}`).
+- **Filtering**: RemoteOK feeds are broad — use `tags` for pre-filter (e.g. `tags` contains `"engineer"` or `"ai"`) before passing through `title_filter`.
+- **Cross-company note**: same as WeWorkRemotely — configure via `cross_company_feeds`, not `tracked_companies`.
+- **ats**: `remoteok` (aggregator).
 ### Use Level 3 — WebSearch Queries (BROAD DISCOVERY)
@@ -44,7 +123,7 @@ The `search_queries` with `site:` filters cover portals broadly (all Ashby board
 **Execution priority:**
 1. Level 1: Geometra → all `tracked_companies` with `careers_url`
-2. Level 2: API → all `tracked_companies` with `api:`
+2. Level 2: API → all `tracked_companies` with `api:` (Greenhouse / Ashby / Lever / Workday / SmartRecruiters) AND all `cross_company_feeds` with `enabled: true` (WeWorkRemotely / RemoteOK)
 3. Level 3: WebSearch → all `search_queries` with `enabled: true`
 The levels are additive — all are executed, results are merged and deduplicated.
@@ -65,15 +144,26 @@ The levels are additive — all are executed, results are merged and deduplicate
    f. Accumulate in candidates list
    g. If `careers_url` fails (404, redirect), try `scan_query` as fallback and note for URL update
-5. **Level 2 — Greenhouse APIs** (WebFetch can batch freely — it's cheap and doesn't use Geometra sessions):
-   For each company in `tracked_companies` with `api:` defined and `enabled: true`:
-   a. WebFetch the API URL → JSON with job list
-   b. For each job extract: `{title, url, company, gh_slug, gh_id, updated_at}`
-      - **`url`**: ALWAYS record the canonical Greenhouse URL: `https://job-boards.greenhouse.io/{gh_slug}/jobs/{gh_id}`. Do **NOT** use `absolute_url` when it points to a customer-skinned front-end (e.g. `pinterestcareers.com/jobs/?gh_jid=N`, `okta.com/company/careers/opportunity/N`, `samsara.com/company/careers/roles/N`, `zoominfo.com/careers?gh_jid=N`, `collibra.com/.../?gh_jid=N`, `careers.toasttab.com/jobs?gh_jid=N`, `careers.airbnb.com/positions/N`, `coinbase.com/careers/positions/N`, `instacart.careers/job/?gh_jid=N`, `pinterestcareers.com/jobs/?gh_jid=N`). These customer front-ends return shells or 403 to bots and cause downstream WebFetch-based verification to wrongly mark the role CLOSED.
-      - **`gh_slug`**: the Greenhouse board slug (from the API URL that was fetched).
-      - **`gh_id`**: `jobs[].id` from the API response.
-      - **`updated_at`**: `jobs[].updated_at` — record for staleness detection (skip if older than 90 days, flag if older than 30).
-   c. Accumulate in candidates list (dedup with Level 1). The pipeline.md entry MUST carry `| gh={gh_slug}/{gh_id}` at the end of the metadata so downstream evaluators can fall back to `https://boards-api.greenhouse.io/v1/boards/{gh_slug}/jobs/{gh_id}` when the canonical URL renders as a shell.
+5. **Level 2 — ATS / Aggregator APIs** (WebFetch can batch freely — it's cheap and doesn't use Geometra sessions):
+   **5a. Per-company APIs** — for each company in `tracked_companies` with `api:` defined and `enabled: true`:
+   a. WebFetch (or `fetch` for Workday, which needs POST) the API URL per the endpoint shape documented above.
+   b. Extract per-posting `{title, url, company, updated_at, ats}` plus ATS-specific IDs:
+      - **Greenhouse** → also record `gh_slug`, `gh_id`. URL MUST be canonical `https://job-boards.greenhouse.io/{gh_slug}/jobs/{gh_id}` — do **NOT** use `absolute_url` when it points to a customer-skinned front-end (e.g. `pinterestcareers.com/jobs/?gh_jid=N`, `okta.com/company/careers/opportunity/N`, `samsara.com/company/careers/roles/N`, `zoominfo.com/careers?gh_jid=N`, `collibra.com/.../?gh_jid=N`, `careers.toasttab.com/jobs?gh_jid=N`, `careers.airbnb.com/positions/N`, `coinbase.com/careers/positions/N`, `instacart.careers/job/?gh_jid=N`). These customer front-ends return shells or 403 to bots and cause downstream WebFetch-based verification to wrongly mark the role CLOSED.
+      - **Ashby** → record the returned `jobUrl`.
+      - **Lever** → record the returned `hostedUrl`.
+      - **Workday** → build URL as `https://{subdomain}.{pod}.myworkdayjobs.com/{site}{externalPath}`. If the POST fails, DROP that tenant's API attempt and fall back to Level 1 for that company — do NOT fabricate postings.
+      - **SmartRecruiters** → record `jobAdUrl` (fallback: `https://jobs.smartrecruiters.com/{company}/{id}`).
+      - **`updated_at`**: use `updated_at` (Greenhouse) / `publishedDate` (Ashby) / `createdAt` (Lever) / `postedOn` (Workday) / `releasedDate` (SmartRecruiters) — record for staleness detection (skip if older than 90 days, flag if older than 30).
+   c. Accumulate in candidates list (dedup with Level 1). The pipeline.md entry MUST carry `| ats={type}` at the end, and for Greenhouse ALSO `| gh={gh_slug}/{gh_id}` so downstream evaluators can fall back to `https://boards-api.greenhouse.io/v1/boards/{gh_slug}/jobs/{gh_id}` when the canonical URL renders as a shell.
+   **5b. Cross-company aggregator feeds** — for each feed in `cross_company_feeds` with `enabled: true`:
+   a. WebFetch the RSS (WeWorkRemotely) or JSON (RemoteOK) endpoint per the shape documented above.
+   b. Parse each entry to `{title, url, company, ats, updated_at}`:
+      - **WeWorkRemotely** → split `<title>` on the first `: ` to separate company from role; `<link>` → url; `<pubDate>` → updated_at.
+      - **RemoteOK** → skip the first element (legal disclaimer); from each remaining entry take `company`, `position`, `url`, `date`.
+   c. Apply the feed's `tag_filter` / `category_filter` before the global `title_filter` — aggregators have much higher volume than per-company APIs.
+   d. Accumulate in candidates list (dedup with Level 1 + 5a).
 6. **Level 3 — WebSearch queries** (WebSearch is parallel-safe; batch freely):
    For each query in `search_queries` with `enabled: true`:
@@ -106,10 +196,14 @@ The levels are additive — all are executed, results are merged and deduplicate
    - When a fuzzy match is found but the URL is new, log it as `skipped_repost` (not `skipped_dup`) with a note referencing the original entry number.
 8. **For each new offer that passes filters**:
-   a. Add to `pipeline.md` section "Pending": `- [ ] {url} | {company} | {title} | ats={ats}` — the `| ats={type}` suffix is REQUIRED for every entry (values: `greenhouse`, `ashby`, `workable`, `lever`, `workday`, `builtin`, `custom`, `unknown`). When the offer came from the Greenhouse API (Level 2), ALSO append `| gh={gh_slug}/{gh_id}` so downstream verification can hit the JSON endpoint. Example entries:
+   a. Add to `pipeline.md` section "Pending": `- [ ] {url} | {company} | {title} | ats={ats}` — the `| ats={type}` suffix is REQUIRED for every entry (values: `greenhouse`, `ashby`, `workable`, `lever`, `workday`, `smartrecruiters`, `wwr`, `remoteok`, `builtin`, `custom`, `unknown`). When the offer came from the Greenhouse API (Level 2), ALSO append `| gh={gh_slug}/{gh_id}` so downstream verification can hit the JSON endpoint. Example entries:
       - `- [ ] https://job-boards.greenhouse.io/webflow/jobs/7689676 | Webflow | Lead AI Engineer | ats=greenhouse | gh=webflow/7689676`
       - `- [ ] https://jobs.ashbyhq.com/everai/abc-123 | EverAI | Senior AI PM | ats=ashby`
       - `- [ ] https://jobs.lever.co/temporal/xyz | Temporal | Product Manager - AI | ats=lever`
+      - `- [ ] https://nvidia.wd5.myworkdayjobs.com/NVIDIAExternalCareerSite/job/US-CA-Santa-Clara/Senior-AI-Engineer_JR123456 | NVIDIA | Senior AI Engineer | ats=workday`
+      - `- [ ] https://jobs.smartrecruiters.com/Visa1/744000012345678 | Visa | Staff ML Engineer | ats=smartrecruiters`
+      - `- [ ] https://weworkremotely.com/remote-jobs/acme-senior-platform-engineer | Acme | Senior Platform Engineer | ats=wwr`
+      - `- [ ] https://remoteok.com/remote-jobs/12345-senior-ai-engineer-acme | Acme | Senior AI Engineer | ats=remoteok`
    b. Record in `scan-history.tsv`: `{url}\t{date}\t{query_name}\t{title}\t{company}\tadded`
 9. **Offers filtered by title**: record in `scan-history.tsv` with status `skipped_title`
@@ -158,10 +252,10 @@ Scan mode MUST write its ranked candidate list to a file, not just return it in
 | 2    | EverAI  | ashby      | Senior AI PM     | -       | -       | https://jobs.ashbyhq.com/everai/abc-123 | 2026-04-15 |
 | ... | ... | ... | ... | ... | ... | ... | ... |
-**`ats` values** (one of): `greenhouse`, `ashby`, `workable`, `lever`, `workday`, `builtin`, `custom`, `unknown`. Every row MUST populate this column — it's what the apply subagent uses to pick the correct Gmail OTP sender query.
+**`ats` values** (one of): `greenhouse`, `ashby`, `workable`, `lever`, `workday`, `smartrecruiters`, `wwr`, `remoteok`, `builtin`, `custom`, `unknown`. Every row MUST populate this column — it's what the apply subagent uses to pick the correct Gmail OTP sender query. The `wwr` and `remoteok` values identify aggregator postings whose real underlying ATS is only known after the redirect is followed — downstream evaluators re-detect and may rewrite to the underlying ATS.
 Every row MUST have:
-- `ats` — the ATS platform hosting the posting. Inferred from the canonical URL host (e.g. `boards-api.greenhouse.io` / `job-boards.greenhouse.io` → `greenhouse`; `jobs.ashbyhq.com` → `ashby`; `jobs.lever.co` → `lever`; `myworkdayjobs.com` / `.wd5.myworkdayjobs.com` → `workday`; `apply.workable.com` / `jobs.workable.com` → `workable`; `builtin.com/jobs/` → `builtin`; company-own domains → `custom`; anything indeterminate → `unknown`).
+- `ats` — the ATS platform hosting the posting. Inferred from the canonical URL host (e.g. `boards-api.greenhouse.io` / `job-boards.greenhouse.io` → `greenhouse`; `jobs.ashbyhq.com` → `ashby`; `jobs.lever.co` → `lever`; `*.myworkdayjobs.com` (any `wd1`/`wd3`/`wd5` pod) → `workday`; `apply.workable.com` / `jobs.workable.com` → `workable`; `api.smartrecruiters.com` / `jobs.smartrecruiters.com` → `smartrecruiters`; `weworkremotely.com` → `wwr`; `remoteok.com` → `remoteok`; `builtin.com/jobs/` → `builtin`; company-own domains → `custom`; anything indeterminate → `unknown`).
 - `url` in canonical form. For Greenhouse use `https://job-boards.greenhouse.io/{gh_slug}/jobs/{gh_id}` (matching the suffix in `data/pipeline.md`). For other ATSes use the platform's native URL (do not rewrite).
 - `updated_at` in `YYYY-MM-DD` form (the most recent `updated_at` in the API response, or scan date when the source has no such field).
@@ -214,6 +308,8 @@ Each company in `tracked_companies` MUST have a `careers_url` — the direct URL
 - **Ashby:** `https://jobs.ashbyhq.com/{slug}`
 - **Greenhouse:** `https://job-boards.greenhouse.io/{slug}` or `https://job-boards.eu.greenhouse.io/{slug}`
 - **Lever:** `https://jobs.lever.co/{slug}`
+- **Workday:** `https://{subdomain}.{pod}.myworkdayjobs.com/{site}` (pod = `wd1`/`wd3`/`wd5`/..., varies by tenant data center; site is tenant-defined, e.g. `External`, `NVIDIAExternalCareerSite`)
+- **SmartRecruiters:** `https://careers.smartrecruiters.com/{company}` (human-facing) / `https://api.smartrecruiters.com/v1/companies/{company}/postings` (API)
 - **Custom:** The company's own URL (e.g., `https://openai.com/careers`)
 **If `careers_url` doesn't exist** for a company:

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "job-forge",
-  "version": "2.0.3",
+  "version": "2.2.0",
   "description": "AI-powered job search pipeline built on opencode",
   "type": "module",
   "bin": {

package/scripts/next-num.mjs CHANGED Viewed

@@ -2,23 +2,36 @@
 /**
  * next-num — print the next sequential report number (3-digit zero-padded).
  *
- * Reads reports/ and returns max(existing) + 1. Used by agents instead of
- * having the model figure this out by listing + parsing filenames.
+ * Scans three sources to find the max and returns max + 1:
+ *   1. reports/*.md                       — filename prefix `{num}-`
+ *   2. data/applications/*.md             — `#` column of each table row
+ *   3. batch/tracker-additions/*.tsv      — first tab-separated column (pending)
+ *      batch/tracker-additions/merged/    — same, already consumed
+ *
+ * Why all three? Same-day batches can advance the counter without writing a
+ * report (e.g., SKIP entries skip PDF + report). Deriving from reports/ alone
+ * causes ID collisions when a later subagent picks a number already used in
+ * a tracker row or TSV. Scanning all three sources is O(N) on a small
+ * directory and eliminates the collision class.
  *
  * Usage:
  *   job-forge next-num              # prints e.g. "521"
- *   job-forge next-num --padded     # prints e.g. "521" (default, already padded)
  *   job-forge next-num --raw        # prints e.g. "521" without padding
  */
-import { readdirSync, existsSync } from 'fs';
+import { readdirSync, readFileSync, existsSync, statSync } from 'fs';
 import { join } from 'path';
 const PROJECT_DIR = process.env.JOB_FORGE_PROJECT || process.cwd();
 const REPORTS_DIR = join(PROJECT_DIR, 'reports');
+const APPS_DIR = join(PROJECT_DIR, 'data', 'applications');
+const TSV_DIR = join(PROJECT_DIR, 'batch', 'tracker-additions');
+const TSV_MERGED_DIR = join(TSV_DIR, 'merged');
 const RAW = process.argv.includes('--raw');
 let max = 0;
+// 1. reports/*.md
 if (existsSync(REPORTS_DIR)) {
   for (const f of readdirSync(REPORTS_DIR)) {
     if (!f.endsWith('.md')) continue;
@@ -29,5 +42,47 @@ if (existsSync(REPORTS_DIR)) {
   }
 }
+// 2. data/applications/*.md — first `|` column of each table row
+if (existsSync(APPS_DIR)) {
+  for (const f of readdirSync(APPS_DIR)) {
+    if (!f.endsWith('.md')) continue;
+    const full = join(APPS_DIR, f);
+    if (!statSync(full).isFile()) continue;
+    const content = readFileSync(full, 'utf-8');
+    for (const line of content.split('\n')) {
+      // Match: "| 756 | 2026-04-18 | ..." — integer in first cell
+      const m = line.match(/^\|\s*(\d+)\s*\|/);
+      if (!m) continue;
+      const n = parseInt(m[1], 10);
+      if (n > max) max = n;
+    }
+  }
+}
+// 3. batch/tracker-additions/*.tsv (pending) + merged/*.tsv
+for (const dir of [TSV_DIR, TSV_MERGED_DIR]) {
+  if (!existsSync(dir)) continue;
+  for (const f of readdirSync(dir)) {
+    if (!f.endsWith('.tsv')) continue;
+    const full = join(dir, f);
+    if (!statSync(full).isFile()) continue;
+    // Prefer the filename prefix (always present and canonical) over TSV
+    // contents — avoids reading the file for the common case.
+    const mName = f.match(/^(\d+)-/);
+    if (mName) {
+      const n = parseInt(mName[1], 10);
+      if (n > max) max = n;
+      continue;
+    }
+    // Fallback: parse first column of first non-empty line
+    const content = readFileSync(full, 'utf-8');
+    const firstLine = content.split('\n').find(l => l.trim().length > 0);
+    if (!firstLine) continue;
+    const cell = firstLine.split('\t')[0];
+    const n = parseInt(cell, 10);
+    if (!Number.isNaN(n) && n > max) max = n;
+  }
+}
 const next = max + 1;
 console.log(RAW ? String(next) : String(next).padStart(3, '0'));

package/templates/portals.example.yml CHANGED Viewed

@@ -406,6 +406,36 @@ search_queries:
 # Companies whose career pages are checked directly.
 # scan_method: geometra (default), websearch, greenhouse_api
 # For Greenhouse companies, add api: field for faster structured JSON access.
+#
+# Per-ATS api: field shapes (see modes/scan.md for full endpoint docs):
+#
+#   Greenhouse:     api: https://boards-api.greenhouse.io/v1/boards/{slug}/jobs
+#   Ashby:          api: https://api.ashbyhq.com/posting-api/job-board/{slug}?includeCompensation=true
+#   Lever:          api: https://api.lever.co/v0/postings/{slug}?mode=json
+#   SmartRecruiters: api: https://api.smartrecruiters.com/v1/companies/{company}/postings
+#   Workday:        api_type: workday  (requires workday_subdomain, workday_pod, workday_tenant, workday_site)
+#
+# Workday schema (finicky — POST with JSON body, tenant + site vary per company):
+#   - name: NVIDIA
+#     careers_url: https://nvidia.wd5.myworkdayjobs.com/NVIDIAExternalCareerSite
+#     api_type: workday
+#     workday_subdomain: nvidia                  # hostname prefix
+#     workday_pod: wd5                           # data-center pod (wd1/wd3/wd5/...)
+#     workday_tenant: nvidia                     # usually same as subdomain
+#     workday_site: NVIDIAExternalCareerSite     # public site name — read from careers_url path
+#     tags: ["chips", "ai-lab", "us"]
+#     enabled: true
+#
+#   Built API URL: https://{subdomain}.{pod}.myworkdayjobs.com/wday/cxs/{tenant}/{site}/jobs
+#   POST body:     {"appliedFacets": {}, "limit": 20, "offset": 0, "searchText": ""}
+#
+# SmartRecruiters schema (simple GET):
+#   - name: Visa
+#     careers_url: https://careers.smartrecruiters.com/Visa1
+#     api: https://api.smartrecruiters.com/v1/companies/Visa1/postings
+#     ats: smartrecruiters
+#     tags: ["fintech", "enterprise", "us"]
+#     enabled: true
 tracked_companies:
@@ -3138,3 +3168,80 @@ tracked_companies:
     notes: "TypeScript ORM. Remote."
     tags: ["developer-tools", "open-source", "remote-first"]
     enabled: true
+# -- Cross-company aggregator feeds --
+# Aggregator boards that expose a single feed covering hundreds of companies.
+# Unlike tracked_companies, these are NOT per-company — the scanner pulls the
+# whole feed, applies a pre-filter (category / tags), then runs each posting
+# through the global title_filter above.
+#
+# Types:
+#   - weworkremotely  → RSS 2.0 XML per category
+#   - remoteok        → JSON array, first element is a legal disclaimer (skipped)
+#
+# See modes/scan.md → "Level 2 — ATS / Aggregator APIs" for full shape docs.
+cross_company_feeds:
+  # -- We Work Remotely (RSS per category) --
+  # Feed IDs map to https://weworkremotely.com/categories/{id}.rss
+  - name: WeWorkRemotely — Programming
+    type: weworkremotely
+    feed: remote-programming-jobs
+    url: https://weworkremotely.com/categories/remote-programming-jobs.rss
+    # Optional pre-filter applied BEFORE title_filter. Drops obviously
+    # off-target entries without cluttering scan-history.
+    category_filter:
+      positive: []   # empty = accept all from this feed
+      negative: ["WordPress", "PHP", "Shopify Theme"]
+    enabled: true
+  - name: WeWorkRemotely — DevOps & SysAdmin
+    type: weworkremotely
+    feed: remote-devops-sysadmin-jobs
+    url: https://weworkremotely.com/categories/remote-devops-sysadmin-jobs.rss
+    enabled: true
+  - name: WeWorkRemotely — Product
+    type: weworkremotely
+    feed: remote-product-jobs
+    url: https://weworkremotely.com/categories/remote-product-jobs.rss
+    enabled: true
+  - name: WeWorkRemotely — All Other
+    type: weworkremotely
+    feed: all-other-remote-jobs
+    url: https://weworkremotely.com/categories/all-other-remote-jobs.rss
+    # This category is very broad — start disabled, enable if signal is good.
+    enabled: false
+  # -- RemoteOK (single JSON feed, filter by tags) --
+  # One endpoint, 100 newest postings. Filter by tags BEFORE title_filter,
+  # otherwise you burn the title_filter pass on ~80% irrelevant rows.
+  - name: RemoteOK — AI & Engineering
+    type: remoteok
+    url: https://remoteok.com/api
+    # Required — RemoteOK returns 403 without a browser-like UA.
+    user_agent: "Mozilla/5.0 (compatible; JobForgeScanner/1.0; +https://github.com/razroo/JobForge)"
+    # Pre-filter on entry.tags (case-insensitive substring match against the
+    # array). Row passes if ANY positive matches AND NO negative matches.
+    tag_filter:
+      positive:
+        - "engineer"
+        - "engineering"
+        - "ai"
+        - "ml"
+        - "llm"
+        - "python"
+        - "golang"
+        - "typescript"
+        - "devops"
+        - "platform"
+        - "product manager"
+      negative:
+        - "wordpress"
+        - "php"
+        - "marketing"
+        - "sales"
+        - "customer support"
+    enabled: true