job-forge 2.0.2 → 2.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -14,16 +14,17 @@ The Hard Limits below are non-negotiable numeric rules. If you catch yourself ab
14
14
  3. **Always clean Geometra sessions before dispatching.** Before every round of `task` dispatches that will use Geometra, call `geometra_list_sessions` then `geometra_disconnect({closeBrowser: true})`. Every round. The disconnect is a no-op when the pool is empty.
15
15
  4. **Orchestrator does NOT fill forms.** This session MUST NOT call `geometra_fill_form`, `geometra_run_actions`, `geometra_pick_listbox_option`, or `geometra_fill_otp` when handling a multi-job request. If you need to, it means you MUST have delegated — `task` out the remaining work instead.
16
16
  5. **Re-dispatch only AFTER the previous subagent returns.** Never fire the same company's `task` twice while the first is still in-flight. Wait for the return value, then decide if a retry is warranted.
17
- 6. **Application outcomes flow through TSVs, not `data/pipeline.md`.** When a subagent returns APPLIED / FAILED / SKIP, the outcome goes to `batch/tracker-additions/{num}-{slug}.tsv`. `node merge-tracker.mjs` then consumes the TSVs into the correct `data/applications/YYYY-MM-DD.md` day file. `data/pipeline.md` only tracks URL inbox state (`[ ]` pending → `[x]` processed). **NEVER append APPLIED / FAILED status lines to `pipeline.md`** — that's the day file's job, via the TSV pathway. After any multi-apply run, the orchestrator MUST run `node merge-tracker.mjs` followed by `node verify-pipeline.mjs` before ending the session.
18
- 7. **URLs passed to downstream subagents must come from a file, not from a prior subagent's prose.** When an orchestrator dispatches a subagent with a URL (for evaluation, apply, verification, etc.), the URL MUST originate from:
19
- - `data/pipeline.md`
20
- - `data/scan-history.tsv`
21
- - `batch/scan-output-*.md` or similar structured output file
22
- - A report file (`reports/{num}-*.md`) with an authoritative `**URL:**` header
17
+ 6. **Application outcomes flow through TSVs, not `data/pipeline.md`.** When a subagent returns APPLIED / FAILED / SKIP, the outcome goes to `batch/tracker-additions/{num}-{slug}.tsv`. `npx job-forge merge` then consumes the TSVs into the correct `data/applications/YYYY-MM-DD.md` day file. `data/pipeline.md` only tracks URL inbox state (`[ ]` pending → `[x]` processed). **NEVER append APPLIED / FAILED status lines to `pipeline.md`** — that's the day file's job, via the TSV pathway. After any multi-apply run, the orchestrator MUST run `npx job-forge merge` followed by `npx job-forge verify` before ending the session.
18
+ 7. **Load-bearing facts passed to downstream subagents must come from a file, not from a prior subagent's prose.** A URL, score, email ID, confirmation page snippet, JD salary range, exact answer submitted to a form question, or any other specific value that a downstream subagent will act on MUST originate from one of:
19
+ - `data/pipeline.md` (URL inbox state)
20
+ - `data/scan-history.tsv` (scan provenance)
21
+ - `batch/scan-output-*.md` (scan-ranked candidates)
22
+ - A report file (`reports/{num}-*.md`) with authoritative headers (`**URL:**`, `**Score:**`, etc.)
23
+ - A TSV in `batch/tracker-additions/` (per-apply outcomes)
23
24
 
24
- URLs mentioned in a subagent's return message are NOT trustworthy by default they may be hallucinated or reconstructed. Before passing any URL from a subagent report to another subagent, cross-check it exists in one of the authoritative files above, OR instruct the dispatching subagent to write its output to a structured file and re-read from that file.
25
+ **Not trustworthy by default**: anything quoted from a subagent's return message, any ID or score the orchestrator "remembers" from prose, any page-content snippet reproduced from a subagent's narrative. Subagents can hallucinate plausible-looking IDs, scores, and confirmation text. Before passing any such fact to another subagent, cross-check it exists in one of the authoritative files above, OR instruct the dispatching subagent to write its output to a structured file and re-read from that file.
25
26
 
26
- **Why**: a scan subagent once reported 30 plausible-looking Greenhouse IDs in its return message that did not exist in the Greenhouse API. The orchestrator dispatched 30 downstream subagents that all failed verification. Trusting prose-form URLs cost ~2 hours of wasted work and corrupted the tracker.
27
+ **Why**: on 2026-04-18, a scan subagent returned 30 fabricated Greenhouse IDs in prose (correct role titles, plausible-looking invented IDs that didn't exist in the API). The orchestrator dispatched 30 downstream subagents that all hit 404s. Verification rules downstream (Hard Limit #6, API-first verify) caught the symptom. This rule prevents the *shape* of the bug — hallucinations propagating through prose handoffs — across all quantitative / identifier / specific-fact claims, not just URLs.
27
28
 
28
29
  Everything below is context and rationale. These seven numbers are the rules.
29
30
 
@@ -45,6 +46,8 @@ Whenever the user says any variation of "apply to N jobs", "process the pipeline
45
46
 
46
47
  **Exception:** evaluation-only or tracker-only work (no Geometra, no repeated tool calls) can proceed in a single session. The rule targets tool-heavy multi-step loops.
47
48
 
49
+ **Before any batch-apply dispatch, run the Apply Preflight location filter from `modes/apply.md`** to exclude location-incompatible candidates. Catches the common case where an evaluated role has the right role-shape but a deal-breaking location that profile.yml already rules out.
50
+
48
51
  ---
49
52
 
50
53
  ## Subagent Routing — which agent for which task
@@ -297,6 +300,8 @@ When a form says "enter the code we sent to your email", you MUST retrieve the c
297
300
  | Workday | `from:myworkday newer_than:10m` |
298
301
  | Lever | `from:lever newer_than:10m` |
299
302
  | Ashby | `from:ashby newer_than:10m` |
303
+ | SmartRecruiters | `from:smartrecruiters newer_than:10m` |
304
+ | Aggregator redirect (WeWorkRemotely / RemoteOK) | Detect the underlying ATS from the post-redirect URL, then use that row's sender query |
300
305
  | Unknown | `newer_than:10m subject:(verify OR code OR confirm)` |
301
306
 
302
307
  **Rules:**
@@ -471,7 +476,7 @@ To check or modify MCP settings, edit `opencode.json` in the project root.
471
476
  - JDs in `jds/` (referenced as `local:jds/{file}` in pipeline.md)
472
477
  - Batch in `batch/` (gitignored except scripts and prompt)
473
478
  - Report numbering: sequential 3-digit zero-padded, max existing + 1
474
- - **RULE: After each batch of evaluations, run `node merge-tracker.mjs`** to merge tracker additions and avoid duplications.
479
+ - **RULE: After each batch of evaluations, run `npx job-forge merge`** to merge tracker additions and avoid duplications.
475
480
  - **RULE: NEVER create new entries in applications.md if company+role already exists.** Update the existing entry.
476
481
  - **RULE: NEVER attribute commits to opencode (no `Co-Authored-By: opencode` or similar).** All commits must be attributed solely to the person making the commit (e.g., CharlieGreenman).
477
482
 
@@ -498,13 +503,13 @@ Write one TSV file per evaluation to `batch/tracker-additions/{num}-{company-slu
498
503
 
499
504
  ### Pipeline Integrity
500
505
 
501
- 1. **NEVER edit day files in `data/applications/` to ADD new entries** -- Write TSV in `batch/tracker-additions/` and `merge-tracker.mjs` handles the merge.
506
+ 1. **NEVER edit day files in `data/applications/` to ADD new entries** -- Write TSV in `batch/tracker-additions/` and `npx job-forge merge` handles the merge.
502
507
  2. **YES you can edit day files in `data/applications/` to UPDATE status/notes of existing entries.**
503
508
  3. All reports MUST include `**URL:**` in the header (between Score and PDF).
504
509
  4. All statuses MUST be canonical (see `templates/states.yml`).
505
- 5. Health check: `node verify-pipeline.mjs`
506
- 6. Normalize statuses: `node normalize-statuses.mjs`
507
- 7. Dedup: `node dedup-tracker.mjs`
510
+ 5. Health check: `npx job-forge verify`
511
+ 6. Normalize statuses: `npx job-forge normalize`
512
+ 7. Dedup: `npx job-forge dedup`
508
513
 
509
514
  ### Canonical States (applications day files)
510
515
 
@@ -520,6 +525,7 @@ Write one TSV file per evaluation to `batch/tracker-additions/{num}-{company-slu
520
525
  | `Offer` | Offer received |
521
526
  | `Rejected` | Rejected by company |
522
527
  | `Discarded` | Discarded by candidate or offer closed |
528
+ | `Failed` | Submission attempted but blocked by portal (spam-filter, anti-bot, broken form). May be recoverable via manual retry. |
523
529
  | `SKIP` | Doesn't fit, don't apply |
524
530
 
525
531
  **RULES:**
@@ -163,8 +163,8 @@ Step 5 — Between rounds: clean sessions again
163
163
  - geometra_disconnect({ closeBrowser: true })
164
164
 
165
165
  Step 6 — After all rounds: reconcile outcomes (Hard Limit #6)
166
- - bash: node merge-tracker.mjs # consumes batch/tracker-additions/*.tsv into the day file
167
- - bash: node verify-pipeline.mjs # validates URL/status consistency
166
+ - bash: npx job-forge merge # consumes batch/tracker-additions/*.tsv into the day file
167
+ - bash: npx job-forge verify # validates URL/status consistency
168
168
  - Review output; if verify-pipeline reports issues, fix them before ending.
169
169
 
170
170
  Step 7 — Aggregate and report
package/AGENTS.md CHANGED
@@ -9,16 +9,17 @@ The Hard Limits below are non-negotiable numeric rules. If you catch yourself ab
9
9
  3. **Always clean Geometra sessions before dispatching.** Before every round of `task` dispatches that will use Geometra, call `geometra_list_sessions` then `geometra_disconnect({closeBrowser: true})`. Every round. The disconnect is a no-op when the pool is empty.
10
10
  4. **Orchestrator does NOT fill forms.** This session MUST NOT call `geometra_fill_form`, `geometra_run_actions`, `geometra_pick_listbox_option`, or `geometra_fill_otp` when handling a multi-job request. If you need to, it means you MUST have delegated — `task` out the remaining work instead.
11
11
  5. **Re-dispatch only AFTER the previous subagent returns.** Never fire the same company's `task` twice while the first is still in-flight. Wait for the return value, then decide if a retry is warranted.
12
- 6. **Application outcomes flow through TSVs, not `data/pipeline.md`.** When a subagent returns APPLIED / FAILED / SKIP, the outcome goes to `batch/tracker-additions/{num}-{slug}.tsv`. `node merge-tracker.mjs` then consumes the TSVs into the correct `data/applications/YYYY-MM-DD.md` day file. `data/pipeline.md` only tracks URL inbox state (`[ ]` pending → `[x]` processed). **NEVER append APPLIED / FAILED status lines to `pipeline.md`** — that's the day file's job, via the TSV pathway. After any multi-apply run, the orchestrator MUST run `node merge-tracker.mjs` followed by `node verify-pipeline.mjs` before ending the session.
13
- 7. **URLs passed to downstream subagents must come from a file, not from a prior subagent's prose.** When an orchestrator dispatches a subagent with a URL (for evaluation, apply, verification, etc.), the URL MUST originate from:
14
- - `data/pipeline.md`
15
- - `data/scan-history.tsv`
16
- - `batch/scan-output-*.md` or similar structured output file
17
- - A report file (`reports/{num}-*.md`) with an authoritative `**URL:**` header
12
+ 6. **Application outcomes flow through TSVs, not `data/pipeline.md`.** When a subagent returns APPLIED / FAILED / SKIP, the outcome goes to `batch/tracker-additions/{num}-{slug}.tsv`. `npx job-forge merge` then consumes the TSVs into the correct `data/applications/YYYY-MM-DD.md` day file. `data/pipeline.md` only tracks URL inbox state (`[ ]` pending → `[x]` processed). **NEVER append APPLIED / FAILED status lines to `pipeline.md`** — that's the day file's job, via the TSV pathway. After any multi-apply run, the orchestrator MUST run `npx job-forge merge` followed by `npx job-forge verify` before ending the session.
13
+ 7. **Load-bearing facts passed to downstream subagents must come from a file, not from a prior subagent's prose.** A URL, score, email ID, confirmation page snippet, JD salary range, exact answer submitted to a form question, or any other specific value that a downstream subagent will act on MUST originate from one of:
14
+ - `data/pipeline.md` (URL inbox state)
15
+ - `data/scan-history.tsv` (scan provenance)
16
+ - `batch/scan-output-*.md` (scan-ranked candidates)
17
+ - A report file (`reports/{num}-*.md`) with authoritative headers (`**URL:**`, `**Score:**`, etc.)
18
+ - A TSV in `batch/tracker-additions/` (per-apply outcomes)
18
19
 
19
- URLs mentioned in a subagent's return message are NOT trustworthy by default they may be hallucinated or reconstructed. Before passing any URL from a subagent report to another subagent, cross-check it exists in one of the authoritative files above, OR instruct the dispatching subagent to write its output to a structured file and re-read from that file.
20
+ **Not trustworthy by default**: anything quoted from a subagent's return message, any ID or score the orchestrator "remembers" from prose, any page-content snippet reproduced from a subagent's narrative. Subagents can hallucinate plausible-looking IDs, scores, and confirmation text. Before passing any such fact to another subagent, cross-check it exists in one of the authoritative files above, OR instruct the dispatching subagent to write its output to a structured file and re-read from that file.
20
21
 
21
- **Why**: a scan subagent once reported 30 plausible-looking Greenhouse IDs in its return message that did not exist in the Greenhouse API. The orchestrator dispatched 30 downstream subagents that all failed verification. Trusting prose-form URLs cost ~2 hours of wasted work and corrupted the tracker.
22
+ **Why**: on 2026-04-18, a scan subagent returned 30 fabricated Greenhouse IDs in prose (correct role titles, plausible-looking invented IDs that didn't exist in the API). The orchestrator dispatched 30 downstream subagents that all hit 404s. Verification rules downstream (Hard Limit #6, API-first verify) caught the symptom. This rule prevents the *shape* of the bug — hallucinations propagating through prose handoffs — across all quantitative / identifier / specific-fact claims, not just URLs.
22
23
 
23
24
  Everything below is context and rationale. These seven numbers are the rules.
24
25
 
@@ -40,6 +41,8 @@ Whenever the user says any variation of "apply to N jobs", "process the pipeline
40
41
 
41
42
  **Exception:** evaluation-only or tracker-only work (no Geometra, no repeated tool calls) can proceed in a single session. The rule targets tool-heavy multi-step loops.
42
43
 
44
+ **Before any batch-apply dispatch, run the Apply Preflight location filter from `modes/apply.md`** to exclude location-incompatible candidates. Catches the common case where an evaluated role has the right role-shape but a deal-breaking location that profile.yml already rules out.
45
+
43
46
  ---
44
47
 
45
48
  ## Subagent Routing — which agent for which task
@@ -292,6 +295,8 @@ When a form says "enter the code we sent to your email", you MUST retrieve the c
292
295
  | Workday | `from:myworkday newer_than:10m` |
293
296
  | Lever | `from:lever newer_than:10m` |
294
297
  | Ashby | `from:ashby newer_than:10m` |
298
+ | SmartRecruiters | `from:smartrecruiters newer_than:10m` |
299
+ | Aggregator redirect (WeWorkRemotely / RemoteOK) | Detect the underlying ATS from the post-redirect URL, then use that row's sender query |
295
300
  | Unknown | `newer_than:10m subject:(verify OR code OR confirm)` |
296
301
 
297
302
  **Rules:**
@@ -466,7 +471,7 @@ To check or modify MCP settings, edit `opencode.json` in the project root.
466
471
  - JDs in `jds/` (referenced as `local:jds/{file}` in pipeline.md)
467
472
  - Batch in `batch/` (gitignored except scripts and prompt)
468
473
  - Report numbering: sequential 3-digit zero-padded, max existing + 1
469
- - **RULE: After each batch of evaluations, run `node merge-tracker.mjs`** to merge tracker additions and avoid duplications.
474
+ - **RULE: After each batch of evaluations, run `npx job-forge merge`** to merge tracker additions and avoid duplications.
470
475
  - **RULE: NEVER create new entries in applications.md if company+role already exists.** Update the existing entry.
471
476
  - **RULE: NEVER attribute commits to opencode (no `Co-Authored-By: opencode` or similar).** All commits must be attributed solely to the person making the commit (e.g., CharlieGreenman).
472
477
 
@@ -493,13 +498,13 @@ Write one TSV file per evaluation to `batch/tracker-additions/{num}-{company-slu
493
498
 
494
499
  ### Pipeline Integrity
495
500
 
496
- 1. **NEVER edit day files in `data/applications/` to ADD new entries** -- Write TSV in `batch/tracker-additions/` and `merge-tracker.mjs` handles the merge.
501
+ 1. **NEVER edit day files in `data/applications/` to ADD new entries** -- Write TSV in `batch/tracker-additions/` and `npx job-forge merge` handles the merge.
497
502
  2. **YES you can edit day files in `data/applications/` to UPDATE status/notes of existing entries.**
498
503
  3. All reports MUST include `**URL:**` in the header (between Score and PDF).
499
504
  4. All statuses MUST be canonical (see `templates/states.yml`).
500
- 5. Health check: `node verify-pipeline.mjs`
501
- 6. Normalize statuses: `node normalize-statuses.mjs`
502
- 7. Dedup: `node dedup-tracker.mjs`
505
+ 5. Health check: `npx job-forge verify`
506
+ 6. Normalize statuses: `npx job-forge normalize`
507
+ 7. Dedup: `npx job-forge dedup`
503
508
 
504
509
  ### Canonical States (applications day files)
505
510
 
@@ -515,6 +520,7 @@ Write one TSV file per evaluation to `batch/tracker-additions/{num}-{company-slu
515
520
  | `Offer` | Offer received |
516
521
  | `Rejected` | Rejected by company |
517
522
  | `Discarded` | Discarded by candidate or offer closed |
523
+ | `Failed` | Submission attempted but blocked by portal (spam-filter, anti-bot, broken form). May be recoverable via manual retry. |
518
524
  | `SKIP` | Doesn't fit, don't apply |
519
525
 
520
526
  **RULES:**
package/CLAUDE.md CHANGED
@@ -9,16 +9,17 @@ The Hard Limits below are non-negotiable numeric rules. If you catch yourself ab
9
9
  3. **Always clean Geometra sessions before dispatching.** Before every round of `task` dispatches that will use Geometra, call `geometra_list_sessions` then `geometra_disconnect({closeBrowser: true})`. Every round. The disconnect is a no-op when the pool is empty.
10
10
  4. **Orchestrator does NOT fill forms.** This session MUST NOT call `geometra_fill_form`, `geometra_run_actions`, `geometra_pick_listbox_option`, or `geometra_fill_otp` when handling a multi-job request. If you need to, it means you MUST have delegated — `task` out the remaining work instead.
11
11
  5. **Re-dispatch only AFTER the previous subagent returns.** Never fire the same company's `task` twice while the first is still in-flight. Wait for the return value, then decide if a retry is warranted.
12
- 6. **Application outcomes flow through TSVs, not `data/pipeline.md`.** When a subagent returns APPLIED / FAILED / SKIP, the outcome goes to `batch/tracker-additions/{num}-{slug}.tsv`. `node merge-tracker.mjs` then consumes the TSVs into the correct `data/applications/YYYY-MM-DD.md` day file. `data/pipeline.md` only tracks URL inbox state (`[ ]` pending → `[x]` processed). **NEVER append APPLIED / FAILED status lines to `pipeline.md`** — that's the day file's job, via the TSV pathway. After any multi-apply run, the orchestrator MUST run `node merge-tracker.mjs` followed by `node verify-pipeline.mjs` before ending the session.
13
- 7. **URLs passed to downstream subagents must come from a file, not from a prior subagent's prose.** When an orchestrator dispatches a subagent with a URL (for evaluation, apply, verification, etc.), the URL MUST originate from:
14
- - `data/pipeline.md`
15
- - `data/scan-history.tsv`
16
- - `batch/scan-output-*.md` or similar structured output file
17
- - A report file (`reports/{num}-*.md`) with an authoritative `**URL:**` header
12
+ 6. **Application outcomes flow through TSVs, not `data/pipeline.md`.** When a subagent returns APPLIED / FAILED / SKIP, the outcome goes to `batch/tracker-additions/{num}-{slug}.tsv`. `npx job-forge merge` then consumes the TSVs into the correct `data/applications/YYYY-MM-DD.md` day file. `data/pipeline.md` only tracks URL inbox state (`[ ]` pending → `[x]` processed). **NEVER append APPLIED / FAILED status lines to `pipeline.md`** — that's the day file's job, via the TSV pathway. After any multi-apply run, the orchestrator MUST run `npx job-forge merge` followed by `npx job-forge verify` before ending the session.
13
+ 7. **Load-bearing facts passed to downstream subagents must come from a file, not from a prior subagent's prose.** A URL, score, email ID, confirmation page snippet, JD salary range, exact answer submitted to a form question, or any other specific value that a downstream subagent will act on MUST originate from one of:
14
+ - `data/pipeline.md` (URL inbox state)
15
+ - `data/scan-history.tsv` (scan provenance)
16
+ - `batch/scan-output-*.md` (scan-ranked candidates)
17
+ - A report file (`reports/{num}-*.md`) with authoritative headers (`**URL:**`, `**Score:**`, etc.)
18
+ - A TSV in `batch/tracker-additions/` (per-apply outcomes)
18
19
 
19
- URLs mentioned in a subagent's return message are NOT trustworthy by default they may be hallucinated or reconstructed. Before passing any URL from a subagent report to another subagent, cross-check it exists in one of the authoritative files above, OR instruct the dispatching subagent to write its output to a structured file and re-read from that file.
20
+ **Not trustworthy by default**: anything quoted from a subagent's return message, any ID or score the orchestrator "remembers" from prose, any page-content snippet reproduced from a subagent's narrative. Subagents can hallucinate plausible-looking IDs, scores, and confirmation text. Before passing any such fact to another subagent, cross-check it exists in one of the authoritative files above, OR instruct the dispatching subagent to write its output to a structured file and re-read from that file.
20
21
 
21
- **Why**: a scan subagent once reported 30 plausible-looking Greenhouse IDs in its return message that did not exist in the Greenhouse API. The orchestrator dispatched 30 downstream subagents that all failed verification. Trusting prose-form URLs cost ~2 hours of wasted work and corrupted the tracker.
22
+ **Why**: on 2026-04-18, a scan subagent returned 30 fabricated Greenhouse IDs in prose (correct role titles, plausible-looking invented IDs that didn't exist in the API). The orchestrator dispatched 30 downstream subagents that all hit 404s. Verification rules downstream (Hard Limit #6, API-first verify) caught the symptom. This rule prevents the *shape* of the bug — hallucinations propagating through prose handoffs — across all quantitative / identifier / specific-fact claims, not just URLs.
22
23
 
23
24
  Everything below is context and rationale. These seven numbers are the rules.
24
25
 
@@ -40,6 +41,8 @@ Whenever the user says any variation of "apply to N jobs", "process the pipeline
40
41
 
41
42
  **Exception:** evaluation-only or tracker-only work (no Geometra, no repeated tool calls) can proceed in a single session. The rule targets tool-heavy multi-step loops.
42
43
 
44
+ **Before any batch-apply dispatch, run the Apply Preflight location filter from `modes/apply.md`** to exclude location-incompatible candidates. Catches the common case where an evaluated role has the right role-shape but a deal-breaking location that profile.yml already rules out.
45
+
43
46
  ---
44
47
 
45
48
  ## Subagent Routing — which agent for which task
@@ -292,6 +295,8 @@ When a form says "enter the code we sent to your email", you MUST retrieve the c
292
295
  | Workday | `from:myworkday newer_than:10m` |
293
296
  | Lever | `from:lever newer_than:10m` |
294
297
  | Ashby | `from:ashby newer_than:10m` |
298
+ | SmartRecruiters | `from:smartrecruiters newer_than:10m` |
299
+ | Aggregator redirect (WeWorkRemotely / RemoteOK) | Detect the underlying ATS from the post-redirect URL, then use that row's sender query |
295
300
  | Unknown | `newer_than:10m subject:(verify OR code OR confirm)` |
296
301
 
297
302
  **Rules:**
@@ -466,7 +471,7 @@ To check or modify MCP settings, edit `opencode.json` in the project root.
466
471
  - JDs in `jds/` (referenced as `local:jds/{file}` in pipeline.md)
467
472
  - Batch in `batch/` (gitignored except scripts and prompt)
468
473
  - Report numbering: sequential 3-digit zero-padded, max existing + 1
469
- - **RULE: After each batch of evaluations, run `node merge-tracker.mjs`** to merge tracker additions and avoid duplications.
474
+ - **RULE: After each batch of evaluations, run `npx job-forge merge`** to merge tracker additions and avoid duplications.
470
475
  - **RULE: NEVER create new entries in applications.md if company+role already exists.** Update the existing entry.
471
476
  - **RULE: NEVER attribute commits to opencode (no `Co-Authored-By: opencode` or similar).** All commits must be attributed solely to the person making the commit (e.g., CharlieGreenman).
472
477
 
@@ -493,13 +498,13 @@ Write one TSV file per evaluation to `batch/tracker-additions/{num}-{company-slu
493
498
 
494
499
  ### Pipeline Integrity
495
500
 
496
- 1. **NEVER edit day files in `data/applications/` to ADD new entries** -- Write TSV in `batch/tracker-additions/` and `merge-tracker.mjs` handles the merge.
501
+ 1. **NEVER edit day files in `data/applications/` to ADD new entries** -- Write TSV in `batch/tracker-additions/` and `npx job-forge merge` handles the merge.
497
502
  2. **YES you can edit day files in `data/applications/` to UPDATE status/notes of existing entries.**
498
503
  3. All reports MUST include `**URL:**` in the header (between Score and PDF).
499
504
  4. All statuses MUST be canonical (see `templates/states.yml`).
500
- 5. Health check: `node verify-pipeline.mjs`
501
- 6. Normalize statuses: `node normalize-statuses.mjs`
502
- 7. Dedup: `node dedup-tracker.mjs`
505
+ 5. Health check: `npx job-forge verify`
506
+ 6. Normalize statuses: `npx job-forge normalize`
507
+ 7. Dedup: `npx job-forge dedup`
503
508
 
504
509
  ### Canonical States (applications day files)
505
510
 
@@ -515,6 +520,7 @@ Write one TSV file per evaluation to `batch/tracker-additions/{num}-{company-slu
515
520
  | `Offer` | Offer received |
516
521
  | `Rejected` | Rejected by company |
517
522
  | `Discarded` | Discarded by candidate or offer closed |
523
+ | `Failed` | Submission attempted but blocked by portal (spam-filter, anti-bot, broken form). May be recoverable via manual retry. |
518
524
  | `SKIP` | Doesn't fit, don't apply |
519
525
 
520
526
  **RULES:**
package/batch/README.md CHANGED
@@ -51,7 +51,7 @@ npm run merge
51
51
  npm run verify # optional: pipeline health after merge (report links, statuses, pending TSVs)
52
52
  ```
53
53
 
54
- (`node merge-tracker.mjs` — same as `npm run merge`; see [CONTRIBUTING.md](../CONTRIBUTING.md#development).)
54
+ (`npx job-forge merge` — same as `npm run merge`; see [CONTRIBUTING.md](../CONTRIBUTING.md#development).)
55
55
 
56
56
  After a successful merge, each processed file is moved to **`batch/tracker-additions/merged/`** (created on first merge when the directory does not yet exist). `npm run verify` only looks for `*.tsv` files in the **top level** of `batch/tracker-additions/`, so rows already merged and archived under `merged/` do not trigger the “pending TSVs” warning.
57
57
 
@@ -243,7 +243,7 @@ Where `{company-slug}` is the company name in lowercase, no spaces, with hyphens
243
243
  12. Write HTML to `/tmp/cv-candidate-{company-slug}.html`
244
244
  13. Run:
245
245
  ```bash
246
- node generate-pdf.mjs \
246
+ npx job-forge pdf \
247
247
  /tmp/cv-candidate-{company-slug}.html \
248
248
  output/cv-candidate-{company-slug}-{{DATE}}.pdf \
249
249
  --format={letter|a4}
@@ -65,3 +65,24 @@ location:
65
65
  visa_status: "No sponsorship needed"
66
66
  # For remote roles outside your country:
67
67
  # onsite_availability: "1 week/month in any city"
68
+
69
+ # Structured location constraints — consumed by the Apply Preflight location
70
+ # filter in modes/apply.md. The prose fields above (compensation.location_flexibility,
71
+ # location.*) remain for human readability and LLM narrative context; the fields
72
+ # below govern automated, deterministic compatibility checks before dispatching
73
+ # an apply subagent.
74
+ #
75
+ # City names are lowercase, hyphenated (e.g. "san-francisco", "new-york").
76
+ # Country codes are ISO-3166 alpha-2 uppercase (e.g. "US", "CA", "GB").
77
+ location_constraints:
78
+ remote_us: true # open to US-remote roles
79
+ remote_global: false # open to non-US remote (visa / timezone permitting)
80
+ hybrid_cities: # cities where hybrid N-days-in-office is acceptable
81
+ - san-francisco
82
+ blocked_cities: # cities that are a hard No for relocation (even if hybrid)
83
+ - new-york
84
+ - london
85
+ authorized_countries: # countries where the candidate has right-to-work
86
+ - US
87
+ requires_visa_sponsorship: false # true → roles in non-authorized countries are blocked unless
88
+ # the JD explicitly mentions visa sponsorship
@@ -53,8 +53,8 @@ The skill router (`.opencode/skills/job-forge.md`) loads mode and data files on
53
53
 
54
54
  ```
55
55
  ┌─────────────────────────────────┐
56
- opencode Agent
57
- │ (reads OPENCODE.md + modes/*.md) │
56
+ Agent
57
+ │ (reads AGENTS.md + modes/*.md) │
58
58
  └──────────┬──────────────────────┘
59
59
 
60
60
  ┌──────────────────┼──────────────────────┐
@@ -85,7 +85,7 @@ The skill router (`.opencode/skills/job-forge.md`) loads mode and data files on
85
85
 
86
86
  ## Modes (`modes/`)
87
87
 
88
- Markdown mode files in `modes/` define how the opencode workflow behaves together with the root `OPENCODE.md`. **`_shared.md`** is the shared layer (archetypes, scoring dimensions, negotiation scaffolding); the rest align with `/job-forge` command entry points listed in `OPENCODE.md`.
88
+ Markdown mode files in `modes/` define how the workflow behaves together with the root `AGENTS.md`. **`_shared.md`** is the shared layer (archetypes, scoring dimensions, negotiation scaffolding); the rest align with `/job-forge` command entry points listed in `AGENTS.md`.
89
89
 
90
90
  | File | Focus |
91
91
  |------|--------|
@@ -124,7 +124,7 @@ For customization (archetypes, weights, tone), start with `_shared.md` and [CUST
124
124
  5. **Score**: Weighted average across 10 dimensions (1-5)
125
125
  6. **Report**: Save as `reports/{num}-{company}-{date}.md`
126
126
  7. **PDF**: Generate ATS-optimized CV (`generate-pdf.mjs`)
127
- 8. **Track**: Write one TSV per evaluation under `batch/tracker-additions/` (see [OPENCODE.md](../OPENCODE.md) TSV layout); fold rows into `data/applications.md` with `npm run merge` / `merge-tracker.mjs` when you are ready (not automatic in every workflow)
127
+ 8. **Track**: Write one TSV per evaluation under `batch/tracker-additions/` (see [AGENTS.md](../AGENTS.md) TSV layout); fold rows into `data/applications.md` with `npm run merge` / `merge-tracker.mjs` when you are ready (not automatic in every workflow)
128
128
 
129
129
  ## Batch Processing
130
130
 
@@ -166,8 +166,8 @@ Step 5 — Between rounds: clean sessions again
166
166
  - geometra_disconnect({ closeBrowser: true })
167
167
 
168
168
  Step 6 — After all rounds: reconcile outcomes (Hard Limit #6)
169
- - bash: node merge-tracker.mjs # consumes batch/tracker-additions/*.tsv into the day file
170
- - bash: node verify-pipeline.mjs # validates URL/status consistency
169
+ - bash: npx job-forge merge # consumes batch/tracker-additions/*.tsv into the day file
170
+ - bash: npx job-forge verify # validates URL/status consistency
171
171
  - Review output; if verify-pipeline reports issues, fix them before ending.
172
172
 
173
173
  Step 7 — Aggregate and report
@@ -9,16 +9,17 @@ The Hard Limits below are non-negotiable numeric rules. If you catch yourself ab
9
9
  3. **Always clean Geometra sessions before dispatching.** Before every round of `task` dispatches that will use Geometra, call `geometra_list_sessions` then `geometra_disconnect({closeBrowser: true})`. Every round. The disconnect is a no-op when the pool is empty.
10
10
  4. **Orchestrator does NOT fill forms.** This session MUST NOT call `geometra_fill_form`, `geometra_run_actions`, `geometra_pick_listbox_option`, or `geometra_fill_otp` when handling a multi-job request. If you need to, it means you MUST have delegated — `task` out the remaining work instead.
11
11
  5. **Re-dispatch only AFTER the previous subagent returns.** Never fire the same company's `task` twice while the first is still in-flight. Wait for the return value, then decide if a retry is warranted.
12
- 6. **Application outcomes flow through TSVs, not `data/pipeline.md`.** When a subagent returns APPLIED / FAILED / SKIP, the outcome goes to `batch/tracker-additions/{num}-{slug}.tsv`. `node merge-tracker.mjs` then consumes the TSVs into the correct `data/applications/YYYY-MM-DD.md` day file. `data/pipeline.md` only tracks URL inbox state (`[ ]` pending → `[x]` processed). **NEVER append APPLIED / FAILED status lines to `pipeline.md`** — that's the day file's job, via the TSV pathway. After any multi-apply run, the orchestrator MUST run `node merge-tracker.mjs` followed by `node verify-pipeline.mjs` before ending the session.
13
- 7. **URLs passed to downstream subagents must come from a file, not from a prior subagent's prose.** When an orchestrator dispatches a subagent with a URL (for evaluation, apply, verification, etc.), the URL MUST originate from:
14
- - `data/pipeline.md`
15
- - `data/scan-history.tsv`
16
- - `batch/scan-output-*.md` or similar structured output file
17
- - A report file (`reports/{num}-*.md`) with an authoritative `**URL:**` header
12
+ 6. **Application outcomes flow through TSVs, not `data/pipeline.md`.** When a subagent returns APPLIED / FAILED / SKIP, the outcome goes to `batch/tracker-additions/{num}-{slug}.tsv`. `npx job-forge merge` then consumes the TSVs into the correct `data/applications/YYYY-MM-DD.md` day file. `data/pipeline.md` only tracks URL inbox state (`[ ]` pending → `[x]` processed). **NEVER append APPLIED / FAILED status lines to `pipeline.md`** — that's the day file's job, via the TSV pathway. After any multi-apply run, the orchestrator MUST run `npx job-forge merge` followed by `npx job-forge verify` before ending the session.
13
+ 7. **Load-bearing facts passed to downstream subagents must come from a file, not from a prior subagent's prose.** A URL, score, email ID, confirmation page snippet, JD salary range, exact answer submitted to a form question, or any other specific value that a downstream subagent will act on MUST originate from one of:
14
+ - `data/pipeline.md` (URL inbox state)
15
+ - `data/scan-history.tsv` (scan provenance)
16
+ - `batch/scan-output-*.md` (scan-ranked candidates)
17
+ - A report file (`reports/{num}-*.md`) with authoritative headers (`**URL:**`, `**Score:**`, etc.)
18
+ - A TSV in `batch/tracker-additions/` (per-apply outcomes)
18
19
 
19
- URLs mentioned in a subagent's return message are NOT trustworthy by default they may be hallucinated or reconstructed. Before passing any URL from a subagent report to another subagent, cross-check it exists in one of the authoritative files above, OR instruct the dispatching subagent to write its output to a structured file and re-read from that file.
20
+ **Not trustworthy by default**: anything quoted from a subagent's return message, any ID or score the orchestrator "remembers" from prose, any page-content snippet reproduced from a subagent's narrative. Subagents can hallucinate plausible-looking IDs, scores, and confirmation text. Before passing any such fact to another subagent, cross-check it exists in one of the authoritative files above, OR instruct the dispatching subagent to write its output to a structured file and re-read from that file.
20
21
 
21
- **Why**: a scan subagent once reported 30 plausible-looking Greenhouse IDs in its return message that did not exist in the Greenhouse API. The orchestrator dispatched 30 downstream subagents that all failed verification. Trusting prose-form URLs cost ~2 hours of wasted work and corrupted the tracker.
22
+ **Why**: on 2026-04-18, a scan subagent returned 30 fabricated Greenhouse IDs in prose (correct role titles, plausible-looking invented IDs that didn't exist in the API). The orchestrator dispatched 30 downstream subagents that all hit 404s. Verification rules downstream (Hard Limit #6, API-first verify) caught the symptom. This rule prevents the *shape* of the bug — hallucinations propagating through prose handoffs — across all quantitative / identifier / specific-fact claims, not just URLs.
22
23
 
23
24
  Everything below is context and rationale. These seven numbers are the rules.
24
25
 
@@ -40,6 +41,8 @@ Whenever the user says any variation of "apply to N jobs", "process the pipeline
40
41
 
41
42
  **Exception:** evaluation-only or tracker-only work (no Geometra, no repeated tool calls) can proceed in a single session. The rule targets tool-heavy multi-step loops.
42
43
 
44
+ **Before any batch-apply dispatch, run the Apply Preflight location filter from `modes/apply.md`** to exclude location-incompatible candidates. Catches the common case where an evaluated role has the right role-shape but a deal-breaking location that profile.yml already rules out.
45
+
43
46
  ---
44
47
 
45
48
  ## Subagent Routing — which agent for which task
@@ -292,6 +295,8 @@ When a form says "enter the code we sent to your email", you MUST retrieve the c
292
295
  | Workday | `from:myworkday newer_than:10m` |
293
296
  | Lever | `from:lever newer_than:10m` |
294
297
  | Ashby | `from:ashby newer_than:10m` |
298
+ | SmartRecruiters | `from:smartrecruiters newer_than:10m` |
299
+ | Aggregator redirect (WeWorkRemotely / RemoteOK) | Detect the underlying ATS from the post-redirect URL, then use that row's sender query |
295
300
  | Unknown | `newer_than:10m subject:(verify OR code OR confirm)` |
296
301
 
297
302
  **Rules:**
@@ -466,7 +471,7 @@ To check or modify MCP settings, edit `opencode.json` in the project root.
466
471
  - JDs in `jds/` (referenced as `local:jds/{file}` in pipeline.md)
467
472
  - Batch in `batch/` (gitignored except scripts and prompt)
468
473
  - Report numbering: sequential 3-digit zero-padded, max existing + 1
469
- - **RULE: After each batch of evaluations, run `node merge-tracker.mjs`** to merge tracker additions and avoid duplications.
474
+ - **RULE: After each batch of evaluations, run `npx job-forge merge`** to merge tracker additions and avoid duplications.
470
475
  - **RULE: NEVER create new entries in applications.md if company+role already exists.** Update the existing entry.
471
476
  - **RULE: NEVER attribute commits to opencode (no `Co-Authored-By: opencode` or similar).** All commits must be attributed solely to the person making the commit (e.g., CharlieGreenman).
472
477
 
@@ -493,13 +498,13 @@ Write one TSV file per evaluation to `batch/tracker-additions/{num}-{company-slu
493
498
 
494
499
  ### Pipeline Integrity
495
500
 
496
- 1. **NEVER edit day files in `data/applications/` to ADD new entries** -- Write TSV in `batch/tracker-additions/` and `merge-tracker.mjs` handles the merge.
501
+ 1. **NEVER edit day files in `data/applications/` to ADD new entries** -- Write TSV in `batch/tracker-additions/` and `npx job-forge merge` handles the merge.
497
502
  2. **YES you can edit day files in `data/applications/` to UPDATE status/notes of existing entries.**
498
503
  3. All reports MUST include `**URL:**` in the header (between Score and PDF).
499
504
  4. All statuses MUST be canonical (see `templates/states.yml`).
500
- 5. Health check: `node verify-pipeline.mjs`
501
- 6. Normalize statuses: `node normalize-statuses.mjs`
502
- 7. Dedup: `node dedup-tracker.mjs`
505
+ 5. Health check: `npx job-forge verify`
506
+ 6. Normalize statuses: `npx job-forge normalize`
507
+ 7. Dedup: `npx job-forge dedup`
503
508
 
504
509
  ### Canonical States (applications day files)
505
510
 
@@ -515,6 +520,7 @@ Write one TSV file per evaluation to `batch/tracker-additions/{num}-{company-slu
515
520
  | `Offer` | Offer received |
516
521
  | `Rejected` | Rejected by company |
517
522
  | `Discarded` | Discarded by candidate or offer closed |
523
+ | `Failed` | Submission attempted but blocked by portal (spam-filter, anti-bot, broken form). May be recoverable via manual retry. |
518
524
  | `SKIP` | Doesn't fit, don't apply |
519
525
 
520
526
  **RULES:**
@@ -0,0 +1,116 @@
1
+ /**
2
+ * canonical-states.mjs — single source of truth for JobForge canonical states.
3
+ *
4
+ * `templates/states.yml` is the authoritative list. This module reads it
5
+ * (when available) and provides a hardcoded fallback that MUST stay in sync
6
+ * with the YAML for the belt-and-suspenders case where the file is missing.
7
+ *
8
+ * Consumers:
9
+ * - merge-tracker.mjs — validation + TSV column-swap heuristic
10
+ * - normalize-statuses.mjs — canonical list for direct matching
11
+ *
12
+ * The dashboard (Go) currently duplicates this list in
13
+ * dashboard/internal/ui/screens/pipeline.go (statusOptions, statusGroupOrder, statusLabel)
14
+ * dashboard/internal/data/career.go (NormalizeStatus, StatusPriority)
15
+ * Full codegen from YAML on the Go side is a follow-up; for now those
16
+ * copies carry KEEP IN SYNC comments.
17
+ */
18
+
19
+ import { readFileSync, existsSync } from 'fs';
20
+ import { join } from 'path';
21
+
22
+ /**
23
+ * Fallback canonical labels, in display order matching templates/states.yml.
24
+ * Used when the YAML file can't be read. Keep in sync with the YAML.
25
+ */
26
+ export const DEFAULT_STATES = [
27
+ 'Evaluated',
28
+ 'Applied',
29
+ 'Responded',
30
+ 'Contacted',
31
+ 'Interview',
32
+ 'Offer',
33
+ 'Rejected',
34
+ 'Discarded',
35
+ 'Failed',
36
+ 'SKIP',
37
+ ];
38
+
39
+ /**
40
+ * Extra tokens the column-swap heuristic recognises as "this column looks
41
+ * like a status". Canonical labels plus historical aliases the tracker has
42
+ * been known to emit (duplicate/repost/hold). Kept here so that both
43
+ * merge-tracker.mjs and any future consumer see the same alias set.
44
+ */
45
+ const STATUS_DETECT_EXTRAS = ['duplicate', 'repost', 'hold'];
46
+
47
+ /**
48
+ * Parse `templates/states.yml` and return the ordered list of canonical
49
+ * labels. Returns null when the file is missing or contains no labels,
50
+ * so callers can fall back to DEFAULT_STATES.
51
+ *
52
+ * The parser intentionally uses a line-regex rather than pulling in a
53
+ * YAML dependency — job-forge has no runtime YAML parser and we don't
54
+ * want to add one just for this.
55
+ *
56
+ * @param {string} repoRoot - repo root where `templates/states.yml` lives.
57
+ * Also checks `states.yml` at the root as a legacy fallback.
58
+ * @returns {string[] | null}
59
+ */
60
+ export function loadCanonicalStates(repoRoot) {
61
+ const candidates = [
62
+ join(repoRoot, 'templates/states.yml'),
63
+ join(repoRoot, 'states.yml'),
64
+ ];
65
+
66
+ for (const filePath of candidates) {
67
+ if (!existsSync(filePath)) continue;
68
+ let text;
69
+ try {
70
+ text = readFileSync(filePath, 'utf-8');
71
+ } catch {
72
+ continue;
73
+ }
74
+ const labels = [];
75
+ for (const line of text.split('\n')) {
76
+ const m = line.match(/^\s+label:\s*(.+)$/);
77
+ if (!m) continue;
78
+ const v = m[1].trim().replace(/^['"]|['"]$/g, '');
79
+ if (v) labels.push(v);
80
+ }
81
+ if (labels.length > 0) return labels;
82
+ }
83
+ return null;
84
+ }
85
+
86
+ /**
87
+ * Build the case-insensitive "does this column look like a status?" regex
88
+ * used by merge-tracker.mjs to detect swapped status/score columns in
89
+ * legacy TSVs.
90
+ *
91
+ * Matches at the start of the column text, case-insensitive. Includes the
92
+ * canonical labels plus alias tokens (duplicate/repost/hold) that have
93
+ * historically appeared in the status column.
94
+ *
95
+ * @param {string[]} states - canonical labels (typically the output of
96
+ * loadCanonicalStates, or DEFAULT_STATES).
97
+ * @returns {RegExp}
98
+ */
99
+ export function buildStatusDetectionRegex(states) {
100
+ const tokens = [
101
+ ...states.map((s) => s.toLowerCase()),
102
+ ...STATUS_DETECT_EXTRAS,
103
+ ];
104
+ // Dedupe while preserving order.
105
+ const seen = new Set();
106
+ const unique = [];
107
+ for (const t of tokens) {
108
+ if (!seen.has(t)) {
109
+ seen.add(t);
110
+ unique.push(t);
111
+ }
112
+ }
113
+ // Escape regex-special chars just in case a label ever contains one.
114
+ const escaped = unique.map((t) => t.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'));
115
+ return new RegExp(`^(${escaped.join('|')})`, 'i');
116
+ }
package/merge-tracker.mjs CHANGED
@@ -26,6 +26,9 @@ import {
26
26
  usesDayFiles, ensureDayDir, getHeader, formatAppLine, parseAppLine,
27
27
  readAllEntries, writeToDayFiles, listDayFiles, dayFilePath,
28
28
  } from './tracker-lib.mjs';
29
+ import {
30
+ DEFAULT_STATES, loadCanonicalStates, buildStatusDetectionRegex,
31
+ } from './lib/canonical-states.mjs';
29
32
 
30
33
  const ADDITIONS_DIR = join(PROJECT_DIR, 'batch/tracker-additions');
31
34
  const MERGED_DIR = join(ADDITIONS_DIR, 'merged');
@@ -54,26 +57,8 @@ Run from the repository root.`);
54
57
  process.exit(0);
55
58
  }
56
59
 
57
- const STATES_FILE = existsSync(join(PROJECT_DIR, 'templates/states.yml'))
58
- ? join(PROJECT_DIR, 'templates/states.yml')
59
- : join(PROJECT_DIR, 'states.yml');
60
-
61
- function loadCanonicalLabelsFromStatesYaml(filePath) {
62
- if (!existsSync(filePath)) return null;
63
- const text = readFileSync(filePath, 'utf-8');
64
- const labels = [];
65
- for (const line of text.split('\n')) {
66
- const m = line.match(/^\s+label:\s*(.+)$/);
67
- if (!m) continue;
68
- let v = m[1].trim().replace(/^['"]|['"]$/g, '');
69
- if (v) labels.push(v);
70
- }
71
- return labels.length > 0 ? labels : null;
72
- }
73
-
74
- const CANONICAL_STATES = loadCanonicalLabelsFromStatesYaml(STATES_FILE) || [
75
- 'Evaluated', 'Applied', 'Contacted', 'Responded', 'Interview', 'Offer', 'Rejected', 'Discarded', 'SKIP',
76
- ];
60
+ const CANONICAL_STATES = loadCanonicalStates(PROJECT_DIR) || DEFAULT_STATES;
61
+ const STATUS_DETECT_RE = buildStatusDetectionRegex(CANONICAL_STATES);
77
62
 
78
63
  function validateStatus(status) {
79
64
  const clean = status.replace(/\*\*/g, '').replace(/\s+\d{4}-\d{2}-\d{2}.*$/, '').trim();
@@ -156,8 +141,8 @@ function parseTsvContent(content, filename) {
156
141
  const col5 = parts[5].trim();
157
142
  const col4LooksLikeScore = /^\d+\.?\d*\/5$/.test(col4) || col4 === 'N/A' || col4 === 'DUP';
158
143
  const col5LooksLikeScore = /^\d+\.?\d*\/5$/.test(col5) || col5 === 'N/A' || col5 === 'DUP';
159
- const col4LooksLikeStatus = /^(evaluated|applied|contacted|responded|interview|offer|rejected|discarded|skip|duplicate|repost|hold)/i.test(col4);
160
- const col5LooksLikeStatus = /^(evaluated|applied|contacted|responded|interview|offer|rejected|discarded|skip|duplicate|repost|hold)/i.test(col5);
144
+ const col4LooksLikeStatus = STATUS_DETECT_RE.test(col4);
145
+ const col5LooksLikeStatus = STATUS_DETECT_RE.test(col5);
161
146
 
162
147
  let statusCol, scoreCol;
163
148
  if (col4LooksLikeStatus && !col4LooksLikeScore) {
package/modes/README.md CHANGED
@@ -1,9 +1,9 @@
1
1
  # Modes
2
2
 
3
- Markdown prompts used with opencode together with the root [`OPENCODE.md`](../OPENCODE.md). Each file aligns with a `/job-forge …` entry point or shared behavior described there.
3
+ Markdown prompts used together with the root [`AGENTS.md`](../AGENTS.md). Each file aligns with a `/job-forge …` entry point or shared behavior described there.
4
4
 
5
5
  - **`_shared.md`** — Archetypes, scoring dimensions, negotiation scaffolding. Edit this first when you change how offers are classified or weighted.
6
- - **Per-command files** — Each `*.md` here pairs with a `/job-forge …` entry in [`OPENCODE.md`](../OPENCODE.md). How modes connect to batch, tracker, and scripts is spelled out in [**Architecture — Modes**](../docs/ARCHITECTURE.md#modes-modes).
6
+ - **Per-command files** — Each `*.md` here pairs with a `/job-forge …` entry in [`AGENTS.md`](../AGENTS.md). How modes connect to batch, tracker, and scripts is spelled out in [**Architecture — Modes**](../docs/ARCHITECTURE.md#modes-modes).
7
7
 
8
8
  | File | Role |
9
9
  |------|------|
package/modes/_shared.md CHANGED
@@ -247,7 +247,7 @@ If the candidate has a live demo/dashboard (check profile.yml), offer access in
247
247
 
248
248
  0. **Cover letter:** If the form has an option to attach or write a cover letter, ALWAYS include one. Generate PDF with the same visual design as the CV. Content: JD quotes mapped to proof points, links to relevant case studies. 1 page max.
249
249
  1. Read cv.md and article-digest.md (if exists) before evaluating any offer
250
- 1b. **First evaluation of each session:** Run `node cv-sync-check.mjs` with Bash. If it reports warnings, notify the candidate before continuing
250
+ 1b. **First evaluation of each session:** Run `npx job-forge sync-check` with Bash. If it reports warnings, notify the candidate before continuing
251
251
  2. Detect the role archetype and adapt framing
252
252
  3. Cite exact lines from CV when matching
253
253
  4. Use WebSearch for comp and company data
@@ -269,4 +269,4 @@ If the candidate has a live demo/dashboard (check profile.yml), offer access in
269
269
  | Read | cv.md, article-digest.md, cv-template.html |
270
270
  | Write | Temporary HTML for PDF, day files in `data/applications/YYYY-MM-DD.md`, reports .md |
271
271
  | Edit | Update tracker |
272
- | Bash | `node generate-pdf.mjs` |
272
+ | Bash | `npx job-forge pdf` |
package/modes/apply.md CHANGED
@@ -10,6 +10,54 @@ Interactive mode for when the candidate is filling out an application form in Ch
10
10
 
11
11
  For a single application interactively, carry on in the current session — the rule targets multi-job loops.
12
12
 
13
+ ## Apply Preflight — Location Filter (orchestrator runs before dispatch)
14
+
15
+ Before dispatching any batch of apply subagents, cross-check each candidate's location against `config/profile.yml`. **Prefer the structured `location_constraints` block** (deterministic match). Fall back to the prose `location.*` / `compensation.location_flexibility` fields only when `location_constraints` is absent (legacy profiles).
16
+
17
+ ### Preferred path — structured `location_constraints` (deterministic)
18
+
19
+ 1. Read `config/profile.yml → location_constraints`. If present, use the structured fields:
20
+
21
+ ```yaml
22
+ location_constraints:
23
+ remote_us: true | false
24
+ remote_global: true | false
25
+ hybrid_cities: [san-francisco, ...]
26
+ blocked_cities: [new-york, ...]
27
+ authorized_countries: [US, ...] # ISO-3166 alpha-2
28
+ requires_visa_sponsorship: true | false
29
+ ```
30
+
31
+ 2. For each candidate, open its evaluation report (`reports/{num}-*.md`) and read the Location / Block A content. Extract: `mode ∈ {remote, hybrid, onsite}`, `city` (lowercase hyphenated), `country` (ISO-3166 alpha-2 when derivable).
32
+
33
+ 3. Apply the filter (decision table):
34
+
35
+ | Role shape | Rule | Outcome |
36
+ |---|---|---|
37
+ | Remote, country ∈ authorized_countries (typically US) | `remote_us == true` → COMPATIBLE | dispatch |
38
+ | Remote, country ∉ authorized_countries | `remote_global == true` AND (`requires_visa_sponsorship == false` OR JD mentions sponsorship) → COMPATIBLE | dispatch / else skip |
39
+ | Hybrid, `city ∈ hybrid_cities` | COMPATIBLE | dispatch |
40
+ | Hybrid or Onsite, `city ∈ blocked_cities` | INCOMPATIBLE | mark `Discarded`, note `location mismatch: blocked_city=X` |
41
+ | Hybrid or Onsite, `city` not in `hybrid_cities` and not in `blocked_cities` | INCOMPATIBLE by default (hybrid is opt-in per city) | mark `Discarded`, note `location mismatch: city=X not in hybrid_cities` |
42
+ | Location unclear / ambiguous | dispatch with a prompt flag instructing the apply subagent to verify the JD location first and Discard early if confirmed incompatible | dispatch-with-flag |
43
+
44
+ 4. Country/visa: if `requires_visa_sponsorship == false` AND `country ∉ authorized_countries` AND the JD does NOT explicitly offer sponsorship → INCOMPATIBLE, do NOT dispatch.
45
+
46
+ ### Fallback path — prose fields (legacy profiles with no `location_constraints`)
47
+
48
+ When `location_constraints` is absent, use the prose fields:
49
+
50
+ 1. Read `config/profile.yml` for `location` (country, city), `compensation.location_flexibility`, and `visa_status`.
51
+ 2. For each candidate, open its evaluation report (`reports/{num}-*.md`) and read the Location / Block A content.
52
+ 3. Apply the filter:
53
+ - If the report says "Remote (US)" / "Remote" / "fully remote" — COMPATIBLE, dispatch.
54
+ - If the report says "Hybrid N days in {city}" AND {city} matches `location.city` OR `location_flexibility` says "open to hybrid in {city}" — COMPATIBLE, dispatch.
55
+ - If the report says "Hybrid" or "Onsite" at a city NOT in the profile's location set AND `location_flexibility` says Remote-preferred — INCOMPATIBLE, do NOT dispatch. Mark the tracker entry `Discarded` directly with note `location mismatch: profile=X, role=Y`.
56
+ - If unclear or ambiguous — dispatch with a prompt flag telling the apply subagent to verify the JD location first and Discard early if confirmed incompatible.
57
+ 4. Country/visa: if `visa_status: "No sponsorship needed"` and the role is outside the authorized country — INCOMPATIBLE, do NOT dispatch.
58
+
59
+ **Why**: on 2026-04-18, 5 of 7 candidates dispatched for apply turned out location-incompatible. Each burned an apply-subagent round. The prose-field path reached the right call but cost interpretation cycles per dispatch; the structured path is O(1) field lookup and removes LLM-interpretation risk.
60
+
13
61
  ### Run this multi-job apply runbook literally when N > 1
14
62
 
15
63
  ```
@@ -24,8 +72,8 @@ Step 4 — For round in ceil(N/2):
24
72
  # WAIT for both returns. Do not proceed until both done.
25
73
  Step 5 — Between rounds: geometra_list_sessions() + geometra_disconnect({closeBrowser: true})
26
74
  Step 6 — Reconcile outcomes (Hard Limit #6):
27
- bash: node merge-tracker.mjs # TSVs → day file
28
- bash: node verify-pipeline.mjs # validate
75
+ bash: npx job-forge merge # TSVs → day file
76
+ bash: npx job-forge verify # validate
29
77
  Step 7 — Summarize outcomes; do NOT auto-retry failures.
30
78
  ```
31
79
 
@@ -33,7 +81,7 @@ If a subagent fails, report it in the summary and let the user decide whether to
33
81
 
34
82
  **Outcome routing (Hard Limit #6 in `AGENTS.md`):**
35
83
  - Subagents write `batch/tracker-additions/{num}-{slug}.tsv` — one TSV per job.
36
- - Orchestrator runs `node merge-tracker.mjs` once at the end to consume TSVs into the right day file.
84
+ - Orchestrator runs `npx job-forge merge` once at the end to consume TSVs into the right day file.
37
85
  - **Do NOT** append APPLIED / FAILED / SKIP lines to `data/pipeline.md` — that file is the URL inbox only.
38
86
 
39
87
  ## Verify these requirements
@@ -214,13 +262,30 @@ Specific portals — Workday "parse my resume", iCIMS multi-step, SAP SuccessFac
214
262
  Check for an OTP gate after the candidate (or Geometra) submits — the major portals (Greenhouse, Workday, Lever, Ashby) gate submission behind an email verification code. When an OTP step appears, do this.
215
263
 
216
264
  1. **Do NOT stop and ask the candidate to paste the code manually.** Use the Gmail MCP.
217
- 2. Wait ~5-10 seconds for the email, then `gmail_list_messages` with a sender-scoped recency query (e.g. `from:greenhouse newer_than:10m`).
218
- 3. `gmail_get_message` on the most recent match, extract the code from the body.
219
- 4. `geometra_fill_otp` to enter it, then submit.
265
+ 2. **Pick the Gmail sender query from the ATS recorded at scan time.** The scan subagent records the ATS type in `batch/scan-output-{YYYY-MM-DD}.md` (`ats` column) and in `data/pipeline.md` (`| ats={type}` suffix). Read that value first — do NOT re-infer the ATS from the URL host when it's already recorded.
266
+ 3. Map the `ats` value to the Gmail sender query (table below). Wait ~5-10 seconds for the email, then call `gmail_list_messages` with the matching query.
267
+ 4. `gmail_get_message` on the most recent match, extract the code from the body.
268
+ 5. `geometra_fill_otp` to enter it, then submit.
269
+
270
+ **ATS → Gmail sender query lookup** (use the `ats` value recorded at scan time):
271
+
272
+ | `ats` value | `q` for `gmail_list_messages` |
273
+ |-------------|-------------------------------|
274
+ | `greenhouse` | `from:greenhouse newer_than:10m` |
275
+ | `workday` | `from:myworkday newer_than:10m` |
276
+ | `lever` | `from:lever newer_than:10m` |
277
+ | `ashby` | `from:ashby newer_than:10m` |
278
+ | `workable` | `from:workable newer_than:10m` |
279
+ | `smartrecruiters` | `from:smartrecruiters newer_than:10m` |
280
+ | `wwr` / `remoteok` | Follow the apply redirect to the underlying ATS, re-detect the host, then use that row's query. Aggregators do not send OTP emails themselves. |
281
+ | `builtin` | `from:builtin newer_than:10m` |
282
+ | `custom` / `unknown` / missing | `newer_than:10m subject:(verify OR code OR confirm)` |
283
+
284
+ **Fallback when `ats` is missing** (legacy pipeline entries with no `| ats=` suffix, or scan-output without an `ats` column): infer from the URL host — `*.greenhouse.io` → `greenhouse`; `jobs.ashbyhq.com` → `ashby`; `jobs.lever.co` → `lever`; `*.myworkdayjobs.com` → `workday`; `apply.workable.com` / `jobs.workable.com` → `workable`; `api.smartrecruiters.com` / `jobs.smartrecruiters.com` → `smartrecruiters`; `weworkremotely.com` → `wwr`; `remoteok.com` → `remoteok`; `builtin.com` → `builtin`; otherwise use the generic `verify OR code OR confirm` subject query.
220
285
 
221
286
  **Before reporting the submission as failed, always check Gmail.** A "submit did nothing" outcome usually means a silent OTP step — not a real failure.
222
287
 
223
- Full sender-to-query table and fallback patterns: see "OTP Handling via Gmail MCP" in `AGENTS.md`.
288
+ Full OTP recipe and fallback patterns: see "OTP Handling via Gmail MCP" in `AGENTS.md`.
224
289
 
225
290
  ## Step 7 — Update outcomes after submission
226
291
 
@@ -240,7 +305,7 @@ The row exists. You are UPDATING an existing entry, which is allowed (Pipeline I
240
305
  The row does NOT exist yet. You MUST go through the TSV pathway (Hard Limit #6 + Pipeline Integrity rule #1):
241
306
 
242
307
  1. Write `batch/tracker-additions/{num}-{slug}.tsv` with the canonical 9-column format (see "TSV Format for Tracker Additions" in `AGENTS.md`)
243
- 2. At the end of the apply run, the orchestrator calls `node merge-tracker.mjs`, which inserts the row into today's day file
308
+ 2. At the end of the apply run, the orchestrator calls `npx job-forge merge`, which inserts the row into today's day file
244
309
  3. Do NOT manually add a row to the day file. Do NOT append an `APPLIED` line to `data/pipeline.md`.
245
310
 
246
311
  ### Apply to both cases
package/modes/pdf.md CHANGED
@@ -17,7 +17,7 @@
17
17
  11. Inject keywords naturally into existing achievements (NEVER fabricate)
18
18
  12. Generate complete HTML from template + personalized content
19
19
  13. Write HTML to `/tmp/cv-candidate-{company}.html`
20
- 14. Run: `node generate-pdf.mjs /tmp/cv-candidate-{company}.html output/cv-candidate-{company}-{YYYY-MM-DD}.pdf --format={letter|a4}`
20
+ 14. Run: `npx job-forge pdf /tmp/cv-candidate-{company}.html output/cv-candidate-{company}-{YYYY-MM-DD}.pdf --format={letter|a4}`
21
21
  15. Report: PDF path, page count, keyword coverage %
22
22
 
23
23
  ## Apply these ATS rules for clean parsing
package/modes/pipeline.md CHANGED
@@ -53,7 +53,7 @@ Processes accumulated job offer URLs from `data/pipeline.md`. The user adds URLs
53
53
 
54
54
  Before processing any URL, verify sync:
55
55
  ```bash
56
- node cv-sync-check.mjs
56
+ npx job-forge sync-check
57
57
  ```
58
58
  If there is a desynchronization, warn the user before continuing.
59
59
 
@@ -72,8 +72,8 @@ Step 3 — For round in ceil(N/2):
72
72
  # WAIT for both returns before the next round.
73
73
  Step 4 — Between rounds: geometra_list_sessions() + geometra_disconnect({closeBrowser: true})
74
74
  Step 5 — Reconcile outcomes (Hard Limit #6):
75
- bash: node merge-tracker.mjs # TSVs → correct day file
76
- bash: node verify-pipeline.mjs # validate URL/status consistency
75
+ bash: npx job-forge merge # TSVs → correct day file
76
+ bash: npx job-forge verify # validate URL/status consistency
77
77
  Step 6 — Display summary table; flag any verify-pipeline errors.
78
78
  ```
79
79
 
package/modes/scan.md CHANGED
@@ -34,9 +34,88 @@ Read `portals.yml` which contains:
34
34
 
35
35
  **Every company MUST have a `careers_url` in portals.yml.** If it doesn't, search for it once, save it, and use it in future scans.
36
36
 
37
- ### Use Level 2 — Greenhouse API (COMPLEMENTARY)
38
-
39
- For companies using Greenhouse, the JSON API (`boards-api.greenhouse.io/v1/boards/{slug}/jobs`) returns clean structured data. Use as a quick complement to Level 1it's faster than Geometra but only works with Greenhouse.
37
+ ### Use Level 2 — ATS / Aggregator APIs (COMPLEMENTARY)
38
+
39
+ For companies using an ATS or aggregator that exposes a public JSON/RSS API, fetch structured data directly. APIs are faster than Geometra and harder to hallucinate (the response is load-bearing record IDs verbatim from the response, never reconstruct them). Use as a complement to Level 1.
40
+
41
+ Supported API shapes:
42
+
43
+ #### Greenhouse (JSON, per-company board)
44
+
45
+ - **Endpoint**: `https://boards-api.greenhouse.io/v1/boards/{slug}/jobs`
46
+ - **Method**: `GET` (plain, no auth)
47
+ - **Shape**: `{ jobs: [{ id, title, absolute_url, updated_at, location: { name } }, ...] }`
48
+ - **Canonical URL to record**: `https://job-boards.greenhouse.io/{slug}/jobs/{id}` — do NOT use `absolute_url` when it points to a customer-skinned front-end (see Verification section below).
49
+ - **ats**: `greenhouse`
50
+
51
+ #### Ashby (JSON, per-company board)
52
+
53
+ - **Endpoint**: `https://api.ashbyhq.com/posting-api/job-board/{slug}?includeCompensation=true`
54
+ - **Method**: `GET`
55
+ - **Shape**: `{ jobs: [{ id, title, jobUrl, publishedDate, locationName, employmentType, department, team, compensation }] }`
56
+ - **Canonical URL to record**: use the returned `jobUrl` (format `https://jobs.ashbyhq.com/{slug}/{uuid}`).
57
+ - **ats**: `ashby`
58
+
59
+ #### Lever (JSON, per-company board)
60
+
61
+ - **Endpoint**: `https://api.lever.co/v0/postings/{slug}?mode=json`
62
+ - **Method**: `GET`
63
+ - **Shape**: array of postings `[{ id, text, hostedUrl, createdAt, categories: { commitment, department, location, team } }, ...]`
64
+ - **Canonical URL to record**: `hostedUrl` (format `https://jobs.lever.co/{slug}/{uuid}`).
65
+ - **ats**: `lever`
66
+
67
+ #### Workday (JSON, per-tenant + site — FINICKY)
68
+
69
+ - **Endpoint**: `https://{subdomain}.{pod}.myworkdayjobs.com/wday/cxs/{tenant}/{site}/jobs`
70
+ - `subdomain` = the Workday tenant hostname prefix (e.g. `nvidia`, `salesforce`, `adobe`, `shopify`).
71
+ - `pod` = the Workday data-center pod segment (varies: `wd1`, `wd3`, `wd5`). The hostname in `careers_url` reveals which.
72
+ - `tenant` = repeats the company slug in the path (usually equal to `subdomain`, but not always).
73
+ - `site` = the public site name exposed by the tenant (e.g. `NVIDIAExternalCareerSite`, `External`, `ShopifyCareerSite`). Read it from the tenant's HTML landing page if unknown.
74
+ - **Method**: `POST` with JSON body:
75
+ ```json
76
+ {"appliedFacets": {}, "limit": 20, "offset": 0, "searchText": ""}
77
+ ```
78
+ - **Required headers**: `Content-Type: application/json`, `Accept: application/json`. Some tenants reject requests without a realistic `User-Agent` — set one if the response is 403.
79
+ - **Shape**: `{ jobPostings: [{ title, externalPath, postedOn, locationsText, bulletFields }, ...], total }`
80
+ - **Canonical URL to record**: `https://{subdomain}.{pod}.myworkdayjobs.com/{site}{externalPath}` (note: `externalPath` already starts with `/job/...` — do NOT prepend an extra `/`).
81
+ - **Pagination**: increment `offset` by `limit` (20) until `jobPostings.length < limit` or `offset >= total`.
82
+ - **ats**: `workday`
83
+ - **Fallback**: Workday APIs are brittle — tenants occasionally block POST from data-center IPs, change `site` names silently, or return empty `jobPostings` while the HTML page shows listings. If the POST fails or returns 0 jobs on a tenant that Level 1 confirmed has listings, fall back to Level 1 (Geometra scraping the `careers_url`). Treat Workday as Level 2 with a guaranteed Level 1 fallback.
84
+
85
+ #### SmartRecruiters (JSON, per-company postings)
86
+
87
+ - **Endpoint**: `https://api.smartrecruiters.com/v1/companies/{company}/postings`
88
+ - **Method**: `GET` (plain, no auth)
89
+ - **Shape**: `{ content: [{ id, name, refNumber, jobAdUrl, releasedDate, location: { city, country, remote }, company: { identifier, name }, department }], totalFound, offset, limit }`
90
+ - **Canonical URL to record**: use `jobAdUrl` when present, otherwise `https://jobs.smartrecruiters.com/{company}/{id}`.
91
+ - **Pagination**: pass `?offset=N&limit=100` (max 100). Loop until `offset + content.length >= totalFound`.
92
+ - **ats**: `smartrecruiters`
93
+
94
+ #### WeWorkRemotely (RSS, cross-company aggregator)
95
+
96
+ - **Endpoints** (one per category — enable the ones matching your target roles):
97
+ - `https://weworkremotely.com/categories/remote-programming-jobs.rss`
98
+ - `https://weworkremotely.com/categories/remote-devops-sysadmin-jobs.rss`
99
+ - `https://weworkremotely.com/categories/remote-product-jobs.rss`
100
+ - `https://weworkremotely.com/categories/remote-design-jobs.rss`
101
+ - `https://weworkremotely.com/categories/all-other-remote-jobs.rss`
102
+ - **Method**: `GET` — returns RSS 2.0 XML.
103
+ - **Shape**: `<rss><channel><item><title>{company}: {role}</title><link>https://weworkremotely.com/remote-jobs/{slug}</link><pubDate>...</pubDate><region>...</region></item></channel></rss>`
104
+ - **Company/role extraction**: split `<title>` on the first `: ` — left side is company, right side is role. Fallback to the whole title as role if there is no `: `.
105
+ - **Canonical URL to record**: the `<link>` verbatim (format `https://weworkremotely.com/remote-jobs/{slug}`).
106
+ - **Cross-company note**: WeWorkRemotely is NOT per-company — it aggregates postings from hundreds of companies. Scan it via the `cross_company_feeds` section in `portals.yml`, not `tracked_companies`.
107
+ - **ats**: `wwr` (aggregator). The underlying company's ATS is unknown at scan time — downstream evaluators follow the link and re-detect.
108
+
109
+ #### RemoteOK (JSON, cross-company aggregator)
110
+
111
+ - **Endpoint**: `https://remoteok.com/api`
112
+ - **Method**: `GET` — returns a JSON array. The **first element is a legal/disclaimer object** (no `id`, has `legal`) — skip it. The remaining 100 entries are postings.
113
+ - **Required headers**: `User-Agent: Mozilla/5.0 ...` — RemoteOK returns 403 without a browser-like UA.
114
+ - **Shape** (per posting after skip): `{ id, slug, company, company_logo, position, description, tags: [string], date, epoch, url, apply_url, location, salary_min, salary_max }`
115
+ - **Canonical URL to record**: `url` (format `https://remoteok.com/remote-jobs/{id}-{slug}`).
116
+ - **Filtering**: RemoteOK feeds are broad — use `tags` for pre-filter (e.g. `tags` contains `"engineer"` or `"ai"`) before passing through `title_filter`.
117
+ - **Cross-company note**: same as WeWorkRemotely — configure via `cross_company_feeds`, not `tracked_companies`.
118
+ - **ats**: `remoteok` (aggregator).
40
119
 
41
120
  ### Use Level 3 — WebSearch Queries (BROAD DISCOVERY)
42
121
 
@@ -44,7 +123,7 @@ The `search_queries` with `site:` filters cover portals broadly (all Ashby board
44
123
 
45
124
  **Execution priority:**
46
125
  1. Level 1: Geometra → all `tracked_companies` with `careers_url`
47
- 2. Level 2: API → all `tracked_companies` with `api:`
126
+ 2. Level 2: API → all `tracked_companies` with `api:` (Greenhouse / Ashby / Lever / Workday / SmartRecruiters) AND all `cross_company_feeds` with `enabled: true` (WeWorkRemotely / RemoteOK)
48
127
  3. Level 3: WebSearch → all `search_queries` with `enabled: true`
49
128
 
50
129
  The levels are additive — all are executed, results are merged and deduplicated.
@@ -65,15 +144,26 @@ The levels are additive — all are executed, results are merged and deduplicate
65
144
  f. Accumulate in candidates list
66
145
  g. If `careers_url` fails (404, redirect), try `scan_query` as fallback and note for URL update
67
146
 
68
- 5. **Level 2 — Greenhouse APIs** (WebFetch can batch freely — it's cheap and doesn't use Geometra sessions):
69
- For each company in `tracked_companies` with `api:` defined and `enabled: true`:
70
- a. WebFetch the API URL JSON with job list
71
- b. For each job extract: `{title, url, company, gh_slug, gh_id, updated_at}`
72
- - **`url`**: ALWAYS record the canonical Greenhouse URL: `https://job-boards.greenhouse.io/{gh_slug}/jobs/{gh_id}`. Do **NOT** use `absolute_url` when it points to a customer-skinned front-end (e.g. `pinterestcareers.com/jobs/?gh_jid=N`, `okta.com/company/careers/opportunity/N`, `samsara.com/company/careers/roles/N`, `zoominfo.com/careers?gh_jid=N`, `collibra.com/.../?gh_jid=N`, `careers.toasttab.com/jobs?gh_jid=N`, `careers.airbnb.com/positions/N`, `coinbase.com/careers/positions/N`, `instacart.careers/job/?gh_jid=N`, `pinterestcareers.com/jobs/?gh_jid=N`). These customer front-ends return shells or 403 to bots and cause downstream WebFetch-based verification to wrongly mark the role CLOSED.
73
- - **`gh_slug`**: the Greenhouse board slug (from the API URL that was fetched).
74
- - **`gh_id`**: `jobs[].id` from the API response.
75
- - **`updated_at`**: `jobs[].updated_at` record for staleness detection (skip if older than 90 days, flag if older than 30).
76
- c. Accumulate in candidates list (dedup with Level 1). The pipeline.md entry MUST carry `| gh={gh_slug}/{gh_id}` at the end of the metadata so downstream evaluators can fall back to `https://boards-api.greenhouse.io/v1/boards/{gh_slug}/jobs/{gh_id}` when the canonical URL renders as a shell.
147
+ 5. **Level 2 — ATS / Aggregator APIs** (WebFetch can batch freely — it's cheap and doesn't use Geometra sessions):
148
+
149
+ **5a. Per-company APIs** for each company in `tracked_companies` with `api:` defined and `enabled: true`:
150
+ a. WebFetch (or `fetch` for Workday, which needs POST) the API URL per the endpoint shape documented above.
151
+ b. Extract per-posting `{title, url, company, updated_at, ats}` plus ATS-specific IDs:
152
+ - **Greenhouse** → also record `gh_slug`, `gh_id`. URL MUST be canonical `https://job-boards.greenhouse.io/{gh_slug}/jobs/{gh_id}` — do **NOT** use `absolute_url` when it points to a customer-skinned front-end (e.g. `pinterestcareers.com/jobs/?gh_jid=N`, `okta.com/company/careers/opportunity/N`, `samsara.com/company/careers/roles/N`, `zoominfo.com/careers?gh_jid=N`, `collibra.com/.../?gh_jid=N`, `careers.toasttab.com/jobs?gh_jid=N`, `careers.airbnb.com/positions/N`, `coinbase.com/careers/positions/N`, `instacart.careers/job/?gh_jid=N`). These customer front-ends return shells or 403 to bots and cause downstream WebFetch-based verification to wrongly mark the role CLOSED.
153
+ - **Ashby** record the returned `jobUrl`.
154
+ - **Lever** record the returned `hostedUrl`.
155
+ - **Workday** build URL as `https://{subdomain}.{pod}.myworkdayjobs.com/{site}{externalPath}`. If the POST fails, DROP that tenant's API attempt and fall back to Level 1 for that company do NOT fabricate postings.
156
+ - **SmartRecruiters** → record `jobAdUrl` (fallback: `https://jobs.smartrecruiters.com/{company}/{id}`).
157
+ - **`updated_at`**: use `updated_at` (Greenhouse) / `publishedDate` (Ashby) / `createdAt` (Lever) / `postedOn` (Workday) / `releasedDate` (SmartRecruiters) — record for staleness detection (skip if older than 90 days, flag if older than 30).
158
+ c. Accumulate in candidates list (dedup with Level 1). The pipeline.md entry MUST carry `| ats={type}` at the end, and for Greenhouse ALSO `| gh={gh_slug}/{gh_id}` so downstream evaluators can fall back to `https://boards-api.greenhouse.io/v1/boards/{gh_slug}/jobs/{gh_id}` when the canonical URL renders as a shell.
159
+
160
+ **5b. Cross-company aggregator feeds** — for each feed in `cross_company_feeds` with `enabled: true`:
161
+ a. WebFetch the RSS (WeWorkRemotely) or JSON (RemoteOK) endpoint per the shape documented above.
162
+ b. Parse each entry to `{title, url, company, ats, updated_at}`:
163
+ - **WeWorkRemotely** → split `<title>` on the first `: ` to separate company from role; `<link>` → url; `<pubDate>` → updated_at.
164
+ - **RemoteOK** → skip the first element (legal disclaimer); from each remaining entry take `company`, `position`, `url`, `date`.
165
+ c. Apply the feed's `tag_filter` / `category_filter` before the global `title_filter` — aggregators have much higher volume than per-company APIs.
166
+ d. Accumulate in candidates list (dedup with Level 1 + 5a).
77
167
 
78
168
  6. **Level 3 — WebSearch queries** (WebSearch is parallel-safe; batch freely):
79
169
  For each query in `search_queries` with `enabled: true`:
@@ -106,7 +196,14 @@ The levels are additive — all are executed, results are merged and deduplicate
106
196
  - When a fuzzy match is found but the URL is new, log it as `skipped_repost` (not `skipped_dup`) with a note referencing the original entry number.
107
197
 
108
198
  8. **For each new offer that passes filters**:
109
- a. Add to `pipeline.md` section "Pending": `- [ ] {url} | {company} | {title}` — append `| gh={gh_slug}/{gh_id}` when the offer came from the Greenhouse API (Level 2) so downstream verification can hit the JSON endpoint.
199
+ a. Add to `pipeline.md` section "Pending": `- [ ] {url} | {company} | {title} | ats={ats}` — the `| ats={type}` suffix is REQUIRED for every entry (values: `greenhouse`, `ashby`, `workable`, `lever`, `workday`, `smartrecruiters`, `wwr`, `remoteok`, `builtin`, `custom`, `unknown`). When the offer came from the Greenhouse API (Level 2), ALSO append `| gh={gh_slug}/{gh_id}` so downstream verification can hit the JSON endpoint. Example entries:
200
+ - `- [ ] https://job-boards.greenhouse.io/webflow/jobs/7689676 | Webflow | Lead AI Engineer | ats=greenhouse | gh=webflow/7689676`
201
+ - `- [ ] https://jobs.ashbyhq.com/everai/abc-123 | EverAI | Senior AI PM | ats=ashby`
202
+ - `- [ ] https://jobs.lever.co/temporal/xyz | Temporal | Product Manager - AI | ats=lever`
203
+ - `- [ ] https://nvidia.wd5.myworkdayjobs.com/NVIDIAExternalCareerSite/job/US-CA-Santa-Clara/Senior-AI-Engineer_JR123456 | NVIDIA | Senior AI Engineer | ats=workday`
204
+ - `- [ ] https://jobs.smartrecruiters.com/Visa1/744000012345678 | Visa | Staff ML Engineer | ats=smartrecruiters`
205
+ - `- [ ] https://weworkremotely.com/remote-jobs/acme-senior-platform-engineer | Acme | Senior Platform Engineer | ats=wwr`
206
+ - `- [ ] https://remoteok.com/remote-jobs/12345-senior-ai-engineer-acme | Acme | Senior AI Engineer | ats=remoteok`
110
207
  b. Record in `scan-history.tsv`: `{url}\t{date}\t{query_name}\t{title}\t{company}\tadded`
111
208
 
112
209
  9. **Offers filtered by title**: record in `scan-history.tsv` with status `skipped_title`
@@ -149,21 +246,27 @@ Scan mode MUST write its ranked candidate list to a file, not just return it in
149
246
 
150
247
  **Format**: one markdown table per scan run, ordered by archetype-fit rank:
151
248
 
152
- | rank | company | role | gh_slug | gh_id | url | updated_at |
153
- |------|---------|------|---------|-------|-----|------------|
154
- | 1 | Webflow | Lead AI Engineer | webflow | 7689676 | https://job-boards.greenhouse.io/webflow/jobs/7689676 | 2026-04-14 |
155
- | ... | ... | ... | ... | ... | ... | ... |
249
+ | rank | company | ats | role | gh_slug | gh_id | url | updated_at |
250
+ |------|---------|-----|------|---------|-------|-----|------------|
251
+ | 1 | Webflow | greenhouse | Lead AI Engineer | webflow | 7689676 | https://job-boards.greenhouse.io/webflow/jobs/7689676 | 2026-04-14 |
252
+ | 2 | EverAI | ashby | Senior AI PM | - | - | https://jobs.ashbyhq.com/everai/abc-123 | 2026-04-15 |
253
+ | ... | ... | ... | ... | ... | ... | ... | ... |
254
+
255
+ **`ats` values** (one of): `greenhouse`, `ashby`, `workable`, `lever`, `workday`, `smartrecruiters`, `wwr`, `remoteok`, `builtin`, `custom`, `unknown`. Every row MUST populate this column — it's what the apply subagent uses to pick the correct Gmail OTP sender query. The `wwr` and `remoteok` values identify aggregator postings whose real underlying ATS is only known after the redirect is followed — downstream evaluators re-detect and may rewrite to the underlying ATS.
156
256
 
157
257
  Every row MUST have:
158
- - `gh_slug` and `gh_id` copied verbatim from the Greenhouse API response (not reconstructed)
159
- - `url` in the canonical form `https://job-boards.greenhouse.io/{gh_slug}/jobs/{gh_id}` (matching the suffix in `data/pipeline.md`)
160
- - `updated_at` in `YYYY-MM-DD` form (the most recent `updated_at` in the API response)
258
+ - `ats` the ATS platform hosting the posting. Inferred from the canonical URL host (e.g. `boards-api.greenhouse.io` / `job-boards.greenhouse.io` → `greenhouse`; `jobs.ashbyhq.com` → `ashby`; `jobs.lever.co` → `lever`; `*.myworkdayjobs.com` (any `wd1`/`wd3`/`wd5` pod) → `workday`; `apply.workable.com` / `jobs.workable.com` → `workable`; `api.smartrecruiters.com` / `jobs.smartrecruiters.com` → `smartrecruiters`; `weworkremotely.com` → `wwr`; `remoteok.com` → `remoteok`; `builtin.com/jobs/` → `builtin`; company-own domains → `custom`; anything indeterminate → `unknown`).
259
+ - `url` in canonical form. For Greenhouse use `https://job-boards.greenhouse.io/{gh_slug}/jobs/{gh_id}` (matching the suffix in `data/pipeline.md`). For other ATSes use the platform's native URL (do not rewrite).
260
+ - `updated_at` in `YYYY-MM-DD` form (the most recent `updated_at` in the API response, or scan date when the source has no such field).
261
+
262
+ Additional columns — REQUIRED when available, `-` (dash) when not applicable:
263
+ - `gh_slug`, `gh_id` — Greenhouse-only. Copied verbatim from the Greenhouse API response (not reconstructed). For non-Greenhouse rows, emit `-` in both columns; `ats` + `url` are sufficient.
161
264
 
162
265
  The scan subagent's return message MUST:
163
266
  - Reference the file path (so orchestrators know where to read)
164
267
  - Omit the ranked URL list from prose entirely (summary counts only)
165
268
 
166
- **Rationale**: in a prior run, a scan subagent returned correct IDs in `scan-history.tsv` but hallucinated plausible-looking fake IDs in its prose-form top-30 list. The orchestrator trusted prose and dispatched 30 downstream subagents against fake URLs. File-based handoff prevents this class of error.
269
+ **Rationale**: in a prior run, a scan subagent returned correct IDs in `scan-history.tsv` but hallucinated plausible-looking fake IDs in its prose-form top-30 list. The orchestrator trusted prose and dispatched 30 downstream subagents against fake URLs. File-based handoff prevents this class of error. Recording `ats` at scan time (rather than having the apply subagent infer it from the URL host) saves downstream re-parsing and keeps the OTP sender lookup deterministic.
167
270
 
168
271
  ## Output Summary
169
272
 
@@ -205,6 +308,8 @@ Each company in `tracked_companies` MUST have a `careers_url` — the direct URL
205
308
  - **Ashby:** `https://jobs.ashbyhq.com/{slug}`
206
309
  - **Greenhouse:** `https://job-boards.greenhouse.io/{slug}` or `https://job-boards.eu.greenhouse.io/{slug}`
207
310
  - **Lever:** `https://jobs.lever.co/{slug}`
311
+ - **Workday:** `https://{subdomain}.{pod}.myworkdayjobs.com/{site}` (pod = `wd1`/`wd3`/`wd5`/..., varies by tenant data center; site is tenant-defined, e.g. `External`, `NVIDIAExternalCareerSite`)
312
+ - **SmartRecruiters:** `https://careers.smartrecruiters.com/{company}` (human-facing) / `https://api.smartrecruiters.com/v1/companies/{company}/postings` (API)
208
313
  - **Custom:** The company's own URL (e.g., `https://openai.com/careers`)
209
314
 
210
315
  **If `careers_url` doesn't exist** for a company:
@@ -7,7 +7,7 @@
7
7
  * - Single-file: data/applications.md or applications.md (legacy)
8
8
  *
9
9
  * Maps all non-canonical statuses to canonical ones per templates/states.yml:
10
- * Evaluated, Applied, Responded, Contacted, Interview, Offer, Rejected, Discarded, SKIP
10
+ * Evaluated, Applied, Responded, Contacted, Interview, Offer, Rejected, Discarded, Failed, SKIP
11
11
  *
12
12
  * Also strips markdown bold (**) and dates from the status field,
13
13
  * moving DUPLICADO info to the notes column.
@@ -23,6 +23,9 @@ import {
23
23
  usesDayFiles, ensureDayDir, parseAppLine, formatAppLine,
24
24
  readAllEntries, writeToDayFiles, listDayFiles,
25
25
  } from './tracker-lib.mjs';
26
+ import { DEFAULT_STATES, loadCanonicalStates } from './lib/canonical-states.mjs';
27
+
28
+ const CANONICAL_STATES = loadCanonicalStates(PROJECT_DIR) || DEFAULT_STATES;
26
29
 
27
30
  const DRY_RUN = process.argv.includes('--dry-run');
28
31
 
@@ -61,11 +64,7 @@ function normalizeStatus(raw) {
61
64
 
62
65
  if (s === '—' || s === '-' || s === '') return { status: 'Discarded' };
63
66
 
64
- const canonical = [
65
- 'Evaluated', 'Applied', 'Contacted', 'Responded', 'Interview',
66
- 'Offer', 'Rejected', 'Discarded', 'SKIP',
67
- ];
68
- for (const c of canonical) {
67
+ for (const c of CANONICAL_STATES) {
69
68
  if (lower === c.toLowerCase()) return { status: c };
70
69
  }
71
70
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "job-forge",
3
- "version": "2.0.2",
3
+ "version": "2.1.0",
4
4
  "description": "AI-powered job search pipeline built on opencode",
5
5
  "type": "module",
6
6
  "bin": {
@@ -43,6 +43,7 @@
43
43
  "batch/batch-runner.sh",
44
44
  "batch/README.md",
45
45
  "docs/",
46
+ "lib/",
46
47
  "tracker-lib.mjs",
47
48
  "merge-tracker.mjs",
48
49
  "dedup-tracker.mjs",
@@ -406,6 +406,36 @@ search_queries:
406
406
  # Companies whose career pages are checked directly.
407
407
  # scan_method: geometra (default), websearch, greenhouse_api
408
408
  # For Greenhouse companies, add api: field for faster structured JSON access.
409
+ #
410
+ # Per-ATS api: field shapes (see modes/scan.md for full endpoint docs):
411
+ #
412
+ # Greenhouse: api: https://boards-api.greenhouse.io/v1/boards/{slug}/jobs
413
+ # Ashby: api: https://api.ashbyhq.com/posting-api/job-board/{slug}?includeCompensation=true
414
+ # Lever: api: https://api.lever.co/v0/postings/{slug}?mode=json
415
+ # SmartRecruiters: api: https://api.smartrecruiters.com/v1/companies/{company}/postings
416
+ # Workday: api_type: workday (requires workday_subdomain, workday_pod, workday_tenant, workday_site)
417
+ #
418
+ # Workday schema (finicky — POST with JSON body, tenant + site vary per company):
419
+ # - name: NVIDIA
420
+ # careers_url: https://nvidia.wd5.myworkdayjobs.com/NVIDIAExternalCareerSite
421
+ # api_type: workday
422
+ # workday_subdomain: nvidia # hostname prefix
423
+ # workday_pod: wd5 # data-center pod (wd1/wd3/wd5/...)
424
+ # workday_tenant: nvidia # usually same as subdomain
425
+ # workday_site: NVIDIAExternalCareerSite # public site name — read from careers_url path
426
+ # tags: ["chips", "ai-lab", "us"]
427
+ # enabled: true
428
+ #
429
+ # Built API URL: https://{subdomain}.{pod}.myworkdayjobs.com/wday/cxs/{tenant}/{site}/jobs
430
+ # POST body: {"appliedFacets": {}, "limit": 20, "offset": 0, "searchText": ""}
431
+ #
432
+ # SmartRecruiters schema (simple GET):
433
+ # - name: Visa
434
+ # careers_url: https://careers.smartrecruiters.com/Visa1
435
+ # api: https://api.smartrecruiters.com/v1/companies/Visa1/postings
436
+ # ats: smartrecruiters
437
+ # tags: ["fintech", "enterprise", "us"]
438
+ # enabled: true
409
439
 
410
440
  tracked_companies:
411
441
 
@@ -3138,3 +3168,80 @@ tracked_companies:
3138
3168
  notes: "TypeScript ORM. Remote."
3139
3169
  tags: ["developer-tools", "open-source", "remote-first"]
3140
3170
  enabled: true
3171
+
3172
+ # -- Cross-company aggregator feeds --
3173
+ # Aggregator boards that expose a single feed covering hundreds of companies.
3174
+ # Unlike tracked_companies, these are NOT per-company — the scanner pulls the
3175
+ # whole feed, applies a pre-filter (category / tags), then runs each posting
3176
+ # through the global title_filter above.
3177
+ #
3178
+ # Types:
3179
+ # - weworkremotely → RSS 2.0 XML per category
3180
+ # - remoteok → JSON array, first element is a legal disclaimer (skipped)
3181
+ #
3182
+ # See modes/scan.md → "Level 2 — ATS / Aggregator APIs" for full shape docs.
3183
+
3184
+ cross_company_feeds:
3185
+
3186
+ # -- We Work Remotely (RSS per category) --
3187
+ # Feed IDs map to https://weworkremotely.com/categories/{id}.rss
3188
+ - name: WeWorkRemotely — Programming
3189
+ type: weworkremotely
3190
+ feed: remote-programming-jobs
3191
+ url: https://weworkremotely.com/categories/remote-programming-jobs.rss
3192
+ # Optional pre-filter applied BEFORE title_filter. Drops obviously
3193
+ # off-target entries without cluttering scan-history.
3194
+ category_filter:
3195
+ positive: [] # empty = accept all from this feed
3196
+ negative: ["WordPress", "PHP", "Shopify Theme"]
3197
+ enabled: true
3198
+
3199
+ - name: WeWorkRemotely — DevOps & SysAdmin
3200
+ type: weworkremotely
3201
+ feed: remote-devops-sysadmin-jobs
3202
+ url: https://weworkremotely.com/categories/remote-devops-sysadmin-jobs.rss
3203
+ enabled: true
3204
+
3205
+ - name: WeWorkRemotely — Product
3206
+ type: weworkremotely
3207
+ feed: remote-product-jobs
3208
+ url: https://weworkremotely.com/categories/remote-product-jobs.rss
3209
+ enabled: true
3210
+
3211
+ - name: WeWorkRemotely — All Other
3212
+ type: weworkremotely
3213
+ feed: all-other-remote-jobs
3214
+ url: https://weworkremotely.com/categories/all-other-remote-jobs.rss
3215
+ # This category is very broad — start disabled, enable if signal is good.
3216
+ enabled: false
3217
+
3218
+ # -- RemoteOK (single JSON feed, filter by tags) --
3219
+ # One endpoint, 100 newest postings. Filter by tags BEFORE title_filter,
3220
+ # otherwise you burn the title_filter pass on ~80% irrelevant rows.
3221
+ - name: RemoteOK — AI & Engineering
3222
+ type: remoteok
3223
+ url: https://remoteok.com/api
3224
+ # Required — RemoteOK returns 403 without a browser-like UA.
3225
+ user_agent: "Mozilla/5.0 (compatible; JobForgeScanner/1.0; +https://github.com/razroo/JobForge)"
3226
+ # Pre-filter on entry.tags (case-insensitive substring match against the
3227
+ # array). Row passes if ANY positive matches AND NO negative matches.
3228
+ tag_filter:
3229
+ positive:
3230
+ - "engineer"
3231
+ - "engineering"
3232
+ - "ai"
3233
+ - "ml"
3234
+ - "llm"
3235
+ - "python"
3236
+ - "golang"
3237
+ - "typescript"
3238
+ - "devops"
3239
+ - "platform"
3240
+ - "product manager"
3241
+ negative:
3242
+ - "wordpress"
3243
+ - "php"
3244
+ - "marketing"
3245
+ - "sales"
3246
+ - "customer support"
3247
+ enabled: true
@@ -55,6 +55,12 @@ states:
55
55
  description: Discarded by candidate or offer closed
56
56
  dashboard_group: discarded
57
57
 
58
+ - id: failed
59
+ label: Failed
60
+ aliases: []
61
+ description: Submission attempted but blocked by portal (spam-filter, anti-bot, broken form). May be recoverable via manual retry.
62
+ dashboard_group: failed
63
+
58
64
  - id: skip
59
65
  label: SKIP
60
66
  aliases: [skip]