job-forge 2.14.36 → 2.14.38
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude/agents/general-free.md +3 -2
- package/.codex/config.toml +1 -1
- package/.cursor/mcp.json +1 -1
- package/.cursor/rules/agent-general-free.mdc +3 -2
- package/.cursor/rules/main.mdc +3 -3
- package/.mcp.json +1 -1
- package/.opencode/agents/general-free.md +3 -2
- package/.opencode/instructions.md +8 -0
- package/.opencode/skills/job-forge.md +1 -1
- package/AGENTS.md +3 -3
- package/CLAUDE.md +3 -3
- package/README.md +2 -2
- package/batch/README.md +7 -6
- package/batch/batch-runner.sh +2 -2
- package/bin/create-job-forge.mjs +6 -3
- package/bin/sync.mjs +35 -1
- package/config/profile.example.yml +8 -5
- package/docs/ARCHITECTURE.md +9 -8
- package/docs/CUSTOMIZATION.md +6 -6
- package/docs/SETUP.md +1 -1
- package/iso/agents/general-free.md +3 -2
- package/iso/commands/job-forge.md +1 -1
- package/iso/instructions.md +3 -3
- package/iso/instructions.opencode.md +8 -0
- package/iso/mcp.json +1 -1
- package/lib/jobforge-observability.mjs +847 -0
- package/modes/apply.md +4 -3
- package/modes/auto-pipeline.md +2 -2
- package/modes/pipeline.md +2 -2
- package/modes/reference-geometra.md +5 -5
- package/modes/reference-portals.md +10 -10
- package/modes/scan.md +3 -3
- package/opencode.json +3 -2
- package/package.json +8 -5
- package/scripts/batch-orchestrator.mjs +158 -9
- package/scripts/check-iso-smoke.mjs +3 -0
- package/scripts/guard.mjs +114 -190
- package/scripts/telemetry.mjs +214 -450
- package/scripts/trace.mjs +103 -232
package/modes/apply.md
CHANGED
|
@@ -42,8 +42,8 @@ Live application assistant. Reads the active application form in Chrome (via Geo
|
|
|
42
42
|
- [D6] Use `fieldLabel` over `fieldId` everywhere it works.
|
|
43
43
|
why: labels are stable across DOM refreshes; IDs are regenerated
|
|
44
44
|
|
|
45
|
-
- [D7] If the orchestrator says a proxy is configured, read the top-level `proxy:` block from `config/profile.yml` and pass that object into every `geometra_connect` call — including Call 3 of the recovery sequence. If the task prompt includes a legacy inline `proxy` object, pass it through
|
|
46
|
-
why: class-B Ashby / Cloudflare-fronted portals need a residential outbound IP
|
|
45
|
+
- [D7] If the orchestrator says a proxy is configured, read the top-level `proxy:` block from `config/profile.yml` and pass that object plus `stealth: true` into every `geometra_connect` call — including Call 3 of the recovery sequence. If the task prompt includes a legacy inline `proxy` object, pass it through and still set `stealth: true`, but do not echo credentials in status text. If absent, run with `stealth: true` and no proxy; never invent a proxy URL.
|
|
46
|
+
why: class-B Ashby / Cloudflare-fronted portals need a residential outbound IP plus a stealth Chromium fingerprint. Geometra MCP v1.59.0 added proxy plumbing, and v1.61.3 added CloakBrowser stealth Chromium via `stealth: true`; the orchestrator owns the config pipe. See "BYO Residential Proxy" in modes/reference-portals.md.
|
|
47
47
|
|
|
48
48
|
- [D8] Upgrade application routing to `@general-paid` when the offer score is ≥ 4.0/5, the user flags "top-tier", "dream job", or "high-stakes", or the candidate is late-stage/post-screen.
|
|
49
49
|
why: high-stakes applications need the quality-sensitive prompt and medium reasoning budget even though OpenCode now routes both application tiers through DeepSeek V4 Flash by default
|
|
@@ -53,7 +53,7 @@ Live application assistant. Reads the active application form in Chrome (via Geo
|
|
|
53
53
|
|
|
54
54
|
## Procedure
|
|
55
55
|
|
|
56
|
-
1. `geometra_connect` + `geometra_page_model`; thread `proxy` if present [D7]; no WebFetch [D5].
|
|
56
|
+
1. `geometra_connect` with `stealth: true` + `geometra_page_model`; thread `proxy` if present [D7]; no WebFetch [D5].
|
|
57
57
|
2. If Geometra is unavailable, ask for screenshot or pasted text [D2].
|
|
58
58
|
3. Extract company + role; Grep `reports/` for a matching evaluation.
|
|
59
59
|
4. Load full report + Section G if present.
|
|
@@ -350,6 +350,7 @@ Call 3: geometra_connect({
|
|
|
350
350
|
isolated: true,
|
|
351
351
|
headless: true,
|
|
352
352
|
slowMo: 350,
|
|
353
|
+
stealth: true,
|
|
353
354
|
proxy: <pass through from task prompt if present; omit otherwise>
|
|
354
355
|
})
|
|
355
356
|
Call 4: geometra_run_actions({
|
package/modes/auto-pipeline.md
CHANGED
|
@@ -9,7 +9,7 @@ Fetch the JD content once. If the input is a **URL** (not pasted JD text), fetch
|
|
|
9
9
|
**Pick exactly one method, in this priority order:**
|
|
10
10
|
|
|
11
11
|
1. **Greenhouse JSON API (first try, if the URL is Greenhouse-backed):** If the pipeline.md entry carries `| gh={slug}/{id}` OR the URL host matches `*.greenhouse.io` / a known Greenhouse customer front-end (`*.pinterestcareers.com`, `okta.com/company/careers/opportunity/*`, `samsara.com/company/careers/roles/*`, `zoominfo.com/careers?gh_jid=*`, `collibra.com/.../?gh_jid=*`, `careers.toasttab.com/jobs?gh_jid=*`, `careers.airbnb.com/positions/*?gh_jid=*`, `coinbase.com/careers/positions/*?gh_jid=*`, `instacart.careers/job/?gh_jid=*`), extract `slug` and `id` and WebFetch `https://boards-api.greenhouse.io/v1/boards/{slug}/jobs/{id}`. 200 + JSON with `content` is the authoritative JD. 404 = genuinely closed (mark CLOSED and stop). **OpenCode WebFetch compatibility:** do not pass `format: "json"`; omit `format` or use `format: "text"` and parse the returned JSON text. **If 200, STOP — do not fall back to Geometra or WebFetch of the front-end.** The API is faster, cheaper (no Geometra session), and never returns a bot-shell.
|
|
12
|
-
2. **Geometra MCP:** Most non-Greenhouse job portals (Lever, Ashby, Workday) are SPAs. Use `geometra_connect` + `geometra_page_model` to render and read the JD. **If this returns non-empty JD text, STOP — do not WebFetch the same URL.**
|
|
12
|
+
2. **Geometra MCP:** Most non-Greenhouse job portals (Lever, Ashby, Workday) are SPAs. Use `geometra_connect({ ..., stealth: true })` + `geometra_page_model` to render and read the JD. **If this returns non-empty JD text, STOP — do not WebFetch the same URL.**
|
|
13
13
|
3. **WebFetch (only if Geometra is unavailable OR returned only a shell with no JD text):** For static pages (ZipRecruiter, WeLoveProduct, company career pages).
|
|
14
14
|
4. **WebSearch (only if methods 1–3 all failed):** Search for the role title + company on secondary portals that index the JD in static HTML.
|
|
15
15
|
|
|
@@ -38,7 +38,7 @@ Execute the full `pdf` pipeline (read `modes/pdf.md`).
|
|
|
38
38
|
|
|
39
39
|
Generate draft answers for the application form when the final score is >= 3.5. If the final score is >= 3.5 (per Canonical Scoring Model thresholds in `_shared.md`), generate draft answers for the application form:
|
|
40
40
|
|
|
41
|
-
1. **Extract form questions**: Use Geometra MCP (`geometra_connect` + `geometra_form_schema`) to discover all form fields. **Reuse the same `sessionId` from Step 0** when the apply URL is the same rendered page; only connect again if the prior session ended or the URL changed. If questions cannot be extracted, use the generic questions.
|
|
41
|
+
1. **Extract form questions**: Use Geometra MCP (`geometra_connect({ ..., stealth: true })` + `geometra_form_schema`) to discover all form fields. **Reuse the same `sessionId` from Step 0** when the apply URL is the same rendered page; only connect again if the prior session ended or the URL changed. If questions cannot be extracted, use the generic questions.
|
|
42
42
|
2. **Generate answers** following the tone guidelines (see below).
|
|
43
43
|
3. **Save in the report** as a `## G) Draft Application Answers` section.
|
|
44
44
|
|
package/modes/pipeline.md
CHANGED
|
@@ -7,7 +7,7 @@ Processes accumulated job offer URLs from `data/pipeline.md`. The user adds URLs
|
|
|
7
7
|
1. **Read** `data/pipeline.md` → find `- [ ]` items in the "Pending" section
|
|
8
8
|
2. **For each pending URL**:
|
|
9
9
|
a. Calculate the next sequential `REPORT_NUM` by running `npx job-forge next-num` (scans `reports/`, day file `#` columns, and `batch/tracker-additions/` — do NOT derive from `reports/` alone)
|
|
10
|
-
b. **Extract JD** using Geometra MCP (geometra_connect + geometra_page_model) → WebFetch → WebSearch
|
|
10
|
+
b. **Extract JD** using Geometra MCP (`geometra_connect({ ..., stealth: true })` + geometra_page_model) → WebFetch → WebSearch
|
|
11
11
|
c. If the URL is not accessible → mark as `- [!]` with a note and continue
|
|
12
12
|
d. **Run full auto-pipeline**: A-F Evaluation → Report .md → PDF (if score >= 3.0, per `_shared.md` thresholds) → Draft answers (if score >= 3.5) → Tracker
|
|
13
13
|
e. **Move from "Pending" to "Processed"**: `- [x] #NNN | URL | Company | Role | Score/5 | PDF ✅/❌`
|
|
@@ -34,7 +34,7 @@ Processes accumulated job offer URLs from `data/pipeline.md`. The user adds URLs
|
|
|
34
34
|
## Detect JD From URL
|
|
35
35
|
|
|
36
36
|
1. **Greenhouse JSON API (FIRST, when the entry has `| gh={slug}/{id}` OR the host looks Greenhouse-backed):** WebFetch `https://boards-api.greenhouse.io/v1/boards/{slug}/jobs/{id}`. 200 + JSON with `content` = LIVE, use it as the JD; 404 = genuinely CLOSED (mark `- [!]` and continue). **OpenCode WebFetch compatibility:** do not pass `format: "json"`; omit `format` or use `format: "text"` and parse the returned JSON text. Bot-hostile customer fronts (`pinterestcareers.com`, `okta.com`, `samsara.com`, `zoominfo.com`, `collibra.com`, `careers.toasttab.com`, `careers.airbnb.com`, `coinbase.com`, `instacart.careers`, `careers.toasttab.com`) MUST be verified via this API first — WebFetch/Geometra of those domains returns a shell or 403 and causes false CLOSED marks.
|
|
37
|
-
2. **Geometra MCP:** `geometra_connect` + `geometra_page_model`. Works with non-Greenhouse SPAs (Lever, Ashby, Workday), uses fewer tokens than raw DOM snapshots.
|
|
37
|
+
2. **Geometra MCP:** `geometra_connect({ ..., stealth: true })` + `geometra_page_model`. Works with non-Greenhouse SPAs (Lever, Ashby, Workday), uses fewer tokens than raw DOM snapshots.
|
|
38
38
|
3. **WebFetch (fallback):** For static pages or when Geometra is not available.
|
|
39
39
|
4. **WebSearch (last resort):** Search on secondary portals that index the JD.
|
|
40
40
|
|
|
@@ -50,7 +50,7 @@ These blocks come from two distinct root causes and require different responses:
|
|
|
50
50
|
|
|
51
51
|
**Rule — do NOT loop retrying a class B block.** One retry with `imeFriendly: true` is the correct test for class A. If the same spam message fires after a clean `imeFriendly` refill, stop, mark Failed, move on. Repeated retries waste subagent time and do not change the outcome.
|
|
52
52
|
|
|
53
|
-
**Class B fix — BYO residential proxy
|
|
53
|
+
**Class B fix — BYO residential proxy + stealth Chromium.** When the candidate has configured `proxy:` in `config/profile.yml`, every `geometra_connect` call threads that proxy through to Chromium, which flips the outbound IP from datacenter to residential/mobile. JobForge also passes `stealth: true` so Geometra MCP >=1.61.3 launches CloakBrowser's patched Chromium instead of stock Playwright Chromium. See the "BYO Residential Proxy" reference section below. Without a configured proxy, stealth still helps browser fingerprinting, but the outbound IP remains datacenter.
|
|
54
54
|
|
|
55
55
|
**Known-block Ashby tenants (2026-04-19 empirical observations).** These tenants fired class B on every attempted submit from a headless datacenter-IP proxy. Orchestrators planning apply dispatches should assume these tenants will Fail in headless — prioritize other portals, or skip same-tenant siblings after a confirmed class B to avoid burning subagent slots:
|
|
56
56
|
|
|
@@ -138,12 +138,12 @@ When running multiple application forms in parallel, each `geometra_connect` MUS
|
|
|
138
138
|
|
|
139
139
|
**Correct parallel pattern:**
|
|
140
140
|
```javascript
|
|
141
|
-
geometra_connect({ pageUrl: "https://...", isolated: true, headless: true, slowMo: 350 })
|
|
141
|
+
geometra_connect({ pageUrl: "https://...", isolated: true, headless: true, slowMo: 350, stealth: true })
|
|
142
142
|
```
|
|
143
143
|
|
|
144
144
|
**Wrong:** running `geometra_connect` without `isolated: true` when submitting multiple forms concurrently. The forms may share state and produce incorrect submissions.
|
|
145
145
|
|
|
146
|
-
**With a configured proxy,** add `proxy: { server, username?, password?, bypass? }` to the same call — see "BYO Residential Proxy" below. The reusable-proxy pool is partitioned by proxy identity, so mixing direct and proxied sessions across parallel rounds is safe.
|
|
146
|
+
**With a configured proxy,** add `proxy: { server, username?, password?, bypass? }` to the same call — see "BYO Residential Proxy" below. The reusable-proxy pool is partitioned by proxy identity, so mixing direct and proxied sessions across parallel rounds is safe. Keep `stealth: true` either way so JobForge uses Geometra's CloakBrowser Chromium path for portal sessions.
|
|
147
147
|
|
|
148
148
|
### Session Reuse — When Subagents Cannot Reach Existing Sessions
|
|
149
149
|
|
|
@@ -185,7 +185,7 @@ Every subagent that uses Geometra must run these THREE tool calls as its FIRST t
|
|
|
185
185
|
```
|
|
186
186
|
Step 1: geometra_list_sessions()
|
|
187
187
|
Step 2: geometra_disconnect({ closeBrowser: true })
|
|
188
|
-
Step 3: geometra_connect({ pageUrl: "<the URL the orchestrator gave you>", isolated: true, headless: true, slowMo: 350 })
|
|
188
|
+
Step 3: geometra_connect({ pageUrl: "<the URL the orchestrator gave you>", isolated: true, headless: true, slowMo: 350, stealth: true })
|
|
189
189
|
```
|
|
190
190
|
|
|
191
191
|
**If the orchestrator says proxy is configured,** read the top-level
|
|
@@ -193,7 +193,7 @@ Step 3: geometra_connect({ pageUrl: "<the URL the orchestrator gave you>", isol
|
|
|
193
193
|
|
|
194
194
|
```
|
|
195
195
|
Step 3: geometra_connect({
|
|
196
|
-
pageUrl: "<URL>", isolated: true, headless: true, slowMo: 350,
|
|
196
|
+
pageUrl: "<URL>", isolated: true, headless: true, slowMo: 350, stealth: true,
|
|
197
197
|
proxy: { server: "...", username: "...", password: "...", bypass: "..." }
|
|
198
198
|
})
|
|
199
199
|
```
|
|
@@ -49,13 +49,13 @@ When a form says "enter the code we sent to your email", you MUST retrieve the c
|
|
|
49
49
|
|
|
50
50
|
---
|
|
51
51
|
|
|
52
|
-
## BYO Residential Proxy
|
|
52
|
+
## BYO Residential Proxy + Stealth Chromium
|
|
53
53
|
|
|
54
54
|
**Problem:** on 2026-04-19 cycle 4, 5/5 untested Ashby tenants and 100% of Dropbox-class Cloudflare-fronted portals fingerprint-blocked headless Chromium from datacenter IPs. `imeFriendly: true` fixes class A (React validation lag) but has zero effect on class B (environment fingerprint). There is no in-session software-only fix for class B: the server decided the session is a bot before the form response was rendered.
|
|
55
55
|
|
|
56
|
-
**Fix:** route the spawned Chromium through a residential or mobile proxy the candidate already pays for
|
|
56
|
+
**Fix:** route the spawned Chromium through a residential or mobile proxy the candidate already pays for, and launch Geometra's CloakBrowser stealth Chromium with `stealth: true`. Geometra MCP v1.59.0 added a `proxy: { server, username?, password?, bypass? }` parameter on `geometra_connect` and `geometra_prepare_browser`; v1.61.3 added `stealth: true` for CloakBrowser. The outbound IP becomes residential/mobile, the browser fingerprint moves off stock Playwright Chromium, and the class-B checks have fewer signals to trip.
|
|
57
57
|
|
|
58
|
-
**
|
|
58
|
+
**Proxy is opt-in, stealth is default.** JobForge does NOT bundle or resell proxy bandwidth — the candidate brings their own provider (Bright Data, Oxylabs, SOAX, Smartproxy, mobile hotspot, self-hosted SOCKS). Without a configured proxy, JobForge still passes `stealth: true`, but the outbound IP remains the machine or hosting environment running Chromium.
|
|
59
59
|
|
|
60
60
|
### Where the proxy config lives
|
|
61
61
|
|
|
@@ -76,15 +76,15 @@ See `config/profile.example.yml` for the commented-out template.
|
|
|
76
76
|
**Orchestrator responsibilities:**
|
|
77
77
|
|
|
78
78
|
1. On session start, read `config/profile.yml` once. If a `proxy:` block is present, remember that a proxy is configured, but do not paste username/password values into task prompts or user-visible status.
|
|
79
|
-
2. When dispatching any subagent whose work involves a `geometra_connect` call, tell it to read `config/profile.yml` and pass the top-level `proxy:` block to every `geometra_connect` call. Example dispatch prompt line: "Proxy is configured; read `config/profile.yml` and pass its top-level `proxy:` object to every `geometra_connect` call."
|
|
80
|
-
3. When the orchestrator itself opens a Chromium session (single-application interactive flow), include the same `proxy` object from `config/profile.yml` in its own `geometra_connect` call.
|
|
79
|
+
2. When dispatching any subagent whose work involves a `geometra_connect` call, tell it to read `config/profile.yml` and pass the top-level `proxy:` block plus `stealth: true` to every `geometra_connect` call. Example dispatch prompt line: "Proxy is configured; read `config/profile.yml` and pass its top-level `proxy:` object plus `stealth: true` to every `geometra_connect` call."
|
|
80
|
+
3. When the orchestrator itself opens a Chromium session (single-application interactive flow), include the same `proxy` object from `config/profile.yml` and `stealth: true` in its own `geometra_connect` call.
|
|
81
81
|
4. If `proxy:` is absent from `profile.yml`, skip the param entirely. Do NOT invent a proxy URL or leave a stale placeholder.
|
|
82
82
|
|
|
83
83
|
**Subagent responsibilities:**
|
|
84
84
|
|
|
85
|
-
1. If the task prompt says proxy is configured, read `config/profile.yml` and pass the top-level `proxy:` object through to `geometra_connect` and any `geometra_prepare_browser` calls unchanged.
|
|
86
|
-
2. If the task prompt includes a legacy inline `proxy` object, pass it through unchanged
|
|
87
|
-
3. If the task prompt does NOT mention a proxy and `config/profile.yml` has no `proxy:` block, run
|
|
85
|
+
1. If the task prompt says proxy is configured, read `config/profile.yml` and pass the top-level `proxy:` object plus `stealth: true` through to `geometra_connect` and any `geometra_prepare_browser` calls unchanged.
|
|
86
|
+
2. If the task prompt includes a legacy inline `proxy` object, pass it through unchanged and still set `stealth: true`, but never print the credentials back in status text.
|
|
87
|
+
3. If the task prompt does NOT mention a proxy and `config/profile.yml` has no `proxy:` block, run with `stealth: true` and no proxy.
|
|
88
88
|
4. Never second-guess the proxy field — if it comes from `profile.yml`, it's authoritative.
|
|
89
89
|
|
|
90
90
|
### When proxy use is load-bearing
|
|
@@ -98,7 +98,7 @@ Apply these rules when deciding whether the proxy is worth waiting for:
|
|
|
98
98
|
|
|
99
99
|
### Pool partitioning — why mixed runs are safe
|
|
100
100
|
|
|
101
|
-
The Geometra MCP partitions its reusable-proxy pool by
|
|
101
|
+
The Geometra MCP partitions its reusable-proxy pool by proxy identity and browser flavor — proxy partitioning landed in `@geometra/mcp@1.59.0`, and stealth partitioning is available in `@geometra/mcp@1.61.3`. A direct session and a proxied session NEVER share a Chromium instance, and stock and stealth sessions do not pool together. Practical consequence: flipping `proxy:` on or off in `profile.yml` mid-session is safe — the next `geometra_connect` just opens a fresh Chromium in its own pool partition.
|
|
102
102
|
|
|
103
103
|
### Troubleshooting
|
|
104
104
|
|
|
@@ -128,7 +128,7 @@ The Geometra MCP partitions its reusable-proxy pool by `(server, username, bypas
|
|
|
128
128
|
"geometra": {
|
|
129
129
|
"type": "stdio",
|
|
130
130
|
"command": "npx",
|
|
131
|
-
"args": ["-y", "@geometra/mcp"]
|
|
131
|
+
"args": ["-y", "@geometra/mcp@1.61.3"]
|
|
132
132
|
},
|
|
133
133
|
"gmail": {
|
|
134
134
|
"type": "stdio",
|
package/modes/scan.md
CHANGED
|
@@ -25,7 +25,7 @@ Read `portals.yml` which contains:
|
|
|
25
25
|
|
|
26
26
|
### Use Level 1 — Direct Geometra (PRIMARY)
|
|
27
27
|
|
|
28
|
-
**For each company in `tracked_companies`:** Connect to its `careers_url` with Geometra MCP (`geometra_connect` + `geometra_page_model` / `geometra_list_items`), read ALL visible job listings, and extract the title + URL of each one. Direct Geometra is the most reliable method because:
|
|
28
|
+
**For each company in `tracked_companies`:** Connect to its `careers_url` with Geometra MCP (`geometra_connect({ ..., stealth: true })` + `geometra_page_model` / `geometra_list_items`), read ALL visible job listings, and extract the title + URL of each one. Direct Geometra is the most reliable method because:
|
|
29
29
|
|
|
30
30
|
- It sees the page in real time (not cached Google results).
|
|
31
31
|
- It works with SPAs (Ashby, Lever, Workday).
|
|
@@ -138,7 +138,7 @@ The levels are additive — all are executed, results are merged and deduplicate
|
|
|
138
138
|
|
|
139
139
|
4. **Level 1 — Geometra scan** (sequential, or ≤2 parallel via `task` subagents per Hard Limit #1 in `AGENTS.md`):
|
|
140
140
|
For each company in `tracked_companies` with `enabled: true` and `careers_url` defined:
|
|
141
|
-
a. `geometra_connect` to the `careers_url`
|
|
141
|
+
a. `geometra_connect` to the `careers_url` with `stealth: true`
|
|
142
142
|
b. `geometra_page_model` or `geometra_list_items` to read all job listings
|
|
143
143
|
c. If the page has filters/departments, navigate the relevant sections
|
|
144
144
|
d. For each job listing extract: `{title, url, company}`
|
|
@@ -317,7 +317,7 @@ Each company in `tracked_companies` MUST have a `careers_url` — the direct URL
|
|
|
317
317
|
**If `careers_url` doesn't exist** for a company:
|
|
318
318
|
1. Try the pattern for its known platform
|
|
319
319
|
2. If that fails, do a quick WebSearch: `"{company}" careers jobs`
|
|
320
|
-
3. Navigate with Geometra (`geometra_connect`) to confirm it works
|
|
320
|
+
3. Navigate with Geometra (`geometra_connect` with `stealth: true`) to confirm it works
|
|
321
321
|
4. **Save the found URL in portals.yml** for future scans
|
|
322
322
|
|
|
323
323
|
**If `careers_url` returns 404 or redirect:**
|
package/opencode.json
CHANGED
|
@@ -18,7 +18,7 @@
|
|
|
18
18
|
"command": [
|
|
19
19
|
"npx",
|
|
20
20
|
"-y",
|
|
21
|
-
"@geometra/mcp"
|
|
21
|
+
"@geometra/mcp@1.61.3"
|
|
22
22
|
],
|
|
23
23
|
"environment": {}
|
|
24
24
|
},
|
|
@@ -46,7 +46,8 @@
|
|
|
46
46
|
}
|
|
47
47
|
},
|
|
48
48
|
"instructions": [
|
|
49
|
-
"templates/states.yml"
|
|
49
|
+
"templates/states.yml",
|
|
50
|
+
".opencode/instructions.md"
|
|
50
51
|
],
|
|
51
52
|
"small_model": "opencode-go/deepseek-v4-flash"
|
|
52
53
|
}
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "job-forge",
|
|
3
|
-
"version": "2.14.
|
|
3
|
+
"version": "2.14.38",
|
|
4
4
|
"description": "AI-powered job search pipeline built on opencode",
|
|
5
5
|
"type": "module",
|
|
6
6
|
"bin": {
|
|
@@ -173,6 +173,9 @@
|
|
|
173
173
|
"engines": {
|
|
174
174
|
"node": ">=20.6.0"
|
|
175
175
|
},
|
|
176
|
+
"overrides": {
|
|
177
|
+
"fast-uri": "3.1.2"
|
|
178
|
+
},
|
|
176
179
|
"dependencies": {
|
|
177
180
|
"@razroo/iso-cache": "^0.1.0",
|
|
178
181
|
"@razroo/iso-canon": "^0.1.0",
|
|
@@ -185,21 +188,21 @@
|
|
|
185
188
|
"@razroo/iso-ledger": "^0.1.0",
|
|
186
189
|
"@razroo/iso-lineage": "^0.1.0",
|
|
187
190
|
"@razroo/iso-migrate": "^0.1.0",
|
|
188
|
-
"@razroo/iso-orchestrator": "^0.
|
|
191
|
+
"@razroo/iso-orchestrator": "^0.2.0",
|
|
189
192
|
"@razroo/iso-postflight": "^0.1.0",
|
|
190
193
|
"@razroo/iso-preflight": "^0.1.0",
|
|
191
194
|
"@razroo/iso-prioritize": "^0.1.0",
|
|
192
195
|
"@razroo/iso-redact": "^0.1.0",
|
|
193
196
|
"@razroo/iso-score": "^0.1.0",
|
|
194
197
|
"@razroo/iso-timeline": "^0.1.0",
|
|
195
|
-
"@razroo/iso-trace": "^0.
|
|
198
|
+
"@razroo/iso-trace": "^0.5.0",
|
|
196
199
|
"playwright": "^1.58.1"
|
|
197
200
|
},
|
|
198
201
|
"devDependencies": {
|
|
199
202
|
"@razroo/agentmd": "^0.3.0",
|
|
200
|
-
"@razroo/iso": "^0.
|
|
203
|
+
"@razroo/iso": "^0.3.1",
|
|
201
204
|
"@razroo/iso-eval": "^0.4.0",
|
|
202
|
-
"@razroo/iso-harness": "^0.
|
|
205
|
+
"@razroo/iso-harness": "^0.8.0",
|
|
203
206
|
"@razroo/iso-route": "^0.5.3"
|
|
204
207
|
}
|
|
205
208
|
}
|
|
@@ -8,6 +8,7 @@
|
|
|
8
8
|
* - idempotent bundle execution keyed by URL + retry count
|
|
9
9
|
* - bounded fan-out through workflow.forEach(..., { maxParallel })
|
|
10
10
|
* - mutexed state/report-number writes across parallel workers
|
|
11
|
+
* - renewable leases + heartbeats for worker liveness inspection
|
|
11
12
|
*/
|
|
12
13
|
|
|
13
14
|
import { spawn, spawnSync } from 'node:child_process';
|
|
@@ -46,12 +47,13 @@ const DEFAULT_WORKFLOW_ID = 'jobforge-batch';
|
|
|
46
47
|
const STATE_HEADER = 'id\turl\tstatus\tstarted_at\tcompleted_at\treport_num\tscore\terror\tretries';
|
|
47
48
|
|
|
48
49
|
function usage() {
|
|
49
|
-
console.log(`job-forge batch runner - process job offers in batch via
|
|
50
|
-
Uses
|
|
50
|
+
console.log(`job-forge batch runner - process job offers in batch via AI CLI workers
|
|
51
|
+
Uses the selected runner's default project configuration.
|
|
51
52
|
|
|
52
53
|
Usage: batch-runner.sh [OPTIONS]
|
|
53
54
|
|
|
54
55
|
Options:
|
|
56
|
+
--runner NAME Worker CLI: opencode or codex (default: ${process.env.JOBFORGE_BATCH_RUNNER || 'opencode'})
|
|
55
57
|
--parallel N Number of parallel workers (default: 1)
|
|
56
58
|
--bundle-size N Offers per worker invocation (default: 5, use 1 for
|
|
57
59
|
legacy per-offer mode). Each worker processes N
|
|
@@ -74,6 +76,7 @@ Files:
|
|
|
74
76
|
|
|
75
77
|
function parseArgs(argv) {
|
|
76
78
|
const options = {
|
|
79
|
+
runner: parseRunner(process.env.JOBFORGE_BATCH_RUNNER || 'opencode'),
|
|
77
80
|
parallel: 1,
|
|
78
81
|
dryRun: false,
|
|
79
82
|
retryFailed: false,
|
|
@@ -94,6 +97,9 @@ function parseArgs(argv) {
|
|
|
94
97
|
};
|
|
95
98
|
|
|
96
99
|
switch (arg) {
|
|
100
|
+
case '--runner':
|
|
101
|
+
options.runner = parseRunner(next());
|
|
102
|
+
break;
|
|
97
103
|
case '--parallel':
|
|
98
104
|
options.parallel = parsePositiveInt(next(), '--parallel');
|
|
99
105
|
break;
|
|
@@ -128,6 +134,12 @@ function parseArgs(argv) {
|
|
|
128
134
|
return options;
|
|
129
135
|
}
|
|
130
136
|
|
|
137
|
+
function parseRunner(value) {
|
|
138
|
+
const runner = String(value || '').trim().toLowerCase();
|
|
139
|
+
if (runner === 'opencode' || runner === 'codex') return runner;
|
|
140
|
+
throw new Error(`--runner must be one of: opencode, codex`);
|
|
141
|
+
}
|
|
142
|
+
|
|
131
143
|
function parsePositiveInt(value, label) {
|
|
132
144
|
const n = Number.parseInt(value, 10);
|
|
133
145
|
if (!Number.isInteger(n) || n < 1) {
|
|
@@ -180,7 +192,7 @@ async function readTextIfExists(path) {
|
|
|
180
192
|
return readFile(path, 'utf8');
|
|
181
193
|
}
|
|
182
194
|
|
|
183
|
-
async function checkPrerequisites({ dryRun }) {
|
|
195
|
+
async function checkPrerequisites({ dryRun, runner }) {
|
|
184
196
|
if (!existsSync(INPUT_FILE)) {
|
|
185
197
|
throw new Error(`${INPUT_FILE} not found. Add offers first.`);
|
|
186
198
|
}
|
|
@@ -188,9 +200,10 @@ async function checkPrerequisites({ dryRun }) {
|
|
|
188
200
|
throw new Error(`${PROMPT_FILE} not found.`);
|
|
189
201
|
}
|
|
190
202
|
if (!dryRun) {
|
|
191
|
-
const
|
|
203
|
+
const command = workerCommandName(runner);
|
|
204
|
+
const result = spawnSync(command, ['--help'], { stdio: 'ignore' });
|
|
192
205
|
if (result.error?.code === 'ENOENT') {
|
|
193
|
-
throw new Error(
|
|
206
|
+
throw new Error(`'${command}' CLI not found in PATH.`);
|
|
194
207
|
}
|
|
195
208
|
}
|
|
196
209
|
|
|
@@ -531,6 +544,70 @@ async function runOpencode(prompt, logFile) {
|
|
|
531
544
|
});
|
|
532
545
|
}
|
|
533
546
|
|
|
547
|
+
let batchTemplateCache;
|
|
548
|
+
|
|
549
|
+
async function batchTemplateText() {
|
|
550
|
+
if (batchTemplateCache === undefined) {
|
|
551
|
+
batchTemplateCache = await readFile(PROMPT_FILE, 'utf8');
|
|
552
|
+
}
|
|
553
|
+
return batchTemplateCache;
|
|
554
|
+
}
|
|
555
|
+
|
|
556
|
+
function workerCommandName(runner) {
|
|
557
|
+
return runner === 'codex' ? 'codex' : 'opencode';
|
|
558
|
+
}
|
|
559
|
+
|
|
560
|
+
async function runCodex(prompt, logFile) {
|
|
561
|
+
await ensureDir(dirname(logFile));
|
|
562
|
+
const finalMessageFile = `${logFile}.last-message.txt`;
|
|
563
|
+
const combinedPrompt = `${(await batchTemplateText()).trim()}\n\n${prompt}`;
|
|
564
|
+
|
|
565
|
+
return new Promise((resolveRun) => {
|
|
566
|
+
const child = spawn('codex', [
|
|
567
|
+
'exec',
|
|
568
|
+
'--dangerously-bypass-approvals-and-sandbox',
|
|
569
|
+
'-C',
|
|
570
|
+
PROJECT_DIR,
|
|
571
|
+
'--output-last-message',
|
|
572
|
+
finalMessageFile,
|
|
573
|
+
combinedPrompt,
|
|
574
|
+
], {
|
|
575
|
+
cwd: PROJECT_DIR,
|
|
576
|
+
env: {
|
|
577
|
+
...process.env,
|
|
578
|
+
JOB_FORGE_PROJECT: PROJECT_DIR,
|
|
579
|
+
},
|
|
580
|
+
stdio: ['ignore', 'pipe', 'pipe'],
|
|
581
|
+
});
|
|
582
|
+
|
|
583
|
+
const chunks = [];
|
|
584
|
+
child.stdout.on('data', (chunk) => chunks.push(chunk));
|
|
585
|
+
child.stderr.on('data', (chunk) => chunks.push(chunk));
|
|
586
|
+
|
|
587
|
+
child.on('error', async (error) => {
|
|
588
|
+
chunks.push(Buffer.from(`\n${error.stack || error.message}\n`));
|
|
589
|
+
});
|
|
590
|
+
|
|
591
|
+
child.on('close', async (code) => {
|
|
592
|
+
const output = Buffer.concat(chunks).toString('utf8');
|
|
593
|
+
const finalMessage = await readTextIfExists(finalMessageFile);
|
|
594
|
+
const logOutput = finalMessage
|
|
595
|
+
? `${output}\n\n--- FINAL MESSAGE ---\n${finalMessage}`
|
|
596
|
+
: output;
|
|
597
|
+
await writeFile(logFile, logOutput, 'utf8');
|
|
598
|
+
resolveRun({
|
|
599
|
+
exitCode: code ?? 1,
|
|
600
|
+
output: finalMessage || output,
|
|
601
|
+
});
|
|
602
|
+
});
|
|
603
|
+
});
|
|
604
|
+
}
|
|
605
|
+
|
|
606
|
+
async function runWorker(runner, prompt, logFile) {
|
|
607
|
+
if (runner === 'codex') return runCodex(prompt, logFile);
|
|
608
|
+
return runOpencode(prompt, logFile);
|
|
609
|
+
}
|
|
610
|
+
|
|
534
611
|
function parseStatusLines(output) {
|
|
535
612
|
const seen = new Map();
|
|
536
613
|
for (const line of output.split('\n')) {
|
|
@@ -550,7 +627,57 @@ function parseStatusLines(output) {
|
|
|
550
627
|
return seen;
|
|
551
628
|
}
|
|
552
629
|
|
|
553
|
-
async function
|
|
630
|
+
async function withWorkerLiveness(workflow, { runner, tag, ids, logFile }, run) {
|
|
631
|
+
const leaseKey = `worker:${tag}`;
|
|
632
|
+
const holder = `${runner}:${process.pid}:${tag}`;
|
|
633
|
+
const detail = {
|
|
634
|
+
runner,
|
|
635
|
+
ids,
|
|
636
|
+
log: relativeProjectPath(logFile),
|
|
637
|
+
};
|
|
638
|
+
|
|
639
|
+
await workflow.touchLease(leaseKey, {
|
|
640
|
+
holder,
|
|
641
|
+
ttlMs: 120_000,
|
|
642
|
+
detail: {
|
|
643
|
+
...detail,
|
|
644
|
+
phase: 'starting',
|
|
645
|
+
},
|
|
646
|
+
});
|
|
647
|
+
await workflow.heartbeat(leaseKey, {
|
|
648
|
+
...detail,
|
|
649
|
+
phase: 'starting',
|
|
650
|
+
});
|
|
651
|
+
|
|
652
|
+
const timer = setInterval(() => {
|
|
653
|
+
workflow.touchLease(leaseKey, {
|
|
654
|
+
holder,
|
|
655
|
+
ttlMs: 120_000,
|
|
656
|
+
detail: {
|
|
657
|
+
...detail,
|
|
658
|
+
phase: 'running',
|
|
659
|
+
},
|
|
660
|
+
}).catch(() => {});
|
|
661
|
+
workflow.heartbeat(leaseKey, {
|
|
662
|
+
...detail,
|
|
663
|
+
phase: 'running',
|
|
664
|
+
}).catch(() => {});
|
|
665
|
+
}, 30_000);
|
|
666
|
+
timer.unref?.();
|
|
667
|
+
|
|
668
|
+
try {
|
|
669
|
+
return await run();
|
|
670
|
+
} finally {
|
|
671
|
+
clearInterval(timer);
|
|
672
|
+
await workflow.heartbeat(leaseKey, {
|
|
673
|
+
...detail,
|
|
674
|
+
phase: 'finished',
|
|
675
|
+
}).catch(() => {});
|
|
676
|
+
await workflow.releaseLease(leaseKey, holder).catch(() => {});
|
|
677
|
+
}
|
|
678
|
+
}
|
|
679
|
+
|
|
680
|
+
async function processBundle(workflow, bundle, options) {
|
|
554
681
|
const startedAt = nowIso();
|
|
555
682
|
const specs = await reserveBundle(workflow, bundle, startedAt);
|
|
556
683
|
const tag = bundleTag(bundle);
|
|
@@ -561,11 +688,17 @@ async function processBundle(workflow, bundle) {
|
|
|
561
688
|
type: 'batch.bundle.started',
|
|
562
689
|
detail: {
|
|
563
690
|
ids: bundle.map((offer) => offer.id),
|
|
691
|
+
runner: options.runner,
|
|
564
692
|
log: relativeProjectPath(logFile),
|
|
565
693
|
},
|
|
566
694
|
});
|
|
567
695
|
|
|
568
|
-
const { exitCode, output } = await
|
|
696
|
+
const { exitCode, output } = await withWorkerLiveness(workflow, {
|
|
697
|
+
runner: options.runner,
|
|
698
|
+
tag,
|
|
699
|
+
ids: bundle.map((offer) => offer.id),
|
|
700
|
+
logFile,
|
|
701
|
+
}, () => runWorker(options.runner, buildBundlePrompt(specs), logFile));
|
|
569
702
|
const completedAt = nowIso();
|
|
570
703
|
const statuses = parseStatusLines(output);
|
|
571
704
|
const outcomes = [];
|
|
@@ -598,6 +731,13 @@ async function processBundle(workflow, bundle) {
|
|
|
598
731
|
score: sanitizeCell(score),
|
|
599
732
|
report_num: sanitizeCell(parsed.report_num, spec.report_num),
|
|
600
733
|
});
|
|
734
|
+
await workflow.heartbeat(`worker:${tag}`, {
|
|
735
|
+
runner: options.runner,
|
|
736
|
+
ids: bundle.map((candidate) => candidate.id),
|
|
737
|
+
offerId: spec.id,
|
|
738
|
+
phase: 'settling',
|
|
739
|
+
status,
|
|
740
|
+
}).catch(() => {});
|
|
601
741
|
console.log(` ${status === 'completed' ? 'OK' : 'FAIL'} #${spec.id} (status=${status}, score=${sanitizeCell(score)}, report=${sanitizeCell(parsed.report_num, spec.report_num)})`);
|
|
602
742
|
continue;
|
|
603
743
|
}
|
|
@@ -622,6 +762,13 @@ async function processBundle(workflow, bundle) {
|
|
|
622
762
|
score: '-',
|
|
623
763
|
report_num: spec.report_num,
|
|
624
764
|
});
|
|
765
|
+
await workflow.heartbeat(`worker:${tag}`, {
|
|
766
|
+
runner: options.runner,
|
|
767
|
+
ids: bundle.map((candidate) => candidate.id),
|
|
768
|
+
offerId: spec.id,
|
|
769
|
+
phase: 'settling',
|
|
770
|
+
status: 'failed',
|
|
771
|
+
}).catch(() => {});
|
|
625
772
|
console.log(` FAIL #${spec.id} (no status emitted; see ${relativeProjectPath(logFile)})`);
|
|
626
773
|
}
|
|
627
774
|
|
|
@@ -633,6 +780,7 @@ async function processBundle(workflow, bundle) {
|
|
|
633
780
|
type: 'batch.bundle.completed',
|
|
634
781
|
detail: {
|
|
635
782
|
ids: bundle.map((offer) => offer.id),
|
|
783
|
+
runner: options.runner,
|
|
636
784
|
exitCode,
|
|
637
785
|
log: relativeProjectPath(logFile),
|
|
638
786
|
outcomes,
|
|
@@ -766,7 +914,7 @@ async function run(options) {
|
|
|
766
914
|
const pending = selectPendingOffers(offers, stateRows, options);
|
|
767
915
|
|
|
768
916
|
console.log('=== job-forge batch runner ===');
|
|
769
|
-
console.log(`Parallel: ${options.parallel} | Bundle size: ${options.bundleSize} | Max retries: ${options.maxRetries}`);
|
|
917
|
+
console.log(`Runner: ${options.runner} | Parallel: ${options.parallel} | Bundle size: ${options.bundleSize} | Max retries: ${options.maxRetries}`);
|
|
770
918
|
console.log(`Workflow: ${options.workflowId} (${relativeProjectPath(WORKFLOW_DIR)})`);
|
|
771
919
|
console.log(`Input: ${totalInput} offers`);
|
|
772
920
|
console.log('');
|
|
@@ -812,6 +960,7 @@ async function run(options) {
|
|
|
812
960
|
totalInput,
|
|
813
961
|
pending: pending.length,
|
|
814
962
|
bundles: bundles.length,
|
|
963
|
+
runner: options.runner,
|
|
815
964
|
parallel: options.parallel,
|
|
816
965
|
bundleSize: options.bundleSize,
|
|
817
966
|
},
|
|
@@ -824,7 +973,7 @@ async function run(options) {
|
|
|
824
973
|
const stepName = bundleStepName(bundle, rowsBeforeRun);
|
|
825
974
|
return workflow.step(
|
|
826
975
|
stepName,
|
|
827
|
-
async () => processBundle(workflow, bundle),
|
|
976
|
+
async () => processBundle(workflow, bundle, options),
|
|
828
977
|
{
|
|
829
978
|
idempotencyKey: stepName,
|
|
830
979
|
},
|
|
@@ -5,6 +5,7 @@ import { resolve } from "node:path";
|
|
|
5
5
|
const root = resolve(process.argv[2] ?? ".");
|
|
6
6
|
const files = {
|
|
7
7
|
instructions: readFileSync(resolve(root, "iso/instructions.md"), "utf8"),
|
|
8
|
+
instructionsOpencode: readFileSync(resolve(root, "iso/instructions.opencode.md"), "utf8"),
|
|
8
9
|
helpers: readFileSync(resolve(root, "modes/reference-local-helpers.md"), "utf8"),
|
|
9
10
|
apply: readFileSync(resolve(root, "modes/apply.md"), "utf8"),
|
|
10
11
|
models: readFileSync(resolve(root, "models.yaml"), "utf8"),
|
|
@@ -20,6 +21,8 @@ const checks = [
|
|
|
20
21
|
["H5 blocks same-company concurrent retry", () => every(files.instructions, ["Re-dispatch the same company only AFTER", "previous subagent returns"])],
|
|
21
22
|
["H6 requires merge and verify", () => every(files.instructions, ["batch/tracker-additions/*.tsv", "npx job-forge merge", "npx job-forge verify"])],
|
|
22
23
|
["H7 distrusts subagent prose", () => every(files.instructions, ["must originate from a file", "not from prior subagent prose"])],
|
|
24
|
+
["H8 keeps proxy secret and requires stealth", () => every(files.instructions, ["[H8]", "Do not transcribe `server`, `username`, `password`, or `bypass`", "`stealth: true`"])],
|
|
25
|
+
["OpenCode addendum exists for task semantics", () => every(files.instructionsOpencode, ["OpenCode", "`task`", "launch acknowledgement", "Do not use `task` to poll status"])],
|
|
23
26
|
["root points to consolidated helper reference", () => every(files.instructions, ["[D8]", "modes/reference-local-helpers.md", "deterministic local helpers"])],
|
|
24
27
|
["helper reference covers score/timeline/prioritize/lineage", () => every(files.helpers, ["templates/score.json", "npx job-forge score:*", "templates/timeline.json", "npx job-forge timeline:*", "templates/prioritize.json", "npx job-forge prioritize:*", ".jobforge-lineage.json", "npx job-forge lineage:*"])],
|
|
25
28
|
["root helper defaults are consolidated", () => !/\[D(?:9|1\d|2[0-9])\]/.test(files.instructions)],
|