job-forge 2.0.0 → 2.0.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.cursor/rules/main.mdc +10 -1
- package/AGENTS.md +10 -1
- package/CLAUDE.md +10 -1
- package/README.md +58 -45
- package/bin/create-job-forge.mjs +1 -1
- package/docs/ARCHITECTURE.md +32 -19
- package/docs/README.md +3 -3
- package/docs/SETUP.md +21 -8
- package/iso/instructions.md +10 -1
- package/modes/auto-pipeline.md +6 -3
- package/modes/pipeline.md +4 -3
- package/modes/scan.md +50 -7
- package/package.json +1 -1
package/.cursor/rules/main.mdc
CHANGED
|
@@ -15,8 +15,17 @@ The Hard Limits below are non-negotiable numeric rules. If you catch yourself ab
|
|
|
15
15
|
4. **Orchestrator does NOT fill forms.** This session MUST NOT call `geometra_fill_form`, `geometra_run_actions`, `geometra_pick_listbox_option`, or `geometra_fill_otp` when handling a multi-job request. If you need to, it means you MUST have delegated — `task` out the remaining work instead.
|
|
16
16
|
5. **Re-dispatch only AFTER the previous subagent returns.** Never fire the same company's `task` twice while the first is still in-flight. Wait for the return value, then decide if a retry is warranted.
|
|
17
17
|
6. **Application outcomes flow through TSVs, not `data/pipeline.md`.** When a subagent returns APPLIED / FAILED / SKIP, the outcome goes to `batch/tracker-additions/{num}-{slug}.tsv`. `node merge-tracker.mjs` then consumes the TSVs into the correct `data/applications/YYYY-MM-DD.md` day file. `data/pipeline.md` only tracks URL inbox state (`[ ]` pending → `[x]` processed). **NEVER append APPLIED / FAILED status lines to `pipeline.md`** — that's the day file's job, via the TSV pathway. After any multi-apply run, the orchestrator MUST run `node merge-tracker.mjs` followed by `node verify-pipeline.mjs` before ending the session.
|
|
18
|
+
7. **URLs passed to downstream subagents must come from a file, not from a prior subagent's prose.** When an orchestrator dispatches a subagent with a URL (for evaluation, apply, verification, etc.), the URL MUST originate from:
|
|
19
|
+
- `data/pipeline.md`
|
|
20
|
+
- `data/scan-history.tsv`
|
|
21
|
+
- `batch/scan-output-*.md` or similar structured output file
|
|
22
|
+
- A report file (`reports/{num}-*.md`) with an authoritative `**URL:**` header
|
|
18
23
|
|
|
19
|
-
|
|
24
|
+
URLs mentioned in a subagent's return message are NOT trustworthy by default — they may be hallucinated or reconstructed. Before passing any URL from a subagent report to another subagent, cross-check it exists in one of the authoritative files above, OR instruct the dispatching subagent to write its output to a structured file and re-read from that file.
|
|
25
|
+
|
|
26
|
+
**Why**: a scan subagent once reported 30 plausible-looking Greenhouse IDs in its return message that did not exist in the Greenhouse API. The orchestrator dispatched 30 downstream subagents that all failed verification. Trusting prose-form URLs cost ~2 hours of wasted work and corrupted the tracker.
|
|
27
|
+
|
|
28
|
+
Everything below is context and rationale. These seven numbers are the rules.
|
|
20
29
|
|
|
21
30
|
---
|
|
22
31
|
|
package/AGENTS.md
CHANGED
|
@@ -10,8 +10,17 @@ The Hard Limits below are non-negotiable numeric rules. If you catch yourself ab
|
|
|
10
10
|
4. **Orchestrator does NOT fill forms.** This session MUST NOT call `geometra_fill_form`, `geometra_run_actions`, `geometra_pick_listbox_option`, or `geometra_fill_otp` when handling a multi-job request. If you need to, it means you MUST have delegated — `task` out the remaining work instead.
|
|
11
11
|
5. **Re-dispatch only AFTER the previous subagent returns.** Never fire the same company's `task` twice while the first is still in-flight. Wait for the return value, then decide if a retry is warranted.
|
|
12
12
|
6. **Application outcomes flow through TSVs, not `data/pipeline.md`.** When a subagent returns APPLIED / FAILED / SKIP, the outcome goes to `batch/tracker-additions/{num}-{slug}.tsv`. `node merge-tracker.mjs` then consumes the TSVs into the correct `data/applications/YYYY-MM-DD.md` day file. `data/pipeline.md` only tracks URL inbox state (`[ ]` pending → `[x]` processed). **NEVER append APPLIED / FAILED status lines to `pipeline.md`** — that's the day file's job, via the TSV pathway. After any multi-apply run, the orchestrator MUST run `node merge-tracker.mjs` followed by `node verify-pipeline.mjs` before ending the session.
|
|
13
|
+
7. **URLs passed to downstream subagents must come from a file, not from a prior subagent's prose.** When an orchestrator dispatches a subagent with a URL (for evaluation, apply, verification, etc.), the URL MUST originate from:
|
|
14
|
+
- `data/pipeline.md`
|
|
15
|
+
- `data/scan-history.tsv`
|
|
16
|
+
- `batch/scan-output-*.md` or similar structured output file
|
|
17
|
+
- A report file (`reports/{num}-*.md`) with an authoritative `**URL:**` header
|
|
13
18
|
|
|
14
|
-
|
|
19
|
+
URLs mentioned in a subagent's return message are NOT trustworthy by default — they may be hallucinated or reconstructed. Before passing any URL from a subagent report to another subagent, cross-check it exists in one of the authoritative files above, OR instruct the dispatching subagent to write its output to a structured file and re-read from that file.
|
|
20
|
+
|
|
21
|
+
**Why**: a scan subagent once reported 30 plausible-looking Greenhouse IDs in its return message that did not exist in the Greenhouse API. The orchestrator dispatched 30 downstream subagents that all failed verification. Trusting prose-form URLs cost ~2 hours of wasted work and corrupted the tracker.
|
|
22
|
+
|
|
23
|
+
Everything below is context and rationale. These seven numbers are the rules.
|
|
15
24
|
|
|
16
25
|
---
|
|
17
26
|
|
package/CLAUDE.md
CHANGED
|
@@ -10,8 +10,17 @@ The Hard Limits below are non-negotiable numeric rules. If you catch yourself ab
|
|
|
10
10
|
4. **Orchestrator does NOT fill forms.** This session MUST NOT call `geometra_fill_form`, `geometra_run_actions`, `geometra_pick_listbox_option`, or `geometra_fill_otp` when handling a multi-job request. If you need to, it means you MUST have delegated — `task` out the remaining work instead.
|
|
11
11
|
5. **Re-dispatch only AFTER the previous subagent returns.** Never fire the same company's `task` twice while the first is still in-flight. Wait for the return value, then decide if a retry is warranted.
|
|
12
12
|
6. **Application outcomes flow through TSVs, not `data/pipeline.md`.** When a subagent returns APPLIED / FAILED / SKIP, the outcome goes to `batch/tracker-additions/{num}-{slug}.tsv`. `node merge-tracker.mjs` then consumes the TSVs into the correct `data/applications/YYYY-MM-DD.md` day file. `data/pipeline.md` only tracks URL inbox state (`[ ]` pending → `[x]` processed). **NEVER append APPLIED / FAILED status lines to `pipeline.md`** — that's the day file's job, via the TSV pathway. After any multi-apply run, the orchestrator MUST run `node merge-tracker.mjs` followed by `node verify-pipeline.mjs` before ending the session.
|
|
13
|
+
7. **URLs passed to downstream subagents must come from a file, not from a prior subagent's prose.** When an orchestrator dispatches a subagent with a URL (for evaluation, apply, verification, etc.), the URL MUST originate from:
|
|
14
|
+
- `data/pipeline.md`
|
|
15
|
+
- `data/scan-history.tsv`
|
|
16
|
+
- `batch/scan-output-*.md` or similar structured output file
|
|
17
|
+
- A report file (`reports/{num}-*.md`) with an authoritative `**URL:**` header
|
|
13
18
|
|
|
14
|
-
|
|
19
|
+
URLs mentioned in a subagent's return message are NOT trustworthy by default — they may be hallucinated or reconstructed. Before passing any URL from a subagent report to another subagent, cross-check it exists in one of the authoritative files above, OR instruct the dispatching subagent to write its output to a structured file and re-read from that file.
|
|
20
|
+
|
|
21
|
+
**Why**: a scan subagent once reported 30 plausible-looking Greenhouse IDs in its return message that did not exist in the Greenhouse API. The orchestrator dispatched 30 downstream subagents that all failed verification. Trusting prose-form URLs cost ~2 hours of wasted work and corrupted the tracker.
|
|
22
|
+
|
|
23
|
+
Everything below is context and rationale. These seven numbers are the rules.
|
|
15
24
|
|
|
16
25
|
---
|
|
17
26
|
|
package/README.md
CHANGED
|
@@ -19,17 +19,17 @@
|
|
|
19
19
|
## Quick Start
|
|
20
20
|
|
|
21
21
|
```bash
|
|
22
|
-
npx
|
|
22
|
+
npx --package=job-forge create-job-forge my-job-search
|
|
23
23
|
cd my-job-search
|
|
24
24
|
npm install
|
|
25
25
|
opencode
|
|
26
26
|
```
|
|
27
27
|
|
|
28
|
-
The scaffolded `opencode.json` already has the Geometra MCP (browser automation + PDF) and Gmail MCP (reading replies) wired up — they launch automatically the first time opencode starts.
|
|
28
|
+
The scaffolded `opencode.json` already has the Geometra MCP (browser automation + PDF) and Gmail MCP (reading replies) wired up — they launch automatically the first time opencode starts. `npm install` also materializes symlinks for every supported agent harness — OpenCode, Cursor, Claude Code, and Codex — so you can run `opencode`, `cursor`, `claude`, or `codex` in the same project and each picks up the shared MCP config and instructions.
|
|
29
29
|
|
|
30
30
|
Then fill in `cv.md`, `config/profile.yml`, and `portals.yml` with your personal data, paste a job URL into opencode, and JobForge evaluates + tracks it.
|
|
31
31
|
|
|
32
|
-
**Upgrade later:** `npm run update-harness` (pulls latest
|
|
32
|
+
**Upgrade later:** `npm run update-harness` (pulls latest `job-forge` from npm, re-syncs symlinks, prints the resolved version)
|
|
33
33
|
|
|
34
34
|
Full setup guide and alternative install paths (including contributing to the harness itself): **[docs/SETUP.md](docs/SETUP.md)**.
|
|
35
35
|
|
|
@@ -125,61 +125,74 @@ You paste a job URL or description
|
|
|
125
125
|
|
|
126
126
|
## Project Structure
|
|
127
127
|
|
|
128
|
-
**Your personal project** (after `npx create-job-forge my-search`):
|
|
128
|
+
**Your personal project** (after `npx --package=job-forge create-job-forge my-search`):
|
|
129
129
|
|
|
130
130
|
```
|
|
131
131
|
my-search/
|
|
132
|
-
├── package.json
|
|
133
|
-
├── opencode.json
|
|
134
|
-
├── cv.md
|
|
135
|
-
├── article-digest.md
|
|
136
|
-
├── portals.yml
|
|
137
|
-
├── config/profile.yml
|
|
138
|
-
├── data/
|
|
139
|
-
├── reports/
|
|
140
|
-
├── batch/
|
|
141
|
-
|
|
142
|
-
|
|
143
|
-
│
|
|
144
|
-
│
|
|
145
|
-
|
|
146
|
-
|
|
147
|
-
├──
|
|
148
|
-
├──
|
|
149
|
-
├── .
|
|
150
|
-
|
|
132
|
+
├── package.json # depends on "job-forge": "^2.0.0" (npm registry)
|
|
133
|
+
├── opencode.json # thin config — enables MCPs + states.yml
|
|
134
|
+
├── cv.md # your CV (personal)
|
|
135
|
+
├── article-digest.md # your proof points (optional, personal)
|
|
136
|
+
├── portals.yml # companies to scan (personal)
|
|
137
|
+
├── config/profile.yml # your identity, target roles (personal)
|
|
138
|
+
├── data/ # applications, pipeline, scan history (personal, gitignored)
|
|
139
|
+
├── reports/ # generated evaluation reports (personal, gitignored)
|
|
140
|
+
├── batch/{batch-input,batch-state}.tsv, tracker-additions/, logs/ # personal
|
|
141
|
+
├── AGENTS.md # personal overrides (opencode + codex)
|
|
142
|
+
├── CLAUDE.md # personal overrides (Claude Code), @-imports CLAUDE.harness.md
|
|
143
|
+
│
|
|
144
|
+
│ # ↓ symlinks into node_modules/job-forge/, regenerated by postinstall sync.mjs
|
|
145
|
+
├── AGENTS.harness.md # → harness instructions (loaded via opencode.json)
|
|
146
|
+
├── CLAUDE.harness.md # → harness instructions (imported from personal CLAUDE.md)
|
|
147
|
+
├── .mcp.json # → Claude Code MCP config
|
|
148
|
+
├── .codex/config.toml # → Codex MCP config
|
|
149
|
+
├── .cursor/mcp.json # → Cursor MCP config
|
|
150
|
+
├── .cursor/rules/main.mdc # → Cursor always-apply rule
|
|
151
|
+
├── .opencode/skills/job-forge.md # → skill router
|
|
152
|
+
├── .opencode/agents/ # → @general-free, @general-paid, @glm-minimal
|
|
153
|
+
├── modes/ # → _shared.md + skill modes
|
|
154
|
+
├── templates/ # → states.yml, portals.example.yml, cv-template.html
|
|
155
|
+
├── batch/batch-prompt.md # → batch worker prompt
|
|
156
|
+
├── batch/batch-runner.sh # → parallel orchestrator
|
|
157
|
+
│
|
|
158
|
+
└── node_modules/job-forge/ # the harness (from npm: `job-forge@2.x`)
|
|
151
159
|
```
|
|
152
160
|
|
|
153
161
|
Symlinks are regenerated on every `npm install` via the package's `postinstall` hook. You never have to know about harness internals — just edit `cv.md`, `portals.yml`, and `config/profile.yml`.
|
|
154
162
|
|
|
155
|
-
**The harness itself** (this repo, what gets
|
|
163
|
+
**The harness itself** (this repo, what gets published as `job-forge` on npm):
|
|
156
164
|
|
|
157
165
|
```
|
|
158
166
|
JobForge/
|
|
159
|
-
├──
|
|
167
|
+
├── iso/ # ← SOURCE OF TRUTH for harness configuration
|
|
168
|
+
│ ├── instructions.md # → AGENTS.md + CLAUDE.md (Claude Code / Codex / Cursor)
|
|
169
|
+
│ ├── mcp.json # → .mcp.json + .cursor/mcp.json + .codex/config.toml + opencode.json
|
|
170
|
+
│ ├── agents/*.md # → .opencode/agents/*.md (general-free, general-paid, glm-minimal)
|
|
171
|
+
│ ├── commands/job-forge.md # → .opencode/skills/job-forge.md
|
|
172
|
+
│ └── config.json # per-harness top-level extras (e.g. opencode `instructions` array)
|
|
173
|
+
│
|
|
174
|
+
├── package.json # bin: job-forge, create-job-forge; prepack runs iso-harness
|
|
160
175
|
├── bin/
|
|
161
|
-
│ ├── job-forge.mjs
|
|
162
|
-
│ ├── sync.mjs
|
|
163
|
-
│ └── create-job-forge.mjs
|
|
164
|
-
├──
|
|
165
|
-
├──
|
|
166
|
-
├──
|
|
167
|
-
├──
|
|
168
|
-
├──
|
|
169
|
-
├──
|
|
170
|
-
|
|
171
|
-
├── tracker-lib.mjs
|
|
172
|
-
├──
|
|
173
|
-
├──
|
|
174
|
-
├──
|
|
175
|
-
├──
|
|
176
|
-
|
|
177
|
-
├── cv-sync-check.mjs # setup lint
|
|
178
|
-
├── dashboard/ # optional Go TUI
|
|
179
|
-
├── fonts/ # Space Grotesk + DM Sans (for PDF)
|
|
180
|
-
└── docs/ # architecture, setup, customization
|
|
176
|
+
│ ├── job-forge.mjs # CLI dispatcher (merge/verify/pdf/tokens/sync/...)
|
|
177
|
+
│ ├── sync.mjs # postinstall: creates symlinks in consumer project
|
|
178
|
+
│ └── create-job-forge.mjs # scaffolder
|
|
179
|
+
├── modes/ # _shared.md + 16 skill modes
|
|
180
|
+
├── templates/ # cv-template.html, portals.example.yml, states.yml
|
|
181
|
+
├── config/profile.example.yml # template for consumer's profile.yml
|
|
182
|
+
├── batch/{batch-prompt.md,batch-runner.sh} # batch orchestrator
|
|
183
|
+
├── scripts/
|
|
184
|
+
│ ├── token-usage-report.mjs # opencode cost analyzer
|
|
185
|
+
│ └── release/check-source.mjs # version gate for npm publish
|
|
186
|
+
├── tracker-lib.mjs / merge-tracker.mjs / dedup-tracker.mjs / verify-pipeline.mjs
|
|
187
|
+
├── normalize-statuses.mjs / generate-pdf.mjs / cv-sync-check.mjs
|
|
188
|
+
├── dashboard/ # optional Go TUI
|
|
189
|
+
├── fonts/ # Space Grotesk + DM Sans (for PDF)
|
|
190
|
+
├── docs/ # architecture, setup, customization
|
|
191
|
+
└── .github/workflows/ # quality.yml + release.yml (CI publish to npm)
|
|
181
192
|
```
|
|
182
193
|
|
|
194
|
+
All per-harness config trees (`.opencode/`, `.cursor/`, `.claude/`, `.codex/`, `CLAUDE.md`, `AGENTS.md`, `.mcp.json`, `opencode.json`) are **generated** from `iso/` by [`@razroo/iso-harness`](https://www.npmjs.com/package/@razroo/iso-harness) and gitignored in this repo. `npm run build:config` regenerates them locally; `prepack` regenerates them into the tarball at publish time so consumers get everything pre-baked.
|
|
195
|
+
|
|
183
196
|
## Documentation
|
|
184
197
|
|
|
185
198
|
Index and cross-links: [docs/README.md](docs/README.md).
|
package/bin/create-job-forge.mjs
CHANGED
|
@@ -117,7 +117,7 @@ const consumerPkg = {
|
|
|
117
117
|
'update-harness': 'npm update job-forge @razroo/opencode-model-fallback @razroo/gmail-mcp @geometra/mcp && job-forge sync && node -e "console.log(\'✅ harness at\', require(\'./package-lock.json\').packages[\'node_modules/job-forge\'].resolved)"',
|
|
118
118
|
},
|
|
119
119
|
dependencies: {
|
|
120
|
-
'job-forge': '
|
|
120
|
+
'job-forge': '^2.0.0',
|
|
121
121
|
// Model-fallback plugin: rotates agents through their fallback_models
|
|
122
122
|
// chain on rate-limit / 5xx errors so a rate-limited free-tier model
|
|
123
123
|
// doesn't wedge the whole flow. The chains live upstream in each
|
package/docs/ARCHITECTURE.md
CHANGED
|
@@ -2,31 +2,42 @@
|
|
|
2
2
|
|
|
3
3
|
## Package architecture (v2.0.0+)
|
|
4
4
|
|
|
5
|
-
JobForge ships as an npm package. There are two kinds of repo involved:
|
|
5
|
+
JobForge ships as an npm package at [`job-forge`](https://www.npmjs.com/package/job-forge). There are two kinds of repo involved:
|
|
6
6
|
|
|
7
|
-
- **Harness** — this repo, `razroo/JobForge`.
|
|
8
|
-
- **Consumer project** — what users interact with day-to-day. Scaffolded via `npx create-job-forge <dir>`, or hand-authored with `job-forge` listed in `package.json` dependencies.
|
|
7
|
+
- **Harness** — this repo, `razroo/JobForge`. Published to npm. Contains `iso/` (single source of truth), modes, scripts, skill router, templates, fonts, dashboard, and bin entries. Per-harness config trees are **generated** from `iso/` by [`@razroo/iso-harness`](https://www.npmjs.com/package/@razroo/iso-harness) — gitignored here, baked into the tarball by `prepack` at publish time, and landed in consumer projects via symlinks.
|
|
8
|
+
- **Consumer project** — what users interact with day-to-day. Scaffolded via `npx --package=job-forge create-job-forge <dir>`, or hand-authored with `job-forge` listed in `package.json` dependencies.
|
|
9
9
|
|
|
10
|
-
The consumer's project root contains
|
|
10
|
+
The consumer's project root contains personal data plus symlinks into `node_modules/job-forge/`:
|
|
11
11
|
|
|
12
12
|
```
|
|
13
13
|
my-search/
|
|
14
|
-
├── package.json
|
|
15
|
-
├── opencode.json
|
|
16
|
-
├── cv.md
|
|
17
|
-
├── config/profile.yml
|
|
18
|
-
├── portals.yml
|
|
19
|
-
├── data/
|
|
20
|
-
├── reports/
|
|
21
|
-
├──
|
|
22
|
-
├──
|
|
23
|
-
|
|
24
|
-
|
|
25
|
-
├──
|
|
26
|
-
|
|
14
|
+
├── package.json # depends on "job-forge": "^2.0.0"
|
|
15
|
+
├── opencode.json # instructions: ["templates/states.yml"]
|
|
16
|
+
├── cv.md # personal
|
|
17
|
+
├── config/profile.yml # personal
|
|
18
|
+
├── portals.yml # personal
|
|
19
|
+
├── data/ # personal (gitignored)
|
|
20
|
+
├── reports/ # personal (gitignored)
|
|
21
|
+
├── AGENTS.md # personal overrides (opencode + codex)
|
|
22
|
+
├── CLAUDE.md # personal overrides (Claude Code); @-imports CLAUDE.harness.md
|
|
23
|
+
│
|
|
24
|
+
│ # ↓ symlinks regenerated on every `npm install` by bin/sync.mjs
|
|
25
|
+
├── AGENTS.harness.md # → node_modules/job-forge/AGENTS.md
|
|
26
|
+
├── CLAUDE.harness.md # → node_modules/job-forge/CLAUDE.md
|
|
27
|
+
├── .mcp.json # → Claude Code MCP config
|
|
28
|
+
├── .codex/config.toml # → Codex MCP config
|
|
29
|
+
├── .cursor/mcp.json # → Cursor MCP config
|
|
30
|
+
├── .cursor/rules/main.mdc # → Cursor always-apply rule
|
|
31
|
+
├── .opencode/skills/job-forge.md # → skill router
|
|
32
|
+
├── .opencode/agents/ # → @general-free, @general-paid, @glm-minimal
|
|
33
|
+
├── modes/ # → mode files
|
|
34
|
+
├── templates/ # → states.yml, portals.example.yml, cv-template.html
|
|
35
|
+
├── batch/batch-prompt.md # → batch worker prompt
|
|
36
|
+
├── batch/batch-runner.sh # → parallel orchestrator
|
|
37
|
+
└── node_modules/job-forge/ # harness, installed from npm
|
|
27
38
|
```
|
|
28
39
|
|
|
29
|
-
Symlinks are created by the harness's `postinstall` hook (`bin/sync.mjs`) on every `npm install`.
|
|
40
|
+
Symlinks are created by the harness's `postinstall` hook (`bin/sync.mjs`) on every `npm install`. Real files at those paths are preserved — if a user locally customizes a mode file, the sync skips that symlink and warns.
|
|
30
41
|
|
|
31
42
|
The consumer's `opencode.json` loads a small set of stable files as always-present instructions: `AGENTS.harness.md` (harness operational rules), `templates/states.yml` (canonical application states), `modes/_shared.md` (scoring model), and `cv.md` (the candidate's CV). Caching these in the prefix means agents never Read them as tool calls. Churning content (score calibration anchors, specific mode files) stays out of `instructions` and is Read on demand.
|
|
32
43
|
|
|
@@ -34,7 +45,9 @@ The skill router (`.opencode/skills/job-forge.md`) loads mode and data files on
|
|
|
34
45
|
|
|
35
46
|
**Cost-tiered subagents** live in `.opencode/agents/` (`general-free`, `general-paid`, `glm-minimal`) — the orchestrator delegates procedural work to free-tier models and reserves paid models for quality-sensitive writing. See [MODEL-ROUTING.md](MODEL-ROUTING.md) for the routing architecture, why it exists, and how to customize.
|
|
36
47
|
|
|
37
|
-
**
|
|
48
|
+
**Multi-harness support.** Because `iso/` is the single source of truth, publishing ships config for OpenCode, Cursor, Claude Code, and Codex in one tarball. Consumers run any of `opencode`, `cursor`, `claude`, or `codex` in the project and each picks up the shared MCP config + instructions via the symlinks above.
|
|
49
|
+
|
|
50
|
+
**Upgrading** the harness in a consumer project is `npm run update-harness` — pulls the latest `job-forge` from npm, refreshes the fallback plugin + pinned MCPs, re-runs symlink sync, and prints the resolved version.
|
|
38
51
|
|
|
39
52
|
## System Overview
|
|
40
53
|
|
package/docs/README.md
CHANGED
|
@@ -4,12 +4,12 @@ Guides for installing JobForge, understanding how pieces fit together, and tailo
|
|
|
4
4
|
|
|
5
5
|
## Install paths
|
|
6
6
|
|
|
7
|
-
JobForge ships
|
|
7
|
+
JobForge ships on npm as [`job-forge`](https://www.npmjs.com/package/job-forge) (v2.0.0+). Pick the path that matches your goal:
|
|
8
8
|
|
|
9
9
|
| Path | Who it's for | How |
|
|
10
10
|
|------|--------------|-----|
|
|
11
|
-
| **A — Scaffold a personal project** | Most users. You want a job search project with the harness in `node_modules`, updatable via `npm update job-forge`. | `npx
|
|
12
|
-
| **B — Clone the harness directly** | Contributors and hackers working on modes, scripts, or the scoring model. Personal files are gitignored. | `git clone https://github.com/razroo/JobForge.git && cd JobForge && npm install` |
|
|
11
|
+
| **A — Scaffold a personal project** | Most users. You want a job search project with the harness in `node_modules`, updatable via `npm update job-forge`. | `npx --package=job-forge create-job-forge my-search && cd my-search && npm install` |
|
|
12
|
+
| **B — Clone the harness directly** | Contributors and hackers working on `iso/`, modes, scripts, or the scoring model. Personal files are gitignored. | `git clone https://github.com/razroo/JobForge.git && cd JobForge && npm install && npm run build:config` |
|
|
13
13
|
|
|
14
14
|
See [SETUP.md](SETUP.md) for both paths.
|
|
15
15
|
|
package/docs/SETUP.md
CHANGED
|
@@ -10,16 +10,22 @@
|
|
|
10
10
|
|
|
11
11
|
### Path A — Scaffold a personal project (recommended)
|
|
12
12
|
|
|
13
|
-
JobForge is
|
|
13
|
+
JobForge is published on npm as [`job-forge`](https://www.npmjs.com/package/job-forge). Use the scaffolder to create a new project that keeps only your personal data (CV, profile, portals, tracker) while the harness (modes, skills, scripts, per-harness configs) lives in `node_modules/job-forge` and updates with one command.
|
|
14
14
|
|
|
15
15
|
```bash
|
|
16
16
|
# 1. Scaffold
|
|
17
|
-
npx
|
|
17
|
+
npx --package=job-forge create-job-forge my-job-search
|
|
18
18
|
cd my-job-search
|
|
19
19
|
|
|
20
|
-
# 2. Install the harness
|
|
21
|
-
# creates symlinks
|
|
22
|
-
#
|
|
20
|
+
# 2. Install the harness. `npm install` fetches job-forge@^2.0.0 from npm;
|
|
21
|
+
# its postinstall hook creates symlinks into your project root for:
|
|
22
|
+
# .opencode/{skills/job-forge.md, agents/}
|
|
23
|
+
# .cursor/mcp.json, .cursor/rules/main.mdc
|
|
24
|
+
# .mcp.json (Claude Code MCP config)
|
|
25
|
+
# .codex/config.toml (Codex MCP config)
|
|
26
|
+
# AGENTS.harness.md, CLAUDE.harness.md
|
|
27
|
+
# modes/, templates/
|
|
28
|
+
# batch/{batch-prompt.md, batch-runner.sh, README.md}
|
|
23
29
|
npm install
|
|
24
30
|
|
|
25
31
|
# 3. Fill in personal files
|
|
@@ -41,18 +47,25 @@ Paste a job URL or run `/job-forge` to see the command menu.
|
|
|
41
47
|
To **upgrade the harness** later:
|
|
42
48
|
|
|
43
49
|
```bash
|
|
44
|
-
npm update job-forge # pulls latest
|
|
50
|
+
npm update job-forge # pulls latest job-forge from the npm registry
|
|
45
51
|
npx job-forge sync # refresh symlinks if anything drifted
|
|
46
52
|
```
|
|
47
53
|
|
|
54
|
+
Or simpler, via the scaffolded script: `npm run update-harness` (also refreshes the fallback plugin + pinned MCPs, reprints the resolved version).
|
|
55
|
+
|
|
48
56
|
### Path B — Clone the harness directly
|
|
49
57
|
|
|
50
|
-
Use this if you want to hack on the harness itself (
|
|
58
|
+
Use this if you want to hack on the harness itself (edit `iso/`, tune the scoring model, add modes, contribute back). Personal files are gitignored.
|
|
51
59
|
|
|
52
60
|
```bash
|
|
53
61
|
git clone https://github.com/razroo/JobForge.git
|
|
54
62
|
cd JobForge
|
|
55
63
|
npm install
|
|
64
|
+
npm run build:config # regenerate per-harness trees from iso/ (CLAUDE.md,
|
|
65
|
+
# AGENTS.md, .mcp.json, .codex/, .cursor/, .opencode/,
|
|
66
|
+
# opencode.json) — these are gitignored but materialized
|
|
67
|
+
# locally so OpenCode/Cursor/Claude Code/Codex can read
|
|
68
|
+
# them while you develop
|
|
56
69
|
|
|
57
70
|
# Add personal files the same way as Path A
|
|
58
71
|
cp config/profile.example.yml config/profile.yml
|
|
@@ -60,7 +73,7 @@ cp templates/portals.example.yml portals.yml
|
|
|
60
73
|
# Create cv.md in the project root
|
|
61
74
|
```
|
|
62
75
|
|
|
63
|
-
When you're inside this repo, the `postinstall` symlink step is a no-op (detected and skipped). All npm scripts run the harness code directly. The repo's `opencode.json` at the project root registers the same Geometra + Gmail MCPs as the scaffolder ships to consumers.
|
|
76
|
+
When you're inside this repo, the `postinstall` symlink step is a no-op (detected and skipped). All npm scripts run the harness code directly. The repo's generated `opencode.json` at the project root registers the same Geometra + Gmail MCPs as the scaffolder ships to consumers. Re-run `npm run build:config` any time you edit something under `iso/`; `prepack` runs the same build automatically at publish time so tarballs always match `iso/`.
|
|
64
77
|
|
|
65
78
|
## Personalization
|
|
66
79
|
|
package/iso/instructions.md
CHANGED
|
@@ -10,8 +10,17 @@ The Hard Limits below are non-negotiable numeric rules. If you catch yourself ab
|
|
|
10
10
|
4. **Orchestrator does NOT fill forms.** This session MUST NOT call `geometra_fill_form`, `geometra_run_actions`, `geometra_pick_listbox_option`, or `geometra_fill_otp` when handling a multi-job request. If you need to, it means you MUST have delegated — `task` out the remaining work instead.
|
|
11
11
|
5. **Re-dispatch only AFTER the previous subagent returns.** Never fire the same company's `task` twice while the first is still in-flight. Wait for the return value, then decide if a retry is warranted.
|
|
12
12
|
6. **Application outcomes flow through TSVs, not `data/pipeline.md`.** When a subagent returns APPLIED / FAILED / SKIP, the outcome goes to `batch/tracker-additions/{num}-{slug}.tsv`. `node merge-tracker.mjs` then consumes the TSVs into the correct `data/applications/YYYY-MM-DD.md` day file. `data/pipeline.md` only tracks URL inbox state (`[ ]` pending → `[x]` processed). **NEVER append APPLIED / FAILED status lines to `pipeline.md`** — that's the day file's job, via the TSV pathway. After any multi-apply run, the orchestrator MUST run `node merge-tracker.mjs` followed by `node verify-pipeline.mjs` before ending the session.
|
|
13
|
+
7. **URLs passed to downstream subagents must come from a file, not from a prior subagent's prose.** When an orchestrator dispatches a subagent with a URL (for evaluation, apply, verification, etc.), the URL MUST originate from:
|
|
14
|
+
- `data/pipeline.md`
|
|
15
|
+
- `data/scan-history.tsv`
|
|
16
|
+
- `batch/scan-output-*.md` or similar structured output file
|
|
17
|
+
- A report file (`reports/{num}-*.md`) with an authoritative `**URL:**` header
|
|
13
18
|
|
|
14
|
-
|
|
19
|
+
URLs mentioned in a subagent's return message are NOT trustworthy by default — they may be hallucinated or reconstructed. Before passing any URL from a subagent report to another subagent, cross-check it exists in one of the authoritative files above, OR instruct the dispatching subagent to write its output to a structured file and re-read from that file.
|
|
20
|
+
|
|
21
|
+
**Why**: a scan subagent once reported 30 plausible-looking Greenhouse IDs in its return message that did not exist in the Greenhouse API. The orchestrator dispatched 30 downstream subagents that all failed verification. Trusting prose-form URLs cost ~2 hours of wasted work and corrupted the tracker.
|
|
22
|
+
|
|
23
|
+
Everything below is context and rationale. These seven numbers are the rules.
|
|
15
24
|
|
|
16
25
|
---
|
|
17
26
|
|
package/modes/auto-pipeline.md
CHANGED
|
@@ -8,9 +8,12 @@ Fetch the JD content once. If the input is a **URL** (not pasted JD text), fetch
|
|
|
8
8
|
|
|
9
9
|
**Pick exactly one method, in this priority order:**
|
|
10
10
|
|
|
11
|
-
1. **
|
|
12
|
-
2. **
|
|
13
|
-
3. **
|
|
11
|
+
1. **Greenhouse JSON API (first try, if the URL is Greenhouse-backed):** If the pipeline.md entry carries `| gh={slug}/{id}` OR the URL host matches `*.greenhouse.io` / a known Greenhouse customer front-end (`*.pinterestcareers.com`, `okta.com/company/careers/opportunity/*`, `samsara.com/company/careers/roles/*`, `zoominfo.com/careers?gh_jid=*`, `collibra.com/.../?gh_jid=*`, `careers.toasttab.com/jobs?gh_jid=*`, `careers.airbnb.com/positions/*?gh_jid=*`, `coinbase.com/careers/positions/*?gh_jid=*`, `instacart.careers/job/?gh_jid=*`), extract `slug` and `id` and WebFetch `https://boards-api.greenhouse.io/v1/boards/{slug}/jobs/{id}`. 200 + JSON with `content` is the authoritative JD. 404 = genuinely closed (mark CLOSED and stop). **If 200, STOP — do not fall back to Geometra or WebFetch of the front-end.** The API is faster, cheaper (no Geometra session), and never returns a bot-shell.
|
|
12
|
+
2. **Geometra MCP:** Most non-Greenhouse job portals (Lever, Ashby, Workday) are SPAs. Use `geometra_connect` + `geometra_page_model` to render and read the JD. **If this returns non-empty JD text, STOP — do not WebFetch the same URL.**
|
|
13
|
+
3. **WebFetch (only if Geometra is unavailable OR returned only a shell with no JD text):** For static pages (ZipRecruiter, WeLoveProduct, company career pages).
|
|
14
|
+
4. **WebSearch (only if methods 1–3 all failed):** Search for the role title + company on secondary portals that index the JD in static HTML.
|
|
15
|
+
|
|
16
|
+
**Do NOT mark a Greenhouse-sourced offer CLOSED based on a WebFetch shell or a 403 from a customer-skinned careers domain.** Pinterest, Okta, Samsara, ZoomInfo, Collibra, Toast, Airbnb, Coinbase, Instacart all serve bot-hostile fronts. The Greenhouse JSON API (step 1) is the ground truth for their offer state. A previous scan run fed 60 live Greenhouse URLs through WebFetch-only verification and 100% of them were wrongly marked CLOSED; if you see a high stale rate, you are skipping step 1.
|
|
14
17
|
|
|
15
18
|
**Rule:** Each URL gets fetched at most once per session. If you already have the JD text in context — from Geometra, a previous WebFetch, or pasted by the candidate — do not fetch again.
|
|
16
19
|
|
package/modes/pipeline.md
CHANGED
|
@@ -33,9 +33,10 @@ Processes accumulated job offer URLs from `data/pipeline.md`. The user adds URLs
|
|
|
33
33
|
|
|
34
34
|
## Detect JD From URL
|
|
35
35
|
|
|
36
|
-
1. **
|
|
37
|
-
2. **
|
|
38
|
-
3. **
|
|
36
|
+
1. **Greenhouse JSON API (FIRST, when the entry has `| gh={slug}/{id}` OR the host looks Greenhouse-backed):** WebFetch `https://boards-api.greenhouse.io/v1/boards/{slug}/jobs/{id}`. 200 + JSON with `content` = LIVE, use it as the JD; 404 = genuinely CLOSED (mark `- [!]` and continue). Bot-hostile customer fronts (`pinterestcareers.com`, `okta.com`, `samsara.com`, `zoominfo.com`, `collibra.com`, `careers.toasttab.com`, `careers.airbnb.com`, `coinbase.com`, `instacart.careers`, `careers.toasttab.com`) MUST be verified via this API first — WebFetch/Geometra of those domains returns a shell or 403 and causes false CLOSED marks.
|
|
37
|
+
2. **Geometra MCP:** `geometra_connect` + `geometra_page_model`. Works with non-Greenhouse SPAs (Lever, Ashby, Workday), uses fewer tokens than raw DOM snapshots.
|
|
38
|
+
3. **WebFetch (fallback):** For static pages or when Geometra is not available.
|
|
39
|
+
4. **WebSearch (last resort):** Search on secondary portals that index the JD.
|
|
39
40
|
|
|
40
41
|
**Special cases:**
|
|
41
42
|
- **LinkedIn**: May require login → mark `[!]` and ask the user to paste the text
|
package/modes/scan.md
CHANGED
|
@@ -68,8 +68,12 @@ The levels are additive — all are executed, results are merged and deduplicate
|
|
|
68
68
|
5. **Level 2 — Greenhouse APIs** (WebFetch can batch freely — it's cheap and doesn't use Geometra sessions):
|
|
69
69
|
For each company in `tracked_companies` with `api:` defined and `enabled: true`:
|
|
70
70
|
a. WebFetch the API URL → JSON with job list
|
|
71
|
-
b. For each job extract: `{title, url, company}`
|
|
72
|
-
|
|
71
|
+
b. For each job extract: `{title, url, company, gh_slug, gh_id, updated_at}`
|
|
72
|
+
- **`url`**: ALWAYS record the canonical Greenhouse URL: `https://job-boards.greenhouse.io/{gh_slug}/jobs/{gh_id}`. Do **NOT** use `absolute_url` when it points to a customer-skinned front-end (e.g. `pinterestcareers.com/jobs/?gh_jid=N`, `okta.com/company/careers/opportunity/N`, `samsara.com/company/careers/roles/N`, `zoominfo.com/careers?gh_jid=N`, `collibra.com/.../?gh_jid=N`, `careers.toasttab.com/jobs?gh_jid=N`, `careers.airbnb.com/positions/N`, `coinbase.com/careers/positions/N`, `instacart.careers/job/?gh_jid=N`, `pinterestcareers.com/jobs/?gh_jid=N`). These customer front-ends return shells or 403 to bots and cause downstream WebFetch-based verification to wrongly mark the role CLOSED.
|
|
73
|
+
- **`gh_slug`**: the Greenhouse board slug (from the API URL that was fetched).
|
|
74
|
+
- **`gh_id`**: `jobs[].id` from the API response.
|
|
75
|
+
- **`updated_at`**: `jobs[].updated_at` — record for staleness detection (skip if older than 90 days, flag if older than 30).
|
|
76
|
+
c. Accumulate in candidates list (dedup with Level 1). The pipeline.md entry MUST carry `| gh={gh_slug}/{gh_id}` at the end of the metadata so downstream evaluators can fall back to `https://boards-api.greenhouse.io/v1/boards/{gh_slug}/jobs/{gh_id}` when the canonical URL renders as a shell.
|
|
73
77
|
|
|
74
78
|
6. **Level 3 — WebSearch queries** (WebSearch is parallel-safe; batch freely):
|
|
75
79
|
For each query in `search_queries` with `enabled: true`:
|
|
@@ -102,7 +106,7 @@ The levels are additive — all are executed, results are merged and deduplicate
|
|
|
102
106
|
- When a fuzzy match is found but the URL is new, log it as `skipped_repost` (not `skipped_dup`) with a note referencing the original entry number.
|
|
103
107
|
|
|
104
108
|
8. **For each new offer that passes filters**:
|
|
105
|
-
a. Add to `pipeline.md` section "Pending": `- [ ] {url} | {company} | {title}`
|
|
109
|
+
a. Add to `pipeline.md` section "Pending": `- [ ] {url} | {company} | {title}` — append `| gh={gh_slug}/{gh_id}` when the offer came from the Greenhouse API (Level 2) so downstream verification can hit the JSON endpoint.
|
|
106
110
|
b. Record in `scan-history.tsv`: `{url}\t{date}\t{query_name}\t{title}\t{company}\tadded`
|
|
107
111
|
|
|
108
112
|
9. **Offers filtered by title**: record in `scan-history.tsv` with status `skipped_title`
|
|
@@ -137,6 +141,30 @@ https://... 2026-02-10 Greenhouse — SA Junior Dev BigCo skipped_title
|
|
|
137
141
|
https://... 2026-02-10 Ashby — AI PM SA AI OldCo skipped_dup
|
|
138
142
|
```
|
|
139
143
|
|
|
144
|
+
## Structured Output — Required for Downstream Dispatch
|
|
145
|
+
|
|
146
|
+
Scan mode MUST write its ranked candidate list to a file, not just return it in prose. Downstream subagents (evaluators, applyers) must read URLs from this file, not from the scan subagent's return message. This prevents any hallucinated URL or ID from propagating.
|
|
147
|
+
|
|
148
|
+
**File location**: `batch/scan-output-{YYYY-MM-DD}.md`
|
|
149
|
+
|
|
150
|
+
**Format**: one markdown table per scan run, ordered by archetype-fit rank:
|
|
151
|
+
|
|
152
|
+
| rank | company | role | gh_slug | gh_id | url | updated_at |
|
|
153
|
+
|------|---------|------|---------|-------|-----|------------|
|
|
154
|
+
| 1 | Webflow | Lead AI Engineer | webflow | 7689676 | https://job-boards.greenhouse.io/webflow/jobs/7689676 | 2026-04-14 |
|
|
155
|
+
| ... | ... | ... | ... | ... | ... | ... |
|
|
156
|
+
|
|
157
|
+
Every row MUST have:
|
|
158
|
+
- `gh_slug` and `gh_id` copied verbatim from the Greenhouse API response (not reconstructed)
|
|
159
|
+
- `url` in the canonical form `https://job-boards.greenhouse.io/{gh_slug}/jobs/{gh_id}` (matching the suffix in `data/pipeline.md`)
|
|
160
|
+
- `updated_at` in `YYYY-MM-DD` form (the most recent `updated_at` in the API response)
|
|
161
|
+
|
|
162
|
+
The scan subagent's return message MUST:
|
|
163
|
+
- Reference the file path (so orchestrators know where to read)
|
|
164
|
+
- Omit the ranked URL list from prose entirely (summary counts only)
|
|
165
|
+
|
|
166
|
+
**Rationale**: in a prior run, a scan subagent returned correct IDs in `scan-history.tsv` but hallucinated plausible-looking fake IDs in its prose-form top-30 list. The orchestrator trusted prose and dispatched 30 downstream subagents against fake URLs. File-based handoff prevents this class of error.
|
|
167
|
+
|
|
140
168
|
## Output Summary
|
|
141
169
|
|
|
142
170
|
```
|
|
@@ -148,12 +176,27 @@ Filtered by title: N relevant
|
|
|
148
176
|
Duplicates: N (already evaluated or in pipeline)
|
|
149
177
|
New added to pipeline.md: N
|
|
150
178
|
|
|
151
|
-
|
|
152
|
-
|
|
153
|
-
|
|
154
|
-
|
|
179
|
+
NEXT STEP RECOMMENDATION:
|
|
180
|
+
- Structured candidate list written to: batch/scan-output-{YYYY-MM-DD}.md
|
|
181
|
+
- Downstream subagents MUST read URLs from that file, not from this return message
|
|
182
|
+
- Run /job-forge pipeline to evaluate the new offers.
|
|
155
183
|
```
|
|
156
184
|
|
|
185
|
+
## Verify Before Marking CLOSED (downstream rule)
|
|
186
|
+
|
|
187
|
+
**DO NOT mark a Greenhouse offer CLOSED based on a WebFetch/Geometra result alone.** Customer-skinned careers pages (`pinterestcareers.com`, `okta.com`, `samsara.com`, `zoominfo.com`, `collibra.com`, `careers.toasttab.com`, `careers.airbnb.com`, `coinbase.com`, `instacart.careers`, etc.) serve bot-hostile shells — a 403, a navbar-only response, or a client-side-only render. WebFetch sees "no JD" and mis-classifies as CLOSED.
|
|
188
|
+
|
|
189
|
+
**Correct verification order for any Greenhouse-sourced URL** (identified by a `| gh={slug}/{id}` suffix in `pipeline.md` or a `boards-api.greenhouse.io` / `job-boards.greenhouse.io` / `boards.greenhouse.io` host):
|
|
190
|
+
|
|
191
|
+
1. Try `https://boards-api.greenhouse.io/v1/boards/{slug}/jobs/{id}`. This is the authoritative source.
|
|
192
|
+
- **200 + JSON with `title` and `content`** → offer is LIVE. Use the JSON content as the JD. Do not mark CLOSED.
|
|
193
|
+
- **404** → offer is genuinely closed. Mark CLOSED.
|
|
194
|
+
- **Other non-2xx** → treat as transient (network/rate-limit); retry once. If still failing, mark `**Verification: unconfirmed**` and continue evaluation from whatever text is available. Do NOT mark CLOSED.
|
|
195
|
+
2. Only then fall back to WebFetch of the canonical `job-boards.greenhouse.io/{slug}/jobs/{id}` URL.
|
|
196
|
+
3. Only then fall back to Geometra on the same canonical URL.
|
|
197
|
+
|
|
198
|
+
**Rule of thumb:** Greenhouse postings with valid `gh_slug`/`gh_id` should be verified via the API first. A WebFetch failure on a customer-skinned domain is NOT evidence the role is closed.
|
|
199
|
+
|
|
157
200
|
## Update careers_url
|
|
158
201
|
|
|
159
202
|
Each company in `tracked_companies` MUST have a `careers_url` — the direct URL to its job listings page. The stored URL avoids searching for it every time.
|