job-forge 2.14.21 → 2.14.23
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.cursor/rules/main.mdc +11 -5
- package/.opencode/skills/job-forge.md +10 -0
- package/AGENTS.md +11 -5
- package/CLAUDE.md +11 -5
- package/README.md +7 -3
- package/bin/create-job-forge.mjs +7 -0
- package/bin/job-forge.mjs +70 -0
- package/docs/ARCHITECTURE.md +6 -2
- package/docs/CUSTOMIZATION.md +4 -0
- package/docs/SETUP.md +4 -0
- package/iso/commands/job-forge.md +10 -0
- package/iso/instructions.md +11 -5
- package/lib/jobforge-cache.mjs +105 -0
- package/lib/jobforge-index.mjs +92 -0
- package/modes/auto-pipeline.md +3 -1
- package/package.json +20 -2
- package/scripts/cache.mjs +313 -0
- package/scripts/index.mjs +210 -0
- package/templates/capabilities.json +5 -1
- package/templates/index.json +144 -0
- package/verify-pipeline.mjs +20 -0
package/.cursor/rules/main.mdc
CHANGED
|
@@ -12,7 +12,7 @@ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs v
|
|
|
12
12
|
- [H1] Max 2 parallel `task` dispatches per message. For N jobs, run `ceil(N/2)` sequential rounds of 2. A round is not complete until both subagents return a final outcome (`APPLIED`, `APPLY FAILED`, `SKIP`, `Discarded`, or a written TSV path). A `task` tool result that only gives a session id / title is a launch acknowledgement, not completion. Applies in all modes, for all user phrasings ("urgent", "apply to 10 jobs now").
|
|
13
13
|
why: each subagent requires post-cleanup and racing more than 2 reliably loses at least one result. On 2026-04-25 the orchestrator launched round 2 while round 1 had only returned task ids, leaving four application subagents in flight and losing two provider recoveries
|
|
14
14
|
|
|
15
|
-
- [H2] Max 1 application per company+role. Before every `apply` dispatch, grep all four sources for the URL and for `company+role`: `data/pipeline.md`, all `data/applications/*.md` day files, `batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`. If `.jobforge-ledger/events.jsonl` exists, `npx job-forge ledger:has --company "..." --role "..." --status Applied` may be used as a fast prefilter; a match is enough to drop that duplicate before dispatch. For candidates not rejected by the ledger, the four-source grep is still mandatory. If any source shows APPLIED / Applied, skip the dispatch and pick a replacement from the remaining candidate list. Do not count duplicates toward a requested "apply to N jobs" total, and do not delegate obvious duplicates just so a subagent can return SKIP.
|
|
15
|
+
- [H2] Max 1 application per company+role. Before every `apply` dispatch, grep all four sources for the URL and for `company+role`: `data/pipeline.md`, all `data/applications/*.md` day files, `batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`. `npx job-forge index:has --key "company-role:..."` may be used first as a fast local artifact prefilter; it rebuilds `.jobforge-index.json` on demand from `templates/index.json`, and a company+role hit is enough to drop an obvious duplicate before dispatch. If `.jobforge-ledger/events.jsonl` exists, `npx job-forge ledger:has --company "..." --role "..." --status Applied` may also be used as a fast prefilter; a match is enough to drop that duplicate before dispatch. For candidates not rejected by the index or ledger, the four-source grep is still mandatory. If any source shows APPLIED / Applied, skip the dispatch and pick a replacement from the remaining candidate list. Do not count duplicates toward a requested "apply to N jobs" total, and do not delegate obvious duplicates just so a subagent can return SKIP.
|
|
16
16
|
why: 2026-04 same-day batch collision — when two batches target the same role, `npx job-forge merge` updates the existing day-file row rather than appending, so grepping day files alone misses earlier-batch applies; merged/*.tsv is the only place the breadcrumb remains
|
|
17
17
|
|
|
18
18
|
- [H3] Before every batch of `task` dispatches that will use Geometra, call `geometra_list_sessions` then `geometra_disconnect({closeBrowser: true})`. Every round, no exceptions. Name this cleanup as an explicit "step 0" in your first-response plan for any multi-apply request — it is the most frequently skipped guardrail in practice, and skipping it produces cascade "Not connected" failures on the next dispatch.
|
|
@@ -24,13 +24,13 @@ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs v
|
|
|
24
24
|
- [H5] Re-dispatch the same company only AFTER the previous subagent returns. Never fire the same `task` twice while the first is still in flight.
|
|
25
25
|
why: two in-flight subagents for the same URL race on Geometra sessions and on tracker TSV writes, corrupting state and sometimes double-submitting
|
|
26
26
|
|
|
27
|
-
- [H5b] Do not use `task` to poll task status. If OpenCode returns a task/session id without a final result, record the id, stop dispatching new rounds, and tell the user the round is still in flight. When the user asks to check later, inspect authoritative files (`batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`, day files, `.jobforge-ledger/events.jsonl`, or `iso-trace`) rather than spawning a "check task status" subagent.
|
|
27
|
+
- [H5b] Do not use `task` to poll task status. If OpenCode returns a task/session id without a final result, record the id, stop dispatching new rounds, and tell the user the round is still in flight. When the user asks to check later, inspect authoritative files (`batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`, day files, `.jobforge-ledger/events.jsonl`, `.jobforge-index.json`, or `iso-trace`) rather than spawning a "check task status" subagent.
|
|
28
28
|
why: OpenCode status prompts can be delivered into the target subagent as a new user message; a 2026-04-25 trace caused a subagent to call `task` recursively instead of finishing the application
|
|
29
29
|
|
|
30
30
|
- [H6] Application outcomes flow through `batch/tracker-additions/*.tsv`, not `data/pipeline.md`. After any multi-apply run, the orchestrator MUST run `npx job-forge merge` then `npx job-forge verify` before ending the session.
|
|
31
31
|
why: `pipeline.md` is the URL inbox (`[ ]` pending → `[x]` processed); `data/applications/YYYY-MM-DD.md` is the outcome log; the TSV pathway is the only safe bridge because `merge` handles column order and duplicate detection
|
|
32
32
|
|
|
33
|
-
- [H7] Load-bearing facts passed to downstream subagents must originate from a file, not from prior subagent prose. Authoritative sources: `data/pipeline.md`, `data/scan-history.tsv`, `batch/scan-output-*.md`, `reports/{num}-*.md` with `**URL:**` / `**Score:**` headers, `batch/tracker-additions/*.tsv
|
|
33
|
+
- [H7] Load-bearing facts passed to downstream subagents must originate from a file, not from prior subagent prose. Authoritative sources: `data/pipeline.md`, `data/scan-history.tsv`, `batch/scan-output-*.md`, `reports/{num}-*.md` with `**URL:**` / `**Score:**` headers, `batch/tracker-additions/*.tsv`, cached JD content returned by `npx job-forge cache:get --url ...`, and source path/line pointers returned by `npx job-forge index:query ...`.
|
|
34
34
|
why: 2026-04-18 scan subagent returned 30 fabricated Greenhouse IDs in prose (plausible-looking, non-existent); orchestrator dispatched 30 downstream subagents that all 404'd. Subagents can hallucinate IDs, scores, and confirmation text — round-trip through a file or don't trust the value
|
|
35
35
|
|
|
36
36
|
- [H8] Never paste proxy values from `config/profile.yml` into `task` prompts, status text, or summaries. If a proxy is configured, tell the subagent exactly: "Proxy is configured; read `config/profile.yml` and pass its top-level `proxy:` object to every `geometra_connect` call." Do not transcribe `server`, `username`, `password`, or `bypass`, even if you just read them from disk.
|
|
@@ -71,12 +71,18 @@ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs v
|
|
|
71
71
|
- [D11] Treat `templates/context.json` as the source of truth for mode/reference context bundles. Use `npx job-forge context:plan <mode>` or `npx job-forge context:check <mode>` when changing or validating what a mode loads; do not paste the full context matrix into prompts.
|
|
72
72
|
why: deterministic context bundles prevent reference-file drift and accidental token bloat without adding MCP/tool-schema tokens
|
|
73
73
|
|
|
74
|
+
- [D12] Use `job-forge cache:*` for deterministic local artifact reuse when available. For URL inputs, check `npx job-forge cache:has --url "..."` / `cache:get` before browser or network JD fetches; after a successful fetch, store the exact JD text with `npx job-forge cache:put --url "..." --ttl 14d --input @file` when it is already on disk.
|
|
75
|
+
why: `iso-cache` is not an MCP and adds no prompt/tool-schema tokens; it avoids repeated JD fetch/render passes and lets future sessions reuse stable content from `.jobforge-cache/`
|
|
76
|
+
|
|
77
|
+
- [D13] Use `job-forge index:*` for deterministic artifact lookup when available. `index:has` and `index:query` rebuild `.jobforge-index.json` from `templates/index.json` on demand, covering reports, tracker day files, tracker TSVs, pipeline URLs, scan history, and ledger events without loading those growing files into prompt context.
|
|
78
|
+
why: `iso-index` is not an MCP and adds no prompt/tool-schema tokens; it gives agents compact file/line pointers and duplicate prefilters before expensive reads or browser dispatches
|
|
79
|
+
|
|
74
80
|
## Procedure
|
|
75
81
|
|
|
76
82
|
1. Check `cv.md`, `profile.yml`, and `portals.yml`; onboard if any file is missing.
|
|
77
83
|
2. Pick and name the mode from **Routing** [D6]. No match → ask; do not guess.
|
|
78
|
-
3. Read the active mode file [D3]
|
|
79
|
-
4. Prepare Geometra dispatches: cleanup [H3], ledger prefilter when
|
|
84
|
+
3. Read the active mode file [D3]. Use context bundle checks when changing context loads [D11]. Check cached artifacts before URL/JD refetches [D12]. Use artifact index lookups before broad file reads when they can answer the question [D13]. Decide inline vs delegated work [D1].
|
|
85
|
+
4. Prepare Geometra dispatches: cleanup [H3], index/ledger prefilter when useful [D8, D13], dedupe [H2], location filter [D5], routing [D2, D10], proxy prompt hygiene [H8].
|
|
80
86
|
5. Dispatch at most 2 tasks per round [H1]; wait for final outcomes, not just task ids [H5b].
|
|
81
87
|
6. Keep multi-job form-filling out of the orchestrator [H4].
|
|
82
88
|
7. Cross-check subagent facts against authoritative files [H7].
|
|
@@ -73,6 +73,11 @@ Local workflow ledger (terminal, outside opencode):
|
|
|
73
73
|
npx job-forge ledger:status # .jobforge-ledger/events.jsonl summary
|
|
74
74
|
npx job-forge ledger:has --company "Acme" --role "Staff Engineer" --status Applied
|
|
75
75
|
|
|
76
|
+
Local artifact index (terminal, outside opencode):
|
|
77
|
+
npx job-forge index:status # .jobforge-index.json summary
|
|
78
|
+
npx job-forge index:has --key "company-role:acme:staff-engineer"
|
|
79
|
+
npx job-forge index:query "acme"
|
|
80
|
+
|
|
76
81
|
Artifact contracts (terminal, outside opencode):
|
|
77
82
|
npx iso-contract explain jobforge.tracker-row --contracts templates/contracts.json
|
|
78
83
|
npx job-forge tracker-line ... --write # renders + validates tracker TSV locally
|
|
@@ -158,6 +163,11 @@ Step 1 — Enumerate candidates
|
|
|
158
163
|
- Build ordered list: candidates = [job_1, job_2, ..., job_N]
|
|
159
164
|
|
|
160
165
|
Step 2 — Dedup against already-applied
|
|
166
|
+
- Run npx job-forge index:has --key "company-role:<company-slug>:<role-slug>"
|
|
167
|
+
as a fast local artifact prefilter when company+role is known. It rebuilds
|
|
168
|
+
.jobforge-index.json on demand from templates/index.json. A hit means the
|
|
169
|
+
role has already appeared in tracker files or tracker TSVs and can be
|
|
170
|
+
dropped before dispatch.
|
|
161
171
|
- If .jobforge-ledger/events.jsonl exists, use npx job-forge ledger:has as a
|
|
162
172
|
fast prefilter for obvious company+role Applied duplicates. A ledger match
|
|
163
173
|
can be dropped before dispatch without loading tracker files into context.
|
package/AGENTS.md
CHANGED
|
@@ -7,7 +7,7 @@ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs v
|
|
|
7
7
|
- [H1] Max 2 parallel `task` dispatches per message. For N jobs, run `ceil(N/2)` sequential rounds of 2. A round is not complete until both subagents return a final outcome (`APPLIED`, `APPLY FAILED`, `SKIP`, `Discarded`, or a written TSV path). A `task` tool result that only gives a session id / title is a launch acknowledgement, not completion. Applies in all modes, for all user phrasings ("urgent", "apply to 10 jobs now").
|
|
8
8
|
why: each subagent requires post-cleanup and racing more than 2 reliably loses at least one result. On 2026-04-25 the orchestrator launched round 2 while round 1 had only returned task ids, leaving four application subagents in flight and losing two provider recoveries
|
|
9
9
|
|
|
10
|
-
- [H2] Max 1 application per company+role. Before every `apply` dispatch, grep all four sources for the URL and for `company+role`: `data/pipeline.md`, all `data/applications/*.md` day files, `batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`. If `.jobforge-ledger/events.jsonl` exists, `npx job-forge ledger:has --company "..." --role "..." --status Applied` may be used as a fast prefilter; a match is enough to drop that duplicate before dispatch. For candidates not rejected by the ledger, the four-source grep is still mandatory. If any source shows APPLIED / Applied, skip the dispatch and pick a replacement from the remaining candidate list. Do not count duplicates toward a requested "apply to N jobs" total, and do not delegate obvious duplicates just so a subagent can return SKIP.
|
|
10
|
+
- [H2] Max 1 application per company+role. Before every `apply` dispatch, grep all four sources for the URL and for `company+role`: `data/pipeline.md`, all `data/applications/*.md` day files, `batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`. `npx job-forge index:has --key "company-role:..."` may be used first as a fast local artifact prefilter; it rebuilds `.jobforge-index.json` on demand from `templates/index.json`, and a company+role hit is enough to drop an obvious duplicate before dispatch. If `.jobforge-ledger/events.jsonl` exists, `npx job-forge ledger:has --company "..." --role "..." --status Applied` may also be used as a fast prefilter; a match is enough to drop that duplicate before dispatch. For candidates not rejected by the index or ledger, the four-source grep is still mandatory. If any source shows APPLIED / Applied, skip the dispatch and pick a replacement from the remaining candidate list. Do not count duplicates toward a requested "apply to N jobs" total, and do not delegate obvious duplicates just so a subagent can return SKIP.
|
|
11
11
|
why: 2026-04 same-day batch collision — when two batches target the same role, `npx job-forge merge` updates the existing day-file row rather than appending, so grepping day files alone misses earlier-batch applies; merged/*.tsv is the only place the breadcrumb remains
|
|
12
12
|
|
|
13
13
|
- [H3] Before every batch of `task` dispatches that will use Geometra, call `geometra_list_sessions` then `geometra_disconnect({closeBrowser: true})`. Every round, no exceptions. Name this cleanup as an explicit "step 0" in your first-response plan for any multi-apply request — it is the most frequently skipped guardrail in practice, and skipping it produces cascade "Not connected" failures on the next dispatch.
|
|
@@ -19,13 +19,13 @@ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs v
|
|
|
19
19
|
- [H5] Re-dispatch the same company only AFTER the previous subagent returns. Never fire the same `task` twice while the first is still in flight.
|
|
20
20
|
why: two in-flight subagents for the same URL race on Geometra sessions and on tracker TSV writes, corrupting state and sometimes double-submitting
|
|
21
21
|
|
|
22
|
-
- [H5b] Do not use `task` to poll task status. If OpenCode returns a task/session id without a final result, record the id, stop dispatching new rounds, and tell the user the round is still in flight. When the user asks to check later, inspect authoritative files (`batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`, day files, `.jobforge-ledger/events.jsonl`, or `iso-trace`) rather than spawning a "check task status" subagent.
|
|
22
|
+
- [H5b] Do not use `task` to poll task status. If OpenCode returns a task/session id without a final result, record the id, stop dispatching new rounds, and tell the user the round is still in flight. When the user asks to check later, inspect authoritative files (`batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`, day files, `.jobforge-ledger/events.jsonl`, `.jobforge-index.json`, or `iso-trace`) rather than spawning a "check task status" subagent.
|
|
23
23
|
why: OpenCode status prompts can be delivered into the target subagent as a new user message; a 2026-04-25 trace caused a subagent to call `task` recursively instead of finishing the application
|
|
24
24
|
|
|
25
25
|
- [H6] Application outcomes flow through `batch/tracker-additions/*.tsv`, not `data/pipeline.md`. After any multi-apply run, the orchestrator MUST run `npx job-forge merge` then `npx job-forge verify` before ending the session.
|
|
26
26
|
why: `pipeline.md` is the URL inbox (`[ ]` pending → `[x]` processed); `data/applications/YYYY-MM-DD.md` is the outcome log; the TSV pathway is the only safe bridge because `merge` handles column order and duplicate detection
|
|
27
27
|
|
|
28
|
-
- [H7] Load-bearing facts passed to downstream subagents must originate from a file, not from prior subagent prose. Authoritative sources: `data/pipeline.md`, `data/scan-history.tsv`, `batch/scan-output-*.md`, `reports/{num}-*.md` with `**URL:**` / `**Score:**` headers, `batch/tracker-additions/*.tsv
|
|
28
|
+
- [H7] Load-bearing facts passed to downstream subagents must originate from a file, not from prior subagent prose. Authoritative sources: `data/pipeline.md`, `data/scan-history.tsv`, `batch/scan-output-*.md`, `reports/{num}-*.md` with `**URL:**` / `**Score:**` headers, `batch/tracker-additions/*.tsv`, cached JD content returned by `npx job-forge cache:get --url ...`, and source path/line pointers returned by `npx job-forge index:query ...`.
|
|
29
29
|
why: 2026-04-18 scan subagent returned 30 fabricated Greenhouse IDs in prose (plausible-looking, non-existent); orchestrator dispatched 30 downstream subagents that all 404'd. Subagents can hallucinate IDs, scores, and confirmation text — round-trip through a file or don't trust the value
|
|
30
30
|
|
|
31
31
|
- [H8] Never paste proxy values from `config/profile.yml` into `task` prompts, status text, or summaries. If a proxy is configured, tell the subagent exactly: "Proxy is configured; read `config/profile.yml` and pass its top-level `proxy:` object to every `geometra_connect` call." Do not transcribe `server`, `username`, `password`, or `bypass`, even if you just read them from disk.
|
|
@@ -66,12 +66,18 @@ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs v
|
|
|
66
66
|
- [D11] Treat `templates/context.json` as the source of truth for mode/reference context bundles. Use `npx job-forge context:plan <mode>` or `npx job-forge context:check <mode>` when changing or validating what a mode loads; do not paste the full context matrix into prompts.
|
|
67
67
|
why: deterministic context bundles prevent reference-file drift and accidental token bloat without adding MCP/tool-schema tokens
|
|
68
68
|
|
|
69
|
+
- [D12] Use `job-forge cache:*` for deterministic local artifact reuse when available. For URL inputs, check `npx job-forge cache:has --url "..."` / `cache:get` before browser or network JD fetches; after a successful fetch, store the exact JD text with `npx job-forge cache:put --url "..." --ttl 14d --input @file` when it is already on disk.
|
|
70
|
+
why: `iso-cache` is not an MCP and adds no prompt/tool-schema tokens; it avoids repeated JD fetch/render passes and lets future sessions reuse stable content from `.jobforge-cache/`
|
|
71
|
+
|
|
72
|
+
- [D13] Use `job-forge index:*` for deterministic artifact lookup when available. `index:has` and `index:query` rebuild `.jobforge-index.json` from `templates/index.json` on demand, covering reports, tracker day files, tracker TSVs, pipeline URLs, scan history, and ledger events without loading those growing files into prompt context.
|
|
73
|
+
why: `iso-index` is not an MCP and adds no prompt/tool-schema tokens; it gives agents compact file/line pointers and duplicate prefilters before expensive reads or browser dispatches
|
|
74
|
+
|
|
69
75
|
## Procedure
|
|
70
76
|
|
|
71
77
|
1. Check `cv.md`, `profile.yml`, and `portals.yml`; onboard if any file is missing.
|
|
72
78
|
2. Pick and name the mode from **Routing** [D6]. No match → ask; do not guess.
|
|
73
|
-
3. Read the active mode file [D3]
|
|
74
|
-
4. Prepare Geometra dispatches: cleanup [H3], ledger prefilter when
|
|
79
|
+
3. Read the active mode file [D3]. Use context bundle checks when changing context loads [D11]. Check cached artifacts before URL/JD refetches [D12]. Use artifact index lookups before broad file reads when they can answer the question [D13]. Decide inline vs delegated work [D1].
|
|
80
|
+
4. Prepare Geometra dispatches: cleanup [H3], index/ledger prefilter when useful [D8, D13], dedupe [H2], location filter [D5], routing [D2, D10], proxy prompt hygiene [H8].
|
|
75
81
|
5. Dispatch at most 2 tasks per round [H1]; wait for final outcomes, not just task ids [H5b].
|
|
76
82
|
6. Keep multi-job form-filling out of the orchestrator [H4].
|
|
77
83
|
7. Cross-check subagent facts against authoritative files [H7].
|
package/CLAUDE.md
CHANGED
|
@@ -7,7 +7,7 @@ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs v
|
|
|
7
7
|
- [H1] Max 2 parallel `task` dispatches per message. For N jobs, run `ceil(N/2)` sequential rounds of 2. A round is not complete until both subagents return a final outcome (`APPLIED`, `APPLY FAILED`, `SKIP`, `Discarded`, or a written TSV path). A `task` tool result that only gives a session id / title is a launch acknowledgement, not completion. Applies in all modes, for all user phrasings ("urgent", "apply to 10 jobs now").
|
|
8
8
|
why: each subagent requires post-cleanup and racing more than 2 reliably loses at least one result. On 2026-04-25 the orchestrator launched round 2 while round 1 had only returned task ids, leaving four application subagents in flight and losing two provider recoveries
|
|
9
9
|
|
|
10
|
-
- [H2] Max 1 application per company+role. Before every `apply` dispatch, grep all four sources for the URL and for `company+role`: `data/pipeline.md`, all `data/applications/*.md` day files, `batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`. If `.jobforge-ledger/events.jsonl` exists, `npx job-forge ledger:has --company "..." --role "..." --status Applied` may be used as a fast prefilter; a match is enough to drop that duplicate before dispatch. For candidates not rejected by the ledger, the four-source grep is still mandatory. If any source shows APPLIED / Applied, skip the dispatch and pick a replacement from the remaining candidate list. Do not count duplicates toward a requested "apply to N jobs" total, and do not delegate obvious duplicates just so a subagent can return SKIP.
|
|
10
|
+
- [H2] Max 1 application per company+role. Before every `apply` dispatch, grep all four sources for the URL and for `company+role`: `data/pipeline.md`, all `data/applications/*.md` day files, `batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`. `npx job-forge index:has --key "company-role:..."` may be used first as a fast local artifact prefilter; it rebuilds `.jobforge-index.json` on demand from `templates/index.json`, and a company+role hit is enough to drop an obvious duplicate before dispatch. If `.jobforge-ledger/events.jsonl` exists, `npx job-forge ledger:has --company "..." --role "..." --status Applied` may also be used as a fast prefilter; a match is enough to drop that duplicate before dispatch. For candidates not rejected by the index or ledger, the four-source grep is still mandatory. If any source shows APPLIED / Applied, skip the dispatch and pick a replacement from the remaining candidate list. Do not count duplicates toward a requested "apply to N jobs" total, and do not delegate obvious duplicates just so a subagent can return SKIP.
|
|
11
11
|
why: 2026-04 same-day batch collision — when two batches target the same role, `npx job-forge merge` updates the existing day-file row rather than appending, so grepping day files alone misses earlier-batch applies; merged/*.tsv is the only place the breadcrumb remains
|
|
12
12
|
|
|
13
13
|
- [H3] Before every batch of `task` dispatches that will use Geometra, call `geometra_list_sessions` then `geometra_disconnect({closeBrowser: true})`. Every round, no exceptions. Name this cleanup as an explicit "step 0" in your first-response plan for any multi-apply request — it is the most frequently skipped guardrail in practice, and skipping it produces cascade "Not connected" failures on the next dispatch.
|
|
@@ -19,13 +19,13 @@ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs v
|
|
|
19
19
|
- [H5] Re-dispatch the same company only AFTER the previous subagent returns. Never fire the same `task` twice while the first is still in flight.
|
|
20
20
|
why: two in-flight subagents for the same URL race on Geometra sessions and on tracker TSV writes, corrupting state and sometimes double-submitting
|
|
21
21
|
|
|
22
|
-
- [H5b] Do not use `task` to poll task status. If OpenCode returns a task/session id without a final result, record the id, stop dispatching new rounds, and tell the user the round is still in flight. When the user asks to check later, inspect authoritative files (`batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`, day files, `.jobforge-ledger/events.jsonl`, or `iso-trace`) rather than spawning a "check task status" subagent.
|
|
22
|
+
- [H5b] Do not use `task` to poll task status. If OpenCode returns a task/session id without a final result, record the id, stop dispatching new rounds, and tell the user the round is still in flight. When the user asks to check later, inspect authoritative files (`batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`, day files, `.jobforge-ledger/events.jsonl`, `.jobforge-index.json`, or `iso-trace`) rather than spawning a "check task status" subagent.
|
|
23
23
|
why: OpenCode status prompts can be delivered into the target subagent as a new user message; a 2026-04-25 trace caused a subagent to call `task` recursively instead of finishing the application
|
|
24
24
|
|
|
25
25
|
- [H6] Application outcomes flow through `batch/tracker-additions/*.tsv`, not `data/pipeline.md`. After any multi-apply run, the orchestrator MUST run `npx job-forge merge` then `npx job-forge verify` before ending the session.
|
|
26
26
|
why: `pipeline.md` is the URL inbox (`[ ]` pending → `[x]` processed); `data/applications/YYYY-MM-DD.md` is the outcome log; the TSV pathway is the only safe bridge because `merge` handles column order and duplicate detection
|
|
27
27
|
|
|
28
|
-
- [H7] Load-bearing facts passed to downstream subagents must originate from a file, not from prior subagent prose. Authoritative sources: `data/pipeline.md`, `data/scan-history.tsv`, `batch/scan-output-*.md`, `reports/{num}-*.md` with `**URL:**` / `**Score:**` headers, `batch/tracker-additions/*.tsv
|
|
28
|
+
- [H7] Load-bearing facts passed to downstream subagents must originate from a file, not from prior subagent prose. Authoritative sources: `data/pipeline.md`, `data/scan-history.tsv`, `batch/scan-output-*.md`, `reports/{num}-*.md` with `**URL:**` / `**Score:**` headers, `batch/tracker-additions/*.tsv`, cached JD content returned by `npx job-forge cache:get --url ...`, and source path/line pointers returned by `npx job-forge index:query ...`.
|
|
29
29
|
why: 2026-04-18 scan subagent returned 30 fabricated Greenhouse IDs in prose (plausible-looking, non-existent); orchestrator dispatched 30 downstream subagents that all 404'd. Subagents can hallucinate IDs, scores, and confirmation text — round-trip through a file or don't trust the value
|
|
30
30
|
|
|
31
31
|
- [H8] Never paste proxy values from `config/profile.yml` into `task` prompts, status text, or summaries. If a proxy is configured, tell the subagent exactly: "Proxy is configured; read `config/profile.yml` and pass its top-level `proxy:` object to every `geometra_connect` call." Do not transcribe `server`, `username`, `password`, or `bypass`, even if you just read them from disk.
|
|
@@ -66,12 +66,18 @@ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs v
|
|
|
66
66
|
- [D11] Treat `templates/context.json` as the source of truth for mode/reference context bundles. Use `npx job-forge context:plan <mode>` or `npx job-forge context:check <mode>` when changing or validating what a mode loads; do not paste the full context matrix into prompts.
|
|
67
67
|
why: deterministic context bundles prevent reference-file drift and accidental token bloat without adding MCP/tool-schema tokens
|
|
68
68
|
|
|
69
|
+
- [D12] Use `job-forge cache:*` for deterministic local artifact reuse when available. For URL inputs, check `npx job-forge cache:has --url "..."` / `cache:get` before browser or network JD fetches; after a successful fetch, store the exact JD text with `npx job-forge cache:put --url "..." --ttl 14d --input @file` when it is already on disk.
|
|
70
|
+
why: `iso-cache` is not an MCP and adds no prompt/tool-schema tokens; it avoids repeated JD fetch/render passes and lets future sessions reuse stable content from `.jobforge-cache/`
|
|
71
|
+
|
|
72
|
+
- [D13] Use `job-forge index:*` for deterministic artifact lookup when available. `index:has` and `index:query` rebuild `.jobforge-index.json` from `templates/index.json` on demand, covering reports, tracker day files, tracker TSVs, pipeline URLs, scan history, and ledger events without loading those growing files into prompt context.
|
|
73
|
+
why: `iso-index` is not an MCP and adds no prompt/tool-schema tokens; it gives agents compact file/line pointers and duplicate prefilters before expensive reads or browser dispatches
|
|
74
|
+
|
|
69
75
|
## Procedure
|
|
70
76
|
|
|
71
77
|
1. Check `cv.md`, `profile.yml`, and `portals.yml`; onboard if any file is missing.
|
|
72
78
|
2. Pick and name the mode from **Routing** [D6]. No match → ask; do not guess.
|
|
73
|
-
3. Read the active mode file [D3]
|
|
74
|
-
4. Prepare Geometra dispatches: cleanup [H3], ledger prefilter when
|
|
79
|
+
3. Read the active mode file [D3]. Use context bundle checks when changing context loads [D11]. Check cached artifacts before URL/JD refetches [D12]. Use artifact index lookups before broad file reads when they can answer the question [D13]. Decide inline vs delegated work [D1].
|
|
80
|
+
4. Prepare Geometra dispatches: cleanup [H3], index/ledger prefilter when useful [D8, D13], dedupe [H2], location filter [D5], routing [D2, D10], proxy prompt hygiene [H8].
|
|
75
81
|
5. Dispatch at most 2 tasks per round [H1]; wait for final outcomes, not just task ids [H5b].
|
|
76
82
|
6. Keep multi-job form-filling out of the orchestrator [H4].
|
|
77
83
|
7. Cross-check subagent facts against authoritative files [H7].
|
package/README.md
CHANGED
|
@@ -31,7 +31,7 @@ The scaffolded `opencode.json` already has three MCPs wired up — they launch a
|
|
|
31
31
|
- **Gmail** — reads replies from recruiters
|
|
32
32
|
- **state-trace** — typed working memory for cross-session context (resumed batches, recent decisions, repeated portal quirks). Install once with `python3 -m pip install "state-trace[mcp]"`; the MCP command is `state-trace-mcp`.
|
|
33
33
|
|
|
34
|
-
JobForge also keeps MCP-free local workflow state: `templates/contracts.json` defines tracker/apply artifact shapes via `@razroo/iso-contract`, `templates/capabilities.json` defines role capability boundaries via `@razroo/iso-capabilities`, `templates/context.json` defines deterministic mode/reference bundles via `@razroo/iso-context`,
|
|
34
|
+
JobForge also keeps MCP-free local workflow state: `templates/contracts.json` defines tracker/apply artifact shapes via `@razroo/iso-contract`, `templates/capabilities.json` defines role capability boundaries via `@razroo/iso-capabilities`, `templates/context.json` defines deterministic mode/reference bundles via `@razroo/iso-context`, `.jobforge-ledger/events.jsonl` records duplicate/status events via `@razroo/iso-ledger`, `.jobforge-cache/` stores reusable JD/artifact content via `@razroo/iso-cache`, and `.jobforge-index.json` indexes artifact source pointers via `@razroo/iso-index`. None of these add always-on prompt or tool-schema tokens.
|
|
35
35
|
|
|
36
36
|
`npm install` also materializes symlinks for every supported agent harness — OpenCode, Cursor, Claude Code, and Codex — so you can run `opencode`, `cursor`, `claude`, or `codex` in the same project and each picks up the shared MCP config and instructions.
|
|
37
37
|
|
|
@@ -78,7 +78,7 @@ JobForge turns opencode into a full job search command center. Instead of manual
|
|
|
78
78
|
| **Durable Batch Orchestration** | `batch-runner.sh` uses `@razroo/iso-orchestrator` for resumable bundle execution, bounded fan-out, mutexed state writes, and workflow records in `.jobforge-runs/`. |
|
|
79
79
|
| **Pipeline Integrity** | Automated merge, dedup, status normalization, health checks |
|
|
80
80
|
| **Cost-Aware Agent Routing** | Three subagents (`@general-free`, `@general-paid`, `@glm-minimal`) with per-task tool surfaces. On OpenCode, JobForge pins all tiers to `opencode-go/deepseek-v4-flash` so application runs avoid overloaded free-model pools. See [Subagent Routing in AGENTS.md](AGENTS.md) for the task-to-agent mapping. |
|
|
81
|
-
| **Trace + Telemetry + Guard + Contract + Ledger + Capabilities + Context** | `job-forge trace:*` exposes local OpenCode transcripts, `job-forge telemetry:*` summarizes runs, `job-forge guard:*` audits deterministic policy rules, `templates/contracts.json` enforces artifact shape with `iso-contract`, `job-forge ledger:*` queries append-only workflow state, `job-forge capabilities:*` checks role boundaries,
|
|
81
|
+
| **Trace + Telemetry + Guard + Contract + Ledger + Capabilities + Context + Cache + Index** | `job-forge trace:*` exposes local OpenCode transcripts, `job-forge telemetry:*` summarizes runs, `job-forge guard:*` audits deterministic policy rules, `templates/contracts.json` enforces artifact shape with `iso-contract`, `job-forge ledger:*` queries append-only workflow state, `job-forge capabilities:*` checks role boundaries, `job-forge context:*` plans mode/reference context bundles, `job-forge cache:*` reuses fetched JD/artifact content, and `job-forge index:*` queries compact source pointers without MCP/tool-schema overhead. |
|
|
82
82
|
| **Token Cost Visibility** | `job-forge tokens --days 1` for per-session breakdown; `job-forge session-report --since-minutes 60 --log` to flag sessions over budget and append history to `data/token-usage.tsv`. Auto-logged after every batch run. |
|
|
83
83
|
|
|
84
84
|
## Usage
|
|
@@ -146,6 +146,8 @@ my-search/
|
|
|
146
146
|
├── config/profile.yml # your identity, target roles (personal)
|
|
147
147
|
├── data/ # applications, pipeline, scan history (personal, gitignored)
|
|
148
148
|
├── .jobforge-ledger/ # append-only local workflow events (personal, gitignored)
|
|
149
|
+
├── .jobforge-cache/ # content-addressed local JD/artifact cache (personal, gitignored)
|
|
150
|
+
├── .jobforge-index.json # deterministic artifact lookup index (generated, gitignored)
|
|
149
151
|
├── reports/ # generated evaluation reports (personal, gitignored)
|
|
150
152
|
├── batch/{batch-input,batch-state}.tsv, tracker-additions/, logs/ # personal
|
|
151
153
|
├── .jobforge-runs/ # durable batch workflow records (generated)
|
|
@@ -162,7 +164,7 @@ my-search/
|
|
|
162
164
|
├── .opencode/skills/job-forge.md # → skill router
|
|
163
165
|
├── .opencode/agents/ # → @general-free, @general-paid, @glm-minimal
|
|
164
166
|
├── modes/ # → _shared.md + skill modes
|
|
165
|
-
├── templates/ # → states.yml, portals.example.yml, cv-template.html, capabilities.json, context.json
|
|
167
|
+
├── templates/ # → states.yml, portals.example.yml, cv-template.html, capabilities.json, context.json, index.json
|
|
166
168
|
├── batch/batch-prompt.md # → batch worker prompt
|
|
167
169
|
├── batch/batch-runner.sh # → parallel orchestrator
|
|
168
170
|
│
|
|
@@ -197,6 +199,8 @@ JobForge/
|
|
|
197
199
|
│ ├── ledger.mjs # iso-ledger-backed workflow-state CLI
|
|
198
200
|
│ ├── capabilities.mjs # iso-capabilities-backed role policy CLI
|
|
199
201
|
│ ├── context.mjs # iso-context-backed context bundle CLI
|
|
202
|
+
│ ├── cache.mjs # iso-cache-backed local artifact cache CLI
|
|
203
|
+
│ ├── index.mjs # iso-index-backed artifact lookup CLI
|
|
200
204
|
│ ├── token-usage-report.mjs # opencode cost analyzer
|
|
201
205
|
│ └── release/check-source.mjs # version gate for npm publish
|
|
202
206
|
├── tracker-lib.mjs / merge-tracker.mjs / dedup-tracker.mjs / verify-pipeline.mjs
|
package/bin/create-job-forge.mjs
CHANGED
|
@@ -124,6 +124,11 @@ const consumerPkg = {
|
|
|
124
124
|
'ledger:verify': 'job-forge ledger:verify',
|
|
125
125
|
'ledger:has': 'job-forge ledger:has',
|
|
126
126
|
'ledger:query': 'job-forge ledger:query',
|
|
127
|
+
'index:build': 'job-forge index:build',
|
|
128
|
+
'index:status': 'job-forge index:status',
|
|
129
|
+
'index:verify': 'job-forge index:verify',
|
|
130
|
+
'index:has': 'job-forge index:has',
|
|
131
|
+
'index:query': 'job-forge index:query',
|
|
127
132
|
// One command to pull the latest harness and any locally-pinned MCP
|
|
128
133
|
// packages. npm update is a no-op on packages not in package.json, so
|
|
129
134
|
// listing @razroo/gmail-mcp + @geometra/mcp is safe for consumers that
|
|
@@ -224,6 +229,7 @@ Before doing any work, remember where things live in *this* project:
|
|
|
224
229
|
| Inbox of pending URLs | \`data/pipeline.md\` | The queue for \`/job-forge pipeline\` |
|
|
225
230
|
| Scanner dedup history | \`data/scan-history.tsv\` | Only touch in \`/job-forge scan\` |
|
|
226
231
|
| Local workflow ledger | \`.jobforge-ledger/events.jsonl\` | Deterministic append-only state; use \`job-forge ledger:*\` |
|
|
232
|
+
| Local artifact index | \`.jobforge-index.json\` | Deterministic file/line lookup; use \`job-forge index:*\` |
|
|
227
233
|
| Scanner config | \`portals.yml\` (project root) | Company configs |
|
|
228
234
|
| Profile / identity | \`config/profile.yml\` | Candidate name, email, target roles |
|
|
229
235
|
| CV | \`cv.md\` (project root) | Markdown, source of truth |
|
|
@@ -369,6 +375,7 @@ job-forge sync # re-run if symlinks drift
|
|
|
369
375
|
job-forge merge # merge batch/tracker-additions/*.tsv into the tracker
|
|
370
376
|
job-forge verify # verify pipeline integrity
|
|
371
377
|
job-forge ledger:status # local deterministic workflow ledger status
|
|
378
|
+
job-forge index:status # local artifact index status
|
|
372
379
|
job-forge pdf cv.md out.pdf
|
|
373
380
|
job-forge tokens --days 1 # per-session opencode token usage
|
|
374
381
|
\`\`\`
|
package/bin/job-forge.mjs
CHANGED
|
@@ -23,6 +23,8 @@
|
|
|
23
23
|
* ledger:* Query local deterministic workflow state via iso-ledger
|
|
24
24
|
* capabilities:* Query role capability policy via iso-capabilities
|
|
25
25
|
* context:* Query/render deterministic context bundles via iso-context
|
|
26
|
+
* cache:* Reuse local deterministic artifacts via iso-cache
|
|
27
|
+
* index:* Query local artifacts via iso-index
|
|
26
28
|
* sync Re-run the harness symlink sync (bin/sync.mjs)
|
|
27
29
|
* help, --help Show this message
|
|
28
30
|
*/
|
|
@@ -103,6 +105,28 @@ const contextAliases = {
|
|
|
103
105
|
'context:path': 'path',
|
|
104
106
|
};
|
|
105
107
|
|
|
108
|
+
const cacheAliases = {
|
|
109
|
+
'cache:key': 'key',
|
|
110
|
+
'cache:status': 'status',
|
|
111
|
+
'cache:has': 'has',
|
|
112
|
+
'cache:get': 'get',
|
|
113
|
+
'cache:put': 'put',
|
|
114
|
+
'cache:list': 'list',
|
|
115
|
+
'cache:verify': 'verify',
|
|
116
|
+
'cache:prune': 'prune',
|
|
117
|
+
'cache:path': 'path',
|
|
118
|
+
};
|
|
119
|
+
|
|
120
|
+
const indexAliases = {
|
|
121
|
+
'index:build': 'build',
|
|
122
|
+
'index:status': 'status',
|
|
123
|
+
'index:query': 'query',
|
|
124
|
+
'index:has': 'has',
|
|
125
|
+
'index:verify': 'verify',
|
|
126
|
+
'index:explain': 'explain',
|
|
127
|
+
'index:path': 'path',
|
|
128
|
+
};
|
|
129
|
+
|
|
106
130
|
const [, , cmd, ...rest] = process.argv;
|
|
107
131
|
|
|
108
132
|
function printHelp() {
|
|
@@ -142,6 +166,17 @@ Commands:
|
|
|
142
166
|
context:plan Estimate files/tokens for one context bundle
|
|
143
167
|
context:check Fail if a context bundle exceeds its budget
|
|
144
168
|
context:render Render context bundle content as markdown/json
|
|
169
|
+
cache:status Show local artifact cache status
|
|
170
|
+
cache:key Print deterministic cache key for a job URL
|
|
171
|
+
cache:has Check whether a job URL or cache key is cached
|
|
172
|
+
cache:get Read cached JD/artifact content
|
|
173
|
+
cache:put Store JD/artifact content
|
|
174
|
+
cache:verify Validate local artifact cache integrity
|
|
175
|
+
index:status Show local artifact index status
|
|
176
|
+
index:build Rebuild .jobforge-index.json from templates/index.json
|
|
177
|
+
index:has Check indexed URL/company-role/report facts without loading source files
|
|
178
|
+
index:query Query indexed reports, tracker rows, TSVs, scan history, pipeline, and ledger
|
|
179
|
+
index:verify Validate local artifact index integrity
|
|
145
180
|
sync Re-create harness symlinks in the current project
|
|
146
181
|
|
|
147
182
|
Deterministic helpers (prefer these over LLM-derived values):
|
|
@@ -175,6 +210,11 @@ Pass --help after a command to see its own flags, e.g.:
|
|
|
175
210
|
job-forge capabilities:check general-free --tool browser --mcp geometra --command "npx job-forge merge" --filesystem write
|
|
176
211
|
job-forge context:plan apply
|
|
177
212
|
job-forge context:check apply --budget 23000
|
|
213
|
+
job-forge cache:has --url https://example.test/jobs/123
|
|
214
|
+
job-forge cache:get --url https://example.test/jobs/123
|
|
215
|
+
job-forge cache:put --url https://example.test/jobs/123 --input @jds/example.md
|
|
216
|
+
job-forge index:has --key "company-role:acme:staff-engineer"
|
|
217
|
+
job-forge index:query "acme"
|
|
178
218
|
|
|
179
219
|
Project directory resolves to $JOB_FORGE_PROJECT or cwd.`);
|
|
180
220
|
}
|
|
@@ -274,6 +314,36 @@ if (cmd === 'context' || contextAliases[cmd]) {
|
|
|
274
314
|
process.exit(result.status ?? 1);
|
|
275
315
|
}
|
|
276
316
|
|
|
317
|
+
if (cmd === 'cache' || cacheAliases[cmd]) {
|
|
318
|
+
const cacheArgs = cmd === 'cache'
|
|
319
|
+
? (rest.length === 0 ? ['help'] : rest)
|
|
320
|
+
: [cacheAliases[cmd], ...rest];
|
|
321
|
+
|
|
322
|
+
const scriptPath = join(PKG_ROOT, 'scripts/cache.mjs');
|
|
323
|
+
const result = spawnSync(process.execPath, [scriptPath, ...cacheArgs], {
|
|
324
|
+
stdio: 'inherit',
|
|
325
|
+
cwd: PROJECT_DIR,
|
|
326
|
+
env: process.env,
|
|
327
|
+
});
|
|
328
|
+
|
|
329
|
+
process.exit(result.status ?? 1);
|
|
330
|
+
}
|
|
331
|
+
|
|
332
|
+
if (cmd === 'index' || indexAliases[cmd]) {
|
|
333
|
+
const indexArgs = cmd === 'index'
|
|
334
|
+
? (rest.length === 0 ? ['help'] : rest)
|
|
335
|
+
: [indexAliases[cmd], ...rest];
|
|
336
|
+
|
|
337
|
+
const scriptPath = join(PKG_ROOT, 'scripts/index.mjs');
|
|
338
|
+
const result = spawnSync(process.execPath, [scriptPath, ...indexArgs], {
|
|
339
|
+
stdio: 'inherit',
|
|
340
|
+
cwd: PROJECT_DIR,
|
|
341
|
+
env: process.env,
|
|
342
|
+
});
|
|
343
|
+
|
|
344
|
+
process.exit(result.status ?? 1);
|
|
345
|
+
}
|
|
346
|
+
|
|
277
347
|
const rel = commands[cmd];
|
|
278
348
|
if (!rel) {
|
|
279
349
|
console.error(`Unknown command: ${cmd}\n`);
|
package/docs/ARCHITECTURE.md
CHANGED
|
@@ -161,6 +161,7 @@ config/profile.yml → Candidate identity
|
|
|
161
161
|
portals.yml → Scanner configuration
|
|
162
162
|
data/pipeline.md → Pending URLs and `local:jds/...` inbox (see modes/pipeline.md)
|
|
163
163
|
.jobforge-ledger/events.jsonl → Append-only workflow events for cheap local duplicate/status checks
|
|
164
|
+
.jobforge-index.json → Deterministic artifact lookup index built from templates/index.json
|
|
164
165
|
jds/*.md → Saved job descriptions referenced from the pipeline (`local:jds/{file}`)
|
|
165
166
|
templates/states.yml → Canonical status values
|
|
166
167
|
templates/context.json → Deterministic mode/reference context bundle policy
|
|
@@ -176,12 +177,13 @@ Create `data/pipeline.md` when you start using the URL inbox (`/job-forge pipeli
|
|
|
176
177
|
- PDFs: `cv-candidate-{company-slug}-{YYYY-MM-DD}.pdf`
|
|
177
178
|
- Tracker TSVs: `batch/tracker-additions/{num}-{company-slug}.tsv` (one file per evaluation; merged files move under `batch/tracker-additions/merged/`; shape enforced by `templates/contracts.json`)
|
|
178
179
|
- Ledger: `.jobforge-ledger/events.jsonl` (created by `job-forge ledger:rebuild`, `tracker-line --write`, or `merge`; gitignored personal state)
|
|
180
|
+
- Index: `.jobforge-index.json` (created on demand by `job-forge index:*`; gitignored local lookup state)
|
|
179
181
|
- Capabilities: `templates/capabilities.json` (role boundary policy inspected with `job-forge capabilities:*`)
|
|
180
182
|
- Context: `templates/context.json` (mode/reference file bundles inspected with `job-forge context:*`)
|
|
181
183
|
|
|
182
184
|
## Pipeline Integrity
|
|
183
185
|
|
|
184
|
-
From the project root, `npx job-forge verify` (or `npm run verify`) runs `verify-pipeline.mjs`. When a tracker file exists, it validates canonical statuses (using `templates/states.yml` when that file is present and parseable), validates every tracker row against `templates/contracts.json`, warns on probable duplicate company/role rows, checks that report column markdown links resolve to files in the repo, validates score column format (`X.X/5`, `N/A`, or `DUP`), rejects table rows with too few columns, flags markdown bold inside the score column, and warns if any `batch/tracker-additions/*.tsv` files are still waiting to be merged. If `.jobforge-ledger/events.jsonl` exists, verify also validates the append-only ledger. It also compares state ids from `templates/states.yml` to an internal fallback list and warns when the two sets drift. **Fresh clone:** the command exits successfully when neither `data/applications.md` nor root `applications.md` exists yet; pending-TSV and states-drift checks still run so contributors see unmerged batch output early. Optional setup validation after you add `cv.md` and `config/profile.yml`: `npm run sync-check` (`cv-sync-check.mjs`).
|
|
186
|
+
From the project root, `npx job-forge verify` (or `npm run verify`) runs `verify-pipeline.mjs`. When a tracker file exists, it validates canonical statuses (using `templates/states.yml` when that file is present and parseable), validates every tracker row against `templates/contracts.json`, warns on probable duplicate company/role rows, checks that report column markdown links resolve to files in the repo, validates score column format (`X.X/5`, `N/A`, or `DUP`), rejects table rows with too few columns, flags markdown bold inside the score column, and warns if any `batch/tracker-additions/*.tsv` files are still waiting to be merged. If `.jobforge-ledger/events.jsonl` exists, verify also validates the append-only ledger. If `.jobforge-index.json` exists, verify validates the artifact index. It also compares state ids from `templates/states.yml` to an internal fallback list and warns when the two sets drift. **Fresh clone:** the command exits successfully when neither `data/applications.md` nor root `applications.md` exists yet; pending-TSV and states-drift checks still run so contributors see unmerged batch output early. Optional setup validation after you add `cv.md` and `config/profile.yml`: `npm run sync-check` (`cv-sync-check.mjs`).
|
|
185
187
|
|
|
186
188
|
**`verify-pipeline.mjs` checks (same order as the script header):**
|
|
187
189
|
|
|
@@ -195,8 +197,9 @@ From the project root, `npx job-forge verify` (or `npm run verify`) runs `verify
|
|
|
195
197
|
8. Score column has no markdown bold.
|
|
196
198
|
9. Warn when state ids in `templates/states.yml` drift from the script’s built-in fallback list (or when the file exists but ids failed to parse).
|
|
197
199
|
10. Validate `.jobforge-ledger/events.jsonl` when present.
|
|
200
|
+
11. Validate `.jobforge-index.json` when present.
|
|
198
201
|
|
|
199
|
-
When the tracker file is missing, checks 1-6 and 8 are skipped; checks 7, 9, and
|
|
202
|
+
When the tracker file is missing, checks 1-6 and 8 are skipped; checks 7, 9, 10, and 11 still run when applicable.
|
|
200
203
|
|
|
201
204
|
## Contributing touchpoints
|
|
202
205
|
|
|
@@ -219,6 +222,7 @@ Scripts maintain data consistency. In a consumer project they're invoked via the
|
|
|
219
222
|
| `scripts/telemetry.mjs` | `npx job-forge telemetry:status` / `telemetry:show` | JobForge operational telemetry derived from OpenCode traces plus tracker TSV state |
|
|
220
223
|
| `scripts/guard.mjs` | `npx job-forge guard:audit` / `guard:explain` | Deterministic `@razroo/iso-guard` policy audits over local OpenCode traces |
|
|
221
224
|
| `scripts/ledger.mjs` | `npx job-forge ledger:status` / `ledger:has` / `ledger:rebuild` | Deterministic `@razroo/iso-ledger` state over tracker, TSV, and pipeline files |
|
|
225
|
+
| `scripts/index.mjs` | `npx job-forge index:status` / `index:has` / `index:query` | Deterministic `@razroo/iso-index` lookup over reports, tracker rows, TSVs, pipeline, scan history, and ledger events |
|
|
222
226
|
| `scripts/context.mjs` | `npx job-forge context:list` / `context:plan` / `context:check` / `context:render` | Deterministic `@razroo/iso-context` mode/reference context bundle planning and rendering |
|
|
223
227
|
| `tracker-lib.mjs` | _(library)_ | Shared helpers for reading/writing day-based tracker files — imported by merge/dedup/verify/normalize |
|
|
224
228
|
| `bin/sync.mjs` | `npx job-forge sync` | Creates the harness symlinks in a consumer project (also runs as `postinstall`) |
|
package/docs/CUSTOMIZATION.md
CHANGED
|
@@ -150,6 +150,10 @@ Role capability boundaries live in `templates/capabilities.json` and are enforce
|
|
|
150
150
|
|
|
151
151
|
Mode/reference context bundles live in `templates/context.json` and are planned locally by `@razroo/iso-context`. Use `job-forge context:plan <mode>` to see the files and estimated tokens, `job-forge context:check <mode>` to fail on budget drift, and `job-forge context:render <mode>` when you intentionally need a compact markdown or JSON context bundle. This is not an MCP and does not add tool-schema tokens; rendered context only consumes prompt tokens when a workflow deliberately asks for it.
|
|
152
152
|
|
|
153
|
+
## JobForge artifact index
|
|
154
|
+
|
|
155
|
+
Artifact lookup policy lives in `templates/index.json` and is built locally by `@razroo/iso-index`. Use `job-forge index:has --key "company-role:acme:staff-engineer"` as a cheap duplicate/source prefilter, `job-forge index:query "acme"` to get compact source path/line pointers, and `job-forge index:verify` to validate `.jobforge-index.json`. Query, has, and verify rebuild the index on demand, so scaffolded projects need no setup. This is not an MCP and does not add tool-schema tokens.
|
|
156
|
+
|
|
153
157
|
## JobForge guard audits
|
|
154
158
|
|
|
155
159
|
Guard audits run deterministic `@razroo/iso-guard` policies over the same local OpenCode traces. The default policy lives at `templates/guards/jobforge-baseline.yaml` and checks rules that are reliable from transcript data, including max two task dispatches per assistant message, no task-status polling via `task`, no raw proxy configuration in task prompts, and no child session task recursion.
|
package/docs/SETUP.md
CHANGED
|
@@ -128,6 +128,8 @@ From your project root, these commands maintain the tracker and pipeline checks.
|
|
|
128
128
|
| Inspect tracker row contract | `npx iso-contract explain jobforge.tracker-row --contracts templates/contracts.json` | _(none)_ |
|
|
129
129
|
| Inspect role capabilities | `npx job-forge capabilities:explain general-free` | `npm run capabilities:explain -- general-free` |
|
|
130
130
|
| Inspect context bundle budget | `npx job-forge context:plan apply` | `npm run context:plan -- apply` |
|
|
131
|
+
| Inspect local JD/artifact cache | `npx job-forge cache:status` | `npm run cache:status` |
|
|
132
|
+
| Inspect local artifact index | `npx job-forge index:status` | `npm run index:status` |
|
|
131
133
|
| Map status column to canonical labels | `npx job-forge normalize` | `npm run normalize` |
|
|
132
134
|
| Merge duplicate company/role rows | `npx job-forge dedup` | `npm run dedup` |
|
|
133
135
|
| Generate ATS-optimized CV PDF | `npx job-forge pdf` | `npm run pdf` |
|
|
@@ -144,6 +146,8 @@ From your project root, these commands maintain the tracker and pipeline checks.
|
|
|
144
146
|
| Show local workflow ledger status | `npx job-forge ledger:status` | `npm run ledger:status` |
|
|
145
147
|
| Rebuild local workflow ledger from tracker/pipeline files | `npx job-forge ledger:rebuild` | `npm run ledger:rebuild` |
|
|
146
148
|
| Check duplicate/status event without loading tracker files | `npx job-forge ledger:has --company "Acme" --role "Staff Engineer" --status Applied` | `npm run ledger:has -- --company ...` |
|
|
149
|
+
| Check/reuse cached JD content | `npx job-forge cache:has --url <url>` / `npx job-forge cache:get --url <url>` | `npm run cache:has -- --url ...` |
|
|
150
|
+
| Query local artifact pointers | `npx job-forge index:query "Acme"` / `npx job-forge index:has --key company-role:acme:staff-engineer` | `npm run index:query -- Acme` |
|
|
147
151
|
| Re-create harness symlinks | `npx job-forge sync` | `npm run sync` |
|
|
148
152
|
| Build optional dashboard TUI (Go on `PATH`) | `(cd node_modules/job-forge/dashboard && go build .)` | `npm run build:dashboard` (harness repo only) |
|
|
149
153
|
|
|
@@ -76,6 +76,11 @@ Local workflow ledger (terminal, outside opencode):
|
|
|
76
76
|
npx job-forge ledger:status # .jobforge-ledger/events.jsonl summary
|
|
77
77
|
npx job-forge ledger:has --company "Acme" --role "Staff Engineer" --status Applied
|
|
78
78
|
|
|
79
|
+
Local artifact index (terminal, outside opencode):
|
|
80
|
+
npx job-forge index:status # .jobforge-index.json summary
|
|
81
|
+
npx job-forge index:has --key "company-role:acme:staff-engineer"
|
|
82
|
+
npx job-forge index:query "acme"
|
|
83
|
+
|
|
79
84
|
Artifact contracts (terminal, outside opencode):
|
|
80
85
|
npx iso-contract explain jobforge.tracker-row --contracts templates/contracts.json
|
|
81
86
|
npx job-forge tracker-line ... --write # renders + validates tracker TSV locally
|
|
@@ -161,6 +166,11 @@ Step 1 — Enumerate candidates
|
|
|
161
166
|
- Build ordered list: candidates = [job_1, job_2, ..., job_N]
|
|
162
167
|
|
|
163
168
|
Step 2 — Dedup against already-applied
|
|
169
|
+
- Run npx job-forge index:has --key "company-role:<company-slug>:<role-slug>"
|
|
170
|
+
as a fast local artifact prefilter when company+role is known. It rebuilds
|
|
171
|
+
.jobforge-index.json on demand from templates/index.json. A hit means the
|
|
172
|
+
role has already appeared in tracker files or tracker TSVs and can be
|
|
173
|
+
dropped before dispatch.
|
|
164
174
|
- If .jobforge-ledger/events.jsonl exists, use npx job-forge ledger:has as a
|
|
165
175
|
fast prefilter for obvious company+role Applied duplicates. A ledger match
|
|
166
176
|
can be dropped before dispatch without loading tracker files into context.
|
package/iso/instructions.md
CHANGED
|
@@ -7,7 +7,7 @@ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs v
|
|
|
7
7
|
- [H1] Max 2 parallel `task` dispatches per message. For N jobs, run `ceil(N/2)` sequential rounds of 2. A round is not complete until both subagents return a final outcome (`APPLIED`, `APPLY FAILED`, `SKIP`, `Discarded`, or a written TSV path). A `task` tool result that only gives a session id / title is a launch acknowledgement, not completion. Applies in all modes, for all user phrasings ("urgent", "apply to 10 jobs now").
|
|
8
8
|
why: each subagent requires post-cleanup and racing more than 2 reliably loses at least one result. On 2026-04-25 the orchestrator launched round 2 while round 1 had only returned task ids, leaving four application subagents in flight and losing two provider recoveries
|
|
9
9
|
|
|
10
|
-
- [H2] Max 1 application per company+role. Before every `apply` dispatch, grep all four sources for the URL and for `company+role`: `data/pipeline.md`, all `data/applications/*.md` day files, `batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`. If `.jobforge-ledger/events.jsonl` exists, `npx job-forge ledger:has --company "..." --role "..." --status Applied` may be used as a fast prefilter; a match is enough to drop that duplicate before dispatch. For candidates not rejected by the ledger, the four-source grep is still mandatory. If any source shows APPLIED / Applied, skip the dispatch and pick a replacement from the remaining candidate list. Do not count duplicates toward a requested "apply to N jobs" total, and do not delegate obvious duplicates just so a subagent can return SKIP.
|
|
10
|
+
- [H2] Max 1 application per company+role. Before every `apply` dispatch, grep all four sources for the URL and for `company+role`: `data/pipeline.md`, all `data/applications/*.md` day files, `batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`. `npx job-forge index:has --key "company-role:..."` may be used first as a fast local artifact prefilter; it rebuilds `.jobforge-index.json` on demand from `templates/index.json`, and a company+role hit is enough to drop an obvious duplicate before dispatch. If `.jobforge-ledger/events.jsonl` exists, `npx job-forge ledger:has --company "..." --role "..." --status Applied` may also be used as a fast prefilter; a match is enough to drop that duplicate before dispatch. For candidates not rejected by the index or ledger, the four-source grep is still mandatory. If any source shows APPLIED / Applied, skip the dispatch and pick a replacement from the remaining candidate list. Do not count duplicates toward a requested "apply to N jobs" total, and do not delegate obvious duplicates just so a subagent can return SKIP.
|
|
11
11
|
why: 2026-04 same-day batch collision — when two batches target the same role, `npx job-forge merge` updates the existing day-file row rather than appending, so grepping day files alone misses earlier-batch applies; merged/*.tsv is the only place the breadcrumb remains
|
|
12
12
|
|
|
13
13
|
- [H3] Before every batch of `task` dispatches that will use Geometra, call `geometra_list_sessions` then `geometra_disconnect({closeBrowser: true})`. Every round, no exceptions. Name this cleanup as an explicit "step 0" in your first-response plan for any multi-apply request — it is the most frequently skipped guardrail in practice, and skipping it produces cascade "Not connected" failures on the next dispatch.
|
|
@@ -19,13 +19,13 @@ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs v
|
|
|
19
19
|
- [H5] Re-dispatch the same company only AFTER the previous subagent returns. Never fire the same `task` twice while the first is still in flight.
|
|
20
20
|
why: two in-flight subagents for the same URL race on Geometra sessions and on tracker TSV writes, corrupting state and sometimes double-submitting
|
|
21
21
|
|
|
22
|
-
- [H5b] Do not use `task` to poll task status. If OpenCode returns a task/session id without a final result, record the id, stop dispatching new rounds, and tell the user the round is still in flight. When the user asks to check later, inspect authoritative files (`batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`, day files, `.jobforge-ledger/events.jsonl`, or `iso-trace`) rather than spawning a "check task status" subagent.
|
|
22
|
+
- [H5b] Do not use `task` to poll task status. If OpenCode returns a task/session id without a final result, record the id, stop dispatching new rounds, and tell the user the round is still in flight. When the user asks to check later, inspect authoritative files (`batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`, day files, `.jobforge-ledger/events.jsonl`, `.jobforge-index.json`, or `iso-trace`) rather than spawning a "check task status" subagent.
|
|
23
23
|
why: OpenCode status prompts can be delivered into the target subagent as a new user message; a 2026-04-25 trace caused a subagent to call `task` recursively instead of finishing the application
|
|
24
24
|
|
|
25
25
|
- [H6] Application outcomes flow through `batch/tracker-additions/*.tsv`, not `data/pipeline.md`. After any multi-apply run, the orchestrator MUST run `npx job-forge merge` then `npx job-forge verify` before ending the session.
|
|
26
26
|
why: `pipeline.md` is the URL inbox (`[ ]` pending → `[x]` processed); `data/applications/YYYY-MM-DD.md` is the outcome log; the TSV pathway is the only safe bridge because `merge` handles column order and duplicate detection
|
|
27
27
|
|
|
28
|
-
- [H7] Load-bearing facts passed to downstream subagents must originate from a file, not from prior subagent prose. Authoritative sources: `data/pipeline.md`, `data/scan-history.tsv`, `batch/scan-output-*.md`, `reports/{num}-*.md` with `**URL:**` / `**Score:**` headers, `batch/tracker-additions/*.tsv
|
|
28
|
+
- [H7] Load-bearing facts passed to downstream subagents must originate from a file, not from prior subagent prose. Authoritative sources: `data/pipeline.md`, `data/scan-history.tsv`, `batch/scan-output-*.md`, `reports/{num}-*.md` with `**URL:**` / `**Score:**` headers, `batch/tracker-additions/*.tsv`, cached JD content returned by `npx job-forge cache:get --url ...`, and source path/line pointers returned by `npx job-forge index:query ...`.
|
|
29
29
|
why: 2026-04-18 scan subagent returned 30 fabricated Greenhouse IDs in prose (plausible-looking, non-existent); orchestrator dispatched 30 downstream subagents that all 404'd. Subagents can hallucinate IDs, scores, and confirmation text — round-trip through a file or don't trust the value
|
|
30
30
|
|
|
31
31
|
- [H8] Never paste proxy values from `config/profile.yml` into `task` prompts, status text, or summaries. If a proxy is configured, tell the subagent exactly: "Proxy is configured; read `config/profile.yml` and pass its top-level `proxy:` object to every `geometra_connect` call." Do not transcribe `server`, `username`, `password`, or `bypass`, even if you just read them from disk.
|
|
@@ -66,12 +66,18 @@ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs v
|
|
|
66
66
|
- [D11] Treat `templates/context.json` as the source of truth for mode/reference context bundles. Use `npx job-forge context:plan <mode>` or `npx job-forge context:check <mode>` when changing or validating what a mode loads; do not paste the full context matrix into prompts.
|
|
67
67
|
why: deterministic context bundles prevent reference-file drift and accidental token bloat without adding MCP/tool-schema tokens
|
|
68
68
|
|
|
69
|
+
- [D12] Use `job-forge cache:*` for deterministic local artifact reuse when available. For URL inputs, check `npx job-forge cache:has --url "..."` / `cache:get` before browser or network JD fetches; after a successful fetch, store the exact JD text with `npx job-forge cache:put --url "..." --ttl 14d --input @file` when it is already on disk.
|
|
70
|
+
why: `iso-cache` is not an MCP and adds no prompt/tool-schema tokens; it avoids repeated JD fetch/render passes and lets future sessions reuse stable content from `.jobforge-cache/`
|
|
71
|
+
|
|
72
|
+
- [D13] Use `job-forge index:*` for deterministic artifact lookup when available. `index:has` and `index:query` rebuild `.jobforge-index.json` from `templates/index.json` on demand, covering reports, tracker day files, tracker TSVs, pipeline URLs, scan history, and ledger events without loading those growing files into prompt context.
|
|
73
|
+
why: `iso-index` is not an MCP and adds no prompt/tool-schema tokens; it gives agents compact file/line pointers and duplicate prefilters before expensive reads or browser dispatches
|
|
74
|
+
|
|
69
75
|
## Procedure
|
|
70
76
|
|
|
71
77
|
1. Check `cv.md`, `profile.yml`, and `portals.yml`; onboard if any file is missing.
|
|
72
78
|
2. Pick and name the mode from **Routing** [D6]. No match → ask; do not guess.
|
|
73
|
-
3. Read the active mode file [D3]
|
|
74
|
-
4. Prepare Geometra dispatches: cleanup [H3], ledger prefilter when
|
|
79
|
+
3. Read the active mode file [D3]. Use context bundle checks when changing context loads [D11]. Check cached artifacts before URL/JD refetches [D12]. Use artifact index lookups before broad file reads when they can answer the question [D13]. Decide inline vs delegated work [D1].
|
|
80
|
+
4. Prepare Geometra dispatches: cleanup [H3], index/ledger prefilter when useful [D8, D13], dedupe [H2], location filter [D5], routing [D2, D10], proxy prompt hygiene [H8].
|
|
75
81
|
5. Dispatch at most 2 tasks per round [H1]; wait for final outcomes, not just task ids [H5b].
|
|
76
82
|
6. Keep multi-job form-filling out of the orchestrator [H4].
|
|
77
83
|
7. Cross-check subagent facts against authoritative files [H7].
|