job-forge 2.14.22 → 2.14.24

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -12,7 +12,7 @@ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs v
12
12
  - [H1] Max 2 parallel `task` dispatches per message. For N jobs, run `ceil(N/2)` sequential rounds of 2. A round is not complete until both subagents return a final outcome (`APPLIED`, `APPLY FAILED`, `SKIP`, `Discarded`, or a written TSV path). A `task` tool result that only gives a session id / title is a launch acknowledgement, not completion. Applies in all modes, for all user phrasings ("urgent", "apply to 10 jobs now").
13
13
  why: each subagent requires post-cleanup and racing more than 2 reliably loses at least one result. On 2026-04-25 the orchestrator launched round 2 while round 1 had only returned task ids, leaving four application subagents in flight and losing two provider recoveries
14
14
 
15
- - [H2] Max 1 application per company+role. Before every `apply` dispatch, grep all four sources for the URL and for `company+role`: `data/pipeline.md`, all `data/applications/*.md` day files, `batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`. If `.jobforge-ledger/events.jsonl` exists, `npx job-forge ledger:has --company "..." --role "..." --status Applied` may be used as a fast prefilter; a match is enough to drop that duplicate before dispatch. For candidates not rejected by the ledger, the four-source grep is still mandatory. If any source shows APPLIED / Applied, skip the dispatch and pick a replacement from the remaining candidate list. Do not count duplicates toward a requested "apply to N jobs" total, and do not delegate obvious duplicates just so a subagent can return SKIP.
15
+ - [H2] Max 1 application per company+role. Before every `apply` dispatch, grep all four sources for the URL and for `company+role`: `data/pipeline.md`, all `data/applications/*.md` day files, `batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`. `npx job-forge index:has --key "company-role:..."` may be used first as a fast local artifact prefilter; it rebuilds `.jobforge-index.json` on demand from `templates/index.json`, and a company+role hit is enough to drop an obvious duplicate before dispatch. If `.jobforge-ledger/events.jsonl` exists, `npx job-forge ledger:has --company "..." --role "..." --status Applied` may also be used as a fast prefilter; a match is enough to drop that duplicate before dispatch. For candidates not rejected by the index or ledger, the four-source grep is still mandatory. If any source shows APPLIED / Applied, skip the dispatch and pick a replacement from the remaining candidate list. Do not count duplicates toward a requested "apply to N jobs" total, and do not delegate obvious duplicates just so a subagent can return SKIP.
16
16
  why: 2026-04 same-day batch collision — when two batches target the same role, `npx job-forge merge` updates the existing day-file row rather than appending, so grepping day files alone misses earlier-batch applies; merged/*.tsv is the only place the breadcrumb remains
17
17
 
18
18
  - [H3] Before every batch of `task` dispatches that will use Geometra, call `geometra_list_sessions` then `geometra_disconnect({closeBrowser: true})`. Every round, no exceptions. Name this cleanup as an explicit "step 0" in your first-response plan for any multi-apply request — it is the most frequently skipped guardrail in practice, and skipping it produces cascade "Not connected" failures on the next dispatch.
@@ -24,13 +24,13 @@ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs v
24
24
  - [H5] Re-dispatch the same company only AFTER the previous subagent returns. Never fire the same `task` twice while the first is still in flight.
25
25
  why: two in-flight subagents for the same URL race on Geometra sessions and on tracker TSV writes, corrupting state and sometimes double-submitting
26
26
 
27
- - [H5b] Do not use `task` to poll task status. If OpenCode returns a task/session id without a final result, record the id, stop dispatching new rounds, and tell the user the round is still in flight. When the user asks to check later, inspect authoritative files (`batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`, day files, `.jobforge-ledger/events.jsonl`, or `iso-trace`) rather than spawning a "check task status" subagent.
27
+ - [H5b] Do not use `task` to poll task status. If OpenCode returns a task/session id without a final result, record the id, stop dispatching new rounds, and tell the user the round is still in flight. When the user asks to check later, inspect authoritative files (`batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`, day files, `.jobforge-ledger/events.jsonl`, `.jobforge-index.json`, or `iso-trace`) rather than spawning a "check task status" subagent.
28
28
  why: OpenCode status prompts can be delivered into the target subagent as a new user message; a 2026-04-25 trace caused a subagent to call `task` recursively instead of finishing the application
29
29
 
30
30
  - [H6] Application outcomes flow through `batch/tracker-additions/*.tsv`, not `data/pipeline.md`. After any multi-apply run, the orchestrator MUST run `npx job-forge merge` then `npx job-forge verify` before ending the session.
31
31
  why: `pipeline.md` is the URL inbox (`[ ]` pending → `[x]` processed); `data/applications/YYYY-MM-DD.md` is the outcome log; the TSV pathway is the only safe bridge because `merge` handles column order and duplicate detection
32
32
 
33
- - [H7] Load-bearing facts passed to downstream subagents must originate from a file, not from prior subagent prose. Authoritative sources: `data/pipeline.md`, `data/scan-history.tsv`, `batch/scan-output-*.md`, `reports/{num}-*.md` with `**URL:**` / `**Score:**` headers, `batch/tracker-additions/*.tsv`, cached JD content returned by `npx job-forge cache:get --url ...`.
33
+ - [H7] Load-bearing facts passed to downstream subagents must originate from a file, not from prior subagent prose. Authoritative sources: `data/pipeline.md`, `data/scan-history.tsv`, `batch/scan-output-*.md`, `reports/{num}-*.md` with `**URL:**` / `**Score:**` headers, `batch/tracker-additions/*.tsv`, cached JD content returned by `npx job-forge cache:get --url ...`, and source path/line pointers returned by `npx job-forge index:query ...`.
34
34
  why: 2026-04-18 scan subagent returned 30 fabricated Greenhouse IDs in prose (plausible-looking, non-existent); orchestrator dispatched 30 downstream subagents that all 404'd. Subagents can hallucinate IDs, scores, and confirmation text — round-trip through a file or don't trust the value
35
35
 
36
36
  - [H8] Never paste proxy values from `config/profile.yml` into `task` prompts, status text, or summaries. If a proxy is configured, tell the subagent exactly: "Proxy is configured; read `config/profile.yml` and pass its top-level `proxy:` object to every `geometra_connect` call." Do not transcribe `server`, `username`, `password`, or `bypass`, even if you just read them from disk.
@@ -74,12 +74,15 @@ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs v
74
74
  - [D12] Use `job-forge cache:*` for deterministic local artifact reuse when available. For URL inputs, check `npx job-forge cache:has --url "..."` / `cache:get` before browser or network JD fetches; after a successful fetch, store the exact JD text with `npx job-forge cache:put --url "..." --ttl 14d --input @file` when it is already on disk.
75
75
  why: `iso-cache` is not an MCP and adds no prompt/tool-schema tokens; it avoids repeated JD fetch/render passes and lets future sessions reuse stable content from `.jobforge-cache/`
76
76
 
77
+ - [D13] Use `job-forge index:*` for deterministic artifact lookup when available. `index:has` and `index:query` rebuild `.jobforge-index.json` from `templates/index.json` on demand, covering reports, tracker day files, tracker TSVs, pipeline URLs, scan history, and ledger events without loading those growing files into prompt context.
78
+ why: `iso-index` is not an MCP and adds no prompt/tool-schema tokens; it gives agents compact file/line pointers and duplicate prefilters before expensive reads or browser dispatches
79
+
77
80
  ## Procedure
78
81
 
79
82
  1. Check `cv.md`, `profile.yml`, and `portals.yml`; onboard if any file is missing.
80
83
  2. Pick and name the mode from **Routing** [D6]. No match → ask; do not guess.
81
- 3. Read the active mode file [D3]; use context bundle checks when changing context loads [D11]; check cached artifacts before URL/JD refetches [D12]; decide inline vs delegated work [D1].
82
- 4. Prepare Geometra dispatches: cleanup [H3], ledger prefilter when present [D8], dedupe [H2], location filter [D5], routing [D2, D10], proxy prompt hygiene [H8].
84
+ 3. Read the active mode file [D3]. Use context bundle checks when changing context loads [D11]. Check cached artifacts before URL/JD refetches [D12]. Use artifact index lookups before broad file reads when they can answer the question [D13]. Decide inline vs delegated work [D1].
85
+ 4. Prepare Geometra dispatches: cleanup [H3], index/ledger prefilter when useful [D8, D13], dedupe [H2], location filter [D5], routing [D2, D10], proxy prompt hygiene [H8].
83
86
  5. Dispatch at most 2 tasks per round [H1]; wait for final outcomes, not just task ids [H5b].
84
87
  6. Keep multi-job form-filling out of the orchestrator [H4].
85
88
  7. Cross-check subagent facts against authoritative files [H7].
@@ -73,6 +73,11 @@ Local workflow ledger (terminal, outside opencode):
73
73
  npx job-forge ledger:status # .jobforge-ledger/events.jsonl summary
74
74
  npx job-forge ledger:has --company "Acme" --role "Staff Engineer" --status Applied
75
75
 
76
+ Local artifact index (terminal, outside opencode):
77
+ npx job-forge index:status # .jobforge-index.json summary
78
+ npx job-forge index:has --key "company-role:acme:staff-engineer"
79
+ npx job-forge index:query "acme"
80
+
76
81
  Artifact contracts (terminal, outside opencode):
77
82
  npx iso-contract explain jobforge.tracker-row --contracts templates/contracts.json
78
83
  npx job-forge tracker-line ... --write # renders + validates tracker TSV locally
@@ -158,6 +163,11 @@ Step 1 — Enumerate candidates
158
163
  - Build ordered list: candidates = [job_1, job_2, ..., job_N]
159
164
 
160
165
  Step 2 — Dedup against already-applied
166
+ - Run npx job-forge index:has --key "company-role:<company-slug>:<role-slug>"
167
+ as a fast local artifact prefilter when company+role is known. It rebuilds
168
+ .jobforge-index.json on demand from templates/index.json. A hit means the
169
+ role has already appeared in tracker files or tracker TSVs and can be
170
+ dropped before dispatch.
161
171
  - If .jobforge-ledger/events.jsonl exists, use npx job-forge ledger:has as a
162
172
  fast prefilter for obvious company+role Applied duplicates. A ledger match
163
173
  can be dropped before dispatch without loading tracker files into context.
package/AGENTS.md CHANGED
@@ -7,7 +7,7 @@ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs v
7
7
  - [H1] Max 2 parallel `task` dispatches per message. For N jobs, run `ceil(N/2)` sequential rounds of 2. A round is not complete until both subagents return a final outcome (`APPLIED`, `APPLY FAILED`, `SKIP`, `Discarded`, or a written TSV path). A `task` tool result that only gives a session id / title is a launch acknowledgement, not completion. Applies in all modes, for all user phrasings ("urgent", "apply to 10 jobs now").
8
8
  why: each subagent requires post-cleanup and racing more than 2 reliably loses at least one result. On 2026-04-25 the orchestrator launched round 2 while round 1 had only returned task ids, leaving four application subagents in flight and losing two provider recoveries
9
9
 
10
- - [H2] Max 1 application per company+role. Before every `apply` dispatch, grep all four sources for the URL and for `company+role`: `data/pipeline.md`, all `data/applications/*.md` day files, `batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`. If `.jobforge-ledger/events.jsonl` exists, `npx job-forge ledger:has --company "..." --role "..." --status Applied` may be used as a fast prefilter; a match is enough to drop that duplicate before dispatch. For candidates not rejected by the ledger, the four-source grep is still mandatory. If any source shows APPLIED / Applied, skip the dispatch and pick a replacement from the remaining candidate list. Do not count duplicates toward a requested "apply to N jobs" total, and do not delegate obvious duplicates just so a subagent can return SKIP.
10
+ - [H2] Max 1 application per company+role. Before every `apply` dispatch, grep all four sources for the URL and for `company+role`: `data/pipeline.md`, all `data/applications/*.md` day files, `batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`. `npx job-forge index:has --key "company-role:..."` may be used first as a fast local artifact prefilter; it rebuilds `.jobforge-index.json` on demand from `templates/index.json`, and a company+role hit is enough to drop an obvious duplicate before dispatch. If `.jobforge-ledger/events.jsonl` exists, `npx job-forge ledger:has --company "..." --role "..." --status Applied` may also be used as a fast prefilter; a match is enough to drop that duplicate before dispatch. For candidates not rejected by the index or ledger, the four-source grep is still mandatory. If any source shows APPLIED / Applied, skip the dispatch and pick a replacement from the remaining candidate list. Do not count duplicates toward a requested "apply to N jobs" total, and do not delegate obvious duplicates just so a subagent can return SKIP.
11
11
  why: 2026-04 same-day batch collision — when two batches target the same role, `npx job-forge merge` updates the existing day-file row rather than appending, so grepping day files alone misses earlier-batch applies; merged/*.tsv is the only place the breadcrumb remains
12
12
 
13
13
  - [H3] Before every batch of `task` dispatches that will use Geometra, call `geometra_list_sessions` then `geometra_disconnect({closeBrowser: true})`. Every round, no exceptions. Name this cleanup as an explicit "step 0" in your first-response plan for any multi-apply request — it is the most frequently skipped guardrail in practice, and skipping it produces cascade "Not connected" failures on the next dispatch.
@@ -19,13 +19,13 @@ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs v
19
19
  - [H5] Re-dispatch the same company only AFTER the previous subagent returns. Never fire the same `task` twice while the first is still in flight.
20
20
  why: two in-flight subagents for the same URL race on Geometra sessions and on tracker TSV writes, corrupting state and sometimes double-submitting
21
21
 
22
- - [H5b] Do not use `task` to poll task status. If OpenCode returns a task/session id without a final result, record the id, stop dispatching new rounds, and tell the user the round is still in flight. When the user asks to check later, inspect authoritative files (`batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`, day files, `.jobforge-ledger/events.jsonl`, or `iso-trace`) rather than spawning a "check task status" subagent.
22
+ - [H5b] Do not use `task` to poll task status. If OpenCode returns a task/session id without a final result, record the id, stop dispatching new rounds, and tell the user the round is still in flight. When the user asks to check later, inspect authoritative files (`batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`, day files, `.jobforge-ledger/events.jsonl`, `.jobforge-index.json`, or `iso-trace`) rather than spawning a "check task status" subagent.
23
23
  why: OpenCode status prompts can be delivered into the target subagent as a new user message; a 2026-04-25 trace caused a subagent to call `task` recursively instead of finishing the application
24
24
 
25
25
  - [H6] Application outcomes flow through `batch/tracker-additions/*.tsv`, not `data/pipeline.md`. After any multi-apply run, the orchestrator MUST run `npx job-forge merge` then `npx job-forge verify` before ending the session.
26
26
  why: `pipeline.md` is the URL inbox (`[ ]` pending → `[x]` processed); `data/applications/YYYY-MM-DD.md` is the outcome log; the TSV pathway is the only safe bridge because `merge` handles column order and duplicate detection
27
27
 
28
- - [H7] Load-bearing facts passed to downstream subagents must originate from a file, not from prior subagent prose. Authoritative sources: `data/pipeline.md`, `data/scan-history.tsv`, `batch/scan-output-*.md`, `reports/{num}-*.md` with `**URL:**` / `**Score:**` headers, `batch/tracker-additions/*.tsv`, cached JD content returned by `npx job-forge cache:get --url ...`.
28
+ - [H7] Load-bearing facts passed to downstream subagents must originate from a file, not from prior subagent prose. Authoritative sources: `data/pipeline.md`, `data/scan-history.tsv`, `batch/scan-output-*.md`, `reports/{num}-*.md` with `**URL:**` / `**Score:**` headers, `batch/tracker-additions/*.tsv`, cached JD content returned by `npx job-forge cache:get --url ...`, and source path/line pointers returned by `npx job-forge index:query ...`.
29
29
  why: 2026-04-18 scan subagent returned 30 fabricated Greenhouse IDs in prose (plausible-looking, non-existent); orchestrator dispatched 30 downstream subagents that all 404'd. Subagents can hallucinate IDs, scores, and confirmation text — round-trip through a file or don't trust the value
30
30
 
31
31
  - [H8] Never paste proxy values from `config/profile.yml` into `task` prompts, status text, or summaries. If a proxy is configured, tell the subagent exactly: "Proxy is configured; read `config/profile.yml` and pass its top-level `proxy:` object to every `geometra_connect` call." Do not transcribe `server`, `username`, `password`, or `bypass`, even if you just read them from disk.
@@ -69,12 +69,15 @@ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs v
69
69
  - [D12] Use `job-forge cache:*` for deterministic local artifact reuse when available. For URL inputs, check `npx job-forge cache:has --url "..."` / `cache:get` before browser or network JD fetches; after a successful fetch, store the exact JD text with `npx job-forge cache:put --url "..." --ttl 14d --input @file` when it is already on disk.
70
70
  why: `iso-cache` is not an MCP and adds no prompt/tool-schema tokens; it avoids repeated JD fetch/render passes and lets future sessions reuse stable content from `.jobforge-cache/`
71
71
 
72
+ - [D13] Use `job-forge index:*` for deterministic artifact lookup when available. `index:has` and `index:query` rebuild `.jobforge-index.json` from `templates/index.json` on demand, covering reports, tracker day files, tracker TSVs, pipeline URLs, scan history, and ledger events without loading those growing files into prompt context.
73
+ why: `iso-index` is not an MCP and adds no prompt/tool-schema tokens; it gives agents compact file/line pointers and duplicate prefilters before expensive reads or browser dispatches
74
+
72
75
  ## Procedure
73
76
 
74
77
  1. Check `cv.md`, `profile.yml`, and `portals.yml`; onboard if any file is missing.
75
78
  2. Pick and name the mode from **Routing** [D6]. No match → ask; do not guess.
76
- 3. Read the active mode file [D3]; use context bundle checks when changing context loads [D11]; check cached artifacts before URL/JD refetches [D12]; decide inline vs delegated work [D1].
77
- 4. Prepare Geometra dispatches: cleanup [H3], ledger prefilter when present [D8], dedupe [H2], location filter [D5], routing [D2, D10], proxy prompt hygiene [H8].
79
+ 3. Read the active mode file [D3]. Use context bundle checks when changing context loads [D11]. Check cached artifacts before URL/JD refetches [D12]. Use artifact index lookups before broad file reads when they can answer the question [D13]. Decide inline vs delegated work [D1].
80
+ 4. Prepare Geometra dispatches: cleanup [H3], index/ledger prefilter when useful [D8, D13], dedupe [H2], location filter [D5], routing [D2, D10], proxy prompt hygiene [H8].
78
81
  5. Dispatch at most 2 tasks per round [H1]; wait for final outcomes, not just task ids [H5b].
79
82
  6. Keep multi-job form-filling out of the orchestrator [H4].
80
83
  7. Cross-check subagent facts against authoritative files [H7].
package/CLAUDE.md CHANGED
@@ -7,7 +7,7 @@ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs v
7
7
  - [H1] Max 2 parallel `task` dispatches per message. For N jobs, run `ceil(N/2)` sequential rounds of 2. A round is not complete until both subagents return a final outcome (`APPLIED`, `APPLY FAILED`, `SKIP`, `Discarded`, or a written TSV path). A `task` tool result that only gives a session id / title is a launch acknowledgement, not completion. Applies in all modes, for all user phrasings ("urgent", "apply to 10 jobs now").
8
8
  why: each subagent requires post-cleanup and racing more than 2 reliably loses at least one result. On 2026-04-25 the orchestrator launched round 2 while round 1 had only returned task ids, leaving four application subagents in flight and losing two provider recoveries
9
9
 
10
- - [H2] Max 1 application per company+role. Before every `apply` dispatch, grep all four sources for the URL and for `company+role`: `data/pipeline.md`, all `data/applications/*.md` day files, `batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`. If `.jobforge-ledger/events.jsonl` exists, `npx job-forge ledger:has --company "..." --role "..." --status Applied` may be used as a fast prefilter; a match is enough to drop that duplicate before dispatch. For candidates not rejected by the ledger, the four-source grep is still mandatory. If any source shows APPLIED / Applied, skip the dispatch and pick a replacement from the remaining candidate list. Do not count duplicates toward a requested "apply to N jobs" total, and do not delegate obvious duplicates just so a subagent can return SKIP.
10
+ - [H2] Max 1 application per company+role. Before every `apply` dispatch, grep all four sources for the URL and for `company+role`: `data/pipeline.md`, all `data/applications/*.md` day files, `batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`. `npx job-forge index:has --key "company-role:..."` may be used first as a fast local artifact prefilter; it rebuilds `.jobforge-index.json` on demand from `templates/index.json`, and a company+role hit is enough to drop an obvious duplicate before dispatch. If `.jobforge-ledger/events.jsonl` exists, `npx job-forge ledger:has --company "..." --role "..." --status Applied` may also be used as a fast prefilter; a match is enough to drop that duplicate before dispatch. For candidates not rejected by the index or ledger, the four-source grep is still mandatory. If any source shows APPLIED / Applied, skip the dispatch and pick a replacement from the remaining candidate list. Do not count duplicates toward a requested "apply to N jobs" total, and do not delegate obvious duplicates just so a subagent can return SKIP.
11
11
  why: 2026-04 same-day batch collision — when two batches target the same role, `npx job-forge merge` updates the existing day-file row rather than appending, so grepping day files alone misses earlier-batch applies; merged/*.tsv is the only place the breadcrumb remains
12
12
 
13
13
  - [H3] Before every batch of `task` dispatches that will use Geometra, call `geometra_list_sessions` then `geometra_disconnect({closeBrowser: true})`. Every round, no exceptions. Name this cleanup as an explicit "step 0" in your first-response plan for any multi-apply request — it is the most frequently skipped guardrail in practice, and skipping it produces cascade "Not connected" failures on the next dispatch.
@@ -19,13 +19,13 @@ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs v
19
19
  - [H5] Re-dispatch the same company only AFTER the previous subagent returns. Never fire the same `task` twice while the first is still in flight.
20
20
  why: two in-flight subagents for the same URL race on Geometra sessions and on tracker TSV writes, corrupting state and sometimes double-submitting
21
21
 
22
- - [H5b] Do not use `task` to poll task status. If OpenCode returns a task/session id without a final result, record the id, stop dispatching new rounds, and tell the user the round is still in flight. When the user asks to check later, inspect authoritative files (`batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`, day files, `.jobforge-ledger/events.jsonl`, or `iso-trace`) rather than spawning a "check task status" subagent.
22
+ - [H5b] Do not use `task` to poll task status. If OpenCode returns a task/session id without a final result, record the id, stop dispatching new rounds, and tell the user the round is still in flight. When the user asks to check later, inspect authoritative files (`batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`, day files, `.jobforge-ledger/events.jsonl`, `.jobforge-index.json`, or `iso-trace`) rather than spawning a "check task status" subagent.
23
23
  why: OpenCode status prompts can be delivered into the target subagent as a new user message; a 2026-04-25 trace caused a subagent to call `task` recursively instead of finishing the application
24
24
 
25
25
  - [H6] Application outcomes flow through `batch/tracker-additions/*.tsv`, not `data/pipeline.md`. After any multi-apply run, the orchestrator MUST run `npx job-forge merge` then `npx job-forge verify` before ending the session.
26
26
  why: `pipeline.md` is the URL inbox (`[ ]` pending → `[x]` processed); `data/applications/YYYY-MM-DD.md` is the outcome log; the TSV pathway is the only safe bridge because `merge` handles column order and duplicate detection
27
27
 
28
- - [H7] Load-bearing facts passed to downstream subagents must originate from a file, not from prior subagent prose. Authoritative sources: `data/pipeline.md`, `data/scan-history.tsv`, `batch/scan-output-*.md`, `reports/{num}-*.md` with `**URL:**` / `**Score:**` headers, `batch/tracker-additions/*.tsv`, cached JD content returned by `npx job-forge cache:get --url ...`.
28
+ - [H7] Load-bearing facts passed to downstream subagents must originate from a file, not from prior subagent prose. Authoritative sources: `data/pipeline.md`, `data/scan-history.tsv`, `batch/scan-output-*.md`, `reports/{num}-*.md` with `**URL:**` / `**Score:**` headers, `batch/tracker-additions/*.tsv`, cached JD content returned by `npx job-forge cache:get --url ...`, and source path/line pointers returned by `npx job-forge index:query ...`.
29
29
  why: 2026-04-18 scan subagent returned 30 fabricated Greenhouse IDs in prose (plausible-looking, non-existent); orchestrator dispatched 30 downstream subagents that all 404'd. Subagents can hallucinate IDs, scores, and confirmation text — round-trip through a file or don't trust the value
30
30
 
31
31
  - [H8] Never paste proxy values from `config/profile.yml` into `task` prompts, status text, or summaries. If a proxy is configured, tell the subagent exactly: "Proxy is configured; read `config/profile.yml` and pass its top-level `proxy:` object to every `geometra_connect` call." Do not transcribe `server`, `username`, `password`, or `bypass`, even if you just read them from disk.
@@ -69,12 +69,15 @@ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs v
69
69
  - [D12] Use `job-forge cache:*` for deterministic local artifact reuse when available. For URL inputs, check `npx job-forge cache:has --url "..."` / `cache:get` before browser or network JD fetches; after a successful fetch, store the exact JD text with `npx job-forge cache:put --url "..." --ttl 14d --input @file` when it is already on disk.
70
70
  why: `iso-cache` is not an MCP and adds no prompt/tool-schema tokens; it avoids repeated JD fetch/render passes and lets future sessions reuse stable content from `.jobforge-cache/`
71
71
 
72
+ - [D13] Use `job-forge index:*` for deterministic artifact lookup when available. `index:has` and `index:query` rebuild `.jobforge-index.json` from `templates/index.json` on demand, covering reports, tracker day files, tracker TSVs, pipeline URLs, scan history, and ledger events without loading those growing files into prompt context.
73
+ why: `iso-index` is not an MCP and adds no prompt/tool-schema tokens; it gives agents compact file/line pointers and duplicate prefilters before expensive reads or browser dispatches
74
+
72
75
  ## Procedure
73
76
 
74
77
  1. Check `cv.md`, `profile.yml`, and `portals.yml`; onboard if any file is missing.
75
78
  2. Pick and name the mode from **Routing** [D6]. No match → ask; do not guess.
76
- 3. Read the active mode file [D3]; use context bundle checks when changing context loads [D11]; check cached artifacts before URL/JD refetches [D12]; decide inline vs delegated work [D1].
77
- 4. Prepare Geometra dispatches: cleanup [H3], ledger prefilter when present [D8], dedupe [H2], location filter [D5], routing [D2, D10], proxy prompt hygiene [H8].
79
+ 3. Read the active mode file [D3]. Use context bundle checks when changing context loads [D11]. Check cached artifacts before URL/JD refetches [D12]. Use artifact index lookups before broad file reads when they can answer the question [D13]. Decide inline vs delegated work [D1].
80
+ 4. Prepare Geometra dispatches: cleanup [H3], index/ledger prefilter when useful [D8, D13], dedupe [H2], location filter [D5], routing [D2, D10], proxy prompt hygiene [H8].
78
81
  5. Dispatch at most 2 tasks per round [H1]; wait for final outcomes, not just task ids [H5b].
79
82
  6. Keep multi-job form-filling out of the orchestrator [H4].
80
83
  7. Cross-check subagent facts against authoritative files [H7].
package/README.md CHANGED
@@ -31,7 +31,7 @@ The scaffolded `opencode.json` already has three MCPs wired up — they launch a
31
31
  - **Gmail** — reads replies from recruiters
32
32
  - **state-trace** — typed working memory for cross-session context (resumed batches, recent decisions, repeated portal quirks). Install once with `python3 -m pip install "state-trace[mcp]"`; the MCP command is `state-trace-mcp`.
33
33
 
34
- JobForge also keeps MCP-free local workflow state: `templates/contracts.json` defines tracker/apply artifact shapes via `@razroo/iso-contract`, `templates/capabilities.json` defines role capability boundaries via `@razroo/iso-capabilities`, `templates/context.json` defines deterministic mode/reference bundles via `@razroo/iso-context`, `.jobforge-ledger/events.jsonl` records duplicate/status events via `@razroo/iso-ledger`, and `.jobforge-cache/` stores reusable JD/artifact content via `@razroo/iso-cache`. None of these add always-on prompt or tool-schema tokens.
34
+ JobForge also keeps MCP-free local workflow state: `templates/contracts.json` defines tracker/apply artifact shapes via `@razroo/iso-contract`, `templates/capabilities.json` defines role capability boundaries via `@razroo/iso-capabilities`, `templates/context.json` defines deterministic mode/reference bundles via `@razroo/iso-context`, `.jobforge-ledger/events.jsonl` records duplicate/status events via `@razroo/iso-ledger`, `.jobforge-cache/` stores reusable JD/artifact content via `@razroo/iso-cache`, and `.jobforge-index.json` indexes artifact source pointers via `@razroo/iso-index`. None of these add always-on prompt or tool-schema tokens.
35
35
 
36
36
  `npm install` also materializes symlinks for every supported agent harness — OpenCode, Cursor, Claude Code, and Codex — so you can run `opencode`, `cursor`, `claude`, or `codex` in the same project and each picks up the shared MCP config and instructions.
37
37
 
@@ -78,7 +78,7 @@ JobForge turns opencode into a full job search command center. Instead of manual
78
78
  | **Durable Batch Orchestration** | `batch-runner.sh` uses `@razroo/iso-orchestrator` for resumable bundle execution, bounded fan-out, mutexed state writes, and workflow records in `.jobforge-runs/`. |
79
79
  | **Pipeline Integrity** | Automated merge, dedup, status normalization, health checks |
80
80
  | **Cost-Aware Agent Routing** | Three subagents (`@general-free`, `@general-paid`, `@glm-minimal`) with per-task tool surfaces. On OpenCode, JobForge pins all tiers to `opencode-go/deepseek-v4-flash` so application runs avoid overloaded free-model pools. See [Subagent Routing in AGENTS.md](AGENTS.md) for the task-to-agent mapping. |
81
- | **Trace + Telemetry + Guard + Contract + Ledger + Capabilities + Context + Cache** | `job-forge trace:*` exposes local OpenCode transcripts, `job-forge telemetry:*` summarizes runs, `job-forge guard:*` audits deterministic policy rules, `templates/contracts.json` enforces artifact shape with `iso-contract`, `job-forge ledger:*` queries append-only workflow state, `job-forge capabilities:*` checks role boundaries, `job-forge context:*` plans mode/reference context bundles, and `job-forge cache:*` reuses fetched JD/artifact content without MCP/tool-schema overhead. |
81
+ | **Trace + Telemetry + Guard + Contract + Ledger + Capabilities + Context + Cache + Index** | `job-forge trace:*` exposes local OpenCode transcripts, `job-forge telemetry:*` summarizes runs, `job-forge guard:*` audits deterministic policy rules, `templates/contracts.json` enforces artifact shape with `iso-contract`, `job-forge ledger:*` queries append-only workflow state, `job-forge capabilities:*` checks role boundaries, `job-forge context:*` plans mode/reference context bundles, `job-forge cache:*` reuses fetched JD/artifact content, and `job-forge index:*` queries compact source pointers without MCP/tool-schema overhead. |
82
82
  | **Token Cost Visibility** | `job-forge tokens --days 1` for per-session breakdown; `job-forge session-report --since-minutes 60 --log` to flag sessions over budget and append history to `data/token-usage.tsv`. Auto-logged after every batch run. |
83
83
 
84
84
  ## Usage
@@ -147,6 +147,7 @@ my-search/
147
147
  ├── data/ # applications, pipeline, scan history (personal, gitignored)
148
148
  ├── .jobforge-ledger/ # append-only local workflow events (personal, gitignored)
149
149
  ├── .jobforge-cache/ # content-addressed local JD/artifact cache (personal, gitignored)
150
+ ├── .jobforge-index.json # deterministic artifact lookup index (generated, gitignored)
150
151
  ├── reports/ # generated evaluation reports (personal, gitignored)
151
152
  ├── batch/{batch-input,batch-state}.tsv, tracker-additions/, logs/ # personal
152
153
  ├── .jobforge-runs/ # durable batch workflow records (generated)
@@ -163,7 +164,7 @@ my-search/
163
164
  ├── .opencode/skills/job-forge.md # → skill router
164
165
  ├── .opencode/agents/ # → @general-free, @general-paid, @glm-minimal
165
166
  ├── modes/ # → _shared.md + skill modes
166
- ├── templates/ # → states.yml, portals.example.yml, cv-template.html, capabilities.json, context.json
167
+ ├── templates/ # → states.yml, portals.example.yml, cv-template.html, capabilities.json, context.json, index.json
167
168
  ├── batch/batch-prompt.md # → batch worker prompt
168
169
  ├── batch/batch-runner.sh # → parallel orchestrator
169
170
 
@@ -199,6 +200,7 @@ JobForge/
199
200
  │ ├── capabilities.mjs # iso-capabilities-backed role policy CLI
200
201
  │ ├── context.mjs # iso-context-backed context bundle CLI
201
202
  │ ├── cache.mjs # iso-cache-backed local artifact cache CLI
203
+ │ ├── index.mjs # iso-index-backed artifact lookup CLI
202
204
  │ ├── token-usage-report.mjs # opencode cost analyzer
203
205
  │ └── release/check-source.mjs # version gate for npm publish
204
206
  ├── tracker-lib.mjs / merge-tracker.mjs / dedup-tracker.mjs / verify-pipeline.mjs
@@ -124,6 +124,11 @@ const consumerPkg = {
124
124
  'ledger:verify': 'job-forge ledger:verify',
125
125
  'ledger:has': 'job-forge ledger:has',
126
126
  'ledger:query': 'job-forge ledger:query',
127
+ 'index:build': 'job-forge index:build',
128
+ 'index:status': 'job-forge index:status',
129
+ 'index:verify': 'job-forge index:verify',
130
+ 'index:has': 'job-forge index:has',
131
+ 'index:query': 'job-forge index:query',
127
132
  // One command to pull the latest harness and any locally-pinned MCP
128
133
  // packages. npm update is a no-op on packages not in package.json, so
129
134
  // listing @razroo/gmail-mcp + @geometra/mcp is safe for consumers that
@@ -224,6 +229,7 @@ Before doing any work, remember where things live in *this* project:
224
229
  | Inbox of pending URLs | \`data/pipeline.md\` | The queue for \`/job-forge pipeline\` |
225
230
  | Scanner dedup history | \`data/scan-history.tsv\` | Only touch in \`/job-forge scan\` |
226
231
  | Local workflow ledger | \`.jobforge-ledger/events.jsonl\` | Deterministic append-only state; use \`job-forge ledger:*\` |
232
+ | Local artifact index | \`.jobforge-index.json\` | Deterministic file/line lookup; use \`job-forge index:*\` |
227
233
  | Scanner config | \`portals.yml\` (project root) | Company configs |
228
234
  | Profile / identity | \`config/profile.yml\` | Candidate name, email, target roles |
229
235
  | CV | \`cv.md\` (project root) | Markdown, source of truth |
@@ -312,6 +318,9 @@ data/pipeline.md
312
318
  data/scan-history.tsv
313
319
  data/token-usage.tsv
314
320
  .jobforge-ledger/
321
+ .jobforge-cache/
322
+ .jobforge-index.json
323
+ .jobforge-runs/
315
324
  reports/
316
325
  !reports/.gitkeep
317
326
  batch/batch-state.tsv
@@ -369,6 +378,7 @@ job-forge sync # re-run if symlinks drift
369
378
  job-forge merge # merge batch/tracker-additions/*.tsv into the tracker
370
379
  job-forge verify # verify pipeline integrity
371
380
  job-forge ledger:status # local deterministic workflow ledger status
381
+ job-forge index:status # local artifact index status
372
382
  job-forge pdf cv.md out.pdf
373
383
  job-forge tokens --days 1 # per-session opencode token usage
374
384
  \`\`\`
package/bin/job-forge.mjs CHANGED
@@ -24,6 +24,7 @@
24
24
  * capabilities:* Query role capability policy via iso-capabilities
25
25
  * context:* Query/render deterministic context bundles via iso-context
26
26
  * cache:* Reuse local deterministic artifacts via iso-cache
27
+ * index:* Query local artifacts via iso-index
27
28
  * sync Re-run the harness symlink sync (bin/sync.mjs)
28
29
  * help, --help Show this message
29
30
  */
@@ -116,6 +117,16 @@ const cacheAliases = {
116
117
  'cache:path': 'path',
117
118
  };
118
119
 
120
+ const indexAliases = {
121
+ 'index:build': 'build',
122
+ 'index:status': 'status',
123
+ 'index:query': 'query',
124
+ 'index:has': 'has',
125
+ 'index:verify': 'verify',
126
+ 'index:explain': 'explain',
127
+ 'index:path': 'path',
128
+ };
129
+
119
130
  const [, , cmd, ...rest] = process.argv;
120
131
 
121
132
  function printHelp() {
@@ -161,6 +172,11 @@ Commands:
161
172
  cache:get Read cached JD/artifact content
162
173
  cache:put Store JD/artifact content
163
174
  cache:verify Validate local artifact cache integrity
175
+ index:status Show local artifact index status
176
+ index:build Rebuild .jobforge-index.json from templates/index.json
177
+ index:has Check indexed URL/company-role/report facts without loading source files
178
+ index:query Query indexed reports, tracker rows, TSVs, scan history, pipeline, and ledger
179
+ index:verify Validate local artifact index integrity
164
180
  sync Re-create harness symlinks in the current project
165
181
 
166
182
  Deterministic helpers (prefer these over LLM-derived values):
@@ -197,6 +213,8 @@ Pass --help after a command to see its own flags, e.g.:
197
213
  job-forge cache:has --url https://example.test/jobs/123
198
214
  job-forge cache:get --url https://example.test/jobs/123
199
215
  job-forge cache:put --url https://example.test/jobs/123 --input @jds/example.md
216
+ job-forge index:has --key "company-role:acme:staff-engineer"
217
+ job-forge index:query "acme"
200
218
 
201
219
  Project directory resolves to $JOB_FORGE_PROJECT or cwd.`);
202
220
  }
@@ -311,6 +329,21 @@ if (cmd === 'cache' || cacheAliases[cmd]) {
311
329
  process.exit(result.status ?? 1);
312
330
  }
313
331
 
332
+ if (cmd === 'index' || indexAliases[cmd]) {
333
+ const indexArgs = cmd === 'index'
334
+ ? (rest.length === 0 ? ['help'] : rest)
335
+ : [indexAliases[cmd], ...rest];
336
+
337
+ const scriptPath = join(PKG_ROOT, 'scripts/index.mjs');
338
+ const result = spawnSync(process.execPath, [scriptPath, ...indexArgs], {
339
+ stdio: 'inherit',
340
+ cwd: PROJECT_DIR,
341
+ env: process.env,
342
+ });
343
+
344
+ process.exit(result.status ?? 1);
345
+ }
346
+
314
347
  const rel = commands[cmd];
315
348
  if (!rel) {
316
349
  console.error(`Unknown command: ${cmd}\n`);
@@ -161,6 +161,7 @@ config/profile.yml → Candidate identity
161
161
  portals.yml → Scanner configuration
162
162
  data/pipeline.md → Pending URLs and `local:jds/...` inbox (see modes/pipeline.md)
163
163
  .jobforge-ledger/events.jsonl → Append-only workflow events for cheap local duplicate/status checks
164
+ .jobforge-index.json → Deterministic artifact lookup index built from templates/index.json
164
165
  jds/*.md → Saved job descriptions referenced from the pipeline (`local:jds/{file}`)
165
166
  templates/states.yml → Canonical status values
166
167
  templates/context.json → Deterministic mode/reference context bundle policy
@@ -176,12 +177,13 @@ Create `data/pipeline.md` when you start using the URL inbox (`/job-forge pipeli
176
177
  - PDFs: `cv-candidate-{company-slug}-{YYYY-MM-DD}.pdf`
177
178
  - Tracker TSVs: `batch/tracker-additions/{num}-{company-slug}.tsv` (one file per evaluation; merged files move under `batch/tracker-additions/merged/`; shape enforced by `templates/contracts.json`)
178
179
  - Ledger: `.jobforge-ledger/events.jsonl` (created by `job-forge ledger:rebuild`, `tracker-line --write`, or `merge`; gitignored personal state)
180
+ - Index: `.jobforge-index.json` (created on demand by `job-forge index:*`; gitignored local lookup state)
179
181
  - Capabilities: `templates/capabilities.json` (role boundary policy inspected with `job-forge capabilities:*`)
180
182
  - Context: `templates/context.json` (mode/reference file bundles inspected with `job-forge context:*`)
181
183
 
182
184
  ## Pipeline Integrity
183
185
 
184
- From the project root, `npx job-forge verify` (or `npm run verify`) runs `verify-pipeline.mjs`. When a tracker file exists, it validates canonical statuses (using `templates/states.yml` when that file is present and parseable), validates every tracker row against `templates/contracts.json`, warns on probable duplicate company/role rows, checks that report column markdown links resolve to files in the repo, validates score column format (`X.X/5`, `N/A`, or `DUP`), rejects table rows with too few columns, flags markdown bold inside the score column, and warns if any `batch/tracker-additions/*.tsv` files are still waiting to be merged. If `.jobforge-ledger/events.jsonl` exists, verify also validates the append-only ledger. It also compares state ids from `templates/states.yml` to an internal fallback list and warns when the two sets drift. **Fresh clone:** the command exits successfully when neither `data/applications.md` nor root `applications.md` exists yet; pending-TSV and states-drift checks still run so contributors see unmerged batch output early. Optional setup validation after you add `cv.md` and `config/profile.yml`: `npm run sync-check` (`cv-sync-check.mjs`).
186
+ From the project root, `npx job-forge verify` (or `npm run verify`) runs `verify-pipeline.mjs`. When a tracker file exists, it validates canonical statuses (using `templates/states.yml` when that file is present and parseable), validates every tracker row against `templates/contracts.json`, warns on probable duplicate company/role rows, checks that report column markdown links resolve to files in the repo, validates score column format (`X.X/5`, `N/A`, or `DUP`), rejects table rows with too few columns, flags markdown bold inside the score column, and warns if any `batch/tracker-additions/*.tsv` files are still waiting to be merged. If `.jobforge-ledger/events.jsonl` exists, verify also validates the append-only ledger. If `.jobforge-index.json` exists, verify validates the artifact index. It also compares state ids from `templates/states.yml` to an internal fallback list and warns when the two sets drift. **Fresh clone:** the command exits successfully when neither `data/applications.md` nor root `applications.md` exists yet; pending-TSV and states-drift checks still run so contributors see unmerged batch output early. Optional setup validation after you add `cv.md` and `config/profile.yml`: `npm run sync-check` (`cv-sync-check.mjs`).
185
187
 
186
188
  **`verify-pipeline.mjs` checks (same order as the script header):**
187
189
 
@@ -195,8 +197,9 @@ From the project root, `npx job-forge verify` (or `npm run verify`) runs `verify
195
197
  8. Score column has no markdown bold.
196
198
  9. Warn when state ids in `templates/states.yml` drift from the script’s built-in fallback list (or when the file exists but ids failed to parse).
197
199
  10. Validate `.jobforge-ledger/events.jsonl` when present.
200
+ 11. Validate `.jobforge-index.json` when present.
198
201
 
199
- When the tracker file is missing, checks 1-6 and 8 are skipped; checks 7, 9, and 10 still run when applicable.
202
+ When the tracker file is missing, checks 1-6 and 8 are skipped; checks 7, 9, 10, and 11 still run when applicable.
200
203
 
201
204
  ## Contributing touchpoints
202
205
 
@@ -219,6 +222,7 @@ Scripts maintain data consistency. In a consumer project they're invoked via the
219
222
  | `scripts/telemetry.mjs` | `npx job-forge telemetry:status` / `telemetry:show` | JobForge operational telemetry derived from OpenCode traces plus tracker TSV state |
220
223
  | `scripts/guard.mjs` | `npx job-forge guard:audit` / `guard:explain` | Deterministic `@razroo/iso-guard` policy audits over local OpenCode traces |
221
224
  | `scripts/ledger.mjs` | `npx job-forge ledger:status` / `ledger:has` / `ledger:rebuild` | Deterministic `@razroo/iso-ledger` state over tracker, TSV, and pipeline files |
225
+ | `scripts/index.mjs` | `npx job-forge index:status` / `index:has` / `index:query` | Deterministic `@razroo/iso-index` lookup over reports, tracker rows, TSVs, pipeline, scan history, and ledger events |
222
226
  | `scripts/context.mjs` | `npx job-forge context:list` / `context:plan` / `context:check` / `context:render` | Deterministic `@razroo/iso-context` mode/reference context bundle planning and rendering |
223
227
  | `tracker-lib.mjs` | _(library)_ | Shared helpers for reading/writing day-based tracker files — imported by merge/dedup/verify/normalize |
224
228
  | `bin/sync.mjs` | `npx job-forge sync` | Creates the harness symlinks in a consumer project (also runs as `postinstall`) |
@@ -150,6 +150,10 @@ Role capability boundaries live in `templates/capabilities.json` and are enforce
150
150
 
151
151
  Mode/reference context bundles live in `templates/context.json` and are planned locally by `@razroo/iso-context`. Use `job-forge context:plan <mode>` to see the files and estimated tokens, `job-forge context:check <mode>` to fail on budget drift, and `job-forge context:render <mode>` when you intentionally need a compact markdown or JSON context bundle. This is not an MCP and does not add tool-schema tokens; rendered context only consumes prompt tokens when a workflow deliberately asks for it.
152
152
 
153
+ ## JobForge artifact index
154
+
155
+ Artifact lookup policy lives in `templates/index.json` and is built locally by `@razroo/iso-index`. Use `job-forge index:has --key "company-role:acme:staff-engineer"` as a cheap duplicate/source prefilter, `job-forge index:query "acme"` to get compact source path/line pointers, and `job-forge index:verify` to validate `.jobforge-index.json`. Query, has, and verify rebuild the index on demand, so scaffolded projects need no setup. This is not an MCP and does not add tool-schema tokens.
156
+
153
157
  ## JobForge guard audits
154
158
 
155
159
  Guard audits run deterministic `@razroo/iso-guard` policies over the same local OpenCode traces. The default policy lives at `templates/guards/jobforge-baseline.yaml` and checks rules that are reliable from transcript data, including max two task dispatches per assistant message, no task-status polling via `task`, no raw proxy configuration in task prompts, and no child session task recursion.
package/docs/SETUP.md CHANGED
@@ -129,6 +129,7 @@ From your project root, these commands maintain the tracker and pipeline checks.
129
129
  | Inspect role capabilities | `npx job-forge capabilities:explain general-free` | `npm run capabilities:explain -- general-free` |
130
130
  | Inspect context bundle budget | `npx job-forge context:plan apply` | `npm run context:plan -- apply` |
131
131
  | Inspect local JD/artifact cache | `npx job-forge cache:status` | `npm run cache:status` |
132
+ | Inspect local artifact index | `npx job-forge index:status` | `npm run index:status` |
132
133
  | Map status column to canonical labels | `npx job-forge normalize` | `npm run normalize` |
133
134
  | Merge duplicate company/role rows | `npx job-forge dedup` | `npm run dedup` |
134
135
  | Generate ATS-optimized CV PDF | `npx job-forge pdf` | `npm run pdf` |
@@ -146,6 +147,7 @@ From your project root, these commands maintain the tracker and pipeline checks.
146
147
  | Rebuild local workflow ledger from tracker/pipeline files | `npx job-forge ledger:rebuild` | `npm run ledger:rebuild` |
147
148
  | Check duplicate/status event without loading tracker files | `npx job-forge ledger:has --company "Acme" --role "Staff Engineer" --status Applied` | `npm run ledger:has -- --company ...` |
148
149
  | Check/reuse cached JD content | `npx job-forge cache:has --url <url>` / `npx job-forge cache:get --url <url>` | `npm run cache:has -- --url ...` |
150
+ | Query local artifact pointers | `npx job-forge index:query "Acme"` / `npx job-forge index:has --key company-role:acme:staff-engineer` | `npm run index:query -- Acme` |
149
151
  | Re-create harness symlinks | `npx job-forge sync` | `npm run sync` |
150
152
  | Build optional dashboard TUI (Go on `PATH`) | `(cd node_modules/job-forge/dashboard && go build .)` | `npm run build:dashboard` (harness repo only) |
151
153
 
@@ -76,6 +76,11 @@ Local workflow ledger (terminal, outside opencode):
76
76
  npx job-forge ledger:status # .jobforge-ledger/events.jsonl summary
77
77
  npx job-forge ledger:has --company "Acme" --role "Staff Engineer" --status Applied
78
78
 
79
+ Local artifact index (terminal, outside opencode):
80
+ npx job-forge index:status # .jobforge-index.json summary
81
+ npx job-forge index:has --key "company-role:acme:staff-engineer"
82
+ npx job-forge index:query "acme"
83
+
79
84
  Artifact contracts (terminal, outside opencode):
80
85
  npx iso-contract explain jobforge.tracker-row --contracts templates/contracts.json
81
86
  npx job-forge tracker-line ... --write # renders + validates tracker TSV locally
@@ -161,6 +166,11 @@ Step 1 — Enumerate candidates
161
166
  - Build ordered list: candidates = [job_1, job_2, ..., job_N]
162
167
 
163
168
  Step 2 — Dedup against already-applied
169
+ - Run npx job-forge index:has --key "company-role:<company-slug>:<role-slug>"
170
+ as a fast local artifact prefilter when company+role is known. It rebuilds
171
+ .jobforge-index.json on demand from templates/index.json. A hit means the
172
+ role has already appeared in tracker files or tracker TSVs and can be
173
+ dropped before dispatch.
164
174
  - If .jobforge-ledger/events.jsonl exists, use npx job-forge ledger:has as a
165
175
  fast prefilter for obvious company+role Applied duplicates. A ledger match
166
176
  can be dropped before dispatch without loading tracker files into context.
@@ -7,7 +7,7 @@ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs v
7
7
  - [H1] Max 2 parallel `task` dispatches per message. For N jobs, run `ceil(N/2)` sequential rounds of 2. A round is not complete until both subagents return a final outcome (`APPLIED`, `APPLY FAILED`, `SKIP`, `Discarded`, or a written TSV path). A `task` tool result that only gives a session id / title is a launch acknowledgement, not completion. Applies in all modes, for all user phrasings ("urgent", "apply to 10 jobs now").
8
8
  why: each subagent requires post-cleanup and racing more than 2 reliably loses at least one result. On 2026-04-25 the orchestrator launched round 2 while round 1 had only returned task ids, leaving four application subagents in flight and losing two provider recoveries
9
9
 
10
- - [H2] Max 1 application per company+role. Before every `apply` dispatch, grep all four sources for the URL and for `company+role`: `data/pipeline.md`, all `data/applications/*.md` day files, `batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`. If `.jobforge-ledger/events.jsonl` exists, `npx job-forge ledger:has --company "..." --role "..." --status Applied` may be used as a fast prefilter; a match is enough to drop that duplicate before dispatch. For candidates not rejected by the ledger, the four-source grep is still mandatory. If any source shows APPLIED / Applied, skip the dispatch and pick a replacement from the remaining candidate list. Do not count duplicates toward a requested "apply to N jobs" total, and do not delegate obvious duplicates just so a subagent can return SKIP.
10
+ - [H2] Max 1 application per company+role. Before every `apply` dispatch, grep all four sources for the URL and for `company+role`: `data/pipeline.md`, all `data/applications/*.md` day files, `batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`. `npx job-forge index:has --key "company-role:..."` may be used first as a fast local artifact prefilter; it rebuilds `.jobforge-index.json` on demand from `templates/index.json`, and a company+role hit is enough to drop an obvious duplicate before dispatch. If `.jobforge-ledger/events.jsonl` exists, `npx job-forge ledger:has --company "..." --role "..." --status Applied` may also be used as a fast prefilter; a match is enough to drop that duplicate before dispatch. For candidates not rejected by the index or ledger, the four-source grep is still mandatory. If any source shows APPLIED / Applied, skip the dispatch and pick a replacement from the remaining candidate list. Do not count duplicates toward a requested "apply to N jobs" total, and do not delegate obvious duplicates just so a subagent can return SKIP.
11
11
  why: 2026-04 same-day batch collision — when two batches target the same role, `npx job-forge merge` updates the existing day-file row rather than appending, so grepping day files alone misses earlier-batch applies; merged/*.tsv is the only place the breadcrumb remains
12
12
 
13
13
  - [H3] Before every batch of `task` dispatches that will use Geometra, call `geometra_list_sessions` then `geometra_disconnect({closeBrowser: true})`. Every round, no exceptions. Name this cleanup as an explicit "step 0" in your first-response plan for any multi-apply request — it is the most frequently skipped guardrail in practice, and skipping it produces cascade "Not connected" failures on the next dispatch.
@@ -19,13 +19,13 @@ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs v
19
19
  - [H5] Re-dispatch the same company only AFTER the previous subagent returns. Never fire the same `task` twice while the first is still in flight.
20
20
  why: two in-flight subagents for the same URL race on Geometra sessions and on tracker TSV writes, corrupting state and sometimes double-submitting
21
21
 
22
- - [H5b] Do not use `task` to poll task status. If OpenCode returns a task/session id without a final result, record the id, stop dispatching new rounds, and tell the user the round is still in flight. When the user asks to check later, inspect authoritative files (`batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`, day files, `.jobforge-ledger/events.jsonl`, or `iso-trace`) rather than spawning a "check task status" subagent.
22
+ - [H5b] Do not use `task` to poll task status. If OpenCode returns a task/session id without a final result, record the id, stop dispatching new rounds, and tell the user the round is still in flight. When the user asks to check later, inspect authoritative files (`batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`, day files, `.jobforge-ledger/events.jsonl`, `.jobforge-index.json`, or `iso-trace`) rather than spawning a "check task status" subagent.
23
23
  why: OpenCode status prompts can be delivered into the target subagent as a new user message; a 2026-04-25 trace caused a subagent to call `task` recursively instead of finishing the application
24
24
 
25
25
  - [H6] Application outcomes flow through `batch/tracker-additions/*.tsv`, not `data/pipeline.md`. After any multi-apply run, the orchestrator MUST run `npx job-forge merge` then `npx job-forge verify` before ending the session.
26
26
  why: `pipeline.md` is the URL inbox (`[ ]` pending → `[x]` processed); `data/applications/YYYY-MM-DD.md` is the outcome log; the TSV pathway is the only safe bridge because `merge` handles column order and duplicate detection
27
27
 
28
- - [H7] Load-bearing facts passed to downstream subagents must originate from a file, not from prior subagent prose. Authoritative sources: `data/pipeline.md`, `data/scan-history.tsv`, `batch/scan-output-*.md`, `reports/{num}-*.md` with `**URL:**` / `**Score:**` headers, `batch/tracker-additions/*.tsv`, cached JD content returned by `npx job-forge cache:get --url ...`.
28
+ - [H7] Load-bearing facts passed to downstream subagents must originate from a file, not from prior subagent prose. Authoritative sources: `data/pipeline.md`, `data/scan-history.tsv`, `batch/scan-output-*.md`, `reports/{num}-*.md` with `**URL:**` / `**Score:**` headers, `batch/tracker-additions/*.tsv`, cached JD content returned by `npx job-forge cache:get --url ...`, and source path/line pointers returned by `npx job-forge index:query ...`.
29
29
  why: 2026-04-18 scan subagent returned 30 fabricated Greenhouse IDs in prose (plausible-looking, non-existent); orchestrator dispatched 30 downstream subagents that all 404'd. Subagents can hallucinate IDs, scores, and confirmation text — round-trip through a file or don't trust the value
30
30
 
31
31
  - [H8] Never paste proxy values from `config/profile.yml` into `task` prompts, status text, or summaries. If a proxy is configured, tell the subagent exactly: "Proxy is configured; read `config/profile.yml` and pass its top-level `proxy:` object to every `geometra_connect` call." Do not transcribe `server`, `username`, `password`, or `bypass`, even if you just read them from disk.
@@ -69,12 +69,15 @@ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs v
69
69
  - [D12] Use `job-forge cache:*` for deterministic local artifact reuse when available. For URL inputs, check `npx job-forge cache:has --url "..."` / `cache:get` before browser or network JD fetches; after a successful fetch, store the exact JD text with `npx job-forge cache:put --url "..." --ttl 14d --input @file` when it is already on disk.
70
70
  why: `iso-cache` is not an MCP and adds no prompt/tool-schema tokens; it avoids repeated JD fetch/render passes and lets future sessions reuse stable content from `.jobforge-cache/`
71
71
 
72
+ - [D13] Use `job-forge index:*` for deterministic artifact lookup when available. `index:has` and `index:query` rebuild `.jobforge-index.json` from `templates/index.json` on demand, covering reports, tracker day files, tracker TSVs, pipeline URLs, scan history, and ledger events without loading those growing files into prompt context.
73
+ why: `iso-index` is not an MCP and adds no prompt/tool-schema tokens; it gives agents compact file/line pointers and duplicate prefilters before expensive reads or browser dispatches
74
+
72
75
  ## Procedure
73
76
 
74
77
  1. Check `cv.md`, `profile.yml`, and `portals.yml`; onboard if any file is missing.
75
78
  2. Pick and name the mode from **Routing** [D6]. No match → ask; do not guess.
76
- 3. Read the active mode file [D3]; use context bundle checks when changing context loads [D11]; check cached artifacts before URL/JD refetches [D12]; decide inline vs delegated work [D1].
77
- 4. Prepare Geometra dispatches: cleanup [H3], ledger prefilter when present [D8], dedupe [H2], location filter [D5], routing [D2, D10], proxy prompt hygiene [H8].
79
+ 3. Read the active mode file [D3]. Use context bundle checks when changing context loads [D11]. Check cached artifacts before URL/JD refetches [D12]. Use artifact index lookups before broad file reads when they can answer the question [D13]. Decide inline vs delegated work [D1].
80
+ 4. Prepare Geometra dispatches: cleanup [H3], index/ledger prefilter when useful [D8, D13], dedupe [H2], location filter [D5], routing [D2, D10], proxy prompt hygiene [H8].
78
81
  5. Dispatch at most 2 tasks per round [H1]; wait for final outcomes, not just task ids [H5b].
79
82
  6. Keep multi-job form-filling out of the orchestrator [H4].
80
83
  7. Cross-check subagent facts against authoritative files [H7].
@@ -0,0 +1,92 @@
1
+ import { existsSync, readFileSync, writeFileSync } from 'fs';
2
+ import { join } from 'path';
3
+ import {
4
+ buildIndex,
5
+ hasIndexRecord,
6
+ loadIndexConfig,
7
+ parseJson,
8
+ queryIndex,
9
+ verifyIndex,
10
+ } from '@razroo/iso-index';
11
+
12
+ export const INDEX_FILE = '.jobforge-index.json';
13
+ export const INDEX_CONFIG_FILE = 'templates/index.json';
14
+
15
+ export function resolveProjectDir(projectDir = process.env.JOB_FORGE_PROJECT || process.cwd()) {
16
+ return projectDir;
17
+ }
18
+
19
+ export function jobForgeIndexPath(projectDir = resolveProjectDir()) {
20
+ return process.env.JOB_FORGE_INDEX || join(projectDir, INDEX_FILE);
21
+ }
22
+
23
+ export function jobForgeIndexConfigPath(projectDir = resolveProjectDir()) {
24
+ return process.env.JOB_FORGE_INDEX_CONFIG || join(projectDir, INDEX_CONFIG_FILE);
25
+ }
26
+
27
+ export function indexExists(projectDir = resolveProjectDir()) {
28
+ return existsSync(jobForgeIndexPath(projectDir));
29
+ }
30
+
31
+ export function readJobForgeIndexConfig(projectDir = resolveProjectDir()) {
32
+ const path = jobForgeIndexConfigPath(projectDir);
33
+ return loadIndexConfig(parseJson(readFileSync(path, 'utf8'), path));
34
+ }
35
+
36
+ export function buildJobForgeIndex(options = {}, projectDir = resolveProjectDir()) {
37
+ const config = readJobForgeIndexConfig(projectDir);
38
+ const index = buildIndex(config, { root: projectDir });
39
+ const out = options.out || jobForgeIndexPath(projectDir);
40
+ if (options.write !== false) {
41
+ writeFileSync(out, `${JSON.stringify(index, null, 2)}\n`, 'utf8');
42
+ }
43
+ return { index, out };
44
+ }
45
+
46
+ export function readJobForgeIndex(projectDir = resolveProjectDir()) {
47
+ const path = jobForgeIndexPath(projectDir);
48
+ return parseJson(readFileSync(path, 'utf8'), path);
49
+ }
50
+
51
+ export function ensureJobForgeIndex(options = {}, projectDir = resolveProjectDir()) {
52
+ if (options.rebuild !== false || !indexExists(projectDir)) {
53
+ return buildJobForgeIndex({ out: options.out }, projectDir).index;
54
+ }
55
+ return readJobForgeIndex(projectDir);
56
+ }
57
+
58
+ export function queryJobForgeIndex(query = {}, options = {}, projectDir = resolveProjectDir()) {
59
+ return queryIndex(ensureJobForgeIndex(options, projectDir), query);
60
+ }
61
+
62
+ export function hasJobForgeIndexRecord(query = {}, options = {}, projectDir = resolveProjectDir()) {
63
+ return hasIndexRecord(ensureJobForgeIndex(options, projectDir), query);
64
+ }
65
+
66
+ export function verifyJobForgeIndex(options = {}, projectDir = resolveProjectDir()) {
67
+ const index = options.index || ensureJobForgeIndex(options, projectDir);
68
+ return verifyIndex(index);
69
+ }
70
+
71
+ export function jobForgeIndexSummary(projectDir = resolveProjectDir()) {
72
+ if (!indexExists(projectDir)) {
73
+ return {
74
+ path: jobForgeIndexPath(projectDir),
75
+ config: jobForgeIndexConfigPath(projectDir),
76
+ exists: false,
77
+ records: 0,
78
+ files: 0,
79
+ sources: 0,
80
+ };
81
+ }
82
+ const index = readJobForgeIndex(projectDir);
83
+ return {
84
+ path: jobForgeIndexPath(projectDir),
85
+ config: jobForgeIndexConfigPath(projectDir),
86
+ exists: true,
87
+ records: index.stats?.records || 0,
88
+ files: index.stats?.files || 0,
89
+ sources: index.stats?.sources || 0,
90
+ configHash: index.configHash,
91
+ };
92
+ }
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "job-forge",
3
- "version": "2.14.22",
3
+ "version": "2.14.24",
4
4
  "description": "AI-powered job search pipeline built on opencode",
5
5
  "type": "module",
6
6
  "bin": {
@@ -49,6 +49,12 @@
49
49
  "cache:list": "node bin/job-forge.mjs cache:list",
50
50
  "cache:verify": "node bin/job-forge.mjs cache:verify",
51
51
  "cache:prune": "node bin/job-forge.mjs cache:prune",
52
+ "index:build": "node bin/job-forge.mjs index:build",
53
+ "index:status": "node bin/job-forge.mjs index:status",
54
+ "index:query": "node bin/job-forge.mjs index:query",
55
+ "index:has": "node bin/job-forge.mjs index:has",
56
+ "index:verify": "node bin/job-forge.mjs index:verify",
57
+ "index:explain": "node bin/job-forge.mjs index:explain",
52
58
  "plan": "iso plan .",
53
59
  "lint:agentmd": "agentmd lint iso/instructions.md",
54
60
  "lint:modes": "isolint lint modes/",
@@ -120,6 +126,7 @@
120
126
  "@razroo/iso-context": "^0.1.0",
121
127
  "@razroo/iso-contract": "^0.1.0",
122
128
  "@razroo/iso-guard": "^0.1.0",
129
+ "@razroo/iso-index": "^0.1.0",
123
130
  "@razroo/iso-ledger": "^0.1.0",
124
131
  "@razroo/iso-orchestrator": "^0.1.0",
125
132
  "@razroo/iso-trace": "^0.4.0",
@@ -0,0 +1,210 @@
1
+ #!/usr/bin/env node
2
+
3
+ import { relative } from 'path';
4
+ import {
5
+ formatBuildResult,
6
+ formatConfigSummary,
7
+ formatIndexRecords,
8
+ formatVerifyResult,
9
+ } from '@razroo/iso-index';
10
+ import { PROJECT_DIR } from '../tracker-lib.mjs';
11
+ import {
12
+ buildJobForgeIndex,
13
+ hasJobForgeIndexRecord,
14
+ indexExists,
15
+ jobForgeIndexConfigPath,
16
+ jobForgeIndexPath,
17
+ jobForgeIndexSummary,
18
+ queryJobForgeIndex,
19
+ readJobForgeIndexConfig,
20
+ verifyJobForgeIndex,
21
+ } from '../lib/jobforge-index.mjs';
22
+
23
+ const USAGE = `job-forge index - local deterministic artifact lookup
24
+
25
+ Usage:
26
+ job-forge index:status [--json]
27
+ job-forge index:build [--json]
28
+ job-forge index:query [text] [--kind <kind>] [--key <key>] [--value <value>] [--source <path>] [--limit N] [--no-rebuild] [--json]
29
+ job-forge index:has [text] [--kind <kind>] [--key <key>] [--value <value>] [--source <path>] [--no-rebuild] [--json]
30
+ job-forge index:verify [--no-rebuild] [--json]
31
+ job-forge index:explain [--json]
32
+ job-forge index:path
33
+
34
+ Default config is templates/index.json. Default output is .jobforge-index.json.
35
+ Query, has, and verify rebuild the index by default so consumer projects need no
36
+ manual setup. Use --no-rebuild to inspect the existing index file only.`;
37
+
38
+ const [cmd = 'help', ...rawArgs] = process.argv.slice(2);
39
+ const opts = parseArgs(rawArgs);
40
+
41
+ if (opts.help || cmd === 'help' || cmd === '--help' || cmd === '-h') {
42
+ console.log(USAGE);
43
+ process.exit(0);
44
+ }
45
+
46
+ try {
47
+ if (cmd === 'path') {
48
+ console.log(jobForgeIndexPath(PROJECT_DIR));
49
+ } else if (cmd === 'status') {
50
+ status(opts);
51
+ } else if (cmd === 'build') {
52
+ build(opts);
53
+ } else if (cmd === 'query') {
54
+ query(opts);
55
+ } else if (cmd === 'has') {
56
+ has(opts);
57
+ } else if (cmd === 'verify') {
58
+ verify(opts);
59
+ } else if (cmd === 'explain') {
60
+ explain(opts);
61
+ } else {
62
+ console.error(`unknown index command "${cmd}"\n`);
63
+ console.error(USAGE);
64
+ process.exit(2);
65
+ }
66
+ } catch (error) {
67
+ console.error(error instanceof Error ? error.message : String(error));
68
+ process.exit(1);
69
+ }
70
+
71
+ function parseArgs(args) {
72
+ const opts = {
73
+ json: false,
74
+ help: false,
75
+ rebuild: true,
76
+ query: {},
77
+ text: [],
78
+ };
79
+
80
+ for (let i = 0; i < args.length; i++) {
81
+ const arg = args[i];
82
+ if (arg === '--json') {
83
+ opts.json = true;
84
+ } else if (arg === '--no-rebuild') {
85
+ opts.rebuild = false;
86
+ } else if (arg === '--rebuild') {
87
+ opts.rebuild = true;
88
+ } else if (arg === '--kind') {
89
+ opts.query.kind = valueAfter(args, ++i, '--kind');
90
+ } else if (arg.startsWith('--kind=')) {
91
+ opts.query.kind = arg.slice('--kind='.length);
92
+ } else if (arg === '--key') {
93
+ opts.query.key = valueAfter(args, ++i, '--key');
94
+ } else if (arg.startsWith('--key=')) {
95
+ opts.query.key = arg.slice('--key='.length);
96
+ } else if (arg === '--value') {
97
+ opts.query.value = valueAfter(args, ++i, '--value');
98
+ } else if (arg.startsWith('--value=')) {
99
+ opts.query.value = arg.slice('--value='.length);
100
+ } else if (arg === '--source') {
101
+ opts.query.source = valueAfter(args, ++i, '--source');
102
+ } else if (arg.startsWith('--source=')) {
103
+ opts.query.source = arg.slice('--source='.length);
104
+ } else if (arg === '--limit') {
105
+ opts.query.limit = parsePositiveInteger(valueAfter(args, ++i, '--limit'), '--limit');
106
+ } else if (arg.startsWith('--limit=')) {
107
+ opts.query.limit = parsePositiveInteger(arg.slice('--limit='.length), '--limit');
108
+ } else if (arg === '--help' || arg === '-h') {
109
+ opts.help = true;
110
+ } else if (arg.startsWith('--')) {
111
+ throw new Error(`unknown flag "${arg}"`);
112
+ } else {
113
+ opts.text.push(arg);
114
+ }
115
+ }
116
+
117
+ if (opts.text.length > 0) opts.query.text = opts.text.join(' ');
118
+ return opts;
119
+ }
120
+
121
+ function valueAfter(values, index, flag) {
122
+ const value = values[index];
123
+ if (!value || value.startsWith('--')) throw new Error(`${flag} requires a value`);
124
+ return value;
125
+ }
126
+
127
+ function parsePositiveInteger(value, flag) {
128
+ const parsed = Number(value);
129
+ if (!Number.isInteger(parsed) || parsed <= 0) throw new Error(`${flag} must be a positive integer`);
130
+ return parsed;
131
+ }
132
+
133
+ function status(opts) {
134
+ const summary = jobForgeIndexSummary(PROJECT_DIR);
135
+ if (opts.json) {
136
+ console.log(JSON.stringify(summary, null, 2));
137
+ return;
138
+ }
139
+ if (!summary.exists) {
140
+ console.log(`index: missing (${relativePath(summary.path)})`);
141
+ console.log('run: job-forge index:build');
142
+ return;
143
+ }
144
+ const result = verifyJobForgeIndex({ rebuild: false }, PROJECT_DIR);
145
+ console.log(`index: ${relativePath(summary.path)}`);
146
+ console.log(`config: ${relativePath(summary.config)}`);
147
+ console.log(`sources: ${summary.sources}`);
148
+ console.log(`files: ${summary.files}`);
149
+ console.log(`records: ${summary.records}`);
150
+ console.log(`verify: ${result.ok ? 'PASS' : 'FAIL'} (${result.issues.length} issue(s))`);
151
+ }
152
+
153
+ function build(opts) {
154
+ const { index, out } = buildJobForgeIndex({}, PROJECT_DIR);
155
+ if (opts.json) {
156
+ console.log(JSON.stringify({ out, stats: index.stats }, null, 2));
157
+ return;
158
+ }
159
+ console.log(formatBuildResult(index, out));
160
+ }
161
+
162
+ function query(opts) {
163
+ const records = queryJobForgeIndex(opts.query, { rebuild: opts.rebuild }, PROJECT_DIR);
164
+ if (opts.json) {
165
+ console.log(JSON.stringify(records, null, 2));
166
+ return;
167
+ }
168
+ console.log(formatIndexRecords(records));
169
+ }
170
+
171
+ function has(opts) {
172
+ const hit = hasJobForgeIndexRecord(opts.query, { rebuild: opts.rebuild }, PROJECT_DIR);
173
+ if (opts.json) {
174
+ console.log(JSON.stringify({ hit, query: opts.query }, null, 2));
175
+ } else {
176
+ console.log(hit ? 'MATCH' : 'MISS');
177
+ }
178
+ process.exit(hit ? 0 : 1);
179
+ }
180
+
181
+ function verify(opts) {
182
+ if (!opts.rebuild && !indexExists(PROJECT_DIR)) {
183
+ if (opts.json) {
184
+ console.log(JSON.stringify({ ok: true, missing: true, path: jobForgeIndexPath(PROJECT_DIR) }, null, 2));
185
+ } else {
186
+ console.log(`index: missing (${relativePath(jobForgeIndexPath(PROJECT_DIR))})`);
187
+ }
188
+ return;
189
+ }
190
+ const result = verifyJobForgeIndex({ rebuild: opts.rebuild }, PROJECT_DIR);
191
+ if (opts.json) {
192
+ console.log(JSON.stringify(result, null, 2));
193
+ } else {
194
+ console.log(formatVerifyResult(result));
195
+ }
196
+ process.exit(result.ok ? 0 : 1);
197
+ }
198
+
199
+ function explain(opts) {
200
+ const config = readJobForgeIndexConfig(PROJECT_DIR);
201
+ if (opts.json) {
202
+ console.log(JSON.stringify(config, null, 2));
203
+ return;
204
+ }
205
+ console.log(formatConfigSummary(config));
206
+ }
207
+
208
+ function relativePath(path) {
209
+ return relative(PROJECT_DIR, path) || '.';
210
+ }
@@ -12,6 +12,7 @@
12
12
  "npx job-forge capabilities:*",
13
13
  "npx job-forge context:*",
14
14
  "npx job-forge cache:*",
15
+ "npx job-forge index:*",
15
16
  "rg *"
16
17
  ],
17
18
  "deny": [
@@ -58,7 +59,8 @@
58
59
  "npx job-forge verify",
59
60
  "npx job-forge ledger:*",
60
61
  "npx job-forge capabilities:*",
61
- "npx job-forge cache:*"
62
+ "npx job-forge cache:*",
63
+ "npx job-forge index:*"
62
64
  ],
63
65
  "deny": [
64
66
  "task *"
@@ -0,0 +1,144 @@
1
+ {
2
+ "version": 1,
3
+ "sources": [
4
+ {
5
+ "name": "reports",
6
+ "include": ["reports/*.md"],
7
+ "format": "text",
8
+ "rules": [
9
+ {
10
+ "kind": "jobforge.report.url",
11
+ "pattern": "^\\*\\*URL:\\*\\*\\s*(?<url>https?://\\S+)",
12
+ "flags": "i",
13
+ "key": "url:{url}",
14
+ "value": "{source}",
15
+ "fields": {
16
+ "url": "{url}",
17
+ "report": "{source}"
18
+ },
19
+ "tags": ["report", "url"]
20
+ },
21
+ {
22
+ "kind": "jobforge.report.score",
23
+ "pattern": "^\\*\\*Score:\\*\\*\\s*(?<score>[0-9.]+/5)",
24
+ "flags": "i",
25
+ "key": "report:{source}:score",
26
+ "value": "{score}",
27
+ "fields": {
28
+ "score": "{score}",
29
+ "report": "{source}"
30
+ },
31
+ "tags": ["report", "score"]
32
+ }
33
+ ]
34
+ },
35
+ {
36
+ "name": "application-day-files",
37
+ "include": ["data/applications/*.md"],
38
+ "format": "markdown-table",
39
+ "records": [
40
+ {
41
+ "kind": "jobforge.application",
42
+ "key": "company-role:{Company|slug}:{Role|slug}",
43
+ "value": "{Status}",
44
+ "fields": ["#", "Date", "Company", "Role", "Score", "Status", "PDF", "Report", "Notes"],
45
+ "tags": ["tracker", "application"]
46
+ },
47
+ {
48
+ "kind": "jobforge.report-ref",
49
+ "key": "{Report}",
50
+ "value": "{Company} - {Role}",
51
+ "fields": {
52
+ "company": "{Company}",
53
+ "role": "{Role}",
54
+ "status": "{Status}",
55
+ "report": "{Report}"
56
+ },
57
+ "tags": ["tracker", "report"]
58
+ }
59
+ ]
60
+ },
61
+ {
62
+ "name": "application-single-file",
63
+ "include": ["data/applications.md", "applications.md"],
64
+ "format": "markdown-table",
65
+ "records": [
66
+ {
67
+ "kind": "jobforge.application",
68
+ "key": "company-role:{Company|slug}:{Role|slug}",
69
+ "value": "{Status}",
70
+ "fields": ["#", "Date", "Company", "Role", "Score", "Status", "PDF", "Report", "Notes"],
71
+ "tags": ["tracker", "application"]
72
+ }
73
+ ]
74
+ },
75
+ {
76
+ "name": "tracker-additions",
77
+ "include": ["batch/tracker-additions/*.tsv", "batch/tracker-additions/merged/*.tsv"],
78
+ "format": "tsv",
79
+ "header": false,
80
+ "columns": ["num", "date", "company", "role", "statusOrScore", "scoreOrStatus", "pdf", "report", "notes"],
81
+ "records": [
82
+ {
83
+ "kind": "jobforge.tracker-addition",
84
+ "key": "company-role:{company|slug}:{role|slug}",
85
+ "value": "{source}",
86
+ "fields": ["num", "date", "company", "role", "statusOrScore", "scoreOrStatus", "pdf", "report", "notes"],
87
+ "tags": ["tracker", "tsv"]
88
+ }
89
+ ]
90
+ },
91
+ {
92
+ "name": "pipeline",
93
+ "include": ["data/pipeline.md"],
94
+ "format": "text",
95
+ "rules": [
96
+ {
97
+ "kind": "jobforge.pipeline.url",
98
+ "pattern": "^\\s*-\\s*\\[(?<state>[ xX])\\]\\s+(?<url>https?://\\S+)",
99
+ "key": "url:{url}",
100
+ "value": "{state}",
101
+ "fields": {
102
+ "url": "{url}",
103
+ "state": "{state}",
104
+ "pipeline": "{source}"
105
+ },
106
+ "tags": ["pipeline", "url"]
107
+ }
108
+ ]
109
+ },
110
+ {
111
+ "name": "scan-history",
112
+ "include": ["data/scan-history.tsv"],
113
+ "format": "tsv",
114
+ "records": [
115
+ {
116
+ "kind": "jobforge.scan.url",
117
+ "key": "url:{url}",
118
+ "value": "{company} - {role}",
119
+ "fields": ["date", "company", "role", "url", "ats"],
120
+ "tags": ["scan", "url"]
121
+ }
122
+ ]
123
+ },
124
+ {
125
+ "name": "ledger",
126
+ "include": [".jobforge-ledger/*.jsonl"],
127
+ "format": "jsonl",
128
+ "records": [
129
+ {
130
+ "kind": "jobforge.ledger.event",
131
+ "key": "{key}",
132
+ "value": "{type}",
133
+ "fields": {
134
+ "type": "{type}",
135
+ "key": "{key}",
136
+ "status": "{data.status}",
137
+ "source": "{source}"
138
+ },
139
+ "tags": ["ledger"]
140
+ }
141
+ ]
142
+ }
143
+ ]
144
+ }
@@ -17,6 +17,7 @@
17
17
  * 8. No markdown bold in score column
18
18
  * 9. Drift warning if states.yml ids differ from the built-in fallback list
19
19
  * 10. Ledger file verifies if .jobforge-ledger/events.jsonl exists
20
+ * 11. Artifact index verifies if .jobforge-index.json exists
20
21
  *
21
22
  * Run: node verify-pipeline.mjs (from repo root; same as npm run verify)
22
23
  */
@@ -29,6 +30,7 @@ import {
29
30
  usesDayFiles, readAllEntries, listDayFiles, dayFilePath,
30
31
  } from './tracker-lib.mjs';
31
32
  import { jobForgeLedgerPath, ledgerExists, verifyJobForgeLedger } from './lib/jobforge-ledger.mjs';
33
+ import { indexExists, jobForgeIndexPath, verifyJobForgeIndex } from './lib/jobforge-index.mjs';
32
34
  import {
33
35
  canonicalStatusValues,
34
36
  formatContractIssues,
@@ -153,6 +155,22 @@ function verifyLedgerIfPresent() {
153
155
  }
154
156
  }
155
157
 
158
+ function verifyIndexIfPresent() {
159
+ if (!indexExists(PROJECT_DIR)) {
160
+ ok('Artifact index not initialized');
161
+ return;
162
+ }
163
+ const result = verifyJobForgeIndex({ rebuild: false }, PROJECT_DIR);
164
+ for (const issue of result.issues) {
165
+ const msg = `index: ${issue.kind}: ${issue.message}`;
166
+ if (issue.severity === 'error') error(msg);
167
+ else warn(msg);
168
+ }
169
+ if (result.ok) {
170
+ ok(`Artifact index valid (${result.records} records at ${relative(PROJECT_DIR, jobForgeIndexPath(PROJECT_DIR))})`);
171
+ }
172
+ }
173
+
156
174
  // --- Read entries ---
157
175
  const { entries, source } = readAllEntries();
158
176
 
@@ -162,6 +180,7 @@ if (entries.length === 0) {
162
180
  checkPendingTrackerAdditions();
163
181
  verifyStatesYamlDrift();
164
182
  verifyLedgerIfPresent();
183
+ verifyIndexIfPresent();
165
184
  console.log('\n' + '='.repeat(50));
166
185
  console.log(`📊 Pipeline Health: ${errors} errors, ${warnings} warnings`);
167
186
  if (errors === 0 && warnings === 0) console.log('🟢 Pipeline is clean!');
@@ -297,6 +316,7 @@ if (boldScores === 0) ok('No bold in scores');
297
316
 
298
317
  verifyStatesYamlDrift();
299
318
  verifyLedgerIfPresent();
319
+ verifyIndexIfPresent();
300
320
 
301
321
  console.log('\n' + '='.repeat(50));
302
322
  console.log(`📊 Pipeline Health: ${errors} errors, ${warnings} warnings`);