npm - job-forge - Versions diffs - 2.14.21 → 2.14.23 - Mend

job-forge 2.14.21 → 2.14.23

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (21) hide show

package/.cursor/rules/main.mdc +11 -5
package/.opencode/skills/job-forge.md +10 -0
package/AGENTS.md +11 -5
package/CLAUDE.md +11 -5
package/README.md +7 -3
package/bin/create-job-forge.mjs +7 -0
package/bin/job-forge.mjs +70 -0
package/docs/ARCHITECTURE.md +6 -2
package/docs/CUSTOMIZATION.md +4 -0
package/docs/SETUP.md +4 -0
package/iso/commands/job-forge.md +10 -0
package/iso/instructions.md +11 -5
package/lib/jobforge-cache.mjs +105 -0
package/lib/jobforge-index.mjs +92 -0
package/modes/auto-pipeline.md +3 -1
package/package.json +20 -2
package/scripts/cache.mjs +313 -0
package/scripts/index.mjs +210 -0
package/templates/capabilities.json +5 -1
package/templates/index.json +144 -0
package/verify-pipeline.mjs +20 -0

package/.cursor/rules/main.mdc CHANGED Viewed

@@ -12,7 +12,7 @@ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs v
 - [H1] Max 2 parallel `task` dispatches per message. For N jobs, run `ceil(N/2)` sequential rounds of 2. A round is not complete until both subagents return a final outcome (`APPLIED`, `APPLY FAILED`, `SKIP`, `Discarded`, or a written TSV path). A `task` tool result that only gives a session id / title is a launch acknowledgement, not completion. Applies in all modes, for all user phrasings ("urgent", "apply to 10 jobs now").
   why: each subagent requires post-cleanup and racing more than 2 reliably loses at least one result. On 2026-04-25 the orchestrator launched round 2 while round 1 had only returned task ids, leaving four application subagents in flight and losing two provider recoveries
-- [H2] Max 1 application per company+role. Before every `apply` dispatch, grep all four sources for the URL and for `company+role`: `data/pipeline.md`, all `data/applications/*.md` day files, `batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`. If `.jobforge-ledger/events.jsonl` exists, `npx job-forge ledger:has --company "..." --role "..." --status Applied` may be used as a fast prefilter; a match is enough to drop that duplicate before dispatch. For candidates not rejected by the ledger, the four-source grep is still mandatory. If any source shows APPLIED / Applied, skip the dispatch and pick a replacement from the remaining candidate list. Do not count duplicates toward a requested "apply to N jobs" total, and do not delegate obvious duplicates just so a subagent can return SKIP.
+- [H2] Max 1 application per company+role. Before every `apply` dispatch, grep all four sources for the URL and for `company+role`: `data/pipeline.md`, all `data/applications/*.md` day files, `batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`. `npx job-forge index:has --key "company-role:..."` may be used first as a fast local artifact prefilter; it rebuilds `.jobforge-index.json` on demand from `templates/index.json`, and a company+role hit is enough to drop an obvious duplicate before dispatch. If `.jobforge-ledger/events.jsonl` exists, `npx job-forge ledger:has --company "..." --role "..." --status Applied` may also be used as a fast prefilter; a match is enough to drop that duplicate before dispatch. For candidates not rejected by the index or ledger, the four-source grep is still mandatory. If any source shows APPLIED / Applied, skip the dispatch and pick a replacement from the remaining candidate list. Do not count duplicates toward a requested "apply to N jobs" total, and do not delegate obvious duplicates just so a subagent can return SKIP.
   why: 2026-04 same-day batch collision — when two batches target the same role, `npx job-forge merge` updates the existing day-file row rather than appending, so grepping day files alone misses earlier-batch applies; merged/*.tsv is the only place the breadcrumb remains
 - [H3] Before every batch of `task` dispatches that will use Geometra, call `geometra_list_sessions` then `geometra_disconnect({closeBrowser: true})`. Every round, no exceptions. Name this cleanup as an explicit "step 0" in your first-response plan for any multi-apply request — it is the most frequently skipped guardrail in practice, and skipping it produces cascade "Not connected" failures on the next dispatch.
@@ -24,13 +24,13 @@ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs v
 - [H5] Re-dispatch the same company only AFTER the previous subagent returns. Never fire the same `task` twice while the first is still in flight.
   why: two in-flight subagents for the same URL race on Geometra sessions and on tracker TSV writes, corrupting state and sometimes double-submitting
-- [H5b] Do not use `task` to poll task status. If OpenCode returns a task/session id without a final result, record the id, stop dispatching new rounds, and tell the user the round is still in flight. When the user asks to check later, inspect authoritative files (`batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`, day files, `.jobforge-ledger/events.jsonl`, or `iso-trace`) rather than spawning a "check task status" subagent.
+- [H5b] Do not use `task` to poll task status. If OpenCode returns a task/session id without a final result, record the id, stop dispatching new rounds, and tell the user the round is still in flight. When the user asks to check later, inspect authoritative files (`batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`, day files, `.jobforge-ledger/events.jsonl`, `.jobforge-index.json`, or `iso-trace`) rather than spawning a "check task status" subagent.
   why: OpenCode status prompts can be delivered into the target subagent as a new user message; a 2026-04-25 trace caused a subagent to call `task` recursively instead of finishing the application
 - [H6] Application outcomes flow through `batch/tracker-additions/*.tsv`, not `data/pipeline.md`. After any multi-apply run, the orchestrator MUST run `npx job-forge merge` then `npx job-forge verify` before ending the session.
   why: `pipeline.md` is the URL inbox (`[ ]` pending → `[x]` processed); `data/applications/YYYY-MM-DD.md` is the outcome log; the TSV pathway is the only safe bridge because `merge` handles column order and duplicate detection
-- [H7] Load-bearing facts passed to downstream subagents must originate from a file, not from prior subagent prose. Authoritative sources: `data/pipeline.md`, `data/scan-history.tsv`, `batch/scan-output-*.md`, `reports/{num}-*.md` with `**URL:**` / `**Score:**` headers, `batch/tracker-additions/*.tsv`.
+- [H7] Load-bearing facts passed to downstream subagents must originate from a file, not from prior subagent prose. Authoritative sources: `data/pipeline.md`, `data/scan-history.tsv`, `batch/scan-output-*.md`, `reports/{num}-*.md` with `**URL:**` / `**Score:**` headers, `batch/tracker-additions/*.tsv`, cached JD content returned by `npx job-forge cache:get --url ...`, and source path/line pointers returned by `npx job-forge index:query ...`.
   why: 2026-04-18 scan subagent returned 30 fabricated Greenhouse IDs in prose (plausible-looking, non-existent); orchestrator dispatched 30 downstream subagents that all 404'd. Subagents can hallucinate IDs, scores, and confirmation text — round-trip through a file or don't trust the value
 - [H8] Never paste proxy values from `config/profile.yml` into `task` prompts, status text, or summaries. If a proxy is configured, tell the subagent exactly: "Proxy is configured; read `config/profile.yml` and pass its top-level `proxy:` object to every `geometra_connect` call." Do not transcribe `server`, `username`, `password`, or `bypass`, even if you just read them from disk.
@@ -71,12 +71,18 @@ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs v
 - [D11] Treat `templates/context.json` as the source of truth for mode/reference context bundles. Use `npx job-forge context:plan <mode>` or `npx job-forge context:check <mode>` when changing or validating what a mode loads; do not paste the full context matrix into prompts.
   why: deterministic context bundles prevent reference-file drift and accidental token bloat without adding MCP/tool-schema tokens
+- [D12] Use `job-forge cache:*` for deterministic local artifact reuse when available. For URL inputs, check `npx job-forge cache:has --url "..."` / `cache:get` before browser or network JD fetches; after a successful fetch, store the exact JD text with `npx job-forge cache:put --url "..." --ttl 14d --input @file` when it is already on disk.
+  why: `iso-cache` is not an MCP and adds no prompt/tool-schema tokens; it avoids repeated JD fetch/render passes and lets future sessions reuse stable content from `.jobforge-cache/`
+- [D13] Use `job-forge index:*` for deterministic artifact lookup when available. `index:has` and `index:query` rebuild `.jobforge-index.json` from `templates/index.json` on demand, covering reports, tracker day files, tracker TSVs, pipeline URLs, scan history, and ledger events without loading those growing files into prompt context.
+  why: `iso-index` is not an MCP and adds no prompt/tool-schema tokens; it gives agents compact file/line pointers and duplicate prefilters before expensive reads or browser dispatches
 ## Procedure
 1. Check `cv.md`, `profile.yml`, and `portals.yml`; onboard if any file is missing.
 2. Pick and name the mode from **Routing** [D6]. No match → ask; do not guess.
-3. Read the active mode file [D3]; use context bundle checks when changing context loads [D11]; decide inline vs delegated work [D1].
-4. Prepare Geometra dispatches: cleanup [H3], ledger prefilter when present [D8], dedupe [H2], location filter [D5], routing [D2, D10], proxy prompt hygiene [H8].
+3. Read the active mode file [D3]. Use context bundle checks when changing context loads [D11]. Check cached artifacts before URL/JD refetches [D12]. Use artifact index lookups before broad file reads when they can answer the question [D13]. Decide inline vs delegated work [D1].
+4. Prepare Geometra dispatches: cleanup [H3], index/ledger prefilter when useful [D8, D13], dedupe [H2], location filter [D5], routing [D2, D10], proxy prompt hygiene [H8].
 5. Dispatch at most 2 tasks per round [H1]; wait for final outcomes, not just task ids [H5b].
 6. Keep multi-job form-filling out of the orchestrator [H4].
 7. Cross-check subagent facts against authoritative files [H7].

package/.opencode/skills/job-forge.md CHANGED Viewed

@@ -73,6 +73,11 @@ Local workflow ledger (terminal, outside opencode):
   npx job-forge ledger:status          # .jobforge-ledger/events.jsonl summary
   npx job-forge ledger:has --company "Acme" --role "Staff Engineer" --status Applied
+Local artifact index (terminal, outside opencode):
+  npx job-forge index:status           # .jobforge-index.json summary
+  npx job-forge index:has --key "company-role:acme:staff-engineer"
+  npx job-forge index:query "acme"
 Artifact contracts (terminal, outside opencode):
   npx iso-contract explain jobforge.tracker-row --contracts templates/contracts.json
   npx job-forge tracker-line ... --write   # renders + validates tracker TSV locally
@@ -158,6 +163,11 @@ Step 1  — Enumerate candidates
   - Build ordered list: candidates = [job_1, job_2, ..., job_N]
 Step 2  — Dedup against already-applied
+  - Run npx job-forge index:has --key "company-role:<company-slug>:<role-slug>"
+    as a fast local artifact prefilter when company+role is known. It rebuilds
+    .jobforge-index.json on demand from templates/index.json. A hit means the
+    role has already appeared in tracker files or tracker TSVs and can be
+    dropped before dispatch.
   - If .jobforge-ledger/events.jsonl exists, use npx job-forge ledger:has as a
     fast prefilter for obvious company+role Applied duplicates. A ledger match
     can be dropped before dispatch without loading tracker files into context.

package/AGENTS.md CHANGED Viewed

@@ -7,7 +7,7 @@ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs v
 - [H1] Max 2 parallel `task` dispatches per message. For N jobs, run `ceil(N/2)` sequential rounds of 2. A round is not complete until both subagents return a final outcome (`APPLIED`, `APPLY FAILED`, `SKIP`, `Discarded`, or a written TSV path). A `task` tool result that only gives a session id / title is a launch acknowledgement, not completion. Applies in all modes, for all user phrasings ("urgent", "apply to 10 jobs now").
   why: each subagent requires post-cleanup and racing more than 2 reliably loses at least one result. On 2026-04-25 the orchestrator launched round 2 while round 1 had only returned task ids, leaving four application subagents in flight and losing two provider recoveries
-- [H2] Max 1 application per company+role. Before every `apply` dispatch, grep all four sources for the URL and for `company+role`: `data/pipeline.md`, all `data/applications/*.md` day files, `batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`. If `.jobforge-ledger/events.jsonl` exists, `npx job-forge ledger:has --company "..." --role "..." --status Applied` may be used as a fast prefilter; a match is enough to drop that duplicate before dispatch. For candidates not rejected by the ledger, the four-source grep is still mandatory. If any source shows APPLIED / Applied, skip the dispatch and pick a replacement from the remaining candidate list. Do not count duplicates toward a requested "apply to N jobs" total, and do not delegate obvious duplicates just so a subagent can return SKIP.
+- [H2] Max 1 application per company+role. Before every `apply` dispatch, grep all four sources for the URL and for `company+role`: `data/pipeline.md`, all `data/applications/*.md` day files, `batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`. `npx job-forge index:has --key "company-role:..."` may be used first as a fast local artifact prefilter; it rebuilds `.jobforge-index.json` on demand from `templates/index.json`, and a company+role hit is enough to drop an obvious duplicate before dispatch. If `.jobforge-ledger/events.jsonl` exists, `npx job-forge ledger:has --company "..." --role "..." --status Applied` may also be used as a fast prefilter; a match is enough to drop that duplicate before dispatch. For candidates not rejected by the index or ledger, the four-source grep is still mandatory. If any source shows APPLIED / Applied, skip the dispatch and pick a replacement from the remaining candidate list. Do not count duplicates toward a requested "apply to N jobs" total, and do not delegate obvious duplicates just so a subagent can return SKIP.
   why: 2026-04 same-day batch collision — when two batches target the same role, `npx job-forge merge` updates the existing day-file row rather than appending, so grepping day files alone misses earlier-batch applies; merged/*.tsv is the only place the breadcrumb remains
 - [H3] Before every batch of `task` dispatches that will use Geometra, call `geometra_list_sessions` then `geometra_disconnect({closeBrowser: true})`. Every round, no exceptions. Name this cleanup as an explicit "step 0" in your first-response plan for any multi-apply request — it is the most frequently skipped guardrail in practice, and skipping it produces cascade "Not connected" failures on the next dispatch.
@@ -19,13 +19,13 @@ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs v
 - [H5] Re-dispatch the same company only AFTER the previous subagent returns. Never fire the same `task` twice while the first is still in flight.
   why: two in-flight subagents for the same URL race on Geometra sessions and on tracker TSV writes, corrupting state and sometimes double-submitting
-- [H5b] Do not use `task` to poll task status. If OpenCode returns a task/session id without a final result, record the id, stop dispatching new rounds, and tell the user the round is still in flight. When the user asks to check later, inspect authoritative files (`batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`, day files, `.jobforge-ledger/events.jsonl`, or `iso-trace`) rather than spawning a "check task status" subagent.
+- [H5b] Do not use `task` to poll task status. If OpenCode returns a task/session id without a final result, record the id, stop dispatching new rounds, and tell the user the round is still in flight. When the user asks to check later, inspect authoritative files (`batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`, day files, `.jobforge-ledger/events.jsonl`, `.jobforge-index.json`, or `iso-trace`) rather than spawning a "check task status" subagent.
   why: OpenCode status prompts can be delivered into the target subagent as a new user message; a 2026-04-25 trace caused a subagent to call `task` recursively instead of finishing the application
 - [H6] Application outcomes flow through `batch/tracker-additions/*.tsv`, not `data/pipeline.md`. After any multi-apply run, the orchestrator MUST run `npx job-forge merge` then `npx job-forge verify` before ending the session.
   why: `pipeline.md` is the URL inbox (`[ ]` pending → `[x]` processed); `data/applications/YYYY-MM-DD.md` is the outcome log; the TSV pathway is the only safe bridge because `merge` handles column order and duplicate detection
-- [H7] Load-bearing facts passed to downstream subagents must originate from a file, not from prior subagent prose. Authoritative sources: `data/pipeline.md`, `data/scan-history.tsv`, `batch/scan-output-*.md`, `reports/{num}-*.md` with `**URL:**` / `**Score:**` headers, `batch/tracker-additions/*.tsv`.
+- [H7] Load-bearing facts passed to downstream subagents must originate from a file, not from prior subagent prose. Authoritative sources: `data/pipeline.md`, `data/scan-history.tsv`, `batch/scan-output-*.md`, `reports/{num}-*.md` with `**URL:**` / `**Score:**` headers, `batch/tracker-additions/*.tsv`, cached JD content returned by `npx job-forge cache:get --url ...`, and source path/line pointers returned by `npx job-forge index:query ...`.
   why: 2026-04-18 scan subagent returned 30 fabricated Greenhouse IDs in prose (plausible-looking, non-existent); orchestrator dispatched 30 downstream subagents that all 404'd. Subagents can hallucinate IDs, scores, and confirmation text — round-trip through a file or don't trust the value
 - [H8] Never paste proxy values from `config/profile.yml` into `task` prompts, status text, or summaries. If a proxy is configured, tell the subagent exactly: "Proxy is configured; read `config/profile.yml` and pass its top-level `proxy:` object to every `geometra_connect` call." Do not transcribe `server`, `username`, `password`, or `bypass`, even if you just read them from disk.
@@ -66,12 +66,18 @@ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs v
 - [D11] Treat `templates/context.json` as the source of truth for mode/reference context bundles. Use `npx job-forge context:plan <mode>` or `npx job-forge context:check <mode>` when changing or validating what a mode loads; do not paste the full context matrix into prompts.
   why: deterministic context bundles prevent reference-file drift and accidental token bloat without adding MCP/tool-schema tokens
+- [D12] Use `job-forge cache:*` for deterministic local artifact reuse when available. For URL inputs, check `npx job-forge cache:has --url "..."` / `cache:get` before browser or network JD fetches; after a successful fetch, store the exact JD text with `npx job-forge cache:put --url "..." --ttl 14d --input @file` when it is already on disk.
+  why: `iso-cache` is not an MCP and adds no prompt/tool-schema tokens; it avoids repeated JD fetch/render passes and lets future sessions reuse stable content from `.jobforge-cache/`
+- [D13] Use `job-forge index:*` for deterministic artifact lookup when available. `index:has` and `index:query` rebuild `.jobforge-index.json` from `templates/index.json` on demand, covering reports, tracker day files, tracker TSVs, pipeline URLs, scan history, and ledger events without loading those growing files into prompt context.
+  why: `iso-index` is not an MCP and adds no prompt/tool-schema tokens; it gives agents compact file/line pointers and duplicate prefilters before expensive reads or browser dispatches
 ## Procedure
 1. Check `cv.md`, `profile.yml`, and `portals.yml`; onboard if any file is missing.
 2. Pick and name the mode from **Routing** [D6]. No match → ask; do not guess.
-3. Read the active mode file [D3]; use context bundle checks when changing context loads [D11]; decide inline vs delegated work [D1].
-4. Prepare Geometra dispatches: cleanup [H3], ledger prefilter when present [D8], dedupe [H2], location filter [D5], routing [D2, D10], proxy prompt hygiene [H8].
+3. Read the active mode file [D3]. Use context bundle checks when changing context loads [D11]. Check cached artifacts before URL/JD refetches [D12]. Use artifact index lookups before broad file reads when they can answer the question [D13]. Decide inline vs delegated work [D1].
+4. Prepare Geometra dispatches: cleanup [H3], index/ledger prefilter when useful [D8, D13], dedupe [H2], location filter [D5], routing [D2, D10], proxy prompt hygiene [H8].
 5. Dispatch at most 2 tasks per round [H1]; wait for final outcomes, not just task ids [H5b].
 6. Keep multi-job form-filling out of the orchestrator [H4].
 7. Cross-check subagent facts against authoritative files [H7].

package/CLAUDE.md CHANGED Viewed

@@ -7,7 +7,7 @@ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs v
 - [H1] Max 2 parallel `task` dispatches per message. For N jobs, run `ceil(N/2)` sequential rounds of 2. A round is not complete until both subagents return a final outcome (`APPLIED`, `APPLY FAILED`, `SKIP`, `Discarded`, or a written TSV path). A `task` tool result that only gives a session id / title is a launch acknowledgement, not completion. Applies in all modes, for all user phrasings ("urgent", "apply to 10 jobs now").
   why: each subagent requires post-cleanup and racing more than 2 reliably loses at least one result. On 2026-04-25 the orchestrator launched round 2 while round 1 had only returned task ids, leaving four application subagents in flight and losing two provider recoveries
-- [H2] Max 1 application per company+role. Before every `apply` dispatch, grep all four sources for the URL and for `company+role`: `data/pipeline.md`, all `data/applications/*.md` day files, `batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`. If `.jobforge-ledger/events.jsonl` exists, `npx job-forge ledger:has --company "..." --role "..." --status Applied` may be used as a fast prefilter; a match is enough to drop that duplicate before dispatch. For candidates not rejected by the ledger, the four-source grep is still mandatory. If any source shows APPLIED / Applied, skip the dispatch and pick a replacement from the remaining candidate list. Do not count duplicates toward a requested "apply to N jobs" total, and do not delegate obvious duplicates just so a subagent can return SKIP.
+- [H2] Max 1 application per company+role. Before every `apply` dispatch, grep all four sources for the URL and for `company+role`: `data/pipeline.md`, all `data/applications/*.md` day files, `batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`. `npx job-forge index:has --key "company-role:..."` may be used first as a fast local artifact prefilter; it rebuilds `.jobforge-index.json` on demand from `templates/index.json`, and a company+role hit is enough to drop an obvious duplicate before dispatch. If `.jobforge-ledger/events.jsonl` exists, `npx job-forge ledger:has --company "..." --role "..." --status Applied` may also be used as a fast prefilter; a match is enough to drop that duplicate before dispatch. For candidates not rejected by the index or ledger, the four-source grep is still mandatory. If any source shows APPLIED / Applied, skip the dispatch and pick a replacement from the remaining candidate list. Do not count duplicates toward a requested "apply to N jobs" total, and do not delegate obvious duplicates just so a subagent can return SKIP.
   why: 2026-04 same-day batch collision — when two batches target the same role, `npx job-forge merge` updates the existing day-file row rather than appending, so grepping day files alone misses earlier-batch applies; merged/*.tsv is the only place the breadcrumb remains
 - [H3] Before every batch of `task` dispatches that will use Geometra, call `geometra_list_sessions` then `geometra_disconnect({closeBrowser: true})`. Every round, no exceptions. Name this cleanup as an explicit "step 0" in your first-response plan for any multi-apply request — it is the most frequently skipped guardrail in practice, and skipping it produces cascade "Not connected" failures on the next dispatch.
@@ -19,13 +19,13 @@ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs v
 - [H5] Re-dispatch the same company only AFTER the previous subagent returns. Never fire the same `task` twice while the first is still in flight.
   why: two in-flight subagents for the same URL race on Geometra sessions and on tracker TSV writes, corrupting state and sometimes double-submitting
-- [H5b] Do not use `task` to poll task status. If OpenCode returns a task/session id without a final result, record the id, stop dispatching new rounds, and tell the user the round is still in flight. When the user asks to check later, inspect authoritative files (`batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`, day files, `.jobforge-ledger/events.jsonl`, or `iso-trace`) rather than spawning a "check task status" subagent.
+- [H5b] Do not use `task` to poll task status. If OpenCode returns a task/session id without a final result, record the id, stop dispatching new rounds, and tell the user the round is still in flight. When the user asks to check later, inspect authoritative files (`batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`, day files, `.jobforge-ledger/events.jsonl`, `.jobforge-index.json`, or `iso-trace`) rather than spawning a "check task status" subagent.
   why: OpenCode status prompts can be delivered into the target subagent as a new user message; a 2026-04-25 trace caused a subagent to call `task` recursively instead of finishing the application
 - [H6] Application outcomes flow through `batch/tracker-additions/*.tsv`, not `data/pipeline.md`. After any multi-apply run, the orchestrator MUST run `npx job-forge merge` then `npx job-forge verify` before ending the session.
   why: `pipeline.md` is the URL inbox (`[ ]` pending → `[x]` processed); `data/applications/YYYY-MM-DD.md` is the outcome log; the TSV pathway is the only safe bridge because `merge` handles column order and duplicate detection
-- [H7] Load-bearing facts passed to downstream subagents must originate from a file, not from prior subagent prose. Authoritative sources: `data/pipeline.md`, `data/scan-history.tsv`, `batch/scan-output-*.md`, `reports/{num}-*.md` with `**URL:**` / `**Score:**` headers, `batch/tracker-additions/*.tsv`.
+- [H7] Load-bearing facts passed to downstream subagents must originate from a file, not from prior subagent prose. Authoritative sources: `data/pipeline.md`, `data/scan-history.tsv`, `batch/scan-output-*.md`, `reports/{num}-*.md` with `**URL:**` / `**Score:**` headers, `batch/tracker-additions/*.tsv`, cached JD content returned by `npx job-forge cache:get --url ...`, and source path/line pointers returned by `npx job-forge index:query ...`.
   why: 2026-04-18 scan subagent returned 30 fabricated Greenhouse IDs in prose (plausible-looking, non-existent); orchestrator dispatched 30 downstream subagents that all 404'd. Subagents can hallucinate IDs, scores, and confirmation text — round-trip through a file or don't trust the value
 - [H8] Never paste proxy values from `config/profile.yml` into `task` prompts, status text, or summaries. If a proxy is configured, tell the subagent exactly: "Proxy is configured; read `config/profile.yml` and pass its top-level `proxy:` object to every `geometra_connect` call." Do not transcribe `server`, `username`, `password`, or `bypass`, even if you just read them from disk.
@@ -66,12 +66,18 @@ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs v
 - [D11] Treat `templates/context.json` as the source of truth for mode/reference context bundles. Use `npx job-forge context:plan <mode>` or `npx job-forge context:check <mode>` when changing or validating what a mode loads; do not paste the full context matrix into prompts.
   why: deterministic context bundles prevent reference-file drift and accidental token bloat without adding MCP/tool-schema tokens
+- [D12] Use `job-forge cache:*` for deterministic local artifact reuse when available. For URL inputs, check `npx job-forge cache:has --url "..."` / `cache:get` before browser or network JD fetches; after a successful fetch, store the exact JD text with `npx job-forge cache:put --url "..." --ttl 14d --input @file` when it is already on disk.
+  why: `iso-cache` is not an MCP and adds no prompt/tool-schema tokens; it avoids repeated JD fetch/render passes and lets future sessions reuse stable content from `.jobforge-cache/`
+- [D13] Use `job-forge index:*` for deterministic artifact lookup when available. `index:has` and `index:query` rebuild `.jobforge-index.json` from `templates/index.json` on demand, covering reports, tracker day files, tracker TSVs, pipeline URLs, scan history, and ledger events without loading those growing files into prompt context.
+  why: `iso-index` is not an MCP and adds no prompt/tool-schema tokens; it gives agents compact file/line pointers and duplicate prefilters before expensive reads or browser dispatches
 ## Procedure
 1. Check `cv.md`, `profile.yml`, and `portals.yml`; onboard if any file is missing.
 2. Pick and name the mode from **Routing** [D6]. No match → ask; do not guess.
-3. Read the active mode file [D3]; use context bundle checks when changing context loads [D11]; decide inline vs delegated work [D1].
-4. Prepare Geometra dispatches: cleanup [H3], ledger prefilter when present [D8], dedupe [H2], location filter [D5], routing [D2, D10], proxy prompt hygiene [H8].
+3. Read the active mode file [D3]. Use context bundle checks when changing context loads [D11]. Check cached artifacts before URL/JD refetches [D12]. Use artifact index lookups before broad file reads when they can answer the question [D13]. Decide inline vs delegated work [D1].
+4. Prepare Geometra dispatches: cleanup [H3], index/ledger prefilter when useful [D8, D13], dedupe [H2], location filter [D5], routing [D2, D10], proxy prompt hygiene [H8].
 5. Dispatch at most 2 tasks per round [H1]; wait for final outcomes, not just task ids [H5b].
 6. Keep multi-job form-filling out of the orchestrator [H4].
 7. Cross-check subagent facts against authoritative files [H7].

package/README.md CHANGED Viewed

@@ -31,7 +31,7 @@ The scaffolded `opencode.json` already has three MCPs wired up — they launch a
 - **Gmail** — reads replies from recruiters
 - **state-trace** — typed working memory for cross-session context (resumed batches, recent decisions, repeated portal quirks). Install once with `python3 -m pip install "state-trace[mcp]"`; the MCP command is `state-trace-mcp`.
-JobForge also keeps MCP-free local workflow state: `templates/contracts.json` defines tracker/apply artifact shapes via `@razroo/iso-contract`, `templates/capabilities.json` defines role capability boundaries via `@razroo/iso-capabilities`, `templates/context.json` defines deterministic mode/reference bundles via `@razroo/iso-context`, and `.jobforge-ledger/events.jsonl` records deterministic duplicate/status events via `@razroo/iso-ledger`. None of these add always-on prompt or tool-schema tokens.
+JobForge also keeps MCP-free local workflow state: `templates/contracts.json` defines tracker/apply artifact shapes via `@razroo/iso-contract`, `templates/capabilities.json` defines role capability boundaries via `@razroo/iso-capabilities`, `templates/context.json` defines deterministic mode/reference bundles via `@razroo/iso-context`, `.jobforge-ledger/events.jsonl` records duplicate/status events via `@razroo/iso-ledger`, `.jobforge-cache/` stores reusable JD/artifact content via `@razroo/iso-cache`, and `.jobforge-index.json` indexes artifact source pointers via `@razroo/iso-index`. None of these add always-on prompt or tool-schema tokens.
 `npm install` also materializes symlinks for every supported agent harness — OpenCode, Cursor, Claude Code, and Codex — so you can run `opencode`, `cursor`, `claude`, or `codex` in the same project and each picks up the shared MCP config and instructions.
@@ -78,7 +78,7 @@ JobForge turns opencode into a full job search command center. Instead of manual
 | **Durable Batch Orchestration** | `batch-runner.sh` uses `@razroo/iso-orchestrator` for resumable bundle execution, bounded fan-out, mutexed state writes, and workflow records in `.jobforge-runs/`. |
 | **Pipeline Integrity** | Automated merge, dedup, status normalization, health checks |
 | **Cost-Aware Agent Routing** | Three subagents (`@general-free`, `@general-paid`, `@glm-minimal`) with per-task tool surfaces. On OpenCode, JobForge pins all tiers to `opencode-go/deepseek-v4-flash` so application runs avoid overloaded free-model pools. See [Subagent Routing in AGENTS.md](AGENTS.md) for the task-to-agent mapping. |
-| **Trace + Telemetry + Guard + Contract + Ledger + Capabilities + Context** | `job-forge trace:*` exposes local OpenCode transcripts, `job-forge telemetry:*` summarizes runs, `job-forge guard:*` audits deterministic policy rules, `templates/contracts.json` enforces artifact shape with `iso-contract`, `job-forge ledger:*` queries append-only workflow state, `job-forge capabilities:*` checks role boundaries, and `job-forge context:*` plans mode/reference context bundles without MCP/tool-schema overhead. |
+| **Trace + Telemetry + Guard + Contract + Ledger + Capabilities + Context + Cache + Index** | `job-forge trace:*` exposes local OpenCode transcripts, `job-forge telemetry:*` summarizes runs, `job-forge guard:*` audits deterministic policy rules, `templates/contracts.json` enforces artifact shape with `iso-contract`, `job-forge ledger:*` queries append-only workflow state, `job-forge capabilities:*` checks role boundaries, `job-forge context:*` plans mode/reference context bundles, `job-forge cache:*` reuses fetched JD/artifact content, and `job-forge index:*` queries compact source pointers without MCP/tool-schema overhead. |
 | **Token Cost Visibility** | `job-forge tokens --days 1` for per-session breakdown; `job-forge session-report --since-minutes 60 --log` to flag sessions over budget and append history to `data/token-usage.tsv`. Auto-logged after every batch run. |
 ## Usage
@@ -146,6 +146,8 @@ my-search/
 ├── config/profile.yml            # your identity, target roles (personal)
 ├── data/                         # applications, pipeline, scan history (personal, gitignored)
 ├── .jobforge-ledger/              # append-only local workflow events (personal, gitignored)
+├── .jobforge-cache/               # content-addressed local JD/artifact cache (personal, gitignored)
+├── .jobforge-index.json           # deterministic artifact lookup index (generated, gitignored)
 ├── reports/                      # generated evaluation reports (personal, gitignored)
 ├── batch/{batch-input,batch-state}.tsv, tracker-additions/, logs/   # personal
 ├── .jobforge-runs/                # durable batch workflow records (generated)
@@ -162,7 +164,7 @@ my-search/
 ├── .opencode/skills/job-forge.md # → skill router
 ├── .opencode/agents/             # → @general-free, @general-paid, @glm-minimal
 ├── modes/                        # → _shared.md + skill modes
-├── templates/                    # → states.yml, portals.example.yml, cv-template.html, capabilities.json, context.json
+├── templates/                    # → states.yml, portals.example.yml, cv-template.html, capabilities.json, context.json, index.json
 ├── batch/batch-prompt.md         # → batch worker prompt
 ├── batch/batch-runner.sh         # → parallel orchestrator
 │
@@ -197,6 +199,8 @@ JobForge/
 │   ├── ledger.mjs                # iso-ledger-backed workflow-state CLI
 │   ├── capabilities.mjs          # iso-capabilities-backed role policy CLI
 │   ├── context.mjs               # iso-context-backed context bundle CLI
+│   ├── cache.mjs                 # iso-cache-backed local artifact cache CLI
+│   ├── index.mjs                 # iso-index-backed artifact lookup CLI
 │   ├── token-usage-report.mjs    # opencode cost analyzer
 │   └── release/check-source.mjs  # version gate for npm publish
 ├── tracker-lib.mjs / merge-tracker.mjs / dedup-tracker.mjs / verify-pipeline.mjs

package/bin/create-job-forge.mjs CHANGED Viewed

@@ -124,6 +124,11 @@ const consumerPkg = {
     'ledger:verify': 'job-forge ledger:verify',
     'ledger:has': 'job-forge ledger:has',
     'ledger:query': 'job-forge ledger:query',
+    'index:build': 'job-forge index:build',
+    'index:status': 'job-forge index:status',
+    'index:verify': 'job-forge index:verify',
+    'index:has': 'job-forge index:has',
+    'index:query': 'job-forge index:query',
     // One command to pull the latest harness and any locally-pinned MCP
     // packages. npm update is a no-op on packages not in package.json, so
     // listing @razroo/gmail-mcp + @geometra/mcp is safe for consumers that
@@ -224,6 +229,7 @@ Before doing any work, remember where things live in *this* project:
 | Inbox of pending URLs | \`data/pipeline.md\` | The queue for \`/job-forge pipeline\` |
 | Scanner dedup history | \`data/scan-history.tsv\` | Only touch in \`/job-forge scan\` |
 | Local workflow ledger | \`.jobforge-ledger/events.jsonl\` | Deterministic append-only state; use \`job-forge ledger:*\` |
+| Local artifact index | \`.jobforge-index.json\` | Deterministic file/line lookup; use \`job-forge index:*\` |
 | Scanner config | \`portals.yml\` (project root) | Company configs |
 | Profile / identity | \`config/profile.yml\` | Candidate name, email, target roles |
 | CV | \`cv.md\` (project root) | Markdown, source of truth |
@@ -369,6 +375,7 @@ job-forge sync             # re-run if symlinks drift
 job-forge merge            # merge batch/tracker-additions/*.tsv into the tracker
 job-forge verify           # verify pipeline integrity
 job-forge ledger:status    # local deterministic workflow ledger status
+job-forge index:status     # local artifact index status
 job-forge pdf cv.md out.pdf
 job-forge tokens --days 1  # per-session opencode token usage
 \`\`\`

package/bin/job-forge.mjs CHANGED Viewed

@@ -23,6 +23,8 @@
  *   ledger:*       Query local deterministic workflow state via iso-ledger
  *   capabilities:* Query role capability policy via iso-capabilities
  *   context:*      Query/render deterministic context bundles via iso-context
+ *   cache:*        Reuse local deterministic artifacts via iso-cache
+ *   index:*        Query local artifacts via iso-index
  *   sync           Re-run the harness symlink sync (bin/sync.mjs)
  *   help, --help   Show this message
  */
@@ -103,6 +105,28 @@ const contextAliases = {
   'context:path': 'path',
 };
+const cacheAliases = {
+  'cache:key': 'key',
+  'cache:status': 'status',
+  'cache:has': 'has',
+  'cache:get': 'get',
+  'cache:put': 'put',
+  'cache:list': 'list',
+  'cache:verify': 'verify',
+  'cache:prune': 'prune',
+  'cache:path': 'path',
+};
+const indexAliases = {
+  'index:build': 'build',
+  'index:status': 'status',
+  'index:query': 'query',
+  'index:has': 'has',
+  'index:verify': 'verify',
+  'index:explain': 'explain',
+  'index:path': 'path',
+};
 const [, , cmd, ...rest] = process.argv;
 function printHelp() {
@@ -142,6 +166,17 @@ Commands:
   context:plan            Estimate files/tokens for one context bundle
   context:check           Fail if a context bundle exceeds its budget
   context:render          Render context bundle content as markdown/json
+  cache:status            Show local artifact cache status
+  cache:key               Print deterministic cache key for a job URL
+  cache:has               Check whether a job URL or cache key is cached
+  cache:get               Read cached JD/artifact content
+  cache:put               Store JD/artifact content
+  cache:verify            Validate local artifact cache integrity
+  index:status            Show local artifact index status
+  index:build             Rebuild .jobforge-index.json from templates/index.json
+  index:has               Check indexed URL/company-role/report facts without loading source files
+  index:query             Query indexed reports, tracker rows, TSVs, scan history, pipeline, and ledger
+  index:verify            Validate local artifact index integrity
   sync           Re-create harness symlinks in the current project
 Deterministic helpers (prefer these over LLM-derived values):
@@ -175,6 +210,11 @@ Pass --help after a command to see its own flags, e.g.:
   job-forge capabilities:check general-free --tool browser --mcp geometra --command "npx job-forge merge" --filesystem write
   job-forge context:plan apply
   job-forge context:check apply --budget 23000
+  job-forge cache:has --url https://example.test/jobs/123
+  job-forge cache:get --url https://example.test/jobs/123
+  job-forge cache:put --url https://example.test/jobs/123 --input @jds/example.md
+  job-forge index:has --key "company-role:acme:staff-engineer"
+  job-forge index:query "acme"
 Project directory resolves to $JOB_FORGE_PROJECT or cwd.`);
 }
@@ -274,6 +314,36 @@ if (cmd === 'context' || contextAliases[cmd]) {
   process.exit(result.status ?? 1);
 }
+if (cmd === 'cache' || cacheAliases[cmd]) {
+  const cacheArgs = cmd === 'cache'
+    ? (rest.length === 0 ? ['help'] : rest)
+    : [cacheAliases[cmd], ...rest];
+  const scriptPath = join(PKG_ROOT, 'scripts/cache.mjs');
+  const result = spawnSync(process.execPath, [scriptPath, ...cacheArgs], {
+    stdio: 'inherit',
+    cwd: PROJECT_DIR,
+    env: process.env,
+  });
+  process.exit(result.status ?? 1);
+}
+if (cmd === 'index' || indexAliases[cmd]) {
+  const indexArgs = cmd === 'index'
+    ? (rest.length === 0 ? ['help'] : rest)
+    : [indexAliases[cmd], ...rest];
+  const scriptPath = join(PKG_ROOT, 'scripts/index.mjs');
+  const result = spawnSync(process.execPath, [scriptPath, ...indexArgs], {
+    stdio: 'inherit',
+    cwd: PROJECT_DIR,
+    env: process.env,
+  });
+  process.exit(result.status ?? 1);
+}
 const rel = commands[cmd];
 if (!rel) {
   console.error(`Unknown command: ${cmd}\n`);

package/docs/ARCHITECTURE.md CHANGED Viewed

@@ -161,6 +161,7 @@ config/profile.yml       →  Candidate identity
 portals.yml              →  Scanner configuration
 data/pipeline.md        →  Pending URLs and `local:jds/...` inbox (see modes/pipeline.md)
 .jobforge-ledger/events.jsonl → Append-only workflow events for cheap local duplicate/status checks
+.jobforge-index.json     →  Deterministic artifact lookup index built from templates/index.json
 jds/*.md                 →  Saved job descriptions referenced from the pipeline (`local:jds/{file}`)
 templates/states.yml     →  Canonical status values
 templates/context.json    →  Deterministic mode/reference context bundle policy
@@ -176,12 +177,13 @@ Create `data/pipeline.md` when you start using the URL inbox (`/job-forge pipeli
 - PDFs: `cv-candidate-{company-slug}-{YYYY-MM-DD}.pdf`
 - Tracker TSVs: `batch/tracker-additions/{num}-{company-slug}.tsv` (one file per evaluation; merged files move under `batch/tracker-additions/merged/`; shape enforced by `templates/contracts.json`)
 - Ledger: `.jobforge-ledger/events.jsonl` (created by `job-forge ledger:rebuild`, `tracker-line --write`, or `merge`; gitignored personal state)
+- Index: `.jobforge-index.json` (created on demand by `job-forge index:*`; gitignored local lookup state)
 - Capabilities: `templates/capabilities.json` (role boundary policy inspected with `job-forge capabilities:*`)
 - Context: `templates/context.json` (mode/reference file bundles inspected with `job-forge context:*`)
 ## Pipeline Integrity
-From the project root, `npx job-forge verify` (or `npm run verify`) runs `verify-pipeline.mjs`. When a tracker file exists, it validates canonical statuses (using `templates/states.yml` when that file is present and parseable), validates every tracker row against `templates/contracts.json`, warns on probable duplicate company/role rows, checks that report column markdown links resolve to files in the repo, validates score column format (`X.X/5`, `N/A`, or `DUP`), rejects table rows with too few columns, flags markdown bold inside the score column, and warns if any `batch/tracker-additions/*.tsv` files are still waiting to be merged. If `.jobforge-ledger/events.jsonl` exists, verify also validates the append-only ledger. It also compares state ids from `templates/states.yml` to an internal fallback list and warns when the two sets drift. **Fresh clone:** the command exits successfully when neither `data/applications.md` nor root `applications.md` exists yet; pending-TSV and states-drift checks still run so contributors see unmerged batch output early. Optional setup validation after you add `cv.md` and `config/profile.yml`: `npm run sync-check` (`cv-sync-check.mjs`).
+From the project root, `npx job-forge verify` (or `npm run verify`) runs `verify-pipeline.mjs`. When a tracker file exists, it validates canonical statuses (using `templates/states.yml` when that file is present and parseable), validates every tracker row against `templates/contracts.json`, warns on probable duplicate company/role rows, checks that report column markdown links resolve to files in the repo, validates score column format (`X.X/5`, `N/A`, or `DUP`), rejects table rows with too few columns, flags markdown bold inside the score column, and warns if any `batch/tracker-additions/*.tsv` files are still waiting to be merged. If `.jobforge-ledger/events.jsonl` exists, verify also validates the append-only ledger. If `.jobforge-index.json` exists, verify validates the artifact index. It also compares state ids from `templates/states.yml` to an internal fallback list and warns when the two sets drift. **Fresh clone:** the command exits successfully when neither `data/applications.md` nor root `applications.md` exists yet; pending-TSV and states-drift checks still run so contributors see unmerged batch output early. Optional setup validation after you add `cv.md` and `config/profile.yml`: `npm run sync-check` (`cv-sync-check.mjs`).
 **`verify-pipeline.mjs` checks (same order as the script header):**
@@ -195,8 +197,9 @@ From the project root, `npx job-forge verify` (or `npm run verify`) runs `verify
 8. Score column has no markdown bold.
 9. Warn when state ids in `templates/states.yml` drift from the script’s built-in fallback list (or when the file exists but ids failed to parse).
 10. Validate `.jobforge-ledger/events.jsonl` when present.
+11. Validate `.jobforge-index.json` when present.
-When the tracker file is missing, checks 1-6 and 8 are skipped; checks 7, 9, and 10 still run when applicable.
+When the tracker file is missing, checks 1-6 and 8 are skipped; checks 7, 9, 10, and 11 still run when applicable.
 ## Contributing touchpoints
@@ -219,6 +222,7 @@ Scripts maintain data consistency. In a consumer project they're invoked via the
 | `scripts/telemetry.mjs` | `npx job-forge telemetry:status` / `telemetry:show` | JobForge operational telemetry derived from OpenCode traces plus tracker TSV state |
 | `scripts/guard.mjs` | `npx job-forge guard:audit` / `guard:explain` | Deterministic `@razroo/iso-guard` policy audits over local OpenCode traces |
 | `scripts/ledger.mjs` | `npx job-forge ledger:status` / `ledger:has` / `ledger:rebuild` | Deterministic `@razroo/iso-ledger` state over tracker, TSV, and pipeline files |
+| `scripts/index.mjs` | `npx job-forge index:status` / `index:has` / `index:query` | Deterministic `@razroo/iso-index` lookup over reports, tracker rows, TSVs, pipeline, scan history, and ledger events |
 | `scripts/context.mjs` | `npx job-forge context:list` / `context:plan` / `context:check` / `context:render` | Deterministic `@razroo/iso-context` mode/reference context bundle planning and rendering |
 | `tracker-lib.mjs` | _(library)_ | Shared helpers for reading/writing day-based tracker files — imported by merge/dedup/verify/normalize |
 | `bin/sync.mjs` | `npx job-forge sync` | Creates the harness symlinks in a consumer project (also runs as `postinstall`) |

package/docs/CUSTOMIZATION.md CHANGED Viewed

@@ -150,6 +150,10 @@ Role capability boundaries live in `templates/capabilities.json` and are enforce
 Mode/reference context bundles live in `templates/context.json` and are planned locally by `@razroo/iso-context`. Use `job-forge context:plan <mode>` to see the files and estimated tokens, `job-forge context:check <mode>` to fail on budget drift, and `job-forge context:render <mode>` when you intentionally need a compact markdown or JSON context bundle. This is not an MCP and does not add tool-schema tokens; rendered context only consumes prompt tokens when a workflow deliberately asks for it.
+## JobForge artifact index
+Artifact lookup policy lives in `templates/index.json` and is built locally by `@razroo/iso-index`. Use `job-forge index:has --key "company-role:acme:staff-engineer"` as a cheap duplicate/source prefilter, `job-forge index:query "acme"` to get compact source path/line pointers, and `job-forge index:verify` to validate `.jobforge-index.json`. Query, has, and verify rebuild the index on demand, so scaffolded projects need no setup. This is not an MCP and does not add tool-schema tokens.
 ## JobForge guard audits
 Guard audits run deterministic `@razroo/iso-guard` policies over the same local OpenCode traces. The default policy lives at `templates/guards/jobforge-baseline.yaml` and checks rules that are reliable from transcript data, including max two task dispatches per assistant message, no task-status polling via `task`, no raw proxy configuration in task prompts, and no child session task recursion.

package/docs/SETUP.md CHANGED Viewed

@@ -128,6 +128,8 @@ From your project root, these commands maintain the tracker and pipeline checks.
 | Inspect tracker row contract | `npx iso-contract explain jobforge.tracker-row --contracts templates/contracts.json` | _(none)_ |
 | Inspect role capabilities | `npx job-forge capabilities:explain general-free` | `npm run capabilities:explain -- general-free` |
 | Inspect context bundle budget | `npx job-forge context:plan apply` | `npm run context:plan -- apply` |
+| Inspect local JD/artifact cache | `npx job-forge cache:status` | `npm run cache:status` |
+| Inspect local artifact index | `npx job-forge index:status` | `npm run index:status` |
 | Map status column to canonical labels | `npx job-forge normalize` | `npm run normalize` |
 | Merge duplicate company/role rows | `npx job-forge dedup` | `npm run dedup` |
 | Generate ATS-optimized CV PDF | `npx job-forge pdf` | `npm run pdf` |
@@ -144,6 +146,8 @@ From your project root, these commands maintain the tracker and pipeline checks.
 | Show local workflow ledger status | `npx job-forge ledger:status` | `npm run ledger:status` |
 | Rebuild local workflow ledger from tracker/pipeline files | `npx job-forge ledger:rebuild` | `npm run ledger:rebuild` |
 | Check duplicate/status event without loading tracker files | `npx job-forge ledger:has --company "Acme" --role "Staff Engineer" --status Applied` | `npm run ledger:has -- --company ...` |
+| Check/reuse cached JD content | `npx job-forge cache:has --url <url>` / `npx job-forge cache:get --url <url>` | `npm run cache:has -- --url ...` |
+| Query local artifact pointers | `npx job-forge index:query "Acme"` / `npx job-forge index:has --key company-role:acme:staff-engineer` | `npm run index:query -- Acme` |
 | Re-create harness symlinks | `npx job-forge sync` | `npm run sync` |
 | Build optional dashboard TUI (Go on `PATH`) | `(cd node_modules/job-forge/dashboard && go build .)` | `npm run build:dashboard` (harness repo only) |

package/iso/commands/job-forge.md CHANGED Viewed

@@ -76,6 +76,11 @@ Local workflow ledger (terminal, outside opencode):
   npx job-forge ledger:status          # .jobforge-ledger/events.jsonl summary
   npx job-forge ledger:has --company "Acme" --role "Staff Engineer" --status Applied
+Local artifact index (terminal, outside opencode):
+  npx job-forge index:status           # .jobforge-index.json summary
+  npx job-forge index:has --key "company-role:acme:staff-engineer"
+  npx job-forge index:query "acme"
 Artifact contracts (terminal, outside opencode):
   npx iso-contract explain jobforge.tracker-row --contracts templates/contracts.json
   npx job-forge tracker-line ... --write   # renders + validates tracker TSV locally
@@ -161,6 +166,11 @@ Step 1  — Enumerate candidates
   - Build ordered list: candidates = [job_1, job_2, ..., job_N]
 Step 2  — Dedup against already-applied
+  - Run npx job-forge index:has --key "company-role:<company-slug>:<role-slug>"
+    as a fast local artifact prefilter when company+role is known. It rebuilds
+    .jobforge-index.json on demand from templates/index.json. A hit means the
+    role has already appeared in tracker files or tracker TSVs and can be
+    dropped before dispatch.
   - If .jobforge-ledger/events.jsonl exists, use npx job-forge ledger:has as a
     fast prefilter for obvious company+role Applied duplicates. A ledger match
     can be dropped before dispatch without loading tracker files into context.

package/iso/instructions.md CHANGED Viewed

@@ -7,7 +7,7 @@ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs v
 - [H1] Max 2 parallel `task` dispatches per message. For N jobs, run `ceil(N/2)` sequential rounds of 2. A round is not complete until both subagents return a final outcome (`APPLIED`, `APPLY FAILED`, `SKIP`, `Discarded`, or a written TSV path). A `task` tool result that only gives a session id / title is a launch acknowledgement, not completion. Applies in all modes, for all user phrasings ("urgent", "apply to 10 jobs now").
   why: each subagent requires post-cleanup and racing more than 2 reliably loses at least one result. On 2026-04-25 the orchestrator launched round 2 while round 1 had only returned task ids, leaving four application subagents in flight and losing two provider recoveries
-- [H2] Max 1 application per company+role. Before every `apply` dispatch, grep all four sources for the URL and for `company+role`: `data/pipeline.md`, all `data/applications/*.md` day files, `batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`. If `.jobforge-ledger/events.jsonl` exists, `npx job-forge ledger:has --company "..." --role "..." --status Applied` may be used as a fast prefilter; a match is enough to drop that duplicate before dispatch. For candidates not rejected by the ledger, the four-source grep is still mandatory. If any source shows APPLIED / Applied, skip the dispatch and pick a replacement from the remaining candidate list. Do not count duplicates toward a requested "apply to N jobs" total, and do not delegate obvious duplicates just so a subagent can return SKIP.
+- [H2] Max 1 application per company+role. Before every `apply` dispatch, grep all four sources for the URL and for `company+role`: `data/pipeline.md`, all `data/applications/*.md` day files, `batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`. `npx job-forge index:has --key "company-role:..."` may be used first as a fast local artifact prefilter; it rebuilds `.jobforge-index.json` on demand from `templates/index.json`, and a company+role hit is enough to drop an obvious duplicate before dispatch. If `.jobforge-ledger/events.jsonl` exists, `npx job-forge ledger:has --company "..." --role "..." --status Applied` may also be used as a fast prefilter; a match is enough to drop that duplicate before dispatch. For candidates not rejected by the index or ledger, the four-source grep is still mandatory. If any source shows APPLIED / Applied, skip the dispatch and pick a replacement from the remaining candidate list. Do not count duplicates toward a requested "apply to N jobs" total, and do not delegate obvious duplicates just so a subagent can return SKIP.
   why: 2026-04 same-day batch collision — when two batches target the same role, `npx job-forge merge` updates the existing day-file row rather than appending, so grepping day files alone misses earlier-batch applies; merged/*.tsv is the only place the breadcrumb remains
 - [H3] Before every batch of `task` dispatches that will use Geometra, call `geometra_list_sessions` then `geometra_disconnect({closeBrowser: true})`. Every round, no exceptions. Name this cleanup as an explicit "step 0" in your first-response plan for any multi-apply request — it is the most frequently skipped guardrail in practice, and skipping it produces cascade "Not connected" failures on the next dispatch.
@@ -19,13 +19,13 @@ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs v
 - [H5] Re-dispatch the same company only AFTER the previous subagent returns. Never fire the same `task` twice while the first is still in flight.
   why: two in-flight subagents for the same URL race on Geometra sessions and on tracker TSV writes, corrupting state and sometimes double-submitting
-- [H5b] Do not use `task` to poll task status. If OpenCode returns a task/session id without a final result, record the id, stop dispatching new rounds, and tell the user the round is still in flight. When the user asks to check later, inspect authoritative files (`batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`, day files, `.jobforge-ledger/events.jsonl`, or `iso-trace`) rather than spawning a "check task status" subagent.
+- [H5b] Do not use `task` to poll task status. If OpenCode returns a task/session id without a final result, record the id, stop dispatching new rounds, and tell the user the round is still in flight. When the user asks to check later, inspect authoritative files (`batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`, day files, `.jobforge-ledger/events.jsonl`, `.jobforge-index.json`, or `iso-trace`) rather than spawning a "check task status" subagent.
   why: OpenCode status prompts can be delivered into the target subagent as a new user message; a 2026-04-25 trace caused a subagent to call `task` recursively instead of finishing the application
 - [H6] Application outcomes flow through `batch/tracker-additions/*.tsv`, not `data/pipeline.md`. After any multi-apply run, the orchestrator MUST run `npx job-forge merge` then `npx job-forge verify` before ending the session.
   why: `pipeline.md` is the URL inbox (`[ ]` pending → `[x]` processed); `data/applications/YYYY-MM-DD.md` is the outcome log; the TSV pathway is the only safe bridge because `merge` handles column order and duplicate detection
-- [H7] Load-bearing facts passed to downstream subagents must originate from a file, not from prior subagent prose. Authoritative sources: `data/pipeline.md`, `data/scan-history.tsv`, `batch/scan-output-*.md`, `reports/{num}-*.md` with `**URL:**` / `**Score:**` headers, `batch/tracker-additions/*.tsv`.
+- [H7] Load-bearing facts passed to downstream subagents must originate from a file, not from prior subagent prose. Authoritative sources: `data/pipeline.md`, `data/scan-history.tsv`, `batch/scan-output-*.md`, `reports/{num}-*.md` with `**URL:**` / `**Score:**` headers, `batch/tracker-additions/*.tsv`, cached JD content returned by `npx job-forge cache:get --url ...`, and source path/line pointers returned by `npx job-forge index:query ...`.
   why: 2026-04-18 scan subagent returned 30 fabricated Greenhouse IDs in prose (plausible-looking, non-existent); orchestrator dispatched 30 downstream subagents that all 404'd. Subagents can hallucinate IDs, scores, and confirmation text — round-trip through a file or don't trust the value
 - [H8] Never paste proxy values from `config/profile.yml` into `task` prompts, status text, or summaries. If a proxy is configured, tell the subagent exactly: "Proxy is configured; read `config/profile.yml` and pass its top-level `proxy:` object to every `geometra_connect` call." Do not transcribe `server`, `username`, `password`, or `bypass`, even if you just read them from disk.
@@ -66,12 +66,18 @@ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs v
 - [D11] Treat `templates/context.json` as the source of truth for mode/reference context bundles. Use `npx job-forge context:plan <mode>` or `npx job-forge context:check <mode>` when changing or validating what a mode loads; do not paste the full context matrix into prompts.
   why: deterministic context bundles prevent reference-file drift and accidental token bloat without adding MCP/tool-schema tokens
+- [D12] Use `job-forge cache:*` for deterministic local artifact reuse when available. For URL inputs, check `npx job-forge cache:has --url "..."` / `cache:get` before browser or network JD fetches; after a successful fetch, store the exact JD text with `npx job-forge cache:put --url "..." --ttl 14d --input @file` when it is already on disk.
+  why: `iso-cache` is not an MCP and adds no prompt/tool-schema tokens; it avoids repeated JD fetch/render passes and lets future sessions reuse stable content from `.jobforge-cache/`
+- [D13] Use `job-forge index:*` for deterministic artifact lookup when available. `index:has` and `index:query` rebuild `.jobforge-index.json` from `templates/index.json` on demand, covering reports, tracker day files, tracker TSVs, pipeline URLs, scan history, and ledger events without loading those growing files into prompt context.
+  why: `iso-index` is not an MCP and adds no prompt/tool-schema tokens; it gives agents compact file/line pointers and duplicate prefilters before expensive reads or browser dispatches
 ## Procedure
 1. Check `cv.md`, `profile.yml`, and `portals.yml`; onboard if any file is missing.
 2. Pick and name the mode from **Routing** [D6]. No match → ask; do not guess.
-3. Read the active mode file [D3]; use context bundle checks when changing context loads [D11]; decide inline vs delegated work [D1].
-4. Prepare Geometra dispatches: cleanup [H3], ledger prefilter when present [D8], dedupe [H2], location filter [D5], routing [D2, D10], proxy prompt hygiene [H8].
+3. Read the active mode file [D3]. Use context bundle checks when changing context loads [D11]. Check cached artifacts before URL/JD refetches [D12]. Use artifact index lookups before broad file reads when they can answer the question [D13]. Decide inline vs delegated work [D1].
+4. Prepare Geometra dispatches: cleanup [H3], index/ledger prefilter when useful [D8, D13], dedupe [H2], location filter [D5], routing [D2, D10], proxy prompt hygiene [H8].
 5. Dispatch at most 2 tasks per round [H1]; wait for final outcomes, not just task ids [H5b].
 6. Keep multi-job form-filling out of the orchestrator [H4].
 7. Cross-check subagent facts against authoritative files [H7].