npm - job-forge - Versions diffs - 2.14.26 → 2.14.28 - Mend

job-forge 2.14.26 → 2.14.28

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (26) hide show

package/.cursor/rules/main.mdc +9 -3
package/.opencode/skills/job-forge.md +28 -6
package/AGENTS.md +9 -3
package/CLAUDE.md +9 -3
package/README.md +6 -4
package/bin/create-job-forge.mjs +11 -0
package/bin/job-forge.mjs +57 -0
package/docs/ARCHITECTURE.md +7 -1
package/docs/CUSTOMIZATION.md +9 -1
package/docs/README.md +1 -1
package/docs/SETUP.md +4 -0
package/iso/commands/job-forge.md +28 -6
package/iso/instructions.md +9 -3
package/lib/jobforge-cache.mjs +9 -4
package/lib/jobforge-canon.mjs +88 -0
package/lib/jobforge-index.mjs +77 -1
package/lib/jobforge-ledger.mjs +33 -15
package/lib/jobforge-preflight.mjs +29 -0
package/package.json +10 -1
package/scripts/canon.mjs +178 -0
package/scripts/ledger.mjs +27 -2
package/scripts/preflight.mjs +142 -0
package/templates/canon.json +65 -0
package/templates/capabilities.json +8 -2
package/templates/migrations.json +7 -0
package/templates/preflight.json +59 -0

package/.cursor/rules/main.mdc CHANGED Viewed

@@ -12,7 +12,7 @@ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs v
 - [H1] Max 2 parallel `task` dispatches per message. For N jobs, run `ceil(N/2)` sequential rounds of 2. A round is not complete until both subagents return a final outcome (`APPLIED`, `APPLY FAILED`, `SKIP`, `Discarded`, or a written TSV path). A `task` tool result that only gives a session id / title is a launch acknowledgement, not completion. Applies in all modes, for all user phrasings ("urgent", "apply to 10 jobs now").
   why: each subagent requires post-cleanup and racing more than 2 reliably loses at least one result. On 2026-04-25 the orchestrator launched round 2 while round 1 had only returned task ids, leaving four application subagents in flight and losing two provider recoveries
-- [H2] Max 1 application per company+role. Before every `apply` dispatch, grep all four sources for the URL and for `company+role`: `data/pipeline.md`, all `data/applications/*.md` day files, `batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`. `npx job-forge index:has --key "company-role:..."` may be used first as a fast local artifact prefilter; it rebuilds `.jobforge-index.json` on demand from `templates/index.json`, and a company+role hit is enough to drop an obvious duplicate before dispatch. If `.jobforge-ledger/events.jsonl` exists, `npx job-forge ledger:has --company "..." --role "..." --status Applied` may also be used as a fast prefilter; a match is enough to drop that duplicate before dispatch. For candidates not rejected by the index or ledger, the four-source grep is still mandatory. If any source shows APPLIED / Applied, skip the dispatch and pick a replacement from the remaining candidate list. Do not count duplicates toward a requested "apply to N jobs" total, and do not delegate obvious duplicates just so a subagent can return SKIP.
+- [H2] Max 1 application per company+role. Before every `apply` dispatch, grep all four sources for the URL and for `company+role`: `data/pipeline.md`, all `data/applications/*.md` day files, `batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`. Use `npx job-forge canon:key company-role --company "..." --role "..."` when deriving a stable duplicate key; do not invent slugs in prose. `npx job-forge index:has --key "company-role:..."` may be used first as a fast local artifact prefilter; it rebuilds `.jobforge-index.json` on demand from `templates/index.json`, and a company+role hit is enough to drop an obvious duplicate before dispatch. If `.jobforge-ledger/events.jsonl` exists, `npx job-forge ledger:has --company "..." --role "..." --status Applied` may also be used as a fast prefilter; a match is enough to drop that duplicate before dispatch. For candidates not rejected by the index or ledger, the four-source grep is still mandatory. If any source shows APPLIED / Applied, skip the dispatch and pick a replacement from the remaining candidate list. Do not count duplicates toward a requested "apply to N jobs" total, and do not delegate obvious duplicates just so a subagent can return SKIP.
   why: 2026-04 same-day batch collision — when two batches target the same role, `npx job-forge merge` updates the existing day-file row rather than appending, so grepping day files alone misses earlier-batch applies; merged/*.tsv is the only place the breadcrumb remains
 - [H3] Before every batch of `task` dispatches that will use Geometra, call `geometra_list_sessions` then `geometra_disconnect({closeBrowser: true})`. Every round, no exceptions. Name this cleanup as an explicit "step 0" in your first-response plan for any multi-apply request — it is the most frequently skipped guardrail in practice, and skipping it produces cascade "Not connected" failures on the next dispatch.
@@ -80,12 +80,18 @@ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs v
 - [D14] Treat `templates/migrations.json` as the source of truth for consumer-project upgrades. Use `npx job-forge migrate:plan` or `npx job-forge migrate:check` when diagnosing harness drift; `job-forge sync` applies safe migrations automatically unless `JOB_FORGE_SKIP_MIGRATIONS=1` is set.
   why: `iso-migrate` is not an MCP and adds no prompt/tool-schema tokens; it prevents stale consumer scripts and generated-artifact ignores without asking agents to hand-edit package.json
+- [D15] Treat `templates/canon.json` as the source of truth for URL/company/role identity keys. Use `npx job-forge canon:key ...` or `npx job-forge canon:compare ...` before broad duplicate checks when a stable key or same/possible/different decision is useful.
+  why: `iso-canon` is not an MCP and adds no prompt/tool-schema tokens; it centralizes duplicate-key rules so agents do not repeatedly derive inconsistent slugs for aliases, suffixes, remote/location noise, or tracking URLs
+- [D16] Treat `templates/preflight.json` as the source of truth for multi-apply dispatch safety. After candidate facts and gates are materialized from authoritative files, run `npx job-forge preflight:plan --candidates <file>` or `npx job-forge preflight:check --candidates <file>` before task dispatch; follow the emitted rounds and pre/post steps. This does not replace H2 four-source grep until those facts are materialized into the candidate JSON.
+  why: `iso-preflight` is not an MCP and adds no prompt/tool-schema tokens; it turns file-backed facts, duplicate/location gates, max-two rounds, and cleanup/merge/verify steps into an executable local plan instead of repeated prose
 ## Procedure
 1. Check `cv.md`, `profile.yml`, and `portals.yml`; onboard if any file is missing.
 2. Pick and name the mode from **Routing** [D6]. No match → ask; do not guess.
-3. Read the active mode file [D3]. Use context bundle checks when changing context loads [D11]. Check cached artifacts before URL/JD refetches [D12]. Use artifact index lookups before broad file reads when they can answer the question [D13]. Use migration checks for harness drift [D14]. Decide inline vs delegated work [D1].
-4. Prepare Geometra dispatches: cleanup [H3], index/ledger prefilter when useful [D8, D13], dedupe [H2], location filter [D5], routing [D2, D10], proxy prompt hygiene [H8].
+3. Read the active mode file [D3]. Use context bundle checks when changing context loads [D11]. Check cached artifacts before URL/JD refetches [D12]. Use artifact index lookups before broad file reads when they can answer the question [D13]. Use canonical identity keys for duplicate checks [D15]. Use migration checks for harness drift [D14]. Decide inline vs delegated work [D1].
+4. Prepare Geometra dispatches: cleanup [H3], canon/index/ledger prefilter when useful [D8, D13, D15], dedupe [H2], location filter [D5], materialize candidate facts/gates and run preflight plan/check [D16], routing [D2, D10], proxy prompt hygiene [H8].
 5. Dispatch at most 2 tasks per round [H1]; wait for final outcomes, not just task ids [H5b].
 6. Keep multi-job form-filling out of the orchestrator [H4].
 7. Cross-check subagent facts against authoritative files [H7].

package/.opencode/skills/job-forge.md CHANGED Viewed

@@ -78,6 +78,14 @@ Local artifact index (terminal, outside opencode):
   npx job-forge index:has --key "company-role:acme:staff-engineer"
   npx job-forge index:query "acme"
+Identity keys (terminal, outside opencode):
+  npx job-forge canon:key company-role --company "Acme" --role "Staff Engineer"
+  npx job-forge canon:compare company "OpenAI, Inc." "Open AI"
+Preflight dispatch plans (terminal, outside opencode):
+  npx job-forge preflight:plan --candidates batch/preflight-candidates.json
+  npx job-forge preflight:check --candidates batch/preflight-candidates.json
 Consumer migrations (terminal, outside opencode):
   npx job-forge migrate:plan           # preview package.json/.gitignore drift
   npx job-forge migrate:apply          # apply safe harness upgrade migrations
@@ -168,11 +176,13 @@ Step 1  — Enumerate candidates
   - Build ordered list: candidates = [job_1, job_2, ..., job_N]
 Step 2  — Dedup against already-applied
-  - Run npx job-forge index:has --key "company-role:<company-slug>:<role-slug>"
-    as a fast local artifact prefilter when company+role is known. It rebuilds
-    .jobforge-index.json on demand from templates/index.json. A hit means the
-    role has already appeared in tracker files or tracker TSVs and can be
-    dropped before dispatch.
+  - Derive the stable key with npx job-forge canon:key company-role --company
+    "<company>" --role "<role>" when company+role is known.
+  - Run npx job-forge index:has --key "<canon-key>" as a fast local artifact
+    prefilter. It rebuilds .jobforge-index.json on demand from
+    templates/index.json and canonicalizes indexed company/role records through
+    templates/canon.json. A hit means the role has already appeared in tracker
+    files or tracker TSVs and can be dropped before dispatch.
   - If .jobforge-ledger/events.jsonl exists, use npx job-forge ledger:has as a
     fast prefilter for obvious company+role Applied duplicates. A ledger match
     can be dropped before dispatch without loading tracker files into context.
@@ -188,7 +198,19 @@ Step 3  — Pre-flight cleanup (once, before the loop)
   - geometra_list_sessions()
   - geometra_disconnect({ closeBrowser: true })
-Step 4  — Loop in rounds of 2 (Hard Limit #1)
+Step 4  — Materialize and check the dispatch plan
+  - Write file-backed candidate facts/gates to batch/preflight-candidates.json
+    (or another explicit JSON file). Include source paths for company, role,
+    companyRoleKey, URL, score, duplicate/location gates, and any skip/block
+    decision.
+  - Run npx job-forge preflight:check --candidates <file> to fail on missing
+    sources or blocked gates, then npx job-forge preflight:plan --candidates
+    <file> to get the bounded round list.
+  - Follow the emitted rounds. Do not dispatch blocked candidates, and do not
+    replace H2's four-source grep with preflight unless those grep results are
+    present in the candidate JSON.
+Step 5  — Loop in rounds of 2 (Hard Limit #1)
   for round in ceil(len(candidates) / 2):
     pair = candidates[round*2 : round*2 + 2]
     # If proxy is configured, do not paste proxy values into prompts.

package/AGENTS.md CHANGED Viewed

@@ -7,7 +7,7 @@ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs v
 - [H1] Max 2 parallel `task` dispatches per message. For N jobs, run `ceil(N/2)` sequential rounds of 2. A round is not complete until both subagents return a final outcome (`APPLIED`, `APPLY FAILED`, `SKIP`, `Discarded`, or a written TSV path). A `task` tool result that only gives a session id / title is a launch acknowledgement, not completion. Applies in all modes, for all user phrasings ("urgent", "apply to 10 jobs now").
   why: each subagent requires post-cleanup and racing more than 2 reliably loses at least one result. On 2026-04-25 the orchestrator launched round 2 while round 1 had only returned task ids, leaving four application subagents in flight and losing two provider recoveries
-- [H2] Max 1 application per company+role. Before every `apply` dispatch, grep all four sources for the URL and for `company+role`: `data/pipeline.md`, all `data/applications/*.md` day files, `batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`. `npx job-forge index:has --key "company-role:..."` may be used first as a fast local artifact prefilter; it rebuilds `.jobforge-index.json` on demand from `templates/index.json`, and a company+role hit is enough to drop an obvious duplicate before dispatch. If `.jobforge-ledger/events.jsonl` exists, `npx job-forge ledger:has --company "..." --role "..." --status Applied` may also be used as a fast prefilter; a match is enough to drop that duplicate before dispatch. For candidates not rejected by the index or ledger, the four-source grep is still mandatory. If any source shows APPLIED / Applied, skip the dispatch and pick a replacement from the remaining candidate list. Do not count duplicates toward a requested "apply to N jobs" total, and do not delegate obvious duplicates just so a subagent can return SKIP.
+- [H2] Max 1 application per company+role. Before every `apply` dispatch, grep all four sources for the URL and for `company+role`: `data/pipeline.md`, all `data/applications/*.md` day files, `batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`. Use `npx job-forge canon:key company-role --company "..." --role "..."` when deriving a stable duplicate key; do not invent slugs in prose. `npx job-forge index:has --key "company-role:..."` may be used first as a fast local artifact prefilter; it rebuilds `.jobforge-index.json` on demand from `templates/index.json`, and a company+role hit is enough to drop an obvious duplicate before dispatch. If `.jobforge-ledger/events.jsonl` exists, `npx job-forge ledger:has --company "..." --role "..." --status Applied` may also be used as a fast prefilter; a match is enough to drop that duplicate before dispatch. For candidates not rejected by the index or ledger, the four-source grep is still mandatory. If any source shows APPLIED / Applied, skip the dispatch and pick a replacement from the remaining candidate list. Do not count duplicates toward a requested "apply to N jobs" total, and do not delegate obvious duplicates just so a subagent can return SKIP.
   why: 2026-04 same-day batch collision — when two batches target the same role, `npx job-forge merge` updates the existing day-file row rather than appending, so grepping day files alone misses earlier-batch applies; merged/*.tsv is the only place the breadcrumb remains
 - [H3] Before every batch of `task` dispatches that will use Geometra, call `geometra_list_sessions` then `geometra_disconnect({closeBrowser: true})`. Every round, no exceptions. Name this cleanup as an explicit "step 0" in your first-response plan for any multi-apply request — it is the most frequently skipped guardrail in practice, and skipping it produces cascade "Not connected" failures on the next dispatch.
@@ -75,12 +75,18 @@ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs v
 - [D14] Treat `templates/migrations.json` as the source of truth for consumer-project upgrades. Use `npx job-forge migrate:plan` or `npx job-forge migrate:check` when diagnosing harness drift; `job-forge sync` applies safe migrations automatically unless `JOB_FORGE_SKIP_MIGRATIONS=1` is set.
   why: `iso-migrate` is not an MCP and adds no prompt/tool-schema tokens; it prevents stale consumer scripts and generated-artifact ignores without asking agents to hand-edit package.json
+- [D15] Treat `templates/canon.json` as the source of truth for URL/company/role identity keys. Use `npx job-forge canon:key ...` or `npx job-forge canon:compare ...` before broad duplicate checks when a stable key or same/possible/different decision is useful.
+  why: `iso-canon` is not an MCP and adds no prompt/tool-schema tokens; it centralizes duplicate-key rules so agents do not repeatedly derive inconsistent slugs for aliases, suffixes, remote/location noise, or tracking URLs
+- [D16] Treat `templates/preflight.json` as the source of truth for multi-apply dispatch safety. After candidate facts and gates are materialized from authoritative files, run `npx job-forge preflight:plan --candidates <file>` or `npx job-forge preflight:check --candidates <file>` before task dispatch; follow the emitted rounds and pre/post steps. This does not replace H2 four-source grep until those facts are materialized into the candidate JSON.
+  why: `iso-preflight` is not an MCP and adds no prompt/tool-schema tokens; it turns file-backed facts, duplicate/location gates, max-two rounds, and cleanup/merge/verify steps into an executable local plan instead of repeated prose
 ## Procedure
 1. Check `cv.md`, `profile.yml`, and `portals.yml`; onboard if any file is missing.
 2. Pick and name the mode from **Routing** [D6]. No match → ask; do not guess.
-3. Read the active mode file [D3]. Use context bundle checks when changing context loads [D11]. Check cached artifacts before URL/JD refetches [D12]. Use artifact index lookups before broad file reads when they can answer the question [D13]. Use migration checks for harness drift [D14]. Decide inline vs delegated work [D1].
-4. Prepare Geometra dispatches: cleanup [H3], index/ledger prefilter when useful [D8, D13], dedupe [H2], location filter [D5], routing [D2, D10], proxy prompt hygiene [H8].
+3. Read the active mode file [D3]. Use context bundle checks when changing context loads [D11]. Check cached artifacts before URL/JD refetches [D12]. Use artifact index lookups before broad file reads when they can answer the question [D13]. Use canonical identity keys for duplicate checks [D15]. Use migration checks for harness drift [D14]. Decide inline vs delegated work [D1].
+4. Prepare Geometra dispatches: cleanup [H3], canon/index/ledger prefilter when useful [D8, D13, D15], dedupe [H2], location filter [D5], materialize candidate facts/gates and run preflight plan/check [D16], routing [D2, D10], proxy prompt hygiene [H8].
 5. Dispatch at most 2 tasks per round [H1]; wait for final outcomes, not just task ids [H5b].
 6. Keep multi-job form-filling out of the orchestrator [H4].
 7. Cross-check subagent facts against authoritative files [H7].

package/CLAUDE.md CHANGED Viewed

@@ -7,7 +7,7 @@ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs v
 - [H1] Max 2 parallel `task` dispatches per message. For N jobs, run `ceil(N/2)` sequential rounds of 2. A round is not complete until both subagents return a final outcome (`APPLIED`, `APPLY FAILED`, `SKIP`, `Discarded`, or a written TSV path). A `task` tool result that only gives a session id / title is a launch acknowledgement, not completion. Applies in all modes, for all user phrasings ("urgent", "apply to 10 jobs now").
   why: each subagent requires post-cleanup and racing more than 2 reliably loses at least one result. On 2026-04-25 the orchestrator launched round 2 while round 1 had only returned task ids, leaving four application subagents in flight and losing two provider recoveries
-- [H2] Max 1 application per company+role. Before every `apply` dispatch, grep all four sources for the URL and for `company+role`: `data/pipeline.md`, all `data/applications/*.md` day files, `batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`. `npx job-forge index:has --key "company-role:..."` may be used first as a fast local artifact prefilter; it rebuilds `.jobforge-index.json` on demand from `templates/index.json`, and a company+role hit is enough to drop an obvious duplicate before dispatch. If `.jobforge-ledger/events.jsonl` exists, `npx job-forge ledger:has --company "..." --role "..." --status Applied` may also be used as a fast prefilter; a match is enough to drop that duplicate before dispatch. For candidates not rejected by the index or ledger, the four-source grep is still mandatory. If any source shows APPLIED / Applied, skip the dispatch and pick a replacement from the remaining candidate list. Do not count duplicates toward a requested "apply to N jobs" total, and do not delegate obvious duplicates just so a subagent can return SKIP.
+- [H2] Max 1 application per company+role. Before every `apply` dispatch, grep all four sources for the URL and for `company+role`: `data/pipeline.md`, all `data/applications/*.md` day files, `batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`. Use `npx job-forge canon:key company-role --company "..." --role "..."` when deriving a stable duplicate key; do not invent slugs in prose. `npx job-forge index:has --key "company-role:..."` may be used first as a fast local artifact prefilter; it rebuilds `.jobforge-index.json` on demand from `templates/index.json`, and a company+role hit is enough to drop an obvious duplicate before dispatch. If `.jobforge-ledger/events.jsonl` exists, `npx job-forge ledger:has --company "..." --role "..." --status Applied` may also be used as a fast prefilter; a match is enough to drop that duplicate before dispatch. For candidates not rejected by the index or ledger, the four-source grep is still mandatory. If any source shows APPLIED / Applied, skip the dispatch and pick a replacement from the remaining candidate list. Do not count duplicates toward a requested "apply to N jobs" total, and do not delegate obvious duplicates just so a subagent can return SKIP.
   why: 2026-04 same-day batch collision — when two batches target the same role, `npx job-forge merge` updates the existing day-file row rather than appending, so grepping day files alone misses earlier-batch applies; merged/*.tsv is the only place the breadcrumb remains
 - [H3] Before every batch of `task` dispatches that will use Geometra, call `geometra_list_sessions` then `geometra_disconnect({closeBrowser: true})`. Every round, no exceptions. Name this cleanup as an explicit "step 0" in your first-response plan for any multi-apply request — it is the most frequently skipped guardrail in practice, and skipping it produces cascade "Not connected" failures on the next dispatch.
@@ -75,12 +75,18 @@ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs v
 - [D14] Treat `templates/migrations.json` as the source of truth for consumer-project upgrades. Use `npx job-forge migrate:plan` or `npx job-forge migrate:check` when diagnosing harness drift; `job-forge sync` applies safe migrations automatically unless `JOB_FORGE_SKIP_MIGRATIONS=1` is set.
   why: `iso-migrate` is not an MCP and adds no prompt/tool-schema tokens; it prevents stale consumer scripts and generated-artifact ignores without asking agents to hand-edit package.json
+- [D15] Treat `templates/canon.json` as the source of truth for URL/company/role identity keys. Use `npx job-forge canon:key ...` or `npx job-forge canon:compare ...` before broad duplicate checks when a stable key or same/possible/different decision is useful.
+  why: `iso-canon` is not an MCP and adds no prompt/tool-schema tokens; it centralizes duplicate-key rules so agents do not repeatedly derive inconsistent slugs for aliases, suffixes, remote/location noise, or tracking URLs
+- [D16] Treat `templates/preflight.json` as the source of truth for multi-apply dispatch safety. After candidate facts and gates are materialized from authoritative files, run `npx job-forge preflight:plan --candidates <file>` or `npx job-forge preflight:check --candidates <file>` before task dispatch; follow the emitted rounds and pre/post steps. This does not replace H2 four-source grep until those facts are materialized into the candidate JSON.
+  why: `iso-preflight` is not an MCP and adds no prompt/tool-schema tokens; it turns file-backed facts, duplicate/location gates, max-two rounds, and cleanup/merge/verify steps into an executable local plan instead of repeated prose
 ## Procedure
 1. Check `cv.md`, `profile.yml`, and `portals.yml`; onboard if any file is missing.
 2. Pick and name the mode from **Routing** [D6]. No match → ask; do not guess.
-3. Read the active mode file [D3]. Use context bundle checks when changing context loads [D11]. Check cached artifacts before URL/JD refetches [D12]. Use artifact index lookups before broad file reads when they can answer the question [D13]. Use migration checks for harness drift [D14]. Decide inline vs delegated work [D1].
-4. Prepare Geometra dispatches: cleanup [H3], index/ledger prefilter when useful [D8, D13], dedupe [H2], location filter [D5], routing [D2, D10], proxy prompt hygiene [H8].
+3. Read the active mode file [D3]. Use context bundle checks when changing context loads [D11]. Check cached artifacts before URL/JD refetches [D12]. Use artifact index lookups before broad file reads when they can answer the question [D13]. Use canonical identity keys for duplicate checks [D15]. Use migration checks for harness drift [D14]. Decide inline vs delegated work [D1].
+4. Prepare Geometra dispatches: cleanup [H3], canon/index/ledger prefilter when useful [D8, D13, D15], dedupe [H2], location filter [D5], materialize candidate facts/gates and run preflight plan/check [D16], routing [D2, D10], proxy prompt hygiene [H8].
 5. Dispatch at most 2 tasks per round [H1]; wait for final outcomes, not just task ids [H5b].
 6. Keep multi-job form-filling out of the orchestrator [H4].
 7. Cross-check subagent facts against authoritative files [H7].

package/README.md CHANGED Viewed

@@ -31,7 +31,7 @@ The scaffolded `opencode.json` already has three MCPs wired up — they launch a
 - **Gmail** — reads replies from recruiters
 - **state-trace** — typed working memory for cross-session context (resumed batches, recent decisions, repeated portal quirks). Install once with `python3 -m pip install "state-trace[mcp]"`; the MCP command is `state-trace-mcp`.
-JobForge also keeps MCP-free local workflow state: `templates/contracts.json` defines tracker/apply artifact shapes via `@razroo/iso-contract`, `templates/capabilities.json` defines role capability boundaries via `@razroo/iso-capabilities`, `templates/context.json` defines deterministic mode/reference bundles via `@razroo/iso-context`, `templates/migrations.json` defines safe consumer-project upgrades via `@razroo/iso-migrate`, `.jobforge-ledger/events.jsonl` records duplicate/status events via `@razroo/iso-ledger`, `.jobforge-cache/` stores reusable JD/artifact content via `@razroo/iso-cache`, and `.jobforge-index.json` indexes artifact source pointers via `@razroo/iso-index`. None of these add always-on prompt or tool-schema tokens.
+JobForge also keeps MCP-free local workflow state: `templates/canon.json` defines URL/company/role identity keys via `@razroo/iso-canon`, `templates/contracts.json` defines tracker/apply artifact shapes via `@razroo/iso-contract`, `templates/capabilities.json` defines role capability boundaries via `@razroo/iso-capabilities`, `templates/context.json` defines deterministic mode/reference bundles via `@razroo/iso-context`, `templates/preflight.json` defines safe dispatch rounds/gates via `@razroo/iso-preflight`, `templates/migrations.json` defines safe consumer-project upgrades via `@razroo/iso-migrate`, `.jobforge-ledger/events.jsonl` records duplicate/status events via `@razroo/iso-ledger`, `.jobforge-cache/` stores reusable JD/artifact content via `@razroo/iso-cache`, and `.jobforge-index.json` indexes artifact source pointers via `@razroo/iso-index`. None of these add always-on prompt or tool-schema tokens.
 `npm install` also materializes symlinks for every supported agent harness — OpenCode, Cursor, Claude Code, and Codex — so you can run `opencode`, `cursor`, `claude`, or `codex` in the same project and each picks up the shared MCP config and instructions.
@@ -78,7 +78,7 @@ JobForge turns opencode into a full job search command center. Instead of manual
 | **Durable Batch Orchestration** | `batch-runner.sh` uses `@razroo/iso-orchestrator` for resumable bundle execution, bounded fan-out, mutexed state writes, and workflow records in `.jobforge-runs/`. |
 | **Pipeline Integrity** | Automated merge, dedup, status normalization, health checks |
 | **Cost-Aware Agent Routing** | Three subagents (`@general-free`, `@general-paid`, `@glm-minimal`) with per-task tool surfaces. On OpenCode, JobForge pins all tiers to `opencode-go/deepseek-v4-flash` so application runs avoid overloaded free-model pools. See [Subagent Routing in AGENTS.md](AGENTS.md) for the task-to-agent mapping. |
-| **Trace + Telemetry + Guard + Contract + Ledger + Capabilities + Context + Cache + Index + Migrate** | `job-forge trace:*` exposes local OpenCode transcripts, `job-forge telemetry:*` summarizes runs, `job-forge guard:*` audits deterministic policy rules, `templates/contracts.json` enforces artifact shape with `iso-contract`, `job-forge ledger:*` queries append-only workflow state, `job-forge capabilities:*` checks role boundaries, `job-forge context:*` plans mode/reference context bundles, `job-forge cache:*` reuses fetched JD/artifact content, `job-forge index:*` queries compact source pointers, and `job-forge migrate:*` applies safe consumer-project upgrades without MCP/tool-schema overhead. |
+| **Trace + Telemetry + Guard + Contract + Canon + Ledger + Capabilities + Context + Cache + Index + Preflight + Migrate** | `job-forge trace:*` exposes local OpenCode transcripts, `job-forge telemetry:*` summarizes runs, `job-forge guard:*` audits deterministic policy rules, `templates/contracts.json` enforces artifact shape with `iso-contract`, `job-forge canon:*` derives stable URL/company/role identity keys, `job-forge ledger:*` queries append-only workflow state, `job-forge capabilities:*` checks role boundaries, `job-forge context:*` plans mode/reference context bundles, `job-forge cache:*` reuses fetched JD/artifact content, `job-forge index:*` queries compact source pointers, `job-forge preflight:*` plans bounded apply dispatch rounds from file-backed candidate facts, and `job-forge migrate:*` applies safe consumer-project upgrades without MCP/tool-schema overhead. |
 | **Token Cost Visibility** | `job-forge tokens --days 1` for per-session breakdown; `job-forge session-report --since-minutes 60 --log` to flag sessions over budget and append history to `data/token-usage.tsv`. Auto-logged after every batch run. |
 ## Usage
@@ -164,7 +164,7 @@ my-search/
 ├── .opencode/skills/job-forge.md # → skill router
 ├── .opencode/agents/             # → @general-free, @general-paid, @glm-minimal
 ├── modes/                        # → _shared.md + skill modes
-├── templates/                    # → states.yml, portals.example.yml, cv-template.html, capabilities.json, context.json, index.json, migrations.json
+├── templates/                    # → states.yml, portals.example.yml, cv-template.html, canon.json, capabilities.json, context.json, index.json, preflight.json, migrations.json
 ├── batch/batch-prompt.md         # → batch worker prompt
 ├── batch/batch-runner.sh         # → parallel orchestrator
 │
@@ -190,7 +190,7 @@ JobForge/
 │   ├── sync.mjs                  # postinstall: creates symlinks in consumer project
 │   └── create-job-forge.mjs      # scaffolder
 ├── modes/                        # _shared.md + 16 skill modes
-├── templates/                    # cv-template.html, portals.example.yml, states.yml, capabilities.json, context.json, migrations.json
+├── templates/                    # cv-template.html, portals.example.yml, states.yml, canon.json, capabilities.json, context.json, preflight.json, migrations.json
 ├── config/profile.example.yml    # template for consumer's profile.yml
 ├── batch/{batch-prompt.md,batch-runner.sh}   # batch orchestrator
 ├── scripts/
@@ -201,6 +201,8 @@ JobForge/
 │   ├── context.mjs               # iso-context-backed context bundle CLI
 │   ├── cache.mjs                 # iso-cache-backed local artifact cache CLI
 │   ├── index.mjs                 # iso-index-backed artifact lookup CLI
+│   ├── canon.mjs                 # iso-canon-backed identity normalization CLI
+│   ├── preflight.mjs             # iso-preflight-backed dispatch planning CLI
 │   ├── migrate.mjs               # iso-migrate-backed consumer-project migrations
 │   ├── token-usage-report.mjs    # opencode cost analyzer
 │   └── release/check-source.mjs  # version gate for npm publish

package/bin/create-job-forge.mjs CHANGED Viewed

@@ -147,6 +147,13 @@ const consumerPkg = {
     'index:has': 'job-forge index:has',
     'index:query': 'job-forge index:query',
     'index:explain': 'job-forge index:explain',
+    'canon:normalize': 'job-forge canon:normalize',
+    'canon:key': 'job-forge canon:key',
+    'canon:compare': 'job-forge canon:compare',
+    'canon:explain': 'job-forge canon:explain',
+    'preflight:plan': 'job-forge preflight:plan',
+    'preflight:check': 'job-forge preflight:check',
+    'preflight:explain': 'job-forge preflight:explain',
     'migrate:plan': 'job-forge migrate:plan',
     'migrate:apply': 'job-forge migrate:apply',
     'migrate:check': 'job-forge migrate:check',
@@ -252,6 +259,8 @@ Before doing any work, remember where things live in *this* project:
 | Scanner dedup history | \`data/scan-history.tsv\` | Only touch in \`/job-forge scan\` |
 | Local workflow ledger | \`.jobforge-ledger/events.jsonl\` | Deterministic append-only state; use \`job-forge ledger:*\` |
 | Local artifact index | \`.jobforge-index.json\` | Deterministic file/line lookup; use \`job-forge index:*\` |
+| Identity canonicalization | \`templates/canon.json\` | Stable URL/company/role keys; use \`job-forge canon:*\` |
+| Dispatch preflight policy | \`templates/preflight.json\` | Safe apply rounds/gates; use \`job-forge preflight:*\` |
 | Consumer migrations | \`templates/migrations.json\` | Safe script/gitignore upgrades; use \`job-forge migrate:*\` |
 | Scanner config | \`portals.yml\` (project root) | Company configs |
 | Profile / identity | \`config/profile.yml\` | Candidate name, email, target roles |
@@ -402,6 +411,8 @@ job-forge merge            # merge batch/tracker-additions/*.tsv into the tracke
 job-forge verify           # verify pipeline integrity
 job-forge ledger:status    # local deterministic workflow ledger status
 job-forge index:status     # local artifact index status
+job-forge canon:key company-role --company "Acme, Inc." --role "Senior SWE"
+job-forge preflight:plan --candidates batch/preflight-candidates.json
 job-forge migrate:check    # verify consumer package scripts/gitignore are current
 job-forge pdf cv.md out.pdf
 job-forge tokens --days 1  # per-session opencode token usage

package/bin/job-forge.mjs CHANGED Viewed

@@ -25,6 +25,8 @@
  *   context:*      Query/render deterministic context bundles via iso-context
  *   cache:*        Reuse local deterministic artifacts via iso-cache
  *   index:*        Query local artifacts via iso-index
+ *   canon:*        Compute deterministic identity keys via iso-canon
+ *   preflight:*    Plan safe dispatch rounds via iso-preflight
  *   migrate:*      Apply deterministic consumer-project migrations via iso-migrate
  *   sync           Re-run the harness symlink sync (bin/sync.mjs)
  *   help, --help   Show this message
@@ -128,6 +130,21 @@ const indexAliases = {
   'index:path': 'path',
 };
+const canonAliases = {
+  'canon:normalize': 'normalize',
+  'canon:key': 'key',
+  'canon:compare': 'compare',
+  'canon:explain': 'explain',
+  'canon:path': 'path',
+};
+const preflightAliases = {
+  'preflight:plan': 'plan',
+  'preflight:check': 'check',
+  'preflight:explain': 'explain',
+  'preflight:path': 'path',
+};
 const migrateAliases = {
   'migrate:plan': 'plan',
   'migrate:apply': 'apply',
@@ -186,6 +203,12 @@ Commands:
   index:has               Check indexed URL/company-role/report facts without loading source files
   index:query             Query indexed reports, tracker rows, TSVs, scan history, pipeline, and ledger
   index:verify            Validate local artifact index integrity
+  canon:key               Print stable URL/company/role/company-role keys
+  canon:compare           Compare two identifiers as same/possible/different
+  canon:explain           Show the active identity canonicalization policy
+  preflight:plan          Build bounded dispatch plan from candidate JSON
+  preflight:check         Fail if preflight candidates are blocked
+  preflight:explain       Show the active preflight workflow policy
   migrate:plan            Preview deterministic consumer-project migrations
   migrate:apply           Apply deterministic consumer-project migrations
   migrate:check           Fail if migrations are pending
@@ -228,6 +251,10 @@ Pass --help after a command to see its own flags, e.g.:
   job-forge cache:put --url https://example.test/jobs/123 --input @jds/example.md
   job-forge index:has --key "company-role:acme:staff-engineer"
   job-forge index:query "acme"
+  job-forge canon:key company-role --company "Acme, Inc." --role "Senior SWE - Remote US"
+  job-forge canon:compare company "OpenAI, Inc." "Open AI"
+  job-forge preflight:plan --candidates batch/preflight-candidates.json
+  job-forge preflight:check --candidates batch/preflight-candidates.json
   job-forge migrate:check
   job-forge migrate:apply
@@ -359,6 +386,36 @@ if (cmd === 'index' || indexAliases[cmd]) {
   process.exit(result.status ?? 1);
 }
+if (cmd === 'canon' || canonAliases[cmd]) {
+  const canonArgs = cmd === 'canon'
+    ? (rest.length === 0 ? ['help'] : rest)
+    : [canonAliases[cmd], ...rest];
+  const scriptPath = join(PKG_ROOT, 'scripts/canon.mjs');
+  const result = spawnSync(process.execPath, [scriptPath, ...canonArgs], {
+    stdio: 'inherit',
+    cwd: PROJECT_DIR,
+    env: process.env,
+  });
+  process.exit(result.status ?? 1);
+}
+if (cmd === 'preflight' || preflightAliases[cmd]) {
+  const preflightArgs = cmd === 'preflight'
+    ? (rest.length === 0 ? ['help'] : rest)
+    : [preflightAliases[cmd], ...rest];
+  const scriptPath = join(PKG_ROOT, 'scripts/preflight.mjs');
+  const result = spawnSync(process.execPath, [scriptPath, ...preflightArgs], {
+    stdio: 'inherit',
+    cwd: PROJECT_DIR,
+    env: process.env,
+  });
+  process.exit(result.status ?? 1);
+}
 if (cmd === 'migrate' || migrateAliases[cmd]) {
   const migrateArgs = cmd === 'migrate'
     ? (rest.length === 0 ? ['help'] : rest)

package/docs/ARCHITECTURE.md CHANGED Viewed

@@ -32,7 +32,7 @@ my-search/
 ├── .opencode/skills/job-forge.md     # → skill router
 ├── .opencode/agents/                 # → @general-free, @general-paid, @glm-minimal
 ├── modes/                            # → mode files
-├── templates/                        # → states.yml, portals.example.yml, cv-template.html
+├── templates/                        # → states.yml, portals.example.yml, cv-template.html, preflight.json
 ├── batch/batch-prompt.md             # → batch worker prompt
 ├── batch/batch-runner.sh             # → parallel orchestrator
 └── node_modules/job-forge/           # harness, installed from npm
@@ -164,7 +164,9 @@ data/pipeline.md        →  Pending URLs and `local:jds/...` inbox (see modes/p
 .jobforge-index.json     →  Deterministic artifact lookup index built from templates/index.json
 jds/*.md                 →  Saved job descriptions referenced from the pipeline (`local:jds/{file}`)
 templates/states.yml     →  Canonical status values
+templates/canon.json      →  Canonical URL/company/role identity keys
 templates/context.json    →  Deterministic mode/reference context bundle policy
+templates/preflight.json  →  Safe apply dispatch rounds/gates policy
 templates/migrations.json → Safe consumer-project upgrade policy
 templates/cv-template.html → PDF generation template
 examples/*.md            →  Fictional layouts only (not read by scripts; see examples/README.md)
@@ -179,6 +181,8 @@ Create `data/pipeline.md` when you start using the URL inbox (`/job-forge pipeli
 - Tracker TSVs: `batch/tracker-additions/{num}-{company-slug}.tsv` (one file per evaluation; merged files move under `batch/tracker-additions/merged/`; shape enforced by `templates/contracts.json`)
 - Ledger: `.jobforge-ledger/events.jsonl` (created by `job-forge ledger:rebuild`, `tracker-line --write`, or `merge`; gitignored personal state)
 - Index: `.jobforge-index.json` (created on demand by `job-forge index:*`; gitignored local lookup state)
+- Canon: `templates/canon.json` (identity rules inspected with `job-forge canon:*`)
+- Preflight: `templates/preflight.json` (dispatch rounds/gates inspected with `job-forge preflight:*`)
 - Migrations: `templates/migrations.json` (applied by `job-forge sync` and inspectable with `job-forge migrate:*`)
 - Capabilities: `templates/capabilities.json` (role boundary policy inspected with `job-forge capabilities:*`)
 - Context: `templates/context.json` (mode/reference file bundles inspected with `job-forge context:*`)
@@ -225,7 +229,9 @@ Scripts maintain data consistency. In a consumer project they're invoked via the
 | `scripts/guard.mjs` | `npx job-forge guard:audit` / `guard:explain` | Deterministic `@razroo/iso-guard` policy audits over local OpenCode traces |
 | `scripts/ledger.mjs` | `npx job-forge ledger:status` / `ledger:has` / `ledger:rebuild` | Deterministic `@razroo/iso-ledger` state over tracker, TSV, and pipeline files |
 | `scripts/index.mjs` | `npx job-forge index:status` / `index:has` / `index:query` | Deterministic `@razroo/iso-index` lookup over reports, tracker rows, TSVs, pipeline, scan history, and ledger events |
+| `scripts/canon.mjs` | `npx job-forge canon:normalize` / `canon:key` / `canon:compare` | Deterministic `@razroo/iso-canon` identity normalization for URLs, companies, roles, and company+role pairs |
 | `scripts/context.mjs` | `npx job-forge context:list` / `context:plan` / `context:check` / `context:render` | Deterministic `@razroo/iso-context` mode/reference context bundle planning and rendering |
+| `scripts/preflight.mjs` | `npx job-forge preflight:plan` / `preflight:check` / `preflight:explain` | Deterministic `@razroo/iso-preflight` dispatch planning for file-backed candidate facts and gates |
 | `scripts/migrate.mjs` | `npx job-forge migrate:plan` / `migrate:apply` / `migrate:check` | Deterministic `@razroo/iso-migrate` consumer-project upgrades for scripts and generated-artifact ignores |
 | `tracker-lib.mjs` | _(library)_ | Shared helpers for reading/writing day-based tracker files — imported by merge/dedup/verify/normalize |
 | `bin/sync.mjs` | `npx job-forge sync` | Creates the harness symlinks in a consumer project and applies safe migrations (also runs as `postinstall`) |

package/docs/CUSTOMIZATION.md CHANGED Viewed

@@ -152,7 +152,15 @@ Mode/reference context bundles live in `templates/context.json` and are planned
 ## JobForge artifact index
-Artifact lookup policy lives in `templates/index.json` and is built locally by `@razroo/iso-index`. Use `job-forge index:has --key "company-role:acme:staff-engineer"` as a cheap duplicate/source prefilter, `job-forge index:query "acme"` to get compact source path/line pointers, and `job-forge index:verify` to validate `.jobforge-index.json`. Query, has, and verify rebuild the index on demand, so scaffolded projects need no setup. This is not an MCP and does not add tool-schema tokens.
+Artifact lookup policy lives in `templates/index.json` and is built locally by `@razroo/iso-index`. Use `job-forge index:has --key "company-role:acme:staff-engineer"` as a cheap duplicate/source prefilter, `job-forge index:query "acme"` to get compact source path/line pointers, and `job-forge index:verify` to validate `.jobforge-index.json`. Query, has, and verify rebuild the index on demand, so scaffolded projects need no setup. JobForge canonicalizes company/role and URL records through `templates/canon.json` before writing the index. This is not an MCP and does not add tool-schema tokens.
+## JobForge identity canonicalization
+URL, company, role, and company+role identity rules live in `templates/canon.json` and are enforced locally by `@razroo/iso-canon`. Use `job-forge canon:key company-role --company "OpenAI, Inc." --role "Senior SWE, AI Platform"` to derive the same duplicate key used by ledger/index helpers, and `job-forge canon:compare company "OpenAI, Inc." "Open AI"` to explain whether two values resolve to the same entity. Custom forks can extend aliases, suffixes, stop words, and match thresholds in `templates/canon.json`. This is not an MCP and does not add prompt or tool-schema tokens.
+## JobForge preflight plans
+Application dispatch policy lives in `templates/preflight.json` and is planned locally by `@razroo/iso-preflight`. After candidate facts and gate results have been materialized into JSON, use `job-forge preflight:plan --candidates <file>` to get bounded rounds and required pre/post steps, or `job-forge preflight:check --candidates <file>` to fail on missing source facts or blocked gates. This is not an MCP and does not add prompt or tool-schema tokens; it consumes only the candidate JSON you deliberately pass to it.
 ## JobForge consumer migrations

package/docs/README.md CHANGED Viewed

@@ -31,7 +31,7 @@ The harness exposes a single CLI (`job-forge`) installed as a `bin` entry. In a
 | What you need | Where to read |
 |---------------|---------------|
-| Full command list (`verify`, `merge`, `dedup`, `normalize`, `pdf`, `sync-check`, `tokens`, `trace`, `telemetry`, `guard`, `ledger`, `context`, `sync`). | [SETUP.md — Tracker and scripts (terminal)](SETUP.md#tracker-and-scripts-terminal). |
+| Full command list (`verify`, `merge`, `dedup`, `normalize`, `pdf`, `sync-check`, `tokens`, `trace`, `telemetry`, `guard`, `ledger`, `canon`, `context`, `preflight`, `sync`). | [SETUP.md — Tracker and scripts (terminal)](SETUP.md#tracker-and-scripts-terminal). |
 | What each harness `.mjs` script does. | [ARCHITECTURE.md — Pipeline integrity](ARCHITECTURE.md#pipeline-integrity) and the scripts table underneath. |
 | Batch runner, TSV layout, and `batch/tracker-additions/` merge flow. | [batch/README.md](../batch/README.md). |
 | PR gate for harness contributions (`npm run verify` + `npm run build:dashboard`). | [CONTRIBUTING.md — Development](../CONTRIBUTING.md#development). |

package/docs/SETUP.md CHANGED Viewed

@@ -126,10 +126,14 @@ From your project root, these commands maintain the tracker and pipeline checks.
 | Pipeline health check | `npx job-forge verify` | `npm run verify` |
 | Merge `batch/tracker-additions/*.tsv` into the tracker | `npx job-forge merge` | `npm run merge` |
 | Inspect tracker row contract | `npx iso-contract explain jobforge.tracker-row --contracts templates/contracts.json` | _(none)_ |
+| Derive canonical company/role key | `npx job-forge canon:key company-role --company "Acme" --role "Staff Engineer"` | `npm run canon:key -- company-role --company ...` |
+| Compare identity values | `npx job-forge canon:compare company "OpenAI, Inc." "Open AI"` | `npm run canon:compare -- company ...` |
 | Inspect role capabilities | `npx job-forge capabilities:explain general-free` | `npm run capabilities:explain -- general-free` |
 | Inspect context bundle budget | `npx job-forge context:plan apply` | `npm run context:plan -- apply` |
 | Inspect local JD/artifact cache | `npx job-forge cache:status` | `npm run cache:status` |
 | Inspect local artifact index | `npx job-forge index:status` | `npm run index:status` |
+| Plan safe application dispatch rounds | `npx job-forge preflight:plan --candidates batch/preflight-candidates.json` | `npm run preflight:plan -- --candidates ...` |
+| Fail on blocked preflight candidates | `npx job-forge preflight:check --candidates batch/preflight-candidates.json` | `npm run preflight:check -- --candidates ...` |
 | Inspect pending consumer migrations | `npx job-forge migrate:plan` | `npm run migrate:plan` |
 | Map status column to canonical labels | `npx job-forge normalize` | `npm run normalize` |
 | Merge duplicate company/role rows | `npx job-forge dedup` | `npm run dedup` |

package/iso/commands/job-forge.md CHANGED Viewed

@@ -81,6 +81,14 @@ Local artifact index (terminal, outside opencode):
   npx job-forge index:has --key "company-role:acme:staff-engineer"
   npx job-forge index:query "acme"
+Identity keys (terminal, outside opencode):
+  npx job-forge canon:key company-role --company "Acme" --role "Staff Engineer"
+  npx job-forge canon:compare company "OpenAI, Inc." "Open AI"
+Preflight dispatch plans (terminal, outside opencode):
+  npx job-forge preflight:plan --candidates batch/preflight-candidates.json
+  npx job-forge preflight:check --candidates batch/preflight-candidates.json
 Consumer migrations (terminal, outside opencode):
   npx job-forge migrate:plan           # preview package.json/.gitignore drift
   npx job-forge migrate:apply          # apply safe harness upgrade migrations
@@ -171,11 +179,13 @@ Step 1  — Enumerate candidates
   - Build ordered list: candidates = [job_1, job_2, ..., job_N]
 Step 2  — Dedup against already-applied
-  - Run npx job-forge index:has --key "company-role:<company-slug>:<role-slug>"
-    as a fast local artifact prefilter when company+role is known. It rebuilds
-    .jobforge-index.json on demand from templates/index.json. A hit means the
-    role has already appeared in tracker files or tracker TSVs and can be
-    dropped before dispatch.
+  - Derive the stable key with npx job-forge canon:key company-role --company
+    "<company>" --role "<role>" when company+role is known.
+  - Run npx job-forge index:has --key "<canon-key>" as a fast local artifact
+    prefilter. It rebuilds .jobforge-index.json on demand from
+    templates/index.json and canonicalizes indexed company/role records through
+    templates/canon.json. A hit means the role has already appeared in tracker
+    files or tracker TSVs and can be dropped before dispatch.
   - If .jobforge-ledger/events.jsonl exists, use npx job-forge ledger:has as a
     fast prefilter for obvious company+role Applied duplicates. A ledger match
     can be dropped before dispatch without loading tracker files into context.
@@ -191,7 +201,19 @@ Step 3  — Pre-flight cleanup (once, before the loop)
   - geometra_list_sessions()
   - geometra_disconnect({ closeBrowser: true })
-Step 4  — Loop in rounds of 2 (Hard Limit #1)
+Step 4  — Materialize and check the dispatch plan
+  - Write file-backed candidate facts/gates to batch/preflight-candidates.json
+    (or another explicit JSON file). Include source paths for company, role,
+    companyRoleKey, URL, score, duplicate/location gates, and any skip/block
+    decision.
+  - Run npx job-forge preflight:check --candidates <file> to fail on missing
+    sources or blocked gates, then npx job-forge preflight:plan --candidates
+    <file> to get the bounded round list.
+  - Follow the emitted rounds. Do not dispatch blocked candidates, and do not
+    replace H2's four-source grep with preflight unless those grep results are
+    present in the candidate JSON.
+Step 5  — Loop in rounds of 2 (Hard Limit #1)
   for round in ceil(len(candidates) / 2):
     pair = candidates[round*2 : round*2 + 2]
     # If proxy is configured, do not paste proxy values into prompts.

package/iso/instructions.md CHANGED Viewed

@@ -7,7 +7,7 @@ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs v
 - [H1] Max 2 parallel `task` dispatches per message. For N jobs, run `ceil(N/2)` sequential rounds of 2. A round is not complete until both subagents return a final outcome (`APPLIED`, `APPLY FAILED`, `SKIP`, `Discarded`, or a written TSV path). A `task` tool result that only gives a session id / title is a launch acknowledgement, not completion. Applies in all modes, for all user phrasings ("urgent", "apply to 10 jobs now").
   why: each subagent requires post-cleanup and racing more than 2 reliably loses at least one result. On 2026-04-25 the orchestrator launched round 2 while round 1 had only returned task ids, leaving four application subagents in flight and losing two provider recoveries
-- [H2] Max 1 application per company+role. Before every `apply` dispatch, grep all four sources for the URL and for `company+role`: `data/pipeline.md`, all `data/applications/*.md` day files, `batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`. `npx job-forge index:has --key "company-role:..."` may be used first as a fast local artifact prefilter; it rebuilds `.jobforge-index.json` on demand from `templates/index.json`, and a company+role hit is enough to drop an obvious duplicate before dispatch. If `.jobforge-ledger/events.jsonl` exists, `npx job-forge ledger:has --company "..." --role "..." --status Applied` may also be used as a fast prefilter; a match is enough to drop that duplicate before dispatch. For candidates not rejected by the index or ledger, the four-source grep is still mandatory. If any source shows APPLIED / Applied, skip the dispatch and pick a replacement from the remaining candidate list. Do not count duplicates toward a requested "apply to N jobs" total, and do not delegate obvious duplicates just so a subagent can return SKIP.
+- [H2] Max 1 application per company+role. Before every `apply` dispatch, grep all four sources for the URL and for `company+role`: `data/pipeline.md`, all `data/applications/*.md` day files, `batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`. Use `npx job-forge canon:key company-role --company "..." --role "..."` when deriving a stable duplicate key; do not invent slugs in prose. `npx job-forge index:has --key "company-role:..."` may be used first as a fast local artifact prefilter; it rebuilds `.jobforge-index.json` on demand from `templates/index.json`, and a company+role hit is enough to drop an obvious duplicate before dispatch. If `.jobforge-ledger/events.jsonl` exists, `npx job-forge ledger:has --company "..." --role "..." --status Applied` may also be used as a fast prefilter; a match is enough to drop that duplicate before dispatch. For candidates not rejected by the index or ledger, the four-source grep is still mandatory. If any source shows APPLIED / Applied, skip the dispatch and pick a replacement from the remaining candidate list. Do not count duplicates toward a requested "apply to N jobs" total, and do not delegate obvious duplicates just so a subagent can return SKIP.
   why: 2026-04 same-day batch collision — when two batches target the same role, `npx job-forge merge` updates the existing day-file row rather than appending, so grepping day files alone misses earlier-batch applies; merged/*.tsv is the only place the breadcrumb remains
 - [H3] Before every batch of `task` dispatches that will use Geometra, call `geometra_list_sessions` then `geometra_disconnect({closeBrowser: true})`. Every round, no exceptions. Name this cleanup as an explicit "step 0" in your first-response plan for any multi-apply request — it is the most frequently skipped guardrail in practice, and skipping it produces cascade "Not connected" failures on the next dispatch.
@@ -75,12 +75,18 @@ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs v
 - [D14] Treat `templates/migrations.json` as the source of truth for consumer-project upgrades. Use `npx job-forge migrate:plan` or `npx job-forge migrate:check` when diagnosing harness drift; `job-forge sync` applies safe migrations automatically unless `JOB_FORGE_SKIP_MIGRATIONS=1` is set.
   why: `iso-migrate` is not an MCP and adds no prompt/tool-schema tokens; it prevents stale consumer scripts and generated-artifact ignores without asking agents to hand-edit package.json
+- [D15] Treat `templates/canon.json` as the source of truth for URL/company/role identity keys. Use `npx job-forge canon:key ...` or `npx job-forge canon:compare ...` before broad duplicate checks when a stable key or same/possible/different decision is useful.
+  why: `iso-canon` is not an MCP and adds no prompt/tool-schema tokens; it centralizes duplicate-key rules so agents do not repeatedly derive inconsistent slugs for aliases, suffixes, remote/location noise, or tracking URLs
+- [D16] Treat `templates/preflight.json` as the source of truth for multi-apply dispatch safety. After candidate facts and gates are materialized from authoritative files, run `npx job-forge preflight:plan --candidates <file>` or `npx job-forge preflight:check --candidates <file>` before task dispatch; follow the emitted rounds and pre/post steps. This does not replace H2 four-source grep until those facts are materialized into the candidate JSON.
+  why: `iso-preflight` is not an MCP and adds no prompt/tool-schema tokens; it turns file-backed facts, duplicate/location gates, max-two rounds, and cleanup/merge/verify steps into an executable local plan instead of repeated prose
 ## Procedure
 1. Check `cv.md`, `profile.yml`, and `portals.yml`; onboard if any file is missing.
 2. Pick and name the mode from **Routing** [D6]. No match → ask; do not guess.
-3. Read the active mode file [D3]. Use context bundle checks when changing context loads [D11]. Check cached artifacts before URL/JD refetches [D12]. Use artifact index lookups before broad file reads when they can answer the question [D13]. Use migration checks for harness drift [D14]. Decide inline vs delegated work [D1].
-4. Prepare Geometra dispatches: cleanup [H3], index/ledger prefilter when useful [D8, D13], dedupe [H2], location filter [D5], routing [D2, D10], proxy prompt hygiene [H8].
+3. Read the active mode file [D3]. Use context bundle checks when changing context loads [D11]. Check cached artifacts before URL/JD refetches [D12]. Use artifact index lookups before broad file reads when they can answer the question [D13]. Use canonical identity keys for duplicate checks [D15]. Use migration checks for harness drift [D14]. Decide inline vs delegated work [D1].
+4. Prepare Geometra dispatches: cleanup [H3], canon/index/ledger prefilter when useful [D8, D13, D15], dedupe [H2], location filter [D5], materialize candidate facts/gates and run preflight plan/check [D16], routing [D2, D10], proxy prompt hygiene [H8].
 5. Dispatch at most 2 tasks per round [H1]; wait for final outcomes, not just task ids [H5b].
 6. Keep multi-job form-filling out of the orchestrator [H4].
 7. Cross-check subagent facts against authoritative files [H7].

package/lib/jobforge-cache.mjs CHANGED Viewed

@@ -10,6 +10,7 @@ import {
   resolveCacheDir,
   verifyCache,
 } from '@razroo/iso-cache';
+import { canonicalizeJobForgeUrl } from './jobforge-canon.mjs';
 export const CACHE_DIR = '.jobforge-cache';
 export const JD_CACHE_NAMESPACE = 'jobforge.jd';
@@ -96,10 +97,14 @@ export function normalizeJobUrl(url) {
   const text = String(url || '').trim();
   if (!text) throw new Error('url is required');
   try {
-    const parsed = new URL(text);
-    parsed.hash = '';
-    return parsed.toString();
+    return canonicalizeJobForgeUrl(text).canonical;
   } catch {
-    return text;
+    try {
+      const parsed = new URL(text);
+      parsed.hash = '';
+      return parsed.toString();
+    } catch {
+      return text;
+    }
   }
 }