npm - job-forge - Versions diffs - 2.14.7 → 2.14.8 - Mend

job-forge 2.14.7 → 2.14.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (14) hide show

package/.codex/config.toml +5 -0
package/.cursor/mcp.json +13 -0
package/.cursor/rules/main.mdc +20 -142
package/.mcp.json +13 -0
package/AGENTS.md +20 -142
package/CLAUDE.md +20 -142
package/README.md +7 -1
package/docs/SETUP.md +1 -0
package/iso/instructions.md +20 -142
package/iso/mcp.json +9 -0
package/modes/apply.md +17 -9
package/opencode.json +14 -0
package/package.json +6 -3
package/scripts/check-iso-smoke.mjs +43 -0

package/.codex/config.toml CHANGED Viewed

@@ -23,3 +23,8 @@ args = ["-y", "@geometra/mcp"]
 command = "npx"
 args = ["-y", "@razroo/gmail-mcp"]
 env = { DISABLE_HTTP = "true" }
+[mcp_servers.state-trace]
+command = "uvx"
+args = ["--from", "state-trace[mcp]", "state-trace-mcp"]
+env = { STATE_TRACE_STORAGE_PATH = ".state-trace/memory.db", STATE_TRACE_NAMESPACE = "job-forge", STATE_TRACE_CAPACITY_LIMIT = "256" }

package/.cursor/mcp.json CHANGED Viewed

@@ -16,6 +16,19 @@
       "env": {
         "DISABLE_HTTP": "true"
       }
+    },
+    "state-trace": {
+      "command": "uvx",
+      "args": [
+        "--from",
+        "state-trace[mcp]",
+        "state-trace-mcp"
+      ],
+      "env": {
+        "STATE_TRACE_STORAGE_PATH": ".state-trace/memory.db",
+        "STATE_TRACE_NAMESPACE": "job-forge",
+        "STATE_TRACE_CAPACITY_LIMIT": "256"
+      }
     }
   }
 }

package/.cursor/rules/main.mdc CHANGED Viewed

@@ -16,7 +16,7 @@ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs v
   why: 2026-04 same-day batch collision — when two batches target the same role, `npx job-forge merge` updates the existing day-file row rather than appending, so grepping day files alone misses earlier-batch applies; merged/*.tsv is the only place the breadcrumb remains
 - [H3] Before every batch of `task` dispatches that will use Geometra, call `geometra_list_sessions` then `geometra_disconnect({closeBrowser: true})`. Every round, no exceptions. Name this cleanup as an explicit "step 0" in your first-response plan for any multi-apply request — it is the most frequently skipped guardrail in practice, and skipping it produces cascade "Not connected" failures on the next dispatch.
-  why: if any prior subagent aborted mid-flow, its Chromium session stays stuck in the MCP pool and the next `geometra_connect` fails with "Not connected"; the disconnect is a no-op when the pool is empty but a poison-cure when it isn't; vocalizing it up-front doubles the odds it actually runs
+  why: aborted subagents can leave Chromium sessions stuck in the MCP pool. Forced disconnect is a safe no-op on an empty pool and prevents the next connect from failing. Naming it up front improves compliance
 - [H4] In multi-job mode, the orchestrator session MUST NOT call `geometra_fill_form`, `geometra_run_actions`, `geometra_pick_listbox_option`, or `geometra_fill_otp` directly. Your first-response plan must name the `task` dispatches explicitly ("dispatch subagent for job 1, subagent for job 2, …") — do not describe the work in first person ("I'll visit each job, fill each form") when it will be delegated.
   why: repeated Geometra calls in the orchestrator bloat the cache prefix — this is the 2026-04 "apply to 20 jobs" 341-msg incident where each turn re-processed 100K+ fresh tokens instead of reading from cache; first-person narration is a leading indicator that the agent is mentally queueing work for itself rather than a subagent
@@ -38,13 +38,10 @@ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs v
 - [D2] Route subagent work by cost tier. `@general-free`: procedural — form-fill, TSV merge, verify, OTP retrieval, portal scan metadata extraction, one-shot structured-field transforms. `@general-paid`: quality-sensitive — offer evaluation narrative Blocks A-F, cover letters, "Why X?" answers, STAR interview stories, LinkedIn outreach. `@glm-minimal`: narrow ≤5K-input one-shot extract/classify jobs that do not need context.
   why: GLM 5.1 doesn't discount cache reads so procedural work there costs ~10×; free-tier models handle procedural work fine empirically (`opencode/big-pickle` processed 1000+ messages at $0)
-- [D3] Upgrade `apply` routing to `@general-paid` when offer score ≥ 4.0/5, when user flags "top-tier / dream job / high-stakes", or when late-stage pipeline (post-screens).
-  why: form-fill flows are 6+ steps; free-tier sometimes aborts mid-flow on large Greenhouse/Workday schemas; paid tier has more headroom
+- [D3] Read the active mode file before dispatch. Mode files own score gates, provider fallback, portal runbooks, and output shape.
+  why: mode-specific rules change faster than global orchestration rules; keeping them out of the shared prefix preserves cache efficiency and prevents stale branches
-- [D3f] **Provider-failure downgrade on `apply` (all harnesses; OpenCode + OpenRouter especially):** If you dispatched `@general-paid` per [D3] and that subagent fails or exhausts retries with provider-side errors — copy mentioning Venice / Diem / Chutes, "insufficient" USD/credits/funds/balance, HTTP 402/429, overload / temporarily unavailable — re-dispatch the **same apply URL** once on `@general-free` before marking FAILED. Do not abandon the role solely because the upgraded tier hit a pool-specific limit.
-  why: `@general-paid` on OpenCode still uses free OpenRouter model ids; Venice-style balance errors are a backend-route issue, not proof that procedural `@general-free` cannot complete the same Greenhouse-style flow after [D5]/[H2] gates pass
-- [D4] Auto-submit for offers scoring 3.0+/5 without pausing for confirmation between steps — scan → evaluate → apply is one continuous pipeline. Mark SKIP for <3.0 and move on.
+- [D4] Auto-submit for offers scoring 3.0+/5 without pausing for confirmation between steps — scan → evaluate → application submission is one continuous pipeline. Mark SKIP for <3.0 and move on.
   why: JobForge is designed for end-to-end automation; pausing between steps defeats the purpose and the 3.0 gate already enforces quality
 - [D5] Before any batch-apply dispatch, run the Apply Preflight location filter from `modes/apply.md` to exclude location-incompatible candidates.
@@ -55,19 +52,16 @@ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs v
 ## Procedure
-1. On start, check `cv.md`, `profile.yml`, `portals.yml` exist; onboard if any missing.
-2. Pick the mode from **Routing** [D6]. No match → ask; do not guess.
-3. Apply [D1]: batch/Geometra work → delegate; single/read-only/dev → inline.
-4. Before any `task` batch using Geometra, run cleanup [H3].
-5. Before `apply`, run duplicate check [H2] and location filter [D5].
-6. Route by cost tier [D2]; upgrade to `@general-paid` per [D3] for high-stakes offers; if that apply dispatch hits provider errors, downgrade once per [D3f].
-7. Cap parallelism at 2 per round [H1].
-8. One in-flight dispatch per company [H5].
-9. Orchestrator does not fill forms in multi-job mode [H4].
-10. Treat subagent prose as untrusted [H7]; cross-check facts against authoritative files.
-11. Write outcomes as TSVs [H6]; run `npx job-forge merge` then `verify` at end.
-12. Offers scoring 3.0+/5 continue without confirmation [D4]; <3.0 is SKIP.
-13. Confirm tracker is merged and verified before ending.
+1. Check `cv.md`, `profile.yml`, and `portals.yml`; onboard if any file is missing.
+2. Pick and name the mode from **Routing** [D6]. No match → ask; do not guess.
+3. Read the active mode file [D3]; decide inline vs delegated work [D1].
+4. Prepare Geometra dispatches: cleanup [H3], dedupe [H2], location filter [D5], routing [D2].
+5. Dispatch at most 2 tasks per round [H1]; wait per company [H5].
+6. Keep multi-job form-filling out of the orchestrator [H4].
+7. Cross-check subagent facts against authoritative files [H7].
+8. Apply score gate [D4].
+9. Merge TSV outcomes [H6].
+10. Verify tracker before ending [H6].
 ## Routing
@@ -99,127 +93,11 @@ Output shape is mode-dependent — see `modes/{mode}.md` for each mode's expecte
 # Reference
-Sections below are context, rationale, runbooks, and portal-specific empirical notes. The **Hard limits**, **Defaults**, **Procedure**, and **Routing** above are the contract; the material below is what the orchestrator and each mode consult during execution.
----
-## Session Hygiene — ALWAYS enforce
-**Multi-job workflows MUST delegate each job to its own subagent.** This rule applies even when the user does NOT explicitly invoke `/job-forge`.
-Whenever the user says any variation of "apply to N jobs", "process the pipeline", "batch evaluate", or similar phrasing that implies more than one application/evaluation in sequence:
-1. **Do not drive all N jobs from this session.** Repeated `geometra_fill_form` / `geometra_page_model` calls accumulate in conversation history and invalidate prompt caching — each new message ends up re-processing 100K+ tokens of fresh history instead of reading from cache.
-2. **Launch one subagent per job, in parallel batches of ≤2** (see Hard Limits #1). Higher parallelism blows through free-tier rate limits and each subagent requires post-cleanup. Use the `task` tool / Agent with `subagent_type="general-purpose"`, passing the single URL and the relevant mode file content.
-3. **This session acts as the orchestrator only**: plan, pick the jobs, dispatch subagents, aggregate results. No Geometra form-filling in this session unless it's a single one-off application.
-**Why:** observed on a real run — a 341-msg "apply to 20 jobs" session had `cache_read ~1.8K` on 5 messages where input ballooned to 100K-144K tokens. A 40-msg orchestrator session that delegates instead stays under 40K input max with cache reads at full 100K+. Same work, ~5× fewer effective tokens.
-**Verify after running:** `npx job-forge tokens --session <id>` — any message with `cache_read < 5K` and `input > 50K` is a cache-bust; next time split that work across subagents.
-**Exception:** evaluation-only or tracker-only work (no Geometra, no repeated tool calls) can proceed in a single session. The rule targets tool-heavy multi-step loops.
-**Before any batch-apply dispatch, run the Apply Preflight location filter from `modes/apply.md`** to exclude location-incompatible candidates. Catches the common case where an evaluated role has the right role-shape but a deal-breaking location that profile.yml already rules out.
----
-## Subagent Routing — which agent for which task
-The harness ships three subagents (see `.opencode/agents/`). The orchestrator MUST route work by cost tier, not pick the default for everything. **GLM 5.1 does not discount cache reads**, so running procedural work on it costs ~10× what it would on a cache-discounting model. Free-tier models handle procedural work fine (confirmed empirically: `opencode/big-pickle` processed 1000+ messages at $0 in prior runs).
-| Task type | Subagent | Why |
-|-----------|----------|-----|
-| Drive Geometra form-fill / submit (atomic `run_actions`) | `@general-free` | Procedural; label-driven; deterministic |
-| Merge TSVs, run `verify-pipeline.mjs`, dedup | `@general-free` | Script-driven; no writing quality needed |
-| OTP retrieval via Gmail MCP + `geometra_fill_otp` | `@general-free` | Fixed-shape lookup + input |
-| Scan portals, extract offer metadata, return structured records (see schema below) | `@general-free` | Structured output; no judgment |
-| Evaluation narrative — Blocks A-F per `modes/offer.md` | `@general-paid` | Judgment + writing quality |
-| Cover letter, "Why X?" answers, Section G drafts | `@general-paid` | Tone and specificity matter |
-| STAR+R interview stories, story-bank curation | `@general-paid` | Quality signals seniority |
-| LinkedIn outreach messages (`modes/contact.md`) | `@general-paid` | First impression |
-| "Extract N fields from this text → JSON" (≤5K input) | `@glm-minimal` | One-shot transform; no context needed |
-| "Classify this JD as archetype X/Y/Z" | `@glm-minimal` | Narrow, structured output |
-**Example JSON shape for the "extract / emit JSON" subagent rows above** (use this exact key set when delegating a portal-scan / extract task):
-```json
-{
-  "company": "Acme",
-  "role": "Senior Backend Engineer",
-  "location": "Remote (US)",
-  "comp_range_usd": "180000-220000",
-  "archetype": "backend-platform",
-  "url": "https://..."
-}
-```
-**Rule:** when you (the orchestrator) delegate a task, pick the cheapest agent that can do it well. Do NOT route every subagent through the same tier. Auto-pipeline mode MUST split a single job across `@general-paid` (evaluation) and `@general-free` (PDF gen + tracker + apply), not run it all on one model.
-**When to break this rule:** if the user explicitly asks for "quality over cost" or flags a high-stakes application (top-tier company, offer-stage negotiation, executive search), route everything through `@general-paid`. Document the exception in the session.
-### When to delegate
-**Delegate (`task` out) when the work involves repeated tool-heavy steps that bloat the orchestrator's cache prefix.** The concrete failure mode this prevents: a 341-message "apply to 20 jobs" session where repeated `geometra_fill_form` / `geometra_page_model` calls accumulated in history, forcing each new message to re-process 100K+ tokens of fresh input instead of reading from cache.
-**Delegate when:**
-- Applying to N≥2 jobs (repeated Geometra form-fill — the original cache-bust scenario)
-- Batch portal scans hitting ≥3 companies (API loops + page-model reads stack up)
-- Any explicit "apply to... / process pipeline / batch evaluate" phrasing from the user (multi-job intent)
-**Do NOT delegate — orchestrate inline:**
-- Single-offer evaluation (text-heavy, not tool-heavy)
-- Development / bug-fix / file-editing tasks
-- `tracker` and other read-only modes
-- Single-company scan, single-URL check
-- One-shot questions — "what does this mean?", "read X and summarize", "what's my next report number?"
-**Detection signal:** if you're about to call `geometra_fill_form` for a second *different* job in the same session, STOP and delegate the remainder. For everything else, in-session execution is the expected default.
----
-## Ethical Use -- CRITICAL
-**Applying to jobs via Geometra MCP is NOT impersonation.** The agent uses the user's credentials, CV, and information to apply on their behalf. This is the same as the user applying themselves — the agent is an extension of the user's will. Think of it as a browser automation tool that fills forms with the user's data.
-**This system is designed for quality, not quantity.** The goal is to help the user find and apply to roles where there is a genuine match -- not to spam companies with mass applications.
-- **Continuous pipeline for 3.0+ offers:** When scanning, evaluating, and applying — run the full pipeline end-to-end without pausing for confirmation. Scan → evaluate → fill form → submit for any offer scoring 3.0/5 or above. Do NOT stop between steps to ask "want me to continue?" — just do it.
-- **Auto-submit:** For offers scoring 3.0+/5, fill and submit the application automatically. For offers below 3.0/5, mark as SKIP and move on.
-- **Still respect quality:** Only apply where there is a genuine match (3.0+ ensures this). Auto-SKIP anything below 3.0.
-- **Respect recruiters' time.** Every application a human reads costs someone's attention. Only send what's worth reading.
----
-## Offer Verification -- MANDATORY
-**Read local artifacts before the network.** If `reports/` already contains this posting URL (or company+role with a full JD in the body), **Read** that report for verification or evaluation instead of WebFetch/Geometra. If `data/pipeline.md` or `jds/` points at frozen JD text (`local:jds/{file}` or pasted blocks), **Read** that first. Reuse JD text already in the same conversation — do not fetch the same URL twice. (The JD extraction section at the top of `modes/auto-pipeline.md` and its "at most once per session" rule are the detailed contract.)
-**When Geometra MCP is available** (interactive sessions), ALWAYS use it to verify offers:
-1. `geometra_connect` to the URL (via proxy)
-2. `geometra_page_model` to read structured page content
-3. Only footer/navbar without JD = closed. Title + description + Apply = active.
-**When Geometra MCP is NOT available** (batch workers via `opencode run`, headless environments):
-1. Use WebFetch to retrieve the page content
-2. Check for JD text, job title, and apply button/link in the response
-3. If WebFetch returns only a shell/navbar (no JD content), mark the offer as `**Verification: unconfirmed**` in the report header
-4. Do NOT skip the evaluation — proceed but flag the uncertainty so the user can verify manually before applying
-The goal is to never waste time on closed offers, but also never silently assume a role is active when verification was incomplete.
-### Canonical MCP tools (quick reference)
-Pick tools by name directly — reduces unnecessary tool discovery:
-| Task | Preferred tools |
-|------|------------------|
-| JD from URL | Greenhouse boards API when the URL matches (see JD extraction in `modes/auto-pipeline.md`) → else `geometra_connect` + `geometra_page_model` → else WebFetch → WebSearch last |
-| Offer still live? | Same as JD when Geometra is available; else WebFetch per above |
-| One apply subagent (single job) | One `geometra_connect` per job URL; reuse `sessionId` through schema + fill; submit via atomic `geometra_run_actions` per `modes/apply.md` [H1]. Do **not** `geometra_disconnect` between `geometra_form_schema` and submit on the same form unless recovery requires it |
-| Chromium pool between orchestrator dispatch rounds | `geometra_list_sessions` + `geometra_disconnect({ closeBrowser: true })` per Hard limit [H3] — orchestrator-only; not a substitute for finishing the in-subagent form flow |
-@modes/reference-setup.md
+The sections above are the shared contract. Load detailed context on demand:
-@modes/reference-portals.md
+- `modes/{mode}.md` for the active mode procedure, output shape, and mode-specific routing.
+- `modes/reference-setup.md` for onboarding, tracker layout, states, and profile/CV setup.
+- `modes/reference-portals.md` for OTP, residential proxy, and MCP configuration.
+- `modes/reference-geometra.md` for form-fill patterns, portal failures, cleanup runbooks, and session recovery.
-@modes/reference-geometra.md
+Do not pre-load all reference files. Read only the active mode file and the reference file needed for the current blocker.

package/.mcp.json CHANGED Viewed

@@ -16,6 +16,19 @@
       "env": {
         "DISABLE_HTTP": "true"
       }
+    },
+    "state-trace": {
+      "command": "uvx",
+      "args": [
+        "--from",
+        "state-trace[mcp]",
+        "state-trace-mcp"
+      ],
+      "env": {
+        "STATE_TRACE_STORAGE_PATH": ".state-trace/memory.db",
+        "STATE_TRACE_NAMESPACE": "job-forge",
+        "STATE_TRACE_CAPACITY_LIMIT": "256"
+      }
     }
   }
 }

package/AGENTS.md CHANGED Viewed

@@ -11,7 +11,7 @@ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs v
   why: 2026-04 same-day batch collision — when two batches target the same role, `npx job-forge merge` updates the existing day-file row rather than appending, so grepping day files alone misses earlier-batch applies; merged/*.tsv is the only place the breadcrumb remains
 - [H3] Before every batch of `task` dispatches that will use Geometra, call `geometra_list_sessions` then `geometra_disconnect({closeBrowser: true})`. Every round, no exceptions. Name this cleanup as an explicit "step 0" in your first-response plan for any multi-apply request — it is the most frequently skipped guardrail in practice, and skipping it produces cascade "Not connected" failures on the next dispatch.
-  why: if any prior subagent aborted mid-flow, its Chromium session stays stuck in the MCP pool and the next `geometra_connect` fails with "Not connected"; the disconnect is a no-op when the pool is empty but a poison-cure when it isn't; vocalizing it up-front doubles the odds it actually runs
+  why: aborted subagents can leave Chromium sessions stuck in the MCP pool. Forced disconnect is a safe no-op on an empty pool and prevents the next connect from failing. Naming it up front improves compliance
 - [H4] In multi-job mode, the orchestrator session MUST NOT call `geometra_fill_form`, `geometra_run_actions`, `geometra_pick_listbox_option`, or `geometra_fill_otp` directly. Your first-response plan must name the `task` dispatches explicitly ("dispatch subagent for job 1, subagent for job 2, …") — do not describe the work in first person ("I'll visit each job, fill each form") when it will be delegated.
   why: repeated Geometra calls in the orchestrator bloat the cache prefix — this is the 2026-04 "apply to 20 jobs" 341-msg incident where each turn re-processed 100K+ fresh tokens instead of reading from cache; first-person narration is a leading indicator that the agent is mentally queueing work for itself rather than a subagent
@@ -33,13 +33,10 @@ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs v
 - [D2] Route subagent work by cost tier. `@general-free`: procedural — form-fill, TSV merge, verify, OTP retrieval, portal scan metadata extraction, one-shot structured-field transforms. `@general-paid`: quality-sensitive — offer evaluation narrative Blocks A-F, cover letters, "Why X?" answers, STAR interview stories, LinkedIn outreach. `@glm-minimal`: narrow ≤5K-input one-shot extract/classify jobs that do not need context.
   why: GLM 5.1 doesn't discount cache reads so procedural work there costs ~10×; free-tier models handle procedural work fine empirically (`opencode/big-pickle` processed 1000+ messages at $0)
-- [D3] Upgrade `apply` routing to `@general-paid` when offer score ≥ 4.0/5, when user flags "top-tier / dream job / high-stakes", or when late-stage pipeline (post-screens).
-  why: form-fill flows are 6+ steps; free-tier sometimes aborts mid-flow on large Greenhouse/Workday schemas; paid tier has more headroom
+- [D3] Read the active mode file before dispatch. Mode files own score gates, provider fallback, portal runbooks, and output shape.
+  why: mode-specific rules change faster than global orchestration rules; keeping them out of the shared prefix preserves cache efficiency and prevents stale branches
-- [D3f] **Provider-failure downgrade on `apply` (all harnesses; OpenCode + OpenRouter especially):** If you dispatched `@general-paid` per [D3] and that subagent fails or exhausts retries with provider-side errors — copy mentioning Venice / Diem / Chutes, "insufficient" USD/credits/funds/balance, HTTP 402/429, overload / temporarily unavailable — re-dispatch the **same apply URL** once on `@general-free` before marking FAILED. Do not abandon the role solely because the upgraded tier hit a pool-specific limit.
-  why: `@general-paid` on OpenCode still uses free OpenRouter model ids; Venice-style balance errors are a backend-route issue, not proof that procedural `@general-free` cannot complete the same Greenhouse-style flow after [D5]/[H2] gates pass
-- [D4] Auto-submit for offers scoring 3.0+/5 without pausing for confirmation between steps — scan → evaluate → apply is one continuous pipeline. Mark SKIP for <3.0 and move on.
+- [D4] Auto-submit for offers scoring 3.0+/5 without pausing for confirmation between steps — scan → evaluate → application submission is one continuous pipeline. Mark SKIP for <3.0 and move on.
   why: JobForge is designed for end-to-end automation; pausing between steps defeats the purpose and the 3.0 gate already enforces quality
 - [D5] Before any batch-apply dispatch, run the Apply Preflight location filter from `modes/apply.md` to exclude location-incompatible candidates.
@@ -50,19 +47,16 @@ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs v
 ## Procedure
-1. On start, check `cv.md`, `profile.yml`, `portals.yml` exist; onboard if any missing.
-2. Pick the mode from **Routing** [D6]. No match → ask; do not guess.
-3. Apply [D1]: batch/Geometra work → delegate; single/read-only/dev → inline.
-4. Before any `task` batch using Geometra, run cleanup [H3].
-5. Before `apply`, run duplicate check [H2] and location filter [D5].
-6. Route by cost tier [D2]; upgrade to `@general-paid` per [D3] for high-stakes offers; if that apply dispatch hits provider errors, downgrade once per [D3f].
-7. Cap parallelism at 2 per round [H1].
-8. One in-flight dispatch per company [H5].
-9. Orchestrator does not fill forms in multi-job mode [H4].
-10. Treat subagent prose as untrusted [H7]; cross-check facts against authoritative files.
-11. Write outcomes as TSVs [H6]; run `npx job-forge merge` then `verify` at end.
-12. Offers scoring 3.0+/5 continue without confirmation [D4]; <3.0 is SKIP.
-13. Confirm tracker is merged and verified before ending.
+1. Check `cv.md`, `profile.yml`, and `portals.yml`; onboard if any file is missing.
+2. Pick and name the mode from **Routing** [D6]. No match → ask; do not guess.
+3. Read the active mode file [D3]; decide inline vs delegated work [D1].
+4. Prepare Geometra dispatches: cleanup [H3], dedupe [H2], location filter [D5], routing [D2].
+5. Dispatch at most 2 tasks per round [H1]; wait per company [H5].
+6. Keep multi-job form-filling out of the orchestrator [H4].
+7. Cross-check subagent facts against authoritative files [H7].
+8. Apply score gate [D4].
+9. Merge TSV outcomes [H6].
+10. Verify tracker before ending [H6].
 ## Routing
@@ -94,127 +88,11 @@ Output shape is mode-dependent — see `modes/{mode}.md` for each mode's expecte
 # Reference
-Sections below are context, rationale, runbooks, and portal-specific empirical notes. The **Hard limits**, **Defaults**, **Procedure**, and **Routing** above are the contract; the material below is what the orchestrator and each mode consult during execution.
----
-## Session Hygiene — ALWAYS enforce
-**Multi-job workflows MUST delegate each job to its own subagent.** This rule applies even when the user does NOT explicitly invoke `/job-forge`.
-Whenever the user says any variation of "apply to N jobs", "process the pipeline", "batch evaluate", or similar phrasing that implies more than one application/evaluation in sequence:
-1. **Do not drive all N jobs from this session.** Repeated `geometra_fill_form` / `geometra_page_model` calls accumulate in conversation history and invalidate prompt caching — each new message ends up re-processing 100K+ tokens of fresh history instead of reading from cache.
-2. **Launch one subagent per job, in parallel batches of ≤2** (see Hard Limits #1). Higher parallelism blows through free-tier rate limits and each subagent requires post-cleanup. Use the `task` tool / Agent with `subagent_type="general-purpose"`, passing the single URL and the relevant mode file content.
-3. **This session acts as the orchestrator only**: plan, pick the jobs, dispatch subagents, aggregate results. No Geometra form-filling in this session unless it's a single one-off application.
-**Why:** observed on a real run — a 341-msg "apply to 20 jobs" session had `cache_read ~1.8K` on 5 messages where input ballooned to 100K-144K tokens. A 40-msg orchestrator session that delegates instead stays under 40K input max with cache reads at full 100K+. Same work, ~5× fewer effective tokens.
-**Verify after running:** `npx job-forge tokens --session <id>` — any message with `cache_read < 5K` and `input > 50K` is a cache-bust; next time split that work across subagents.
-**Exception:** evaluation-only or tracker-only work (no Geometra, no repeated tool calls) can proceed in a single session. The rule targets tool-heavy multi-step loops.
-**Before any batch-apply dispatch, run the Apply Preflight location filter from `modes/apply.md`** to exclude location-incompatible candidates. Catches the common case where an evaluated role has the right role-shape but a deal-breaking location that profile.yml already rules out.
----
-## Subagent Routing — which agent for which task
-The harness ships three subagents (see `.opencode/agents/`). The orchestrator MUST route work by cost tier, not pick the default for everything. **GLM 5.1 does not discount cache reads**, so running procedural work on it costs ~10× what it would on a cache-discounting model. Free-tier models handle procedural work fine (confirmed empirically: `opencode/big-pickle` processed 1000+ messages at $0 in prior runs).
-| Task type | Subagent | Why |
-|-----------|----------|-----|
-| Drive Geometra form-fill / submit (atomic `run_actions`) | `@general-free` | Procedural; label-driven; deterministic |
-| Merge TSVs, run `verify-pipeline.mjs`, dedup | `@general-free` | Script-driven; no writing quality needed |
-| OTP retrieval via Gmail MCP + `geometra_fill_otp` | `@general-free` | Fixed-shape lookup + input |
-| Scan portals, extract offer metadata, return structured records (see schema below) | `@general-free` | Structured output; no judgment |
-| Evaluation narrative — Blocks A-F per `modes/offer.md` | `@general-paid` | Judgment + writing quality |
-| Cover letter, "Why X?" answers, Section G drafts | `@general-paid` | Tone and specificity matter |
-| STAR+R interview stories, story-bank curation | `@general-paid` | Quality signals seniority |
-| LinkedIn outreach messages (`modes/contact.md`) | `@general-paid` | First impression |
-| "Extract N fields from this text → JSON" (≤5K input) | `@glm-minimal` | One-shot transform; no context needed |
-| "Classify this JD as archetype X/Y/Z" | `@glm-minimal` | Narrow, structured output |
-**Example JSON shape for the "extract / emit JSON" subagent rows above** (use this exact key set when delegating a portal-scan / extract task):
-```json
-{
-  "company": "Acme",
-  "role": "Senior Backend Engineer",
-  "location": "Remote (US)",
-  "comp_range_usd": "180000-220000",
-  "archetype": "backend-platform",
-  "url": "https://..."
-}
-```
-**Rule:** when you (the orchestrator) delegate a task, pick the cheapest agent that can do it well. Do NOT route every subagent through the same tier. Auto-pipeline mode MUST split a single job across `@general-paid` (evaluation) and `@general-free` (PDF gen + tracker + apply), not run it all on one model.
-**When to break this rule:** if the user explicitly asks for "quality over cost" or flags a high-stakes application (top-tier company, offer-stage negotiation, executive search), route everything through `@general-paid`. Document the exception in the session.
-### When to delegate
-**Delegate (`task` out) when the work involves repeated tool-heavy steps that bloat the orchestrator's cache prefix.** The concrete failure mode this prevents: a 341-message "apply to 20 jobs" session where repeated `geometra_fill_form` / `geometra_page_model` calls accumulated in history, forcing each new message to re-process 100K+ tokens of fresh input instead of reading from cache.
-**Delegate when:**
-- Applying to N≥2 jobs (repeated Geometra form-fill — the original cache-bust scenario)
-- Batch portal scans hitting ≥3 companies (API loops + page-model reads stack up)
-- Any explicit "apply to... / process pipeline / batch evaluate" phrasing from the user (multi-job intent)
-**Do NOT delegate — orchestrate inline:**
-- Single-offer evaluation (text-heavy, not tool-heavy)
-- Development / bug-fix / file-editing tasks
-- `tracker` and other read-only modes
-- Single-company scan, single-URL check
-- One-shot questions — "what does this mean?", "read X and summarize", "what's my next report number?"
-**Detection signal:** if you're about to call `geometra_fill_form` for a second *different* job in the same session, STOP and delegate the remainder. For everything else, in-session execution is the expected default.
----
-## Ethical Use -- CRITICAL
-**Applying to jobs via Geometra MCP is NOT impersonation.** The agent uses the user's credentials, CV, and information to apply on their behalf. This is the same as the user applying themselves — the agent is an extension of the user's will. Think of it as a browser automation tool that fills forms with the user's data.
-**This system is designed for quality, not quantity.** The goal is to help the user find and apply to roles where there is a genuine match -- not to spam companies with mass applications.
-- **Continuous pipeline for 3.0+ offers:** When scanning, evaluating, and applying — run the full pipeline end-to-end without pausing for confirmation. Scan → evaluate → fill form → submit for any offer scoring 3.0/5 or above. Do NOT stop between steps to ask "want me to continue?" — just do it.
-- **Auto-submit:** For offers scoring 3.0+/5, fill and submit the application automatically. For offers below 3.0/5, mark as SKIP and move on.
-- **Still respect quality:** Only apply where there is a genuine match (3.0+ ensures this). Auto-SKIP anything below 3.0.
-- **Respect recruiters' time.** Every application a human reads costs someone's attention. Only send what's worth reading.
----
-## Offer Verification -- MANDATORY
-**Read local artifacts before the network.** If `reports/` already contains this posting URL (or company+role with a full JD in the body), **Read** that report for verification or evaluation instead of WebFetch/Geometra. If `data/pipeline.md` or `jds/` points at frozen JD text (`local:jds/{file}` or pasted blocks), **Read** that first. Reuse JD text already in the same conversation — do not fetch the same URL twice. (The JD extraction section at the top of `modes/auto-pipeline.md` and its "at most once per session" rule are the detailed contract.)
-**When Geometra MCP is available** (interactive sessions), ALWAYS use it to verify offers:
-1. `geometra_connect` to the URL (via proxy)
-2. `geometra_page_model` to read structured page content
-3. Only footer/navbar without JD = closed. Title + description + Apply = active.
-**When Geometra MCP is NOT available** (batch workers via `opencode run`, headless environments):
-1. Use WebFetch to retrieve the page content
-2. Check for JD text, job title, and apply button/link in the response
-3. If WebFetch returns only a shell/navbar (no JD content), mark the offer as `**Verification: unconfirmed**` in the report header
-4. Do NOT skip the evaluation — proceed but flag the uncertainty so the user can verify manually before applying
-The goal is to never waste time on closed offers, but also never silently assume a role is active when verification was incomplete.
-### Canonical MCP tools (quick reference)
-Pick tools by name directly — reduces unnecessary tool discovery:
-| Task | Preferred tools |
-|------|------------------|
-| JD from URL | Greenhouse boards API when the URL matches (see JD extraction in `modes/auto-pipeline.md`) → else `geometra_connect` + `geometra_page_model` → else WebFetch → WebSearch last |
-| Offer still live? | Same as JD when Geometra is available; else WebFetch per above |
-| One apply subagent (single job) | One `geometra_connect` per job URL; reuse `sessionId` through schema + fill; submit via atomic `geometra_run_actions` per `modes/apply.md` [H1]. Do **not** `geometra_disconnect` between `geometra_form_schema` and submit on the same form unless recovery requires it |
-| Chromium pool between orchestrator dispatch rounds | `geometra_list_sessions` + `geometra_disconnect({ closeBrowser: true })` per Hard limit [H3] — orchestrator-only; not a substitute for finishing the in-subagent form flow |
-@modes/reference-setup.md
+The sections above are the shared contract. Load detailed context on demand:
-@modes/reference-portals.md
+- `modes/{mode}.md` for the active mode procedure, output shape, and mode-specific routing.
+- `modes/reference-setup.md` for onboarding, tracker layout, states, and profile/CV setup.
+- `modes/reference-portals.md` for OTP, residential proxy, and MCP configuration.
+- `modes/reference-geometra.md` for form-fill patterns, portal failures, cleanup runbooks, and session recovery.
-@modes/reference-geometra.md
+Do not pre-load all reference files. Read only the active mode file and the reference file needed for the current blocker.

package/CLAUDE.md CHANGED Viewed

@@ -11,7 +11,7 @@ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs v
   why: 2026-04 same-day batch collision — when two batches target the same role, `npx job-forge merge` updates the existing day-file row rather than appending, so grepping day files alone misses earlier-batch applies; merged/*.tsv is the only place the breadcrumb remains
 - [H3] Before every batch of `task` dispatches that will use Geometra, call `geometra_list_sessions` then `geometra_disconnect({closeBrowser: true})`. Every round, no exceptions. Name this cleanup as an explicit "step 0" in your first-response plan for any multi-apply request — it is the most frequently skipped guardrail in practice, and skipping it produces cascade "Not connected" failures on the next dispatch.
-  why: if any prior subagent aborted mid-flow, its Chromium session stays stuck in the MCP pool and the next `geometra_connect` fails with "Not connected"; the disconnect is a no-op when the pool is empty but a poison-cure when it isn't; vocalizing it up-front doubles the odds it actually runs
+  why: aborted subagents can leave Chromium sessions stuck in the MCP pool. Forced disconnect is a safe no-op on an empty pool and prevents the next connect from failing. Naming it up front improves compliance
 - [H4] In multi-job mode, the orchestrator session MUST NOT call `geometra_fill_form`, `geometra_run_actions`, `geometra_pick_listbox_option`, or `geometra_fill_otp` directly. Your first-response plan must name the `task` dispatches explicitly ("dispatch subagent for job 1, subagent for job 2, …") — do not describe the work in first person ("I'll visit each job, fill each form") when it will be delegated.
   why: repeated Geometra calls in the orchestrator bloat the cache prefix — this is the 2026-04 "apply to 20 jobs" 341-msg incident where each turn re-processed 100K+ fresh tokens instead of reading from cache; first-person narration is a leading indicator that the agent is mentally queueing work for itself rather than a subagent
@@ -33,13 +33,10 @@ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs v
 - [D2] Route subagent work by cost tier. `@general-free`: procedural — form-fill, TSV merge, verify, OTP retrieval, portal scan metadata extraction, one-shot structured-field transforms. `@general-paid`: quality-sensitive — offer evaluation narrative Blocks A-F, cover letters, "Why X?" answers, STAR interview stories, LinkedIn outreach. `@glm-minimal`: narrow ≤5K-input one-shot extract/classify jobs that do not need context.
   why: GLM 5.1 doesn't discount cache reads so procedural work there costs ~10×; free-tier models handle procedural work fine empirically (`opencode/big-pickle` processed 1000+ messages at $0)
-- [D3] Upgrade `apply` routing to `@general-paid` when offer score ≥ 4.0/5, when user flags "top-tier / dream job / high-stakes", or when late-stage pipeline (post-screens).
-  why: form-fill flows are 6+ steps; free-tier sometimes aborts mid-flow on large Greenhouse/Workday schemas; paid tier has more headroom
+- [D3] Read the active mode file before dispatch. Mode files own score gates, provider fallback, portal runbooks, and output shape.
+  why: mode-specific rules change faster than global orchestration rules; keeping them out of the shared prefix preserves cache efficiency and prevents stale branches
-- [D3f] **Provider-failure downgrade on `apply` (all harnesses; OpenCode + OpenRouter especially):** If you dispatched `@general-paid` per [D3] and that subagent fails or exhausts retries with provider-side errors — copy mentioning Venice / Diem / Chutes, "insufficient" USD/credits/funds/balance, HTTP 402/429, overload / temporarily unavailable — re-dispatch the **same apply URL** once on `@general-free` before marking FAILED. Do not abandon the role solely because the upgraded tier hit a pool-specific limit.
-  why: `@general-paid` on OpenCode still uses free OpenRouter model ids; Venice-style balance errors are a backend-route issue, not proof that procedural `@general-free` cannot complete the same Greenhouse-style flow after [D5]/[H2] gates pass
-- [D4] Auto-submit for offers scoring 3.0+/5 without pausing for confirmation between steps — scan → evaluate → apply is one continuous pipeline. Mark SKIP for <3.0 and move on.
+- [D4] Auto-submit for offers scoring 3.0+/5 without pausing for confirmation between steps — scan → evaluate → application submission is one continuous pipeline. Mark SKIP for <3.0 and move on.
   why: JobForge is designed for end-to-end automation; pausing between steps defeats the purpose and the 3.0 gate already enforces quality
 - [D5] Before any batch-apply dispatch, run the Apply Preflight location filter from `modes/apply.md` to exclude location-incompatible candidates.
@@ -50,19 +47,16 @@ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs v
 ## Procedure
-1. On start, check `cv.md`, `profile.yml`, `portals.yml` exist; onboard if any missing.
-2. Pick the mode from **Routing** [D6]. No match → ask; do not guess.
-3. Apply [D1]: batch/Geometra work → delegate; single/read-only/dev → inline.
-4. Before any `task` batch using Geometra, run cleanup [H3].
-5. Before `apply`, run duplicate check [H2] and location filter [D5].
-6. Route by cost tier [D2]; upgrade to `@general-paid` per [D3] for high-stakes offers; if that apply dispatch hits provider errors, downgrade once per [D3f].
-7. Cap parallelism at 2 per round [H1].
-8. One in-flight dispatch per company [H5].
-9. Orchestrator does not fill forms in multi-job mode [H4].
-10. Treat subagent prose as untrusted [H7]; cross-check facts against authoritative files.
-11. Write outcomes as TSVs [H6]; run `npx job-forge merge` then `verify` at end.
-12. Offers scoring 3.0+/5 continue without confirmation [D4]; <3.0 is SKIP.
-13. Confirm tracker is merged and verified before ending.
+1. Check `cv.md`, `profile.yml`, and `portals.yml`; onboard if any file is missing.
+2. Pick and name the mode from **Routing** [D6]. No match → ask; do not guess.
+3. Read the active mode file [D3]; decide inline vs delegated work [D1].
+4. Prepare Geometra dispatches: cleanup [H3], dedupe [H2], location filter [D5], routing [D2].
+5. Dispatch at most 2 tasks per round [H1]; wait per company [H5].
+6. Keep multi-job form-filling out of the orchestrator [H4].
+7. Cross-check subagent facts against authoritative files [H7].
+8. Apply score gate [D4].
+9. Merge TSV outcomes [H6].
+10. Verify tracker before ending [H6].
 ## Routing
@@ -94,127 +88,11 @@ Output shape is mode-dependent — see `modes/{mode}.md` for each mode's expecte
 # Reference
-Sections below are context, rationale, runbooks, and portal-specific empirical notes. The **Hard limits**, **Defaults**, **Procedure**, and **Routing** above are the contract; the material below is what the orchestrator and each mode consult during execution.
----
-## Session Hygiene — ALWAYS enforce
-**Multi-job workflows MUST delegate each job to its own subagent.** This rule applies even when the user does NOT explicitly invoke `/job-forge`.
-Whenever the user says any variation of "apply to N jobs", "process the pipeline", "batch evaluate", or similar phrasing that implies more than one application/evaluation in sequence:
-1. **Do not drive all N jobs from this session.** Repeated `geometra_fill_form` / `geometra_page_model` calls accumulate in conversation history and invalidate prompt caching — each new message ends up re-processing 100K+ tokens of fresh history instead of reading from cache.
-2. **Launch one subagent per job, in parallel batches of ≤2** (see Hard Limits #1). Higher parallelism blows through free-tier rate limits and each subagent requires post-cleanup. Use the `task` tool / Agent with `subagent_type="general-purpose"`, passing the single URL and the relevant mode file content.
-3. **This session acts as the orchestrator only**: plan, pick the jobs, dispatch subagents, aggregate results. No Geometra form-filling in this session unless it's a single one-off application.
-**Why:** observed on a real run — a 341-msg "apply to 20 jobs" session had `cache_read ~1.8K` on 5 messages where input ballooned to 100K-144K tokens. A 40-msg orchestrator session that delegates instead stays under 40K input max with cache reads at full 100K+. Same work, ~5× fewer effective tokens.
-**Verify after running:** `npx job-forge tokens --session <id>` — any message with `cache_read < 5K` and `input > 50K` is a cache-bust; next time split that work across subagents.
-**Exception:** evaluation-only or tracker-only work (no Geometra, no repeated tool calls) can proceed in a single session. The rule targets tool-heavy multi-step loops.
-**Before any batch-apply dispatch, run the Apply Preflight location filter from `modes/apply.md`** to exclude location-incompatible candidates. Catches the common case where an evaluated role has the right role-shape but a deal-breaking location that profile.yml already rules out.
----
-## Subagent Routing — which agent for which task
-The harness ships three subagents (see `.opencode/agents/`). The orchestrator MUST route work by cost tier, not pick the default for everything. **GLM 5.1 does not discount cache reads**, so running procedural work on it costs ~10× what it would on a cache-discounting model. Free-tier models handle procedural work fine (confirmed empirically: `opencode/big-pickle` processed 1000+ messages at $0 in prior runs).
-| Task type | Subagent | Why |
-|-----------|----------|-----|
-| Drive Geometra form-fill / submit (atomic `run_actions`) | `@general-free` | Procedural; label-driven; deterministic |
-| Merge TSVs, run `verify-pipeline.mjs`, dedup | `@general-free` | Script-driven; no writing quality needed |
-| OTP retrieval via Gmail MCP + `geometra_fill_otp` | `@general-free` | Fixed-shape lookup + input |
-| Scan portals, extract offer metadata, return structured records (see schema below) | `@general-free` | Structured output; no judgment |
-| Evaluation narrative — Blocks A-F per `modes/offer.md` | `@general-paid` | Judgment + writing quality |
-| Cover letter, "Why X?" answers, Section G drafts | `@general-paid` | Tone and specificity matter |
-| STAR+R interview stories, story-bank curation | `@general-paid` | Quality signals seniority |
-| LinkedIn outreach messages (`modes/contact.md`) | `@general-paid` | First impression |
-| "Extract N fields from this text → JSON" (≤5K input) | `@glm-minimal` | One-shot transform; no context needed |
-| "Classify this JD as archetype X/Y/Z" | `@glm-minimal` | Narrow, structured output |
-**Example JSON shape for the "extract / emit JSON" subagent rows above** (use this exact key set when delegating a portal-scan / extract task):
-```json
-{
-  "company": "Acme",
-  "role": "Senior Backend Engineer",
-  "location": "Remote (US)",
-  "comp_range_usd": "180000-220000",
-  "archetype": "backend-platform",
-  "url": "https://..."
-}
-```
-**Rule:** when you (the orchestrator) delegate a task, pick the cheapest agent that can do it well. Do NOT route every subagent through the same tier. Auto-pipeline mode MUST split a single job across `@general-paid` (evaluation) and `@general-free` (PDF gen + tracker + apply), not run it all on one model.
-**When to break this rule:** if the user explicitly asks for "quality over cost" or flags a high-stakes application (top-tier company, offer-stage negotiation, executive search), route everything through `@general-paid`. Document the exception in the session.
-### When to delegate
-**Delegate (`task` out) when the work involves repeated tool-heavy steps that bloat the orchestrator's cache prefix.** The concrete failure mode this prevents: a 341-message "apply to 20 jobs" session where repeated `geometra_fill_form` / `geometra_page_model` calls accumulated in history, forcing each new message to re-process 100K+ tokens of fresh input instead of reading from cache.
-**Delegate when:**
-- Applying to N≥2 jobs (repeated Geometra form-fill — the original cache-bust scenario)
-- Batch portal scans hitting ≥3 companies (API loops + page-model reads stack up)
-- Any explicit "apply to... / process pipeline / batch evaluate" phrasing from the user (multi-job intent)
-**Do NOT delegate — orchestrate inline:**
-- Single-offer evaluation (text-heavy, not tool-heavy)
-- Development / bug-fix / file-editing tasks
-- `tracker` and other read-only modes
-- Single-company scan, single-URL check
-- One-shot questions — "what does this mean?", "read X and summarize", "what's my next report number?"
-**Detection signal:** if you're about to call `geometra_fill_form` for a second *different* job in the same session, STOP and delegate the remainder. For everything else, in-session execution is the expected default.
----
-## Ethical Use -- CRITICAL
-**Applying to jobs via Geometra MCP is NOT impersonation.** The agent uses the user's credentials, CV, and information to apply on their behalf. This is the same as the user applying themselves — the agent is an extension of the user's will. Think of it as a browser automation tool that fills forms with the user's data.
-**This system is designed for quality, not quantity.** The goal is to help the user find and apply to roles where there is a genuine match -- not to spam companies with mass applications.
-- **Continuous pipeline for 3.0+ offers:** When scanning, evaluating, and applying — run the full pipeline end-to-end without pausing for confirmation. Scan → evaluate → fill form → submit for any offer scoring 3.0/5 or above. Do NOT stop between steps to ask "want me to continue?" — just do it.
-- **Auto-submit:** For offers scoring 3.0+/5, fill and submit the application automatically. For offers below 3.0/5, mark as SKIP and move on.
-- **Still respect quality:** Only apply where there is a genuine match (3.0+ ensures this). Auto-SKIP anything below 3.0.
-- **Respect recruiters' time.** Every application a human reads costs someone's attention. Only send what's worth reading.
----
-## Offer Verification -- MANDATORY
-**Read local artifacts before the network.** If `reports/` already contains this posting URL (or company+role with a full JD in the body), **Read** that report for verification or evaluation instead of WebFetch/Geometra. If `data/pipeline.md` or `jds/` points at frozen JD text (`local:jds/{file}` or pasted blocks), **Read** that first. Reuse JD text already in the same conversation — do not fetch the same URL twice. (The JD extraction section at the top of `modes/auto-pipeline.md` and its "at most once per session" rule are the detailed contract.)
-**When Geometra MCP is available** (interactive sessions), ALWAYS use it to verify offers:
-1. `geometra_connect` to the URL (via proxy)
-2. `geometra_page_model` to read structured page content
-3. Only footer/navbar without JD = closed. Title + description + Apply = active.
-**When Geometra MCP is NOT available** (batch workers via `opencode run`, headless environments):
-1. Use WebFetch to retrieve the page content
-2. Check for JD text, job title, and apply button/link in the response
-3. If WebFetch returns only a shell/navbar (no JD content), mark the offer as `**Verification: unconfirmed**` in the report header
-4. Do NOT skip the evaluation — proceed but flag the uncertainty so the user can verify manually before applying
-The goal is to never waste time on closed offers, but also never silently assume a role is active when verification was incomplete.
-### Canonical MCP tools (quick reference)
-Pick tools by name directly — reduces unnecessary tool discovery:
-| Task | Preferred tools |
-|------|------------------|
-| JD from URL | Greenhouse boards API when the URL matches (see JD extraction in `modes/auto-pipeline.md`) → else `geometra_connect` + `geometra_page_model` → else WebFetch → WebSearch last |
-| Offer still live? | Same as JD when Geometra is available; else WebFetch per above |
-| One apply subagent (single job) | One `geometra_connect` per job URL; reuse `sessionId` through schema + fill; submit via atomic `geometra_run_actions` per `modes/apply.md` [H1]. Do **not** `geometra_disconnect` between `geometra_form_schema` and submit on the same form unless recovery requires it |
-| Chromium pool between orchestrator dispatch rounds | `geometra_list_sessions` + `geometra_disconnect({ closeBrowser: true })` per Hard limit [H3] — orchestrator-only; not a substitute for finishing the in-subagent form flow |
-@modes/reference-setup.md
+The sections above are the shared contract. Load detailed context on demand:
-@modes/reference-portals.md
+- `modes/{mode}.md` for the active mode procedure, output shape, and mode-specific routing.
+- `modes/reference-setup.md` for onboarding, tracker layout, states, and profile/CV setup.
+- `modes/reference-portals.md` for OTP, residential proxy, and MCP configuration.
+- `modes/reference-geometra.md` for form-fill patterns, portal failures, cleanup runbooks, and session recovery.
-@modes/reference-geometra.md
+Do not pre-load all reference files. Read only the active mode file and the reference file needed for the current blocker.

package/README.md CHANGED Viewed

@@ -25,7 +25,13 @@ npm install
 opencode
 ```
-The scaffolded `opencode.json` already has the Geometra MCP (browser automation + PDF) and Gmail MCP (reading replies) wired up — they launch automatically the first time opencode starts. `npm install` also materializes symlinks for every supported agent harness — OpenCode, Cursor, Claude Code, and Codex — so you can run `opencode`, `cursor`, `claude`, or `codex` in the same project and each picks up the shared MCP config and instructions.
+The scaffolded `opencode.json` already has three MCPs wired up — they launch automatically the first time opencode starts:
+- **Geometra** — browser automation + PDF generation
+- **Gmail** — reads replies from recruiters
+- **state-trace** — typed working memory for cross-session context (resumed batches, recent decisions, repeated portal quirks). Spawned via `uvx`; install once with `brew install uv` (or `pipx install uv`) — no other setup.
+`npm install` also materializes symlinks for every supported agent harness — OpenCode, Cursor, Claude Code, and Codex — so you can run `opencode`, `cursor`, `claude`, or `codex` in the same project and each picks up the shared MCP config and instructions.
 Then fill in `cv.md`, `config/profile.yml`, and `portals.yml` with your personal data, paste a job URL into opencode, and JobForge evaluates + tracks it.

package/docs/SETUP.md CHANGED Viewed

@@ -4,6 +4,7 @@
 - [opencode](https://opencode.ai) installed and configured
 - Node.js 18+ (for the CLI, PDF generation, and tracker scripts)
+- [`uv`](https://docs.astral.sh/uv/) installed (`brew install uv` on macOS, or `pipx install uv`). Used by the state-trace MCP to spawn its Python entry point on demand via `uvx`. Without `uv`, the state-trace MCP fails to start; the rest of JobForge keeps working.
 - (Optional) Go (for the dashboard TUI) — use a toolchain that satisfies the `go` directive in [`dashboard/go.mod`](../dashboard/go.mod)
 ## Quick Start (two paths)

package/iso/instructions.md CHANGED Viewed

@@ -11,7 +11,7 @@ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs v
   why: 2026-04 same-day batch collision — when two batches target the same role, `npx job-forge merge` updates the existing day-file row rather than appending, so grepping day files alone misses earlier-batch applies; merged/*.tsv is the only place the breadcrumb remains
 - [H3] Before every batch of `task` dispatches that will use Geometra, call `geometra_list_sessions` then `geometra_disconnect({closeBrowser: true})`. Every round, no exceptions. Name this cleanup as an explicit "step 0" in your first-response plan for any multi-apply request — it is the most frequently skipped guardrail in practice, and skipping it produces cascade "Not connected" failures on the next dispatch.
-  why: if any prior subagent aborted mid-flow, its Chromium session stays stuck in the MCP pool and the next `geometra_connect` fails with "Not connected"; the disconnect is a no-op when the pool is empty but a poison-cure when it isn't; vocalizing it up-front doubles the odds it actually runs
+  why: aborted subagents can leave Chromium sessions stuck in the MCP pool. Forced disconnect is a safe no-op on an empty pool and prevents the next connect from failing. Naming it up front improves compliance
 - [H4] In multi-job mode, the orchestrator session MUST NOT call `geometra_fill_form`, `geometra_run_actions`, `geometra_pick_listbox_option`, or `geometra_fill_otp` directly. Your first-response plan must name the `task` dispatches explicitly ("dispatch subagent for job 1, subagent for job 2, …") — do not describe the work in first person ("I'll visit each job, fill each form") when it will be delegated.
   why: repeated Geometra calls in the orchestrator bloat the cache prefix — this is the 2026-04 "apply to 20 jobs" 341-msg incident where each turn re-processed 100K+ fresh tokens instead of reading from cache; first-person narration is a leading indicator that the agent is mentally queueing work for itself rather than a subagent
@@ -33,13 +33,10 @@ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs v
 - [D2] Route subagent work by cost tier. `@general-free`: procedural — form-fill, TSV merge, verify, OTP retrieval, portal scan metadata extraction, one-shot structured-field transforms. `@general-paid`: quality-sensitive — offer evaluation narrative Blocks A-F, cover letters, "Why X?" answers, STAR interview stories, LinkedIn outreach. `@glm-minimal`: narrow ≤5K-input one-shot extract/classify jobs that do not need context.
   why: GLM 5.1 doesn't discount cache reads so procedural work there costs ~10×; free-tier models handle procedural work fine empirically (`opencode/big-pickle` processed 1000+ messages at $0)
-- [D3] Upgrade `apply` routing to `@general-paid` when offer score ≥ 4.0/5, when user flags "top-tier / dream job / high-stakes", or when late-stage pipeline (post-screens).
-  why: form-fill flows are 6+ steps; free-tier sometimes aborts mid-flow on large Greenhouse/Workday schemas; paid tier has more headroom
+- [D3] Read the active mode file before dispatch. Mode files own score gates, provider fallback, portal runbooks, and output shape.
+  why: mode-specific rules change faster than global orchestration rules; keeping them out of the shared prefix preserves cache efficiency and prevents stale branches
-- [D3f] **Provider-failure downgrade on `apply` (all harnesses; OpenCode + OpenRouter especially):** If you dispatched `@general-paid` per [D3] and that subagent fails or exhausts retries with provider-side errors — copy mentioning Venice / Diem / Chutes, "insufficient" USD/credits/funds/balance, HTTP 402/429, overload / temporarily unavailable — re-dispatch the **same apply URL** once on `@general-free` before marking FAILED. Do not abandon the role solely because the upgraded tier hit a pool-specific limit.
-  why: `@general-paid` on OpenCode still uses free OpenRouter model ids; Venice-style balance errors are a backend-route issue, not proof that procedural `@general-free` cannot complete the same Greenhouse-style flow after [D5]/[H2] gates pass
-- [D4] Auto-submit for offers scoring 3.0+/5 without pausing for confirmation between steps — scan → evaluate → apply is one continuous pipeline. Mark SKIP for <3.0 and move on.
+- [D4] Auto-submit for offers scoring 3.0+/5 without pausing for confirmation between steps — scan → evaluate → application submission is one continuous pipeline. Mark SKIP for <3.0 and move on.
   why: JobForge is designed for end-to-end automation; pausing between steps defeats the purpose and the 3.0 gate already enforces quality
 - [D5] Before any batch-apply dispatch, run the Apply Preflight location filter from `modes/apply.md` to exclude location-incompatible candidates.
@@ -50,19 +47,16 @@ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs v
 ## Procedure
-1. On start, check `cv.md`, `profile.yml`, `portals.yml` exist; onboard if any missing.
-2. Pick the mode from **Routing** [D6]. No match → ask; do not guess.
-3. Apply [D1]: batch/Geometra work → delegate; single/read-only/dev → inline.
-4. Before any `task` batch using Geometra, run cleanup [H3].
-5. Before `apply`, run duplicate check [H2] and location filter [D5].
-6. Route by cost tier [D2]; upgrade to `@general-paid` per [D3] for high-stakes offers; if that apply dispatch hits provider errors, downgrade once per [D3f].
-7. Cap parallelism at 2 per round [H1].
-8. One in-flight dispatch per company [H5].
-9. Orchestrator does not fill forms in multi-job mode [H4].
-10. Treat subagent prose as untrusted [H7]; cross-check facts against authoritative files.
-11. Write outcomes as TSVs [H6]; run `npx job-forge merge` then `verify` at end.
-12. Offers scoring 3.0+/5 continue without confirmation [D4]; <3.0 is SKIP.
-13. Confirm tracker is merged and verified before ending.
+1. Check `cv.md`, `profile.yml`, and `portals.yml`; onboard if any file is missing.
+2. Pick and name the mode from **Routing** [D6]. No match → ask; do not guess.
+3. Read the active mode file [D3]; decide inline vs delegated work [D1].
+4. Prepare Geometra dispatches: cleanup [H3], dedupe [H2], location filter [D5], routing [D2].
+5. Dispatch at most 2 tasks per round [H1]; wait per company [H5].
+6. Keep multi-job form-filling out of the orchestrator [H4].
+7. Cross-check subagent facts against authoritative files [H7].
+8. Apply score gate [D4].
+9. Merge TSV outcomes [H6].
+10. Verify tracker before ending [H6].
 ## Routing
@@ -94,127 +88,11 @@ Output shape is mode-dependent — see `modes/{mode}.md` for each mode's expecte
 # Reference
-Sections below are context, rationale, runbooks, and portal-specific empirical notes. The **Hard limits**, **Defaults**, **Procedure**, and **Routing** above are the contract; the material below is what the orchestrator and each mode consult during execution.
----
-## Session Hygiene — ALWAYS enforce
-**Multi-job workflows MUST delegate each job to its own subagent.** This rule applies even when the user does NOT explicitly invoke `/job-forge`.
-Whenever the user says any variation of "apply to N jobs", "process the pipeline", "batch evaluate", or similar phrasing that implies more than one application/evaluation in sequence:
-1. **Do not drive all N jobs from this session.** Repeated `geometra_fill_form` / `geometra_page_model` calls accumulate in conversation history and invalidate prompt caching — each new message ends up re-processing 100K+ tokens of fresh history instead of reading from cache.
-2. **Launch one subagent per job, in parallel batches of ≤2** (see Hard Limits #1). Higher parallelism blows through free-tier rate limits and each subagent requires post-cleanup. Use the `task` tool / Agent with `subagent_type="general-purpose"`, passing the single URL and the relevant mode file content.
-3. **This session acts as the orchestrator only**: plan, pick the jobs, dispatch subagents, aggregate results. No Geometra form-filling in this session unless it's a single one-off application.
-**Why:** observed on a real run — a 341-msg "apply to 20 jobs" session had `cache_read ~1.8K` on 5 messages where input ballooned to 100K-144K tokens. A 40-msg orchestrator session that delegates instead stays under 40K input max with cache reads at full 100K+. Same work, ~5× fewer effective tokens.
-**Verify after running:** `npx job-forge tokens --session <id>` — any message with `cache_read < 5K` and `input > 50K` is a cache-bust; next time split that work across subagents.
-**Exception:** evaluation-only or tracker-only work (no Geometra, no repeated tool calls) can proceed in a single session. The rule targets tool-heavy multi-step loops.
-**Before any batch-apply dispatch, run the Apply Preflight location filter from `modes/apply.md`** to exclude location-incompatible candidates. Catches the common case where an evaluated role has the right role-shape but a deal-breaking location that profile.yml already rules out.
----
-## Subagent Routing — which agent for which task
-The harness ships three subagents (see `.opencode/agents/`). The orchestrator MUST route work by cost tier, not pick the default for everything. **GLM 5.1 does not discount cache reads**, so running procedural work on it costs ~10× what it would on a cache-discounting model. Free-tier models handle procedural work fine (confirmed empirically: `opencode/big-pickle` processed 1000+ messages at $0 in prior runs).
-| Task type | Subagent | Why |
-|-----------|----------|-----|
-| Drive Geometra form-fill / submit (atomic `run_actions`) | `@general-free` | Procedural; label-driven; deterministic |
-| Merge TSVs, run `verify-pipeline.mjs`, dedup | `@general-free` | Script-driven; no writing quality needed |
-| OTP retrieval via Gmail MCP + `geometra_fill_otp` | `@general-free` | Fixed-shape lookup + input |
-| Scan portals, extract offer metadata, return structured records (see schema below) | `@general-free` | Structured output; no judgment |
-| Evaluation narrative — Blocks A-F per `modes/offer.md` | `@general-paid` | Judgment + writing quality |
-| Cover letter, "Why X?" answers, Section G drafts | `@general-paid` | Tone and specificity matter |
-| STAR+R interview stories, story-bank curation | `@general-paid` | Quality signals seniority |
-| LinkedIn outreach messages (`modes/contact.md`) | `@general-paid` | First impression |
-| "Extract N fields from this text → JSON" (≤5K input) | `@glm-minimal` | One-shot transform; no context needed |
-| "Classify this JD as archetype X/Y/Z" | `@glm-minimal` | Narrow, structured output |
-**Example JSON shape for the "extract / emit JSON" subagent rows above** (use this exact key set when delegating a portal-scan / extract task):
-```json
-{
-  "company": "Acme",
-  "role": "Senior Backend Engineer",
-  "location": "Remote (US)",
-  "comp_range_usd": "180000-220000",
-  "archetype": "backend-platform",
-  "url": "https://..."
-}
-```
-**Rule:** when you (the orchestrator) delegate a task, pick the cheapest agent that can do it well. Do NOT route every subagent through the same tier. Auto-pipeline mode MUST split a single job across `@general-paid` (evaluation) and `@general-free` (PDF gen + tracker + apply), not run it all on one model.
-**When to break this rule:** if the user explicitly asks for "quality over cost" or flags a high-stakes application (top-tier company, offer-stage negotiation, executive search), route everything through `@general-paid`. Document the exception in the session.
-### When to delegate
-**Delegate (`task` out) when the work involves repeated tool-heavy steps that bloat the orchestrator's cache prefix.** The concrete failure mode this prevents: a 341-message "apply to 20 jobs" session where repeated `geometra_fill_form` / `geometra_page_model` calls accumulated in history, forcing each new message to re-process 100K+ tokens of fresh input instead of reading from cache.
-**Delegate when:**
-- Applying to N≥2 jobs (repeated Geometra form-fill — the original cache-bust scenario)
-- Batch portal scans hitting ≥3 companies (API loops + page-model reads stack up)
-- Any explicit "apply to... / process pipeline / batch evaluate" phrasing from the user (multi-job intent)
-**Do NOT delegate — orchestrate inline:**
-- Single-offer evaluation (text-heavy, not tool-heavy)
-- Development / bug-fix / file-editing tasks
-- `tracker` and other read-only modes
-- Single-company scan, single-URL check
-- One-shot questions — "what does this mean?", "read X and summarize", "what's my next report number?"
-**Detection signal:** if you're about to call `geometra_fill_form` for a second *different* job in the same session, STOP and delegate the remainder. For everything else, in-session execution is the expected default.
----
-## Ethical Use -- CRITICAL
-**Applying to jobs via Geometra MCP is NOT impersonation.** The agent uses the user's credentials, CV, and information to apply on their behalf. This is the same as the user applying themselves — the agent is an extension of the user's will. Think of it as a browser automation tool that fills forms with the user's data.
-**This system is designed for quality, not quantity.** The goal is to help the user find and apply to roles where there is a genuine match -- not to spam companies with mass applications.
-- **Continuous pipeline for 3.0+ offers:** When scanning, evaluating, and applying — run the full pipeline end-to-end without pausing for confirmation. Scan → evaluate → fill form → submit for any offer scoring 3.0/5 or above. Do NOT stop between steps to ask "want me to continue?" — just do it.
-- **Auto-submit:** For offers scoring 3.0+/5, fill and submit the application automatically. For offers below 3.0/5, mark as SKIP and move on.
-- **Still respect quality:** Only apply where there is a genuine match (3.0+ ensures this). Auto-SKIP anything below 3.0.
-- **Respect recruiters' time.** Every application a human reads costs someone's attention. Only send what's worth reading.
----
-## Offer Verification -- MANDATORY
-**Read local artifacts before the network.** If `reports/` already contains this posting URL (or company+role with a full JD in the body), **Read** that report for verification or evaluation instead of WebFetch/Geometra. If `data/pipeline.md` or `jds/` points at frozen JD text (`local:jds/{file}` or pasted blocks), **Read** that first. Reuse JD text already in the same conversation — do not fetch the same URL twice. (The JD extraction section at the top of `modes/auto-pipeline.md` and its "at most once per session" rule are the detailed contract.)
-**When Geometra MCP is available** (interactive sessions), ALWAYS use it to verify offers:
-1. `geometra_connect` to the URL (via proxy)
-2. `geometra_page_model` to read structured page content
-3. Only footer/navbar without JD = closed. Title + description + Apply = active.
-**When Geometra MCP is NOT available** (batch workers via `opencode run`, headless environments):
-1. Use WebFetch to retrieve the page content
-2. Check for JD text, job title, and apply button/link in the response
-3. If WebFetch returns only a shell/navbar (no JD content), mark the offer as `**Verification: unconfirmed**` in the report header
-4. Do NOT skip the evaluation — proceed but flag the uncertainty so the user can verify manually before applying
-The goal is to never waste time on closed offers, but also never silently assume a role is active when verification was incomplete.
-### Canonical MCP tools (quick reference)
-Pick tools by name directly — reduces unnecessary tool discovery:
-| Task | Preferred tools |
-|------|------------------|
-| JD from URL | Greenhouse boards API when the URL matches (see JD extraction in `modes/auto-pipeline.md`) → else `geometra_connect` + `geometra_page_model` → else WebFetch → WebSearch last |
-| Offer still live? | Same as JD when Geometra is available; else WebFetch per above |
-| One apply subagent (single job) | One `geometra_connect` per job URL; reuse `sessionId` through schema + fill; submit via atomic `geometra_run_actions` per `modes/apply.md` [H1]. Do **not** `geometra_disconnect` between `geometra_form_schema` and submit on the same form unless recovery requires it |
-| Chromium pool between orchestrator dispatch rounds | `geometra_list_sessions` + `geometra_disconnect({ closeBrowser: true })` per Hard limit [H3] — orchestrator-only; not a substitute for finishing the in-subagent form flow |
-@modes/reference-setup.md
+The sections above are the shared contract. Load detailed context on demand:
-@modes/reference-portals.md
+- `modes/{mode}.md` for the active mode procedure, output shape, and mode-specific routing.
+- `modes/reference-setup.md` for onboarding, tracker layout, states, and profile/CV setup.
+- `modes/reference-portals.md` for OTP, residential proxy, and MCP configuration.
+- `modes/reference-geometra.md` for form-fill patterns, portal failures, cleanup runbooks, and session recovery.
-@modes/reference-geometra.md
+Do not pre-load all reference files. Read only the active mode file and the reference file needed for the current blocker.

package/iso/mcp.json CHANGED Viewed

@@ -10,6 +10,15 @@
       "env": {
         "DISABLE_HTTP": "true"
       }
+    },
+    "state-trace": {
+      "command": "uvx",
+      "args": ["--from", "state-trace[mcp]", "state-trace-mcp"],
+      "env": {
+        "STATE_TRACE_STORAGE_PATH": ".state-trace/memory.db",
+        "STATE_TRACE_NAMESPACE": "job-forge",
+        "STATE_TRACE_CAPACITY_LIMIT": "256"
+      }
     }
   }
 }

package/modes/apply.md CHANGED Viewed

@@ -43,7 +43,13 @@ Live application assistant. Reads the active application form in Chrome (via Geo
   why: labels are stable across DOM refreshes; IDs are regenerated
 - [D7] If the orchestrator's task prompt includes a `proxy` object (sourced from `config/profile.yml`), pass it verbatim into every `geometra_connect` call — including Call 3 of the recovery sequence. If absent, run without one; never invent a proxy URL.
-  why: class-B Ashby / Cloudflare-fronted portals need a residential outbound IP; the fix is wired in Geometra MCP v1.59.0 but the orchestrator owns the config pipe. See "BYO Residential Proxy" in iso/instructions.md.
+  why: class-B Ashby / Cloudflare-fronted portals need a residential outbound IP; the fix is wired in Geometra MCP v1.59.0 but the orchestrator owns the config pipe. See "BYO Residential Proxy" in modes/reference-portals.md.
+- [D8] Upgrade application routing to `@general-paid` when the offer score is ≥ 4.0/5, the user flags "top-tier", "dream job", or "high-stakes", or the candidate is late-stage/post-screen.
+  why: form-fill flows are 6+ steps; free-tier sometimes aborts mid-flow on large Greenhouse/Workday schemas; paid tier has more headroom
+- [D9] If an upgraded `@general-paid` subagent fails with provider-side errors, re-dispatch the same URL once on `@general-free` before marking FAILED. Provider-side errors include Venice, Diem, Chutes, HTTP 402/429, insufficient credits/funds/balance, overload, and temporarily unavailable.
+  why: OpenCode paid-tier routing can still use free OpenRouter model IDs; backend pool limits are not evidence that a procedural free-tier worker cannot complete the same form after preflight gates pass
 ## Procedure
@@ -54,14 +60,16 @@ Live application assistant. Reads the active application form in Chrome (via Geo
 5. Compare role on screen vs evaluated role [D3].
 6. If different, pause for the candidate's decision [D3].
 7. Before dispatch, run Geometra cleanup [H4] and location filter [D1].
-8. Extract form questions; classify each Section-G vs new.
-9. Generate answers from Block B + Block F + Section G + JD.
-10. Submit as ONE `run_actions` call [H1] using labels [D6] with `imeFriendly: true` [D4].
-11. On session error, run the 4-step recovery; only one retry [H2].
-12. On OTP prompt, fetch the code from Gmail via `gmail_get_message`.
-13. Submit the OTP with `geometra_fill_otp` and click Submit.
-14. Write outcome as `batch/tracker-additions/*.tsv` [H3].
-15. Cap parallelism at 2 per round [H5]; one in-flight per company.
+8. Route high-stakes applications through `@general-paid` [D8].
+9. Extract form questions; classify each Section-G vs new.
+10. Generate answers from Block B + Block F + Section G + JD.
+11. Submit as ONE `run_actions` call [H1] using labels [D6] with `imeFriendly: true` [D4].
+12. On session error, run the 4-step recovery; only one retry [H2].
+13. On upgraded-provider failure, downgrade once to `@general-free` [D9].
+14. On OTP prompt, fetch the code from Gmail via `gmail_get_message`.
+15. Submit the OTP with `geometra_fill_otp` and click Submit.
+16. Write outcome as `batch/tracker-additions/*.tsv` [H3].
+17. Cap parallelism at 2 per round [H5]; one in-flight per company.
 ## Routing

package/opencode.json CHANGED Viewed

@@ -50,6 +50,20 @@
       "environment": {
         "DISABLE_HTTP": "true"
       }
+    },
+    "state-trace": {
+      "type": "local",
+      "command": [
+        "uvx",
+        "--from",
+        "state-trace[mcp]",
+        "state-trace-mcp"
+      ],
+      "environment": {
+        "STATE_TRACE_STORAGE_PATH": ".state-trace/memory.db",
+        "STATE_TRACE_NAMESPACE": "job-forge",
+        "STATE_TRACE_CAPACITY_LIMIT": "256"
+      }
     }
   },
   "plugin": [

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "job-forge",
-  "version": "2.14.7",
+  "version": "2.14.8",
   "description": "AI-powered job search pipeline built on opencode",
   "type": "module",
   "bin": {
@@ -28,6 +28,7 @@
     "test:agentmd:baseline": "agentmd test iso/instructions.md --fixtures fixtures/instructions.yml --via claude-code --model claude-haiku-4-5 --concurrency 2 --trials 3 --format json --out fixtures/baseline.json",
     "test:agentmd:apply": "agentmd test modes/apply.md --fixtures fixtures/modes/apply.yml --via claude-code --model claude-haiku-4-5 --concurrency 2 --trials 3",
     "lint:agentmd:modes": "agentmd lint modes/apply.md",
+    "smoke:iso": "iso plan . && iso build . --dry-run && iso-route verify models.yaml && node scripts/check-iso-smoke.mjs . && JOBFORGE_ROOT=$PWD iso-eval run fixtures/iso-smoke/eval.yml",
     "build:config": "iso build .",
     "prepack": "iso build .",
     "release:check-source": "node ./scripts/release/check-source.mjs",
@@ -87,10 +88,12 @@
     "playwright": "^1.58.1"
   },
   "devDependencies": {
+    "@razroo/agentmd": "^0.3.0",
     "@razroo/iso": "^0.2.5",
+    "@razroo/iso-eval": "^0.4.0",
     "@razroo/iso-harness": "^0.6.1",
-    "@razroo/iso-route": "^0.5.2",
-    "@razroo/iso-trace": "^0.3.1",
+    "@razroo/iso-route": "^0.5.3",
+    "@razroo/iso-trace": "^0.4.0",
     "@razroo/opencode-model-fallback": "^0.3.1"
   }
 }

package/scripts/check-iso-smoke.mjs ADDED Viewed

@@ -0,0 +1,43 @@
+#!/usr/bin/env node
+import { readFileSync } from "node:fs";
+import { resolve } from "node:path";
+const root = resolve(process.argv[2] ?? ".");
+const files = {
+  instructions: readFileSync(resolve(root, "iso/instructions.md"), "utf8"),
+  apply: readFileSync(resolve(root, "modes/apply.md"), "utf8"),
+  models: readFileSync(resolve(root, "models.yaml"), "utf8"),
+  config: readFileSync(resolve(root, "iso/config.json"), "utf8"),
+};
+const checks = [
+  ["root defines H1-H7", () => every(files.instructions, ["[H1]", "[H2]", "[H3]", "[H4]", "[H5]", "[H6]", "[H7]"])],
+  ["H1 caps dispatches at 2", () => /Max 2 parallel `task` dispatches/.test(files.instructions)],
+  ["H2 checks all duplicate sources", () => every(files.instructions, ["data/pipeline.md", "data/applications/*.md", "batch/tracker-additions/*.tsv", "batch/tracker-additions/merged/*.tsv"])],
+  ["H3 names Geometra cleanup calls", () => every(files.instructions, ["geometra_list_sessions", "geometra_disconnect({closeBrowser: true})"])],
+  ["H4 blocks orchestrator form filling", () => every(files.instructions, ["MUST NOT call `geometra_fill_form`", "`geometra_run_actions`", "`geometra_fill_otp`"])],
+  ["H5 blocks same-company concurrent retry", () => every(files.instructions, ["Re-dispatch the same company only AFTER", "previous subagent returns"])],
+  ["H6 requires merge and verify", () => every(files.instructions, ["batch/tracker-additions/*.tsv", "npx job-forge merge", "npx job-forge verify"])],
+  ["H7 distrusts subagent prose", () => every(files.instructions, ["must originate from a file", "not from prior subagent prose"])],
+  ["shared prompt points to on-demand references", () => every(files.instructions, ["modes/{mode}.md", "modes/reference-setup.md", "modes/reference-portals.md", "modes/reference-geometra.md"])],
+  ["apply mode owns high-stakes upgrade", () => every(files.apply, ["[D8]", "@general-paid", "4.0/5", "high-stakes"])],
+  ["apply mode owns provider downgrade", () => every(files.apply, ["[D9]", "@general-free", "HTTP 402/429", "insufficient credits/funds/balance"])],
+  ["models policy extends free OpenRouter preset", () => /extends:\s*openrouter-free/.test(files.models)],
+  ["OpenCode fallback plugin is configured", () => every(files.config, ["opencodeModelFallback", "@razroo/opencode-model-fallback"])],
+];
+const failures = checks
+  .filter(([, check]) => !check())
+  .map(([name]) => name);
+if (failures.length > 0) {
+  console.error("JobForge iso smoke failed:");
+  for (const failure of failures) console.error(`- ${failure}`);
+  process.exit(1);
+}
+console.log(`JobForge iso smoke passed (${checks.length} checks).`);
+function every(source, needles) {
+  return needles.every((needle) => source.includes(needle));
+}