npm - job-forge - Versions diffs - 2.14.7 → 2.14.9 - Mend

job-forge 2.14.7 → 2.14.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (26) hide show

package/.claude/agents/general-free.md +1 -0
package/.claude/agents/general-paid.md +3 -1
package/.codex/config.toml +5 -0
package/.cursor/mcp.json +13 -0
package/.cursor/rules/agent-general-free.mdc +1 -0
package/.cursor/rules/agent-general-paid.mdc +3 -1
package/.cursor/rules/main.mdc +25 -144
package/.mcp.json +13 -0
package/.opencode/agents/general-free.md +2 -0
package/.opencode/agents/general-paid.md +15 -3
package/.opencode/skills/job-forge.md +5 -1
package/AGENTS.md +25 -144
package/CLAUDE.md +25 -144
package/README.md +7 -1
package/docs/SETUP.md +1 -0
package/iso/agents/general-free.md +2 -0
package/iso/agents/general-paid.md +15 -3
package/iso/commands/job-forge.md +5 -1
package/iso/instructions.md +25 -144
package/iso/mcp.json +9 -0
package/modes/apply.md +29 -13
package/modes/reference-geometra.md +3 -2
package/modes/reference-portals.md +7 -6
package/opencode.json +14 -0
package/package.json +6 -3
package/scripts/check-iso-smoke.mjs +43 -0

package/CLAUDE.md CHANGED Viewed

@@ -4,14 +4,14 @@ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs v
 ## Hard limits
-- [H1] Max 2 parallel `task` dispatches per message. For N jobs, run `ceil(N/2)` sequential rounds of 2. Applies in all modes, for all user phrasings ("urgent", "apply to 10 jobs now").
-  why: higher parallelism blows through free-tier rate limits; each subagent requires post-cleanup and racing more than 2 reliably loses at least one result
+- [H1] Max 2 parallel `task` dispatches per message. For N jobs, run `ceil(N/2)` sequential rounds of 2. A round is not complete until both subagents return a final outcome (`APPLIED`, `APPLY FAILED`, `SKIP`, `Discarded`, or a written TSV path). A `task` tool result that only gives a session id / title is a launch acknowledgement, not completion. Applies in all modes, for all user phrasings ("urgent", "apply to 10 jobs now").
+  why: higher parallelism blows through free-tier rate limits; each subagent requires post-cleanup and racing more than 2 reliably loses at least one result. On 2026-04-25 the orchestrator launched round 2 while round 1 had only returned task ids, leaving four application subagents in flight and losing two provider-fallback recoveries
 - [H2] Max 1 application per company+role. Before every `apply` dispatch, grep all four sources for the URL and for `company+role`: `data/pipeline.md`, all `data/applications/*.md` day files, `batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`. If any source shows APPLIED / Applied, skip the dispatch.
   why: 2026-04 same-day batch collision — when two batches target the same role, `npx job-forge merge` updates the existing day-file row rather than appending, so grepping day files alone misses earlier-batch applies; merged/*.tsv is the only place the breadcrumb remains
 - [H3] Before every batch of `task` dispatches that will use Geometra, call `geometra_list_sessions` then `geometra_disconnect({closeBrowser: true})`. Every round, no exceptions. Name this cleanup as an explicit "step 0" in your first-response plan for any multi-apply request — it is the most frequently skipped guardrail in practice, and skipping it produces cascade "Not connected" failures on the next dispatch.
-  why: if any prior subagent aborted mid-flow, its Chromium session stays stuck in the MCP pool and the next `geometra_connect` fails with "Not connected"; the disconnect is a no-op when the pool is empty but a poison-cure when it isn't; vocalizing it up-front doubles the odds it actually runs
+  why: aborted subagents can leave Chromium sessions stuck in the MCP pool. Forced disconnect is a safe no-op on an empty pool and prevents the next connect from failing. Naming it up front improves compliance
 - [H4] In multi-job mode, the orchestrator session MUST NOT call `geometra_fill_form`, `geometra_run_actions`, `geometra_pick_listbox_option`, or `geometra_fill_otp` directly. Your first-response plan must name the `task` dispatches explicitly ("dispatch subagent for job 1, subagent for job 2, …") — do not describe the work in first person ("I'll visit each job, fill each form") when it will be delegated.
   why: repeated Geometra calls in the orchestrator bloat the cache prefix — this is the 2026-04 "apply to 20 jobs" 341-msg incident where each turn re-processed 100K+ fresh tokens instead of reading from cache; first-person narration is a leading indicator that the agent is mentally queueing work for itself rather than a subagent
@@ -19,6 +19,9 @@ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs v
 - [H5] Re-dispatch the same company only AFTER the previous subagent returns. Never fire the same `task` twice while the first is still in flight.
   why: two in-flight subagents for the same URL race on Geometra sessions and on tracker TSV writes, corrupting state and sometimes double-submitting
+- [H5b] Do not use `task` to poll task status. If OpenCode returns a task/session id without a final result, record the id, stop dispatching new rounds, and tell the user the round is still in flight. When the user asks to check later, inspect authoritative files (`batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`, day files, or `iso-trace`) rather than spawning a "check task status" subagent.
+  why: OpenCode status prompts can be delivered into the target subagent as a new user message; a 2026-04-25 trace caused a subagent to call `task` recursively instead of finishing the application
 - [H6] Application outcomes flow through `batch/tracker-additions/*.tsv`, not `data/pipeline.md`. After any multi-apply run, the orchestrator MUST run `npx job-forge merge` then `npx job-forge verify` before ending the session.
   why: `pipeline.md` is the URL inbox (`[ ]` pending → `[x]` processed); `data/applications/YYYY-MM-DD.md` is the outcome log; the TSV pathway is the only safe bridge because `merge` handles column order and duplicate detection
@@ -33,13 +36,10 @@ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs v
 - [D2] Route subagent work by cost tier. `@general-free`: procedural — form-fill, TSV merge, verify, OTP retrieval, portal scan metadata extraction, one-shot structured-field transforms. `@general-paid`: quality-sensitive — offer evaluation narrative Blocks A-F, cover letters, "Why X?" answers, STAR interview stories, LinkedIn outreach. `@glm-minimal`: narrow ≤5K-input one-shot extract/classify jobs that do not need context.
   why: GLM 5.1 doesn't discount cache reads so procedural work there costs ~10×; free-tier models handle procedural work fine empirically (`opencode/big-pickle` processed 1000+ messages at $0)
-- [D3] Upgrade `apply` routing to `@general-paid` when offer score ≥ 4.0/5, when user flags "top-tier / dream job / high-stakes", or when late-stage pipeline (post-screens).
-  why: form-fill flows are 6+ steps; free-tier sometimes aborts mid-flow on large Greenhouse/Workday schemas; paid tier has more headroom
-- [D3f] **Provider-failure downgrade on `apply` (all harnesses; OpenCode + OpenRouter especially):** If you dispatched `@general-paid` per [D3] and that subagent fails or exhausts retries with provider-side errors — copy mentioning Venice / Diem / Chutes, "insufficient" USD/credits/funds/balance, HTTP 402/429, overload / temporarily unavailable — re-dispatch the **same apply URL** once on `@general-free` before marking FAILED. Do not abandon the role solely because the upgraded tier hit a pool-specific limit.
-  why: `@general-paid` on OpenCode still uses free OpenRouter model ids; Venice-style balance errors are a backend-route issue, not proof that procedural `@general-free` cannot complete the same Greenhouse-style flow after [D5]/[H2] gates pass
+- [D3] Read the active mode file before dispatch. Mode files own score gates, provider fallback, portal runbooks, and output shape.
+  why: mode-specific rules change faster than global orchestration rules; keeping them out of the shared prefix preserves cache efficiency and prevents stale branches
-- [D4] Auto-submit for offers scoring 3.0+/5 without pausing for confirmation between steps — scan → evaluate → apply is one continuous pipeline. Mark SKIP for <3.0 and move on.
+- [D4] Auto-submit for offers scoring 3.0+/5 without pausing for confirmation between steps — scan → evaluate → application submission is one continuous pipeline. Mark SKIP for <3.0 and move on.
   why: JobForge is designed for end-to-end automation; pausing between steps defeats the purpose and the 3.0 gate already enforces quality
 - [D5] Before any batch-apply dispatch, run the Apply Preflight location filter from `modes/apply.md` to exclude location-incompatible candidates.
@@ -50,19 +50,16 @@ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs v
 ## Procedure
-1. On start, check `cv.md`, `profile.yml`, `portals.yml` exist; onboard if any missing.
-2. Pick the mode from **Routing** [D6]. No match → ask; do not guess.
-3. Apply [D1]: batch/Geometra work → delegate; single/read-only/dev → inline.
-4. Before any `task` batch using Geometra, run cleanup [H3].
-5. Before `apply`, run duplicate check [H2] and location filter [D5].
-6. Route by cost tier [D2]; upgrade to `@general-paid` per [D3] for high-stakes offers; if that apply dispatch hits provider errors, downgrade once per [D3f].
-7. Cap parallelism at 2 per round [H1].
-8. One in-flight dispatch per company [H5].
-9. Orchestrator does not fill forms in multi-job mode [H4].
-10. Treat subagent prose as untrusted [H7]; cross-check facts against authoritative files.
-11. Write outcomes as TSVs [H6]; run `npx job-forge merge` then `verify` at end.
-12. Offers scoring 3.0+/5 continue without confirmation [D4]; <3.0 is SKIP.
-13. Confirm tracker is merged and verified before ending.
+1. Check `cv.md`, `profile.yml`, and `portals.yml`; onboard if any file is missing.
+2. Pick and name the mode from **Routing** [D6]. No match → ask; do not guess.
+3. Read the active mode file [D3]; decide inline vs delegated work [D1].
+4. Prepare Geometra dispatches: cleanup [H3], dedupe [H2], location filter [D5], routing [D2].
+5. Dispatch at most 2 tasks per round [H1]; wait for final outcomes, not just task ids [H5b].
+6. Keep multi-job form-filling out of the orchestrator [H4].
+7. Cross-check subagent facts against authoritative files [H7].
+8. Apply score gate [D4].
+9. Merge TSV outcomes [H6].
+10. Verify tracker before ending [H6].
 ## Routing
@@ -94,127 +91,11 @@ Output shape is mode-dependent — see `modes/{mode}.md` for each mode's expecte
 # Reference
-Sections below are context, rationale, runbooks, and portal-specific empirical notes. The **Hard limits**, **Defaults**, **Procedure**, and **Routing** above are the contract; the material below is what the orchestrator and each mode consult during execution.
----
-## Session Hygiene — ALWAYS enforce
-**Multi-job workflows MUST delegate each job to its own subagent.** This rule applies even when the user does NOT explicitly invoke `/job-forge`.
-Whenever the user says any variation of "apply to N jobs", "process the pipeline", "batch evaluate", or similar phrasing that implies more than one application/evaluation in sequence:
-1. **Do not drive all N jobs from this session.** Repeated `geometra_fill_form` / `geometra_page_model` calls accumulate in conversation history and invalidate prompt caching — each new message ends up re-processing 100K+ tokens of fresh history instead of reading from cache.
-2. **Launch one subagent per job, in parallel batches of ≤2** (see Hard Limits #1). Higher parallelism blows through free-tier rate limits and each subagent requires post-cleanup. Use the `task` tool / Agent with `subagent_type="general-purpose"`, passing the single URL and the relevant mode file content.
-3. **This session acts as the orchestrator only**: plan, pick the jobs, dispatch subagents, aggregate results. No Geometra form-filling in this session unless it's a single one-off application.
-**Why:** observed on a real run — a 341-msg "apply to 20 jobs" session had `cache_read ~1.8K` on 5 messages where input ballooned to 100K-144K tokens. A 40-msg orchestrator session that delegates instead stays under 40K input max with cache reads at full 100K+. Same work, ~5× fewer effective tokens.
-**Verify after running:** `npx job-forge tokens --session <id>` — any message with `cache_read < 5K` and `input > 50K` is a cache-bust; next time split that work across subagents.
-**Exception:** evaluation-only or tracker-only work (no Geometra, no repeated tool calls) can proceed in a single session. The rule targets tool-heavy multi-step loops.
-**Before any batch-apply dispatch, run the Apply Preflight location filter from `modes/apply.md`** to exclude location-incompatible candidates. Catches the common case where an evaluated role has the right role-shape but a deal-breaking location that profile.yml already rules out.
----
-## Subagent Routing — which agent for which task
-The harness ships three subagents (see `.opencode/agents/`). The orchestrator MUST route work by cost tier, not pick the default for everything. **GLM 5.1 does not discount cache reads**, so running procedural work on it costs ~10× what it would on a cache-discounting model. Free-tier models handle procedural work fine (confirmed empirically: `opencode/big-pickle` processed 1000+ messages at $0 in prior runs).
-| Task type | Subagent | Why |
-|-----------|----------|-----|
-| Drive Geometra form-fill / submit (atomic `run_actions`) | `@general-free` | Procedural; label-driven; deterministic |
-| Merge TSVs, run `verify-pipeline.mjs`, dedup | `@general-free` | Script-driven; no writing quality needed |
-| OTP retrieval via Gmail MCP + `geometra_fill_otp` | `@general-free` | Fixed-shape lookup + input |
-| Scan portals, extract offer metadata, return structured records (see schema below) | `@general-free` | Structured output; no judgment |
-| Evaluation narrative — Blocks A-F per `modes/offer.md` | `@general-paid` | Judgment + writing quality |
-| Cover letter, "Why X?" answers, Section G drafts | `@general-paid` | Tone and specificity matter |
-| STAR+R interview stories, story-bank curation | `@general-paid` | Quality signals seniority |
-| LinkedIn outreach messages (`modes/contact.md`) | `@general-paid` | First impression |
-| "Extract N fields from this text → JSON" (≤5K input) | `@glm-minimal` | One-shot transform; no context needed |
-| "Classify this JD as archetype X/Y/Z" | `@glm-minimal` | Narrow, structured output |
-**Example JSON shape for the "extract / emit JSON" subagent rows above** (use this exact key set when delegating a portal-scan / extract task):
-```json
-{
-  "company": "Acme",
-  "role": "Senior Backend Engineer",
-  "location": "Remote (US)",
-  "comp_range_usd": "180000-220000",
-  "archetype": "backend-platform",
-  "url": "https://..."
-}
-```
-**Rule:** when you (the orchestrator) delegate a task, pick the cheapest agent that can do it well. Do NOT route every subagent through the same tier. Auto-pipeline mode MUST split a single job across `@general-paid` (evaluation) and `@general-free` (PDF gen + tracker + apply), not run it all on one model.
-**When to break this rule:** if the user explicitly asks for "quality over cost" or flags a high-stakes application (top-tier company, offer-stage negotiation, executive search), route everything through `@general-paid`. Document the exception in the session.
-### When to delegate
-**Delegate (`task` out) when the work involves repeated tool-heavy steps that bloat the orchestrator's cache prefix.** The concrete failure mode this prevents: a 341-message "apply to 20 jobs" session where repeated `geometra_fill_form` / `geometra_page_model` calls accumulated in history, forcing each new message to re-process 100K+ tokens of fresh input instead of reading from cache.
-**Delegate when:**
-- Applying to N≥2 jobs (repeated Geometra form-fill — the original cache-bust scenario)
-- Batch portal scans hitting ≥3 companies (API loops + page-model reads stack up)
-- Any explicit "apply to... / process pipeline / batch evaluate" phrasing from the user (multi-job intent)
-**Do NOT delegate — orchestrate inline:**
-- Single-offer evaluation (text-heavy, not tool-heavy)
-- Development / bug-fix / file-editing tasks
-- `tracker` and other read-only modes
-- Single-company scan, single-URL check
-- One-shot questions — "what does this mean?", "read X and summarize", "what's my next report number?"
-**Detection signal:** if you're about to call `geometra_fill_form` for a second *different* job in the same session, STOP and delegate the remainder. For everything else, in-session execution is the expected default.
----
-## Ethical Use -- CRITICAL
-**Applying to jobs via Geometra MCP is NOT impersonation.** The agent uses the user's credentials, CV, and information to apply on their behalf. This is the same as the user applying themselves — the agent is an extension of the user's will. Think of it as a browser automation tool that fills forms with the user's data.
-**This system is designed for quality, not quantity.** The goal is to help the user find and apply to roles where there is a genuine match -- not to spam companies with mass applications.
-- **Continuous pipeline for 3.0+ offers:** When scanning, evaluating, and applying — run the full pipeline end-to-end without pausing for confirmation. Scan → evaluate → fill form → submit for any offer scoring 3.0/5 or above. Do NOT stop between steps to ask "want me to continue?" — just do it.
-- **Auto-submit:** For offers scoring 3.0+/5, fill and submit the application automatically. For offers below 3.0/5, mark as SKIP and move on.
-- **Still respect quality:** Only apply where there is a genuine match (3.0+ ensures this). Auto-SKIP anything below 3.0.
-- **Respect recruiters' time.** Every application a human reads costs someone's attention. Only send what's worth reading.
----
-## Offer Verification -- MANDATORY
-**Read local artifacts before the network.** If `reports/` already contains this posting URL (or company+role with a full JD in the body), **Read** that report for verification or evaluation instead of WebFetch/Geometra. If `data/pipeline.md` or `jds/` points at frozen JD text (`local:jds/{file}` or pasted blocks), **Read** that first. Reuse JD text already in the same conversation — do not fetch the same URL twice. (The JD extraction section at the top of `modes/auto-pipeline.md` and its "at most once per session" rule are the detailed contract.)
-**When Geometra MCP is available** (interactive sessions), ALWAYS use it to verify offers:
-1. `geometra_connect` to the URL (via proxy)
-2. `geometra_page_model` to read structured page content
-3. Only footer/navbar without JD = closed. Title + description + Apply = active.
-**When Geometra MCP is NOT available** (batch workers via `opencode run`, headless environments):
-1. Use WebFetch to retrieve the page content
-2. Check for JD text, job title, and apply button/link in the response
-3. If WebFetch returns only a shell/navbar (no JD content), mark the offer as `**Verification: unconfirmed**` in the report header
-4. Do NOT skip the evaluation — proceed but flag the uncertainty so the user can verify manually before applying
-The goal is to never waste time on closed offers, but also never silently assume a role is active when verification was incomplete.
-### Canonical MCP tools (quick reference)
-Pick tools by name directly — reduces unnecessary tool discovery:
-| Task | Preferred tools |
-|------|------------------|
-| JD from URL | Greenhouse boards API when the URL matches (see JD extraction in `modes/auto-pipeline.md`) → else `geometra_connect` + `geometra_page_model` → else WebFetch → WebSearch last |
-| Offer still live? | Same as JD when Geometra is available; else WebFetch per above |
-| One apply subagent (single job) | One `geometra_connect` per job URL; reuse `sessionId` through schema + fill; submit via atomic `geometra_run_actions` per `modes/apply.md` [H1]. Do **not** `geometra_disconnect` between `geometra_form_schema` and submit on the same form unless recovery requires it |
-| Chromium pool between orchestrator dispatch rounds | `geometra_list_sessions` + `geometra_disconnect({ closeBrowser: true })` per Hard limit [H3] — orchestrator-only; not a substitute for finishing the in-subagent form flow |
-@modes/reference-setup.md
+The sections above are the shared contract. Load detailed context on demand:
-@modes/reference-portals.md
+- `modes/{mode}.md` for the active mode procedure, output shape, and mode-specific routing.
+- `modes/reference-setup.md` for onboarding, tracker layout, states, and profile/CV setup.
+- `modes/reference-portals.md` for OTP, residential proxy, and MCP configuration.
+- `modes/reference-geometra.md` for form-fill patterns, portal failures, cleanup runbooks, and session recovery.
-@modes/reference-geometra.md
+Do not pre-load all reference files. Read only the active mode file and the reference file needed for the current blocker.

package/README.md CHANGED Viewed

@@ -25,7 +25,13 @@ npm install
 opencode
 ```
-The scaffolded `opencode.json` already has the Geometra MCP (browser automation + PDF) and Gmail MCP (reading replies) wired up — they launch automatically the first time opencode starts. `npm install` also materializes symlinks for every supported agent harness — OpenCode, Cursor, Claude Code, and Codex — so you can run `opencode`, `cursor`, `claude`, or `codex` in the same project and each picks up the shared MCP config and instructions.
+The scaffolded `opencode.json` already has three MCPs wired up — they launch automatically the first time opencode starts:
+- **Geometra** — browser automation + PDF generation
+- **Gmail** — reads replies from recruiters
+- **state-trace** — typed working memory for cross-session context (resumed batches, recent decisions, repeated portal quirks). Spawned via `uvx`; install once with `brew install uv` (or `pipx install uv`) — no other setup.
+`npm install` also materializes symlinks for every supported agent harness — OpenCode, Cursor, Claude Code, and Codex — so you can run `opencode`, `cursor`, `claude`, or `codex` in the same project and each picks up the shared MCP config and instructions.
 Then fill in `cv.md`, `config/profile.yml`, and `portals.yml` with your personal data, paste a job URL into opencode, and JobForge evaluates + tracks it.

package/docs/SETUP.md CHANGED Viewed

@@ -4,6 +4,7 @@
 - [opencode](https://opencode.ai) installed and configured
 - Node.js 18+ (for the CLI, PDF generation, and tracker scripts)
+- [`uv`](https://docs.astral.sh/uv/) installed (`brew install uv` on macOS, or `pipx install uv`). Used by the state-trace MCP to spawn its Python entry point on demand via `uvx`. Without `uv`, the state-trace MCP fails to start; the rest of JobForge keeps working.
 - (Optional) Go (for the dashboard TUI) — use a toolchain that satisfies the `go` directive in [`dashboard/go.mod`](../dashboard/go.mod)
 ## Quick Start (two paths)

package/iso/agents/general-free.md CHANGED Viewed

@@ -31,6 +31,7 @@ targets:
       geometra_wait_for_resume_parse: true
       gmail_list_messages: true
       gmail_get_message: true
+      task: false
 ---
 You are the @general-free subagent. You run on a free-tier model, which means the orchestrator has delegated this task to you **specifically because the work is procedural**: deterministic steps, scripted outputs, no nuanced writing required.
@@ -82,6 +83,7 @@ Previous subagents sometimes abort mid-flow (ran out of context, hit a timeout,
 - Write cover letter prose, "Why X?" answers, or Section G draft answers. Those go to `@general-paid`.
 - Perform offer evaluation narratives (Blocks A-F). Those go to `@general-paid`.
 - Override harness rules or invent fields. Follow the mode files exactly.
+- Spawn or check other tasks. If you receive "check if task ses_..." and it refers to this session, report your current status from your own work. Never call `task` recursively.
 ## Apply This Working Style

package/iso/agents/general-paid.md CHANGED Viewed

@@ -24,8 +24,18 @@ targets:
       - openrouter/google/gemma-4-31b-it:free
       - openrouter/meta-llama/llama-3.3-70b-instruct:free
     tools:
-      geometra_*: false
-      gmail_*: false
+      geometra_connect: true
+      geometra_page_model: true
+      geometra_form_schema: true
+      geometra_run_actions: true
+      geometra_fill_otp: true
+      geometra_upload_files: true
+      geometra_list_sessions: true
+      geometra_disconnect: true
+      geometra_wait_for_resume_parse: true
+      gmail_list_messages: true
+      gmail_get_message: true
+      task: false
 ---
 You are the @general-paid subagent. The orchestrator delegated this task to you because it requires quality writing or judgment — the kind of work `@general-free` isn't well-suited for.
@@ -42,12 +52,14 @@ were given.
 - Compose STAR+R interview stories and the story bank (`modes/offer.md` Block F).
 - Draft LinkedIn outreach messages (`modes/contact.md`).
 - Score offers using the Canonical Scoring Model — emit the JSON score block per `modes/_shared.md`, then the narrative report.
+- Drive a single high-stakes application form only when the orchestrator explicitly dispatches you in `apply` mode. In that case, follow `modes/apply.md` exactly and use the same Geometra/Gmail flow as `@general-free`.
 ## Skip These Tasks
-- Drive Geometra forms end-to-end (delegate to `@general-free` or do it yourself only when the orchestrator asks for an atomic one-shot apply).
+- Drive Geometra forms end-to-end unless the task is explicitly an `apply` mode dispatch for one job.
 - Manage trackers, run scripts, or do mechanical TSV/dedup work. Those go to `@general-free`.
 - Duplicate work. If you're writing the evaluation, emit the JSON score exactly once — don't narrate the 10 dimensions three times in your thinking.
+- Spawn or check other tasks. If you receive "check if task ses_..." and it refers to this session, report your current status from your own work. Never call `task` recursively.
 ## Apply This Working Style

package/iso/commands/job-forge.md CHANGED Viewed

@@ -158,7 +158,11 @@ Step 4  — Loop in rounds of 2 (Hard Limit #1)
     # Dispatch 1 or 2 task() calls in ONE message (never 3+)
     task(subagent_type=<tier per AGENTS.md routing>, prompt=<apply prompt for pair[0]>)
     task(subagent_type=<tier>, prompt=<apply prompt for pair[1]>)  # only if pair has 2
-    # WAIT for both subagents to return before proceeding
+    # WAIT for both subagents to return final APPLIED / APPLY FAILED / SKIP /
+    # Discarded outcomes or TSV paths before proceeding.
+    # A returned task/session id is only a launch receipt, not completion.
+    # Do not create a "check task status" task; inspect tracker files or
+    # iso-trace if the user asks for status later.
     # Read their return values, log outcomes
 Step 5  — Between rounds: clean sessions again

package/iso/instructions.md CHANGED Viewed

@@ -4,14 +4,14 @@ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs v
 ## Hard limits
-- [H1] Max 2 parallel `task` dispatches per message. For N jobs, run `ceil(N/2)` sequential rounds of 2. Applies in all modes, for all user phrasings ("urgent", "apply to 10 jobs now").
-  why: higher parallelism blows through free-tier rate limits; each subagent requires post-cleanup and racing more than 2 reliably loses at least one result
+- [H1] Max 2 parallel `task` dispatches per message. For N jobs, run `ceil(N/2)` sequential rounds of 2. A round is not complete until both subagents return a final outcome (`APPLIED`, `APPLY FAILED`, `SKIP`, `Discarded`, or a written TSV path). A `task` tool result that only gives a session id / title is a launch acknowledgement, not completion. Applies in all modes, for all user phrasings ("urgent", "apply to 10 jobs now").
+  why: higher parallelism blows through free-tier rate limits; each subagent requires post-cleanup and racing more than 2 reliably loses at least one result. On 2026-04-25 the orchestrator launched round 2 while round 1 had only returned task ids, leaving four application subagents in flight and losing two provider-fallback recoveries
 - [H2] Max 1 application per company+role. Before every `apply` dispatch, grep all four sources for the URL and for `company+role`: `data/pipeline.md`, all `data/applications/*.md` day files, `batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`. If any source shows APPLIED / Applied, skip the dispatch.
   why: 2026-04 same-day batch collision — when two batches target the same role, `npx job-forge merge` updates the existing day-file row rather than appending, so grepping day files alone misses earlier-batch applies; merged/*.tsv is the only place the breadcrumb remains
 - [H3] Before every batch of `task` dispatches that will use Geometra, call `geometra_list_sessions` then `geometra_disconnect({closeBrowser: true})`. Every round, no exceptions. Name this cleanup as an explicit "step 0" in your first-response plan for any multi-apply request — it is the most frequently skipped guardrail in practice, and skipping it produces cascade "Not connected" failures on the next dispatch.
-  why: if any prior subagent aborted mid-flow, its Chromium session stays stuck in the MCP pool and the next `geometra_connect` fails with "Not connected"; the disconnect is a no-op when the pool is empty but a poison-cure when it isn't; vocalizing it up-front doubles the odds it actually runs
+  why: aborted subagents can leave Chromium sessions stuck in the MCP pool. Forced disconnect is a safe no-op on an empty pool and prevents the next connect from failing. Naming it up front improves compliance
 - [H4] In multi-job mode, the orchestrator session MUST NOT call `geometra_fill_form`, `geometra_run_actions`, `geometra_pick_listbox_option`, or `geometra_fill_otp` directly. Your first-response plan must name the `task` dispatches explicitly ("dispatch subagent for job 1, subagent for job 2, …") — do not describe the work in first person ("I'll visit each job, fill each form") when it will be delegated.
   why: repeated Geometra calls in the orchestrator bloat the cache prefix — this is the 2026-04 "apply to 20 jobs" 341-msg incident where each turn re-processed 100K+ fresh tokens instead of reading from cache; first-person narration is a leading indicator that the agent is mentally queueing work for itself rather than a subagent
@@ -19,6 +19,9 @@ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs v
 - [H5] Re-dispatch the same company only AFTER the previous subagent returns. Never fire the same `task` twice while the first is still in flight.
   why: two in-flight subagents for the same URL race on Geometra sessions and on tracker TSV writes, corrupting state and sometimes double-submitting
+- [H5b] Do not use `task` to poll task status. If OpenCode returns a task/session id without a final result, record the id, stop dispatching new rounds, and tell the user the round is still in flight. When the user asks to check later, inspect authoritative files (`batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`, day files, or `iso-trace`) rather than spawning a "check task status" subagent.
+  why: OpenCode status prompts can be delivered into the target subagent as a new user message; a 2026-04-25 trace caused a subagent to call `task` recursively instead of finishing the application
 - [H6] Application outcomes flow through `batch/tracker-additions/*.tsv`, not `data/pipeline.md`. After any multi-apply run, the orchestrator MUST run `npx job-forge merge` then `npx job-forge verify` before ending the session.
   why: `pipeline.md` is the URL inbox (`[ ]` pending → `[x]` processed); `data/applications/YYYY-MM-DD.md` is the outcome log; the TSV pathway is the only safe bridge because `merge` handles column order and duplicate detection
@@ -33,13 +36,10 @@ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs v
 - [D2] Route subagent work by cost tier. `@general-free`: procedural — form-fill, TSV merge, verify, OTP retrieval, portal scan metadata extraction, one-shot structured-field transforms. `@general-paid`: quality-sensitive — offer evaluation narrative Blocks A-F, cover letters, "Why X?" answers, STAR interview stories, LinkedIn outreach. `@glm-minimal`: narrow ≤5K-input one-shot extract/classify jobs that do not need context.
   why: GLM 5.1 doesn't discount cache reads so procedural work there costs ~10×; free-tier models handle procedural work fine empirically (`opencode/big-pickle` processed 1000+ messages at $0)
-- [D3] Upgrade `apply` routing to `@general-paid` when offer score ≥ 4.0/5, when user flags "top-tier / dream job / high-stakes", or when late-stage pipeline (post-screens).
-  why: form-fill flows are 6+ steps; free-tier sometimes aborts mid-flow on large Greenhouse/Workday schemas; paid tier has more headroom
-- [D3f] **Provider-failure downgrade on `apply` (all harnesses; OpenCode + OpenRouter especially):** If you dispatched `@general-paid` per [D3] and that subagent fails or exhausts retries with provider-side errors — copy mentioning Venice / Diem / Chutes, "insufficient" USD/credits/funds/balance, HTTP 402/429, overload / temporarily unavailable — re-dispatch the **same apply URL** once on `@general-free` before marking FAILED. Do not abandon the role solely because the upgraded tier hit a pool-specific limit.
-  why: `@general-paid` on OpenCode still uses free OpenRouter model ids; Venice-style balance errors are a backend-route issue, not proof that procedural `@general-free` cannot complete the same Greenhouse-style flow after [D5]/[H2] gates pass
+- [D3] Read the active mode file before dispatch. Mode files own score gates, provider fallback, portal runbooks, and output shape.
+  why: mode-specific rules change faster than global orchestration rules; keeping them out of the shared prefix preserves cache efficiency and prevents stale branches
-- [D4] Auto-submit for offers scoring 3.0+/5 without pausing for confirmation between steps — scan → evaluate → apply is one continuous pipeline. Mark SKIP for <3.0 and move on.
+- [D4] Auto-submit for offers scoring 3.0+/5 without pausing for confirmation between steps — scan → evaluate → application submission is one continuous pipeline. Mark SKIP for <3.0 and move on.
   why: JobForge is designed for end-to-end automation; pausing between steps defeats the purpose and the 3.0 gate already enforces quality
 - [D5] Before any batch-apply dispatch, run the Apply Preflight location filter from `modes/apply.md` to exclude location-incompatible candidates.
@@ -50,19 +50,16 @@ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs v
 ## Procedure
-1. On start, check `cv.md`, `profile.yml`, `portals.yml` exist; onboard if any missing.
-2. Pick the mode from **Routing** [D6]. No match → ask; do not guess.
-3. Apply [D1]: batch/Geometra work → delegate; single/read-only/dev → inline.
-4. Before any `task` batch using Geometra, run cleanup [H3].
-5. Before `apply`, run duplicate check [H2] and location filter [D5].
-6. Route by cost tier [D2]; upgrade to `@general-paid` per [D3] for high-stakes offers; if that apply dispatch hits provider errors, downgrade once per [D3f].
-7. Cap parallelism at 2 per round [H1].
-8. One in-flight dispatch per company [H5].
-9. Orchestrator does not fill forms in multi-job mode [H4].
-10. Treat subagent prose as untrusted [H7]; cross-check facts against authoritative files.
-11. Write outcomes as TSVs [H6]; run `npx job-forge merge` then `verify` at end.
-12. Offers scoring 3.0+/5 continue without confirmation [D4]; <3.0 is SKIP.
-13. Confirm tracker is merged and verified before ending.
+1. Check `cv.md`, `profile.yml`, and `portals.yml`; onboard if any file is missing.
+2. Pick and name the mode from **Routing** [D6]. No match → ask; do not guess.
+3. Read the active mode file [D3]; decide inline vs delegated work [D1].
+4. Prepare Geometra dispatches: cleanup [H3], dedupe [H2], location filter [D5], routing [D2].
+5. Dispatch at most 2 tasks per round [H1]; wait for final outcomes, not just task ids [H5b].
+6. Keep multi-job form-filling out of the orchestrator [H4].
+7. Cross-check subagent facts against authoritative files [H7].
+8. Apply score gate [D4].
+9. Merge TSV outcomes [H6].
+10. Verify tracker before ending [H6].
 ## Routing
@@ -94,127 +91,11 @@ Output shape is mode-dependent — see `modes/{mode}.md` for each mode's expecte
 # Reference
-Sections below are context, rationale, runbooks, and portal-specific empirical notes. The **Hard limits**, **Defaults**, **Procedure**, and **Routing** above are the contract; the material below is what the orchestrator and each mode consult during execution.
----
-## Session Hygiene — ALWAYS enforce
-**Multi-job workflows MUST delegate each job to its own subagent.** This rule applies even when the user does NOT explicitly invoke `/job-forge`.
-Whenever the user says any variation of "apply to N jobs", "process the pipeline", "batch evaluate", or similar phrasing that implies more than one application/evaluation in sequence:
-1. **Do not drive all N jobs from this session.** Repeated `geometra_fill_form` / `geometra_page_model` calls accumulate in conversation history and invalidate prompt caching — each new message ends up re-processing 100K+ tokens of fresh history instead of reading from cache.
-2. **Launch one subagent per job, in parallel batches of ≤2** (see Hard Limits #1). Higher parallelism blows through free-tier rate limits and each subagent requires post-cleanup. Use the `task` tool / Agent with `subagent_type="general-purpose"`, passing the single URL and the relevant mode file content.
-3. **This session acts as the orchestrator only**: plan, pick the jobs, dispatch subagents, aggregate results. No Geometra form-filling in this session unless it's a single one-off application.
-**Why:** observed on a real run — a 341-msg "apply to 20 jobs" session had `cache_read ~1.8K` on 5 messages where input ballooned to 100K-144K tokens. A 40-msg orchestrator session that delegates instead stays under 40K input max with cache reads at full 100K+. Same work, ~5× fewer effective tokens.
-**Verify after running:** `npx job-forge tokens --session <id>` — any message with `cache_read < 5K` and `input > 50K` is a cache-bust; next time split that work across subagents.
-**Exception:** evaluation-only or tracker-only work (no Geometra, no repeated tool calls) can proceed in a single session. The rule targets tool-heavy multi-step loops.
-**Before any batch-apply dispatch, run the Apply Preflight location filter from `modes/apply.md`** to exclude location-incompatible candidates. Catches the common case where an evaluated role has the right role-shape but a deal-breaking location that profile.yml already rules out.
----
-## Subagent Routing — which agent for which task
-The harness ships three subagents (see `.opencode/agents/`). The orchestrator MUST route work by cost tier, not pick the default for everything. **GLM 5.1 does not discount cache reads**, so running procedural work on it costs ~10× what it would on a cache-discounting model. Free-tier models handle procedural work fine (confirmed empirically: `opencode/big-pickle` processed 1000+ messages at $0 in prior runs).
-| Task type | Subagent | Why |
-|-----------|----------|-----|
-| Drive Geometra form-fill / submit (atomic `run_actions`) | `@general-free` | Procedural; label-driven; deterministic |
-| Merge TSVs, run `verify-pipeline.mjs`, dedup | `@general-free` | Script-driven; no writing quality needed |
-| OTP retrieval via Gmail MCP + `geometra_fill_otp` | `@general-free` | Fixed-shape lookup + input |
-| Scan portals, extract offer metadata, return structured records (see schema below) | `@general-free` | Structured output; no judgment |
-| Evaluation narrative — Blocks A-F per `modes/offer.md` | `@general-paid` | Judgment + writing quality |
-| Cover letter, "Why X?" answers, Section G drafts | `@general-paid` | Tone and specificity matter |
-| STAR+R interview stories, story-bank curation | `@general-paid` | Quality signals seniority |
-| LinkedIn outreach messages (`modes/contact.md`) | `@general-paid` | First impression |
-| "Extract N fields from this text → JSON" (≤5K input) | `@glm-minimal` | One-shot transform; no context needed |
-| "Classify this JD as archetype X/Y/Z" | `@glm-minimal` | Narrow, structured output |
-**Example JSON shape for the "extract / emit JSON" subagent rows above** (use this exact key set when delegating a portal-scan / extract task):
-```json
-{
-  "company": "Acme",
-  "role": "Senior Backend Engineer",
-  "location": "Remote (US)",
-  "comp_range_usd": "180000-220000",
-  "archetype": "backend-platform",
-  "url": "https://..."
-}
-```
-**Rule:** when you (the orchestrator) delegate a task, pick the cheapest agent that can do it well. Do NOT route every subagent through the same tier. Auto-pipeline mode MUST split a single job across `@general-paid` (evaluation) and `@general-free` (PDF gen + tracker + apply), not run it all on one model.
-**When to break this rule:** if the user explicitly asks for "quality over cost" or flags a high-stakes application (top-tier company, offer-stage negotiation, executive search), route everything through `@general-paid`. Document the exception in the session.
-### When to delegate
-**Delegate (`task` out) when the work involves repeated tool-heavy steps that bloat the orchestrator's cache prefix.** The concrete failure mode this prevents: a 341-message "apply to 20 jobs" session where repeated `geometra_fill_form` / `geometra_page_model` calls accumulated in history, forcing each new message to re-process 100K+ tokens of fresh input instead of reading from cache.
-**Delegate when:**
-- Applying to N≥2 jobs (repeated Geometra form-fill — the original cache-bust scenario)
-- Batch portal scans hitting ≥3 companies (API loops + page-model reads stack up)
-- Any explicit "apply to... / process pipeline / batch evaluate" phrasing from the user (multi-job intent)
-**Do NOT delegate — orchestrate inline:**
-- Single-offer evaluation (text-heavy, not tool-heavy)
-- Development / bug-fix / file-editing tasks
-- `tracker` and other read-only modes
-- Single-company scan, single-URL check
-- One-shot questions — "what does this mean?", "read X and summarize", "what's my next report number?"
-**Detection signal:** if you're about to call `geometra_fill_form` for a second *different* job in the same session, STOP and delegate the remainder. For everything else, in-session execution is the expected default.
----
-## Ethical Use -- CRITICAL
-**Applying to jobs via Geometra MCP is NOT impersonation.** The agent uses the user's credentials, CV, and information to apply on their behalf. This is the same as the user applying themselves — the agent is an extension of the user's will. Think of it as a browser automation tool that fills forms with the user's data.
-**This system is designed for quality, not quantity.** The goal is to help the user find and apply to roles where there is a genuine match -- not to spam companies with mass applications.
-- **Continuous pipeline for 3.0+ offers:** When scanning, evaluating, and applying — run the full pipeline end-to-end without pausing for confirmation. Scan → evaluate → fill form → submit for any offer scoring 3.0/5 or above. Do NOT stop between steps to ask "want me to continue?" — just do it.
-- **Auto-submit:** For offers scoring 3.0+/5, fill and submit the application automatically. For offers below 3.0/5, mark as SKIP and move on.
-- **Still respect quality:** Only apply where there is a genuine match (3.0+ ensures this). Auto-SKIP anything below 3.0.
-- **Respect recruiters' time.** Every application a human reads costs someone's attention. Only send what's worth reading.
----
-## Offer Verification -- MANDATORY
-**Read local artifacts before the network.** If `reports/` already contains this posting URL (or company+role with a full JD in the body), **Read** that report for verification or evaluation instead of WebFetch/Geometra. If `data/pipeline.md` or `jds/` points at frozen JD text (`local:jds/{file}` or pasted blocks), **Read** that first. Reuse JD text already in the same conversation — do not fetch the same URL twice. (The JD extraction section at the top of `modes/auto-pipeline.md` and its "at most once per session" rule are the detailed contract.)
-**When Geometra MCP is available** (interactive sessions), ALWAYS use it to verify offers:
-1. `geometra_connect` to the URL (via proxy)
-2. `geometra_page_model` to read structured page content
-3. Only footer/navbar without JD = closed. Title + description + Apply = active.
-**When Geometra MCP is NOT available** (batch workers via `opencode run`, headless environments):
-1. Use WebFetch to retrieve the page content
-2. Check for JD text, job title, and apply button/link in the response
-3. If WebFetch returns only a shell/navbar (no JD content), mark the offer as `**Verification: unconfirmed**` in the report header
-4. Do NOT skip the evaluation — proceed but flag the uncertainty so the user can verify manually before applying
-The goal is to never waste time on closed offers, but also never silently assume a role is active when verification was incomplete.
-### Canonical MCP tools (quick reference)
-Pick tools by name directly — reduces unnecessary tool discovery:
-| Task | Preferred tools |
-|------|------------------|
-| JD from URL | Greenhouse boards API when the URL matches (see JD extraction in `modes/auto-pipeline.md`) → else `geometra_connect` + `geometra_page_model` → else WebFetch → WebSearch last |
-| Offer still live? | Same as JD when Geometra is available; else WebFetch per above |
-| One apply subagent (single job) | One `geometra_connect` per job URL; reuse `sessionId` through schema + fill; submit via atomic `geometra_run_actions` per `modes/apply.md` [H1]. Do **not** `geometra_disconnect` between `geometra_form_schema` and submit on the same form unless recovery requires it |
-| Chromium pool between orchestrator dispatch rounds | `geometra_list_sessions` + `geometra_disconnect({ closeBrowser: true })` per Hard limit [H3] — orchestrator-only; not a substitute for finishing the in-subagent form flow |
-@modes/reference-setup.md
+The sections above are the shared contract. Load detailed context on demand:
-@modes/reference-portals.md
+- `modes/{mode}.md` for the active mode procedure, output shape, and mode-specific routing.
+- `modes/reference-setup.md` for onboarding, tracker layout, states, and profile/CV setup.
+- `modes/reference-portals.md` for OTP, residential proxy, and MCP configuration.
+- `modes/reference-geometra.md` for form-fill patterns, portal failures, cleanup runbooks, and session recovery.
-@modes/reference-geometra.md
+Do not pre-load all reference files. Read only the active mode file and the reference file needed for the current blocker.

package/iso/mcp.json CHANGED Viewed

@@ -10,6 +10,15 @@
       "env": {
         "DISABLE_HTTP": "true"
       }
+    },
+    "state-trace": {
+      "command": "uvx",
+      "args": ["--from", "state-trace[mcp]", "state-trace-mcp"],
+      "env": {
+        "STATE_TRACE_STORAGE_PATH": ".state-trace/memory.db",
+        "STATE_TRACE_NAMESPACE": "job-forge",
+        "STATE_TRACE_CAPACITY_LIMIT": "256"
+      }
     }
   }
 }