job-forge 2.4.0 → 2.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -3,36 +3,100 @@ description: Project instructions
3
3
  alwaysApply: true
4
4
  ---
5
5
 
6
- # JobForge -- AI Job Search Pipeline
6
+ # Agent: job-forge
7
7
 
8
- ## Hard Limits NEVER exceed these numbers
8
+ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs via Geometra MCP, applies to jobs, tracks applications across day files. Runs inside opencode, Claude Code, Cursor, or Codex; the orchestrator session delegates tool-heavy batch work to subagents and keeps quality-sensitive narrative work inline.
9
9
 
10
- The Hard Limits below are non-negotiable numeric rules. If you catch yourself about to violate one, STOP and restructure.
10
+ ## Hard limits
11
11
 
12
- 1. **Max parallel subagents: 2.** Never emit 3+ `task` tool calls in a single message. For N jobs, run `ceil(N/2)` sequential rounds of 2. No exceptions not for "urgent", not for "the user asked for 10".
13
- 2. **Max 1 application per company+role.** Before every `task` dispatch for `apply`, Grep ALL of the following for the URL and for `company+role`:
14
- - `data/pipeline.md`
15
- - all `data/applications/*.md` day files (not just today's — prior-day Applies count too)
16
- - `batch/tracker-additions/*.tsv` (pending outcomes not yet merged)
17
- - `batch/tracker-additions/merged/*.tsv` (outcomes already consumed into day files — catches same-day earlier-batch Applies that merge collapsed into an existing row)
12
+ - [H1] Max 2 parallel `task` dispatches per message. For N jobs, run `ceil(N/2)` sequential rounds of 2. Applies in all modes, for all user phrasings ("urgent", "apply to 10 jobs now").
13
+ why: higher parallelism blows through free-tier rate limits; each subagent requires post-cleanup and racing more than 2 reliably loses at least one result
18
14
 
19
- If any source shows an APPLIED / Applied outcome for this URL or company+role, skip that job and do not dispatch. **Why merged/ matters**: when two batches in the same day target the same role, `npx job-forge merge` updates the existing day-file row instead of creating a new one — so `grep data/applications/*.md` for the higher report number misses the earlier apply. The merged TSV is the only place the newer attempt's breadcrumb remains.
20
- 3. **Always clean Geometra sessions before dispatching.** Before every round of `task` dispatches that will use Geometra, call `geometra_list_sessions` then `geometra_disconnect({closeBrowser: true})`. Every round. The disconnect is a no-op when the pool is empty.
21
- 4. **Orchestrator does NOT fill forms.** This session MUST NOT call `geometra_fill_form`, `geometra_run_actions`, `geometra_pick_listbox_option`, or `geometra_fill_otp` when handling a multi-job request. If you need to, it means you MUST have delegated — `task` out the remaining work instead.
22
- 5. **Re-dispatch only AFTER the previous subagent returns.** Never fire the same company's `task` twice while the first is still in-flight. Wait for the return value, then decide if a retry is warranted.
23
- 6. **Application outcomes flow through TSVs, not `data/pipeline.md`.** When a subagent returns APPLIED / FAILED / SKIP, the outcome goes to `batch/tracker-additions/{num}-{slug}.tsv`. `npx job-forge merge` then consumes the TSVs into the correct `data/applications/YYYY-MM-DD.md` day file. `data/pipeline.md` only tracks URL inbox state (`[ ]` pending → `[x]` processed). **NEVER append APPLIED / FAILED status lines to `pipeline.md`** — that's the day file's job, via the TSV pathway. After any multi-apply run, the orchestrator MUST run `npx job-forge merge` followed by `npx job-forge verify` before ending the session.
24
- 7. **Load-bearing facts passed to downstream subagents must come from a file, not from a prior subagent's prose.** A URL, score, email ID, confirmation page snippet, JD salary range, exact answer submitted to a form question, or any other specific value that a downstream subagent will act on MUST originate from one of:
25
- - `data/pipeline.md` (URL inbox state)
26
- - `data/scan-history.tsv` (scan provenance)
27
- - `batch/scan-output-*.md` (scan-ranked candidates)
28
- - A report file (`reports/{num}-*.md`) with authoritative headers (`**URL:**`, `**Score:**`, etc.)
29
- - A TSV in `batch/tracker-additions/` (per-apply outcomes)
15
+ - [H2] Max 1 application per company+role. Before every `apply` dispatch, grep all four sources for the URL and for `company+role`: `data/pipeline.md`, all `data/applications/*.md` day files, `batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`. If any source shows APPLIED / Applied, skip the dispatch.
16
+ why: 2026-04 same-day batch collision when two batches target the same role, `npx job-forge merge` updates the existing day-file row rather than appending, so grepping day files alone misses earlier-batch applies; merged/*.tsv is the only place the breadcrumb remains
30
17
 
31
- **Not trustworthy by default**: anything quoted from a subagent's return message, any ID or score the orchestrator "remembers" from prose, any page-content snippet reproduced from a subagent's narrative. Subagents can hallucinate plausible-looking IDs, scores, and confirmation text. Before passing any such fact to another subagent, cross-check it exists in one of the authoritative files above, OR instruct the dispatching subagent to write its output to a structured file and re-read from that file.
18
+ - [H3] Before every batch of `task` dispatches that will use Geometra, call `geometra_list_sessions` then `geometra_disconnect({closeBrowser: true})`. Every round, no exceptions. Name this cleanup as an explicit "step 0" in your first-response plan for any multi-apply request it is the most frequently skipped guardrail in practice, and skipping it produces cascade "Not connected" failures on the next dispatch.
19
+ why: if any prior subagent aborted mid-flow, its Chromium session stays stuck in the MCP pool and the next `geometra_connect` fails with "Not connected"; the disconnect is a no-op when the pool is empty but a poison-cure when it isn't; vocalizing it up-front doubles the odds it actually runs
32
20
 
33
- **Why**: on 2026-04-18, a scan subagent returned 30 fabricated Greenhouse IDs in prose (correct role titles, plausible-looking invented IDs that didn't exist in the API). The orchestrator dispatched 30 downstream subagents that all hit 404s. Verification rules downstream (Hard Limit #6, API-first verify) caught the symptom. This rule prevents the *shape* of the bug hallucinations propagating through prose handoffs across all quantitative / identifier / specific-fact claims, not just URLs.
21
+ - [H4] In multi-job mode, the orchestrator session MUST NOT call `geometra_fill_form`, `geometra_run_actions`, `geometra_pick_listbox_option`, or `geometra_fill_otp` directly. Your first-response plan must name the `task` dispatches explicitly ("dispatch subagent for job 1, subagent for job 2, …") do not describe the work in first person ("I'll visit each job, fill each form") when it will be delegated.
22
+ why: repeated Geometra calls in the orchestrator bloat the cache prefix — this is the 2026-04 "apply to 20 jobs" 341-msg incident where each turn re-processed 100K+ fresh tokens instead of reading from cache; first-person narration is a leading indicator that the agent is mentally queueing work for itself rather than a subagent
34
23
 
35
- Everything below is context and rationale. These seven numbers are the rules.
24
+ - [H5] Re-dispatch the same company only AFTER the previous subagent returns. Never fire the same `task` twice while the first is still in flight.
25
+ why: two in-flight subagents for the same URL race on Geometra sessions and on tracker TSV writes, corrupting state and sometimes double-submitting
26
+
27
+ - [H6] Application outcomes flow through `batch/tracker-additions/*.tsv`, not `data/pipeline.md`. After any multi-apply run, the orchestrator MUST run `npx job-forge merge` then `npx job-forge verify` before ending the session.
28
+ why: `pipeline.md` is the URL inbox (`[ ]` pending → `[x]` processed); `data/applications/YYYY-MM-DD.md` is the outcome log; the TSV pathway is the only safe bridge because `merge` handles column order and duplicate detection
29
+
30
+ - [H7] Load-bearing facts passed to downstream subagents must originate from a file, not from prior subagent prose. Authoritative sources: `data/pipeline.md`, `data/scan-history.tsv`, `batch/scan-output-*.md`, `reports/{num}-*.md` with `**URL:**` / `**Score:**` headers, `batch/tracker-additions/*.tsv`.
31
+ why: 2026-04-18 scan subagent returned 30 fabricated Greenhouse IDs in prose (plausible-looking, non-existent); orchestrator dispatched 30 downstream subagents that all 404'd. Subagents can hallucinate IDs, scores, and confirmation text — round-trip through a file or don't trust the value
32
+
33
+ ## Defaults
34
+
35
+ - [D1] Delegate to a subagent (`task`) only when the work involves repeated tool-heavy steps that bloat the cache prefix: applying to N≥2 jobs, batch scans hitting ≥3 companies, or any "apply to… / process pipeline / batch evaluate" user phrasing. Single-offer evals, dev work, file edits, `tracker` mode, single-URL checks, and one-shot questions stay inline.
36
+ why: iso-trace showed 0.25% Agent calls across 5174 turns under a prior over-broad "delegate before 2nd tool call" rule — the rule was ignored in practice; narrowing matches the original cache-bust incident
37
+
38
+ - [D2] Route subagent work by cost tier. `@general-free`: procedural — form-fill, TSV merge, verify, OTP retrieval, portal scan metadata extraction, one-shot structured-field transforms. `@general-paid`: quality-sensitive — offer evaluation narrative Blocks A-F, cover letters, "Why X?" answers, STAR interview stories, LinkedIn outreach. `@glm-minimal`: narrow ≤5K-input one-shot extract/classify jobs that do not need context.
39
+ why: GLM 5.1 doesn't discount cache reads so procedural work there costs ~10×; free-tier models handle procedural work fine empirically (`opencode/big-pickle` processed 1000+ messages at $0)
40
+
41
+ - [D3] Upgrade `apply` routing to `@general-paid` when offer score ≥ 4.0/5, when user flags "top-tier / dream job / high-stakes", or when late-stage pipeline (post-screens).
42
+ why: form-fill flows are 6+ steps; free-tier sometimes aborts mid-flow on large Greenhouse/Workday schemas; paid tier has more headroom
43
+
44
+ - [D4] Auto-submit for offers scoring 3.0+/5 without pausing for confirmation between steps — scan → evaluate → apply is one continuous pipeline. Mark SKIP for <3.0 and move on.
45
+ why: JobForge is designed for end-to-end automation; pausing between steps defeats the purpose and the 3.0 gate already enforces quality
46
+
47
+ - [D5] Before any batch-apply dispatch, run the Apply Preflight location filter from `modes/apply.md` to exclude location-incompatible candidates.
48
+ why: catches the common case where an evaluated role has the right role-shape but a deal-breaking location that profile.yml already rules out
49
+
50
+ - [D6] Pick the mode from the **Routing** table below AND name it explicitly in your first response (e.g., "running auto-pipeline mode", "this is a `compare` request"). If no row matches the user's intent, ask which mode fits; do not guess.
51
+ why: silent mode picks mis-route work (a "negotiation" question answered in `offer` mode produces the wrong report shape); naming the mode out loud makes the routing decision reviewable and gives downstream dispatches a reliable anchor
52
+
53
+ ## Procedure
54
+
55
+ 1. On start, check `cv.md`, `profile.yml`, `portals.yml` exist; onboard if any missing.
56
+ 2. Pick the mode from **Routing** [D6]. No match → ask; do not guess.
57
+ 3. Apply [D1]: batch/Geometra work → delegate; single/read-only/dev → inline.
58
+ 4. Before any `task` batch using Geometra, run cleanup [H3].
59
+ 5. Before `apply`, run duplicate check [H2] and location filter [D5].
60
+ 6. Route by cost tier [D2]; upgrade to `@general-paid` per [D3] for high-stakes offers.
61
+ 7. Cap parallelism at 2 per round [H1].
62
+ 8. One in-flight dispatch per company [H5].
63
+ 9. Orchestrator does not fill forms in multi-job mode [H4].
64
+ 10. Treat subagent prose as untrusted [H7]; cross-check facts against authoritative files.
65
+ 11. Write outcomes as TSVs [H6]; run `npx job-forge merge` then `verify` at end.
66
+ 12. Offers scoring 3.0+/5 continue without confirmation [D4]; <3.0 is SKIP.
67
+ 13. Confirm tracker is merged and verified before ending.
68
+
69
+ ## Routing
70
+
71
+ | If the user… | Mode |
72
+ |---|---|
73
+ | Pastes JD or URL | auto-pipeline (evaluate + report + PDF + tracker) |
74
+ | Asks to evaluate offer | `offer` |
75
+ | Asks to compare offers | `compare` |
76
+ | Wants LinkedIn outreach | `contact` |
77
+ | Asks for company research | `deep` |
78
+ | Wants to generate CV/PDF | `pdf` |
79
+ | Evaluates a course/cert | `training` |
80
+ | Evaluates portfolio project | `project` |
81
+ | Asks about application status | `tracker` |
82
+ | Fills out application form | `apply` |
83
+ | Searches for new offers | `scan` |
84
+ | Processes pending URLs | `pipeline` |
85
+ | Batch processes offers | `batch` |
86
+ | Asks what needs follow-up | `followup` |
87
+ | Reports a rejection | `rejection` |
88
+ | Receives a job offer | `negotiation` |
89
+ | otherwise | Ask which mode fits; do not guess |
90
+
91
+ ## Output format
92
+
93
+ Output shape is mode-dependent — see `modes/{mode}.md` for each mode's expected output. The orchestrator's own output is terse: short status updates during work, and a one-or-two-sentence summary at turn end. No mid-work narration of individual tool calls.
94
+
95
+ ---
96
+
97
+ # Reference
98
+
99
+ Sections below are context, rationale, runbooks, and portal-specific empirical notes. The **Hard limits**, **Defaults**, **Procedure**, and **Routing** above are the contract; the material below is what the orchestrator and each mode consult during execution.
36
100
 
37
101
  ---
38
102
 
@@ -90,22 +154,23 @@ The harness ships three subagents (see `.opencode/agents/`). The orchestrator MU
90
154
 
91
155
  **When to break this rule:** if the user explicitly asks for "quality over cost" or flags a high-stakes application (top-tier company, offer-stage negotiation, executive search), route everything through `@general-paid`. Document the exception in the session.
92
156
 
93
- ### Pre-flight delegation (HARD RULE)
157
+ ### When to delegate
94
158
 
95
- For any task that will involve **more than one tool call** i.e., anything beyond a one-shot answer the orchestrator's **first tool call MUST be `task`** (dispatching to a subagent). Not `Read`, not `Bash`, not `geometra_connect`, not `Grep`. The orchestrator plans and dispatches; subagents execute.
159
+ **Delegate (`task` out) when the work involves repeated tool-heavy steps that bloat the orchestrator's cache prefix.** The concrete failure mode this prevents: a 341-message "apply to 20 jobs" session where repeated `geometra_fill_form` / `geometra_page_model` calls accumulated in history, forcing each new message to re-process 100K+ tokens of fresh input instead of reading from cache.
96
160
 
97
- **Why this is absolute:** every tool call in the orchestrator accumulates in the top-level session's history and pollutes the cache prefix. Once the orchestrator has read three files and made two Geometra calls, delegating to a subagent no longer helps — the subagent inherits the bloated context. The only way to keep the orchestrator lean is to delegate *before* doing anything else.
161
+ **Delegate when:**
162
+ - Applying to N≥2 jobs (repeated Geometra form-fill — the original cache-bust scenario)
163
+ - Batch portal scans hitting ≥3 companies (API loops + page-model reads stack up)
164
+ - Any explicit "apply to... / process pipeline / batch evaluate" phrasing from the user (multi-job intent)
98
165
 
99
- **What counts as "more than one tool call":**
100
- - Evaluating any offer (always ≥3 steps: fetch JD, score, write report)
101
- - Any `/job-forge` mode invocation except `tracker` (read-only)
102
- - Applying to a job
103
- - Scanning portals
104
- - Any batch operation
166
+ **Do NOT delegate orchestrate inline:**
167
+ - Single-offer evaluation (text-heavy, not tool-heavy)
168
+ - Development / bug-fix / file-editing tasks
169
+ - `tracker` and other read-only modes
170
+ - Single-company scan, single-URL check
171
+ - One-shot questions — "what does this mean?", "read X and summarize", "what's my next report number?"
105
172
 
106
- **Explicit exception:** trivial one-shot answers "what does this error mean?", "read this file and summarize", "what's my next report number?" — can stay in the orchestrator. If the question can be answered in ≤1 tool call, do not delegate.
107
-
108
- **Detection signal:** if you (orchestrator) find yourself about to make your 2nd tool call in a session that wasn't a trivial one-shot, STOP. Instead, `task` out the remaining work as a single delegated job.
173
+ **Detection signal:** if you're about to call `geometra_fill_form` for a second *different* job in the same session, STOP and delegate the remainder. For everything else, in-session execution is the expected default.
109
174
 
110
175
  ---
111
176
 
@@ -211,24 +276,7 @@ JobForge is designed to be customized by YOU (opencode). When the user asks you
211
276
 
212
277
  ### Skill Modes
213
278
 
214
- | If the user... | Mode |
215
- |----------------|------|
216
- | Pastes JD or URL | auto-pipeline (evaluate + report + PDF + tracker) |
217
- | Asks to evaluate offer | `offer` |
218
- | Asks to compare offers | `compare` |
219
- | Wants LinkedIn outreach | `contact` |
220
- | Asks for company research | `deep` |
221
- | Wants to generate CV/PDF | `pdf` |
222
- | Evaluates a course/cert | `training` |
223
- | Evaluates portfolio project | `project` |
224
- | Asks about application status | `tracker` |
225
- | Fills out application form | `apply` |
226
- | Searches for new offers | `scan` |
227
- | Processes pending URLs | `pipeline` |
228
- | Batch processes offers | `batch` |
229
- | Asks what needs follow-up | `followup` |
230
- | Reports a rejection | `rejection` |
231
- | Receives a job offer | `negotiation` |
279
+ Mode routing is specified in the top-level **## Routing** section. Each mode is implemented in `modes/{mode}.md` — consult those files for per-mode prompts, state, and expected outputs.
232
280
 
233
281
  ### CV Source of Truth
234
282
 
@@ -373,14 +421,53 @@ These blocks come from two distinct root causes and require different responses:
373
421
 
374
422
  **Known-block Ashby tenants (2026-04-19 empirical observations).** These tenants fired class B on every attempted submit from a headless datacenter-IP proxy. Orchestrators planning apply dispatches should assume these tenants will Fail in headless — prioritize other portals, or skip same-tenant siblings after a confirmed class B to avoid burning subagent slots:
375
423
 
376
- - Vellum, Linear, Vanta, River Financial, Higharc, Trace Labs, Solace Health, Unstructured, ClickUp, Zapier, Deepgram, Ramp, WorkOS
424
+ - Vellum, Linear, Vanta, River Financial, Higharc, Trace Labs, Solace Health, Unstructured, ClickUp, Zapier, Deepgram, Ramp, WorkOS, Ashby (self-tenant), Perplexity, **Goody**, **Starbridge**, **Graphite**, **Prompt Health**, **Vantage**
377
425
 
378
426
  **Known class-A-compatible Ashby tenants (same observations).** These tenants accepted headless submits cleanly, often with `imeFriendly: true` making the difference on the text-field subset:
379
427
 
380
- - Supabase, LangChain, Poolside, Runway Financial
428
+ - Supabase, LangChain, Poolside, Runway Financial, Sentry, Cognition
429
+
430
+ **Base rate for untested Ashby tenants (5/5 tested 2026-04-19 cycle 4 = class B).** The prior today is ~80-90% of untested Ashby tenants fingerprint-block headless submits. Orchestrators should treat any tenant not on the class-A-compatible list as likely class B — still dispatch to collect the data point, but don't burn multiple sibling-role slots on the same Ashby tenant.
381
431
 
382
432
  The pattern is tenant configuration, not role or company size. Lists drift as tenants tune their anti-bot — treat as probabilistic priors, not hard rules.
383
433
 
434
+ **Ashby choice-group with `optionCount: 1` and no labels (Sentry pattern).** Some Ashby tenants render Yes/No work-authorization questions as `role="button" name="Application"` pill toggles where the accessibility tree exposes neither `Yes` nor `No` labels. `fill_fields` with `choiceType: "group"` silently no-ops; `geometra_click` by `id` also fails to toggle. Fix: fall back to `geometra_click` with RAW x,y coordinates at the button centers (Yes is typically the left button, No is the right). Confirmed on Sentry Staff Platform #845, 2026-04-19.
435
+
436
+ ### Other Portal Failure Classes
437
+
438
+ **Typeform applications are Geometra-unsupported.** Some companies (Better Stack confirmed, 2026-04-19) route the Apply link to a Typeform wizard (`*.typeform.com/apply-*`). Typeform renders questions via a custom React/canvas layer that does NOT expose input fields to the accessibility tree — `geometra_form_schema` returns "No forms found", `geometra_query role=textbox` returns empty, blind `geometra_type` produces no semantic change. Mark `Failed` with reason "Typeform portal — Geometra unsupported" on detection; do not burn the 9-minute budget attempting blind input.
439
+
440
+ **Avature multi-step wizards have a native-`<select>` validation lag (Bloomberg pattern).** Bloomberg's careers site redirects to `bloomberg.avature.net` with a 4-step wizard. On Step 2, native `<select>` elements ("Is Current Position? / No") accept the value but keep `invalid: true` persistently — neither Tab, re-submit, nor re-pick clears it. `imeFriendly` has no effect because the field is a native `<select>`, not React-controlled text. There is no documented recovery. Mark `Failed` with reason "Avature native-select validation lag"; account creation up to that point is preserved for any future manual path. Confirmed on Bloomberg Sr SWE Auth #828, 2026-04-19.
441
+
442
+ **Cloudflare / ATS-vendor blocks on Dropbox-class portals.** Dropbox's real apply flow lives behind `happydance.website` (ATS vendor), which Cloudflare-fingerprints headless Chromium + datacenter IPs and returns "Sorry, you have been blocked". `job-boards.greenhouse.io/dropbox` does not mirror — there is no public Greenhouse fallback. Symptom-wise indistinguishable from Ashby class B but at a different layer. Mark `Failed` with reason "ATS vendor Cloudflare block (happydance.website or equivalent)". Confirmed on Dropbox Sr FS Product #831, 2026-04-19.
443
+
444
+ **Greenhouse OTP-on-fill variant (Instacart pattern).** Most Greenhouse OTP flows fire on Submit. A minority (Instacart Staff FoodStorm #827, 2026-04-19) fire the 8-cell security-code gate mid-fill, BEFORE the user clicks Submit. Detection: watch for an 8-cell OTP input surfacing after resume upload or the first listbox commit. Fetch from Gmail (`from:greenhouse newer_than:10m`) immediately when it appears — do not wait for Submit.
445
+
446
+ **`geometra_fill_otp` char-drop on first fill.** Occasionally `fill_otp` lands only the first character of an 8-char code (seen on Instacart, 2026-04-19). Recovery: click the first cell to focus, then re-issue `fill_otp` with `perCharDelayMs: 120`. The form usually auto-submits once all 8 cells are populated.
447
+
448
+ **Breezy portal — tenant-dependent, native `<select>`, resume-auto-parse is primary.** A subset of companies (Avantos AI, Courted, Instinct Science confirmed 2026-04-19) host applications on `*.breezy.hr` or `applytojob.com`. Empirical rules:
449
+
450
+ - **Class is per-tenant, not uniform.** Avantos (Failed 2026-04-19 #854) returned Breezy's own "It looks like maybe you've already applied to this job?" banner from IP fingerprinting, even on a first submit — distinct failure mode from Ashby's "flagged as possible spam". Courted (Applied 2026-04-19 #855) went through cleanly on the same session. Don't pre-skip Breezy; the outcome is tenant-specific.
451
+ - **Native `<select>` elements, not React comboboxes.** `geometra_pick_listbox_option` sets the visible display but NOT the underlying form state — submit will fail with "A response is required" on every combobox. Use `geometra_select_option` with x,y + label value for every choice field on Breezy.
452
+ - **Resume-auto-parse carries the signal.** After resume upload, Breezy auto-parses work history and education into structured rows. Do NOT Add/Delete position rows via Geometra — row mutations reshuffle fieldIds mid-flow, sequential `fill_fields` calls land in wrong rows, and upstream pollution corrupts earlier positions. Trust the parsed resume and fill only Personal Details + salary.
453
+
454
+ **Mailto-apply portals — direct email via gmail-mcp `attachments`.** A subset of HN-listed companies (CoPlane, Gambit Robotics, Rinse, Digital Health Strategies confirmed 2026-04-19) don't host an ATS form — their careers page instructs sending resume by email to `founders@...` / `jobs@...` / `contact@...`. Detection: WebFetch the careers URL; if the Apply link resolves to `mailto:` or the copy reads "email your resume to …", skip Geometra entirely.
455
+
456
+ Use `gmail_send_message` with the `attachments` parameter (available from `@razroo/gmail-mcp@1.8.0`):
457
+
458
+ ```
459
+ gmail_send_message({
460
+ to: ["founders@example.com"],
461
+ subject: "Application — Forward Deployed AI Engineer — Charlie Greenman (Austin)",
462
+ body: "<Section G pitch, 4-8 short paragraphs>",
463
+ attachments: [{ path: "/abs/path/to/Charlie-Greenman-CV.pdf" }]
464
+ })
465
+ ```
466
+
467
+ The MCP reads the file from disk and builds multipart/mixed MIME server-side — do NOT manually base64-encode a PDF into the `raw` parameter (the inline blob exceeds tool-call argument limits for any real attachment). Subject is auto MIME-encoded for non-ASCII (em-dash, smart quotes) by the same version. For older gmail-mcp versions (< 1.8.0) the only path was a direct Gmail API POST with the stored OAuth token at `~/.gmail-mcp/credentials.json` — upgrade if you can.
468
+
469
+ Mark Applied with note `mailto portal — sent via gmail_send_message; Gmail msgId {id}`. Verify via `gmail_get_message` that the attachment intact-size matches what was on disk before writing the TSV.
470
+
384
471
  ### Greenhouse Bot-Detection Honeypots
385
472
 
386
473
  Some Greenhouse tenants (Grafana Labs confirmed, 2026-04-19) inject a honeypot-style single-pick question on the application form, rendered as a listbox labeled something like "Which of the following best describes you?" with options resembling "I am a human being / I am a bot / I am a robot".
package/AGENTS.md CHANGED
@@ -1,33 +1,97 @@
1
- # JobForge -- AI Job Search Pipeline
1
+ # Agent: job-forge
2
2
 
3
- ## Hard Limits NEVER exceed these numbers
3
+ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs via Geometra MCP, applies to jobs, tracks applications across day files. Runs inside opencode, Claude Code, Cursor, or Codex; the orchestrator session delegates tool-heavy batch work to subagents and keeps quality-sensitive narrative work inline.
4
4
 
5
- The Hard Limits below are non-negotiable numeric rules. If you catch yourself about to violate one, STOP and restructure.
5
+ ## Hard limits
6
6
 
7
- 1. **Max parallel subagents: 2.** Never emit 3+ `task` tool calls in a single message. For N jobs, run `ceil(N/2)` sequential rounds of 2. No exceptions not for "urgent", not for "the user asked for 10".
8
- 2. **Max 1 application per company+role.** Before every `task` dispatch for `apply`, Grep ALL of the following for the URL and for `company+role`:
9
- - `data/pipeline.md`
10
- - all `data/applications/*.md` day files (not just today's — prior-day Applies count too)
11
- - `batch/tracker-additions/*.tsv` (pending outcomes not yet merged)
12
- - `batch/tracker-additions/merged/*.tsv` (outcomes already consumed into day files — catches same-day earlier-batch Applies that merge collapsed into an existing row)
7
+ - [H1] Max 2 parallel `task` dispatches per message. For N jobs, run `ceil(N/2)` sequential rounds of 2. Applies in all modes, for all user phrasings ("urgent", "apply to 10 jobs now").
8
+ why: higher parallelism blows through free-tier rate limits; each subagent requires post-cleanup and racing more than 2 reliably loses at least one result
13
9
 
14
- If any source shows an APPLIED / Applied outcome for this URL or company+role, skip that job and do not dispatch. **Why merged/ matters**: when two batches in the same day target the same role, `npx job-forge merge` updates the existing day-file row instead of creating a new one — so `grep data/applications/*.md` for the higher report number misses the earlier apply. The merged TSV is the only place the newer attempt's breadcrumb remains.
15
- 3. **Always clean Geometra sessions before dispatching.** Before every round of `task` dispatches that will use Geometra, call `geometra_list_sessions` then `geometra_disconnect({closeBrowser: true})`. Every round. The disconnect is a no-op when the pool is empty.
16
- 4. **Orchestrator does NOT fill forms.** This session MUST NOT call `geometra_fill_form`, `geometra_run_actions`, `geometra_pick_listbox_option`, or `geometra_fill_otp` when handling a multi-job request. If you need to, it means you MUST have delegated — `task` out the remaining work instead.
17
- 5. **Re-dispatch only AFTER the previous subagent returns.** Never fire the same company's `task` twice while the first is still in-flight. Wait for the return value, then decide if a retry is warranted.
18
- 6. **Application outcomes flow through TSVs, not `data/pipeline.md`.** When a subagent returns APPLIED / FAILED / SKIP, the outcome goes to `batch/tracker-additions/{num}-{slug}.tsv`. `npx job-forge merge` then consumes the TSVs into the correct `data/applications/YYYY-MM-DD.md` day file. `data/pipeline.md` only tracks URL inbox state (`[ ]` pending → `[x]` processed). **NEVER append APPLIED / FAILED status lines to `pipeline.md`** — that's the day file's job, via the TSV pathway. After any multi-apply run, the orchestrator MUST run `npx job-forge merge` followed by `npx job-forge verify` before ending the session.
19
- 7. **Load-bearing facts passed to downstream subagents must come from a file, not from a prior subagent's prose.** A URL, score, email ID, confirmation page snippet, JD salary range, exact answer submitted to a form question, or any other specific value that a downstream subagent will act on MUST originate from one of:
20
- - `data/pipeline.md` (URL inbox state)
21
- - `data/scan-history.tsv` (scan provenance)
22
- - `batch/scan-output-*.md` (scan-ranked candidates)
23
- - A report file (`reports/{num}-*.md`) with authoritative headers (`**URL:**`, `**Score:**`, etc.)
24
- - A TSV in `batch/tracker-additions/` (per-apply outcomes)
10
+ - [H2] Max 1 application per company+role. Before every `apply` dispatch, grep all four sources for the URL and for `company+role`: `data/pipeline.md`, all `data/applications/*.md` day files, `batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`. If any source shows APPLIED / Applied, skip the dispatch.
11
+ why: 2026-04 same-day batch collision when two batches target the same role, `npx job-forge merge` updates the existing day-file row rather than appending, so grepping day files alone misses earlier-batch applies; merged/*.tsv is the only place the breadcrumb remains
25
12
 
26
- **Not trustworthy by default**: anything quoted from a subagent's return message, any ID or score the orchestrator "remembers" from prose, any page-content snippet reproduced from a subagent's narrative. Subagents can hallucinate plausible-looking IDs, scores, and confirmation text. Before passing any such fact to another subagent, cross-check it exists in one of the authoritative files above, OR instruct the dispatching subagent to write its output to a structured file and re-read from that file.
13
+ - [H3] Before every batch of `task` dispatches that will use Geometra, call `geometra_list_sessions` then `geometra_disconnect({closeBrowser: true})`. Every round, no exceptions. Name this cleanup as an explicit "step 0" in your first-response plan for any multi-apply request it is the most frequently skipped guardrail in practice, and skipping it produces cascade "Not connected" failures on the next dispatch.
14
+ why: if any prior subagent aborted mid-flow, its Chromium session stays stuck in the MCP pool and the next `geometra_connect` fails with "Not connected"; the disconnect is a no-op when the pool is empty but a poison-cure when it isn't; vocalizing it up-front doubles the odds it actually runs
27
15
 
28
- **Why**: on 2026-04-18, a scan subagent returned 30 fabricated Greenhouse IDs in prose (correct role titles, plausible-looking invented IDs that didn't exist in the API). The orchestrator dispatched 30 downstream subagents that all hit 404s. Verification rules downstream (Hard Limit #6, API-first verify) caught the symptom. This rule prevents the *shape* of the bug hallucinations propagating through prose handoffs across all quantitative / identifier / specific-fact claims, not just URLs.
16
+ - [H4] In multi-job mode, the orchestrator session MUST NOT call `geometra_fill_form`, `geometra_run_actions`, `geometra_pick_listbox_option`, or `geometra_fill_otp` directly. Your first-response plan must name the `task` dispatches explicitly ("dispatch subagent for job 1, subagent for job 2, …") do not describe the work in first person ("I'll visit each job, fill each form") when it will be delegated.
17
+ why: repeated Geometra calls in the orchestrator bloat the cache prefix — this is the 2026-04 "apply to 20 jobs" 341-msg incident where each turn re-processed 100K+ fresh tokens instead of reading from cache; first-person narration is a leading indicator that the agent is mentally queueing work for itself rather than a subagent
29
18
 
30
- Everything below is context and rationale. These seven numbers are the rules.
19
+ - [H5] Re-dispatch the same company only AFTER the previous subagent returns. Never fire the same `task` twice while the first is still in flight.
20
+ why: two in-flight subagents for the same URL race on Geometra sessions and on tracker TSV writes, corrupting state and sometimes double-submitting
21
+
22
+ - [H6] Application outcomes flow through `batch/tracker-additions/*.tsv`, not `data/pipeline.md`. After any multi-apply run, the orchestrator MUST run `npx job-forge merge` then `npx job-forge verify` before ending the session.
23
+ why: `pipeline.md` is the URL inbox (`[ ]` pending → `[x]` processed); `data/applications/YYYY-MM-DD.md` is the outcome log; the TSV pathway is the only safe bridge because `merge` handles column order and duplicate detection
24
+
25
+ - [H7] Load-bearing facts passed to downstream subagents must originate from a file, not from prior subagent prose. Authoritative sources: `data/pipeline.md`, `data/scan-history.tsv`, `batch/scan-output-*.md`, `reports/{num}-*.md` with `**URL:**` / `**Score:**` headers, `batch/tracker-additions/*.tsv`.
26
+ why: 2026-04-18 scan subagent returned 30 fabricated Greenhouse IDs in prose (plausible-looking, non-existent); orchestrator dispatched 30 downstream subagents that all 404'd. Subagents can hallucinate IDs, scores, and confirmation text — round-trip through a file or don't trust the value
27
+
28
+ ## Defaults
29
+
30
+ - [D1] Delegate to a subagent (`task`) only when the work involves repeated tool-heavy steps that bloat the cache prefix: applying to N≥2 jobs, batch scans hitting ≥3 companies, or any "apply to… / process pipeline / batch evaluate" user phrasing. Single-offer evals, dev work, file edits, `tracker` mode, single-URL checks, and one-shot questions stay inline.
31
+ why: iso-trace showed 0.25% Agent calls across 5174 turns under a prior over-broad "delegate before 2nd tool call" rule — the rule was ignored in practice; narrowing matches the original cache-bust incident
32
+
33
+ - [D2] Route subagent work by cost tier. `@general-free`: procedural — form-fill, TSV merge, verify, OTP retrieval, portal scan metadata extraction, one-shot structured-field transforms. `@general-paid`: quality-sensitive — offer evaluation narrative Blocks A-F, cover letters, "Why X?" answers, STAR interview stories, LinkedIn outreach. `@glm-minimal`: narrow ≤5K-input one-shot extract/classify jobs that do not need context.
34
+ why: GLM 5.1 doesn't discount cache reads so procedural work there costs ~10×; free-tier models handle procedural work fine empirically (`opencode/big-pickle` processed 1000+ messages at $0)
35
+
36
+ - [D3] Upgrade `apply` routing to `@general-paid` when offer score ≥ 4.0/5, when user flags "top-tier / dream job / high-stakes", or when late-stage pipeline (post-screens).
37
+ why: form-fill flows are 6+ steps; free-tier sometimes aborts mid-flow on large Greenhouse/Workday schemas; paid tier has more headroom
38
+
39
+ - [D4] Auto-submit for offers scoring 3.0+/5 without pausing for confirmation between steps — scan → evaluate → apply is one continuous pipeline. Mark SKIP for <3.0 and move on.
40
+ why: JobForge is designed for end-to-end automation; pausing between steps defeats the purpose and the 3.0 gate already enforces quality
41
+
42
+ - [D5] Before any batch-apply dispatch, run the Apply Preflight location filter from `modes/apply.md` to exclude location-incompatible candidates.
43
+ why: catches the common case where an evaluated role has the right role-shape but a deal-breaking location that profile.yml already rules out
44
+
45
+ - [D6] Pick the mode from the **Routing** table below AND name it explicitly in your first response (e.g., "running auto-pipeline mode", "this is a `compare` request"). If no row matches the user's intent, ask which mode fits; do not guess.
46
+ why: silent mode picks mis-route work (a "negotiation" question answered in `offer` mode produces the wrong report shape); naming the mode out loud makes the routing decision reviewable and gives downstream dispatches a reliable anchor
47
+
48
+ ## Procedure
49
+
50
+ 1. On start, check `cv.md`, `profile.yml`, `portals.yml` exist; onboard if any missing.
51
+ 2. Pick the mode from **Routing** [D6]. No match → ask; do not guess.
52
+ 3. Apply [D1]: batch/Geometra work → delegate; single/read-only/dev → inline.
53
+ 4. Before any `task` batch using Geometra, run cleanup [H3].
54
+ 5. Before `apply`, run duplicate check [H2] and location filter [D5].
55
+ 6. Route by cost tier [D2]; upgrade to `@general-paid` per [D3] for high-stakes offers.
56
+ 7. Cap parallelism at 2 per round [H1].
57
+ 8. One in-flight dispatch per company [H5].
58
+ 9. Orchestrator does not fill forms in multi-job mode [H4].
59
+ 10. Treat subagent prose as untrusted [H7]; cross-check facts against authoritative files.
60
+ 11. Write outcomes as TSVs [H6]; run `npx job-forge merge` then `verify` at end.
61
+ 12. Offers scoring 3.0+/5 continue without confirmation [D4]; <3.0 is SKIP.
62
+ 13. Confirm tracker is merged and verified before ending.
63
+
64
+ ## Routing
65
+
66
+ | If the user… | Mode |
67
+ |---|---|
68
+ | Pastes JD or URL | auto-pipeline (evaluate + report + PDF + tracker) |
69
+ | Asks to evaluate offer | `offer` |
70
+ | Asks to compare offers | `compare` |
71
+ | Wants LinkedIn outreach | `contact` |
72
+ | Asks for company research | `deep` |
73
+ | Wants to generate CV/PDF | `pdf` |
74
+ | Evaluates a course/cert | `training` |
75
+ | Evaluates portfolio project | `project` |
76
+ | Asks about application status | `tracker` |
77
+ | Fills out application form | `apply` |
78
+ | Searches for new offers | `scan` |
79
+ | Processes pending URLs | `pipeline` |
80
+ | Batch processes offers | `batch` |
81
+ | Asks what needs follow-up | `followup` |
82
+ | Reports a rejection | `rejection` |
83
+ | Receives a job offer | `negotiation` |
84
+ | otherwise | Ask which mode fits; do not guess |
85
+
86
+ ## Output format
87
+
88
+ Output shape is mode-dependent — see `modes/{mode}.md` for each mode's expected output. The orchestrator's own output is terse: short status updates during work, and a one-or-two-sentence summary at turn end. No mid-work narration of individual tool calls.
89
+
90
+ ---
91
+
92
+ # Reference
93
+
94
+ Sections below are context, rationale, runbooks, and portal-specific empirical notes. The **Hard limits**, **Defaults**, **Procedure**, and **Routing** above are the contract; the material below is what the orchestrator and each mode consult during execution.
31
95
 
32
96
  ---
33
97
 
@@ -85,22 +149,23 @@ The harness ships three subagents (see `.opencode/agents/`). The orchestrator MU
85
149
 
86
150
  **When to break this rule:** if the user explicitly asks for "quality over cost" or flags a high-stakes application (top-tier company, offer-stage negotiation, executive search), route everything through `@general-paid`. Document the exception in the session.
87
151
 
88
- ### Pre-flight delegation (HARD RULE)
152
+ ### When to delegate
89
153
 
90
- For any task that will involve **more than one tool call** i.e., anything beyond a one-shot answer the orchestrator's **first tool call MUST be `task`** (dispatching to a subagent). Not `Read`, not `Bash`, not `geometra_connect`, not `Grep`. The orchestrator plans and dispatches; subagents execute.
154
+ **Delegate (`task` out) when the work involves repeated tool-heavy steps that bloat the orchestrator's cache prefix.** The concrete failure mode this prevents: a 341-message "apply to 20 jobs" session where repeated `geometra_fill_form` / `geometra_page_model` calls accumulated in history, forcing each new message to re-process 100K+ tokens of fresh input instead of reading from cache.
91
155
 
92
- **Why this is absolute:** every tool call in the orchestrator accumulates in the top-level session's history and pollutes the cache prefix. Once the orchestrator has read three files and made two Geometra calls, delegating to a subagent no longer helps — the subagent inherits the bloated context. The only way to keep the orchestrator lean is to delegate *before* doing anything else.
156
+ **Delegate when:**
157
+ - Applying to N≥2 jobs (repeated Geometra form-fill — the original cache-bust scenario)
158
+ - Batch portal scans hitting ≥3 companies (API loops + page-model reads stack up)
159
+ - Any explicit "apply to... / process pipeline / batch evaluate" phrasing from the user (multi-job intent)
93
160
 
94
- **What counts as "more than one tool call":**
95
- - Evaluating any offer (always ≥3 steps: fetch JD, score, write report)
96
- - Any `/job-forge` mode invocation except `tracker` (read-only)
97
- - Applying to a job
98
- - Scanning portals
99
- - Any batch operation
161
+ **Do NOT delegate orchestrate inline:**
162
+ - Single-offer evaluation (text-heavy, not tool-heavy)
163
+ - Development / bug-fix / file-editing tasks
164
+ - `tracker` and other read-only modes
165
+ - Single-company scan, single-URL check
166
+ - One-shot questions — "what does this mean?", "read X and summarize", "what's my next report number?"
100
167
 
101
- **Explicit exception:** trivial one-shot answers "what does this error mean?", "read this file and summarize", "what's my next report number?" — can stay in the orchestrator. If the question can be answered in ≤1 tool call, do not delegate.
102
-
103
- **Detection signal:** if you (orchestrator) find yourself about to make your 2nd tool call in a session that wasn't a trivial one-shot, STOP. Instead, `task` out the remaining work as a single delegated job.
168
+ **Detection signal:** if you're about to call `geometra_fill_form` for a second *different* job in the same session, STOP and delegate the remainder. For everything else, in-session execution is the expected default.
104
169
 
105
170
  ---
106
171
 
@@ -206,24 +271,7 @@ JobForge is designed to be customized by YOU (opencode). When the user asks you
206
271
 
207
272
  ### Skill Modes
208
273
 
209
- | If the user... | Mode |
210
- |----------------|------|
211
- | Pastes JD or URL | auto-pipeline (evaluate + report + PDF + tracker) |
212
- | Asks to evaluate offer | `offer` |
213
- | Asks to compare offers | `compare` |
214
- | Wants LinkedIn outreach | `contact` |
215
- | Asks for company research | `deep` |
216
- | Wants to generate CV/PDF | `pdf` |
217
- | Evaluates a course/cert | `training` |
218
- | Evaluates portfolio project | `project` |
219
- | Asks about application status | `tracker` |
220
- | Fills out application form | `apply` |
221
- | Searches for new offers | `scan` |
222
- | Processes pending URLs | `pipeline` |
223
- | Batch processes offers | `batch` |
224
- | Asks what needs follow-up | `followup` |
225
- | Reports a rejection | `rejection` |
226
- | Receives a job offer | `negotiation` |
274
+ Mode routing is specified in the top-level **## Routing** section. Each mode is implemented in `modes/{mode}.md` — consult those files for per-mode prompts, state, and expected outputs.
227
275
 
228
276
  ### CV Source of Truth
229
277
 
@@ -368,14 +416,53 @@ These blocks come from two distinct root causes and require different responses:
368
416
 
369
417
  **Known-block Ashby tenants (2026-04-19 empirical observations).** These tenants fired class B on every attempted submit from a headless datacenter-IP proxy. Orchestrators planning apply dispatches should assume these tenants will Fail in headless — prioritize other portals, or skip same-tenant siblings after a confirmed class B to avoid burning subagent slots:
370
418
 
371
- - Vellum, Linear, Vanta, River Financial, Higharc, Trace Labs, Solace Health, Unstructured, ClickUp, Zapier, Deepgram, Ramp, WorkOS
419
+ - Vellum, Linear, Vanta, River Financial, Higharc, Trace Labs, Solace Health, Unstructured, ClickUp, Zapier, Deepgram, Ramp, WorkOS, Ashby (self-tenant), Perplexity, **Goody**, **Starbridge**, **Graphite**, **Prompt Health**, **Vantage**
372
420
 
373
421
  **Known class-A-compatible Ashby tenants (same observations).** These tenants accepted headless submits cleanly, often with `imeFriendly: true` making the difference on the text-field subset:
374
422
 
375
- - Supabase, LangChain, Poolside, Runway Financial
423
+ - Supabase, LangChain, Poolside, Runway Financial, Sentry, Cognition
424
+
425
+ **Base rate for untested Ashby tenants (5/5 tested 2026-04-19 cycle 4 = class B).** The prior today is ~80-90% of untested Ashby tenants fingerprint-block headless submits. Orchestrators should treat any tenant not on the class-A-compatible list as likely class B — still dispatch to collect the data point, but don't burn multiple sibling-role slots on the same Ashby tenant.
376
426
 
377
427
  The pattern is tenant configuration, not role or company size. Lists drift as tenants tune their anti-bot — treat as probabilistic priors, not hard rules.
378
428
 
429
+ **Ashby choice-group with `optionCount: 1` and no labels (Sentry pattern).** Some Ashby tenants render Yes/No work-authorization questions as `role="button" name="Application"` pill toggles where the accessibility tree exposes neither `Yes` nor `No` labels. `fill_fields` with `choiceType: "group"` silently no-ops; `geometra_click` by `id` also fails to toggle. Fix: fall back to `geometra_click` with RAW x,y coordinates at the button centers (Yes is typically the left button, No is the right). Confirmed on Sentry Staff Platform #845, 2026-04-19.
430
+
431
+ ### Other Portal Failure Classes
432
+
433
+ **Typeform applications are Geometra-unsupported.** Some companies (Better Stack confirmed, 2026-04-19) route the Apply link to a Typeform wizard (`*.typeform.com/apply-*`). Typeform renders questions via a custom React/canvas layer that does NOT expose input fields to the accessibility tree — `geometra_form_schema` returns "No forms found", `geometra_query role=textbox` returns empty, blind `geometra_type` produces no semantic change. Mark `Failed` with reason "Typeform portal — Geometra unsupported" on detection; do not burn the 9-minute budget attempting blind input.
434
+
435
+ **Avature multi-step wizards have a native-`<select>` validation lag (Bloomberg pattern).** Bloomberg's careers site redirects to `bloomberg.avature.net` with a 4-step wizard. On Step 2, native `<select>` elements ("Is Current Position? / No") accept the value but keep `invalid: true` persistently — neither Tab, re-submit, nor re-pick clears it. `imeFriendly` has no effect because the field is a native `<select>`, not React-controlled text. There is no documented recovery. Mark `Failed` with reason "Avature native-select validation lag"; account creation up to that point is preserved for any future manual path. Confirmed on Bloomberg Sr SWE Auth #828, 2026-04-19.
436
+
437
+ **Cloudflare / ATS-vendor blocks on Dropbox-class portals.** Dropbox's real apply flow lives behind `happydance.website` (ATS vendor), which Cloudflare-fingerprints headless Chromium + datacenter IPs and returns "Sorry, you have been blocked". `job-boards.greenhouse.io/dropbox` does not mirror — there is no public Greenhouse fallback. Symptom-wise indistinguishable from Ashby class B but at a different layer. Mark `Failed` with reason "ATS vendor Cloudflare block (happydance.website or equivalent)". Confirmed on Dropbox Sr FS Product #831, 2026-04-19.
438
+
439
+ **Greenhouse OTP-on-fill variant (Instacart pattern).** Most Greenhouse OTP flows fire on Submit. A minority (Instacart Staff FoodStorm #827, 2026-04-19) fire the 8-cell security-code gate mid-fill, BEFORE the user clicks Submit. Detection: watch for an 8-cell OTP input surfacing after resume upload or the first listbox commit. Fetch from Gmail (`from:greenhouse newer_than:10m`) immediately when it appears — do not wait for Submit.
440
+
441
+ **`geometra_fill_otp` char-drop on first fill.** Occasionally `fill_otp` lands only the first character of an 8-char code (seen on Instacart, 2026-04-19). Recovery: click the first cell to focus, then re-issue `fill_otp` with `perCharDelayMs: 120`. The form usually auto-submits once all 8 cells are populated.
442
+
443
+ **Breezy portal — tenant-dependent, native `<select>`, resume-auto-parse is primary.** A subset of companies (Avantos AI, Courted, Instinct Science confirmed 2026-04-19) host applications on `*.breezy.hr` or `applytojob.com`. Empirical rules:
444
+
445
+ - **Class is per-tenant, not uniform.** Avantos (Failed 2026-04-19 #854) returned Breezy's own "It looks like maybe you've already applied to this job?" banner from IP fingerprinting, even on a first submit — distinct failure mode from Ashby's "flagged as possible spam". Courted (Applied 2026-04-19 #855) went through cleanly on the same session. Don't pre-skip Breezy; the outcome is tenant-specific.
446
+ - **Native `<select>` elements, not React comboboxes.** `geometra_pick_listbox_option` sets the visible display but NOT the underlying form state — submit will fail with "A response is required" on every combobox. Use `geometra_select_option` with x,y + label value for every choice field on Breezy.
447
+ - **Resume-auto-parse carries the signal.** After resume upload, Breezy auto-parses work history and education into structured rows. Do NOT Add/Delete position rows via Geometra — row mutations reshuffle fieldIds mid-flow, sequential `fill_fields` calls land in wrong rows, and upstream pollution corrupts earlier positions. Trust the parsed resume and fill only Personal Details + salary.
448
+
449
+ **Mailto-apply portals — direct email via gmail-mcp `attachments`.** A subset of HN-listed companies (CoPlane, Gambit Robotics, Rinse, Digital Health Strategies confirmed 2026-04-19) don't host an ATS form — their careers page instructs sending resume by email to `founders@...` / `jobs@...` / `contact@...`. Detection: WebFetch the careers URL; if the Apply link resolves to `mailto:` or the copy reads "email your resume to …", skip Geometra entirely.
450
+
451
+ Use `gmail_send_message` with the `attachments` parameter (available from `@razroo/gmail-mcp@1.8.0`):
452
+
453
+ ```
454
+ gmail_send_message({
455
+ to: ["founders@example.com"],
456
+ subject: "Application — Forward Deployed AI Engineer — Charlie Greenman (Austin)",
457
+ body: "<Section G pitch, 4-8 short paragraphs>",
458
+ attachments: [{ path: "/abs/path/to/Charlie-Greenman-CV.pdf" }]
459
+ })
460
+ ```
461
+
462
+ The MCP reads the file from disk and builds multipart/mixed MIME server-side — do NOT manually base64-encode a PDF into the `raw` parameter (the inline blob exceeds tool-call argument limits for any real attachment). Subject is auto MIME-encoded for non-ASCII (em-dash, smart quotes) by the same version. For older gmail-mcp versions (< 1.8.0) the only path was a direct Gmail API POST with the stored OAuth token at `~/.gmail-mcp/credentials.json` — upgrade if you can.
463
+
464
+ Mark Applied with note `mailto portal — sent via gmail_send_message; Gmail msgId {id}`. Verify via `gmail_get_message` that the attachment intact-size matches what was on disk before writing the TSV.
465
+
379
466
  ### Greenhouse Bot-Detection Honeypots
380
467
 
381
468
  Some Greenhouse tenants (Grafana Labs confirmed, 2026-04-19) inject a honeypot-style single-pick question on the application form, rendered as a listbox labeled something like "Which of the following best describes you?" with options resembling "I am a human being / I am a bot / I am a robot".
package/CLAUDE.md CHANGED
@@ -1,33 +1,97 @@
1
- # JobForge -- AI Job Search Pipeline
1
+ # Agent: job-forge
2
2
 
3
- ## Hard Limits NEVER exceed these numbers
3
+ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs via Geometra MCP, applies to jobs, tracks applications across day files. Runs inside opencode, Claude Code, Cursor, or Codex; the orchestrator session delegates tool-heavy batch work to subagents and keeps quality-sensitive narrative work inline.
4
4
 
5
- The Hard Limits below are non-negotiable numeric rules. If you catch yourself about to violate one, STOP and restructure.
5
+ ## Hard limits
6
6
 
7
- 1. **Max parallel subagents: 2.** Never emit 3+ `task` tool calls in a single message. For N jobs, run `ceil(N/2)` sequential rounds of 2. No exceptions not for "urgent", not for "the user asked for 10".
8
- 2. **Max 1 application per company+role.** Before every `task` dispatch for `apply`, Grep ALL of the following for the URL and for `company+role`:
9
- - `data/pipeline.md`
10
- - all `data/applications/*.md` day files (not just today's — prior-day Applies count too)
11
- - `batch/tracker-additions/*.tsv` (pending outcomes not yet merged)
12
- - `batch/tracker-additions/merged/*.tsv` (outcomes already consumed into day files — catches same-day earlier-batch Applies that merge collapsed into an existing row)
7
+ - [H1] Max 2 parallel `task` dispatches per message. For N jobs, run `ceil(N/2)` sequential rounds of 2. Applies in all modes, for all user phrasings ("urgent", "apply to 10 jobs now").
8
+ why: higher parallelism blows through free-tier rate limits; each subagent requires post-cleanup and racing more than 2 reliably loses at least one result
13
9
 
14
- If any source shows an APPLIED / Applied outcome for this URL or company+role, skip that job and do not dispatch. **Why merged/ matters**: when two batches in the same day target the same role, `npx job-forge merge` updates the existing day-file row instead of creating a new one — so `grep data/applications/*.md` for the higher report number misses the earlier apply. The merged TSV is the only place the newer attempt's breadcrumb remains.
15
- 3. **Always clean Geometra sessions before dispatching.** Before every round of `task` dispatches that will use Geometra, call `geometra_list_sessions` then `geometra_disconnect({closeBrowser: true})`. Every round. The disconnect is a no-op when the pool is empty.
16
- 4. **Orchestrator does NOT fill forms.** This session MUST NOT call `geometra_fill_form`, `geometra_run_actions`, `geometra_pick_listbox_option`, or `geometra_fill_otp` when handling a multi-job request. If you need to, it means you MUST have delegated — `task` out the remaining work instead.
17
- 5. **Re-dispatch only AFTER the previous subagent returns.** Never fire the same company's `task` twice while the first is still in-flight. Wait for the return value, then decide if a retry is warranted.
18
- 6. **Application outcomes flow through TSVs, not `data/pipeline.md`.** When a subagent returns APPLIED / FAILED / SKIP, the outcome goes to `batch/tracker-additions/{num}-{slug}.tsv`. `npx job-forge merge` then consumes the TSVs into the correct `data/applications/YYYY-MM-DD.md` day file. `data/pipeline.md` only tracks URL inbox state (`[ ]` pending → `[x]` processed). **NEVER append APPLIED / FAILED status lines to `pipeline.md`** — that's the day file's job, via the TSV pathway. After any multi-apply run, the orchestrator MUST run `npx job-forge merge` followed by `npx job-forge verify` before ending the session.
19
- 7. **Load-bearing facts passed to downstream subagents must come from a file, not from a prior subagent's prose.** A URL, score, email ID, confirmation page snippet, JD salary range, exact answer submitted to a form question, or any other specific value that a downstream subagent will act on MUST originate from one of:
20
- - `data/pipeline.md` (URL inbox state)
21
- - `data/scan-history.tsv` (scan provenance)
22
- - `batch/scan-output-*.md` (scan-ranked candidates)
23
- - A report file (`reports/{num}-*.md`) with authoritative headers (`**URL:**`, `**Score:**`, etc.)
24
- - A TSV in `batch/tracker-additions/` (per-apply outcomes)
10
+ - [H2] Max 1 application per company+role. Before every `apply` dispatch, grep all four sources for the URL and for `company+role`: `data/pipeline.md`, all `data/applications/*.md` day files, `batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`. If any source shows APPLIED / Applied, skip the dispatch.
11
+ why: 2026-04 same-day batch collision when two batches target the same role, `npx job-forge merge` updates the existing day-file row rather than appending, so grepping day files alone misses earlier-batch applies; merged/*.tsv is the only place the breadcrumb remains
25
12
 
26
- **Not trustworthy by default**: anything quoted from a subagent's return message, any ID or score the orchestrator "remembers" from prose, any page-content snippet reproduced from a subagent's narrative. Subagents can hallucinate plausible-looking IDs, scores, and confirmation text. Before passing any such fact to another subagent, cross-check it exists in one of the authoritative files above, OR instruct the dispatching subagent to write its output to a structured file and re-read from that file.
13
+ - [H3] Before every batch of `task` dispatches that will use Geometra, call `geometra_list_sessions` then `geometra_disconnect({closeBrowser: true})`. Every round, no exceptions. Name this cleanup as an explicit "step 0" in your first-response plan for any multi-apply request it is the most frequently skipped guardrail in practice, and skipping it produces cascade "Not connected" failures on the next dispatch.
14
+ why: if any prior subagent aborted mid-flow, its Chromium session stays stuck in the MCP pool and the next `geometra_connect` fails with "Not connected"; the disconnect is a no-op when the pool is empty but a poison-cure when it isn't; vocalizing it up-front doubles the odds it actually runs
27
15
 
28
- **Why**: on 2026-04-18, a scan subagent returned 30 fabricated Greenhouse IDs in prose (correct role titles, plausible-looking invented IDs that didn't exist in the API). The orchestrator dispatched 30 downstream subagents that all hit 404s. Verification rules downstream (Hard Limit #6, API-first verify) caught the symptom. This rule prevents the *shape* of the bug hallucinations propagating through prose handoffs across all quantitative / identifier / specific-fact claims, not just URLs.
16
+ - [H4] In multi-job mode, the orchestrator session MUST NOT call `geometra_fill_form`, `geometra_run_actions`, `geometra_pick_listbox_option`, or `geometra_fill_otp` directly. Your first-response plan must name the `task` dispatches explicitly ("dispatch subagent for job 1, subagent for job 2, …") do not describe the work in first person ("I'll visit each job, fill each form") when it will be delegated.
17
+ why: repeated Geometra calls in the orchestrator bloat the cache prefix — this is the 2026-04 "apply to 20 jobs" 341-msg incident where each turn re-processed 100K+ fresh tokens instead of reading from cache; first-person narration is a leading indicator that the agent is mentally queueing work for itself rather than a subagent
29
18
 
30
- Everything below is context and rationale. These seven numbers are the rules.
19
+ - [H5] Re-dispatch the same company only AFTER the previous subagent returns. Never fire the same `task` twice while the first is still in flight.
20
+ why: two in-flight subagents for the same URL race on Geometra sessions and on tracker TSV writes, corrupting state and sometimes double-submitting
21
+
22
+ - [H6] Application outcomes flow through `batch/tracker-additions/*.tsv`, not `data/pipeline.md`. After any multi-apply run, the orchestrator MUST run `npx job-forge merge` then `npx job-forge verify` before ending the session.
23
+ why: `pipeline.md` is the URL inbox (`[ ]` pending → `[x]` processed); `data/applications/YYYY-MM-DD.md` is the outcome log; the TSV pathway is the only safe bridge because `merge` handles column order and duplicate detection
24
+
25
+ - [H7] Load-bearing facts passed to downstream subagents must originate from a file, not from prior subagent prose. Authoritative sources: `data/pipeline.md`, `data/scan-history.tsv`, `batch/scan-output-*.md`, `reports/{num}-*.md` with `**URL:**` / `**Score:**` headers, `batch/tracker-additions/*.tsv`.
26
+ why: 2026-04-18 scan subagent returned 30 fabricated Greenhouse IDs in prose (plausible-looking, non-existent); orchestrator dispatched 30 downstream subagents that all 404'd. Subagents can hallucinate IDs, scores, and confirmation text — round-trip through a file or don't trust the value
27
+
28
+ ## Defaults
29
+
30
+ - [D1] Delegate to a subagent (`task`) only when the work involves repeated tool-heavy steps that bloat the cache prefix: applying to N≥2 jobs, batch scans hitting ≥3 companies, or any "apply to… / process pipeline / batch evaluate" user phrasing. Single-offer evals, dev work, file edits, `tracker` mode, single-URL checks, and one-shot questions stay inline.
31
+ why: iso-trace showed 0.25% Agent calls across 5174 turns under a prior over-broad "delegate before 2nd tool call" rule — the rule was ignored in practice; narrowing matches the original cache-bust incident
32
+
33
+ - [D2] Route subagent work by cost tier. `@general-free`: procedural — form-fill, TSV merge, verify, OTP retrieval, portal scan metadata extraction, one-shot structured-field transforms. `@general-paid`: quality-sensitive — offer evaluation narrative Blocks A-F, cover letters, "Why X?" answers, STAR interview stories, LinkedIn outreach. `@glm-minimal`: narrow ≤5K-input one-shot extract/classify jobs that do not need context.
34
+ why: GLM 5.1 doesn't discount cache reads so procedural work there costs ~10×; free-tier models handle procedural work fine empirically (`opencode/big-pickle` processed 1000+ messages at $0)
35
+
36
+ - [D3] Upgrade `apply` routing to `@general-paid` when offer score ≥ 4.0/5, when user flags "top-tier / dream job / high-stakes", or when late-stage pipeline (post-screens).
37
+ why: form-fill flows are 6+ steps; free-tier sometimes aborts mid-flow on large Greenhouse/Workday schemas; paid tier has more headroom
38
+
39
+ - [D4] Auto-submit for offers scoring 3.0+/5 without pausing for confirmation between steps — scan → evaluate → apply is one continuous pipeline. Mark SKIP for <3.0 and move on.
40
+ why: JobForge is designed for end-to-end automation; pausing between steps defeats the purpose and the 3.0 gate already enforces quality
41
+
42
+ - [D5] Before any batch-apply dispatch, run the Apply Preflight location filter from `modes/apply.md` to exclude location-incompatible candidates.
43
+ why: catches the common case where an evaluated role has the right role-shape but a deal-breaking location that profile.yml already rules out
44
+
45
+ - [D6] Pick the mode from the **Routing** table below AND name it explicitly in your first response (e.g., "running auto-pipeline mode", "this is a `compare` request"). If no row matches the user's intent, ask which mode fits; do not guess.
46
+ why: silent mode picks mis-route work (a "negotiation" question answered in `offer` mode produces the wrong report shape); naming the mode out loud makes the routing decision reviewable and gives downstream dispatches a reliable anchor
47
+
48
+ ## Procedure
49
+
50
+ 1. On start, check `cv.md`, `profile.yml`, `portals.yml` exist; onboard if any missing.
51
+ 2. Pick the mode from **Routing** [D6]. No match → ask; do not guess.
52
+ 3. Apply [D1]: batch/Geometra work → delegate; single/read-only/dev → inline.
53
+ 4. Before any `task` batch using Geometra, run cleanup [H3].
54
+ 5. Before `apply`, run duplicate check [H2] and location filter [D5].
55
+ 6. Route by cost tier [D2]; upgrade to `@general-paid` per [D3] for high-stakes offers.
56
+ 7. Cap parallelism at 2 per round [H1].
57
+ 8. One in-flight dispatch per company [H5].
58
+ 9. Orchestrator does not fill forms in multi-job mode [H4].
59
+ 10. Treat subagent prose as untrusted [H7]; cross-check facts against authoritative files.
60
+ 11. Write outcomes as TSVs [H6]; run `npx job-forge merge` then `verify` at end.
61
+ 12. Offers scoring 3.0+/5 continue without confirmation [D4]; <3.0 is SKIP.
62
+ 13. Confirm tracker is merged and verified before ending.
63
+
64
+ ## Routing
65
+
66
+ | If the user… | Mode |
67
+ |---|---|
68
+ | Pastes JD or URL | auto-pipeline (evaluate + report + PDF + tracker) |
69
+ | Asks to evaluate offer | `offer` |
70
+ | Asks to compare offers | `compare` |
71
+ | Wants LinkedIn outreach | `contact` |
72
+ | Asks for company research | `deep` |
73
+ | Wants to generate CV/PDF | `pdf` |
74
+ | Evaluates a course/cert | `training` |
75
+ | Evaluates portfolio project | `project` |
76
+ | Asks about application status | `tracker` |
77
+ | Fills out application form | `apply` |
78
+ | Searches for new offers | `scan` |
79
+ | Processes pending URLs | `pipeline` |
80
+ | Batch processes offers | `batch` |
81
+ | Asks what needs follow-up | `followup` |
82
+ | Reports a rejection | `rejection` |
83
+ | Receives a job offer | `negotiation` |
84
+ | otherwise | Ask which mode fits; do not guess |
85
+
86
+ ## Output format
87
+
88
+ Output shape is mode-dependent — see `modes/{mode}.md` for each mode's expected output. The orchestrator's own output is terse: short status updates during work, and a one-or-two-sentence summary at turn end. No mid-work narration of individual tool calls.
89
+
90
+ ---
91
+
92
+ # Reference
93
+
94
+ Sections below are context, rationale, runbooks, and portal-specific empirical notes. The **Hard limits**, **Defaults**, **Procedure**, and **Routing** above are the contract; the material below is what the orchestrator and each mode consult during execution.
31
95
 
32
96
  ---
33
97
 
@@ -85,22 +149,23 @@ The harness ships three subagents (see `.opencode/agents/`). The orchestrator MU
85
149
 
86
150
  **When to break this rule:** if the user explicitly asks for "quality over cost" or flags a high-stakes application (top-tier company, offer-stage negotiation, executive search), route everything through `@general-paid`. Document the exception in the session.
87
151
 
88
- ### Pre-flight delegation (HARD RULE)
152
+ ### When to delegate
89
153
 
90
- For any task that will involve **more than one tool call** i.e., anything beyond a one-shot answer the orchestrator's **first tool call MUST be `task`** (dispatching to a subagent). Not `Read`, not `Bash`, not `geometra_connect`, not `Grep`. The orchestrator plans and dispatches; subagents execute.
154
+ **Delegate (`task` out) when the work involves repeated tool-heavy steps that bloat the orchestrator's cache prefix.** The concrete failure mode this prevents: a 341-message "apply to 20 jobs" session where repeated `geometra_fill_form` / `geometra_page_model` calls accumulated in history, forcing each new message to re-process 100K+ tokens of fresh input instead of reading from cache.
91
155
 
92
- **Why this is absolute:** every tool call in the orchestrator accumulates in the top-level session's history and pollutes the cache prefix. Once the orchestrator has read three files and made two Geometra calls, delegating to a subagent no longer helps — the subagent inherits the bloated context. The only way to keep the orchestrator lean is to delegate *before* doing anything else.
156
+ **Delegate when:**
157
+ - Applying to N≥2 jobs (repeated Geometra form-fill — the original cache-bust scenario)
158
+ - Batch portal scans hitting ≥3 companies (API loops + page-model reads stack up)
159
+ - Any explicit "apply to... / process pipeline / batch evaluate" phrasing from the user (multi-job intent)
93
160
 
94
- **What counts as "more than one tool call":**
95
- - Evaluating any offer (always ≥3 steps: fetch JD, score, write report)
96
- - Any `/job-forge` mode invocation except `tracker` (read-only)
97
- - Applying to a job
98
- - Scanning portals
99
- - Any batch operation
161
+ **Do NOT delegate orchestrate inline:**
162
+ - Single-offer evaluation (text-heavy, not tool-heavy)
163
+ - Development / bug-fix / file-editing tasks
164
+ - `tracker` and other read-only modes
165
+ - Single-company scan, single-URL check
166
+ - One-shot questions — "what does this mean?", "read X and summarize", "what's my next report number?"
100
167
 
101
- **Explicit exception:** trivial one-shot answers "what does this error mean?", "read this file and summarize", "what's my next report number?" — can stay in the orchestrator. If the question can be answered in ≤1 tool call, do not delegate.
102
-
103
- **Detection signal:** if you (orchestrator) find yourself about to make your 2nd tool call in a session that wasn't a trivial one-shot, STOP. Instead, `task` out the remaining work as a single delegated job.
168
+ **Detection signal:** if you're about to call `geometra_fill_form` for a second *different* job in the same session, STOP and delegate the remainder. For everything else, in-session execution is the expected default.
104
169
 
105
170
  ---
106
171
 
@@ -206,24 +271,7 @@ JobForge is designed to be customized by YOU (opencode). When the user asks you
206
271
 
207
272
  ### Skill Modes
208
273
 
209
- | If the user... | Mode |
210
- |----------------|------|
211
- | Pastes JD or URL | auto-pipeline (evaluate + report + PDF + tracker) |
212
- | Asks to evaluate offer | `offer` |
213
- | Asks to compare offers | `compare` |
214
- | Wants LinkedIn outreach | `contact` |
215
- | Asks for company research | `deep` |
216
- | Wants to generate CV/PDF | `pdf` |
217
- | Evaluates a course/cert | `training` |
218
- | Evaluates portfolio project | `project` |
219
- | Asks about application status | `tracker` |
220
- | Fills out application form | `apply` |
221
- | Searches for new offers | `scan` |
222
- | Processes pending URLs | `pipeline` |
223
- | Batch processes offers | `batch` |
224
- | Asks what needs follow-up | `followup` |
225
- | Reports a rejection | `rejection` |
226
- | Receives a job offer | `negotiation` |
274
+ Mode routing is specified in the top-level **## Routing** section. Each mode is implemented in `modes/{mode}.md` — consult those files for per-mode prompts, state, and expected outputs.
227
275
 
228
276
  ### CV Source of Truth
229
277
 
@@ -368,14 +416,53 @@ These blocks come from two distinct root causes and require different responses:
368
416
 
369
417
  **Known-block Ashby tenants (2026-04-19 empirical observations).** These tenants fired class B on every attempted submit from a headless datacenter-IP proxy. Orchestrators planning apply dispatches should assume these tenants will Fail in headless — prioritize other portals, or skip same-tenant siblings after a confirmed class B to avoid burning subagent slots:
370
418
 
371
- - Vellum, Linear, Vanta, River Financial, Higharc, Trace Labs, Solace Health, Unstructured, ClickUp, Zapier, Deepgram, Ramp, WorkOS
419
+ - Vellum, Linear, Vanta, River Financial, Higharc, Trace Labs, Solace Health, Unstructured, ClickUp, Zapier, Deepgram, Ramp, WorkOS, Ashby (self-tenant), Perplexity, **Goody**, **Starbridge**, **Graphite**, **Prompt Health**, **Vantage**
372
420
 
373
421
  **Known class-A-compatible Ashby tenants (same observations).** These tenants accepted headless submits cleanly, often with `imeFriendly: true` making the difference on the text-field subset:
374
422
 
375
- - Supabase, LangChain, Poolside, Runway Financial
423
+ - Supabase, LangChain, Poolside, Runway Financial, Sentry, Cognition
424
+
425
+ **Base rate for untested Ashby tenants (5/5 tested 2026-04-19 cycle 4 = class B).** The prior today is ~80-90% of untested Ashby tenants fingerprint-block headless submits. Orchestrators should treat any tenant not on the class-A-compatible list as likely class B — still dispatch to collect the data point, but don't burn multiple sibling-role slots on the same Ashby tenant.
376
426
 
377
427
  The pattern is tenant configuration, not role or company size. Lists drift as tenants tune their anti-bot — treat as probabilistic priors, not hard rules.
378
428
 
429
+ **Ashby choice-group with `optionCount: 1` and no labels (Sentry pattern).** Some Ashby tenants render Yes/No work-authorization questions as `role="button" name="Application"` pill toggles where the accessibility tree exposes neither `Yes` nor `No` labels. `fill_fields` with `choiceType: "group"` silently no-ops; `geometra_click` by `id` also fails to toggle. Fix: fall back to `geometra_click` with RAW x,y coordinates at the button centers (Yes is typically the left button, No is the right). Confirmed on Sentry Staff Platform #845, 2026-04-19.
430
+
431
+ ### Other Portal Failure Classes
432
+
433
+ **Typeform applications are Geometra-unsupported.** Some companies (Better Stack confirmed, 2026-04-19) route the Apply link to a Typeform wizard (`*.typeform.com/apply-*`). Typeform renders questions via a custom React/canvas layer that does NOT expose input fields to the accessibility tree — `geometra_form_schema` returns "No forms found", `geometra_query role=textbox` returns empty, blind `geometra_type` produces no semantic change. Mark `Failed` with reason "Typeform portal — Geometra unsupported" on detection; do not burn the 9-minute budget attempting blind input.
434
+
435
+ **Avature multi-step wizards have a native-`<select>` validation lag (Bloomberg pattern).** Bloomberg's careers site redirects to `bloomberg.avature.net` with a 4-step wizard. On Step 2, native `<select>` elements ("Is Current Position? / No") accept the value but keep `invalid: true` persistently — neither Tab, re-submit, nor re-pick clears it. `imeFriendly` has no effect because the field is a native `<select>`, not React-controlled text. There is no documented recovery. Mark `Failed` with reason "Avature native-select validation lag"; account creation up to that point is preserved for any future manual path. Confirmed on Bloomberg Sr SWE Auth #828, 2026-04-19.
436
+
437
+ **Cloudflare / ATS-vendor blocks on Dropbox-class portals.** Dropbox's real apply flow lives behind `happydance.website` (ATS vendor), which Cloudflare-fingerprints headless Chromium + datacenter IPs and returns "Sorry, you have been blocked". `job-boards.greenhouse.io/dropbox` does not mirror — there is no public Greenhouse fallback. Symptom-wise indistinguishable from Ashby class B but at a different layer. Mark `Failed` with reason "ATS vendor Cloudflare block (happydance.website or equivalent)". Confirmed on Dropbox Sr FS Product #831, 2026-04-19.
438
+
439
+ **Greenhouse OTP-on-fill variant (Instacart pattern).** Most Greenhouse OTP flows fire on Submit. A minority (Instacart Staff FoodStorm #827, 2026-04-19) fire the 8-cell security-code gate mid-fill, BEFORE the user clicks Submit. Detection: watch for an 8-cell OTP input surfacing after resume upload or the first listbox commit. Fetch from Gmail (`from:greenhouse newer_than:10m`) immediately when it appears — do not wait for Submit.
440
+
441
+ **`geometra_fill_otp` char-drop on first fill.** Occasionally `fill_otp` lands only the first character of an 8-char code (seen on Instacart, 2026-04-19). Recovery: click the first cell to focus, then re-issue `fill_otp` with `perCharDelayMs: 120`. The form usually auto-submits once all 8 cells are populated.
442
+
443
+ **Breezy portal — tenant-dependent, native `<select>`, resume-auto-parse is primary.** A subset of companies (Avantos AI, Courted, Instinct Science confirmed 2026-04-19) host applications on `*.breezy.hr` or `applytojob.com`. Empirical rules:
444
+
445
+ - **Class is per-tenant, not uniform.** Avantos (Failed 2026-04-19 #854) returned Breezy's own "It looks like maybe you've already applied to this job?" banner from IP fingerprinting, even on a first submit — distinct failure mode from Ashby's "flagged as possible spam". Courted (Applied 2026-04-19 #855) went through cleanly on the same session. Don't pre-skip Breezy; the outcome is tenant-specific.
446
+ - **Native `<select>` elements, not React comboboxes.** `geometra_pick_listbox_option` sets the visible display but NOT the underlying form state — submit will fail with "A response is required" on every combobox. Use `geometra_select_option` with x,y + label value for every choice field on Breezy.
447
+ - **Resume-auto-parse carries the signal.** After resume upload, Breezy auto-parses work history and education into structured rows. Do NOT Add/Delete position rows via Geometra — row mutations reshuffle fieldIds mid-flow, sequential `fill_fields` calls land in wrong rows, and upstream pollution corrupts earlier positions. Trust the parsed resume and fill only Personal Details + salary.
448
+
449
+ **Mailto-apply portals — direct email via gmail-mcp `attachments`.** A subset of HN-listed companies (CoPlane, Gambit Robotics, Rinse, Digital Health Strategies confirmed 2026-04-19) don't host an ATS form — their careers page instructs sending resume by email to `founders@...` / `jobs@...` / `contact@...`. Detection: WebFetch the careers URL; if the Apply link resolves to `mailto:` or the copy reads "email your resume to …", skip Geometra entirely.
450
+
451
+ Use `gmail_send_message` with the `attachments` parameter (available from `@razroo/gmail-mcp@1.8.0`):
452
+
453
+ ```
454
+ gmail_send_message({
455
+ to: ["founders@example.com"],
456
+ subject: "Application — Forward Deployed AI Engineer — Charlie Greenman (Austin)",
457
+ body: "<Section G pitch, 4-8 short paragraphs>",
458
+ attachments: [{ path: "/abs/path/to/Charlie-Greenman-CV.pdf" }]
459
+ })
460
+ ```
461
+
462
+ The MCP reads the file from disk and builds multipart/mixed MIME server-side — do NOT manually base64-encode a PDF into the `raw` parameter (the inline blob exceeds tool-call argument limits for any real attachment). Subject is auto MIME-encoded for non-ASCII (em-dash, smart quotes) by the same version. For older gmail-mcp versions (< 1.8.0) the only path was a direct Gmail API POST with the stored OAuth token at `~/.gmail-mcp/credentials.json` — upgrade if you can.
463
+
464
+ Mark Applied with note `mailto portal — sent via gmail_send_message; Gmail msgId {id}`. Verify via `gmail_get_message` that the attachment intact-size matches what was on disk before writing the TSV.
465
+
379
466
  ### Greenhouse Bot-Detection Honeypots
380
467
 
381
468
  Some Greenhouse tenants (Grafana Labs confirmed, 2026-04-19) inject a honeypot-style single-pick question on the application form, rendered as a listbox labeled something like "Which of the following best describes you?" with options resembling "I am a human being / I am a bot / I am a robot".
@@ -1,33 +1,97 @@
1
- # JobForge -- AI Job Search Pipeline
1
+ # Agent: job-forge
2
2
 
3
- ## Hard Limits NEVER exceed these numbers
3
+ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs via Geometra MCP, applies to jobs, tracks applications across day files. Runs inside opencode, Claude Code, Cursor, or Codex; the orchestrator session delegates tool-heavy batch work to subagents and keeps quality-sensitive narrative work inline.
4
4
 
5
- The Hard Limits below are non-negotiable numeric rules. If you catch yourself about to violate one, STOP and restructure.
5
+ ## Hard limits
6
6
 
7
- 1. **Max parallel subagents: 2.** Never emit 3+ `task` tool calls in a single message. For N jobs, run `ceil(N/2)` sequential rounds of 2. No exceptions not for "urgent", not for "the user asked for 10".
8
- 2. **Max 1 application per company+role.** Before every `task` dispatch for `apply`, Grep ALL of the following for the URL and for `company+role`:
9
- - `data/pipeline.md`
10
- - all `data/applications/*.md` day files (not just today's — prior-day Applies count too)
11
- - `batch/tracker-additions/*.tsv` (pending outcomes not yet merged)
12
- - `batch/tracker-additions/merged/*.tsv` (outcomes already consumed into day files — catches same-day earlier-batch Applies that merge collapsed into an existing row)
7
+ - [H1] Max 2 parallel `task` dispatches per message. For N jobs, run `ceil(N/2)` sequential rounds of 2. Applies in all modes, for all user phrasings ("urgent", "apply to 10 jobs now").
8
+ why: higher parallelism blows through free-tier rate limits; each subagent requires post-cleanup and racing more than 2 reliably loses at least one result
13
9
 
14
- If any source shows an APPLIED / Applied outcome for this URL or company+role, skip that job and do not dispatch. **Why merged/ matters**: when two batches in the same day target the same role, `npx job-forge merge` updates the existing day-file row instead of creating a new one — so `grep data/applications/*.md` for the higher report number misses the earlier apply. The merged TSV is the only place the newer attempt's breadcrumb remains.
15
- 3. **Always clean Geometra sessions before dispatching.** Before every round of `task` dispatches that will use Geometra, call `geometra_list_sessions` then `geometra_disconnect({closeBrowser: true})`. Every round. The disconnect is a no-op when the pool is empty.
16
- 4. **Orchestrator does NOT fill forms.** This session MUST NOT call `geometra_fill_form`, `geometra_run_actions`, `geometra_pick_listbox_option`, or `geometra_fill_otp` when handling a multi-job request. If you need to, it means you MUST have delegated — `task` out the remaining work instead.
17
- 5. **Re-dispatch only AFTER the previous subagent returns.** Never fire the same company's `task` twice while the first is still in-flight. Wait for the return value, then decide if a retry is warranted.
18
- 6. **Application outcomes flow through TSVs, not `data/pipeline.md`.** When a subagent returns APPLIED / FAILED / SKIP, the outcome goes to `batch/tracker-additions/{num}-{slug}.tsv`. `npx job-forge merge` then consumes the TSVs into the correct `data/applications/YYYY-MM-DD.md` day file. `data/pipeline.md` only tracks URL inbox state (`[ ]` pending → `[x]` processed). **NEVER append APPLIED / FAILED status lines to `pipeline.md`** — that's the day file's job, via the TSV pathway. After any multi-apply run, the orchestrator MUST run `npx job-forge merge` followed by `npx job-forge verify` before ending the session.
19
- 7. **Load-bearing facts passed to downstream subagents must come from a file, not from a prior subagent's prose.** A URL, score, email ID, confirmation page snippet, JD salary range, exact answer submitted to a form question, or any other specific value that a downstream subagent will act on MUST originate from one of:
20
- - `data/pipeline.md` (URL inbox state)
21
- - `data/scan-history.tsv` (scan provenance)
22
- - `batch/scan-output-*.md` (scan-ranked candidates)
23
- - A report file (`reports/{num}-*.md`) with authoritative headers (`**URL:**`, `**Score:**`, etc.)
24
- - A TSV in `batch/tracker-additions/` (per-apply outcomes)
10
+ - [H2] Max 1 application per company+role. Before every `apply` dispatch, grep all four sources for the URL and for `company+role`: `data/pipeline.md`, all `data/applications/*.md` day files, `batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`. If any source shows APPLIED / Applied, skip the dispatch.
11
+ why: 2026-04 same-day batch collision when two batches target the same role, `npx job-forge merge` updates the existing day-file row rather than appending, so grepping day files alone misses earlier-batch applies; merged/*.tsv is the only place the breadcrumb remains
25
12
 
26
- **Not trustworthy by default**: anything quoted from a subagent's return message, any ID or score the orchestrator "remembers" from prose, any page-content snippet reproduced from a subagent's narrative. Subagents can hallucinate plausible-looking IDs, scores, and confirmation text. Before passing any such fact to another subagent, cross-check it exists in one of the authoritative files above, OR instruct the dispatching subagent to write its output to a structured file and re-read from that file.
13
+ - [H3] Before every batch of `task` dispatches that will use Geometra, call `geometra_list_sessions` then `geometra_disconnect({closeBrowser: true})`. Every round, no exceptions. Name this cleanup as an explicit "step 0" in your first-response plan for any multi-apply request it is the most frequently skipped guardrail in practice, and skipping it produces cascade "Not connected" failures on the next dispatch.
14
+ why: if any prior subagent aborted mid-flow, its Chromium session stays stuck in the MCP pool and the next `geometra_connect` fails with "Not connected"; the disconnect is a no-op when the pool is empty but a poison-cure when it isn't; vocalizing it up-front doubles the odds it actually runs
27
15
 
28
- **Why**: on 2026-04-18, a scan subagent returned 30 fabricated Greenhouse IDs in prose (correct role titles, plausible-looking invented IDs that didn't exist in the API). The orchestrator dispatched 30 downstream subagents that all hit 404s. Verification rules downstream (Hard Limit #6, API-first verify) caught the symptom. This rule prevents the *shape* of the bug hallucinations propagating through prose handoffs across all quantitative / identifier / specific-fact claims, not just URLs.
16
+ - [H4] In multi-job mode, the orchestrator session MUST NOT call `geometra_fill_form`, `geometra_run_actions`, `geometra_pick_listbox_option`, or `geometra_fill_otp` directly. Your first-response plan must name the `task` dispatches explicitly ("dispatch subagent for job 1, subagent for job 2, …") do not describe the work in first person ("I'll visit each job, fill each form") when it will be delegated.
17
+ why: repeated Geometra calls in the orchestrator bloat the cache prefix — this is the 2026-04 "apply to 20 jobs" 341-msg incident where each turn re-processed 100K+ fresh tokens instead of reading from cache; first-person narration is a leading indicator that the agent is mentally queueing work for itself rather than a subagent
29
18
 
30
- Everything below is context and rationale. These seven numbers are the rules.
19
+ - [H5] Re-dispatch the same company only AFTER the previous subagent returns. Never fire the same `task` twice while the first is still in flight.
20
+ why: two in-flight subagents for the same URL race on Geometra sessions and on tracker TSV writes, corrupting state and sometimes double-submitting
21
+
22
+ - [H6] Application outcomes flow through `batch/tracker-additions/*.tsv`, not `data/pipeline.md`. After any multi-apply run, the orchestrator MUST run `npx job-forge merge` then `npx job-forge verify` before ending the session.
23
+ why: `pipeline.md` is the URL inbox (`[ ]` pending → `[x]` processed); `data/applications/YYYY-MM-DD.md` is the outcome log; the TSV pathway is the only safe bridge because `merge` handles column order and duplicate detection
24
+
25
+ - [H7] Load-bearing facts passed to downstream subagents must originate from a file, not from prior subagent prose. Authoritative sources: `data/pipeline.md`, `data/scan-history.tsv`, `batch/scan-output-*.md`, `reports/{num}-*.md` with `**URL:**` / `**Score:**` headers, `batch/tracker-additions/*.tsv`.
26
+ why: 2026-04-18 scan subagent returned 30 fabricated Greenhouse IDs in prose (plausible-looking, non-existent); orchestrator dispatched 30 downstream subagents that all 404'd. Subagents can hallucinate IDs, scores, and confirmation text — round-trip through a file or don't trust the value
27
+
28
+ ## Defaults
29
+
30
+ - [D1] Delegate to a subagent (`task`) only when the work involves repeated tool-heavy steps that bloat the cache prefix: applying to N≥2 jobs, batch scans hitting ≥3 companies, or any "apply to… / process pipeline / batch evaluate" user phrasing. Single-offer evals, dev work, file edits, `tracker` mode, single-URL checks, and one-shot questions stay inline.
31
+ why: iso-trace showed 0.25% Agent calls across 5174 turns under a prior over-broad "delegate before 2nd tool call" rule — the rule was ignored in practice; narrowing matches the original cache-bust incident
32
+
33
+ - [D2] Route subagent work by cost tier. `@general-free`: procedural — form-fill, TSV merge, verify, OTP retrieval, portal scan metadata extraction, one-shot structured-field transforms. `@general-paid`: quality-sensitive — offer evaluation narrative Blocks A-F, cover letters, "Why X?" answers, STAR interview stories, LinkedIn outreach. `@glm-minimal`: narrow ≤5K-input one-shot extract/classify jobs that do not need context.
34
+ why: GLM 5.1 doesn't discount cache reads so procedural work there costs ~10×; free-tier models handle procedural work fine empirically (`opencode/big-pickle` processed 1000+ messages at $0)
35
+
36
+ - [D3] Upgrade `apply` routing to `@general-paid` when offer score ≥ 4.0/5, when user flags "top-tier / dream job / high-stakes", or when late-stage pipeline (post-screens).
37
+ why: form-fill flows are 6+ steps; free-tier sometimes aborts mid-flow on large Greenhouse/Workday schemas; paid tier has more headroom
38
+
39
+ - [D4] Auto-submit for offers scoring 3.0+/5 without pausing for confirmation between steps — scan → evaluate → apply is one continuous pipeline. Mark SKIP for <3.0 and move on.
40
+ why: JobForge is designed for end-to-end automation; pausing between steps defeats the purpose and the 3.0 gate already enforces quality
41
+
42
+ - [D5] Before any batch-apply dispatch, run the Apply Preflight location filter from `modes/apply.md` to exclude location-incompatible candidates.
43
+ why: catches the common case where an evaluated role has the right role-shape but a deal-breaking location that profile.yml already rules out
44
+
45
+ - [D6] Pick the mode from the **Routing** table below AND name it explicitly in your first response (e.g., "running auto-pipeline mode", "this is a `compare` request"). If no row matches the user's intent, ask which mode fits; do not guess.
46
+ why: silent mode picks mis-route work (a "negotiation" question answered in `offer` mode produces the wrong report shape); naming the mode out loud makes the routing decision reviewable and gives downstream dispatches a reliable anchor
47
+
48
+ ## Procedure
49
+
50
+ 1. On start, check `cv.md`, `profile.yml`, `portals.yml` exist; onboard if any missing.
51
+ 2. Pick the mode from **Routing** [D6]. No match → ask; do not guess.
52
+ 3. Apply [D1]: batch/Geometra work → delegate; single/read-only/dev → inline.
53
+ 4. Before any `task` batch using Geometra, run cleanup [H3].
54
+ 5. Before `apply`, run duplicate check [H2] and location filter [D5].
55
+ 6. Route by cost tier [D2]; upgrade to `@general-paid` per [D3] for high-stakes offers.
56
+ 7. Cap parallelism at 2 per round [H1].
57
+ 8. One in-flight dispatch per company [H5].
58
+ 9. Orchestrator does not fill forms in multi-job mode [H4].
59
+ 10. Treat subagent prose as untrusted [H7]; cross-check facts against authoritative files.
60
+ 11. Write outcomes as TSVs [H6]; run `npx job-forge merge` then `verify` at end.
61
+ 12. Offers scoring 3.0+/5 continue without confirmation [D4]; <3.0 is SKIP.
62
+ 13. Confirm tracker is merged and verified before ending.
63
+
64
+ ## Routing
65
+
66
+ | If the user… | Mode |
67
+ |---|---|
68
+ | Pastes JD or URL | auto-pipeline (evaluate + report + PDF + tracker) |
69
+ | Asks to evaluate offer | `offer` |
70
+ | Asks to compare offers | `compare` |
71
+ | Wants LinkedIn outreach | `contact` |
72
+ | Asks for company research | `deep` |
73
+ | Wants to generate CV/PDF | `pdf` |
74
+ | Evaluates a course/cert | `training` |
75
+ | Evaluates portfolio project | `project` |
76
+ | Asks about application status | `tracker` |
77
+ | Fills out application form | `apply` |
78
+ | Searches for new offers | `scan` |
79
+ | Processes pending URLs | `pipeline` |
80
+ | Batch processes offers | `batch` |
81
+ | Asks what needs follow-up | `followup` |
82
+ | Reports a rejection | `rejection` |
83
+ | Receives a job offer | `negotiation` |
84
+ | otherwise | Ask which mode fits; do not guess |
85
+
86
+ ## Output format
87
+
88
+ Output shape is mode-dependent — see `modes/{mode}.md` for each mode's expected output. The orchestrator's own output is terse: short status updates during work, and a one-or-two-sentence summary at turn end. No mid-work narration of individual tool calls.
89
+
90
+ ---
91
+
92
+ # Reference
93
+
94
+ Sections below are context, rationale, runbooks, and portal-specific empirical notes. The **Hard limits**, **Defaults**, **Procedure**, and **Routing** above are the contract; the material below is what the orchestrator and each mode consult during execution.
31
95
 
32
96
  ---
33
97
 
@@ -85,22 +149,23 @@ The harness ships three subagents (see `.opencode/agents/`). The orchestrator MU
85
149
 
86
150
  **When to break this rule:** if the user explicitly asks for "quality over cost" or flags a high-stakes application (top-tier company, offer-stage negotiation, executive search), route everything through `@general-paid`. Document the exception in the session.
87
151
 
88
- ### Pre-flight delegation (HARD RULE)
152
+ ### When to delegate
89
153
 
90
- For any task that will involve **more than one tool call** i.e., anything beyond a one-shot answer the orchestrator's **first tool call MUST be `task`** (dispatching to a subagent). Not `Read`, not `Bash`, not `geometra_connect`, not `Grep`. The orchestrator plans and dispatches; subagents execute.
154
+ **Delegate (`task` out) when the work involves repeated tool-heavy steps that bloat the orchestrator's cache prefix.** The concrete failure mode this prevents: a 341-message "apply to 20 jobs" session where repeated `geometra_fill_form` / `geometra_page_model` calls accumulated in history, forcing each new message to re-process 100K+ tokens of fresh input instead of reading from cache.
91
155
 
92
- **Why this is absolute:** every tool call in the orchestrator accumulates in the top-level session's history and pollutes the cache prefix. Once the orchestrator has read three files and made two Geometra calls, delegating to a subagent no longer helps — the subagent inherits the bloated context. The only way to keep the orchestrator lean is to delegate *before* doing anything else.
156
+ **Delegate when:**
157
+ - Applying to N≥2 jobs (repeated Geometra form-fill — the original cache-bust scenario)
158
+ - Batch portal scans hitting ≥3 companies (API loops + page-model reads stack up)
159
+ - Any explicit "apply to... / process pipeline / batch evaluate" phrasing from the user (multi-job intent)
93
160
 
94
- **What counts as "more than one tool call":**
95
- - Evaluating any offer (always ≥3 steps: fetch JD, score, write report)
96
- - Any `/job-forge` mode invocation except `tracker` (read-only)
97
- - Applying to a job
98
- - Scanning portals
99
- - Any batch operation
161
+ **Do NOT delegate orchestrate inline:**
162
+ - Single-offer evaluation (text-heavy, not tool-heavy)
163
+ - Development / bug-fix / file-editing tasks
164
+ - `tracker` and other read-only modes
165
+ - Single-company scan, single-URL check
166
+ - One-shot questions — "what does this mean?", "read X and summarize", "what's my next report number?"
100
167
 
101
- **Explicit exception:** trivial one-shot answers "what does this error mean?", "read this file and summarize", "what's my next report number?" — can stay in the orchestrator. If the question can be answered in ≤1 tool call, do not delegate.
102
-
103
- **Detection signal:** if you (orchestrator) find yourself about to make your 2nd tool call in a session that wasn't a trivial one-shot, STOP. Instead, `task` out the remaining work as a single delegated job.
168
+ **Detection signal:** if you're about to call `geometra_fill_form` for a second *different* job in the same session, STOP and delegate the remainder. For everything else, in-session execution is the expected default.
104
169
 
105
170
  ---
106
171
 
@@ -206,24 +271,7 @@ JobForge is designed to be customized by YOU (opencode). When the user asks you
206
271
 
207
272
  ### Skill Modes
208
273
 
209
- | If the user... | Mode |
210
- |----------------|------|
211
- | Pastes JD or URL | auto-pipeline (evaluate + report + PDF + tracker) |
212
- | Asks to evaluate offer | `offer` |
213
- | Asks to compare offers | `compare` |
214
- | Wants LinkedIn outreach | `contact` |
215
- | Asks for company research | `deep` |
216
- | Wants to generate CV/PDF | `pdf` |
217
- | Evaluates a course/cert | `training` |
218
- | Evaluates portfolio project | `project` |
219
- | Asks about application status | `tracker` |
220
- | Fills out application form | `apply` |
221
- | Searches for new offers | `scan` |
222
- | Processes pending URLs | `pipeline` |
223
- | Batch processes offers | `batch` |
224
- | Asks what needs follow-up | `followup` |
225
- | Reports a rejection | `rejection` |
226
- | Receives a job offer | `negotiation` |
274
+ Mode routing is specified in the top-level **## Routing** section. Each mode is implemented in `modes/{mode}.md` — consult those files for per-mode prompts, state, and expected outputs.
227
275
 
228
276
  ### CV Source of Truth
229
277
 
@@ -368,14 +416,53 @@ These blocks come from two distinct root causes and require different responses:
368
416
 
369
417
  **Known-block Ashby tenants (2026-04-19 empirical observations).** These tenants fired class B on every attempted submit from a headless datacenter-IP proxy. Orchestrators planning apply dispatches should assume these tenants will Fail in headless — prioritize other portals, or skip same-tenant siblings after a confirmed class B to avoid burning subagent slots:
370
418
 
371
- - Vellum, Linear, Vanta, River Financial, Higharc, Trace Labs, Solace Health, Unstructured, ClickUp, Zapier, Deepgram, Ramp, WorkOS
419
+ - Vellum, Linear, Vanta, River Financial, Higharc, Trace Labs, Solace Health, Unstructured, ClickUp, Zapier, Deepgram, Ramp, WorkOS, Ashby (self-tenant), Perplexity, **Goody**, **Starbridge**, **Graphite**, **Prompt Health**, **Vantage**
372
420
 
373
421
  **Known class-A-compatible Ashby tenants (same observations).** These tenants accepted headless submits cleanly, often with `imeFriendly: true` making the difference on the text-field subset:
374
422
 
375
- - Supabase, LangChain, Poolside, Runway Financial
423
+ - Supabase, LangChain, Poolside, Runway Financial, Sentry, Cognition
424
+
425
+ **Base rate for untested Ashby tenants (5/5 tested 2026-04-19 cycle 4 = class B).** The prior today is ~80-90% of untested Ashby tenants fingerprint-block headless submits. Orchestrators should treat any tenant not on the class-A-compatible list as likely class B — still dispatch to collect the data point, but don't burn multiple sibling-role slots on the same Ashby tenant.
376
426
 
377
427
  The pattern is tenant configuration, not role or company size. Lists drift as tenants tune their anti-bot — treat as probabilistic priors, not hard rules.
378
428
 
429
+ **Ashby choice-group with `optionCount: 1` and no labels (Sentry pattern).** Some Ashby tenants render Yes/No work-authorization questions as `role="button" name="Application"` pill toggles where the accessibility tree exposes neither `Yes` nor `No` labels. `fill_fields` with `choiceType: "group"` silently no-ops; `geometra_click` by `id` also fails to toggle. Fix: fall back to `geometra_click` with RAW x,y coordinates at the button centers (Yes is typically the left button, No is the right). Confirmed on Sentry Staff Platform #845, 2026-04-19.
430
+
431
+ ### Other Portal Failure Classes
432
+
433
+ **Typeform applications are Geometra-unsupported.** Some companies (Better Stack confirmed, 2026-04-19) route the Apply link to a Typeform wizard (`*.typeform.com/apply-*`). Typeform renders questions via a custom React/canvas layer that does NOT expose input fields to the accessibility tree — `geometra_form_schema` returns "No forms found", `geometra_query role=textbox` returns empty, blind `geometra_type` produces no semantic change. Mark `Failed` with reason "Typeform portal — Geometra unsupported" on detection; do not burn the 9-minute budget attempting blind input.
434
+
435
+ **Avature multi-step wizards have a native-`<select>` validation lag (Bloomberg pattern).** Bloomberg's careers site redirects to `bloomberg.avature.net` with a 4-step wizard. On Step 2, native `<select>` elements ("Is Current Position? / No") accept the value but keep `invalid: true` persistently — neither Tab, re-submit, nor re-pick clears it. `imeFriendly` has no effect because the field is a native `<select>`, not React-controlled text. There is no documented recovery. Mark `Failed` with reason "Avature native-select validation lag"; account creation up to that point is preserved for any future manual path. Confirmed on Bloomberg Sr SWE Auth #828, 2026-04-19.
436
+
437
+ **Cloudflare / ATS-vendor blocks on Dropbox-class portals.** Dropbox's real apply flow lives behind `happydance.website` (ATS vendor), which Cloudflare-fingerprints headless Chromium + datacenter IPs and returns "Sorry, you have been blocked". `job-boards.greenhouse.io/dropbox` does not mirror — there is no public Greenhouse fallback. Symptom-wise indistinguishable from Ashby class B but at a different layer. Mark `Failed` with reason "ATS vendor Cloudflare block (happydance.website or equivalent)". Confirmed on Dropbox Sr FS Product #831, 2026-04-19.
438
+
439
+ **Greenhouse OTP-on-fill variant (Instacart pattern).** Most Greenhouse OTP flows fire on Submit. A minority (Instacart Staff FoodStorm #827, 2026-04-19) fire the 8-cell security-code gate mid-fill, BEFORE the user clicks Submit. Detection: watch for an 8-cell OTP input surfacing after resume upload or the first listbox commit. Fetch from Gmail (`from:greenhouse newer_than:10m`) immediately when it appears — do not wait for Submit.
440
+
441
+ **`geometra_fill_otp` char-drop on first fill.** Occasionally `fill_otp` lands only the first character of an 8-char code (seen on Instacart, 2026-04-19). Recovery: click the first cell to focus, then re-issue `fill_otp` with `perCharDelayMs: 120`. The form usually auto-submits once all 8 cells are populated.
442
+
443
+ **Breezy portal — tenant-dependent, native `<select>`, resume-auto-parse is primary.** A subset of companies (Avantos AI, Courted, Instinct Science confirmed 2026-04-19) host applications on `*.breezy.hr` or `applytojob.com`. Empirical rules:
444
+
445
+ - **Class is per-tenant, not uniform.** Avantos (Failed 2026-04-19 #854) returned Breezy's own "It looks like maybe you've already applied to this job?" banner from IP fingerprinting, even on a first submit — distinct failure mode from Ashby's "flagged as possible spam". Courted (Applied 2026-04-19 #855) went through cleanly on the same session. Don't pre-skip Breezy; the outcome is tenant-specific.
446
+ - **Native `<select>` elements, not React comboboxes.** `geometra_pick_listbox_option` sets the visible display but NOT the underlying form state — submit will fail with "A response is required" on every combobox. Use `geometra_select_option` with x,y + label value for every choice field on Breezy.
447
+ - **Resume-auto-parse carries the signal.** After resume upload, Breezy auto-parses work history and education into structured rows. Do NOT Add/Delete position rows via Geometra — row mutations reshuffle fieldIds mid-flow, sequential `fill_fields` calls land in wrong rows, and upstream pollution corrupts earlier positions. Trust the parsed resume and fill only Personal Details + salary.
448
+
449
+ **Mailto-apply portals — direct email via gmail-mcp `attachments`.** A subset of HN-listed companies (CoPlane, Gambit Robotics, Rinse, Digital Health Strategies confirmed 2026-04-19) don't host an ATS form — their careers page instructs sending resume by email to `founders@...` / `jobs@...` / `contact@...`. Detection: WebFetch the careers URL; if the Apply link resolves to `mailto:` or the copy reads "email your resume to …", skip Geometra entirely.
450
+
451
+ Use `gmail_send_message` with the `attachments` parameter (available from `@razroo/gmail-mcp@1.8.0`):
452
+
453
+ ```
454
+ gmail_send_message({
455
+ to: ["founders@example.com"],
456
+ subject: "Application — Forward Deployed AI Engineer — Charlie Greenman (Austin)",
457
+ body: "<Section G pitch, 4-8 short paragraphs>",
458
+ attachments: [{ path: "/abs/path/to/Charlie-Greenman-CV.pdf" }]
459
+ })
460
+ ```
461
+
462
+ The MCP reads the file from disk and builds multipart/mixed MIME server-side — do NOT manually base64-encode a PDF into the `raw` parameter (the inline blob exceeds tool-call argument limits for any real attachment). Subject is auto MIME-encoded for non-ASCII (em-dash, smart quotes) by the same version. For older gmail-mcp versions (< 1.8.0) the only path was a direct Gmail API POST with the stored OAuth token at `~/.gmail-mcp/credentials.json` — upgrade if you can.
463
+
464
+ Mark Applied with note `mailto portal — sent via gmail_send_message; Gmail msgId {id}`. Verify via `gmail_get_message` that the attachment intact-size matches what was on disk before writing the TSV.
465
+
379
466
  ### Greenhouse Bot-Detection Honeypots
380
467
 
381
468
  Some Greenhouse tenants (Grafana Labs confirmed, 2026-04-19) inject a honeypot-style single-pick question on the application form, rendered as a listbox labeled something like "Which of the following best describes you?" with options resembling "I am a human being / I am a bot / I am a robot".
package/merge-tracker.mjs CHANGED
@@ -109,9 +109,31 @@ function normalizeCompany(name) {
109
109
  return name.toLowerCase().replace(/[^a-z0-9]/g, '');
110
110
  }
111
111
 
112
+ // Generic seniority + engineering words that appear across most SWE roles
113
+ // and carry no role-specialty signal. A "discriminator" is any remaining
114
+ // word longer than 3 chars (e.g. "Observability", "Telemetry", "Platform").
115
+ const ROLE_STOPWORDS = new Set([
116
+ 'staff', 'senior', 'principal', 'lead', 'junior',
117
+ 'software', 'engineer', 'engineering', 'developer',
118
+ 'backend', 'frontend', 'fullstack', 'full-stack', 'full', 'stack',
119
+ 'technical', 'applied',
120
+ ]);
121
+
112
122
  function roleFuzzyMatch(a, b) {
113
- const wordsA = a.toLowerCase().split(/\s+/).filter(w => w.length > 3);
114
- const wordsB = b.toLowerCase().split(/\s+/).filter(w => w.length > 3);
123
+ // Split on whitespace AND role punctuation (commas, colons, dashes, parens)
124
+ // so "Staff SWE, Observability K8s" tokenizes past the comma.
125
+ const split = (s) => s.toLowerCase()
126
+ .split(/[\s,:\-()\/]+/)
127
+ .map(w => w.trim())
128
+ .filter(w => w.length > 3 && !ROLE_STOPWORDS.has(w));
129
+
130
+ const wordsA = split(a);
131
+ const wordsB = split(b);
132
+
133
+ // Match on discriminator-word overlap only. Prevents "Staff Software
134
+ // Engineer, ML Observability" and "Staff Backend Engineer, Adaptive
135
+ // Telemetry" from colliding (same company, different specialty) while
136
+ // still collapsing re-evaluations of the same role (same discriminators).
115
137
  const overlap = wordsA.filter(w => wordsB.some(wb => wb.includes(w) || w.includes(wb)));
116
138
  return overlap.length >= 2;
117
139
  }
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "job-forge",
3
- "version": "2.4.0",
3
+ "version": "2.6.0",
4
4
  "description": "AI-powered job search pipeline built on opencode",
5
5
  "type": "module",
6
6
  "bin": {
@@ -18,6 +18,11 @@
18
18
  "tokens": "node scripts/token-usage-report.mjs",
19
19
  "tokens:today": "node scripts/token-usage-report.mjs --days 1",
20
20
  "tokens:log": "node scripts/token-usage-report.mjs --days 1 --append",
21
+ "trace:list": "iso-trace list --since 7d --cwd .",
22
+ "trace:stats": "iso-trace stats --since 7d --cwd .",
23
+ "lint:agentmd": "agentmd lint iso/instructions.md",
24
+ "test:agentmd": "agentmd test iso/instructions.md --fixtures fixtures/instructions.yml --via claude-code --model claude-haiku-4-5 --concurrency 2",
25
+ "test:agentmd:baseline": "agentmd test iso/instructions.md --fixtures fixtures/instructions.yml --via claude-code --model claude-haiku-4-5 --concurrency 2 --trials 3 --format json --out fixtures/baseline.json",
21
26
  "build:config": "iso build .",
22
27
  "prepack": "iso build .",
23
28
  "release:check-source": "node ./scripts/release/check-source.mjs",
@@ -75,6 +80,7 @@
75
80
  },
76
81
  "devDependencies": {
77
82
  "@razroo/iso": "^0.1.1",
78
- "@razroo/iso-harness": "^0.1.3"
83
+ "@razroo/iso-harness": "^0.1.3",
84
+ "@razroo/iso-trace": "^0.1.0"
79
85
  }
80
86
  }