npm - @chrono-meta/fh-gate - Versions diffs - 1.4.21 → 1.4.23 - Mend

@chrono-meta/fh-gate 1.4.21 → 1.4.23

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

package/AGENTS.md +10 -1
package/CATALOG.md +12 -0
package/CHEATSHEET.md +2 -2
package/CLAUDE.md +14 -2
package/package.json +1 -1
package/plugins/fh-meta/skills/steel-quench/SKILL.md +41 -1
package/plugins/fh-meta/skills/steel-quench/SKILL_detail.md +117 -0

package/AGENTS.md CHANGED Viewed

@@ -59,7 +59,16 @@ Agents in this registry belong to the **Automation layer**. Skills (in `plugins/
 > **Codex-compatible beta**: The Methodology layer (`tracks/`, `knowledge/`, skill documentation) is designated Codex-compatible beta. Gemini, Codex, and other AI users can apply FH methodology without the Automation layer — manual invocation replaces hook/agent dispatch.
-> **Multi-model sidecar (validated)**: Any FH user can delegate to other models via sidecar — Gemini CLI, OpenAI/Codex CLI, or Copilot CLI's model catalog — invoked with `Bash` from within the Claude Code session. FH is the orchestrating harness; the sidecar is a routing/access layer (not a second harness — different layer entirely). Validated empirically: `echo "prompt" | gemini` works inside a CC session and produces usable output. Sidecar calls are Bash invocations, not agent dispatches — they bypass this registry and are coordinated inline by the skill. See `knowledge/shared/harness-core/multi_model_sidecar_strategy.md` for the full pattern.
+> **Directory → destination routing (where your outputs belong)**: not everything in the methodology layer is public-shareable. `knowledge/` and `SKILL.md` docs are the **public, reusable** methodology. `tracks/` is **local / private by convention** — work history, session records, `fh_signal_*`, audit logs — and is gitignored on the public mirror. An AI working in a local workspace that pairs the public mirror with a private companion store (the `*-be` pattern) must **not** infer "same folder ⇒ same repository"; route by content type:
+>
+> | Content | Default destination |
+> |---|---|
+> | reusable methodology · docs · skills · public guidance · polished external-facing conclusions | public mirror (`knowledge/`, `plugins/`, `docs/`) |
+> | raw signal · operator observation · private validation · handoff · paper draft · PR-background reasoning log | private companion store (`*-be` pattern) — or keep local; do **not** commit to the public mirror |
+>
+> When unsure, treat raw / observational / operator-specific material as **private-first** and promote only the polished result to public. (Concrete per-operator bindings — exact companion-store path, sync mechanism — live in the operator's local config, not here.)
+> **Multi-model sidecar (validated)**: Any FH user can delegate to other models via sidecar — Gemini CLI, OpenAI/Codex CLI, or Copilot CLI's model catalog — invoked with `Bash` from within the Claude Code session. FH is the orchestrating harness; the sidecar is a routing/access layer (not a second harness — different layer entirely). Validated empirically: `echo "prompt" | gemini` works inside a CC session and produces usable output. Sidecar calls are Bash invocations, not agent dispatches — they bypass this registry and are coordinated inline by the skill. Capability routing matters too: Gemini/Antigravity is the natural multimodal sidecar, while a Codex app/runtime session with Browser/Chrome connectors is the preferred handoff for live web-flow automation. In a local FH workspace that pairs the public methodology mirror with a private companion store (the `*-be` pattern), route by workspace capability while preserving each repository's ownership boundary. See `knowledge/shared/harness-core/multi_model_sidecar_strategy.md` for the full pattern.
 ---

package/CATALOG.md CHANGED Viewed

@@ -8,6 +8,18 @@ AI reads this file first when searching past work. Open individual files for det
 <!-- Add entries in reverse date order (newest at top) -->
+### 2026-06-14 | forge-harness | #crucible-mode, #total-immersion-absorption, #design-decision-lens, #completion-claim-discipline, #self-forge, #sister-asset
+**File:** knowledge/shared/harness-core/crucible_mode.md + harness_design_decision_lens.md + harness_6axis_framework.md (Completion-claim discipline) + tracks/_audit/session_2026_06_14_wikidocs-deep-sweep.md
+Content-level deep cross-audit of two wikidocs sister books (19689 백과사전 / 19736 Allen 멀티에이전트) via live-surface Playwright ingest + Gemini/Codex debate-loop + governor source-close, then **absorbed every candidate that passed the identity gate** (FH-identity-preserving + positively-expandable). Three assets: (1) `harness_design_decision_lens.md` — the 7 architectural-bet decisions as an orthogonal companion to the 6-axis lifecycle (only net-new = the framing; rest ALREADY-HAVE, honestly marked); (2) 6-axis **Completion-claim discipline** — a "done" claim must carry evidence + failure-checks-run + residual risk, non-vacuous; (3) **`crucible_mode.md`** — names the total-immersion absorption *stance* (throw the whole corpus in, melt under adversarial heat, keep only what bonds to an **unmeltable adamantium core**; rejections are boundary-defining). Each absorption was itself put through the crucible (quench-challenger + persona-auditor + Sonnet blind sim) — the crucible doc's own quench caught 3 of its defects (incl. a phantom worked-instance claim) before commit.
+- Decision: unmeltable-by-absorption ≠ unchangeable-by-operator (anvil signal = N≥3 recurrence → verify-bidirectional); Redis multi-agent bus rejected as out-of-identity (speed/ops); productivity/ROI over-claims quarantined. Operator insight seed: "포지하네스는 자기자신도 용광로에 빠뜨려 단련하지만 그 심지는 아다만티움처럼 결코 녹지 않는다."
+- Open: HITL commit decision (this batch); adamantium core list (5) is operator-owned.
+### 2026-06-14 | forge-harness | #debate-circulation-loop, #sister-asset, #sidecar-mining, #governor-source-verification, #import-list-first
+**File:** knowledge/shared/harness-core/multi_model_sidecar_strategy.md (§Debate Circulation Loop) + tracks/_audit/session_2026_06_14_fh-reinforcement-mining.md
+Promoted the **Debate Circulation Loop + governor source-verification** methodology from memory to the public sidecar doc — relay a multi-runtime question, mutual peer critique (a runtime's blind spot is invisible to itself, visible to a peer), CC/Opus governor closes against *source* (a debate is also judged → bind to a mechanical anchor), promote only the source-verified residue. Demonstrated live by mining two operator-designated FH-reinforcement sources: **oh-my-claudecode** (Codex governance critique → mostly ALREADY-HAVE, 1 gated increment = Ralph Done-When hardening; speed-first substrate rejected per governance-over-speed axis) and **wikidocs book/19736 = 「하네스 엔지니어링 백과사전」** (Gemini ingest past a governor 403; a same-topic "Agent = Model + Harness" encyclopedia, mostly independent convergence + 2 gated follow-ups: Ch12 7-decisions/3-contrarian, Ch11 12-patterns).
+- Decision: import-list-first held on both; no clone-and-own. Source B content beyond TOC is Gemini-one-surface, governor-unverified (403) → import candidates are explicit follow-ups, not confirmed.
+- Open: (a) Ralph Done-When phrasing → goal-quench/pipeline-conductor PR (HITL); (b) Ch11/Ch12 content access → cross-audit vs 6-axis + pattern library.
 ### 2026-06-13 | forge-harness | #judge-robustness, #mechanical-anchor, #hardening-batch-2, #sycophancy-gate, #verification
 **File:** plugins/fh-meta/skills/{verify-bidirectional,steel-quench,asset-placement-gate}/SKILL.md (commit f80bc99)
 Batch-2 of the judge-robustness hardening (after #1-#2 in be2d5dc): three more judge-only verdict paths bound to anchors. #3 verify-bidirectional evidence gate — a persistent-baseline overwrite needs a supporting cited source (read, not existence) or a grep contradiction, else ESCALATE+block; closes the bare-pushback sycophancy vector without restoring AI stubbornness. #4 steel-quench Wave-P3 PASS-framing redaction (mktemp glyph+verdict-phrase strip — the challenger caught a naive bare-PASS global corrupting "status==PASS", an S fixed pre-commit). #5 asset-placement Step 0.5 mechanical pre-grep grounds criterion ④. challenger-verify round 2 was load-bearing again (FAIL→fixed: 1S+2A+4B); sonnet blind sim PASS (evidence gate ESCALATEs on bare overwrite).

package/CHEATSHEET.md CHANGED Viewed

@@ -252,9 +252,9 @@ Use when conversation has grown long and past context becomes noise. Clears cont
 > Using Opus continuously for the same task multiplies token cost 3–5x. Switching down strategically is a strategy.
-### Lever 4 — Keyword trigger load (Context Isolate)
+### Lever 4 — Intent-based recall load (Context Isolate)
-Only read necessary documents at the necessary time. MEMORY.md keyword trigger method.
+Only read necessary documents at the necessary time. MEMORY.md intent-based + associative recall (index→body; `knowledge/shared/dialogue/memory_intent_recall.md`).
 ```
 ✅ Only read CATALOG.md first at session start

package/CLAUDE.md CHANGED Viewed

@@ -25,7 +25,7 @@ The forge-harness hub is not just a repository — it is the **command center fo
 |---|---|---|
 | **① Control Tower** | Coordinates all connected projects and **drives harness-ification across them** — decides *which* projects to harness and *when*, propagates harness assets to each, and feeds their synced learnings into the hub's compounding loop. The *how* (rules · gates · 6-axis) is executed via the Core Axis. Command HQ, not a passive registry. | `.claude/rules/auto_project_mapping.md` (mapping + **Full-Harness Mode**) · `harvest-loop` (compounding loop) · `templates/` (project-harness bundle) · `CATALOG.md` |
 | **② Frontier → Org Propagation** | Proactively applies global AI/harness frontier thinking and **translates it for your organization**. | `knowledge/shared/harness-core/harness_frontier_diagnosis_*.md` · `knowledge/{your-org}/` |
-| **③ AI Collaboration Guide** | Accumulates and distributes best practices for token efficiency and dialogue methodology — "how to ask, delegate, and record". | `CHEATSHEET.md` · `knowledge/shared/dialogue/ai_dialogue_playbook.md` · `MEMORY.md` keyword-triggered loading |
+| **③ AI Collaboration Guide** | Accumulates and distributes best practices for token efficiency and dialogue methodology — "how to ask, delegate, and record". | `CHEATSHEET.md` · `knowledge/shared/dialogue/ai_dialogue_playbook.md` · `MEMORY.md` intent-based + associative recall (`knowledge/shared/dialogue/memory_intent_recall.md`) |
 | **Core Axis** | **Harness Engineering (How)** — the methodology and practice axis that realizes the three layers above. The 6-axis framework is the operating unit. **A harness is a means, not an end** — Field harness: "simpler over time" (complexity = warning signal). Meta-harness: *optimize*, not necessarily simplify — complexity earns its scope; red flags are orphaned, redundant, and decorative units, not complexity itself. | `harness_6axis_framework.md` · `hub_compounding_loop.md` · `claude_code_runtime_flow.md` · `.claude/agents/` (sub-agents) |
 ## Core Reference Documents (Consult First)
@@ -136,7 +136,7 @@ All 6 items below must pass before committing a new SKILL.md. If any fails, fix
 | **Description diet** | Plain text / 0 self-marketing expressions / 0 emphasis words (⭐, "critical", "groundbreaking") |
 | **Done When defined** | At least 1 explicit completion condition |
 | **Check-class declared** | Each Done When condition states its check class — mandatory-pass / measured / judged (`harness_6axis_framework.md` §Axis 5). Any judged condition names its adversarial pairing — no judge-only path |
-| **Natural language triggers** | At least 3 examples that work without internal vocabulary |
+| **Natural language triggers** | At least 3 examples that work without internal vocabulary. This is a **form** check (judged — do the examples avoid internal jargon). For a load-bearing gate/router skill it can be upgraded **judged → measured** with steel-quench's `Step 0.5 — Trigger-Accuracy Probe` (a dispatched should-fire / near-miss-should-not-fire fire-count), turning "do these triggers collide?" from a guess into a number. Optional for ordinary skills; recommended when the skill is a routing/gate surface |
 | **Independently executable** | Confirmed to work without other FH skills (or dependencies are explicitly documented) |
 Skills without a Done When definition automatically qualify as harness-doctor L2 M-tier.
@@ -306,6 +306,7 @@ Proposal format: `"I see [X]. Want me to run /[skill] to [one-line description]?
 | "context is getting long", "token limit", "/clear", "slow", "context" (burden) | `/context-doctor` |
 | "wrap up this week", "review", "audit", "weekly", "retrospective" | `/harvest-loop` |
 | "pull this into FH", "reverse-harvest", "worth keeping", "harvest pattern", "field pattern" | `/field-harvest` |
+| "용광로모드", "crucible mode", "absorb this whole corpus", "throw everything in", "re-forge FH identity", "melt this down" (total-immersion absorption, not cherry-pick — esp. a whole corpus on a core FH axis, or a frontier showcase risking FOMO) | `knowledge/shared/harness-core/crucible_mode.md` (read it, run the chain: total-ingest → steel-quench/phantom-quench melt → governor identity-bonding → sim/persona reforge → field-harvest rebirth; the core invariants stay unmeltable) |
 | "harness is complex", "too many skills", "check structure", "harness" | `/harness-doctor` |
 | "review this PR", "check diff", "code review" | code diff → built-in `/code-review`·`/review` · FH-asset coherence → `/hub-cc-pr-reviewer` (role split) |
 | "keep watching X", "poll this", "check every N minutes", recurring WATCH item | built-in `/loop` (interval runner) — pair with the WATCH list, don't hand-poll |
@@ -351,6 +352,17 @@ At session start, determine the last run time from history files and auto-propos
 > A cadence reminder the user has repeatedly declined is **muted** per the UAP (see the loop below) — don't re-nag.
+#### Event-bound proposals (context-entry, not time)
+Some proposals are not *time*-overdue — they fire **once when a specific work context is entered**. `persona-innovator` (ideation/naming + external-frontier absorption) is most valuable in exactly two contexts and friction-noise everywhere else, so it is proposed on context-entry rather than every session or every N days:
+| Context entered | Proposal | innovator mode | Guard |
+|---|---|---|---|
+| **Mapped-project acceleration** (door ③ — field harness work begins) | gap/naming scan | Mode I (internal) | once/session · UAP-suppressible |
+| **Mode D FH self-dev** (an FH asset is about to change — the 4-axis gate's own trigger) | gap + external-frontier scan | Mode F (full) | once/session · UAP-suppressible |
+**Not always-on** (cost + simplicity guard): innovator runs WebSearch/WebFetch, so a per-turn fire would tax tokens and risk decorative-unit over-generation — the very thing steel-quench's Wave-1 angle #1 ("is there no simpler alternative?") attacks. One proposal per context-entry; the user accepts or declines. In a Mode D session this runs *before* the change (design-time ideation), distinct from the post-change 4-axis verification gate. **Promotion is measured, not assumed**: log each outcome to `knowledge/shared/learnings/subagent_invocations_log.yaml`; escalate to a stronger cadence only after the `operations.md` gate clears (`accepted ≥ 60%`) — innovator is v0.2 with no pilot data yet. A 3×-declined proposal is UAP-muted like any cadence nag. (innovator also rides `frontier-digest --chain`; that 7-day path is unchanged and complementary.)
 ## Operational Adaptation Loop — User-Tuned Self-Optimization
 > Detail: `.claude/rules/operational_adaptation.md`

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@chrono-meta/fh-gate",
-  "version": "1.4.21",
+  "version": "1.4.23",
   "description": "FH runtime adapters — run FH governance, skills, and agents via Claude or Codex with machine-parseable gates.",
   "license": "MIT",
   "keywords": [

package/plugins/fh-meta/skills/steel-quench/SKILL.md CHANGED Viewed

@@ -99,6 +99,33 @@ Treat the adapter output as the isolated challenger result for Wave 1. This pres
 ---
+## Step 0.5 — Trigger-Accuracy Probe (SKILL.md artifacts only · measured)
+> **Import origin** (sister-asset cross-audit 2026-06-14, `tracks/_audit/session_2026_06_14_official-plugins-cross-audit.md`): skill-creator + plugin-dev/skill-reviewer measure trigger accuracy **empirically**; FH's skill gate ("3+ NL triggers") and steel-quench's trigger-collision attack are **judged**, not measured. This probe converts that one verdict to **measured** — the mechanical-anchor discipline (a terminal trigger verdict should rest on a count, not an inference: the W4-4 question applied to the skill's own description).
+**Fires only when** `artifact_type = skill_md` (Step 0.3 canonical enum, `tpa_schema.md` — i.e. a SKILL.md) AND the **trigger surface** changed. *Trigger surface* = exactly the `description:` YAML field **plus** the `## Triggers` / `## Trigger Phrases` section (and nothing else — a body-wave or procedure edit with both of those untouched does **not** fire it). Any other artifact has no trigger surface — note `Step 0.5: skipped (not a skill_md trigger change)` and proceed.
+**Procedure** (prose-scale — FH routes + governs, it does **not** rebuild skill-creator's eval engine; no Python harness, no vector store):
+1. From the skill's `description`, author **8–10 should-trigger** phrases (varied, realistic user utterances; include some that omit the skill's internal vocabulary) and **8–10 near-miss should-NOT-trigger** phrases (share keywords/domain but actually need a different tool — the discriminating cases, not obvious irrelevants).
+2. Dispatch the probe set **in isolation** (`Agent` / `fh-run` — same isolation rule as Wave 1; the author session must not judge its own triggers) against the live skill description only.
+3. **Count** and report as measured: `trigger-probe: <fired>/<should> fire · <false>/<should-not> false-fire (model: <tier>)`.
+**Verdict mapping** (measured → severity; feeds Wave-T temper + Done When):
+| Measured | Verdict | Severity |
+|---|---|---|
+| should-fire < 70% | **undertrigger** — description too narrow, or diet stripped a real trigger | S if load-bearing gate skill, else A |
+| should-not false-fire > 20% | **overtrigger / collision** — bleeds into an adjacent skill's territory (the collision class steel-quench owns, now measured) | A |
+| both within bound | trigger surface PASS (measured, not guessed) | — |
+**Threshold granularity** (the bounds are guidelines, not a knife-edge): at the probe size (N=8–10 → ~10–12.5% steps) the percentages are coarse and 20%/70% may not be exactly reachable. Report the **count** (`2/8 false-fire`), not just a percentage; on a boundary result take the **stricter** verdict and re-probe with more phrases before finalizing. The bound is a trigger-collision heuristic, not a derived constant.
+**Honesty caveat**: the probe measures *trigger-description* accuracy on the session model — not field behavior across all tiers. A near-floor model may under-fire regardless of description quality; record the probe model and, on a below-floor run, treat the result as provisional (Step 0.3 below-floor rule).
+> **Detail**: See `SKILL_detail.md §TriggerProbe` — worked probe set + fire-count table + before/after description fix.
+---
 ## Wave 1 — 5 Mandatory Attack Angles
 **Execution principles**: Attacks must be based on real code/files/configs — abstract criticism prohibited.
@@ -117,7 +144,18 @@ Isolation can be achieved by Claude Code `Agent(...)` or by `fh-run --agent fh-c
 **S-grade Immediate Human Gate**: If Wave 1 contains 1+ S-grade blocker → pause, surface options (a) proceed to Wave 2 / (b) human review first / (c) abort. Do not silently enter Wave 2 with unreviewed S-grade items.
-> **Detail**: See `SKILL_detail.md §Wave1` — Wave 1 output format, optional numeric score, quench-challenger invocation.
+**Code-artifact supplementary lens — silent-failure scan** (conditional · fires only when `artifact_type ∈ {bash_script, code}` — Step 0.3 canonical enum, `tpa_schema.md`; non-code artifact → note `code-lens: n/a (non-code artifact)` and skip). `artifact_type` is derived from **file path / extension** (`tpa_schema.md` classification rule), **not** interior content — a `skill_md` that embeds a ```` ```bash ```` fence stays `skill_md` and does **not** fire this lens (do not conflate with the CLAUDE.md Substantive carve-out, which keys Axes 2–3 of the *commit gate* off code-fence presence — a different mechanism). Import origin: pr-review-toolkit/silent-failure-hunter (sister-asset cross-audit 2026-06-14, Import #2). Wave 1's 5 angles attack *structure*; this adds the named *error-suppression* vector the general angles miss. Grep the diff/file for each named pattern:
+| Pattern | What to flag | Severity guide |
+|---|---|---|
+| **Empty catch / `\|\| true` / `2>/dev/null` swallow** | An error path that discards the error with no log, no re-raise, no user surface | S if it hides a gate/verification failure, else A |
+| **Broad catch** (`except:` / `catch (e)` with no type) | Catches more than intended; masks unrelated failures | A |
+| **Unjustified fallback** | Falls back to a default on error without recording *that* it fell back | A — silent degradation is the worst class (cf. P6 graceful-degradation must be *documented*) |
+| **Exit-code ignored** | A piped/chained command whose non-zero exit can't propagate (`cmd1 \| cmd2`, missing `set -e`/`pipefail`) | A if it gates a downstream destructive/publish step |
+A finding here is a real-code attack (Wave 1 execution principle) — cite the exact line. The lens is a *checklist supplement*, not a 6th mandatory angle: it carries no weight on non-code artifacts.
+> **Detail**: See `SKILL_detail.md §Wave1` — Wave 1 output format, optional numeric score, quench-challenger invocation; and `§CodeLens` — silent-failure worked examples (bash + python).
 ---
@@ -148,6 +186,8 @@ The challenger (quench-challenger in Wave 4 mode) knows it's running in an isola
 Wave 4 convergence = Wave 3 criteria + 3 AI-specific vectors actually reviewed + hallucination defense based on original file references.
+> **W4-4 ↔ Step 0.5**: W4-4 is the *general* measurement-vs-inference question; **Step 0.5 (Trigger-Accuracy Probe)** is its *measured instance* for one surface — the skill's own trigger description. When the target is a `skill_md` with a changed trigger surface, satisfy W4-4 by running Step 0.5 rather than answering it by inference.
 > **Detail**: See `SKILL_detail.md §Wave4` — Wave 4 output format, defense principles, convergence criteria, activation declaration format.
 ---

package/plugins/fh-meta/skills/steel-quench/SKILL_detail.md CHANGED Viewed

@@ -558,3 +558,120 @@ External CLIs available: check at runtime
 ```
 **Degraded coverage note**: Wave 3 without `/phantom-quench` available → flag as "Axis 3 skipped (skill unavailable)" and note in residual risk card.
+---
+## §TriggerProbe — Trigger-Accuracy Probe Worked Example
+Worked instance for SKILL.md §Step 0.5 (Trigger-Accuracy Probe). Imported from skill-creator's
+eval-driven trigger loop + plugin-dev/skill-reviewer's description-trigger check (sister-asset
+cross-audit `tracks/_audit/session_2026_06_14_official-plugins-cross-audit.md`, Import #1). The import
+is **prose-scale**: a dispatched fire-count, not skill-creator's `run_loop.py` / `benchmark.json` eval
+engine (no-reinvention — FH adds the *measured number* to its trigger-collision axis, it does not
+rebuild the engine).
+Target: a hypothetical `pdf-extract` skill whose description reads *"Extract text and tables from PDF
+files. Use when the user wants to pull data out of a PDF."*
+**Probe set authored from the description** (16 phrases: 8 should-fire + 8 near-miss should-NOT-fire).
+The near-misses are deliberately adjacent — same keyword (`extract`, `pdf`) but a different *task verb* —
+because that is where trigger collisions actually live:
+| # | Phrase | should-fire? | why |
+|---|---|---|---|
+| 1 | "grab the line items out of this invoice.pdf" | ✅ | extract from pdf |
+| 2 | "I need the clauses of this scanned contract as text" | ✅ | extract text |
+| 3 | "pull the table on page 3 of the Q4 report (it's a pdf)" | ✅ | extract table |
+| 4 | "convert this PDF to a spreadsheet" | ✅ | extract → structured |
+| 5 | "read what's in attached.pdf and summarize" | ✅ | read content |
+| 6 | "extract the figures from this research-paper pdf" | ✅ | extract |
+| 7 | "I have a folder of pdfs, get the totals from each" | ✅ | batch extract |
+| 8 | "what does this pdf say" | ✅ | content request |
+| 9 | "make a PDF from this markdown" | ❌ | *generate*, not extract |
+| 10 | "merge these three pdfs into one" | ❌ | *manipulate*, not extract |
+| 11 | "fill out this PDF form for me" | ❌ | *write*, not read |
+| 12 | "extract the frames from this video" | ❌ | keyword 'extract', wrong medium |
+| 13 | "OCR this scanned image.png" | ❌ | image, not pdf — discriminating near-miss |
+| 14 | "compress this pdf so it's smaller" | ❌ | *transform*, not extract |
+| 15 | "redact the SSNs in this pdf" | ❌ | *edit*, not extract |
+| 16 | "split this pdf at the bookmarks" | ❌ | *manipulate* |
+**Dispatch + measured result** (isolation-run via Agent / `fh-run`, not judged inline):
+```
+trigger-probe: 8/8 fire · 2/8 false-fire (model: sonnet)
+```
+should-not #9 ("make a PDF") and #11 ("fill out form") false-fired — the description's loose
+"pull data out of a PDF" let *generation* and *form-fill* leak in.
+**Verdict mapping** (per §Step 0.5 table): should-fire 8/8 PASS · false-fire **2/8** (= 25%, above the
+~20% guideline; at N=8 the reachable values straddle the bound as 1/8=12.5% / 2/8=25%, so report the
+count and take the stricter verdict) → **overtrigger / collision (A-grade)**.
+**Before → after description fix** (the *output* of the probe — narrow the verb, name the near-miss
+boundary explicitly):
+> *Before:* "Extract text and tables from PDF files. Use when the user wants to pull data out of a PDF."
+> *After:* "Extract existing text and tables **from** PDF files (read-only). Use when the user wants the
+> **content** of a PDF as text/data — not to create, fill, merge, redact, compress, or split a PDF
+> (those are separate tasks)."
+Re-probe after the fix: `8/8 fire · 0/8 false-fire` → trigger surface PASS (measured). This is the
+predict → measure → fix loop at prose scale: the probe converts steel-quench's previously *judged*
+"could this trigger collide?" into a *measured* fire-count, closing the same judge-only gap the
+mechanical-anchor doctrine targets (`[[feedback_judge_robustness_mechanical_anchor]]`).
+**Honesty caveat** (carried from §Step 0.5): the probe measures trigger-*description* accuracy on the
+**session model**, not on every field tier — a description that fires cleanly on Opus may undertrigger
+on Haiku. Record the probe model in the result line; a below-floor probe model makes the PASS
+provisional (re-probe at floor tier, `[[feedback_verify_before_downgrade]]`).
+---
+## §CodeLens — Silent-Failure Scan Worked Examples
+Worked instances for the SKILL.md Wave 1 **code-artifact supplementary lens** (Import #2 — imported from
+pr-review-toolkit/silent-failure-hunter, sister-asset cross-audit 2026-06-14). The lens fires **only** on
+`artifact_type ∈ {bash_script, code}` (canonical enum, `tpa_schema.md`); the FH-original increment over silent-failure-hunter is the
+**severity-by-blast-radius rule**: a swallowed error is S (not just "high") when it hides a *gate /
+verification / destructive-or-publish* failure — the same blast-radius logic FH's irreversibility gates use.
+### Example 1 — bash (a gate script swallowing its own failure)
+```bash
+# regression check before publish
+bash templates/predelete_check.sh "$REPO" 2>/dev/null || true   # ← finding
+gh repo edit --visibility public
+```
+**Finding** (cite the line): `2>/dev/null || true` on a *gate* command — `predelete_check.sh`'s REVIEW
+exit (which is supposed to block) is discarded, and the next line publishes regardless.
+- Pattern: **empty-catch / `|| true` swallow** + **exit-code ignored**.
+- Severity: **S** — it hides a gate failure on a *publish* (irreversible) path. Not "A": the blast radius
+  is an un-recoverable public exposure.
+- Fix: `bash templates/predelete_check.sh "$REPO" || { echo "predelete REVIEW — aborting publish"; exit 1; }`
+### Example 2 — python (broad catch + unjustified silent fallback)
+```python
+try:
+    cfg = load_config(path)
+except Exception:          # ← finding 1: broad catch
+    cfg = DEFAULT_CONFIG   # ← finding 2: unjustified silent fallback
+```
+**Findings**:
+- **Broad catch** (`except Exception`): a `KeyboardInterrupt`-adjacent or unrelated `OSError` is masked as
+  "config missing." Severity **A** — narrow to `except FileNotFoundError`.
+- **Unjustified fallback**: falls back to `DEFAULT_CONFIG` with **no log that it did so** — the operator
+  later debugs "why is prod using defaults?" blind. Severity **A** (silent degradation, the worst class:
+  cf. P6 — graceful degradation must be *documented*, not silent).
+- Fix: catch the specific error **and** `log.warning("config %s missing — using DEFAULT_CONFIG", path)`
+  before the fallback. The fallback is fine; the *silence* is the defect.
+### Boundary note (over-engineering guard)
+On a SKILL.md / design-doc / README target the lens emits exactly one line — `code-lens: n/a (non-code
+artifact)` — and adds **zero** attack weight. This is deliberate: making silent-failure a 6th *mandatory*
+Wave 1 angle would force an "N/A" on every methodology artifact (the majority of FH targets), which
+steel-quench's own Wave-1 angle #1 ("is there no simpler alternative?") would correctly attack as
+over-engineering. Conditional-by-artifact-type keeps the lens sharp where it applies and invisible where
+it doesn't.