@chrono-meta/fh-gate 1.4.21 → 1.4.23

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/AGENTS.md CHANGED
@@ -59,7 +59,16 @@ Agents in this registry belong to the **Automation layer**. Skills (in `plugins/
59
59
 
60
60
  > **Codex-compatible beta**: The Methodology layer (`tracks/`, `knowledge/`, skill documentation) is designated Codex-compatible beta. Gemini, Codex, and other AI users can apply FH methodology without the Automation layer — manual invocation replaces hook/agent dispatch.
61
61
 
62
- > **Multi-model sidecar (validated)**: Any FH user can delegate to other models via sidecar Gemini CLI, OpenAI/Codex CLI, or Copilot CLI's model catalog — invoked with `Bash` from within the Claude Code session. FH is the orchestrating harness; the sidecar is a routing/access layer (not a second harnessdifferent layer entirely). Validated empirically: `echo "prompt" | gemini` works inside a CC session and produces usable output. Sidecar calls are Bash invocations, not agent dispatches they bypass this registry and are coordinated inline by the skill. See `knowledge/shared/harness-core/multi_model_sidecar_strategy.md` for the full pattern.
62
+ > **Directory destination routing (where your outputs belong)**: not everything in the methodology layer is public-shareable. `knowledge/` and `SKILL.md` docs are the **public, reusable** methodology. `tracks/` is **local / private by convention** work history, session records, `fh_signal_*`, audit logsand is gitignored on the public mirror. An AI working in a local workspace that pairs the public mirror with a private companion store (the `*-be` pattern) must **not** infer "same folder same repository"; route by content type:
63
+ >
64
+ > | Content | Default destination |
65
+ > |---|---|
66
+ > | reusable methodology · docs · skills · public guidance · polished external-facing conclusions | public mirror (`knowledge/`, `plugins/`, `docs/`) |
67
+ > | raw signal · operator observation · private validation · handoff · paper draft · PR-background reasoning log | private companion store (`*-be` pattern) — or keep local; do **not** commit to the public mirror |
68
+ >
69
+ > When unsure, treat raw / observational / operator-specific material as **private-first** and promote only the polished result to public. (Concrete per-operator bindings — exact companion-store path, sync mechanism — live in the operator's local config, not here.)
70
+
71
+ > **Multi-model sidecar (validated)**: Any FH user can delegate to other models via sidecar — Gemini CLI, OpenAI/Codex CLI, or Copilot CLI's model catalog — invoked with `Bash` from within the Claude Code session. FH is the orchestrating harness; the sidecar is a routing/access layer (not a second harness — different layer entirely). Validated empirically: `echo "prompt" | gemini` works inside a CC session and produces usable output. Sidecar calls are Bash invocations, not agent dispatches — they bypass this registry and are coordinated inline by the skill. Capability routing matters too: Gemini/Antigravity is the natural multimodal sidecar, while a Codex app/runtime session with Browser/Chrome connectors is the preferred handoff for live web-flow automation. In a local FH workspace that pairs the public methodology mirror with a private companion store (the `*-be` pattern), route by workspace capability while preserving each repository's ownership boundary. See `knowledge/shared/harness-core/multi_model_sidecar_strategy.md` for the full pattern.
63
72
 
64
73
  ---
65
74
 
package/CATALOG.md CHANGED
@@ -8,6 +8,18 @@ AI reads this file first when searching past work. Open individual files for det
8
8
 
9
9
  <!-- Add entries in reverse date order (newest at top) -->
10
10
 
11
+ ### 2026-06-14 | forge-harness | #crucible-mode, #total-immersion-absorption, #design-decision-lens, #completion-claim-discipline, #self-forge, #sister-asset
12
+ **File:** knowledge/shared/harness-core/crucible_mode.md + harness_design_decision_lens.md + harness_6axis_framework.md (Completion-claim discipline) + tracks/_audit/session_2026_06_14_wikidocs-deep-sweep.md
13
+ Content-level deep cross-audit of two wikidocs sister books (19689 백과사전 / 19736 Allen 멀티에이전트) via live-surface Playwright ingest + Gemini/Codex debate-loop + governor source-close, then **absorbed every candidate that passed the identity gate** (FH-identity-preserving + positively-expandable). Three assets: (1) `harness_design_decision_lens.md` — the 7 architectural-bet decisions as an orthogonal companion to the 6-axis lifecycle (only net-new = the framing; rest ALREADY-HAVE, honestly marked); (2) 6-axis **Completion-claim discipline** — a "done" claim must carry evidence + failure-checks-run + residual risk, non-vacuous; (3) **`crucible_mode.md`** — names the total-immersion absorption *stance* (throw the whole corpus in, melt under adversarial heat, keep only what bonds to an **unmeltable adamantium core**; rejections are boundary-defining). Each absorption was itself put through the crucible (quench-challenger + persona-auditor + Sonnet blind sim) — the crucible doc's own quench caught 3 of its defects (incl. a phantom worked-instance claim) before commit.
14
+ - Decision: unmeltable-by-absorption ≠ unchangeable-by-operator (anvil signal = N≥3 recurrence → verify-bidirectional); Redis multi-agent bus rejected as out-of-identity (speed/ops); productivity/ROI over-claims quarantined. Operator insight seed: "포지하네스는 자기자신도 용광로에 빠뜨려 단련하지만 그 심지는 아다만티움처럼 결코 녹지 않는다."
15
+ - Open: HITL commit decision (this batch); adamantium core list (5) is operator-owned.
16
+
17
+ ### 2026-06-14 | forge-harness | #debate-circulation-loop, #sister-asset, #sidecar-mining, #governor-source-verification, #import-list-first
18
+ **File:** knowledge/shared/harness-core/multi_model_sidecar_strategy.md (§Debate Circulation Loop) + tracks/_audit/session_2026_06_14_fh-reinforcement-mining.md
19
+ Promoted the **Debate Circulation Loop + governor source-verification** methodology from memory to the public sidecar doc — relay a multi-runtime question, mutual peer critique (a runtime's blind spot is invisible to itself, visible to a peer), CC/Opus governor closes against *source* (a debate is also judged → bind to a mechanical anchor), promote only the source-verified residue. Demonstrated live by mining two operator-designated FH-reinforcement sources: **oh-my-claudecode** (Codex governance critique → mostly ALREADY-HAVE, 1 gated increment = Ralph Done-When hardening; speed-first substrate rejected per governance-over-speed axis) and **wikidocs book/19736 = 「하네스 엔지니어링 백과사전」** (Gemini ingest past a governor 403; a same-topic "Agent = Model + Harness" encyclopedia, mostly independent convergence + 2 gated follow-ups: Ch12 7-decisions/3-contrarian, Ch11 12-patterns).
20
+ - Decision: import-list-first held on both; no clone-and-own. Source B content beyond TOC is Gemini-one-surface, governor-unverified (403) → import candidates are explicit follow-ups, not confirmed.
21
+ - Open: (a) Ralph Done-When phrasing → goal-quench/pipeline-conductor PR (HITL); (b) Ch11/Ch12 content access → cross-audit vs 6-axis + pattern library.
22
+
11
23
  ### 2026-06-13 | forge-harness | #judge-robustness, #mechanical-anchor, #hardening-batch-2, #sycophancy-gate, #verification
12
24
  **File:** plugins/fh-meta/skills/{verify-bidirectional,steel-quench,asset-placement-gate}/SKILL.md (commit f80bc99)
13
25
  Batch-2 of the judge-robustness hardening (after #1-#2 in be2d5dc): three more judge-only verdict paths bound to anchors. #3 verify-bidirectional evidence gate — a persistent-baseline overwrite needs a supporting cited source (read, not existence) or a grep contradiction, else ESCALATE+block; closes the bare-pushback sycophancy vector without restoring AI stubbornness. #4 steel-quench Wave-P3 PASS-framing redaction (mktemp glyph+verdict-phrase strip — the challenger caught a naive bare-PASS global corrupting "status==PASS", an S fixed pre-commit). #5 asset-placement Step 0.5 mechanical pre-grep grounds criterion ④. challenger-verify round 2 was load-bearing again (FAIL→fixed: 1S+2A+4B); sonnet blind sim PASS (evidence gate ESCALATEs on bare overwrite).
package/CHEATSHEET.md CHANGED
@@ -252,9 +252,9 @@ Use when conversation has grown long and past context becomes noise. Clears cont
252
252
 
253
253
  > Using Opus continuously for the same task multiplies token cost 3–5x. Switching down strategically is a strategy.
254
254
 
255
- ### Lever 4 — Keyword trigger load (Context Isolate)
255
+ ### Lever 4 — Intent-based recall load (Context Isolate)
256
256
 
257
- Only read necessary documents at the necessary time. MEMORY.md keyword trigger method.
257
+ Only read necessary documents at the necessary time. MEMORY.md intent-based + associative recall (index→body; `knowledge/shared/dialogue/memory_intent_recall.md`).
258
258
 
259
259
  ```
260
260
  ✅ Only read CATALOG.md first at session start
package/CLAUDE.md CHANGED
@@ -25,7 +25,7 @@ The forge-harness hub is not just a repository — it is the **command center fo
25
25
  |---|---|---|
26
26
  | **① Control Tower** | Coordinates all connected projects and **drives harness-ification across them** — decides *which* projects to harness and *when*, propagates harness assets to each, and feeds their synced learnings into the hub's compounding loop. The *how* (rules · gates · 6-axis) is executed via the Core Axis. Command HQ, not a passive registry. | `.claude/rules/auto_project_mapping.md` (mapping + **Full-Harness Mode**) · `harvest-loop` (compounding loop) · `templates/` (project-harness bundle) · `CATALOG.md` |
27
27
  | **② Frontier → Org Propagation** | Proactively applies global AI/harness frontier thinking and **translates it for your organization**. | `knowledge/shared/harness-core/harness_frontier_diagnosis_*.md` · `knowledge/{your-org}/` |
28
- | **③ AI Collaboration Guide** | Accumulates and distributes best practices for token efficiency and dialogue methodology — "how to ask, delegate, and record". | `CHEATSHEET.md` · `knowledge/shared/dialogue/ai_dialogue_playbook.md` · `MEMORY.md` keyword-triggered loading |
28
+ | **③ AI Collaboration Guide** | Accumulates and distributes best practices for token efficiency and dialogue methodology — "how to ask, delegate, and record". | `CHEATSHEET.md` · `knowledge/shared/dialogue/ai_dialogue_playbook.md` · `MEMORY.md` intent-based + associative recall (`knowledge/shared/dialogue/memory_intent_recall.md`) |
29
29
  | **Core Axis** | **Harness Engineering (How)** — the methodology and practice axis that realizes the three layers above. The 6-axis framework is the operating unit. **A harness is a means, not an end** — Field harness: "simpler over time" (complexity = warning signal). Meta-harness: *optimize*, not necessarily simplify — complexity earns its scope; red flags are orphaned, redundant, and decorative units, not complexity itself. | `harness_6axis_framework.md` · `hub_compounding_loop.md` · `claude_code_runtime_flow.md` · `.claude/agents/` (sub-agents) |
30
30
 
31
31
  ## Core Reference Documents (Consult First)
@@ -136,7 +136,7 @@ All 6 items below must pass before committing a new SKILL.md. If any fails, fix
136
136
  | **Description diet** | Plain text / 0 self-marketing expressions / 0 emphasis words (⭐, "critical", "groundbreaking") |
137
137
  | **Done When defined** | At least 1 explicit completion condition |
138
138
  | **Check-class declared** | Each Done When condition states its check class — mandatory-pass / measured / judged (`harness_6axis_framework.md` §Axis 5). Any judged condition names its adversarial pairing — no judge-only path |
139
- | **Natural language triggers** | At least 3 examples that work without internal vocabulary |
139
+ | **Natural language triggers** | At least 3 examples that work without internal vocabulary. This is a **form** check (judged — do the examples avoid internal jargon). For a load-bearing gate/router skill it can be upgraded **judged → measured** with steel-quench's `Step 0.5 — Trigger-Accuracy Probe` (a dispatched should-fire / near-miss-should-not-fire fire-count), turning "do these triggers collide?" from a guess into a number. Optional for ordinary skills; recommended when the skill is a routing/gate surface |
140
140
  | **Independently executable** | Confirmed to work without other FH skills (or dependencies are explicitly documented) |
141
141
 
142
142
  Skills without a Done When definition automatically qualify as harness-doctor L2 M-tier.
@@ -306,6 +306,7 @@ Proposal format: `"I see [X]. Want me to run /[skill] to [one-line description]?
306
306
  | "context is getting long", "token limit", "/clear", "slow", "context" (burden) | `/context-doctor` |
307
307
  | "wrap up this week", "review", "audit", "weekly", "retrospective" | `/harvest-loop` |
308
308
  | "pull this into FH", "reverse-harvest", "worth keeping", "harvest pattern", "field pattern" | `/field-harvest` |
309
+ | "용광로모드", "crucible mode", "absorb this whole corpus", "throw everything in", "re-forge FH identity", "melt this down" (total-immersion absorption, not cherry-pick — esp. a whole corpus on a core FH axis, or a frontier showcase risking FOMO) | `knowledge/shared/harness-core/crucible_mode.md` (read it, run the chain: total-ingest → steel-quench/phantom-quench melt → governor identity-bonding → sim/persona reforge → field-harvest rebirth; the core invariants stay unmeltable) |
309
310
  | "harness is complex", "too many skills", "check structure", "harness" | `/harness-doctor` |
310
311
  | "review this PR", "check diff", "code review" | code diff → built-in `/code-review`·`/review` · FH-asset coherence → `/hub-cc-pr-reviewer` (role split) |
311
312
  | "keep watching X", "poll this", "check every N minutes", recurring WATCH item | built-in `/loop` (interval runner) — pair with the WATCH list, don't hand-poll |
@@ -351,6 +352,17 @@ At session start, determine the last run time from history files and auto-propos
351
352
 
352
353
  > A cadence reminder the user has repeatedly declined is **muted** per the UAP (see the loop below) — don't re-nag.
353
354
 
355
+ #### Event-bound proposals (context-entry, not time)
356
+
357
+ Some proposals are not *time*-overdue — they fire **once when a specific work context is entered**. `persona-innovator` (ideation/naming + external-frontier absorption) is most valuable in exactly two contexts and friction-noise everywhere else, so it is proposed on context-entry rather than every session or every N days:
358
+
359
+ | Context entered | Proposal | innovator mode | Guard |
360
+ |---|---|---|---|
361
+ | **Mapped-project acceleration** (door ③ — field harness work begins) | gap/naming scan | Mode I (internal) | once/session · UAP-suppressible |
362
+ | **Mode D FH self-dev** (an FH asset is about to change — the 4-axis gate's own trigger) | gap + external-frontier scan | Mode F (full) | once/session · UAP-suppressible |
363
+
364
+ **Not always-on** (cost + simplicity guard): innovator runs WebSearch/WebFetch, so a per-turn fire would tax tokens and risk decorative-unit over-generation — the very thing steel-quench's Wave-1 angle #1 ("is there no simpler alternative?") attacks. One proposal per context-entry; the user accepts or declines. In a Mode D session this runs *before* the change (design-time ideation), distinct from the post-change 4-axis verification gate. **Promotion is measured, not assumed**: log each outcome to `knowledge/shared/learnings/subagent_invocations_log.yaml`; escalate to a stronger cadence only after the `operations.md` gate clears (`accepted ≥ 60%`) — innovator is v0.2 with no pilot data yet. A 3×-declined proposal is UAP-muted like any cadence nag. (innovator also rides `frontier-digest --chain`; that 7-day path is unchanged and complementary.)
365
+
354
366
  ## Operational Adaptation Loop — User-Tuned Self-Optimization
355
367
 
356
368
  > Detail: `.claude/rules/operational_adaptation.md`
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@chrono-meta/fh-gate",
3
- "version": "1.4.21",
3
+ "version": "1.4.23",
4
4
  "description": "FH runtime adapters — run FH governance, skills, and agents via Claude or Codex with machine-parseable gates.",
5
5
  "license": "MIT",
6
6
  "keywords": [
@@ -99,6 +99,33 @@ Treat the adapter output as the isolated challenger result for Wave 1. This pres
99
99
 
100
100
  ---
101
101
 
102
+ ## Step 0.5 — Trigger-Accuracy Probe (SKILL.md artifacts only · measured)
103
+
104
+ > **Import origin** (sister-asset cross-audit 2026-06-14, `tracks/_audit/session_2026_06_14_official-plugins-cross-audit.md`): skill-creator + plugin-dev/skill-reviewer measure trigger accuracy **empirically**; FH's skill gate ("3+ NL triggers") and steel-quench's trigger-collision attack are **judged**, not measured. This probe converts that one verdict to **measured** — the mechanical-anchor discipline (a terminal trigger verdict should rest on a count, not an inference: the W4-4 question applied to the skill's own description).
105
+
106
+ **Fires only when** `artifact_type = skill_md` (Step 0.3 canonical enum, `tpa_schema.md` — i.e. a SKILL.md) AND the **trigger surface** changed. *Trigger surface* = exactly the `description:` YAML field **plus** the `## Triggers` / `## Trigger Phrases` section (and nothing else — a body-wave or procedure edit with both of those untouched does **not** fire it). Any other artifact has no trigger surface — note `Step 0.5: skipped (not a skill_md trigger change)` and proceed.
107
+
108
+ **Procedure** (prose-scale — FH routes + governs, it does **not** rebuild skill-creator's eval engine; no Python harness, no vector store):
109
+ 1. From the skill's `description`, author **8–10 should-trigger** phrases (varied, realistic user utterances; include some that omit the skill's internal vocabulary) and **8–10 near-miss should-NOT-trigger** phrases (share keywords/domain but actually need a different tool — the discriminating cases, not obvious irrelevants).
110
+ 2. Dispatch the probe set **in isolation** (`Agent` / `fh-run` — same isolation rule as Wave 1; the author session must not judge its own triggers) against the live skill description only.
111
+ 3. **Count** and report as measured: `trigger-probe: <fired>/<should> fire · <false>/<should-not> false-fire (model: <tier>)`.
112
+
113
+ **Verdict mapping** (measured → severity; feeds Wave-T temper + Done When):
114
+
115
+ | Measured | Verdict | Severity |
116
+ |---|---|---|
117
+ | should-fire < 70% | **undertrigger** — description too narrow, or diet stripped a real trigger | S if load-bearing gate skill, else A |
118
+ | should-not false-fire > 20% | **overtrigger / collision** — bleeds into an adjacent skill's territory (the collision class steel-quench owns, now measured) | A |
119
+ | both within bound | trigger surface PASS (measured, not guessed) | — |
120
+
121
+ **Threshold granularity** (the bounds are guidelines, not a knife-edge): at the probe size (N=8–10 → ~10–12.5% steps) the percentages are coarse and 20%/70% may not be exactly reachable. Report the **count** (`2/8 false-fire`), not just a percentage; on a boundary result take the **stricter** verdict and re-probe with more phrases before finalizing. The bound is a trigger-collision heuristic, not a derived constant.
122
+
123
+ **Honesty caveat**: the probe measures *trigger-description* accuracy on the session model — not field behavior across all tiers. A near-floor model may under-fire regardless of description quality; record the probe model and, on a below-floor run, treat the result as provisional (Step 0.3 below-floor rule).
124
+
125
+ > **Detail**: See `SKILL_detail.md §TriggerProbe` — worked probe set + fire-count table + before/after description fix.
126
+
127
+ ---
128
+
102
129
  ## Wave 1 — 5 Mandatory Attack Angles
103
130
 
104
131
  **Execution principles**: Attacks must be based on real code/files/configs — abstract criticism prohibited.
@@ -117,7 +144,18 @@ Isolation can be achieved by Claude Code `Agent(...)` or by `fh-run --agent fh-c
117
144
 
118
145
  **S-grade Immediate Human Gate**: If Wave 1 contains 1+ S-grade blocker → pause, surface options (a) proceed to Wave 2 / (b) human review first / (c) abort. Do not silently enter Wave 2 with unreviewed S-grade items.
119
146
 
120
- > **Detail**: See `SKILL_detail.md §Wave1` — Wave 1 output format, optional numeric score, quench-challenger invocation.
147
+ **Code-artifact supplementary lens — silent-failure scan** (conditional · fires only when `artifact_type ∈ {bash_script, code}` — Step 0.3 canonical enum, `tpa_schema.md`; non-code artifact → note `code-lens: n/a (non-code artifact)` and skip). `artifact_type` is derived from **file path / extension** (`tpa_schema.md` classification rule), **not** interior content a `skill_md` that embeds a ```` ```bash ```` fence stays `skill_md` and does **not** fire this lens (do not conflate with the CLAUDE.md Substantive carve-out, which keys Axes 2–3 of the *commit gate* off code-fence presence — a different mechanism). Import origin: pr-review-toolkit/silent-failure-hunter (sister-asset cross-audit 2026-06-14, Import #2). Wave 1's 5 angles attack *structure*; this adds the named *error-suppression* vector the general angles miss. Grep the diff/file for each named pattern:
148
+
149
+ | Pattern | What to flag | Severity guide |
150
+ |---|---|---|
151
+ | **Empty catch / `\|\| true` / `2>/dev/null` swallow** | An error path that discards the error with no log, no re-raise, no user surface | S if it hides a gate/verification failure, else A |
152
+ | **Broad catch** (`except:` / `catch (e)` with no type) | Catches more than intended; masks unrelated failures | A |
153
+ | **Unjustified fallback** | Falls back to a default on error without recording *that* it fell back | A — silent degradation is the worst class (cf. P6 graceful-degradation must be *documented*) |
154
+ | **Exit-code ignored** | A piped/chained command whose non-zero exit can't propagate (`cmd1 \| cmd2`, missing `set -e`/`pipefail`) | A if it gates a downstream destructive/publish step |
155
+
156
+ A finding here is a real-code attack (Wave 1 execution principle) — cite the exact line. The lens is a *checklist supplement*, not a 6th mandatory angle: it carries no weight on non-code artifacts.
157
+
158
+ > **Detail**: See `SKILL_detail.md §Wave1` — Wave 1 output format, optional numeric score, quench-challenger invocation; and `§CodeLens` — silent-failure worked examples (bash + python).
121
159
 
122
160
  ---
123
161
 
@@ -148,6 +186,8 @@ The challenger (quench-challenger in Wave 4 mode) knows it's running in an isola
148
186
 
149
187
  Wave 4 convergence = Wave 3 criteria + 3 AI-specific vectors actually reviewed + hallucination defense based on original file references.
150
188
 
189
+ > **W4-4 ↔ Step 0.5**: W4-4 is the *general* measurement-vs-inference question; **Step 0.5 (Trigger-Accuracy Probe)** is its *measured instance* for one surface — the skill's own trigger description. When the target is a `skill_md` with a changed trigger surface, satisfy W4-4 by running Step 0.5 rather than answering it by inference.
190
+
151
191
  > **Detail**: See `SKILL_detail.md §Wave4` — Wave 4 output format, defense principles, convergence criteria, activation declaration format.
152
192
 
153
193
  ---
@@ -558,3 +558,120 @@ External CLIs available: check at runtime
558
558
  ```
559
559
 
560
560
  **Degraded coverage note**: Wave 3 without `/phantom-quench` available → flag as "Axis 3 skipped (skill unavailable)" and note in residual risk card.
561
+
562
+ ---
563
+
564
+ ## §TriggerProbe — Trigger-Accuracy Probe Worked Example
565
+
566
+ Worked instance for SKILL.md §Step 0.5 (Trigger-Accuracy Probe). Imported from skill-creator's
567
+ eval-driven trigger loop + plugin-dev/skill-reviewer's description-trigger check (sister-asset
568
+ cross-audit `tracks/_audit/session_2026_06_14_official-plugins-cross-audit.md`, Import #1). The import
569
+ is **prose-scale**: a dispatched fire-count, not skill-creator's `run_loop.py` / `benchmark.json` eval
570
+ engine (no-reinvention — FH adds the *measured number* to its trigger-collision axis, it does not
571
+ rebuild the engine).
572
+
573
+ Target: a hypothetical `pdf-extract` skill whose description reads *"Extract text and tables from PDF
574
+ files. Use when the user wants to pull data out of a PDF."*
575
+
576
+ **Probe set authored from the description** (16 phrases: 8 should-fire + 8 near-miss should-NOT-fire).
577
+ The near-misses are deliberately adjacent — same keyword (`extract`, `pdf`) but a different *task verb* —
578
+ because that is where trigger collisions actually live:
579
+
580
+ | # | Phrase | should-fire? | why |
581
+ |---|---|---|---|
582
+ | 1 | "grab the line items out of this invoice.pdf" | ✅ | extract from pdf |
583
+ | 2 | "I need the clauses of this scanned contract as text" | ✅ | extract text |
584
+ | 3 | "pull the table on page 3 of the Q4 report (it's a pdf)" | ✅ | extract table |
585
+ | 4 | "convert this PDF to a spreadsheet" | ✅ | extract → structured |
586
+ | 5 | "read what's in attached.pdf and summarize" | ✅ | read content |
587
+ | 6 | "extract the figures from this research-paper pdf" | ✅ | extract |
588
+ | 7 | "I have a folder of pdfs, get the totals from each" | ✅ | batch extract |
589
+ | 8 | "what does this pdf say" | ✅ | content request |
590
+ | 9 | "make a PDF from this markdown" | ❌ | *generate*, not extract |
591
+ | 10 | "merge these three pdfs into one" | ❌ | *manipulate*, not extract |
592
+ | 11 | "fill out this PDF form for me" | ❌ | *write*, not read |
593
+ | 12 | "extract the frames from this video" | ❌ | keyword 'extract', wrong medium |
594
+ | 13 | "OCR this scanned image.png" | ❌ | image, not pdf — discriminating near-miss |
595
+ | 14 | "compress this pdf so it's smaller" | ❌ | *transform*, not extract |
596
+ | 15 | "redact the SSNs in this pdf" | ❌ | *edit*, not extract |
597
+ | 16 | "split this pdf at the bookmarks" | ❌ | *manipulate* |
598
+
599
+ **Dispatch + measured result** (isolation-run via Agent / `fh-run`, not judged inline):
600
+ ```
601
+ trigger-probe: 8/8 fire · 2/8 false-fire (model: sonnet)
602
+ ```
603
+ should-not #9 ("make a PDF") and #11 ("fill out form") false-fired — the description's loose
604
+ "pull data out of a PDF" let *generation* and *form-fill* leak in.
605
+
606
+ **Verdict mapping** (per §Step 0.5 table): should-fire 8/8 PASS · false-fire **2/8** (= 25%, above the
607
+ ~20% guideline; at N=8 the reachable values straddle the bound as 1/8=12.5% / 2/8=25%, so report the
608
+ count and take the stricter verdict) → **overtrigger / collision (A-grade)**.
609
+
610
+ **Before → after description fix** (the *output* of the probe — narrow the verb, name the near-miss
611
+ boundary explicitly):
612
+ > *Before:* "Extract text and tables from PDF files. Use when the user wants to pull data out of a PDF."
613
+ > *After:* "Extract existing text and tables **from** PDF files (read-only). Use when the user wants the
614
+ > **content** of a PDF as text/data — not to create, fill, merge, redact, compress, or split a PDF
615
+ > (those are separate tasks)."
616
+
617
+ Re-probe after the fix: `8/8 fire · 0/8 false-fire` → trigger surface PASS (measured). This is the
618
+ predict → measure → fix loop at prose scale: the probe converts steel-quench's previously *judged*
619
+ "could this trigger collide?" into a *measured* fire-count, closing the same judge-only gap the
620
+ mechanical-anchor doctrine targets (`[[feedback_judge_robustness_mechanical_anchor]]`).
621
+
622
+ **Honesty caveat** (carried from §Step 0.5): the probe measures trigger-*description* accuracy on the
623
+ **session model**, not on every field tier — a description that fires cleanly on Opus may undertrigger
624
+ on Haiku. Record the probe model in the result line; a below-floor probe model makes the PASS
625
+ provisional (re-probe at floor tier, `[[feedback_verify_before_downgrade]]`).
626
+
627
+ ---
628
+
629
+ ## §CodeLens — Silent-Failure Scan Worked Examples
630
+
631
+ Worked instances for the SKILL.md Wave 1 **code-artifact supplementary lens** (Import #2 — imported from
632
+ pr-review-toolkit/silent-failure-hunter, sister-asset cross-audit 2026-06-14). The lens fires **only** on
633
+ `artifact_type ∈ {bash_script, code}` (canonical enum, `tpa_schema.md`); the FH-original increment over silent-failure-hunter is the
634
+ **severity-by-blast-radius rule**: a swallowed error is S (not just "high") when it hides a *gate /
635
+ verification / destructive-or-publish* failure — the same blast-radius logic FH's irreversibility gates use.
636
+
637
+ ### Example 1 — bash (a gate script swallowing its own failure)
638
+
639
+ ```bash
640
+ # regression check before publish
641
+ bash templates/predelete_check.sh "$REPO" 2>/dev/null || true # ← finding
642
+ gh repo edit --visibility public
643
+ ```
644
+
645
+ **Finding** (cite the line): `2>/dev/null || true` on a *gate* command — `predelete_check.sh`'s REVIEW
646
+ exit (which is supposed to block) is discarded, and the next line publishes regardless.
647
+ - Pattern: **empty-catch / `|| true` swallow** + **exit-code ignored**.
648
+ - Severity: **S** — it hides a gate failure on a *publish* (irreversible) path. Not "A": the blast radius
649
+ is an un-recoverable public exposure.
650
+ - Fix: `bash templates/predelete_check.sh "$REPO" || { echo "predelete REVIEW — aborting publish"; exit 1; }`
651
+
652
+ ### Example 2 — python (broad catch + unjustified silent fallback)
653
+
654
+ ```python
655
+ try:
656
+ cfg = load_config(path)
657
+ except Exception: # ← finding 1: broad catch
658
+ cfg = DEFAULT_CONFIG # ← finding 2: unjustified silent fallback
659
+ ```
660
+
661
+ **Findings**:
662
+ - **Broad catch** (`except Exception`): a `KeyboardInterrupt`-adjacent or unrelated `OSError` is masked as
663
+ "config missing." Severity **A** — narrow to `except FileNotFoundError`.
664
+ - **Unjustified fallback**: falls back to `DEFAULT_CONFIG` with **no log that it did so** — the operator
665
+ later debugs "why is prod using defaults?" blind. Severity **A** (silent degradation, the worst class:
666
+ cf. P6 — graceful degradation must be *documented*, not silent).
667
+ - Fix: catch the specific error **and** `log.warning("config %s missing — using DEFAULT_CONFIG", path)`
668
+ before the fallback. The fallback is fine; the *silence* is the defect.
669
+
670
+ ### Boundary note (over-engineering guard)
671
+
672
+ On a SKILL.md / design-doc / README target the lens emits exactly one line — `code-lens: n/a (non-code
673
+ artifact)` — and adds **zero** attack weight. This is deliberate: making silent-failure a 6th *mandatory*
674
+ Wave 1 angle would force an "N/A" on every methodology artifact (the majority of FH targets), which
675
+ steel-quench's own Wave-1 angle #1 ("is there no simpler alternative?") would correctly attack as
676
+ over-engineering. Conditional-by-artifact-type keeps the lens sharp where it applies and invisible where
677
+ it doesn't.