@chrono-meta/fh-gate 1.4.28 → 1.4.29

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@chrono-meta/fh-gate",
3
- "version": "1.4.28",
3
+ "version": "1.4.29",
4
4
  "description": "FH runtime adapters — run FH governance, skills, and agents via Claude or Codex with machine-parseable gates.",
5
5
  "license": "MIT",
6
6
  "keywords": [
@@ -106,12 +106,42 @@ literal span — never on a model's agreement.
106
106
 
107
107
  | Citation form | Resolved URL |
108
108
  |---|---|
109
- | `arXiv:NNNN.NNNNN` / `arXiv:NNNN.NNNNNvK` | `https://arxiv.org/abs/NNNN.NNNNN` |
109
+ | `arXiv:NNNN.NNNNN` / `arXiv:NNNN.NNNNNvK` | `https://arxiv.org/abs/NNNN.NNNNN` (abstract — the **canonical identifier** surface). For a section-/body-cited claim, prefer full text **first**, falling back in order `https://arxiv.org/html/NNNN.NNNNN` → `https://arxiv.org/pdf/NNNN.NNNNN` → `/abs/` (HTML is a *partial* backfill — see *Surface selection* below) |
110
110
  | bare DOI `10.xxxx/...` | `https://doi.org/10.xxxx/...` |
111
111
  | `http(s)://...` | use as-is (HTTPS-upgrade handled by WebFetch) |
112
112
  | Named source, no URL ("paper X shows Y") | **one** `WebSearch` to locate the canonical URL → then WebFetch it. Do **not** verify against the search snippet alone — the snippet is not the source. |
113
113
  | version token `pkg x.y.z` | the package registry/release page (npm/PyPI/GitHub releases) for that exact version |
114
114
 
115
+ **Surface selection — the abstract is the wrong surface for a §-cited mechanism.** The `/abs/` page
116
+ carries only the abstract; a claim that cites a section or attributes a *named body-level mechanism* to
117
+ the paper (e.g. "§4.2 stale-but-confident", "the paper's Change Manifest") returns NONE / MENTIONS on
118
+ `/abs/` even when fully supported in the body — an **artificial Unsupported** (measured 2026-06-17
119
+ in-the-wild dogfood: a `/abs/`-only fetch falsely flagged a body-supported claim; it resolved only after
120
+ escalating to full text). Two mechanical, judgment-free rules cover this:
121
+
122
+ 1. **Mechanical first-surface trigger** — if the claim text carries a section/figure marker (`§N`,
123
+ "section N", "Table N", "Figure N") → fetch full text **first**. Otherwise fetch `/abs/` first. (Do
124
+ *not* pre-decide "is this a body-level mechanism?" — that is a salience-dependent judgment; the
125
+ `§`-marker is the only mechanical signal, and the NONE-escalation below catches the rest.)
126
+ 2. **Abstract-NONE escalates** (the self-correcting backstop) — *any* `/abs/` fetch that returns
127
+ NONE/MENTIONS escalates once to full text before an Unsupported verdict (see Decision rules). A true
128
+ headline claim already returned ASSERTS on `/abs/`, so it never escalates; only a NONE does, and the
129
+ escalation is cheap insurance. This subsumes rule 1's misses — a body claim that *didn't* carry a `§`
130
+ marker still gets full text via its abstract-NONE.
131
+
132
+ **HTML is a partial backfill — fall back, never Phantom.** arXiv HTML exists only for LaTeX-source
133
+ papers (roughly post-2023) and some conversions fail, so `/html/{id}` can 404 for a perfectly real
134
+ paper. A 404 on `/html/` (or `/pdf/`) is **surface-absence, not identifier-absence**: fall back
135
+ `/html/` → `/pdf/` → `/abs/`. Only a 404 on **`/abs/`** (the canonical identifier surface) is Phantom.
136
+
137
+ **Anchor the full-text fetch.** Full text is large; pass the cited section number / mechanism term *into*
138
+ the WebFetch prompt (`"…the span in §4.2 that states…"`) so the span search is anchored, not whole-paper
139
+ — this makes the `§`-citation the retrieval anchor, not merely a routing flag, and guards against a long
140
+ paper's relevant span falling outside the fetch model's attention (a full-text false-NONE).
141
+
142
+ Reserve `/abs/` for claims the abstract itself states (headline numbers, the paper's stated contribution
143
+ — e.g. a top-line stat like "39–77% factual support" is abstract-level and `/abs/` is correct).
144
+
115
145
  **WebFetch prompt template** (ask for a *span*, not a *verdict* — this keeps the anchor non-model):
116
146
 
117
147
  ```
@@ -126,6 +156,12 @@ Polarity is load-bearing: a page saying "X does NOT hold" or "earlier work claim
126
156
  contains a span lexically matching the claim. Asking for the label keeps the *retriever* mechanical while
127
157
  surfacing the stance the *judge* needs — a NEGATES/MENTIONS span is **not** grounding.
128
158
 
159
+ **Coinage at fetch time:** when the claim's key term is a known FH coinage (a name FH invented, not the
160
+ paper's — e.g. "Change Manifest", "Validation Gate"), pass the *mechanism description* — not the coinage
161
+ string — into the `<exact claim>` slot, so the retriever searches for the **relation**, not a label that
162
+ isn't on the page. The Coinage-vs-source-term decision rule (below) then only adjudicates the returned
163
+ span; it never has to overturn a NONE the prompt itself caused.
164
+
129
165
  **Span-evidence format** (a Grounded external verdict is invalid without this):
130
166
 
131
167
  ```
@@ -152,7 +188,16 @@ Grounded: N / Unsupported: N / Unreachable: N / Phantom: N
152
188
  - Span labelled **NEGATES or MENTIONS** → **Unsupported 🟠** (a negating or merely-mentioning span is not
153
189
  grounding — the false-Grounded trap).
154
190
  - Fetch succeeded, content readable, span = NONE or off-claim → **Unsupported 🟠** (cited-but-not-verified).
155
- - Identifier does not resolve at all (404 on a specific arXiv id, dead DOI, fabricated) **Phantom ❌**.
191
+ **Surface-escalation first abstract-NONE is the mechanical trigger (no judgment):** if the only
192
+ surface fetched was the abstract (`/abs/`), re-fetch full text (fall back `/html/{id}` → `/pdf/{id}`)
193
+ before classifying — *any* abstract-NONE escalates once, you do **not** first decide whether the claim
194
+ "looks body-level" (that judgment is the salience leak this rule removes). A body-supported claim must
195
+ not be failed on an abstract-only NONE. This is the second-surface escalation, **extended from
196
+ Unreachable to Unsupported-on-abstract** (2026-06-17 dogfood finding #2). Only after a full-text surface
197
+ *also* returns NONE/MENTIONS is the verdict Unsupported. (Single-pass: escalate once, then classify — no loop.)
198
+ - Identifier does not resolve on the **canonical** surface (`/abs/` 404, dead DOI, fabricated id) →
199
+ **Phantom ❌**. A 404 on `/html/` or `/pdf/` while `/abs/` resolves is **surface-absence, not
200
+ identifier-absence** — fall back per *Surface selection*, never Phantom.
156
201
  - Identifier plausibly real but fetch blocked in this environment (paywall/403/timeout/cross-host) →
157
202
  **Unreachable ⏳** — provisional; note the limit, route to a second surface or the human gate; never auto-Phantom.
158
203
  - `WebFetch`/`WebSearch` **absent/disabled in the environment** (not one source — the capability) → Step 2-E
@@ -160,6 +205,40 @@ Grounded: N / Unsupported: N / Unreachable: N / Phantom: N
160
205
  (do not mark every claim Unreachable + CONDITIONAL_PASS — that falsely implies grounding was attempted).
161
206
  - **Never** upgrade NONE to Grounded because a second model "thinks it's probably right" (agreement bias).
162
207
 
208
+ **Coinage-vs-source-term (the external analogue of Step 2's format normalization — but *judged*, not
209
+ mechanical).** FH sometimes attributes a *coined* name to an external source ("the paper's Change
210
+ Manifest", "its Validation Gate") when the paper uses different words for the same mechanism. A name
211
+ mismatch alone is not automatically Unsupported — but the path to rescuing it is **deliberately narrow**,
212
+ because this is one rationalization away from the agreement-bias trap above. The failure direction must
213
+ stay conservative (over-flag, never false-Grounded):
214
+
215
+ - **Possessive/attributive phrasing is presumed a terminology claim.** "the paper's X", "its X",
216
+ "X, per [paper]" → literal absence of the coinage on the source = **Unsupported 🟠**, *unless the FH
217
+ artifact separately states the underlying relation in non-coinage words* (so the mechanism is
218
+ independently legible without the label). A runner may **not** reclassify a bare "the paper's X" as a
219
+ mechanism claim on its own reading — the mechanism claim must already be spelled out **in the artifact**
220
+ before the span-judged path opens. (Closes the laundering move: relabeling a terminology claim as a
221
+ mechanism claim to save it.)
222
+ - **When the mechanism IS spelled out, ground on the relation, not the string — mechanical de-label
223
+ test:** blank out *both* labels (FH's coinage and the source's term); the span must still literally
224
+ state the operative relation the claim asserts — the same inputs→outputs / the same conditional / the
225
+ same action. If, with both names blanked, the span no longer states the claim, the equivalence was
226
+ reader-supplied → **Unsupported 🟠**. (2026-06-17 dogfood: SkillOpt's "validation score" /
227
+ "rejected-edit buffer" survives the de-label test for FH's "Validation Gate" claim — the span states
228
+ the accept-only-on-improvement relation in its own words.)
229
+ - **No self-granted Grounded on a contested match.** If the de-label test is contested rather than clean,
230
+ the runner may **not** write Grounded — the *only* path to Grounded for a disputed term-match is
231
+ **escalate to `/steel-quench` Wave 1** and let the isolated adversary surface the span or reject it.
232
+ Default otherwise = Unsupported. (Removes the runner's discretion to self-select the lenient branch.)
233
+ - **Recommended conservative resolution (publish-facing).** For a publish-facing citation the safer fix
234
+ is to require the artifact to **quote the paper's own term** ("what SkillOpt calls a 'validation
235
+ score'") instead of asserting FH's coinage as the paper's — removing the judgment entirely. Reserve the
236
+ judged de-label path for low-stakes internal claims; match effort to stakes.
237
+
238
+ Unlike Step 2's value-format normalization (mechanical: `300s` ≡ `300 seconds`), coinage-vs-term is
239
+ **judged** — Grounded only via a clean de-label span or a passed steel-quench escalation, never a
240
+ reader's semantic equivalence assertion.
241
+
163
242
  ---
164
243
 
165
244
  ## §Step3-Detail