@chrono-meta/fh-gate 1.4.28 → 1.4.29
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json
CHANGED
|
@@ -106,12 +106,42 @@ literal span — never on a model's agreement.
|
|
|
106
106
|
|
|
107
107
|
| Citation form | Resolved URL |
|
|
108
108
|
|---|---|
|
|
109
|
-
| `arXiv:NNNN.NNNNN` / `arXiv:NNNN.NNNNNvK` | `https://arxiv.org/abs/NNNN.NNNNN` |
|
|
109
|
+
| `arXiv:NNNN.NNNNN` / `arXiv:NNNN.NNNNNvK` | `https://arxiv.org/abs/NNNN.NNNNN` (abstract — the **canonical identifier** surface). For a section-/body-cited claim, prefer full text **first**, falling back in order `https://arxiv.org/html/NNNN.NNNNN` → `https://arxiv.org/pdf/NNNN.NNNNN` → `/abs/` (HTML is a *partial* backfill — see *Surface selection* below) |
|
|
110
110
|
| bare DOI `10.xxxx/...` | `https://doi.org/10.xxxx/...` |
|
|
111
111
|
| `http(s)://...` | use as-is (HTTPS-upgrade handled by WebFetch) |
|
|
112
112
|
| Named source, no URL ("paper X shows Y") | **one** `WebSearch` to locate the canonical URL → then WebFetch it. Do **not** verify against the search snippet alone — the snippet is not the source. |
|
|
113
113
|
| version token `pkg x.y.z` | the package registry/release page (npm/PyPI/GitHub releases) for that exact version |
|
|
114
114
|
|
|
115
|
+
**Surface selection — the abstract is the wrong surface for a §-cited mechanism.** The `/abs/` page
|
|
116
|
+
carries only the abstract; a claim that cites a section or attributes a *named body-level mechanism* to
|
|
117
|
+
the paper (e.g. "§4.2 stale-but-confident", "the paper's Change Manifest") returns NONE / MENTIONS on
|
|
118
|
+
`/abs/` even when fully supported in the body — an **artificial Unsupported** (measured 2026-06-17
|
|
119
|
+
in-the-wild dogfood: a `/abs/`-only fetch falsely flagged a body-supported claim; it resolved only after
|
|
120
|
+
escalating to full text). Two mechanical, judgment-free rules cover this:
|
|
121
|
+
|
|
122
|
+
1. **Mechanical first-surface trigger** — if the claim text carries a section/figure marker (`§N`,
|
|
123
|
+
"section N", "Table N", "Figure N") → fetch full text **first**. Otherwise fetch `/abs/` first. (Do
|
|
124
|
+
*not* pre-decide "is this a body-level mechanism?" — that is a salience-dependent judgment; the
|
|
125
|
+
`§`-marker is the only mechanical signal, and the NONE-escalation below catches the rest.)
|
|
126
|
+
2. **Abstract-NONE escalates** (the self-correcting backstop) — *any* `/abs/` fetch that returns
|
|
127
|
+
NONE/MENTIONS escalates once to full text before an Unsupported verdict (see Decision rules). A true
|
|
128
|
+
headline claim already returned ASSERTS on `/abs/`, so it never escalates; only a NONE does, and the
|
|
129
|
+
escalation is cheap insurance. This subsumes rule 1's misses — a body claim that *didn't* carry a `§`
|
|
130
|
+
marker still gets full text via its abstract-NONE.
|
|
131
|
+
|
|
132
|
+
**HTML is a partial backfill — fall back, never Phantom.** arXiv HTML exists only for LaTeX-source
|
|
133
|
+
papers (roughly post-2023) and some conversions fail, so `/html/{id}` can 404 for a perfectly real
|
|
134
|
+
paper. A 404 on `/html/` (or `/pdf/`) is **surface-absence, not identifier-absence**: fall back
|
|
135
|
+
`/html/` → `/pdf/` → `/abs/`. Only a 404 on **`/abs/`** (the canonical identifier surface) is Phantom.
|
|
136
|
+
|
|
137
|
+
**Anchor the full-text fetch.** Full text is large; pass the cited section number / mechanism term *into*
|
|
138
|
+
the WebFetch prompt (`"…the span in §4.2 that states…"`) so the span search is anchored, not whole-paper
|
|
139
|
+
— this makes the `§`-citation the retrieval anchor, not merely a routing flag, and guards against a long
|
|
140
|
+
paper's relevant span falling outside the fetch model's attention (a full-text false-NONE).
|
|
141
|
+
|
|
142
|
+
Reserve `/abs/` for claims the abstract itself states (headline numbers, the paper's stated contribution
|
|
143
|
+
— e.g. a top-line stat like "39–77% factual support" is abstract-level and `/abs/` is correct).
|
|
144
|
+
|
|
115
145
|
**WebFetch prompt template** (ask for a *span*, not a *verdict* — this keeps the anchor non-model):
|
|
116
146
|
|
|
117
147
|
```
|
|
@@ -126,6 +156,12 @@ Polarity is load-bearing: a page saying "X does NOT hold" or "earlier work claim
|
|
|
126
156
|
contains a span lexically matching the claim. Asking for the label keeps the *retriever* mechanical while
|
|
127
157
|
surfacing the stance the *judge* needs — a NEGATES/MENTIONS span is **not** grounding.
|
|
128
158
|
|
|
159
|
+
**Coinage at fetch time:** when the claim's key term is a known FH coinage (a name FH invented, not the
|
|
160
|
+
paper's — e.g. "Change Manifest", "Validation Gate"), pass the *mechanism description* — not the coinage
|
|
161
|
+
string — into the `<exact claim>` slot, so the retriever searches for the **relation**, not a label that
|
|
162
|
+
isn't on the page. The Coinage-vs-source-term decision rule (below) then only adjudicates the returned
|
|
163
|
+
span; it never has to overturn a NONE the prompt itself caused.
|
|
164
|
+
|
|
129
165
|
**Span-evidence format** (a Grounded external verdict is invalid without this):
|
|
130
166
|
|
|
131
167
|
```
|
|
@@ -152,7 +188,16 @@ Grounded: N / Unsupported: N / Unreachable: N / Phantom: N
|
|
|
152
188
|
- Span labelled **NEGATES or MENTIONS** → **Unsupported 🟠** (a negating or merely-mentioning span is not
|
|
153
189
|
grounding — the false-Grounded trap).
|
|
154
190
|
- Fetch succeeded, content readable, span = NONE or off-claim → **Unsupported 🟠** (cited-but-not-verified).
|
|
155
|
-
-
|
|
191
|
+
**Surface-escalation first — abstract-NONE is the mechanical trigger (no judgment):** if the only
|
|
192
|
+
surface fetched was the abstract (`/abs/`), re-fetch full text (fall back `/html/{id}` → `/pdf/{id}`)
|
|
193
|
+
before classifying — *any* abstract-NONE escalates once, you do **not** first decide whether the claim
|
|
194
|
+
"looks body-level" (that judgment is the salience leak this rule removes). A body-supported claim must
|
|
195
|
+
not be failed on an abstract-only NONE. This is the second-surface escalation, **extended from
|
|
196
|
+
Unreachable to Unsupported-on-abstract** (2026-06-17 dogfood finding #2). Only after a full-text surface
|
|
197
|
+
*also* returns NONE/MENTIONS is the verdict Unsupported. (Single-pass: escalate once, then classify — no loop.)
|
|
198
|
+
- Identifier does not resolve on the **canonical** surface (`/abs/` 404, dead DOI, fabricated id) →
|
|
199
|
+
**Phantom ❌**. A 404 on `/html/` or `/pdf/` while `/abs/` resolves is **surface-absence, not
|
|
200
|
+
identifier-absence** — fall back per *Surface selection*, never Phantom.
|
|
156
201
|
- Identifier plausibly real but fetch blocked in this environment (paywall/403/timeout/cross-host) →
|
|
157
202
|
**Unreachable ⏳** — provisional; note the limit, route to a second surface or the human gate; never auto-Phantom.
|
|
158
203
|
- `WebFetch`/`WebSearch` **absent/disabled in the environment** (not one source — the capability) → Step 2-E
|
|
@@ -160,6 +205,40 @@ Grounded: N / Unsupported: N / Unreachable: N / Phantom: N
|
|
|
160
205
|
(do not mark every claim Unreachable + CONDITIONAL_PASS — that falsely implies grounding was attempted).
|
|
161
206
|
- **Never** upgrade NONE to Grounded because a second model "thinks it's probably right" (agreement bias).
|
|
162
207
|
|
|
208
|
+
**Coinage-vs-source-term (the external analogue of Step 2's format normalization — but *judged*, not
|
|
209
|
+
mechanical).** FH sometimes attributes a *coined* name to an external source ("the paper's Change
|
|
210
|
+
Manifest", "its Validation Gate") when the paper uses different words for the same mechanism. A name
|
|
211
|
+
mismatch alone is not automatically Unsupported — but the path to rescuing it is **deliberately narrow**,
|
|
212
|
+
because this is one rationalization away from the agreement-bias trap above. The failure direction must
|
|
213
|
+
stay conservative (over-flag, never false-Grounded):
|
|
214
|
+
|
|
215
|
+
- **Possessive/attributive phrasing is presumed a terminology claim.** "the paper's X", "its X",
|
|
216
|
+
"X, per [paper]" → literal absence of the coinage on the source = **Unsupported 🟠**, *unless the FH
|
|
217
|
+
artifact separately states the underlying relation in non-coinage words* (so the mechanism is
|
|
218
|
+
independently legible without the label). A runner may **not** reclassify a bare "the paper's X" as a
|
|
219
|
+
mechanism claim on its own reading — the mechanism claim must already be spelled out **in the artifact**
|
|
220
|
+
before the span-judged path opens. (Closes the laundering move: relabeling a terminology claim as a
|
|
221
|
+
mechanism claim to save it.)
|
|
222
|
+
- **When the mechanism IS spelled out, ground on the relation, not the string — mechanical de-label
|
|
223
|
+
test:** blank out *both* labels (FH's coinage and the source's term); the span must still literally
|
|
224
|
+
state the operative relation the claim asserts — the same inputs→outputs / the same conditional / the
|
|
225
|
+
same action. If, with both names blanked, the span no longer states the claim, the equivalence was
|
|
226
|
+
reader-supplied → **Unsupported 🟠**. (2026-06-17 dogfood: SkillOpt's "validation score" /
|
|
227
|
+
"rejected-edit buffer" survives the de-label test for FH's "Validation Gate" claim — the span states
|
|
228
|
+
the accept-only-on-improvement relation in its own words.)
|
|
229
|
+
- **No self-granted Grounded on a contested match.** If the de-label test is contested rather than clean,
|
|
230
|
+
the runner may **not** write Grounded — the *only* path to Grounded for a disputed term-match is
|
|
231
|
+
**escalate to `/steel-quench` Wave 1** and let the isolated adversary surface the span or reject it.
|
|
232
|
+
Default otherwise = Unsupported. (Removes the runner's discretion to self-select the lenient branch.)
|
|
233
|
+
- **Recommended conservative resolution (publish-facing).** For a publish-facing citation the safer fix
|
|
234
|
+
is to require the artifact to **quote the paper's own term** ("what SkillOpt calls a 'validation
|
|
235
|
+
score'") instead of asserting FH's coinage as the paper's — removing the judgment entirely. Reserve the
|
|
236
|
+
judged de-label path for low-stakes internal claims; match effort to stakes.
|
|
237
|
+
|
|
238
|
+
Unlike Step 2's value-format normalization (mechanical: `300s` ≡ `300 seconds`), coinage-vs-term is
|
|
239
|
+
**judged** — Grounded only via a clean de-label span or a passed steel-quench escalation, never a
|
|
240
|
+
reader's semantic equivalence assertion.
|
|
241
|
+
|
|
163
242
|
---
|
|
164
243
|
|
|
165
244
|
## §Step3-Detail
|