@chrono-meta/fh-gate 1.4.29 → 1.4.30
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json
CHANGED
|
@@ -8,12 +8,16 @@ model: sonnet
|
|
|
8
8
|
|
|
9
9
|
# edit-manifest — Predict-Verify Loop for Harness Edits
|
|
10
10
|
|
|
11
|
-
> Implements two mechanisms from frontier harness research
|
|
12
|
-
>
|
|
13
|
-
>
|
|
14
|
-
>
|
|
15
|
-
>
|
|
16
|
-
>
|
|
11
|
+
> Implements two mechanisms from frontier harness research, named as in their source papers (both terms
|
|
12
|
+
> are the papers' own — verified 2026-06-17 against arXiv full text):
|
|
13
|
+
> - **Change Manifest** (AHE, arXiv:2604.25850): each edit attaches a manifest entry declaring a
|
|
14
|
+
> falsifiable predicted impact (expected fixes + at-risk regressions); the next round intersects it
|
|
15
|
+
> with observed task-level deltas to produce a per-edit verdict. (Paper: "the Evolve Agent produces a
|
|
16
|
+
> change manifest…".)
|
|
17
|
+
> - **Validation Gate** (SkillOpt, arXiv:2605.23904): a candidate is accepted only when its
|
|
18
|
+
> selection-split score is strictly greater than the current selection score (ties rejected); rejected
|
|
19
|
+
> edits are kept in an epoch-local buffer, not silently discarded. (Paper: "validation gate" /
|
|
20
|
+
> "held-out gate".)
|
|
17
21
|
|
|
18
22
|
FH's regression_guard.sh catches backward regressions. edit-manifest closes the forward
|
|
19
23
|
loop: did the edit actually improve what it claimed to improve?
|
|
@@ -201,7 +205,7 @@ Apply `verified_at`, `verification_note`, `gate_decision` to each resolved entry
|
|
|
201
205
|
|
|
202
206
|
## Validation Gate Logic
|
|
203
207
|
|
|
204
|
-
Mirrors SkillOpt's selection-split
|
|
208
|
+
Mirrors SkillOpt's validation gate (the held-out gate over its selection-split score):
|
|
205
209
|
|
|
206
210
|
```
|
|
207
211
|
IF verified evidence shows improvement:
|
|
@@ -253,6 +257,6 @@ non-vacuous by the **mandatory cited observation** (no citation → stays pendin
|
|
|
253
257
|
## References
|
|
254
258
|
|
|
255
259
|
- Theoretical basis: AHE (arXiv:2604.25850) §4 change manifest + prediction falsifiability
|
|
256
|
-
- Validation gate: SkillOpt (arXiv:2605.23904) §3.3 selection-split
|
|
260
|
+
- Validation gate: SkillOpt (arXiv:2605.23904) §3.3 validation/held-out gate over the selection-split score + rejected-edit buffer
|
|
257
261
|
- Integrates with: `harvest-loop` Step 0-c · `verify-bidirectional` · 3-axis auto-gate (CLAUDE.md)
|
|
258
262
|
- Manifest file: `tracks/_meta/edit_manifest.yaml`
|
|
@@ -123,11 +123,16 @@ escalating to full text). Two mechanical, judgment-free rules cover this:
|
|
|
123
123
|
"section N", "Table N", "Figure N") → fetch full text **first**. Otherwise fetch `/abs/` first. (Do
|
|
124
124
|
*not* pre-decide "is this a body-level mechanism?" — that is a salience-dependent judgment; the
|
|
125
125
|
`§`-marker is the only mechanical signal, and the NONE-escalation below catches the rest.)
|
|
126
|
-
2. **Abstract
|
|
127
|
-
NONE
|
|
128
|
-
headline claim already returned ASSERTS on `/abs/`, so it never escalates; only a
|
|
129
|
-
escalation is cheap insurance.
|
|
130
|
-
|
|
126
|
+
2. **Abstract NONE/MENTIONS escalates** (the self-correcting backstop) — *any* `/abs/` fetch that
|
|
127
|
+
returns **NONE or MENTIONS** escalates once to full text before an Unsupported verdict (see Decision
|
|
128
|
+
rules). A true headline claim already returned ASSERTS on `/abs/`, so it never escalates; only a
|
|
129
|
+
NONE/MENTIONS does, and the escalation is cheap insurance. (A MENTIONS on the abstract is the common
|
|
130
|
+
case — the body may ASSERT what the abstract only named; measured 2026-06-17 wild case 2604.25850 was
|
|
131
|
+
a /abs MENTIONS that became a full-text ASSERTS. A **NEGATES** does *not* escalate — it is
|
|
132
|
+
polarity-final: a body span asserting the same relation cannot coexist with an abstract that refutes
|
|
133
|
+
it; a *different, narrower* body claim is a citation-granularity error, still Unsupported, not an
|
|
134
|
+
escalation target.) This subsumes rule 1's misses — a body claim that *didn't* carry a `§` marker
|
|
135
|
+
still gets full text via its abstract NONE/MENTIONS.
|
|
131
136
|
|
|
132
137
|
**HTML is a partial backfill — fall back, never Phantom.** arXiv HTML exists only for LaTeX-source
|
|
133
138
|
papers (roughly post-2023) and some conversions fail, so `/html/{id}` can 404 for a perfectly real
|
|
@@ -156,10 +161,10 @@ Polarity is load-bearing: a page saying "X does NOT hold" or "earlier work claim
|
|
|
156
161
|
contains a span lexically matching the claim. Asking for the label keeps the *retriever* mechanical while
|
|
157
162
|
surfacing the stance the *judge* needs — a NEGATES/MENTIONS span is **not** grounding.
|
|
158
163
|
|
|
159
|
-
**Coinage at fetch time:** when the claim's key term is a known FH coinage (a name FH invented
|
|
160
|
-
|
|
161
|
-
|
|
162
|
-
isn't on the page. The Coinage-vs-source-term decision rule (below) then only adjudicates the returned
|
|
164
|
+
**Coinage at fetch time:** when the claim's key term is a known FH coinage (a name FH invented that the
|
|
165
|
+
source does *not* use — e.g. a label like "Drift Gate" for a mechanism the paper describes only in its
|
|
166
|
+
own words), pass the *mechanism description* — not the coinage string — into the `<exact claim>` slot, so
|
|
167
|
+
the retriever searches for the **relation**, not a label that isn't on the page. The Coinage-vs-source-term decision rule (below) then only adjudicates the returned
|
|
163
168
|
span; it never has to overturn a NONE the prompt itself caused.
|
|
164
169
|
|
|
165
170
|
**Span-evidence format** (a Grounded external verdict is invalid without this):
|
|
@@ -186,15 +191,19 @@ Grounded: N / Unsupported: N / Unreachable: N / Phantom: N
|
|
|
186
191
|
**Decision rules**:
|
|
187
192
|
- Span labelled **ASSERTS** that expresses the claimed relation → **Grounded ✅** (record span).
|
|
188
193
|
- Span labelled **NEGATES or MENTIONS** → **Unsupported 🟠** (a negating or merely-mentioning span is not
|
|
189
|
-
grounding — the false-Grounded trap).
|
|
194
|
+
grounding — the false-Grounded trap). **One exception — a MENTIONS from the abstract (`/abs/`) surface
|
|
195
|
+
escalates first** (per the NONE/MENTIONS escalation below) before being finalized: the body may ASSERT
|
|
196
|
+
what the abstract only mentioned. A **NEGATES** is final (the source refutes the claim — the body will
|
|
197
|
+
not undo it); a MENTIONS from a full-text surface is also final.
|
|
190
198
|
- Fetch succeeded, content readable, span = NONE or off-claim → **Unsupported 🟠** (cited-but-not-verified).
|
|
191
|
-
**Surface-escalation first — abstract
|
|
192
|
-
surface fetched was the abstract (`/abs/`), re-fetch full text (fall back `/html/{id}` → `/pdf/{id}`)
|
|
193
|
-
before classifying — *any* abstract
|
|
194
|
-
"looks body-level" (that judgment is the salience leak this rule removes). A body-supported claim
|
|
195
|
-
not be failed on an abstract-only NONE. This is the second-surface escalation, **extended
|
|
196
|
-
Unreachable to Unsupported-on-abstract** (2026-06-17 dogfood finding #2). Only after a full-text
|
|
197
|
-
*also* returns NONE/MENTIONS is the verdict Unsupported. (Single-pass: escalate once, then
|
|
199
|
+
**Surface-escalation first — an abstract NONE/MENTIONS is the mechanical trigger (no judgment):** if the
|
|
200
|
+
only surface fetched was the abstract (`/abs/`), re-fetch full text (fall back `/html/{id}` → `/pdf/{id}`)
|
|
201
|
+
before classifying — *any* abstract NONE/MENTIONS escalates once, you do **not** first decide whether the
|
|
202
|
+
claim "looks body-level" (that judgment is the salience leak this rule removes). A body-supported claim
|
|
203
|
+
must not be failed on an abstract-only NONE/MENTIONS. This is the second-surface escalation, **extended
|
|
204
|
+
from Unreachable to Unsupported-on-abstract** (2026-06-17 dogfood finding #2). Only after a full-text
|
|
205
|
+
surface *also* returns NONE/MENTIONS is the verdict Unsupported. (Single-pass: escalate once, then
|
|
206
|
+
classify — no loop.)
|
|
198
207
|
- Identifier does not resolve on the **canonical** surface (`/abs/` 404, dead DOI, fabricated id) →
|
|
199
208
|
**Phantom ❌**. A 404 on `/html/` or `/pdf/` while `/abs/` resolves is **surface-absence, not
|
|
200
209
|
identifier-absence** — fall back per *Surface selection*, never Phantom.
|
|
@@ -206,33 +215,39 @@ Grounded: N / Unsupported: N / Unreachable: N / Phantom: N
|
|
|
206
215
|
- **Never** upgrade NONE to Grounded because a second model "thinks it's probably right" (agreement bias).
|
|
207
216
|
|
|
208
217
|
**Coinage-vs-source-term (the external analogue of Step 2's format normalization — but *judged*, not
|
|
209
|
-
mechanical).** FH sometimes attributes a *coined* name to an external source ("the paper's
|
|
210
|
-
|
|
218
|
+
mechanical).** FH sometimes attributes a *coined* name to an external source (e.g. "the paper's Drift
|
|
219
|
+
Gate") when the paper uses different words for the same mechanism. A name
|
|
211
220
|
mismatch alone is not automatically Unsupported — but the path to rescuing it is **deliberately narrow**,
|
|
212
221
|
because this is one rationalization away from the agreement-bias trap above. The failure direction must
|
|
213
222
|
stay conservative (over-flag, never false-Grounded):
|
|
214
223
|
|
|
215
224
|
- **Possessive/attributive phrasing is presumed a terminology claim.** "the paper's X", "its X",
|
|
216
225
|
"X, per [paper]" → literal absence of the coinage on the source = **Unsupported 🟠**, *unless the FH
|
|
217
|
-
artifact separately states the underlying relation in non-coinage words*
|
|
218
|
-
|
|
219
|
-
|
|
220
|
-
|
|
221
|
-
mechanism claim
|
|
226
|
+
artifact separately states the underlying relation in non-coinage words* — "separately" means
|
|
227
|
+
**anywhere in the artifact, including an inline gloss of the coined term** (e.g. "**X** (the spec's
|
|
228
|
+
accept-only-on-improvement rule)"); proximity does not matter, only that the relation is stated in
|
|
229
|
+
non-coinage words, so the mechanism is independently legible without the label. A runner may **not**
|
|
230
|
+
reclassify a bare "the paper's X" (a coinage with *no* such gloss anywhere) as a mechanism claim on its
|
|
231
|
+
own reading — the mechanism claim must already be spelled out **in the artifact** before the span-judged
|
|
232
|
+
path opens. (Closes the laundering move: relabeling a terminology claim as a mechanism claim to save it.)
|
|
233
|
+
An inline gloss grants only *eligibility* for that path — Grounded still comes **only from the fetched
|
|
234
|
+
source span** (de-label test below), never from the gloss itself; a gloss the source does not support
|
|
235
|
+
is artifact-internal, fails de-label, and cannot launder a coinage into Grounded.
|
|
222
236
|
- **When the mechanism IS spelled out, ground on the relation, not the string — mechanical de-label
|
|
223
237
|
test:** blank out *both* labels (FH's coinage and the source's term); the span must still literally
|
|
224
238
|
state the operative relation the claim asserts — the same inputs→outputs / the same conditional / the
|
|
225
239
|
same action. If, with both names blanked, the span no longer states the claim, the equivalence was
|
|
226
|
-
reader-supplied → **Unsupported 🟠**. (
|
|
227
|
-
|
|
228
|
-
|
|
240
|
+
reader-supplied → **Unsupported 🟠**. (Worked shape: blank FH's coined label *and* the source's own
|
|
241
|
+
term; a fetched span that still literally states the operative relation — e.g. "accepted only when the
|
|
242
|
+
score strictly improves" — in the source's own words is Grounded; if the equivalence only holds once
|
|
243
|
+
you read FH's label back in, it was reader-supplied → Unsupported.)
|
|
229
244
|
- **No self-granted Grounded on a contested match.** If the de-label test is contested rather than clean,
|
|
230
245
|
the runner may **not** write Grounded — the *only* path to Grounded for a disputed term-match is
|
|
231
246
|
**escalate to `/steel-quench` Wave 1** and let the isolated adversary surface the span or reject it.
|
|
232
247
|
Default otherwise = Unsupported. (Removes the runner's discretion to self-select the lenient branch.)
|
|
233
248
|
- **Recommended conservative resolution (publish-facing).** For a publish-facing citation the safer fix
|
|
234
|
-
is to require the artifact to **quote the
|
|
235
|
-
|
|
249
|
+
is to require the artifact to **quote the source's own term** (e.g. "what the source itself calls X")
|
|
250
|
+
instead of asserting an unverified label as the source's — removing the judgment entirely. Reserve the
|
|
236
251
|
judged de-label path for low-stakes internal claims; match effort to stakes.
|
|
237
252
|
|
|
238
253
|
Unlike Step 2's value-format normalization (mechanical: `300s` ≡ `300 seconds`), coinage-vs-term is
|