@chrono-meta/fh-gate 1.4.27 → 1.4.29
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json
CHANGED
|
@@ -85,7 +85,7 @@ Execution method · agent selection · recording location → infer and proceed
|
|
|
85
85
|
| "Do X" — target and completion criteria are clear | Clear | Immediately enter Step 0-b |
|
|
86
86
|
| "Improve X" — improvement dimension unclear | Direction unclear | Ask for clarification |
|
|
87
87
|
| "Something feels off" — diagnosis target unclear | Execution unclear → infer | Auto-dispatch harness-doctor |
|
|
88
|
-
| "
|
|
88
|
+
| "Wrap up / harvest this session" — trigger is clear | Clear | Run harvest-loop based on recent work ("run the pipeline" itself is ceded to pipeline-conductor — route end-to-end sweeps there, not here) |
|
|
89
89
|
| No request — session context exists | Execution unclear → infer | Auto-compose based on CATALOG/MEMORY, then confirm |
|
|
90
90
|
|
|
91
91
|
> **Detail**: See `SKILL_detail.md §Clarification` — 2-question format, 4-question structured confirmation, meta-prompting intervention — read when direction ambiguity requires a clarification block.
|
|
@@ -106,12 +106,42 @@ literal span — never on a model's agreement.
|
|
|
106
106
|
|
|
107
107
|
| Citation form | Resolved URL |
|
|
108
108
|
|---|---|
|
|
109
|
-
| `arXiv:NNNN.NNNNN` / `arXiv:NNNN.NNNNNvK` | `https://arxiv.org/abs/NNNN.NNNNN` |
|
|
109
|
+
| `arXiv:NNNN.NNNNN` / `arXiv:NNNN.NNNNNvK` | `https://arxiv.org/abs/NNNN.NNNNN` (abstract — the **canonical identifier** surface). For a section-/body-cited claim, prefer full text **first**, falling back in order `https://arxiv.org/html/NNNN.NNNNN` → `https://arxiv.org/pdf/NNNN.NNNNN` → `/abs/` (HTML is a *partial* backfill — see *Surface selection* below) |
|
|
110
110
|
| bare DOI `10.xxxx/...` | `https://doi.org/10.xxxx/...` |
|
|
111
111
|
| `http(s)://...` | use as-is (HTTPS-upgrade handled by WebFetch) |
|
|
112
112
|
| Named source, no URL ("paper X shows Y") | **one** `WebSearch` to locate the canonical URL → then WebFetch it. Do **not** verify against the search snippet alone — the snippet is not the source. |
|
|
113
113
|
| version token `pkg x.y.z` | the package registry/release page (npm/PyPI/GitHub releases) for that exact version |
|
|
114
114
|
|
|
115
|
+
**Surface selection — the abstract is the wrong surface for a §-cited mechanism.** The `/abs/` page
|
|
116
|
+
carries only the abstract; a claim that cites a section or attributes a *named body-level mechanism* to
|
|
117
|
+
the paper (e.g. "§4.2 stale-but-confident", "the paper's Change Manifest") returns NONE / MENTIONS on
|
|
118
|
+
`/abs/` even when fully supported in the body — an **artificial Unsupported** (measured 2026-06-17
|
|
119
|
+
in-the-wild dogfood: a `/abs/`-only fetch falsely flagged a body-supported claim; it resolved only after
|
|
120
|
+
escalating to full text). Two mechanical, judgment-free rules cover this:
|
|
121
|
+
|
|
122
|
+
1. **Mechanical first-surface trigger** — if the claim text carries a section/figure marker (`§N`,
|
|
123
|
+
"section N", "Table N", "Figure N") → fetch full text **first**. Otherwise fetch `/abs/` first. (Do
|
|
124
|
+
*not* pre-decide "is this a body-level mechanism?" — that is a salience-dependent judgment; the
|
|
125
|
+
`§`-marker is the only mechanical signal, and the NONE-escalation below catches the rest.)
|
|
126
|
+
2. **Abstract-NONE escalates** (the self-correcting backstop) — *any* `/abs/` fetch that returns
|
|
127
|
+
NONE/MENTIONS escalates once to full text before an Unsupported verdict (see Decision rules). A true
|
|
128
|
+
headline claim already returned ASSERTS on `/abs/`, so it never escalates; only a NONE does, and the
|
|
129
|
+
escalation is cheap insurance. This subsumes rule 1's misses — a body claim that *didn't* carry a `§`
|
|
130
|
+
marker still gets full text via its abstract-NONE.
|
|
131
|
+
|
|
132
|
+
**HTML is a partial backfill — fall back, never Phantom.** arXiv HTML exists only for LaTeX-source
|
|
133
|
+
papers (roughly post-2023) and some conversions fail, so `/html/{id}` can 404 for a perfectly real
|
|
134
|
+
paper. A 404 on `/html/` (or `/pdf/`) is **surface-absence, not identifier-absence**: fall back
|
|
135
|
+
`/html/` → `/pdf/` → `/abs/`. Only a 404 on **`/abs/`** (the canonical identifier surface) is Phantom.
|
|
136
|
+
|
|
137
|
+
**Anchor the full-text fetch.** Full text is large; pass the cited section number / mechanism term *into*
|
|
138
|
+
the WebFetch prompt (`"…the span in §4.2 that states…"`) so the span search is anchored, not whole-paper
|
|
139
|
+
— this makes the `§`-citation the retrieval anchor, not merely a routing flag, and guards against a long
|
|
140
|
+
paper's relevant span falling outside the fetch model's attention (a full-text false-NONE).
|
|
141
|
+
|
|
142
|
+
Reserve `/abs/` for claims the abstract itself states (headline numbers, the paper's stated contribution
|
|
143
|
+
— e.g. a top-line stat like "39–77% factual support" is abstract-level and `/abs/` is correct).
|
|
144
|
+
|
|
115
145
|
**WebFetch prompt template** (ask for a *span*, not a *verdict* — this keeps the anchor non-model):
|
|
116
146
|
|
|
117
147
|
```
|
|
@@ -126,6 +156,12 @@ Polarity is load-bearing: a page saying "X does NOT hold" or "earlier work claim
|
|
|
126
156
|
contains a span lexically matching the claim. Asking for the label keeps the *retriever* mechanical while
|
|
127
157
|
surfacing the stance the *judge* needs — a NEGATES/MENTIONS span is **not** grounding.
|
|
128
158
|
|
|
159
|
+
**Coinage at fetch time:** when the claim's key term is a known FH coinage (a name FH invented, not the
|
|
160
|
+
paper's — e.g. "Change Manifest", "Validation Gate"), pass the *mechanism description* — not the coinage
|
|
161
|
+
string — into the `<exact claim>` slot, so the retriever searches for the **relation**, not a label that
|
|
162
|
+
isn't on the page. The Coinage-vs-source-term decision rule (below) then only adjudicates the returned
|
|
163
|
+
span; it never has to overturn a NONE the prompt itself caused.
|
|
164
|
+
|
|
129
165
|
**Span-evidence format** (a Grounded external verdict is invalid without this):
|
|
130
166
|
|
|
131
167
|
```
|
|
@@ -152,7 +188,16 @@ Grounded: N / Unsupported: N / Unreachable: N / Phantom: N
|
|
|
152
188
|
- Span labelled **NEGATES or MENTIONS** → **Unsupported 🟠** (a negating or merely-mentioning span is not
|
|
153
189
|
grounding — the false-Grounded trap).
|
|
154
190
|
- Fetch succeeded, content readable, span = NONE or off-claim → **Unsupported 🟠** (cited-but-not-verified).
|
|
155
|
-
-
|
|
191
|
+
**Surface-escalation first — abstract-NONE is the mechanical trigger (no judgment):** if the only
|
|
192
|
+
surface fetched was the abstract (`/abs/`), re-fetch full text (fall back `/html/{id}` → `/pdf/{id}`)
|
|
193
|
+
before classifying — *any* abstract-NONE escalates once, you do **not** first decide whether the claim
|
|
194
|
+
"looks body-level" (that judgment is the salience leak this rule removes). A body-supported claim must
|
|
195
|
+
not be failed on an abstract-only NONE. This is the second-surface escalation, **extended from
|
|
196
|
+
Unreachable to Unsupported-on-abstract** (2026-06-17 dogfood finding #2). Only after a full-text surface
|
|
197
|
+
*also* returns NONE/MENTIONS is the verdict Unsupported. (Single-pass: escalate once, then classify — no loop.)
|
|
198
|
+
- Identifier does not resolve on the **canonical** surface (`/abs/` 404, dead DOI, fabricated id) →
|
|
199
|
+
**Phantom ❌**. A 404 on `/html/` or `/pdf/` while `/abs/` resolves is **surface-absence, not
|
|
200
|
+
identifier-absence** — fall back per *Surface selection*, never Phantom.
|
|
156
201
|
- Identifier plausibly real but fetch blocked in this environment (paywall/403/timeout/cross-host) →
|
|
157
202
|
**Unreachable ⏳** — provisional; note the limit, route to a second surface or the human gate; never auto-Phantom.
|
|
158
203
|
- `WebFetch`/`WebSearch` **absent/disabled in the environment** (not one source — the capability) → Step 2-E
|
|
@@ -160,6 +205,40 @@ Grounded: N / Unsupported: N / Unreachable: N / Phantom: N
|
|
|
160
205
|
(do not mark every claim Unreachable + CONDITIONAL_PASS — that falsely implies grounding was attempted).
|
|
161
206
|
- **Never** upgrade NONE to Grounded because a second model "thinks it's probably right" (agreement bias).
|
|
162
207
|
|
|
208
|
+
**Coinage-vs-source-term (the external analogue of Step 2's format normalization — but *judged*, not
|
|
209
|
+
mechanical).** FH sometimes attributes a *coined* name to an external source ("the paper's Change
|
|
210
|
+
Manifest", "its Validation Gate") when the paper uses different words for the same mechanism. A name
|
|
211
|
+
mismatch alone is not automatically Unsupported — but the path to rescuing it is **deliberately narrow**,
|
|
212
|
+
because this is one rationalization away from the agreement-bias trap above. The failure direction must
|
|
213
|
+
stay conservative (over-flag, never false-Grounded):
|
|
214
|
+
|
|
215
|
+
- **Possessive/attributive phrasing is presumed a terminology claim.** "the paper's X", "its X",
|
|
216
|
+
"X, per [paper]" → literal absence of the coinage on the source = **Unsupported 🟠**, *unless the FH
|
|
217
|
+
artifact separately states the underlying relation in non-coinage words* (so the mechanism is
|
|
218
|
+
independently legible without the label). A runner may **not** reclassify a bare "the paper's X" as a
|
|
219
|
+
mechanism claim on its own reading — the mechanism claim must already be spelled out **in the artifact**
|
|
220
|
+
before the span-judged path opens. (Closes the laundering move: relabeling a terminology claim as a
|
|
221
|
+
mechanism claim to save it.)
|
|
222
|
+
- **When the mechanism IS spelled out, ground on the relation, not the string — mechanical de-label
|
|
223
|
+
test:** blank out *both* labels (FH's coinage and the source's term); the span must still literally
|
|
224
|
+
state the operative relation the claim asserts — the same inputs→outputs / the same conditional / the
|
|
225
|
+
same action. If, with both names blanked, the span no longer states the claim, the equivalence was
|
|
226
|
+
reader-supplied → **Unsupported 🟠**. (2026-06-17 dogfood: SkillOpt's "validation score" /
|
|
227
|
+
"rejected-edit buffer" survives the de-label test for FH's "Validation Gate" claim — the span states
|
|
228
|
+
the accept-only-on-improvement relation in its own words.)
|
|
229
|
+
- **No self-granted Grounded on a contested match.** If the de-label test is contested rather than clean,
|
|
230
|
+
the runner may **not** write Grounded — the *only* path to Grounded for a disputed term-match is
|
|
231
|
+
**escalate to `/steel-quench` Wave 1** and let the isolated adversary surface the span or reject it.
|
|
232
|
+
Default otherwise = Unsupported. (Removes the runner's discretion to self-select the lenient branch.)
|
|
233
|
+
- **Recommended conservative resolution (publish-facing).** For a publish-facing citation the safer fix
|
|
234
|
+
is to require the artifact to **quote the paper's own term** ("what SkillOpt calls a 'validation
|
|
235
|
+
score'") instead of asserting FH's coinage as the paper's — removing the judgment entirely. Reserve the
|
|
236
|
+
judged de-label path for low-stakes internal claims; match effort to stakes.
|
|
237
|
+
|
|
238
|
+
Unlike Step 2's value-format normalization (mechanical: `300s` ≡ `300 seconds`), coinage-vs-term is
|
|
239
|
+
**judged** — Grounded only via a clean de-label span or a passed steel-quench escalation, never a
|
|
240
|
+
reader's semantic equivalence assertion.
|
|
241
|
+
|
|
163
242
|
---
|
|
164
243
|
|
|
165
244
|
## §Step3-Detail
|
|
@@ -13,117 +13,18 @@ deprecated_date: 2026-06-02
|
|
|
13
13
|
successor: harness-doctor
|
|
14
14
|
---
|
|
15
15
|
|
|
16
|
-
# self-marketing-lint —
|
|
16
|
+
# self-marketing-lint — DEPRECATED
|
|
17
17
|
|
|
18
|
-
|
|
19
|
-
|
|
18
|
+
This skill was absorbed into **harness-doctor** on 2026-06-02. Its self-marketing /
|
|
19
|
+
cushion-word / version-brag detection patterns now live in **harness-doctor Step 3-L
|
|
20
|
+
(`--lint` mode)**, which carries the canonical baseline.
|
|
20
21
|
|
|
21
|
-
|
|
22
|
-
|
|
23
|
-
- `/self-marketing-lint`
|
|
24
|
-
- "Check FH files", "remove marketing language", "description diet"
|
|
25
|
-
- "Catch self-marketing language", "run the lint"
|
|
26
|
-
- When surfaced as a HIGH item in harvest-loop (P10 series)
|
|
27
|
-
|
|
28
|
-
## Detection Patterns (Inline Baseline)
|
|
29
|
-
|
|
30
|
-
The following criteria are self-contained inside this skill. No external personal-memory file is required.
|
|
31
|
-
|
|
32
|
-
### Self-Marketing Anti-Patterns
|
|
33
|
-
|
|
34
|
-
1. **Self-declarations of quality**: "industry-leading", "world-class", "production-ready", "the world's first" without external evidence
|
|
35
|
-
2. **Round-counters and version brags**: "after 26 iterations", "v0.5 maturity", "22nd naming round", "prototype 9th iteration" — irrelevant to function
|
|
36
|
-
3. **Cushion language**: "perfectly", "easily", "seamlessly", "instantly", "without burden" — content-free intensifiers
|
|
37
|
-
4. **Owner self-reference loops**: "validated by our team", "proven internally" — circular evidence with no external check
|
|
38
|
-
5. **Marketing word stacking**: 3+ adjectives in one sentence
|
|
39
|
-
6. **Spec-only artifacts**: skills/agents missing a Done When section — declaration without observable output
|
|
40
|
-
|
|
41
|
-
| Pattern type | Examples (detection targets) | Replacement direction |
|
|
42
|
-
|---|---|---|
|
|
43
|
-
| Version labels | "v0.3", "latest version", "(added 2026-05-24)" | Remove (git history manages versioning) |
|
|
44
|
-
| Emphasis words | "fully automated", "instantly", "automatically", "full coverage" | Replace with functional description |
|
|
45
|
-
| Iteration counts | "22nd naming", "26th round", "prototype 9th iteration" | Remove or move to external document |
|
|
46
|
-
| Self-declarations | "This skill is the world's first", "Core FH skill" | Delete |
|
|
47
|
-
| Cushion language | "easily", "conveniently", "without burden" | Replace with behavior-based description |
|
|
48
|
-
| Spec-only (no outputs) | Skills missing Done When | Require adding an outputs section |
|
|
49
|
-
|
|
50
|
-
## Execution Steps
|
|
51
|
-
|
|
52
|
-
### Step 1. Scan Target Files
|
|
53
|
-
|
|
54
|
-
```bash
|
|
55
|
-
# full list of skill files
|
|
56
|
-
find {FH_ROOT}/plugins -name "SKILL.md" | sort
|
|
57
|
-
|
|
58
|
-
# full list of agent files
|
|
59
|
-
find {FH_ROOT}/.claude/agents -name "*.md" | sort
|
|
60
|
-
|
|
61
|
-
# docs/ public documents
|
|
62
|
-
find {FH_ROOT}/docs -name "*.md" | sort
|
|
63
|
-
```
|
|
64
|
-
|
|
65
|
-
Scope: specified files only if target is given / full scan if unspecified.
|
|
66
|
-
|
|
67
|
-
### Step 2. Confirm Inline Baseline
|
|
68
|
-
|
|
69
|
-
Re-read the "Detection Patterns (Inline Baseline)" section above. No external file load is required — the skill is self-contained.
|
|
70
|
-
|
|
71
|
-
### Step 3. Pattern Detection
|
|
72
|
-
|
|
73
|
-
Grep detection patterns in each file's description field + body.
|
|
74
|
-
|
|
75
|
-
```bash
|
|
76
|
-
# version / iteration count patterns
|
|
77
|
-
grep -n "v[0-9]\+\.[0-9]\|[0-9]\+th\|[0-9]\+th iteration\|prototype [0-9]\+" {file}
|
|
78
|
-
|
|
79
|
-
# emphasis word patterns
|
|
80
|
-
grep -n "fully\|instantly\|full coverage\|automatically\|conveniently\|easily\|without burden" {file}
|
|
81
|
-
|
|
82
|
-
# self-declaration patterns
|
|
83
|
-
grep -n "core skill\|world's first\|best\|most\|top" {file}
|
|
84
|
-
```
|
|
85
|
-
|
|
86
|
-
### Step 4. Output Replacement Suggestions
|
|
87
|
-
|
|
88
|
-
Organize and output detection results per file.
|
|
89
|
-
|
|
90
|
-
```
|
|
91
|
-
File: {path}
|
|
92
|
-
L{line}: "{original text}"
|
|
93
|
-
→ Suggested: "{replacement text}" (reason: {pattern type})
|
|
94
|
-
→ Or: recommend removal (reason: {pattern type})
|
|
95
|
-
```
|
|
96
|
-
|
|
97
|
-
**No automatic edits**: Output suggestions only. Actual edits proceed after user confirmation.
|
|
98
|
-
|
|
99
|
-
### Step 5. Summary Report
|
|
22
|
+
**Use instead:**
|
|
100
23
|
|
|
101
24
|
```
|
|
102
|
-
|
|
103
|
-
|
|
104
|
-
Target: {N} files
|
|
105
|
-
Detected: {N} items
|
|
106
|
-
|
|
107
|
-
| File | Detection count | Most common pattern |
|
|
108
|
-
|---|---|---|
|
|
109
|
-
...
|
|
110
|
-
|
|
111
|
-
Recommendation: prioritize files with highest detection counts for review
|
|
25
|
+
/harness-doctor --lint
|
|
112
26
|
```
|
|
113
27
|
|
|
114
|
-
|
|
115
|
-
|
|
116
|
-
|
|
117
|
-
1. Full scan of target files complete
|
|
118
|
-
2. Inline baseline criteria applied (no external file dependency)
|
|
119
|
-
3. Detection results organized and output per file
|
|
120
|
-
4. No automatic edits — user confirmation gate maintained
|
|
121
|
-
```
|
|
122
|
-
|
|
123
|
-
## Connections
|
|
124
|
-
|
|
125
|
-
| Situation | Connection |
|
|
126
|
-
|---|---|
|
|
127
|
-
| harvest-loop HIGH P10 series triggered | auto-suggest `/self-marketing-lint` |
|
|
128
|
-
| During cold audit | `/steel-quench` + self-marketing-lint in parallel |
|
|
129
|
-
| Before external distribution | `fh-meta:hub-persona-auditor` followed by self-marketing-lint |
|
|
28
|
+
Triggers that previously reached this skill (`check FH files`, `remove marketing
|
|
29
|
+
language`, `description diet`) now route to harness-doctor Step 3-L. This stub remains
|
|
30
|
+
only so old references resolve.
|