@chrono-meta/fh-gate 1.4.27 → 1.4.29

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@chrono-meta/fh-gate",
3
- "version": "1.4.27",
3
+ "version": "1.4.29",
4
4
  "description": "FH runtime adapters — run FH governance, skills, and agents via Claude or Codex with machine-parseable gates.",
5
5
  "license": "MIT",
6
6
  "keywords": [
@@ -85,7 +85,7 @@ Execution method · agent selection · recording location → infer and proceed
85
85
  | "Do X" — target and completion criteria are clear | Clear | Immediately enter Step 0-b |
86
86
  | "Improve X" — improvement dimension unclear | Direction unclear | Ask for clarification |
87
87
  | "Something feels off" — diagnosis target unclear | Execution unclear → infer | Auto-dispatch harness-doctor |
88
- | "Run the pipeline" — trigger is clear | Clear | Run harvest-loop based on recent work |
88
+ | "Wrap up / harvest this session" — trigger is clear | Clear | Run harvest-loop based on recent work ("run the pipeline" itself is ceded to pipeline-conductor — route end-to-end sweeps there, not here) |
89
89
  | No request — session context exists | Execution unclear → infer | Auto-compose based on CATALOG/MEMORY, then confirm |
90
90
 
91
91
  > **Detail**: See `SKILL_detail.md §Clarification` — 2-question format, 4-question structured confirmation, meta-prompting intervention — read when direction ambiguity requires a clarification block.
@@ -106,12 +106,42 @@ literal span — never on a model's agreement.
106
106
 
107
107
  | Citation form | Resolved URL |
108
108
  |---|---|
109
- | `arXiv:NNNN.NNNNN` / `arXiv:NNNN.NNNNNvK` | `https://arxiv.org/abs/NNNN.NNNNN` |
109
+ | `arXiv:NNNN.NNNNN` / `arXiv:NNNN.NNNNNvK` | `https://arxiv.org/abs/NNNN.NNNNN` (abstract — the **canonical identifier** surface). For a section-/body-cited claim, prefer full text **first**, falling back in order `https://arxiv.org/html/NNNN.NNNNN` → `https://arxiv.org/pdf/NNNN.NNNNN` → `/abs/` (HTML is a *partial* backfill — see *Surface selection* below) |
110
110
  | bare DOI `10.xxxx/...` | `https://doi.org/10.xxxx/...` |
111
111
  | `http(s)://...` | use as-is (HTTPS-upgrade handled by WebFetch) |
112
112
  | Named source, no URL ("paper X shows Y") | **one** `WebSearch` to locate the canonical URL → then WebFetch it. Do **not** verify against the search snippet alone — the snippet is not the source. |
113
113
  | version token `pkg x.y.z` | the package registry/release page (npm/PyPI/GitHub releases) for that exact version |
114
114
 
115
+ **Surface selection — the abstract is the wrong surface for a §-cited mechanism.** The `/abs/` page
116
+ carries only the abstract; a claim that cites a section or attributes a *named body-level mechanism* to
117
+ the paper (e.g. "§4.2 stale-but-confident", "the paper's Change Manifest") returns NONE / MENTIONS on
118
+ `/abs/` even when fully supported in the body — an **artificial Unsupported** (measured 2026-06-17
119
+ in-the-wild dogfood: a `/abs/`-only fetch falsely flagged a body-supported claim; it resolved only after
120
+ escalating to full text). Two mechanical, judgment-free rules cover this:
121
+
122
+ 1. **Mechanical first-surface trigger** — if the claim text carries a section/figure marker (`§N`,
123
+ "section N", "Table N", "Figure N") → fetch full text **first**. Otherwise fetch `/abs/` first. (Do
124
+ *not* pre-decide "is this a body-level mechanism?" — that is a salience-dependent judgment; the
125
+ `§`-marker is the only mechanical signal, and the NONE-escalation below catches the rest.)
126
+ 2. **Abstract-NONE escalates** (the self-correcting backstop) — *any* `/abs/` fetch that returns
127
+ NONE/MENTIONS escalates once to full text before an Unsupported verdict (see Decision rules). A true
128
+ headline claim already returned ASSERTS on `/abs/`, so it never escalates; only a NONE does, and the
129
+ escalation is cheap insurance. This subsumes rule 1's misses — a body claim that *didn't* carry a `§`
130
+ marker still gets full text via its abstract-NONE.
131
+
132
+ **HTML is a partial backfill — fall back, never Phantom.** arXiv HTML exists only for LaTeX-source
133
+ papers (roughly post-2023) and some conversions fail, so `/html/{id}` can 404 for a perfectly real
134
+ paper. A 404 on `/html/` (or `/pdf/`) is **surface-absence, not identifier-absence**: fall back
135
+ `/html/` → `/pdf/` → `/abs/`. Only a 404 on **`/abs/`** (the canonical identifier surface) is Phantom.
136
+
137
+ **Anchor the full-text fetch.** Full text is large; pass the cited section number / mechanism term *into*
138
+ the WebFetch prompt (`"…the span in §4.2 that states…"`) so the span search is anchored, not whole-paper
139
+ — this makes the `§`-citation the retrieval anchor, not merely a routing flag, and guards against a long
140
+ paper's relevant span falling outside the fetch model's attention (a full-text false-NONE).
141
+
142
+ Reserve `/abs/` for claims the abstract itself states (headline numbers, the paper's stated contribution
143
+ — e.g. a top-line stat like "39–77% factual support" is abstract-level and `/abs/` is correct).
144
+
115
145
  **WebFetch prompt template** (ask for a *span*, not a *verdict* — this keeps the anchor non-model):
116
146
 
117
147
  ```
@@ -126,6 +156,12 @@ Polarity is load-bearing: a page saying "X does NOT hold" or "earlier work claim
126
156
  contains a span lexically matching the claim. Asking for the label keeps the *retriever* mechanical while
127
157
  surfacing the stance the *judge* needs — a NEGATES/MENTIONS span is **not** grounding.
128
158
 
159
+ **Coinage at fetch time:** when the claim's key term is a known FH coinage (a name FH invented, not the
160
+ paper's — e.g. "Change Manifest", "Validation Gate"), pass the *mechanism description* — not the coinage
161
+ string — into the `<exact claim>` slot, so the retriever searches for the **relation**, not a label that
162
+ isn't on the page. The Coinage-vs-source-term decision rule (below) then only adjudicates the returned
163
+ span; it never has to overturn a NONE the prompt itself caused.
164
+
129
165
  **Span-evidence format** (a Grounded external verdict is invalid without this):
130
166
 
131
167
  ```
@@ -152,7 +188,16 @@ Grounded: N / Unsupported: N / Unreachable: N / Phantom: N
152
188
  - Span labelled **NEGATES or MENTIONS** → **Unsupported 🟠** (a negating or merely-mentioning span is not
153
189
  grounding — the false-Grounded trap).
154
190
  - Fetch succeeded, content readable, span = NONE or off-claim → **Unsupported 🟠** (cited-but-not-verified).
155
- - Identifier does not resolve at all (404 on a specific arXiv id, dead DOI, fabricated) **Phantom ❌**.
191
+ **Surface-escalation first abstract-NONE is the mechanical trigger (no judgment):** if the only
192
+ surface fetched was the abstract (`/abs/`), re-fetch full text (fall back `/html/{id}` → `/pdf/{id}`)
193
+ before classifying — *any* abstract-NONE escalates once, you do **not** first decide whether the claim
194
+ "looks body-level" (that judgment is the salience leak this rule removes). A body-supported claim must
195
+ not be failed on an abstract-only NONE. This is the second-surface escalation, **extended from
196
+ Unreachable to Unsupported-on-abstract** (2026-06-17 dogfood finding #2). Only after a full-text surface
197
+ *also* returns NONE/MENTIONS is the verdict Unsupported. (Single-pass: escalate once, then classify — no loop.)
198
+ - Identifier does not resolve on the **canonical** surface (`/abs/` 404, dead DOI, fabricated id) →
199
+ **Phantom ❌**. A 404 on `/html/` or `/pdf/` while `/abs/` resolves is **surface-absence, not
200
+ identifier-absence** — fall back per *Surface selection*, never Phantom.
156
201
  - Identifier plausibly real but fetch blocked in this environment (paywall/403/timeout/cross-host) →
157
202
  **Unreachable ⏳** — provisional; note the limit, route to a second surface or the human gate; never auto-Phantom.
158
203
  - `WebFetch`/`WebSearch` **absent/disabled in the environment** (not one source — the capability) → Step 2-E
@@ -160,6 +205,40 @@ Grounded: N / Unsupported: N / Unreachable: N / Phantom: N
160
205
  (do not mark every claim Unreachable + CONDITIONAL_PASS — that falsely implies grounding was attempted).
161
206
  - **Never** upgrade NONE to Grounded because a second model "thinks it's probably right" (agreement bias).
162
207
 
208
+ **Coinage-vs-source-term (the external analogue of Step 2's format normalization — but *judged*, not
209
+ mechanical).** FH sometimes attributes a *coined* name to an external source ("the paper's Change
210
+ Manifest", "its Validation Gate") when the paper uses different words for the same mechanism. A name
211
+ mismatch alone is not automatically Unsupported — but the path to rescuing it is **deliberately narrow**,
212
+ because this is one rationalization away from the agreement-bias trap above. The failure direction must
213
+ stay conservative (over-flag, never false-Grounded):
214
+
215
+ - **Possessive/attributive phrasing is presumed a terminology claim.** "the paper's X", "its X",
216
+ "X, per [paper]" → literal absence of the coinage on the source = **Unsupported 🟠**, *unless the FH
217
+ artifact separately states the underlying relation in non-coinage words* (so the mechanism is
218
+ independently legible without the label). A runner may **not** reclassify a bare "the paper's X" as a
219
+ mechanism claim on its own reading — the mechanism claim must already be spelled out **in the artifact**
220
+ before the span-judged path opens. (Closes the laundering move: relabeling a terminology claim as a
221
+ mechanism claim to save it.)
222
+ - **When the mechanism IS spelled out, ground on the relation, not the string — mechanical de-label
223
+ test:** blank out *both* labels (FH's coinage and the source's term); the span must still literally
224
+ state the operative relation the claim asserts — the same inputs→outputs / the same conditional / the
225
+ same action. If, with both names blanked, the span no longer states the claim, the equivalence was
226
+ reader-supplied → **Unsupported 🟠**. (2026-06-17 dogfood: SkillOpt's "validation score" /
227
+ "rejected-edit buffer" survives the de-label test for FH's "Validation Gate" claim — the span states
228
+ the accept-only-on-improvement relation in its own words.)
229
+ - **No self-granted Grounded on a contested match.** If the de-label test is contested rather than clean,
230
+ the runner may **not** write Grounded — the *only* path to Grounded for a disputed term-match is
231
+ **escalate to `/steel-quench` Wave 1** and let the isolated adversary surface the span or reject it.
232
+ Default otherwise = Unsupported. (Removes the runner's discretion to self-select the lenient branch.)
233
+ - **Recommended conservative resolution (publish-facing).** For a publish-facing citation the safer fix
234
+ is to require the artifact to **quote the paper's own term** ("what SkillOpt calls a 'validation
235
+ score'") instead of asserting FH's coinage as the paper's — removing the judgment entirely. Reserve the
236
+ judged de-label path for low-stakes internal claims; match effort to stakes.
237
+
238
+ Unlike Step 2's value-format normalization (mechanical: `300s` ≡ `300 seconds`), coinage-vs-term is
239
+ **judged** — Grounded only via a clean de-label span or a passed steel-quench escalation, never a
240
+ reader's semantic equivalence assertion.
241
+
163
242
  ---
164
243
 
165
244
  ## §Step3-Detail
@@ -13,117 +13,18 @@ deprecated_date: 2026-06-02
13
13
  successor: harness-doctor
14
14
  ---
15
15
 
16
- # self-marketing-lint — FH Self-Marketing Language Detection + Replacement Suggestions
16
+ # self-marketing-lint — DEPRECATED
17
17
 
18
- > Detection criteria are defined inline in this skill (see "Inline Baseline" below).
19
- > The skill is self-contained and does not depend on external personal-memory documents.
18
+ This skill was absorbed into **harness-doctor** on 2026-06-02. Its self-marketing /
19
+ cushion-word / version-brag detection patterns now live in **harness-doctor Step 3-L
20
+ (`--lint` mode)**, which carries the canonical baseline.
20
21
 
21
- ## Triggers
22
-
23
- - `/self-marketing-lint`
24
- - "Check FH files", "remove marketing language", "description diet"
25
- - "Catch self-marketing language", "run the lint"
26
- - When surfaced as a HIGH item in harvest-loop (P10 series)
27
-
28
- ## Detection Patterns (Inline Baseline)
29
-
30
- The following criteria are self-contained inside this skill. No external personal-memory file is required.
31
-
32
- ### Self-Marketing Anti-Patterns
33
-
34
- 1. **Self-declarations of quality**: "industry-leading", "world-class", "production-ready", "the world's first" without external evidence
35
- 2. **Round-counters and version brags**: "after 26 iterations", "v0.5 maturity", "22nd naming round", "prototype 9th iteration" — irrelevant to function
36
- 3. **Cushion language**: "perfectly", "easily", "seamlessly", "instantly", "without burden" — content-free intensifiers
37
- 4. **Owner self-reference loops**: "validated by our team", "proven internally" — circular evidence with no external check
38
- 5. **Marketing word stacking**: 3+ adjectives in one sentence
39
- 6. **Spec-only artifacts**: skills/agents missing a Done When section — declaration without observable output
40
-
41
- | Pattern type | Examples (detection targets) | Replacement direction |
42
- |---|---|---|
43
- | Version labels | "v0.3", "latest version", "(added 2026-05-24)" | Remove (git history manages versioning) |
44
- | Emphasis words | "fully automated", "instantly", "automatically", "full coverage" | Replace with functional description |
45
- | Iteration counts | "22nd naming", "26th round", "prototype 9th iteration" | Remove or move to external document |
46
- | Self-declarations | "This skill is the world's first", "Core FH skill" | Delete |
47
- | Cushion language | "easily", "conveniently", "without burden" | Replace with behavior-based description |
48
- | Spec-only (no outputs) | Skills missing Done When | Require adding an outputs section |
49
-
50
- ## Execution Steps
51
-
52
- ### Step 1. Scan Target Files
53
-
54
- ```bash
55
- # full list of skill files
56
- find {FH_ROOT}/plugins -name "SKILL.md" | sort
57
-
58
- # full list of agent files
59
- find {FH_ROOT}/.claude/agents -name "*.md" | sort
60
-
61
- # docs/ public documents
62
- find {FH_ROOT}/docs -name "*.md" | sort
63
- ```
64
-
65
- Scope: specified files only if target is given / full scan if unspecified.
66
-
67
- ### Step 2. Confirm Inline Baseline
68
-
69
- Re-read the "Detection Patterns (Inline Baseline)" section above. No external file load is required — the skill is self-contained.
70
-
71
- ### Step 3. Pattern Detection
72
-
73
- Grep detection patterns in each file's description field + body.
74
-
75
- ```bash
76
- # version / iteration count patterns
77
- grep -n "v[0-9]\+\.[0-9]\|[0-9]\+th\|[0-9]\+th iteration\|prototype [0-9]\+" {file}
78
-
79
- # emphasis word patterns
80
- grep -n "fully\|instantly\|full coverage\|automatically\|conveniently\|easily\|without burden" {file}
81
-
82
- # self-declaration patterns
83
- grep -n "core skill\|world's first\|best\|most\|top" {file}
84
- ```
85
-
86
- ### Step 4. Output Replacement Suggestions
87
-
88
- Organize and output detection results per file.
89
-
90
- ```
91
- File: {path}
92
- L{line}: "{original text}"
93
- → Suggested: "{replacement text}" (reason: {pattern type})
94
- → Or: recommend removal (reason: {pattern type})
95
- ```
96
-
97
- **No automatic edits**: Output suggestions only. Actual edits proceed after user confirmation.
98
-
99
- ### Step 5. Summary Report
22
+ **Use instead:**
100
23
 
101
24
  ```
102
- ## self-marketing-lint results
103
-
104
- Target: {N} files
105
- Detected: {N} items
106
-
107
- | File | Detection count | Most common pattern |
108
- |---|---|---|
109
- ...
110
-
111
- Recommendation: prioritize files with highest detection counts for review
25
+ /harness-doctor --lint
112
26
  ```
113
27
 
114
- ## Done When
115
-
116
- ```
117
- 1. Full scan of target files complete
118
- 2. Inline baseline criteria applied (no external file dependency)
119
- 3. Detection results organized and output per file
120
- 4. No automatic edits — user confirmation gate maintained
121
- ```
122
-
123
- ## Connections
124
-
125
- | Situation | Connection |
126
- |---|---|
127
- | harvest-loop HIGH P10 series triggered | auto-suggest `/self-marketing-lint` |
128
- | During cold audit | `/steel-quench` + self-marketing-lint in parallel |
129
- | Before external distribution | `fh-meta:hub-persona-auditor` followed by self-marketing-lint |
28
+ Triggers that previously reached this skill (`check FH files`, `remove marketing
29
+ language`, `description diet`) now route to harness-doctor Step 3-L. This stub remains
30
+ only so old references resolve.