@chrono-meta/fh-gate 1.4.19 → 1.4.21

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CATALOG.md CHANGED
@@ -8,19 +8,36 @@ AI reads this file first when searching past work. Open individual files for det
8
8
 
9
9
  <!-- Add entries in reverse date order (newest at top) -->
10
10
 
11
+ ### 2026-06-13 | forge-harness | #judge-robustness, #mechanical-anchor, #hardening-batch-2, #sycophancy-gate, #verification
12
+ **File:** plugins/fh-meta/skills/{verify-bidirectional,steel-quench,asset-placement-gate}/SKILL.md (commit f80bc99)
13
+ Batch-2 of the judge-robustness hardening (after #1-#2 in be2d5dc): three more judge-only verdict paths bound to anchors. #3 verify-bidirectional evidence gate — a persistent-baseline overwrite needs a supporting cited source (read, not existence) or a grep contradiction, else ESCALATE+block; closes the bare-pushback sycophancy vector without restoring AI stubbornness. #4 steel-quench Wave-P3 PASS-framing redaction (mktemp glyph+verdict-phrase strip — the challenger caught a naive bare-PASS global corrupting "status==PASS", an S fixed pre-commit). #5 asset-placement Step 0.5 mechanical pre-grep grounds criterion ④. challenger-verify round 2 was load-bearing again (FAIL→fixed: 1S+2A+4B); sonnet blind sim PASS (evidence gate ESCALATEs on bare overwrite).
14
+ - Decision: #6 sim-conductor (A) deferred — its cross-model hard-gate needs graceful degradation for CC-only environments; not rushed into the batch.
15
+ - Open: #6 sim-conductor staged; npm republish (1.4.21) bundling be2d5dc + f80bc99 pending operator.
16
+
17
+ ### 2026-06-13 | forge-harness | #judge-robustness, #mechanical-anchor, #verification-hardening, #self-audit, #scaled-dispatch
18
+ **File:** plugins/fh-meta/skills/phantom-quench/SKILL.md + templates/.git-hooks/pre-commit + CLAUDE.md/CHEATSHEET (commit be2d5dc)
19
+ Deep-research + a 6-agent parallel swarm audit turned FH's own adversarial method on FH: armed with arXiv 2507.08794 ("One Token to Fool LLM-as-a-Judge"), cold agents found 5/6 of FH's judged-check skills S-exploitable. Common root cause: every terminal verdict is judge-only with no mechanical checksum. Shipped the two highest-leverage fixes — (#1) the pre-commit gate marker now requires a non-vacuous, auditable axis2-evidence field, with the residual (provenance against a self-deceiving runner) documented honestly as the weekly-audit's+operator's, not pretended-closed by a false-security HMAC; (#2) phantom-quench GROUNDED is now gated on a typed mechanical anchor (proper-noun grep-in-asserting-slot / numeric value-after-normalization / branching decompose-or-declare-judged; universal rule: a hit counts only if the line expresses the claimed relation), closing out-of-context grounding. challenger-verify caught a real over-block regression (format-variant false-flag) before it shipped; sonnet blind sim PASS.
20
+ - Decision: bind judged verdicts to mechanical anchors where one exists (verifiability constraint); ship #1-#2 first, stage #3-#6 (verify-bidirectional evidence gate, steel-quench redaction, asset-placement grep, sim-conductor cross-model) — rushing all six would be the over-verification the same research warned against.
21
+ - Open: hardenings #3-#6 staged; scaled-dispatch meta-lesson = value is real-target coverage (~1 reviewer per asset), not agent count.
22
+
23
+ ### 2026-06-13 | forge-harness | #deep-research, #capability-ladder, #no-reinvention, #routing, #goal-quench-max
24
+ **File:** knowledge/shared/harness-core/deep_research_capability_ladder.md (+ CLAUDE.md initiative row, goal-quench/frontier-digest SKILL.md) (commit 55fa3da)
25
+ Deep-research as an FH default — lifts the /deep-research engine ladder that was locked inside frontier-digest into a general routing default. 3 rungs: built-in /deep-research if present → Claude WebSearch+WebFetch synthesis (always available, tier-sensitive) → frontier-digest for AI/harness trend-scan only. No-reinvention: FH routes to the best capability present, builds no research engine. Wired as a CLAUDE.md Autonomous Initiative Layer row (default invocation) + goal-quench max-mode capability-gap fill (flexes in when budget RED), with rung-2 research run in an isolated sub-agent to preserve max's context budget. 4-axis: challenger PASS no-S (4B applied incl. the isolation invariant) + sonnet blind sim PASS (correct rung, trend-scan boundary held).
26
+ - Decision: single-source the ladder (frontier-digest Step-0 becomes a consumer/rung-3, not a parallel definition); /deep-research stays conditional-detect everywhere (phantom-safe).
27
+
11
28
  ### 2026-06-13 | forge-harness | #sidecar-eol-proofing, #agy, #liveness-probe, #rubber-stamp-guard, #upstream-report
12
29
  **File:** plugins/fh-meta/skills/{steel-quench,sim-conductor}/SKILL_detail.md + templates/.git-hooks/pre-commit (commit 4693d00)
13
30
  Completion sweep of all currently-unblocked carries. FP3: agy joins the sidecar panel as T5 (argument form `agy -p` only — stdin pipe prints help, measured; 60s timebox+1 retry hard rule; trusted-artifacts caution since -p auto-approves tools) and gemini detection becomes a dispatch-form stdin liveness probe (EOL 2026-06-18 leaves the binary alive, backend dead — bare `command -v` goes silently stale). Ack hardening: `below-floor-ack:` now requires a verbatim-quoted operator utterance (unquoted reason = agent-self-writable = blocked; residuals — quote fabrication, out-of-context quoting — named as weekly-audit targets). Challenger round: 1S/4A fixed (cross-fence tb() self-containment, probe-form/dispatch-form mismatch, T1~T5, empty-team synthesis gate, comment overclaim); B1 curly-quote locale block refuted by live test. FP1 closed upstream: increment comment (auto-compact non-recovery + 76% non-repro control) posted to anthropics/claude-code#65359 with operator approval. Knowledge orphans resolved: 3 gitignored paper files de-orphaned (unique framework rescued to companion store, stale dupes deleted).
14
31
  - Decision: probe must exercise the same invocation form the dispatch uses (forms diverge on one binary — agy proved it); hook enforces ack form, weekly audit owns genuineness.
15
32
 
16
33
  ### 2026-06-11 | forge-harness | #readme-dedup, #commoditization-defense, #b1-boundary, #seed-vetting, #field-routing
17
- **File:** README.md (PR #93) + fh-be signals/handoffs (PR #24+) + field project v0.14 (private track)
18
- Cloud session #4 (Mode D, remote-doable batch): README stale duplicate "Measured, not asserted" block removed (pre-tier-floor copy contradicted default-Sonnet stance) + "Where this sits (2026)" positioning para (gate+loop are the asset — plumbing commoditizes). fh-be: B1 scope boundary (AW2 simple-vs-complex import), SC1–4 seed vetting (SC1 phantom arXiv fixed → 2509.19349; SC3 commoditization threat to bet ID), gstack positioning-triangle line. Field: private-track companion handoff verified+executed — DefectPatternMatcher+Bug Mode implemented UTF-8-clean in the field repo (25 tests PASS, 0 regressions); its OpenCode(sLLM) lane's downgrade triage hardened to 2-tier (keep/block) with free-tier 80/20 split — β/SC2 field case logged.
19
- - Decision: README action PARTIAL (merge done; About refresh + npm 1.4.14 laptop-bound); full-TC locked on qwen build per operator's CaseCraft limit measurement — structure-transform survives (P6/P6.5 deterministic), meaning-fill routes to frontier.
34
+ **File:** README.md (PR #93) + companion-store signals/handoffs (PR #24+) + field project v0.14 (private track)
35
+ Cloud session #4 (Mode D, remote-doable batch): README stale duplicate "Measured, not asserted" block removed (pre-tier-floor copy contradicted default-Sonnet stance) + "Where this sits (2026)" positioning para (gate+loop are the asset — plumbing commoditizes). Companion store: B1 scope boundary (AW2 simple-vs-complex import), SC1–4 seed vetting (SC1 phantom arXiv fixed → 2509.19349; SC3 commoditization threat to bet ID), gstack positioning-triangle line. Field: private-track companion handoff verified+executed — DefectPatternMatcher+Bug Mode implemented UTF-8-clean in the field repo (25 tests PASS, 0 regressions); its OpenCode(sLLM) lane's downgrade triage hardened to 2-tier (keep/block) with free-tier 80/20 split — β/SC2 field case logged.
36
+ - Decision: README action PARTIAL (merge done; About refresh + npm 1.4.14 laptop-bound); full-TC locked on local-LLM build per operator's CaseCraft limit measurement — structure-transform survives (P6/P6.5 deterministic), meaning-fill routes to frontier.
20
37
 
21
38
  ### 2026-06-11 | forge-harness | #mcp-gating, #external-mcp, #name-keyed-policy, #measured-origin, #field-template
22
39
  **File:** templates/.claude/rules/mcp_tool_gating.md (+ auto_project_mapping.md §6 row 4, CLAUDE.md mount-intent trigger)
23
- Cloud session (Mode D, ext): new field template — external-MCP tool gating with three tiers (ask / ask-meta-write / allow-untrusted-read), name-keyed because server-supplied annotations are unreliable (measured same-day: live messaging-class MCP shipped all-None hints incl. irreversible send + approval-resolution tools — fh-be `signal_2026-06-11_hermes-mcp-cloud-boot.md`). Opus challenger caught the name-spoofing hole (server controls names too → behavior-confirmation required for non-ask tiers, fixed inline); sonnet blind sim PASS on unfilled-§3 scenario (per-item ask on send, batch approval-grant refused).
40
+ Cloud session (Mode D, ext): new field template — external-MCP tool gating with three tiers (ask / ask-meta-write / allow-untrusted-read), name-keyed because server-supplied annotations are unreliable (measured same-day: live messaging-class MCP shipped all-None hints incl. irreversible send + approval-resolution tools — companion store `signal_2026-06-11_hermes-mcp-cloud-boot.md`). Opus challenger caught the name-spoofing hole (server controls names too → behavior-confirmation required for non-ask tiers, fixed inline); sonnet blind sim PASS on unfilled-§3 scenario (per-item ask on send, batch approval-grant refused).
24
41
  - Decision: prefer host-native per-tool permission config as enforcement; this template = what-to-gate + portable fallback. §6 install row is conditional (MCP present); the proactive mount-intent trigger is the load-bearing path.
25
42
 
26
43
  ### 2026-06-11 | forge-harness | #identity-marker, #door-skeleton, #target-tier-sim, #below-floor-consumer, #false-control-kill
@@ -67,7 +84,7 @@ Tier-floor resolution ships — the model dimension of the Sidecar Engine Resolu
67
84
  **File:** docs/OUTPUT_EVIDENCE.md (+ README.md §Model setup)
68
85
  Model-tier flattening measured and published: 30-point blind battery (rule-application + meta-dev fixtures, pre-registered rubric) on four Claude tiers — operation 100/100/97/94 (anchor/Opus 4.8/Sonnet 4.6/Haiku 4.5), tier separation only on above-rubric design increments (3/3·1/3·0.5/3·0/3). Public claim scoped honestly: single trial, self-graded, worked example not benchmark. README §Model setup gains the evidence note grounding the existing Opus recommendation.
69
86
  - Decision: operating FH ≈ model-flat (the harness is the score); developing FH is where tier matters — recommendation unchanged (opus for harness-editing/gates), now evidence-backed. **[superseded same-day by the tier-floor entry above: default flipped to sonnet + floored dispatch; opus pin now Mode-D-only]**
70
- - Open: real Qwen-class measurement on laptop (batteries are a portable fixture pack, fh-be record).
87
+ - Open: real local-LLM-class measurement on laptop (batteries are a portable fixture pack, companion-store record).
71
88
 
72
89
  ### 2026-06-10 | forge-harness | #fc, #consent-lane, #federated-compounding, #starved-center, #v3
73
90
  **File:** tracks/_contrib/README.md (+ .gitignore, templates/contrib_session.md, docs/CONTRIBUTING.md, README.md)
@@ -153,7 +170,7 @@ Axis 5 check-class taxonomy added: every verify check classified as mandatory-pa
153
170
  **File:** knowledge/shared/harness-core/multi_model_sidecar_strategy.md (+ hybrid_orchestration_architecture_roadmap.md)
154
171
  Added canonical §Sidecar Engine Resolution Protocol — Tier1 subscription-CLI → Tier2 API-key → Tier3 Claude-subagent guaranteed fallback. Principle: discovery automatic/free, invocation value-gated (intelligent default multi-AI, no hard-fail for Mode C). Wired pointers into goal-quench Step D / steel-quench runtime-adapter / harvest-loop Step 3.5-X; sim-conductor/pipeline-conductor/agent-composer inherit by reference. Source hybrid-orchestration design archived as proposed roadmap (versions→placeholders, Python pseudo-code→illustrative, non-shipped tagged Proposed). PR #80.
155
172
  - Decision: single-source resolution protocol — skills cite it instead of re-inventing "if available" probes.
156
- - Open: npm republish (machine-bound) — 3 npm-shipped SKILL.md changed; handed off to laptop via fh-be.
173
+ - Open: npm republish (machine-bound) — 3 npm-shipped SKILL.md changed; handed off to laptop via the companion store.
157
174
 
158
175
  ### 2026-06-09 | forge-harness | #onboarding, #greeting, #3-axis-scaffold, #returning-user
159
176
  **File:** knowledge/shared/harness-core/fh_detail_protocols.md
package/CHEATSHEET.md CHANGED
@@ -98,13 +98,14 @@ git config core.hooksPath templates/.git-hooks
98
98
  chmod +x templates/.git-hooks/pre-commit
99
99
  ```
100
100
 
101
- After running `/steel-quench` and `/phantom-quench` in your session, Claude creates the Axes 2+3 pass marker automatically. The marker must carry machine-readable floor fields — the hook validates them (a bare `touch` marker no longer passes; below-floor passes block unless an explicit `below-floor-ack:` line records operator acceptance, **quoting the operator's approval utterance verbatim** — an unquoted reason is rejected as agent-self-writable). If Claude doesn't create it (e.g., session interrupted), create it manually:
101
+ After running `/steel-quench` and `/phantom-quench` in your session, Claude creates the Axes 2+3 pass marker automatically. The marker must carry machine-readable floor fields — the hook validates them (a bare `touch` marker no longer passes; below-floor passes block unless an explicit `below-floor-ack:` line records operator acceptance, **quoting the operator's approval utterance verbatim** — an unquoted reason is rejected as agent-self-writable). It also requires an **`axis2-evidence:`** line recording what the pass actually found (a finding count or verdict token — `PASS no-S` / `1S/4A fixed` / `clean — 0 findings`); a vacuous "it ran" line is rejected. *Honest scope: this enforces the marker is non-vacuous + auditable, not that the pass truly ran — a fabricated attestation is the weekly audit's + operator's residual, by design (judge-robustness swarm, 2026-06-13).* If Claude doesn't create it (e.g., session interrupted), create it manually:
102
102
 
103
103
  ```bash
104
104
  cat > "tracks/_meta/.axes_23_passed_$(git rev-parse --abbrev-ref HEAD | tr '/' '_')_$(date +%Y-%m-%d).marker" <<'EOF'
105
105
  axis2-engine: quench-challenger
106
106
  axis2-model: opus
107
107
  floor-status: at-floor
108
+ axis2-evidence: PASS no-S
108
109
  <scope / findings prose>
109
110
  EOF
110
111
  ```
package/CLAUDE.md CHANGED
@@ -157,8 +157,11 @@ No user request is needed — this is a mandatory autonomous step, not a proposa
157
157
  FH asset modified → Axis 1 (regression_guard.sh --pr {BRANCH})
158
158
  → Axis 2 (/steel-quench) → Axis 3 (/phantom-quench)
159
159
  → marker: tracks/_meta/.axes_23_passed_{branch}_{date}.marker
160
- (structured — required fields: axis2-engine / axis2-model / floor-status;
161
- hook validates mechanically: below-floor blocks without below-floor-ack)
160
+ (structured — required fields: axis2-engine / axis2-model / floor-status / axis2-evidence;
161
+ hook validates mechanically: below-floor blocks without below-floor-ack, and
162
+ axis2-evidence must be non-vacuous — a recorded verdict/count, not "it ran". Honest
163
+ scope: form + non-vacuity + auditability, NOT provenance — a fabricated marker is the
164
+ weekly audit's + operator's residual by design, judge-robustness swarm 2026-06-13)
162
165
  → Axis 4 (/edit-manifest RECORD, today's entry in edit_manifest.yaml)
163
166
  → All 4 PASS → git commit allowed | Any FAIL → fix inline, re-run
164
167
  ```
@@ -308,6 +311,7 @@ Proposal format: `"I see [X]. Want me to run /[skill] to [one-line description]?
308
311
  | "keep watching X", "poll this", "check every N minutes", recurring WATCH item | built-in `/loop` (interval runner) — pair with the WATCH list, don't hand-poll |
309
312
  | "are these in sync", "synergy", "can these integrate", "any overlap" | `/cross-ecosystem-synergy-detection` |
310
313
  | "latest trends", "frontier", "external resources" | `/frontier-digest` |
314
+ | "research this deeply", "survey the literature", "comprehensive analysis", "deep research", "look this up thoroughly", "조사해줘", "리서치" (general topic research, not trend-scan) | **Deep-Research Capability Ladder** (`knowledge/shared/harness-core/deep_research_capability_ladder.md`) — route to the highest available rung: built-in `/deep-research` if present → else Claude `WebSearch`+`WebFetch` synthesis (tier-sensitive) → `/frontier-digest` only if it's AI/harness trend-scan. No-reinvention: FH routes, does not build a research engine. |
311
315
  | "orchestrate agents", "parallel dispatch", "combine skills", "multiple agents" | `/agent-composer` |
312
316
  | "run a simulation", "external user perspective", "internal audit", "quality check" | `/sim-conductor` |
313
317
  | "first install", "FH setup", "wizard", "install-wizard" | `/install-wizard` |
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@chrono-meta/fh-gate",
3
- "version": "1.4.19",
3
+ "version": "1.4.21",
4
4
  "description": "FH runtime adapters — run FH governance, skills, and agents via Claude or Codex with machine-parseable gates.",
5
5
  "license": "MIT",
6
6
  "keywords": [
@@ -37,7 +37,8 @@ When unsure where to place a new asset or skill:
37
37
 
38
38
  1. Request full file path from user (or accept natural language description)
39
39
  2. Load asset content via `Read` (if path provided)
40
- 3. Evaluate Step 1 4-criteria in order (LLM makes the judgment directly)
40
+ 2.5. Step 0.5 mechanical overlap pre-scan (grounds criterion ④ before the judged pass)
41
+ 3. Evaluate Step 1 4-criteria in order (LLM makes the judgment, ④ gated on the Step 0.5 scan)
41
42
  4. ① + ④ both pass + at least one of ②③ passes → output **"FH suitable"**
42
43
  Otherwise, proceed to Step 2 local assessment → if fails, output **"Project-local agent or no asset needed"**
43
44
 
@@ -64,7 +65,32 @@ Immediately after trigger, acquire asset content in the following order.
64
65
  > **Which asset should I evaluate?**
65
66
  > Enter a file path (e.g., `.claude/agents/jira-create.md`) or a description.
66
67
 
67
- After acquiring the asset content, **Claude directly** applies Step 1 4-criteria (no external calls).
68
+ After acquiring the asset content, run Step 0.5 (mechanical overlap pre-scan) **before** the judged Step 1.
69
+
70
+ ## Step 0.5. Mechanical Overlap Pre-Scan (grounds criterion ④)
71
+
72
+ Criterion ④ ("no overlap with existing FH skills") is otherwise an LLM **recall** judgment with no
73
+ ground truth — a duplicate skill with a novel name passes because the judge has no enumerated list to
74
+ check against (judge-robustness swarm, 2026-06-13). Ground it mechanically first:
75
+
76
+ ```bash
77
+ # enumerate existing skill names + descriptions (grounds the judged comparison)
78
+ grep -riE "name:|description:" plugins/fh-meta/skills/*/SKILL.md plugins/fh-commons/skills/*/SKILL.md
79
+ # hard-collision check: WHOLE proposed name or a WHOLE trigger phrase reused verbatim.
80
+ # grep -wF (whole-word, fixed-string) on the full strings — NOT -E on tokens (a shared common
81
+ # word like "review" is not a collision). Exclude the asset's own file (self-match = false hit).
82
+ SELF="plugins/fh-meta/skills/<proposed name>/SKILL.md"
83
+ grep -rwF -e "<proposed full name>" -e "<full trigger phrase 1>" -e "<full trigger phrase 2>" \
84
+ plugins/*/skills/*/SKILL.md | grep -v "$SELF" | grep -c .
85
+ ```
86
+
87
+ Surface **collision count + nearest existing skill(s)**. Criterion ④ then passes only if **0 whole-name/
88
+ whole-trigger collision AND the judged ≤90%-overlap check agrees**. A verbatim whole-name or
89
+ whole-trigger reuse is a hard ④ fail regardless of the LLM judgment. **Honest scope**: the grep grounds
90
+ *literal* name/trigger reuse only — a post-cutoff duplicate with a *paraphrased* trigger is invisible to
91
+ both the judge (cutoff) and the grep (literal); that residual leans on the judged half **fed the
92
+ enumerated descriptions above** (grounded comparison, not pure memory), not on full mechanization. A
93
+ shared common word is a judged-review flag, **not** a hard fail.
68
94
 
69
95
  ## Done When
70
96
 
@@ -83,7 +109,7 @@ All steps 0–3 completed
83
109
  | ① | Cross-project value | Is this asset equally useful in other projects without depending on a specific project? |
84
110
  | ② | Orchestration / judgment layer | Is it just a list of MCP/Bash calls, or a judgment layer that synthesizes multiple signals? |
85
111
  | ③ | Not replaceable by built-ins | Can this be equally achieved with direct MCP calls or basic bash? (If yes, fails this criterion) |
86
- | ④ | No overlap with existing FH skills | Does it not overlap 90%+ with existing FH skills? |
112
+ | ④ | No overlap with existing FH skills | Step 0.5 mechanical scan = 0 name/trigger collision **AND** judged ≤90% overlap. Non-zero collision hard fail. |
87
113
 
88
114
  **FH suitable** → ① + ④ both pass + at least one of ②③ passes.
89
115
  **Fail** → ① or ④ fails → immediate fail. Or both ②③ fail → proceed to Step 2.
@@ -27,6 +27,12 @@ model: sonnet
27
27
 
28
28
  ## Step 0. API Environment Detection
29
29
 
30
+ This is the trend-scan specialization (rung 3) of the **Deep-Research Capability Ladder**
31
+ (`../../../../knowledge/shared/harness-core/deep_research_capability_ladder.md`). The ladder owns
32
+ the **cross-rung routing** (when a task is trend-scan at all vs general research); the `Priority:`
33
+ block below is frontier-digest's own **internal engine resolution** for the HN/arxiv case (API-key
34
+ vs WebSearch) — a narrower detection, not a re-definition of the cross-rung ladder.
35
+
30
36
  ```
31
37
  Priority:
32
38
  0. /deep-research built-in available (check live session skill list)
@@ -43,6 +43,8 @@ goal-quench is a ladder, not a fixed shape. The default (**core**) is the narrow
43
43
 
44
44
  Each mode is a **superset** of the one before it — pro does everything core does, plus more. Nothing in core is removed by escalating.
45
45
 
46
+ **Max-mode deep-research routing**: capability-gap fill recognizes a **research-heavy goal** (the goal needs to survey/gather/reconcile external sources before building — e.g. "implement X" where X needs domain grounding) and routes it through the **Deep-Research Capability Ladder** (`knowledge/shared/harness-core/deep_research_capability_ladder.md`): take the highest available rung (built-in `/deep-research` → Claude `WebSearch`+`WebFetch` synthesis → `frontier-digest` only for trend-scan). `plugin-recommender` is proposed **only if no rung is available** (rung 2 always is, for a Claude session) — so this is routing, not a new install by default. **Isolation invariant**: rung-2 research (WebSearch/WebFetch) runs in an **isolated sub-agent that returns only the synthesis** — fetched source content must not load into the orchestrator context, preserving the context-isolation/budget property max mode depends on (see the Token-honesty guard above). Honesty caveat carries from the ladder: research quality is bounded by source access + session model tier, not by invoking it.
47
+
46
48
  **Selection**:
47
49
  - Explicit flag: `/goal-quench --core` (default) · `--pro` · `--max`
48
50
  - Auto: Phase 1's budget verdict proposes the mode (see Phase 1 Step 2). The user can always override **down** to core.
@@ -123,14 +123,44 @@ Extract claims from the artifact that require source back-tracing. Claim types:
123
123
 
124
124
  Back-trace each claim to the declared source files using Read + Grep directly — no inference judgment. Partial match is not treated as match.
125
125
 
126
+ **Mechanical anchor (GROUNDED is gated on a literal grep hit *in the right slot*, not a bare
127
+ judgment)** — this closes *out-of-context grounding*: citing a real, readable file for a false claim,
128
+ where the judge biases GROUNDED merely because the source *exists* and contains domain-adjacent text
129
+ (measured 2026-06-13, judge-robustness swarm — the cheapest S-exploit of this skill). The anchor is
130
+ **typed** — applying one byte-literal rule to every claim type both under-blocks (a token that hits an
131
+ irrelevant line) and over-blocks (a correct value formatted differently); each type gets the right check:
132
+
133
+ - **Proper noun / exact identifier** (skill name, file path, API name, flag): run
134
+ `grep -n "<exact token>" <cited file>` — must return a non-empty line, surfaced **literally**
135
+ (`file:line: matched text`), and that line must be where the identifier is *used as the claim
136
+ asserts* (a bare occurrence elsewhere is **not** grounding — e.g. claim "X is the default model" needs
137
+ the line that sets X as default, not any line mentioning X). No qualifying hit → Phantom.
138
+ - **Numerical / range value**: grep the value, but **normalize format/unit before judging** —
139
+ `300s` ≡ `300 seconds`, `≥5` ≡ `>= 5` (ASCII/Unicode), `5 minutes` ≡ `300 seconds`. The *value* must
140
+ sit in the slot the claim asserts. A correct value in a different format = Grounded (note the
141
+ normalization in evidence); a **different** value in that slot = Partial (re-confirm) or Phantom.
142
+ Never auto-Phantom a format variant.
143
+ - **Branching / multi-clause condition**: no single greppable token. Either **decompose to atomic
144
+ sub-conditions** and grep each identifier, or — if it stays a compound judgment — keep it
145
+ **judged-class with the adversarial pairing declared** (do not fake a single-token grep). State which.
146
+
147
+ **Universal rule**: a grep hit counts only if the surfaced line *expresses the claimed relation*.
148
+ "The token appears somewhere in a real file" is precisely the out-of-context trap, not evidence.
149
+
150
+ A claim that is **cited to a specific source but the source cannot be Read** (path doesn't resolve,
151
+ line beyond EOF) is **S-grade Phantom**, *not* the softer Source-Missing 🔴 — a citation that does not
152
+ resolve means the citation was invented. Source-Missing 🔴 is reserved for *undeclared* sources only.
153
+ Declared "sources" that are not resolvable file paths (e.g. `source: "the codebase"`) count as **0
154
+ sources** for the Step 0.5 blocker.
155
+
126
156
  Back-tracing classification:
127
157
 
128
158
  | Classification | Criteria | Marking |
129
159
  |---|---|:---:|
130
- | **Grounded** | Claim directly confirmed in source | ✅ |
131
- | **Partial** | Similar content in source but not exact match — needs re-confirmation | ⚠️ |
132
- | **Phantom** | Cannot be found in source | ❌ |
133
- | **Source-Missing** | Source itself cannot be Read or was not declared | 🔴 |
160
+ | **Grounded** | Typed anchor passes: identifier grep-hits *in the asserting slot* / value matches after normalization / branching sub-conditions trace (line surfaced) | ✅ |
161
+ | **Partial** | A *different* value sits in the claimed slot, or a compound condition partially traces — needs re-confirmation | ⚠️ |
162
+ | **Phantom** | Exact token not found in source, **or** cited to a named source that cannot be Read | ❌ |
163
+ | **Source-Missing** | Source was **not declared** (undeclared only a failed *cited* source is Phantom) | 🔴 |
134
164
 
135
165
  > **Detail**: See `SKILL_detail.md §Step2-Detail` — back-tracing execution procedure, classification decision rules, and Step 2 output format template — read when handling edge cases or formatting results.
136
166
 
@@ -232,7 +262,7 @@ This skill can be used independently without the full meta-harness structure.
232
262
 
233
263
  ```
234
264
  Step 1 claim extraction complete
235
- + Step 2 all claims back-traced (using Read toolno inference judgment)
265
+ + Step 2 all claims back-traced (Read + Grep highest-priority GROUNDED requires a typed literal grep hit in the asserting slot, not inference)
236
266
  + Step 3 Phantom severity classification + prescription output
237
267
  + Step 4 process pattern diagnosis complete (skip if 0 Phantoms)
238
268
  + "phantom-quench Complete" declaration output
@@ -244,7 +274,7 @@ Verdict: PASS (0 Phantom claims) | CONDITIONAL_PASS (LOW-severity Phantoms only,
244
274
 
245
275
  ## Operating Notes
246
276
 
247
- - **Never back-trace by inference**: Judging "this value is probably in the source" treats it as Partial not Phantom. Always directly confirm with Read + Grep.
277
+ - **Never back-trace by inference**: Judging "this value is probably in the source" treats it as Partial not Phantom. Always directly confirm with Read + Grep. **GROUNDED on a highest-priority claim is gated on a literal grep hit of the exact token (Step 2 mechanical anchor) — "the file exists and looks right" is the out-of-context-grounding trap, not evidence.**
248
278
  - **Partial is not Grounded**: Processing similar-value-in-source as Grounded misses the reconstruction modification pattern.
249
279
  - **Source not declared itself is S-grade**: If source is not declared when making an artifact, no claim can subsequently be verified. Recommend mandating source declaration in the process design stage.
250
280
  - **Recommended to use with steel-quench**: steel-quench quenches structural flaws, phantom-quench ensures source consistency. The two skills are orthogonal and artifact quality assurance is strengthened when used together.
@@ -78,7 +78,7 @@ Read target artifact(s) → classify on 5 dimensions → output recommendation
78
78
  | `artifact_type` | SKILL.md / design-doc → Area B + D-skill↑ · README / CHEATSHEET → Area A↑ · code / config → Area D-code↑ |
79
79
  | `audience` | external installer / first-time user → beginner↑ · internal team only → challenger↑ |
80
80
  | `claim_density` | 3+ stated benefits or superlatives → challenger↑ |
81
- | `risk_level` | external publish / marketplace listing → steel-quench prerequisite triggered |
81
+ | `risk_level` | external publish / marketplace listing → steel-quench prerequisite triggered. **Mechanical floor (not judge-only)**: any of — publish/marketplace target · public-surface or visibility change · auth/secret-handling or executable code · an FH asset under the 4-axis gate — **forces `risk_level ≥ medium`** regardless of profiler judgment. The floor closes the "fool the profiler into `low` to skip Step 0.6" seam; the judge may only raise above the floor, never below it. |
82
82
  | `novelty` | first-of-its-kind / no prior session evidence → phantom-quench recommended |
83
83
 
84
84
  ```
@@ -149,6 +149,48 @@ Concern format: `"One thing to check before [Area X]: [concern]. Proceed?"`
149
149
 
150
150
  ---
151
151
 
152
+ ## Step 0.6 — Cross-Model Coverage Gate (risk≥medium — hard)
153
+
154
+ Closes the homogeneous-blind-spot + formatting-flattery vector (judge-robustness swarm, 2026-06-13):
155
+ a panel of same-session Claude sub-agents shares one model's blind spots, so a clean verdict can be
156
+ flattery the **whole panel** is blind to — and with no quotable-rule violation, no persona escalates.
157
+ For `risk_level ≥ medium` targets (from Step 0.3), at least one persona MUST come from a source
158
+ **outside the orchestrator's own session context**. This **promotes the former advisory "dual
159
+ validation principle"** (detail §AreaB-Baseline #4) to a hard gate — the mechanical-anchor pattern of
160
+ hardening #1–#5: a judged verdict binds to a fact the judge cannot fake.
161
+
162
+ **Graceful-degradation ladder** — take the highest available rung. The gate **never breaks
163
+ sim-conductor**; at the bottom it only withdraws the *unsafe autonomy* (self-certifying a blind verdict),
164
+ not the run:
165
+
166
+ | Rung | Source | `cross_model_coverage` | Closes | When |
167
+ |---|---|:---:|---|---|
168
+ | 1 | External CLI team (Multi-Team Mode — §MultiTeam) | `external` | model-level blind spot (genuine cross-model) | 1+ external CLI live + probe non-empty |
169
+ | 2 | Cross-session Claude — `claude -p` headless, or an Agent with **zero inherited context** | `cross-session` | **session-contamination only** — a fresh Claude shares the same weights/RLHF gradient, so it does **not** close the model-level blind spot; it only removes the orchestrator's working-memory bias. Honest partial mitigation, labeled as such | no external CLI; dispatch probe returns non-empty |
170
+ | 3 | Same-session sub-agents only | `NONE` | nothing — homogeneous panel | neither rung's probe succeeded |
171
+
172
+ **Mechanical anchor** — `cross_model_coverage` is valid **only if backed by a quoted dispatch artifact**,
173
+ not a self-assessment (the self-signing hole hardening #1 closed for the marker — same fix here). To
174
+ record `external` or `cross-session`, the Step 3 report must **quote a non-empty excerpt of the actual
175
+ dispatch output** (external CLI stdout, or the dispatched Agent's returned verdict text); a label with no
176
+ quoted excerpt is invalid and falls to `NONE`. **Liveness, not mere availability**: probe the rung before
177
+ claiming it — attempt the dispatch with a timeout; if it errors or returns empty (plan-gate closed,
178
+ context saturated, CLI present-but-dead), record `NONE`, never assume the rung succeeded. This is the
179
+ same honest scope as #1: the artifact makes the claim **auditable**, not cryptographically unforgeable —
180
+ a fabricated excerpt remains the operator's + weekly-audit's residual by design. On rung 3 (`NONE`) for a
181
+ risk≥medium target:
182
+ - the report flags `⚠️ cross-model coverage: NONE — homogeneous same-session panel; verdict provisional`, **and**
183
+ - **Step 4 auto-commit privilege is withdrawn** (see Step 4): M-tier fixes may be *prepared* but the
184
+ commit waits for the operator's explicit go. Auto-committing a structurally self-blind verdict is
185
+ exactly what the exploit targeted — so that single privilege is what degrades, not the simulation.
186
+
187
+ `risk_level = low` targets are exempt (a homogeneous panel is acceptable); the gate fires only at
188
+ medium+. The rung-2 fallback is what makes this CC-only-safe: a Claude-only environment still gets a
189
+ real cross-context read (a fresh isolated dispatch shares no working memory with the orchestrator), so
190
+ `NONE` is reached only when *both* external CLIs and a second Claude context are unavailable.
191
+
192
+ ---
193
+
152
194
  ## Step 1 — Area-Specific Simulation
153
195
 
154
196
  ### Area A — External User Perspective
@@ -315,6 +357,29 @@ positives erode reviewer trust.
315
357
  - No forced consensus or forced conflict — report Common opinions (2+ personas agree) and Conflicts
316
358
  (position A vs B, each with rationale) as-is. Feeds Step 2 M/S/R triage (M ← Critical or 2+ personas).
317
359
 
360
+ **Zero-coverage map (mandatory synthesizer output)** — the synthesizer must emit, mechanically, the set
361
+ of standpoints that produced **no** finding, not only the ones that did (judge-robustness swarm,
362
+ 2026-06-13). Enumerate the persona **standpoints** in play — those the Step 0.3 profile recommended,
363
+ **plus the standpoints its dimensions imply** (risk_level=high → a security/publish standpoint ·
364
+ audience=mixed → a non-native-reader standpoint · claim_density=high → a claim-evidence standpoint).
365
+ List the *standpoints*, not the bare dimension names (a `risk_level (low) → ZERO` row is noise that
366
+ trains operators to ignore ⚠️). Mark each `covered` (≥1 persona addressed it) or `ZERO` (no persona
367
+ touched it):
368
+
369
+ ```
370
+ Coverage map:
371
+ beginner (onboarding friction) → covered (A-1: 2 findings)
372
+ challenger (claim-evidence) → covered (A-3: 1 finding)
373
+ security surface (risk_level=high) → ZERO ⚠️
374
+ non-native reader (audience=mixed) → ZERO ⚠️
375
+ ```
376
+
377
+ A clean report with `ZERO` standpoints is **not** a pass — it is an uncovered surface, reported as such.
378
+ This converts the formatting-flattery failure (everything reads fine → nothing escalates) into a visible
379
+ gap: **silence on a standpoint is reported as `ZERO`, never inferred as approval.** It is a checklist
380
+ derived from the profile and the dispatch outputs — a mechanical anchor, not a judgment. Carry the map
381
+ verbatim into the Step 3 report.
382
+
318
383
  The two severity vocabularies are layered, not redundant: a persona running **in isolation** assigns only
319
384
  its own Critical/Important/Suggestion — it cannot assign M/S/R, since `S = found by 3+ personas` depends on
320
385
  cross-persona agreement the isolated persona never sees. The synthesizer is the only context that can triage
@@ -365,6 +430,13 @@ File: `$REPORT_DIR/sim_YYYY_MM_DD_area_[X].md`
365
430
  1+ M-tier → fix immediately → commit. PR creation requires explicit user request.
366
431
  0 M-tier → commit report only + report S/R backlog.
367
432
 
433
+ **Cross-model gate on auto-commit (risk≥medium)** — when Step 0.6 recorded `cross_model_coverage: NONE`
434
+ on a risk≥medium target, the auto-commit privilege is **withdrawn**: prepare the M-tier fixes and write
435
+ the report, but do **not** self-commit — surface *"cross-model coverage NONE on a risk≥medium target;
436
+ the verdict is from a homogeneous same-session panel. Commit the fixes, or add a cross-context read
437
+ first?"* and wait for the operator's go. `external`/`cross-session` coverage, or risk_level=low, commits
438
+ as normal. (This withdraws one privilege, not the run — the report and fixes still exist.)
439
+
368
440
  > **Detail**: See `SKILL_detail.md §PR-Bash` — branch creation bash, commit + push, gh pr create template — read when creating a PR.
369
441
 
370
442
  ---
@@ -390,8 +462,10 @@ Convergence within an AI-AI loop is **provisional**. Elevated to final only afte
390
462
  | 1+ M-tier → fixed + committed (or "none") | ✅ Prescription complete |
391
463
  | Report `tracks/_meta/sim_YYYY_MM_DD_*.md` saved | ✅ Persistence complete |
392
464
  | 0 M-tier → report committed + S/R backlog reported | ✅ Health check complete |
465
+ | risk≥medium → `cross_model_coverage` recorded in report (external/cross-session/NONE) | ✅ Coverage gate ran *(check class: measured — the recorded value reflects the dispatch path that ran, not a self-grade; pair: NONE withdraws auto-commit per Step 4)* |
466
+ | Synthesizer emitted the zero-coverage map (every profile standpoint marked covered/ZERO) | ✅ Blind-spot surface reported *(check class: mechanical — a checklist over the profile, not a judgment)* |
393
467
 
394
- Verdicts: PASS · CONDITIONAL_PASS (S/R only, or Area B cadence skip) · FAIL (M-tier unresolved) · ESCALATE (persona conflict requiring human judgment).
468
+ Verdicts: PASS · CONDITIONAL_PASS (S/R only, or Area B cadence skip) · FAIL (M-tier unresolved) · ESCALATE (persona conflict requiring human judgment, **or** `cross_model_coverage: NONE` on risk≥medium → auto-commit withdrawn pending operator go).
395
469
 
396
470
  **Mandatory for Area A (external publish)**: steel-quench must complete in same session before Area A is marked complete.
397
471
 
@@ -214,7 +214,7 @@ Structural methods to reduce self-reference risk in Area B:
214
214
  1. **Regular adversarial attacks**: Area B once/month + `challenger` attack once/quarter. Route challenger → defense results directly into SKILL.md via steel-quench handoff after Area B ends.
215
215
  2. **Direct external user validation**: Non-owner attempts install + invocation → collect reactions. (cascade β validated: first autonomous external run confirmed.)
216
216
  3. **steel-quench integration**: After Area B ends, hand off challenger findings to `/steel-quench` for deeper adversarial review + SKILL.md inscription.
217
- 4. **Dual validation principle**: Internal validation (Area B) alone is insufficient — minimized only when combined with external install reaction collection or cross-model validation.
217
+ 4. **Dual validation principle**: Internal validation (Area B) alone is insufficient — minimized only when combined with external install reaction collection or cross-model validation. **For risk≥medium targets this is no longer advisory: it is the hard Cross-Model Coverage Gate** (SKILL.md Step 0.6) — at least one persona from outside the orchestrator's session context (external CLI → cross-session Claude → else `cross_model_coverage: NONE` withdraws auto-commit). Promoted from advisory to gate by the judge-robustness swarm (2026-06-13): a homogeneous same-session panel shares blind spots, so its clean verdict cannot self-certify.
218
218
 
219
219
  **Dispatch template for Area B parallel**:
220
220
  ```
@@ -291,6 +291,7 @@ type: simulation-report
291
291
  date: YYYY-MM-DD
292
292
  areas: [A|B|C|D|E|all]
293
293
  target_profile: [artifact_type | audience | risk_level]
294
+ cross_model_coverage: [external | cross-session | NONE | n/a-low-risk] # Step 0.6 — recorded from dispatch path
294
295
  m_count: N
295
296
  s_count: N
296
297
  r_count: N
@@ -299,6 +300,10 @@ r_count: N
299
300
  ## Target Profile
300
301
  artifact_type: [type] · audience: [internal|external|mixed] · risk_level: [low|medium|high]
301
302
 
303
+ ## Coverage map (Step 1.5 — every profile standpoint marked covered/ZERO)
304
+ [verbatim zero-coverage map; ⚠️ on each ZERO]
305
+ cross_model_coverage: [external | cross-session | NONE] # NONE on risk≥medium → auto-commit withdrawn (Step 4)
306
+
302
307
  ## M-tier ([N] items)
303
308
  | # | Issue | Location | Prescription |
304
309
  ...
@@ -164,8 +164,30 @@ No gate-PASS in scope → skip Wave-P3 entirely.
164
164
  > dimensions the gate's own pass criteria structurally could not check. Only when all three Attack Failed can
165
165
  > a **"Real PASS"** be declared.
166
166
 
167
+ **PASS-framing redaction (mandatory pre-step)** — the artifact reaches Wave-P3 *carrying its own
168
+ PASS declaration* (a `✅`, a "Verified" header, the gate marker), and a re-judge that reads that
169
+ framing is biased toward "Attack Failed" — the exact bias Wave-P3 exists to defeat (judge-robustness
170
+ swarm, 2026-06-13). Before feeding the artifact to the dimensions/challenger, **strip the pass-framing
171
+ at the bash layer, not the AI layer**:
172
+
173
+ ```bash
174
+ # Strip framing glyphs + canonical FH verdict PHRASES only — never the bare word PASS
175
+ # (a global s/PASS//g corrupts substance: "status==PASS" → "status==", manufacturing findings).
176
+ P3R=$(mktemp)
177
+ sed -E -e 's/[✅🟩✓]//g' \
178
+ -e 's/(ALL AXES PASSED|FH_GATE_VERDICT:[[:space:]]*PASS|Real PASS|VERIFIED|[Gg]ate[^.]{0,24}declared PASS)//g' \
179
+ "{ARTIFACT}" > "$P3R"
180
+ # feed "$P3R" to Wave-P3; clean up after: rm -f "$P3R"
181
+ ```
182
+
183
+ Feed `$P3R` to Wave-P3, then `rm -f "$P3R"`. The redaction is mechanical, so it cannot itself be
184
+ placated. **Honest scope**: it strips framing glyphs and *canonical FH verdict phrases* — bare-word or
185
+ lowercase prose self-congratulation ("this passed review", "looks green") is out of scope for `sed`
186
+ and is covered by a prose backstop instead: the Wave-P3 persona is instructed to **disregard any
187
+ residual self-pass language in the artifact** and attack as if no verdict were stated.
188
+
167
189
  **Agent utilization**:
168
- - `fh-commons:quench-challenger` (optional) — adds 6-axis structural attack to each dimension. If absent, run the 3 dimensions directly.
190
+ - `fh-commons:quench-challenger` (optional) — adds 6-axis structural attack to each dimension, fed the **redacted** artifact. If absent, run the 3 dimensions directly on the redacted copy.
169
191
  - `fh-meta:persona-innovator` (after convergence) — error/gap patterns found during Wave-P3 → auto-propose new Cross-Project Pattern rows or skill-candidate signals.
170
192
 
171
193
  The three dimensions generalize the gate's three blind spots:
@@ -10,6 +10,7 @@ complexity_routing:
10
10
  escalate_when:
11
11
  - full_revalidation
12
12
  - high_stakes
13
+ - fail_verdict # AI recommendation was wrong → baseline overwrite is high-stakes, never stay at sonnet
13
14
  ---
14
15
 
15
16
  # verify-bidirectional — Bidirectional Self-Validation Automation
@@ -50,6 +51,28 @@ Treat user's statement as **external refinement material**. **Do NOT attempt to
50
51
 
51
52
  Core proposition: "refinement challenge ≠ fundamental negation". Priority is identifying where the initial recommendation is weakened.
52
53
 
54
+ **Evidence gate (overwrite ≠ soften)** — closes the sycophancy/steering vector where a bare assertion
55
+ ("that's wrong, re-examine") flips a baseline with zero evidence (judge-robustness swarm, 2026-06-13).
56
+ "Do NOT defend" still holds for *this conversation's proposition* (anti-stubbornness is the point), but a
57
+ **persistent-baseline overwrite** (a rule, asset, memory, or `knowledge/` claim — anything that outlives
58
+ this session) requires a **supporting basis**, not mere pushback:
59
+
60
+ - **(a)** the user cited a file / line / commit / URL / past decision **and the cited content actually
61
+ supports the challenge** — verified by *reading it*, not by its mere existence (an irrelevant-but-real
62
+ citation is the out-of-context-grounding trap, the same vector phantom-quench guards), **or**
63
+ - **(b)** the Step 2 grep returns an actual contradiction that *supports the challenge* (not just any
64
+ conflict with the original) — surfaced literally.
65
+
66
+ If a baseline overwrite is implied but **neither holds**, do not silently rewrite: verdict is **ESCALATE**
67
+ — surface *"this would overwrite baseline {X} with no cited evidence; confirm override, or provide a
68
+ source?"* and block the Step 4 cascade until the operator answers. Softening a local in-conversation
69
+ proposition (no persistent asset changed) proceeds as before — the gate fires only on persistent-baseline
70
+ writes. **Sequencing**: this gate is written in Step 1, but Step 4 is what enumerates affected persistent
71
+ assets — so if Step 4 later identifies *any* persistent asset to write, **re-apply this gate before Step
72
+ 4.5** even if the original challenge first looked like a mere soften. This is **not** restored AI defensiveness: the AI still does not argue the user is wrong; it only
73
+ declines to *fabricate a baseline change* the evidence does not support (mirror of the steel-quench/phantom
74
+ mechanical-anchor principle — judged verdicts bind to evidence).
75
+
53
76
  ### Step 2. Consistency Area Grep (3-step mandatory)
54
77
 
55
78
  Grep to find which rules, assets, or propositions conflict with the initial recommendation: