npm - @chrono-meta/fh-gate - Versions diffs - 1.4.7 → 1.4.9 - Mend

@chrono-meta/fh-gate 1.4.7 → 1.4.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (14) hide show

package/CATALOG.md +150 -0
package/CHEATSHEET.md +1 -0
package/CLAUDE.md +58 -8
package/README.md +43 -12
package/docs/CONTRIBUTING.md +16 -0
package/package.json +5 -2
package/plugins/fh-commons/agents/quench-challenger.md +8 -1
package/plugins/fh-meta/skills/frontier-digest/SKILL.md +38 -1
package/plugins/fh-meta/skills/goal-quench/SKILL.md +4 -4
package/plugins/fh-meta/skills/harness-doctor/SKILL.md +18 -0
package/plugins/fh-meta/skills/plugin-recommender/SKILL.md +4 -1
package/plugins/fh-meta/skills/prompt-regression/SKILL.md +11 -8
package/plugins/fh-meta/skills/steel-quench/SKILL.md +33 -1
package/scripts/selfcheck.sh +72 -0

package/CATALOG.md CHANGED Viewed

@@ -8,6 +8,120 @@ AI reads this file first when searching past work. Open individual files for det
 <!-- Add entries in reverse date order (newest at top) -->
+### 2026-06-10 | forge-harness | #destructive-op-gate, #irreversibility, #silent-loss, #branch-cleanup-incident
+**File:** CLAUDE.md §Destructive-Op Gate (+ templates/predelete_check.sh, scripts/selfcheck.sh)
+Third irreversibility gate (sibling of Pre-Publish): enumerate → recover → destroy, never destroy-then-check. predelete_check.sh classifies branches SAFE/CHECK/REVIEW (CHECK = 0 unique paths but commits off base — shared files may hold newer content, the silent-loss class); REVIEW blocks scripted deletes (exit 1); the recovery step is judged/depth-sensitive (strongest-tier floor semantics). Signal-table row fires on destructive intents proactively. Origin: same-day incident — a parallel session's card (weekly-audit done + #88) lived only on an unmerged 0-unique-path branch; pre-deletion enumeration recovered it. Dogfood replay: the script lands that exact branch in CHECK.
+- Decision: safety mechanized rather than tier-escalated — Sonnet-default stays valid because the gate carries the depth, with the judged step floor-routed for the residual.
+- Decision: scope split vs Pre-Publish kept explicit (publish = exposure irreversibility, destroy = silent-loss irreversibility).
+### 2026-06-10 | forge-harness | #mode-d-notice, #model-guidance, #self-dev-entry
+**File:** CLAUDE.md §Mode D Model Notice
+Conditional model-pin guidance at self-dev entry (operator request): fires once at the 4-axis gate's own activation trigger — session model opus+ = silent · below-opus = one-line pin recommendation with measured rationale · identity-withheld runtime = static fallback. Advisory only, never auto-switches (pin-is-not-a-cap); field-operation sessions never see it.
+- Decision: zero new triggers — reuses the gate activation condition; the notice and the gate share one detection point.
+### 2026-06-10 | forge-harness | #tier-floor-quench, #below-floor-live, #floor-governance, #dual-challenger
+**File:** knowledge/shared/harness-core/multi_model_sidecar_strategy.md §Floor governance (+ quench-challenger.md)
+Tier-floor design quenched same-day by dual challenger dispatch — F1 opus (floor met) + F2 sonnet (below-floor; floor=opus): first live firing of the below-floor flag, on the design that defines it. Both tiers found the same 2 S-grades: cross-provider "floor-equivalent" undecidable (vibes equivalence) and re-quench tags with no consumer (permanent silent degradation). §Floor governance ships the fixes: external tiers below-floor-by-default until measured equivalence entries (the backend×tier ladder is the evidence source); below-floor judged verdicts provisional until floor re-run or operator acceptance, weekly audit = standing consumer; floor: hard for depth-critical roles (floor outranks diversity); diversity_rationale tie-break; claims age; pin-is-not-a-cap resolution. Wave-T runs #11/#12 both τ-PASS (fix-traceable complexity only).
+- Decision: below-floor challenger empirically buys value (sonnet 6/6 real findings, 2 unique) but misses the deepest (opus-only hard-floor insight) — F2 doctrine and the floor both validated by the same experiment.
+- Open: first weekly-audit below-floor consumption + first measured equivalence entry (laptop ladder).
+### 2026-06-10 | forge-harness | #tier-floor, #model-resolution, #default-sonnet, #sidecar-protocol
+**File:** knowledge/shared/harness-core/multi_model_sidecar_strategy.md (+ quench-challenger.md, steel-quench SKILL.md, README.md, templates/CLAUDE.md)
+Tier-floor resolution ships — the model dimension of the Sidecar Engine Resolution Protocol: assets declare measured-or-justified tier floors (quench-challenger=opus, Wave-T/harness-doctor=sonnet measured, mechanical=none); environment resolves R1 native dispatch / R2 cross-provider / R3 below-floor-with-flag (never hard-fail). Public guidance flips to default `/model sonnet` + floored dispatch; human-set session defaults are inviolable (FH dispatches sub-agents, never switches the session model); pinning the strongest available model recommended only for harness-editing (Mode D).
+- Decision: floor-not-pin semantics; below-floor judged verdicts auto-tagged re-quench candidates (Degraded coverage rule extension); no specific top-anchor model or subscription window named publicly (anti-stale).
+- Decision: grounded in same-day measurements (operation tier-flat 100/100/97/94; depth differential on design increments only) — the guidance flip and the mechanism ship together, neither alone.
+- Open: first organic below-floor dispatch + first Sonnet-default external install feedback (verify_next em-2026-06-10v).
+### 2026-06-10 | forge-harness | #model-tier, #tier-flattening, #worked-example, #output-evidence
+**File:** docs/OUTPUT_EVIDENCE.md (+ README.md §Model setup)
+Model-tier flattening measured and published: 30-point blind battery (rule-application + meta-dev fixtures, pre-registered rubric) on four Claude tiers — operation 100/100/97/94 (anchor/Opus 4.8/Sonnet 4.6/Haiku 4.5), tier separation only on above-rubric design increments (3/3·1/3·0.5/3·0/3). Public claim scoped honestly: single trial, self-graded, worked example not benchmark. README §Model setup gains the evidence note grounding the existing Opus recommendation.
+- Decision: operating FH ≈ model-flat (the harness is the score); developing FH is where tier matters — recommendation unchanged (opus for harness-editing/gates), now evidence-backed. **[superseded same-day by the tier-floor entry above: default flipped to sonnet + floored dispatch; opus pin now Mode-D-only]**
+- Open: real Qwen-class measurement on laptop (batteries are a portable fixture pack, fh-be record).
+### 2026-06-10 | forge-harness | #fc, #consent-lane, #federated-compounding, #starved-center, #v3
+**File:** tracks/_contrib/README.md (+ .gitignore, templates/contrib_session.md, docs/CONTRIBUTING.md, README.md)
+FC F1 pour — the consent lane opens (operator override: "실제증명으로 찍어누른다" — real-proof-first supersedes the v2-submission latch). `tracks/_contrib/` becomes the only tracked subtree under tracks/: surgical 2-line un-ignore, placement = consent, PR-time lane gate (PSA + marketplace-gate C5 + ingest contradiction scan + reviewer). Lane charter + session template + CONTRIBUTING §Consent Lane + README row shipped — exactly the don't-overbuild list, no new skill/verifier.
+- Decision: operator lifted the v2 latch (2026-06-10) — paper timing no longer gates feature publicization; proof-by-shipping doctrine recorded in the lab ledger.
+- Decision: privacy default unchanged — the lane is consent-by-placement; the firewall (PSA/Pre-Publish/4-axis) is the lane's load-bearing precondition, not weakened by it.
+- Open: F2 begins on the first external lane PR (live gate run + friction signals); F3 verdict 2 quarters post-open (PAID vs WRITTEN-OFF per falsifiability criterion).
+### 2026-06-10 | forge-harness | #wave-t, #temper, #steel-quench, #forge-fourth-movement, #v3
+**File:** plugins/fh-meta/skills/steel-quench/SKILL.md (+ templates/temper_check.sh, docs/ETHOS.md, README.md)
+Wave-T (Temper) ships — the fourth forge movement, poured from lab validation (operator-approved): after Wave 3+ convergence, measure the complexity the quench itself added (T-1 temper_check.sh delta, fence-excluding · T-2 harness-doctor absolute tier · T-3 τ-verdict, judged-paired with the quench's own findings). Validation: 9 runs across 4 independent convergences (0 false flags, simplification un-punished) + synthetic positive control (sensitivity) + same-commit dogfood run #10 on the pour itself (τ-PASS). ETHOS/README "direction ahead" IOU converted to delivery.
+- Decision: organic τ-FAIL redefined from promotion-blocker to standing watch (operator-approved) — synthetic control covers sensitivity; first organic flag → human verdict → recorded.
+- Decision: Wave-T stays one script + one section reusing harness-doctor (don't-overbuild guard is part of the shipped spec).
+- Open: npm 1.4.8 ships Wave-T + temper_check.sh + selfcheck 21 (folded into the open laptop handoff).
+### 2026-06-10 | forge-harness | #selfcheck, #count-consistency, #drift-class-kill
+**File:** scripts/selfcheck.sh
+Count-consistency probes (mandatory-pass): plugin.json ×2 / marketplace.json / README header / local_fh_context stated counts vs actual dirs (active = non-deprecated). Motivated by 4 same-day drift instances — the class now fails npm test + prepublishOnly instead of waiting for a doctor run.
+- Decision: deprecation detected mechanically (frontmatter `deprecated: true` or DEPRECATED marker in head -20).
+### 2026-06-10 | forge-harness | #backlog-cleanup, #phantom-fix, #count-drift, #goal-quench
+**File:** plugins/fh-meta/skills/goal-quench/SKILL.md (+ templates/local_fh_context.md, plugin.json, README.md)
+Backlog cleanup (cloud session 2): goal-quench phantom vocab fixed — "steel-quench C3 config" → "cross-provider challenger" (anchored to steel-quench:83 vocabulary + §Sidecar Engine Resolution Protocol; token `steel-quench-crossprovider`, 4 spots; resolves `fh_signal_2026-06-10_fh-direct`). Skill/agent count drift fixed: local_fh_context 26→29 active (−2 deprecated, +5 missing), plugin.json agents 3→7, README agents 5→8 (skills 33 verified correct = 36 dirs − 3 deprecated). Resolves the 06-04 "skill-count drift" Open item.
+- Decision: counting policy is mechanical — active = non-deprecated skill dirs, agents = files in agents/; descriptive sidecar labels confirmed over C-numbering (per 06-10 check-class decision).
+- Verified-no-action: 05-26 cross-ref recommendations already implemented — arXiv 2605.18747 (README:282 + definition doc:39) and Sylph 2604.21003 (definition doc:71/:105) — L4 open NOT raised.
+- Open: npm 1.4.8 republish ships plugin.json/README/SKILL.md count+phantom fixes (folded into the open laptop handoff).
+### 2026-06-10 | forge-harness | #credit-economy, #operator-intake, #lightweight-triage, #frontier-digest
+**File:** plugins/fh-meta/skills/frontier-digest/SKILL.md (+ .claude/rules/sister_asset_protocol.md)
+Credit-economy engine run #2 formalization (operator-approved; this session = manual run #2: 7 walled sources → 8 audits → 15 same-day gate-passed imports, 2 human corrections): R1 — frontier-digest Step 0.5 operator-intake asks for walled-channel sources (YouTube/LinkedIn/X, machine-403) on cadence runs only, skippable, try wall-bypass (WebSearch + secondary) first. R2 — sister protocol lightweight path: dedup-hit/no-increment sources get a one-paragraph entry, full cross-audits reserved for A/B-tier.
+- Decision: the operator stays the wide-net sensor for walled channels by design (human = scarce oracle), the machine owns endurance (triage→gate cycles); intake cost discipline keeps the wide net wide.
+- Follow-up (same day): Step 0.5 video-harvest ladder — Tier 1 sidecar (codex / Gemini-route, agentic = approval-mode first) → Tier 3 Claude+yt-dlp transcript → operator floor; laptop verification trio handed off. First applied instance of the capability-probe principle.
+### 2026-06-10 | forge-harness | #comprehension-debt, #loop-engineering, #osmani, #harness-doctor
+**File:** plugins/fh-meta/skills/harness-doctor/SKILL.md
+Taxonomy 6→7: Comprehension Debt (runtime/operator-side, S) — loop output outpacing operator understanding, named from Osmani's Loop Engineering (canonical upstream of the supervisor-loops lineage; Cherny/Steinberger quotes — paper citation upgraded from the Korean video to this primary source, convergence count unchanged at n=5). Signals mechanical: merged change without CATALOG entry, manifest pending backlog, zero card delta. Countermeasures pre-existing (CATALOG summaries, predict-verify, card protocol).
+- Decision: taxonomy now covers both loop sides — agent behavior (rows 4–6) and operator comprehension (row 7); FH's HITL PR principle recorded as a deliberate L2 cap on asset-changing loops (Osmani L3 rejected for asset mutation).
+### 2026-06-10 | forge-harness | #failure-modes, #agentic-laziness, #self-preferential-bias, #goal-drift, #harness-doctor
+**File:** plugins/fh-meta/skills/harness-doctor/SKILL.md
+Harness-Defect Taxonomy 3→6 classes: runtime behavioral failure modes named from dynamic-workflows discourse (LinkedIn full-text triage) — Agentic Laziness (completion claims without per-item evidence, S), Self-Preferential Bias (judged check without adversarial pairing, M), Goal Drift (no pre-compaction completion log, S). FH already had the countermeasures (golden probes/coverage, judged-pairing rule, fh_completed + card-last guard) — the import is the *naming*, making diagnosis explicit.
+- Decision: signals kept mechanically checkable (list/pairing/log presence) — no judge-only signal in the taxonomy itself; sources not counted as new convergence (discourse extension, anti-inflation).
+### 2026-06-10 | forge-harness | #no-reinvention, #official-first, #claude-plugins-official, #full-harness-mode
+**File:** knowledge/shared/plugin-catalog/recommended_plugins.md (+ CLAUDE.md, auto_project_mapping.md)
+Operator-declared meta-harness 철칙 — no-reinvention / official-first: (a) plugin catalog gains Category 0.5, the claude-plugins-official 36-plugin inventory (12 LSP + workflow + authoring + setup groups; name-based, anti-stale "re-enumerate when recommending", role-split warnings for claude-md-management overlap); (b) New Skill gate Role-duplication criterion now also checks Tier 0 built-ins + official plugins — reinventing an official capability requires explicit justification; (c) Full-Harness Mode item 4: official-plugin scan for the mapped project's stack (recommend-only, never auto-install) — mapped-project acceleration with zero FH build.
+- Decision: FH builds only what adds governance on top of official capabilities; drafting tools (skill-creator etc.) never exempt their output from the FH gate.
+### 2026-06-10 | forge-harness | #tier-0, #builtins, #security-review, #deep-research, #role-split
+**File:** plugins/fh-meta/skills/plugin-recommender/SKILL.md (+ frontier-digest, CLAUDE.md, probes.md)
+CC built-ins utilization imports (operator-approved; video claims verified 9/13 real — agent initially ruled skill-creator nonexistent, operator-corrected: it is an official plugin in claude-plugins-official): Tier 0 = platform built-ins added to plugin-recommender (discovery order 0, "enumerate from live session" anti-stale, governance-add guard for FH-native precedence — goal-quench pattern named). /deep-research as frontier-digest Tier-0 engine. /security-review as Pre-Publish chain item 3 (code-security axis, skip-note path). Permission-Denial Option C, code-review role split, /loop WATCH row. G-TRIG-05 probe synced — anti-stale maintenance rule's first live use.
+- Decision: built-in beats plugin install at ~80% coverage; FH native beats built-in only when it adds governance.
+### 2026-06-10 | forge-harness | #ingest-gate, #contradiction-scan, #crossref-lint, #llm-wiki, #karpathy
+**File:** .claude/rules/sync_push_protocols.md (+ harness-doctor SKILL.md, probes.md)
+Karpathy LLM-Wiki sister-audit imports (operator-approved; convergence case n=5, citable primary source): I1 — contradiction scan as Sync step 3 (ingest gate, judged + verify-bidirectional pair): new knowledge grepped against existing claims before indexing, conflicts flagged in both files, old-claim removal is HITL. I2 — harness-doctor L4 knowledge cross-ref lint: no CATALOG entry = S-tier index orphan, no inbound ref = R-tier orphan page. Probes G-SYNC-01/G-LINT-01 added (30 total).
+- Decision: scale escape (W1) deliberately NOT built — watch-item with trigger (CATALOG hundreds of entries / repeated search misses); operator-preferred first remedy = skill-splitter-style CATALOG split-mapping, RAG hybrid only after that.
+- Open: npm republish (harness-doctor SKILL.md shipped) — folded into the open 1.4.8 handoff.
+### 2026-06-10 | forge-harness | #golden-probes, #offline-eval, #doc-code-coupling, #anthropic-4layer
+**File:** .claude/regression/probes.md (+ templates/.git-hooks/pre-commit, prompt-regression SKILL.md)
+Anthropic 4-layer sister-audit imports (operator-approved): I1 — 28-probe known-answer golden set with check classes, the standing offline eval prompt-regression auto-loads (its P-GATE-01 had already gone stale 5→6 the same day the gate grew — fixed, plus an explicit anti-stale maintenance rule). I2 — pre-commit doc-code coupling WARN (measured class, never blocks) when executables are staged without any doc asset.
+- Decision: probes.md canonical-when-present, SKILL.md default matrix = Mode C fallback (single-source preserved); coupling check warns rather than blocks — the decision is made conscious, doc-neutral script fixes stay friction-free.
+- Open: npm republish (prompt-regression SKILL.md is shipped) — folded into the open 1.4.8 handoff.
+### 2026-06-10 | forge-harness | #new-skill-gate, #check-class, #done-when, #g2
+**File:** CLAUDE.md
+G2 from the supervisor-loops audit: New Skill Creation Pre-Commit Gate extended 5→6 items — "Check-class declared": each Done When condition states its class (mandatory-pass / measured / judged per 6axis §Axis 5), and judged conditions must name their adversarial pairing (no judge-only path). The taxonomy now lives in the operating loop, not just the knowledge doc.
+- Decision: applies to new skills only; existing 28 backfill opportunistically when next edited — no retroactive block.
+### 2026-06-10 | forge-harness | #selfcheck, #mandatory-pass, #npm-test, #self-application
+**File:** scripts/selfcheck.sh (+ package.json)
+G1 fix from the supervisor-loops 100%+ audit: FH's own shipped code (bin/fh-*.js, scripts/fh-*.sh) had zero deterministic checks. New selfcheck.sh runs node --check + bash -n over the npm-shipped executable surface and the gate-chain bash infra (15 checks), wired as `npm test` and `prepublishOnly` — a publish can no longer ship a syntactically broken executable.
+- Decision: syntax-only scope (zero side effects, runs anywhere); blocking wired at publish, not commit — doc commits stay unaffected.
+- Open: npm republish to ship selfcheck.sh in the tarball (machine-bound, laptop).
+### 2026-06-10 | forge-harness | #verify-axis, #check-classes, #supervisor-loops, #sister-asset
+**File:** knowledge/shared/harness-core/harness_6axis_framework.md
+Axis 5 check-class taxonomy added: every verify check classified as mandatory-pass (deterministic, blocking) / measured (quantitative, tracked) / judged (LLM-judge + cited evidence + corrective action — self-judges grade leniently). Judged rule: a judge verdict never passes alone — paired adversarial re-verification + evidence itself phantom-checked. Import from supervisor-loops sister-audit (operator-approved; cross-audit + proposal in companion store; corrective-action clause added after full-transcript delta check).
+- Decision: descriptive labels over C1/C2/C3 numbering — avoids collision with unanchored "steel-quench C3 config" vocab (goal-quench:283, logged as fh_signal for separate fix).
 ### 2026-06-09 | forge-harness | #sidecar, #zero-config, #engine-resolution, #roadmap, #mode-c
 **File:** knowledge/shared/harness-core/multi_model_sidecar_strategy.md (+ hybrid_orchestration_architecture_roadmap.md)
 Added canonical §Sidecar Engine Resolution Protocol — Tier1 subscription-CLI → Tier2 API-key → Tier3 Claude-subagent guaranteed fallback. Principle: discovery automatic/free, invocation value-gated (intelligent default multi-AI, no hard-fail for Mode C). Wired pointers into goal-quench Step D / steel-quench runtime-adapter / harvest-loop Step 3.5-X; sim-conductor/pipeline-conductor/agent-composer inherit by reference. Source hybrid-orchestration design archived as proposed roadmap (versions→placeholders, Python pseudo-code→illustrative, non-shipped tagged Proposed). PR #80.
@@ -207,6 +321,42 @@ v1.2 release complete (PR #1–#5): harvest-loop Step 0, agent-composer worktree
 <!-- Time-independent reference documents -->
+### 2026-06-06 | pattern | parallax, multi-persona-review, synthesizer, standpoint-coverage
+**File:** `knowledge/shared/patterns/multi-persona-review.md`
+Generalized architecture for multi-persona parallel artifact review ("parallax") — parallel isolated personas + shared output protocol + neutral synthesizer. Domain-agnostic, IP-stripped; embodied as sim-conductor Step 1.5.
+### 2026-05-31 | reference | ecosystem-positioning, opencode, hermes, openhuman, readiness
+**File:** `knowledge/shared/harness-core/fh_ecosystem_positioning.md`
+FH's structural position in the AI agent framework ecosystem vs Hermes/OpenCode/OpenHuman — gap analysis, synergy map, layered readiness verdict from 3-model adversarial audit. Companion: `fh_synergy_playbook.md` (concrete workflow specs).
+### 2026-06-04 | schema | tpa, target-profile-analysis, routing, sim-conductor, steel-quench
+**File:** `knowledge/shared/harness-core/tpa_schema.md`
+Canonical Target Profile Analysis schema — all TPA-running skills (sim-conductor, steel-quench, phantom-quench, agent-composer) derive routing decisions from this schema. Single source for profile fields.
+### 2026-06-04 | draft | goal-quench, anthropic-issue, native-goal, token-budget
+**File:** `knowledge/shared/harness-core/goal_quench_anthropic_issue.md`
+Draft Anthropic GitHub issue — native /goal token budget + quality verification hook proposal. Held until arXiv number confirmation.
+### 2026-05-26 | measurement | skill-quality-rubric, maturity-score, verifiable-numbers
+**File:** `knowledge/shared/harness-core/skill_quality_rubric.md`
+Skill maturity score formula definition. Declaring verifiable/evolution numbers without this file violates the cold-audit "self-declaration = delete if no basis" rule.
+### 2026-06-04 | core-reference | compounding-loop, weekly-cycle, axis-6, automation
+**File:** `knowledge/shared/harness-core/hub_compounding_loop.md`
+Core reference doc (CLAUDE.md Consult-First table): weekly/monthly/quarterly feedback cycles + Axis-6 Compounding automation roadmap.
+### 2026-06-04 | core-reference | runtime-flow, session-chronology, subagent-delegation
+**File:** `knowledge/shared/dialogue/claude_code_runtime_flow.md`
+Core reference doc (CLAUDE.md Consult-First table): chronological flow of a Claude Code session (does) + sub-agent delegation flowchart.
+### 2026-06-04 | core-reference | dialogue-playbook, token-efficiency, rule-hierarchy
+**File:** `knowledge/shared/dialogue/ai_dialogue_playbook.md`
+Core reference doc (CLAUDE.md Consult-First table): session-start principles, token efficiency, rule hierarchy, amplifier/coach dual mode (should).
+### 2026-06-04 | glossary | terminology, meta-harness, launch-pad, transit-acceleration
+**File:** `knowledge/shared/GLOSSARY.md`
+Key term definitions — meta-harness, meta hub, launch pad effect, transit acceleration value, shared skill pool, operating modes. Entry point for FH-internal vocabulary; linked from CHEATSHEET.
 ### 2026-04-28 | template | maturity-roadmap, 3-phase-frame, frontier-tracking, simplification-gate
 **File:** `knowledge/shared/harness-core/hub_maturity_roadmap.md`
 Hub long-term evolution path frame. Phase I (entering maturity) → Phase II (frontier following (b)cadence) → Phase III (leading) 3-stage model + 5-criteria gate (audit automation·operations guide·external propagation·sub-agent judgment·self-diagnosis warning) + 6 indicators (seed repo·blog·citations·external adoption·self-evolving·industry original) + simplification gate (self-diagnosis + within 200 lines + unreferenced archive at each transition). General-purpose template derived from first verified operating instance.

package/CHEATSHEET.md CHANGED Viewed

@@ -3,6 +3,7 @@
 Frequently used commands and phrases.
 > **`<harness-root>`** = the path where you cloned this repo. Example: `~/projects/forge-harness`. Replace `<harness-root>` in the commands below with your actual path.
+> Unfamiliar with a term (meta-harness, launch pad effect, transit acceleration…)? See `knowledge/shared/GLOSSARY.md`.
 ---

package/CLAUDE.md CHANGED Viewed

@@ -87,7 +87,7 @@ The forge-harness hub has a dual identity: **(a) a seed for others** + **(b) you
 When blocked by auto-mode permission denial, **do not stop at the bare denial** — turn the block into a decision the user can act on in one step:
 1. **State what was blocked** and why
-2. **Option A — Approval mode**: show exact commands to run after switching; **Option B — Manual review**: list specific files/sections
+2. **Option A — Approval mode**: show exact commands to run after switching; **Option B — Manual review**: list specific files/sections; **Option C — Reduce future prompts**: propose built-in `/fewer-permission-prompts` when the same read-only call class keeps getting prompted
 3. **Ask which option** — one line, then wait
 **Sub-agent variant**: report (what was blocked + ready-to-apply content + exact unblock step) back to orchestrator — never silently fail. Switching modes lifts permission block, not FH gates — the 4-axis gate still applies.
@@ -99,27 +99,31 @@ Simplification guard: trivial denials with one obvious fix → state block + sin
 > **Full 4-step detail**: `knowledge/shared/harness-core/fh_detail_protocols.md`
 > **Read this file before Step 1 begins** — duplicate-install detection (Step 1-b) and registry scan (Step 1-c) are only defined there, not in this summary.
-**Triggers**: greetings (`hi`/`hello`/`hey`) · start intent (`resume`, `continue`, `where were we`) · new task (`new project`, `new task`) · discovery (`what is this`, `what can you do`, `first time here`)
+**Triggers**: greetings (`hi`/`hello`/`hey`/`안녕` — and the same word in any language; FH is English-based but language-agnostic, a bare greeting in any tongue fires this) · start intent (`resume`, `continue`, `where were we`) · new task (`new project`, `new task`) · discovery (`what is this`, `what can you do`, `first time here`)
 **4-step summary**: ① Auto-read CLAUDE.md + CATALOG + session card + registry scan → ② One-line proposal (new user / exploratory / returning branches) → ③ 5-skill cascade (plugin-recommender → synergy → .claudeignore → model → verify) → ④ Approval + setup
 **Identity marker**: every greeting response (Step ②) opens with 🐿️ on its own line. FH's session-start signal — see `fh_detail_protocols.md` Step 2 for full greeting templates.
 **Guards**: explicit task-entry utterance → skip onboarding · once per session · code/debug requests → start working directly · project routing is a suggestion, mention at most once
+**Metadata-is-not-intent guard**: the trigger is the user's **typed message only**. Session metadata — branch name (auto-derived from the first message, e.g. `claude/korean-greeting-*`), repo name, file paths — is **never** a task spec and never suppresses or redirects the greeting trigger. A bare greeting fires onboarding even when the branch name looks like a feature request; if the only "task" signal lives in metadata and not in what the user typed, treat the message as a greeting and run the 3-axis scaffold.
 ## New Skill Creation Pre-Commit Gate
-All 5 items below must pass before committing a new SKILL.md. If any fails, fix and re-commit.
+All 6 items below must pass before committing a new SKILL.md. If any fails, fix and re-commit.
 | Item | Criterion |
 |---|---|
-| **Role duplication check** | Pass `/asset-placement-gate` — no overlap with existing role clusters |
+| **Role duplication check** | Pass `/asset-placement-gate` — no overlap with existing role clusters, **platform built-ins (Tier 0), or `claude-plugins-official` (Tier 1 official)**. Reinventing an official capability requires explicit justification in the SKILL.md (no-reinvention rule — FH builds only what adds governance) |
 | **Description diet** | Plain text / 0 self-marketing expressions / 0 emphasis words (⭐, "critical", "groundbreaking") |
 | **Done When defined** | At least 1 explicit completion condition |
+| **Check-class declared** | Each Done When condition states its check class — mandatory-pass / measured / judged (`harness_6axis_framework.md` §Axis 5). Any judged condition names its adversarial pairing — no judge-only path |
 | **Natural language triggers** | At least 3 examples that work without internal vocabulary |
 | **Independently executable** | Confirmed to work without other FH skills (or dependencies are explicitly documented) |
 Skills without a Done When definition automatically qualify as harness-doctor L2 M-tier.
+Check-class declaration applies to **new** skills; existing skills backfill opportunistically
+(when next edited), not retroactively.
 ---
@@ -156,7 +160,22 @@ FH asset modified → Axis 1 (regression_guard.sh --pr {BRANCH})
 | Forward | `phantom-quench` | Phantom references, paths that don't exist, stale external links |
 | Record | `edit-manifest` RECORD | Logs predicted impact — closes the predict-verify loop for future harvest-loop |
----
+### Mode D Model Notice (fires once, at the same trigger as this gate)
+The moment FH self-development work begins (= the gate's own activation trigger: an FH asset is about
+to be modified), check the **session model** (self-identity; if the runtime withholds it, treat as
+unknown) and surface **one line** — then proceed, never block:
+- Model known and opus-tier or above → no notice (already optimal).
+- Model known and below opus-tier → *"이 작업은 FH 자체개발(Mode D)입니다 — 가용 최강 모델 핀을
+  권장합니다 (`/model opus` 이상; 측정 근거: README §Model setup). 그대로 진행해도 floored
+  디스패치가 깊이 턴을 커버하지만, 세션-레벨 설계 깊이는 핀이 좌우합니다."*
+- Model unknown (runtime withholds identity) → static fallback: *"FH 자체개발 작업입니다 — 세션
+  모델이 opus 이상이 아니라면 핀 전환을 권장합니다 (`/model opus`+)."*
+**Guards**: once per session · advisory only — **never switch the session model** (human override is
+inviolable; a pin is not a cap — tier-floor resolution §Floor governance) · field-project operation
+sessions (no FH asset modification) never see this notice — the Sonnet default stays friction-free.
 ## Pre-Publish Surface Gate (Irreversibility Gate — Publish, not Commit)
@@ -169,11 +188,14 @@ first time**, especially one **derived from internal/company assets** (operator-
 private harness): `gh repo create --public`, `gh repo edit --visibility public`, a first push to a new
 public remote, `npm publish`, `twine upload`, a private→public visibility flip.
-**Required before the public action** (both must be non-LEAK) — this gate is the **umbrella that invokes
-both**, not a competitor to them; when publish intent is detected, fire *this* gate (it then runs both),
+**Required before the public action** (all must be non-LEAK/non-FAIL) — this gate is the **umbrella that
+invokes them**, not a competitor; when publish intent is detected, fire *this* gate (it then runs the chain),
 not marketplace-gate alone:
 1. `/public-surface-audit` — operator-private token scan (real username, corp asset names, home paths)
 2. `/marketplace-gate` Check 5 — broad public safety (API keys, internal domains, license)
+3. `/security-review` (built-in, when the repo ships executable code) — code-security pass on the
+   publishable surface; complements 1–2 which scan tokens/metadata, not code behavior. Skip note
+   (`skipped: docs-only repo` or `skipped: built-in unavailable`) if not applicable
 > Routing vs the rows below: `/marketplace-gate` alone = "is this ready to **list on a marketplace**?";
 > `/public-surface-audit` alone = reactive "did I leak a token?"; **this gate** = the *act of going
@@ -193,6 +215,32 @@ checklist** (`templates/PRE-PUBLISH-CHECKLIST.md`) the operator runs on any repo
 ---
+## Destructive-Op Gate (Irreversibility Gate — Delete/Rewrite, not Commit)
+**Order invariant: enumerate → recover → destroy, never destroy-then-check.** Deletion and history
+rewrite are irreversible in the way publish is — except the loss is *silent* (nobody sees what a
+deleted branch was carrying).
+**When this gate fires** — *before* any of: branch deletion (local or remote), history rewrite /
+force-push, scrub of tracked history, bulk deletion of session records / tracks content.
+1. **Enumerate (measured)**: `bash templates/predelete_check.sh <repo> [base]` — per branch: commits
+   off base + unique paths. Verdicts: SAFE (fully merged) · CHECK (0 unique paths but commits off
+   base — shared files may hold *newer* content, e.g. an unmerged session card) · REVIEW (unique
+   paths — recovery mandatory).
+2. **Recover (judged — depth-sensitive)**: every CHECK/REVIEW item gets a content-direction look;
+   live un-integrated state (cards · handoffs · signals · session records) is integrated to main
+   **before** anything is deleted. This step exists because the loss class is silent — run it at the
+   strongest available tier (floor semantics, §Tier-floor); a below-floor pass is provisional.
+3. **Destroy** only what passed — REVIEW blocks a scripted delete chain (script exits 1).
+> Origin: 2026-06-10 branch cleanup — pre-deletion enumeration recovered a parallel session's card
+> (weekly-audit completion + #88 merge state) that existed **only on an unmerged branch** with zero
+> unique paths: exactly the CHECK class, invisible to "is it merged?" intuition. Deletion without the
+> gate destroys live state without anyone noticing.
+---
 ## Autonomous Initiative Layer — Context-Triggered Skill Proposals (Active Throughout Session)
 At any point during a session, when the following signals are detected, propose the relevant skill in one line.
@@ -205,7 +253,8 @@ Proposal format: `"I see [X]. Want me to run /[skill] to [one-line description]?
 | "wrap up this week", "review", "audit", "weekly", "retrospective" | `/harvest-loop` |
 | "pull this into FH", "reverse-harvest", "worth keeping", "harvest pattern", "field pattern" | `/field-harvest` |
 | "harness is complex", "too many skills", "check structure", "harness" | `/harness-doctor` |
-| "review this PR", "check diff", "code review" | `/hub-cc-pr-reviewer` |
+| "review this PR", "check diff", "code review" | code diff → built-in `/code-review`·`/review` · FH-asset coherence → `/hub-cc-pr-reviewer` (role split) |
+| "keep watching X", "poll this", "check every N minutes", recurring WATCH item | built-in `/loop` (interval runner) — pair with the WATCH list, don't hand-poll |
 | "are these in sync", "synergy", "can these integrate", "any overlap" | `/cross-ecosystem-synergy-detection` |
 | "latest trends", "frontier", "external resources" | `/frontier-digest` |
 | "orchestrate agents", "parallel dispatch", "combine skills", "multiple agents" | `/agent-composer` |
@@ -218,6 +267,7 @@ Proposal format: `"I see [X]. Want me to run /[skill] to [one-line description]?
 | "add to marketplace", "OK to publish", "pre-publish check" | `/marketplace-gate` |
 | "did I leak anything", "public surface audit", "private token scan", "is my split clean", "check tracked files for private tokens" | `/public-surface-audit` |
 | "publish", "make public", "make this repo public", "go public", "gh repo create --public", "flip to public", "first public push", "publish the package", "npm publish", "twine upload" (publish intent — **proactive**, fire *before* the action) | **Pre-Publish Surface Gate** (see above → `/public-surface-audit` + `/marketplace-gate` Check 5 must PASS first) |
+| "delete the branch", "브랜치 삭제", "브랜치 정리", "clean up branches", "force-push", "rewrite history", "지워도 돼?" (destructive intent — **proactive**, fire *before* the action) | **Destructive-Op Gate** (see above → enumerate → recover → destroy; `templates/predelete_check.sh`) |
 | "look at this again", "is this right", "counterargument", "re-validate" | `/verify-bidirectional` |
 | "MCP failing", "tool keeps erroring", "circuit-breaker", "same error looping" | `/mcp-circuit-breaker` |
 | "token budget", "how expensive", "estimate tokens", "will this cost a lot" | `/token-budget-gate` |

package/README.md CHANGED Viewed

@@ -166,15 +166,15 @@ hardened by attack, and only then does it ship faster, for having survived.
 |---|---|---|
 | **Forge** | shape the raw project into a harness — raise its floor | `install-wizard`, "harness-ify this project" |
 | **Quench** | harden it by attack — the cold pass leaves standing only what is sound | `steel-quench` · `phantom-quench` |
-| **Temper** | take the brittleness back out of the hardened asset | *the movement being forged next* |
+| **Temper** | take the brittleness back out of the hardened asset | `steel-quench` Wave-T · `templates/temper_check.sh` |
 | → **Accelerate** | a blade that survived the forge cuts faster | `goal-quench` — *Pass → Accelerate* |
-Three movements are shipped; **temper** is the direction ahead — and naming the movement we have *not*
-finished is the point (see [`ETHOS.md`](docs/ETHOS.md#the-forge)). Around the forge, two more signatures keep
-it running: `harvest-loop` (each session's lessons become permanent skills) and `agent-composer`
-(orchestrate the dispatch). The other skills wait until you need them — full list below.
+All four movements ship. Temper was named before it was built — deliberately (see
+[`ETHOS.md`](docs/ETHOS.md#the-forge)) — and shipped once measurement runs validated it. Around the forge,
+two more signatures keep it running: `harvest-loop` (each session's lessons become permanent skills) and
+`agent-composer` (orchestrate the dispatch). The other skills wait until you need them — full list below.
-## 33 skills · 5 agents
+## 33 skills · 8 agents
 <details>
 <summary>Full asset activation check</summary>
@@ -237,18 +237,48 @@ it running: `harvest-loop` (each session's lessons become permanent skills) and
 Claude Code does not auto-select models by task complexity — you configure this once.
 ```bash
-/model opus   # recommended for forge-harness — pin Opus so gate/verification turns never silently drop
+/model sonnet   # recommended default — FH dispatches stronger models itself where they matter
 ```
 | Command | Who runs what | Best for |
 |---|---|---|
-| `/model opus` | Opus handles everything | **FH default** — verification, governance, agent reasoning |
+| `/model sonnet` | Sonnet session; FH dispatches higher-tier sub-agents on declared floors | **FH default** — operation + routine dev |
+| `/model opus` | Opus handles everything | Harness-editing sessions (Mode D) · maximum depth on every turn |
 | `/model opusplan` | Opus *plans* · Sonnet executes *(when Opus engages)* | Cost-conscious routine coding — see caveat |
-| `/model sonnet` | Sonnet handles everything | Fast, simple tasks (no FH gates) |
-**Why `/model opus` for FH**: FH is a *quality* harness — its value lives in verification turns (steel-quench, phantom-quench, the 4-axis gate, agent reasoning) that must not silently run on a weaker model. `opusplan`'s Opus engagement is **not guaranteed**: in a measured 10-turn run it used Opus on **0** turns (CC classifies few turns as "plan-mode"), so the reasoning FH leans on quietly ran on Sonnet — pinning `/model opus` removed that variance (22/22 turns Opus in the follow-up). **Sub-agent dispatch** model is set by the dispatch's own `model` parameter; the session model/plan-mode does **not** propagate Opus to sub-agents, so pin Opus there too for adversarial/verification agents. Use `opusplan` only when the task is mostly routine edits and you accept that Opus may not engage.
-> **By role**: editing the harness itself → `/model opus` (full). Running FH on a field project → `/model opus` for any gated/verification work; `opusplan` is an acceptable cost tradeoff for routine coding, with the caveat above. Sub-agent token costs are CC-visible in the session jsonl under `message.model`.
+**Why default Sonnet now works**: measured (see §Model setup evidence note below), *operating* FH is
+nearly model-flat — the rules in context do most of the work. What still needs a stronger model is a
+small set of depth-sensitive turns, and FH handles those itself: **some skills and agents declare a
+model-tier floor** (e.g. `quench-challenger` floors at opus) and are dispatched as sub-agents at the
+floor tier when your environment can reach it — your session model stays untouched. **FH never switches
+your session model**: a default you set by hand is followed; floors apply only to FH's own sub-agent
+dispatches. If your environment tops out below a floor (e.g. Sonnet-only API routing), the floored
+asset still runs at the best available tier with an explicit `below-floor` flag in its output — degraded
+delivery is visible, never silent (tier-floor resolution: `knowledge/shared/harness-core/multi_model_sidecar_strategy.md §Tier-floor`).
+**`opusplan` caveat (measured)**: its Opus engagement is **not guaranteed** — in a measured 10-turn run
+it used Opus on **0** turns (CC classifies few turns as "plan-mode"). If you want Opus on every turn,
+pin `/model opus` (22/22 turns Opus in the follow-up run). **Sub-agent dispatch** model is set by the
+dispatch's own `model` parameter; the session model/plan-mode does **not** propagate to sub-agents.
+> **By role**: running FH (field projects, gates, routine dev) → `/model sonnet` + let the floors
+> escalate. Editing the harness itself (Mode D) → pin the strongest model you have — harness
+> *self-development* is where tier depth measurably pays (design-increment finding), while operation
+> does not. Sub-agent token costs are CC-visible in the session jsonl under `message.model`.
+**Measured, not asserted** (2026-06-10, worked example): on a 30-point blind rule-application battery,
+*operating* FH was nearly model-flat — Opus 4.8 / Sonnet 4.6 / Haiku 4.5 scored **100 / 97 / 94** against
+a top-tier anchor at 100, with the rules in context doing most of the work. The tiers separated only on
+above-rubric *design* increments (developing the harness, not running it) — which is why the default is
+Sonnet with **tier-floored dispatch** covering the depth-sensitive turns, and a pinned stronger model is
+recommended only for harness-editing sessions. Details: `docs/OUTPUT_EVIDENCE.md` §Validation signals.
+**Measured, not asserted** (2026-06-10, worked example): on a 30-point blind rule-application battery,
+*operating* FH was nearly model-flat — Opus 4.8 / Sonnet 4.6 / Haiku 4.5 scored **100 / 97 / 94** against
+a top-tier anchor at 100, with the rules in context doing most of the work. The tiers separated only on
+above-rubric *design* increments (developing the harness, not running it) — which is exactly why the
+recommendation stays Opus for harness-editing and gate turns, while field operation tolerates lower tiers.
+Details: `docs/OUTPUT_EVIDENCE.md` §Validation signals.
 If you use external CLIs (Gemini, Codex, `gh copilot`) as sidecars, their costs are billed to their own quota and not visible in CC's token display.
@@ -293,4 +323,5 @@ External convergence:
 | [`AGENTS.md`](AGENTS.md) | Runtime agent specs |
 | [`CATALOG.md`](CATALOG.md) | Past work search index |
 | [`CONTRIBUTING.md`](docs/CONTRIBUTING.md) | How to contribute skills and patterns |
+| [`tracks/_contrib/`](tracks/_contrib/README.md) | **Consent lane** — share a de-identified work session; the repo compounds across operators, not just locally |
 | [`fh_integration_contract.md`](knowledge/shared/harness-core/fh_integration_contract.md) | Governance gate spec |

package/docs/CONTRIBUTING.md CHANGED Viewed

@@ -12,6 +12,7 @@ If you'd like to make forge-harness better, pull requests are welcome.
 | **Templates** | Common files to add under `templates/` |
 | **Documentation** | README, skill description refinement, typo fixes |
 | **Field pattern harvest** | Proposing a pattern discovered in real use as a skill (see `/field-harvest`) |
+| **Session contribution (consent lane)** | Share a de-identified work session in `tracks/_contrib/` — see §Consent Lane below |
 ## Choosing a Contribution Path — Check First
@@ -24,6 +25,21 @@ If you'd like to make forge-harness better, pull requests are welcome.
 > **Adding a new skill = Full path required**. Improving an existing skill = Lightweight path available.
+## Consent Lane — share a session (`tracks/_contrib/`)
+Everything under `tracks/` is private by design **except** `tracks/_contrib/` — the consent lane.
+Placing a session file there is your explicit consent to publish it; it lands via PR through the lane
+gate (`/public-surface-audit` + `/marketplace-gate` Check 5 + ingest contradiction scan + reviewer pass).
+- Start from `templates/contrib_session.md` → `tracks/_contrib/{your-handle}/{topic}/session_YYYY_MM_DD_{slug}.md`
+- **De-identify first** (no employer/project/colleague names, paths, domains, credentials) — the gate
+  re-checks, but scrubbing is yours first
+- Full charter, gate table, and what-happens-after-merge: [`tracks/_contrib/README.md`](../tracks/_contrib/README.md)
+- Overhead: minimal — the skill-PR rules don't apply (it's a session, not a skill); only the lane gate + frontmatter floor
+Merged sessions get CATALOG credit under your handle, and `harvest-loop` distills repeating patterns
+into shared knowledge/skills — your session becomes compound interest for every cloner.
 ## PR Rules (Short Version)
 1. **New skill** → Create `plugins/fh-meta/skills/{name}/SKILL.md` + add version line to `plugins/fh-meta/CHANGELOG.md`

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@chrono-meta/fh-gate",
-  "version": "1.4.7",
+  "version": "1.4.9",
   "description": "FH runtime adapters — run FH governance, skills, and agents via Claude or Codex with machine-parseable gates.",
   "license": "MIT",
   "keywords": [
@@ -23,7 +23,9 @@
     "fh-goal": "bin/fh-goal.js"
   },
   "scripts": {
-    "prepare": "chmod +x bin/fh-gate.js bin/fh-run.js bin/fh-goal.js scripts/fh-gate.sh scripts/fh-run.sh scripts/fh-goal.sh"
+    "prepare": "chmod +x bin/fh-gate.js bin/fh-run.js bin/fh-goal.js scripts/fh-gate.sh scripts/fh-run.sh scripts/fh-goal.sh",
+    "test": "bash scripts/selfcheck.sh",
+    "prepublishOnly": "bash scripts/selfcheck.sh"
   },
   "engines": {
     "node": ">=16"
@@ -54,6 +56,7 @@
     "scripts/fh-gate.sh",
     "scripts/fh-run.sh",
     "scripts/fh-goal.sh",
+    "scripts/selfcheck.sh",
     "plugins/fh-meta/skills",
     "plugins/fh-meta/agents",
     "plugins/fh-commons/skills",

package/plugins/fh-commons/agents/quench-challenger.md CHANGED Viewed

@@ -19,7 +19,14 @@ description: Dedicated quench attack-prescription synthesis agent — Devil (6-a
   If even one S-tier finding exists, registration is blocked. A-tier and below allow registration after fix recommendations.
   </commentary>
   </example>
-model: sonnet
+model: opus
+# model is a HARD FLOOR (tier-floor resolution — multi_model_sidecar_strategy.md §Tier-floor):
+# adversarial increment-finding is the depth-sensitive class. floor: hard semantics — the floor
+# outranks diversity: prefer any floor-meeting engine (incl. native Tier-3 at opus) over a below-floor
+# diversity engine. Only when NO engine reaches the floor, dispatch at best available with the
+# below-floor header (e.g. "challenger: sonnet (below-floor; floor=opus)") — never hard-fail.
+# Below-floor judged verdicts are PROVISIONAL (not gate-PASS evidence) until floor-tier re-run or
+# explicit operator acceptance; the weekly audit is the standing re-quench consumer.
 color: red
 tools: Read, Grep, Glob
 version: 0.2

package/plugins/fh-meta/skills/frontier-digest/SKILL.md CHANGED Viewed

@@ -29,11 +29,48 @@ model: sonnet
 ```
 Priority:
+0. /deep-research built-in available (check live session skill list)
+   → use it as the collection+verification engine (staged source gathering,
+     cross-checking, cited synthesis) — Tier-0 route, no API key needed
 1. ANTHROPIC_API_KEY environment variable → Claude Sonnet
 2. Neither → WebSearch mode (raw data only, no synthesis)
 ```
-Report detection result in one line. Example: `🔑 Using Claude Sonnet`
+Report detection result in one line. Example: `🔑 Using Claude Sonnet` / `🔑 Engine: /deep-research (built-in)`
+---
+## Step 0.5. Operator Intake (speculative-interview arm — walled channels)
+On **cadence-triggered** runs (7d), ask the operator one line before collecting:
+> *"이번 주에 본 벽 뒤 소스(YouTube·LinkedIn·X 등 기계가 못 닿는 링크/요약)가 있으면 던져 주세요 — triage해서 기록합니다. 없으면 그냥 진행할게요."*
+- Operator may skip — zero pressure; the autonomous arms (Step 1) run regardless.
+- Why: walled channels return 403 to machine fetch — **the operator is the only wide-net sensor
+  for them**. This arm turns ad-hoc link-throwing into a scheduled intake (manual-validated n=2).
+- Received sources route to the sister-asset/triage flow with its **lightweight path**: C-tier
+  (territory already covered) = one-paragraph entry only; full cross-audit reserved for A/B-tier.
+  Partial wall-bypass is allowed first: try WebSearch + secondary sources before declaring unfetchable.
+- **Video sources (local/laptop only — cloud VMs typically 403 on video hosts)**: resolve a
+  *video-harvest* capability via the Sidecar Engine Resolution Protocol
+  (`multi_model_sidecar_strategy.md`) — probe by **capability, not engine name**. A CLI that is a
+  valid sidecar for other tasks is not automatically a video-harvest engine.
+  **Tier 1 — a natively multimodal CLI that ingests the URL directly**: probe for one, then route
+  to it (pre-EOL example invocation: `gemini --skip-trust -p "{URL}"` → timestamped summary). ⚠️
+  **the direct `gemini` CLI is being sunset (vendor EOL 2026-06-18)** — after that probe `agy` (the
+  Antigravity router-shell successor, same class) or the Gemini API; see
+  `multi_model_sidecar_strategy.md §Binary names churn`. A coding-agent CLI with **no native
+  video/transcript access** (e.g. `codex`) cannot do this — it burns tokens, recovers only
+  metadata (title), then asks for a pasted transcript; do not route video to it. Agentic
+  router-shells (`agy`) get approval-mode first.
+  **Tier 3** (Tier 2 = router-shell / Gemini API, see the protocol) **— conditional fallback (not
+  guaranteed)**: `yt-dlp --write-auto-subs --skip-download` then summarize the transcript — fine
+  for talk-style content. **Probe the environment first**:
+  `yt-dlp --version && python3 -c "import curl_cffi" && command -v ffmpeg` — Tier 3 needs all three
+  (`yt-dlp`, `curl_cffi` impersonation target, `ffmpeg`), and the timedtext endpoint may return
+  HTTP 429; if the probe fails, fall through rather than assuming it works. Unresolvable (cloud, no sidecar; or all tiers blocked) →
+  operator summary remains the path, as today.
 ---

package/plugins/fh-meta/skills/goal-quench/SKILL.md CHANGED Viewed

@@ -280,14 +280,14 @@ Runs after Step C (or after Step B if no capability gap). Selects an adversarial
 | Task scope signal | Sidecar | Invocation point |
 |---|---|---|
-| Code quality review / new SKILL.md / governance gate change | `steel-quench` C3 config (Gemini sidecar if available) | Post-/goal, before pipeline-conductor |
+| Code quality review / new SKILL.md / governance gate change | `steel-quench` cross-provider challenger (Gemini sidecar if available; Tier 3 Claude sub-agent fallback per the Sidecar Engine Resolution Protocol) | Post-/goal, before pipeline-conductor |
 | Architecture design / cross-project dependency | `agent-composer` multi-model panel (if external CLIs available; otherwise single-Claude sub-agent) | Parallel to /goal as separate Agent |
 | External publication / marketplace-gate / skill release | `sim-conductor` + `steel-quench` Wave 5 | Post-/goal quality gate |
 | No signal match (default) | None — pipeline-conductor handles quality alone | — |
 Write resolved sidecar to `.active`:
 ```
-sidecar: none | steel-quench-c3 | agent-composer-panel | sim-conductor | {cli-name}
+sidecar: none | steel-quench-crossprovider | agent-composer-panel | sim-conductor | {cli-name}
 sidecar_rationale: {one-line reason — which scope signal triggered this}
 ```
@@ -410,7 +410,7 @@ After pipeline-conductor completes, read `sidecar:` from `.pending`. If non-none
 | Sidecar | Wait for | Failure action |
 |---|---|---|
-| `steel-quench-c3` | Wave 2 convergence: PASS or CONDITIONAL_PASS | FAIL → reopen Phase 2, surface blocking findings to user |
+| `steel-quench-crossprovider` | Wave 2 convergence: PASS or CONDITIONAL_PASS | FAIL → reopen Phase 2, surface blocking findings to user |
 | `agent-composer-panel` | Panel findings file written + incorporated into pipeline-conductor context | Not yet written → re-run pipeline-conductor with panel findings |
 | `sim-conductor` + `steel-quench-w5` | Both: sim-conductor Area A PASS **and** steel-quench W5 convergence PASS | Either FAIL → block Done When; surface failing verdict |
@@ -438,7 +438,7 @@ After each goal-quench run, append to `tracks/_meta/goal_quench_{YYYY-MM-DD}.md`
   actual_vs_estimate_ratio: N.N  # actual / estimated (e.g., 4.7 means actual was 4.7× estimate)
   budget_verdict: GREEN/YELLOW/ORANGE/RED
   pipeline_verdict: CLEAN/PENDING/BLOCKED/ESCALATE
-  sidecar: none | steel-quench-c3 | agent-composer-panel | sim-conductor | {cli-name}
+  sidecar: none | steel-quench-crossprovider | agent-composer-panel | sim-conductor | {cli-name}
   sidecar_type: none | cc-subagent | external-cli  # cc-subagent = isolated Sonnet context; external-cli = untracked
   sidecar_model: none | sonnet | {model-name}  # standard baseline: sonnet; external CLIs vary by tier
   orchestrator_tokens: N | unknown  # Opus orchestrator CC tokens (pro/max only; visible in CC)

package/plugins/fh-meta/skills/harness-doctor/SKILL.md CHANGED Viewed

@@ -107,6 +107,7 @@ File: {path}
 | .claude/rules files unmodified 90+ days | R-tier |
 | settings.json JSON syntax error | M-tier |
 | Hooks in settings.json (PostToolUse/Stop) that don't fire in Agent View | S-tier or M-tier |
+| Public-surface FH recommendation or default changed — no N-shot measurement evidence traceable in session records or PR body | S-tier |
 Hook divergence verdict: 0 hooks = Normal · 1+ hooks (session-end/Stop) = S-tier · 1+ hooks (PostToolUse file writes or external API) = M-tier (data loss risk in Agent View).
@@ -115,6 +116,10 @@ Hook divergence verdict: 0 hooks = Normal · 1+ hooks (session-end/Stop) = S-tie
 - Tracks with no sync in 30+ days → R-tier
 - CATALOG.md: 5+ open items in recent 5 sessions → S-tier
 - Field project CLAUDE.md missing → S-tier
+- **Knowledge cross-ref lint**: `knowledge/**/*.md` with no CATALOG.md entry → S-tier
+  (index orphan — unfindable via CATALOG-first search); with no inbound reference from
+  any CLAUDE.md / rules / SKILL.md / knowledge doc → R-tier (orphan page).
+  Mechanical: `grep -L` filenames against CATALOG.md, then `grep -rl` for inbound refs.
 ### Step 6. L5 — Pattern Analysis *(FH only)*
@@ -154,8 +159,19 @@ Hook divergence verdict: 0 hooks = Normal · 1+ hooks (session-end/Stop) = S-tie
 | **Context Drift** | Stale paths in CLAUDE.md (L3) · rules unmodified 90+ days (L3) · CLAUDE.md over threshold (L2) | S |
 | **Schema Misalignment** | Plugin count drift · SKILL.md missing `Done When` (undocumented contract) | M |
 | **State Degradation** | tracks/ no sync 30+ days (L4) · INACTIVE_90D skills (L5-A) · orphaned memory entries | M |
+| **Agentic Laziness** *(runtime)* | Completion claims without per-item evidence — "all N done" with no enumerable list in session records / PR bodies · Done When lacking any mandatory-pass condition | S |
+| **Self-Preferential Bias** *(runtime)* | judged-class check without a named adversarial pairing (gate-rejectable for new skills; scan existing for backfill) · a self-graded verdict cited as sole completion evidence | M |
+| **Goal Drift** *(runtime)* | Long session with S/A-tier work but no pre-compaction completion log (`fh_completed_*` missing) · early-stated constraints absent from card/manifest at close | S |
+| **Comprehension Debt** *(runtime/operator)* | Merged FH-asset change with no CATALOG entry · edit_manifest `validation_status: pending` backlog piling up unverified · session closed with work done but zero card delta | S |
+| **Evidence Gap** *(FH self-dev)* | Public-surface FH recommendation or default changed with no N-shot measurement evidence traceable in session records or PR history (L3 check) | S |
 Signal in 2+ classes → escalate one tier.
+Rows 1–3 = structural (2026-06-02 frontier diagnosis). Rows 4–6 = runtime behavioral, agent-side —
+named from dynamic-workflows discourse (2026-06). Row 7 = runtime, operator-side — named from
+loop-engineering discourse (Osmani, 2026-06): loop output outpacing operator understanding. FH
+countermeasures already exist and are what the signals check for: golden probes + multi-persona
+coverage (laziness) · judged-pairing rule (bias) · pre-compaction completion log + card-last guard
+(drift) · CATALOG 3-line summaries + manifest predict-verify + card protocol (comprehension debt).
 ---
@@ -247,6 +263,8 @@ Verdict: ✅ CONSISTENT · 🟧 INCONSISTENT (fix before push) · 🟩 REVIEW (i
 **This skill Done When = "prescription report output complete".** Actual resolution of M/S/R items belongs to user or follow-up work.
+**Check classes** (`harness_6axis_framework.md` §Axis 5): report-output completions above = mandatory-pass (report exists or not). M/S/R tier *assignments* = judged — paired with verify-bidirectional when the user challenges a tier.
 Verdict: PASS (M-tier 0, "Structure healthy") | CONDITIONAL_PASS (S/R remain, no M) | FAIL (1+ M-tier found) | ESCALATE (structural ambiguity before prescription can be issued)
 **Three-Doctor Loop chain** (auto-propose after prescription report):

package/plugins/fh-meta/skills/plugin-recommender/SKILL.md CHANGED Viewed

@@ -49,6 +49,7 @@ Tier is **independent of platform origin** (Anthropic / OpenAI / community). A w
 | Tier | Criteria | Sources |
 |---|---|---|
+| **Tier 0** | Platform built-in — ships with the runtime, zero install, zero token cost to discover | Claude Code built-in skills/commands (e.g. `/deep-research`, `/code-review`, `/review`, `/security-review`, `/batch`, `/loop`, `/fewer-permission-prompts`, `/goal`, `/rewind`, `/team-onboarding`) — inventory varies by version/environment: enumerate from the live session, do not assume |
 | **Tier 1** | Marketplace-listed + performance-validated (benchmark data or production usage evidence) | Anthropic official · Codex marketplace verified · CC marketplace verified · FH community reviewed (steel-quench + sim-conductor validated) |
 | **Tier 2** | Marketplace-listed, no explicit performance data | Any marketplace source (CC marketplace · Codex marketplace · npm · GHE) |
 | **Tier 3** | Not marketplace-listed, source-available (GitHub/npm) | Repo-only agents/plugins |
@@ -87,13 +88,15 @@ When queried for a specific capability (e.g., "adversarial reviewer for bash cod
 **Discovery order (stop when sufficient Tier 1 candidates found):**
+0. **Platform built-ins (Tier 0)** — does a built-in skill/command already cover the capability? Check the live session's available-skills list before any plugin search. A built-in that covers ~80% beats installing a plugin for the rest
 1. **Installed locally** — `.claude/agents/`, `plugins/` in current cwd
 2. **FH native skills** — always-loaded knowledge in `plugins/fh-meta/` and `plugins/fh-commons/`
 3. **Claude Code marketplace** — `claude mcp search [capability]` or known CC registry (see verified targets above)
 4. **Codex marketplace** — `npx @openai/codex list-agents [capability]` or known Codex registry
 5. **npm ecosystem** — `@chrono-meta/`, `@anthropic/`, and other known-quality scoped packages
-**Discovery priority**: installed > FH native > Tier 1 (any platform) > Tier 2 > Tier 3 > Tier 4
+**Discovery priority**: built-in (Tier 0) > installed > FH native > Tier 1 (any platform) > Tier 2 > Tier 3 > Tier 4
+**Tier 0 guard**: FH native wins over a built-in only when the FH skill adds governance the built-in lacks (e.g. `/goal` → `goal-quench` adds budget+quality gates; code diff review stays with built-in `/code-review`, FH-asset coherence with `hub-cc-pr-reviewer`)
 **When sim-conductor chains here for persona discovery**: apply the same platform-aware search scoped to persona/simulation/review capability tags. Return discovered agents with their Tier rating so sim-conductor can decide whether to install or use a built-in brief.

package/plugins/fh-meta/skills/prompt-regression/SKILL.md CHANGED Viewed

@@ -57,9 +57,12 @@ Check for custom probes:
 ls .claude/regression/probes.md 2>/dev/null || echo "NO_CUSTOM_PROBES"
 ```
-**If custom probes exist**: load and use them.
+**If custom probes exist**: load and use them. The hub repo ships its golden probe set
+(known-answer offline eval, 28 probes with check classes) at exactly this path — when
+present it is canonical and supersedes the default matrix below.
-**If no custom probes**: use the default probe matrix below.
+**If no custom probes** (e.g. Mode C install without the hub repo): use the default
+probe matrix below.
 #### Default Probe Matrix
@@ -71,7 +74,7 @@ ls .claude/regression/probes.md 2>/dev/null || echo "NO_CUSTOM_PROBES"
 | `P-TRIGGER-03` | `harness is complex` | harness-doctor proposed | CLAUDE.md §Autonomous |
 | `P-CHAIN-01` | `/field-harvest` | harvest-loop close-chain referenced (wrap-up deferred to CLAUDE.md session-close chain — field-harvest has no inline sim-conductor gate) | field-harvest SKILL.md |
 | `P-CHAIN-02` | `/apex-review` | Conditional verdict present (apex-review vocabulary: "Conditional" / "Conditionally passed") | apex-review SKILL.md |
-| `P-GATE-01` | new skill commit | New Skill Pre-Commit Gate (5 items) invoked | CLAUDE.md §Gate |
+| `P-GATE-01` | new skill commit | New Skill Pre-Commit Gate (6 items, incl. check-class declared) invoked | CLAUDE.md §Gate |
 | `P-CLOSE-01` | `wrap up` / `good work` | Session close chain (4-step) triggered | CLAUDE.md §Wrap-up |
 ---
@@ -104,7 +107,7 @@ For each affected probe, evaluate:
 - For session close chain: confirm all 4 steps in CLAUDE.md §Wrap-up
 **4-c. Gate presence check** — Are gate conditions still enforced?
-- Pre-commit gate: confirm all 5 items present in CLAUDE.md
+- Pre-commit gate: confirm all 6 items present in CLAUDE.md (incl. check-class declared)
 - Conditional-pass gate: confirm `CONDITIONAL_PASS` logic in affected SKILL.md
 Output per probe:
@@ -161,10 +164,10 @@ Only update on explicit `y` — never auto-update.
 ## Done When
-- All affected probes are evaluated (PASS / FAIL / SKIP)
-- Regression report is output with clear PASS/FAIL verdict
-- If FAIL: specific file + line fix is recommended
-- Baseline updated only on explicit user approval
+- All affected probes are evaluated (PASS / FAIL / SKIP) — class: mandatory-pass
+- Regression report is output with clear PASS/FAIL verdict — class: mandatory-pass
+- If FAIL: specific file + line fix is recommended — class: judged, paired with verify-bidirectional (the fix recommendation is re-checked, not trusted as-is)
+- Baseline updated only on explicit user approval — class: mandatory-pass (HITL)
 ---

package/plugins/fh-meta/skills/steel-quench/SKILL.md CHANGED Viewed

@@ -41,6 +41,7 @@ A designer's anxiety is most dangerous when vague. steel-quench breaks that anxi
 | **Wave 4** (optional) | Meta-Aware Adversary — AI uses its own nature as attack vector | Zero new S-grade + AI-specific criteria |
 | **Wave-P3** (optional) | Gate-passage re-attack — when an upstream gate declares PASS, re-attack the just-passed artifact on Coverage / Narrative / False-confidence | All 3 dimensions Attack Failed |
 | **Wave 5** (optional) | Multi-Team Adversarial Panel — external CLIs or cross-session Claude | Zero new S-grade cross-team |
+| **Wave-T** (after convergence) | Temper — measure complexity the quench *added*; flag over-hardening | τ-PASS or named τ-FAIL |
 ---
@@ -68,7 +69,7 @@ Skip: [list of skipped waves with reason]
 External CLIs available: [yes/no → Wave 5 available]
 ```
-**Degraded coverage rule**: if a high-weight wave or capability is skipped (user choice, unavailable tool, or scope=internal), flag explicitly in the output header — do not silently proceed.
+**Degraded coverage rule**: if a high-weight wave or capability is skipped (user choice, unavailable tool, or scope=internal) **or runs below its declared model-tier floor** (tier-floor resolution, `multi_model_sidecar_strategy.md §Tier-floor`), flag explicitly in the output header — do not silently proceed. Below-floor example: `challenger: sonnet (below-floor; floor=opus)`; a judged verdict produced below floor is a re-quench candidate once a floor-tier is available.
 ---
@@ -194,6 +195,37 @@ domain-coupled (a spec→test-case gate) form to a gate-agnostic boundary hook.
 ---
+## Wave-T — Temper (post-convergence)
+Quench hardens, but quenched steel is brittle — no smith ships it un-tempered. steel-quench attacks
+until zero new S-grade; nothing in that loop asks whether the hardening itself **introduced complexity
+beyond what the fixes required** (defense scaffolding, decorative wiring). Wave-T is that inverse
+corrective. It runs **after Wave 3+ convergence, before Done When**. It does not attack; it measures
+the cost of the convergence just achieved.
+| Step | Class | What it does |
+|---|---|---|
+| **T-1 complexity delta** | measured | `bash templates/temper_check.sh <repo> <file> <pre-quench-ref>` — Δlines/sections/steps/tables/fences/cross-refs, baseline (pre-Wave-1) → post-convergence. Prose-only counts: code-fence interiors are excluded (bash comments are not sections) |
+| **T-2 absolute context** | measured | `harness-doctor` L1–L3 on the post-quench asset — absolute complexity tier (reuse, don't reimplement) |
+| **T-3 τ verdict** | judged — paired with the quench's own Wave findings (each flagged construct must trace to a specific finding it allegedly fixes; no judge-only path) | **τ-PASS**: added complexity ⊆ what the fixes required. **τ-FAIL**: the quench introduced a construct that *defends against an attack rather than fixing the flaw* — name it, propose the simpler form, hand back for de-brittling. τ-FAIL is the temper step working, not a quench failure |
+**T-3 heuristic flags** (any → review, never auto-reject): a new section/table/step whose only referent
+is a Wave-N finding · Δcross-refs ≫ Δsteps (wiring, not function) · the asset crosses a harness-doctor
+complexity tier it was below pre-quench.
+**Don't-overbuild guard (τ applied to τ)**: Wave-T is one script + this section, reusing harness-doctor
+for the absolute read. If Wave-T grows its own detection engine, it has failed its own test — a temper
+step that adds complexity is self-refuting. Known limits (honest): `temper_check.sh` takes one path —
+renamed files need a manual pre/post measurement; the wiring flag uses strict `>`, so Δxrefs = Δsteps
+does not fire it (the section flag usually carries those cases).
+**Model note**: T-1 is bash (no model), T-2 reuses harness-doctor (sonnet-rated). T-3 adjudication was
+validated blind on both Opus and Sonnet (3-fixture ground-truth test, 3/3 each, 2026-06-10) — Wave-T
+end-to-end does not require the largest model tier. Opus remains the recommendation for full
+steel-quench runs (challenger waves are broader than T-3).
+---
 ## External-GT Adjudication (when the target has a public ground truth)
 When quenching a **public artifact that has its own ground truth** — a repo's open issues, test suite, or

package/scripts/selfcheck.sh ADDED Viewed

@@ -0,0 +1,72 @@
+#!/usr/bin/env bash
+# selfcheck.sh — mandatory-pass (deterministic) checks on FH's own executable surface.
+# Class: mandatory-pass (harness_6axis_framework.md §Axis 5 check classes) — blocks on fail.
+# Scope: executables shipped via npm files[] + the bash infra driving the FH gate chain.
+# Syntax-only (node --check / bash -n): zero side effects, no network, runs anywhere.
+# Wiring: `npm test` for any session; `prepublishOnly` so a publish cannot ship a
+# syntactically broken executable.
+set -u
+cd "$(dirname "${BASH_SOURCE[0]}")/.."
+fail=0
+check() { # check <label> <cmd...>
+  local label="$1"; shift
+  if "$@" 2>/dev/null; then
+    echo "PASS  $label"
+  else
+    echo "FAIL  $label"
+    "$@" || true
+    fail=1
+  fi
+}
+# Node executables (npm-shipped)
+for f in bin/*.js; do
+  check "node --check $f" node --check "$f"
+done
+# Bash surface: npm-shipped scripts + local bin wrappers + gate-chain infra
+for f in scripts/*.sh bin/fh-gate bin/fh-run bin/fh-goal \
+         templates/regression_guard.sh templates/temper_check.sh templates/predelete_check.sh templates/.git-hooks/pre-commit; do
+  [ -f "$f" ] || continue
+  check "bash -n $f" bash -n "$f"
+done
+# Count consistency: stated skill/agent counts vs actual directories.
+# Drift class recurred 4x on 2026-06-10 alone (local_fh_context 26, plugin.json "3 agents",
+# README "5 agents", marketplace.json "3 agents") — this makes the check mechanical and permanent.
+# Active skill = SKILL.md without a deprecation marker (frontmatter `deprecated: true` or
+# "DEPRECATED" in the description block).
+count_active() { # count_active <plugin>
+  local n=0 s
+  for s in plugins/"$1"/skills/*/SKILL.md; do
+    [ -f "$s" ] || continue
+    head -20 "$s" | grep -qE 'deprecated: true|DEPRECATED' || n=$((n+1))
+  done
+  echo "$n"
+}
+count_agents() { ls plugins/"$1"/agents/*.md 2>/dev/null | wc -l | tr -d ' '; }
+meta_sk=$(count_active fh-meta); meta_ag=$(count_agents fh-meta)
+com_sk=$(count_active fh-commons); com_ag=$(count_agents fh-commons)
+total_sk=$((meta_sk + com_sk)); total_ag=$((meta_ag + com_ag))
+count_check() { # count_check <label> <file> <expected-string>
+  if grep -q "$3" "$2"; then
+    echo "PASS  count: $1"
+  else
+    echo "FAIL  count: $1 — expected \"$3\" in $2 (actual: fh-meta ${meta_sk}sk/${meta_ag}ag, fh-commons ${com_sk}sk/${com_ag}ag)"
+    fail=1
+  fi
+}
+count_check "fh-meta plugin.json"      plugins/fh-meta/.claude-plugin/plugin.json    "${meta_sk} skills + ${meta_ag} agents"
+count_check "fh-commons plugin.json"   plugins/fh-commons/.claude-plugin/plugin.json "${com_sk} skills"
+count_check "marketplace.json fh-meta" .claude-plugin/marketplace.json               "${meta_sk} skills + ${meta_ag} agents"
+count_check "README header"            README.md                                     "${total_sk} skills · ${total_ag} agents"
+count_check "local_fh_context fh-meta" templates/local_fh_context.md                 "(fh-meta, ${meta_sk})"
+if [ "$fail" -ne 0 ]; then
+  echo "SELFCHECK: FAIL"
+  exit 1
+fi
+echo "SELFCHECK: PASS"