@chrono-meta/fh-gate 1.4.7 → 1.4.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CATALOG.md CHANGED
@@ -8,6 +8,120 @@ AI reads this file first when searching past work. Open individual files for det
8
8
 
9
9
  <!-- Add entries in reverse date order (newest at top) -->
10
10
 
11
+ ### 2026-06-10 | forge-harness | #destructive-op-gate, #irreversibility, #silent-loss, #branch-cleanup-incident
12
+ **File:** CLAUDE.md §Destructive-Op Gate (+ templates/predelete_check.sh, scripts/selfcheck.sh)
13
+ Third irreversibility gate (sibling of Pre-Publish): enumerate → recover → destroy, never destroy-then-check. predelete_check.sh classifies branches SAFE/CHECK/REVIEW (CHECK = 0 unique paths but commits off base — shared files may hold newer content, the silent-loss class); REVIEW blocks scripted deletes (exit 1); the recovery step is judged/depth-sensitive (strongest-tier floor semantics). Signal-table row fires on destructive intents proactively. Origin: same-day incident — a parallel session's card (weekly-audit done + #88) lived only on an unmerged 0-unique-path branch; pre-deletion enumeration recovered it. Dogfood replay: the script lands that exact branch in CHECK.
14
+ - Decision: safety mechanized rather than tier-escalated — Sonnet-default stays valid because the gate carries the depth, with the judged step floor-routed for the residual.
15
+ - Decision: scope split vs Pre-Publish kept explicit (publish = exposure irreversibility, destroy = silent-loss irreversibility).
16
+
17
+
18
+ ### 2026-06-10 | forge-harness | #mode-d-notice, #model-guidance, #self-dev-entry
19
+ **File:** CLAUDE.md §Mode D Model Notice
20
+ Conditional model-pin guidance at self-dev entry (operator request): fires once at the 4-axis gate's own activation trigger — session model opus+ = silent · below-opus = one-line pin recommendation with measured rationale · identity-withheld runtime = static fallback. Advisory only, never auto-switches (pin-is-not-a-cap); field-operation sessions never see it.
21
+ - Decision: zero new triggers — reuses the gate activation condition; the notice and the gate share one detection point.
22
+
23
+
24
+ ### 2026-06-10 | forge-harness | #tier-floor-quench, #below-floor-live, #floor-governance, #dual-challenger
25
+ **File:** knowledge/shared/harness-core/multi_model_sidecar_strategy.md §Floor governance (+ quench-challenger.md)
26
+ Tier-floor design quenched same-day by dual challenger dispatch — F1 opus (floor met) + F2 sonnet (below-floor; floor=opus): first live firing of the below-floor flag, on the design that defines it. Both tiers found the same 2 S-grades: cross-provider "floor-equivalent" undecidable (vibes equivalence) and re-quench tags with no consumer (permanent silent degradation). §Floor governance ships the fixes: external tiers below-floor-by-default until measured equivalence entries (the backend×tier ladder is the evidence source); below-floor judged verdicts provisional until floor re-run or operator acceptance, weekly audit = standing consumer; floor: hard for depth-critical roles (floor outranks diversity); diversity_rationale tie-break; claims age; pin-is-not-a-cap resolution. Wave-T runs #11/#12 both τ-PASS (fix-traceable complexity only).
27
+ - Decision: below-floor challenger empirically buys value (sonnet 6/6 real findings, 2 unique) but misses the deepest (opus-only hard-floor insight) — F2 doctrine and the floor both validated by the same experiment.
28
+ - Open: first weekly-audit below-floor consumption + first measured equivalence entry (laptop ladder).
29
+
30
+
31
+ ### 2026-06-10 | forge-harness | #tier-floor, #model-resolution, #default-sonnet, #sidecar-protocol
32
+ **File:** knowledge/shared/harness-core/multi_model_sidecar_strategy.md (+ quench-challenger.md, steel-quench SKILL.md, README.md, templates/CLAUDE.md)
33
+ Tier-floor resolution ships — the model dimension of the Sidecar Engine Resolution Protocol: assets declare measured-or-justified tier floors (quench-challenger=opus, Wave-T/harness-doctor=sonnet measured, mechanical=none); environment resolves R1 native dispatch / R2 cross-provider / R3 below-floor-with-flag (never hard-fail). Public guidance flips to default `/model sonnet` + floored dispatch; human-set session defaults are inviolable (FH dispatches sub-agents, never switches the session model); pinning the strongest available model recommended only for harness-editing (Mode D).
34
+ - Decision: floor-not-pin semantics; below-floor judged verdicts auto-tagged re-quench candidates (Degraded coverage rule extension); no specific top-anchor model or subscription window named publicly (anti-stale).
35
+ - Decision: grounded in same-day measurements (operation tier-flat 100/100/97/94; depth differential on design increments only) — the guidance flip and the mechanism ship together, neither alone.
36
+ - Open: first organic below-floor dispatch + first Sonnet-default external install feedback (verify_next em-2026-06-10v).
37
+
38
+
39
+ ### 2026-06-10 | forge-harness | #model-tier, #tier-flattening, #worked-example, #output-evidence
40
+ **File:** docs/OUTPUT_EVIDENCE.md (+ README.md §Model setup)
41
+ Model-tier flattening measured and published: 30-point blind battery (rule-application + meta-dev fixtures, pre-registered rubric) on four Claude tiers — operation 100/100/97/94 (anchor/Opus 4.8/Sonnet 4.6/Haiku 4.5), tier separation only on above-rubric design increments (3/3·1/3·0.5/3·0/3). Public claim scoped honestly: single trial, self-graded, worked example not benchmark. README §Model setup gains the evidence note grounding the existing Opus recommendation.
42
+ - Decision: operating FH ≈ model-flat (the harness is the score); developing FH is where tier matters — recommendation unchanged (opus for harness-editing/gates), now evidence-backed. **[superseded same-day by the tier-floor entry above: default flipped to sonnet + floored dispatch; opus pin now Mode-D-only]**
43
+ - Open: real Qwen-class measurement on laptop (batteries are a portable fixture pack, fh-be record).
44
+
45
+ ### 2026-06-10 | forge-harness | #fc, #consent-lane, #federated-compounding, #starved-center, #v3
46
+ **File:** tracks/_contrib/README.md (+ .gitignore, templates/contrib_session.md, docs/CONTRIBUTING.md, README.md)
47
+ FC F1 pour — the consent lane opens (operator override: "실제증명으로 찍어누른다" — real-proof-first supersedes the v2-submission latch). `tracks/_contrib/` becomes the only tracked subtree under tracks/: surgical 2-line un-ignore, placement = consent, PR-time lane gate (PSA + marketplace-gate C5 + ingest contradiction scan + reviewer). Lane charter + session template + CONTRIBUTING §Consent Lane + README row shipped — exactly the don't-overbuild list, no new skill/verifier.
48
+ - Decision: operator lifted the v2 latch (2026-06-10) — paper timing no longer gates feature publicization; proof-by-shipping doctrine recorded in the lab ledger.
49
+ - Decision: privacy default unchanged — the lane is consent-by-placement; the firewall (PSA/Pre-Publish/4-axis) is the lane's load-bearing precondition, not weakened by it.
50
+ - Open: F2 begins on the first external lane PR (live gate run + friction signals); F3 verdict 2 quarters post-open (PAID vs WRITTEN-OFF per falsifiability criterion).
51
+
52
+ ### 2026-06-10 | forge-harness | #wave-t, #temper, #steel-quench, #forge-fourth-movement, #v3
53
+ **File:** plugins/fh-meta/skills/steel-quench/SKILL.md (+ templates/temper_check.sh, docs/ETHOS.md, README.md)
54
+ Wave-T (Temper) ships — the fourth forge movement, poured from lab validation (operator-approved): after Wave 3+ convergence, measure the complexity the quench itself added (T-1 temper_check.sh delta, fence-excluding · T-2 harness-doctor absolute tier · T-3 τ-verdict, judged-paired with the quench's own findings). Validation: 9 runs across 4 independent convergences (0 false flags, simplification un-punished) + synthetic positive control (sensitivity) + same-commit dogfood run #10 on the pour itself (τ-PASS). ETHOS/README "direction ahead" IOU converted to delivery.
55
+ - Decision: organic τ-FAIL redefined from promotion-blocker to standing watch (operator-approved) — synthetic control covers sensitivity; first organic flag → human verdict → recorded.
56
+ - Decision: Wave-T stays one script + one section reusing harness-doctor (don't-overbuild guard is part of the shipped spec).
57
+ - Open: npm 1.4.8 ships Wave-T + temper_check.sh + selfcheck 21 (folded into the open laptop handoff).
58
+
59
+ ### 2026-06-10 | forge-harness | #selfcheck, #count-consistency, #drift-class-kill
60
+ **File:** scripts/selfcheck.sh
61
+ Count-consistency probes (mandatory-pass): plugin.json ×2 / marketplace.json / README header / local_fh_context stated counts vs actual dirs (active = non-deprecated). Motivated by 4 same-day drift instances — the class now fails npm test + prepublishOnly instead of waiting for a doctor run.
62
+ - Decision: deprecation detected mechanically (frontmatter `deprecated: true` or DEPRECATED marker in head -20).
63
+
64
+ ### 2026-06-10 | forge-harness | #backlog-cleanup, #phantom-fix, #count-drift, #goal-quench
65
+ **File:** plugins/fh-meta/skills/goal-quench/SKILL.md (+ templates/local_fh_context.md, plugin.json, README.md)
66
+ Backlog cleanup (cloud session 2): goal-quench phantom vocab fixed — "steel-quench C3 config" → "cross-provider challenger" (anchored to steel-quench:83 vocabulary + §Sidecar Engine Resolution Protocol; token `steel-quench-crossprovider`, 4 spots; resolves `fh_signal_2026-06-10_fh-direct`). Skill/agent count drift fixed: local_fh_context 26→29 active (−2 deprecated, +5 missing), plugin.json agents 3→7, README agents 5→8 (skills 33 verified correct = 36 dirs − 3 deprecated). Resolves the 06-04 "skill-count drift" Open item.
67
+ - Decision: counting policy is mechanical — active = non-deprecated skill dirs, agents = files in agents/; descriptive sidecar labels confirmed over C-numbering (per 06-10 check-class decision).
68
+ - Verified-no-action: 05-26 cross-ref recommendations already implemented — arXiv 2605.18747 (README:282 + definition doc:39) and Sylph 2604.21003 (definition doc:71/:105) — L4 open NOT raised.
69
+ - Open: npm 1.4.8 republish ships plugin.json/README/SKILL.md count+phantom fixes (folded into the open laptop handoff).
70
+
71
+ ### 2026-06-10 | forge-harness | #credit-economy, #operator-intake, #lightweight-triage, #frontier-digest
72
+ **File:** plugins/fh-meta/skills/frontier-digest/SKILL.md (+ .claude/rules/sister_asset_protocol.md)
73
+ Credit-economy engine run #2 formalization (operator-approved; this session = manual run #2: 7 walled sources → 8 audits → 15 same-day gate-passed imports, 2 human corrections): R1 — frontier-digest Step 0.5 operator-intake asks for walled-channel sources (YouTube/LinkedIn/X, machine-403) on cadence runs only, skippable, try wall-bypass (WebSearch + secondary) first. R2 — sister protocol lightweight path: dedup-hit/no-increment sources get a one-paragraph entry, full cross-audits reserved for A/B-tier.
74
+ - Decision: the operator stays the wide-net sensor for walled channels by design (human = scarce oracle), the machine owns endurance (triage→gate cycles); intake cost discipline keeps the wide net wide.
75
+ - Follow-up (same day): Step 0.5 video-harvest ladder — Tier 1 sidecar (codex / Gemini-route, agentic = approval-mode first) → Tier 3 Claude+yt-dlp transcript → operator floor; laptop verification trio handed off. First applied instance of the capability-probe principle.
76
+
77
+ ### 2026-06-10 | forge-harness | #comprehension-debt, #loop-engineering, #osmani, #harness-doctor
78
+ **File:** plugins/fh-meta/skills/harness-doctor/SKILL.md
79
+ Taxonomy 6→7: Comprehension Debt (runtime/operator-side, S) — loop output outpacing operator understanding, named from Osmani's Loop Engineering (canonical upstream of the supervisor-loops lineage; Cherny/Steinberger quotes — paper citation upgraded from the Korean video to this primary source, convergence count unchanged at n=5). Signals mechanical: merged change without CATALOG entry, manifest pending backlog, zero card delta. Countermeasures pre-existing (CATALOG summaries, predict-verify, card protocol).
80
+ - Decision: taxonomy now covers both loop sides — agent behavior (rows 4–6) and operator comprehension (row 7); FH's HITL PR principle recorded as a deliberate L2 cap on asset-changing loops (Osmani L3 rejected for asset mutation).
81
+
82
+ ### 2026-06-10 | forge-harness | #failure-modes, #agentic-laziness, #self-preferential-bias, #goal-drift, #harness-doctor
83
+ **File:** plugins/fh-meta/skills/harness-doctor/SKILL.md
84
+ Harness-Defect Taxonomy 3→6 classes: runtime behavioral failure modes named from dynamic-workflows discourse (LinkedIn full-text triage) — Agentic Laziness (completion claims without per-item evidence, S), Self-Preferential Bias (judged check without adversarial pairing, M), Goal Drift (no pre-compaction completion log, S). FH already had the countermeasures (golden probes/coverage, judged-pairing rule, fh_completed + card-last guard) — the import is the *naming*, making diagnosis explicit.
85
+ - Decision: signals kept mechanically checkable (list/pairing/log presence) — no judge-only signal in the taxonomy itself; sources not counted as new convergence (discourse extension, anti-inflation).
86
+
87
+ ### 2026-06-10 | forge-harness | #no-reinvention, #official-first, #claude-plugins-official, #full-harness-mode
88
+ **File:** knowledge/shared/plugin-catalog/recommended_plugins.md (+ CLAUDE.md, auto_project_mapping.md)
89
+ Operator-declared meta-harness 철칙 — no-reinvention / official-first: (a) plugin catalog gains Category 0.5, the claude-plugins-official 36-plugin inventory (12 LSP + workflow + authoring + setup groups; name-based, anti-stale "re-enumerate when recommending", role-split warnings for claude-md-management overlap); (b) New Skill gate Role-duplication criterion now also checks Tier 0 built-ins + official plugins — reinventing an official capability requires explicit justification; (c) Full-Harness Mode item 4: official-plugin scan for the mapped project's stack (recommend-only, never auto-install) — mapped-project acceleration with zero FH build.
90
+ - Decision: FH builds only what adds governance on top of official capabilities; drafting tools (skill-creator etc.) never exempt their output from the FH gate.
91
+
92
+ ### 2026-06-10 | forge-harness | #tier-0, #builtins, #security-review, #deep-research, #role-split
93
+ **File:** plugins/fh-meta/skills/plugin-recommender/SKILL.md (+ frontier-digest, CLAUDE.md, probes.md)
94
+ CC built-ins utilization imports (operator-approved; video claims verified 9/13 real — agent initially ruled skill-creator nonexistent, operator-corrected: it is an official plugin in claude-plugins-official): Tier 0 = platform built-ins added to plugin-recommender (discovery order 0, "enumerate from live session" anti-stale, governance-add guard for FH-native precedence — goal-quench pattern named). /deep-research as frontier-digest Tier-0 engine. /security-review as Pre-Publish chain item 3 (code-security axis, skip-note path). Permission-Denial Option C, code-review role split, /loop WATCH row. G-TRIG-05 probe synced — anti-stale maintenance rule's first live use.
95
+ - Decision: built-in beats plugin install at ~80% coverage; FH native beats built-in only when it adds governance.
96
+
97
+ ### 2026-06-10 | forge-harness | #ingest-gate, #contradiction-scan, #crossref-lint, #llm-wiki, #karpathy
98
+ **File:** .claude/rules/sync_push_protocols.md (+ harness-doctor SKILL.md, probes.md)
99
+ Karpathy LLM-Wiki sister-audit imports (operator-approved; convergence case n=5, citable primary source): I1 — contradiction scan as Sync step 3 (ingest gate, judged + verify-bidirectional pair): new knowledge grepped against existing claims before indexing, conflicts flagged in both files, old-claim removal is HITL. I2 — harness-doctor L4 knowledge cross-ref lint: no CATALOG entry = S-tier index orphan, no inbound ref = R-tier orphan page. Probes G-SYNC-01/G-LINT-01 added (30 total).
100
+ - Decision: scale escape (W1) deliberately NOT built — watch-item with trigger (CATALOG hundreds of entries / repeated search misses); operator-preferred first remedy = skill-splitter-style CATALOG split-mapping, RAG hybrid only after that.
101
+ - Open: npm republish (harness-doctor SKILL.md shipped) — folded into the open 1.4.8 handoff.
102
+
103
+ ### 2026-06-10 | forge-harness | #golden-probes, #offline-eval, #doc-code-coupling, #anthropic-4layer
104
+ **File:** .claude/regression/probes.md (+ templates/.git-hooks/pre-commit, prompt-regression SKILL.md)
105
+ Anthropic 4-layer sister-audit imports (operator-approved): I1 — 28-probe known-answer golden set with check classes, the standing offline eval prompt-regression auto-loads (its P-GATE-01 had already gone stale 5→6 the same day the gate grew — fixed, plus an explicit anti-stale maintenance rule). I2 — pre-commit doc-code coupling WARN (measured class, never blocks) when executables are staged without any doc asset.
106
+ - Decision: probes.md canonical-when-present, SKILL.md default matrix = Mode C fallback (single-source preserved); coupling check warns rather than blocks — the decision is made conscious, doc-neutral script fixes stay friction-free.
107
+ - Open: npm republish (prompt-regression SKILL.md is shipped) — folded into the open 1.4.8 handoff.
108
+
109
+ ### 2026-06-10 | forge-harness | #new-skill-gate, #check-class, #done-when, #g2
110
+ **File:** CLAUDE.md
111
+ G2 from the supervisor-loops audit: New Skill Creation Pre-Commit Gate extended 5→6 items — "Check-class declared": each Done When condition states its class (mandatory-pass / measured / judged per 6axis §Axis 5), and judged conditions must name their adversarial pairing (no judge-only path). The taxonomy now lives in the operating loop, not just the knowledge doc.
112
+ - Decision: applies to new skills only; existing 28 backfill opportunistically when next edited — no retroactive block.
113
+
114
+ ### 2026-06-10 | forge-harness | #selfcheck, #mandatory-pass, #npm-test, #self-application
115
+ **File:** scripts/selfcheck.sh (+ package.json)
116
+ G1 fix from the supervisor-loops 100%+ audit: FH's own shipped code (bin/fh-*.js, scripts/fh-*.sh) had zero deterministic checks. New selfcheck.sh runs node --check + bash -n over the npm-shipped executable surface and the gate-chain bash infra (15 checks), wired as `npm test` and `prepublishOnly` — a publish can no longer ship a syntactically broken executable.
117
+ - Decision: syntax-only scope (zero side effects, runs anywhere); blocking wired at publish, not commit — doc commits stay unaffected.
118
+ - Open: npm republish to ship selfcheck.sh in the tarball (machine-bound, laptop).
119
+
120
+ ### 2026-06-10 | forge-harness | #verify-axis, #check-classes, #supervisor-loops, #sister-asset
121
+ **File:** knowledge/shared/harness-core/harness_6axis_framework.md
122
+ Axis 5 check-class taxonomy added: every verify check classified as mandatory-pass (deterministic, blocking) / measured (quantitative, tracked) / judged (LLM-judge + cited evidence + corrective action — self-judges grade leniently). Judged rule: a judge verdict never passes alone — paired adversarial re-verification + evidence itself phantom-checked. Import from supervisor-loops sister-audit (operator-approved; cross-audit + proposal in companion store; corrective-action clause added after full-transcript delta check).
123
+ - Decision: descriptive labels over C1/C2/C3 numbering — avoids collision with unanchored "steel-quench C3 config" vocab (goal-quench:283, logged as fh_signal for separate fix).
124
+
11
125
  ### 2026-06-09 | forge-harness | #sidecar, #zero-config, #engine-resolution, #roadmap, #mode-c
12
126
  **File:** knowledge/shared/harness-core/multi_model_sidecar_strategy.md (+ hybrid_orchestration_architecture_roadmap.md)
13
127
  Added canonical §Sidecar Engine Resolution Protocol — Tier1 subscription-CLI → Tier2 API-key → Tier3 Claude-subagent guaranteed fallback. Principle: discovery automatic/free, invocation value-gated (intelligent default multi-AI, no hard-fail for Mode C). Wired pointers into goal-quench Step D / steel-quench runtime-adapter / harvest-loop Step 3.5-X; sim-conductor/pipeline-conductor/agent-composer inherit by reference. Source hybrid-orchestration design archived as proposed roadmap (versions→placeholders, Python pseudo-code→illustrative, non-shipped tagged Proposed). PR #80.
@@ -207,6 +321,42 @@ v1.2 release complete (PR #1–#5): harvest-loop Step 0, agent-composer worktree
207
321
 
208
322
  <!-- Time-independent reference documents -->
209
323
 
324
+ ### 2026-06-06 | pattern | parallax, multi-persona-review, synthesizer, standpoint-coverage
325
+ **File:** `knowledge/shared/patterns/multi-persona-review.md`
326
+ Generalized architecture for multi-persona parallel artifact review ("parallax") — parallel isolated personas + shared output protocol + neutral synthesizer. Domain-agnostic, IP-stripped; embodied as sim-conductor Step 1.5.
327
+
328
+ ### 2026-05-31 | reference | ecosystem-positioning, opencode, hermes, openhuman, readiness
329
+ **File:** `knowledge/shared/harness-core/fh_ecosystem_positioning.md`
330
+ FH's structural position in the AI agent framework ecosystem vs Hermes/OpenCode/OpenHuman — gap analysis, synergy map, layered readiness verdict from 3-model adversarial audit. Companion: `fh_synergy_playbook.md` (concrete workflow specs).
331
+
332
+ ### 2026-06-04 | schema | tpa, target-profile-analysis, routing, sim-conductor, steel-quench
333
+ **File:** `knowledge/shared/harness-core/tpa_schema.md`
334
+ Canonical Target Profile Analysis schema — all TPA-running skills (sim-conductor, steel-quench, phantom-quench, agent-composer) derive routing decisions from this schema. Single source for profile fields.
335
+
336
+ ### 2026-06-04 | draft | goal-quench, anthropic-issue, native-goal, token-budget
337
+ **File:** `knowledge/shared/harness-core/goal_quench_anthropic_issue.md`
338
+ Draft Anthropic GitHub issue — native /goal token budget + quality verification hook proposal. Held until arXiv number confirmation.
339
+
340
+ ### 2026-05-26 | measurement | skill-quality-rubric, maturity-score, verifiable-numbers
341
+ **File:** `knowledge/shared/harness-core/skill_quality_rubric.md`
342
+ Skill maturity score formula definition. Declaring verifiable/evolution numbers without this file violates the cold-audit "self-declaration = delete if no basis" rule.
343
+
344
+ ### 2026-06-04 | core-reference | compounding-loop, weekly-cycle, axis-6, automation
345
+ **File:** `knowledge/shared/harness-core/hub_compounding_loop.md`
346
+ Core reference doc (CLAUDE.md Consult-First table): weekly/monthly/quarterly feedback cycles + Axis-6 Compounding automation roadmap.
347
+
348
+ ### 2026-06-04 | core-reference | runtime-flow, session-chronology, subagent-delegation
349
+ **File:** `knowledge/shared/dialogue/claude_code_runtime_flow.md`
350
+ Core reference doc (CLAUDE.md Consult-First table): chronological flow of a Claude Code session (does) + sub-agent delegation flowchart.
351
+
352
+ ### 2026-06-04 | core-reference | dialogue-playbook, token-efficiency, rule-hierarchy
353
+ **File:** `knowledge/shared/dialogue/ai_dialogue_playbook.md`
354
+ Core reference doc (CLAUDE.md Consult-First table): session-start principles, token efficiency, rule hierarchy, amplifier/coach dual mode (should).
355
+
356
+ ### 2026-06-04 | glossary | terminology, meta-harness, launch-pad, transit-acceleration
357
+ **File:** `knowledge/shared/GLOSSARY.md`
358
+ Key term definitions — meta-harness, meta hub, launch pad effect, transit acceleration value, shared skill pool, operating modes. Entry point for FH-internal vocabulary; linked from CHEATSHEET.
359
+
210
360
  ### 2026-04-28 | template | maturity-roadmap, 3-phase-frame, frontier-tracking, simplification-gate
211
361
  **File:** `knowledge/shared/harness-core/hub_maturity_roadmap.md`
212
362
  Hub long-term evolution path frame. Phase I (entering maturity) → Phase II (frontier following (b)cadence) → Phase III (leading) 3-stage model + 5-criteria gate (audit automation·operations guide·external propagation·sub-agent judgment·self-diagnosis warning) + 6 indicators (seed repo·blog·citations·external adoption·self-evolving·industry original) + simplification gate (self-diagnosis + within 200 lines + unreferenced archive at each transition). General-purpose template derived from first verified operating instance.
package/CHEATSHEET.md CHANGED
@@ -3,6 +3,7 @@
3
3
  Frequently used commands and phrases.
4
4
 
5
5
  > **`<harness-root>`** = the path where you cloned this repo. Example: `~/projects/forge-harness`. Replace `<harness-root>` in the commands below with your actual path.
6
+ > Unfamiliar with a term (meta-harness, launch pad effect, transit acceleration…)? See `knowledge/shared/GLOSSARY.md`.
6
7
 
7
8
  ---
8
9
 
package/CLAUDE.md CHANGED
@@ -87,7 +87,7 @@ The forge-harness hub has a dual identity: **(a) a seed for others** + **(b) you
87
87
  When blocked by auto-mode permission denial, **do not stop at the bare denial** — turn the block into a decision the user can act on in one step:
88
88
 
89
89
  1. **State what was blocked** and why
90
- 2. **Option A — Approval mode**: show exact commands to run after switching; **Option B — Manual review**: list specific files/sections
90
+ 2. **Option A — Approval mode**: show exact commands to run after switching; **Option B — Manual review**: list specific files/sections; **Option C — Reduce future prompts**: propose built-in `/fewer-permission-prompts` when the same read-only call class keeps getting prompted
91
91
  3. **Ask which option** — one line, then wait
92
92
 
93
93
  **Sub-agent variant**: report (what was blocked + ready-to-apply content + exact unblock step) back to orchestrator — never silently fail. Switching modes lifts permission block, not FH gates — the 4-axis gate still applies.
@@ -99,27 +99,31 @@ Simplification guard: trivial denials with one obvious fix → state block + sin
99
99
  > **Full 4-step detail**: `knowledge/shared/harness-core/fh_detail_protocols.md`
100
100
  > **Read this file before Step 1 begins** — duplicate-install detection (Step 1-b) and registry scan (Step 1-c) are only defined there, not in this summary.
101
101
 
102
- **Triggers**: greetings (`hi`/`hello`/`hey`) · start intent (`resume`, `continue`, `where were we`) · new task (`new project`, `new task`) · discovery (`what is this`, `what can you do`, `first time here`)
102
+ **Triggers**: greetings (`hi`/`hello`/`hey`/`안녕` — and the same word in any language; FH is English-based but language-agnostic, a bare greeting in any tongue fires this) · start intent (`resume`, `continue`, `where were we`) · new task (`new project`, `new task`) · discovery (`what is this`, `what can you do`, `first time here`)
103
103
 
104
104
  **4-step summary**: ① Auto-read CLAUDE.md + CATALOG + session card + registry scan → ② One-line proposal (new user / exploratory / returning branches) → ③ 5-skill cascade (plugin-recommender → synergy → .claudeignore → model → verify) → ④ Approval + setup
105
105
 
106
106
  **Identity marker**: every greeting response (Step ②) opens with 🐿️ on its own line. FH's session-start signal — see `fh_detail_protocols.md` Step 2 for full greeting templates.
107
107
 
108
108
  **Guards**: explicit task-entry utterance → skip onboarding · once per session · code/debug requests → start working directly · project routing is a suggestion, mention at most once
109
+ **Metadata-is-not-intent guard**: the trigger is the user's **typed message only**. Session metadata — branch name (auto-derived from the first message, e.g. `claude/korean-greeting-*`), repo name, file paths — is **never** a task spec and never suppresses or redirects the greeting trigger. A bare greeting fires onboarding even when the branch name looks like a feature request; if the only "task" signal lives in metadata and not in what the user typed, treat the message as a greeting and run the 3-axis scaffold.
109
110
 
110
111
  ## New Skill Creation Pre-Commit Gate
111
112
 
112
- All 5 items below must pass before committing a new SKILL.md. If any fails, fix and re-commit.
113
+ All 6 items below must pass before committing a new SKILL.md. If any fails, fix and re-commit.
113
114
 
114
115
  | Item | Criterion |
115
116
  |---|---|
116
- | **Role duplication check** | Pass `/asset-placement-gate` — no overlap with existing role clusters |
117
+ | **Role duplication check** | Pass `/asset-placement-gate` — no overlap with existing role clusters, **platform built-ins (Tier 0), or `claude-plugins-official` (Tier 1 official)**. Reinventing an official capability requires explicit justification in the SKILL.md (no-reinvention rule — FH builds only what adds governance) |
117
118
  | **Description diet** | Plain text / 0 self-marketing expressions / 0 emphasis words (⭐, "critical", "groundbreaking") |
118
119
  | **Done When defined** | At least 1 explicit completion condition |
120
+ | **Check-class declared** | Each Done When condition states its check class — mandatory-pass / measured / judged (`harness_6axis_framework.md` §Axis 5). Any judged condition names its adversarial pairing — no judge-only path |
119
121
  | **Natural language triggers** | At least 3 examples that work without internal vocabulary |
120
122
  | **Independently executable** | Confirmed to work without other FH skills (or dependencies are explicitly documented) |
121
123
 
122
124
  Skills without a Done When definition automatically qualify as harness-doctor L2 M-tier.
125
+ Check-class declaration applies to **new** skills; existing skills backfill opportunistically
126
+ (when next edited), not retroactively.
123
127
 
124
128
  ---
125
129
 
@@ -156,7 +160,22 @@ FH asset modified → Axis 1 (regression_guard.sh --pr {BRANCH})
156
160
  | Forward | `phantom-quench` | Phantom references, paths that don't exist, stale external links |
157
161
  | Record | `edit-manifest` RECORD | Logs predicted impact — closes the predict-verify loop for future harvest-loop |
158
162
 
159
- ---
163
+ ### Mode D Model Notice (fires once, at the same trigger as this gate)
164
+
165
+ The moment FH self-development work begins (= the gate's own activation trigger: an FH asset is about
166
+ to be modified), check the **session model** (self-identity; if the runtime withholds it, treat as
167
+ unknown) and surface **one line** — then proceed, never block:
168
+
169
+ - Model known and opus-tier or above → no notice (already optimal).
170
+ - Model known and below opus-tier → *"이 작업은 FH 자체개발(Mode D)입니다 — 가용 최강 모델 핀을
171
+ 권장합니다 (`/model opus` 이상; 측정 근거: README §Model setup). 그대로 진행해도 floored
172
+ 디스패치가 깊이 턴을 커버하지만, 세션-레벨 설계 깊이는 핀이 좌우합니다."*
173
+ - Model unknown (runtime withholds identity) → static fallback: *"FH 자체개발 작업입니다 — 세션
174
+ 모델이 opus 이상이 아니라면 핀 전환을 권장합니다 (`/model opus`+)."*
175
+
176
+ **Guards**: once per session · advisory only — **never switch the session model** (human override is
177
+ inviolable; a pin is not a cap — tier-floor resolution §Floor governance) · field-project operation
178
+ sessions (no FH asset modification) never see this notice — the Sonnet default stays friction-free.
160
179
 
161
180
  ## Pre-Publish Surface Gate (Irreversibility Gate — Publish, not Commit)
162
181
 
@@ -169,11 +188,14 @@ first time**, especially one **derived from internal/company assets** (operator-
169
188
  private harness): `gh repo create --public`, `gh repo edit --visibility public`, a first push to a new
170
189
  public remote, `npm publish`, `twine upload`, a private→public visibility flip.
171
190
 
172
- **Required before the public action** (both must be non-LEAK) — this gate is the **umbrella that invokes
173
- both**, not a competitor to them; when publish intent is detected, fire *this* gate (it then runs both),
191
+ **Required before the public action** (all must be non-LEAK/non-FAIL) — this gate is the **umbrella that
192
+ invokes them**, not a competitor; when publish intent is detected, fire *this* gate (it then runs the chain),
174
193
  not marketplace-gate alone:
175
194
  1. `/public-surface-audit` — operator-private token scan (real username, corp asset names, home paths)
176
195
  2. `/marketplace-gate` Check 5 — broad public safety (API keys, internal domains, license)
196
+ 3. `/security-review` (built-in, when the repo ships executable code) — code-security pass on the
197
+ publishable surface; complements 1–2 which scan tokens/metadata, not code behavior. Skip note
198
+ (`skipped: docs-only repo` or `skipped: built-in unavailable`) if not applicable
177
199
 
178
200
  > Routing vs the rows below: `/marketplace-gate` alone = "is this ready to **list on a marketplace**?";
179
201
  > `/public-surface-audit` alone = reactive "did I leak a token?"; **this gate** = the *act of going
@@ -193,6 +215,32 @@ checklist** (`templates/PRE-PUBLISH-CHECKLIST.md`) the operator runs on any repo
193
215
 
194
216
  ---
195
217
 
218
+ ## Destructive-Op Gate (Irreversibility Gate — Delete/Rewrite, not Commit)
219
+
220
+ **Order invariant: enumerate → recover → destroy, never destroy-then-check.** Deletion and history
221
+ rewrite are irreversible in the way publish is — except the loss is *silent* (nobody sees what a
222
+ deleted branch was carrying).
223
+
224
+ **When this gate fires** — *before* any of: branch deletion (local or remote), history rewrite /
225
+ force-push, scrub of tracked history, bulk deletion of session records / tracks content.
226
+
227
+ 1. **Enumerate (measured)**: `bash templates/predelete_check.sh <repo> [base]` — per branch: commits
228
+ off base + unique paths. Verdicts: SAFE (fully merged) · CHECK (0 unique paths but commits off
229
+ base — shared files may hold *newer* content, e.g. an unmerged session card) · REVIEW (unique
230
+ paths — recovery mandatory).
231
+ 2. **Recover (judged — depth-sensitive)**: every CHECK/REVIEW item gets a content-direction look;
232
+ live un-integrated state (cards · handoffs · signals · session records) is integrated to main
233
+ **before** anything is deleted. This step exists because the loss class is silent — run it at the
234
+ strongest available tier (floor semantics, §Tier-floor); a below-floor pass is provisional.
235
+ 3. **Destroy** only what passed — REVIEW blocks a scripted delete chain (script exits 1).
236
+
237
+ > Origin: 2026-06-10 branch cleanup — pre-deletion enumeration recovered a parallel session's card
238
+ > (weekly-audit completion + #88 merge state) that existed **only on an unmerged branch** with zero
239
+ > unique paths: exactly the CHECK class, invisible to "is it merged?" intuition. Deletion without the
240
+ > gate destroys live state without anyone noticing.
241
+
242
+ ---
243
+
196
244
  ## Autonomous Initiative Layer — Context-Triggered Skill Proposals (Active Throughout Session)
197
245
 
198
246
  At any point during a session, when the following signals are detected, propose the relevant skill in one line.
@@ -205,7 +253,8 @@ Proposal format: `"I see [X]. Want me to run /[skill] to [one-line description]?
205
253
  | "wrap up this week", "review", "audit", "weekly", "retrospective" | `/harvest-loop` |
206
254
  | "pull this into FH", "reverse-harvest", "worth keeping", "harvest pattern", "field pattern" | `/field-harvest` |
207
255
  | "harness is complex", "too many skills", "check structure", "harness" | `/harness-doctor` |
208
- | "review this PR", "check diff", "code review" | `/hub-cc-pr-reviewer` |
256
+ | "review this PR", "check diff", "code review" | code diff → built-in `/code-review`·`/review` · FH-asset coherence → `/hub-cc-pr-reviewer` (role split) |
257
+ | "keep watching X", "poll this", "check every N minutes", recurring WATCH item | built-in `/loop` (interval runner) — pair with the WATCH list, don't hand-poll |
209
258
  | "are these in sync", "synergy", "can these integrate", "any overlap" | `/cross-ecosystem-synergy-detection` |
210
259
  | "latest trends", "frontier", "external resources" | `/frontier-digest` |
211
260
  | "orchestrate agents", "parallel dispatch", "combine skills", "multiple agents" | `/agent-composer` |
@@ -218,6 +267,7 @@ Proposal format: `"I see [X]. Want me to run /[skill] to [one-line description]?
218
267
  | "add to marketplace", "OK to publish", "pre-publish check" | `/marketplace-gate` |
219
268
  | "did I leak anything", "public surface audit", "private token scan", "is my split clean", "check tracked files for private tokens" | `/public-surface-audit` |
220
269
  | "publish", "make public", "make this repo public", "go public", "gh repo create --public", "flip to public", "first public push", "publish the package", "npm publish", "twine upload" (publish intent — **proactive**, fire *before* the action) | **Pre-Publish Surface Gate** (see above → `/public-surface-audit` + `/marketplace-gate` Check 5 must PASS first) |
270
+ | "delete the branch", "브랜치 삭제", "브랜치 정리", "clean up branches", "force-push", "rewrite history", "지워도 돼?" (destructive intent — **proactive**, fire *before* the action) | **Destructive-Op Gate** (see above → enumerate → recover → destroy; `templates/predelete_check.sh`) |
221
271
  | "look at this again", "is this right", "counterargument", "re-validate" | `/verify-bidirectional` |
222
272
  | "MCP failing", "tool keeps erroring", "circuit-breaker", "same error looping" | `/mcp-circuit-breaker` |
223
273
  | "token budget", "how expensive", "estimate tokens", "will this cost a lot" | `/token-budget-gate` |
package/README.md CHANGED
@@ -166,15 +166,15 @@ hardened by attack, and only then does it ship faster, for having survived.
166
166
  |---|---|---|
167
167
  | **Forge** | shape the raw project into a harness — raise its floor | `install-wizard`, "harness-ify this project" |
168
168
  | **Quench** | harden it by attack — the cold pass leaves standing only what is sound | `steel-quench` · `phantom-quench` |
169
- | **Temper** | take the brittleness back out of the hardened asset | *the movement being forged next* |
169
+ | **Temper** | take the brittleness back out of the hardened asset | `steel-quench` Wave-T · `templates/temper_check.sh` |
170
170
  | → **Accelerate** | a blade that survived the forge cuts faster | `goal-quench` — *Pass → Accelerate* |
171
171
 
172
- Three movements are shipped; **temper** is the direction aheadand naming the movement we have *not*
173
- finished is the point (see [`ETHOS.md`](docs/ETHOS.md#the-forge)). Around the forge, two more signatures keep
174
- it running: `harvest-loop` (each session's lessons become permanent skills) and `agent-composer`
175
- (orchestrate the dispatch). The other skills wait until you need them — full list below.
172
+ All four movements ship. Temper was named before it was built deliberately (see
173
+ [`ETHOS.md`](docs/ETHOS.md#the-forge)) — and shipped once measurement runs validated it. Around the forge,
174
+ two more signatures keep it running: `harvest-loop` (each session's lessons become permanent skills) and
175
+ `agent-composer` (orchestrate the dispatch). The other skills wait until you need them — full list below.
176
176
 
177
- ## 33 skills · 5 agents
177
+ ## 33 skills · 8 agents
178
178
 
179
179
  <details>
180
180
  <summary>Full asset activation check</summary>
@@ -237,18 +237,48 @@ it running: `harvest-loop` (each session's lessons become permanent skills) and
237
237
  Claude Code does not auto-select models by task complexity — you configure this once.
238
238
 
239
239
  ```bash
240
- /model opus # recommended for forge-harness pin Opus so gate/verification turns never silently drop
240
+ /model sonnet # recommended defaultFH dispatches stronger models itself where they matter
241
241
  ```
242
242
 
243
243
  | Command | Who runs what | Best for |
244
244
  |---|---|---|
245
- | `/model opus` | Opus handles everything | **FH default** — verification, governance, agent reasoning |
245
+ | `/model sonnet` | Sonnet session; FH dispatches higher-tier sub-agents on declared floors | **FH default** — operation + routine dev |
246
+ | `/model opus` | Opus handles everything | Harness-editing sessions (Mode D) · maximum depth on every turn |
246
247
  | `/model opusplan` | Opus *plans* · Sonnet executes *(when Opus engages)* | Cost-conscious routine coding — see caveat |
247
- | `/model sonnet` | Sonnet handles everything | Fast, simple tasks (no FH gates) |
248
248
 
249
- **Why `/model opus` for FH**: FH is a *quality* harness — its value lives in verification turns (steel-quench, phantom-quench, the 4-axis gate, agent reasoning) that must not silently run on a weaker model. `opusplan`'s Opus engagement is **not guaranteed**: in a measured 10-turn run it used Opus on **0** turns (CC classifies few turns as "plan-mode"), so the reasoning FH leans on quietly ran on Sonnet — pinning `/model opus` removed that variance (22/22 turns Opus in the follow-up). **Sub-agent dispatch** model is set by the dispatch's own `model` parameter; the session model/plan-mode does **not** propagate Opus to sub-agents, so pin Opus there too for adversarial/verification agents. Use `opusplan` only when the task is mostly routine edits and you accept that Opus may not engage.
250
-
251
- > **By role**: editing the harness itself → `/model opus` (full). Running FH on a field project `/model opus` for any gated/verification work; `opusplan` is an acceptable cost tradeoff for routine coding, with the caveat above. Sub-agent token costs are CC-visible in the session jsonl under `message.model`.
249
+ **Why default Sonnet now works**: measured (see §Model setup evidence note below), *operating* FH is
250
+ nearly model-flat — the rules in context do most of the work. What still needs a stronger model is a
251
+ small set of depth-sensitive turns, and FH handles those itself: **some skills and agents declare a
252
+ model-tier floor** (e.g. `quench-challenger` floors at opus) and are dispatched as sub-agents at the
253
+ floor tier when your environment can reach it — your session model stays untouched. **FH never switches
254
+ your session model**: a default you set by hand is followed; floors apply only to FH's own sub-agent
255
+ dispatches. If your environment tops out below a floor (e.g. Sonnet-only API routing), the floored
256
+ asset still runs at the best available tier with an explicit `below-floor` flag in its output — degraded
257
+ delivery is visible, never silent (tier-floor resolution: `knowledge/shared/harness-core/multi_model_sidecar_strategy.md §Tier-floor`).
258
+
259
+ **`opusplan` caveat (measured)**: its Opus engagement is **not guaranteed** — in a measured 10-turn run
260
+ it used Opus on **0** turns (CC classifies few turns as "plan-mode"). If you want Opus on every turn,
261
+ pin `/model opus` (22/22 turns Opus in the follow-up run). **Sub-agent dispatch** model is set by the
262
+ dispatch's own `model` parameter; the session model/plan-mode does **not** propagate to sub-agents.
263
+
264
+ > **By role**: running FH (field projects, gates, routine dev) → `/model sonnet` + let the floors
265
+ > escalate. Editing the harness itself (Mode D) → pin the strongest model you have — harness
266
+ > *self-development* is where tier depth measurably pays (design-increment finding), while operation
267
+ > does not. Sub-agent token costs are CC-visible in the session jsonl under `message.model`.
268
+
269
+ **Measured, not asserted** (2026-06-10, worked example): on a 30-point blind rule-application battery,
270
+ *operating* FH was nearly model-flat — Opus 4.8 / Sonnet 4.6 / Haiku 4.5 scored **100 / 97 / 94** against
271
+ a top-tier anchor at 100, with the rules in context doing most of the work. The tiers separated only on
272
+ above-rubric *design* increments (developing the harness, not running it) — which is why the default is
273
+ Sonnet with **tier-floored dispatch** covering the depth-sensitive turns, and a pinned stronger model is
274
+ recommended only for harness-editing sessions. Details: `docs/OUTPUT_EVIDENCE.md` §Validation signals.
275
+
276
+ **Measured, not asserted** (2026-06-10, worked example): on a 30-point blind rule-application battery,
277
+ *operating* FH was nearly model-flat — Opus 4.8 / Sonnet 4.6 / Haiku 4.5 scored **100 / 97 / 94** against
278
+ a top-tier anchor at 100, with the rules in context doing most of the work. The tiers separated only on
279
+ above-rubric *design* increments (developing the harness, not running it) — which is exactly why the
280
+ recommendation stays Opus for harness-editing and gate turns, while field operation tolerates lower tiers.
281
+ Details: `docs/OUTPUT_EVIDENCE.md` §Validation signals.
252
282
 
253
283
  If you use external CLIs (Gemini, Codex, `gh copilot`) as sidecars, their costs are billed to their own quota and not visible in CC's token display.
254
284
 
@@ -293,4 +323,5 @@ External convergence:
293
323
  | [`AGENTS.md`](AGENTS.md) | Runtime agent specs |
294
324
  | [`CATALOG.md`](CATALOG.md) | Past work search index |
295
325
  | [`CONTRIBUTING.md`](docs/CONTRIBUTING.md) | How to contribute skills and patterns |
326
+ | [`tracks/_contrib/`](tracks/_contrib/README.md) | **Consent lane** — share a de-identified work session; the repo compounds across operators, not just locally |
296
327
  | [`fh_integration_contract.md`](knowledge/shared/harness-core/fh_integration_contract.md) | Governance gate spec |
@@ -12,6 +12,7 @@ If you'd like to make forge-harness better, pull requests are welcome.
12
12
  | **Templates** | Common files to add under `templates/` |
13
13
  | **Documentation** | README, skill description refinement, typo fixes |
14
14
  | **Field pattern harvest** | Proposing a pattern discovered in real use as a skill (see `/field-harvest`) |
15
+ | **Session contribution (consent lane)** | Share a de-identified work session in `tracks/_contrib/` — see §Consent Lane below |
15
16
 
16
17
  ## Choosing a Contribution Path — Check First
17
18
 
@@ -24,6 +25,21 @@ If you'd like to make forge-harness better, pull requests are welcome.
24
25
 
25
26
  > **Adding a new skill = Full path required**. Improving an existing skill = Lightweight path available.
26
27
 
28
+ ## Consent Lane — share a session (`tracks/_contrib/`)
29
+
30
+ Everything under `tracks/` is private by design **except** `tracks/_contrib/` — the consent lane.
31
+ Placing a session file there is your explicit consent to publish it; it lands via PR through the lane
32
+ gate (`/public-surface-audit` + `/marketplace-gate` Check 5 + ingest contradiction scan + reviewer pass).
33
+
34
+ - Start from `templates/contrib_session.md` → `tracks/_contrib/{your-handle}/{topic}/session_YYYY_MM_DD_{slug}.md`
35
+ - **De-identify first** (no employer/project/colleague names, paths, domains, credentials) — the gate
36
+ re-checks, but scrubbing is yours first
37
+ - Full charter, gate table, and what-happens-after-merge: [`tracks/_contrib/README.md`](../tracks/_contrib/README.md)
38
+ - Overhead: minimal — the skill-PR rules don't apply (it's a session, not a skill); only the lane gate + frontmatter floor
39
+
40
+ Merged sessions get CATALOG credit under your handle, and `harvest-loop` distills repeating patterns
41
+ into shared knowledge/skills — your session becomes compound interest for every cloner.
42
+
27
43
  ## PR Rules (Short Version)
28
44
 
29
45
  1. **New skill** → Create `plugins/fh-meta/skills/{name}/SKILL.md` + add version line to `plugins/fh-meta/CHANGELOG.md`
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@chrono-meta/fh-gate",
3
- "version": "1.4.7",
3
+ "version": "1.4.9",
4
4
  "description": "FH runtime adapters — run FH governance, skills, and agents via Claude or Codex with machine-parseable gates.",
5
5
  "license": "MIT",
6
6
  "keywords": [
@@ -23,7 +23,9 @@
23
23
  "fh-goal": "bin/fh-goal.js"
24
24
  },
25
25
  "scripts": {
26
- "prepare": "chmod +x bin/fh-gate.js bin/fh-run.js bin/fh-goal.js scripts/fh-gate.sh scripts/fh-run.sh scripts/fh-goal.sh"
26
+ "prepare": "chmod +x bin/fh-gate.js bin/fh-run.js bin/fh-goal.js scripts/fh-gate.sh scripts/fh-run.sh scripts/fh-goal.sh",
27
+ "test": "bash scripts/selfcheck.sh",
28
+ "prepublishOnly": "bash scripts/selfcheck.sh"
27
29
  },
28
30
  "engines": {
29
31
  "node": ">=16"
@@ -54,6 +56,7 @@
54
56
  "scripts/fh-gate.sh",
55
57
  "scripts/fh-run.sh",
56
58
  "scripts/fh-goal.sh",
59
+ "scripts/selfcheck.sh",
57
60
  "plugins/fh-meta/skills",
58
61
  "plugins/fh-meta/agents",
59
62
  "plugins/fh-commons/skills",
@@ -19,7 +19,14 @@ description: Dedicated quench attack-prescription synthesis agent — Devil (6-a
19
19
  If even one S-tier finding exists, registration is blocked. A-tier and below allow registration after fix recommendations.
20
20
  </commentary>
21
21
  </example>
22
- model: sonnet
22
+ model: opus
23
+ # model is a HARD FLOOR (tier-floor resolution — multi_model_sidecar_strategy.md §Tier-floor):
24
+ # adversarial increment-finding is the depth-sensitive class. floor: hard semantics — the floor
25
+ # outranks diversity: prefer any floor-meeting engine (incl. native Tier-3 at opus) over a below-floor
26
+ # diversity engine. Only when NO engine reaches the floor, dispatch at best available with the
27
+ # below-floor header (e.g. "challenger: sonnet (below-floor; floor=opus)") — never hard-fail.
28
+ # Below-floor judged verdicts are PROVISIONAL (not gate-PASS evidence) until floor-tier re-run or
29
+ # explicit operator acceptance; the weekly audit is the standing re-quench consumer.
23
30
  color: red
24
31
  tools: Read, Grep, Glob
25
32
  version: 0.2
@@ -29,11 +29,48 @@ model: sonnet
29
29
 
30
30
  ```
31
31
  Priority:
32
+ 0. /deep-research built-in available (check live session skill list)
33
+ → use it as the collection+verification engine (staged source gathering,
34
+ cross-checking, cited synthesis) — Tier-0 route, no API key needed
32
35
  1. ANTHROPIC_API_KEY environment variable → Claude Sonnet
33
36
  2. Neither → WebSearch mode (raw data only, no synthesis)
34
37
  ```
35
38
 
36
- Report detection result in one line. Example: `🔑 Using Claude Sonnet`
39
+ Report detection result in one line. Example: `🔑 Using Claude Sonnet` / `🔑 Engine: /deep-research (built-in)`
40
+
41
+ ---
42
+
43
+ ## Step 0.5. Operator Intake (speculative-interview arm — walled channels)
44
+
45
+ On **cadence-triggered** runs (7d), ask the operator one line before collecting:
46
+
47
+ > *"이번 주에 본 벽 뒤 소스(YouTube·LinkedIn·X 등 기계가 못 닿는 링크/요약)가 있으면 던져 주세요 — triage해서 기록합니다. 없으면 그냥 진행할게요."*
48
+
49
+ - Operator may skip — zero pressure; the autonomous arms (Step 1) run regardless.
50
+ - Why: walled channels return 403 to machine fetch — **the operator is the only wide-net sensor
51
+ for them**. This arm turns ad-hoc link-throwing into a scheduled intake (manual-validated n=2).
52
+ - Received sources route to the sister-asset/triage flow with its **lightweight path**: C-tier
53
+ (territory already covered) = one-paragraph entry only; full cross-audit reserved for A/B-tier.
54
+ Partial wall-bypass is allowed first: try WebSearch + secondary sources before declaring unfetchable.
55
+ - **Video sources (local/laptop only — cloud VMs typically 403 on video hosts)**: resolve a
56
+ *video-harvest* capability via the Sidecar Engine Resolution Protocol
57
+ (`multi_model_sidecar_strategy.md`) — probe by **capability, not engine name**. A CLI that is a
58
+ valid sidecar for other tasks is not automatically a video-harvest engine.
59
+ **Tier 1 — a natively multimodal CLI that ingests the URL directly**: probe for one, then route
60
+ to it (pre-EOL example invocation: `gemini --skip-trust -p "{URL}"` → timestamped summary). ⚠️
61
+ **the direct `gemini` CLI is being sunset (vendor EOL 2026-06-18)** — after that probe `agy` (the
62
+ Antigravity router-shell successor, same class) or the Gemini API; see
63
+ `multi_model_sidecar_strategy.md §Binary names churn`. A coding-agent CLI with **no native
64
+ video/transcript access** (e.g. `codex`) cannot do this — it burns tokens, recovers only
65
+ metadata (title), then asks for a pasted transcript; do not route video to it. Agentic
66
+ router-shells (`agy`) get approval-mode first.
67
+ **Tier 3** (Tier 2 = router-shell / Gemini API, see the protocol) **— conditional fallback (not
68
+ guaranteed)**: `yt-dlp --write-auto-subs --skip-download` then summarize the transcript — fine
69
+ for talk-style content. **Probe the environment first**:
70
+ `yt-dlp --version && python3 -c "import curl_cffi" && command -v ffmpeg` — Tier 3 needs all three
71
+ (`yt-dlp`, `curl_cffi` impersonation target, `ffmpeg`), and the timedtext endpoint may return
72
+ HTTP 429; if the probe fails, fall through rather than assuming it works. Unresolvable (cloud, no sidecar; or all tiers blocked) →
73
+ operator summary remains the path, as today.
37
74
 
38
75
  ---
39
76
 
@@ -280,14 +280,14 @@ Runs after Step C (or after Step B if no capability gap). Selects an adversarial
280
280
 
281
281
  | Task scope signal | Sidecar | Invocation point |
282
282
  |---|---|---|
283
- | Code quality review / new SKILL.md / governance gate change | `steel-quench` C3 config (Gemini sidecar if available) | Post-/goal, before pipeline-conductor |
283
+ | Code quality review / new SKILL.md / governance gate change | `steel-quench` cross-provider challenger (Gemini sidecar if available; Tier 3 Claude sub-agent fallback per the Sidecar Engine Resolution Protocol) | Post-/goal, before pipeline-conductor |
284
284
  | Architecture design / cross-project dependency | `agent-composer` multi-model panel (if external CLIs available; otherwise single-Claude sub-agent) | Parallel to /goal as separate Agent |
285
285
  | External publication / marketplace-gate / skill release | `sim-conductor` + `steel-quench` Wave 5 | Post-/goal quality gate |
286
286
  | No signal match (default) | None — pipeline-conductor handles quality alone | — |
287
287
 
288
288
  Write resolved sidecar to `.active`:
289
289
  ```
290
- sidecar: none | steel-quench-c3 | agent-composer-panel | sim-conductor | {cli-name}
290
+ sidecar: none | steel-quench-crossprovider | agent-composer-panel | sim-conductor | {cli-name}
291
291
  sidecar_rationale: {one-line reason — which scope signal triggered this}
292
292
  ```
293
293
 
@@ -410,7 +410,7 @@ After pipeline-conductor completes, read `sidecar:` from `.pending`. If non-none
410
410
 
411
411
  | Sidecar | Wait for | Failure action |
412
412
  |---|---|---|
413
- | `steel-quench-c3` | Wave 2 convergence: PASS or CONDITIONAL_PASS | FAIL → reopen Phase 2, surface blocking findings to user |
413
+ | `steel-quench-crossprovider` | Wave 2 convergence: PASS or CONDITIONAL_PASS | FAIL → reopen Phase 2, surface blocking findings to user |
414
414
  | `agent-composer-panel` | Panel findings file written + incorporated into pipeline-conductor context | Not yet written → re-run pipeline-conductor with panel findings |
415
415
  | `sim-conductor` + `steel-quench-w5` | Both: sim-conductor Area A PASS **and** steel-quench W5 convergence PASS | Either FAIL → block Done When; surface failing verdict |
416
416
 
@@ -438,7 +438,7 @@ After each goal-quench run, append to `tracks/_meta/goal_quench_{YYYY-MM-DD}.md`
438
438
  actual_vs_estimate_ratio: N.N # actual / estimated (e.g., 4.7 means actual was 4.7× estimate)
439
439
  budget_verdict: GREEN/YELLOW/ORANGE/RED
440
440
  pipeline_verdict: CLEAN/PENDING/BLOCKED/ESCALATE
441
- sidecar: none | steel-quench-c3 | agent-composer-panel | sim-conductor | {cli-name}
441
+ sidecar: none | steel-quench-crossprovider | agent-composer-panel | sim-conductor | {cli-name}
442
442
  sidecar_type: none | cc-subagent | external-cli # cc-subagent = isolated Sonnet context; external-cli = untracked
443
443
  sidecar_model: none | sonnet | {model-name} # standard baseline: sonnet; external CLIs vary by tier
444
444
  orchestrator_tokens: N | unknown # Opus orchestrator CC tokens (pro/max only; visible in CC)
@@ -107,6 +107,7 @@ File: {path}
107
107
  | .claude/rules files unmodified 90+ days | R-tier |
108
108
  | settings.json JSON syntax error | M-tier |
109
109
  | Hooks in settings.json (PostToolUse/Stop) that don't fire in Agent View | S-tier or M-tier |
110
+ | Public-surface FH recommendation or default changed — no N-shot measurement evidence traceable in session records or PR body | S-tier |
110
111
 
111
112
  Hook divergence verdict: 0 hooks = Normal · 1+ hooks (session-end/Stop) = S-tier · 1+ hooks (PostToolUse file writes or external API) = M-tier (data loss risk in Agent View).
112
113
 
@@ -115,6 +116,10 @@ Hook divergence verdict: 0 hooks = Normal · 1+ hooks (session-end/Stop) = S-tie
115
116
  - Tracks with no sync in 30+ days → R-tier
116
117
  - CATALOG.md: 5+ open items in recent 5 sessions → S-tier
117
118
  - Field project CLAUDE.md missing → S-tier
119
+ - **Knowledge cross-ref lint**: `knowledge/**/*.md` with no CATALOG.md entry → S-tier
120
+ (index orphan — unfindable via CATALOG-first search); with no inbound reference from
121
+ any CLAUDE.md / rules / SKILL.md / knowledge doc → R-tier (orphan page).
122
+ Mechanical: `grep -L` filenames against CATALOG.md, then `grep -rl` for inbound refs.
118
123
 
119
124
  ### Step 6. L5 — Pattern Analysis *(FH only)*
120
125
 
@@ -154,8 +159,19 @@ Hook divergence verdict: 0 hooks = Normal · 1+ hooks (session-end/Stop) = S-tie
154
159
  | **Context Drift** | Stale paths in CLAUDE.md (L3) · rules unmodified 90+ days (L3) · CLAUDE.md over threshold (L2) | S |
155
160
  | **Schema Misalignment** | Plugin count drift · SKILL.md missing `Done When` (undocumented contract) | M |
156
161
  | **State Degradation** | tracks/ no sync 30+ days (L4) · INACTIVE_90D skills (L5-A) · orphaned memory entries | M |
162
+ | **Agentic Laziness** *(runtime)* | Completion claims without per-item evidence — "all N done" with no enumerable list in session records / PR bodies · Done When lacking any mandatory-pass condition | S |
163
+ | **Self-Preferential Bias** *(runtime)* | judged-class check without a named adversarial pairing (gate-rejectable for new skills; scan existing for backfill) · a self-graded verdict cited as sole completion evidence | M |
164
+ | **Goal Drift** *(runtime)* | Long session with S/A-tier work but no pre-compaction completion log (`fh_completed_*` missing) · early-stated constraints absent from card/manifest at close | S |
165
+ | **Comprehension Debt** *(runtime/operator)* | Merged FH-asset change with no CATALOG entry · edit_manifest `validation_status: pending` backlog piling up unverified · session closed with work done but zero card delta | S |
166
+ | **Evidence Gap** *(FH self-dev)* | Public-surface FH recommendation or default changed with no N-shot measurement evidence traceable in session records or PR history (L3 check) | S |
157
167
 
158
168
  Signal in 2+ classes → escalate one tier.
169
+ Rows 1–3 = structural (2026-06-02 frontier diagnosis). Rows 4–6 = runtime behavioral, agent-side —
170
+ named from dynamic-workflows discourse (2026-06). Row 7 = runtime, operator-side — named from
171
+ loop-engineering discourse (Osmani, 2026-06): loop output outpacing operator understanding. FH
172
+ countermeasures already exist and are what the signals check for: golden probes + multi-persona
173
+ coverage (laziness) · judged-pairing rule (bias) · pre-compaction completion log + card-last guard
174
+ (drift) · CATALOG 3-line summaries + manifest predict-verify + card protocol (comprehension debt).
159
175
 
160
176
  ---
161
177
 
@@ -247,6 +263,8 @@ Verdict: ✅ CONSISTENT · 🟧 INCONSISTENT (fix before push) · 🟩 REVIEW (i
247
263
 
248
264
  **This skill Done When = "prescription report output complete".** Actual resolution of M/S/R items belongs to user or follow-up work.
249
265
 
266
+ **Check classes** (`harness_6axis_framework.md` §Axis 5): report-output completions above = mandatory-pass (report exists or not). M/S/R tier *assignments* = judged — paired with verify-bidirectional when the user challenges a tier.
267
+
250
268
  Verdict: PASS (M-tier 0, "Structure healthy") | CONDITIONAL_PASS (S/R remain, no M) | FAIL (1+ M-tier found) | ESCALATE (structural ambiguity before prescription can be issued)
251
269
 
252
270
  **Three-Doctor Loop chain** (auto-propose after prescription report):
@@ -49,6 +49,7 @@ Tier is **independent of platform origin** (Anthropic / OpenAI / community). A w
49
49
 
50
50
  | Tier | Criteria | Sources |
51
51
  |---|---|---|
52
+ | **Tier 0** | Platform built-in — ships with the runtime, zero install, zero token cost to discover | Claude Code built-in skills/commands (e.g. `/deep-research`, `/code-review`, `/review`, `/security-review`, `/batch`, `/loop`, `/fewer-permission-prompts`, `/goal`, `/rewind`, `/team-onboarding`) — inventory varies by version/environment: enumerate from the live session, do not assume |
52
53
  | **Tier 1** | Marketplace-listed + performance-validated (benchmark data or production usage evidence) | Anthropic official · Codex marketplace verified · CC marketplace verified · FH community reviewed (steel-quench + sim-conductor validated) |
53
54
  | **Tier 2** | Marketplace-listed, no explicit performance data | Any marketplace source (CC marketplace · Codex marketplace · npm · GHE) |
54
55
  | **Tier 3** | Not marketplace-listed, source-available (GitHub/npm) | Repo-only agents/plugins |
@@ -87,13 +88,15 @@ When queried for a specific capability (e.g., "adversarial reviewer for bash cod
87
88
 
88
89
  **Discovery order (stop when sufficient Tier 1 candidates found):**
89
90
 
91
+ 0. **Platform built-ins (Tier 0)** — does a built-in skill/command already cover the capability? Check the live session's available-skills list before any plugin search. A built-in that covers ~80% beats installing a plugin for the rest
90
92
  1. **Installed locally** — `.claude/agents/`, `plugins/` in current cwd
91
93
  2. **FH native skills** — always-loaded knowledge in `plugins/fh-meta/` and `plugins/fh-commons/`
92
94
  3. **Claude Code marketplace** — `claude mcp search [capability]` or known CC registry (see verified targets above)
93
95
  4. **Codex marketplace** — `npx @openai/codex list-agents [capability]` or known Codex registry
94
96
  5. **npm ecosystem** — `@chrono-meta/`, `@anthropic/`, and other known-quality scoped packages
95
97
 
96
- **Discovery priority**: installed > FH native > Tier 1 (any platform) > Tier 2 > Tier 3 > Tier 4
98
+ **Discovery priority**: built-in (Tier 0) > installed > FH native > Tier 1 (any platform) > Tier 2 > Tier 3 > Tier 4
99
+ **Tier 0 guard**: FH native wins over a built-in only when the FH skill adds governance the built-in lacks (e.g. `/goal` → `goal-quench` adds budget+quality gates; code diff review stays with built-in `/code-review`, FH-asset coherence with `hub-cc-pr-reviewer`)
97
100
 
98
101
  **When sim-conductor chains here for persona discovery**: apply the same platform-aware search scoped to persona/simulation/review capability tags. Return discovered agents with their Tier rating so sim-conductor can decide whether to install or use a built-in brief.
99
102
 
@@ -57,9 +57,12 @@ Check for custom probes:
57
57
  ls .claude/regression/probes.md 2>/dev/null || echo "NO_CUSTOM_PROBES"
58
58
  ```
59
59
 
60
- **If custom probes exist**: load and use them.
60
+ **If custom probes exist**: load and use them. The hub repo ships its golden probe set
61
+ (known-answer offline eval, 28 probes with check classes) at exactly this path — when
62
+ present it is canonical and supersedes the default matrix below.
61
63
 
62
- **If no custom probes**: use the default probe matrix below.
64
+ **If no custom probes** (e.g. Mode C install without the hub repo): use the default
65
+ probe matrix below.
63
66
 
64
67
  #### Default Probe Matrix
65
68
 
@@ -71,7 +74,7 @@ ls .claude/regression/probes.md 2>/dev/null || echo "NO_CUSTOM_PROBES"
71
74
  | `P-TRIGGER-03` | `harness is complex` | harness-doctor proposed | CLAUDE.md §Autonomous |
72
75
  | `P-CHAIN-01` | `/field-harvest` | harvest-loop close-chain referenced (wrap-up deferred to CLAUDE.md session-close chain — field-harvest has no inline sim-conductor gate) | field-harvest SKILL.md |
73
76
  | `P-CHAIN-02` | `/apex-review` | Conditional verdict present (apex-review vocabulary: "Conditional" / "Conditionally passed") | apex-review SKILL.md |
74
- | `P-GATE-01` | new skill commit | New Skill Pre-Commit Gate (5 items) invoked | CLAUDE.md §Gate |
77
+ | `P-GATE-01` | new skill commit | New Skill Pre-Commit Gate (6 items, incl. check-class declared) invoked | CLAUDE.md §Gate |
75
78
  | `P-CLOSE-01` | `wrap up` / `good work` | Session close chain (4-step) triggered | CLAUDE.md §Wrap-up |
76
79
 
77
80
  ---
@@ -104,7 +107,7 @@ For each affected probe, evaluate:
104
107
  - For session close chain: confirm all 4 steps in CLAUDE.md §Wrap-up
105
108
 
106
109
  **4-c. Gate presence check** — Are gate conditions still enforced?
107
- - Pre-commit gate: confirm all 5 items present in CLAUDE.md
110
+ - Pre-commit gate: confirm all 6 items present in CLAUDE.md (incl. check-class declared)
108
111
  - Conditional-pass gate: confirm `CONDITIONAL_PASS` logic in affected SKILL.md
109
112
 
110
113
  Output per probe:
@@ -161,10 +164,10 @@ Only update on explicit `y` — never auto-update.
161
164
 
162
165
  ## Done When
163
166
 
164
- - All affected probes are evaluated (PASS / FAIL / SKIP)
165
- - Regression report is output with clear PASS/FAIL verdict
166
- - If FAIL: specific file + line fix is recommended
167
- - Baseline updated only on explicit user approval
167
+ - All affected probes are evaluated (PASS / FAIL / SKIP) — class: mandatory-pass
168
+ - Regression report is output with clear PASS/FAIL verdict — class: mandatory-pass
169
+ - If FAIL: specific file + line fix is recommended — class: judged, paired with verify-bidirectional (the fix recommendation is re-checked, not trusted as-is)
170
+ - Baseline updated only on explicit user approval — class: mandatory-pass (HITL)
168
171
 
169
172
  ---
170
173
 
@@ -41,6 +41,7 @@ A designer's anxiety is most dangerous when vague. steel-quench breaks that anxi
41
41
  | **Wave 4** (optional) | Meta-Aware Adversary — AI uses its own nature as attack vector | Zero new S-grade + AI-specific criteria |
42
42
  | **Wave-P3** (optional) | Gate-passage re-attack — when an upstream gate declares PASS, re-attack the just-passed artifact on Coverage / Narrative / False-confidence | All 3 dimensions Attack Failed |
43
43
  | **Wave 5** (optional) | Multi-Team Adversarial Panel — external CLIs or cross-session Claude | Zero new S-grade cross-team |
44
+ | **Wave-T** (after convergence) | Temper — measure complexity the quench *added*; flag over-hardening | τ-PASS or named τ-FAIL |
44
45
 
45
46
  ---
46
47
 
@@ -68,7 +69,7 @@ Skip: [list of skipped waves with reason]
68
69
  External CLIs available: [yes/no → Wave 5 available]
69
70
  ```
70
71
 
71
- **Degraded coverage rule**: if a high-weight wave or capability is skipped (user choice, unavailable tool, or scope=internal), flag explicitly in the output header — do not silently proceed.
72
+ **Degraded coverage rule**: if a high-weight wave or capability is skipped (user choice, unavailable tool, or scope=internal) **or runs below its declared model-tier floor** (tier-floor resolution, `multi_model_sidecar_strategy.md §Tier-floor`), flag explicitly in the output header — do not silently proceed. Below-floor example: `challenger: sonnet (below-floor; floor=opus)`; a judged verdict produced below floor is a re-quench candidate once a floor-tier is available.
72
73
 
73
74
  ---
74
75
 
@@ -194,6 +195,37 @@ domain-coupled (a spec→test-case gate) form to a gate-agnostic boundary hook.
194
195
 
195
196
  ---
196
197
 
198
+ ## Wave-T — Temper (post-convergence)
199
+
200
+ Quench hardens, but quenched steel is brittle — no smith ships it un-tempered. steel-quench attacks
201
+ until zero new S-grade; nothing in that loop asks whether the hardening itself **introduced complexity
202
+ beyond what the fixes required** (defense scaffolding, decorative wiring). Wave-T is that inverse
203
+ corrective. It runs **after Wave 3+ convergence, before Done When**. It does not attack; it measures
204
+ the cost of the convergence just achieved.
205
+
206
+ | Step | Class | What it does |
207
+ |---|---|---|
208
+ | **T-1 complexity delta** | measured | `bash templates/temper_check.sh <repo> <file> <pre-quench-ref>` — Δlines/sections/steps/tables/fences/cross-refs, baseline (pre-Wave-1) → post-convergence. Prose-only counts: code-fence interiors are excluded (bash comments are not sections) |
209
+ | **T-2 absolute context** | measured | `harness-doctor` L1–L3 on the post-quench asset — absolute complexity tier (reuse, don't reimplement) |
210
+ | **T-3 τ verdict** | judged — paired with the quench's own Wave findings (each flagged construct must trace to a specific finding it allegedly fixes; no judge-only path) | **τ-PASS**: added complexity ⊆ what the fixes required. **τ-FAIL**: the quench introduced a construct that *defends against an attack rather than fixing the flaw* — name it, propose the simpler form, hand back for de-brittling. τ-FAIL is the temper step working, not a quench failure |
211
+
212
+ **T-3 heuristic flags** (any → review, never auto-reject): a new section/table/step whose only referent
213
+ is a Wave-N finding · Δcross-refs ≫ Δsteps (wiring, not function) · the asset crosses a harness-doctor
214
+ complexity tier it was below pre-quench.
215
+
216
+ **Don't-overbuild guard (τ applied to τ)**: Wave-T is one script + this section, reusing harness-doctor
217
+ for the absolute read. If Wave-T grows its own detection engine, it has failed its own test — a temper
218
+ step that adds complexity is self-refuting. Known limits (honest): `temper_check.sh` takes one path —
219
+ renamed files need a manual pre/post measurement; the wiring flag uses strict `>`, so Δxrefs = Δsteps
220
+ does not fire it (the section flag usually carries those cases).
221
+
222
+ **Model note**: T-1 is bash (no model), T-2 reuses harness-doctor (sonnet-rated). T-3 adjudication was
223
+ validated blind on both Opus and Sonnet (3-fixture ground-truth test, 3/3 each, 2026-06-10) — Wave-T
224
+ end-to-end does not require the largest model tier. Opus remains the recommendation for full
225
+ steel-quench runs (challenger waves are broader than T-3).
226
+
227
+ ---
228
+
197
229
  ## External-GT Adjudication (when the target has a public ground truth)
198
230
 
199
231
  When quenching a **public artifact that has its own ground truth** — a repo's open issues, test suite, or
@@ -0,0 +1,72 @@
1
+ #!/usr/bin/env bash
2
+ # selfcheck.sh — mandatory-pass (deterministic) checks on FH's own executable surface.
3
+ # Class: mandatory-pass (harness_6axis_framework.md §Axis 5 check classes) — blocks on fail.
4
+ # Scope: executables shipped via npm files[] + the bash infra driving the FH gate chain.
5
+ # Syntax-only (node --check / bash -n): zero side effects, no network, runs anywhere.
6
+ # Wiring: `npm test` for any session; `prepublishOnly` so a publish cannot ship a
7
+ # syntactically broken executable.
8
+ set -u
9
+ cd "$(dirname "${BASH_SOURCE[0]}")/.."
10
+ fail=0
11
+
12
+ check() { # check <label> <cmd...>
13
+ local label="$1"; shift
14
+ if "$@" 2>/dev/null; then
15
+ echo "PASS $label"
16
+ else
17
+ echo "FAIL $label"
18
+ "$@" || true
19
+ fail=1
20
+ fi
21
+ }
22
+
23
+ # Node executables (npm-shipped)
24
+ for f in bin/*.js; do
25
+ check "node --check $f" node --check "$f"
26
+ done
27
+
28
+ # Bash surface: npm-shipped scripts + local bin wrappers + gate-chain infra
29
+ for f in scripts/*.sh bin/fh-gate bin/fh-run bin/fh-goal \
30
+ templates/regression_guard.sh templates/temper_check.sh templates/predelete_check.sh templates/.git-hooks/pre-commit; do
31
+ [ -f "$f" ] || continue
32
+ check "bash -n $f" bash -n "$f"
33
+ done
34
+
35
+ # Count consistency: stated skill/agent counts vs actual directories.
36
+ # Drift class recurred 4x on 2026-06-10 alone (local_fh_context 26, plugin.json "3 agents",
37
+ # README "5 agents", marketplace.json "3 agents") — this makes the check mechanical and permanent.
38
+ # Active skill = SKILL.md without a deprecation marker (frontmatter `deprecated: true` or
39
+ # "DEPRECATED" in the description block).
40
+ count_active() { # count_active <plugin>
41
+ local n=0 s
42
+ for s in plugins/"$1"/skills/*/SKILL.md; do
43
+ [ -f "$s" ] || continue
44
+ head -20 "$s" | grep -qE 'deprecated: true|DEPRECATED' || n=$((n+1))
45
+ done
46
+ echo "$n"
47
+ }
48
+ count_agents() { ls plugins/"$1"/agents/*.md 2>/dev/null | wc -l | tr -d ' '; }
49
+
50
+ meta_sk=$(count_active fh-meta); meta_ag=$(count_agents fh-meta)
51
+ com_sk=$(count_active fh-commons); com_ag=$(count_agents fh-commons)
52
+ total_sk=$((meta_sk + com_sk)); total_ag=$((meta_ag + com_ag))
53
+
54
+ count_check() { # count_check <label> <file> <expected-string>
55
+ if grep -q "$3" "$2"; then
56
+ echo "PASS count: $1"
57
+ else
58
+ echo "FAIL count: $1 — expected \"$3\" in $2 (actual: fh-meta ${meta_sk}sk/${meta_ag}ag, fh-commons ${com_sk}sk/${com_ag}ag)"
59
+ fail=1
60
+ fi
61
+ }
62
+ count_check "fh-meta plugin.json" plugins/fh-meta/.claude-plugin/plugin.json "${meta_sk} skills + ${meta_ag} agents"
63
+ count_check "fh-commons plugin.json" plugins/fh-commons/.claude-plugin/plugin.json "${com_sk} skills"
64
+ count_check "marketplace.json fh-meta" .claude-plugin/marketplace.json "${meta_sk} skills + ${meta_ag} agents"
65
+ count_check "README header" README.md "${total_sk} skills · ${total_ag} agents"
66
+ count_check "local_fh_context fh-meta" templates/local_fh_context.md "(fh-meta, ${meta_sk})"
67
+
68
+ if [ "$fail" -ne 0 ]; then
69
+ echo "SELFCHECK: FAIL"
70
+ exit 1
71
+ fi
72
+ echo "SELFCHECK: PASS"