@chrono-meta/fh-gate 1.4.7 → 1.4.8
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CATALOG.md +150 -0
- package/CHEATSHEET.md +1 -0
- package/CLAUDE.md +58 -8
- package/README.md +43 -12
- package/docs/CONTRIBUTING.md +16 -0
- package/package.json +5 -2
- package/plugins/fh-commons/agents/quench-challenger.md +8 -1
- package/plugins/fh-meta/skills/frontier-digest/SKILL.md +37 -1
- package/plugins/fh-meta/skills/goal-quench/SKILL.md +4 -4
- package/plugins/fh-meta/skills/harness-doctor/SKILL.md +18 -0
- package/plugins/fh-meta/skills/plugin-recommender/SKILL.md +4 -1
- package/plugins/fh-meta/skills/prompt-regression/SKILL.md +11 -8
- package/plugins/fh-meta/skills/steel-quench/SKILL.md +33 -1
- package/scripts/selfcheck.sh +72 -0
package/CATALOG.md
CHANGED
|
@@ -8,6 +8,120 @@ AI reads this file first when searching past work. Open individual files for det
|
|
|
8
8
|
|
|
9
9
|
<!-- Add entries in reverse date order (newest at top) -->
|
|
10
10
|
|
|
11
|
+
### 2026-06-10 | forge-harness | #destructive-op-gate, #irreversibility, #silent-loss, #branch-cleanup-incident
|
|
12
|
+
**File:** CLAUDE.md §Destructive-Op Gate (+ templates/predelete_check.sh, scripts/selfcheck.sh)
|
|
13
|
+
Third irreversibility gate (sibling of Pre-Publish): enumerate → recover → destroy, never destroy-then-check. predelete_check.sh classifies branches SAFE/CHECK/REVIEW (CHECK = 0 unique paths but commits off base — shared files may hold newer content, the silent-loss class); REVIEW blocks scripted deletes (exit 1); the recovery step is judged/depth-sensitive (strongest-tier floor semantics). Signal-table row fires on destructive intents proactively. Origin: same-day incident — a parallel session's card (weekly-audit done + #88) lived only on an unmerged 0-unique-path branch; pre-deletion enumeration recovered it. Dogfood replay: the script lands that exact branch in CHECK.
|
|
14
|
+
- Decision: safety mechanized rather than tier-escalated — Sonnet-default stays valid because the gate carries the depth, with the judged step floor-routed for the residual.
|
|
15
|
+
- Decision: scope split vs Pre-Publish kept explicit (publish = exposure irreversibility, destroy = silent-loss irreversibility).
|
|
16
|
+
|
|
17
|
+
|
|
18
|
+
### 2026-06-10 | forge-harness | #mode-d-notice, #model-guidance, #self-dev-entry
|
|
19
|
+
**File:** CLAUDE.md §Mode D Model Notice
|
|
20
|
+
Conditional model-pin guidance at self-dev entry (operator request): fires once at the 4-axis gate's own activation trigger — session model opus+ = silent · below-opus = one-line pin recommendation with measured rationale · identity-withheld runtime = static fallback. Advisory only, never auto-switches (pin-is-not-a-cap); field-operation sessions never see it.
|
|
21
|
+
- Decision: zero new triggers — reuses the gate activation condition; the notice and the gate share one detection point.
|
|
22
|
+
|
|
23
|
+
|
|
24
|
+
### 2026-06-10 | forge-harness | #tier-floor-quench, #below-floor-live, #floor-governance, #dual-challenger
|
|
25
|
+
**File:** knowledge/shared/harness-core/multi_model_sidecar_strategy.md §Floor governance (+ quench-challenger.md)
|
|
26
|
+
Tier-floor design quenched same-day by dual challenger dispatch — F1 opus (floor met) + F2 sonnet (below-floor; floor=opus): first live firing of the below-floor flag, on the design that defines it. Both tiers found the same 2 S-grades: cross-provider "floor-equivalent" undecidable (vibes equivalence) and re-quench tags with no consumer (permanent silent degradation). §Floor governance ships the fixes: external tiers below-floor-by-default until measured equivalence entries (the backend×tier ladder is the evidence source); below-floor judged verdicts provisional until floor re-run or operator acceptance, weekly audit = standing consumer; floor: hard for depth-critical roles (floor outranks diversity); diversity_rationale tie-break; claims age; pin-is-not-a-cap resolution. Wave-T runs #11/#12 both τ-PASS (fix-traceable complexity only).
|
|
27
|
+
- Decision: below-floor challenger empirically buys value (sonnet 6/6 real findings, 2 unique) but misses the deepest (opus-only hard-floor insight) — F2 doctrine and the floor both validated by the same experiment.
|
|
28
|
+
- Open: first weekly-audit below-floor consumption + first measured equivalence entry (laptop ladder).
|
|
29
|
+
|
|
30
|
+
|
|
31
|
+
### 2026-06-10 | forge-harness | #tier-floor, #model-resolution, #default-sonnet, #sidecar-protocol
|
|
32
|
+
**File:** knowledge/shared/harness-core/multi_model_sidecar_strategy.md (+ quench-challenger.md, steel-quench SKILL.md, README.md, templates/CLAUDE.md)
|
|
33
|
+
Tier-floor resolution ships — the model dimension of the Sidecar Engine Resolution Protocol: assets declare measured-or-justified tier floors (quench-challenger=opus, Wave-T/harness-doctor=sonnet measured, mechanical=none); environment resolves R1 native dispatch / R2 cross-provider / R3 below-floor-with-flag (never hard-fail). Public guidance flips to default `/model sonnet` + floored dispatch; human-set session defaults are inviolable (FH dispatches sub-agents, never switches the session model); pinning the strongest available model recommended only for harness-editing (Mode D).
|
|
34
|
+
- Decision: floor-not-pin semantics; below-floor judged verdicts auto-tagged re-quench candidates (Degraded coverage rule extension); no specific top-anchor model or subscription window named publicly (anti-stale).
|
|
35
|
+
- Decision: grounded in same-day measurements (operation tier-flat 100/100/97/94; depth differential on design increments only) — the guidance flip and the mechanism ship together, neither alone.
|
|
36
|
+
- Open: first organic below-floor dispatch + first Sonnet-default external install feedback (verify_next em-2026-06-10v).
|
|
37
|
+
|
|
38
|
+
|
|
39
|
+
### 2026-06-10 | forge-harness | #model-tier, #tier-flattening, #worked-example, #output-evidence
|
|
40
|
+
**File:** docs/OUTPUT_EVIDENCE.md (+ README.md §Model setup)
|
|
41
|
+
Model-tier flattening measured and published: 30-point blind battery (rule-application + meta-dev fixtures, pre-registered rubric) on four Claude tiers — operation 100/100/97/94 (anchor/Opus 4.8/Sonnet 4.6/Haiku 4.5), tier separation only on above-rubric design increments (3/3·1/3·0.5/3·0/3). Public claim scoped honestly: single trial, self-graded, worked example not benchmark. README §Model setup gains the evidence note grounding the existing Opus recommendation.
|
|
42
|
+
- Decision: operating FH ≈ model-flat (the harness is the score); developing FH is where tier matters — recommendation unchanged (opus for harness-editing/gates), now evidence-backed. **[superseded same-day by the tier-floor entry above: default flipped to sonnet + floored dispatch; opus pin now Mode-D-only]**
|
|
43
|
+
- Open: real Qwen-class measurement on laptop (batteries are a portable fixture pack, fh-be record).
|
|
44
|
+
|
|
45
|
+
### 2026-06-10 | forge-harness | #fc, #consent-lane, #federated-compounding, #starved-center, #v3
|
|
46
|
+
**File:** tracks/_contrib/README.md (+ .gitignore, templates/contrib_session.md, docs/CONTRIBUTING.md, README.md)
|
|
47
|
+
FC F1 pour — the consent lane opens (operator override: "실제증명으로 찍어누른다" — real-proof-first supersedes the v2-submission latch). `tracks/_contrib/` becomes the only tracked subtree under tracks/: surgical 2-line un-ignore, placement = consent, PR-time lane gate (PSA + marketplace-gate C5 + ingest contradiction scan + reviewer). Lane charter + session template + CONTRIBUTING §Consent Lane + README row shipped — exactly the don't-overbuild list, no new skill/verifier.
|
|
48
|
+
- Decision: operator lifted the v2 latch (2026-06-10) — paper timing no longer gates feature publicization; proof-by-shipping doctrine recorded in the lab ledger.
|
|
49
|
+
- Decision: privacy default unchanged — the lane is consent-by-placement; the firewall (PSA/Pre-Publish/4-axis) is the lane's load-bearing precondition, not weakened by it.
|
|
50
|
+
- Open: F2 begins on the first external lane PR (live gate run + friction signals); F3 verdict 2 quarters post-open (PAID vs WRITTEN-OFF per falsifiability criterion).
|
|
51
|
+
|
|
52
|
+
### 2026-06-10 | forge-harness | #wave-t, #temper, #steel-quench, #forge-fourth-movement, #v3
|
|
53
|
+
**File:** plugins/fh-meta/skills/steel-quench/SKILL.md (+ templates/temper_check.sh, docs/ETHOS.md, README.md)
|
|
54
|
+
Wave-T (Temper) ships — the fourth forge movement, poured from lab validation (operator-approved): after Wave 3+ convergence, measure the complexity the quench itself added (T-1 temper_check.sh delta, fence-excluding · T-2 harness-doctor absolute tier · T-3 τ-verdict, judged-paired with the quench's own findings). Validation: 9 runs across 4 independent convergences (0 false flags, simplification un-punished) + synthetic positive control (sensitivity) + same-commit dogfood run #10 on the pour itself (τ-PASS). ETHOS/README "direction ahead" IOU converted to delivery.
|
|
55
|
+
- Decision: organic τ-FAIL redefined from promotion-blocker to standing watch (operator-approved) — synthetic control covers sensitivity; first organic flag → human verdict → recorded.
|
|
56
|
+
- Decision: Wave-T stays one script + one section reusing harness-doctor (don't-overbuild guard is part of the shipped spec).
|
|
57
|
+
- Open: npm 1.4.8 ships Wave-T + temper_check.sh + selfcheck 21 (folded into the open laptop handoff).
|
|
58
|
+
|
|
59
|
+
### 2026-06-10 | forge-harness | #selfcheck, #count-consistency, #drift-class-kill
|
|
60
|
+
**File:** scripts/selfcheck.sh
|
|
61
|
+
Count-consistency probes (mandatory-pass): plugin.json ×2 / marketplace.json / README header / local_fh_context stated counts vs actual dirs (active = non-deprecated). Motivated by 4 same-day drift instances — the class now fails npm test + prepublishOnly instead of waiting for a doctor run.
|
|
62
|
+
- Decision: deprecation detected mechanically (frontmatter `deprecated: true` or DEPRECATED marker in head -20).
|
|
63
|
+
|
|
64
|
+
### 2026-06-10 | forge-harness | #backlog-cleanup, #phantom-fix, #count-drift, #goal-quench
|
|
65
|
+
**File:** plugins/fh-meta/skills/goal-quench/SKILL.md (+ templates/local_fh_context.md, plugin.json, README.md)
|
|
66
|
+
Backlog cleanup (cloud session 2): goal-quench phantom vocab fixed — "steel-quench C3 config" → "cross-provider challenger" (anchored to steel-quench:83 vocabulary + §Sidecar Engine Resolution Protocol; token `steel-quench-crossprovider`, 4 spots; resolves `fh_signal_2026-06-10_fh-direct`). Skill/agent count drift fixed: local_fh_context 26→29 active (−2 deprecated, +5 missing), plugin.json agents 3→7, README agents 5→8 (skills 33 verified correct = 36 dirs − 3 deprecated). Resolves the 06-04 "skill-count drift" Open item.
|
|
67
|
+
- Decision: counting policy is mechanical — active = non-deprecated skill dirs, agents = files in agents/; descriptive sidecar labels confirmed over C-numbering (per 06-10 check-class decision).
|
|
68
|
+
- Verified-no-action: 05-26 cross-ref recommendations already implemented — arXiv 2605.18747 (README:282 + definition doc:39) and Sylph 2604.21003 (definition doc:71/:105) — L4 open NOT raised.
|
|
69
|
+
- Open: npm 1.4.8 republish ships plugin.json/README/SKILL.md count+phantom fixes (folded into the open laptop handoff).
|
|
70
|
+
|
|
71
|
+
### 2026-06-10 | forge-harness | #credit-economy, #operator-intake, #lightweight-triage, #frontier-digest
|
|
72
|
+
**File:** plugins/fh-meta/skills/frontier-digest/SKILL.md (+ .claude/rules/sister_asset_protocol.md)
|
|
73
|
+
Credit-economy engine run #2 formalization (operator-approved; this session = manual run #2: 7 walled sources → 8 audits → 15 same-day gate-passed imports, 2 human corrections): R1 — frontier-digest Step 0.5 operator-intake asks for walled-channel sources (YouTube/LinkedIn/X, machine-403) on cadence runs only, skippable, try wall-bypass (WebSearch + secondary) first. R2 — sister protocol lightweight path: dedup-hit/no-increment sources get a one-paragraph entry, full cross-audits reserved for A/B-tier.
|
|
74
|
+
- Decision: the operator stays the wide-net sensor for walled channels by design (human = scarce oracle), the machine owns endurance (triage→gate cycles); intake cost discipline keeps the wide net wide.
|
|
75
|
+
- Follow-up (same day): Step 0.5 video-harvest ladder — Tier 1 sidecar (codex / Gemini-route, agentic = approval-mode first) → Tier 3 Claude+yt-dlp transcript → operator floor; laptop verification trio handed off. First applied instance of the capability-probe principle.
|
|
76
|
+
|
|
77
|
+
### 2026-06-10 | forge-harness | #comprehension-debt, #loop-engineering, #osmani, #harness-doctor
|
|
78
|
+
**File:** plugins/fh-meta/skills/harness-doctor/SKILL.md
|
|
79
|
+
Taxonomy 6→7: Comprehension Debt (runtime/operator-side, S) — loop output outpacing operator understanding, named from Osmani's Loop Engineering (canonical upstream of the supervisor-loops lineage; Cherny/Steinberger quotes — paper citation upgraded from the Korean video to this primary source, convergence count unchanged at n=5). Signals mechanical: merged change without CATALOG entry, manifest pending backlog, zero card delta. Countermeasures pre-existing (CATALOG summaries, predict-verify, card protocol).
|
|
80
|
+
- Decision: taxonomy now covers both loop sides — agent behavior (rows 4–6) and operator comprehension (row 7); FH's HITL PR principle recorded as a deliberate L2 cap on asset-changing loops (Osmani L3 rejected for asset mutation).
|
|
81
|
+
|
|
82
|
+
### 2026-06-10 | forge-harness | #failure-modes, #agentic-laziness, #self-preferential-bias, #goal-drift, #harness-doctor
|
|
83
|
+
**File:** plugins/fh-meta/skills/harness-doctor/SKILL.md
|
|
84
|
+
Harness-Defect Taxonomy 3→6 classes: runtime behavioral failure modes named from dynamic-workflows discourse (LinkedIn full-text triage) — Agentic Laziness (completion claims without per-item evidence, S), Self-Preferential Bias (judged check without adversarial pairing, M), Goal Drift (no pre-compaction completion log, S). FH already had the countermeasures (golden probes/coverage, judged-pairing rule, fh_completed + card-last guard) — the import is the *naming*, making diagnosis explicit.
|
|
85
|
+
- Decision: signals kept mechanically checkable (list/pairing/log presence) — no judge-only signal in the taxonomy itself; sources not counted as new convergence (discourse extension, anti-inflation).
|
|
86
|
+
|
|
87
|
+
### 2026-06-10 | forge-harness | #no-reinvention, #official-first, #claude-plugins-official, #full-harness-mode
|
|
88
|
+
**File:** knowledge/shared/plugin-catalog/recommended_plugins.md (+ CLAUDE.md, auto_project_mapping.md)
|
|
89
|
+
Operator-declared meta-harness 철칙 — no-reinvention / official-first: (a) plugin catalog gains Category 0.5, the claude-plugins-official 36-plugin inventory (12 LSP + workflow + authoring + setup groups; name-based, anti-stale "re-enumerate when recommending", role-split warnings for claude-md-management overlap); (b) New Skill gate Role-duplication criterion now also checks Tier 0 built-ins + official plugins — reinventing an official capability requires explicit justification; (c) Full-Harness Mode item 4: official-plugin scan for the mapped project's stack (recommend-only, never auto-install) — mapped-project acceleration with zero FH build.
|
|
90
|
+
- Decision: FH builds only what adds governance on top of official capabilities; drafting tools (skill-creator etc.) never exempt their output from the FH gate.
|
|
91
|
+
|
|
92
|
+
### 2026-06-10 | forge-harness | #tier-0, #builtins, #security-review, #deep-research, #role-split
|
|
93
|
+
**File:** plugins/fh-meta/skills/plugin-recommender/SKILL.md (+ frontier-digest, CLAUDE.md, probes.md)
|
|
94
|
+
CC built-ins utilization imports (operator-approved; video claims verified 9/13 real — agent initially ruled skill-creator nonexistent, operator-corrected: it is an official plugin in claude-plugins-official): Tier 0 = platform built-ins added to plugin-recommender (discovery order 0, "enumerate from live session" anti-stale, governance-add guard for FH-native precedence — goal-quench pattern named). /deep-research as frontier-digest Tier-0 engine. /security-review as Pre-Publish chain item 3 (code-security axis, skip-note path). Permission-Denial Option C, code-review role split, /loop WATCH row. G-TRIG-05 probe synced — anti-stale maintenance rule's first live use.
|
|
95
|
+
- Decision: built-in beats plugin install at ~80% coverage; FH native beats built-in only when it adds governance.
|
|
96
|
+
|
|
97
|
+
### 2026-06-10 | forge-harness | #ingest-gate, #contradiction-scan, #crossref-lint, #llm-wiki, #karpathy
|
|
98
|
+
**File:** .claude/rules/sync_push_protocols.md (+ harness-doctor SKILL.md, probes.md)
|
|
99
|
+
Karpathy LLM-Wiki sister-audit imports (operator-approved; convergence case n=5, citable primary source): I1 — contradiction scan as Sync step 3 (ingest gate, judged + verify-bidirectional pair): new knowledge grepped against existing claims before indexing, conflicts flagged in both files, old-claim removal is HITL. I2 — harness-doctor L4 knowledge cross-ref lint: no CATALOG entry = S-tier index orphan, no inbound ref = R-tier orphan page. Probes G-SYNC-01/G-LINT-01 added (30 total).
|
|
100
|
+
- Decision: scale escape (W1) deliberately NOT built — watch-item with trigger (CATALOG hundreds of entries / repeated search misses); operator-preferred first remedy = skill-splitter-style CATALOG split-mapping, RAG hybrid only after that.
|
|
101
|
+
- Open: npm republish (harness-doctor SKILL.md shipped) — folded into the open 1.4.8 handoff.
|
|
102
|
+
|
|
103
|
+
### 2026-06-10 | forge-harness | #golden-probes, #offline-eval, #doc-code-coupling, #anthropic-4layer
|
|
104
|
+
**File:** .claude/regression/probes.md (+ templates/.git-hooks/pre-commit, prompt-regression SKILL.md)
|
|
105
|
+
Anthropic 4-layer sister-audit imports (operator-approved): I1 — 28-probe known-answer golden set with check classes, the standing offline eval prompt-regression auto-loads (its P-GATE-01 had already gone stale 5→6 the same day the gate grew — fixed, plus an explicit anti-stale maintenance rule). I2 — pre-commit doc-code coupling WARN (measured class, never blocks) when executables are staged without any doc asset.
|
|
106
|
+
- Decision: probes.md canonical-when-present, SKILL.md default matrix = Mode C fallback (single-source preserved); coupling check warns rather than blocks — the decision is made conscious, doc-neutral script fixes stay friction-free.
|
|
107
|
+
- Open: npm republish (prompt-regression SKILL.md is shipped) — folded into the open 1.4.8 handoff.
|
|
108
|
+
|
|
109
|
+
### 2026-06-10 | forge-harness | #new-skill-gate, #check-class, #done-when, #g2
|
|
110
|
+
**File:** CLAUDE.md
|
|
111
|
+
G2 from the supervisor-loops audit: New Skill Creation Pre-Commit Gate extended 5→6 items — "Check-class declared": each Done When condition states its class (mandatory-pass / measured / judged per 6axis §Axis 5), and judged conditions must name their adversarial pairing (no judge-only path). The taxonomy now lives in the operating loop, not just the knowledge doc.
|
|
112
|
+
- Decision: applies to new skills only; existing 28 backfill opportunistically when next edited — no retroactive block.
|
|
113
|
+
|
|
114
|
+
### 2026-06-10 | forge-harness | #selfcheck, #mandatory-pass, #npm-test, #self-application
|
|
115
|
+
**File:** scripts/selfcheck.sh (+ package.json)
|
|
116
|
+
G1 fix from the supervisor-loops 100%+ audit: FH's own shipped code (bin/fh-*.js, scripts/fh-*.sh) had zero deterministic checks. New selfcheck.sh runs node --check + bash -n over the npm-shipped executable surface and the gate-chain bash infra (15 checks), wired as `npm test` and `prepublishOnly` — a publish can no longer ship a syntactically broken executable.
|
|
117
|
+
- Decision: syntax-only scope (zero side effects, runs anywhere); blocking wired at publish, not commit — doc commits stay unaffected.
|
|
118
|
+
- Open: npm republish to ship selfcheck.sh in the tarball (machine-bound, laptop).
|
|
119
|
+
|
|
120
|
+
### 2026-06-10 | forge-harness | #verify-axis, #check-classes, #supervisor-loops, #sister-asset
|
|
121
|
+
**File:** knowledge/shared/harness-core/harness_6axis_framework.md
|
|
122
|
+
Axis 5 check-class taxonomy added: every verify check classified as mandatory-pass (deterministic, blocking) / measured (quantitative, tracked) / judged (LLM-judge + cited evidence + corrective action — self-judges grade leniently). Judged rule: a judge verdict never passes alone — paired adversarial re-verification + evidence itself phantom-checked. Import from supervisor-loops sister-audit (operator-approved; cross-audit + proposal in companion store; corrective-action clause added after full-transcript delta check).
|
|
123
|
+
- Decision: descriptive labels over C1/C2/C3 numbering — avoids collision with unanchored "steel-quench C3 config" vocab (goal-quench:283, logged as fh_signal for separate fix).
|
|
124
|
+
|
|
11
125
|
### 2026-06-09 | forge-harness | #sidecar, #zero-config, #engine-resolution, #roadmap, #mode-c
|
|
12
126
|
**File:** knowledge/shared/harness-core/multi_model_sidecar_strategy.md (+ hybrid_orchestration_architecture_roadmap.md)
|
|
13
127
|
Added canonical §Sidecar Engine Resolution Protocol — Tier1 subscription-CLI → Tier2 API-key → Tier3 Claude-subagent guaranteed fallback. Principle: discovery automatic/free, invocation value-gated (intelligent default multi-AI, no hard-fail for Mode C). Wired pointers into goal-quench Step D / steel-quench runtime-adapter / harvest-loop Step 3.5-X; sim-conductor/pipeline-conductor/agent-composer inherit by reference. Source hybrid-orchestration design archived as proposed roadmap (versions→placeholders, Python pseudo-code→illustrative, non-shipped tagged Proposed). PR #80.
|
|
@@ -207,6 +321,42 @@ v1.2 release complete (PR #1–#5): harvest-loop Step 0, agent-composer worktree
|
|
|
207
321
|
|
|
208
322
|
<!-- Time-independent reference documents -->
|
|
209
323
|
|
|
324
|
+
### 2026-06-06 | pattern | parallax, multi-persona-review, synthesizer, standpoint-coverage
|
|
325
|
+
**File:** `knowledge/shared/patterns/multi-persona-review.md`
|
|
326
|
+
Generalized architecture for multi-persona parallel artifact review ("parallax") — parallel isolated personas + shared output protocol + neutral synthesizer. Domain-agnostic, IP-stripped; embodied as sim-conductor Step 1.5.
|
|
327
|
+
|
|
328
|
+
### 2026-05-31 | reference | ecosystem-positioning, opencode, hermes, openhuman, readiness
|
|
329
|
+
**File:** `knowledge/shared/harness-core/fh_ecosystem_positioning.md`
|
|
330
|
+
FH's structural position in the AI agent framework ecosystem vs Hermes/OpenCode/OpenHuman — gap analysis, synergy map, layered readiness verdict from 3-model adversarial audit. Companion: `fh_synergy_playbook.md` (concrete workflow specs).
|
|
331
|
+
|
|
332
|
+
### 2026-06-04 | schema | tpa, target-profile-analysis, routing, sim-conductor, steel-quench
|
|
333
|
+
**File:** `knowledge/shared/harness-core/tpa_schema.md`
|
|
334
|
+
Canonical Target Profile Analysis schema — all TPA-running skills (sim-conductor, steel-quench, phantom-quench, agent-composer) derive routing decisions from this schema. Single source for profile fields.
|
|
335
|
+
|
|
336
|
+
### 2026-06-04 | draft | goal-quench, anthropic-issue, native-goal, token-budget
|
|
337
|
+
**File:** `knowledge/shared/harness-core/goal_quench_anthropic_issue.md`
|
|
338
|
+
Draft Anthropic GitHub issue — native /goal token budget + quality verification hook proposal. Held until arXiv number confirmation.
|
|
339
|
+
|
|
340
|
+
### 2026-05-26 | measurement | skill-quality-rubric, maturity-score, verifiable-numbers
|
|
341
|
+
**File:** `knowledge/shared/harness-core/skill_quality_rubric.md`
|
|
342
|
+
Skill maturity score formula definition. Declaring verifiable/evolution numbers without this file violates the cold-audit "self-declaration = delete if no basis" rule.
|
|
343
|
+
|
|
344
|
+
### 2026-06-04 | core-reference | compounding-loop, weekly-cycle, axis-6, automation
|
|
345
|
+
**File:** `knowledge/shared/harness-core/hub_compounding_loop.md`
|
|
346
|
+
Core reference doc (CLAUDE.md Consult-First table): weekly/monthly/quarterly feedback cycles + Axis-6 Compounding automation roadmap.
|
|
347
|
+
|
|
348
|
+
### 2026-06-04 | core-reference | runtime-flow, session-chronology, subagent-delegation
|
|
349
|
+
**File:** `knowledge/shared/dialogue/claude_code_runtime_flow.md`
|
|
350
|
+
Core reference doc (CLAUDE.md Consult-First table): chronological flow of a Claude Code session (does) + sub-agent delegation flowchart.
|
|
351
|
+
|
|
352
|
+
### 2026-06-04 | core-reference | dialogue-playbook, token-efficiency, rule-hierarchy
|
|
353
|
+
**File:** `knowledge/shared/dialogue/ai_dialogue_playbook.md`
|
|
354
|
+
Core reference doc (CLAUDE.md Consult-First table): session-start principles, token efficiency, rule hierarchy, amplifier/coach dual mode (should).
|
|
355
|
+
|
|
356
|
+
### 2026-06-04 | glossary | terminology, meta-harness, launch-pad, transit-acceleration
|
|
357
|
+
**File:** `knowledge/shared/GLOSSARY.md`
|
|
358
|
+
Key term definitions — meta-harness, meta hub, launch pad effect, transit acceleration value, shared skill pool, operating modes. Entry point for FH-internal vocabulary; linked from CHEATSHEET.
|
|
359
|
+
|
|
210
360
|
### 2026-04-28 | template | maturity-roadmap, 3-phase-frame, frontier-tracking, simplification-gate
|
|
211
361
|
**File:** `knowledge/shared/harness-core/hub_maturity_roadmap.md`
|
|
212
362
|
Hub long-term evolution path frame. Phase I (entering maturity) → Phase II (frontier following (b)cadence) → Phase III (leading) 3-stage model + 5-criteria gate (audit automation·operations guide·external propagation·sub-agent judgment·self-diagnosis warning) + 6 indicators (seed repo·blog·citations·external adoption·self-evolving·industry original) + simplification gate (self-diagnosis + within 200 lines + unreferenced archive at each transition). General-purpose template derived from first verified operating instance.
|
package/CHEATSHEET.md
CHANGED
|
@@ -3,6 +3,7 @@
|
|
|
3
3
|
Frequently used commands and phrases.
|
|
4
4
|
|
|
5
5
|
> **`<harness-root>`** = the path where you cloned this repo. Example: `~/projects/forge-harness`. Replace `<harness-root>` in the commands below with your actual path.
|
|
6
|
+
> Unfamiliar with a term (meta-harness, launch pad effect, transit acceleration…)? See `knowledge/shared/GLOSSARY.md`.
|
|
6
7
|
|
|
7
8
|
---
|
|
8
9
|
|
package/CLAUDE.md
CHANGED
|
@@ -87,7 +87,7 @@ The forge-harness hub has a dual identity: **(a) a seed for others** + **(b) you
|
|
|
87
87
|
When blocked by auto-mode permission denial, **do not stop at the bare denial** — turn the block into a decision the user can act on in one step:
|
|
88
88
|
|
|
89
89
|
1. **State what was blocked** and why
|
|
90
|
-
2. **Option A — Approval mode**: show exact commands to run after switching; **Option B — Manual review**: list specific files/sections
|
|
90
|
+
2. **Option A — Approval mode**: show exact commands to run after switching; **Option B — Manual review**: list specific files/sections; **Option C — Reduce future prompts**: propose built-in `/fewer-permission-prompts` when the same read-only call class keeps getting prompted
|
|
91
91
|
3. **Ask which option** — one line, then wait
|
|
92
92
|
|
|
93
93
|
**Sub-agent variant**: report (what was blocked + ready-to-apply content + exact unblock step) back to orchestrator — never silently fail. Switching modes lifts permission block, not FH gates — the 4-axis gate still applies.
|
|
@@ -99,27 +99,31 @@ Simplification guard: trivial denials with one obvious fix → state block + sin
|
|
|
99
99
|
> **Full 4-step detail**: `knowledge/shared/harness-core/fh_detail_protocols.md`
|
|
100
100
|
> **Read this file before Step 1 begins** — duplicate-install detection (Step 1-b) and registry scan (Step 1-c) are only defined there, not in this summary.
|
|
101
101
|
|
|
102
|
-
**Triggers**: greetings (`hi`/`hello`/`hey
|
|
102
|
+
**Triggers**: greetings (`hi`/`hello`/`hey`/`안녕` — and the same word in any language; FH is English-based but language-agnostic, a bare greeting in any tongue fires this) · start intent (`resume`, `continue`, `where were we`) · new task (`new project`, `new task`) · discovery (`what is this`, `what can you do`, `first time here`)
|
|
103
103
|
|
|
104
104
|
**4-step summary**: ① Auto-read CLAUDE.md + CATALOG + session card + registry scan → ② One-line proposal (new user / exploratory / returning branches) → ③ 5-skill cascade (plugin-recommender → synergy → .claudeignore → model → verify) → ④ Approval + setup
|
|
105
105
|
|
|
106
106
|
**Identity marker**: every greeting response (Step ②) opens with 🐿️ on its own line. FH's session-start signal — see `fh_detail_protocols.md` Step 2 for full greeting templates.
|
|
107
107
|
|
|
108
108
|
**Guards**: explicit task-entry utterance → skip onboarding · once per session · code/debug requests → start working directly · project routing is a suggestion, mention at most once
|
|
109
|
+
**Metadata-is-not-intent guard**: the trigger is the user's **typed message only**. Session metadata — branch name (auto-derived from the first message, e.g. `claude/korean-greeting-*`), repo name, file paths — is **never** a task spec and never suppresses or redirects the greeting trigger. A bare greeting fires onboarding even when the branch name looks like a feature request; if the only "task" signal lives in metadata and not in what the user typed, treat the message as a greeting and run the 3-axis scaffold.
|
|
109
110
|
|
|
110
111
|
## New Skill Creation Pre-Commit Gate
|
|
111
112
|
|
|
112
|
-
All
|
|
113
|
+
All 6 items below must pass before committing a new SKILL.md. If any fails, fix and re-commit.
|
|
113
114
|
|
|
114
115
|
| Item | Criterion |
|
|
115
116
|
|---|---|
|
|
116
|
-
| **Role duplication check** | Pass `/asset-placement-gate` — no overlap with existing role clusters |
|
|
117
|
+
| **Role duplication check** | Pass `/asset-placement-gate` — no overlap with existing role clusters, **platform built-ins (Tier 0), or `claude-plugins-official` (Tier 1 official)**. Reinventing an official capability requires explicit justification in the SKILL.md (no-reinvention rule — FH builds only what adds governance) |
|
|
117
118
|
| **Description diet** | Plain text / 0 self-marketing expressions / 0 emphasis words (⭐, "critical", "groundbreaking") |
|
|
118
119
|
| **Done When defined** | At least 1 explicit completion condition |
|
|
120
|
+
| **Check-class declared** | Each Done When condition states its check class — mandatory-pass / measured / judged (`harness_6axis_framework.md` §Axis 5). Any judged condition names its adversarial pairing — no judge-only path |
|
|
119
121
|
| **Natural language triggers** | At least 3 examples that work without internal vocabulary |
|
|
120
122
|
| **Independently executable** | Confirmed to work without other FH skills (or dependencies are explicitly documented) |
|
|
121
123
|
|
|
122
124
|
Skills without a Done When definition automatically qualify as harness-doctor L2 M-tier.
|
|
125
|
+
Check-class declaration applies to **new** skills; existing skills backfill opportunistically
|
|
126
|
+
(when next edited), not retroactively.
|
|
123
127
|
|
|
124
128
|
---
|
|
125
129
|
|
|
@@ -156,7 +160,22 @@ FH asset modified → Axis 1 (regression_guard.sh --pr {BRANCH})
|
|
|
156
160
|
| Forward | `phantom-quench` | Phantom references, paths that don't exist, stale external links |
|
|
157
161
|
| Record | `edit-manifest` RECORD | Logs predicted impact — closes the predict-verify loop for future harvest-loop |
|
|
158
162
|
|
|
159
|
-
|
|
163
|
+
### Mode D Model Notice (fires once, at the same trigger as this gate)
|
|
164
|
+
|
|
165
|
+
The moment FH self-development work begins (= the gate's own activation trigger: an FH asset is about
|
|
166
|
+
to be modified), check the **session model** (self-identity; if the runtime withholds it, treat as
|
|
167
|
+
unknown) and surface **one line** — then proceed, never block:
|
|
168
|
+
|
|
169
|
+
- Model known and opus-tier or above → no notice (already optimal).
|
|
170
|
+
- Model known and below opus-tier → *"이 작업은 FH 자체개발(Mode D)입니다 — 가용 최강 모델 핀을
|
|
171
|
+
권장합니다 (`/model opus` 이상; 측정 근거: README §Model setup). 그대로 진행해도 floored
|
|
172
|
+
디스패치가 깊이 턴을 커버하지만, 세션-레벨 설계 깊이는 핀이 좌우합니다."*
|
|
173
|
+
- Model unknown (runtime withholds identity) → static fallback: *"FH 자체개발 작업입니다 — 세션
|
|
174
|
+
모델이 opus 이상이 아니라면 핀 전환을 권장합니다 (`/model opus`+)."*
|
|
175
|
+
|
|
176
|
+
**Guards**: once per session · advisory only — **never switch the session model** (human override is
|
|
177
|
+
inviolable; a pin is not a cap — tier-floor resolution §Floor governance) · field-project operation
|
|
178
|
+
sessions (no FH asset modification) never see this notice — the Sonnet default stays friction-free.
|
|
160
179
|
|
|
161
180
|
## Pre-Publish Surface Gate (Irreversibility Gate — Publish, not Commit)
|
|
162
181
|
|
|
@@ -169,11 +188,14 @@ first time**, especially one **derived from internal/company assets** (operator-
|
|
|
169
188
|
private harness): `gh repo create --public`, `gh repo edit --visibility public`, a first push to a new
|
|
170
189
|
public remote, `npm publish`, `twine upload`, a private→public visibility flip.
|
|
171
190
|
|
|
172
|
-
**Required before the public action** (
|
|
173
|
-
|
|
191
|
+
**Required before the public action** (all must be non-LEAK/non-FAIL) — this gate is the **umbrella that
|
|
192
|
+
invokes them**, not a competitor; when publish intent is detected, fire *this* gate (it then runs the chain),
|
|
174
193
|
not marketplace-gate alone:
|
|
175
194
|
1. `/public-surface-audit` — operator-private token scan (real username, corp asset names, home paths)
|
|
176
195
|
2. `/marketplace-gate` Check 5 — broad public safety (API keys, internal domains, license)
|
|
196
|
+
3. `/security-review` (built-in, when the repo ships executable code) — code-security pass on the
|
|
197
|
+
publishable surface; complements 1–2 which scan tokens/metadata, not code behavior. Skip note
|
|
198
|
+
(`skipped: docs-only repo` or `skipped: built-in unavailable`) if not applicable
|
|
177
199
|
|
|
178
200
|
> Routing vs the rows below: `/marketplace-gate` alone = "is this ready to **list on a marketplace**?";
|
|
179
201
|
> `/public-surface-audit` alone = reactive "did I leak a token?"; **this gate** = the *act of going
|
|
@@ -193,6 +215,32 @@ checklist** (`templates/PRE-PUBLISH-CHECKLIST.md`) the operator runs on any repo
|
|
|
193
215
|
|
|
194
216
|
---
|
|
195
217
|
|
|
218
|
+
## Destructive-Op Gate (Irreversibility Gate — Delete/Rewrite, not Commit)
|
|
219
|
+
|
|
220
|
+
**Order invariant: enumerate → recover → destroy, never destroy-then-check.** Deletion and history
|
|
221
|
+
rewrite are irreversible in the way publish is — except the loss is *silent* (nobody sees what a
|
|
222
|
+
deleted branch was carrying).
|
|
223
|
+
|
|
224
|
+
**When this gate fires** — *before* any of: branch deletion (local or remote), history rewrite /
|
|
225
|
+
force-push, scrub of tracked history, bulk deletion of session records / tracks content.
|
|
226
|
+
|
|
227
|
+
1. **Enumerate (measured)**: `bash templates/predelete_check.sh <repo> [base]` — per branch: commits
|
|
228
|
+
off base + unique paths. Verdicts: SAFE (fully merged) · CHECK (0 unique paths but commits off
|
|
229
|
+
base — shared files may hold *newer* content, e.g. an unmerged session card) · REVIEW (unique
|
|
230
|
+
paths — recovery mandatory).
|
|
231
|
+
2. **Recover (judged — depth-sensitive)**: every CHECK/REVIEW item gets a content-direction look;
|
|
232
|
+
live un-integrated state (cards · handoffs · signals · session records) is integrated to main
|
|
233
|
+
**before** anything is deleted. This step exists because the loss class is silent — run it at the
|
|
234
|
+
strongest available tier (floor semantics, §Tier-floor); a below-floor pass is provisional.
|
|
235
|
+
3. **Destroy** only what passed — REVIEW blocks a scripted delete chain (script exits 1).
|
|
236
|
+
|
|
237
|
+
> Origin: 2026-06-10 branch cleanup — pre-deletion enumeration recovered a parallel session's card
|
|
238
|
+
> (weekly-audit completion + #88 merge state) that existed **only on an unmerged branch** with zero
|
|
239
|
+
> unique paths: exactly the CHECK class, invisible to "is it merged?" intuition. Deletion without the
|
|
240
|
+
> gate destroys live state without anyone noticing.
|
|
241
|
+
|
|
242
|
+
---
|
|
243
|
+
|
|
196
244
|
## Autonomous Initiative Layer — Context-Triggered Skill Proposals (Active Throughout Session)
|
|
197
245
|
|
|
198
246
|
At any point during a session, when the following signals are detected, propose the relevant skill in one line.
|
|
@@ -205,7 +253,8 @@ Proposal format: `"I see [X]. Want me to run /[skill] to [one-line description]?
|
|
|
205
253
|
| "wrap up this week", "review", "audit", "weekly", "retrospective" | `/harvest-loop` |
|
|
206
254
|
| "pull this into FH", "reverse-harvest", "worth keeping", "harvest pattern", "field pattern" | `/field-harvest` |
|
|
207
255
|
| "harness is complex", "too many skills", "check structure", "harness" | `/harness-doctor` |
|
|
208
|
-
| "review this PR", "check diff", "code review" | `/hub-cc-pr-reviewer` |
|
|
256
|
+
| "review this PR", "check diff", "code review" | code diff → built-in `/code-review`·`/review` · FH-asset coherence → `/hub-cc-pr-reviewer` (role split) |
|
|
257
|
+
| "keep watching X", "poll this", "check every N minutes", recurring WATCH item | built-in `/loop` (interval runner) — pair with the WATCH list, don't hand-poll |
|
|
209
258
|
| "are these in sync", "synergy", "can these integrate", "any overlap" | `/cross-ecosystem-synergy-detection` |
|
|
210
259
|
| "latest trends", "frontier", "external resources" | `/frontier-digest` |
|
|
211
260
|
| "orchestrate agents", "parallel dispatch", "combine skills", "multiple agents" | `/agent-composer` |
|
|
@@ -218,6 +267,7 @@ Proposal format: `"I see [X]. Want me to run /[skill] to [one-line description]?
|
|
|
218
267
|
| "add to marketplace", "OK to publish", "pre-publish check" | `/marketplace-gate` |
|
|
219
268
|
| "did I leak anything", "public surface audit", "private token scan", "is my split clean", "check tracked files for private tokens" | `/public-surface-audit` |
|
|
220
269
|
| "publish", "make public", "make this repo public", "go public", "gh repo create --public", "flip to public", "first public push", "publish the package", "npm publish", "twine upload" (publish intent — **proactive**, fire *before* the action) | **Pre-Publish Surface Gate** (see above → `/public-surface-audit` + `/marketplace-gate` Check 5 must PASS first) |
|
|
270
|
+
| "delete the branch", "브랜치 삭제", "브랜치 정리", "clean up branches", "force-push", "rewrite history", "지워도 돼?" (destructive intent — **proactive**, fire *before* the action) | **Destructive-Op Gate** (see above → enumerate → recover → destroy; `templates/predelete_check.sh`) |
|
|
221
271
|
| "look at this again", "is this right", "counterargument", "re-validate" | `/verify-bidirectional` |
|
|
222
272
|
| "MCP failing", "tool keeps erroring", "circuit-breaker", "same error looping" | `/mcp-circuit-breaker` |
|
|
223
273
|
| "token budget", "how expensive", "estimate tokens", "will this cost a lot" | `/token-budget-gate` |
|
package/README.md
CHANGED
|
@@ -166,15 +166,15 @@ hardened by attack, and only then does it ship faster, for having survived.
|
|
|
166
166
|
|---|---|---|
|
|
167
167
|
| **Forge** | shape the raw project into a harness — raise its floor | `install-wizard`, "harness-ify this project" |
|
|
168
168
|
| **Quench** | harden it by attack — the cold pass leaves standing only what is sound | `steel-quench` · `phantom-quench` |
|
|
169
|
-
| **Temper** | take the brittleness back out of the hardened asset |
|
|
169
|
+
| **Temper** | take the brittleness back out of the hardened asset | `steel-quench` Wave-T · `templates/temper_check.sh` |
|
|
170
170
|
| → **Accelerate** | a blade that survived the forge cuts faster | `goal-quench` — *Pass → Accelerate* |
|
|
171
171
|
|
|
172
|
-
|
|
173
|
-
|
|
174
|
-
it running: `harvest-loop` (each session's lessons become permanent skills) and
|
|
175
|
-
(orchestrate the dispatch). The other skills wait until you need them — full list below.
|
|
172
|
+
All four movements ship. Temper was named before it was built — deliberately (see
|
|
173
|
+
[`ETHOS.md`](docs/ETHOS.md#the-forge)) — and shipped once measurement runs validated it. Around the forge,
|
|
174
|
+
two more signatures keep it running: `harvest-loop` (each session's lessons become permanent skills) and
|
|
175
|
+
`agent-composer` (orchestrate the dispatch). The other skills wait until you need them — full list below.
|
|
176
176
|
|
|
177
|
-
## 33 skills ·
|
|
177
|
+
## 33 skills · 8 agents
|
|
178
178
|
|
|
179
179
|
<details>
|
|
180
180
|
<summary>Full asset activation check</summary>
|
|
@@ -237,18 +237,48 @@ it running: `harvest-loop` (each session's lessons become permanent skills) and
|
|
|
237
237
|
Claude Code does not auto-select models by task complexity — you configure this once.
|
|
238
238
|
|
|
239
239
|
```bash
|
|
240
|
-
/model
|
|
240
|
+
/model sonnet # recommended default — FH dispatches stronger models itself where they matter
|
|
241
241
|
```
|
|
242
242
|
|
|
243
243
|
| Command | Who runs what | Best for |
|
|
244
244
|
|---|---|---|
|
|
245
|
-
| `/model
|
|
245
|
+
| `/model sonnet` | Sonnet session; FH dispatches higher-tier sub-agents on declared floors | **FH default** — operation + routine dev |
|
|
246
|
+
| `/model opus` | Opus handles everything | Harness-editing sessions (Mode D) · maximum depth on every turn |
|
|
246
247
|
| `/model opusplan` | Opus *plans* · Sonnet executes *(when Opus engages)* | Cost-conscious routine coding — see caveat |
|
|
247
|
-
| `/model sonnet` | Sonnet handles everything | Fast, simple tasks (no FH gates) |
|
|
248
248
|
|
|
249
|
-
**Why
|
|
250
|
-
|
|
251
|
-
|
|
249
|
+
**Why default Sonnet now works**: measured (see §Model setup evidence note below), *operating* FH is
|
|
250
|
+
nearly model-flat — the rules in context do most of the work. What still needs a stronger model is a
|
|
251
|
+
small set of depth-sensitive turns, and FH handles those itself: **some skills and agents declare a
|
|
252
|
+
model-tier floor** (e.g. `quench-challenger` floors at opus) and are dispatched as sub-agents at the
|
|
253
|
+
floor tier when your environment can reach it — your session model stays untouched. **FH never switches
|
|
254
|
+
your session model**: a default you set by hand is followed; floors apply only to FH's own sub-agent
|
|
255
|
+
dispatches. If your environment tops out below a floor (e.g. Sonnet-only API routing), the floored
|
|
256
|
+
asset still runs at the best available tier with an explicit `below-floor` flag in its output — degraded
|
|
257
|
+
delivery is visible, never silent (tier-floor resolution: `knowledge/shared/harness-core/multi_model_sidecar_strategy.md §Tier-floor`).
|
|
258
|
+
|
|
259
|
+
**`opusplan` caveat (measured)**: its Opus engagement is **not guaranteed** — in a measured 10-turn run
|
|
260
|
+
it used Opus on **0** turns (CC classifies few turns as "plan-mode"). If you want Opus on every turn,
|
|
261
|
+
pin `/model opus` (22/22 turns Opus in the follow-up run). **Sub-agent dispatch** model is set by the
|
|
262
|
+
dispatch's own `model` parameter; the session model/plan-mode does **not** propagate to sub-agents.
|
|
263
|
+
|
|
264
|
+
> **By role**: running FH (field projects, gates, routine dev) → `/model sonnet` + let the floors
|
|
265
|
+
> escalate. Editing the harness itself (Mode D) → pin the strongest model you have — harness
|
|
266
|
+
> *self-development* is where tier depth measurably pays (design-increment finding), while operation
|
|
267
|
+
> does not. Sub-agent token costs are CC-visible in the session jsonl under `message.model`.
|
|
268
|
+
|
|
269
|
+
**Measured, not asserted** (2026-06-10, worked example): on a 30-point blind rule-application battery,
|
|
270
|
+
*operating* FH was nearly model-flat — Opus 4.8 / Sonnet 4.6 / Haiku 4.5 scored **100 / 97 / 94** against
|
|
271
|
+
a top-tier anchor at 100, with the rules in context doing most of the work. The tiers separated only on
|
|
272
|
+
above-rubric *design* increments (developing the harness, not running it) — which is why the default is
|
|
273
|
+
Sonnet with **tier-floored dispatch** covering the depth-sensitive turns, and a pinned stronger model is
|
|
274
|
+
recommended only for harness-editing sessions. Details: `docs/OUTPUT_EVIDENCE.md` §Validation signals.
|
|
275
|
+
|
|
276
|
+
**Measured, not asserted** (2026-06-10, worked example): on a 30-point blind rule-application battery,
|
|
277
|
+
*operating* FH was nearly model-flat — Opus 4.8 / Sonnet 4.6 / Haiku 4.5 scored **100 / 97 / 94** against
|
|
278
|
+
a top-tier anchor at 100, with the rules in context doing most of the work. The tiers separated only on
|
|
279
|
+
above-rubric *design* increments (developing the harness, not running it) — which is exactly why the
|
|
280
|
+
recommendation stays Opus for harness-editing and gate turns, while field operation tolerates lower tiers.
|
|
281
|
+
Details: `docs/OUTPUT_EVIDENCE.md` §Validation signals.
|
|
252
282
|
|
|
253
283
|
If you use external CLIs (Gemini, Codex, `gh copilot`) as sidecars, their costs are billed to their own quota and not visible in CC's token display.
|
|
254
284
|
|
|
@@ -293,4 +323,5 @@ External convergence:
|
|
|
293
323
|
| [`AGENTS.md`](AGENTS.md) | Runtime agent specs |
|
|
294
324
|
| [`CATALOG.md`](CATALOG.md) | Past work search index |
|
|
295
325
|
| [`CONTRIBUTING.md`](docs/CONTRIBUTING.md) | How to contribute skills and patterns |
|
|
326
|
+
| [`tracks/_contrib/`](tracks/_contrib/README.md) | **Consent lane** — share a de-identified work session; the repo compounds across operators, not just locally |
|
|
296
327
|
| [`fh_integration_contract.md`](knowledge/shared/harness-core/fh_integration_contract.md) | Governance gate spec |
|
package/docs/CONTRIBUTING.md
CHANGED
|
@@ -12,6 +12,7 @@ If you'd like to make forge-harness better, pull requests are welcome.
|
|
|
12
12
|
| **Templates** | Common files to add under `templates/` |
|
|
13
13
|
| **Documentation** | README, skill description refinement, typo fixes |
|
|
14
14
|
| **Field pattern harvest** | Proposing a pattern discovered in real use as a skill (see `/field-harvest`) |
|
|
15
|
+
| **Session contribution (consent lane)** | Share a de-identified work session in `tracks/_contrib/` — see §Consent Lane below |
|
|
15
16
|
|
|
16
17
|
## Choosing a Contribution Path — Check First
|
|
17
18
|
|
|
@@ -24,6 +25,21 @@ If you'd like to make forge-harness better, pull requests are welcome.
|
|
|
24
25
|
|
|
25
26
|
> **Adding a new skill = Full path required**. Improving an existing skill = Lightweight path available.
|
|
26
27
|
|
|
28
|
+
## Consent Lane — share a session (`tracks/_contrib/`)
|
|
29
|
+
|
|
30
|
+
Everything under `tracks/` is private by design **except** `tracks/_contrib/` — the consent lane.
|
|
31
|
+
Placing a session file there is your explicit consent to publish it; it lands via PR through the lane
|
|
32
|
+
gate (`/public-surface-audit` + `/marketplace-gate` Check 5 + ingest contradiction scan + reviewer pass).
|
|
33
|
+
|
|
34
|
+
- Start from `templates/contrib_session.md` → `tracks/_contrib/{your-handle}/{topic}/session_YYYY_MM_DD_{slug}.md`
|
|
35
|
+
- **De-identify first** (no employer/project/colleague names, paths, domains, credentials) — the gate
|
|
36
|
+
re-checks, but scrubbing is yours first
|
|
37
|
+
- Full charter, gate table, and what-happens-after-merge: [`tracks/_contrib/README.md`](../tracks/_contrib/README.md)
|
|
38
|
+
- Overhead: minimal — the skill-PR rules don't apply (it's a session, not a skill); only the lane gate + frontmatter floor
|
|
39
|
+
|
|
40
|
+
Merged sessions get CATALOG credit under your handle, and `harvest-loop` distills repeating patterns
|
|
41
|
+
into shared knowledge/skills — your session becomes compound interest for every cloner.
|
|
42
|
+
|
|
27
43
|
## PR Rules (Short Version)
|
|
28
44
|
|
|
29
45
|
1. **New skill** → Create `plugins/fh-meta/skills/{name}/SKILL.md` + add version line to `plugins/fh-meta/CHANGELOG.md`
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@chrono-meta/fh-gate",
|
|
3
|
-
"version": "1.4.
|
|
3
|
+
"version": "1.4.8",
|
|
4
4
|
"description": "FH runtime adapters — run FH governance, skills, and agents via Claude or Codex with machine-parseable gates.",
|
|
5
5
|
"license": "MIT",
|
|
6
6
|
"keywords": [
|
|
@@ -23,7 +23,9 @@
|
|
|
23
23
|
"fh-goal": "bin/fh-goal.js"
|
|
24
24
|
},
|
|
25
25
|
"scripts": {
|
|
26
|
-
"prepare": "chmod +x bin/fh-gate.js bin/fh-run.js bin/fh-goal.js scripts/fh-gate.sh scripts/fh-run.sh scripts/fh-goal.sh"
|
|
26
|
+
"prepare": "chmod +x bin/fh-gate.js bin/fh-run.js bin/fh-goal.js scripts/fh-gate.sh scripts/fh-run.sh scripts/fh-goal.sh",
|
|
27
|
+
"test": "bash scripts/selfcheck.sh",
|
|
28
|
+
"prepublishOnly": "bash scripts/selfcheck.sh"
|
|
27
29
|
},
|
|
28
30
|
"engines": {
|
|
29
31
|
"node": ">=16"
|
|
@@ -54,6 +56,7 @@
|
|
|
54
56
|
"scripts/fh-gate.sh",
|
|
55
57
|
"scripts/fh-run.sh",
|
|
56
58
|
"scripts/fh-goal.sh",
|
|
59
|
+
"scripts/selfcheck.sh",
|
|
57
60
|
"plugins/fh-meta/skills",
|
|
58
61
|
"plugins/fh-meta/agents",
|
|
59
62
|
"plugins/fh-commons/skills",
|
|
@@ -19,7 +19,14 @@ description: Dedicated quench attack-prescription synthesis agent — Devil (6-a
|
|
|
19
19
|
If even one S-tier finding exists, registration is blocked. A-tier and below allow registration after fix recommendations.
|
|
20
20
|
</commentary>
|
|
21
21
|
</example>
|
|
22
|
-
model:
|
|
22
|
+
model: opus
|
|
23
|
+
# model is a HARD FLOOR (tier-floor resolution — multi_model_sidecar_strategy.md §Tier-floor):
|
|
24
|
+
# adversarial increment-finding is the depth-sensitive class. floor: hard semantics — the floor
|
|
25
|
+
# outranks diversity: prefer any floor-meeting engine (incl. native Tier-3 at opus) over a below-floor
|
|
26
|
+
# diversity engine. Only when NO engine reaches the floor, dispatch at best available with the
|
|
27
|
+
# below-floor header (e.g. "challenger: sonnet (below-floor; floor=opus)") — never hard-fail.
|
|
28
|
+
# Below-floor judged verdicts are PROVISIONAL (not gate-PASS evidence) until floor-tier re-run or
|
|
29
|
+
# explicit operator acceptance; the weekly audit is the standing re-quench consumer.
|
|
23
30
|
color: red
|
|
24
31
|
tools: Read, Grep, Glob
|
|
25
32
|
version: 0.2
|
|
@@ -29,11 +29,47 @@ model: sonnet
|
|
|
29
29
|
|
|
30
30
|
```
|
|
31
31
|
Priority:
|
|
32
|
+
0. /deep-research built-in available (check live session skill list)
|
|
33
|
+
→ use it as the collection+verification engine (staged source gathering,
|
|
34
|
+
cross-checking, cited synthesis) — Tier-0 route, no API key needed
|
|
32
35
|
1. ANTHROPIC_API_KEY environment variable → Claude Sonnet
|
|
33
36
|
2. Neither → WebSearch mode (raw data only, no synthesis)
|
|
34
37
|
```
|
|
35
38
|
|
|
36
|
-
Report detection result in one line. Example: `🔑 Using Claude Sonnet`
|
|
39
|
+
Report detection result in one line. Example: `🔑 Using Claude Sonnet` / `🔑 Engine: /deep-research (built-in)`
|
|
40
|
+
|
|
41
|
+
---
|
|
42
|
+
|
|
43
|
+
## Step 0.5. Operator Intake (speculative-interview arm — walled channels)
|
|
44
|
+
|
|
45
|
+
On **cadence-triggered** runs (7d), ask the operator one line before collecting:
|
|
46
|
+
|
|
47
|
+
> *"이번 주에 본 벽 뒤 소스(YouTube·LinkedIn·X 등 기계가 못 닿는 링크/요약)가 있으면 던져 주세요 — triage해서 기록합니다. 없으면 그냥 진행할게요."*
|
|
48
|
+
|
|
49
|
+
- Operator may skip — zero pressure; the autonomous arms (Step 1) run regardless.
|
|
50
|
+
- Why: walled channels return 403 to machine fetch — **the operator is the only wide-net sensor
|
|
51
|
+
for them**. This arm turns ad-hoc link-throwing into a scheduled intake (manual-validated n=2).
|
|
52
|
+
- Received sources route to the sister-asset/triage flow with its **lightweight path**: C-tier
|
|
53
|
+
(territory already covered) = one-paragraph entry only; full cross-audit reserved for A/B-tier.
|
|
54
|
+
Partial wall-bypass is allowed first: try WebSearch + secondary sources before declaring unfetchable.
|
|
55
|
+
- **Video sources (local/laptop only — cloud VMs typically 403 on video hosts)**: resolve a
|
|
56
|
+
*video-harvest* capability via the Sidecar Engine Resolution Protocol
|
|
57
|
+
(`multi_model_sidecar_strategy.md`) — probe by **capability, not engine name**. A CLI that is a
|
|
58
|
+
valid sidecar for other tasks is not automatically a video-harvest engine.
|
|
59
|
+
**Tier 1 — a natively multimodal CLI that ingests the URL directly** (verified 2026-06-10,
|
|
60
|
+
laptop, video `oZUeRib1Xec`): the current verified invocation is `gemini --skip-trust -p "{URL}"`
|
|
61
|
+
→ grounded timestamped summary ✅. ⚠️ **the direct `gemini` CLI is being sunset (vendor EOL
|
|
62
|
+
2026-06-18)** — after that date probe `agy` (the Antigravity router-shell successor, same class)
|
|
63
|
+
or the Gemini API; see `multi_model_sidecar_strategy.md §Binary names churn`. A coding-agent CLI
|
|
64
|
+
with no native video/transcript access (`codex`) **cannot** — it spent ~67K tokens, recovered
|
|
65
|
+
only the title, then asked for a pasted transcript ❌; do not route video to it. Agentic
|
|
66
|
+
router-shells (`agy`) get approval-mode first.
|
|
67
|
+
**Tier 3 — conditional fallback (not guaranteed)**: Claude harvests the transcript via
|
|
68
|
+
`yt-dlp --write-auto-subs --skip-download` and summarizes — fine for talk-style content, but
|
|
69
|
+
needs `ffmpeg` + a `curl_cffi` impersonation target; the timedtext endpoint may return HTTP 429,
|
|
70
|
+
and that dep has no wheel on brew's Python 3.14 (blocked on this machine 2026-06-10).
|
|
71
|
+
Unresolvable (cloud, no sidecar; or all tiers blocked) → operator summary remains the path, as
|
|
72
|
+
today.
|
|
37
73
|
|
|
38
74
|
---
|
|
39
75
|
|
|
@@ -280,14 +280,14 @@ Runs after Step C (or after Step B if no capability gap). Selects an adversarial
|
|
|
280
280
|
|
|
281
281
|
| Task scope signal | Sidecar | Invocation point |
|
|
282
282
|
|---|---|---|
|
|
283
|
-
| Code quality review / new SKILL.md / governance gate change | `steel-quench`
|
|
283
|
+
| Code quality review / new SKILL.md / governance gate change | `steel-quench` cross-provider challenger (Gemini sidecar if available; Tier 3 Claude sub-agent fallback per the Sidecar Engine Resolution Protocol) | Post-/goal, before pipeline-conductor |
|
|
284
284
|
| Architecture design / cross-project dependency | `agent-composer` multi-model panel (if external CLIs available; otherwise single-Claude sub-agent) | Parallel to /goal as separate Agent |
|
|
285
285
|
| External publication / marketplace-gate / skill release | `sim-conductor` + `steel-quench` Wave 5 | Post-/goal quality gate |
|
|
286
286
|
| No signal match (default) | None — pipeline-conductor handles quality alone | — |
|
|
287
287
|
|
|
288
288
|
Write resolved sidecar to `.active`:
|
|
289
289
|
```
|
|
290
|
-
sidecar: none | steel-quench-
|
|
290
|
+
sidecar: none | steel-quench-crossprovider | agent-composer-panel | sim-conductor | {cli-name}
|
|
291
291
|
sidecar_rationale: {one-line reason — which scope signal triggered this}
|
|
292
292
|
```
|
|
293
293
|
|
|
@@ -410,7 +410,7 @@ After pipeline-conductor completes, read `sidecar:` from `.pending`. If non-none
|
|
|
410
410
|
|
|
411
411
|
| Sidecar | Wait for | Failure action |
|
|
412
412
|
|---|---|---|
|
|
413
|
-
| `steel-quench-
|
|
413
|
+
| `steel-quench-crossprovider` | Wave 2 convergence: PASS or CONDITIONAL_PASS | FAIL → reopen Phase 2, surface blocking findings to user |
|
|
414
414
|
| `agent-composer-panel` | Panel findings file written + incorporated into pipeline-conductor context | Not yet written → re-run pipeline-conductor with panel findings |
|
|
415
415
|
| `sim-conductor` + `steel-quench-w5` | Both: sim-conductor Area A PASS **and** steel-quench W5 convergence PASS | Either FAIL → block Done When; surface failing verdict |
|
|
416
416
|
|
|
@@ -438,7 +438,7 @@ After each goal-quench run, append to `tracks/_meta/goal_quench_{YYYY-MM-DD}.md`
|
|
|
438
438
|
actual_vs_estimate_ratio: N.N # actual / estimated (e.g., 4.7 means actual was 4.7× estimate)
|
|
439
439
|
budget_verdict: GREEN/YELLOW/ORANGE/RED
|
|
440
440
|
pipeline_verdict: CLEAN/PENDING/BLOCKED/ESCALATE
|
|
441
|
-
sidecar: none | steel-quench-
|
|
441
|
+
sidecar: none | steel-quench-crossprovider | agent-composer-panel | sim-conductor | {cli-name}
|
|
442
442
|
sidecar_type: none | cc-subagent | external-cli # cc-subagent = isolated Sonnet context; external-cli = untracked
|
|
443
443
|
sidecar_model: none | sonnet | {model-name} # standard baseline: sonnet; external CLIs vary by tier
|
|
444
444
|
orchestrator_tokens: N | unknown # Opus orchestrator CC tokens (pro/max only; visible in CC)
|
|
@@ -107,6 +107,7 @@ File: {path}
|
|
|
107
107
|
| .claude/rules files unmodified 90+ days | R-tier |
|
|
108
108
|
| settings.json JSON syntax error | M-tier |
|
|
109
109
|
| Hooks in settings.json (PostToolUse/Stop) that don't fire in Agent View | S-tier or M-tier |
|
|
110
|
+
| Public-surface FH recommendation or default changed — no N-shot measurement evidence traceable in session records or PR body | S-tier |
|
|
110
111
|
|
|
111
112
|
Hook divergence verdict: 0 hooks = Normal · 1+ hooks (session-end/Stop) = S-tier · 1+ hooks (PostToolUse file writes or external API) = M-tier (data loss risk in Agent View).
|
|
112
113
|
|
|
@@ -115,6 +116,10 @@ Hook divergence verdict: 0 hooks = Normal · 1+ hooks (session-end/Stop) = S-tie
|
|
|
115
116
|
- Tracks with no sync in 30+ days → R-tier
|
|
116
117
|
- CATALOG.md: 5+ open items in recent 5 sessions → S-tier
|
|
117
118
|
- Field project CLAUDE.md missing → S-tier
|
|
119
|
+
- **Knowledge cross-ref lint**: `knowledge/**/*.md` with no CATALOG.md entry → S-tier
|
|
120
|
+
(index orphan — unfindable via CATALOG-first search); with no inbound reference from
|
|
121
|
+
any CLAUDE.md / rules / SKILL.md / knowledge doc → R-tier (orphan page).
|
|
122
|
+
Mechanical: `grep -L` filenames against CATALOG.md, then `grep -rl` for inbound refs.
|
|
118
123
|
|
|
119
124
|
### Step 6. L5 — Pattern Analysis *(FH only)*
|
|
120
125
|
|
|
@@ -154,8 +159,19 @@ Hook divergence verdict: 0 hooks = Normal · 1+ hooks (session-end/Stop) = S-tie
|
|
|
154
159
|
| **Context Drift** | Stale paths in CLAUDE.md (L3) · rules unmodified 90+ days (L3) · CLAUDE.md over threshold (L2) | S |
|
|
155
160
|
| **Schema Misalignment** | Plugin count drift · SKILL.md missing `Done When` (undocumented contract) | M |
|
|
156
161
|
| **State Degradation** | tracks/ no sync 30+ days (L4) · INACTIVE_90D skills (L5-A) · orphaned memory entries | M |
|
|
162
|
+
| **Agentic Laziness** *(runtime)* | Completion claims without per-item evidence — "all N done" with no enumerable list in session records / PR bodies · Done When lacking any mandatory-pass condition | S |
|
|
163
|
+
| **Self-Preferential Bias** *(runtime)* | judged-class check without a named adversarial pairing (gate-rejectable for new skills; scan existing for backfill) · a self-graded verdict cited as sole completion evidence | M |
|
|
164
|
+
| **Goal Drift** *(runtime)* | Long session with S/A-tier work but no pre-compaction completion log (`fh_completed_*` missing) · early-stated constraints absent from card/manifest at close | S |
|
|
165
|
+
| **Comprehension Debt** *(runtime/operator)* | Merged FH-asset change with no CATALOG entry · edit_manifest `validation_status: pending` backlog piling up unverified · session closed with work done but zero card delta | S |
|
|
166
|
+
| **Evidence Gap** *(FH self-dev)* | Public-surface FH recommendation or default changed with no N-shot measurement evidence traceable in session records or PR history (L3 check) | S |
|
|
157
167
|
|
|
158
168
|
Signal in 2+ classes → escalate one tier.
|
|
169
|
+
Rows 1–3 = structural (2026-06-02 frontier diagnosis). Rows 4–6 = runtime behavioral, agent-side —
|
|
170
|
+
named from dynamic-workflows discourse (2026-06). Row 7 = runtime, operator-side — named from
|
|
171
|
+
loop-engineering discourse (Osmani, 2026-06): loop output outpacing operator understanding. FH
|
|
172
|
+
countermeasures already exist and are what the signals check for: golden probes + multi-persona
|
|
173
|
+
coverage (laziness) · judged-pairing rule (bias) · pre-compaction completion log + card-last guard
|
|
174
|
+
(drift) · CATALOG 3-line summaries + manifest predict-verify + card protocol (comprehension debt).
|
|
159
175
|
|
|
160
176
|
---
|
|
161
177
|
|
|
@@ -247,6 +263,8 @@ Verdict: ✅ CONSISTENT · 🟧 INCONSISTENT (fix before push) · 🟩 REVIEW (i
|
|
|
247
263
|
|
|
248
264
|
**This skill Done When = "prescription report output complete".** Actual resolution of M/S/R items belongs to user or follow-up work.
|
|
249
265
|
|
|
266
|
+
**Check classes** (`harness_6axis_framework.md` §Axis 5): report-output completions above = mandatory-pass (report exists or not). M/S/R tier *assignments* = judged — paired with verify-bidirectional when the user challenges a tier.
|
|
267
|
+
|
|
250
268
|
Verdict: PASS (M-tier 0, "Structure healthy") | CONDITIONAL_PASS (S/R remain, no M) | FAIL (1+ M-tier found) | ESCALATE (structural ambiguity before prescription can be issued)
|
|
251
269
|
|
|
252
270
|
**Three-Doctor Loop chain** (auto-propose after prescription report):
|
|
@@ -49,6 +49,7 @@ Tier is **independent of platform origin** (Anthropic / OpenAI / community). A w
|
|
|
49
49
|
|
|
50
50
|
| Tier | Criteria | Sources |
|
|
51
51
|
|---|---|---|
|
|
52
|
+
| **Tier 0** | Platform built-in — ships with the runtime, zero install, zero token cost to discover | Claude Code built-in skills/commands (e.g. `/deep-research`, `/code-review`, `/review`, `/security-review`, `/batch`, `/loop`, `/fewer-permission-prompts`, `/goal`, `/rewind`, `/team-onboarding`) — inventory varies by version/environment: enumerate from the live session, do not assume |
|
|
52
53
|
| **Tier 1** | Marketplace-listed + performance-validated (benchmark data or production usage evidence) | Anthropic official · Codex marketplace verified · CC marketplace verified · FH community reviewed (steel-quench + sim-conductor validated) |
|
|
53
54
|
| **Tier 2** | Marketplace-listed, no explicit performance data | Any marketplace source (CC marketplace · Codex marketplace · npm · GHE) |
|
|
54
55
|
| **Tier 3** | Not marketplace-listed, source-available (GitHub/npm) | Repo-only agents/plugins |
|
|
@@ -87,13 +88,15 @@ When queried for a specific capability (e.g., "adversarial reviewer for bash cod
|
|
|
87
88
|
|
|
88
89
|
**Discovery order (stop when sufficient Tier 1 candidates found):**
|
|
89
90
|
|
|
91
|
+
0. **Platform built-ins (Tier 0)** — does a built-in skill/command already cover the capability? Check the live session's available-skills list before any plugin search. A built-in that covers ~80% beats installing a plugin for the rest
|
|
90
92
|
1. **Installed locally** — `.claude/agents/`, `plugins/` in current cwd
|
|
91
93
|
2. **FH native skills** — always-loaded knowledge in `plugins/fh-meta/` and `plugins/fh-commons/`
|
|
92
94
|
3. **Claude Code marketplace** — `claude mcp search [capability]` or known CC registry (see verified targets above)
|
|
93
95
|
4. **Codex marketplace** — `npx @openai/codex list-agents [capability]` or known Codex registry
|
|
94
96
|
5. **npm ecosystem** — `@chrono-meta/`, `@anthropic/`, and other known-quality scoped packages
|
|
95
97
|
|
|
96
|
-
**Discovery priority**: installed > FH native > Tier 1 (any platform) > Tier 2 > Tier 3 > Tier 4
|
|
98
|
+
**Discovery priority**: built-in (Tier 0) > installed > FH native > Tier 1 (any platform) > Tier 2 > Tier 3 > Tier 4
|
|
99
|
+
**Tier 0 guard**: FH native wins over a built-in only when the FH skill adds governance the built-in lacks (e.g. `/goal` → `goal-quench` adds budget+quality gates; code diff review stays with built-in `/code-review`, FH-asset coherence with `hub-cc-pr-reviewer`)
|
|
97
100
|
|
|
98
101
|
**When sim-conductor chains here for persona discovery**: apply the same platform-aware search scoped to persona/simulation/review capability tags. Return discovered agents with their Tier rating so sim-conductor can decide whether to install or use a built-in brief.
|
|
99
102
|
|
|
@@ -57,9 +57,12 @@ Check for custom probes:
|
|
|
57
57
|
ls .claude/regression/probes.md 2>/dev/null || echo "NO_CUSTOM_PROBES"
|
|
58
58
|
```
|
|
59
59
|
|
|
60
|
-
**If custom probes exist**: load and use them.
|
|
60
|
+
**If custom probes exist**: load and use them. The hub repo ships its golden probe set
|
|
61
|
+
(known-answer offline eval, 28 probes with check classes) at exactly this path — when
|
|
62
|
+
present it is canonical and supersedes the default matrix below.
|
|
61
63
|
|
|
62
|
-
**If no custom probes
|
|
64
|
+
**If no custom probes** (e.g. Mode C install without the hub repo): use the default
|
|
65
|
+
probe matrix below.
|
|
63
66
|
|
|
64
67
|
#### Default Probe Matrix
|
|
65
68
|
|
|
@@ -71,7 +74,7 @@ ls .claude/regression/probes.md 2>/dev/null || echo "NO_CUSTOM_PROBES"
|
|
|
71
74
|
| `P-TRIGGER-03` | `harness is complex` | harness-doctor proposed | CLAUDE.md §Autonomous |
|
|
72
75
|
| `P-CHAIN-01` | `/field-harvest` | harvest-loop close-chain referenced (wrap-up deferred to CLAUDE.md session-close chain — field-harvest has no inline sim-conductor gate) | field-harvest SKILL.md |
|
|
73
76
|
| `P-CHAIN-02` | `/apex-review` | Conditional verdict present (apex-review vocabulary: "Conditional" / "Conditionally passed") | apex-review SKILL.md |
|
|
74
|
-
| `P-GATE-01` | new skill commit | New Skill Pre-Commit Gate (
|
|
77
|
+
| `P-GATE-01` | new skill commit | New Skill Pre-Commit Gate (6 items, incl. check-class declared) invoked | CLAUDE.md §Gate |
|
|
75
78
|
| `P-CLOSE-01` | `wrap up` / `good work` | Session close chain (4-step) triggered | CLAUDE.md §Wrap-up |
|
|
76
79
|
|
|
77
80
|
---
|
|
@@ -104,7 +107,7 @@ For each affected probe, evaluate:
|
|
|
104
107
|
- For session close chain: confirm all 4 steps in CLAUDE.md §Wrap-up
|
|
105
108
|
|
|
106
109
|
**4-c. Gate presence check** — Are gate conditions still enforced?
|
|
107
|
-
- Pre-commit gate: confirm all
|
|
110
|
+
- Pre-commit gate: confirm all 6 items present in CLAUDE.md (incl. check-class declared)
|
|
108
111
|
- Conditional-pass gate: confirm `CONDITIONAL_PASS` logic in affected SKILL.md
|
|
109
112
|
|
|
110
113
|
Output per probe:
|
|
@@ -161,10 +164,10 @@ Only update on explicit `y` — never auto-update.
|
|
|
161
164
|
|
|
162
165
|
## Done When
|
|
163
166
|
|
|
164
|
-
- All affected probes are evaluated (PASS / FAIL / SKIP)
|
|
165
|
-
- Regression report is output with clear PASS/FAIL verdict
|
|
166
|
-
- If FAIL: specific file + line fix is recommended
|
|
167
|
-
- Baseline updated only on explicit user approval
|
|
167
|
+
- All affected probes are evaluated (PASS / FAIL / SKIP) — class: mandatory-pass
|
|
168
|
+
- Regression report is output with clear PASS/FAIL verdict — class: mandatory-pass
|
|
169
|
+
- If FAIL: specific file + line fix is recommended — class: judged, paired with verify-bidirectional (the fix recommendation is re-checked, not trusted as-is)
|
|
170
|
+
- Baseline updated only on explicit user approval — class: mandatory-pass (HITL)
|
|
168
171
|
|
|
169
172
|
---
|
|
170
173
|
|
|
@@ -41,6 +41,7 @@ A designer's anxiety is most dangerous when vague. steel-quench breaks that anxi
|
|
|
41
41
|
| **Wave 4** (optional) | Meta-Aware Adversary — AI uses its own nature as attack vector | Zero new S-grade + AI-specific criteria |
|
|
42
42
|
| **Wave-P3** (optional) | Gate-passage re-attack — when an upstream gate declares PASS, re-attack the just-passed artifact on Coverage / Narrative / False-confidence | All 3 dimensions Attack Failed |
|
|
43
43
|
| **Wave 5** (optional) | Multi-Team Adversarial Panel — external CLIs or cross-session Claude | Zero new S-grade cross-team |
|
|
44
|
+
| **Wave-T** (after convergence) | Temper — measure complexity the quench *added*; flag over-hardening | τ-PASS or named τ-FAIL |
|
|
44
45
|
|
|
45
46
|
---
|
|
46
47
|
|
|
@@ -68,7 +69,7 @@ Skip: [list of skipped waves with reason]
|
|
|
68
69
|
External CLIs available: [yes/no → Wave 5 available]
|
|
69
70
|
```
|
|
70
71
|
|
|
71
|
-
**Degraded coverage rule**: if a high-weight wave or capability is skipped (user choice, unavailable tool, or scope=internal), flag explicitly in the output header — do not silently proceed.
|
|
72
|
+
**Degraded coverage rule**: if a high-weight wave or capability is skipped (user choice, unavailable tool, or scope=internal) **or runs below its declared model-tier floor** (tier-floor resolution, `multi_model_sidecar_strategy.md §Tier-floor`), flag explicitly in the output header — do not silently proceed. Below-floor example: `challenger: sonnet (below-floor; floor=opus)`; a judged verdict produced below floor is a re-quench candidate once a floor-tier is available.
|
|
72
73
|
|
|
73
74
|
---
|
|
74
75
|
|
|
@@ -194,6 +195,37 @@ domain-coupled (a spec→test-case gate) form to a gate-agnostic boundary hook.
|
|
|
194
195
|
|
|
195
196
|
---
|
|
196
197
|
|
|
198
|
+
## Wave-T — Temper (post-convergence)
|
|
199
|
+
|
|
200
|
+
Quench hardens, but quenched steel is brittle — no smith ships it un-tempered. steel-quench attacks
|
|
201
|
+
until zero new S-grade; nothing in that loop asks whether the hardening itself **introduced complexity
|
|
202
|
+
beyond what the fixes required** (defense scaffolding, decorative wiring). Wave-T is that inverse
|
|
203
|
+
corrective. It runs **after Wave 3+ convergence, before Done When**. It does not attack; it measures
|
|
204
|
+
the cost of the convergence just achieved.
|
|
205
|
+
|
|
206
|
+
| Step | Class | What it does |
|
|
207
|
+
|---|---|---|
|
|
208
|
+
| **T-1 complexity delta** | measured | `bash templates/temper_check.sh <repo> <file> <pre-quench-ref>` — Δlines/sections/steps/tables/fences/cross-refs, baseline (pre-Wave-1) → post-convergence. Prose-only counts: code-fence interiors are excluded (bash comments are not sections) |
|
|
209
|
+
| **T-2 absolute context** | measured | `harness-doctor` L1–L3 on the post-quench asset — absolute complexity tier (reuse, don't reimplement) |
|
|
210
|
+
| **T-3 τ verdict** | judged — paired with the quench's own Wave findings (each flagged construct must trace to a specific finding it allegedly fixes; no judge-only path) | **τ-PASS**: added complexity ⊆ what the fixes required. **τ-FAIL**: the quench introduced a construct that *defends against an attack rather than fixing the flaw* — name it, propose the simpler form, hand back for de-brittling. τ-FAIL is the temper step working, not a quench failure |
|
|
211
|
+
|
|
212
|
+
**T-3 heuristic flags** (any → review, never auto-reject): a new section/table/step whose only referent
|
|
213
|
+
is a Wave-N finding · Δcross-refs ≫ Δsteps (wiring, not function) · the asset crosses a harness-doctor
|
|
214
|
+
complexity tier it was below pre-quench.
|
|
215
|
+
|
|
216
|
+
**Don't-overbuild guard (τ applied to τ)**: Wave-T is one script + this section, reusing harness-doctor
|
|
217
|
+
for the absolute read. If Wave-T grows its own detection engine, it has failed its own test — a temper
|
|
218
|
+
step that adds complexity is self-refuting. Known limits (honest): `temper_check.sh` takes one path —
|
|
219
|
+
renamed files need a manual pre/post measurement; the wiring flag uses strict `>`, so Δxrefs = Δsteps
|
|
220
|
+
does not fire it (the section flag usually carries those cases).
|
|
221
|
+
|
|
222
|
+
**Model note**: T-1 is bash (no model), T-2 reuses harness-doctor (sonnet-rated). T-3 adjudication was
|
|
223
|
+
validated blind on both Opus and Sonnet (3-fixture ground-truth test, 3/3 each, 2026-06-10) — Wave-T
|
|
224
|
+
end-to-end does not require the largest model tier. Opus remains the recommendation for full
|
|
225
|
+
steel-quench runs (challenger waves are broader than T-3).
|
|
226
|
+
|
|
227
|
+
---
|
|
228
|
+
|
|
197
229
|
## External-GT Adjudication (when the target has a public ground truth)
|
|
198
230
|
|
|
199
231
|
When quenching a **public artifact that has its own ground truth** — a repo's open issues, test suite, or
|
|
@@ -0,0 +1,72 @@
|
|
|
1
|
+
#!/usr/bin/env bash
|
|
2
|
+
# selfcheck.sh — mandatory-pass (deterministic) checks on FH's own executable surface.
|
|
3
|
+
# Class: mandatory-pass (harness_6axis_framework.md §Axis 5 check classes) — blocks on fail.
|
|
4
|
+
# Scope: executables shipped via npm files[] + the bash infra driving the FH gate chain.
|
|
5
|
+
# Syntax-only (node --check / bash -n): zero side effects, no network, runs anywhere.
|
|
6
|
+
# Wiring: `npm test` for any session; `prepublishOnly` so a publish cannot ship a
|
|
7
|
+
# syntactically broken executable.
|
|
8
|
+
set -u
|
|
9
|
+
cd "$(dirname "${BASH_SOURCE[0]}")/.."
|
|
10
|
+
fail=0
|
|
11
|
+
|
|
12
|
+
check() { # check <label> <cmd...>
|
|
13
|
+
local label="$1"; shift
|
|
14
|
+
if "$@" 2>/dev/null; then
|
|
15
|
+
echo "PASS $label"
|
|
16
|
+
else
|
|
17
|
+
echo "FAIL $label"
|
|
18
|
+
"$@" || true
|
|
19
|
+
fail=1
|
|
20
|
+
fi
|
|
21
|
+
}
|
|
22
|
+
|
|
23
|
+
# Node executables (npm-shipped)
|
|
24
|
+
for f in bin/*.js; do
|
|
25
|
+
check "node --check $f" node --check "$f"
|
|
26
|
+
done
|
|
27
|
+
|
|
28
|
+
# Bash surface: npm-shipped scripts + local bin wrappers + gate-chain infra
|
|
29
|
+
for f in scripts/*.sh bin/fh-gate bin/fh-run bin/fh-goal \
|
|
30
|
+
templates/regression_guard.sh templates/temper_check.sh templates/predelete_check.sh templates/.git-hooks/pre-commit; do
|
|
31
|
+
[ -f "$f" ] || continue
|
|
32
|
+
check "bash -n $f" bash -n "$f"
|
|
33
|
+
done
|
|
34
|
+
|
|
35
|
+
# Count consistency: stated skill/agent counts vs actual directories.
|
|
36
|
+
# Drift class recurred 4x on 2026-06-10 alone (local_fh_context 26, plugin.json "3 agents",
|
|
37
|
+
# README "5 agents", marketplace.json "3 agents") — this makes the check mechanical and permanent.
|
|
38
|
+
# Active skill = SKILL.md without a deprecation marker (frontmatter `deprecated: true` or
|
|
39
|
+
# "DEPRECATED" in the description block).
|
|
40
|
+
count_active() { # count_active <plugin>
|
|
41
|
+
local n=0 s
|
|
42
|
+
for s in plugins/"$1"/skills/*/SKILL.md; do
|
|
43
|
+
[ -f "$s" ] || continue
|
|
44
|
+
head -20 "$s" | grep -qE 'deprecated: true|DEPRECATED' || n=$((n+1))
|
|
45
|
+
done
|
|
46
|
+
echo "$n"
|
|
47
|
+
}
|
|
48
|
+
count_agents() { ls plugins/"$1"/agents/*.md 2>/dev/null | wc -l | tr -d ' '; }
|
|
49
|
+
|
|
50
|
+
meta_sk=$(count_active fh-meta); meta_ag=$(count_agents fh-meta)
|
|
51
|
+
com_sk=$(count_active fh-commons); com_ag=$(count_agents fh-commons)
|
|
52
|
+
total_sk=$((meta_sk + com_sk)); total_ag=$((meta_ag + com_ag))
|
|
53
|
+
|
|
54
|
+
count_check() { # count_check <label> <file> <expected-string>
|
|
55
|
+
if grep -q "$3" "$2"; then
|
|
56
|
+
echo "PASS count: $1"
|
|
57
|
+
else
|
|
58
|
+
echo "FAIL count: $1 — expected \"$3\" in $2 (actual: fh-meta ${meta_sk}sk/${meta_ag}ag, fh-commons ${com_sk}sk/${com_ag}ag)"
|
|
59
|
+
fail=1
|
|
60
|
+
fi
|
|
61
|
+
}
|
|
62
|
+
count_check "fh-meta plugin.json" plugins/fh-meta/.claude-plugin/plugin.json "${meta_sk} skills + ${meta_ag} agents"
|
|
63
|
+
count_check "fh-commons plugin.json" plugins/fh-commons/.claude-plugin/plugin.json "${com_sk} skills"
|
|
64
|
+
count_check "marketplace.json fh-meta" .claude-plugin/marketplace.json "${meta_sk} skills + ${meta_ag} agents"
|
|
65
|
+
count_check "README header" README.md "${total_sk} skills · ${total_ag} agents"
|
|
66
|
+
count_check "local_fh_context fh-meta" templates/local_fh_context.md "(fh-meta, ${meta_sk})"
|
|
67
|
+
|
|
68
|
+
if [ "$fail" -ne 0 ]; then
|
|
69
|
+
echo "SELFCHECK: FAIL"
|
|
70
|
+
exit 1
|
|
71
|
+
fi
|
|
72
|
+
echo "SELFCHECK: PASS"
|