@chrono-meta/fh-gate 1.4.13 → 1.4.14

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CATALOG.md CHANGED
@@ -8,6 +8,28 @@ AI reads this file first when searching past work. Open individual files for det
8
8
 
9
9
  <!-- Add entries in reverse date order (newest at top) -->
10
10
 
11
+ ### 2026-06-11 | forge-harness | #readme-dedup, #commoditization-defense, #b1-boundary, #seed-vetting, #field-routing
12
+ **File:** README.md (PR #93) + fh-be signals/handoffs (PR #24+) + field project v0.14 (private track)
13
+ Cloud session #4 (Mode D, remote-doable batch): README stale duplicate "Measured, not asserted" block removed (pre-tier-floor copy contradicted default-Sonnet stance) + "Where this sits (2026)" positioning para (gate+loop are the asset — plumbing commoditizes). fh-be: B1 scope boundary (AW2 simple-vs-complex import), SC1–4 seed vetting (SC1 phantom arXiv fixed → 2509.19349; SC3 commoditization threat to bet ID), gstack positioning-triangle line. Field: private-track companion handoff verified+executed — DefectPatternMatcher+Bug Mode implemented UTF-8-clean in the field repo (25 tests PASS, 0 regressions); its OpenCode(sLLM) lane's downgrade triage hardened to 2-tier (keep/block) with free-tier 80/20 split — β/SC2 field case logged.
14
+ - Decision: README action PARTIAL (merge done; About refresh + npm 1.4.14 laptop-bound); full-TC locked on qwen build per operator's CaseCraft limit measurement — structure-transform survives (P6/P6.5 deterministic), meaning-fill routes to frontier.
15
+
16
+ ### 2026-06-11 | forge-harness | #mcp-gating, #external-mcp, #name-keyed-policy, #measured-origin, #field-template
17
+ **File:** templates/.claude/rules/mcp_tool_gating.md (+ auto_project_mapping.md §6 row 4, CLAUDE.md mount-intent trigger)
18
+ Cloud session (Mode D, ext): new field template — external-MCP tool gating with three tiers (ask / ask-meta-write / allow-untrusted-read), name-keyed because server-supplied annotations are unreliable (measured same-day: live messaging-class MCP shipped all-None hints incl. irreversible send + approval-resolution tools — fh-be `signal_2026-06-11_hermes-mcp-cloud-boot.md`). Opus challenger caught the name-spoofing hole (server controls names too → behavior-confirmation required for non-ask tiers, fixed inline); sonnet blind sim PASS on unfilled-§3 scenario (per-item ask on send, batch approval-grant refused).
19
+ - Decision: prefer host-native per-tool permission config as enforcement; this template = what-to-gate + portable fallback. §6 install row is conditional (MCP present); the proactive mount-intent trigger is the load-bearing path.
20
+
21
+ ### 2026-06-11 | forge-harness | #identity-marker, #door-skeleton, #target-tier-sim, #below-floor-consumer, #false-control-kill
22
+ **File:** CLAUDE.md §Active Onboarding (+ fh_detail_protocols.md, scripts/below_floor_scan.sh, .claude/rules/operations.md)
23
+ Cloud session (Mode D): 🐿️ marker folded into the returning-user door skeleton — one salience unit with the menu, closing the sonnet-sim marker-drop residual (`fh_signal_2026-06-11`); verified by blind sonnet Agent sim PASS (verification tier = failure tier; in-session model-pinned dispatch available in cloud, no headless fallback needed). 6/15 billing one-line amendment rode the same CLAUDE.md edit (no churn commit). below_floor_scan.sh ships as the standing consumer the pre-commit hook promised ("weekly audit re-queues below-floor markers") but never had — resolution via `floor-rerun:`/`floor-writeoff:` marker appends, wired as weekly Phase 1.5 step 2.
24
+ - Decision: ack rubber-stamp (card item 4) closed via route-around — build the re-run consumer, leave the ack untouched (per opus-challenger verdict on the reverted regex attempt).
25
+ - Decision: opus challenger PASS 0S+3B — marker-append/hook-collision replicated CLEAN; P9 check: "builds the control, not paper-over".
26
+
27
+ ### 2026-06-11 | forge-harness | #p9, #harness-bulk, #check-class, #model-portability, #steel-quench
28
+ **File:** plugins/fh-meta/skills/steel-quench/SKILL.md (+ knowledge/shared/harness-core/multi_model_sidecar_strategy.md)
29
+ Two field-validated generalizations promoted to the public mirror (origin: 2026-06-08 field reversal — a weak open-weight model's domain ceiling proved iteration-proof while the pipeline thickened to compensate): (a) steel-quench Cross-Project Patterns gains **P9 harness-bulk-as-model-compensation** (route the task class to a stronger model; never paper over a capability ceiling with more harness — signals are measured: steps added for one model's weakness, quality flat while step count rises); (b) sidecar strategy gains the **check-class = model-portability map** principle (mandatory-pass + mechanical-measured port by construction; judged is where model choice matters, bounded by judged-pairing and §Floor governance re-runs).
30
+ - Decision: 4-axis gate ran with opus challenger (CONDITIONAL_PASS → A-grade measured-class over-claim narrowed inline + 4 B fixes) + blind sonnet target-tier sim (P9 fixture correctly classified + routed) — the salience row survives the field tier.
31
+ - Decision: P9 scope-tagged to the field axis ("simpler over time"); meta-harness complexity that earns its scope is explicitly distinguished.
32
+
11
33
  ### 2026-06-10 | forge-harness | #destructive-op-gate, #irreversibility, #silent-loss, #branch-cleanup-incident
12
34
  **File:** CLAUDE.md §Destructive-Op Gate (+ templates/predelete_check.sh, scripts/selfcheck.sh)
13
35
  Third irreversibility gate (sibling of Pre-Publish): enumerate → recover → destroy, never destroy-then-check. predelete_check.sh classifies branches SAFE/CHECK/REVIEW (CHECK = 0 unique paths but commits off base — shared files may hold newer content, the silent-loss class); REVIEW blocks scripted deletes (exit 1); the recovery step is judged/depth-sensitive (strongest-tier floor semantics). Signal-table row fires on destructive intents proactively. Origin: same-day incident — a parallel session's card (weekly-audit done + #88) lived only on an unmerged 0-unique-path branch; pre-deletion enumeration recovered it. Dogfood replay: the script lands that exact branch in CHECK.
package/CLAUDE.md CHANGED
@@ -103,9 +103,14 @@ Simplification guard: trivial denials with one obvious fix → state block + sin
103
103
 
104
104
  **4-step summary**: ① Auto-read CLAUDE.md + CATALOG + session card + registry scan → ② One-line proposal (new user / exploratory / returning branches) → ③ 5-skill cascade (plugin-recommender → synergy → .claudeignore → model → verify) → ④ Approval + setup
105
105
 
106
- **Returning-user door skeleton (summary-level — applies even if the detail file read is skipped; returning users only, new/exploratory branches stay in the detail file)**: open with the fixed 3-door menu *"① Connect/map a project · ② Work on a mapped project — {field candidates} · ③ FH self-development — {FH worklist}"* — composing session-card candidates **into doors ②/③**, never as a raw priority dump that replaces the menu. An urgent open item (time-windowed handoff · blocking external deadline) outranks the menu; an explicit task utterance skips it entirely (see Guards below); cadence reminders (§Cadence Rules) ride below it, they don't displace it. Canonical source: `fh_detail_protocols.md` Step 2 §Returning user keep door labels in sync.
106
+ **Returning-user door skeleton (summary-level — applies even if the detail file read is skipped; returning users only, new/exploratory branches stay in the detail file)**: open with the fixed 3-door menu, **🐿️ on its own line as the skeleton's first line**
107
107
 
108
- **Identity marker**: every greeting response (Step ②) opens with 🐿️ on its own line. FH's session-start signal — see `fh_detail_protocols.md` Step 2 for full greeting templates.
108
+ > 🐿️
109
+ > *"① Connect/map a project · ② Work on a mapped project — {field candidates} · ③ FH self-development — {FH worklist}"*
110
+
111
+ The marker is part of this skeleton — one salience unit with the menu, not a separate rule (origin: a sonnet-tier sim emitted the menu but dropped the standalone marker rule, `fh_signal_2026-06-11`). Compose session-card candidates **into doors ②/③**, never as a raw priority dump that replaces the menu. An urgent open item (time-windowed handoff · blocking external deadline) outranks the menu; an explicit task utterance skips it entirely (see Guards below); cadence reminders (§Cadence Rules) ride below it, they don't displace it. Canonical source: `fh_detail_protocols.md` Step 2 §Returning user — keep door labels in sync.
112
+
113
+ **Identity marker**: every greeting response (Step ②) opens with 🐿️ on its own line. For returning users it is embedded in the door skeleton above (do not strip it when composing doors); new/exploratory branch templates carry it in `fh_detail_protocols.md` Step 2.
109
114
 
110
115
  **Guards**: explicit task-entry utterance → skip onboarding · once per session · code/debug requests → start working directly · project routing is a suggestion, mention at most once
111
116
  **Metadata-is-not-intent guard**: the trigger is the user's **typed message only**. Session metadata — branch name (auto-derived from the first message, e.g. `claude/korean-greeting-*`), repo name, file paths — is **never** a task spec and never suppresses or redirects the greeting trigger. A bare greeting fires onboarding even when the branch name looks like a feature request; if the only "task" signal lives in metadata and not in what the user typed, treat the message as a greeting and run the 3-axis scaffold.
@@ -178,7 +183,9 @@ the operator (one line), mirroring §Floor governance.
178
183
 
179
184
  If `model:`-pinned dispatch is unavailable (plan/billing gate), fall back to a cross-session headless
180
185
  run (`claude -p "<trigger>" --model <tier>` in the target cwd) — stronger isolation, zero instruction
181
- contamination. Record sim results in the Axes 2–3 marker + sub-agent invocation log.
186
+ contamination. 2026-06-15+: headless `claude -p` draws from the hard-capped credit pool, not the
187
+ subscription — prefer in-session Agent dispatch when the plan gate allows; take the headless fallback
188
+ knowingly. Record sim results in the Axes 2–3 marker + sub-agent invocation log.
182
189
 
183
190
  **Axis ownership** (each skill is already complete — orchestrator only coordinates):
184
191
 
@@ -299,6 +306,7 @@ Proposal format: `"I see [X]. Want me to run /[skill] to [one-line description]?
299
306
  | "delete the branch", "브랜치 삭제", "브랜치 정리", "clean up branches", "force-push", "rewrite history", "지워도 돼?" (destructive intent — **proactive**, fire *before* the action) | **Destructive-Op Gate** (see above → enumerate → recover → destroy; `templates/predelete_check.sh`) |
300
307
  | "look at this again", "is this right", "counterargument", "re-validate" | `/verify-bidirectional` |
301
308
  | "MCP failing", "tool keeps erroring", "circuit-breaker", "same error looping" | `/mcp-circuit-breaker` |
309
+ | "add this MCP server", "mount this MCP", "mcp.json에 추가", "connect this tool server" (external-MCP mount intent — **proactive**, fire *before* first tool call; mount intent only — a failing/erroring mounted server is `/mcp-circuit-breaker`'s row above) | `templates/.claude/rules/mcp_tool_gating.md` (name-keyed ask/allow table — never trust server annotations or names; fill §3 at mount time) |
302
310
  | "token budget", "how expensive", "estimate tokens", "will this cost a lot" | `/token-budget-gate` |
303
311
  | "did my rule change break anything", "regression check", "test harness changes" | `/prompt-regression` |
304
312
  | "SKILL.md too large", "split this skill", "skill is bloated", "skill file too long" | `/skill-splitter` |
package/README.md CHANGED
@@ -99,6 +99,12 @@ forge-harness is structured as **two distinct layers**:
99
99
 
100
100
  The methodology layer is the portable core — persistent hub, accumulating learnings, curating cross-project knowledge. The automation layer makes it frictionless when running Claude Code.
101
101
 
102
+ **Where this sits (2026):** "harness engineering" is now a public paradigm — and basic agent
103
+ orchestration is rapidly commoditizing into standard infrastructure. FH deliberately stakes nothing on
104
+ that plumbing. Its durable layer is what does *not* commoditize: the governance gates (adversarial ·
105
+ phantom · regression), drift control, and the cross-project compounding loop. Routing and dispatch are
106
+ means; **the gate and the loop are the asset.**
107
+
102
108
  ```
103
109
  forge-harness/ ← the hub (persistent brain)
104
110
  ├── knowledge/ → shared across all projects
@@ -273,13 +279,6 @@ above-rubric *design* increments (developing the harness, not running it) — wh
273
279
  Sonnet with **tier-floored dispatch** covering the depth-sensitive turns, and a pinned stronger model is
274
280
  recommended only for harness-editing sessions. Details: `docs/OUTPUT_EVIDENCE.md` §Validation signals.
275
281
 
276
- **Measured, not asserted** (2026-06-10, worked example): on a 30-point blind rule-application battery,
277
- *operating* FH was nearly model-flat — Opus 4.8 / Sonnet 4.6 / Haiku 4.5 scored **100 / 97 / 94** against
278
- a top-tier anchor at 100, with the rules in context doing most of the work. The tiers separated only on
279
- above-rubric *design* increments (developing the harness, not running it) — which is exactly why the
280
- recommendation stays Opus for harness-editing and gate turns, while field operation tolerates lower tiers.
281
- Details: `docs/OUTPUT_EVIDENCE.md` §Validation signals.
282
-
283
282
  If you use external CLIs (Gemini, Codex, `gh copilot`) as sidecars, their costs are billed to their own quota and not visible in CC's token display.
284
283
 
285
284
  ---
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@chrono-meta/fh-gate",
3
- "version": "1.4.13",
3
+ "version": "1.4.14",
4
4
  "description": "FH runtime adapters — run FH governance, skills, and agents via Claude or Codex with machine-parseable gates.",
5
5
  "license": "MIT",
6
6
  "keywords": [
@@ -259,6 +259,7 @@ a single-family pass repeated still misses what cross-family catches, and a targ
259
259
  | P6 | **AI dependency single point of failure** | Claude API/MCP removal causes collapse | Document graceful degradation + fallback |
260
260
  | P7 | **Hallucination-contaminated defense** | Defense relies on LLM inference, not measurement | Mandate citing original file/commit/value |
261
261
  | P8 | **Context Collapse unguarded** | Key instructions lost to compression → harness silent | Review CLAUDE.md compact repeated insertion |
262
+ | P9 | **Harness-bulk as model compensation** | Pipeline thickened to substitute for a model capability ceiling (a gap no iteration count closes — e.g. domain understanding) — complexity replaces missing capability, violating the field axis "simpler over time" (meta-harness: distinguish from complexity that earns its scope) | Route the task class to a stronger model; never paper over the ceiling with more harness. Signals: steps added for one model's weakness; step count rising while class quality stays flat across iterations |
262
263
 
263
264
  Add new rows as new patterns are discovered.
264
265