npm - @chrono-meta/fh-gate - Versions diffs - 1.4.26 → 1.4.27 - Mend

@chrono-meta/fh-gate 1.4.26 → 1.4.27

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/CLAUDE.md +24 -2
package/package.json +1 -1
package/plugins/fh-meta/skills/context-bridge-dispatch/SKILL.md +4 -2
package/plugins/fh-meta/skills/edit-manifest/SKILL.md +60 -12
package/plugins/fh-meta/skills/context-bridge-dispatch/SKILL_detail.md +0 -144

package/CLAUDE.md CHANGED Viewed

@@ -141,7 +141,19 @@ All 6 items below must pass before committing a new SKILL.md. If any fails, fix
 Skills without a Done When definition automatically qualify as harness-doctor L2 M-tier.
 Check-class declaration applies to **new** skills; existing skills backfill opportunistically
-(when next edited), not retroactively.
+(when next edited), not retroactively. **Trigger-accuracy probe backfill** follows the same
+opportunistic rule but scoped to **routing/gate skills only**: when an existing router/gate skill is
+edited (especially when its trigger phrases change), run steel-quench `Step 0.5 — Trigger-Accuracy
+Probe` on the changed trigger surface and record the fire-count — turning "do these triggers collide?"
+from a guess into a number. Not a retroactive sweep of all routers (that would be decorative
+over-work); it rides the edit that touches the router. **One-time baseline floor** (closes the
+never-edited-router gap — a stable router accumulates the most un-probed traffic): existing
+routing/gate skills get **one** baseline Step-0.5 probe at the next `harness-doctor` run (the 30-day
+cadence already enumerates skills), then opportunistic-on-edit thereafter — a single baseline pass,
+not a recurring sweep. **"routing/gate skill" (mechanical test)**: a skill whose *primary output is a
+dispatch decision or a pass/block verdict* — e.g. `agent-composer`, `goal-quench`, `asset-placement-gate`,
+`return-path-gate`, `phantom-quench` — NOT a skill that merely calls others as sub-steps (e.g.
+`harvest-loop`).
 ---
@@ -161,7 +173,17 @@ FH asset modified → Axis 1 (regression_guard.sh --pr {BRANCH})
       hook validates mechanically: below-floor blocks without below-floor-ack, and
       axis2-evidence must be non-vacuous — a recorded verdict/count, not "it ran". Honest
       scope: form + non-vacuity + auditability, NOT provenance — a fabricated marker is the
-      weekly audit's + operator's residual by design, judge-robustness swarm 2026-06-13)
+      weekly audit's + operator's residual by design, judge-robustness swarm 2026-06-13.
+      The below-floor-ack rubber-stamp is **structurally IRREDUCIBLE for an autonomous runner**: the
+      runner writes both the marker AND any transcript the hook could cross-check, so any in-boundary
+      check it can satisfy it can also forge (a runner-computed signature is false security). The one
+      genuine close needs a secret the runner does NOT hold — an **operator-present hard-close**: a
+      GPG-signed trailer **whose key requires a fresh interactive passphrase** (NOT an agent-cached
+      gpg-agent key, and NOT operator-git-identity alone — `user.email` is config the runner already
+      writes, forgeable, not a secret). The real guarantee is *uncached private-key access*, not commit
+      identity; optional, breaks full autonomy, use only when the operator is at the keyboard. Autonomous mode keeps the honest
+      residual + weekly-audit backstop — do NOT fake-close it. Gemini cross-analysis 2026-06-16 reached
+      this verdict independently, converging with the existing FH stance)
   → Axis 4 (/edit-manifest RECORD, today's entry in edit_manifest.yaml)
   → All 4 PASS → git commit allowed   |   Any FAIL → fix inline, re-run
 ```

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@chrono-meta/fh-gate",
-  "version": "1.4.26",
+  "version": "1.4.27",
   "description": "FH runtime adapters — run FH governance, skills, and agents via Claude or Codex with machine-parseable gates.",
   "license": "MIT",
   "keywords": [

package/plugins/fh-meta/skills/context-bridge-dispatch/SKILL.md CHANGED Viewed

@@ -21,9 +21,11 @@ successor: agent-composer
 ## Content preserved at
-`plugins/fh-meta/skills/agent-composer/SKILL.md §Step 3-a`
+`plugins/fh-meta/skills/agent-composer/SKILL.md §Step 3-a` — the live successor.
-> **Detail**: See `SKILL_detail.md §Archive` — full original content preserved — read only for historical reference.
+> The original pre-merge content is preserved in git history (this stub's earlier revisions); the
+> shipped `SKILL_detail.md` archive was removed in the curator shrink (2026-06-16) as redundant with
+> git history. No live content lost — the functional content lives in agent-composer Step 3-a.
 ## Done When

package/plugins/fh-meta/skills/edit-manifest/SKILL.md CHANGED Viewed

@@ -43,11 +43,26 @@ edit history and negative-feedback buffer.
   predicted_impact: "users entering via phrase X will increase — estimate +1 session/week"
   predicted_measurable_by: "session start logs or user utterance pattern in next 2 sessions"
   validation_status: pending   # pending | verified | falsified | untestable
+  validation_type: judged      # mechanical (grep/count/git) | judged (cited observation) | untestable
+  baseline_value: null         # number, when a metric exists (mechanical)
+  measured_value: null         # filled at VERIFY
+  delta: null                  # measured_value - baseline_value, or null
+  match_score: null            # at VERIFY: 1.0 confirmed | 0.5 partial | 0.0 contradicted
   verified_at: null
-  verification_note: null
-  gate_decision: null   # accepted | rejected
+  verification_note: null      # one-line MEASURED outcome + cited evidence (never bare "seems better")
+  gate_decision: null   # accepted | redefine | rejected
 ```
+> **Status vocabulary is canonical and machine-greppable** — `validation_status` MUST be one of
+> `pending | verified | falsified | untestable`, never freeform prose. The verify pass (Step V1)
+> greps these literals; a freeform status like `"predicted — verify next session"` is **invisible to
+> the grep and silently never closes the loop** (the format-reconciliation bug fixed 2026-06-16 — put
+> the prose in `predicted_measurable_by`, keep `validation_status: pending`).
+>
+> `baseline_value` / `measured_value` / `delta` apply to **mechanical** entries only; judged and
+> untestable entries leave them `null`. **`match_score` is the gate input (Step V3) for all types** —
+> the numeric fields are a mechanical-entry audit detail, not a second gate signal.
 ## Trigger Conditions
 ### Automatic — Record Phase (on every FH asset edit)
@@ -105,14 +120,37 @@ that the edit rationale needs sharpening.
 **Step V1 — Load Pending Entries**
 ```bash
-grep -A20 "validation_status: pending" tracks/_meta/edit_manifest.yaml
+# canonical pending + legacy freeform "predicted ..." entries (transition: reconcile legacy to pending)
+# \b anchors the alternation so 'pending_review' / 'predicted_outcome' (non-canonical) still surface as legacy, not swept as pending
+grep -nA22 -E 'validation_status: *"?(predicted|pending)\b' tracks/_meta/edit_manifest.yaml
 ```
-Skip entries where `predicted_measurable_by` date has not yet passed.
+Skip entries where `predicted_measurable_by` date has not yet passed. **Reconcile any legacy
+freeform `validation_status: "predicted — ..."` entry to `pending` (move the prose into
+`predicted_measurable_by`) as you touch it — otherwise it stays invisible to future passes.**
+**Reconciliation completeness** (a half-reconciled entry stays unverifiable): when you touch a legacy
+entry, also backfill the fields a canonical entry needs — generate a missing `id`
+(`em-{date}-{slug}`), set `file:` from the edit's target, and set `validation_type` explicitly (default
+**mechanical** when `predicted_measurable_by` names a grep/count/git check; **judged** when it names a
+reviewer observation). A missing `validation_type` is not a silent default — name it, or the entry
+can't be scored consistently across passes.
 **Step V2 — Verify Each Entry**
-For each pending entry, check the evidence source specified in `predicted_measurable_by`:
+First classify the entry by `validation_type`, then collect evidence accordingly:
+- **mechanical** — prediction is a count/presence checkable by grep/git. Record `baseline_value` →
+  `measured_value` → `delta`. The check IS the evidence (non-vacuous by construction).
+- **judged** — prediction needs reviewer judgment. Requires **one concrete cited observation**
+  (file:line / a quoted signal), never a bare "seems better". No citation → stays `pending`, not verified.
+  **Non-Model Ground (the citation must be tool-confirmed, not asserted)**: the cited file:line MUST be
+  confirmed by an actual Grep/Read **in this verify pass** and the tool output (the matched line) pasted
+  into `verification_note` — a citation-shaped string asserted from memory is NOT evidence. An
+  unverifiable / un-pasted citation caps `match_score` at **0.5** (never 1.0). This is the same anchor
+  discipline as phantom-quench: a verdict rests on a surfaced span, not a claim that one exists.
+- **untestable** — no observable evidence source. Mark `untestable`, do not score.
+For each entry, check the evidence source specified in `predicted_measurable_by`:
 | Evidence Source | Check Method |
 |---|---|
@@ -121,6 +159,9 @@ For each pending entry, check the evidence source specified in `predicted_measur
 | User friction signals | Grep `tracks/_meta/fh_signal_*.md` for related friction |
 | Git commit frequency | `git log --oneline --since={date} -- {file}` |
+Then score `match_score`: **1.0** = evidence clearly confirms the prediction · **0.5** = partial/ambiguous
+· **0.0** = contradicted or no-occurrence. Record the score + the cited evidence in `verification_note`.
 > **Circularity guard**: edit-manifest is invoked *by* harvest-loop (Step 0-c). To avoid a
 > circular evidence loop, edit-manifest must NOT use harvest-loop's own synthesis outputs
 > (proposal lists, curator decisions) as verification evidence. Evidence sources are limited
@@ -129,12 +170,13 @@ For each pending entry, check the evidence source specified in `predicted_measur
 **Step V3 — Apply Validation Gate**
-| Outcome | Gate Decision | Next Action |
+| `match_score` | status → Gate Decision | Next Action |
 |---|---|---|
-| Evidence confirms prediction | `verified` → `accepted` | No action needed |
-| Evidence contradicts prediction | `falsified` → `rejected` | Add to rejected-edits buffer; propose revert if regression |
-| No evidence yet | Keep `pending` | Re-check next session |
-| Untestable | `untestable` | Flag for human judgment |
+| ≥ 0.75 | `verified` → `accepted` | No action needed |
+| 0.25–0.75 | `verified`(partial) → `redefine` | Sharpen the prediction/edit; note what partially held |
+| ≤ 0.25 | `falsified` → `rejected` | Add to rejected-edits buffer; propose revert if regression |
+| no evidence / window not matured | keep `pending` | Re-check next session |
+| no evidence source | `untestable` | Flag for human judgment |
 **Step V4 — Rejected-Edits Buffer Report**
@@ -195,13 +237,19 @@ RECORD mode:
   + Untestable flag applied if vague prediction
 VERIFY mode:
-  All pending entries checked
-  + Gate decisions applied (accepted / rejected / pending)
+  All pending entries checked (canonical + legacy freeform reconciled to pending)
+  + validation_type classified (mechanical / judged / untestable)
+  + match_score recorded with cited evidence (mechanical: delta; judged: one cited observation)
+  + Gate decisions applied (accepted / redefine / rejected / pending)
   + Rejected-edits buffer reported
   + Manifest file updated via Edit
   + Human gate presented for any proposed reverts
 ```
+**Check class** (per `harness_6axis_framework.md §Axis 5`): the verify pass itself is *measured* for
+mechanical entries (delta is a number) and *judged* for judged entries — the judged path is kept
+non-vacuous by the **mandatory cited observation** (no citation → stays pending, never auto-verified).
 ## References
 - Theoretical basis: AHE (arXiv:2604.25850) §4 change manifest + prediction falsifiability

package/plugins/fh-meta/skills/context-bridge-dispatch/SKILL_detail.md DELETED Viewed

@@ -1,144 +0,0 @@
----
-name: context-bridge-dispatch-detail
-description: Archived original body of the deprecated context-bridge-dispatch skill
-load: on-demand
----
-## §Archive
-# context-bridge-dispatch — Parallel Agent Context Bridge (archived)
-In agent dispatch, sub-agents can read files but do not have access to the live conversation context of the main session. This skill generates a session context card before dispatch and injects it into each agent prompt.
-## Triggers
-| Phrase pattern | Situation |
-|---|---|
-| "do it in parallel" / "agent view" + 2+ tasks | Auto-triggered |
-| "create a context bridge" | Explicit call |
-| `/context-bridge-dispatch` | Explicit call |
-| Immediately before dispatching 2+ agents | Auto-injected |
-## Context Card Format
-**N≤2 (standard)**:
-```
-[Session Context Card]
-Purpose: {the goal of this session / task}
-Completed: {what has already been decided or implemented — risk of duplication if agent doesn't know}
-This agent's task: {the specific task for this agent}
-Note: {constraints, directions, or history the agent must know before acting}
-```
-**N≥3 (Registry mode — DACS-inspired)**:
-```
-[Session Context Card]
-Purpose: {session goal}
-Completed: {done items + file paths}
-This agent's task: {specific task}
-Note: {constraints}
-[Agent Registry]
-Agent-1 ({role}): {≤1 sentence — what it's doing, key files}
-Agent-2 ({role}): {≤1 sentence}
-... (all agents except this one)
-```
-Registry entries keep other agents visible (≤200 tokens total) without flooding context.
-Each agent gets its own full card + compressed view of the parallel picture.
-## Step 1. Extract Session Context
-Summarize the 3 key items from the current conversation:
-- **Purpose**: Core goal of this session / request
-- **Completed**: What has already been built or decided (include file paths and commits)
-- **Note**: Constraints that could lead an agent in the wrong direction if unknown
-## Step 2. Identify Agent List + Generate Individual Cards
-For each of the N agents to dispatch:
-- Common Context Card (Step 1 summary)
-- Agent-specific item (`This agent's task` field customized per agent)
-**N≥3 — Registry mode**: additionally generate one Registry entry per agent:
-```
-Agent-X ({role}): {what it's doing in ≤1 sentence} | files: {key paths}
-```
-Each agent's card includes the Registry entries for all *other* agents (omit its own).
-Keep total Registry section ≤200 tokens. If an agent's task is simple (read-only lookup), its registry entry can be a single phrase.
-## Step 3. Execute Parallel Dispatch
-Prepend the Context Card to each agent's prompt and dispatch as a single message.
-```
-[Session Context Card]
-...
-{Agent's original task instruction}
-```
-## Focus Mode (on-demand, N≥3)
-When an agent's result is incomplete and it signals it needs another agent's full output:
-1. Orchestrator identifies the target agent (a_i) whose full context is needed
-2. Re-dispatch the requesting agent with: full Context Card of a_i + Registry-compressed entries for all others
-3. Use only when genuinely needed — adds one round-trip latency
-Trigger signal from agent: `"Need full context from Agent-X to proceed"` or equivalent explicit statement.
-## Coordination-Overhead Budget
-Centralized multi-agent coordination is not free: external reporting cites orchestrator-worker coordination adding ~+285% token overhead (see the digest Provenance), and coordination cost dominates once a wave exceeds ~4 agents. Apply the following before each dispatch wave.
-| Rule | Constraint |
-|---|---|
-| **Parallel fan-out cap** | 3–4 agents per dispatch wave. This is the upper bound for the 2+ parallel dispatch in the Simplification Guard — do not flat-fan-out past 4. |
-| **Capability-aware routing** | Route each subtask to the agent whose declared capability fits, reading `.claude/registry/agent_cards.json` as the routing source (`role` + `allowed_tools` + `writes`). Do not dispatch a `writes: false` audit agent (e.g. `fact-checker`, `hub-persona-auditor`) for a task needing edits. |
-| **Escalation** | If a task genuinely needs >4 parallel agents, decompose hierarchically (supervisor → sub-waves) rather than flat fan-out. |
-Source: `../../../../knowledge/shared/harness-core/harness_frontier_diagnosis_2026-06-02.md`
-## Step 4. Aggregate Results
-After all agents complete, consolidate results in the main session and report to the user.
-## Simplification Guard
-- Simple file lookup agents unrelated to context (e.g., "read file A") → card may be omitted
-- Single agent dispatch → card injection optional
-- 2+ parallel dispatch → card injection required
-## Why This Is Necessary
-Agents are spawned in an isolated environment (sub-agent sandbox). They can read what is recorded in files, but decisions made during the current main session conversation — direction changes, completed implementations, design intent — do not exist for the agent unless saved to a file.
-Problems this disconnection causes:
-- Attempting to redo already completed work
-- Working in the old direction without knowing the current session's direction change
-- Making wrong decisions without knowing the constraints
-Context Bridge corrects this asymmetry.
-## Done When
-```
-All steps 1–4 completed
-+ Context Card injected at the front of each agent prompt
-+ Results aggregated and reported after all agents complete
-```
-## Connected Skills
-| Situation | Connection |
-|---|---|
-| Context collapse risk after a long session | `/context-doctor` |
-| Task of promoting field patterns to FH | `/field-harvest` |
-| When agent orchestration itself is complex | `agent-composer` |
-| N≥3 agents / long-running orchestration (context drifts post-dispatch) | See sister asset: DACS (arXiv:2604.07911) — Registry+Focus dynamic isolation |
-## Design Basis
-Registry mode and Focus mode patterns absorbed from **DACS** (arXiv:2604.07911, Nickson Patel, 2026-04-09).
-DACS validated: steering accuracy 98.4% vs 21% baseline at N=10; context efficiency 3.53×.
-Cross-audit + import/propagate analysis: `tracks/_audit/session_2026-06-02_dacs-sister.md`