@chrono-meta/fh-gate 1.4.26 → 1.4.27
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CLAUDE.md
CHANGED
|
@@ -141,7 +141,19 @@ All 6 items below must pass before committing a new SKILL.md. If any fails, fix
|
|
|
141
141
|
|
|
142
142
|
Skills without a Done When definition automatically qualify as harness-doctor L2 M-tier.
|
|
143
143
|
Check-class declaration applies to **new** skills; existing skills backfill opportunistically
|
|
144
|
-
(when next edited), not retroactively.
|
|
144
|
+
(when next edited), not retroactively. **Trigger-accuracy probe backfill** follows the same
|
|
145
|
+
opportunistic rule but scoped to **routing/gate skills only**: when an existing router/gate skill is
|
|
146
|
+
edited (especially when its trigger phrases change), run steel-quench `Step 0.5 — Trigger-Accuracy
|
|
147
|
+
Probe` on the changed trigger surface and record the fire-count — turning "do these triggers collide?"
|
|
148
|
+
from a guess into a number. Not a retroactive sweep of all routers (that would be decorative
|
|
149
|
+
over-work); it rides the edit that touches the router. **One-time baseline floor** (closes the
|
|
150
|
+
never-edited-router gap — a stable router accumulates the most un-probed traffic): existing
|
|
151
|
+
routing/gate skills get **one** baseline Step-0.5 probe at the next `harness-doctor` run (the 30-day
|
|
152
|
+
cadence already enumerates skills), then opportunistic-on-edit thereafter — a single baseline pass,
|
|
153
|
+
not a recurring sweep. **"routing/gate skill" (mechanical test)**: a skill whose *primary output is a
|
|
154
|
+
dispatch decision or a pass/block verdict* — e.g. `agent-composer`, `goal-quench`, `asset-placement-gate`,
|
|
155
|
+
`return-path-gate`, `phantom-quench` — NOT a skill that merely calls others as sub-steps (e.g.
|
|
156
|
+
`harvest-loop`).
|
|
145
157
|
|
|
146
158
|
---
|
|
147
159
|
|
|
@@ -161,7 +173,17 @@ FH asset modified → Axis 1 (regression_guard.sh --pr {BRANCH})
|
|
|
161
173
|
hook validates mechanically: below-floor blocks without below-floor-ack, and
|
|
162
174
|
axis2-evidence must be non-vacuous — a recorded verdict/count, not "it ran". Honest
|
|
163
175
|
scope: form + non-vacuity + auditability, NOT provenance — a fabricated marker is the
|
|
164
|
-
weekly audit's + operator's residual by design, judge-robustness swarm 2026-06-13
|
|
176
|
+
weekly audit's + operator's residual by design, judge-robustness swarm 2026-06-13.
|
|
177
|
+
The below-floor-ack rubber-stamp is **structurally IRREDUCIBLE for an autonomous runner**: the
|
|
178
|
+
runner writes both the marker AND any transcript the hook could cross-check, so any in-boundary
|
|
179
|
+
check it can satisfy it can also forge (a runner-computed signature is false security). The one
|
|
180
|
+
genuine close needs a secret the runner does NOT hold — an **operator-present hard-close**: a
|
|
181
|
+
GPG-signed trailer **whose key requires a fresh interactive passphrase** (NOT an agent-cached
|
|
182
|
+
gpg-agent key, and NOT operator-git-identity alone — `user.email` is config the runner already
|
|
183
|
+
writes, forgeable, not a secret). The real guarantee is *uncached private-key access*, not commit
|
|
184
|
+
identity; optional, breaks full autonomy, use only when the operator is at the keyboard. Autonomous mode keeps the honest
|
|
185
|
+
residual + weekly-audit backstop — do NOT fake-close it. Gemini cross-analysis 2026-06-16 reached
|
|
186
|
+
this verdict independently, converging with the existing FH stance)
|
|
165
187
|
→ Axis 4 (/edit-manifest RECORD, today's entry in edit_manifest.yaml)
|
|
166
188
|
→ All 4 PASS → git commit allowed | Any FAIL → fix inline, re-run
|
|
167
189
|
```
|
package/package.json
CHANGED
|
@@ -21,9 +21,11 @@ successor: agent-composer
|
|
|
21
21
|
|
|
22
22
|
## Content preserved at
|
|
23
23
|
|
|
24
|
-
`plugins/fh-meta/skills/agent-composer/SKILL.md §Step 3-a`
|
|
24
|
+
`plugins/fh-meta/skills/agent-composer/SKILL.md §Step 3-a` — the live successor.
|
|
25
25
|
|
|
26
|
-
>
|
|
26
|
+
> The original pre-merge content is preserved in git history (this stub's earlier revisions); the
|
|
27
|
+
> shipped `SKILL_detail.md` archive was removed in the curator shrink (2026-06-16) as redundant with
|
|
28
|
+
> git history. No live content lost — the functional content lives in agent-composer Step 3-a.
|
|
27
29
|
|
|
28
30
|
## Done When
|
|
29
31
|
|
|
@@ -43,11 +43,26 @@ edit history and negative-feedback buffer.
|
|
|
43
43
|
predicted_impact: "users entering via phrase X will increase — estimate +1 session/week"
|
|
44
44
|
predicted_measurable_by: "session start logs or user utterance pattern in next 2 sessions"
|
|
45
45
|
validation_status: pending # pending | verified | falsified | untestable
|
|
46
|
+
validation_type: judged # mechanical (grep/count/git) | judged (cited observation) | untestable
|
|
47
|
+
baseline_value: null # number, when a metric exists (mechanical)
|
|
48
|
+
measured_value: null # filled at VERIFY
|
|
49
|
+
delta: null # measured_value - baseline_value, or null
|
|
50
|
+
match_score: null # at VERIFY: 1.0 confirmed | 0.5 partial | 0.0 contradicted
|
|
46
51
|
verified_at: null
|
|
47
|
-
verification_note: null
|
|
48
|
-
gate_decision: null # accepted | rejected
|
|
52
|
+
verification_note: null # one-line MEASURED outcome + cited evidence (never bare "seems better")
|
|
53
|
+
gate_decision: null # accepted | redefine | rejected
|
|
49
54
|
```
|
|
50
55
|
|
|
56
|
+
> **Status vocabulary is canonical and machine-greppable** — `validation_status` MUST be one of
|
|
57
|
+
> `pending | verified | falsified | untestable`, never freeform prose. The verify pass (Step V1)
|
|
58
|
+
> greps these literals; a freeform status like `"predicted — verify next session"` is **invisible to
|
|
59
|
+
> the grep and silently never closes the loop** (the format-reconciliation bug fixed 2026-06-16 — put
|
|
60
|
+
> the prose in `predicted_measurable_by`, keep `validation_status: pending`).
|
|
61
|
+
>
|
|
62
|
+
> `baseline_value` / `measured_value` / `delta` apply to **mechanical** entries only; judged and
|
|
63
|
+
> untestable entries leave them `null`. **`match_score` is the gate input (Step V3) for all types** —
|
|
64
|
+
> the numeric fields are a mechanical-entry audit detail, not a second gate signal.
|
|
65
|
+
|
|
51
66
|
## Trigger Conditions
|
|
52
67
|
|
|
53
68
|
### Automatic — Record Phase (on every FH asset edit)
|
|
@@ -105,14 +120,37 @@ that the edit rationale needs sharpening.
|
|
|
105
120
|
**Step V1 — Load Pending Entries**
|
|
106
121
|
|
|
107
122
|
```bash
|
|
108
|
-
|
|
123
|
+
# canonical pending + legacy freeform "predicted ..." entries (transition: reconcile legacy to pending)
|
|
124
|
+
# \b anchors the alternation so 'pending_review' / 'predicted_outcome' (non-canonical) still surface as legacy, not swept as pending
|
|
125
|
+
grep -nA22 -E 'validation_status: *"?(predicted|pending)\b' tracks/_meta/edit_manifest.yaml
|
|
109
126
|
```
|
|
110
127
|
|
|
111
|
-
Skip entries where `predicted_measurable_by` date has not yet passed.
|
|
128
|
+
Skip entries where `predicted_measurable_by` date has not yet passed. **Reconcile any legacy
|
|
129
|
+
freeform `validation_status: "predicted — ..."` entry to `pending` (move the prose into
|
|
130
|
+
`predicted_measurable_by`) as you touch it — otherwise it stays invisible to future passes.**
|
|
131
|
+
**Reconciliation completeness** (a half-reconciled entry stays unverifiable): when you touch a legacy
|
|
132
|
+
entry, also backfill the fields a canonical entry needs — generate a missing `id`
|
|
133
|
+
(`em-{date}-{slug}`), set `file:` from the edit's target, and set `validation_type` explicitly (default
|
|
134
|
+
**mechanical** when `predicted_measurable_by` names a grep/count/git check; **judged** when it names a
|
|
135
|
+
reviewer observation). A missing `validation_type` is not a silent default — name it, or the entry
|
|
136
|
+
can't be scored consistently across passes.
|
|
112
137
|
|
|
113
138
|
**Step V2 — Verify Each Entry**
|
|
114
139
|
|
|
115
|
-
|
|
140
|
+
First classify the entry by `validation_type`, then collect evidence accordingly:
|
|
141
|
+
|
|
142
|
+
- **mechanical** — prediction is a count/presence checkable by grep/git. Record `baseline_value` →
|
|
143
|
+
`measured_value` → `delta`. The check IS the evidence (non-vacuous by construction).
|
|
144
|
+
- **judged** — prediction needs reviewer judgment. Requires **one concrete cited observation**
|
|
145
|
+
(file:line / a quoted signal), never a bare "seems better". No citation → stays `pending`, not verified.
|
|
146
|
+
**Non-Model Ground (the citation must be tool-confirmed, not asserted)**: the cited file:line MUST be
|
|
147
|
+
confirmed by an actual Grep/Read **in this verify pass** and the tool output (the matched line) pasted
|
|
148
|
+
into `verification_note` — a citation-shaped string asserted from memory is NOT evidence. An
|
|
149
|
+
unverifiable / un-pasted citation caps `match_score` at **0.5** (never 1.0). This is the same anchor
|
|
150
|
+
discipline as phantom-quench: a verdict rests on a surfaced span, not a claim that one exists.
|
|
151
|
+
- **untestable** — no observable evidence source. Mark `untestable`, do not score.
|
|
152
|
+
|
|
153
|
+
For each entry, check the evidence source specified in `predicted_measurable_by`:
|
|
116
154
|
|
|
117
155
|
| Evidence Source | Check Method |
|
|
118
156
|
|---|---|
|
|
@@ -121,6 +159,9 @@ For each pending entry, check the evidence source specified in `predicted_measur
|
|
|
121
159
|
| User friction signals | Grep `tracks/_meta/fh_signal_*.md` for related friction |
|
|
122
160
|
| Git commit frequency | `git log --oneline --since={date} -- {file}` |
|
|
123
161
|
|
|
162
|
+
Then score `match_score`: **1.0** = evidence clearly confirms the prediction · **0.5** = partial/ambiguous
|
|
163
|
+
· **0.0** = contradicted or no-occurrence. Record the score + the cited evidence in `verification_note`.
|
|
164
|
+
|
|
124
165
|
> **Circularity guard**: edit-manifest is invoked *by* harvest-loop (Step 0-c). To avoid a
|
|
125
166
|
> circular evidence loop, edit-manifest must NOT use harvest-loop's own synthesis outputs
|
|
126
167
|
> (proposal lists, curator decisions) as verification evidence. Evidence sources are limited
|
|
@@ -129,12 +170,13 @@ For each pending entry, check the evidence source specified in `predicted_measur
|
|
|
129
170
|
|
|
130
171
|
**Step V3 — Apply Validation Gate**
|
|
131
172
|
|
|
132
|
-
|
|
|
173
|
+
| `match_score` | status → Gate Decision | Next Action |
|
|
133
174
|
|---|---|---|
|
|
134
|
-
|
|
|
135
|
-
|
|
|
136
|
-
|
|
|
137
|
-
|
|
|
175
|
+
| ≥ 0.75 | `verified` → `accepted` | No action needed |
|
|
176
|
+
| 0.25–0.75 | `verified`(partial) → `redefine` | Sharpen the prediction/edit; note what partially held |
|
|
177
|
+
| ≤ 0.25 | `falsified` → `rejected` | Add to rejected-edits buffer; propose revert if regression |
|
|
178
|
+
| no evidence / window not matured | keep `pending` | Re-check next session |
|
|
179
|
+
| no evidence source | `untestable` | Flag for human judgment |
|
|
138
180
|
|
|
139
181
|
**Step V4 — Rejected-Edits Buffer Report**
|
|
140
182
|
|
|
@@ -195,13 +237,19 @@ RECORD mode:
|
|
|
195
237
|
+ Untestable flag applied if vague prediction
|
|
196
238
|
|
|
197
239
|
VERIFY mode:
|
|
198
|
-
All pending entries checked
|
|
199
|
-
+
|
|
240
|
+
All pending entries checked (canonical + legacy freeform reconciled to pending)
|
|
241
|
+
+ validation_type classified (mechanical / judged / untestable)
|
|
242
|
+
+ match_score recorded with cited evidence (mechanical: delta; judged: one cited observation)
|
|
243
|
+
+ Gate decisions applied (accepted / redefine / rejected / pending)
|
|
200
244
|
+ Rejected-edits buffer reported
|
|
201
245
|
+ Manifest file updated via Edit
|
|
202
246
|
+ Human gate presented for any proposed reverts
|
|
203
247
|
```
|
|
204
248
|
|
|
249
|
+
**Check class** (per `harness_6axis_framework.md §Axis 5`): the verify pass itself is *measured* for
|
|
250
|
+
mechanical entries (delta is a number) and *judged* for judged entries — the judged path is kept
|
|
251
|
+
non-vacuous by the **mandatory cited observation** (no citation → stays pending, never auto-verified).
|
|
252
|
+
|
|
205
253
|
## References
|
|
206
254
|
|
|
207
255
|
- Theoretical basis: AHE (arXiv:2604.25850) §4 change manifest + prediction falsifiability
|
|
@@ -1,144 +0,0 @@
|
|
|
1
|
-
---
|
|
2
|
-
name: context-bridge-dispatch-detail
|
|
3
|
-
description: Archived original body of the deprecated context-bridge-dispatch skill
|
|
4
|
-
load: on-demand
|
|
5
|
-
---
|
|
6
|
-
|
|
7
|
-
## §Archive
|
|
8
|
-
|
|
9
|
-
# context-bridge-dispatch — Parallel Agent Context Bridge (archived)
|
|
10
|
-
|
|
11
|
-
In agent dispatch, sub-agents can read files but do not have access to the live conversation context of the main session. This skill generates a session context card before dispatch and injects it into each agent prompt.
|
|
12
|
-
|
|
13
|
-
## Triggers
|
|
14
|
-
|
|
15
|
-
| Phrase pattern | Situation |
|
|
16
|
-
|---|---|
|
|
17
|
-
| "do it in parallel" / "agent view" + 2+ tasks | Auto-triggered |
|
|
18
|
-
| "create a context bridge" | Explicit call |
|
|
19
|
-
| `/context-bridge-dispatch` | Explicit call |
|
|
20
|
-
| Immediately before dispatching 2+ agents | Auto-injected |
|
|
21
|
-
|
|
22
|
-
## Context Card Format
|
|
23
|
-
|
|
24
|
-
**N≤2 (standard)**:
|
|
25
|
-
```
|
|
26
|
-
[Session Context Card]
|
|
27
|
-
Purpose: {the goal of this session / task}
|
|
28
|
-
Completed: {what has already been decided or implemented — risk of duplication if agent doesn't know}
|
|
29
|
-
This agent's task: {the specific task for this agent}
|
|
30
|
-
Note: {constraints, directions, or history the agent must know before acting}
|
|
31
|
-
```
|
|
32
|
-
|
|
33
|
-
**N≥3 (Registry mode — DACS-inspired)**:
|
|
34
|
-
```
|
|
35
|
-
[Session Context Card]
|
|
36
|
-
Purpose: {session goal}
|
|
37
|
-
Completed: {done items + file paths}
|
|
38
|
-
This agent's task: {specific task}
|
|
39
|
-
Note: {constraints}
|
|
40
|
-
|
|
41
|
-
[Agent Registry]
|
|
42
|
-
Agent-1 ({role}): {≤1 sentence — what it's doing, key files}
|
|
43
|
-
Agent-2 ({role}): {≤1 sentence}
|
|
44
|
-
... (all agents except this one)
|
|
45
|
-
```
|
|
46
|
-
|
|
47
|
-
Registry entries keep other agents visible (≤200 tokens total) without flooding context.
|
|
48
|
-
Each agent gets its own full card + compressed view of the parallel picture.
|
|
49
|
-
|
|
50
|
-
## Step 1. Extract Session Context
|
|
51
|
-
|
|
52
|
-
Summarize the 3 key items from the current conversation:
|
|
53
|
-
- **Purpose**: Core goal of this session / request
|
|
54
|
-
- **Completed**: What has already been built or decided (include file paths and commits)
|
|
55
|
-
- **Note**: Constraints that could lead an agent in the wrong direction if unknown
|
|
56
|
-
|
|
57
|
-
## Step 2. Identify Agent List + Generate Individual Cards
|
|
58
|
-
|
|
59
|
-
For each of the N agents to dispatch:
|
|
60
|
-
- Common Context Card (Step 1 summary)
|
|
61
|
-
- Agent-specific item (`This agent's task` field customized per agent)
|
|
62
|
-
|
|
63
|
-
**N≥3 — Registry mode**: additionally generate one Registry entry per agent:
|
|
64
|
-
```
|
|
65
|
-
Agent-X ({role}): {what it's doing in ≤1 sentence} | files: {key paths}
|
|
66
|
-
```
|
|
67
|
-
Each agent's card includes the Registry entries for all *other* agents (omit its own).
|
|
68
|
-
Keep total Registry section ≤200 tokens. If an agent's task is simple (read-only lookup), its registry entry can be a single phrase.
|
|
69
|
-
|
|
70
|
-
## Step 3. Execute Parallel Dispatch
|
|
71
|
-
|
|
72
|
-
Prepend the Context Card to each agent's prompt and dispatch as a single message.
|
|
73
|
-
|
|
74
|
-
```
|
|
75
|
-
[Session Context Card]
|
|
76
|
-
...
|
|
77
|
-
|
|
78
|
-
{Agent's original task instruction}
|
|
79
|
-
```
|
|
80
|
-
|
|
81
|
-
## Focus Mode (on-demand, N≥3)
|
|
82
|
-
|
|
83
|
-
When an agent's result is incomplete and it signals it needs another agent's full output:
|
|
84
|
-
1. Orchestrator identifies the target agent (a_i) whose full context is needed
|
|
85
|
-
2. Re-dispatch the requesting agent with: full Context Card of a_i + Registry-compressed entries for all others
|
|
86
|
-
3. Use only when genuinely needed — adds one round-trip latency
|
|
87
|
-
|
|
88
|
-
Trigger signal from agent: `"Need full context from Agent-X to proceed"` or equivalent explicit statement.
|
|
89
|
-
|
|
90
|
-
## Coordination-Overhead Budget
|
|
91
|
-
|
|
92
|
-
Centralized multi-agent coordination is not free: external reporting cites orchestrator-worker coordination adding ~+285% token overhead (see the digest Provenance), and coordination cost dominates once a wave exceeds ~4 agents. Apply the following before each dispatch wave.
|
|
93
|
-
|
|
94
|
-
| Rule | Constraint |
|
|
95
|
-
|---|---|
|
|
96
|
-
| **Parallel fan-out cap** | 3–4 agents per dispatch wave. This is the upper bound for the 2+ parallel dispatch in the Simplification Guard — do not flat-fan-out past 4. |
|
|
97
|
-
| **Capability-aware routing** | Route each subtask to the agent whose declared capability fits, reading `.claude/registry/agent_cards.json` as the routing source (`role` + `allowed_tools` + `writes`). Do not dispatch a `writes: false` audit agent (e.g. `fact-checker`, `hub-persona-auditor`) for a task needing edits. |
|
|
98
|
-
| **Escalation** | If a task genuinely needs >4 parallel agents, decompose hierarchically (supervisor → sub-waves) rather than flat fan-out. |
|
|
99
|
-
|
|
100
|
-
Source: `../../../../knowledge/shared/harness-core/harness_frontier_diagnosis_2026-06-02.md`
|
|
101
|
-
|
|
102
|
-
## Step 4. Aggregate Results
|
|
103
|
-
|
|
104
|
-
After all agents complete, consolidate results in the main session and report to the user.
|
|
105
|
-
|
|
106
|
-
## Simplification Guard
|
|
107
|
-
|
|
108
|
-
- Simple file lookup agents unrelated to context (e.g., "read file A") → card may be omitted
|
|
109
|
-
- Single agent dispatch → card injection optional
|
|
110
|
-
- 2+ parallel dispatch → card injection required
|
|
111
|
-
|
|
112
|
-
## Why This Is Necessary
|
|
113
|
-
|
|
114
|
-
Agents are spawned in an isolated environment (sub-agent sandbox). They can read what is recorded in files, but decisions made during the current main session conversation — direction changes, completed implementations, design intent — do not exist for the agent unless saved to a file.
|
|
115
|
-
|
|
116
|
-
Problems this disconnection causes:
|
|
117
|
-
- Attempting to redo already completed work
|
|
118
|
-
- Working in the old direction without knowing the current session's direction change
|
|
119
|
-
- Making wrong decisions without knowing the constraints
|
|
120
|
-
|
|
121
|
-
Context Bridge corrects this asymmetry.
|
|
122
|
-
|
|
123
|
-
## Done When
|
|
124
|
-
|
|
125
|
-
```
|
|
126
|
-
All steps 1–4 completed
|
|
127
|
-
+ Context Card injected at the front of each agent prompt
|
|
128
|
-
+ Results aggregated and reported after all agents complete
|
|
129
|
-
```
|
|
130
|
-
|
|
131
|
-
## Connected Skills
|
|
132
|
-
|
|
133
|
-
| Situation | Connection |
|
|
134
|
-
|---|---|
|
|
135
|
-
| Context collapse risk after a long session | `/context-doctor` |
|
|
136
|
-
| Task of promoting field patterns to FH | `/field-harvest` |
|
|
137
|
-
| When agent orchestration itself is complex | `agent-composer` |
|
|
138
|
-
| N≥3 agents / long-running orchestration (context drifts post-dispatch) | See sister asset: DACS (arXiv:2604.07911) — Registry+Focus dynamic isolation |
|
|
139
|
-
|
|
140
|
-
## Design Basis
|
|
141
|
-
|
|
142
|
-
Registry mode and Focus mode patterns absorbed from **DACS** (arXiv:2604.07911, Nickson Patel, 2026-04-09).
|
|
143
|
-
DACS validated: steering accuracy 98.4% vs 21% baseline at N=10; context efficiency 3.53×.
|
|
144
|
-
Cross-audit + import/propagate analysis: `tracks/_audit/session_2026-06-02_dacs-sister.md`
|