@chrono-meta/fh-gate 1.0.3 → 1.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude/agents/challenger.md +169 -0
- package/AGENTS.md +160 -0
- package/CATALOG.md +256 -0
- package/CHEATSHEET.md +367 -0
- package/CLAUDE.md +331 -0
- package/CONTRIBUTING.md +198 -0
- package/LICENSE +21 -0
- package/README.md +131 -418
- package/bin/fh-goal.js +9 -0
- package/bin/fh-run.js +9 -0
- package/docs/banner.png +0 -0
- package/docs/codex-compat.md +123 -0
- package/docs/pillars.svg +70 -0
- package/knowledge/shared/harness-core/fh_integration_contract.md +48 -29
- package/package.json +31 -6
- package/plugins/fh-commons/README.md +37 -0
- package/plugins/fh-commons/agents/quench-challenger.md +373 -0
- package/plugins/fh-commons/skills/convergence-loop/SKILL.md +155 -0
- package/plugins/fh-commons/skills/deliberation/SKILL.md +288 -0
- package/plugins/fh-commons/skills/mcp-circuit-breaker/SKILL.md +196 -0
- package/plugins/fh-commons/skills/token-budget-gate/SKILL.md +175 -0
- package/plugins/fh-meta/agents/fact-checker.md +121 -0
- package/plugins/fh-meta/agents/hub-persona-auditor.md +109 -0
- package/plugins/fh-meta/agents/persona-innovator.md +195 -0
- package/plugins/fh-meta/skills/agent-composer/SKILL.md +461 -0
- package/plugins/fh-meta/skills/agent-composer/SKILL_detail.md +464 -0
- package/plugins/fh-meta/skills/apex-review/SKILL.md +185 -0
- package/plugins/fh-meta/skills/asset-placement-gate/SKILL.md +135 -0
- package/plugins/fh-meta/skills/contention-layer/SKILL.md +127 -0
- package/plugins/fh-meta/skills/context-bridge-dispatch/SKILL.md +30 -0
- package/plugins/fh-meta/skills/context-bridge-dispatch/SKILL_detail.md +144 -0
- package/plugins/fh-meta/skills/context-doctor/SKILL.md +341 -0
- package/plugins/fh-meta/skills/cross-ecosystem-synergy-detection/SKILL.md +202 -0
- package/plugins/fh-meta/skills/deep-clarify/SKILL.md +144 -0
- package/plugins/fh-meta/skills/edit-manifest/SKILL.md +210 -0
- package/plugins/fh-meta/skills/field-harvest/SKILL.md +384 -0
- package/plugins/fh-meta/skills/frontier-digest/SKILL.md +272 -0
- package/plugins/fh-meta/skills/goal-quench/SKILL.md +509 -0
- package/plugins/fh-meta/skills/harness-doctor/SKILL.md +277 -0
- package/plugins/fh-meta/skills/harness-doctor/SKILL_detail.md +484 -0
- package/plugins/fh-meta/skills/harvest-loop/SKILL.md +231 -0
- package/plugins/fh-meta/skills/harvest-loop/SKILL_detail.md +201 -0
- package/plugins/fh-meta/skills/hub-cc-pr-reviewer/SKILL.md +129 -0
- package/plugins/fh-meta/skills/hub-cc-pr-reviewer/SKILL_detail.md +158 -0
- package/plugins/fh-meta/skills/install-doctor/SKILL.md +207 -0
- package/plugins/fh-meta/skills/install-wizard/SKILL.md +613 -0
- package/plugins/fh-meta/skills/marketplace-gate/SKILL.md +193 -0
- package/plugins/fh-meta/skills/memory-hygiene/SKILL.md +143 -0
- package/plugins/fh-meta/skills/meta-prompt-builder/SKILL.md +167 -0
- package/plugins/fh-meta/skills/meta-prompt-builder/SKILL_detail.md +37 -0
- package/plugins/fh-meta/skills/pipeline-conductor/SKILL.md +430 -0
- package/plugins/fh-meta/skills/plugin-recommender/SKILL.md +221 -0
- package/plugins/fh-meta/skills/plugin-recommender/SKILL_detail.md +220 -0
- package/plugins/fh-meta/skills/prompt-regression/SKILL.md +178 -0
- package/plugins/fh-meta/skills/public-surface-audit/SKILL.md +224 -0
- package/plugins/fh-meta/skills/return-path-gate/SKILL.md +257 -0
- package/plugins/fh-meta/skills/self-marketing-lint/SKILL.md +129 -0
- package/plugins/fh-meta/skills/sim-conductor/SKILL.md +364 -0
- package/plugins/fh-meta/skills/sim-conductor/SKILL_detail.md +337 -0
- package/plugins/fh-meta/skills/skill-splitter/SKILL.md +126 -0
- package/plugins/fh-meta/skills/skill-splitter/SKILL_detail.md +185 -0
- package/plugins/fh-meta/skills/source-grounding-audit/SKILL.md +230 -0
- package/plugins/fh-meta/skills/source-grounding-audit/SKILL_detail.md +182 -0
- package/plugins/fh-meta/skills/steel-quench/SKILL.md +226 -0
- package/plugins/fh-meta/skills/steel-quench/SKILL_detail.md +453 -0
- package/plugins/fh-meta/skills/verify-bidirectional/SKILL.md +238 -0
- package/scripts/fh-gate.sh +175 -40
- package/scripts/fh-goal.sh +182 -0
- package/scripts/fh-run.sh +269 -0
|
@@ -0,0 +1,231 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: harvest-loop
|
|
3
|
+
description: A self-evolution pipeline that runs automatically after field sessions end. field-harvest (pattern extraction) → contention-layer (collision signals) → [Agent(subagent_type="challenger") + persona-innovator parallel] → synthesizer (challenger/innovator collision harvest) → Critic isolated Agent (SAGE automated critique) → harness-doctor (health check) → verify-bidirectional (consistency validation) → curator (skill lifecycle management) — 8 steps. Session learnings are automatically absorbed back into the FH ecosystem so the harness evolves on its own. In the main development environment, runs automatically at session end. For external FH users, proposes execution first. Triggered by "session harvest", "learning absorption", "fh evolution", or "harvest-loop". (The phrase "run the pipeline" is ceded to pipeline-conductor to avoid a trigger collision — for end-to-end verification sweeps use pipeline-conductor.)
|
|
4
|
+
user-invocable: true
|
|
5
|
+
allowed-tools: ["Read", "Write", "Bash", "Grep", "Glob", "Agent"]
|
|
6
|
+
model: opus
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
# harvest-loop — Field Session → FH Self-Evolution Pipeline
|
|
10
|
+
|
|
11
|
+
> Automatically absorbs patterns/conflicts/discoveries from field sessions back into the FH ecosystem.
|
|
12
|
+
> Internalizes as a pipeline the return loop from field projects to the harness that was previously done manually.
|
|
13
|
+
> One of the core functions is real-time detection and blocking of **Semantic Drift** — where agent terminology gradually diverges in meaning as sessions grow longer.
|
|
14
|
+
|
|
15
|
+
## Operation Modes
|
|
16
|
+
|
|
17
|
+
| Mode | Description | Trigger |
|
|
18
|
+
|---|---|---|
|
|
19
|
+
| **Forced mode** | Auto-runs at end of local development session. Executes without approval, only confirms final suggestions | Session wrap-up rules in hub CLAUDE.md |
|
|
20
|
+
| **Lightweight mode** | Immediate harvest after Wave completion. Skip Steps 3/3.5/4 — prioritize fast recording | agent-composer Step 4-c (2+ new files or 3+ existing files changed, **or M-tier resolved**) |
|
|
21
|
+
| **Proposal mode** | External FH users — confirms "run harvest-loop?" before executing | User utterance or `/harvest-loop` |
|
|
22
|
+
|
|
23
|
+
**Simplification guard**: Sessions that only browsed/explored (no code changes or outputs) auto-skip even in forced mode.
|
|
24
|
+
|
|
25
|
+
**Lightweight mode Done When**:
|
|
26
|
+
```
|
|
27
|
+
Step 0 (Regression Guard) + Step 1 (field-harvest) + Step 2 (contention-layer) + Step 5 (verify-bidirectional) complete
|
|
28
|
+
+ harvested pattern summary 1~3 lines output
|
|
29
|
+
+ "run full harvest-loop?" proposed (if patterns found)
|
|
30
|
+
+ [Card update prohibited] Do NOT update reference_next_session_starter.md in lightweight mode alone
|
|
31
|
+
```
|
|
32
|
+
|
|
33
|
+
**Early Trigger** (mid-session): Same pattern 3+ times · same skill fails 2+ consecutive times · session 2+ hours elapsed → "Early harvest condition detected. Run mid-session harvest?" If Y → field-harvest → contention-layer → verify-bidirectional only.
|
|
34
|
+
|
|
35
|
+
---
|
|
36
|
+
|
|
37
|
+
## Pipeline Structure
|
|
38
|
+
|
|
39
|
+
```
|
|
40
|
+
Session end
|
|
41
|
+
│
|
|
42
|
+
[Step 0-a] FH asset change detection → auto-quench
|
|
43
|
+
│ git diff --name-only HEAD | grep -E "SKILL\.md|\.claude/rules/|templates/|CLAUDE\.md"
|
|
44
|
+
│ → 1+ FH assets changed: run full 3-axis gate
|
|
45
|
+
│ → No changes: proceed to Step 0-b immediately
|
|
46
|
+
│
|
|
47
|
+
[Step 0-b] Card cross-check — reconstruct completed items (no memory dependency)
|
|
48
|
+
│ Read reference_next_session_starter.md + fh_completed_{today}.md + git log
|
|
49
|
+
│ → Generate removal candidate list from 3-source cross-check
|
|
50
|
+
│
|
|
51
|
+
[Step 0-c] Edit Manifest Verification + Memory Hygiene
|
|
52
|
+
│ edit-manifest VERIFY: check pending predictions in edit_manifest.yaml
|
|
53
|
+
│ memory-hygiene scan: staleness check on memory/*.md entries (skip if < 7 days)
|
|
54
|
+
│
|
|
55
|
+
[Step 0] Regression Guard
|
|
56
|
+
│ Check: does anything from this session conflict with or regress a validated skill?
|
|
57
|
+
│ → Regression detected: flag, route to contention-layer
|
|
58
|
+
│ → No regression: proceed
|
|
59
|
+
│
|
|
60
|
+
[Step 1] field-harvest
|
|
61
|
+
│ Scan field git diff / outputs → extract patterns (proceed if 3+, skip if fewer)
|
|
62
|
+
│
|
|
63
|
+
[Step 2] contention-layer
|
|
64
|
+
│ Compare patterns ↔ existing FH skills → collision = new skill candidate signal
|
|
65
|
+
│
|
|
66
|
+
[Step 3a] challenger (Agent) [Step 3b] persona-innovator ← parallel
|
|
67
|
+
│ Attack existing skills Propose new skill candidates
|
|
68
|
+
│
|
|
69
|
+
[Step 3.5] synthesizer
|
|
70
|
+
│ Cross-synthesize attack ↔ proposal → readjust grades (HIGH/MED/LOW)
|
|
71
|
+
│
|
|
72
|
+
[Step 3.75] Critic (isolated Agent — SAGE pattern)
|
|
73
|
+
│ Independent critique of synthesizer proposals → PASS / CONDITIONAL PASS / FAIL
|
|
74
|
+
│
|
|
75
|
+
[Step 4] harness-doctor
|
|
76
|
+
│ Health check when adding candidates (Done When exists? ≥70% overlap?)
|
|
77
|
+
│
|
|
78
|
+
[Step 5] verify-bidirectional
|
|
79
|
+
│ Bidirectional consistency check on candidate skill
|
|
80
|
+
│
|
|
81
|
+
Output final proposal list → Y: PR creation / N: persist to tracks/_meta/fh_signal
|
|
82
|
+
│
|
|
83
|
+
[Step 6] Curator lifecycle review (auto-run after Y)
|
|
84
|
+
│ SKILL.md STALE/merge candidates + Memory self-correction
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
---
|
|
88
|
+
|
|
89
|
+
## Execution Instructions
|
|
90
|
+
|
|
91
|
+
### Step 1 — field-harvest
|
|
92
|
+
`/field-harvest --since 1d` — Fewer than 3 patterns → auto-skip + output "no session harvest targets".
|
|
93
|
+
|
|
94
|
+
### Step 2 — contention-layer
|
|
95
|
+
`/contention-layer [field-harvest output patterns]`
|
|
96
|
+
|
|
97
|
+
| Collision type | Routing |
|
|
98
|
+
|---|---|
|
|
99
|
+
| Overlaps with existing skill | Existing skill enhancement candidate |
|
|
100
|
+
| New area not covered | New skill candidate |
|
|
101
|
+
| Two skills conflict | Mediation skill candidate |
|
|
102
|
+
|
|
103
|
+
### Step 3 — Parallel challenger + innovator
|
|
104
|
+
**3a challenger**: "Does this discovery overturn existing skill X?" / "Doesn't existing skill already handle this?" / "Does adding this simplify or complicate FH?"
|
|
105
|
+
**3b innovator**: Field pattern → abstraction → naming candidates + Done When draft required.
|
|
106
|
+
|
|
107
|
+
### Step 3.5 — synthesizer
|
|
108
|
+
|
|
109
|
+
| devil attack | innovator proposal | synthesizer verdict |
|
|
110
|
+
|:---:|:---:|---|
|
|
111
|
+
| S-tier attack | Proposal for that area | **HIGH** — immediate reflection candidate |
|
|
112
|
+
| S-tier attack | No proposal | **HIGH** — fix existing skill weakness immediately |
|
|
113
|
+
| No attack | Proposal exists | **MED** — re-review in next wave |
|
|
114
|
+
| Attack overturns proposal | — | Proposal **rejected** — persist as fh_signal on hold |
|
|
115
|
+
|
|
116
|
+
**Fallback** (deep-insight not installed): Inline synthesis. Apply same judgment matrix. If quality low → Step 3.75 Critic processes as CONDITIONAL PASS.
|
|
117
|
+
|
|
118
|
+
**Step 3.5-X** (optional): Cross-session 2nd validation when 2+ HIGH-grade items exist. External CLI (gemini/codex) or cross-session Claude. Items flagged as over-promoted → downgrade HIGH → MED.
|
|
119
|
+
|
|
120
|
+
> **Detail**: See `SKILL_detail.md §Step3-5X` — bash execution scripts for external CLI and cross-session Claude fallback — read when running Step 3.5-X validation.
|
|
121
|
+
|
|
122
|
+
### Step 3.75 — Critic (Isolated Agent)
|
|
123
|
+
|
|
124
|
+
> Source: SAGE (arXiv 2603.15255). Isolation = Critic does not inherit synthesizer reasoning chain → resolves Cost of Consensus.
|
|
125
|
+
|
|
126
|
+
Critic evaluation: Done When logic validation · failure mode exploration (2+ edge cases) · claim vs. implementation alignment · scope appropriateness (Too Narrow / Too Broad).
|
|
127
|
+
|
|
128
|
+
FAIL routing: First FAIL → 1 re-synthesis allowed. FAIL after re-synthesis → auto-persist as `fh_signal` on hold. Maximum retries: **1**.
|
|
129
|
+
|
|
130
|
+
> **Detail**: See `SKILL_detail.md §Step3-75` — Critic isolated Agent() call format, evaluation items table, FAIL routing, Post-Core-Skill Critic connection — read when executing Step 3.75.
|
|
131
|
+
|
|
132
|
+
### Step 4 — harness-doctor
|
|
133
|
+
`/harness-doctor --scope new-candidates` — Check: Done When exists · ≥70% overlap with existing skills · self-reference structure.
|
|
134
|
+
|
|
135
|
+
### Step 5 — verify-bidirectional
|
|
136
|
+
`/verify-bidirectional [new skill draft]` — If A references B, does B back-reference A?
|
|
137
|
+
|
|
138
|
+
### Step 6 — Curator Lifecycle Review
|
|
139
|
+
|
|
140
|
+
**6-1 SKILL.md Lifecycle**: 30+ day unused → [STALE] candidate. `pinned: true` → never touch. ≥70% overlap → merge candidate suggestion. **> 300 lines AND no `SKILL_detail.md`** → propose `/skill-splitter` (governance-semantic split — not compression; the grew-through-harvest pattern is a natural split trigger).
|
|
141
|
+
|
|
142
|
+
**6-1-a Archive-candidate auto-tag**: When 0 invocations in 30 days detected (cross-check `tracks/_meta/skill_usage.md`), auto-append `#archive-candidate` tag to that skill's CATALOG.md entry. No file deletion — tag only. User reviews tagged entries at next session start.
|
|
143
|
+
|
|
144
|
+
**6-2 Memory Self-Correction**: INDEX-ORPHAN (in MEMORY.md but file missing → auto-remove) · FILE-ORPHAN (file exists, not indexed → confirm with user) · MEM-STALE (30+ day unmodified → confirm with user).
|
|
145
|
+
|
|
146
|
+
**Memory curator safety**: Only INDEX-ORPHAN removal is auto-allowed. Actual file deletion absolutely prohibited without explicit approval. `type: reference` items with 🔑 keywords excluded from STALE detection.
|
|
147
|
+
|
|
148
|
+
**6-a Skill Usage Leaderboard**: Record skills called this session in `tracks/_meta/skill_usage.md`. Flag 4+ weeks no-call → deprecation candidate.
|
|
149
|
+
|
|
150
|
+
**6-b Harness Evolution Cadence** (4-week cycle): Scan skills with `complexity_routing`. Aggregate escalation records from `fh_signal_*.md`. Valid conditions = keep; never activated in 4 weeks = removal candidate; pattern in fh_signal = addition candidate. No auto-modification — output candidates then require user approval.
|
|
151
|
+
|
|
152
|
+
> **Detail**: See `SKILL_detail.md §Step6-Detail` — bash scripts for STALE detection, memory scan, skill usage leaderboard, evolution cadence aggregation — read when executing Step 6.
|
|
153
|
+
|
|
154
|
+
---
|
|
155
|
+
|
|
156
|
+
## Observability Hook (glass-box self-improvement)
|
|
157
|
+
|
|
158
|
+
Every evolution decision must leave a 3-part trace in `tracks/_meta/edit_manifest.yaml`:
|
|
159
|
+
- **(a) what changed** — file + diff summary
|
|
160
|
+
- **(b) predicted effect** — `predicted_impact` + `predicted_measurable_by`
|
|
161
|
+
- **(c) verify checkpoint** — `validation_status` flipped at next Step 0-c VERIFY
|
|
162
|
+
|
|
163
|
+
A proposal accepted without a recorded prediction is a black-box edit — flag, do not silently apply.
|
|
164
|
+
|
|
165
|
+
> **Detail**: See `SKILL_detail.md §Observability` — full observability hook spec and trace format.
|
|
166
|
+
|
|
167
|
+
---
|
|
168
|
+
|
|
169
|
+
## Output Format
|
|
170
|
+
|
|
171
|
+
```
|
|
172
|
+
## harvest-loop Execution Results
|
|
173
|
+
|
|
174
|
+
Session: [date] [project name]
|
|
175
|
+
field-harvest: [N patterns extracted]
|
|
176
|
+
contention-layer: [N collision signals]
|
|
177
|
+
synthesizer: [HIGH N / MED N / rejected N]
|
|
178
|
+
|
|
179
|
+
### Final Proposals (sorted by synthesizer grade)
|
|
180
|
+
| # | Type | Target | Grade | devil | innovator | synthesizer verdict |
|
|
181
|
+
|:---:|---|---|:---:|---|---|---|
|
|
182
|
+
|
|
183
|
+
→ Y: Create PR / draft skill file
|
|
184
|
+
→ N: Persist to tracks/_meta/fh_signal_YYYY_MM_DD_{slug}.md
|
|
185
|
+
|
|
186
|
+
### [Required final step] Session card update (proof gate)
|
|
187
|
+
Read reference_next_session_starter.md → apply Step 0-b removal list → add new priorities
|
|
188
|
+
→ output "BEFORE N items → AFTER M items (removed: [list])" — required
|
|
189
|
+
→ No diff (N=M) = warning + Step 0-b re-check obligation
|
|
190
|
+
|
|
191
|
+
**Natural-language close (4th source)**: Even without git log match, items with these patterns stated in session → treated as closed, remove immediately:
|
|
192
|
+
- "not possible / confirmed impossible" · "no response + N weeks elapsed" → abandoned
|
|
193
|
+
- "mutual citation confirmed" · "merged" · "cancelled" · "no longer needed"
|
|
194
|
+
- User says "stop monitoring" · "close this" · "remove it"
|
|
195
|
+
```
|
|
196
|
+
|
|
197
|
+
> **Detail**: See `SKILL_detail.md §Output-Detail` — 2-source mode (when fh_completed absent), exact match criteria, natural-language close edge cases — read when reconstructing session card without fh_completed file.
|
|
198
|
+
|
|
199
|
+
---
|
|
200
|
+
|
|
201
|
+
## Linked Skills
|
|
202
|
+
|
|
203
|
+
| Situation | Linked skill |
|
|
204
|
+
|---|---|
|
|
205
|
+
| 3+ new skill candidates | `/agent-composer` for dispatch plan |
|
|
206
|
+
| Design existing skill enhancement direction | `/meta-prompt-builder` |
|
|
207
|
+
| Validate candidates from external user perspective | `fh-meta:hub-persona-auditor` |
|
|
208
|
+
| Review before sharing with team | `/apex-review` |
|
|
209
|
+
| Self-marketing pattern discovered as HIGH P10 | `/harness-doctor --lint` auto-propose |
|
|
210
|
+
| Edit predictions to verify / rejected buffer | `fh-meta:edit-manifest` (Step 0-c) |
|
|
211
|
+
| Stale memory entries to re-verify | `fh-meta:memory-hygiene` (Step 0-c) |
|
|
212
|
+
|
|
213
|
+
---
|
|
214
|
+
|
|
215
|
+
## Done When
|
|
216
|
+
|
|
217
|
+
```
|
|
218
|
+
All stages Step 0-c → 0 → 1 → 2 → 3 (parallel) → 3.5 → 3.75 → 4 → 5 complete
|
|
219
|
+
+ Step 0-c: edit-manifest pending entries verified + memory-hygiene scan run
|
|
220
|
+
+ Step 3.75 Critic verdict received (PASS/CONDITIONAL PASS/FAIL stated) before Step 4
|
|
221
|
+
+ synthesizer grade readjustment complete (rejected candidates separated)
|
|
222
|
+
+ Final proposal list output (sorted by HIGH/MED)
|
|
223
|
+
+ User Y/N approval gate complete
|
|
224
|
+
+ (If Y) Step 6 Curator complete
|
|
225
|
+
→ 6-1: STALE candidate list + merge candidates
|
|
226
|
+
→ 6-2: INDEX-ORPHAN/FILE-ORPHAN/MEM-STALE detection results
|
|
227
|
+
+ [Required] reference_next_session_starter.md delta update complete
|
|
228
|
+
→ BEFORE N items → AFTER M items diff output required (proof gate)
|
|
229
|
+
→ No diff (N=M) = warning + Step 0-b re-check
|
|
230
|
+
→ Completed items remaining = bug (Done When not met)
|
|
231
|
+
```
|
|
@@ -0,0 +1,201 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: harvest-loop-detail
|
|
3
|
+
description: Detail file for harvest-loop — bash scripts for Step 6 curator, observability hook spec, Step 3.5-X bash, Critic Agent call format. Load when executing a specific step.
|
|
4
|
+
load: on-demand
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# harvest-loop — Detail Reference
|
|
8
|
+
|
|
9
|
+
> Load when executing a specific step. SKILL.md contains operation modes, pipeline structure diagram, execution instructions overview, and Done When.
|
|
10
|
+
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
## §Step3-75 — Critic Isolated Agent Call
|
|
14
|
+
|
|
15
|
+
```
|
|
16
|
+
Agent(
|
|
17
|
+
prompt="Independently evaluate the following skill proposals.\n[synthesizer output passed]",
|
|
18
|
+
# synthesizer reasoning chain not inherited — blind evaluation
|
|
19
|
+
)
|
|
20
|
+
```
|
|
21
|
+
|
|
22
|
+
Meaning of isolation: The Critic reads the synthesizer conclusion but does not inherit the reasoning chain that reached it (devil attack reflection/grade adjustment process). Independent reasoning path = resolves Cost of Consensus.
|
|
23
|
+
|
|
24
|
+
**Evaluation items**:
|
|
25
|
+
|
|
26
|
+
| Item | Judgment criteria |
|
|
27
|
+
|---|---|
|
|
28
|
+
| Done When logic validation | When condition is met, is the goal actually achieved? Is it measurable? |
|
|
29
|
+
| Failure mode exploration | 2+ edge cases where this skill could fail |
|
|
30
|
+
| Claim vs. implementation alignment | Does description promise contradict execution guide? |
|
|
31
|
+
| Scope appropriateness | Too Narrow / Too Broad verdict |
|
|
32
|
+
|
|
33
|
+
**FAIL routing** (infinite loop prevention):
|
|
34
|
+
- First FAIL → pass Critic findings to synthesizer for **1 re-synthesis** allowed
|
|
35
|
+
- FAIL after re-synthesis → auto-persist as `fh_signal` on hold (no additional retries)
|
|
36
|
+
- Maximum retries: **1**
|
|
37
|
+
|
|
38
|
+
**Post-Core-Skill Critic Verdict Connection**: Following core skills can have Critic called inline after completion: harness-doctor · verify-bidirectional · hub-cc-pr-reviewer · context-doctor · sim-conductor. Trigger: immediately after completion announcement + "steel-quench" / "re-validate" / "run Critic" utterance.
|
|
39
|
+
|
|
40
|
+
---
|
|
41
|
+
|
|
42
|
+
## §Step3-5X — Cross-Session 2nd Validation Bash
|
|
43
|
+
|
|
44
|
+
```bash
|
|
45
|
+
# Option A: External CLI team (if available — same detection as steel-quench Wave 5)
|
|
46
|
+
SYNTH_CHALLENGE=$(printf \
|
|
47
|
+
'You are an adversarial reviewer with zero prior context.\nEvaluate these skill proposals and find flaws in the synthesis logic.\nFlag any HIGH-grade items that are over-promoted.\nFormat: [item · flaw · downgrade-to]\n---\n%s' \
|
|
48
|
+
"${SYNTHESIZER_OUTPUT}" | gemini 2>/dev/null)
|
|
49
|
+
|
|
50
|
+
# Option B: Cross-session Claude (fallback)
|
|
51
|
+
SYNTH_CHALLENGE=$(claude --print \
|
|
52
|
+
"Adversarial reviewer, zero context. Evaluate these skill proposals.
|
|
53
|
+
Flag over-promoted HIGH-grade items. Format: [item · flaw · downgrade-to]
|
|
54
|
+
---
|
|
55
|
+
${SYNTHESIZER_OUTPUT}" 2>/dev/null)
|
|
56
|
+
```
|
|
57
|
+
|
|
58
|
+
**Outcome**:
|
|
59
|
+
- Items flagged as over-promoted → downgrade HIGH → MED, proceed with caution
|
|
60
|
+
- Confirmed across both → HIGH-confirmed → pass to Step 3.75 with elevated confidence
|
|
61
|
+
- Zero new issues → synthesis confirmed, proceed normally
|
|
62
|
+
|
|
63
|
+
Token cost: External CLI ~1K–2K tokens (billed to that CLI). Cross-session Claude ~2K–3K. Propose once — user may skip.
|
|
64
|
+
|
|
65
|
+
---
|
|
66
|
+
|
|
67
|
+
## §Step6-Detail — Bash Scripts for Curator + Memory + Leaderboard
|
|
68
|
+
|
|
69
|
+
### 6-1. SKILL.md Lifecycle (bash)
|
|
70
|
+
|
|
71
|
+
```bash
|
|
72
|
+
# Detect skills unused for 30+ days (based on git log)
|
|
73
|
+
git log --since="30 days ago" --name-only --pretty=format: \
|
|
74
|
+
plugins/*/skills/*/SKILL.md | sort -u > /tmp/recently_touched.txt
|
|
75
|
+
|
|
76
|
+
find plugins -name "SKILL.md" | while read f; do
|
|
77
|
+
grep -qxF "$f" /tmp/recently_touched.txt || echo "[STALE candidate] $f"
|
|
78
|
+
done
|
|
79
|
+
```
|
|
80
|
+
|
|
81
|
+
| Status | Judgment criteria | Action |
|
|
82
|
+
|---|---|---|
|
|
83
|
+
| **STALE** | 30+ day git no-modify + no recent session mention | Confirm with user, then mark `status: stale` in frontmatter |
|
|
84
|
+
| **Pin protected** | `pinned: true` in frontmatter | Never touch under any circumstances |
|
|
85
|
+
| **Merge candidate** | Two skills with ≥70% functional overlap | Suggest merge draft (no auto-execution) |
|
|
86
|
+
| **Normal** | None of the above | Keep |
|
|
87
|
+
|
|
88
|
+
### 6-2. Memory Self-Correction (bash)
|
|
89
|
+
|
|
90
|
+
**Theoretical basis — Agent Aging 4 Mechanisms** (arXiv:2605.26302, AgingBench):
|
|
91
|
+
|
|
92
|
+
| Aging type | Definition | 6-2 defense |
|
|
93
|
+
|---|---|---|
|
|
94
|
+
| **Compression aging** | Information loss during history compression | harvest-loop Step 0-a/b real-time recording obligation |
|
|
95
|
+
| **Interference aging** | New knowledge corrupts existing | FILE-ORPHAN detection |
|
|
96
|
+
| **Revision aging** | Stale facts after updates create inconsistency | MEM-STALE detection |
|
|
97
|
+
| **Maintenance aging** | Side effects from routine cleanup | Curator safety principles — no auto-delete |
|
|
98
|
+
|
|
99
|
+
```bash
|
|
100
|
+
# MEMORY.md index vs actual files consistency check
|
|
101
|
+
grep -oP '\[.*?\]\(\K[^)]+' memory/MEMORY.md | while read f; do
|
|
102
|
+
[ -f "memory/$f" ] || echo "[INDEX-ORPHAN] memory/$f — in index but file missing"
|
|
103
|
+
done
|
|
104
|
+
|
|
105
|
+
# Detect orphan files not in index
|
|
106
|
+
find memory -name "*.md" ! -name "MEMORY.md" | while read f; do
|
|
107
|
+
fname=$(basename "$f")
|
|
108
|
+
grep -q "$fname" memory/MEMORY.md || echo "[FILE-ORPHAN] $f — file exists but not indexed"
|
|
109
|
+
done
|
|
110
|
+
|
|
111
|
+
# Detect memory files unmodified for 30+ days
|
|
112
|
+
git log --since="30 days ago" --name-only --pretty=format: -- memory/ \
|
|
113
|
+
| sort -u > /tmp/recently_touched_mem.txt
|
|
114
|
+
|
|
115
|
+
find memory -name "*.md" ! -name "MEMORY.md" | while read f; do
|
|
116
|
+
grep -qxF "$f" /tmp/recently_touched_mem.txt || echo "[MEM-STALE candidate] $f"
|
|
117
|
+
done
|
|
118
|
+
```
|
|
119
|
+
|
|
120
|
+
| Status | Judgment | Action |
|
|
121
|
+
|---|---|---|
|
|
122
|
+
| **INDEX-ORPHAN** | In MEMORY.md but file missing | Remove from MEMORY.md immediately (auto-allowed) |
|
|
123
|
+
| **FILE-ORPHAN** | File exists but not indexed | Confirm: "add to index or delete?" |
|
|
124
|
+
| **MEM-STALE** | 30+ day git no-modify | Confirm: "archive or delete?" |
|
|
125
|
+
| **PROJECT type priority** | `type: project` file | Suggest moving to CLOSED section if completed |
|
|
126
|
+
|
|
127
|
+
Memory curator safety: Only INDEX-ORPHAN removal is auto-allowed. `type: reference` items with 🔑 keywords excluded from STALE detection.
|
|
128
|
+
|
|
129
|
+
### 6-a. Skill Usage Leaderboard (bash)
|
|
130
|
+
|
|
131
|
+
```bash
|
|
132
|
+
# Check whether skill_usage.md exists
|
|
133
|
+
ls tracks/_meta/skill_usage.md 2>/dev/null || echo "MISSING"
|
|
134
|
+
```
|
|
135
|
+
|
|
136
|
+
**If absent**: `cp {FH_DIR}/templates/skill_usage_template.md tracks/_meta/skill_usage.md`
|
|
137
|
+
**If present**: Add row at bottom of "Recent session records" table:
|
|
138
|
+
```markdown
|
|
139
|
+
| {YYYY-MM-DD} | {comma-separated list of skills called this session} |
|
|
140
|
+
```
|
|
141
|
+
|
|
142
|
+
Update `Last used` date for called skills in Leaderboard table.
|
|
143
|
+
Flag skills with 4+ weeks no call: status `⚠️ Under observation` → if 28+ days → `❌ Deprecation candidate`.
|
|
144
|
+
|
|
145
|
+
### 6-b. Harness Evolution Cadence (bash — run when 4+ weeks of data accumulated)
|
|
146
|
+
|
|
147
|
+
```bash
|
|
148
|
+
# 1. Scan skills with complexity_routing
|
|
149
|
+
grep -rl "complexity_routing" plugins/*/skills/*/SKILL.md
|
|
150
|
+
|
|
151
|
+
# 2. Aggregate escalation records from fh_signal files
|
|
152
|
+
grep -rh "" tracks/_meta/fh_signal_*.md 2>/dev/null | \
|
|
153
|
+
grep -oE "(harness-doctor|verify-bidirectional|hub-cc-pr-reviewer|context-doctor|sim-conductor|agent-composer|harvest-loop|steel-quench)" | \
|
|
154
|
+
sort | uniq -c | sort -rn
|
|
155
|
+
```
|
|
156
|
+
|
|
157
|
+
| Status | Criteria | Action |
|
|
158
|
+
|---|---|---|
|
|
159
|
+
| **Valid** | 1+ actual activations within last cycle | Keep |
|
|
160
|
+
| **Removal candidate** | Never activated + 4+ weeks | Suggest removal |
|
|
161
|
+
| **New candidate** | Pattern appearing repeatedly in fh_signal | Suggest addition |
|
|
162
|
+
|
|
163
|
+
Output update candidate list → modify relevant SKILL.md after user approval. No auto-modification.
|
|
164
|
+
|
|
165
|
+
---
|
|
166
|
+
|
|
167
|
+
## §Observability — Full Observability Hook Spec
|
|
168
|
+
|
|
169
|
+
> Frontier basis: `harness_frontier_diagnosis_2026-06-02.md` §Frontier Highlights 3 (AHE — agents cannot reliably improve a black-box harness).
|
|
170
|
+
|
|
171
|
+
Every evolution decision must leave a 3-part trace in `tracks/_meta/edit_manifest.yaml`:
|
|
172
|
+
|
|
173
|
+
| Trace part | Where | When written |
|
|
174
|
+
|---|---|---|
|
|
175
|
+
| **(a) what changed** | `edit_manifest.yaml` entry (file + diff summary) | On accepting a proposal (Y gate) |
|
|
176
|
+
| **(b) predicted effect** | same entry's `predicted_impact` + `predicted_measurable_by` | Same moment — decision with no prediction is blind |
|
|
177
|
+
| **(c) verify checkpoint** | same entry's `validation_status` (accepted/rejected) | Next harvest-loop Step 0-c VERIFY |
|
|
178
|
+
|
|
179
|
+
**Decision-log obligation**: When Y gate accepts a proposal, append (a)+(b) pair to `edit_manifest.yaml` in the same step — do not defer. A proposal accepted without a recorded prediction is a black-box edit — flag, not silently applied.
|
|
180
|
+
|
|
181
|
+
**Glass-box Done When**: harvest-loop should be able to answer "for each change, what did we predict and did it hold?" purely from `edit_manifest.yaml` — zero reliance on session memory.
|
|
182
|
+
|
|
183
|
+
---
|
|
184
|
+
|
|
185
|
+
## §Output-Detail — Session Card Update (Natural Language Close Judgment)
|
|
186
|
+
|
|
187
|
+
**Natural-language close judgment (4th source — conversation context)**:
|
|
188
|
+
|
|
189
|
+
Even without git log match, items with the following patterns stated in session are treated as "natural-language closed" and removed immediately:
|
|
190
|
+
- "not possible / confirmed impossible" (endorsement not possible, cannot proceed)
|
|
191
|
+
- "no response + N weeks elapsed" → abandoned
|
|
192
|
+
- "mutual citation confirmed" · "merged" · "cancelled" · "no longer needed"
|
|
193
|
+
- User directly says "stop monitoring" · "close this" · "remove it"
|
|
194
|
+
|
|
195
|
+
Natural-language closed items → remove from card immediately (leave only "✅ closed" one-liner in reference table). Do not re-mention.
|
|
196
|
+
|
|
197
|
+
**When fh_completed file is absent (2-source mode)**:
|
|
198
|
+
- Source: starter card + git log only
|
|
199
|
+
- Items confirmed by git log alone → "confirmed removal"
|
|
200
|
+
- Card item name ↔ commit message mismatch → "removal candidate (uncertain)" + "real-time log missing — manual check needed: [item list]"
|
|
201
|
+
- Exact match required — no LLM semantic judgment for "confirmed" status
|
|
@@ -0,0 +1,129 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: hub-cc-pr-reviewer
|
|
3
|
+
description: Checks a submitted PR against the environment's baseline assets (CLAUDE.md, memory, naming, asset classification) and attaches a review comment with a merge recommendation. 5 steps — diff read, 8-area consistency check, self-catch, comment, merge recommendation.
|
|
4
|
+
user-invocable: true
|
|
5
|
+
allowed-tools: ["Bash", "Read", "Grep", "Glob"]
|
|
6
|
+
model: sonnet
|
|
7
|
+
complexity_routing:
|
|
8
|
+
base: sonnet
|
|
9
|
+
high: opus
|
|
10
|
+
escalate_when:
|
|
11
|
+
- adversarial
|
|
12
|
+
- cross_project
|
|
13
|
+
- high_stakes
|
|
14
|
+
---
|
|
15
|
+
|
|
16
|
+
> **Note:** The original developer is the forge-harness original developer (development source + meta-monitoring home). In external user install environments, the install environment user themselves is the baseline integrity gate operator (following path B generalization baseline / `SKILL_detail.md §External User Environment Adaptation Path` §).
|
|
17
|
+
|
|
18
|
+
# hub-cc-pr-reviewer — Hub Gate Operation Rule Automation
|
|
19
|
+
|
|
20
|
+
When a PR is submitted, checks consistency against the user environment's baseline assets (CLAUDE.md · memory · naming · asset classification) and attaches a review comment. 5-step: diff read → 8-matrix check → self-catch → comment attachment → merge recommendation.
|
|
21
|
+
|
|
22
|
+
## Activation Triggers
|
|
23
|
+
|
|
24
|
+
1. **PR #N input**: *"Review PR #N"* / *"Check PR #N"* / *"hub review"* / *"baseline consistency check"*
|
|
25
|
+
2. **Action leader cc → hub sync point**: Large decision area PR catch (following Option C Hybrid policy — memory creation / CLAUDE.md change / CATALOG round / skill v0.x evolution / policy change / asset synergy branch judgment)
|
|
26
|
+
3. **Hub cc session entry**: Layer A auto-read recent external commit catch (auto-discover new PRs)
|
|
27
|
+
|
|
28
|
+
### Natural Language Triggers (General user phrasing — activates without internal vocabulary)
|
|
29
|
+
|
|
30
|
+
| Example phrasing | Intent |
|
|
31
|
+
|---|---|
|
|
32
|
+
| "Is it okay to submit this PR?" | PR review request |
|
|
33
|
+
| "This change seems inconsistent with existing rules" | Baseline consistency check |
|
|
34
|
+
| "Please review before merging" | PR review gate |
|
|
35
|
+
| "Does this change affect other parts?" | Consistency check |
|
|
36
|
+
|
|
37
|
+
**Activation criteria**: "Review PR #N" / "Add review comment" / "Baseline check" → Run this skill directly
|
|
38
|
+
*(pr-review-watcher deprecated as of v0.2.0 — recommend using `gh pr view --json reviews` directly)*
|
|
39
|
+
|
|
40
|
+
**Exceptions** (this skill does NOT apply):
|
|
41
|
+
- **Small patches** (typo / 1-line cross-ref addition / sync / minor adjustments) → Follow Option C Hybrid policy (direct push allowed area / review skip)
|
|
42
|
+
- **Original developer simple correction commands** ("This is wrong, redo it") → Immediate correction (no review / direct handling)
|
|
43
|
+
|
|
44
|
+
## Processing Steps (5-step)
|
|
45
|
+
|
|
46
|
+
### Step 1. PR Diff Read
|
|
47
|
+
|
|
48
|
+
Read the PR diff + metadata. If this cc authored the change, the diff read can be skipped (directly state changed areas in PR body).
|
|
49
|
+
|
|
50
|
+
> **Detail**: See `SKILL_detail.md §Step 1 Diff Read` — `gh pr diff` + `gh pr view` commands — read when executing the diff read.
|
|
51
|
+
|
|
52
|
+
### Step 2. Baseline Consistency Check — 8-Matrix Auto-Generation
|
|
53
|
+
|
|
54
|
+
| # | Area | Check path |
|
|
55
|
+
|:---:|---|---|
|
|
56
|
+
| 1 | CLAUDE.md (hub identity + asset ownership + sync policy) | Grep PR diff vs CLAUDE.md baseline areas |
|
|
57
|
+
| 2 | Memory accumulation (accumulated naming/decision baseline + asset synergy branch judgment + active onboarding + bidirectional self-validation, etc.) | Grep PR diff vs `memory feedback_*.md` key areas (**External environment**: skip this item if memory files absent → `SKILL_detail.md §External User Environment Adaptation Path` §) |
|
|
58
|
+
| 3 | Naming baseline (accumulated naming baseline + new naming candidate area) | Catch new naming candidates from PR diff / check adherence to existing naming |
|
|
59
|
+
| 4 | Asset synergy branch judgment (meta/hub seed vs action leader persistent location) | Check PR changed asset location consistency |
|
|
60
|
+
| 5 | Simplification guard (P15 asymmetry catch + R7 over-engineering) | New asset creation vs existing asset reinforcement judgment / body length check |
|
|
61
|
+
| 6 | Dimension separation baseline (## Plugins / ## Skills / ## Agents) | Check dimension separation consistency on CATALOG changes |
|
|
62
|
+
| 7 | Branch criteria (large decision PR mandatory vs small patch direct push) | Check if PR is a large decision area (Option C Hybrid) |
|
|
63
|
+
| 8 | Hub gate operation consistency | Check if PR itself is a hub gate operation proof path |
|
|
64
|
+
|
|
65
|
+
Matrix result = Consistent ✅ / Partially Consistent ⚠️ / Inconsistent ❌.
|
|
66
|
+
|
|
67
|
+
### Step 3. Layer 5 Self-Catch Matrix
|
|
68
|
+
|
|
69
|
+
Self-precision catch areas after first cc review (following previous PR self-catch patterns):
|
|
70
|
+
- Check adherence to frontmatter description plain text only baseline (project baseline)
|
|
71
|
+
- Check honest documentation of generalization effect weakening areas
|
|
72
|
+
- Check explicit statement of gap between accumulated history (original developer environment) vs external user starting point (0 instances)
|
|
73
|
+
- Check explicit statement that audience-specific guides are limited to original developer environment
|
|
74
|
+
- Check explicit statement of organization-specific areas
|
|
75
|
+
|
|
76
|
+
Self-catch areas 0 items = skip this entire catch matrix (no token-filling / following `feedback_simplification_evidence`).
|
|
77
|
+
|
|
78
|
+
### Step 4. Review Comment Attachment
|
|
79
|
+
|
|
80
|
+
Attach the review comment (8-matrix results + self-catch + refinement suggestions + merge recommendation) via `gh pr comment`. Within this skill's execution authority (automatic).
|
|
81
|
+
|
|
82
|
+
> **Detail**: See `SKILL_detail.md §Step 4 Comment Template` — `gh pr comment` heredoc template — read when attaching the comment.
|
|
83
|
+
|
|
84
|
+
### Step 5. Admin Override Merge Recommendation
|
|
85
|
+
|
|
86
|
+
**User decision delegation** (this skill = review/recording automation / no merge authority):
|
|
87
|
+
- Beta stage policy (`enforce_admins: false`) adherence → admin override possible
|
|
88
|
+
- Self-approve blocked (GHE policy) → admin override path adherence
|
|
89
|
+
- When this cc authored the change, admin override path is mandatory
|
|
90
|
+
- N+1th operation proof = baseline stabilization acceleration path
|
|
91
|
+
|
|
92
|
+
> **Detail**: See `SKILL_detail.md §Step 5 Merge Command` — `gh pr merge` command (executed after user decision, not by this skill) — read when the user authorizes merge.
|
|
93
|
+
|
|
94
|
+
## User Approval Gate
|
|
95
|
+
|
|
96
|
+
| Stage | Approval |
|
|
97
|
+
|---|---|
|
|
98
|
+
| Step 1~3 check auto-activation | **Automatic** (editable afterward) |
|
|
99
|
+
| Step 4 review comment attachment | **Automatic** (gh pr comment within this skill's execution authority) |
|
|
100
|
+
| Step 5 admin override merge execution | **User decision** (this skill = recommendation only / no merge authority) |
|
|
101
|
+
|
|
102
|
+
## Constraints
|
|
103
|
+
|
|
104
|
+
- **This skill = review/recording automation / no merge authority** — user admin override or other reviewer merge decision
|
|
105
|
+
- **No single-person decision application** — following `fact-checker` rule (narrow 1 / broad N+1 / this cc self-catch joins fact-checker count)
|
|
106
|
+
- **Simplification guard consistency** (`feedback_simplification_evidence`) — when creating/modifying this skill, update SKILL.md only. No new auxiliary files
|
|
107
|
+
- **Markdown editing discipline mandatory** (`feedback_markdown_edit_discipline`) — Edit first. No Write
|
|
108
|
+
- **Frontmatter description plain text only baseline** (`feedback_skill_frontmatter_description_plain_text`) — avoid markdown bold
|
|
109
|
+
|
|
110
|
+
> **Detail**: See `SKILL_detail.md §Sister Asset Utilization Path`, `§External User Environment Adaptation Path`, `§Disable Path`, `§Persona Synergy Catch` — cross-ecosystem utilization, external-environment fallback, own-PRS disable resolution, and deep-insight simultaneous-activation handling — read when operating in an external user environment, resolving own-PRS conflict, or coordinating with deep-insight.
|
|
111
|
+
|
|
112
|
+
## Done When
|
|
113
|
+
|
|
114
|
+
```
|
|
115
|
+
All 5 Steps completed
|
|
116
|
+
+ Baseline consistency check 8-matrix results output (✅/⚠️/❌ each item)
|
|
117
|
+
+ Review comment attached via gh pr comment command
|
|
118
|
+
+ Admin override merge recommendation output (merge execution is user's decision)
|
|
119
|
+
+ External verification path: harvest-loop Step 3.75 Critic isolation Agent can independently judge based on above criteria (skill_quality_rubric.md verifiable criteria)
|
|
120
|
+
```
|
|
121
|
+
|
|
122
|
+
**→ Mandatory when PR contains SKILL.md / rules / templates changes: `bash templates/regression_guard.sh`** — run Axis 1 (backward check) before merge recommendation is issued. If regression_guard exits with M-tier block, merge recommendation must change to ❌ regardless of other checks.
|
|
123
|
+
|
|
124
|
+
## References
|
|
125
|
+
|
|
126
|
+
- Rule body: `memory feedback_command_tower_gate.md` (hub gate accumulated naming baseline) + `memory feedback_qasp_to_hub_sync_protocol.md` (Option C Hybrid sync policy)
|
|
127
|
+
- Consistency rules: `feedback_simplification_evidence` · `feedback_markdown_edit_discipline` · `feedback_skill_frontmatter_description_plain_text` · `feedback_bidirectional_self_validation` · `feedback_reference_own_hub_assets_first`
|
|
128
|
+
- Sister skills: `cross-ecosystem-synergy-detection` (sister asset cluster baseline) · `verify-bidirectional` (bidirectional self-validation automation / self-catch auxiliary axis) · `harvest-loop` (weekly audit automation / operation proof accumulation cross-link)
|
|
129
|
+
- Autonomous commit proposal §2.19 baseline: `memory feedback_autonomous_commit_proposal.md` (① development source automation + PR proposal under human approval)
|