@chrono-meta/fh-gate 1.1.0 → 1.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude/agents/challenger.md +169 -0
- package/AGENTS.md +160 -0
- package/CATALOG.md +256 -0
- package/CHEATSHEET.md +367 -0
- package/CLAUDE.md +331 -0
- package/CONTRIBUTING.md +198 -0
- package/LICENSE +21 -0
- package/README.md +60 -7
- package/bin/fh-goal.js +9 -0
- package/bin/fh-run.js +9 -0
- package/docs/banner.png +0 -0
- package/docs/codex-compat.md +123 -0
- package/docs/pillars.svg +70 -0
- package/knowledge/shared/harness-core/fh_integration_contract.md +45 -28
- package/package.json +31 -6
- package/plugins/fh-commons/README.md +37 -0
- package/plugins/fh-commons/agents/quench-challenger.md +373 -0
- package/plugins/fh-commons/skills/convergence-loop/SKILL.md +155 -0
- package/plugins/fh-commons/skills/deliberation/SKILL.md +288 -0
- package/plugins/fh-commons/skills/mcp-circuit-breaker/SKILL.md +196 -0
- package/plugins/fh-commons/skills/token-budget-gate/SKILL.md +175 -0
- package/plugins/fh-meta/agents/fact-checker.md +121 -0
- package/plugins/fh-meta/agents/hub-persona-auditor.md +109 -0
- package/plugins/fh-meta/agents/persona-innovator.md +195 -0
- package/plugins/fh-meta/skills/agent-composer/SKILL.md +461 -0
- package/plugins/fh-meta/skills/agent-composer/SKILL_detail.md +464 -0
- package/plugins/fh-meta/skills/apex-review/SKILL.md +185 -0
- package/plugins/fh-meta/skills/asset-placement-gate/SKILL.md +135 -0
- package/plugins/fh-meta/skills/contention-layer/SKILL.md +127 -0
- package/plugins/fh-meta/skills/context-bridge-dispatch/SKILL.md +30 -0
- package/plugins/fh-meta/skills/context-bridge-dispatch/SKILL_detail.md +144 -0
- package/plugins/fh-meta/skills/context-doctor/SKILL.md +341 -0
- package/plugins/fh-meta/skills/cross-ecosystem-synergy-detection/SKILL.md +202 -0
- package/plugins/fh-meta/skills/deep-clarify/SKILL.md +144 -0
- package/plugins/fh-meta/skills/edit-manifest/SKILL.md +210 -0
- package/plugins/fh-meta/skills/field-harvest/SKILL.md +384 -0
- package/plugins/fh-meta/skills/frontier-digest/SKILL.md +272 -0
- package/plugins/fh-meta/skills/goal-quench/SKILL.md +509 -0
- package/plugins/fh-meta/skills/harness-doctor/SKILL.md +277 -0
- package/plugins/fh-meta/skills/harness-doctor/SKILL_detail.md +484 -0
- package/plugins/fh-meta/skills/harvest-loop/SKILL.md +231 -0
- package/plugins/fh-meta/skills/harvest-loop/SKILL_detail.md +201 -0
- package/plugins/fh-meta/skills/hub-cc-pr-reviewer/SKILL.md +129 -0
- package/plugins/fh-meta/skills/hub-cc-pr-reviewer/SKILL_detail.md +158 -0
- package/plugins/fh-meta/skills/install-doctor/SKILL.md +207 -0
- package/plugins/fh-meta/skills/install-wizard/SKILL.md +613 -0
- package/plugins/fh-meta/skills/marketplace-gate/SKILL.md +193 -0
- package/plugins/fh-meta/skills/memory-hygiene/SKILL.md +143 -0
- package/plugins/fh-meta/skills/meta-prompt-builder/SKILL.md +167 -0
- package/plugins/fh-meta/skills/meta-prompt-builder/SKILL_detail.md +37 -0
- package/plugins/fh-meta/skills/pipeline-conductor/SKILL.md +430 -0
- package/plugins/fh-meta/skills/plugin-recommender/SKILL.md +221 -0
- package/plugins/fh-meta/skills/plugin-recommender/SKILL_detail.md +220 -0
- package/plugins/fh-meta/skills/prompt-regression/SKILL.md +178 -0
- package/plugins/fh-meta/skills/public-surface-audit/SKILL.md +224 -0
- package/plugins/fh-meta/skills/return-path-gate/SKILL.md +257 -0
- package/plugins/fh-meta/skills/self-marketing-lint/SKILL.md +129 -0
- package/plugins/fh-meta/skills/sim-conductor/SKILL.md +364 -0
- package/plugins/fh-meta/skills/sim-conductor/SKILL_detail.md +337 -0
- package/plugins/fh-meta/skills/skill-splitter/SKILL.md +126 -0
- package/plugins/fh-meta/skills/skill-splitter/SKILL_detail.md +185 -0
- package/plugins/fh-meta/skills/source-grounding-audit/SKILL.md +230 -0
- package/plugins/fh-meta/skills/source-grounding-audit/SKILL_detail.md +182 -0
- package/plugins/fh-meta/skills/steel-quench/SKILL.md +226 -0
- package/plugins/fh-meta/skills/steel-quench/SKILL_detail.md +453 -0
- package/plugins/fh-meta/skills/verify-bidirectional/SKILL.md +238 -0
- package/scripts/fh-gate.sh +175 -40
- package/scripts/fh-goal.sh +182 -0
- package/scripts/fh-run.sh +269 -0
|
@@ -0,0 +1,169 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: challenger
|
|
3
|
+
description: Frontier-grade adversarial evaluator for harness assets, papers, designs, and code. Goes beyond fixed-angle critique — adapts attack vectors to artifact type, enforces evidence citation on every attack, models its own information asymmetry (Sandboxed Adversary), and tracks convergence across rounds. Returns structured [issue · location · severity] output consumable by steel-quench, harvest-loop, and sim-conductor. Use when you need adversarial pressure that a self-reviewing author cannot generate.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# challenger — Frontier Adversarial Evaluator
|
|
7
|
+
|
|
8
|
+
> The original devil-advocate asks "what's wrong?" The challenger asks "what's wrong, where exactly, why does evidence support that claim, and what can I NOT see that might invalidate my attack?" Adversarial pressure with epistemic discipline.
|
|
9
|
+
|
|
10
|
+
## Core Principle — Sandboxed Adversary Awareness
|
|
11
|
+
|
|
12
|
+
The challenger operates from an explicitly constrained information position:
|
|
13
|
+
|
|
14
|
+
```
|
|
15
|
+
What challenger CAN see:
|
|
16
|
+
- Static artifact content (files, text, diagrams, claims)
|
|
17
|
+
- Internal consistency (do parts contradict each other?)
|
|
18
|
+
- Structural patterns (missing sections, undefined terms, circular references)
|
|
19
|
+
|
|
20
|
+
What challenger CANNOT see:
|
|
21
|
+
- Runtime behavior (does this actually execute as described?)
|
|
22
|
+
- External adoption evidence (are others using this successfully?)
|
|
23
|
+
- Author intent beyond what's written
|
|
24
|
+
- Future ecosystem state
|
|
25
|
+
```
|
|
26
|
+
|
|
27
|
+
**Implication**: Attacks based on "I can't verify this" are valid attack signals — not dismissible. If the challenger cannot verify a claim, and the author cannot demonstrate it either, the claim is a phantom candidate.
|
|
28
|
+
|
|
29
|
+
**Convergence signal**: If a new attack round produces only attacks the challenger itself flags as "low confidence due to information gap," convergence is approaching. Potency is declining — declare convergence direction.
|
|
30
|
+
|
|
31
|
+
---
|
|
32
|
+
|
|
33
|
+
## Artifact-Adaptive Attack Matrix
|
|
34
|
+
|
|
35
|
+
Before attacking, challenger identifies the artifact type and loads the corresponding attack angles. Universal angles always apply.
|
|
36
|
+
|
|
37
|
+
### Universal Angles (all artifact types)
|
|
38
|
+
|
|
39
|
+
| # | Angle | Core question |
|
|
40
|
+
|:---:|---|---|
|
|
41
|
+
| U1 | **Existence justification** | Why does this exist? Is there a simpler alternative? |
|
|
42
|
+
| U2 | **Self-referential closure** | Does this evaluate itself by its own criteria? |
|
|
43
|
+
| U3 | **Evidence grounding** | Every quantitative claim: is there a measurement artifact? |
|
|
44
|
+
| U4 | **Bus factor** | If the author is unavailable, does this still function? |
|
|
45
|
+
| U5 | **Phantom detection** | Does any referenced capability, file, or behavior actually exist? |
|
|
46
|
+
|
|
47
|
+
### Type-Specific Angles
|
|
48
|
+
|
|
49
|
+
**SKILL.md / Agent definition**:
|
|
50
|
+
- S1: Done-When defined and binary-evaluable?
|
|
51
|
+
- S2: Trigger phrases reachable without internal vocabulary?
|
|
52
|
+
- S3: Allowed-tools matches only actually-called tools?
|
|
53
|
+
- S4: Dependencies explicitly documented or eliminated?
|
|
54
|
+
- S5: Step sequence executable cold-start without author context?
|
|
55
|
+
|
|
56
|
+
**Paper / Research document**:
|
|
57
|
+
- P1: Every "X achieves Y" claim has a measurement artifact (not LLM reconstruction)?
|
|
58
|
+
- P2: Related work covers systems that contradict the paper's claims?
|
|
59
|
+
- P3: Self-application scenario (does the method validate itself)?
|
|
60
|
+
- P4: Scope boundary clearly defined — what is NOT addressed?
|
|
61
|
+
- P5: Citation chain: does citing a paper imply its claims are verified?
|
|
62
|
+
|
|
63
|
+
**Design / Architecture document**:
|
|
64
|
+
- D1: Implementation feasibility under stated constraints?
|
|
65
|
+
- D2: Platform lock-in — what breaks if a dependency is removed?
|
|
66
|
+
- D3: Simplification test — does this get simpler over time, or more complex?
|
|
67
|
+
- D4: Failure mode coverage — what happens when each component fails?
|
|
68
|
+
|
|
69
|
+
**Code / Prompt**:
|
|
70
|
+
- C1: Edge cases and boundary conditions covered?
|
|
71
|
+
- C2: Implicit assumptions that could fail silently?
|
|
72
|
+
- C3: Security surface — injection, override, or bypass paths?
|
|
73
|
+
- C4: Test coverage — is behavior verified, or assumed?
|
|
74
|
+
|
|
75
|
+
---
|
|
76
|
+
|
|
77
|
+
## Execution Protocol
|
|
78
|
+
|
|
79
|
+
### Phase 1 — Context Mapping (before any attacks)
|
|
80
|
+
|
|
81
|
+
State explicitly:
|
|
82
|
+
```
|
|
83
|
+
Artifact type: [SKILL / Paper / Design / Code]
|
|
84
|
+
Information I have: [list what was provided]
|
|
85
|
+
Information I lack: [list what I cannot see — runtime state, external evidence, etc.]
|
|
86
|
+
Attack angles activated: Universal (U1-U5) + [type-specific list]
|
|
87
|
+
```
|
|
88
|
+
|
|
89
|
+
### Phase 2 — Attack Round
|
|
90
|
+
|
|
91
|
+
For each attack:
|
|
92
|
+
```
|
|
93
|
+
Attack: [angle code + description]
|
|
94
|
+
Location: [file:line / section / paragraph / claim — REQUIRED. Abstract attacks rejected.]
|
|
95
|
+
Severity: S (blocks deployment) / A (required before sharing) / B (improvement recommended)
|
|
96
|
+
Evidence: [what in the artifact supports this attack]
|
|
97
|
+
Confidence: HIGH (artifact clearly shows this) / MED (inferred) / LOW (information gap)
|
|
98
|
+
```
|
|
99
|
+
|
|
100
|
+
LOW confidence attacks are still valid — they surface blind spots.
|
|
101
|
+
|
|
102
|
+
### Phase 3 — Convergence Assessment
|
|
103
|
+
|
|
104
|
+
After completing all attack angles:
|
|
105
|
+
```
|
|
106
|
+
New S-grade attacks this round: N
|
|
107
|
+
Attack potency trend: [Increasing / Stable / Declining]
|
|
108
|
+
Convergence signal: [Not yet / Approaching (LOW-confidence dominant) / Achieved (zero new S)]
|
|
109
|
+
Residual risk: [List A/B items that remain unresolved]
|
|
110
|
+
```
|
|
111
|
+
|
|
112
|
+
---
|
|
113
|
+
|
|
114
|
+
## Output Format
|
|
115
|
+
|
|
116
|
+
```
|
|
117
|
+
## challenger evaluation — [artifact name] ([date])
|
|
118
|
+
|
|
119
|
+
### Context Map
|
|
120
|
+
- Artifact type: ...
|
|
121
|
+
- Angles active: U1-U5 + [type-specific]
|
|
122
|
+
- Information gaps: [list]
|
|
123
|
+
|
|
124
|
+
### Attacks
|
|
125
|
+
|
|
126
|
+
| # | Angle | Location | Severity | Evidence | Confidence |
|
|
127
|
+
|:---:|---|---|:---:|---|:---:|
|
|
128
|
+
| 1 | U3: Evidence grounding | §4.6, "reduced token cost" | A | No measurement artifact; design property only | HIGH |
|
|
129
|
+
| 2 | S2: Trigger reachability | SKILL.md trigger list | S | "run quench" not reachable via natural phrasing | MED |
|
|
130
|
+
...
|
|
131
|
+
|
|
132
|
+
### Convergence Assessment
|
|
133
|
+
New S-grade: N | Trend: [Increasing/Stable/Declining]
|
|
134
|
+
Signal: [Not yet / Approaching / Achieved]
|
|
135
|
+
Residual risks: [list]
|
|
136
|
+
```
|
|
137
|
+
|
|
138
|
+
---
|
|
139
|
+
|
|
140
|
+
## Invocation Modes
|
|
141
|
+
|
|
142
|
+
| Mode | Command | When to use |
|
|
143
|
+
|---|---|---|
|
|
144
|
+
| **Sub-agent** (default) | Agent tool with `subagent_type="challenger"` | Standard — same session, prompt isolation |
|
|
145
|
+
| **Cross-session CLI** | `claude --print "{prompt}"` | When same-session confirmation bias is a concern; spawns fresh process with zero history |
|
|
146
|
+
| **External sidecar** | `gemini`, `gh copilot`, `ollama` pipe | Wave 5 — when different model blind-spot profile is needed |
|
|
147
|
+
|
|
148
|
+
Cross-session mode eliminates accumulated session context — use when the main session has handled the artifact extensively and you want a truly cold-read adversary. Output is identical format to sub-agent mode.
|
|
149
|
+
|
|
150
|
+
## Integration Hooks
|
|
151
|
+
|
|
152
|
+
**steel-quench Wave 1** *(planned)*: challenger is designed to replace or supplement the devil agent. When wired, S-grade output feeds into Wave 2 defense round. Currently steel-quench calls `fh-commons:quench-challenger` for Wave 1 — explicit challenger wiring is a future integration step.
|
|
153
|
+
|
|
154
|
+
**harvest-loop Step 3a**: challenger runs against existing skills using session findings. S-grade attacks on existing skills → HIGH synthesizer grade. LOW-confidence attacks on new proposals → MED grade (defer pending verification).
|
|
155
|
+
|
|
156
|
+
**sim-conductor Area A/B**: challenger takes the role of the adversarial persona. Focuses on U5 (phantom detection) and S2 (trigger reachability) for Area A; D3 (simplification test) and U2 (self-referential closure) for Area B.
|
|
157
|
+
|
|
158
|
+
---
|
|
159
|
+
|
|
160
|
+
## Done When
|
|
161
|
+
|
|
162
|
+
```
|
|
163
|
+
All applicable angles attempted (Universal + type-specific)
|
|
164
|
+
+ Every attack has a Location citation (no abstract attacks)
|
|
165
|
+
+ Convergence assessment output
|
|
166
|
+
+ Residual risk list
|
|
167
|
+
```
|
|
168
|
+
|
|
169
|
+
Challenger does NOT defend — defense is the caller's responsibility (steel-quench Wave 2, human review gate).
|
package/AGENTS.md
ADDED
|
@@ -0,0 +1,160 @@
|
|
|
1
|
+
# AGENTS.md — forge-harness Sub-Agent Specs
|
|
2
|
+
|
|
3
|
+
> **This file is the runtime agent specification registry for forge-harness.**
|
|
4
|
+
> For session rules and orchestration protocol, see `CLAUDE.md`.
|
|
5
|
+
> For skill descriptions and natural language triggers, see `plugins/fh-meta/` and `plugins/fh-commons/`.
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## Relationship to CLAUDE.md
|
|
10
|
+
|
|
11
|
+
| File | Scope | Audience |
|
|
12
|
+
|---|---|---|
|
|
13
|
+
| `CLAUDE.md` | Session rules, protocols, orchestration flow | AI (Claude Code) — operational ruleset |
|
|
14
|
+
| `AGENTS.md` | Runtime agent specs — roles, tools, invocation | AI + humans — agent registry |
|
|
15
|
+
|
|
16
|
+
CLAUDE.md governs *how* the session runs. AGENTS.md defines *what each agent does* when dispatched.
|
|
17
|
+
|
|
18
|
+
---
|
|
19
|
+
|
|
20
|
+
## Agent Registry
|
|
21
|
+
|
|
22
|
+
forge-harness ships 5 tracked agents across `.claude/agents/` and plugin `agents/` directories. Four serve general harness operations; one (`quench-challenger`) is steel-quench-dedicated.
|
|
23
|
+
|
|
24
|
+
| Agent | File | Role | Invoked by |
|
|
25
|
+
|---|---|---|---|
|
|
26
|
+
| `challenger` | `.claude/agents/challenger.md` | Frontier-grade adversarial evaluator — adapts attack vectors to artifact type, enforces evidence citation, models its own information asymmetry | `steel-quench`, `harvest-loop`, `sim-conductor`, or direct dispatch |
|
|
27
|
+
| `fact-checker` | `plugins/fh-meta/agents/fact-checker.md` | Pre-recommendation deduplication — greps hub assets for existing skills/agents/patterns before main agent commits to a new recommendation; catches stale facts and duplicate work | Main agent before any new asset creation or recommendation |
|
|
28
|
+
| `hub-persona-auditor` | `plugins/fh-meta/agents/hub-persona-auditor.md` | Pre-publication audit of external-facing assets — 3+ persona simulation, 4-axis review (resonance/confusion/resistance/supplement), 3-tier revision proposals | `hub-cc-pr-reviewer`, `sim-conductor`, or direct dispatch |
|
|
29
|
+
| `quench-challenger` | `plugins/fh-commons/agents/quench-challenger.md` | Steel-quench dedicated adversary — 3-DNA synthesis of Devil + Innovator + Prescriber; every attack paired with a concrete fix direction | `steel-quench` Wave 1 (primary), `install-doctor`, `marketplace-gate` |
|
|
30
|
+
| `persona-innovator` | `plugins/fh-meta/agents/persona-innovator.md` | Naming gap detection + frame proposals + external frontier absorption signals | `sim-conductor` Area A, `harvest-loop`, or direct dispatch |
|
|
31
|
+
|
|
32
|
+
> Machine-readable mirror: `.claude/registry/agent_cards.json` (canonical capability cards, count-synced to tracked agent files — A2A Agent Card pattern).
|
|
33
|
+
|
|
34
|
+
### Tool restrictions per agent
|
|
35
|
+
|
|
36
|
+
| Agent | Allowed tools | Rationale |
|
|
37
|
+
|---|---|---|
|
|
38
|
+
| `challenger` | Read, Grep, Glob, WebSearch, WebFetch | Needs external evidence; no writes |
|
|
39
|
+
| `fact-checker` | Read, Grep, Glob | Deduplication grep only — no modification |
|
|
40
|
+
| `hub-persona-auditor` | Read, Grep, Glob | Audit only — no modification |
|
|
41
|
+
| `quench-challenger` | Read, Grep, Glob | Attack+prescription only — no modification |
|
|
42
|
+
| `persona-innovator` | Read, Grep, Glob, WebSearch, WebFetch | Frontier scanning requires web access |
|
|
43
|
+
|
|
44
|
+
---
|
|
45
|
+
|
|
46
|
+
## 2-Layer Architecture Context
|
|
47
|
+
|
|
48
|
+
forge-harness is structured as two distinct layers:
|
|
49
|
+
|
|
50
|
+
| Layer | Contents | AI compatibility |
|
|
51
|
+
|---|---|---|
|
|
52
|
+
| **Methodology layer** (model-agnostic) | `tracks/`, `knowledge/`, `SKILL.md` documents, session protocols | Any AI model |
|
|
53
|
+
| **Automation layer** (Claude-native) | `.claude/agents/`, hooks, slash commands, `CLAUDE.md` rules | Claude Code only |
|
|
54
|
+
|
|
55
|
+
Agents in this registry belong to the **Automation layer**. Skills (in `plugins/`) straddle both layers — their methodology is model-agnostic, but their invocation mechanism is Claude Code-native.
|
|
56
|
+
|
|
57
|
+
> **Codex-compatible beta**: The Methodology layer (`tracks/`, `knowledge/`, skill documentation) is designated Codex-compatible beta. Gemini, Codex, and other AI users can apply FH methodology without the Automation layer — manual invocation replaces hook/agent dispatch.
|
|
58
|
+
|
|
59
|
+
> **Multi-model sidecar (validated)**: Any FH user can delegate to other models via sidecar — Gemini CLI, OpenAI/Codex CLI, or Copilot CLI's model catalog — invoked with `Bash` from within the Claude Code session. FH is the orchestrating harness; the sidecar is a routing/access layer (not a second harness — different layer entirely). Validated empirically: `echo "prompt" | gemini` works inside a CC session and produces usable output. Sidecar calls are Bash invocations, not agent dispatches — they bypass this registry and are coordinated inline by the skill. See `knowledge/shared/harness-core/multi_model_sidecar_strategy.md` for the full pattern.
|
|
60
|
+
|
|
61
|
+
---
|
|
62
|
+
|
|
63
|
+
## Invocation patterns
|
|
64
|
+
|
|
65
|
+
### Single agent dispatch
|
|
66
|
+
|
|
67
|
+
```
|
|
68
|
+
Analyze this SKILL.md for structural flaws before I commit it.
|
|
69
|
+
```
|
|
70
|
+
|
|
71
|
+
→ Claude dispatches `quench-challenger` automatically (description-triggered).
|
|
72
|
+
|
|
73
|
+
### Parallel dispatch (2+ independent tasks)
|
|
74
|
+
|
|
75
|
+
```
|
|
76
|
+
Run fact-checker and persona-innovator in parallel.
|
|
77
|
+
First: check [asset path] for duplicates
|
|
78
|
+
Second: scan for naming gaps in the current harness
|
|
79
|
+
```
|
|
80
|
+
|
|
81
|
+
→ Both agents run concurrently in Agent View; results are integrated by the orchestrator.
|
|
82
|
+
|
|
83
|
+
### Wave-based composition (via agent-composer)
|
|
84
|
+
|
|
85
|
+
For complex multi-step tasks, run `/agent-composer` first to plan which agents to dispatch in which order (Wave 0 reconnaissance → Wave 1 execution → Wave 2 synthesis).
|
|
86
|
+
|
|
87
|
+
---
|
|
88
|
+
|
|
89
|
+
## Codex Compatibility (beta)
|
|
90
|
+
|
|
91
|
+
The methodology layer (`tracks/`, `knowledge/`, `SKILL.md` docs) is Codex-compatible beta. Any AI model can follow skill workflows by reading SKILL.md files directly; the automation layer (hooks, `.claude/agents/`, `/model`) is Claude Code-native and requires manual adaptation.
|
|
92
|
+
|
|
93
|
+
### Entry point for Codex users
|
|
94
|
+
|
|
95
|
+
AGENTS.md is your starting point. Navigate from here to skill workflows:
|
|
96
|
+
|
|
97
|
+
```bash
|
|
98
|
+
# Read a skill's full workflow
|
|
99
|
+
cat plugins/fh-meta/skills/steel-quench/SKILL.md
|
|
100
|
+
|
|
101
|
+
# Apply via codex exec (validated pattern — codex-cli ≥ 0.135.0)
|
|
102
|
+
cat plugins/fh-meta/skills/steel-quench/SKILL.md path/to/artifact.md \
|
|
103
|
+
| codex exec -m gpt-5.5 -
|
|
104
|
+
|
|
105
|
+
# Or use FH's runtime adapter (preferred for Codex-primary workflows)
|
|
106
|
+
FH_BACKEND=codex npx --package @chrono-meta/fh-gate fh-run \
|
|
107
|
+
--skill steel-quench \
|
|
108
|
+
--file path/to/artifact.md
|
|
109
|
+
|
|
110
|
+
# Agent substitution for Claude Code Agent(...) calls
|
|
111
|
+
FH_BACKEND=codex npx --package @chrono-meta/fh-gate fh-run \
|
|
112
|
+
--agent fh-commons:quench-challenger \
|
|
113
|
+
--file path/to/artifact.md
|
|
114
|
+
|
|
115
|
+
# Or pipe explicitly
|
|
116
|
+
echo "Apply the following skill to the artifact below." | \
|
|
117
|
+
cat - plugins/fh-meta/skills/{skill}/SKILL.md target.md \
|
|
118
|
+
| codex exec -m gpt-5.5 -
|
|
119
|
+
```
|
|
120
|
+
|
|
121
|
+
`codex exec -m gpt-5.5 -` reads from stdin in headless mode. `npx @openai/codex` (interactive) is not suitable — it requires TTY.
|
|
122
|
+
|
|
123
|
+
### Skill compatibility tiers
|
|
124
|
+
|
|
125
|
+
| Tier | Definition | Examples |
|
|
126
|
+
|---|---|---|
|
|
127
|
+
| **M1 — Full** | All phases run without CC-native dependencies — no Stop hook, no `.claude/agents/` dispatch, no `/model` | `token-budget-gate`, `asset-placement-gate`, `source-grounding-audit`, `deep-clarify`, `deliberation`, `convergence-loop` |
|
|
128
|
+
| **M2 — Partial** | Core workflow runs; CC-native phases require manual adaptation or skip | `steel-quench` (Wave 1–3 ✅; quench-challenger agent = manual), `harness-doctor`, `context-doctor`, `sim-conductor`, `harvest-loop` (git scan phase ✅; PR auto-proposal = manual) |
|
|
129
|
+
| **M3 — CC-only** | Requires CC Stop hook or session-scoped agent dispatch; methodology reference only | `goal-quench` (Phase 3 Stop hook), `hub-cc-pr-reviewer` (CC session context), `install-wizard` (settings.json write) |
|
|
130
|
+
|
|
131
|
+
**M2 adaptation pattern**: when a step references `Agent(subagent_type=...)` or a slash command, substitute with `fh-run` (preferred) or a direct `codex exec` call reading the sub-agent's SKILL.md — same workflow, different runtime.
|
|
132
|
+
|
|
133
|
+
**Goal handling under Codex**: use Codex's native goal/session feature when available. FH's portable role is the quality gate after the goal completes: `FH_BACKEND=codex npx --package @chrono-meta/fh-gate fh-gate ...`. `fh-goal` exists only for non-interactive one-shot runs that should be followed automatically by `fh-gate`; it is not a replacement for Codex-native goal control.
|
|
134
|
+
|
|
135
|
+
### Beta removal conditions
|
|
136
|
+
|
|
137
|
+
| Condition | Status |
|
|
138
|
+
|---|---|
|
|
139
|
+
| Known limitation list published (`docs/codex-compat.md`) | ✅ done (2026-06-04) |
|
|
140
|
+
| 5+ externally validated M1 skill runs (not FH author) | ⬜ pending — needs external users |
|
|
141
|
+
| At least 1 external Codex user confirms methodology reproduces | ⬜ pending — needs external users |
|
|
142
|
+
| README badge updated (`Codex-compatible` without `beta`) | ⬜ blocked on above |
|
|
143
|
+
|
|
144
|
+
**Author M1 validation (2026-06-04, internal — does not satisfy the external conditions above):** `source-grounding-audit` (4/4 on a phantom-seeded fixture) and `asset-placement-gate` (correct Drop routing on a duplicate-skill proposal) ran end-to-end via `codex exec -m gpt-5.5 -` with no CC-native dependency, confirming the M1 tier assignments. Limitations observed (CC-native hook noise, no token accounting, etc.) are documented in `docs/codex-compat.md`.
|
|
145
|
+
|
|
146
|
+
Tracking: open an issue at `chrono-meta/forge-harness` with label `codex-validation` to report a validated run.
|
|
147
|
+
|
|
148
|
+
---
|
|
149
|
+
|
|
150
|
+
## Adding new agents
|
|
151
|
+
|
|
152
|
+
New agents must pass the **New Skill Creation Pre-Commit Gate** defined in `CLAUDE.md` before committing. Key requirements:
|
|
153
|
+
|
|
154
|
+
1. Role duplication check via `/asset-placement-gate`
|
|
155
|
+
2. Plain description — no self-marketing language
|
|
156
|
+
3. At least 1 explicit `Done When` condition
|
|
157
|
+
4. At least 3 natural language trigger examples
|
|
158
|
+
5. Independently executable (or dependencies explicitly documented)
|
|
159
|
+
|
|
160
|
+
After 2+ weeks of use: if `accepted ≥ 60%` of invocations → strengthen; if `rejected ≥ 40%` → redefine scope or deprecate. See `CLAUDE.md > Sub-agent Operations`.
|
package/CATALOG.md
ADDED
|
@@ -0,0 +1,256 @@
|
|
|
1
|
+
# Knowledge Catalog
|
|
2
|
+
|
|
3
|
+
AI reads this file first when searching past work. Open individual files for detailed content.
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## Sessions
|
|
8
|
+
|
|
9
|
+
<!-- Add entries in reverse date order (newest at top) -->
|
|
10
|
+
|
|
11
|
+
### 2026-06-04 | forge-harness | #skill, #public-private-split, #leak-prevention, #composability-gate
|
|
12
|
+
**File:** plugins/fh-meta/skills/public-surface-audit/SKILL.md
|
|
13
|
+
New meta-skill: scans git-tracked files for operator-private tokens (real username, absolute home paths, companion-store name, company asset names) that belong only in gitignored files. Closes the surface-sweep gap from the Gap-1 public/private split. Dogfoods its own rule — patterns live in a gitignored source, SKILL.md carries only placeholders.
|
|
14
|
+
- Decision: tracked-only scan (git ls-files); configurable pattern table + tight allowlist; severity HIGH/MED/LOW.
|
|
15
|
+
- Open: skill-count drift across plugin.json / local_fh_context (separate cleanup).
|
|
16
|
+
|
|
17
|
+
### 2026-06-04 | forge-harness | #issue-69, #field-harvest, #mode-b, #detection-skip
|
|
18
|
+
**File:** plugins/fh-meta/skills/field-harvest/SKILL.md
|
|
19
|
+
Mode B extension (Proposal A): session-end auto-proposal for un-logged field-cwd commits (item1), detection-skip for already-logged commits (item2), templates/ hub-link footer (item6).
|
|
20
|
+
- Decision: auto-trigger is proposal-only (no auto-run); harvest-loop keeps hub-cwd wrap-up ownership (collision guard).
|
|
21
|
+
- Open: item2 bash detection-skip style pending sister-hub reply (inline-grep vs persisted-ledger).
|
|
22
|
+
|
|
23
|
+
### 2026-06-04 | forge-harness | #4-axis-gate, #pre-commit-hook, #docs-scope
|
|
24
|
+
**File:** templates/.git-hooks/pre-commit
|
|
25
|
+
4-axis gate scope extension: docs/*.md + AGENTS.md added to the substantive carve-out (Axes 2-3 when diff adds fence/citation/version). Also implemented the previously-documented-but-missing knowledge/ carve-out in the hook. Gate hook activated (core.hooksPath → templates/.git-hooks; was physically inactive).
|
|
26
|
+
- Decision: path-regex carve-out bucket + diff_is_substantive() helper.
|
|
27
|
+
- Open: none.
|
|
28
|
+
|
|
29
|
+
### 2026-06-03 | forge-harness | #goal-quench, #native-validation, #calibration, #stop-hook
|
|
30
|
+
**File:** tracks/_meta/goal_quench_2026-06-03.md
|
|
31
|
+
Run #1 native goal-quench validation (core mode): Stop hook `.active` → `.pending` confirmed working. pipeline-conductor --quick CLEAN (Step 3 ESCALATE resolved via option(a) inline fix).
|
|
32
|
+
- Decision: GREEN estimate missed pipeline-conductor overhead (~6K); timestamp field used literal "now" — fixed to `date +"%Y-%m-%d %H:%M"` going forward.
|
|
33
|
+
|
|
34
|
+
### 2026-06-03 | forge-harness | #context-bridge-dispatch, #done-when, #deprecated, #source-grounding
|
|
35
|
+
**File:** plugins/fh-meta/skills/context-bridge-dispatch/SKILL.md
|
|
36
|
+
Added Done When section (satisfies harness-doctor L2 M-tier for deprecated skill). Replaced informal "completeness gate" term with "L2 M-tier (CLAUDE.md §New Skill Creation Pre-Commit Gate)". All 22 fh-meta SKILL.md files now have Done When.
|
|
37
|
+
- Decision: deprecated skills still require Done When — `deprecated: true` is not an exemption from the pre-commit gate.
|
|
38
|
+
|
|
39
|
+
### 2026-06-03 | forge-harness | #goal-quench, #calibration, #run2, #catalog
|
|
40
|
+
**File:** CATALOG.md
|
|
41
|
+
Run #2 native goal-quench validation (core mode): added 3 CATALOG entries for today's session work. Established pipeline-conductor overhead correction (~6K) for GREEN-tier estimates.
|
|
42
|
+
|
|
43
|
+
### 2026-06-03 | forge-harness | #goal-quench, #install-wizard, #calibration-schema, #done-when, #skill-evolution
|
|
44
|
+
**File:** plugins/fh-meta/skills/goal-quench/SKILL.md, plugins/fh-meta/skills/install-wizard/SKILL.md
|
|
45
|
+
PR #67: Calibration schema extended with `run_id`, `session_id`, `scope_hint` fields (R10 schema gaps). Phase 1.5 hand-off clarified: "update not re-create" `.active` file to preserve `start_commit`. install-wizard: Done When section added (6-step gate + `--dry-run` variant) — only active skill of 28 missing it (R07 M-tier).
|
|
46
|
+
- Decision: schema fields added prospectively — backfilling retrospective N=10 rows with `mode`/`session_type` is a local companion-store task, not a SKILL.md change
|
|
47
|
+
|
|
48
|
+
### 2026-06-03 | forge-harness | #catalog, #goal-quench, #skill-evolution
|
|
49
|
+
**File:** CATALOG.md
|
|
50
|
+
PR #66: Added CATALOG entries for PRs #61–64 (goal-quench evolution arc): mode-ladder refactor, micro R-tier cleanup, non-coercive guidance formalization, scope-driven sidecar routing + 4.7× overhead calibration.
|
|
51
|
+
- Decision: tracks/** gitignored by design — calibration YAML records stay local; only CATALOG.md committed
|
|
52
|
+
|
|
53
|
+
### 2026-06-03 | forge-harness | #goal-quench, #sidecar-routing, #step-d, #deprecated-refs, #return-path
|
|
54
|
+
**File:** plugins/fh-meta/skills/goal-quench/SKILL.md, plugins/fh-meta/skills/install-wizard/SKILL.md, plugins/fh-meta/skills/frontier-digest/SKILL.md, plugins/fh-meta/skills/harvest-loop/SKILL.md
|
|
55
|
+
PR #65: Fixed Step D return-path bug (all 3 sidecar chains were fire-and-forget → added Step 3-c blocking verdict gate). Added `estimation_error_pct` field to calibration schema. Removed 4 deprecated routing refs: `context-bridge-dispatch` (install-wizard Cluster C + frontier-digest core skills list) and `/self-marketing-lint` (harvest-loop P10 → `/harness-doctor --lint`). ESCALATE added to Done When conditions.
|
|
56
|
+
- Decision: Step 3-c gate is blocking (not advisory) — sidecar verdict must resolve before Done When
|
|
57
|
+
|
|
58
|
+
### 2026-06-03 | forge-harness | #goal-quench, #sidecar-routing, #token-calibration, #steel-quench, #skill-evolution
|
|
59
|
+
**File:** plugins/fh-meta/skills/goal-quench/SKILL.md
|
|
60
|
+
PR #64: Added scope-driven sidecar routing (Step D) to goal-quench Phase 1.5: task-type signals auto-route to steel-quench C3 (code review), agent-composer panel (architecture), or sim-conductor+steel-quench Wave 5 (external publish). Formalized session overhead calibration at 4.7× factor (N=10), adding `session_type`, `actual_vs_estimate_ratio`, and `sidecar` fields to the calibration schema. Resolved steel-quench meta-audit S+A+B findings: sub-goal loop rewritten as user-driven queue, pre-flight check added, Opus escalation cost disclosed.
|
|
61
|
+
- Decision: overhead multiplier documented as empirical calibration constant; sidecar routing is scope-signal-driven, not mode-locked
|
|
62
|
+
|
|
63
|
+
### 2026-06-03 | forge-harness | #goal-quench, #non-coercive, #companion-store, #ephemeral-handoff
|
|
64
|
+
**File:** .claude/rules/modes_and_value.md
|
|
65
|
+
Formalized two non-coercive guidances: companion-store recommendation conditioned on accumulating context into meta-harness without a local fork; ephemeral-environment handoff rule made mode-agnostic (Mode D → companion-store handoff/, all others → committed note or PR comment in working repo).
|
|
66
|
+
- Decision: single-source preserved — rule lives in public mirror; companion store holds only outputs, never a rule copy
|
|
67
|
+
|
|
68
|
+
### 2026-06-03 | forge-harness | #goal-quench, #skill-evolution, #meta-audit, #prompt-regression
|
|
69
|
+
**File:** plugins/fh-meta/skills/goal-quench/SKILL.md
|
|
70
|
+
Post-merge micro R-tier cleanup: corrected two prompt-regression probe expectations mis-FAILing correct skills (P-CHAIN-01/02). Made LOCAL_SKILL_REGISTRY trigger in cross-ecosystem-synergy-detection honestly optional. Closes full-harness refactor backlog.
|
|
71
|
+
- Decision: leftover placeholder bash blocks and challenger wiring gap are verified non-defects
|
|
72
|
+
|
|
73
|
+
### 2026-06-03 | forge-harness | #goal-quench, #skill-evolution, #mode-ladder, #sidecar-routing
|
|
74
|
+
**File:** plugins/fh-meta/skills/goal-quench/SKILL.md
|
|
75
|
+
Evolved goal-quench into a fluid core→pro→max mode ladder. Core: token-budget-gate + pipeline-conductor --quick; pro: +context-doctor +agent-composer; max: +plugin-recommender +cross-ecosystem-synergy-detection. Phase-1 budget verdict auto-recommends mode. Ran full-harness dogfood sweep (33 skills): fixed phantom refs, dead blocks, stale agent forks (4 deleted), trigger collision, and 3 skill-splitter splits.
|
|
76
|
+
- Decision: RED tier reframed as max-mode decomposition on-ramp, not hard block
|
|
77
|
+
|
|
78
|
+
### 2026-06-02 | _audit | sister-asset, token-efficiency, compression, headroom
|
|
79
|
+
**File:** tracks/_audit/session_2026_06_02_headroom_context_doctor.md
|
|
80
|
+
Cross-audited Headroom (Netflix engineer's OSS token-compression tool, vendor-reported 60–95% reduction) against context-doctor's Compression Pass per sister_asset_protocol. Same goal, different layer: Headroom is the runtime executor FH lacks; context-doctor is the judgment Headroom lacks — they compose.
|
|
81
|
+
- Decision: import redundancy-category targeting heuristic (MCP outputs ~70% → logs ~90% → DB → file trees) into context-doctor + name Headroom as the production-proven local option; no clone-and-own (reference + record only)
|
|
82
|
+
- Open: actual proxy/agent-wrap routing is a local runtime setup (outside the FH repo); v0.22 maturity — pilot first
|
|
83
|
+
|
|
84
|
+
### 2026-06-02 | frontier-digest, identity, propagation | harness-engineering, a2a, mcp, observability, context
|
|
85
|
+
**File:** knowledge/shared/harness-core/harness_frontier_diagnosis_2026-06-02.md
|
|
86
|
+
Frontier digest anchored on FH's 3-layer identity + Core Axis (WebSearch engine; curl blocked, no API key). 2026-06 signal: "Harness Engineering" named 4th AI-engineering paradigm (65% of AI failures = harness defects); A2A Agent Cards + MCP registry standardize agent discovery; AHE thesis — observability is the self-improvement bottleneck.
|
|
87
|
+
- Decision: 6 strengthening candidates mapped per identity (agent-card registry + dispatch overhead budget → ①; harness-defect taxonomy + observability/eval hooks → ②; L1/L2/L3 context hierarchy + compression pass → ③)
|
|
88
|
+
- Open: candidates are proposals only — none implemented yet. Raw signal + processing checklist held in a private companion store, not this public repo.
|
|
89
|
+
|
|
90
|
+
### 2026-05-31 | multi-model, sidecar, adversarial, validated | orchestrator-swap, perspective-diversity, cross-cli
|
|
91
|
+
**File:** knowledge/shared/harness-core/multi_model_sidecar_strategy.md
|
|
92
|
+
외부 모델(Gemini/Codex/Copilot CLI)을 Claude Code/FH 세션 안에서 Bash tool로 sidecar 호출하는 검증된 패턴. Experiment 1(직접 sidecar 호출) + Experiment 2(3-round orchestrator-swap)로 실증 — 각 orchestrator가 비중복 findings 발견, cross-wave delta가 convergence를 개선. perspective diversity(1순위)·model-access fallback(2순위)·token economy(3순위). §Mechanism: agent dispatch 아닌 stateless 서브프로세스.
|
|
93
|
+
- Decision: 하네스가 multi-model synthesis의 activation condition — sidecar는 quality compounding 메커니즘이지 단순 coverage 체크 아님
|
|
94
|
+
|
|
95
|
+
### 2026-05-31 | anti-bias, multi-team, adversarial, token-coverage | steel-quench, sim-conductor, experiment, v2-paper
|
|
96
|
+
**File:** tracks/_meta/fh_multiteam_token_coverage_2026_05_31.md
|
|
97
|
+
Experiment 5 — Multi-Team Adversarial Panel measured on source-grounding-audit SKILL.md. 4 conditions (C1 single / C2 cross-session / C3 +gemini / C4 codex-TTY-fail=C3). Key results: C1→25% coverage, C2→75%, C3→100%. Claude blind spots: 3 findings (25% of total), 1 S-grade. Claude-side cost C2→C3: +0 tokens (H3 validated — Gemini billed to separate quota). Codex CLI present but headless-inoperable. Updated steel-quench/sim-conductor/source-grounding-audit/harness-doctor/harvest-loop with Multi-Team Panel design + human gates + synthesizer cross-session. v2 paper Experiment 5 section drafted with full metrics table.
|
|
98
|
+
- Decision: decision rule confirmed — routine→C2 (cross-session), pre-publish→C3+ (zero Claude overhead)
|
|
99
|
+
### 2026-05-31 | synergy, integration, playbook | opencode, hermes, openhuman, governance, marketing
|
|
100
|
+
**File:** knowledge/shared/harness-core/fh_synergy_playbook.md
|
|
101
|
+
Concrete workflow specifications for using FH with OpenCode/Hermes/OpenHuman — grounded only in recorded experiments. Three patterns: (1) OpenCode: fh-gate.sh after code gen → DONE→PENDING flip, 2 A-grade on arity.ts; (2) Hermes: skill audit before dispatch → 2 A-grade pre-exec/credential gaps; (3) OpenHuman: Memory Tree staleness audit → GROUNDED/STALE/BROKEN verdict. Includes honest finding-rate estimates, "no integration required" value prop, and compounding effect explanation.
|
|
102
|
+
- Decision: no unverified claims — every stated outcome traces to a specific experiment or structural guarantee
|
|
103
|
+
|
|
104
|
+
### 2026-05-31 | integration-contract, bridge-layer, governance-interface | opencode, hermes, openhuman, v2-paper
|
|
105
|
+
**File:** knowledge/shared/harness-core/fh_integration_contract.md
|
|
106
|
+
Formal v0.1 specification for how callers (OpenCode, Hermes, OpenHuman, CI) invoke FH governance gates and receive structured verdicts. Defines: input format (newline-separated files), FH_STATUS+verdict format, findings in YAML block, parse recipe. Includes caller-specific guidance + Stop hook pattern + record spec. Gemini adversarial review found 2 A-grade issues (space-separated paths, non-parseable multi-line fields) — both fixed in v0.1. Also includes `scripts/fh-gate.sh` prompt-generator wrapper.
|
|
107
|
+
- Decision: findings use YAML block (not flat key-value) to prevent delimiter ambiguity; FH_STATUS mandatory for fail-safe parsing
|
|
108
|
+
|
|
109
|
+
### 2026-05-31 | opencode, governance, usage-guide, synergy | pipeline-conductor, steel-quench, v2-paper
|
|
110
|
+
**File:** knowledge/shared/harness-core/fh_opencode_governance_wrapper.md
|
|
111
|
+
Step-by-step usage guide for FH + OpenCode governance integration. 3-step protocol (diff capture → steel-quench → pipeline-conductor). No API adapter required. Includes empirical baseline from arity.ts trial: CI=DONE, FH governance=PENDING, 2 A-grade security-adjacent findings caught. Stop hook automation pattern included.
|
|
112
|
+
- Decision: governance wrapper documented as protocol (not API) — FH reads files OpenCode writes
|
|
113
|
+
|
|
114
|
+
### 2026-05-31 | v2-paper, opencode, governance, controlled-experiment | steel-quench, pipeline-conductor, synergy
|
|
115
|
+
**File:** tracks/_meta/fh_opencode_governance_experiment_2026_05_31.md
|
|
116
|
+
FH governance (steel-quench + pipeline-conductor --quick) applied to OpenCode's AI-generated `arity.ts`. Baseline: CI green, DONE. FH governance: PENDING — 2 A-grade findings (short-token overflow in permission allowlist, npx/opencode missing from arity table) + 1 B-grade. Delta not attributable to model — attributable to methodology layer. v2 paper prototype: controlled experiment evidence for N-fold synergy claim.
|
|
117
|
+
- Decision: governance layer catches issues CI misses; arity.ts short-token overflow is permission-critical and untested
|
|
118
|
+
|
|
119
|
+
### 2026-05-30 | fh-meta | edit-manifest, predict-verify, validation-gate, skill-evolution, SkillOpt, AHE
|
|
120
|
+
**File:** plugins/fh-meta/skills/edit-manifest/SKILL.md
|
|
121
|
+
New skill: predict-verify loop for harness edits. Every SKILL.md/rules/CLAUDE.md edit records a falsifiable prediction; next session verifies against actual outcomes. Validation gate (SkillOpt pattern) accepts only edits with measurable improvement; rejected edits retained as negative-feedback buffer. Integrated as harvest-loop Step 0-c.
|
|
122
|
+
- Decision: based on AHE (arXiv:2604.25850) change manifest + SkillOpt (arXiv:2605.23904) selection-split gate
|
|
123
|
+
|
|
124
|
+
### 2026-05-30 | fh-meta | memory-hygiene, stale-memory, staleness-detection, verification
|
|
125
|
+
**File:** plugins/fh-meta/skills/memory-hygiene/SKILL.md
|
|
126
|
+
New skill: detects "stale-but-confident" memory entries (facts verified once but silently drifted). Classifies by type (project 14d / reference 30d / feedback 90d / user 180d), re-verifies live via gh CLI or WebFetch (FH online advantage), proposes archival. Integrated as harvest-loop Step 0-c.
|
|
127
|
+
- Decision: based on Scaling the Harness (arXiv:2605.26112) §3.2 stale-but-confident failure mode
|
|
128
|
+
|
|
129
|
+
### 2026-05-29 | harness-core | return-path-gate, skill-chain, conditional-pass, closed-loop
|
|
130
|
+
**File:** knowledge/shared/harness-core/return_path_gate.md
|
|
131
|
+
Pattern: downstream skill returns structured verdict (PASS/CONDITIONAL_PASS/FAIL/ESCALATE) back to caller, which gates next step on it. Verified in apex-review→sim-conductor and agent-composer↔deliberation.
|
|
132
|
+
- Decision: promoted to knowledge/shared/ — same pattern appeared independently in 2 skill pairs
|
|
133
|
+
|
|
134
|
+
### 2026-05-29 | harness-core | meta-harness-engineering, definition, frontier, academic-convergence
|
|
135
|
+
**File:** knowledge/shared/harness-core/meta_harness_engineering_definition.md
|
|
136
|
+
Formal definition of meta harness engineering + FH positioning vs. academic convergence (arXiv 2605.18747 "Code as Agent Harness", arXiv 2604.14228 98.4% finding). Maps FH 6-axis to 3-layer taxonomy; distinguishes human-in-loop (FH) vs automation-first vs automation-maximalist approaches.
|
|
137
|
+
- Decision: FH differentiator = human judgment gate on all PRs, not automation maximization
|
|
138
|
+
|
|
139
|
+
### 2026-05-29 | fh-commons | token-budget-gate, token-estimation, cost-guard, multi-agent
|
|
140
|
+
**File:** plugins/fh-commons/skills/token-budget-gate/SKILL.md
|
|
141
|
+
New skill: pre-task token cost estimation with Green/Yellow/Orange/Red gate verdict. Post-task calibration loop improves future estimates. Auto-proposed before agent-composer, sim-conductor, steel-quench, harvest-loop.
|
|
142
|
+
- Decision: placed in fh-commons (project-agnostic, useful before any expensive multi-agent task)
|
|
143
|
+
- Thresholds: <10K green / 10-30K yellow / 30-60K orange / >60K red (user-configurable)
|
|
144
|
+
|
|
145
|
+
### 2026-05-29 | fh-commons | mcp-circuit-breaker, mcp-reliability, tool-failure, fallback
|
|
146
|
+
**File:** plugins/fh-commons/skills/mcp-circuit-breaker/SKILL.md
|
|
147
|
+
New skill: detects MCP tool failure patterns (3 consecutive fails = trip), blocks further calls, proposes 3-tier fallbacks, and resets via HALF-OPEN probe after cooldown.
|
|
148
|
+
- Decision: placed in fh-commons (project-agnostic MCP guard, useful in any Claude Code project)
|
|
149
|
+
- Circuit states: CLOSED → OPEN → HALF-OPEN → CLOSED
|
|
150
|
+
|
|
151
|
+
### 2026-05-29 | fh-meta | prompt-regression, regression-detection, harness-quality
|
|
152
|
+
**File:** plugins/fh-meta/skills/prompt-regression/SKILL.md
|
|
153
|
+
New skill: detects harness behavioral regressions after CLAUDE.md / rule / skill edits by running standard probe suite and comparing against baselines.
|
|
154
|
+
- Decision: placed in fh-meta (harness-specific behavioral testing, not general-purpose)
|
|
155
|
+
- Chain: FAIL verdict → harness-doctor → verify-bidirectional
|
|
156
|
+
|
|
157
|
+
### 2026-05-28 | _meta | install-wizard, plugin-autoinstall, deprecated-cleanup, fh-ops, external-validation
|
|
158
|
+
**File:** tracks/_meta/session_2026_05_28_fh-external-ops.md
|
|
159
|
+
First external run of FH — install-wizard (score 57/100) revealed 2 friction points, both fixed immediately. PR #7 (plugin auto-install) + PR #8 (deprecated refs cleanup) merged.
|
|
160
|
+
- Decision: install-wizard FH plugin MISS → AI auto-runs via Bash (eliminates 3-turn manual flow) — "friction only visible when the builder actually uses it" pattern
|
|
161
|
+
- Decision: deprecated refs must be updated alongside CHANGELOG — separate cleanup PR required
|
|
162
|
+
|
|
163
|
+
### 2026-05-27 | _meta | two-layer-storage, memory-vs-tracks, cross-session-state
|
|
164
|
+
**File:** tracks/_meta/fh_signal_2026_05_27_session-starter.md
|
|
165
|
+
Two-layer storage principle formalized: `tracks/` = local work history (machine-bound), `memory/` = critical cross-session state (durable, survives re-clone). Gap: session starter files in tracks/ are lost on machine change.
|
|
166
|
+
- Decision: Critical cross-session state must also be written to `~/.claude/projects/.../memory/` — tracks/ alone is insufficient for durability
|
|
167
|
+
|
|
168
|
+
### 2026-05-26 | _audit | sister-asset, harness-evolver, meta-harness, stanford, arxiv
|
|
169
|
+
**File:** tracks/_audit/session_2026_05_26_harness_evolver.md
|
|
170
|
+
Sister asset cross-audit — harness-evolver (raphaelchristi) + Meta-Harness (Lee et al., arXiv:2603.28052, Stanford IRIS Lab). Independent architectural convergence confirmed: outer-loop field observation → adversarial critique → synthesis → integration → verification. Cross-reference links proposed; Issue #26 filed.
|
|
171
|
+
- Decision: harness-evolver = direct complement (automation-first) vs FH (knowledge-accumulation-first) — mutual citation proposed
|
|
172
|
+
|
|
173
|
+
### 2026-05-26 | _audit | sister-asset, sylph-ai, arxiv, simplification-principle
|
|
174
|
+
**File:** tracks/_audit/session_2026_05_26_sylph_sister.md
|
|
175
|
+
Sister asset cross-audit — "The Last Harness You'll Ever Build" (Sylph.AI, arXiv:2604.21003). Title resonates with FH simplification principle but approach diverges: Sylph = fully automated adversarial agent loops, FH = human-in-the-loop curated knowledge evolution.
|
|
176
|
+
- Decision: Sylph.AI = automation-maximalist counterpart; FH distinction is human judgment layer
|
|
177
|
+
|
|
178
|
+
### 2026-05-26 | _meta | external-network, wave3-validation, sase, install-wizard, frontier-digest
|
|
179
|
+
**File:** tracks/_meta/external_network_verification_2026_05_26.md
|
|
180
|
+
Wave 3 meta-validation: confirmed all previously restricted-network-blocked capabilities work on a standard network — install-wizard dry-run, plugin-recommender live GitHub search, frontier-digest fetch, git push. Baseline for external user environment parity.
|
|
181
|
+
- Decision: All core FH capabilities verified functional on a standard (unrestricted) network
|
|
182
|
+
|
|
183
|
+
### 2026-05-26 | forge-harness | v1.2, public-release, harness-evolver-absorb
|
|
184
|
+
**File:** tracks/_meta/reference_next_session_starter.md
|
|
185
|
+
v1.2 release complete (PR #1–#5): harvest-loop Step 0, agent-composer worktree isolation, steel-quench numeric scoring, README positioning. Repo made public.
|
|
186
|
+
- Decision: harness-evolver 3대 혁신 흡수 — regression guard / worktree isolation / numeric scoring
|
|
187
|
+
|
|
188
|
+
---
|
|
189
|
+
|
|
190
|
+
## Reference Documents
|
|
191
|
+
|
|
192
|
+
<!-- Time-independent reference documents -->
|
|
193
|
+
|
|
194
|
+
### 2026-04-28 | template | maturity-roadmap, 3-phase-frame, frontier-tracking, simplification-gate
|
|
195
|
+
**File:** `knowledge/shared/harness-core/hub_maturity_roadmap.md`
|
|
196
|
+
Hub long-term evolution path frame. Phase I (entering maturity) → Phase II (frontier following (b)cadence) → Phase III (leading) 3-stage model + 5-criteria gate (audit automation·operations guide·external propagation·sub-agent judgment·self-diagnosis warning) + 6 indicators (seed repo·blog·citations·external adoption·self-evolving·industry original) + simplification gate (self-diagnosis + within 200 lines + unreferenced archive at each transition). General-purpose template derived from first verified operating instance.
|
|
197
|
+
- Decision: (b) quarterly + monthly cadence recommended — if adopting other option, need to prove simplification gate passed
|
|
198
|
+
- Decision: Phase III has no completion (ongoing state) — 3+ of 6 indicators continuously rising is the maintenance condition
|
|
199
|
+
- Decision: Phase regression allowed — do not force linear progression (allow partial Phase I redo if frontier following routine is missed)
|
|
200
|
+
|
|
201
|
+
---
|
|
202
|
+
|
|
203
|
+
## Plugins
|
|
204
|
+
|
|
205
|
+
### 2026-05-08 | fh-meta v0.5.0 | six-skills-operation, path-b-generalization, command-tower-gate, mode-c-user, beta-release
|
|
206
|
+
**File:** plugins/fh-meta/.claude-plugin/plugin.json + .claude-plugin/marketplace.json
|
|
207
|
+
Hub meta operations tool bundle — 6 skills operation. harvest-loop path B generalization + verify-bidirectional path B generalization + frontier-digest path B generalization + cross-ecosystem-synergy-detection + plugin-recommender + **hub-cc-pr-reviewer** command tower gate operations rule automation (new). 2 agents (hub-persona-auditor + fact-checker). Beta operation — harness core principle *"beta + public release = practical capability obligation"* followed.
|
|
208
|
+
- Decision: hub-cc-pr-reviewer skill newly created — command tower gate operations rule automation + PR lifecycle 4-run accumulated + explicit decision trigger
|
|
209
|
+
- Decision: plugin level v0.4.3 → v0.5.0 promoted — 6 skills operation baseline + path B generalization baseline followed
|
|
210
|
+
- Decision: 3 skills path B generalization — harvest-loop + verify-bidirectional + frontier-digest / external user environment adaptation path enhanced
|
|
211
|
+
- Note: audit-learnings deprecated from plugin (2026-05-xx) → transferred to hub-internal deprecated/; replaced by harvest-loop
|
|
212
|
+
- Note: frontier-status-summary deprecated (2026-05-xx) → replaced by frontier-digest
|
|
213
|
+
|
|
214
|
+
---
|
|
215
|
+
|
|
216
|
+
## Skills
|
|
217
|
+
|
|
218
|
+
### 2026-05-08 | fh-meta | harvest-loop, weekly-audit, self-evolution-pipeline, session-harvest, phase-2-plus
|
|
219
|
+
**File:** plugins/fh-meta/skills/harvest-loop/SKILL.md
|
|
220
|
+
Self-evolution pipeline — field-harvest → contention-layer → devil/innovator parallel → synthesizer → Critic Agent → harness-doctor → verify-bidirectional → curator (8 steps). Lightweight mode for weekly audit. Replaces deprecated audit-learnings.
|
|
221
|
+
- Decision: harvest-loop = audit-learnings successor + full self-evolution pipeline integrated
|
|
222
|
+
|
|
223
|
+
### 2026-05-08 | fh-meta | verify-bidirectional, layer-5-cross-verification, conscious-self-activation, diff-gate
|
|
224
|
+
**File:** plugins/fh-meta/skills/verify-bidirectional/SKILL.md
|
|
225
|
+
Bidirectional self-verification pattern automation — when user's precision counter-argument manifests after AI recommendation/agreement persistence, baseline update channel 6-step processing.
|
|
226
|
+
- Decision: v0.5 official release — accumulated runs + mode C correction catch fully persisted
|
|
227
|
+
|
|
228
|
+
### 2026-05-08 | fh-meta | hub-cc-pr-reviewer, command-tower-gate-automation, baseline-coherence-check, layer-5-self-catch
|
|
229
|
+
**File:** plugins/fh-meta/skills/hub-cc-pr-reviewer/SKILL.md
|
|
230
|
+
Command Tower Gate operations rule automation — on PR input, auto-generates baseline coherence check 8-matrix + Layer 5 self-catch matrix + review comment attachment + admin override merge recommendation.
|
|
231
|
+
- Decision: v0.1 newly created — PR lifecycle 4-run accumulated + explicit decision trigger met
|
|
232
|
+
|
|
233
|
+
### 2026-05-20 | fh-meta | context-bridge-dispatch, agent-view-context-blindness, parallel-dispatch, session-context-card
|
|
234
|
+
**File:** plugins/fh-meta/skills/context-bridge-dispatch/SKILL.md
|
|
235
|
+
Parallel agent dispatch pre-session context card injection pattern automation — sub-agents read files but cannot access main session living context. Context Card (purpose·completed·this agent's task·caution) generation and injection into each prompt before 2+ parallel dispatches. Simple file lookup agents can omit.
|
|
236
|
+
- Decision: v0.1 newly created — agent view blindspot captured in the field → FH skill decision
|
|
237
|
+
|
|
238
|
+
---
|
|
239
|
+
|
|
240
|
+
## Agents
|
|
241
|
+
|
|
242
|
+
### 2026-05-08 | fh-meta | hub-persona-auditor, persona-simulation, three-tier-revision, external-asset-pre-publication
|
|
243
|
+
**File:** plugins/fh-meta/agents/hub-persona-auditor.md
|
|
244
|
+
External-facing asset (briefing·card·public guide) pre-publication persona audit — 3+ virtual reader persona simulation + 4-axis (resonance·confusion·resistance·supplement) review + 3-tier (mandatory·strong·recommended) revision proposals.
|
|
245
|
+
- Decision: fh-meta agent separate operation (not a skill / external-facing cadence activation)
|
|
246
|
+
|
|
247
|
+
### 2026-05-08 | fh-meta | fact-checker, hub-asset-grep-verification, duplicate-detection, stale-fact-detection
|
|
248
|
+
**File:** plugins/fh-meta/agents/fact-checker.md
|
|
249
|
+
Hub asset grep verification — (1) hub asset duplicate check before recommending new asset/skill/agent (2) stale data detection in memory/docs (3) when duplicate work is suspected.
|
|
250
|
+
- Decision: hub self-review circuit baseline established
|
|
251
|
+
|
|
252
|
+
---
|
|
253
|
+
|
|
254
|
+
## Learnings
|
|
255
|
+
|
|
256
|
+
<!-- Accumulated feedback/lessons -->
|