@chrono-meta/fh-gate 1.0.2 → 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,18 +1,22 @@
1
1
  <p align="center">
2
- <img src="docs/banner.png" alt="forge-harness — A forkable Claude Code meta-harness for multi-project teams" width="640">
2
+ <img src="docs/banner.png" alt="forge-harness — A forkable Claude Code meta-harness for multi-project teams" width="680">
3
3
  </p>
4
4
 
5
5
  <p align="center">
6
6
  <a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-22c55e.svg" alt="MIT License"></a>
7
7
  <img src="https://img.shields.io/badge/version-v1.3-3b82f6.svg" alt="v1.3">
8
- <a href="https://zenodo.org/records/20397566"><img src="https://zenodo.org/badge/DOI/10.5281/zenodo.20397566.svg" alt="DOI 10.5281/zenodo.20397566"></a>
9
- <img src="https://img.shields.io/badge/Claude_Code-compatible-a855f7.svg" alt="Claude Code compatible">
10
- <img src="https://img.shields.io/badge/Codex-beta-f59e0b.svg" alt="Codex-compatible beta">
8
+ <a href="https://zenodo.org/records/20397566"><img src="https://img.shields.io/badge/DOI-10.5281%2Fzenodo.20397566-blue.svg" alt="DOI"></a>
9
+ <img src="https://img.shields.io/badge/Claude_Code-compatible-a855f7.svg" alt="Claude Code">
10
+ <a href="https://www.npmjs.com/package/@chrono-meta/fh-gate"><img src="https://img.shields.io/npm/v/@chrono-meta/fh-gate.svg?color=cb3837" alt="npm"></a>
11
11
  </p>
12
12
 
13
13
  <p align="center">
14
14
  <b>Fork it. Rename it. Make it yours.</b><br>
15
- A persistent knowledge hub that connects all your Claude Code projects shared skills, accumulated context, and a compounding improvement loop.
15
+ A persistent knowledge hub that connects all your Claude Code projects —<br>shared skills, accumulated context, and a compounding improvement loop.
16
+ </p>
17
+
18
+ <p align="center">
19
+ <img src="docs/pillars.svg" alt="FORK · ADAPT · COLLABORATE · EMPOWER" width="680">
16
20
  </p>
17
21
 
18
22
  ---
@@ -20,500 +24,156 @@
20
24
  | If you're here because… | forge-harness solves it |
21
25
  |---|---|
22
26
  | Context disappears when a session ends | Persistent `tracks/` — resumable from anywhere |
23
- | You repeat the same setup for every project | Connect once to the hub, share across all projects |
24
- | Your team's AI know-how lives only in people's heads | Codify it so everyone shares it |
27
+ | You repeat the same setup across projects | Connect once to the hub, share across all projects |
28
+ | Team AI know-how lives only in people's heads | Codify it so everyone shares it |
25
29
  | You want AI to get *better* as work accumulates | Skills and patterns compound session over session |
26
- | You're evaluating AI-generated code with no governance layer | `fh-gate` wraps any coding agent as a post-generation gate |
27
-
28
- > **Worried about token costs?** New install footprint ≈ 14.5% of 200K context. `context-doctor` diagnoses and reduces this further. → [Token optimization](#token-cost-optimization)
29
-
30
- | Where you are now | Jump to |
31
- |---|---|
32
- | Starting from scratch | [Get started in 2 minutes](#get-started-in-2-minutes) |
33
- | Already using it, want more | [33 asset activation check](#already-using-it----33-asset-activation-check) |
34
- | Wrapping an external coding agent | [Governance layer](#governance-layer-for-ai-generated-code) |
35
- | Want to spread it to your team | [Operating model Phase 3](#operating-model----3-phase-essence) |
36
-
37
- ---
30
+ | You need a governance layer for AI-generated code | `fh-gate` wraps any coding agent as a post-generation gate |
38
31
 
39
32
  > **This document is for humans.** AI operating rules → `CLAUDE.md` · Command reference → `CHEATSHEET.md`
40
33
 
41
34
  ---
42
35
 
43
- ## What is this?
44
-
45
- An **acceleration hub** for teams already using Claude Code. Connect N projects to one hub — work, learnings, and patterns from each project **mutually reinforce** each other. Build skills and agents once in the hub; share them across every project.
46
-
47
- > **Goal**: Get into orbit in the age of AI acceleration without burning out. Minimize setup friction, optimize context, distribute expertise by task complexity, and raise the success rate of every session.
48
-
49
- forge-harness is structured as two distinct layers:
50
-
51
- | Layer | Contents | AI compatibility |
52
- |---|---|---|
53
- | **Methodology layer** (model-agnostic) | `tracks/`, `knowledge/`, `SKILL.md` documents, session protocols | Any AI model |
54
- | **Automation layer** (Claude-native) | `.claude/agents/`, hooks, slash commands, `CLAUDE.md` rules | Claude Code only |
55
-
56
- The **methodology layer** is the portable core — connecting projects to a persistent hub, accumulating learnings in `tracks/`, curating cross-project knowledge in `knowledge/shared/`. Works regardless of which AI you use.
57
-
58
- The **automation layer** is what makes the methodology frictionless when running Claude Code: agents dispatch automatically, hooks fire at session boundaries, and slash commands invoke skills without manual prompting.
59
-
60
- > **Codex-compatible beta**: Gemini, Codex, and other AI users can apply the methodology layer manually. Automation layer features require Claude Code as host.
61
-
62
- ---
63
-
64
- ## Finding your entry path
65
-
66
- Teams using AI collaboration tools systematically are already doing **harness engineering**: QA protocols, verification pipelines, structures that make AI behave more consistently. forge-harness is the **OS one layer above** — a system that measures, improves, and evolves the harness itself across multiple projects.
67
-
68
- | Layer | What it does | Examples |
69
- |---|---|---|
70
- | Harness engineering | Per-project rules, gates, context management | QA protocols, CLAUDE.md rulesets, TC verification pipelines |
71
- | **Meta harness engineering** | Cross-project system to measure, improve, evolve harnesses | FH skill bus, harness-doctor, steel-quench, field-harvest |
72
-
73
- > **FH v1.0 paper** — published 2026-05-30 on [Zenodo](https://zenodo.org/records/20397566) (DOI: 10.5281/zenodo.20397566) · arXiv submission in review. Documents the 2-layer design, 6-axis framework, 4-agent orchestration, and compounding loop with controlled empirical evidence.
74
-
75
- > **External validation (2026)** — three independent research findings converge:
76
- > - VILA-Lab analysis of Claude Code v2.1.88 (512K lines): [98.4% is harness infrastructure, 1.6% AI logic](https://arxiv.org/abs/2604.14228)
77
- > - "[Code as Agent Harness](https://arxiv.org/abs/2605.18747)" (arXiv, May 2026, 43 authors)
78
- > - Stanford IRIS Lab: "[Meta-Harness](https://arxiv.org/abs/2603.28052)" (Lee et al., Mar 2026) — outer-loop harness optimization; +7.7pts at 4× fewer tokens
79
-
80
- #### FH vs. automated harness tools
81
-
82
- The Stanford paper inspired [harness-evolver](https://github.com/raphaelchristi/harness-evolver) — a fully automated 7-stage CODE optimizer. FH independently converged on the same loop architecture from the opposite direction:
83
-
84
- | Axis | harness-evolver | forge-harness |
85
- |---|---|---|
86
- | **Optimization target** | Harness code (prompts, routing) | Harness knowledge (context, patterns, expertise) |
87
- | **Evolution** | Auto-merge winners to git | Human-approved at every stage |
88
- | **Infrastructure** | LangSmith + Python 3.10+ | CLAUDE.md + skills only, zero extra |
89
- | **Scope** | Single-harness optimization | Multi-project federation, shared skill bus |
90
- | **Knowledge layer** | No persistent curation | `tracks/` + `knowledge/` grow over time |
91
-
92
- They're complementary — FH's approval gates and knowledge layer fill exactly the gaps automated CODE search leaves open.
93
-
94
- Count how many apply to you:
95
-
96
- - [ ] You have 2 or more Claude Code projects
97
- - [ ] You lose context when a session ends
98
- - [ ] You repeat the same patterns and rules across multiple projects
99
- - [ ] You want to spread AI methodology to your team
100
- - [ ] You want AI to improve as work accumulates
101
-
102
- | Count | Recommended path |
103
- |:---:|---|
104
- | **3+** | Standard entry → [Get started in 2 minutes](#get-started-in-2-minutes) |
105
- | **1–2** | Plugin first → `claude plugin install -s user fh-meta@forge-harness` |
106
- | **0** | Single-project stage — check back when you reach 2+ projects. `context-doctor` is available standalone now |
107
-
108
- ---
109
-
110
36
  ## Get started in 2 minutes
111
37
 
112
- > **Prerequisite**: Claude Code CLI installed. Verify: `claude --version`
113
-
114
- ### Step 0. Register the plugin
38
+ **Prerequisite**: Claude Code CLI verify with `claude --version`
115
39
 
116
40
  ```bash
41
+ # 1. Install the plugin
117
42
  claude plugin marketplace add https://github.com/chrono-meta/forge-harness.git
118
43
  claude plugin install -s user fh-meta@forge-harness
119
- ```
120
-
121
- > If Step 0 fails: run `claude plugin update fh-meta@forge-harness`, or check that your network can reach `github.com`.
122
-
123
- Verify: type `/skills` in the CC chat → if `install-wizard` appears, you're done.
124
44
 
125
- ### Step 1. Clone the hub
126
-
127
- ```bash
45
+ # 2. Clone the hub
128
46
  git clone https://github.com/chrono-meta/forge-harness.git ~/forge-harness
129
47
  cd ~/forge-harness
130
- ```
131
-
132
- > **Standard path**: Fork on GitHub first → clone your fork → accumulate `tracks/` and `knowledge/` there → periodically pull upstream updates from forge-harness. Rename it to make it yours.
133
-
134
- ### Step 2. Say something
135
48
 
136
- ```bash
49
+ # 3. Start a session
137
50
  claude
138
51
  ```
139
52
 
140
- > ✅ **Expected**: Claude reads `CLAUDE.md`, asks what project to connect or what task to start.
141
- >
142
- > ❌ **Generic response?** → Run `pwd` to confirm you're in the `forge-harness` root. If not: `cd ~/forge-harness && claude`
143
-
144
- From here:
145
- - **"Connect a project"** → hub scans `../`, lists projects with `.git`, creates `tracks/{project}/` on confirmation
146
- - **"My projects are in `~/work/`"** → specify a different root
147
-
148
- ---
149
-
150
- ## Governance layer for AI-generated code
151
-
152
- FH wraps any coding agent (OpenCode, Codex, etc.) as a **post-generation governance layer** — no runtime adapter needed. FH reads files the agent writes; the protocol is the interface.
53
+ > ✅ Claude reads `CLAUDE.md` and asks what project to connect or what task to start.
54
+ > Say **"Connect a project"** → hub scans `../`, finds `.git` directories, creates `tracks/{project}/`.
153
55
 
56
+ **Plugin only (no clone):**
154
57
  ```bash
155
- # After a coding agent completes a task:
156
- ./scripts/fh-gate.sh # auto-detects changed files from git diff
157
- # → steel-quench adversarial pass # behavioral edges, untested contracts, security
158
- # → pipeline-conductor --quick # 4-axis: regression / adversarial / grounding / record
159
- # → FH_GATE_VERDICT # PASS | PENDING | BLOCKED | ESCALATE
58
+ claude plugin install -s user fh-meta@forge-harness
59
+ cd ~/projects/{your-project} && claude
160
60
  ```
161
61
 
162
- **Empirical result (2026-05-31)**: Applied to OpenCode's own AI-generated `permission/arity.ts` (163 lines, 6 tests passing, CI green). Governance verdict: PENDING — 2 A-grade findings CI didn't cover (short-token overflow in allowlist, executor tools absent from arity table). Delta attributable to methodology layer, not the model.
163
-
164
- Full spec: `knowledge/shared/harness-core/fh_integration_contract.md` · Usage: `knowledge/shared/harness-core/fh_opencode_governance_wrapper.md`
165
-
166
- > **One-line install (coming soon)**: `npx @chrono-meta/fh-gate "src/foo.ts" quick ci` — npm publish in progress.
167
-
168
62
  ---
169
63
 
170
- ## Real-world case — AI TC generation prompt hardening
171
-
172
- > **Context**: An AI-powered test case generation tool was merging TC outputs without quality validation. Prompts contained cushion language, phantom claims, and no priority guardrails.
64
+ ## What it is
173
65
 
174
- **Applied**: `steel-quench` (W1–W8 adversarial hardening) + `source-grounding-audit` (phantom claim detection)
66
+ forge-harness is structured as **two distinct layers**:
175
67
 
176
- | Wave | What was attacked | Result |
68
+ | Layer | Contents | AI compatibility |
177
69
  |---|---|---|
178
- | W1–W2 | Cushion language ("it would be good to…") forced conditions | Ambiguity eliminated |
179
- | W4–W5 | No self-check step Self-Check quality gate added | Quality bypass path closed |
180
- | W6 | Soft review → Hard gate ("no next step until fix complete") | Incomplete TC merge blocked |
181
- | W7 | P0 ratio inflation → forced re-review above 30% | Priority inflation prevented |
182
- | W8 | Phantom Claim Guard — unspecified values/button names banned | Fabricated expected results blocked |
183
-
184
- **Outcome**: 4 bugs found and fixed · 8-layer quality gate complete · output noise eliminated
185
-
186
- > The self-healing loop: steel-quench attacks the prompt → execution catches bugs the review missed → fixes are verified in the same pass.
187
-
188
- ---
70
+ | **Methodology layer** | `tracks/`, `knowledge/`, `SKILL.md` docs, session protocols | Any AI model |
71
+ | **Automation layer** | `.claude/agents/`, hooks, slash commands, `CLAUDE.md` rules | Claude Code only |
189
72
 
190
- ## Already using it33 asset activation check
191
-
192
- <details>
193
- <summary>Expand full asset table (33 skills + 5 agents)</summary>
194
-
195
- Check which of the following are **regularly activating** for you:
196
-
197
- | Asset | Role | Natural language triggers | Active |
198
- |---|---|---|:---:|
199
- | `agent-composer` | Plans optimal agent dispatch | "How should I split this across agents?", "Run in parallel" | □ |
200
- | `apex-review` | Final quality review from executive perspective | "Will this hold up with decision-makers?" | □ |
201
- | `verify-bidirectional` | Reverse-verify decisions | "Is that right?", "Double-check this" | □ |
202
- | `deliberation` *(fh-commons)* | Structured multi-angle argument | "Battle it out", "Review this from multiple angles" | □ |
203
- | `cross-ecosystem-synergy-detection` | Detect cross-tool synergies | "Are my installed tools working together?" | □ |
204
- | `plugin-recommender` | Plugin recommendations | "Is there a good tool for this?" | □ |
205
- | `hub-cc-pr-reviewer` | Automated PR review | "Review this PR", "Is it okay to merge?" | □ |
206
- | `context-doctor` | Token efficiency + `.claudeignore` | "Session is slow", "Clean up context" | □ |
207
- | `sim-conductor` | Meta-simulation orchestrator | "External user perspective", "Internal audit" | □ |
208
- | `steel-quench` | Full-spectrum adversarial verification — attacks output patterns (self-declarations, cushion language, structural flaws) | "Run the quench", "Attack from the root" | □ |
209
- | `source-grounding-audit` | Source back-tracing — detects Phantom Claims (no source found). Attacks input tracing (where did this come from?) | "Verify the source", "Grounding audit" | □ |
210
- | `harness-doctor` | Harness structure diagnosis | "Something seems wrong with my Claude setup" | □ |
211
- | `deep-clarify` | Socratic requirements clarification | "I'm not sure what I need to build", "Clarify this" | □ |
212
- | `meta-prompt-builder` | Meta prompt design | "Write a prompt for each Wave", "What should I tell the agent?" | □ |
213
- | `install-doctor` | Diagnose conflicts before/after plugin install | "Is it okay to add this plugin?" | □ |
214
- | `install-wizard` | Initial environment diagnosis + onboarding | "First-time setup", "Just installed this" | □ |
215
- | `asset-placement-gate` | New asset belongs in FH or project? | "Should this be shared?", "Hub vs project" | □ |
216
- | `marketplace-gate` | 5-point fitness gate before listing | "Is it okay to list this?" | □ |
217
- | `field-harvest` | Back-propagate field patterns to hub | "I could reuse this in other projects" | □ |
218
- | `hub-persona-auditor` | Pre-publish 4-axis audit | "How will this look to others?" | □ |
219
- | `fact-checker` | Asset deduplication check | "Isn't there something similar already?" | □ |
220
- | `persona-innovator` | Naming gap detection + ideation | "What would be a good name for this?" | □ |
221
- | `contention-layer` | Treat skill conflicts as harvest signals | "These two skills conflict" | □ |
222
- | `context-bridge-dispatch` | Inject session context cards before parallel dispatch | "Brief the agents first", "Parallel dispatch" | □ |
223
- | `frontier-digest` | Frontier signals (HN, arXiv) → actionable insights | "AI trend digest", "What's new this week" | □ |
224
- | `harvest-loop` | End-of-session learning → evolution pipeline | "Harvest the session", "Run the pipeline" | □ |
225
- | `self-marketing-lint` | Remove self-marketing language from skill descriptions | "Description diet", "Strip the marketing tone" | □ |
226
- | `pipeline-conductor` | 4-axis quality gate (backward/adversarial/forward/record) | "Run the quality gate", "4-axis check" | □ |
227
- | `goal-quench` | `/goal` wrapper with token budget gate + pipeline-conductor verification | "Safe goal run", "Goal with budget control" | □ |
228
- | `edit-manifest` | Predict-verify loop for harness edits | "Log this edit", "Predict what this changes" | □ |
229
- | `memory-hygiene` | Detect stale memory entries + re-verify live | "Check stale memory", "Memory drift" | □ |
230
- | `prompt-regression` | Detect behavioral regressions after rule edits | "Did my rule change break anything?" | □ |
231
- | `convergence-loop` *(fh-commons)* | N-round convergence loops — only "truly passed" after convergence | "Suspicious of single-pass", "Convergence loop" | □ |
232
- | `token-budget-gate` *(fh-commons)* | Pre-task token cost estimate (GREEN/YELLOW/ORANGE/RED) | "How expensive is this?", "Token budget estimate" | □ |
233
- | `mcp-circuit-breaker` *(fh-commons)* | Detects MCP tool failure patterns, blocks further calls | "MCP keeps failing", "Tool error loop" | □ |
234
- | `quench-challenger` *(fh-commons)* | Pressure-tests near-final artifacts from adversarial angles | "Challenge this with a devil", "Quench challenger" | □ |
235
-
236
- | Count | Diagnosis |
237
- |:---:|---|
238
- | **28–36** | Advanced — focus on `agent-composer` + `sim-conductor` + `steel-quench` + `pipeline-conductor` chained |
239
- | **10–27** | Activation stage — gradually activate unchecked assets |
240
- | **0–9** | Early stage — go back to self-diagnosis above |
241
-
242
- </details>
243
-
244
- ---
245
-
246
- ## How it works
73
+ The methodology layer is the portable core persistent hub, accumulating learnings, curating cross-project knowledge. The automation layer makes it frictionless when running Claude Code.
247
74
 
248
75
  ```
249
- forge-harness (the brain persistent hub)
250
- ├── knowledge/ → referenced from all projects
251
- └── tracks/ → work records per project
252
-
253
- Project A (the execution site)
254
- → connect hub in CLAUDE.md → auto-referenced
76
+ forge-harness/ ← the hub (persistent brain)
77
+ ├── knowledge/ → shared across all projects
78
+ └── tracks/ → work records per project
255
79
 
256
- Project B (the execution site)
257
- connect hub in CLAUDE.md → auto-referenced
80
+ Project A ──→ connect hub in CLAUDE.md
81
+ Project B ──→ connect hub in CLAUDE.md
258
82
  ```
259
83
 
260
- - **From the hub**: invoke Claude Code → cross-project judgment with integrated context
261
- - **From each project**: project-specific work + hub reference
262
- - **"Hello"** → Claude automatically pulls recent context and today's tasks from the hub *(when running `claude` from the FH cwd)*
263
-
264
- ```
265
- Search: CATALOG.md (tags + summary) → open that file directly
266
- Store: End of session → save to tracks/{project}/ → update CATALOG.md
267
- Return: New pattern found → save to tracks/{project}/learnings/
268
- Share: Common to 2+ projects → write to knowledge/shared/
269
- ```
270
-
271
- ---
272
-
273
- ## Core usage
274
-
275
- | What you want | What to say |
276
- |---|---|
277
- | Start a session | "Hello" → reads hub, guides today's tasks |
278
- | Save session | "Sync this session to forge-harness" |
279
- | Search past work | "What did I do around April 13th?" |
280
- | Connect a new project | "Connect a project" |
281
- | Run adversarial review | "Run the quench on this" |
282
- | Run end-of-session harvest | "Harvest the session" |
283
-
284
84
  ---
285
85
 
286
- ## Agent dispatch
86
+ ## Governance layer for AI-generated code
287
87
 
288
- forge-harness includes specialized agents and `agent-composer` to plan their optimal combination.
88
+ FH wraps any coding agent (OpenCode, Codex, etc.) as a **post-generation governance gate**.
289
89
 
290
- ```
291
- /agent-composer
90
+ ```bash
91
+ npx @chrono-meta/fh-gate # auto-detects changed files
92
+ npx @chrono-meta/fh-gate "src/foo.ts" full # explicit file + full pass
93
+ # → FH_GATE_VERDICT: PASS | PENDING | BLOCKED | ESCALATE
292
94
  ```
293
95
 
294
- Analyzes the current task and proposes which agents to dispatch in what order.
96
+ **Empirical result (2026-05-31)**: Applied to OpenCode's AI-generated `permission/arity.ts` (163 lines, CI green). Verdict: PENDING — 2 A-grade findings CI didn't catch (short-token overflow in allowlist, executor tools absent from arity table).
295
97
 
296
- ### FH agents
98
+ Full spec: [`fh_integration_contract.md`](knowledge/shared/harness-core/fh_integration_contract.md)
297
99
 
298
- | Agent | Role | Tool restrictions |
299
- |---|---|---|
300
- | `plan` | Read-only design agent — analyzes files, maps impact, plans before implementation | Read·Bash·Glob·Grep only |
301
- | `fact-checker` | Asset deduplication and staleness check | Read·Grep·Glob |
302
- | `hub-persona-auditor` | 3+ persona audit of externally published assets | Read·Grep·Glob |
303
- | `persona-innovator` | Naming exploration + frame proposals | Read·Grep·Glob·WebSearch·WebFetch |
304
- | `quench-challenger` | Steel-quench adversary — pressure-tests near-final artifacts | Read·Grep·Glob |
100
+ ---
305
101
 
306
- ### Parallel dispatch
102
+ ## 36 skills, 5 agents
307
103
 
308
- Request two agents in a single message to run in parallel:
104
+ <details>
105
+ <summary>Full asset activation check</summary>
309
106
 
310
- ```
311
- "Run fact-checker and persona-innovator in parallel.
312
- First: check [asset path] for duplicates
313
- Second: scan current harness for naming gaps"
314
- ```
107
+ | Asset | Role | Triggers |
108
+ |---|---|---|
109
+ | `steel-quench` | Full-spectrum adversarial verification | "Run the quench", "Attack from the root" |
110
+ | `source-grounding-audit` | Phantom claim detection + source back-tracing | "Verify the source", "Grounding audit" |
111
+ | `harvest-loop` | End-of-session learning → evolution pipeline | "Harvest the session" |
112
+ | `agent-composer` | Plans optimal agent dispatch | "Run in parallel", "Which agents?" |
113
+ | `sim-conductor` | Meta-simulation orchestrator | "External user perspective" |
114
+ | `context-doctor` | Token efficiency + `.claudeignore` | "Session is slow", "Clean up context" |
115
+ | `harness-doctor` | Harness structure diagnosis | "Check my Claude setup" |
116
+ | `pipeline-conductor` | 4-axis quality gate (backward/adversarial/forward/record) | "Run the quality gate" |
117
+ | `field-harvest` | Back-propagate field patterns to hub | "I could reuse this" |
118
+ | `frontier-digest` | HN + arXiv → actionable insights | "AI trend digest" |
119
+ | `hub-cc-pr-reviewer` | Automated PR review | "Review this PR" |
120
+ | `verify-bidirectional` | Reverse-verify decisions | "Is that right?", "Double-check" |
121
+ | `deep-clarify` | Socratic requirements clarification | "I'm not sure what to build" |
122
+ | `install-wizard` | Initial onboarding | "First-time setup" |
123
+ | `plugin-recommender` | Plugin recommendations | "Is there a good tool for this?" |
124
+ | `apex-review` | Executive-perspective quality review | "Will this hold up?" |
125
+ | `meta-prompt-builder` | Meta prompt design | "Write a prompt for the agent" |
126
+ | `asset-placement-gate` | Hub vs project asset routing | "Should this be shared?" |
127
+ | `cross-ecosystem-synergy-detection` | Cross-tool synergy finder | "Are my tools working together?" |
128
+ | `convergence-loop` *(fh-commons)* | N-round convergence loops | "Single-pass seems suspicious" |
129
+ | `token-budget-gate` *(fh-commons)* | Pre-task token cost estimate | "How expensive is this?" |
130
+ | `mcp-circuit-breaker` *(fh-commons)* | MCP tool failure pattern detection | "MCP keeps failing" |
131
+ | `quench-challenger` *(fh-commons)* | Adversarial pressure-test agent | "Challenge this with a devil" |
132
+ | *(+ 13 more)* | marketplace-gate · contention-layer · context-bridge-dispatch · edit-manifest · fact-checker · goal-quench · hub-persona-auditor · install-doctor · memory-hygiene · persona-innovator · prompt-regression · self-marketing-lint · skill-splitter | |
133
+
134
+ | Active count | Diagnosis |
135
+ |:---:|---|
136
+ | **28+** | Advanced — chain agent-composer + sim-conductor + steel-quench + pipeline-conductor |
137
+ | **10–27** | Activation stage — gradually enable unchecked assets |
138
+ | **0–9** | Early stage — start with `install-wizard` |
315
139
 
316
- > **Validated**: 6 background agents dispatched in parallel from meta-harness cwd → completed in ~3 minutes (~5× faster than sequential).
140
+ </details>
317
141
 
318
142
  ---
319
143
 
320
144
  ## Multi-Model Sidecar (v1.3)
321
145
 
322
- Each available AI CLI (Gemini, Codex, `gh copilot`) forms an independent review team alongside Claude. Cross-team synthesis surfaces Claude blind spots — issues external teams catch that single-model review misses. The sidecars act as **peer reviewers**, not primary orchestrators; skill invocation and harness automation remain Claude Code-native.
146
+ Run Gemini, Codex, or `gh copilot` as independent peer reviewers alongside Claude.
323
147
 
324
- **Coverage tiers (measured on `source-grounding-audit/SKILL.md`):**
325
148
  | Tier | Setup | Defects found |
326
149
  |---|---|---|
327
- | **C1** Single Claude persona | Default | 25% |
150
+ | **C1** Single Claude | Default | 25% |
328
151
  | **C2** 3 cross-session Claude personas | No extra tools | 75% |
329
- | **C3** C2 + external CLI (Gemini/Codex/gh copilot) | External CLI installed | 100% +3 Claude blind spots |
330
-
331
- Claude-side token cost: **zero increase** C2→C3. External CLI billed to its own quota.
332
-
333
- Decision rule: routine → C2, pre-publish → C3+.
334
-
335
- > **Corporate path**: `gh copilot` as sidecar (GitHub Copilot CLI, separate enterprise license). Requires headless operability — use `gh copilot -- -p "..." --allow-all-tools`. Note: CLI presence ≠ headless capable; verify with `--allow-all-tools` before adding to CI.
336
-
337
- ---
338
-
339
- ## Runtime requirements
340
-
341
- | Environment | Support | Notes |
342
- |---|---|---|
343
- | Claude Code + Anthropic API Key | ✅ Recommended | 200K context · officially supported |
344
- | claude.ai Pro / Team Plan | ✅ Recommended | 200K context · officially supported |
345
- | AWS Bedrock (direct API) | ⚠️ Conditional | Possible with sufficient account quota |
346
- | Bedrock + LiteLLM proxy | ⚠️ Unofficial | Frequent `Input is too long` errors |
347
- | Internal AI API proxy | ⚠️ Conditional | Depends on `max_input_tokens` config |
348
-
349
- ---
350
-
351
- ## Plugin install
352
-
353
- ```bash
354
- claude plugin marketplace add https://github.com/chrono-meta/forge-harness.git
355
- claude plugin install -s user fh-meta@forge-harness
356
- ```
357
-
358
- Verify: `/skills` or `/agents` in Claude Code chat. Updates aren't automatic — run `claude plugin update fh-meta@forge-harness` periodically.
359
-
360
- #### Plugin catalog
361
-
362
- | Plugin | Skills | Agents |
363
- |---|---|---|
364
- | **fh-meta** (v1.3) | 29 skills — agent-composer · apex-review · asset-placement-gate · contention-layer · context-bridge-dispatch · context-doctor · cross-ecosystem-synergy-detection · deep-clarify · edit-manifest · field-harvest · frontier-digest · goal-quench · harness-doctor · harvest-loop · hub-cc-pr-reviewer · install-doctor · install-wizard · marketplace-gate · memory-hygiene · meta-prompt-builder · pipeline-conductor · plugin-recommender · prompt-regression · self-marketing-lint · sim-conductor · source-grounding-audit · steel-quench · verify-bidirectional · and more | 3 (hub-persona-auditor · fact-checker · persona-innovator) |
365
- | **fh-commons** (v0.2.0) | 4 skills — convergence-loop · deliberation · mcp-circuit-breaker · token-budget-gate | 1 (quench-challenger) |
366
-
367
- #### Mode C (plugin only — no clone)
368
-
369
- ```bash
370
- claude plugin marketplace add https://github.com/chrono-meta/forge-harness.git
371
- claude plugin install fh-meta@forge-harness
372
- cd ~/projects/{your-project} && claude
373
- ```
374
-
375
- | Skill / area | Mode A (clone + plugin) | Mode C (plugin only) |
376
- |---|:---:|:---:|
377
- | `verify-bidirectional` · `apex-review` | ✅ hub baseline | ⚠️ no `knowledge/` |
378
- | `cross-ecosystem-synergy-detection` · `plugin-recommender` | ✅ hub cross-ref | ⚠️ your project only |
379
- | Meta/hub seed accumulation | ✅ `knowledge/shared/` | ❌ |
380
-
381
- #### Mode D — agent file copy only
382
-
383
- The lightest entry. Copy a single agent file to use immediately:
384
-
385
- ```bash
386
- mkdir -p <your-project>/.claude/agents/
387
- cp <harness-root>/.claude/agents/fact-checker.md <your-project>/.claude/agents/
388
- ```
389
-
390
- #### Connecting FH context to existing project CC
391
-
392
- ```bash
393
- cp {FH_ROOT}/templates/local_fh_context.md .claude/rules/local_fh_context.md
394
- echo ".claude/rules/local_fh_context.md" >> .git/info/exclude
395
- ```
396
-
397
- After this, `claude` in that project recognizes FH skills, session locations, and how to reference them. Token footprint: ~200 tokens (pointer file only).
398
-
399
- ---
400
-
401
- ## Token cost optimization
402
-
403
- **Native overhead** — measured: new install standalone ≈ 29K tokens (14.5% of 200K). Top 2 heaviest files: `.claude/rules/*.md` (~20K) and `CLAUDE.md` (~8.7K). `context-doctor` diagnoses and recommends keyword-trigger deferral for infrequently-used rules (saves 5–8K).
404
-
405
- **1. `.claudeignore` standard** — copy `templates/.claudeignore` to your project root. Defaults: `node_modules/` · `dist/` · `.next/` · `*.lock` · `*.min.js` · `.env`
406
-
407
- **2. Model switching** — `/model sonnet` (coding) · `/model opus` (reasoning) · `/model opusplan` (hybrid)
408
-
409
- **3. Agent view parallel execution** — `context-bridge-dispatch` auto-injects session context cards. 2+ independent tasks → parallel by default; 5–6× acceleration.
410
-
411
- **4. Automated audits** — terminal-start zshrc hook:
412
-
413
- ```bash
414
- export FH_DIR="$HOME/path/to/forge-harness"
415
- source "$FH_DIR/templates/fh_audit_check.zsh"
416
- ```
417
-
418
- ---
419
-
420
- ## Operating model — 3 Phase essence
421
-
422
- ### Phase 1 — Initial setup (active onboarding)
423
-
424
- Greeting from the FH cwd → AI proactively proposes → asks about task → runs 5 skills → setup → hands off to project cwd.
425
-
426
- ### Phase 2 — Backstage optimization
427
-
428
- User works from the **field project cwd**. The hub is not directly invoked but performs lateral optimization: `.claudeignore` applied, model switching active, fh-meta skills naturally activate from description triggers.
152
+ | **C3** C2 + external CLI | External CLI installed | 100% (+3 Claude blind spots) |
429
153
 
430
- ### Phase 3 Threshold return (autonomous proposals)
431
-
432
- When work matures and new skills or upgrades are possible, this AI **proactively proposes** returning to meta-harness from the field cwd.
433
-
434
- | Trigger | Signal |
435
- |---|---|
436
- | New generalizable pattern emerges | First discovery of a pattern worth promoting |
437
- | 3+ accumulated upgrades | Stabilization signal from the same asset evolving |
438
- | Sister asset absorption | External PR audit gate passed |
439
-
440
- ### Command tower pattern (advanced)
441
-
442
- | Task type | Recommended location |
443
- |---|---|
444
- | Single project coding/debugging | That project's cwd |
445
- | Meta/audit/simulation | **Meta-harness cwd + Agent** |
446
- | 2+ projects simultaneously | **Meta-harness cwd + parallel Agent** |
447
- | field-harvest · PR audit · CATALOG updates | **Meta-harness cwd + Agent** |
448
-
449
- ---
450
-
451
- ## Steel-quench convergence — multi-layer defense
452
-
453
- | Layer | Mechanism |
454
- |:---:|---|
455
- | **L1** | harness-doctor + context-doctor + sim-conductor Area B — isolated third-person evaluation |
456
- | **L2** | Real user feedback + external PR review — evidence generated outside owner environment |
457
- | **L3** | steel-quench pre-runs attack angles internally; flaws patched before external devils run |
458
- | **L4** | Meta-aware adversary — remaining attack surface shrinks per wave |
154
+ Claude-side token cost: **zero increase** C2→C3.
459
155
 
460
156
  ---
461
157
 
462
- ## Research & external validation
158
+ ## Research
463
159
 
464
- > **FH v1.0 paper** — published 2026-05-30 on [Zenodo](https://zenodo.org/records/20397566) (DOI: 10.5281/zenodo.20397566) · arXiv submission in review. Documents the 2-layer design, 6-axis framework, 4-agent orchestration, and compounding loop with controlled empirical evidence.
160
+ > **FH v1.0 paper** — [Zenodo](https://zenodo.org/records/20397566) (DOI: 10.5281/zenodo.20397566) · arXiv in review.
161
+ > Documents 2-layer design, 6-axis framework, 4-agent orchestration, and compounding loop with empirical evidence.
465
162
 
466
- Three independent research findings converge on the same layer:
467
- - VILA-Lab analysis of Claude Code v2.1.88 (512K lines): [98.4% is harness infrastructure, 1.6% AI logic](https://arxiv.org/abs/2604.14228)
468
- - "[Code as Agent Harness](https://arxiv.org/abs/2605.18747)" (arXiv, May 2026, 43 authors)
469
- - Stanford IRIS Lab: "[Meta-Harness](https://arxiv.org/abs/2603.28052)" (Lee et al., Mar 2026) outer-loop harness optimization; +7.7pts at 4× fewer tokens
470
-
471
- The Stanford paper also inspired [harness-evolver](https://github.com/raphaelchristi/harness-evolver) (fully automated CODE optimizer). FH converged on the same loop architecture from the opposite direction — complementary, not competing. See `knowledge/shared/harness-core/fh_ecosystem_positioning.md`.
163
+ External convergence:
164
+ - VILA-Lab: [Claude Code v2.1.88 98.4% is harness infrastructure](https://arxiv.org/abs/2604.14228)
165
+ - ["Code as Agent Harness"](https://arxiv.org/abs/2605.18747) arXiv May 2026
166
+ - Stanford IRIS Lab: ["Meta-Harness"](https://arxiv.org/abs/2603.28052) — +7.7pts at 4× fewer tokens
472
167
 
473
168
  ---
474
169
 
475
170
  ## Learn more
476
171
 
477
- - `CLAUDE.md` Sync/Push protocol · AI operating rules
478
- - `AGENTS.md` — Runtime agent specs
479
- - `CATALOG.md` — Search index
480
- - `CHEATSHEET.md` — Full command reference
481
- - `CONTRIBUTING.md` — How to contribute skills and patterns
482
- - `knowledge/shared/harness-core/fh_integration_contract.md` — Governance layer spec
483
-
484
- ---
485
-
486
- ## Appendix
487
-
488
- ### Directory structure
489
-
490
- ```
491
- forge-harness/
492
- ├── knowledge/ # Pure knowledge — time-independent, for reference
493
- │ ├── domain/ # Domain-specific knowledge
494
- │ └── shared/ # Cross-project patterns
495
-
496
- ├── tracks/ # Work records per project — time-accumulated
497
- │ └── {project_name}/
498
- │ ├── session_*.md # Session history
499
- │ └── learnings/ # Accumulated feedback
500
-
501
- ├── plugins/ # fh-meta + fh-commons plugins
502
- ├── templates/ # Skeletons to copy for new projects
503
- ├── scripts/ # fh-gate.sh and automation scripts
504
- ├── docs/ # Diagrams and reference assets
505
- ├── CATALOG.md # Full search index
506
- ├── CLAUDE.md # AI operating rules + Sync/Push protocol
507
- └── CHEATSHEET.md # Command cheat sheet
508
- ```
509
-
510
- ### Key terms
511
-
512
- | Term | Definition |
172
+ | Resource | Purpose |
513
173
  |---|---|
514
- | **Meta-harness** | A persistent hub connecting work, learnings, and patterns of N Claude Code projects for mutual reinforcement |
515
- | **Launch pad effect** | Meta-harness as launch pad, not destination — passing through accelerates the starting line |
516
- | **Shared skill pool** | Common skill/agent pool eliminating reinvention cost across teams and projects |
517
- | **Environment engineering** | Not making the agent smarter, but making the environment easier for the agent to work in |
518
- | **Harness engineering** | Per-project structures (rules, gates, context management) that make AI behave more consistently |
519
- | **Meta harness engineering** | Cross-project system to measure, improve, and evolve harnesses — FH's core layer |
174
+ | [`CLAUDE.md`](CLAUDE.md) | AI operating rules + sync/push protocol |
175
+ | [`CHEATSHEET.md`](CHEATSHEET.md) | Full command reference |
176
+ | [`AGENTS.md`](AGENTS.md) | Runtime agent specs |
177
+ | [`CATALOG.md`](CATALOG.md) | Past work search index |
178
+ | [`CONTRIBUTING.md`](CONTRIBUTING.md) | How to contribute skills and patterns |
179
+ | [`fh_integration_contract.md`](knowledge/shared/harness-core/fh_integration_contract.md) | Governance gate spec |
@@ -301,7 +301,7 @@ The following require the bridge layer and are out of scope for v0.1:
301
301
 
302
302
  | Feature | Why deferred |
303
303
  |---|---|
304
- | Binary / installable package | FH is methodology layer; no runtime distribution yet |
304
+ | REST API or webhook | Would require a server process FH is file-based |
305
305
  | REST API or webhook | Would require a server process — FH is file-based |
306
306
  | Streaming verdict updates | Requires runtime; methodology layer is synchronous |
307
307
  | Multi-file parallel governance | Possible via agent dispatch today; not formalized here |
@@ -316,6 +316,8 @@ The bridge layer (v1.0) will implement these. This contract is the specification
316
316
  | Version | Date | Change |
317
317
  |---|---|---|
318
318
  | v0.1 | 2026-05-31 | Initial specification. Bash invocation patterns + structured verdict format. Empirical basis: arity.ts controlled trial. |
319
+ | v1.0 | 2026-06-01 | Binary available as `@chrono-meta/fh-gate` on npm. JS wrapper + fh-gate.sh CI-ready binary. |
320
+ | v1.1 | 2026-06-03 | Large-scale harness improvements. Banner update. Version alignment. |
319
321
 
320
322
  ---
321
323
 
@@ -323,6 +325,6 @@ The bridge layer (v1.0) will implement these. This contract is the specification
323
325
 
324
326
  - `fh_opencode_governance_wrapper.md` — step-by-step usage guide (less formal, more tutorial)
325
327
  - `fh_ecosystem_positioning.md` — ecosystem context + synergy map + v2 paper connection
326
- - `tracks/_meta/fh_opencode_governance_experiment_2026_05_31.md`empirical basis for verdict format
328
+ - `tracks/_meta/`governance logs written here on each gate run
327
329
  - `multi_model_sidecar_strategy.md` — multi-model orchestration (related pattern)
328
330
  - FH paper (Zenodo: 10.5281/zenodo.20397566) — harness-as-durable-layer thesis this contract operationalizes
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@chrono-meta/fh-gate",
3
- "version": "1.0.2",
3
+ "version": "1.1.0",
4
4
  "description": "FH governance gate — runs structured AI code review via claude --print and returns machine-parseable verdicts (PASS/PENDING/BLOCKED/ESCALATE).",
5
5
  "license": "MIT",
6
6
  "keywords": [
@@ -13,10 +13,10 @@
13
13
  ],
14
14
  "repository": {
15
15
  "type": "git",
16
- "url": "https://github.com/chrono-meta/forge-harness.git"
16
+ "url": "git+https://github.com/chrono-meta/forge-harness.git"
17
17
  },
18
18
  "bin": {
19
- "fh-gate": "./bin/fh-gate.js"
19
+ "fh-gate": "bin/fh-gate.js"
20
20
  },
21
21
  "scripts": {
22
22
  "prepare": "chmod +x bin/fh-gate.js scripts/fh-gate.sh"
@@ -1,5 +1,5 @@
1
1
  #!/usr/bin/env bash
2
- # fh-gate.sh — FH governance gate v1.0
2
+ # fh-gate.sh — FH governance gate v1.1
3
3
  #
4
4
  # Executes governance review end-to-end via claude --print.
5
5
  # CI-ready: machine-parseable verdict + exit codes.
@@ -27,7 +27,7 @@
27
27
 
28
28
  set -euo pipefail
29
29
 
30
- VERSION="1.0.0"
30
+ VERSION="1.1.0"
31
31
  FH_ROOT="$(cd "$(dirname "$0")/.." && pwd)"
32
32
  _TMPDIR="${TMPDIR:-/tmp}"
33
33