@chrono-meta/fh-gate 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +519 -0
- package/bin/fh-gate +3 -0
- package/knowledge/shared/harness-core/fh_integration_contract.md +328 -0
- package/package.json +41 -0
- package/scripts/fh-gate.sh +239 -0
package/README.md
ADDED
|
@@ -0,0 +1,519 @@
|
|
|
1
|
+
<p align="center">
|
|
2
|
+
<img src="docs/banner.png" alt="forge-harness — A forkable Claude Code meta-harness for multi-project teams" width="640">
|
|
3
|
+
</p>
|
|
4
|
+
|
|
5
|
+
<p align="center">
|
|
6
|
+
<a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-22c55e.svg" alt="MIT License"></a>
|
|
7
|
+
<img src="https://img.shields.io/badge/version-v1.3-3b82f6.svg" alt="v1.3">
|
|
8
|
+
<a href="https://zenodo.org/records/20397566"><img src="https://zenodo.org/badge/DOI/10.5281/zenodo.20397566.svg" alt="DOI 10.5281/zenodo.20397566"></a>
|
|
9
|
+
<img src="https://img.shields.io/badge/Claude_Code-compatible-a855f7.svg" alt="Claude Code compatible">
|
|
10
|
+
<img src="https://img.shields.io/badge/Codex-beta-f59e0b.svg" alt="Codex-compatible beta">
|
|
11
|
+
</p>
|
|
12
|
+
|
|
13
|
+
<p align="center">
|
|
14
|
+
<b>Fork it. Rename it. Make it yours.</b><br>
|
|
15
|
+
A persistent knowledge hub that connects all your Claude Code projects — shared skills, accumulated context, and a compounding improvement loop.
|
|
16
|
+
</p>
|
|
17
|
+
|
|
18
|
+
---
|
|
19
|
+
|
|
20
|
+
| If you're here because… | forge-harness solves it |
|
|
21
|
+
|---|---|
|
|
22
|
+
| Context disappears when a session ends | Persistent `tracks/` — resumable from anywhere |
|
|
23
|
+
| You repeat the same setup for every project | Connect once to the hub, share across all projects |
|
|
24
|
+
| Your team's AI know-how lives only in people's heads | Codify it so everyone shares it |
|
|
25
|
+
| You want AI to get *better* as work accumulates | Skills and patterns compound session over session |
|
|
26
|
+
| You're evaluating AI-generated code with no governance layer | `fh-gate` wraps any coding agent as a post-generation gate |
|
|
27
|
+
|
|
28
|
+
> **Worried about token costs?** New install footprint ≈ 14.5% of 200K context. `context-doctor` diagnoses and reduces this further. → [Token optimization](#token-cost-optimization)
|
|
29
|
+
|
|
30
|
+
| Where you are now | Jump to |
|
|
31
|
+
|---|---|
|
|
32
|
+
| Starting from scratch | [Get started in 2 minutes](#get-started-in-2-minutes) |
|
|
33
|
+
| Already using it, want more | [33 asset activation check](#already-using-it----33-asset-activation-check) |
|
|
34
|
+
| Wrapping an external coding agent | [Governance layer](#governance-layer-for-ai-generated-code) |
|
|
35
|
+
| Want to spread it to your team | [Operating model Phase 3](#operating-model----3-phase-essence) |
|
|
36
|
+
|
|
37
|
+
---
|
|
38
|
+
|
|
39
|
+
> **This document is for humans.** AI operating rules → `CLAUDE.md` · Command reference → `CHEATSHEET.md`
|
|
40
|
+
|
|
41
|
+
---
|
|
42
|
+
|
|
43
|
+
## What is this?
|
|
44
|
+
|
|
45
|
+
An **acceleration hub** for teams already using Claude Code. Connect N projects to one hub — work, learnings, and patterns from each project **mutually reinforce** each other. Build skills and agents once in the hub; share them across every project.
|
|
46
|
+
|
|
47
|
+
> **Goal**: Get into orbit in the age of AI acceleration without burning out. Minimize setup friction, optimize context, distribute expertise by task complexity, and raise the success rate of every session.
|
|
48
|
+
|
|
49
|
+
forge-harness is structured as two distinct layers:
|
|
50
|
+
|
|
51
|
+
| Layer | Contents | AI compatibility |
|
|
52
|
+
|---|---|---|
|
|
53
|
+
| **Methodology layer** (model-agnostic) | `tracks/`, `knowledge/`, `SKILL.md` documents, session protocols | Any AI model |
|
|
54
|
+
| **Automation layer** (Claude-native) | `.claude/agents/`, hooks, slash commands, `CLAUDE.md` rules | Claude Code only |
|
|
55
|
+
|
|
56
|
+
The **methodology layer** is the portable core — connecting projects to a persistent hub, accumulating learnings in `tracks/`, curating cross-project knowledge in `knowledge/shared/`. Works regardless of which AI you use.
|
|
57
|
+
|
|
58
|
+
The **automation layer** is what makes the methodology frictionless when running Claude Code: agents dispatch automatically, hooks fire at session boundaries, and slash commands invoke skills without manual prompting.
|
|
59
|
+
|
|
60
|
+
> **Codex-compatible beta**: Gemini, Codex, and other AI users can apply the methodology layer manually. Automation layer features require Claude Code as host.
|
|
61
|
+
|
|
62
|
+
---
|
|
63
|
+
|
|
64
|
+
## Finding your entry path
|
|
65
|
+
|
|
66
|
+
Teams using AI collaboration tools systematically are already doing **harness engineering**: QA protocols, verification pipelines, structures that make AI behave more consistently. forge-harness is the **OS one layer above** — a system that measures, improves, and evolves the harness itself across multiple projects.
|
|
67
|
+
|
|
68
|
+
| Layer | What it does | Examples |
|
|
69
|
+
|---|---|---|
|
|
70
|
+
| Harness engineering | Per-project rules, gates, context management | QA protocols, CLAUDE.md rulesets, TC verification pipelines |
|
|
71
|
+
| **Meta harness engineering** | Cross-project system to measure, improve, evolve harnesses | FH skill bus, harness-doctor, steel-quench, field-harvest |
|
|
72
|
+
|
|
73
|
+
> **FH v1.0 paper** — published 2026-05-30 on [Zenodo](https://zenodo.org/records/20397566) (DOI: 10.5281/zenodo.20397566) · arXiv submission in review. Documents the 2-layer design, 6-axis framework, 4-agent orchestration, and compounding loop with controlled empirical evidence.
|
|
74
|
+
|
|
75
|
+
> **External validation (2026)** — three independent research findings converge:
|
|
76
|
+
> - VILA-Lab analysis of Claude Code v2.1.88 (512K lines): [98.4% is harness infrastructure, 1.6% AI logic](https://arxiv.org/abs/2604.14228)
|
|
77
|
+
> - "[Code as Agent Harness](https://arxiv.org/abs/2605.18747)" (arXiv, May 2026, 43 authors)
|
|
78
|
+
> - Stanford IRIS Lab: "[Meta-Harness](https://arxiv.org/abs/2603.28052)" (Lee et al., Mar 2026) — outer-loop harness optimization; +7.7pts at 4× fewer tokens
|
|
79
|
+
|
|
80
|
+
#### FH vs. automated harness tools
|
|
81
|
+
|
|
82
|
+
The Stanford paper inspired [harness-evolver](https://github.com/raphaelchristi/harness-evolver) — a fully automated 7-stage CODE optimizer. FH independently converged on the same loop architecture from the opposite direction:
|
|
83
|
+
|
|
84
|
+
| Axis | harness-evolver | forge-harness |
|
|
85
|
+
|---|---|---|
|
|
86
|
+
| **Optimization target** | Harness code (prompts, routing) | Harness knowledge (context, patterns, expertise) |
|
|
87
|
+
| **Evolution** | Auto-merge winners to git | Human-approved at every stage |
|
|
88
|
+
| **Infrastructure** | LangSmith + Python 3.10+ | CLAUDE.md + skills only, zero extra |
|
|
89
|
+
| **Scope** | Single-harness optimization | Multi-project federation, shared skill bus |
|
|
90
|
+
| **Knowledge layer** | No persistent curation | `tracks/` + `knowledge/` grow over time |
|
|
91
|
+
|
|
92
|
+
They're complementary — FH's approval gates and knowledge layer fill exactly the gaps automated CODE search leaves open.
|
|
93
|
+
|
|
94
|
+
Count how many apply to you:
|
|
95
|
+
|
|
96
|
+
- [ ] You have 2 or more Claude Code projects
|
|
97
|
+
- [ ] You lose context when a session ends
|
|
98
|
+
- [ ] You repeat the same patterns and rules across multiple projects
|
|
99
|
+
- [ ] You want to spread AI methodology to your team
|
|
100
|
+
- [ ] You want AI to improve as work accumulates
|
|
101
|
+
|
|
102
|
+
| Count | Recommended path |
|
|
103
|
+
|:---:|---|
|
|
104
|
+
| **3+** | Standard entry → [Get started in 2 minutes](#get-started-in-2-minutes) |
|
|
105
|
+
| **1–2** | Plugin first → `claude plugin install -s user fh-meta@forge-harness` |
|
|
106
|
+
| **0** | Single-project stage — check back when you reach 2+ projects. `context-doctor` is available standalone now |
|
|
107
|
+
|
|
108
|
+
---
|
|
109
|
+
|
|
110
|
+
## Get started in 2 minutes
|
|
111
|
+
|
|
112
|
+
> **Prerequisite**: Claude Code CLI installed. Verify: `claude --version`
|
|
113
|
+
|
|
114
|
+
### Step 0. Register the plugin
|
|
115
|
+
|
|
116
|
+
```bash
|
|
117
|
+
claude plugin marketplace add https://github.com/chrono-meta/forge-harness.git
|
|
118
|
+
claude plugin install -s user fh-meta@forge-harness
|
|
119
|
+
```
|
|
120
|
+
|
|
121
|
+
> If Step 0 fails: run `claude plugin update fh-meta@forge-harness`, or check that your network can reach `github.com`.
|
|
122
|
+
|
|
123
|
+
Verify: type `/skills` in the CC chat → if `install-wizard` appears, you're done.
|
|
124
|
+
|
|
125
|
+
### Step 1. Clone the hub
|
|
126
|
+
|
|
127
|
+
```bash
|
|
128
|
+
git clone https://github.com/chrono-meta/forge-harness.git ~/forge-harness
|
|
129
|
+
cd ~/forge-harness
|
|
130
|
+
```
|
|
131
|
+
|
|
132
|
+
> **Standard path**: Fork on GitHub first → clone your fork → accumulate `tracks/` and `knowledge/` there → periodically pull upstream updates from forge-harness. Rename it to make it yours.
|
|
133
|
+
|
|
134
|
+
### Step 2. Say something
|
|
135
|
+
|
|
136
|
+
```bash
|
|
137
|
+
claude
|
|
138
|
+
```
|
|
139
|
+
|
|
140
|
+
> ✅ **Expected**: Claude reads `CLAUDE.md`, asks what project to connect or what task to start.
|
|
141
|
+
>
|
|
142
|
+
> ❌ **Generic response?** → Run `pwd` to confirm you're in the `forge-harness` root. If not: `cd ~/forge-harness && claude`
|
|
143
|
+
|
|
144
|
+
From here:
|
|
145
|
+
- **"Connect a project"** → hub scans `../`, lists projects with `.git`, creates `tracks/{project}/` on confirmation
|
|
146
|
+
- **"My projects are in `~/work/`"** → specify a different root
|
|
147
|
+
|
|
148
|
+
---
|
|
149
|
+
|
|
150
|
+
## Governance layer for AI-generated code
|
|
151
|
+
|
|
152
|
+
FH wraps any coding agent (OpenCode, Codex, etc.) as a **post-generation governance layer** — no runtime adapter needed. FH reads files the agent writes; the protocol is the interface.
|
|
153
|
+
|
|
154
|
+
```bash
|
|
155
|
+
# After a coding agent completes a task:
|
|
156
|
+
./scripts/fh-gate.sh # auto-detects changed files from git diff
|
|
157
|
+
# → steel-quench adversarial pass # behavioral edges, untested contracts, security
|
|
158
|
+
# → pipeline-conductor --quick # 4-axis: regression / adversarial / grounding / record
|
|
159
|
+
# → FH_GATE_VERDICT # PASS | PENDING | BLOCKED | ESCALATE
|
|
160
|
+
```
|
|
161
|
+
|
|
162
|
+
**Empirical result (2026-05-31)**: Applied to OpenCode's own AI-generated `permission/arity.ts` (163 lines, 6 tests passing, CI green). Governance verdict: PENDING — 2 A-grade findings CI didn't cover (short-token overflow in allowlist, executor tools absent from arity table). Delta attributable to methodology layer, not the model.
|
|
163
|
+
|
|
164
|
+
Full spec: `knowledge/shared/harness-core/fh_integration_contract.md` · Usage: `knowledge/shared/harness-core/fh_opencode_governance_wrapper.md`
|
|
165
|
+
|
|
166
|
+
> **One-line install (coming soon)**: `npx @chrono-meta/fh-gate "src/foo.ts" quick ci` — npm publish in progress.
|
|
167
|
+
|
|
168
|
+
---
|
|
169
|
+
|
|
170
|
+
## Real-world case — AI TC generation prompt hardening
|
|
171
|
+
|
|
172
|
+
> **Context**: An AI-powered test case generation tool was merging TC outputs without quality validation. Prompts contained cushion language, phantom claims, and no priority guardrails.
|
|
173
|
+
|
|
174
|
+
**Applied**: `steel-quench` (W1–W8 adversarial hardening) + `source-grounding-audit` (phantom claim detection)
|
|
175
|
+
|
|
176
|
+
| Wave | What was attacked | Result |
|
|
177
|
+
|---|---|---|
|
|
178
|
+
| W1–W2 | Cushion language ("it would be good to…") → forced conditions | Ambiguity eliminated |
|
|
179
|
+
| W4–W5 | No self-check step → Self-Check quality gate added | Quality bypass path closed |
|
|
180
|
+
| W6 | Soft review → Hard gate ("no next step until fix complete") | Incomplete TC merge blocked |
|
|
181
|
+
| W7 | P0 ratio inflation → forced re-review above 30% | Priority inflation prevented |
|
|
182
|
+
| W8 | Phantom Claim Guard — unspecified values/button names banned | Fabricated expected results blocked |
|
|
183
|
+
|
|
184
|
+
**Outcome**: 4 bugs found and fixed · 8-layer quality gate complete · output noise eliminated
|
|
185
|
+
|
|
186
|
+
> The self-healing loop: steel-quench attacks the prompt → execution catches bugs the review missed → fixes are verified in the same pass.
|
|
187
|
+
|
|
188
|
+
---
|
|
189
|
+
|
|
190
|
+
## Already using it — 33 asset activation check
|
|
191
|
+
|
|
192
|
+
<details>
|
|
193
|
+
<summary>Expand full asset table (33 skills + 5 agents)</summary>
|
|
194
|
+
|
|
195
|
+
Check which of the following are **regularly activating** for you:
|
|
196
|
+
|
|
197
|
+
| Asset | Role | Natural language triggers | Active |
|
|
198
|
+
|---|---|---|:---:|
|
|
199
|
+
| `agent-composer` | Plans optimal agent dispatch | "How should I split this across agents?", "Run in parallel" | □ |
|
|
200
|
+
| `apex-review` | Final quality review from executive perspective | "Will this hold up with decision-makers?" | □ |
|
|
201
|
+
| `verify-bidirectional` | Reverse-verify decisions | "Is that right?", "Double-check this" | □ |
|
|
202
|
+
| `deliberation` *(fh-commons)* | Structured multi-angle argument | "Battle it out", "Review this from multiple angles" | □ |
|
|
203
|
+
| `cross-ecosystem-synergy-detection` | Detect cross-tool synergies | "Are my installed tools working together?" | □ |
|
|
204
|
+
| `plugin-recommender` | Plugin recommendations | "Is there a good tool for this?" | □ |
|
|
205
|
+
| `hub-cc-pr-reviewer` | Automated PR review | "Review this PR", "Is it okay to merge?" | □ |
|
|
206
|
+
| `context-doctor` | Token efficiency + `.claudeignore` | "Session is slow", "Clean up context" | □ |
|
|
207
|
+
| `sim-conductor` | Meta-simulation orchestrator | "External user perspective", "Internal audit" | □ |
|
|
208
|
+
| `steel-quench` | Full-spectrum adversarial verification — attacks output patterns (self-declarations, cushion language, structural flaws) | "Run the quench", "Attack from the root" | □ |
|
|
209
|
+
| `source-grounding-audit` | Source back-tracing — detects Phantom Claims (no source found). Attacks input tracing (where did this come from?) | "Verify the source", "Grounding audit" | □ |
|
|
210
|
+
| `harness-doctor` | Harness structure diagnosis | "Something seems wrong with my Claude setup" | □ |
|
|
211
|
+
| `deep-clarify` | Socratic requirements clarification | "I'm not sure what I need to build", "Clarify this" | □ |
|
|
212
|
+
| `meta-prompt-builder` | Meta prompt design | "Write a prompt for each Wave", "What should I tell the agent?" | □ |
|
|
213
|
+
| `install-doctor` | Diagnose conflicts before/after plugin install | "Is it okay to add this plugin?" | □ |
|
|
214
|
+
| `install-wizard` | Initial environment diagnosis + onboarding | "First-time setup", "Just installed this" | □ |
|
|
215
|
+
| `asset-placement-gate` | New asset belongs in FH or project? | "Should this be shared?", "Hub vs project" | □ |
|
|
216
|
+
| `marketplace-gate` | 5-point fitness gate before listing | "Is it okay to list this?" | □ |
|
|
217
|
+
| `field-harvest` | Back-propagate field patterns to hub | "I could reuse this in other projects" | □ |
|
|
218
|
+
| `hub-persona-auditor` | Pre-publish 4-axis audit | "How will this look to others?" | □ |
|
|
219
|
+
| `fact-checker` | Asset deduplication check | "Isn't there something similar already?" | □ |
|
|
220
|
+
| `persona-innovator` | Naming gap detection + ideation | "What would be a good name for this?" | □ |
|
|
221
|
+
| `contention-layer` | Treat skill conflicts as harvest signals | "These two skills conflict" | □ |
|
|
222
|
+
| `context-bridge-dispatch` | Inject session context cards before parallel dispatch | "Brief the agents first", "Parallel dispatch" | □ |
|
|
223
|
+
| `frontier-digest` | Frontier signals (HN, arXiv) → actionable insights | "AI trend digest", "What's new this week" | □ |
|
|
224
|
+
| `harvest-loop` | End-of-session learning → evolution pipeline | "Harvest the session", "Run the pipeline" | □ |
|
|
225
|
+
| `self-marketing-lint` | Remove self-marketing language from skill descriptions | "Description diet", "Strip the marketing tone" | □ |
|
|
226
|
+
| `pipeline-conductor` | 4-axis quality gate (backward/adversarial/forward/record) | "Run the quality gate", "4-axis check" | □ |
|
|
227
|
+
| `goal-quench` | `/goal` wrapper with token budget gate + pipeline-conductor verification | "Safe goal run", "Goal with budget control" | □ |
|
|
228
|
+
| `edit-manifest` | Predict-verify loop for harness edits | "Log this edit", "Predict what this changes" | □ |
|
|
229
|
+
| `memory-hygiene` | Detect stale memory entries + re-verify live | "Check stale memory", "Memory drift" | □ |
|
|
230
|
+
| `prompt-regression` | Detect behavioral regressions after rule edits | "Did my rule change break anything?" | □ |
|
|
231
|
+
| `convergence-loop` *(fh-commons)* | N-round convergence loops — only "truly passed" after convergence | "Suspicious of single-pass", "Convergence loop" | □ |
|
|
232
|
+
| `token-budget-gate` *(fh-commons)* | Pre-task token cost estimate (GREEN/YELLOW/ORANGE/RED) | "How expensive is this?", "Token budget estimate" | □ |
|
|
233
|
+
| `mcp-circuit-breaker` *(fh-commons)* | Detects MCP tool failure patterns, blocks further calls | "MCP keeps failing", "Tool error loop" | □ |
|
|
234
|
+
| `quench-challenger` *(fh-commons)* | Pressure-tests near-final artifacts from adversarial angles | "Challenge this with a devil", "Quench challenger" | □ |
|
|
235
|
+
|
|
236
|
+
| Count | Diagnosis |
|
|
237
|
+
|:---:|---|
|
|
238
|
+
| **28–36** | Advanced — focus on `agent-composer` + `sim-conductor` + `steel-quench` + `pipeline-conductor` chained |
|
|
239
|
+
| **10–27** | Activation stage — gradually activate unchecked assets |
|
|
240
|
+
| **0–9** | Early stage — go back to self-diagnosis above |
|
|
241
|
+
|
|
242
|
+
</details>
|
|
243
|
+
|
|
244
|
+
---
|
|
245
|
+
|
|
246
|
+
## How it works
|
|
247
|
+
|
|
248
|
+
```
|
|
249
|
+
forge-harness (the brain — persistent hub)
|
|
250
|
+
├── knowledge/ → referenced from all projects
|
|
251
|
+
└── tracks/ → work records per project
|
|
252
|
+
|
|
253
|
+
Project A (the execution site)
|
|
254
|
+
→ connect hub in CLAUDE.md → auto-referenced
|
|
255
|
+
|
|
256
|
+
Project B (the execution site)
|
|
257
|
+
→ connect hub in CLAUDE.md → auto-referenced
|
|
258
|
+
```
|
|
259
|
+
|
|
260
|
+
- **From the hub**: invoke Claude Code → cross-project judgment with integrated context
|
|
261
|
+
- **From each project**: project-specific work + hub reference
|
|
262
|
+
- **"Hello"** → Claude automatically pulls recent context and today's tasks from the hub *(when running `claude` from the FH cwd)*
|
|
263
|
+
|
|
264
|
+
```
|
|
265
|
+
Search: CATALOG.md (tags + summary) → open that file directly
|
|
266
|
+
Store: End of session → save to tracks/{project}/ → update CATALOG.md
|
|
267
|
+
Return: New pattern found → save to tracks/{project}/learnings/
|
|
268
|
+
Share: Common to 2+ projects → write to knowledge/shared/
|
|
269
|
+
```
|
|
270
|
+
|
|
271
|
+
---
|
|
272
|
+
|
|
273
|
+
## Core usage
|
|
274
|
+
|
|
275
|
+
| What you want | What to say |
|
|
276
|
+
|---|---|
|
|
277
|
+
| Start a session | "Hello" → reads hub, guides today's tasks |
|
|
278
|
+
| Save session | "Sync this session to forge-harness" |
|
|
279
|
+
| Search past work | "What did I do around April 13th?" |
|
|
280
|
+
| Connect a new project | "Connect a project" |
|
|
281
|
+
| Run adversarial review | "Run the quench on this" |
|
|
282
|
+
| Run end-of-session harvest | "Harvest the session" |
|
|
283
|
+
|
|
284
|
+
---
|
|
285
|
+
|
|
286
|
+
## Agent dispatch
|
|
287
|
+
|
|
288
|
+
forge-harness includes specialized agents and `agent-composer` to plan their optimal combination.
|
|
289
|
+
|
|
290
|
+
```
|
|
291
|
+
/agent-composer
|
|
292
|
+
```
|
|
293
|
+
|
|
294
|
+
Analyzes the current task and proposes which agents to dispatch in what order.
|
|
295
|
+
|
|
296
|
+
### FH agents
|
|
297
|
+
|
|
298
|
+
| Agent | Role | Tool restrictions |
|
|
299
|
+
|---|---|---|
|
|
300
|
+
| `plan` | Read-only design agent — analyzes files, maps impact, plans before implementation | Read·Bash·Glob·Grep only |
|
|
301
|
+
| `fact-checker` | Asset deduplication and staleness check | Read·Grep·Glob |
|
|
302
|
+
| `hub-persona-auditor` | 3+ persona audit of externally published assets | Read·Grep·Glob |
|
|
303
|
+
| `persona-innovator` | Naming exploration + frame proposals | Read·Grep·Glob·WebSearch·WebFetch |
|
|
304
|
+
| `quench-challenger` | Steel-quench adversary — pressure-tests near-final artifacts | Read·Grep·Glob |
|
|
305
|
+
|
|
306
|
+
### Parallel dispatch
|
|
307
|
+
|
|
308
|
+
Request two agents in a single message to run in parallel:
|
|
309
|
+
|
|
310
|
+
```
|
|
311
|
+
"Run fact-checker and persona-innovator in parallel.
|
|
312
|
+
First: check [asset path] for duplicates
|
|
313
|
+
Second: scan current harness for naming gaps"
|
|
314
|
+
```
|
|
315
|
+
|
|
316
|
+
> **Validated**: 6 background agents dispatched in parallel from meta-harness cwd → completed in ~3 minutes (~5× faster than sequential).
|
|
317
|
+
|
|
318
|
+
---
|
|
319
|
+
|
|
320
|
+
## Multi-Model Sidecar (v1.3)
|
|
321
|
+
|
|
322
|
+
Each available AI CLI (Gemini, Codex, `gh copilot`) forms an independent review team alongside Claude. Cross-team synthesis surfaces Claude blind spots — issues external teams catch that single-model review misses. The sidecars act as **peer reviewers**, not primary orchestrators; skill invocation and harness automation remain Claude Code-native.
|
|
323
|
+
|
|
324
|
+
**Coverage tiers (measured on `source-grounding-audit/SKILL.md`):**
|
|
325
|
+
| Tier | Setup | Defects found |
|
|
326
|
+
|---|---|---|
|
|
327
|
+
| **C1** Single Claude persona | Default | 25% |
|
|
328
|
+
| **C2** 3 cross-session Claude personas | No extra tools | 75% |
|
|
329
|
+
| **C3** C2 + external CLI (Gemini/Codex/gh copilot) | External CLI installed | 100% — +3 Claude blind spots |
|
|
330
|
+
|
|
331
|
+
Claude-side token cost: **zero increase** C2→C3. External CLI billed to its own quota.
|
|
332
|
+
|
|
333
|
+
Decision rule: routine → C2, pre-publish → C3+.
|
|
334
|
+
|
|
335
|
+
> **Corporate path**: `gh copilot` as sidecar (GitHub Copilot CLI, separate enterprise license). Requires headless operability — use `gh copilot -- -p "..." --allow-all-tools`. Note: CLI presence ≠ headless capable; verify with `--allow-all-tools` before adding to CI.
|
|
336
|
+
|
|
337
|
+
---
|
|
338
|
+
|
|
339
|
+
## Runtime requirements
|
|
340
|
+
|
|
341
|
+
| Environment | Support | Notes |
|
|
342
|
+
|---|---|---|
|
|
343
|
+
| Claude Code + Anthropic API Key | ✅ Recommended | 200K context · officially supported |
|
|
344
|
+
| claude.ai Pro / Team Plan | ✅ Recommended | 200K context · officially supported |
|
|
345
|
+
| AWS Bedrock (direct API) | ⚠️ Conditional | Possible with sufficient account quota |
|
|
346
|
+
| Bedrock + LiteLLM proxy | ⚠️ Unofficial | Frequent `Input is too long` errors |
|
|
347
|
+
| Internal AI API proxy | ⚠️ Conditional | Depends on `max_input_tokens` config |
|
|
348
|
+
|
|
349
|
+
---
|
|
350
|
+
|
|
351
|
+
## Plugin install
|
|
352
|
+
|
|
353
|
+
```bash
|
|
354
|
+
claude plugin marketplace add https://github.com/chrono-meta/forge-harness.git
|
|
355
|
+
claude plugin install -s user fh-meta@forge-harness
|
|
356
|
+
```
|
|
357
|
+
|
|
358
|
+
Verify: `/skills` or `/agents` in Claude Code chat. Updates aren't automatic — run `claude plugin update fh-meta@forge-harness` periodically.
|
|
359
|
+
|
|
360
|
+
#### Plugin catalog
|
|
361
|
+
|
|
362
|
+
| Plugin | Skills | Agents |
|
|
363
|
+
|---|---|---|
|
|
364
|
+
| **fh-meta** (v1.3) | 29 skills — agent-composer · apex-review · asset-placement-gate · contention-layer · context-bridge-dispatch · context-doctor · cross-ecosystem-synergy-detection · deep-clarify · edit-manifest · field-harvest · frontier-digest · goal-quench · harness-doctor · harvest-loop · hub-cc-pr-reviewer · install-doctor · install-wizard · marketplace-gate · memory-hygiene · meta-prompt-builder · pipeline-conductor · plugin-recommender · prompt-regression · self-marketing-lint · sim-conductor · source-grounding-audit · steel-quench · verify-bidirectional · and more | 3 (hub-persona-auditor · fact-checker · persona-innovator) |
|
|
365
|
+
| **fh-commons** (v0.2.0) | 4 skills — convergence-loop · deliberation · mcp-circuit-breaker · token-budget-gate | 1 (quench-challenger) |
|
|
366
|
+
|
|
367
|
+
#### Mode C (plugin only — no clone)
|
|
368
|
+
|
|
369
|
+
```bash
|
|
370
|
+
claude plugin marketplace add https://github.com/chrono-meta/forge-harness.git
|
|
371
|
+
claude plugin install fh-meta@forge-harness
|
|
372
|
+
cd ~/projects/{your-project} && claude
|
|
373
|
+
```
|
|
374
|
+
|
|
375
|
+
| Skill / area | Mode A (clone + plugin) | Mode C (plugin only) |
|
|
376
|
+
|---|:---:|:---:|
|
|
377
|
+
| `verify-bidirectional` · `apex-review` | ✅ hub baseline | ⚠️ no `knowledge/` |
|
|
378
|
+
| `cross-ecosystem-synergy-detection` · `plugin-recommender` | ✅ hub cross-ref | ⚠️ your project only |
|
|
379
|
+
| Meta/hub seed accumulation | ✅ `knowledge/shared/` | ❌ |
|
|
380
|
+
|
|
381
|
+
#### Mode D — agent file copy only
|
|
382
|
+
|
|
383
|
+
The lightest entry. Copy a single agent file to use immediately:
|
|
384
|
+
|
|
385
|
+
```bash
|
|
386
|
+
mkdir -p <your-project>/.claude/agents/
|
|
387
|
+
cp <harness-root>/.claude/agents/fact-checker.md <your-project>/.claude/agents/
|
|
388
|
+
```
|
|
389
|
+
|
|
390
|
+
#### Connecting FH context to existing project CC
|
|
391
|
+
|
|
392
|
+
```bash
|
|
393
|
+
cp {FH_ROOT}/templates/local_fh_context.md .claude/rules/local_fh_context.md
|
|
394
|
+
echo ".claude/rules/local_fh_context.md" >> .git/info/exclude
|
|
395
|
+
```
|
|
396
|
+
|
|
397
|
+
After this, `claude` in that project recognizes FH skills, session locations, and how to reference them. Token footprint: ~200 tokens (pointer file only).
|
|
398
|
+
|
|
399
|
+
---
|
|
400
|
+
|
|
401
|
+
## Token cost optimization
|
|
402
|
+
|
|
403
|
+
**Native overhead** — measured: new install standalone ≈ 29K tokens (14.5% of 200K). Top 2 heaviest files: `.claude/rules/*.md` (~20K) and `CLAUDE.md` (~8.7K). `context-doctor` diagnoses and recommends keyword-trigger deferral for infrequently-used rules (saves 5–8K).
|
|
404
|
+
|
|
405
|
+
**1. `.claudeignore` standard** — copy `templates/.claudeignore` to your project root. Defaults: `node_modules/` · `dist/` · `.next/` · `*.lock` · `*.min.js` · `.env`
|
|
406
|
+
|
|
407
|
+
**2. Model switching** — `/model sonnet` (coding) · `/model opus` (reasoning) · `/model opusplan` (hybrid)
|
|
408
|
+
|
|
409
|
+
**3. Agent view parallel execution** — `context-bridge-dispatch` auto-injects session context cards. 2+ independent tasks → parallel by default; 5–6× acceleration.
|
|
410
|
+
|
|
411
|
+
**4. Automated audits** — terminal-start zshrc hook:
|
|
412
|
+
|
|
413
|
+
```bash
|
|
414
|
+
export FH_DIR="$HOME/path/to/forge-harness"
|
|
415
|
+
source "$FH_DIR/templates/fh_audit_check.zsh"
|
|
416
|
+
```
|
|
417
|
+
|
|
418
|
+
---
|
|
419
|
+
|
|
420
|
+
## Operating model — 3 Phase essence
|
|
421
|
+
|
|
422
|
+
### Phase 1 — Initial setup (active onboarding)
|
|
423
|
+
|
|
424
|
+
Greeting from the FH cwd → AI proactively proposes → asks about task → runs 5 skills → setup → hands off to project cwd.
|
|
425
|
+
|
|
426
|
+
### Phase 2 — Backstage optimization
|
|
427
|
+
|
|
428
|
+
User works from the **field project cwd**. The hub is not directly invoked but performs lateral optimization: `.claudeignore` applied, model switching active, fh-meta skills naturally activate from description triggers.
|
|
429
|
+
|
|
430
|
+
### Phase 3 — Threshold return (autonomous proposals)
|
|
431
|
+
|
|
432
|
+
When work matures and new skills or upgrades are possible, this AI **proactively proposes** returning to meta-harness from the field cwd.
|
|
433
|
+
|
|
434
|
+
| Trigger | Signal |
|
|
435
|
+
|---|---|
|
|
436
|
+
| New generalizable pattern emerges | First discovery of a pattern worth promoting |
|
|
437
|
+
| 3+ accumulated upgrades | Stabilization signal from the same asset evolving |
|
|
438
|
+
| Sister asset absorption | External PR audit gate passed |
|
|
439
|
+
|
|
440
|
+
### Command tower pattern (advanced)
|
|
441
|
+
|
|
442
|
+
| Task type | Recommended location |
|
|
443
|
+
|---|---|
|
|
444
|
+
| Single project coding/debugging | That project's cwd |
|
|
445
|
+
| Meta/audit/simulation | **Meta-harness cwd + Agent** |
|
|
446
|
+
| 2+ projects simultaneously | **Meta-harness cwd + parallel Agent** |
|
|
447
|
+
| field-harvest · PR audit · CATALOG updates | **Meta-harness cwd + Agent** |
|
|
448
|
+
|
|
449
|
+
---
|
|
450
|
+
|
|
451
|
+
## Steel-quench convergence — multi-layer defense
|
|
452
|
+
|
|
453
|
+
| Layer | Mechanism |
|
|
454
|
+
|:---:|---|
|
|
455
|
+
| **L1** | harness-doctor + context-doctor + sim-conductor Area B — isolated third-person evaluation |
|
|
456
|
+
| **L2** | Real user feedback + external PR review — evidence generated outside owner environment |
|
|
457
|
+
| **L3** | steel-quench pre-runs attack angles internally; flaws patched before external devils run |
|
|
458
|
+
| **L4** | Meta-aware adversary — remaining attack surface shrinks per wave |
|
|
459
|
+
|
|
460
|
+
---
|
|
461
|
+
|
|
462
|
+
## Research & external validation
|
|
463
|
+
|
|
464
|
+
> **FH v1.0 paper** — published 2026-05-30 on [Zenodo](https://zenodo.org/records/20397566) (DOI: 10.5281/zenodo.20397566) · arXiv submission in review. Documents the 2-layer design, 6-axis framework, 4-agent orchestration, and compounding loop with controlled empirical evidence.
|
|
465
|
+
|
|
466
|
+
Three independent research findings converge on the same layer:
|
|
467
|
+
- VILA-Lab analysis of Claude Code v2.1.88 (512K lines): [98.4% is harness infrastructure, 1.6% AI logic](https://arxiv.org/abs/2604.14228)
|
|
468
|
+
- "[Code as Agent Harness](https://arxiv.org/abs/2605.18747)" (arXiv, May 2026, 43 authors)
|
|
469
|
+
- Stanford IRIS Lab: "[Meta-Harness](https://arxiv.org/abs/2603.28052)" (Lee et al., Mar 2026) — outer-loop harness optimization; +7.7pts at 4× fewer tokens
|
|
470
|
+
|
|
471
|
+
The Stanford paper also inspired [harness-evolver](https://github.com/raphaelchristi/harness-evolver) (fully automated CODE optimizer). FH converged on the same loop architecture from the opposite direction — complementary, not competing. See `knowledge/shared/harness-core/fh_ecosystem_positioning.md`.
|
|
472
|
+
|
|
473
|
+
---
|
|
474
|
+
|
|
475
|
+
## Learn more
|
|
476
|
+
|
|
477
|
+
- `CLAUDE.md` — Sync/Push protocol · AI operating rules
|
|
478
|
+
- `AGENTS.md` — Runtime agent specs
|
|
479
|
+
- `CATALOG.md` — Search index
|
|
480
|
+
- `CHEATSHEET.md` — Full command reference
|
|
481
|
+
- `CONTRIBUTING.md` — How to contribute skills and patterns
|
|
482
|
+
- `knowledge/shared/harness-core/fh_integration_contract.md` — Governance layer spec
|
|
483
|
+
|
|
484
|
+
---
|
|
485
|
+
|
|
486
|
+
## Appendix
|
|
487
|
+
|
|
488
|
+
### Directory structure
|
|
489
|
+
|
|
490
|
+
```
|
|
491
|
+
forge-harness/
|
|
492
|
+
├── knowledge/ # Pure knowledge — time-independent, for reference
|
|
493
|
+
│ ├── domain/ # Domain-specific knowledge
|
|
494
|
+
│ └── shared/ # Cross-project patterns
|
|
495
|
+
│
|
|
496
|
+
├── tracks/ # Work records per project — time-accumulated
|
|
497
|
+
│ └── {project_name}/
|
|
498
|
+
│ ├── session_*.md # Session history
|
|
499
|
+
│ └── learnings/ # Accumulated feedback
|
|
500
|
+
│
|
|
501
|
+
├── plugins/ # fh-meta + fh-commons plugins
|
|
502
|
+
├── templates/ # Skeletons to copy for new projects
|
|
503
|
+
├── scripts/ # fh-gate.sh and automation scripts
|
|
504
|
+
├── docs/ # Diagrams and reference assets
|
|
505
|
+
├── CATALOG.md # Full search index
|
|
506
|
+
├── CLAUDE.md # AI operating rules + Sync/Push protocol
|
|
507
|
+
└── CHEATSHEET.md # Command cheat sheet
|
|
508
|
+
```
|
|
509
|
+
|
|
510
|
+
### Key terms
|
|
511
|
+
|
|
512
|
+
| Term | Definition |
|
|
513
|
+
|---|---|
|
|
514
|
+
| **Meta-harness** | A persistent hub connecting work, learnings, and patterns of N Claude Code projects for mutual reinforcement |
|
|
515
|
+
| **Launch pad effect** | Meta-harness as launch pad, not destination — passing through accelerates the starting line |
|
|
516
|
+
| **Shared skill pool** | Common skill/agent pool eliminating reinvention cost across teams and projects |
|
|
517
|
+
| **Environment engineering** | Not making the agent smarter, but making the environment easier for the agent to work in |
|
|
518
|
+
| **Harness engineering** | Per-project structures (rules, gates, context management) that make AI behave more consistently |
|
|
519
|
+
| **Meta harness engineering** | Cross-project system to measure, improve, and evolve harnesses — FH's core layer |
|
package/bin/fh-gate
ADDED
|
@@ -0,0 +1,328 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: fh-integration-contract
|
|
3
|
+
description: FH governance gate interface specification — inputs, verdict format, findings schema, invocation patterns. Bridge layer item 3. Defines how OpenCode, OpenHuman, Hermes, and CI systems call FH gates and receive structured verdicts.
|
|
4
|
+
date: 2026-05-31
|
|
5
|
+
tags: [integration-contract, governance, opencode, hermes, openhuman, bridge-layer, v2-paper]
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
# FH Integration Contract
|
|
9
|
+
|
|
10
|
+
## Status
|
|
11
|
+
|
|
12
|
+
**v1.0 — Binary available.** `scripts/fh-gate.sh` executes governance review end-to-end via `claude --print`.
|
|
13
|
+
CI-ready: machine-parseable verdict + exit codes (0=PASS / 1=PENDING / 2=BLOCKED / 3=ESCALATE / 10=harness error).
|
|
14
|
+
Backward-compatible: `FH_DRY_RUN=1` restores prompt-only (v0.1) behavior.
|
|
15
|
+
|
|
16
|
+
---
|
|
17
|
+
|
|
18
|
+
## The Interface in One Diagram
|
|
19
|
+
|
|
20
|
+
```
|
|
21
|
+
Caller (OpenCode / OpenHuman / Hermes / CI)
|
|
22
|
+
│
|
|
23
|
+
│ provides: file list + diff path + gate level + caller ID
|
|
24
|
+
▼
|
|
25
|
+
FH governance gate (steel-quench + pipeline-conductor)
|
|
26
|
+
│
|
|
27
|
+
│ returns: verdict + findings list + record path
|
|
28
|
+
▼
|
|
29
|
+
Caller reads verdict, decides: merge / hold / escalate
|
|
30
|
+
```
|
|
31
|
+
|
|
32
|
+
---
|
|
33
|
+
|
|
34
|
+
## Input Specification
|
|
35
|
+
|
|
36
|
+
### Required
|
|
37
|
+
|
|
38
|
+
| Input | Form | Description |
|
|
39
|
+
|---|---|---|
|
|
40
|
+
| `FH_TARGET_FILES` | newline-separated file paths (one per line) | Files to review (changed by caller). Use newlines, not spaces — space-separation breaks on paths with spaces. |
|
|
41
|
+
| `FH_CALLER` | string | Identifier of calling system (`opencode` · `hermes` · `openhuman` · `ci`) |
|
|
42
|
+
| `FH_GATE_LEVEL` | `quick` or `full` | `quick` = Axes 2+3 only; `full` = all 4 axes |
|
|
43
|
+
|
|
44
|
+
### Optional
|
|
45
|
+
|
|
46
|
+
| Input | Form | Description |
|
|
47
|
+
|---|---|---|
|
|
48
|
+
| `FH_DIFF_PATH` | file path | Pre-generated diff file (skips Step 1 if provided) |
|
|
49
|
+
| `FH_TASK_DESCRIPTION` | string | What the caller was trying to accomplish (context for adversarial pass) |
|
|
50
|
+
| `FH_SECURITY_LENS` | `on` or `off` (default `off`) | Force security-adjacent focus in steel-quench |
|
|
51
|
+
|
|
52
|
+
### Capture pattern (caller's responsibility)
|
|
53
|
+
|
|
54
|
+
```bash
|
|
55
|
+
# Caller generates these before invoking FH:
|
|
56
|
+
# Note: newline-separated — do NOT use tr '\n' ' ' (breaks on paths with spaces)
|
|
57
|
+
FH_TARGET_FILES=$(git diff main..HEAD --name-only)
|
|
58
|
+
FH_DIFF_PATH=/tmp/fh_input_${FH_CALLER}_$(date +%Y%m%d_%H%M%S).diff
|
|
59
|
+
git diff main..HEAD > "$FH_DIFF_PATH"
|
|
60
|
+
FH_GATE_LEVEL=quick
|
|
61
|
+
FH_CALLER=opencode
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
---
|
|
65
|
+
|
|
66
|
+
## Verdict Specification
|
|
67
|
+
|
|
68
|
+
### Verdict values
|
|
69
|
+
|
|
70
|
+
| Verdict | Meaning | Caller action |
|
|
71
|
+
|---|---|---|
|
|
72
|
+
| `PASS` | No findings. Axes all green. | Proceed to merge / ship |
|
|
73
|
+
| `PENDING` | 1+ B-grade findings, or Axis 3/4 weak. No A-grade. | Proceed with awareness; log findings |
|
|
74
|
+
| `BLOCKED` | 1+ A-grade findings. Structural or security-critical. | Do not merge. Surface to developer. Re-run governance after fix. |
|
|
75
|
+
| `ESCALATE` | Ambiguous A-grade or out-of-scope for automated verdict. | Human decision required before merge. |
|
|
76
|
+
|
|
77
|
+
### Output format
|
|
78
|
+
|
|
79
|
+
FH writes verdict to stdout as structured text (human-readable + machine-parseable).
|
|
80
|
+
`FH_STATUS` is MANDATORY — it MUST precede the verdict so callers detect harness failures (LLM timeout, token exhaustion) that would otherwise silently appear as PASS.
|
|
81
|
+
|
|
82
|
+
```
|
|
83
|
+
FH_STATUS: SUCCESS
|
|
84
|
+
FH_GATE_VERDICT: PENDING
|
|
85
|
+
FH_CALLER: opencode
|
|
86
|
+
FH_TIMESTAMP: 2026-05-31T12:00:00Z
|
|
87
|
+
FH_FINDINGS_COUNT: 3
|
|
88
|
+
FH_FINDINGS_A: 2
|
|
89
|
+
FH_FINDINGS_B: 1
|
|
90
|
+
FH_RECORD_PATH: tracks/_meta/governance_log_2026-05-31.yaml
|
|
91
|
+
---
|
|
92
|
+
findings:
|
|
93
|
+
- grade: A
|
|
94
|
+
location: "prefix() lines 1-9"
|
|
95
|
+
title: "Short-token overflow — allowlist pattern may not cover bare commands"
|
|
96
|
+
evidence: "tokens.slice(0, arity) with arity=3 and len=2 returns 2 tokens; pattern 'git stash *' may not match bare 'git stash'"
|
|
97
|
+
fix: "Add test: expect(BashArity.prefix(['git', 'stash'])).toEqual(['git', 'stash']); add explicit comment that slice handles overflow"
|
|
98
|
+
- grade: A
|
|
99
|
+
location: "ARITY table lines 24-161"
|
|
100
|
+
title: "npx/opencode/claude absent — overly broad permission patterns"
|
|
101
|
+
evidence: "All npx <package> commands receive same 'npx *' pattern; security model weakened"
|
|
102
|
+
fix: "Add: npx: 2, opencode: 2, claude: 2, bunx: 2, uvx: 2"
|
|
103
|
+
```
|
|
104
|
+
|
|
105
|
+
**Key design decisions** (addressing Gemini Finding 1+2):
|
|
106
|
+
- Findings use YAML block (under `findings:` key) — NOT flat `FINDING_N_KEY: value`. This prevents delimiter ambiguity when `evidence` or `fix` spans multiple lines.
|
|
107
|
+
- `FH_STATUS: SUCCESS|ERROR` is mandatory first field — a missing or `ERROR` status means the harness itself failed (do NOT interpret as PASS).
|
|
108
|
+
|
|
109
|
+
### Parse recipe (for CI/CD integration)
|
|
110
|
+
|
|
111
|
+
```bash
|
|
112
|
+
# Parse verdict from FH output (file or stdin)
|
|
113
|
+
FH_OUT=$(cat /tmp/fh_verdict.txt)
|
|
114
|
+
STATUS=$(echo "$FH_OUT" | grep "^FH_STATUS:" | awk '{print $2}')
|
|
115
|
+
VERDICT=$(echo "$FH_OUT" | grep "^FH_GATE_VERDICT:" | awk '{print $2}')
|
|
116
|
+
A_COUNT=$(echo "$FH_OUT" | grep "^FH_FINDINGS_A:" | awk '{print $2}')
|
|
117
|
+
|
|
118
|
+
# Guard: treat harness failure as BLOCKED (fail-safe)
|
|
119
|
+
if [ "$STATUS" != "SUCCESS" ]; then
|
|
120
|
+
echo "FH harness error — treating as BLOCKED (fail-safe)"
|
|
121
|
+
exit 1
|
|
122
|
+
fi
|
|
123
|
+
|
|
124
|
+
if [ "$VERDICT" = "BLOCKED" ] || [ "${A_COUNT:-0}" -gt "0" ]; then
|
|
125
|
+
echo "FH governance: BLOCKED — do not merge"
|
|
126
|
+
exit 1
|
|
127
|
+
fi
|
|
128
|
+
```
|
|
129
|
+
|
|
130
|
+
---
|
|
131
|
+
|
|
132
|
+
## Invocation Patterns
|
|
133
|
+
|
|
134
|
+
### Pattern 1 — Manual invocation from Claude Code session
|
|
135
|
+
|
|
136
|
+
The primary form until binary is available. Run inside a Claude Code session:
|
|
137
|
+
|
|
138
|
+
```
|
|
139
|
+
Run FH governance pass (gate level: quick):
|
|
140
|
+
Target files: $FH_TARGET_FILES
|
|
141
|
+
Caller: opencode
|
|
142
|
+
Security lens: on (arity.ts is permission-adjacent)
|
|
143
|
+
|
|
144
|
+
Steps:
|
|
145
|
+
1. Read each file in target list
|
|
146
|
+
2. Run steel-quench adversarial pass — behavioral edge cases, untested contracts, security assumptions
|
|
147
|
+
3. Run pipeline-conductor --quick — 4-axis verdict
|
|
148
|
+
4. Output structured verdict in FH_GATE_VERDICT format
|
|
149
|
+
5. Log to tracks/_meta/governance_log_{date}.yaml
|
|
150
|
+
```
|
|
151
|
+
|
|
152
|
+
### Pattern 2 — Bash wrapper (approximation, no binary)
|
|
153
|
+
|
|
154
|
+
```bash
|
|
155
|
+
#!/usr/bin/env bash
|
|
156
|
+
# scripts/fh-gate.sh — FH governance wrapper (methodology layer only)
|
|
157
|
+
# Outputs a governance prompt to stdout for Claude Code to execute
|
|
158
|
+
|
|
159
|
+
set -euo pipefail
|
|
160
|
+
|
|
161
|
+
FH_TARGET_FILES="${1:-$(git diff main..HEAD --name-only | tr '\n' ' ')}"
|
|
162
|
+
FH_GATE_LEVEL="${2:-quick}"
|
|
163
|
+
FH_CALLER="${3:-unknown}"
|
|
164
|
+
|
|
165
|
+
cat <<EOF
|
|
166
|
+
Run FH governance pass on: $FH_TARGET_FILES
|
|
167
|
+
Gate level: $FH_GATE_LEVEL | Caller: $FH_CALLER
|
|
168
|
+
|
|
169
|
+
Step 1: Read all target files
|
|
170
|
+
Step 2: steel-quench adversarial pass (behavioral edge cases, untested contracts, security assumptions)
|
|
171
|
+
Step 3: pipeline-conductor --$FH_GATE_LEVEL (4-axis: backward / adversarial / forward / record)
|
|
172
|
+
Step 4: Output in FH_GATE_VERDICT format (PASS / PENDING / BLOCKED / ESCALATE)
|
|
173
|
+
Step 5: Write findings to tracks/_meta/governance_log_$(date +%Y-%m-%d).yaml
|
|
174
|
+
EOF
|
|
175
|
+
```
|
|
176
|
+
|
|
177
|
+
Usage: `./scripts/fh-gate.sh "src/permission/arity.ts" quick opencode`
|
|
178
|
+
|
|
179
|
+
### Pattern 3 — Stop hook (automated post-session)
|
|
180
|
+
|
|
181
|
+
Add to project's `.claude/settings.json`:
|
|
182
|
+
|
|
183
|
+
```json
|
|
184
|
+
{
|
|
185
|
+
"hooks": {
|
|
186
|
+
"Stop": [{
|
|
187
|
+
"matcher": "",
|
|
188
|
+
"hooks": [{
|
|
189
|
+
"type": "command",
|
|
190
|
+
"command": "bash ~/projects/forge-harness/scripts/fh-gate.sh \"$(git diff main..HEAD --name-only | tr '\\n' ' ')\" quick auto >> /tmp/fh-governance-queue.txt"
|
|
191
|
+
}]
|
|
192
|
+
}]
|
|
193
|
+
}
|
|
194
|
+
}
|
|
195
|
+
```
|
|
196
|
+
|
|
197
|
+
Check queue on next session start: `cat /tmp/fh-governance-queue.txt`
|
|
198
|
+
|
|
199
|
+
---
|
|
200
|
+
|
|
201
|
+
## Caller-Specific Guidance
|
|
202
|
+
|
|
203
|
+
### OpenCode → FH
|
|
204
|
+
|
|
205
|
+
OpenCode generates code fast. FH governance runs after generation, before review.
|
|
206
|
+
|
|
207
|
+
```bash
|
|
208
|
+
# After opencode run completes:
|
|
209
|
+
FH_TARGET_FILES=$(git diff main..HEAD --name-only | tr '\n' ' ')
|
|
210
|
+
FH_SECURITY_LENS=on # OpenCode touches broad surfaces; security lens default on
|
|
211
|
+
FH_GATE_LEVEL=quick
|
|
212
|
+
```
|
|
213
|
+
|
|
214
|
+
Full chain: `hermes.opencode` → code generated → `fh-gate.sh` → verdict → `hermes.github-code-review` → PR with governance findings inline.
|
|
215
|
+
|
|
216
|
+
### Hermes → FH
|
|
217
|
+
|
|
218
|
+
Hermes dispatches FH governance as a step in multi-agent pipelines:
|
|
219
|
+
|
|
220
|
+
```
|
|
221
|
+
# In hermes skill workflow:
|
|
222
|
+
After code generation step:
|
|
223
|
+
→ Dispatch FH governance: Read fh-gate.sh, execute with target_files=$CHANGED
|
|
224
|
+
→ If verdict=BLOCKED: halt pipeline, surface findings to user
|
|
225
|
+
→ If verdict=PENDING: attach findings to PR body, continue
|
|
226
|
+
→ If verdict=PASS: proceed to github-code-review step
|
|
227
|
+
```
|
|
228
|
+
|
|
229
|
+
See `hermes-agent/skills/autonomous-ai-agents/opencode/SKILL.md` for the full Hermes→OpenCode→FH chain.
|
|
230
|
+
|
|
231
|
+
### OpenHuman → FH
|
|
232
|
+
|
|
233
|
+
OpenHuman's Memory Tree stores conversation history. FH audit target: memory entries that reference code paths or technical decisions.
|
|
234
|
+
|
|
235
|
+
```bash
|
|
236
|
+
# FH harvest-loop on OpenHuman memory:
|
|
237
|
+
FH_TARGET_FILES=$(find ~/.openhuman/memory -name "*.md" -newer tracks/_meta/last_harvest.marker)
|
|
238
|
+
FH_GATE_LEVEL=quick
|
|
239
|
+
FH_CALLER=openhuman
|
|
240
|
+
```
|
|
241
|
+
|
|
242
|
+
FH validates: are memory entries grounded? Do referenced paths still exist? Are technical claims still accurate?
|
|
243
|
+
|
|
244
|
+
### CI/CD → FH
|
|
245
|
+
|
|
246
|
+
Post-merge governance for AI-generated modules:
|
|
247
|
+
|
|
248
|
+
```yaml
|
|
249
|
+
# .github/workflows/fh-governance.yml (future)
|
|
250
|
+
on:
|
|
251
|
+
pull_request:
|
|
252
|
+
paths: ['**/*.ts', '**/*.py']
|
|
253
|
+
|
|
254
|
+
jobs:
|
|
255
|
+
fh-governance:
|
|
256
|
+
steps:
|
|
257
|
+
- uses: actions/checkout@v4
|
|
258
|
+
- name: FH governance gate
|
|
259
|
+
run: |
|
|
260
|
+
CHANGED=$(git diff origin/main..HEAD --name-only | tr '\n' ' ')
|
|
261
|
+
bash scripts/fh-gate.sh "$CHANGED" quick ci
|
|
262
|
+
```
|
|
263
|
+
|
|
264
|
+
---
|
|
265
|
+
|
|
266
|
+
## Record Specification
|
|
267
|
+
|
|
268
|
+
Every governance pass writes a record entry:
|
|
269
|
+
|
|
270
|
+
```yaml
|
|
271
|
+
# tracks/_meta/governance_log_YYYY-MM-DD.yaml
|
|
272
|
+
- timestamp: 2026-05-31T12:00:00Z
|
|
273
|
+
caller: opencode
|
|
274
|
+
gate_level: quick
|
|
275
|
+
target_files:
|
|
276
|
+
- packages/opencode/src/permission/arity.ts
|
|
277
|
+
verdict: PENDING
|
|
278
|
+
findings:
|
|
279
|
+
- grade: A
|
|
280
|
+
location: "prefix() lines 1-9"
|
|
281
|
+
title: "Short-token overflow"
|
|
282
|
+
- grade: A
|
|
283
|
+
location: "ARITY table lines 24-161"
|
|
284
|
+
title: "npx/opencode/claude absent"
|
|
285
|
+
- grade: B
|
|
286
|
+
location: "ARITY table + generation comment"
|
|
287
|
+
title: "No maintenance protocol"
|
|
288
|
+
calibration:
|
|
289
|
+
predicted_findings: 2
|
|
290
|
+
actual_findings: 3
|
|
291
|
+
delta: +1
|
|
292
|
+
```
|
|
293
|
+
|
|
294
|
+
Record path is included in every verdict output as `FH_RECORD_PATH`. This feeds `harvest-loop` calibration.
|
|
295
|
+
|
|
296
|
+
---
|
|
297
|
+
|
|
298
|
+
## What This Contract Does NOT Specify (Bridge Layer v1.0)
|
|
299
|
+
|
|
300
|
+
The following require the bridge layer and are out of scope for v0.1:
|
|
301
|
+
|
|
302
|
+
| Feature | Why deferred |
|
|
303
|
+
|---|---|
|
|
304
|
+
| Binary / installable package | FH is methodology layer; no runtime distribution yet |
|
|
305
|
+
| REST API or webhook | Would require a server process — FH is file-based |
|
|
306
|
+
| Streaming verdict updates | Requires runtime; methodology layer is synchronous |
|
|
307
|
+
| Multi-file parallel governance | Possible via agent dispatch today; not formalized here |
|
|
308
|
+
| Verdict caching | No state store beyond `tracks/`; governance runs fresh each time |
|
|
309
|
+
|
|
310
|
+
The bridge layer (v1.0) will implement these. This contract is the specification they implement against.
|
|
311
|
+
|
|
312
|
+
---
|
|
313
|
+
|
|
314
|
+
## Version History
|
|
315
|
+
|
|
316
|
+
| Version | Date | Change |
|
|
317
|
+
|---|---|---|
|
|
318
|
+
| v0.1 | 2026-05-31 | Initial specification. Bash invocation patterns + structured verdict format. Empirical basis: arity.ts controlled trial. |
|
|
319
|
+
|
|
320
|
+
---
|
|
321
|
+
|
|
322
|
+
## References
|
|
323
|
+
|
|
324
|
+
- `fh_opencode_governance_wrapper.md` — step-by-step usage guide (less formal, more tutorial)
|
|
325
|
+
- `fh_ecosystem_positioning.md` — ecosystem context + synergy map + v2 paper connection
|
|
326
|
+
- `tracks/_meta/fh_opencode_governance_experiment_2026_05_31.md` — empirical basis for verdict format
|
|
327
|
+
- `multi_model_sidecar_strategy.md` — multi-model orchestration (related pattern)
|
|
328
|
+
- FH paper (Zenodo: 10.5281/zenodo.20397566) — harness-as-durable-layer thesis this contract operationalizes
|
package/package.json
ADDED
|
@@ -0,0 +1,41 @@
|
|
|
1
|
+
{
|
|
2
|
+
"name": "@chrono-meta/fh-gate",
|
|
3
|
+
"version": "1.0.0",
|
|
4
|
+
"description": "FH governance gate — runs structured AI code review via claude --print and returns machine-parseable verdicts (PASS/PENDING/BLOCKED/ESCALATE).",
|
|
5
|
+
"license": "MIT",
|
|
6
|
+
"keywords": [
|
|
7
|
+
"ai-governance",
|
|
8
|
+
"code-review",
|
|
9
|
+
"claude",
|
|
10
|
+
"claude-code",
|
|
11
|
+
"ci",
|
|
12
|
+
"harness"
|
|
13
|
+
],
|
|
14
|
+
"repository": {
|
|
15
|
+
"type": "git",
|
|
16
|
+
"url": "https://github.com/chrono-meta/forge-harness.git"
|
|
17
|
+
},
|
|
18
|
+
"bin": {
|
|
19
|
+
"fh-gate": "./bin/fh-gate"
|
|
20
|
+
},
|
|
21
|
+
"scripts": {
|
|
22
|
+
"prepare": "chmod +x bin/fh-gate scripts/fh-gate.sh"
|
|
23
|
+
},
|
|
24
|
+
"engines": {
|
|
25
|
+
"node": ">=16"
|
|
26
|
+
},
|
|
27
|
+
"peerDependencies": {
|
|
28
|
+
"@anthropic-ai/claude-code": "*"
|
|
29
|
+
},
|
|
30
|
+
"peerDependenciesMeta": {
|
|
31
|
+
"@anthropic-ai/claude-code": {
|
|
32
|
+
"optional": false
|
|
33
|
+
}
|
|
34
|
+
},
|
|
35
|
+
"files": [
|
|
36
|
+
"bin/fh-gate",
|
|
37
|
+
"scripts/fh-gate.sh",
|
|
38
|
+
"knowledge/shared/harness-core/fh_integration_contract.md",
|
|
39
|
+
"README.md"
|
|
40
|
+
]
|
|
41
|
+
}
|
|
@@ -0,0 +1,239 @@
|
|
|
1
|
+
#!/usr/bin/env bash
|
|
2
|
+
# fh-gate.sh — FH governance gate v1.0
|
|
3
|
+
#
|
|
4
|
+
# Executes governance review end-to-end via claude --print.
|
|
5
|
+
# CI-ready: machine-parseable verdict + exit codes.
|
|
6
|
+
#
|
|
7
|
+
# Usage:
|
|
8
|
+
# ./scripts/fh-gate.sh [FILES] [LEVEL] [CALLER]
|
|
9
|
+
# ./scripts/fh-gate.sh # auto-detect from git diff
|
|
10
|
+
# ./scripts/fh-gate.sh "src/foo.ts src/bar.ts" # explicit files
|
|
11
|
+
# ./scripts/fh-gate.sh "src/foo.ts" full opencode # explicit level + caller
|
|
12
|
+
#
|
|
13
|
+
# Exit codes:
|
|
14
|
+
# 0 — PASS (no findings)
|
|
15
|
+
# 1 — PENDING (B-grade findings; proceed with awareness)
|
|
16
|
+
# 2 — BLOCKED (A-grade findings; do not merge)
|
|
17
|
+
# 3 — ESCALATE (human decision required)
|
|
18
|
+
# 10 — Harness error (claude unavailable, timeout, or FH_STATUS != SUCCESS)
|
|
19
|
+
# 11 — Argument error (invalid level, no files)
|
|
20
|
+
#
|
|
21
|
+
# Environment:
|
|
22
|
+
# FH_DRY_RUN=1 generate prompt only, skip claude invocation (v0.1 behavior)
|
|
23
|
+
# FH_MODEL=<model> claude model to use (default: claude-sonnet-4-6)
|
|
24
|
+
# FH_TIMEOUT=120 seconds before claude --print is killed (default: 120)
|
|
25
|
+
# FH_VERBOSE=1 print full claude output to stderr
|
|
26
|
+
# FH_RECORD_BASE=<p> directory for governance_log YAML (default: FH_ROOT/tracks/_meta)
|
|
27
|
+
|
|
28
|
+
set -euo pipefail
|
|
29
|
+
|
|
30
|
+
VERSION="1.0.0"
|
|
31
|
+
FH_ROOT="$(cd "$(dirname "$0")/.." && pwd)"
|
|
32
|
+
_TMPDIR="${TMPDIR:-/tmp}"
|
|
33
|
+
|
|
34
|
+
EXIT_PASS=0
|
|
35
|
+
EXIT_PENDING=1
|
|
36
|
+
EXIT_BLOCKED=2
|
|
37
|
+
EXIT_ESCALATE=3
|
|
38
|
+
EXIT_HARNESS_ERROR=10
|
|
39
|
+
EXIT_ARG_ERROR=11
|
|
40
|
+
|
|
41
|
+
TARGET_FILES="${1:-}"
|
|
42
|
+
GATE_LEVEL="${2:-quick}"
|
|
43
|
+
FH_CALLER="${3:-ci}"
|
|
44
|
+
|
|
45
|
+
FH_DRY_RUN="${FH_DRY_RUN:-0}"
|
|
46
|
+
FH_MODEL="${FH_MODEL:-claude-sonnet-4-6}"
|
|
47
|
+
FH_TIMEOUT="${FH_TIMEOUT:-120}"
|
|
48
|
+
FH_VERBOSE="${FH_VERBOSE:-0}"
|
|
49
|
+
|
|
50
|
+
# Smart record base: FH repo → tracks/_meta/; standalone npm install → ~/.fh/logs/
|
|
51
|
+
if [[ -z "${FH_RECORD_BASE:-}" ]]; then
|
|
52
|
+
if [[ -d "${FH_ROOT}/tracks/_meta" ]]; then
|
|
53
|
+
FH_RECORD_BASE="${FH_ROOT}/tracks/_meta"
|
|
54
|
+
else
|
|
55
|
+
FH_RECORD_BASE="${HOME}/.fh/logs"
|
|
56
|
+
mkdir -p "$FH_RECORD_BASE"
|
|
57
|
+
fi
|
|
58
|
+
fi
|
|
59
|
+
|
|
60
|
+
# --- Validation ---
|
|
61
|
+
if [[ "$GATE_LEVEL" != "quick" && "$GATE_LEVEL" != "full" ]]; then
|
|
62
|
+
echo "ERROR: gate level must be 'quick' or 'full' (got: $GATE_LEVEL)" >&2
|
|
63
|
+
exit $EXIT_ARG_ERROR
|
|
64
|
+
fi
|
|
65
|
+
|
|
66
|
+
# Auto-detect files from git diff (B1: configurable base branch)
|
|
67
|
+
FH_BASE_BRANCH="${FH_BASE_BRANCH:-main}"
|
|
68
|
+
if [[ -z "$TARGET_FILES" ]]; then
|
|
69
|
+
TARGET_FILES=$(git -C "$FH_ROOT" diff "${FH_BASE_BRANCH}..HEAD" --name-only 2>/dev/null | tr '\n' ' ' | xargs || true)
|
|
70
|
+
if [[ -z "$TARGET_FILES" ]]; then
|
|
71
|
+
TARGET_FILES=$(git -C "$FH_ROOT" status --short 2>/dev/null | awk '{print $2}' | tr '\n' ' ' | xargs || true)
|
|
72
|
+
fi
|
|
73
|
+
fi
|
|
74
|
+
|
|
75
|
+
if [[ -z "$TARGET_FILES" ]]; then
|
|
76
|
+
echo "ERROR: no files found (git diff returned empty; pass files explicitly)" >&2
|
|
77
|
+
exit $EXIT_ARG_ERROR
|
|
78
|
+
fi
|
|
79
|
+
|
|
80
|
+
# Security lens auto-detect
|
|
81
|
+
SECURITY_LENS="off"
|
|
82
|
+
if echo "$TARGET_FILES" | grep -qiE "(permission|auth|token|secret|key|cred|security|vulnerability|csrf|inject|sanitize)"; then
|
|
83
|
+
SECURITY_LENS="on"
|
|
84
|
+
fi
|
|
85
|
+
|
|
86
|
+
TIMESTAMP=$(date +%Y-%m-%dT%H:%M:%SZ)
|
|
87
|
+
RECORD_PATH="${FH_RECORD_BASE}/governance_log_$(date +%Y-%m-%d).yaml"
|
|
88
|
+
PROMPT_FILE=$(mktemp "${_TMPDIR}/fh_gate_prompt_XXXXXX.txt")
|
|
89
|
+
OUTPUT_FILE=$(mktemp "${_TMPDIR}/fh_gate_output_XXXXXX.txt")
|
|
90
|
+
ERR_FILE=$(mktemp "${_TMPDIR}/fh_gate_err_XXXXXX.txt")
|
|
91
|
+
|
|
92
|
+
# Pre-compute values that need transformation (bash 3.2 compat — no ${VAR^^})
|
|
93
|
+
GATE_LEVEL_UPPER=$(echo "$GATE_LEVEL" | tr '[:lower:]' '[:upper:]')
|
|
94
|
+
FILES_LIST=$(echo "$TARGET_FILES" | tr ' ' '\n' | grep -v '^$' | sed 's/^/ - /')
|
|
95
|
+
SECURITY_EXTRA=""
|
|
96
|
+
[ "$SECURITY_LENS" = "on" ] && SECURITY_EXTRA=", permission model gaps"
|
|
97
|
+
|
|
98
|
+
if [ "$GATE_LEVEL" = "quick" ]; then
|
|
99
|
+
AXES_BLOCK=" - Axis 2 (Adversarial): findings from Step 2
|
|
100
|
+
- Axis 3 (Forward): phantom references, broken paths, stale claims"
|
|
101
|
+
else
|
|
102
|
+
AXES_BLOCK=" - Axis 1 (Backward): regression risk vs prior version
|
|
103
|
+
- Axis 2 (Adversarial): findings from Step 2
|
|
104
|
+
- Axis 3 (Forward): phantom references, broken paths, stale claims
|
|
105
|
+
- Axis 4 (Record): calibration log entry"
|
|
106
|
+
fi
|
|
107
|
+
|
|
108
|
+
cleanup() { rm -f "$PROMPT_FILE" "$OUTPUT_FILE" "$ERR_FILE"; }
|
|
109
|
+
trap cleanup EXIT
|
|
110
|
+
|
|
111
|
+
# --- Build prompt ---
|
|
112
|
+
cat > "$PROMPT_FILE" <<PROMPT
|
|
113
|
+
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
|
114
|
+
FH GOVERNANCE GATE v${VERSION} — ${GATE_LEVEL_UPPER} PASS
|
|
115
|
+
Caller: ${FH_CALLER} | Timestamp: ${TIMESTAMP}
|
|
116
|
+
Security lens: ${SECURITY_LENS}
|
|
117
|
+
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
|
118
|
+
|
|
119
|
+
Target files:
|
|
120
|
+
${FILES_LIST}
|
|
121
|
+
|
|
122
|
+
Execute these steps in order:
|
|
123
|
+
|
|
124
|
+
Step 1 — Read all target files listed above.
|
|
125
|
+
|
|
126
|
+
Step 2 — Adversarial pass (steel-quench angles):
|
|
127
|
+
Focus: behavioral edge cases, untested contracts, security assumptions${SECURITY_EXTRA}
|
|
128
|
+
Find findings and grade each: A=blocking / B=warning / C=note.
|
|
129
|
+
|
|
130
|
+
Step 3 — pipeline-conductor --${GATE_LEVEL}:
|
|
131
|
+
${AXES_BLOCK}
|
|
132
|
+
|
|
133
|
+
Step 4 — Output structured verdict. EXACT FORMAT REQUIRED (machine-parsed):
|
|
134
|
+
|
|
135
|
+
FH_STATUS: SUCCESS
|
|
136
|
+
FH_GATE_VERDICT: [PASS|PENDING|BLOCKED|ESCALATE]
|
|
137
|
+
FH_CALLER: ${FH_CALLER}
|
|
138
|
+
FH_TIMESTAMP: ${TIMESTAMP}
|
|
139
|
+
FH_FINDINGS_COUNT: [N]
|
|
140
|
+
FH_FINDINGS_A: [N]
|
|
141
|
+
FH_FINDINGS_B: [N]
|
|
142
|
+
FH_RECORD_PATH: ${RECORD_PATH}
|
|
143
|
+
---
|
|
144
|
+
findings:
|
|
145
|
+
- grade: [A|B|C]
|
|
146
|
+
location: "[file:line or function name]"
|
|
147
|
+
title: "[one-line description]"
|
|
148
|
+
evidence: "[what was observed in the file]"
|
|
149
|
+
fix: "[concrete suggestion]"
|
|
150
|
+
|
|
151
|
+
Verdict rules:
|
|
152
|
+
A-grade present → BLOCKED
|
|
153
|
+
B-grade only → PENDING
|
|
154
|
+
No findings → PASS
|
|
155
|
+
Ambiguous A → ESCALATE
|
|
156
|
+
|
|
157
|
+
FH_STATUS MUST appear first. Missing or ERROR status = harness failure.
|
|
158
|
+
|
|
159
|
+
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
|
160
|
+
PASS=ship | PENDING=proceed with awareness | BLOCKED=fix first | ESCALATE=human decision
|
|
161
|
+
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
|
162
|
+
PROMPT
|
|
163
|
+
|
|
164
|
+
# --- Dry-run: prompt to stdout only (v0.1 behavior) ---
|
|
165
|
+
if [[ "$FH_DRY_RUN" == "1" ]]; then
|
|
166
|
+
cat "$PROMPT_FILE"
|
|
167
|
+
exit $EXIT_PASS
|
|
168
|
+
fi
|
|
169
|
+
|
|
170
|
+
# --- Require claude CLI ---
|
|
171
|
+
if ! command -v claude &>/dev/null; then
|
|
172
|
+
echo "ERROR: 'claude' CLI not found." >&2
|
|
173
|
+
echo " Install: https://claude.ai/code" >&2
|
|
174
|
+
echo " Prompt-only mode: FH_DRY_RUN=1 $0 $*" >&2
|
|
175
|
+
exit $EXIT_HARNESS_ERROR
|
|
176
|
+
fi
|
|
177
|
+
|
|
178
|
+
# --- Invoke ---
|
|
179
|
+
echo "→ fh-gate v${VERSION} [${GATE_LEVEL_UPPER}] caller=${FH_CALLER} security=${SECURITY_LENS}" >&2
|
|
180
|
+
echo " files: ${TARGET_FILES}" >&2
|
|
181
|
+
|
|
182
|
+
# Use gtimeout (macOS coreutils) if available, fall back to timeout, then bare invoke
|
|
183
|
+
_TIMEOUT_CMD=""
|
|
184
|
+
if command -v gtimeout &>/dev/null; then
|
|
185
|
+
_TIMEOUT_CMD="gtimeout ${FH_TIMEOUT}"
|
|
186
|
+
elif command -v timeout &>/dev/null; then
|
|
187
|
+
_TIMEOUT_CMD="timeout ${FH_TIMEOUT}"
|
|
188
|
+
fi
|
|
189
|
+
|
|
190
|
+
if ! ${_TIMEOUT_CMD} claude --print --model "$FH_MODEL" < "$PROMPT_FILE" > "$OUTPUT_FILE" 2>"$ERR_FILE"; then
|
|
191
|
+
echo "ERROR: claude --print failed or timed out (${FH_TIMEOUT}s)" >&2
|
|
192
|
+
cat "$ERR_FILE" >&2
|
|
193
|
+
exit $EXIT_HARNESS_ERROR
|
|
194
|
+
fi
|
|
195
|
+
|
|
196
|
+
[[ "$FH_VERBOSE" == "1" ]] && cat "$ERR_FILE" >&2
|
|
197
|
+
|
|
198
|
+
# --- Parse verdict (B3: -m 1 prevents concatenation on repeated header lines) ---
|
|
199
|
+
FH_STATUS=$(grep -m 1 "^FH_STATUS:" "$OUTPUT_FILE" 2>/dev/null | awk '{print $2}' | tr -d '[:space:]' || true)
|
|
200
|
+
VERDICT=$(grep -m 1 "^FH_GATE_VERDICT:" "$OUTPUT_FILE" 2>/dev/null | awk '{print $2}' | tr -d '[:space:]' || true)
|
|
201
|
+
|
|
202
|
+
# Harness failure guard (fail-safe: missing status → BLOCKED)
|
|
203
|
+
if [[ "$FH_STATUS" != "SUCCESS" ]]; then
|
|
204
|
+
echo "ERROR: FH_STATUS=${FH_STATUS:-MISSING} — harness failure (fail-safe: BLOCKED)" >&2
|
|
205
|
+
cat "$OUTPUT_FILE" >&2
|
|
206
|
+
exit $EXIT_HARNESS_ERROR
|
|
207
|
+
fi
|
|
208
|
+
|
|
209
|
+
# Emit structured output to stdout
|
|
210
|
+
cat "$OUTPUT_FILE"
|
|
211
|
+
|
|
212
|
+
# B4: Write governance log — structured header only (clean YAML, no raw markdown)
|
|
213
|
+
FINDINGS_A_LOG=$(grep -m 1 "^FH_FINDINGS_A:" "$OUTPUT_FILE" 2>/dev/null | awk '{print $2}' | tr -d '[:space:]' || echo "0")
|
|
214
|
+
FINDINGS_B_LOG=$(grep -m 1 "^FH_FINDINGS_B:" "$OUTPUT_FILE" 2>/dev/null | awk '{print $2}' | tr -d '[:space:]' || echo "0")
|
|
215
|
+
FINDINGS_N_LOG=$(grep -m 1 "^FH_FINDINGS_COUNT:" "$OUTPUT_FILE" 2>/dev/null | awk '{print $2}' | tr -d '[:space:]' || echo "0")
|
|
216
|
+
{
|
|
217
|
+
printf -- "- timestamp: %s\n" "$TIMESTAMP"
|
|
218
|
+
printf " caller: %s\n" "$FH_CALLER"
|
|
219
|
+
printf " gate_level: %s\n" "$GATE_LEVEL"
|
|
220
|
+
printf " verdict: %s\n" "$VERDICT"
|
|
221
|
+
printf " findings_total: %s\n" "$FINDINGS_N_LOG"
|
|
222
|
+
printf " findings_a: %s\n" "$FINDINGS_A_LOG"
|
|
223
|
+
printf " findings_b: %s\n" "$FINDINGS_B_LOG"
|
|
224
|
+
printf " files:\n"
|
|
225
|
+
echo "$TARGET_FILES" | tr ' ' '\n' | grep -v '^$' | sed 's/^/ - /'
|
|
226
|
+
printf "\n"
|
|
227
|
+
} >> "$RECORD_PATH" || echo "WARN: governance log write failed: $RECORD_PATH" >&2
|
|
228
|
+
|
|
229
|
+
# Exit code
|
|
230
|
+
case "$VERDICT" in
|
|
231
|
+
PASS) echo "→ verdict: PASS" >&2; exit $EXIT_PASS ;;
|
|
232
|
+
PENDING) echo "→ verdict: PENDING" >&2; exit $EXIT_PENDING ;;
|
|
233
|
+
BLOCKED) echo "→ verdict: BLOCKED" >&2; exit $EXIT_BLOCKED ;;
|
|
234
|
+
ESCALATE) echo "→ verdict: ESCALATE" >&2; exit $EXIT_ESCALATE ;;
|
|
235
|
+
*)
|
|
236
|
+
echo "ERROR: unrecognized verdict '${VERDICT:-EMPTY}' — fail-safe BLOCKED" >&2
|
|
237
|
+
exit $EXIT_HARNESS_ERROR
|
|
238
|
+
;;
|
|
239
|
+
esac
|