@chrono-meta/fh-gate 1.2.2 → 1.4.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/AGENTS.md +7 -4
- package/CATALOG.md +6 -1
- package/CHEATSHEET.md +125 -1
- package/CLAUDE.md +49 -6
- package/README.md +79 -20
- package/docs/codex-compat.md +4 -4
- package/docs/pillars.svg +26 -29
- package/knowledge/shared/harness-core/fh_integration_contract.md +1 -1
- package/package.json +1 -2
- package/plugins/fh-commons/skills/deliberation/SKILL.md +1 -1
- package/plugins/fh-meta/agents/beginner.md +104 -0
- package/{.claude → plugins/fh-meta}/agents/challenger.md +3 -1
- package/plugins/fh-meta/agents/expert.md +114 -0
- package/plugins/fh-meta/agents/main-player.md +106 -0
- package/plugins/fh-meta/skills/agent-composer/SKILL.md +2 -2
- package/plugins/fh-meta/skills/agent-composer/SKILL_detail.md +2 -2
- package/plugins/fh-meta/skills/apex-review/SKILL.md +1 -1
- package/plugins/fh-meta/skills/edit-manifest/SKILL.md +1 -1
- package/plugins/fh-meta/skills/harness-doctor/SKILL_detail.md +1 -1
- package/plugins/fh-meta/skills/install-wizard/SKILL.md +54 -30
- package/plugins/fh-meta/skills/marketplace-gate/SKILL.md +1 -1
- package/plugins/fh-meta/skills/phantom-quench/SKILL.md +248 -0
- package/plugins/fh-meta/skills/{source-grounding-audit → phantom-quench}/SKILL_detail.md +3 -3
- package/plugins/fh-meta/skills/pipeline-conductor/SKILL.md +10 -10
- package/plugins/fh-meta/skills/public-surface-audit/SKILL.md +77 -1
- package/plugins/fh-meta/skills/return-path-gate/SKILL.md +2 -2
- package/plugins/fh-meta/skills/sim-conductor/SKILL.md +91 -24
- package/plugins/fh-meta/skills/sim-conductor/SKILL_detail.md +18 -18
- package/plugins/fh-meta/skills/skill-splitter/SKILL.md +4 -4
- package/plugins/fh-meta/skills/skill-splitter/SKILL_detail.md +2 -2
- package/plugins/fh-meta/skills/source-grounding-audit/SKILL.md +27 -215
- package/plugins/fh-meta/skills/steel-quench/SKILL.md +24 -2
- package/plugins/fh-meta/skills/steel-quench/SKILL_detail.md +8 -8
- package/scripts/fh-gate.sh +3 -9
- package/scripts/fh-run.sh +1 -1
|
@@ -1,230 +1,42 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: source-grounding-audit
|
|
3
|
-
description:
|
|
4
|
-
|
|
5
|
-
|
|
3
|
+
description: >-
|
|
4
|
+
RENAMED to phantom-quench (2026-06-06, quench-series rebrand). Same skill, same ruleset — only the
|
|
5
|
+
label changed to fit the quench family (steel-quench · phantom-quench · goal-quench). Use
|
|
6
|
+
/phantom-quench. This alias is retained so old references and the v1 paper's name still resolve.
|
|
7
|
+
user-invocable: false
|
|
8
|
+
allowed-tools: []
|
|
6
9
|
model: sonnet
|
|
10
|
+
deprecated: true
|
|
11
|
+
deprecated_reason: renamed to phantom-quench (label-only; not a merge — same skill)
|
|
12
|
+
deprecated_date: 2026-06-06
|
|
13
|
+
successor: phantom-quench
|
|
7
14
|
---
|
|
8
15
|
|
|
9
|
-
# source-grounding-audit —
|
|
16
|
+
# source-grounding-audit — RENAMED to `phantom-quench`
|
|
10
17
|
|
|
11
|
-
>
|
|
18
|
+
> **Renamed to `phantom-quench` (2026-06-06).** This is a **label rename, not a deprecation-by-merge** —
|
|
19
|
+
> the skill is unchanged and fully active under its new name. Invoke **`/phantom-quench`**.
|
|
12
20
|
|
|
13
|
-
|
|
21
|
+
## Why the rename
|
|
14
22
|
|
|
15
|
-
|
|
23
|
+
phantom-quench is the **grounding member of the quench series** (steel-quench attacks output patterns ·
|
|
24
|
+
phantom-quench traces inputs for Phantom Claims · goal-quench gates autonomous runs). The old descriptive
|
|
25
|
+
name did not signal that family membership; the function is identical.
|
|
16
26
|
|
|
17
|
-
|
|
18
|
-
|---|---|---|
|
|
19
|
-
| **Attack target** | Output patterns (self-declarations, cushion language, reason for existence) | Input tracing (is the claim in the source?) |
|
|
20
|
-
| **Core question** | "Is this structure flawed?" | "Where did this content come from?" |
|
|
21
|
-
| **Activation timing** | All-angle quench just before completion | Immediately after source-based artifact generation or at point of suspicion |
|
|
22
|
-
| **Primary attack vector** | Bus factor, self-reference, platform obsolescence | Phantom Claim, source not read, fabricated branching conditions |
|
|
23
|
-
| **Representative pattern** | "Declaration only, no evidence" | "Number in TC that doesn't exist in source" |
|
|
27
|
+
## Where the skill lives now
|
|
24
28
|
|
|
25
|
-
|
|
29
|
+
`plugins/fh-meta/skills/phantom-quench/SKILL.md` (+ `SKILL_detail.md`) — full ruleset preserved
|
|
30
|
+
(S-grade blocker, Human Gate, Pattern Diagnosis, etc.).
|
|
26
31
|
|
|
27
|
-
|
|
28
|
-
|
|
29
|
-
## Trigger Phrases
|
|
30
|
-
|
|
31
|
-
| Phrase | Situation |
|
|
32
|
-
|---|---|
|
|
33
|
-
| "phantom detection", "phantom claim", "false claim detection" | Full artifact Phantom scan (primary trigger) |
|
|
34
|
-
| "source back-trace", "source audit" | Analysis report, design document verification |
|
|
35
|
-
| "verify source", "where did this come from" | Suspecting origin of a specific claim |
|
|
36
|
-
| "TC evidence tracing", "TC source verification" | Post-TC-generation source consistency check |
|
|
37
|
-
| "grounding audit", "source grounding audit" | Full artifact Phantom scan |
|
|
38
|
-
| "verify evidence files" | Analysis report, design document verification |
|
|
39
|
-
| `/source-grounding-audit` | Explicit call |
|
|
40
|
-
|
|
41
|
-
---
|
|
42
|
-
|
|
43
|
-
## Core Concept — Phantom Claim
|
|
44
|
-
|
|
45
|
-
**Phantom Claim**: A claim that appears in the artifact but cannot be found in the declared source files.
|
|
46
|
-
|
|
47
|
-
3 paths through which Phantoms are produced:
|
|
48
|
-
|
|
49
|
-
| Path | Description | Risk |
|
|
50
|
-
|---|---|:---:|
|
|
51
|
-
| **Source not read** | AI generates artifact using domain knowledge without Read-ing source | S |
|
|
52
|
-
| **Partial reading** | Source partially read, rest filled in with inference | A |
|
|
53
|
-
| **Reconstruction contamination** | Source was read but LLM modified values/conditions during paraphrase | A |
|
|
54
|
-
|
|
55
|
-
---
|
|
56
|
-
|
|
57
|
-
## Execution Steps
|
|
58
|
-
|
|
59
|
-
### Step 0. Confirm Audit Target
|
|
60
|
-
|
|
61
|
-
If not provided by user, explicitly confirm: artifact file path, declared source files, and audit scope. Source not declared = S-grade blocker registered immediately.
|
|
62
|
-
|
|
63
|
-
> **Detail**: See `SKILL_detail.md §Step0-Detail` — confirmation output format and simplification guard — read when audit target or source list is ambiguous.
|
|
64
|
-
|
|
65
|
-
---
|
|
66
|
-
|
|
67
|
-
### Step 0.5. Claim Distribution Profile
|
|
68
|
-
|
|
69
|
-
> **Schema**: `knowledge/shared/harness-core/tpa_schema.md` — `phantom_risk` derivation rule, gate trigger conditions, §Gate Routing Table.
|
|
70
|
-
|
|
71
|
-
Runs after Step 0 (target + source confirmed). Skip if user specifies scope explicitly.
|
|
72
|
-
|
|
73
|
-
Scan artifact quickly to classify claim distribution:
|
|
74
|
-
|
|
75
|
-
| Dimension | Signal → Audit depth shift |
|
|
76
|
-
|---|---|
|
|
77
|
-
| `claim_density` | > 10 claims → full Step 1-4 audit; ≤ 3 claims → light (S+A only) |
|
|
78
|
-
| `artifact_type` | SKILL.md/design-doc → prioritize Branch/State-transition claims; code → prioritize Proper-noun/API claims |
|
|
79
|
-
| `risk_level` | external publish / arXiv citations → all claim types, max depth |
|
|
80
|
-
| `source_count` | 0 declared sources → S-grade blocker immediately (skip to Step 3 prescription) |
|
|
81
|
-
| `quantitative_density` | > 3 numerical claims → focus numerical+range types first |
|
|
82
|
-
|
|
83
|
-
Scope recommendation output:
|
|
84
|
-
```
|
|
85
|
-
Claim types to prioritize: [list]
|
|
86
|
-
Audit depth: [full | prioritized | light]
|
|
87
|
-
Immediate blockers detected: [yes/no — 0 sources = immediate S-grade]
|
|
88
|
-
```
|
|
89
|
-
|
|
90
|
-
**0-source behavioral rule**: When artifact has 0 declared sources, skip Steps 1-2 entirely and go directly to Step 3 with S-grade blocker: "Source not declared — all claims unverifiable."
|
|
91
|
-
|
|
92
|
-
---
|
|
93
|
-
|
|
94
|
-
### Step 1. Claim Extraction (Artifact Scan)
|
|
95
|
-
|
|
96
|
-
Extract claims from the artifact that require source back-tracing. Claim types: Proper nouns (highest), Numerical/range values (highest), Branching conditions (highest), State transitions (high), Preconditions (high), Actors (medium). Exclude generic test methodology descriptions and generic UI patterns.
|
|
97
|
-
|
|
98
|
-
> **Detail**: See `SKILL_detail.md §Step1-Detail` — full claim types table with examples, exclude list, and Step 1 output format template — read when deciding which claims to include or format the extraction results.
|
|
99
|
-
|
|
100
|
-
---
|
|
101
|
-
|
|
102
|
-
### Step 2. Source Read + Back-Trace
|
|
103
|
-
|
|
104
|
-
Back-trace each claim to the declared source files using Read + Grep directly — no inference judgment. Partial match is not treated as match.
|
|
105
|
-
|
|
106
|
-
Back-tracing classification:
|
|
107
|
-
|
|
108
|
-
| Classification | Criteria | Marking |
|
|
109
|
-
|---|---|:---:|
|
|
110
|
-
| **Grounded** | Claim directly confirmed in source | ✅ |
|
|
111
|
-
| **Partial** | Similar content in source but not exact match — needs re-confirmation | ⚠️ |
|
|
112
|
-
| **Phantom** | Cannot be found in source | ❌ |
|
|
113
|
-
| **Source-Missing** | Source itself cannot be Read or was not declared | 🔴 |
|
|
114
|
-
|
|
115
|
-
> **Detail**: See `SKILL_detail.md §Step2-Detail` — back-tracing execution procedure, classification decision rules, and Step 2 output format template — read when handling edge cases or formatting results.
|
|
116
|
-
|
|
117
|
-
---
|
|
118
|
-
|
|
119
|
-
### Step 3. Phantom Classification + Prescription
|
|
120
|
-
|
|
121
|
-
Classify Phantom and Partial claims by severity and provide prescriptions.
|
|
122
|
-
|
|
123
|
-
**Severity classification criteria**:
|
|
32
|
+
## Record note (do not "fix")
|
|
124
33
|
|
|
125
|
-
|
|
126
|
-
|
|
127
|
-
|
|
128
|
-
| **A** (Must fix) | If this claim is wrong, TC cannot execute or runs wrong path | API endpoint names, field names, preconditions |
|
|
129
|
-
| **B** (Improvement recommended) | If this claim is wrong, TC can execute but intent may differ | Descriptive text, non-critical names |
|
|
130
|
-
|
|
131
|
-
Prescriptions: (1) Source Re-read — precisely re-read the relevant source section and fix; (2) Request source specification — when source doesn't exist or wasn't declared; (3) Delete/rewrite — delete claims without source grounding and rewrite from source.
|
|
132
|
-
|
|
133
|
-
> **Detail**: See `SKILL_detail.md §Step3-Detail` — prescription procedures and Step 3 output format template — read when writing the classification table or applying a prescription.
|
|
134
|
-
|
|
135
|
-
**S-grade Immediate Human Gate** — if 1+ S-grade Phantoms found, pause before Step 4/5 and surface:
|
|
136
|
-
|
|
137
|
-
```
|
|
138
|
-
⚠️ source-grounding-audit: N S-grade Phantom(s) found:
|
|
139
|
-
- [claim 1 — one-line summary, location]
|
|
140
|
-
- [claim 2 — one-line summary, location]
|
|
141
|
-
|
|
142
|
-
Options:
|
|
143
|
-
(a) Continue — AI proceeds to Step 4 pattern diagnosis + Step 5 re-audit
|
|
144
|
-
(b) Human review first — inspect Phantoms directly, then proceed
|
|
145
|
-
(c) Abort — fix sources manually and re-run audit
|
|
146
|
-
|
|
147
|
-
Waiting for input. (Default: a)
|
|
148
|
-
```
|
|
149
|
-
|
|
150
|
-
Rationale: S-grade Phantoms that enter Step 5 re-audit without human review risk LLM reconstruction contamination — the same pattern that originally produced the Phantoms can "verify" its own fixes. Human review at this threshold breaks the loop.
|
|
151
|
-
|
|
152
|
-
---
|
|
153
|
-
|
|
154
|
-
### Step 4. Source Not-Read Pattern Detection (Meta Diagnosis)
|
|
155
|
-
|
|
156
|
-
Analyze Phantom distribution to diagnose structural problems in the artifact generation process. Reveal "why were these Phantoms produced", not just "this TC is wrong".
|
|
157
|
-
|
|
158
|
-
**Pattern detection criteria**:
|
|
159
|
-
|
|
160
|
-
| Pattern | Detection Condition | Meaning |
|
|
161
|
-
|---|---|---|
|
|
162
|
-
| **Source not read** | 3+ Phantoms and no or partial source Read history | AI generated using domain knowledge without reading source |
|
|
163
|
-
| **Partial reading contamination** | Partial items exceed 30% of total | AI read source partially and filled rest with inference |
|
|
164
|
-
| **Reconstruction modification** | Source value exists but unit/format/range modified in TC | LLM paraphrase process contamination |
|
|
165
|
-
| **Source declaration absent** | Source file not specified when generating artifact | Process design stage problem |
|
|
166
|
-
|
|
167
|
-
**Simplification guard**: If 0 Phantoms, skip Step 4 entirely. Replace with one line: "Source grounding adequate."
|
|
168
|
-
|
|
169
|
-
> **Detail**: See `SKILL_detail.md §Step4-Detail` — Step 4 output format template — read when writing the pattern diagnosis section.
|
|
170
|
-
|
|
171
|
-
---
|
|
172
|
-
|
|
173
|
-
### Step 5. Post-Fix Re-audit (Optional)
|
|
174
|
-
|
|
175
|
-
Re-run back-trace for S-grade blocker claims after fixes are complete. Activate when 1+ S-grade blockers exist and fix is immediately possible.
|
|
176
|
-
|
|
177
|
-
**Done When (re-audit)**: Back-trace results for fixed claims all show Grounded (✅) status.
|
|
178
|
-
|
|
179
|
-
---
|
|
180
|
-
|
|
181
|
-
## Completion Declaration Format
|
|
182
|
-
|
|
183
|
-
> **Template**: See `SKILL_detail.md §Report-Template` — full completion declaration format — read when producing the final audit summary.
|
|
184
|
-
|
|
185
|
-
---
|
|
186
|
-
|
|
187
|
-
## Connected Skills
|
|
188
|
-
|
|
189
|
-
| Situation | Connected Skill |
|
|
190
|
-
|---|---|
|
|
191
|
-
| Simultaneously verify output patterns (self-declarations, cushion language) | `/steel-quench` Wave 1 "real-use verification" angle |
|
|
192
|
-
| Re-verify Phantom patterns from external user perspective | `/sim-conductor Area A` |
|
|
193
|
-
| Source not-read is a harness structure problem | `/harness-doctor` |
|
|
194
|
-
| Phantom pattern is a candidate for new rule items | `fh-meta:persona-innovator` |
|
|
195
|
-
| Redesign the artifact generation prompt itself | `/meta-prompt-builder` |
|
|
196
|
-
|
|
197
|
-
---
|
|
198
|
-
|
|
199
|
-
## External User Environment Adaptation
|
|
200
|
-
|
|
201
|
-
This skill can be used independently without the full meta-harness structure.
|
|
202
|
-
|
|
203
|
-
**How to declare source files**: When generating artifacts, specify "source: [file path list]", or provide source files when invoking this skill.
|
|
204
|
-
|
|
205
|
-
**External environment fallback**:
|
|
206
|
-
- If no `tracks/_meta/` → skip persistence step
|
|
207
|
-
- If no project-specific rules (like PFD) → output Phantom pattern summary only
|
|
208
|
-
|
|
209
|
-
---
|
|
34
|
+
The **v1 paper** (Zenodo 10.5281/zenodo.20397566; arXiv submission) cites `source-grounding-audit`.
|
|
35
|
+
That is the **immutable historical name**, not a phantom — `paper/forge_harness_v1.0.html` is left
|
|
36
|
+
unchanged by design. Future readers map: *source-grounding-audit (v1 paper) = phantom-quench (current)*.
|
|
210
37
|
|
|
211
38
|
## Done When
|
|
212
39
|
|
|
213
|
-
|
|
214
|
-
|
|
215
|
-
|
|
216
|
-
+ Step 3 Phantom severity classification + prescription output
|
|
217
|
-
+ Step 4 process pattern diagnosis complete (skip if 0 Phantoms)
|
|
218
|
-
+ "source-grounding-audit Complete" declaration output
|
|
219
|
-
```
|
|
220
|
-
|
|
221
|
-
Verdict: PASS (0 Phantom claims) | CONDITIONAL_PASS (LOW-severity Phantoms only, prescriptions noted) | FAIL (1+ HIGH/MEDIUM Phantom — broken path, phantom file, or stale external link) | ESCALATE (scope unclear or claim extraction impossible)
|
|
222
|
-
|
|
223
|
-
---
|
|
224
|
-
|
|
225
|
-
## Operating Notes
|
|
226
|
-
|
|
227
|
-
- **Never back-trace by inference**: Judging "this value is probably in the source" treats it as Partial not Phantom. Always directly confirm with Read + Grep.
|
|
228
|
-
- **Partial is not Grounded**: Processing similar-value-in-source as Grounded misses the reconstruction modification pattern.
|
|
229
|
-
- **Source not declared itself is S-grade**: If source is not declared when making an artifact, no claim can subsequently be verified. Recommend mandating source declaration in the process design stage.
|
|
230
|
-
- **Recommended to use with steel-quench**: steel-quench quenches structural flaws, source-grounding-audit ensures source consistency. The two skills are orthogonal and artifact quality assurance is strengthened when used together.
|
|
40
|
+
Deprecated alias — no active execution path of its own. Done When: all invocation routes through
|
|
41
|
+
`/phantom-quench` (the successor); this entry exists only so old names resolve. Satisfies the
|
|
42
|
+
harness-doctor L2 M-tier Done-When requirement (CLAUDE.md §New Skill Creation Pre-Commit Gate).
|
|
@@ -148,6 +148,27 @@ Wave 4 convergence = Wave 3 criteria + 3 AI-specific vectors actually reviewed +
|
|
|
148
148
|
|
|
149
149
|
---
|
|
150
150
|
|
|
151
|
+
## External-GT Adjudication (when the target has a public ground truth)
|
|
152
|
+
|
|
153
|
+
When quenching a **public artifact that has its own ground truth** — a repo's open issues, test suite, or
|
|
154
|
+
stated policy/threat-model (a frontier codebase, a sister project — *not* your own in-progress draft) — add
|
|
155
|
+
an adjudication pass after the panel produces findings. The panel (Wave 5 cross-family) gives decorrelated
|
|
156
|
+
detection; this pass adds the *external check* the panel cannot self-supply. For each finding, classify:
|
|
157
|
+
|
|
158
|
+
| Class | Test | Meaning |
|
|
159
|
+
|---|---|---|
|
|
160
|
+
| **Corroborated** | matches an OPEN issue / a failing test | independent rediscovery — strongest |
|
|
161
|
+
| **Novel** | no matching issue, but confirmed by logic or a written test | caught what the target missed |
|
|
162
|
+
| **Reframe / reject** | the target's own docs/policy/threat-model marks it intentional or out-of-scope | NOT a confident catch — a false positive |
|
|
163
|
+
|
|
164
|
+
The GT (not a cross-family vote) resolves contention objectively, and it catches the panel's own
|
|
165
|
+
**shared training-prior** false positives. Report only Corroborated + Novel as confident catches; a null
|
|
166
|
+
result on sound code is the correct answer, not a failure. **Basis**: 2026-06-06 frontier-quench sweep —
|
|
167
|
+
a single-family pass repeated still misses what cross-family catches, and a target's `SECURITY.md` reframed
|
|
168
|
+
"security" findings to "correctness" (its permission layer was UX, not a boundary).
|
|
169
|
+
|
|
170
|
+
---
|
|
171
|
+
|
|
151
172
|
## Cross-Project Common Patterns (initial seed)
|
|
152
173
|
|
|
153
174
|
| # | Pattern Name | Description | Response Direction |
|
|
@@ -194,7 +215,7 @@ Verdict: PASS (zero S-grade, convergence reached) | CONDITIONAL_PASS (A/B-grade
|
|
|
194
215
|
| Attack angle is a harness structure problem | `/harness-doctor` | optional |
|
|
195
216
|
| After Wave convergence, propose new pattern rules | `fh-meta:persona-innovator` | optional |
|
|
196
217
|
| Wave 1 structure-specific attack (6-axis) | `fh-commons:quench-challenger` | priority |
|
|
197
|
-
| Back-trace whether claims exist in source files | `/
|
|
218
|
+
| Back-trace whether claims exist in source files | `/phantom-quench` | **mandatory** when `phantom_risk=true` OR `scope=external` (see tpa_schema.md §Gate Routing Table) |
|
|
198
219
|
|
|
199
220
|
**steel-quench → sim-conductor gate**: After Wave convergence in external-publish context, `/sim-conductor Area A` is the mandatory next step.
|
|
200
221
|
|
|
@@ -219,7 +240,8 @@ sim-conductor Area A (external user perspective)
|
|
|
219
240
|
- **Attacks without real code are invalid.** Abstract criticism is not included in Wave 1 results.
|
|
220
241
|
- **quench-challenger first.** Call fh-commons:quench-challenger in isolation in Wave 1 if available.
|
|
221
242
|
- **Always check self-referential pattern (P3).** Cross-validate Wave results with external criteria.
|
|
222
|
-
- **
|
|
243
|
+
- **Public target → adjudicate against external GT before claiming.** A finding the target's own docs/policy/threat-model marks intentional or out-of-scope is a false positive, not a catch. See §External-GT Adjudication.
|
|
244
|
+
- **Attack surface limit**: steel-quench attacks output content patterns. Phantom Claim detection → `phantom-quench`.
|
|
223
245
|
|
|
224
246
|
## Failure Fallback
|
|
225
247
|
|
|
@@ -214,9 +214,9 @@ Default team-persona assignments:
|
|
|
214
214
|
|
|
215
215
|
| Team | CLI | Personas deployed |
|
|
216
216
|
|---|---|---|
|
|
217
|
-
| **T0 Claude** | Agent sub-agent (always present) | challenger · quench-challenger ·
|
|
218
|
-
| **T1 Gemini** | `gemini` pipe | devil ·
|
|
219
|
-
| **T2 Copilot** | `gh copilot suggest` | devil ·
|
|
217
|
+
| **T0 Claude** | Agent sub-agent (always present) | challenger · quench-challenger · expert |
|
|
218
|
+
| **T1 Gemini** | `gemini` pipe | devil · beginner · alternatives (challenger U1 lens) |
|
|
219
|
+
| **T2 Copilot** | `gh copilot suggest` | devil · expert |
|
|
220
220
|
| **T3 Ollama** | `ollama run {model}` | devil |
|
|
221
221
|
| **T4 Codex** | `npx @openai/codex exec` | devil · edge-case-hunter |
|
|
222
222
|
|
|
@@ -230,9 +230,9 @@ declare -A TEAM_RESULTS
|
|
|
230
230
|
if [[ " ${TEAMS[*]} " =~ " gemini " ]]; then
|
|
231
231
|
G_DEVIL=$(printf '[Devil] Adversarial reviewer, no prior context.\nFind 3 critical structural flaws — especially whether Done When criteria are binary and achievable.\nFormat: [issue · location · severity S/A/B]\n---\n%s' \
|
|
232
232
|
"$ARTIFACT_TAIL" | gemini 2>/dev/null) &
|
|
233
|
-
G_NEW=$(printf '[
|
|
233
|
+
G_NEW=$(printf '[Beginner] First-time user, zero background.\nFind 3 unclear or jargon-heavy points.\nFormat: [issue · location · severity]\n---\n%s' \
|
|
234
234
|
"$ARTIFACT_TAIL" | gemini 2>/dev/null) &
|
|
235
|
-
G_SKEP=$(printf '[
|
|
235
|
+
G_SKEP=$(printf '[Alternatives — challenger U1 lens] Pragmatic outsider.\nFind 3 "why not just X?" challenges.\nFormat: [issue · location · severity]\n---\n%s' \
|
|
236
236
|
"$ARTIFACT_TAIL" | gemini 2>/dev/null) &
|
|
237
237
|
wait
|
|
238
238
|
TEAM_RESULTS["gemini"]="$G_DEVIL
|
|
@@ -244,7 +244,7 @@ fi
|
|
|
244
244
|
if [[ " ${TEAMS[*]} " =~ " gh-copilot " ]]; then
|
|
245
245
|
GH_D=$(echo "[Devil] Find 3 critical flaws. Format: [issue · location · severity S/A/B]. Artifact: $ARTIFACT_TAIL" \
|
|
246
246
|
| gh copilot suggest -t shell 2>/dev/null) &
|
|
247
|
-
GH_E=$(echo "[
|
|
247
|
+
GH_E=$(echo "[Expert] Find 3 technical depth gaps. Format: [issue · location · severity]. Artifact: $ARTIFACT_TAIL" \
|
|
248
248
|
| gh copilot suggest -t shell 2>/dev/null) &
|
|
249
249
|
wait
|
|
250
250
|
TEAM_RESULTS["gh-copilot"]="$GH_D
|
|
@@ -443,11 +443,11 @@ External CLIs available: check at runtime via Step 0-pre bash detection
|
|
|
443
443
|
**Wave selection**:
|
|
444
444
|
```
|
|
445
445
|
Run: Wave 1 (claim density), Wave 2 (structural defense, weight↑),
|
|
446
|
-
Wave 3 (weight↑ — arXiv/DOI phantom risk; pair with /
|
|
446
|
+
Wave 3 (weight↑ — arXiv/DOI phantom risk; pair with /phantom-quench),
|
|
447
447
|
Wave 4 (novelty: new architecture)
|
|
448
448
|
Wave 5 (cross-team scope — activate if risk_level=high or user requests)
|
|
449
449
|
Skip: Phase 0 (unless user supplies an external bad-case doc)
|
|
450
450
|
External CLIs available: check at runtime
|
|
451
451
|
```
|
|
452
452
|
|
|
453
|
-
**Degraded coverage note**: Wave 3 without `/
|
|
453
|
+
**Degraded coverage note**: Wave 3 without `/phantom-quench` available → flag as "Axis 3 skipped (skill unavailable)" and note in residual risk card.
|
package/scripts/fh-gate.sh
CHANGED
|
@@ -329,16 +329,10 @@ if [[ "$FIRST_OUTPUT_LINE" != "FH_STATUS: SUCCESS" ]]; then
|
|
|
329
329
|
exit $EXIT_HARNESS_ERROR
|
|
330
330
|
fi
|
|
331
331
|
|
|
332
|
-
|
|
332
|
+
# Harness-failure guard is already enforced above: the first non-empty output line
|
|
333
|
+
# must be "FH_STATUS: SUCCESS" (see check at top of this block) or we exit HARNESS_ERROR.
|
|
333
334
|
VERDICT=$(grep -m 1 "^FH_GATE_VERDICT:" "$PARSE_FILE" 2>/dev/null | awk '{print $2}' | tr -d '[:space:]' || true)
|
|
334
335
|
|
|
335
|
-
# Harness failure guard (fail-safe: missing status → BLOCKED)
|
|
336
|
-
if [[ "$FH_STATUS" != "SUCCESS" ]]; then
|
|
337
|
-
echo "ERROR: FH_STATUS=${FH_STATUS:-MISSING} — harness failure (fail-safe: BLOCKED)" >&2
|
|
338
|
-
cat "$OUTPUT_FILE" >&2
|
|
339
|
-
exit $EXIT_HARNESS_ERROR
|
|
340
|
-
fi
|
|
341
|
-
|
|
342
336
|
# Emit structured output to stdout
|
|
343
337
|
cat "$PARSE_FILE"
|
|
344
338
|
|
|
@@ -368,7 +362,7 @@ case "$VERDICT" in
|
|
|
368
362
|
BLOCKED) echo "→ verdict: BLOCKED" >&2; exit $EXIT_BLOCKED ;;
|
|
369
363
|
ESCALATE) echo "→ verdict: ESCALATE" >&2; exit $EXIT_ESCALATE ;;
|
|
370
364
|
*)
|
|
371
|
-
echo "ERROR: unrecognized verdict '${VERDICT:-EMPTY}' —
|
|
365
|
+
echo "ERROR: unrecognized verdict '${VERDICT:-EMPTY}' — harness error, failing safe (commit not allowed)" >&2
|
|
372
366
|
exit $EXIT_HARNESS_ERROR
|
|
373
367
|
;;
|
|
374
368
|
esac
|
package/scripts/fh-run.sh
CHANGED
|
@@ -35,7 +35,7 @@ Environment:
|
|
|
35
35
|
FH_DRY_RUN=1 Print assembled prompt only
|
|
36
36
|
|
|
37
37
|
Examples:
|
|
38
|
-
FH_BACKEND=codex fh-run --skill
|
|
38
|
+
FH_BACKEND=codex fh-run --skill phantom-quench --file docs/foo.md
|
|
39
39
|
FH_BACKEND=codex fh-run --agent fh-commons:quench-challenger --file plugins/fh-meta/skills/foo/SKILL.md
|
|
40
40
|
USAGE
|
|
41
41
|
}
|