@chrono-meta/fh-gate 1.2.2 → 1.4.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/AGENTS.md +7 -4
- package/CATALOG.md +6 -1
- package/CHEATSHEET.md +125 -1
- package/CLAUDE.md +49 -6
- package/README.md +79 -20
- package/docs/codex-compat.md +4 -4
- package/docs/pillars.svg +26 -29
- package/knowledge/shared/harness-core/fh_integration_contract.md +1 -1
- package/package.json +1 -2
- package/plugins/fh-commons/skills/deliberation/SKILL.md +1 -1
- package/plugins/fh-meta/agents/beginner.md +104 -0
- package/{.claude → plugins/fh-meta}/agents/challenger.md +3 -1
- package/plugins/fh-meta/agents/expert.md +114 -0
- package/plugins/fh-meta/agents/main-player.md +106 -0
- package/plugins/fh-meta/skills/agent-composer/SKILL.md +2 -2
- package/plugins/fh-meta/skills/agent-composer/SKILL_detail.md +2 -2
- package/plugins/fh-meta/skills/apex-review/SKILL.md +1 -1
- package/plugins/fh-meta/skills/edit-manifest/SKILL.md +1 -1
- package/plugins/fh-meta/skills/harness-doctor/SKILL_detail.md +1 -1
- package/plugins/fh-meta/skills/install-wizard/SKILL.md +54 -30
- package/plugins/fh-meta/skills/marketplace-gate/SKILL.md +1 -1
- package/plugins/fh-meta/skills/phantom-quench/SKILL.md +248 -0
- package/plugins/fh-meta/skills/{source-grounding-audit → phantom-quench}/SKILL_detail.md +3 -3
- package/plugins/fh-meta/skills/pipeline-conductor/SKILL.md +10 -10
- package/plugins/fh-meta/skills/public-surface-audit/SKILL.md +77 -1
- package/plugins/fh-meta/skills/return-path-gate/SKILL.md +2 -2
- package/plugins/fh-meta/skills/sim-conductor/SKILL.md +91 -24
- package/plugins/fh-meta/skills/sim-conductor/SKILL_detail.md +18 -18
- package/plugins/fh-meta/skills/skill-splitter/SKILL.md +4 -4
- package/plugins/fh-meta/skills/skill-splitter/SKILL_detail.md +2 -2
- package/plugins/fh-meta/skills/source-grounding-audit/SKILL.md +27 -215
- package/plugins/fh-meta/skills/steel-quench/SKILL.md +24 -2
- package/plugins/fh-meta/skills/steel-quench/SKILL_detail.md +8 -8
- package/scripts/fh-gate.sh +3 -9
- package/scripts/fh-run.sh +1 -1
|
@@ -26,15 +26,15 @@ category: Composability Gate
|
|
|
26
26
|
> See `README.md > Advanced Settings > Plugin Install` for detailed guide.
|
|
27
27
|
|
|
28
28
|
Run immediately after cloning forge-harness (FH), or when setting up a new project for the first time.
|
|
29
|
-
Sets up periodic notification structure
|
|
29
|
+
Sets up the periodic-audit notification structure: a permanent zshrc hook (`fh_audit_check.zsh`, runs on terminal start) plus FH's session-start mtime detection. Both surface a weekly-audit reminder when 7+ days have elapsed since the last `weekly_audit` — no persistent cron is used (a session-scoped scheduler cannot survive to fire on a later day).
|
|
30
30
|
|
|
31
31
|
## Key Terms
|
|
32
32
|
|
|
33
33
|
| Term | Definition |
|
|
34
34
|
|---|---|
|
|
35
35
|
| **sentinel** | An empty file that records whether a specific event (audit complete, install complete, etc.) has occurred. Created in `~/.cc_sentinels/`. |
|
|
36
|
-
| **CronCreate** | Claude Code built-in command — schedules periodic tasks valid for the current session. Disappears when session ends. |
|
|
37
36
|
| **zshrc hook** | Shell function added to `~/.zshrc`. Automatically runs on terminal start and applies permanently. |
|
|
37
|
+
| **session-start detection** | FH's durable weekly-audit cadence — at session start the mtime of the latest `weekly_audit_*` is checked and `/harvest-loop` is proposed if 7+ days elapsed (see CLAUDE.md Cadence Rules). No persistent scheduler required. |
|
|
38
38
|
|
|
39
39
|
## Execution Modes
|
|
40
40
|
|
|
@@ -51,7 +51,7 @@ Sets up periodic notification structure (zshrc hook) and weekly audit notificati
|
|
|
51
51
|
- **Per-item approval**: Select each item individually (Y approve / N skip / L later)
|
|
52
52
|
- **Double-confirm irreversible changes**: Preview before file writes and zshrc modifications
|
|
53
53
|
- **User review before PR creation**: Output PR parameters (title, base branch, included files, body) and get approval before execution. No automatic submission.
|
|
54
|
-
- **Periodic audit structure setup**: zshrc hook (permanently applied on terminal start) + sentinel initialization +
|
|
54
|
+
- **Periodic audit structure setup**: zshrc hook (permanently applied on terminal start) + sentinel initialization + session-start mtime detection (7-day threshold)
|
|
55
55
|
|
|
56
56
|
## Execution Steps
|
|
57
57
|
|
|
@@ -138,9 +138,13 @@ echo 'source ~/.cc_secrets/tokens.env' >> ~/.zshrc
|
|
|
138
138
|
**The following are environment detection procedures that CC executes automatically. No need for users to run manually.**
|
|
139
139
|
|
|
140
140
|
```bash
|
|
141
|
-
# Prompt injection pre-flight:
|
|
142
|
-
|
|
143
|
-
|
|
141
|
+
# Prompt injection pre-flight: scan config AND the project's AI-instruction surfaces — CLAUDE.md,
|
|
142
|
+
# AGENTS.md, .claude/rules/* — which are the higher-risk vectors in an unknown repo (not just shell/settings).
|
|
143
|
+
# Injection-SPECIFIC patterns only (override/exfil), since instruction files legitimately carry directives;
|
|
144
|
+
# advisory (recommend manual review), never an auto-block.
|
|
145
|
+
if grep -rIE "ignore (all )?previous|disregard (the )?above|exfiltrat|^# CLAUDE:|^# AI:|<instructions>" \
|
|
146
|
+
~/.zshrc .claude/settings.json CLAUDE.md AGENTS.md .claude/rules/ 2>/dev/null | grep -q .; then
|
|
147
|
+
echo "⚠️ AI-instruction / override pattern detected in config or instruction files — injection risk in an unknown repo. Review the listed files manually before proceeding."; fi
|
|
144
148
|
|
|
145
149
|
# FH location
|
|
146
150
|
echo "FH_DIR=${FH_DIR:-not set}"
|
|
@@ -164,13 +168,13 @@ python3 -c "import json,os; d=json.load(open(os.path.expanduser('~/.claude.json'
|
|
|
164
168
|
# zshrc hook status
|
|
165
169
|
grep -q "fh_audit_check.zsh" ~/.zshrc 2>/dev/null && echo "zshrc hook: present" || echo "zshrc hook: absent"
|
|
166
170
|
|
|
167
|
-
# Framework detection (
|
|
168
|
-
|
|
169
|
-
|
|
170
|
-
|
|
171
|
-
|
|
172
|
-
echo "Framework:
|
|
173
|
-
|
|
171
|
+
# Framework detection (optional) — only used to look for a matching OPTIONAL domain pattern pack.
|
|
172
|
+
# Generic: capture the framework name; the pattern-pack path is derived as {framework}_patterns.md.
|
|
173
|
+
# No pattern pack ships by default — this is a user-supplied extension point, absence is the normal state.
|
|
174
|
+
FRAMEWORK=""
|
|
175
|
+
for fw in streamlit django fastapi flask; do
|
|
176
|
+
if grep -qi "$fw" requirements.txt pyproject.toml 2>/dev/null; then FRAMEWORK="$fw"; echo "Framework: $fw detected"; break; fi
|
|
177
|
+
done
|
|
174
178
|
```
|
|
175
179
|
|
|
176
180
|
**Bootstrap guidance when FH_DIR is not set (stop immediately in Step 0):**
|
|
@@ -180,8 +184,10 @@ fi
|
|
|
180
184
|
1. Clone FH repo:
|
|
181
185
|
git clone https://github.com/chrono-meta/forge-harness ~/forge-harness
|
|
182
186
|
|
|
183
|
-
2. Set environment
|
|
187
|
+
2. Set environment variables:
|
|
184
188
|
export FH_DIR=~/forge-harness
|
|
189
|
+
export CC_HUB_DIR=$FH_DIR # FH hub dir (holds tracks/_audit for the weekly-audit mtime check);
|
|
190
|
+
# equals FH_DIR unless you run a separate hub clone
|
|
185
191
|
|
|
186
192
|
3. Install FH plugin in CC:
|
|
187
193
|
Settings → Plugins → Add → {FH_DIR}/plugins/fh-meta
|
|
@@ -194,11 +200,12 @@ fi
|
|
|
194
200
|
|
|
195
201
|
*(Run after Step 0-A·B pre-checks. Output results as environment card, then continue to Step 0-C.)*
|
|
196
202
|
|
|
197
|
-
Output detection results as **environment card**.
|
|
203
|
+
Output detection results as **environment card**. If a framework was detected AND you maintain a matching
|
|
204
|
+
optional domain pattern pack, reference it (none ship by default — absence is normal, never a gap):
|
|
198
205
|
```
|
|
199
|
-
📌
|
|
200
|
-
{CC_HUB_DIR}/knowledge/shared/
|
|
201
|
-
|
|
206
|
+
📌 {FRAMEWORK} project detected → optional domain pattern pack check
|
|
207
|
+
{CC_HUB_DIR}/knowledge/shared/{FRAMEWORK}_patterns.md loaded (only if you supplied it; not shipped by default)
|
|
208
|
+
If absent: skip silently — no pack is the expected default state.
|
|
202
209
|
```
|
|
203
210
|
|
|
204
211
|
```
|
|
@@ -219,7 +226,7 @@ install-wizard — Environment Detection
|
|
|
219
226
|
> **Core message**: FH is not something placed on top of an existing harness.
|
|
220
227
|
> It analyzes existing rules to remove duplicates — making things lighter.
|
|
221
228
|
>
|
|
222
|
-
> **
|
|
229
|
+
> **Illustrative single-run measurements** (n=1 per project, `--dry-run` verified — not benchmarks; your numbers will differ):
|
|
223
230
|
>
|
|
224
231
|
> | Project type | Example | Total volume | Reduction | Main cause |
|
|
225
232
|
> |---|---|---|---|---|
|
|
@@ -323,9 +330,10 @@ Auto-check the following items based on detected environment. Each item classifi
|
|
|
323
330
|
| MCP plugin | ~/.claude.json mcpServers contains entry | `python3 -c "import json,os; d=json.load(open(os.path.expanduser('~/.claude.json'))); print(list(d.get('mcpServers',{}).keys()))"` |
|
|
324
331
|
| `deep-insight plugin` | settings.json plugins contains deep-insight | `grep -r "deep-insight" .claude/settings.json 2>/dev/null` |
|
|
325
332
|
| `fh_env_context.jsonc` | `.claude/rules/fh_env_context.jsonc` exists | `ls .claude/rules/fh_env_context.jsonc` |
|
|
326
|
-
| `
|
|
333
|
+
| `phantom-gate` | **(Python + AI-output projects only)** `phantom-gate` present in `requirements.txt` / `pyproject.toml` | `grep "phantom.gate" requirements.txt pyproject.toml 2>/dev/null` |
|
|
334
|
+
| `Domain pattern pack applied` | (optional — only when a `{framework}_patterns.md` pack is present; none ship by default) framework-specific pattern checks | `knowledge/shared/{framework}_patterns.md` check (skip if file absent — the normal default) |
|
|
327
335
|
|
|
328
|
-
**Score calculation**: PASS = 1
|
|
336
|
+
**Score calculation**: PASS = 1 / MISS = 0.5 / FAIL = 0. Formula: `score = round( Σ(points) ÷ (applicable item count) × 100 )`. Conditional items (domain pattern pack / phantom-gate / MCP / deep-insight) are excluded from the denominator when not relevant, so always print the denominator next to the score (e.g. `{score}/100 over {n} applicable items`) — the percentage is reproducible only when the item count is shown.
|
|
329
337
|
|
|
330
338
|
### Step 2. Diagnosis Report + Proposal List
|
|
331
339
|
|
|
@@ -356,13 +364,21 @@ install-wizard — Diagnosis Results ({score}/100)
|
|
|
356
364
|
[6] Add MCP plugin — activate integrations (if MCP plugin MISS)
|
|
357
365
|
Run: claude mcp add <your-mcp-plugin> -- npx -y <your-mcp-plugin>
|
|
358
366
|
CC restart required after completion
|
|
359
|
-
[7] Install deep-insight
|
|
360
|
-
|
|
361
|
-
|
|
367
|
+
[7] (Optional — field plugin, NOT required) Install deep-insight — adds the field's domain personas to sim-conductor
|
|
368
|
+
deep-insight is a private/field marketplace plugin. sim-conductor already ships the built-in
|
|
369
|
+
user-mastery spectrum (beginner · main-player · expert · challenger), so multi-persona simulation
|
|
370
|
+
works WITHOUT it. If you have access: Settings → Plugins → Add → <your deep-insight path>.
|
|
371
|
+
If not: skip — sim-conductor falls back to the built-in spectrum agents (no capability lost).
|
|
362
372
|
[8] Create fh_env_context.jsonc — org/network/Git environment context file (if fh_env_context.jsonc MISS)
|
|
363
373
|
Copy: {FH_DIR}/templates/fh_env_context.jsonc → .claude/rules/fh_env_context.jsonc
|
|
364
374
|
Then manually update with actual values for org name, Jira URL, environment status, etc.
|
|
365
375
|
Effect: Each skill references common environment context → eliminate individual setting duplication
|
|
376
|
+
[9] Install phantom-gate — AI output hallucination detection (Python + AI-output projects only, if MISS)
|
|
377
|
+
Run: pip install git+https://github.com/chrono-meta/phantom-gate.git
|
|
378
|
+
Usage: phantom-gate scan output.txt / phantom-gate scan . --project
|
|
379
|
+
Detectors: M1 (phantom claims) · M2 (self-reference loops) · M3 (unvalidated external-dep claims) · M4 (temporal) · M5 (cross-file version mismatch)
|
|
380
|
+
Skip condition: non-Python project OR no AI-generated output in pipeline
|
|
381
|
+
|
|
366
382
|
|
|
367
383
|
Each item: Y (approve) / N (skip) / L (later) / A (approve all)
|
|
368
384
|
```
|
|
@@ -470,9 +486,16 @@ source "$FH_DIR/templates/fh_audit_check.zsh"
|
|
|
470
486
|
EOF
|
|
471
487
|
fi
|
|
472
488
|
|
|
473
|
-
# 4-axis verification gate
|
|
474
|
-
#
|
|
489
|
+
# 4-axis verification gate (Mode D / FH-self-development only — OPT-IN, double-confirm required)
|
|
490
|
+
# SCOPE (state this before asking): this gates commits IN YOUR FH CLONE ($FH_DIR) — git commit there is
|
|
491
|
+
# blocked until the 4-axis markers pass. It is FH-internal infra (hardcodes hub paths/markers) and is
|
|
492
|
+
# NEVER installed into field projects (see auto_project_mapping.md §6). Skip unless you develop FH itself.
|
|
493
|
+
# Per Core Principles (Per-item approval + Double-confirm irreversible changes): this is NOT auto-run —
|
|
494
|
+
# it is a separate explicit Y/N, not folded into the baseline-setup batch.
|
|
475
495
|
if [ -d "$FH_DIR/templates/.git-hooks" ]; then
|
|
496
|
+
echo "Enable the 4-axis pre-commit gate on your FH clone ($FH_DIR)? It will block commits there until"
|
|
497
|
+
echo "markers pass (Mode D / FH-development only). Skip if you are not developing FH itself. (Y/N)"
|
|
498
|
+
# → On explicit Y only:
|
|
476
499
|
git -C "$FH_DIR" config core.hooksPath templates/.git-hooks
|
|
477
500
|
chmod +x "$FH_DIR/templates/.git-hooks/pre-commit" 2>/dev/null
|
|
478
501
|
echo "4-axis pre-commit gate: installed (core.hooksPath -> templates/.git-hooks)"
|
|
@@ -482,8 +505,9 @@ fi
|
|
|
482
505
|
mkdir -p ~/.cc_sentinels
|
|
483
506
|
touch ~/.cc_sentinels/$(basename "$(pwd)")_wizard_done
|
|
484
507
|
|
|
485
|
-
# Weekly audit
|
|
486
|
-
#
|
|
508
|
+
# Weekly audit cadence — NO cron needed (a session-scoped scheduler cannot fire on a later day).
|
|
509
|
+
# Durable mechanism = the zshrc hook above (fh_audit_check.zsh warns on terminal start when 7+ days
|
|
510
|
+
# since last weekly_audit) + FH session-start detection (proposes /harvest-loop lightweight when overdue).
|
|
487
511
|
```
|
|
488
512
|
|
|
489
513
|
### Step 5. Completion Report + Contribution Guidance
|
|
@@ -496,7 +520,7 @@ install-wizard — Complete
|
|
|
496
520
|
From now on:
|
|
497
521
|
· Periodic audit auto-check on terminal start
|
|
498
522
|
· Yellow warning output when weekly_audit exceeds 7 days
|
|
499
|
-
·
|
|
523
|
+
· /harvest-loop (lightweight) proposed at session start when 7+ days since last weekly_audit
|
|
500
524
|
|
|
501
525
|
Next step skills:
|
|
502
526
|
· Not sure which plugin you need → /plugin-recommender
|
|
@@ -559,7 +583,7 @@ ls ~/.cc_sentinels/${PROJECT_NAME}_wizard_done 2>/dev/null && echo "Inspection m
|
|
|
559
583
|
|---|---|
|
|
560
584
|
| Structural anomaly detected | `/harness-doctor` |
|
|
561
585
|
| Token waste pattern detected | `/context-doctor` |
|
|
562
|
-
| External user simulation needed | `/sim-conductor
|
|
586
|
+
| External user simulation needed | `/sim-conductor` |
|
|
563
587
|
| Install conflict suspected | `/install-doctor` |
|
|
564
588
|
|
|
565
589
|
## Per-Cluster Deferred Loading (Progressive Disclosure)
|
|
@@ -187,7 +187,7 @@ All steps 0–2 completed
|
|
|
187
187
|
+ Overall verdict output (🟢 Recommended / 🟡 Conditional / 🔴 On hold)
|
|
188
188
|
```
|
|
189
189
|
|
|
190
|
-
**→ Mandatory before 🟢 Recommended verdict: `
|
|
190
|
+
**→ Mandatory before 🟢 Recommended verdict: `phantom-quench`** — forward axis check on all citations, external URLs, and file path references in the asset being reviewed. A 🟢 verdict without phantom-quench is incomplete. If phantom-quench finds phantom refs → verdict downgrades to 🟡 Conditional automatically.
|
|
191
191
|
|
|
192
192
|
> When `agent-composer` receives a "comprehensive marketplace listing audit" request,
|
|
193
193
|
> recommend: Wave 0 `fact-checker` → Wave 1 `marketplace-gate` + `hub-persona-auditor` in parallel.
|
|
@@ -0,0 +1,248 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: phantom-quench
|
|
3
|
+
description: The grounding member of the quench series — extracts proper nouns, numerical values, and branching conditions from artifacts (TCs, analysis reports, design documents), back-traces them to declared source files, and marks anything not found as a Phantom Claim (ungrounded — present in the artifact but not traceable to a declared source; not a claim that it is necessarily false). If steel-quench attacks output patterns (self-declarations, cushion language), phantom-quench attacks input tracing (where did this come from?). Renamed from source-grounding-audit (2026-06-06, quench-series); `/source-grounding-audit` still resolves as an alias. Triggered by "phantom detection", "phantom-quench", "phantom claim", "hallucinated claim detection", "source back-trace", "source audit", "verify source", "TC evidence tracing", "where did this come from", "grounding audit", "source grounding audit", "false claim detection".
|
|
4
|
+
user-invocable: true
|
|
5
|
+
allowed-tools: ["Read", "Write", "Edit", "Bash", "Grep", "Glob"]
|
|
6
|
+
model: sonnet
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
# phantom-quench — Input Tracing Grounding Audit
|
|
10
|
+
|
|
11
|
+
> Just because an artifact looks plausible doesn't mean it's grounded in source. plausible ≠ grounded.
|
|
12
|
+
|
|
13
|
+
> **Renamed from `source-grounding-audit` (2026-06-06)** — the grounding member of the quench series
|
|
14
|
+
> (steel-quench · phantom-quench · goal-quench). Same skill, same ruleset; only the label changed to fit
|
|
15
|
+
> the family. The **v1 paper (Zenodo 10.5281/zenodo.20397566) cites the old name** — that is the
|
|
16
|
+
> historical record, not a phantom. `/source-grounding-audit` still resolves via the deprecated redirect
|
|
17
|
+
> stub at `plugins/fh-meta/skills/source-grounding-audit/SKILL.md` (`successor: phantom-quench`).
|
|
18
|
+
> This is a **label rename, not a capability change** — phantom-quench does not fuse steel-quench or
|
|
19
|
+
> inject faults; those remain separate (orthogonality is deliberate — see Role Separation below).
|
|
20
|
+
>
|
|
21
|
+
> **Quench-series semantics** (resolves the "quench *what*?" question): each member subjects a different
|
|
22
|
+
> thing to the forge — steel-quench hardens an **existing output**; phantom-quench hardens the system
|
|
23
|
+
> against **mistaking the absent for present** (the phantom illusion — *not* the phantom as a material to
|
|
24
|
+
> harden); goal-quench hardens the **goal itself** into an advanced version. Same verb, consistent grammar.
|
|
25
|
+
>
|
|
26
|
+
> **Not the same as `phantom-gate`.** `phantom-gate` is the *productized standalone* phantom detector — a
|
|
27
|
+
> PyPI package run against any repo from the shell. `phantom-quench` is the *in-harness skill* — the same
|
|
28
|
+
> detection lineage as a method invoked inside a Claude session against a declared source set. Tool vs
|
|
29
|
+
> skill; different delivery, shared idea.
|
|
30
|
+
|
|
31
|
+
When AI generates artifacts without reading the source, those artifacts look like domain knowledge but are actually **Phantom Claims** coming from LLM weights. This skill back-traces each claim in the artifact to the declared source to explicitly mark Phantoms.
|
|
32
|
+
|
|
33
|
+
## Role Separation from steel-quench
|
|
34
|
+
|
|
35
|
+
| Dimension | steel-quench | phantom-quench |
|
|
36
|
+
|---|---|---|
|
|
37
|
+
| **Attack target** | Output patterns (self-declarations, cushion language, reason for existence) | Input tracing (is the claim in the source?) |
|
|
38
|
+
| **Core question** | "Is this structure flawed?" | "Where did this content come from?" |
|
|
39
|
+
| **Activation timing** | All-angle quench just before completion | Immediately after source-based artifact generation or at point of suspicion |
|
|
40
|
+
| **Primary attack vector** | Bus factor, self-reference, platform obsolescence | Phantom Claim, source not read, fabricated branching conditions |
|
|
41
|
+
| **Representative pattern** | "Declaration only, no evidence" | "Number in TC that doesn't exist in source" |
|
|
42
|
+
|
|
43
|
+
**Can be used together**: steel-quench Wave 1 real-code-based attack + phantom-quench Phantom marking can be run sequentially in the same session. But do not mix the roles of the two skills.
|
|
44
|
+
|
|
45
|
+
---
|
|
46
|
+
|
|
47
|
+
## Trigger Phrases
|
|
48
|
+
|
|
49
|
+
| Phrase | Situation |
|
|
50
|
+
|---|---|
|
|
51
|
+
| "phantom detection", "phantom claim", "false claim detection" | Full artifact Phantom scan (primary trigger) |
|
|
52
|
+
| "source back-trace", "source audit" | Analysis report, design document verification |
|
|
53
|
+
| "verify source", "where did this come from" | Suspecting origin of a specific claim |
|
|
54
|
+
| "TC evidence tracing", "TC source verification" | Post-TC-generation source consistency check |
|
|
55
|
+
| "grounding audit", "source grounding audit" | Full artifact Phantom scan |
|
|
56
|
+
| "verify evidence files" | Analysis report, design document verification |
|
|
57
|
+
| `/phantom-quench` | Explicit call |
|
|
58
|
+
|
|
59
|
+
---
|
|
60
|
+
|
|
61
|
+
## Core Concept — Phantom Claim
|
|
62
|
+
|
|
63
|
+
**Phantom Claim**: A claim that appears in the artifact but cannot be found in the declared source files.
|
|
64
|
+
|
|
65
|
+
3 paths through which Phantoms are produced:
|
|
66
|
+
|
|
67
|
+
| Path | Description | Risk |
|
|
68
|
+
|---|---|:---:|
|
|
69
|
+
| **Source not read** | AI generates artifact using domain knowledge without Read-ing source | S |
|
|
70
|
+
| **Partial reading** | Source partially read, rest filled in with inference | A |
|
|
71
|
+
| **Reconstruction contamination** | Source was read but LLM modified values/conditions during paraphrase | A |
|
|
72
|
+
|
|
73
|
+
---
|
|
74
|
+
|
|
75
|
+
## Execution Steps
|
|
76
|
+
|
|
77
|
+
### Step 0. Confirm Audit Target
|
|
78
|
+
|
|
79
|
+
If not provided by user, explicitly confirm: artifact file path, declared source files, and audit scope. Source not declared = S-grade blocker registered immediately.
|
|
80
|
+
|
|
81
|
+
> **Detail**: See `SKILL_detail.md §Step0-Detail` — confirmation output format and simplification guard — read when audit target or source list is ambiguous.
|
|
82
|
+
|
|
83
|
+
---
|
|
84
|
+
|
|
85
|
+
### Step 0.5. Claim Distribution Profile
|
|
86
|
+
|
|
87
|
+
> **Schema**: `knowledge/shared/harness-core/tpa_schema.md` — `phantom_risk` derivation rule, gate trigger conditions, §Gate Routing Table.
|
|
88
|
+
|
|
89
|
+
Runs after Step 0 (target + source confirmed). Skip if user specifies scope explicitly.
|
|
90
|
+
|
|
91
|
+
Scan artifact quickly to classify claim distribution:
|
|
92
|
+
|
|
93
|
+
| Dimension | Signal → Audit depth shift |
|
|
94
|
+
|---|---|
|
|
95
|
+
| `claim_density` | > 10 claims → full Step 1-4 audit; ≤ 3 claims → light (S+A only) |
|
|
96
|
+
| `artifact_type` | SKILL.md/design-doc → prioritize Branch/State-transition claims; code → prioritize Proper-noun/API claims |
|
|
97
|
+
| `risk_level` | external publish / arXiv citations → all claim types, max depth |
|
|
98
|
+
| `source_count` | 0 declared sources → S-grade blocker immediately (skip to Step 3 prescription) |
|
|
99
|
+
| `quantitative_density` | > 3 numerical claims → focus numerical+range types first |
|
|
100
|
+
|
|
101
|
+
Scope recommendation output:
|
|
102
|
+
```
|
|
103
|
+
Claim types to prioritize: [list]
|
|
104
|
+
Audit depth: [full | prioritized | light]
|
|
105
|
+
Immediate blockers detected: [yes/no — 0 sources = immediate S-grade]
|
|
106
|
+
```
|
|
107
|
+
|
|
108
|
+
**0-source behavioral rule**: When artifact has 0 declared sources, skip Steps 1-2 entirely and go directly to Step 3 with S-grade blocker: "Source not declared — all claims unverifiable."
|
|
109
|
+
|
|
110
|
+
---
|
|
111
|
+
|
|
112
|
+
### Step 1. Claim Extraction (Artifact Scan)
|
|
113
|
+
|
|
114
|
+
Extract claims from the artifact that require source back-tracing. Claim types: Proper nouns (highest), Numerical/range values (highest), Branching conditions (highest), State transitions (high), Preconditions (high), Actors (medium). Exclude generic test methodology descriptions and generic UI patterns.
|
|
115
|
+
|
|
116
|
+
> **Detail**: See `SKILL_detail.md §Step1-Detail` — full claim types table with examples, exclude list, and Step 1 output format template — read when deciding which claims to include or format the extraction results.
|
|
117
|
+
|
|
118
|
+
---
|
|
119
|
+
|
|
120
|
+
### Step 2. Source Read + Back-Trace
|
|
121
|
+
|
|
122
|
+
Back-trace each claim to the declared source files using Read + Grep directly — no inference judgment. Partial match is not treated as match.
|
|
123
|
+
|
|
124
|
+
Back-tracing classification:
|
|
125
|
+
|
|
126
|
+
| Classification | Criteria | Marking |
|
|
127
|
+
|---|---|:---:|
|
|
128
|
+
| **Grounded** | Claim directly confirmed in source | ✅ |
|
|
129
|
+
| **Partial** | Similar content in source but not exact match — needs re-confirmation | ⚠️ |
|
|
130
|
+
| **Phantom** | Cannot be found in source | ❌ |
|
|
131
|
+
| **Source-Missing** | Source itself cannot be Read or was not declared | 🔴 |
|
|
132
|
+
|
|
133
|
+
> **Detail**: See `SKILL_detail.md §Step2-Detail` — back-tracing execution procedure, classification decision rules, and Step 2 output format template — read when handling edge cases or formatting results.
|
|
134
|
+
|
|
135
|
+
---
|
|
136
|
+
|
|
137
|
+
### Step 3. Phantom Classification + Prescription
|
|
138
|
+
|
|
139
|
+
Classify Phantom and Partial claims by severity and provide prescriptions.
|
|
140
|
+
|
|
141
|
+
**Severity classification criteria**:
|
|
142
|
+
|
|
143
|
+
| Severity | Criteria | Examples |
|
|
144
|
+
|:---:|---|---|
|
|
145
|
+
| **S** (Immediate blocker) | If this claim is wrong, TC could Pass-judge incorrect behavior | Monetary boundary values, branching conditions, status values |
|
|
146
|
+
| **A** (Must fix) | If this claim is wrong, TC cannot execute or runs wrong path | API endpoint names, field names, preconditions |
|
|
147
|
+
| **B** (Improvement recommended) | If this claim is wrong, TC can execute but intent may differ | Descriptive text, non-critical names |
|
|
148
|
+
|
|
149
|
+
Prescriptions: (1) Source Re-read — precisely re-read the relevant source section and fix; (2) Request source specification — when source doesn't exist or wasn't declared; (3) Delete/rewrite — delete claims without source grounding and rewrite from source.
|
|
150
|
+
|
|
151
|
+
> **Detail**: See `SKILL_detail.md §Step3-Detail` — prescription procedures and Step 3 output format template — read when writing the classification table or applying a prescription.
|
|
152
|
+
|
|
153
|
+
**S-grade Immediate Human Gate** — if 1+ S-grade Phantoms found, pause before Step 4/5 and surface:
|
|
154
|
+
|
|
155
|
+
```
|
|
156
|
+
⚠️ phantom-quench: N S-grade Phantom(s) found:
|
|
157
|
+
- [claim 1 — one-line summary, location]
|
|
158
|
+
- [claim 2 — one-line summary, location]
|
|
159
|
+
|
|
160
|
+
Options:
|
|
161
|
+
(a) Continue — AI proceeds to Step 4 pattern diagnosis + Step 5 re-audit
|
|
162
|
+
(b) Human review first — inspect Phantoms directly, then proceed
|
|
163
|
+
(c) Abort — fix sources manually and re-run audit
|
|
164
|
+
|
|
165
|
+
Waiting for input. (Default: a)
|
|
166
|
+
```
|
|
167
|
+
|
|
168
|
+
Rationale: S-grade Phantoms that enter Step 5 re-audit without human review risk LLM reconstruction contamination — the same pattern that originally produced the Phantoms can "verify" its own fixes. Human review at this threshold breaks the loop.
|
|
169
|
+
|
|
170
|
+
---
|
|
171
|
+
|
|
172
|
+
### Step 4. Source Not-Read Pattern Detection (Meta Diagnosis)
|
|
173
|
+
|
|
174
|
+
Analyze Phantom distribution to diagnose structural problems in the artifact generation process. Reveal "why were these Phantoms produced", not just "this TC is wrong".
|
|
175
|
+
|
|
176
|
+
**Pattern detection criteria**:
|
|
177
|
+
|
|
178
|
+
| Pattern | Detection Condition | Meaning |
|
|
179
|
+
|---|---|---|
|
|
180
|
+
| **Source not read** | 3+ Phantoms and no or partial source Read history | AI generated using domain knowledge without reading source |
|
|
181
|
+
| **Partial reading contamination** | Partial items exceed 30% of total | AI read source partially and filled rest with inference |
|
|
182
|
+
| **Reconstruction modification** | Source value exists but unit/format/range modified in TC | LLM paraphrase process contamination |
|
|
183
|
+
| **Source declaration absent** | Source file not specified when generating artifact | Process design stage problem |
|
|
184
|
+
|
|
185
|
+
**Simplification guard**: If 0 Phantoms, skip Step 4 entirely. Replace with one line: "Source grounding adequate."
|
|
186
|
+
|
|
187
|
+
> **Detail**: See `SKILL_detail.md §Step4-Detail` — Step 4 output format template — read when writing the pattern diagnosis section.
|
|
188
|
+
|
|
189
|
+
---
|
|
190
|
+
|
|
191
|
+
### Step 5. Post-Fix Re-audit (Optional)
|
|
192
|
+
|
|
193
|
+
Re-run back-trace for S-grade blocker claims after fixes are complete. Activate when 1+ S-grade blockers exist and fix is immediately possible.
|
|
194
|
+
|
|
195
|
+
**Done When (re-audit)**: Back-trace results for fixed claims all show Grounded (✅) status.
|
|
196
|
+
|
|
197
|
+
---
|
|
198
|
+
|
|
199
|
+
## Completion Declaration Format
|
|
200
|
+
|
|
201
|
+
> **Template**: See `SKILL_detail.md §Report-Template` — full completion declaration format — read when producing the final audit summary.
|
|
202
|
+
|
|
203
|
+
---
|
|
204
|
+
|
|
205
|
+
## Connected Skills
|
|
206
|
+
|
|
207
|
+
| Situation | Connected Skill |
|
|
208
|
+
|---|---|
|
|
209
|
+
| Simultaneously verify output patterns (self-declarations, cushion language) | `/steel-quench` Wave 1 "real-use verification" angle |
|
|
210
|
+
| Re-verify Phantom patterns from external user perspective | `/sim-conductor Area A` |
|
|
211
|
+
| Source not-read is a harness structure problem | `/harness-doctor` |
|
|
212
|
+
| Phantom pattern is a candidate for new rule items | `fh-meta:persona-innovator` |
|
|
213
|
+
| Redesign the artifact generation prompt itself | `/meta-prompt-builder` |
|
|
214
|
+
|
|
215
|
+
---
|
|
216
|
+
|
|
217
|
+
## External User Environment Adaptation
|
|
218
|
+
|
|
219
|
+
This skill can be used independently without the full meta-harness structure.
|
|
220
|
+
|
|
221
|
+
**How to declare source files**: When generating artifacts, specify "source: [file path list]", or provide source files when invoking this skill.
|
|
222
|
+
|
|
223
|
+
**External environment fallback**:
|
|
224
|
+
- If no `tracks/_meta/` → skip persistence step
|
|
225
|
+
- If no project-specific rules (like PFD) → output Phantom pattern summary only
|
|
226
|
+
|
|
227
|
+
---
|
|
228
|
+
|
|
229
|
+
## Done When
|
|
230
|
+
|
|
231
|
+
```
|
|
232
|
+
Step 1 claim extraction complete
|
|
233
|
+
+ Step 2 all claims back-traced (using Read tool — no inference judgment)
|
|
234
|
+
+ Step 3 Phantom severity classification + prescription output
|
|
235
|
+
+ Step 4 process pattern diagnosis complete (skip if 0 Phantoms)
|
|
236
|
+
+ "phantom-quench Complete" declaration output
|
|
237
|
+
```
|
|
238
|
+
|
|
239
|
+
Verdict: PASS (0 Phantom claims) | CONDITIONAL_PASS (LOW-severity Phantoms only, prescriptions noted) | FAIL (1+ HIGH/MEDIUM Phantom — broken path, phantom file, or stale external link) | ESCALATE (scope unclear or claim extraction impossible)
|
|
240
|
+
|
|
241
|
+
---
|
|
242
|
+
|
|
243
|
+
## Operating Notes
|
|
244
|
+
|
|
245
|
+
- **Never back-trace by inference**: Judging "this value is probably in the source" treats it as Partial not Phantom. Always directly confirm with Read + Grep.
|
|
246
|
+
- **Partial is not Grounded**: Processing similar-value-in-source as Grounded misses the reconstruction modification pattern.
|
|
247
|
+
- **Source not declared itself is S-grade**: If source is not declared when making an artifact, no claim can subsequently be verified. Recommend mandating source declaration in the process design stage.
|
|
248
|
+
- **Recommended to use with steel-quench**: steel-quench quenches structural flaws, phantom-quench ensures source consistency. The two skills are orthogonal and artifact quality assurance is strengthened when used together.
|
|
@@ -1,4 +1,4 @@
|
|
|
1
|
-
#
|
|
1
|
+
# phantom-quench — Execution Detail
|
|
2
2
|
|
|
3
3
|
On-demand reference. Load the section indicated by the pointer in SKILL.md.
|
|
4
4
|
|
|
@@ -153,7 +153,7 @@ Process prescription:
|
|
|
153
153
|
**Completion Declaration Format**
|
|
154
154
|
|
|
155
155
|
```
|
|
156
|
-
##
|
|
156
|
+
## phantom-quench Complete
|
|
157
157
|
|
|
158
158
|
Audit scope: {artifact file} / source {N files}
|
|
159
159
|
{N} total claims audited
|
|
@@ -179,4 +179,4 @@ Next actions:
|
|
|
179
179
|
|
|
180
180
|
**Evidence Record**
|
|
181
181
|
|
|
182
|
-
- **Verified in practice**: TC generation without reading source files → steel-quench passes →
|
|
182
|
+
- **Verified in practice**: TC generation without reading source files → steel-quench passes → phantom-quench back-trace detects numerous Phantoms (notifications vs. push notifications, version names vs. non-enrolled, bottom sheet vs. screen navigation). **Procedure**: Read sources in order then regenerate → replace with source-based TCs. **Recurrence prevention**: Source gate implementation — FileNotFoundError if required source files absent. steel-quench misses this because: outputs look logically sound so pattern attacks cannot identify Phantoms — only source back-tracing can detect them.
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: pipeline-conductor
|
|
3
|
-
description: Chains the four core FH verification pipelines (harvest-loop → steel-quench →
|
|
3
|
+
description: Chains the four core FH verification pipelines (harvest-loop → steel-quench → phantom-quench → sim-conductor) into a single gated sweep. Accepts a scope (single skill, specific asset, full harness) and aggregates results into one structured report. Supports --quick mode (steps 2+3 only) and --full mode (all four steps). Triggered by "run the full pipeline", "chain all verifications", "end-to-end sweep", "pipeline-conductor", or "verify everything".
|
|
4
4
|
user-invocable: true
|
|
5
5
|
allowed-tools: ["Read", "Write", "Bash", "Grep", "Glob", "Agent"]
|
|
6
6
|
model: sonnet
|
|
@@ -10,7 +10,7 @@ model: sonnet
|
|
|
10
10
|
|
|
11
11
|
Chains the four standalone FH verification pipelines into a gated sequence. Each step receives the previous step's verdict before proceeding. Aggregates all findings into a single structured report at the end.
|
|
12
12
|
|
|
13
|
-
The gap this closes: harvest-loop, steel-quench,
|
|
13
|
+
The gap this closes: harvest-loop, steel-quench, phantom-quench, and sim-conductor are each invocable independently but have no automatic hand-off between them. Running them sequentially by hand loses inter-step signal — a FAIL in step 2 should block step 3 rather than silently continuing. pipeline-conductor enforces that ordering.
|
|
14
14
|
|
|
15
15
|
## Triggers
|
|
16
16
|
|
|
@@ -92,7 +92,7 @@ Do not infer scope — a wrong scope produces misleading verdicts.
|
|
|
92
92
|
|
|
93
93
|
The four constituent skills use heterogeneous scope models. Translate the pipeline scope to each skill's invocation form before running any step:
|
|
94
94
|
|
|
95
|
-
| Pipeline scope | harvest-loop (Step 1) | steel-quench (Step 2) |
|
|
95
|
+
| Pipeline scope | harvest-loop (Step 1) | steel-quench (Step 2) | phantom-quench (Step 3) | sim-conductor (Step 4) |
|
|
96
96
|
|---|---|---|---|---|
|
|
97
97
|
| Single SKILL.md | Check session findings relevant to this skill; propose mode only | Adversarial attack on this SKILL.md | Back-trace claims in this SKILL.md to declared sources | Area D (artifact review) on this SKILL.md |
|
|
98
98
|
| Specific directory | Check session findings in this domain | Attack all SKILL.md files in directory | Back-trace all claims in directory | Area A + Area D on the domain |
|
|
@@ -219,13 +219,13 @@ Run steel-quench against the target scope.
|
|
|
219
219
|
|
|
220
220
|
---
|
|
221
221
|
|
|
222
|
-
## Step 3.
|
|
222
|
+
## Step 3. phantom-quench — Phantom Claim Detection
|
|
223
223
|
|
|
224
|
-
Run
|
|
224
|
+
Run phantom-quench against the target scope.
|
|
225
225
|
|
|
226
226
|
**What it checks**: Proper nouns, numerical values, file paths, and branching conditions back-traced to declared source files. Claims not found in source are marked Phantom.
|
|
227
227
|
|
|
228
|
-
**Invocation**: Run
|
|
228
|
+
**Invocation**: Run phantom-quench scoped to the same target as Steps 1 and 2.
|
|
229
229
|
|
|
230
230
|
**Load-bearing Phantom** (binary test — apply mechanically):
|
|
231
231
|
|
|
@@ -238,7 +238,7 @@ All other locations (§Triggers, advisory §Chains language, frontmatter descrip
|
|
|
238
238
|
|
|
239
239
|
**Verdict criteria**:
|
|
240
240
|
|
|
241
|
-
|
|
|
241
|
+
| phantom-quench result | pipeline-conductor verdict |
|
|
242
242
|
|---|---|
|
|
243
243
|
| 0 Phantoms, all claims grounded | `PASS` |
|
|
244
244
|
| Phantom claims found, none load-bearing (binary test) | `CONDITIONAL_PASS` — list Phantoms |
|
|
@@ -246,12 +246,12 @@ All other locations (§Triggers, advisory §Chains language, frontmatter descrip
|
|
|
246
246
|
| Grounding ambiguous (source file exists but content unclear) | `ESCALATE` |
|
|
247
247
|
|
|
248
248
|
**On FAIL**: Output the load-bearing Phantom(s). Ask:
|
|
249
|
-
> "
|
|
249
|
+
> "phantom-quench found a load-bearing Phantom claim. Fix and re-run Step 3, or abort the sweep?"
|
|
250
250
|
|
|
251
251
|
**On CONDITIONAL_PASS**: Capture non-load-bearing Phantoms. Continue to Step 4.
|
|
252
252
|
|
|
253
253
|
```
|
|
254
|
-
[Step 3 —
|
|
254
|
+
[Step 3 — phantom-quench]
|
|
255
255
|
Verdict: {verdict}
|
|
256
256
|
Basis: {one-line}
|
|
257
257
|
Phantoms: {count} — {load-bearing: Y/N} — {top item or "none"}
|
|
@@ -320,7 +320,7 @@ pipeline-conductor — Sweep Report
|
|
|
320
320
|
Step 0.5 — return-path-gate: {PASS / CONDITIONAL_PASS / FAIL / SKIPPED / degraded}
|
|
321
321
|
Step 1 — harvest-loop: {PASS / CONDITIONAL_PASS / FAIL / ESCALATE / SKIPPED}
|
|
322
322
|
Step 2 — steel-quench: {verdict}
|
|
323
|
-
Step 3 —
|
|
323
|
+
Step 3 — phantom-quench: {verdict}
|
|
324
324
|
Step 4 — sim-conductor: {verdict}
|
|
325
325
|
|
|
326
326
|
Overall: {CLEAN (--full) / CLEAN (--quick) / CLEAN (--no-sim) / PENDING / BLOCKED}
|