@chrono-meta/fh-gate 1.0.3 → 1.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude/agents/challenger.md +169 -0
- package/AGENTS.md +160 -0
- package/CATALOG.md +256 -0
- package/CHEATSHEET.md +367 -0
- package/CLAUDE.md +331 -0
- package/CONTRIBUTING.md +198 -0
- package/LICENSE +21 -0
- package/README.md +131 -418
- package/bin/fh-goal.js +9 -0
- package/bin/fh-run.js +9 -0
- package/docs/banner.png +0 -0
- package/docs/codex-compat.md +123 -0
- package/docs/pillars.svg +70 -0
- package/knowledge/shared/harness-core/fh_integration_contract.md +48 -29
- package/package.json +31 -6
- package/plugins/fh-commons/README.md +37 -0
- package/plugins/fh-commons/agents/quench-challenger.md +373 -0
- package/plugins/fh-commons/skills/convergence-loop/SKILL.md +155 -0
- package/plugins/fh-commons/skills/deliberation/SKILL.md +288 -0
- package/plugins/fh-commons/skills/mcp-circuit-breaker/SKILL.md +196 -0
- package/plugins/fh-commons/skills/token-budget-gate/SKILL.md +175 -0
- package/plugins/fh-meta/agents/fact-checker.md +121 -0
- package/plugins/fh-meta/agents/hub-persona-auditor.md +109 -0
- package/plugins/fh-meta/agents/persona-innovator.md +195 -0
- package/plugins/fh-meta/skills/agent-composer/SKILL.md +461 -0
- package/plugins/fh-meta/skills/agent-composer/SKILL_detail.md +464 -0
- package/plugins/fh-meta/skills/apex-review/SKILL.md +185 -0
- package/plugins/fh-meta/skills/asset-placement-gate/SKILL.md +135 -0
- package/plugins/fh-meta/skills/contention-layer/SKILL.md +127 -0
- package/plugins/fh-meta/skills/context-bridge-dispatch/SKILL.md +30 -0
- package/plugins/fh-meta/skills/context-bridge-dispatch/SKILL_detail.md +144 -0
- package/plugins/fh-meta/skills/context-doctor/SKILL.md +341 -0
- package/plugins/fh-meta/skills/cross-ecosystem-synergy-detection/SKILL.md +202 -0
- package/plugins/fh-meta/skills/deep-clarify/SKILL.md +144 -0
- package/plugins/fh-meta/skills/edit-manifest/SKILL.md +210 -0
- package/plugins/fh-meta/skills/field-harvest/SKILL.md +384 -0
- package/plugins/fh-meta/skills/frontier-digest/SKILL.md +272 -0
- package/plugins/fh-meta/skills/goal-quench/SKILL.md +509 -0
- package/plugins/fh-meta/skills/harness-doctor/SKILL.md +277 -0
- package/plugins/fh-meta/skills/harness-doctor/SKILL_detail.md +484 -0
- package/plugins/fh-meta/skills/harvest-loop/SKILL.md +231 -0
- package/plugins/fh-meta/skills/harvest-loop/SKILL_detail.md +201 -0
- package/plugins/fh-meta/skills/hub-cc-pr-reviewer/SKILL.md +129 -0
- package/plugins/fh-meta/skills/hub-cc-pr-reviewer/SKILL_detail.md +158 -0
- package/plugins/fh-meta/skills/install-doctor/SKILL.md +207 -0
- package/plugins/fh-meta/skills/install-wizard/SKILL.md +613 -0
- package/plugins/fh-meta/skills/marketplace-gate/SKILL.md +193 -0
- package/plugins/fh-meta/skills/memory-hygiene/SKILL.md +143 -0
- package/plugins/fh-meta/skills/meta-prompt-builder/SKILL.md +167 -0
- package/plugins/fh-meta/skills/meta-prompt-builder/SKILL_detail.md +37 -0
- package/plugins/fh-meta/skills/pipeline-conductor/SKILL.md +430 -0
- package/plugins/fh-meta/skills/plugin-recommender/SKILL.md +221 -0
- package/plugins/fh-meta/skills/plugin-recommender/SKILL_detail.md +220 -0
- package/plugins/fh-meta/skills/prompt-regression/SKILL.md +178 -0
- package/plugins/fh-meta/skills/public-surface-audit/SKILL.md +224 -0
- package/plugins/fh-meta/skills/return-path-gate/SKILL.md +257 -0
- package/plugins/fh-meta/skills/self-marketing-lint/SKILL.md +129 -0
- package/plugins/fh-meta/skills/sim-conductor/SKILL.md +364 -0
- package/plugins/fh-meta/skills/sim-conductor/SKILL_detail.md +337 -0
- package/plugins/fh-meta/skills/skill-splitter/SKILL.md +126 -0
- package/plugins/fh-meta/skills/skill-splitter/SKILL_detail.md +185 -0
- package/plugins/fh-meta/skills/source-grounding-audit/SKILL.md +230 -0
- package/plugins/fh-meta/skills/source-grounding-audit/SKILL_detail.md +182 -0
- package/plugins/fh-meta/skills/steel-quench/SKILL.md +226 -0
- package/plugins/fh-meta/skills/steel-quench/SKILL_detail.md +453 -0
- package/plugins/fh-meta/skills/verify-bidirectional/SKILL.md +238 -0
- package/scripts/fh-gate.sh +175 -40
- package/scripts/fh-goal.sh +182 -0
- package/scripts/fh-run.sh +269 -0
|
@@ -0,0 +1,509 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: goal-quench
|
|
3
|
+
description: >-
|
|
4
|
+
Wraps /goal with a tiered safety + orchestration ladder. core (default): a token budget gate (pre-run estimate), mid-run budget thresholds, and an automatic post-run quality verification via pipeline-conductor — closing /goal's two gaps (Haiku evaluates completion, pipeline-conductor evaluates correctness). pro: adds context-doctor token reduction and agent-composer goal decomposition. max: adds plugin-recommender capability-gap fill and cross-ecosystem-synergy-detection pre-validation. The Phase-1 budget verdict auto-recommends the mode. Triggered by "goal with quality gate", "safe goal", "goal-quench", "orchestrate this goal", or before running /goal on high-stakes tasks.
|
|
5
|
+
user-invocable: true
|
|
6
|
+
allowed-tools: ["Read", "Write", "Bash", "Grep"]
|
|
7
|
+
model: sonnet
|
|
8
|
+
complexity_routing:
|
|
9
|
+
base: sonnet
|
|
10
|
+
escalate_when:
|
|
11
|
+
- pro_mode # orchestration: context-doctor + agent-composer decomposition
|
|
12
|
+
- max_mode # + external discovery: plugin-recommender + synergy pre-validation
|
|
13
|
+
high: opus
|
|
14
|
+
# NOTE: CC switches models per-turn by task weight, not per-skill invocation.
|
|
15
|
+
# With `/model opusplan`: Opus activates on plan-mode turns (reasoning, decomposition);
|
|
16
|
+
# Sonnet handles execution turns (tool calls, edits). Both appear in session jsonl.
|
|
17
|
+
# Without opusplan: all turns use the session model regardless of mode.
|
|
18
|
+
---
|
|
19
|
+
|
|
20
|
+
# goal-quench — /goal with Token Budget + Quality Gate
|
|
21
|
+
|
|
22
|
+
`/goal` runs until Haiku says "done" — but Haiku only checks completion, not quality. Without a budget ceiling, sessions can exhaust tokens silently. goal-quench adds three things that /goal currently lacks:
|
|
23
|
+
|
|
24
|
+
1. **Pre-run**: token-budget-gate estimate — know the cost before committing
|
|
25
|
+
2. **Mid-run**: budget threshold awareness — signal before exhaustion (instructional; not mechanically enforced)
|
|
26
|
+
3. **Post-run**: pipeline-conductor — verify quality before accepting "done"
|
|
27
|
+
|
|
28
|
+
The evaluator principle: Haiku judges completion (every turn, cheap). pipeline-conductor judges quality (once at the end, structured). Separating the two closes the self-evaluation bias that a single evaluator cannot avoid.
|
|
29
|
+
|
|
30
|
+
> **Scope by mode**: core = budget gate + stop-hook verification (v1 behavior, unchanged). pro/max add token reduction, goal decomposition, and external discovery (see Modes below). (Tier names mirror Claude Code's subscription units — core / pro / max — and avoid colliding with pipeline-conductor's `--full` flag.) Mid-run Sonnet quality signals remain deferred (requires empirical calibration).
|
|
31
|
+
|
|
32
|
+
---
|
|
33
|
+
|
|
34
|
+
## Modes — core → pro → max (fluid)
|
|
35
|
+
|
|
36
|
+
goal-quench is a ladder, not a fixed shape. The tier names mirror Claude Code's subscription units (core / pro / max). The default (**core**) is the narrow safety belt — for users who only want /goal's two structural gaps closed. **pro** and **max** add optimization and orchestration on top, and are **auto-recommended by the Phase-1 budget verdict** so the orchestration cost is only paid when the task is large enough to justify it.
|
|
37
|
+
|
|
38
|
+
| Mode | Adds over previous | Chained skills | Auto-recommended when |
|
|
39
|
+
|---|---|---|---|
|
|
40
|
+
| **core** (default) | budget gate + mid-run thresholds + post-run quality gate | token-budget-gate, pipeline-conductor --quick | budget GREEN / YELLOW |
|
|
41
|
+
| **pro** | token-reduction pre-pass + goal decomposition into Waves | + context-doctor, agent-composer | budget ORANGE |
|
|
42
|
+
| **max** | capability-gap fill + synergy pre-validation before the run | + plugin-recommender, cross-ecosystem-synergy-detection | budget RED |
|
|
43
|
+
|
|
44
|
+
Each mode is a **superset** of the one before it — pro does everything core does, plus more. Nothing in core is removed by escalating.
|
|
45
|
+
|
|
46
|
+
**Selection**:
|
|
47
|
+
- Explicit flag: `/goal-quench --core` (default) · `--pro` · `--max`
|
|
48
|
+
- Auto: Phase 1's budget verdict proposes the mode (see Phase 1 Step 2). The user can always override **down** to core.
|
|
49
|
+
|
|
50
|
+
**Token-honesty guard** (what empirical calibration showed): The cost structure of pro/max depends on which sidecar pattern is in use:
|
|
51
|
+
|
|
52
|
+
| Sidecar type | Who uses it | Cost visibility | Estimation |
|
|
53
|
+
|---|---|---|---|
|
|
54
|
+
| External CLI (Gemini, Codex, Copilot) | Multi-LLM environment | **Not visible in CC** | Low — external billing only |
|
|
55
|
+
| CC sub-agent (isolated context, steel-quench pattern) | CC-only environment | **Visible in CC** | Estimable: Sonnet × sub-agent scope |
|
|
56
|
+
|
|
57
|
+
For **CC-only users**, pro/max sidecars run as isolated sub-agents (Sonnet for execution), with the orchestrator running at Opus (`complexity_routing`). Both are CC-visible. Cost estimate: `(Opus orchestrator turns × ~3×) + (Sonnet sub-agent tokens)`. The sub-agent context isolation means sidecar work does not load the main context — the context-preservation property holds regardless of sidecar type.
|
|
58
|
+
|
|
59
|
+
Therefore:
|
|
60
|
+
- **Never auto-escalate a GREEN/YELLOW task to pro/max.** For external CLI users, sidecar costs are untracked and invisible in CC's budget display. For CC sub-agent users, the overhead is estimable but still real — orchestrator Opus × 3 applies. Note the appropriate caveat when disclosing cost.
|
|
61
|
+
- **Model escalation cost**: With `/model opusplan`, Opus activates on plan-mode turns (reasoning, decomposition decisions) and Sonnet handles execution turns — both CC-visible in session jsonl under `message.model`. External CLI sidecar fees are additional and invisible to CC; state both when present.
|
|
62
|
+
- **RED is no longer a dead-end.** v1 hard-blocked RED ("split manually"). v2 turns RED into the **on-ramp to max** — agent-composer decomposes the over-budget goal into sequential sub-goals automatically.
|
|
63
|
+
|
|
64
|
+
---
|
|
65
|
+
|
|
66
|
+
## Triggers
|
|
67
|
+
|
|
68
|
+
- `/goal-quench`
|
|
69
|
+
- "run /goal with a quality gate", "safe goal run", "goal with verification"
|
|
70
|
+
- "I want to use /goal but worried about tokens", "goal with budget control"
|
|
71
|
+
- "goal-quench", before any long /goal session
|
|
72
|
+
- `/goal-quench --pro`, `/goal-quench --max`, "orchestrate this goal", "decompose this goal", "optimize then run this goal"
|
|
73
|
+
- "this goal is too big for one run", "find a tool for this goal if FH lacks one" (→ max mode)
|
|
74
|
+
- Automatically proposed when user mentions `/goal` on tasks estimated > 15K tokens
|
|
75
|
+
- Mode is auto-recommended by the Phase-1 budget verdict (GREEN/YELLOW → core, ORANGE → pro, RED → max)
|
|
76
|
+
|
|
77
|
+
---
|
|
78
|
+
|
|
79
|
+
## One-Time Setup (required)
|
|
80
|
+
|
|
81
|
+
### Claude Code Runtime
|
|
82
|
+
|
|
83
|
+
Copy the hook snippet from `templates/goal-quench-hook-setup.md` and merge into your `.claude/settings.json`.
|
|
84
|
+
|
|
85
|
+
Add to `.gitignore`:
|
|
86
|
+
```
|
|
87
|
+
.claude/goal-quench.active
|
|
88
|
+
.claude/goal-quench.pending
|
|
89
|
+
```
|
|
90
|
+
|
|
91
|
+
**Hook failure fallback**: The Stop hook may fire silently without triggering verification (hook failure, session reset, or CC version differences). If Phase 3 verification has not run after /goal completes, manually trigger it:
|
|
92
|
+
> `/goal-quench --verify` — skips Phase 1+2, runs pipeline-conductor --quick using current session scope.
|
|
93
|
+
|
|
94
|
+
**Interrupted /goal recovery**: If /goal is interrupted (error, user abort, CC crash), the Stop hook may not fire — leaving `.claude/goal-quench.active` without a `.pending`. Detect and recover:
|
|
95
|
+
```bash
|
|
96
|
+
[ -f .claude/goal-quench.active ] && ! [ -f .claude/goal-quench.pending ] && echo "Interrupted — run /goal-quench --recover"
|
|
97
|
+
```
|
|
98
|
+
> `/goal-quench --recover` — promotes `.active` → `.pending` and runs Phase 3 verification on partially-completed work.
|
|
99
|
+
|
|
100
|
+
### Codex Runtime
|
|
101
|
+
|
|
102
|
+
Codex has its own goal/session capability. Do not replace it with goal-quench state files. In Codex-primary sessions:
|
|
103
|
+
|
|
104
|
+
1. Use Codex's native goal/session feature for goal control.
|
|
105
|
+
2. Use `fh-run` for any FH skill/agent sub-step that would otherwise require Claude Code `Agent(...)`.
|
|
106
|
+
3. After the Codex goal completes, run FH governance on changed files:
|
|
107
|
+
|
|
108
|
+
```bash
|
|
109
|
+
FH_BACKEND=codex npx @chrono-meta/fh-gate "{changed-files}" quick codex-goal
|
|
110
|
+
```
|
|
111
|
+
|
|
112
|
+
For non-interactive one-shot runs only, `fh-goal` can run a backend task and then invoke `fh-gate` automatically:
|
|
113
|
+
|
|
114
|
+
```bash
|
|
115
|
+
FH_BACKEND=codex npx --package @chrono-meta/fh-gate fh-goal \
|
|
116
|
+
--prompt "{task}" \
|
|
117
|
+
--gate quick
|
|
118
|
+
```
|
|
119
|
+
|
|
120
|
+
`fh-goal` is not a Codex goal replacement; it is a post-run governance wrapper.
|
|
121
|
+
|
|
122
|
+
---
|
|
123
|
+
|
|
124
|
+
## Phase 1 — Pre-run Gate
|
|
125
|
+
|
|
126
|
+
### Step 1. Collect task description
|
|
127
|
+
|
|
128
|
+
Ask:
|
|
129
|
+
> "What will you run with /goal? Describe the task and expected scope."
|
|
130
|
+
|
|
131
|
+
Collect: task description, target files or directories, expected output.
|
|
132
|
+
|
|
133
|
+
**Pre-flight check (pro/max mode only)**: before proposing pro/max, verify required chained skills are available:
|
|
134
|
+
```bash
|
|
135
|
+
for skill in context-doctor agent-composer; do
|
|
136
|
+
[ -f ".claude/plugins/cache/forge-harness/fh-meta/"*"/skills/${skill}/SKILL.md" ] 2>/dev/null \
|
|
137
|
+
|| find ~/.claude/plugins -name "${skill}" -type d 2>/dev/null | grep -q . \
|
|
138
|
+
|| echo "WARNING: ${skill} not found — pro/max mode requires it. Falling back to core."
|
|
139
|
+
done
|
|
140
|
+
```
|
|
141
|
+
If any required skill is missing, **fall back to core and warn**: "Pro/max mode requires `{skill}` — not installed. Running in core mode instead."
|
|
142
|
+
|
|
143
|
+
### Step 2. token-budget-gate estimate
|
|
144
|
+
|
|
145
|
+
**Invocation contract**:
|
|
146
|
+
```
|
|
147
|
+
Input: task description (one paragraph), target file count (approximate)
|
|
148
|
+
Trigger phrase: "estimate token budget for: {task description}"
|
|
149
|
+
Expected output fields:
|
|
150
|
+
- estimated_tokens: N
|
|
151
|
+
- verdict: GREEN | YELLOW | ORANGE | RED
|
|
152
|
+
- reasoning: one-line basis for estimate
|
|
153
|
+
```
|
|
154
|
+
|
|
155
|
+
If `token-budget-gate` skill is not installed, use this fallback estimator:
|
|
156
|
+
```
|
|
157
|
+
< 5 files changed, no new architecture → GREEN (< 10K)
|
|
158
|
+
5–20 files or new module/feature → YELLOW (10K–30K)
|
|
159
|
+
20+ files or cross-system refactor → ORANGE (30K–60K)
|
|
160
|
+
Full-project migration or rewrite → RED (> 60K)
|
|
161
|
+
```
|
|
162
|
+
**Session overhead factor (empirical calibration, N=10, 2026-06-01–06-03)**: The tiers above estimate *task* tokens only. Actual full CC session tokens average 4.7× the task estimate (range: 1.3×–10.5×) for harness-heavy projects (dense CLAUDE.md + multi-rule auto-load). Multiply task estimate by 4× for FH-density projects (rough inference only — not measured for non-FH projects), to get expected session total. This multiplier informs the mode recommendation — it does not change the go/no-go gate thresholds.
|
|
163
|
+
|
|
164
|
+
Note the fallback in `.active` as `budget_source: fallback-heuristic` instead of `budget_source: token-budget-gate`.
|
|
165
|
+
|
|
166
|
+
Run token-budget-gate against the task description. Map the result to a go/no-go decision:
|
|
167
|
+
|
|
168
|
+
| token-budget-gate verdict | goal-quench action |
|
|
169
|
+
|---|---|
|
|
170
|
+
| GREEN (< 10K) | Proceed without comment |
|
|
171
|
+
| YELLOW (10K–30K) | Proceed — note estimated cost, suggest /goal scope if possible |
|
|
172
|
+
| ORANGE (30K–60K) | Propose **pro mode** as the cheaper path: "This is expensive. Run as a single /goal, or let pro mode optimize (context-doctor) + decompose (agent-composer) it?" Proceed in core only if the user declines orchestration. |
|
|
173
|
+
| RED (> 60K) | Propose **max mode** instead of a hard block: "Too big for one /goal run. Route through max mode — agent-composer decomposes into sequential sub-goals, plugin-recommender fills any capability gaps." Hard-block only if the user declines orchestration. |
|
|
174
|
+
|
|
175
|
+
On RED, the user has two paths:
|
|
176
|
+
- **Accept max mode** (recommended) → goal-quench enters Phase 1.5 and lets agent-composer decompose the over-budget goal into sequential sub-goals, each small enough to run under its own budget. This is the v2 recovery path that replaces v1's dead-end.
|
|
177
|
+
- **Decline orchestration** → goal-quench halts and does not write `.active` or inject thresholds. The user may still run `/goal` manually — but goal-quench will not gate or verify a session started without its active file.
|
|
178
|
+
|
|
179
|
+
### Step 3. Write state file
|
|
180
|
+
|
|
181
|
+
Create `.claude/goal-quench.active`:
|
|
182
|
+
|
|
183
|
+
```
|
|
184
|
+
scope: {task description — one line}
|
|
185
|
+
mode: core | pro | max
|
|
186
|
+
target_files: {comma-separated file paths or directory; "inferred from git diff" if not known}
|
|
187
|
+
budget_estimate: {N} tokens
|
|
188
|
+
budget_source: token-budget-gate | fallback-heuristic
|
|
189
|
+
budget_verdict: {GREEN/YELLOW/ORANGE/RED}
|
|
190
|
+
pipeline_mode: {--quick for core/pro · --full for max}
|
|
191
|
+
composed_plan: {sub-goal list from Phase 1.5 agent-composer; "n/a" in core mode}
|
|
192
|
+
timestamp: {YYYY-MM-DD HH:MM}
|
|
193
|
+
session_pid: {$$}
|
|
194
|
+
start_commit: {git rev-parse HEAD}
|
|
195
|
+
```
|
|
196
|
+
|
|
197
|
+
`target_files` is the **verification artifact** — what pipeline-conductor --quick will evaluate in Phase 3. Specify files/directories explicitly if known; otherwise write `"inferred from git diff"` and Phase 3 will resolve using `git diff {start_commit}..HEAD` to capture all changes including interim commits made during the /goal session.
|
|
198
|
+
|
|
199
|
+
> **Control flow (core vs pro/max)**: in **core** mode, proceed directly Step 3 → Step 4. In **pro/max** mode, run **Phase 1.5 (orchestration) here — after Step 3, before Step 4** — then return to Step 4 to inject thresholds and hand off. The `composed_plan` field in `.active` stays `n/a` until Phase 1.5 fills it.
|
|
200
|
+
|
|
201
|
+
### Step 4. Inject mid-run budget thresholds
|
|
202
|
+
|
|
203
|
+
Before the user runs /goal, output the following instruction block so Claude carries it into the /goal session:
|
|
204
|
+
|
|
205
|
+
```
|
|
206
|
+
─── goal-quench budget thresholds (active for this /goal run) ───
|
|
207
|
+
50% of estimated budget consumed:
|
|
208
|
+
→ Output a one-line progress summary. No action required.
|
|
209
|
+
|
|
210
|
+
70% consumed:
|
|
211
|
+
→ YELLOW: re-prioritize remaining tasks. Drop lowest-priority items
|
|
212
|
+
if the goal can still be met without them.
|
|
213
|
+
|
|
214
|
+
85% consumed:
|
|
215
|
+
→ ORANGE: stop current task, commit completed work, surface to user:
|
|
216
|
+
"Budget at 85%. Completed: [X]. Remaining: [Y]. Continue / stop?"
|
|
217
|
+
|
|
218
|
+
95% consumed:
|
|
219
|
+
→ RED: stop immediately. Commit everything completed so far.
|
|
220
|
+
Output: "Budget exhausted. Completed tasks: [list]. Incomplete: [list]."
|
|
221
|
+
─────────────────────────────────────────────────────────────────
|
|
222
|
+
```
|
|
223
|
+
|
|
224
|
+
### Step 5. Hand off to /goal
|
|
225
|
+
|
|
226
|
+
Output:
|
|
227
|
+
> "goal-quench ready. Budget: {verdict} (~{N} tokens). Now run: `/goal {your condition}`
|
|
228
|
+
> When /goal finishes, the Stop hook will trigger pipeline-conductor --quick automatically."
|
|
229
|
+
|
|
230
|
+
---
|
|
231
|
+
|
|
232
|
+
## Phase 1.5 — Pro / Max Orchestration Layer
|
|
233
|
+
|
|
234
|
+
> Runs only in `--pro` or `--max` mode (auto-proposed on ORANGE/RED budget, or user-selected). **Skipped entirely in core mode** — core hands off straight to /goal after Step 5.
|
|
235
|
+
|
|
236
|
+
The ordering is deliberate: optimize the context first (cheapest win), then decompose, then — only if a capability gap remains — reach outside FH.
|
|
237
|
+
|
|
238
|
+
### Step A — context-doctor (token-reduction pre-pass) · pro + max
|
|
239
|
+
|
|
240
|
+
Invoke context-doctor on the target scope before /goal runs. It generates/updates `.claudeignore`, flags over-read files, and recommends `/clear` timing. This lowers the per-turn token floor /goal will consume — actual reduction, not just the estimate the budget gate produced.
|
|
241
|
+
|
|
242
|
+
**Re-estimate** the budget after the pre-pass. If the verdict drops (e.g., ORANGE → YELLOW), offer to step back down to core: "Context trimmed; estimate is now YELLOW. Continue in pro, or drop to core?" (See Simplification Guards — post-optimization step-down.)
|
|
243
|
+
|
|
244
|
+
### Step B — agent-composer (goal decomposition) · pro + max
|
|
245
|
+
|
|
246
|
+
Hand the task description to agent-composer in compose-only mode. It returns a Wave plan: the goal split into independent/sequential sub-tasks with capability-fit scoring (agent-composer Step 0.2). For RED-origin runs this decomposition is the recovery path v1 lacked — each sub-goal is small enough to run under its own budget.
|
|
247
|
+
|
|
248
|
+
**Sub-goal execution (user-driven, not automatic)**: `/goal` is a user-invoked command — goal-quench cannot programmatically drive a loop over sub-goals. After agent-composer produces the Wave plan, goal-quench writes it to `.claude/goal-quench.queue` and outputs:
|
|
249
|
+
|
|
250
|
+
> "Sub-goal plan ready. Run each sub-goal in order by invoking `/goal-quench` again with the next sub-goal description. goal-quench will gate each sub-goal independently."
|
|
251
|
+
|
|
252
|
+
Sub-goal queue format (`.claude/goal-quench.queue`):
|
|
253
|
+
```
|
|
254
|
+
remaining:
|
|
255
|
+
- sub-goal-1: {description}
|
|
256
|
+
- sub-goal-2: {description}
|
|
257
|
+
completed: []
|
|
258
|
+
```
|
|
259
|
+
|
|
260
|
+
Each `/goal-quench` invocation pops the first `remaining` item, runs it as a core-mode run (Phase 1 GREEN/YELLOW expected → /goal → Phase 3 verify → commit), then moves it to `completed`. The outer pro/max run owns the decomposition; each inner invocation owns its sub-goal's gate. When `remaining` is empty, delete `.claude/goal-quench.queue`. (Parallel sub-goal execution is deferred — sequential is the v2 contract.)
|
|
261
|
+
|
|
262
|
+
goal-quench does **not** re-implement agent-composer's gates — its destructive-action gate (Step 2.7) and per-wave fan-out cap apply as-is.
|
|
263
|
+
|
|
264
|
+
### Step C — plugin-recommender + synergy pre-validation · max only
|
|
265
|
+
|
|
266
|
+
Triggered only when agent-composer Step 0.2 reports a capability **GAP** (`fit_score < 0.5` on a required-weight sub-task):
|
|
267
|
+
1. `plugin-recommender` searches FH + Codex + Claude Code marketplaces for a fitting skill/agent.
|
|
268
|
+
2. For each candidate, `cross-ecosystem-synergy-detection` pre-validates fit + overlap **before anything is installed**.
|
|
269
|
+
3. User decides: install / skip / general-purpose fallback (agent-composer's degraded-composition rule applies — `⚠️ degraded: [role]`).
|
|
270
|
+
|
|
271
|
+
max mode never installs anything silently — discovery and synergy-check are surfaced for approval first.
|
|
272
|
+
|
|
273
|
+
### Step D — scope-driven sidecar configuration · pro + max
|
|
274
|
+
|
|
275
|
+
Runs after Step C (or after Step B if no capability gap). Selects an adversarial sidecar based on the task's quality-risk profile — distinct from Step C's capability-gap sidecar, which addresses missing tools. Step D's sidecar addresses **blind-spot risk**: the generator and reviewer sharing the same reasoning distribution.
|
|
276
|
+
|
|
277
|
+
**Scope → sidecar routing table**:
|
|
278
|
+
|
|
279
|
+
| Task scope signal | Sidecar | Invocation point |
|
|
280
|
+
|---|---|---|
|
|
281
|
+
| Code quality review / new SKILL.md / governance gate change | `steel-quench` C3 config (Gemini sidecar if available) | Post-/goal, before pipeline-conductor |
|
|
282
|
+
| Architecture design / cross-project dependency | `agent-composer` multi-model panel (if external CLIs available; otherwise single-Claude sub-agent) | Parallel to /goal as separate Agent |
|
|
283
|
+
| External publication / marketplace-gate / skill release | `sim-conductor` + `steel-quench` Wave 5 | Post-/goal quality gate |
|
|
284
|
+
| No signal match (default) | None — pipeline-conductor handles quality alone | — |
|
|
285
|
+
|
|
286
|
+
Write resolved sidecar to `.active`:
|
|
287
|
+
```
|
|
288
|
+
sidecar: none | steel-quench-c3 | agent-composer-panel | sim-conductor | {cli-name}
|
|
289
|
+
sidecar_rationale: {one-line reason — which scope signal triggered this}
|
|
290
|
+
```
|
|
291
|
+
|
|
292
|
+
Output to user (one line only):
|
|
293
|
+
> "Sidecar: {config} — {rationale}"
|
|
294
|
+
|
|
295
|
+
Do not ask for confirmation. The user may override by re-running `/goal-quench --sidecar none`.
|
|
296
|
+
|
|
297
|
+
### Hand-off
|
|
298
|
+
|
|
299
|
+
After orchestration, **update** (not re-create) `.claude/goal-quench.active` — add `mode:` and `composed_plan:` lines alongside existing budget fields. Re-creating the file loses the `start_commit` field written in Phase 1 Step 3. Then proceed to threshold injection (Phase 1 Step 4) and hand off to /goal. Phase 3 verification then runs as in core — `pipeline-conductor --quick` for pro, `--full` for max (max implies external-facing stakes).
|
|
300
|
+
|
|
301
|
+
---
|
|
302
|
+
|
|
303
|
+
## Phase 2 — Mid-run (during /goal execution)
|
|
304
|
+
|
|
305
|
+
goal-quench does not directly control /goal execution. The budget thresholds injected in Phase 1 are instructions Claude follows during the /goal session.
|
|
306
|
+
|
|
307
|
+
**What Claude should do mid-run**:
|
|
308
|
+
|
|
309
|
+
- Track approximate token consumption against the Phase 1 estimate
|
|
310
|
+
- At each threshold (50/70/85/95%), execute the corresponding action
|
|
311
|
+
- Do not wait for goal-quench to intervene — the thresholds are self-enforced
|
|
312
|
+
|
|
313
|
+
**Threshold enforcement caveat**: Claude cannot reliably read its own real-time token consumption during a /goal session. The 50/70/85/95% thresholds are instructional — Claude approximates consumption based on task complexity and turn count. They are not mechanically enforced. For hard enforcement, see the Anthropic feature request (--budget flag).
|
|
314
|
+
|
|
315
|
+
**Queue mode (per-sub-goal threshold scope)**: When running sequential sub-goals from `.claude/goal-quench.queue`, each `/goal-quench` invocation is an independent core-mode run — thresholds are re-injected fresh for each sub-goal's Phase 1 estimate, not inherited from the outer pro/max run. The outer run owns the decomposition plan; each inner invocation owns its sub-goal's budget gate. Concretely: if sub-goal-1 estimate is 12K (YELLOW), thresholds are set against 12K — not against the outer RED estimate that triggered the queue in the first place.
|
|
316
|
+
|
|
317
|
+
**What goal-quench cannot do in v1** (requires native Anthropic support):
|
|
318
|
+
- Hard-enforce token ceiling mid-run (--budget flag, not yet available)
|
|
319
|
+
- Auto-checkpoint commit on sub-goal completion (--checkpoint flag, not yet available)
|
|
320
|
+
- Mid-run Sonnet quality signal (requires structured evaluator hooks)
|
|
321
|
+
|
|
322
|
+
These gaps are the basis for the Anthropic feature request (see `knowledge/shared/harness-core/goal_quench_anthropic_issue.md`).
|
|
323
|
+
|
|
324
|
+
---
|
|
325
|
+
|
|
326
|
+
## Phase 3 — Post-run Verification
|
|
327
|
+
|
|
328
|
+
### Claude Code path
|
|
329
|
+
|
|
330
|
+
When /goal stops, the Stop hook fires and:
|
|
331
|
+
1. Detects `.claude/goal-quench.active`
|
|
332
|
+
2. Copies it to `.claude/goal-quench.pending`
|
|
333
|
+
3. Removes `.active`
|
|
334
|
+
4. Prints: "[goal-quench] /goal finished. Verification pending — starting pipeline-conductor --quick."
|
|
335
|
+
|
|
336
|
+
On the next Claude response (after /goal), check for `.claude/goal-quench.pending` **with a freshness guard** — only act if the timestamp in the file is within the current session (< 4 hours old):
|
|
337
|
+
|
|
338
|
+
```bash
|
|
339
|
+
if [ -f .claude/goal-quench.pending ]; then
|
|
340
|
+
PENDING_TIME=$(grep "^timestamp:" .claude/goal-quench.pending | sed 's/^timestamp: //')
|
|
341
|
+
NOW=$(date +%s)
|
|
342
|
+
PENDING_EPOCH=$(date -d "$PENDING_TIME" +%s 2>/dev/null || date -j -f "%Y-%m-%d %H:%M" "$PENDING_TIME" +%s 2>/dev/null)
|
|
343
|
+
AGE_HOURS=$(( (NOW - PENDING_EPOCH) / 3600 ))
|
|
344
|
+
if [ "$AGE_HOURS" -lt 4 ]; then
|
|
345
|
+
echo "goal-quench verification pending (created: $PENDING_TIME — ${AGE_HOURS}h ago)"
|
|
346
|
+
else
|
|
347
|
+
echo "STALE: goal-quench.pending is ${AGE_HOURS}h old. Run /goal-quench --verify to re-evaluate or delete to clear."
|
|
348
|
+
exit 0
|
|
349
|
+
fi
|
|
350
|
+
fi
|
|
351
|
+
```
|
|
352
|
+
|
|
353
|
+
If found and fresh: **automatically run pipeline-conductor** before responding to any other request — `--quick` for core/pro, `--full` for max (the `mode:` field in `.pending` selects which). This is not optional — pending verification takes priority.
|
|
354
|
+
|
|
355
|
+
If found but stale (> 4 hours): warn the user — "A stale goal-quench.pending exists from {timestamp}. Run `/goal-quench --verify` to re-evaluate, or delete it to clear." Do not auto-trigger verification on stale state.
|
|
356
|
+
|
|
357
|
+
### Codex path
|
|
358
|
+
|
|
359
|
+
No Stop hook is required. After the Codex goal/session completes, resolve changed files with git and run:
|
|
360
|
+
|
|
361
|
+
```bash
|
|
362
|
+
FH_BACKEND=codex npx @chrono-meta/fh-gate "{changed-files}" quick codex-goal
|
|
363
|
+
```
|
|
364
|
+
|
|
365
|
+
For high-stakes or external-facing work, use `full` instead of `quick`. Treat `BLOCKED` or `ESCALATE` the same as the Claude path: fix and re-run the gate, or surface the decision to the user.
|
|
366
|
+
|
|
367
|
+
### Verification flow
|
|
368
|
+
|
|
369
|
+
```bash
|
|
370
|
+
# Read scope and target from pending file
|
|
371
|
+
SCOPE=$(grep "^scope:" .claude/goal-quench.pending | sed 's/^scope: //')
|
|
372
|
+
TARGET=$(grep "^target_files:" .claude/goal-quench.pending | sed 's/^target_files: //')
|
|
373
|
+
|
|
374
|
+
# Resolve artifact: explicit target or git diff
|
|
375
|
+
# Use git diff against the commit recorded at Phase 1 start, NOT HEAD,
|
|
376
|
+
# to capture files changed during the entire /goal session including interim commits.
|
|
377
|
+
GOAL_START_COMMIT=$(grep "^start_commit:" .claude/goal-quench.pending | sed 's/^start_commit: //')
|
|
378
|
+
if [ "$TARGET" = "inferred from git diff" ] || [ -z "$TARGET" ]; then
|
|
379
|
+
if [ -n "$GOAL_START_COMMIT" ]; then
|
|
380
|
+
TARGET=$(git diff "$GOAL_START_COMMIT"..HEAD --name-only 2>/dev/null | tr '\n' ' ')
|
|
381
|
+
else
|
|
382
|
+
# Fallback: staged + unstaged changes (does not capture interim commits)
|
|
383
|
+
TARGET=$(git status --short 2>/dev/null | awk '{print $2}' | tr '\n' ' ')
|
|
384
|
+
fi
|
|
385
|
+
fi
|
|
386
|
+
# Guard: reject empty target
|
|
387
|
+
[ -z "$TARGET" ] && echo "ERROR: cannot resolve verification artifact — no files changed or start_commit missing" && exit 1
|
|
388
|
+
```
|
|
389
|
+
|
|
390
|
+
Run `pipeline-conductor --quick` on `$TARGET` (the artifact changed during this /goal session). Gate on result:
|
|
391
|
+
|
|
392
|
+
| pipeline-conductor verdict | goal-quench action | Delete `.pending`? |
|
|
393
|
+
|---|---|---|
|
|
394
|
+
| `CLEAN (--quick)` | Output: "Quality gate passed (--quick). Safe for local iteration and internal commits. Run pipeline-conductor --full before external release or PR." Log tokens. | **Yes — immediately** |
|
|
395
|
+
| `PENDING` | Proceed with caution. Output pending items. Recommend: "Resolve before any external release." | **Yes — immediately** |
|
|
396
|
+
| `BLOCKED` | Block commit. Output blocking items. Ask: "Fix and re-run /goal, or accept partial completion?" | **Only after user acknowledges** |
|
|
397
|
+
| `ESCALATE` | Surface to user for decision. Do not auto-delete — preserve state until user explicitly decides. | **Only after user decision** |
|
|
398
|
+
|
|
399
|
+
**Deletion rule**: On BLOCKED or ESCALATE, `.pending` must survive until the user makes an explicit decision. Deleting before that decision loses the recovery anchor.
|
|
400
|
+
|
|
401
|
+
**Deadlock prevention**: While `.pending` exists (BLOCKED/ESCALATE), the next-turn verification check runs ONCE per turn only — not on every subsequent turn. After the first verification output, suppress further auto-trigger until the user explicitly acts (fix + re-run, accept, or delete). If the user starts a new `/goal-quench` run while `.pending` exists, warn: "A previous verification is pending. Resolve it first (`/goal-quench --verify`) or clear it (`rm .claude/goal-quench.pending`)." Do not silently overwrite with a new `.active` file.
|
|
402
|
+
|
|
403
|
+
Record actual token usage in `tracks/_meta/goal_quench_{YYYY-MM-DD}.md` for calibration (regardless of verdict).
|
|
404
|
+
|
|
405
|
+
### Step 3-c — Sidecar verdict gate (pro + max, when sidecar ≠ none)
|
|
406
|
+
|
|
407
|
+
After pipeline-conductor completes, read `sidecar:` from `.pending`. If non-none, gate on the sidecar's verdict before advancing to Done When:
|
|
408
|
+
|
|
409
|
+
| Sidecar | Wait for | Failure action |
|
|
410
|
+
|---|---|---|
|
|
411
|
+
| `steel-quench-c3` | Wave 2 convergence: PASS or CONDITIONAL_PASS | FAIL → reopen Phase 2, surface blocking findings to user |
|
|
412
|
+
| `agent-composer-panel` | Panel findings file written + incorporated into pipeline-conductor context | Not yet written → re-run pipeline-conductor with panel findings |
|
|
413
|
+
| `sim-conductor` + `steel-quench-w5` | Both: sim-conductor Area A PASS **and** steel-quench W5 convergence PASS | Either FAIL → block Done When; surface failing verdict |
|
|
414
|
+
|
|
415
|
+
This gate is **blocking** — Done When cannot be reached until the sidecar verdict is resolved. Record `sidecar_findings_count` in the calibration record before closing.
|
|
416
|
+
|
|
417
|
+
---
|
|
418
|
+
|
|
419
|
+
## Calibration Record
|
|
420
|
+
|
|
421
|
+
After each goal-quench run, append to `tracks/_meta/goal_quench_{YYYY-MM-DD}.md`:
|
|
422
|
+
|
|
423
|
+
```yaml
|
|
424
|
+
- run_id: GQ-{YYYYMMDD}-{N} # e.g. GQ-20260603-01 — unique per run, links to session JSONL
|
|
425
|
+
session_id: "{first-8-chars-of-jsonl-filename}" # for JSONL back-trace
|
|
426
|
+
date: YYYY-MM-DD
|
|
427
|
+
task: {one-line description}
|
|
428
|
+
scope_hint: "{N files affected, task type}" # e.g. "3 new files, signal doc"
|
|
429
|
+
mode: core | pro | max
|
|
430
|
+
session_type: minor | normal | heavy | continuation # for within-type comparison
|
|
431
|
+
estimated_tokens: N
|
|
432
|
+
budget_source: token-budget-gate | fallback-heuristic
|
|
433
|
+
actual_tokens: N # from ~/.claude/projects/*/conversation*.jsonl or user-reported; "unknown" if unavailable
|
|
434
|
+
estimation_error: over | under | accurate
|
|
435
|
+
estimation_error_pct: "N%" # e.g. "717%" — magnitude of under/over; omit if accurate
|
|
436
|
+
actual_vs_estimate_ratio: N.N # actual / estimated (e.g., 4.7 means actual was 4.7× estimate)
|
|
437
|
+
budget_verdict: GREEN/YELLOW/ORANGE/RED
|
|
438
|
+
pipeline_verdict: CLEAN/PENDING/BLOCKED/ESCALATE
|
|
439
|
+
sidecar: none | steel-quench-c3 | agent-composer-panel | sim-conductor | {cli-name}
|
|
440
|
+
sidecar_type: none | cc-subagent | external-cli # cc-subagent = isolated Sonnet context; external-cli = untracked
|
|
441
|
+
sidecar_model: none | sonnet | {model-name} # standard baseline: sonnet; external CLIs vary by tier
|
|
442
|
+
orchestrator_tokens: N | unknown # Opus orchestrator CC tokens (pro/max only; visible in CC)
|
|
443
|
+
sidecar_tokens: N | unknown # Sonnet sub-agent CC tokens if cc-subagent; "untracked" if external-cli
|
|
444
|
+
sidecar_findings_count: N # 0 if sidecar=none
|
|
445
|
+
threshold_triggered: none/50/70/85/95
|
|
446
|
+
notes: {optional — why estimate was off, what scope changed}
|
|
447
|
+
```
|
|
448
|
+
|
|
449
|
+
**`actual_tokens` collection**: Claude cannot read session token counts directly. Preferred method:
|
|
450
|
+
```bash
|
|
451
|
+
python3 -c "
|
|
452
|
+
import json, glob
|
|
453
|
+
f = sorted(glob.glob('~/.claude/projects/*/conversation*.jsonl'.replace('~', __import__('os').path.expanduser('~'))))[-1]
|
|
454
|
+
lines = [json.loads(l) for l in open(f)]
|
|
455
|
+
total = sum(m.get('message',{}).get('usage',{}).get('input_tokens',0) + m.get('message',{}).get('usage',{}).get('output_tokens',0) for m in lines)
|
|
456
|
+
print(f'Session tokens: {total:,}')
|
|
457
|
+
"
|
|
458
|
+
```
|
|
459
|
+
Fallback: turn count × estimated tokens/turn (~2K for short turns, ~8K for long file edits). Write `"unknown"` if no estimate is possible.
|
|
460
|
+
|
|
461
|
+
**Retrospective calibration baseline (N=10, 2026-06-01–06-03, Sonnet)**: mean actual/estimate ratio = 4.7× (range 1.3×–10.5×). Systematic underestimation due to session overhead. Full data held in the private companion store (`paper-signals/`).
|
|
462
|
+
|
|
463
|
+
After 10 additional prospective runs, compute mean estimation error per mode — calibrate the session_overhead_factor if systematic over/under persists.
|
|
464
|
+
|
|
465
|
+
`pipeline_verdict` enum includes `ESCALATE` — record it when Phase 3 required user decision before proceeding.
|
|
466
|
+
|
|
467
|
+
This data calibrates future estimates. Target: 10 prospective runs per mode before treating mode comparison as reliable. Runs with `budget_source: fallback-heuristic` calibrate the fallback tiers; runs with `token-budget-gate` calibrate the skill itself.
|
|
468
|
+
|
|
469
|
+
---
|
|
470
|
+
|
|
471
|
+
## Simplification Guards
|
|
472
|
+
|
|
473
|
+
- Do not activate goal-quench for exploratory or single-turn tasks — only for /goal sessions.
|
|
474
|
+
- If the user runs /goal without going through goal-quench setup, propose retroactively: "Run /goal-quench --verify to quality-check the output."
|
|
475
|
+
- `--verify` mode: skip Phase 1+2, go directly to Phase 3 verification using current session scope.
|
|
476
|
+
- **core is the default and the floor** — never auto-escalate a GREEN/YELLOW task to pro/max. Sidecar CLI costs (external, untracked in CC) apply in pro/max — auto-escalation requires ORANGE/RED budget or explicit user flag.
|
|
477
|
+
- **pro/max are supersets of core, not replacements** — a user who wants only the v1 safety belt stays in core and sees no orchestration.
|
|
478
|
+
- **Post-optimization step-down**: if a pro/max run's re-estimated budget drops to GREEN/YELLOW after context-doctor trims (Phase 1.5 Step A), offer to drop back to core — don't run orchestration the trimmed scope no longer needs.
|
|
479
|
+
- Phase 1.5 does not re-implement the gates of the skills it chains (agent-composer's destructive/fan-out gates, plugin-recommender's install approval) — it defers to them.
|
|
480
|
+
|
|
481
|
+
---
|
|
482
|
+
|
|
483
|
+
## Chains
|
|
484
|
+
|
|
485
|
+
- `token-budget-gate` — Phase 1 cost estimation (all modes)
|
|
486
|
+
- `context-doctor` — Phase 1.5 Step A token-reduction pre-pass (pro + max)
|
|
487
|
+
- `agent-composer` — Phase 1.5 Step B goal decomposition into Waves (pro + max)
|
|
488
|
+
- `plugin-recommender` — Phase 1.5 Step C capability-gap fill (max only, GAP-triggered)
|
|
489
|
+
- `cross-ecosystem-synergy-detection` — Phase 1.5 Step C pre-validation of discovered candidates (max only)
|
|
490
|
+
- `pipeline-conductor` — Phase 3 quality gate (`--quick` for core/pro, `--full` for max; called by Stop hook)
|
|
491
|
+
- `field-harvest` — capture calibration data as reusable pattern after 10 runs
|
|
492
|
+
|
|
493
|
+
---
|
|
494
|
+
|
|
495
|
+
## Done When
|
|
496
|
+
|
|
497
|
+
```
|
|
498
|
+
Phase 1: token-budget-gate verdict output + mode resolved (core default, or pro/max via budget verdict / explicit flag)
|
|
499
|
+
+ .claude/goal-quench.active written (with mode: field) + thresholds injected
|
|
500
|
+
+ If pro/max: Phase 1.5 ran — context-doctor pre-pass + agent-composer plan;
|
|
501
|
+
max additionally: GAP-triggered plugin-recommender + cross-ecosystem-synergy-detection pre-validation
|
|
502
|
+
+ Phase 3 (on next response after /goal): .pending file detected + pipeline-conductor run
|
|
503
|
+
(--quick for core/pro, --full for max)
|
|
504
|
+
+ Verification verdict output (CLEAN/PENDING/BLOCKED/ESCALATE)
|
|
505
|
+
+ If sidecar invoked (pro/max Step D): Step 3-c sidecar verdict resolved (PASS/CONDITIONAL_PASS) before closing
|
|
506
|
+
+ .pending file deleted + calibration record appended
|
|
507
|
+
```
|
|
508
|
+
|
|
509
|
+
Verdict: PASS (CLEAN verification) | CONDITIONAL_PASS (PENDING verification) | FAIL (BLOCKED verification) | ESCALATE (user decision required on partial completion)
|