ultimate-pi 0.7.0 → 0.8.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.agents/skills/harness-decisions/SKILL.md +20 -1
- package/.agents/skills/harness-eval/SKILL.md +11 -13
- package/.agents/skills/harness-orchestration/SKILL.md +36 -30
- package/.agents/skills/harness-plan/SKILL.md +13 -18
- package/.pi/PACKAGING.md +1 -1
- package/.pi/agents/harness/adversary.md +20 -12
- package/.pi/agents/harness/evaluator.md +25 -14
- package/.pi/agents/harness/executor.md +27 -16
- package/.pi/agents/harness/incident-recorder.md +37 -0
- package/.pi/agents/harness/meta-optimizer.md +18 -15
- package/.pi/agents/harness/planner.md +27 -30
- package/.pi/agents/harness/tie-breaker.md +4 -2
- package/.pi/agents/harness/trace-librarian.md +18 -11
- package/.pi/agents/pi-pi/ext-expert.md +1 -1
- package/.pi/agents/pi-pi/keybinding-expert.md +1 -1
- package/.pi/agents/pi-pi/tui-expert.md +3 -3
- package/.pi/extensions/00-ultimate-pi-system-prompt.ts +2 -2
- package/.pi/extensions/budget-guard.ts +1 -1
- package/.pi/extensions/custom-footer.ts +8 -3
- package/.pi/extensions/custom-header.ts +2 -2
- package/.pi/extensions/debate-orchestrator.ts +1 -1
- package/.pi/extensions/dotenv-loader.ts +1 -1
- package/.pi/extensions/drift-monitor.ts +1 -1
- package/.pi/extensions/harness-ask-user.ts +1 -1
- package/.pi/extensions/harness-live-widget.ts +1 -1
- package/.pi/extensions/harness-run-context.ts +52 -10
- package/.pi/extensions/harness-telemetry.ts +1 -1
- package/.pi/extensions/harness-web-guard.ts +1 -1
- package/.pi/extensions/harness-web-tools.ts +1 -1
- package/.pi/extensions/lib/ask-user/dialog.ts +2 -2
- package/.pi/extensions/lib/ask-user/fallback.ts +1 -1
- package/.pi/extensions/lib/ask-user/render.ts +3 -3
- package/.pi/extensions/lib/harness-subagents/agent-loader.ts +1 -1
- package/.pi/extensions/lib/harness-subagents/agent-parser.ts +1 -1
- package/.pi/extensions/lib/harness-subagents/blackboard-tool.ts +1 -1
- package/.pi/extensions/lib/harness-subagents/harness-subagent-policy.ts +134 -0
- package/.pi/extensions/lib/harness-subagents/vendored/agent-manager.ts +2 -2
- package/.pi/extensions/lib/harness-subagents/vendored/agent-runner.ts +9 -5
- package/.pi/extensions/lib/harness-subagents/vendored/context.ts +1 -1
- package/.pi/extensions/lib/harness-subagents/vendored/env.ts +1 -1
- package/.pi/extensions/lib/harness-subagents/vendored/index.ts +2 -2
- package/.pi/extensions/lib/harness-subagents/vendored/output-file.ts +1 -1
- package/.pi/extensions/lib/harness-subagents/vendored/schedule.ts +1 -1
- package/.pi/extensions/lib/harness-subagents/vendored/settings.ts +1 -1
- package/.pi/extensions/lib/harness-subagents/vendored/skill-loader.ts +1 -1
- package/.pi/extensions/lib/harness-subagents/vendored/types.ts +2 -2
- package/.pi/extensions/lib/harness-subagents/vendored/ui/agent-widget.ts +1 -1
- package/.pi/extensions/lib/harness-subagents/vendored/ui/conversation-viewer.ts +2 -2
- package/.pi/extensions/lib/harness-subagents/vendored/ui/schedule-menu.ts +1 -1
- package/.pi/extensions/observation-bus.ts +1 -1
- package/.pi/extensions/pi-model-router-harness.ts +1 -1
- package/.pi/extensions/policy-gate.ts +86 -16
- package/.pi/extensions/provider-payload-sanitize.ts +1 -1
- package/.pi/extensions/review-integrity.ts +76 -22
- package/.pi/extensions/sentrux-rules-sync.ts +1 -1
- package/.pi/extensions/soundboard.ts +1 -1
- package/.pi/extensions/test-diff-integrity.ts +1 -1
- package/.pi/extensions/trace-recorder.ts +1 -1
- package/.pi/extensions/ultimate-pi-vcc.ts +1 -1
- package/.pi/harness/agents.manifest.json +16 -12
- package/.pi/harness/docs/adrs/0031-harness-run-context.md +5 -2
- package/.pi/harness/docs/adrs/0032-harness-command-orchestration.md +37 -0
- package/.pi/harness/docs/adrs/README.md +1 -0
- package/.pi/harness/specs/harness-spawn-context.schema.json +65 -0
- package/.pi/lib/harness-agent-output.ts +41 -0
- package/.pi/lib/harness-run-context.ts +352 -7
- package/.pi/lib/harness-ui-state.ts +1 -1
- package/.pi/prompts/harness-auto.md +36 -61
- package/.pi/prompts/harness-critic.md +15 -28
- package/.pi/prompts/harness-eval.md +19 -27
- package/.pi/prompts/harness-incident.md +15 -34
- package/.pi/prompts/harness-plan.md +31 -50
- package/.pi/prompts/harness-review.md +16 -30
- package/.pi/prompts/harness-router-tune.md +16 -38
- package/.pi/prompts/harness-run.md +21 -38
- package/.pi/prompts/harness-setup.md +2 -0
- package/.pi/prompts/harness-trace.md +13 -30
- package/.pi/scripts/harness-generate-model-router.mjs +16 -13
- package/.pi/scripts/harness-verify.mjs +16 -0
- package/.pi/scripts/vendor-sync-pi-model-router.sh +10 -10
- package/CHANGELOG.md +19 -1
- package/README.md +4 -5
- package/THIRD_PARTY_NOTICES.md +1 -1
- package/package.json +13 -8
- package/vendor/pi-model-router/UPSTREAM_PIN.md +1 -1
- package/vendor/pi-model-router/extensions/commands.ts +2 -2
- package/vendor/pi-model-router/extensions/config.ts +2 -2
- package/vendor/pi-model-router/extensions/index.ts +1 -1
- package/vendor/pi-model-router/extensions/provider.ts +2 -2
- package/vendor/pi-model-router/extensions/routing.ts +2 -2
- package/vendor/pi-model-router/extensions/types.ts +1 -1
- package/vendor/pi-model-router/extensions/ui.ts +1 -1
- package/vendor/pi-model-router/package.json +4 -4
- package/vendor/pi-vcc/index.ts +1 -1
- package/vendor/pi-vcc/package.json +1 -1
- package/vendor/pi-vcc/src/commands/pi-vcc.ts +1 -1
- package/vendor/pi-vcc/src/commands/vcc-recall.ts +1 -1
- package/vendor/pi-vcc/src/core/content.ts +1 -1
- package/vendor/pi-vcc/src/core/load-messages.ts +1 -1
- package/vendor/pi-vcc/src/core/normalize.ts +1 -1
- package/vendor/pi-vcc/src/core/render-entries.ts +1 -1
- package/vendor/pi-vcc/src/core/report.ts +1 -1
- package/vendor/pi-vcc/src/core/search-entries.ts +1 -1
- package/vendor/pi-vcc/src/core/summarize.ts +1 -1
- package/vendor/pi-vcc/src/hooks/before-compact.ts +2 -2
- package/vendor/pi-vcc/src/tools/recall.ts +1 -1
- package/vendor/pi-vcc/src/types.ts +1 -1
- package/vendor/pi-vcc/tests/fixtures.ts +1 -1
- package/vendor/pi-vcc/tests/render-entries.test.ts +1 -1
- package/vendor/pi-vcc/tests/search-entries.test.ts +1 -1
- package/vendor/pi-vcc/tests/support/load-session.ts +2 -2
|
@@ -0,0 +1,41 @@
|
|
|
1
|
+
/**
|
|
2
|
+
* Parse structured JSON blocks from harness subagent assistant output.
|
|
3
|
+
*/
|
|
4
|
+
|
|
5
|
+
const JSON_FENCE_RE = /```(?:json)?\s*([\s\S]*?)```/i;
|
|
6
|
+
|
|
7
|
+
export function extractJsonBlock(text: string): string | null {
|
|
8
|
+
const trimmed = text.trim();
|
|
9
|
+
if (trimmed.startsWith("{")) {
|
|
10
|
+
return trimmed;
|
|
11
|
+
}
|
|
12
|
+
const match = JSON_FENCE_RE.exec(text);
|
|
13
|
+
if (match?.[1]) {
|
|
14
|
+
return match[1].trim();
|
|
15
|
+
}
|
|
16
|
+
const lastBrace = trimmed.lastIndexOf("{");
|
|
17
|
+
const lastClose = trimmed.lastIndexOf("}");
|
|
18
|
+
if (lastBrace >= 0 && lastClose > lastBrace) {
|
|
19
|
+
return trimmed.slice(lastBrace, lastClose + 1);
|
|
20
|
+
}
|
|
21
|
+
return null;
|
|
22
|
+
}
|
|
23
|
+
|
|
24
|
+
export function parseHarnessAgentJson<T extends Record<string, unknown>>(
|
|
25
|
+
text: string,
|
|
26
|
+
): { ok: true; value: T } | { ok: false; error: string } {
|
|
27
|
+
const block = extractJsonBlock(text);
|
|
28
|
+
if (!block) {
|
|
29
|
+
return { ok: false, error: "no JSON block found in subagent output" };
|
|
30
|
+
}
|
|
31
|
+
try {
|
|
32
|
+
const value = JSON.parse(block) as T;
|
|
33
|
+
if (!value || typeof value !== "object") {
|
|
34
|
+
return { ok: false, error: "parsed value is not an object" };
|
|
35
|
+
}
|
|
36
|
+
return { ok: true, value };
|
|
37
|
+
} catch (err) {
|
|
38
|
+
const message = err instanceof Error ? err.message : String(err);
|
|
39
|
+
return { ok: false, error: message };
|
|
40
|
+
}
|
|
41
|
+
}
|
|
@@ -6,7 +6,7 @@
|
|
|
6
6
|
* - `.pi/harness/active-run.json` (cross-session pointer)
|
|
7
7
|
*/
|
|
8
8
|
|
|
9
|
-
import { mkdir, readFile, writeFile } from "node:fs/promises";
|
|
9
|
+
import { mkdir, readFile, realpath, writeFile } from "node:fs/promises";
|
|
10
10
|
import { isAbsolute, join, relative, resolve } from "node:path";
|
|
11
11
|
|
|
12
12
|
export type HarnessPhase =
|
|
@@ -114,6 +114,353 @@ export function canonicalPlanPath(runId: string, projectRoot: string): string {
|
|
|
114
114
|
return join(harnessRunsRoot(projectRoot), runId, "plan-packet.json");
|
|
115
115
|
}
|
|
116
116
|
|
|
117
|
+
const PLAN_PACKET_BASENAME = "plan-packet.json";
|
|
118
|
+
|
|
119
|
+
const MUTATING_FILE_TOOLS = new Set(["write", "edit"]);
|
|
120
|
+
|
|
121
|
+
const PLAN_APPROVE_OPTION =
|
|
122
|
+
/^(approve(d)?(\s+plan)?|yes,?\s+proceed|looks\s+good)$/i;
|
|
123
|
+
const PLAN_CANCEL_OPTION =
|
|
124
|
+
/^(cancel(led)?|revise|request\s+changes|needs?\s+clarification)$/i;
|
|
125
|
+
|
|
126
|
+
export interface PlanUserApproval {
|
|
127
|
+
plan_id: string | null;
|
|
128
|
+
approved_at: string;
|
|
129
|
+
source: "ask_user" | "harness-plan-approval" | "noninteractive";
|
|
130
|
+
}
|
|
131
|
+
|
|
132
|
+
export interface PlanPhaseMutationDecision {
|
|
133
|
+
allowed: boolean;
|
|
134
|
+
reason?: string;
|
|
135
|
+
isScopedPlanWrite?: boolean;
|
|
136
|
+
}
|
|
137
|
+
|
|
138
|
+
/** Resolve path relative to project root when not absolute. */
|
|
139
|
+
export function normalizeHarnessPath(
|
|
140
|
+
path: string,
|
|
141
|
+
projectRoot: string,
|
|
142
|
+
): string {
|
|
143
|
+
const trimmed = path.trim();
|
|
144
|
+
if (!trimmed) return resolve(projectRoot);
|
|
145
|
+
if (isAbsolute(trimmed)) return resolve(trimmed);
|
|
146
|
+
return resolve(projectRoot, trimmed);
|
|
147
|
+
}
|
|
148
|
+
|
|
149
|
+
export function isCanonicalPlanPacketPath(
|
|
150
|
+
absPath: string,
|
|
151
|
+
projectRoot: string,
|
|
152
|
+
runId: string,
|
|
153
|
+
): boolean {
|
|
154
|
+
const expected = resolve(canonicalPlanPath(runId, projectRoot));
|
|
155
|
+
return resolve(absPath) === expected;
|
|
156
|
+
}
|
|
157
|
+
|
|
158
|
+
export function extractWritePathFromToolInput(
|
|
159
|
+
input: Record<string, unknown>,
|
|
160
|
+
): string {
|
|
161
|
+
const raw =
|
|
162
|
+
(typeof input.path === "string" && input.path) ||
|
|
163
|
+
(typeof input.filePath === "string" && input.filePath) ||
|
|
164
|
+
"";
|
|
165
|
+
return raw.trim();
|
|
166
|
+
}
|
|
167
|
+
|
|
168
|
+
/** True when absPath is the canonical plan-packet.json for the active run. */
|
|
169
|
+
export async function isPlanPhaseScopedWrite(
|
|
170
|
+
absPath: string,
|
|
171
|
+
runCtx: HarnessRunContext | null,
|
|
172
|
+
projectRoot: string,
|
|
173
|
+
): Promise<boolean> {
|
|
174
|
+
if (!runCtx?.run_id) return false;
|
|
175
|
+
let resolved: string;
|
|
176
|
+
try {
|
|
177
|
+
resolved = await realpath(normalizeHarnessPath(absPath, projectRoot));
|
|
178
|
+
} catch {
|
|
179
|
+
resolved = normalizeHarnessPath(absPath, projectRoot);
|
|
180
|
+
}
|
|
181
|
+
const runsRoot = resolve(harnessRunsRoot(projectRoot));
|
|
182
|
+
let runsReal: string;
|
|
183
|
+
try {
|
|
184
|
+
runsReal = await realpath(runsRoot);
|
|
185
|
+
} catch {
|
|
186
|
+
runsReal = runsRoot;
|
|
187
|
+
}
|
|
188
|
+
const rel = relative(runsReal, resolved);
|
|
189
|
+
if (rel.startsWith("..") || isAbsolute(rel)) return false;
|
|
190
|
+
const parts = rel.split(/[/\\]/);
|
|
191
|
+
if (parts.length !== 2 || parts[1] !== PLAN_PACKET_BASENAME) return false;
|
|
192
|
+
if (parts[0] !== runCtx.run_id) return false;
|
|
193
|
+
return isCanonicalPlanPacketPath(resolved, projectRoot, runCtx.run_id);
|
|
194
|
+
}
|
|
195
|
+
|
|
196
|
+
export function indexOfLastPlanCommand(entries: unknown[]): number {
|
|
197
|
+
for (let i = entries.length - 1; i >= 0; i--) {
|
|
198
|
+
const entry = entries[i] as SessionEntryLike & {
|
|
199
|
+
message?: { role?: string; content?: string | unknown[] };
|
|
200
|
+
};
|
|
201
|
+
if (
|
|
202
|
+
entry.type === "custom" &&
|
|
203
|
+
entry.customType === "harness-plan-attempt"
|
|
204
|
+
) {
|
|
205
|
+
return i;
|
|
206
|
+
}
|
|
207
|
+
if (entry.type !== "message" || entry.message?.role !== "user") continue;
|
|
208
|
+
const content = entry.message.content;
|
|
209
|
+
const text =
|
|
210
|
+
typeof content === "string"
|
|
211
|
+
? content
|
|
212
|
+
: Array.isArray(content)
|
|
213
|
+
? content
|
|
214
|
+
.filter(
|
|
215
|
+
(c): c is { type: string; text?: string } =>
|
|
216
|
+
typeof c === "object" &&
|
|
217
|
+
c !== null &&
|
|
218
|
+
(c as { type?: string }).type === "text",
|
|
219
|
+
)
|
|
220
|
+
.map((c) => c.text ?? "")
|
|
221
|
+
.join("\n")
|
|
222
|
+
: "";
|
|
223
|
+
const visible = userVisiblePromptSlice(text);
|
|
224
|
+
const parsed = parseHarnessSlashCommand(visible);
|
|
225
|
+
if (
|
|
226
|
+
parsed?.command === "harness-plan" ||
|
|
227
|
+
parsed?.command === "harness-auto"
|
|
228
|
+
) {
|
|
229
|
+
return i;
|
|
230
|
+
}
|
|
231
|
+
}
|
|
232
|
+
return -1;
|
|
233
|
+
}
|
|
234
|
+
|
|
235
|
+
export function parseAskUserApprovalFromMessage(msg: {
|
|
236
|
+
toolName?: string;
|
|
237
|
+
details?: unknown;
|
|
238
|
+
content?: { type?: string; text?: string }[];
|
|
239
|
+
}): PlanUserApproval | null {
|
|
240
|
+
if (msg.toolName !== "ask_user") return null;
|
|
241
|
+
const details = msg.details as
|
|
242
|
+
| {
|
|
243
|
+
cancelled?: boolean;
|
|
244
|
+
response?: {
|
|
245
|
+
kind?: string;
|
|
246
|
+
text?: string;
|
|
247
|
+
selections?: string[];
|
|
248
|
+
};
|
|
249
|
+
}
|
|
250
|
+
| undefined;
|
|
251
|
+
if (details?.cancelled) return null;
|
|
252
|
+
const response = details?.response;
|
|
253
|
+
if (!response) return null;
|
|
254
|
+
if (response.kind === "freeform") {
|
|
255
|
+
const text = (response.text ?? "").trim();
|
|
256
|
+
if (/^approve(d)?\b/i.test(text)) {
|
|
257
|
+
return {
|
|
258
|
+
plan_id: null,
|
|
259
|
+
approved_at: nowIso(),
|
|
260
|
+
source: "ask_user",
|
|
261
|
+
};
|
|
262
|
+
}
|
|
263
|
+
return null;
|
|
264
|
+
}
|
|
265
|
+
const selection = (response.selections?.[0] ?? "").trim();
|
|
266
|
+
if (!selection || PLAN_CANCEL_OPTION.test(selection)) return null;
|
|
267
|
+
if (PLAN_APPROVE_OPTION.test(selection)) {
|
|
268
|
+
return {
|
|
269
|
+
plan_id: null,
|
|
270
|
+
approved_at: nowIso(),
|
|
271
|
+
source: "ask_user",
|
|
272
|
+
};
|
|
273
|
+
}
|
|
274
|
+
return null;
|
|
275
|
+
}
|
|
276
|
+
|
|
277
|
+
export function getLatestPlanUserApproval(
|
|
278
|
+
entries: unknown[],
|
|
279
|
+
sinceIndex = 0,
|
|
280
|
+
): PlanUserApproval | null {
|
|
281
|
+
for (let i = entries.length - 1; i >= sinceIndex; i--) {
|
|
282
|
+
const entry = entries[i] as SessionEntryLike & {
|
|
283
|
+
message?: {
|
|
284
|
+
role?: string;
|
|
285
|
+
toolName?: string;
|
|
286
|
+
details?: unknown;
|
|
287
|
+
content?: { type?: string; text?: string }[];
|
|
288
|
+
};
|
|
289
|
+
};
|
|
290
|
+
if (
|
|
291
|
+
entry.type === "custom" &&
|
|
292
|
+
entry.customType === "harness-plan-approval"
|
|
293
|
+
) {
|
|
294
|
+
const data = entry.data as Partial<PlanUserApproval> | undefined;
|
|
295
|
+
if (data?.approved_at) {
|
|
296
|
+
return {
|
|
297
|
+
plan_id: typeof data.plan_id === "string" ? data.plan_id : null,
|
|
298
|
+
approved_at: data.approved_at,
|
|
299
|
+
source:
|
|
300
|
+
data.source === "noninteractive"
|
|
301
|
+
? "noninteractive"
|
|
302
|
+
: "harness-plan-approval",
|
|
303
|
+
};
|
|
304
|
+
}
|
|
305
|
+
}
|
|
306
|
+
if (entry.type !== "message" || entry.message?.role !== "toolResult") {
|
|
307
|
+
continue;
|
|
308
|
+
}
|
|
309
|
+
const fromAsk = parseAskUserApprovalFromMessage(entry.message);
|
|
310
|
+
if (fromAsk) return fromAsk;
|
|
311
|
+
}
|
|
312
|
+
return null;
|
|
313
|
+
}
|
|
314
|
+
|
|
315
|
+
export function hasPlanUserApproval(
|
|
316
|
+
entries: unknown[],
|
|
317
|
+
opts?: { planId?: string | null; sincePlanCommand?: boolean },
|
|
318
|
+
): boolean {
|
|
319
|
+
if (process.env.HARNESS_PLAN_NONINTERACTIVE === "1") {
|
|
320
|
+
return true;
|
|
321
|
+
}
|
|
322
|
+
const since = opts?.sincePlanCommand
|
|
323
|
+
? Math.max(0, indexOfLastPlanCommand(entries))
|
|
324
|
+
: 0;
|
|
325
|
+
const approval = getLatestPlanUserApproval(entries, since);
|
|
326
|
+
if (!approval) return false;
|
|
327
|
+
if (opts?.planId && approval.plan_id && approval.plan_id !== opts.planId) {
|
|
328
|
+
return false;
|
|
329
|
+
}
|
|
330
|
+
return true;
|
|
331
|
+
}
|
|
332
|
+
|
|
333
|
+
export function isHarnessAutoSession(entries: unknown[]): boolean {
|
|
334
|
+
const since = indexOfLastPlanCommand(entries);
|
|
335
|
+
if (since < 0) return false;
|
|
336
|
+
for (let i = since; i < entries.length; i++) {
|
|
337
|
+
const entry = entries[i] as SessionEntryLike & {
|
|
338
|
+
message?: { role?: string; content?: string };
|
|
339
|
+
};
|
|
340
|
+
if (entry.type !== "message" || entry.message?.role !== "user") continue;
|
|
341
|
+
const text =
|
|
342
|
+
typeof entry.message.content === "string"
|
|
343
|
+
? userVisiblePromptSlice(entry.message.content)
|
|
344
|
+
: "";
|
|
345
|
+
const parsed = parseHarnessSlashCommand(text);
|
|
346
|
+
if (parsed?.command === "harness-auto") return true;
|
|
347
|
+
}
|
|
348
|
+
return false;
|
|
349
|
+
}
|
|
350
|
+
|
|
351
|
+
export async function isPlanPhaseAllowedMutation(
|
|
352
|
+
toolName: string,
|
|
353
|
+
input: Record<string, unknown>,
|
|
354
|
+
phase: HarnessPhase,
|
|
355
|
+
runCtx: HarnessRunContext | null,
|
|
356
|
+
projectRoot: string,
|
|
357
|
+
opts: {
|
|
358
|
+
aborted: boolean;
|
|
359
|
+
entries: unknown[];
|
|
360
|
+
ownerSessionId?: string;
|
|
361
|
+
currentSessionId?: string;
|
|
362
|
+
},
|
|
363
|
+
): Promise<PlanPhaseMutationDecision> {
|
|
364
|
+
if (!MUTATING_FILE_TOOLS.has(toolName)) {
|
|
365
|
+
if (phase === "execute" || phase === "merge") {
|
|
366
|
+
return { allowed: true };
|
|
367
|
+
}
|
|
368
|
+
return {
|
|
369
|
+
allowed: false,
|
|
370
|
+
reason: `policy-gate: ${toolName} blocked in phase '${phase}'.`,
|
|
371
|
+
};
|
|
372
|
+
}
|
|
373
|
+
|
|
374
|
+
if (
|
|
375
|
+
runCtx?.owner_pi_session_id &&
|
|
376
|
+
opts.currentSessionId &&
|
|
377
|
+
runCtx.owner_pi_session_id !== opts.currentSessionId
|
|
378
|
+
) {
|
|
379
|
+
return {
|
|
380
|
+
allowed: false,
|
|
381
|
+
reason:
|
|
382
|
+
"harness-run-context: this session does not own the active run; plan writes are read-only here.",
|
|
383
|
+
};
|
|
384
|
+
}
|
|
385
|
+
|
|
386
|
+
const target = extractWritePathFromToolInput(input);
|
|
387
|
+
if (!target) {
|
|
388
|
+
return {
|
|
389
|
+
allowed: false,
|
|
390
|
+
reason: "policy-gate: write/edit requires a path.",
|
|
391
|
+
};
|
|
392
|
+
}
|
|
393
|
+
|
|
394
|
+
const scoped = runCtx
|
|
395
|
+
? await isPlanPhaseScopedWrite(target, runCtx, projectRoot)
|
|
396
|
+
: false;
|
|
397
|
+
|
|
398
|
+
if (scoped) {
|
|
399
|
+
if (!runCtx) {
|
|
400
|
+
return {
|
|
401
|
+
allowed: false,
|
|
402
|
+
reason:
|
|
403
|
+
'policy-gate: no active harness run. Run /harness-plan "<task>" first.',
|
|
404
|
+
};
|
|
405
|
+
}
|
|
406
|
+
if (
|
|
407
|
+
!hasPlanUserApproval(opts.entries, {
|
|
408
|
+
sincePlanCommand: true,
|
|
409
|
+
planId: runCtx.plan_id,
|
|
410
|
+
})
|
|
411
|
+
) {
|
|
412
|
+
return {
|
|
413
|
+
allowed: false,
|
|
414
|
+
isScopedPlanWrite: true,
|
|
415
|
+
reason:
|
|
416
|
+
"policy-gate: plan-packet.json write blocked until the user approves via ask_user (present the full plan, then Approve).",
|
|
417
|
+
};
|
|
418
|
+
}
|
|
419
|
+
if (opts.aborted) {
|
|
420
|
+
return { allowed: true, isScopedPlanWrite: true };
|
|
421
|
+
}
|
|
422
|
+
if (phase === "plan") {
|
|
423
|
+
return { allowed: true, isScopedPlanWrite: true };
|
|
424
|
+
}
|
|
425
|
+
if (phase === "execute" || phase === "merge") {
|
|
426
|
+
return { allowed: true, isScopedPlanWrite: true };
|
|
427
|
+
}
|
|
428
|
+
return {
|
|
429
|
+
allowed: false,
|
|
430
|
+
isScopedPlanWrite: true,
|
|
431
|
+
reason: `harness-run-context: plan-packet.json is read-only in phase '${phase}'.`,
|
|
432
|
+
};
|
|
433
|
+
}
|
|
434
|
+
|
|
435
|
+
if (opts.aborted) {
|
|
436
|
+
return {
|
|
437
|
+
allowed: false,
|
|
438
|
+
reason:
|
|
439
|
+
"policy-gate: mutating tool blocked because harness-abort lock is active. Attach a new approved plan via plan-packet.json first.",
|
|
440
|
+
};
|
|
441
|
+
}
|
|
442
|
+
|
|
443
|
+
if (phase === "execute" || phase === "merge") {
|
|
444
|
+
return { allowed: true };
|
|
445
|
+
}
|
|
446
|
+
|
|
447
|
+
if (phase === "plan" && !runCtx) {
|
|
448
|
+
return {
|
|
449
|
+
allowed: false,
|
|
450
|
+
reason:
|
|
451
|
+
'policy-gate: no active harness run. Run /harness-plan "<task>" first.',
|
|
452
|
+
};
|
|
453
|
+
}
|
|
454
|
+
|
|
455
|
+
const allowedPath = runCtx?.run_id
|
|
456
|
+
? canonicalPlanPath(runCtx.run_id, projectRoot)
|
|
457
|
+
: ".pi/harness/runs/<run_id>/plan-packet.json";
|
|
458
|
+
return {
|
|
459
|
+
allowed: false,
|
|
460
|
+
reason: `policy-gate: ${toolName} blocked in phase '${phase}'. In plan phase only ${allowedPath} is writable after ask_user approval.`,
|
|
461
|
+
};
|
|
462
|
+
}
|
|
463
|
+
|
|
117
464
|
export function allocateRunId(sessionId: string): string {
|
|
118
465
|
return `${sessionId}-${Date.now()}`;
|
|
119
466
|
}
|
|
@@ -471,13 +818,11 @@ export function validatePlanOverridePath(
|
|
|
471
818
|
runId: string,
|
|
472
819
|
projectRoot: string,
|
|
473
820
|
): { ok: boolean; reason?: string } {
|
|
474
|
-
const absPlan =
|
|
475
|
-
|
|
476
|
-
const rel = relative(runsDir, absPlan);
|
|
477
|
-
if (rel.startsWith("..") || isAbsolute(rel)) {
|
|
821
|
+
const absPlan = normalizeHarnessPath(planPath, projectRoot);
|
|
822
|
+
if (!isCanonicalPlanPacketPath(absPlan, projectRoot, runId)) {
|
|
478
823
|
return {
|
|
479
824
|
ok: false,
|
|
480
|
-
reason: `--plan must be
|
|
825
|
+
reason: `--plan must be runs/${runId}/plan-packet.json (canonical plan packet only)`,
|
|
481
826
|
};
|
|
482
827
|
}
|
|
483
828
|
return { ok: true };
|
|
@@ -701,7 +1046,7 @@ export function nextStepAfterOutcome(input: {
|
|
|
701
1046
|
return "/harness-plan or /harness-abort";
|
|
702
1047
|
}
|
|
703
1048
|
if (exec === "completed") {
|
|
704
|
-
return "
|
|
1049
|
+
return "/harness-eval";
|
|
705
1050
|
}
|
|
706
1051
|
}
|
|
707
1052
|
if (input.phase === "evaluate") {
|
|
@@ -5,79 +5,54 @@ argument-hint: "\"<task>\" [--quick] [--risk low|med|high] [--budget <amount>]"
|
|
|
5
5
|
|
|
6
6
|
# harness-auto
|
|
7
7
|
|
|
8
|
-
|
|
9
|
-
|
|
10
|
-
`plan -> execute -> evaluate -> adversary -> severity-policy decision -> commit+PR (no auto-merge)`
|
|
8
|
+
Pipeline orchestrator — one session, sequential `Agent` spawns. Invoke **harness-orchestration** skill for agent IDs. Do **not** implement or review inline.
|
|
11
9
|
|
|
12
10
|
## Step 0 — Parse arguments
|
|
13
11
|
|
|
14
|
-
|
|
15
|
-
|
|
16
|
-
- required task: quoted or unquoted first value
|
|
17
|
-
- optional flags: `--quick`, `--risk low|med|high`, `--budget <amount>`
|
|
12
|
+
- required task (quoted or first token)
|
|
13
|
+
- optional: `--quick`, `--risk`, `--budget`
|
|
18
14
|
|
|
19
|
-
If task
|
|
15
|
+
If task missing:
|
|
20
16
|
|
|
21
17
|
`Usage: /harness-auto "<task>" [--quick] [--risk low|med|high] [--budget <amount>]`
|
|
22
18
|
|
|
23
|
-
##
|
|
24
|
-
|
|
25
|
-
1. Build and approve plan packet at the canonical active-run path before any mutation (extension allocates one `run_id` for the auto pipeline).
|
|
26
|
-
2. Execute only approved scope with rollback artifacts.
|
|
27
|
-
3. Run independent evaluator then adversarial reviewer.
|
|
28
|
-
4. Apply severity policy + strict pre-PR gates.
|
|
29
|
-
5. If gates pass, auto-commit and open PR; never auto-merge.
|
|
30
|
-
|
|
31
|
-
## Locked decisions (must not be changed)
|
|
32
|
-
|
|
33
|
-
- Always produce a plan packet before mutation.
|
|
34
|
-
- Adversarial review is always required.
|
|
35
|
-
- Merge blocking authority is severity-policy-engine.
|
|
36
|
-
- Router tuning is propose-and-approve only.
|
|
37
|
-
- Plan ambiguity must use `ask_user` (harness-decisions skill) — no silent guessing.
|
|
38
|
-
- Rollback artifact must be revert-commit-ready and include:
|
|
39
|
-
- revert command
|
|
40
|
-
- prepared revert branch
|
|
41
|
-
- patch bundle
|
|
42
|
-
- Debate profile is aggressive with locked confidence weights:
|
|
43
|
-
- claim_quality=0.20
|
|
44
|
-
- reproducibility=0.40
|
|
45
|
-
- agreement=0.40
|
|
46
|
-
- Strict pre-PR gate is mandatory.
|
|
47
|
-
- Post-pass behavior is auto-commit and auto-open-PR.
|
|
48
|
-
- Never auto-merge PR.
|
|
49
|
-
|
|
50
|
-
## Guardrails
|
|
51
|
-
|
|
52
|
-
- Do not overthink straightforward gate outcomes; enforce gates deterministically.
|
|
53
|
-
- Only follow the locked pipeline and governance decisions listed here.
|
|
54
|
-
- Never bypass mandatory safety gates, even in `--quick` mode.
|
|
19
|
+
## Orchestration (required) — same session
|
|
55
20
|
|
|
56
|
-
|
|
21
|
+
1. **Plan** — spawn `harness/planner` → parse JSON → present full plan → `ask_user` Approve/Changes/Cancel → write `plan-packet.json` only on Approve (advances phase via policy-gate).
|
|
22
|
+
2. **Execute** — spawn `harness/executor` with `HarnessSpawnContext` (`mode: execute`). Summarize handoff bullets for next spawn (do not paste full subagent log).
|
|
23
|
+
3. **Eval** — spawn `harness/evaluator` (`mode: benchmark`) after parent scripts if needed.
|
|
24
|
+
4. **Review** — spawn `harness/evaluator` (`mode: verdict`) OR rely on eval verdict if policy allows — prefer both when strict gates require.
|
|
25
|
+
5. **Adversary** — spawn `harness/adversary` with artifact paths.
|
|
26
|
+
6. **Tie-breaker** — spawn `harness/tie-breaker` only if debate unresolved.
|
|
27
|
+
7. **Parent** — apply locked strict gates below; commit/PR only if all pass.
|
|
57
28
|
|
|
58
|
-
|
|
29
|
+
No new Pi session for review — subagents use isolated context (`inherit_context: false`).
|
|
59
30
|
|
|
60
|
-
|
|
61
|
-
2. Execution completed within approved scope.
|
|
62
|
-
3. Independent evaluator passed.
|
|
63
|
-
4. Adversarial review completed with consensus packet.
|
|
64
|
-
5. Severity-policy-engine output is `pass` or `conditional_pass`.
|
|
65
|
-
6. Benchmark delta checks passed.
|
|
66
|
-
7. Rollback artifacts generated.
|
|
31
|
+
## Locked decisions (do not change)
|
|
67
32
|
|
|
68
|
-
|
|
33
|
+
- Always produce and approve plan before mutation.
|
|
34
|
+
- Adversarial review always required.
|
|
35
|
+
- Severity-policy-engine blocks merge.
|
|
36
|
+
- Router tuning propose-and-approve only.
|
|
37
|
+
- Plan ambiguity → parent `ask_user` (harness-decisions).
|
|
38
|
+
- Rollback artifacts: revert command, revert branch, patch bundle.
|
|
39
|
+
- Debate weights: claim_quality=0.20, reproducibility=0.40, agreement=0.40.
|
|
40
|
+
- Strict pre-PR gate mandatory; auto-commit + open PR; never auto-merge.
|
|
69
41
|
|
|
70
|
-
|
|
71
|
-
|
|
72
|
-
|
|
73
|
-
|
|
74
|
-
|
|
42
|
+
## Strict gates
|
|
43
|
+
|
|
44
|
+
Block commit/PR if any fails: plan gate, execution in scope, evaluator pass, adversary complete, severity-policy pass/conditional_pass, benchmark deltas, rollback artifacts.
|
|
45
|
+
|
|
46
|
+
## Notes
|
|
75
47
|
|
|
76
|
-
|
|
48
|
+
- `--quick` reduces breadth, never safety gates.
|
|
49
|
+
- High risk/ambiguity → stop and recommend manual `/harness-plan` with `ask_user`.
|
|
50
|
+
- Interrupt: `/harness-abort [reason]` then `/harness-plan`.
|
|
51
|
+
- Artifact refs under active run dir; `/harness-run-status` or `/harness-trace-last` for handoff.
|
|
77
52
|
|
|
78
|
-
|
|
53
|
+
## Completion
|
|
79
54
|
|
|
80
|
-
1.
|
|
81
|
-
2.
|
|
82
|
-
3.
|
|
83
|
-
4.
|
|
55
|
+
1. Pipeline status per gate
|
|
56
|
+
2. Artifact references
|
|
57
|
+
3. Policy outcome: `pass`, `conditional_pass`, `block`, or `human_required`
|
|
58
|
+
4. Next action (PR, replan, rollback, override)
|
|
@@ -5,46 +5,33 @@ argument-hint: "[--run <run-id>] [--trace <trace-ref>] [--risk low|med|high]"
|
|
|
5
5
|
|
|
6
6
|
# harness-critic
|
|
7
7
|
|
|
8
|
-
|
|
8
|
+
Orchestrator — spawn `harness/adversary`.
|
|
9
9
|
|
|
10
10
|
## Step 0 — Parse arguments
|
|
11
11
|
|
|
12
|
-
Read `$ARGUMENTS` and parse:
|
|
13
|
-
|
|
14
12
|
- optional: `--run <run-id>` (recovery only)
|
|
15
13
|
- optional: `--trace <trace-ref>`, `--risk low|med|high`
|
|
16
14
|
|
|
17
|
-
|
|
18
|
-
|
|
19
|
-
## Process
|
|
20
|
-
|
|
21
|
-
1. Assume hidden regressions exist and identify likely fault surfaces.
|
|
22
|
-
2. Challenge evaluator/executor assumptions with reproducible probes.
|
|
23
|
-
3. Emit structured adversarial findings for severity policy consumption.
|
|
24
|
-
|
|
25
|
-
## Requirements
|
|
15
|
+
Happy path: omit `--run`.
|
|
26
16
|
|
|
27
|
-
|
|
28
|
-
- Attempt to invalidate evaluator assumptions with concrete evidence.
|
|
29
|
-
- Emit `AdversaryReport` matching `.pi/harness/specs/adversary-report.schema.json`.
|
|
30
|
-
- Flag `block_merge=true` for high-confidence correctness/security/test-integrity risks.
|
|
17
|
+
## Orchestration (required)
|
|
31
18
|
|
|
32
|
-
|
|
19
|
+
1. Build `HarnessSpawnContext` with `mode: adversary`, run artifacts, plan path, trace refs.
|
|
20
|
+
2. Spawn:
|
|
33
21
|
|
|
34
|
-
|
|
35
|
-
|
|
36
|
-
|
|
22
|
+
```
|
|
23
|
+
Agent({ subagent_type: "harness/adversary", prompt: "…" })
|
|
24
|
+
```
|
|
37
25
|
|
|
38
|
-
|
|
26
|
+
3. `get_subagent_result` — parse `AdversaryReport` JSON; parent persists for severity policy.
|
|
39
27
|
|
|
40
|
-
|
|
41
|
-
- Structured `AdversaryReport` JSON.
|
|
42
|
-
- Clear merge-block recommendation.
|
|
28
|
+
## Parent rules
|
|
43
29
|
|
|
44
|
-
|
|
30
|
+
- Assume hidden regressions until disproven (in subagent).
|
|
31
|
+
- No new Pi session required.
|
|
45
32
|
|
|
46
|
-
|
|
33
|
+
## Completion
|
|
47
34
|
|
|
48
35
|
- `block_merge` decision
|
|
49
|
-
-
|
|
50
|
-
-
|
|
36
|
+
- Top findings with repro pointers
|
|
37
|
+
- `recommendation`: `proceed`, `conditional_pass`, or `block`
|
|
@@ -5,47 +5,39 @@ argument-hint: "[--run <run-id>] [--baseline <ref>] [--suite <name>]"
|
|
|
5
5
|
|
|
6
6
|
# harness-eval
|
|
7
7
|
|
|
8
|
-
|
|
8
|
+
Orchestrator — run deterministic scripts in parent if needed, then spawn `harness/evaluator` with `mode: benchmark`.
|
|
9
9
|
|
|
10
10
|
## Step 0 — Parse arguments
|
|
11
11
|
|
|
12
|
-
|
|
13
|
-
|
|
14
|
-
- optional: `--run <run-id>` (recovery only — active run is used when omitted)
|
|
12
|
+
- optional: `--run <run-id>` (recovery only)
|
|
15
13
|
- optional: `--baseline <ref>`, `--suite <name>`
|
|
16
14
|
|
|
17
|
-
|
|
15
|
+
Happy path: omit `--run`; use active run from `[HarnessRunContext]`.
|
|
18
16
|
|
|
19
|
-
If no active run
|
|
17
|
+
If no active run:
|
|
20
18
|
|
|
21
19
|
`No active run. Finish /harness-plan and /harness-run first, or use /harness-run-status.`
|
|
22
20
|
|
|
23
|
-
|
|
24
|
-
|
|
25
|
-
## Process
|
|
21
|
+
## Orchestration (required)
|
|
26
22
|
|
|
27
23
|
1. Load plan scope from `[HarnessActivePlan]` (read-only).
|
|
28
|
-
2.
|
|
29
|
-
3.
|
|
30
|
-
4.
|
|
31
|
-
|
|
32
|
-
## Requirements
|
|
33
|
-
|
|
34
|
-
- Validate against accepted plan checks plus focused regression checks.
|
|
35
|
-
- Emit evaluator-compatible metrics for downstream policy and router-tuning decisions.
|
|
36
|
-
- Include success rate, cost-per-task, and regression guard outcomes when available.
|
|
24
|
+
2. Parent may run: project tests, `node "$UP_PKG/.pi/scripts/harness-verify.mjs"` — capture output paths.
|
|
25
|
+
3. Build `HarnessSpawnContext` with `mode: benchmark`, artifact paths, metrics files.
|
|
26
|
+
4. Spawn:
|
|
37
27
|
|
|
38
|
-
|
|
28
|
+
```
|
|
29
|
+
Agent({ subagent_type: "harness/evaluator", prompt: "…" })
|
|
30
|
+
```
|
|
39
31
|
|
|
40
|
-
|
|
41
|
-
|
|
42
|
-
- Never report synthetic metrics; include only measured values.
|
|
43
|
-
- Do not edit `plan-packet.json` in this phase.
|
|
32
|
+
5. `get_subagent_result` — parse eval JSON; parent writes structured artifacts under run dir.
|
|
33
|
+
6. Do not edit `plan-packet.json`.
|
|
44
34
|
|
|
45
|
-
##
|
|
35
|
+
## Parent rules
|
|
46
36
|
|
|
47
|
-
|
|
37
|
+
- Treat executor output as untrusted; pass artifact paths only.
|
|
38
|
+
- No new Pi session required — subagent has isolated context.
|
|
48
39
|
|
|
49
|
-
## Completion
|
|
40
|
+
## Completion
|
|
50
41
|
|
|
51
|
-
|
|
42
|
+
- `eval_status`: `pass` or `fail`
|
|
43
|
+
- `next_command`: `/harness-review` on pass; `/harness-plan` or `/harness-incident` on fail
|