ultimate-pi 0.14.0 → 0.16.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (92) hide show
  1. package/.agents/skills/harness-debate-plan/SKILL.md +41 -61
  2. package/.agents/skills/harness-governor/SKILL.md +11 -0
  3. package/.agents/skills/harness-orchestration/SKILL.md +5 -3
  4. package/.agents/skills/harness-plan/SKILL.md +11 -9
  5. package/.pi/agents/harness/adversary.md +1 -1
  6. package/.pi/agents/harness/evaluator.md +1 -1
  7. package/.pi/agents/harness/executor.md +1 -1
  8. package/.pi/agents/harness/incident-recorder.md +1 -1
  9. package/.pi/agents/harness/meta-optimizer.md +1 -1
  10. package/.pi/agents/harness/planning/decompose.md +8 -35
  11. package/.pi/agents/harness/planning/execution-plan-author.md +27 -15
  12. package/.pi/agents/harness/planning/hypothesis-validator.md +23 -6
  13. package/.pi/agents/harness/planning/hypothesis.md +4 -27
  14. package/.pi/agents/harness/planning/implementation-researcher.md +43 -0
  15. package/.pi/agents/harness/planning/plan-adversary.md +20 -5
  16. package/.pi/agents/harness/planning/plan-evaluator.md +28 -6
  17. package/.pi/agents/harness/planning/review-integrator.md +23 -10
  18. package/.pi/agents/harness/planning/scout-graphify.md +4 -23
  19. package/.pi/agents/harness/planning/scout-semantic.md +3 -18
  20. package/.pi/agents/harness/planning/scout-structure.md +3 -18
  21. package/.pi/agents/harness/planning/sprint-contract-auditor.md +22 -6
  22. package/.pi/agents/harness/planning/stack-researcher.md +21 -11
  23. package/.pi/agents/harness/tie-breaker.md +1 -1
  24. package/.pi/agents/harness/trace-librarian.md +1 -1
  25. package/.pi/extensions/budget-guard.ts +33 -19
  26. package/.pi/extensions/harness-debate-tools.ts +280 -19
  27. package/.pi/extensions/harness-live-widget.ts +39 -159
  28. package/.pi/extensions/harness-plan-approval.ts +47 -5
  29. package/.pi/extensions/harness-run-context.ts +96 -2
  30. package/.pi/extensions/harness-subagent-submit.ts +195 -0
  31. package/.pi/extensions/lib/debate-bus-core.ts +108 -17
  32. package/.pi/extensions/lib/debate-bus-state.ts +6 -0
  33. package/.pi/extensions/lib/harness-subagent-policy.ts +45 -0
  34. package/.pi/extensions/lib/harness-subagent-submit-pipeline.ts +82 -0
  35. package/.pi/extensions/lib/harness-subagent-submit-registry.ts +172 -0
  36. package/.pi/extensions/lib/harness-subagents-bridge.ts +42 -0
  37. package/.pi/extensions/lib/plan-approval/plan-review.ts +56 -0
  38. package/.pi/extensions/lib/plan-approval/types.ts +1 -0
  39. package/.pi/extensions/lib/plan-debate-eligibility.ts +214 -0
  40. package/.pi/extensions/lib/plan-debate-focus.ts +151 -0
  41. package/.pi/extensions/lib/plan-debate-gate.ts +88 -34
  42. package/.pi/extensions/lib/plan-debate-lane.ts +15 -0
  43. package/.pi/extensions/lib/plan-debate-lanes.ts +44 -0
  44. package/.pi/extensions/lib/plan-debate-round-status.ts +63 -20
  45. package/.pi/extensions/lib/plan-messenger.ts +93 -17
  46. package/.pi/extensions/policy-gate.ts +1 -1
  47. package/.pi/harness/README.md +1 -1
  48. package/.pi/harness/agents.manifest.json +25 -21
  49. package/.pi/harness/docs/adrs/0034-darwin-plan-research-pipeline.md +1 -3
  50. package/.pi/harness/docs/adrs/0035-plan-phase-review-gate.md +13 -5
  51. package/.pi/harness/docs/adrs/0036-implementation-research-and-selective-debate.md +51 -0
  52. package/.pi/harness/docs/adrs/0037-subagent-submit-tools.md +31 -0
  53. package/.pi/harness/docs/adrs/0038-budget-telemetry-only.md +23 -0
  54. package/.pi/harness/docs/adrs/README.md +4 -0
  55. package/.pi/harness/evals/smoke/fixtures/plan-phase/minimal-low-light/artifacts/implementation-research.yaml +28 -0
  56. package/.pi/harness/evals/smoke/fixtures/plan-phase/minimal-low-light/artifacts/review-round-r1.yaml +24 -0
  57. package/.pi/harness/evals/smoke/fixtures/plan-phase/minimal-low-light/artifacts/review-round-r2.yaml +25 -0
  58. package/.pi/harness/evals/smoke/fixtures/plan-phase/minimal-low-light/plan-packet.yaml +196 -0
  59. package/.pi/harness/evals/smoke/fixtures/plan-phase/minimal-low-light/plan-review.md +14 -0
  60. package/.pi/harness/evals/smoke/fixtures/plan-phase/minimal-low-light/research-brief.yaml +62 -0
  61. package/.pi/harness/evals/smoke/fixtures/plan-phase/minimal-med/artifacts/implementation-research.yaml +28 -0
  62. package/.pi/harness/evals/smoke/fixtures/plan-phase/minimal-med/artifacts/review-round-r2.yaml +24 -0
  63. package/.pi/harness/evals/smoke/fixtures/plan-phase/minimal-med/artifacts/review-round-r3.yaml +24 -0
  64. package/.pi/harness/evals/smoke/fixtures/plan-phase/minimal-med/research-brief.yaml +29 -0
  65. package/.pi/harness/evals/smoke/smoke-harness-plan.mjs +97 -16
  66. package/.pi/harness/specs/harness-executor-handoff.schema.json +19 -0
  67. package/.pi/harness/specs/harness-human-required.schema.json +16 -0
  68. package/.pi/harness/specs/plan-implementation-research-brief.schema.json +128 -0
  69. package/.pi/harness/specs/plan-review-round-draft.schema.json +1 -1
  70. package/.pi/harness/specs/plan-scout-findings.schema.json +19 -0
  71. package/.pi/harness/specs/round-result.schema.json +15 -2
  72. package/.pi/lib/harness-agent-output.ts +45 -0
  73. package/.pi/lib/harness-budget-enforce.ts +18 -0
  74. package/.pi/lib/harness-schema-validate.ts +89 -0
  75. package/.pi/lib/harness-spawn-parse.ts +86 -0
  76. package/.pi/lib/harness-subagent-submit-path.ts +41 -0
  77. package/.pi/lib/harness-ui-state.ts +107 -2
  78. package/.pi/prompts/harness-auto.md +2 -2
  79. package/.pi/prompts/harness-plan.md +94 -42
  80. package/.pi/prompts/harness-run.md +2 -2
  81. package/.pi/prompts/planning-rubrics.md +31 -0
  82. package/.pi/scripts/harness-verify.mjs +2 -0
  83. package/.pi/scripts/harness_web/__pycache__/__init__.cpython-314.pyc +0 -0
  84. package/.pi/scripts/harness_web/__pycache__/config.cpython-314.pyc +0 -0
  85. package/.pi/scripts/harness_web/__pycache__/output.cpython-314.pyc +0 -0
  86. package/.pi/scripts/harness_web/__pycache__/scrape.cpython-314.pyc +0 -0
  87. package/.pi/scripts/harness_web/__pycache__/search.cpython-314.pyc +0 -0
  88. package/.pi/scripts/harness_web/__pycache__/search_ddg.cpython-314.pyc +0 -0
  89. package/.pi/scripts/harness_web/__pycache__/search_searxng.cpython-314.pyc +0 -0
  90. package/CHANGELOG.md +21 -0
  91. package/package.json +4 -2
  92. package/vendor/pi-subagents/src/subagents.ts +29 -3
@@ -5,10 +5,18 @@
5
5
  import { constants } from "node:fs";
6
6
  import { access } from "node:fs/promises";
7
7
  import { join } from "node:path";
8
- import { type DebateLaneKind, laneArtifactPath } from "./plan-debate-lane.js";
8
+ import { capsForDebate } from "./debate-bus-core.js";
9
+ import {
10
+ type PlanDebateFocus,
11
+ readDebateRoundFocus,
12
+ } from "./plan-debate-focus.js";
13
+ import { planDebateIdForRun } from "./plan-debate-id.js";
14
+ import { laneArtifactPath } from "./plan-debate-lane.js";
15
+ import { lanesForRound } from "./plan-debate-lanes.js";
9
16
  import {
10
17
  getMessengerRoundState,
11
- messengerRoundDebateReady,
18
+ loadMessengerState,
19
+ messengerRoundDialogueReady,
12
20
  } from "./plan-messenger.js";
13
21
 
14
22
  async function exists(path: string): Promise<boolean> {
@@ -20,39 +28,50 @@ async function exists(path: string): Promise<boolean> {
20
28
  }
21
29
  }
22
30
 
23
- function lanesForRound(roundIndex: number): DebateLaneKind[] {
24
- const lanes: DebateLaneKind[] = ["validation-turn", "adversary-brief"];
25
- if (roundIndex === 1) lanes.unshift("hypothesis-validation");
26
- if (roundIndex === 4) lanes.push("sprint-audit");
27
- return lanes;
28
- }
29
-
30
31
  export interface RoundStatusResult {
31
32
  round_index: number;
32
- /** Lane YAML + messenger thread complete; spawn integrator next. */
33
+ /** Lane YAML + messenger dialogue complete; spawn integrator next. */
33
34
  ready_for_integrator: boolean;
34
35
  /** review-round-rN.yaml on disk (call harness_debate_submit_round if bus not updated). */
35
36
  review_round_on_disk: boolean;
36
37
  missing: string[];
37
38
  next_tool?: string;
38
39
  messenger: { ok: boolean; errors: string[] };
40
+ dialogue: { ok: boolean; errors: string[] };
41
+ unresolved_claim_ids: string[];
42
+ exchange_count: number;
43
+ debate_round_focus?: PlanDebateFocus | null;
39
44
  }
40
45
 
41
46
  export async function getPlanDebateRoundStatus(
42
47
  runDir: string,
43
48
  roundIndex: number,
49
+ runId?: string,
50
+ opts?: { debate_round_focus?: PlanDebateFocus },
44
51
  ): Promise<RoundStatusResult> {
52
+ const focus =
53
+ opts?.debate_round_focus ??
54
+ (await readDebateRoundFocus(runDir, roundIndex));
45
55
  const missing: string[] = [];
46
- for (const lane of lanesForRound(roundIndex)) {
56
+ for (const lane of lanesForRound(roundIndex, focus)) {
47
57
  const rel = laneArtifactPath(lane, roundIndex);
48
58
  if (!(await exists(join(runDir, rel)))) {
49
59
  missing.push(rel);
50
60
  }
51
61
  }
62
+ const messengerState = await loadMessengerState(runDir);
63
+ const profile = messengerState?.debate_profile;
64
+ const caps = capsForDebate(
65
+ runId ? planDebateIdForRun(runId) : `plan-${runId ?? "unknown"}`,
66
+ profile,
67
+ );
52
68
  const roundState = await getMessengerRoundState(runDir, roundIndex);
53
- const messenger = messengerRoundDebateReady(roundState, roundIndex === 4);
54
- if (!messenger.ok) {
55
- missing.push(...messenger.errors.map((e) => `messenger: ${e}`));
69
+ const dialogueOpts = {
70
+ max_exchanges_per_round: caps.max_exchanges_per_round,
71
+ };
72
+ const dialogue = messengerRoundDialogueReady(roundState, dialogueOpts);
73
+ if (!dialogue.ok) {
74
+ missing.push(...dialogue.errors.map((e) => `messenger: ${e}`));
56
75
  }
57
76
  const reviewRound = `artifacts/review-round-r${roundIndex}.yaml`;
58
77
  const reviewRoundOnDisk = await exists(join(runDir, reviewRound));
@@ -62,14 +81,35 @@ export async function getPlanDebateRoundStatus(
62
81
  next_tool = "subagent harness/planning/hypothesis-validator";
63
82
  } else if (missing.some((m) => m.includes("validation-turn"))) {
64
83
  next_tool = "subagent harness/planning/plan-evaluator";
84
+ } else if (
85
+ missing.some((m) => m.includes("adversary-brief")) &&
86
+ !roundState?.evaluator_posted
87
+ ) {
88
+ next_tool = "subagent harness/planning/plan-evaluator";
65
89
  } else if (missing.some((m) => m.includes("adversary-brief"))) {
66
90
  next_tool =
67
91
  "harness_messenger_read_round then subagent harness/planning/plan-adversary";
68
92
  } else if (missing.some((m) => m.includes("sprint-audit"))) {
69
93
  next_tool = "subagent harness/planning/sprint-contract-auditor";
70
- } else if (!messenger.ok) {
94
+ } else if (
95
+ roundState &&
96
+ roundState.evaluator_posted &&
97
+ !roundState.adversary_posted
98
+ ) {
99
+ next_tool =
100
+ "harness_messenger_read_round then subagent harness/planning/plan-adversary";
101
+ } else if (
102
+ roundState &&
103
+ roundState.unresolved_claim_ids.length > 0 &&
104
+ roundState.exchange_count < caps.max_exchanges_per_round
105
+ ) {
106
+ const spawnEvaluator = roundState.exchange_count % 2 === 1;
107
+ next_tool = spawnEvaluator
108
+ ? "harness_debate_advance_thread → harness_messenger_read_round → subagent harness/planning/plan-evaluator (clarification; address unresolved claim_ids)"
109
+ : "harness_debate_advance_thread → harness_messenger_read_round → subagent harness/planning/plan-adversary (counter or concede)";
110
+ } else if (!dialogue.ok) {
71
111
  next_tool =
72
- "harness_debate_apply_lane (evaluator/adversary) or re-spawn lane agent";
112
+ "harness_debate_advance_thread or harness_debate_apply_lane (evaluator/adversary)";
73
113
  } else if (!reviewRoundOnDisk) {
74
114
  next_tool =
75
115
  "subagent harness/planning/review-integrator then harness_debate_submit_round";
@@ -78,10 +118,9 @@ export async function getPlanDebateRoundStatus(
78
118
  "harness_debate_submit_round with integrator draft from review-round file";
79
119
  }
80
120
 
121
+ const laneMissing = missing.filter((m) => !m.startsWith("messenger"));
81
122
  const readyForIntegrator =
82
- messenger.ok &&
83
- missing.filter((m) => !m.startsWith("messenger")).length === 0 &&
84
- !reviewRoundOnDisk;
123
+ dialogue.ok && laneMissing.length === 0 && !reviewRoundOnDisk;
85
124
 
86
125
  return {
87
126
  round_index: roundIndex,
@@ -89,6 +128,10 @@ export async function getPlanDebateRoundStatus(
89
128
  review_round_on_disk: reviewRoundOnDisk,
90
129
  missing,
91
130
  next_tool,
92
- messenger,
131
+ messenger: dialogue,
132
+ dialogue,
133
+ unresolved_claim_ids: roundState?.unresolved_claim_ids ?? [],
134
+ exchange_count: roundState?.exchange_count ?? 0,
135
+ debate_round_focus: focus,
93
136
  };
94
137
  }
@@ -17,11 +17,15 @@ import {
17
17
  } from "node:fs/promises";
18
18
  import { join } from "node:path";
19
19
  import type { DebateParticipant } from "../../lib/debate-orchestrator-types.js";
20
+ import type { DebateProfile } from "./plan-debate-eligibility.js";
21
+ import type { PlanDebateFocus } from "./plan-debate-focus.js";
20
22
 
21
23
  export type MessengerMessageKind =
22
24
  | "system"
23
25
  | "claim"
24
26
  | "rebuttal"
27
+ | "clarification"
28
+ | "counter"
25
29
  | "integrate"
26
30
  | "audit";
27
31
 
@@ -47,6 +51,8 @@ export interface MessengerRoundState {
47
51
  integrator_posted: boolean;
48
52
  claim_count: number;
49
53
  rebuttal_count: number;
54
+ exchange_count: number;
55
+ unresolved_claim_ids: string[];
50
56
  }
51
57
 
52
58
  export interface MessengerState {
@@ -55,6 +61,8 @@ export interface MessengerState {
55
61
  debate_id: string;
56
62
  opened_at: string;
57
63
  rounds: Record<string, MessengerRoundState>;
64
+ debate_profile?: DebateProfile;
65
+ required_focuses?: PlanDebateFocus[];
58
66
  }
59
67
 
60
68
  function messengerRoot(runDir: string): string {
@@ -71,7 +79,12 @@ function roundKey(roundIndex: number): string {
71
79
 
72
80
  export async function initPlanMessenger(
73
81
  runDir: string,
74
- opts: { runId: string; debateId: string },
82
+ opts: {
83
+ runId: string;
84
+ debateId: string;
85
+ debate_profile?: DebateProfile;
86
+ required_focuses?: PlanDebateFocus[];
87
+ },
75
88
  ): Promise<string> {
76
89
  const root = messengerRoot(runDir);
77
90
  await mkdir(join(root, "inbox"), { recursive: true });
@@ -82,6 +95,8 @@ export async function initPlanMessenger(
82
95
  debate_id: opts.debateId,
83
96
  opened_at: nowIso(),
84
97
  rounds: {},
98
+ debate_profile: opts.debate_profile,
99
+ required_focuses: opts.required_focuses,
85
100
  };
86
101
  await writeFile(
87
102
  join(root, "state.json"),
@@ -122,9 +137,51 @@ function defaultRoundState(roundIndex: number): MessengerRoundState {
122
137
  integrator_posted: false,
123
138
  claim_count: 0,
124
139
  rebuttal_count: 0,
140
+ exchange_count: 0,
141
+ unresolved_claim_ids: [],
125
142
  };
126
143
  }
127
144
 
145
+ /** Recompute exchange + unresolved claim ids from a round transcript. */
146
+ export function syncRoundStateFromTranscript(
147
+ round: MessengerRoundState,
148
+ messages: MessengerMessage[],
149
+ ): MessengerRoundState {
150
+ const claimed = new Set<string>();
151
+ const resolved = new Set<string>();
152
+ let exchange_count = 0;
153
+
154
+ for (const m of messages) {
155
+ if (m.from === "PlanEvaluatorAgent" && m.kind === "claim") {
156
+ round.evaluator_posted = true;
157
+ round.claim_count += m.claim_ids.length || 1;
158
+ for (const id of m.claim_ids) claimed.add(id);
159
+ }
160
+ if (m.from === "PlanAdversaryAgent" && m.kind === "rebuttal") {
161
+ round.adversary_posted = true;
162
+ round.rebuttal_count += m.in_reply_to.length || 1;
163
+ exchange_count += 1;
164
+ }
165
+ if (m.from === "PlanEvaluatorAgent" && m.kind === "clarification") {
166
+ exchange_count += 1;
167
+ for (const id of m.claim_ids) resolved.add(id);
168
+ for (const id of m.in_reply_to) resolved.add(id);
169
+ }
170
+ if (m.from === "PlanAdversaryAgent" && m.kind === "counter") {
171
+ exchange_count += 1;
172
+ for (const id of m.claim_ids) resolved.add(id);
173
+ for (const id of m.in_reply_to) resolved.add(id);
174
+ }
175
+ if (m.from === "ReviewIntegratorAgent" && m.kind === "integrate") {
176
+ round.integrator_posted = true;
177
+ }
178
+ }
179
+
180
+ round.exchange_count = exchange_count;
181
+ round.unresolved_claim_ids = [...claimed].filter((id) => !resolved.has(id));
182
+ return round;
183
+ }
184
+
128
185
  export async function postMessengerMessage(
129
186
  runDir: string,
130
187
  msg: Omit<MessengerMessage, "schema_version" | "id" | "ts"> & {
@@ -172,19 +229,10 @@ export async function postMessengerMessage(
172
229
  rounds: {},
173
230
  };
174
231
  const key = roundKey(full.round_index);
232
+ const messages = await readRoundTranscript(runDir, full.round_index);
233
+ messages.push(full);
175
234
  const round = state.rounds[key] ?? defaultRoundState(full.round_index);
176
- if (full.from === "PlanEvaluatorAgent" && full.kind === "claim") {
177
- round.evaluator_posted = true;
178
- round.claim_count += full.claim_ids.length || 1;
179
- }
180
- if (full.from === "PlanAdversaryAgent" && full.kind === "rebuttal") {
181
- round.adversary_posted = true;
182
- round.rebuttal_count += full.in_reply_to.length || 1;
183
- }
184
- if (full.from === "ReviewIntegratorAgent" && full.kind === "integrate") {
185
- round.integrator_posted = true;
186
- }
187
- state.rounds[key] = round;
235
+ state.rounds[key] = syncRoundStateFromTranscript(round, messages);
188
236
  await saveMessengerState(runDir, state);
189
237
  return full;
190
238
  }
@@ -233,13 +281,22 @@ export async function getMessengerRoundState(
233
281
  ): Promise<MessengerRoundState | null> {
234
282
  const state = await loadMessengerState(runDir);
235
283
  if (!state) return null;
236
- return state.rounds[roundKey(roundIndex)] ?? null;
284
+ const round = state.rounds[roundKey(roundIndex)];
285
+ if (!round) return null;
286
+ const transcript = await readRoundTranscript(runDir, roundIndex);
287
+ return syncRoundStateFromTranscript({ ...round }, transcript);
237
288
  }
238
289
 
239
- export function messengerRoundDebateReady(
290
+ export interface MessengerDialogueOptions {
291
+ max_exchanges_per_round?: number;
292
+ }
293
+
294
+ /** Evaluator + adversary dialogue settled; safe to spawn integrator. */
295
+ export function messengerRoundDialogueReady(
240
296
  round: MessengerRoundState | null,
241
- _requireSprintAudit: boolean,
297
+ opts: MessengerDialogueOptions = {},
242
298
  ): { ok: boolean; errors: string[] } {
299
+ const maxExchanges = opts.max_exchanges_per_round ?? 3;
243
300
  const errors: string[] = [];
244
301
  if (!round) {
245
302
  errors.push("no messenger activity for this round");
@@ -257,7 +314,26 @@ export function messengerRoundDebateReady(
257
314
  if (round.rebuttal_count < 1) {
258
315
  errors.push("adversary must rebut at least one claim (in_reply_to)");
259
316
  }
260
- if (!round.integrator_posted) {
317
+ const dialogueSettled =
318
+ round.unresolved_claim_ids.length === 0 ||
319
+ round.exchange_count >= maxExchanges;
320
+ if (!dialogueSettled) {
321
+ errors.push(
322
+ `unresolved claims remain (${round.unresolved_claim_ids.join(", ")}) and exchange_count ${round.exchange_count} < ${maxExchanges}`,
323
+ );
324
+ }
325
+ return { ok: errors.length === 0, errors };
326
+ }
327
+
328
+ /** Full round ready for harness_debate_submit_round (includes integrator). */
329
+ export function messengerRoundDebateReady(
330
+ round: MessengerRoundState | null,
331
+ _requireSprintAudit: boolean,
332
+ opts: MessengerDialogueOptions = {},
333
+ ): { ok: boolean; errors: string[] } {
334
+ const dialogue = messengerRoundDialogueReady(round, opts);
335
+ const errors = [...dialogue.errors];
336
+ if (!round?.integrator_posted) {
261
337
  errors.push(
262
338
  "ReviewIntegratorAgent must post integrate message before bus submit",
263
339
  );
@@ -243,7 +243,7 @@ export default function policyGate(pi: ExtensionAPI) {
243
243
 
244
244
  const planPhaseHint =
245
245
  state.phase === "plan"
246
- ? "\nPlan phase: scouts → decompose → hypothesis → stack-researcher → execution-plan-author → validate-plan-dag → 4-round plan debate → approve_plan → create_plan (YAML plan-packet.yaml). Post-execute: /harness-critic."
246
+ ? "\nPlan phase: scouts → decompose → hypothesis → implementation-researcher + stack-researcher → execution-plan-author → validate-plan-dag → debate eligibility + Review Gate → approve_plan → create_plan (YAML plan-packet.yaml). Post-execute: /harness-critic."
247
247
  : "";
248
248
 
249
249
  return {
@@ -30,7 +30,7 @@ under `.pi/extensions/` and auto-loaded through the package `pi.extensions`
30
30
  manifest (`package.json`).
31
31
 
32
32
  - `harness-run-context.ts` - active run + plan injection; short commands without run/plan args
33
- - `harness-live-widget.ts` - footer status (phase, plan ready, next command; no run id in UI)
33
+ - `harness-live-widget.ts` - footer status (current/next phase + plain-language status hint; no run id in UI)
34
34
  - `policy-gate.ts` - phase state machine + plan-before-mutate enforcement
35
35
  - `budget-guard.ts` - hard-stop token budget checks + budget exhausted artifacts
36
36
  - `trace-recorder.ts` - append-only run traces + HarnessRunRecord + compact index
@@ -1,8 +1,8 @@
1
1
  {
2
2
  "schema_version": "1.0.0",
3
3
  "package": "ultimate-pi",
4
- "package_version": "0.13.1",
5
- "generated_at": "2026-05-18T17:22:10.311Z",
4
+ "package_version": "0.15.0",
5
+ "generated_at": "2026-05-19T12:56:13.369Z",
6
6
  "agents": {
7
7
  "pi-pi/agent-expert": {
8
8
  "path": ".pi/agents/pi-pi/agent-expert.md",
@@ -46,23 +46,23 @@
46
46
  },
47
47
  "harness/adversary": {
48
48
  "path": ".pi/agents/harness/adversary.md",
49
- "sha256": "dd2ef87139cb175d795f4d7bde78dca1a181d2e42c3c3bd0d48832cf5069aa29"
49
+ "sha256": "560c7571ab91478bde1271e9ae6c3a112c3e1d28e1a261c5450fd1d00f9f89af"
50
50
  },
51
51
  "harness/evaluator": {
52
52
  "path": ".pi/agents/harness/evaluator.md",
53
- "sha256": "2b8039fd79f9177fdafd5319a53a96812719d4f1f68e2de70632030142649cfe"
53
+ "sha256": "a4667d3efb305ba2fe79118e3d7d2b0de5e0369637af040d1238161d75cd28ac"
54
54
  },
55
55
  "harness/executor": {
56
56
  "path": ".pi/agents/harness/executor.md",
57
- "sha256": "b549e9fc802ba23857a1bc6b2ff36f3c169e708fe5ec13857b3bcfe841384f1f"
57
+ "sha256": "6baffcc3d89954494ce3ae439175686a39928b6a543a0a451da27475094b1712"
58
58
  },
59
59
  "harness/incident-recorder": {
60
60
  "path": ".pi/agents/harness/incident-recorder.md",
61
- "sha256": "d7577c911a9e6c9607eb64f76337aab85c4eb9a92e7cd917eb8d989ef3cd1de5"
61
+ "sha256": "d42fa45de1a2fe3842d075c6f319315266588942e314f1b650caabac39bdc29a"
62
62
  },
63
63
  "harness/meta-optimizer": {
64
64
  "path": ".pi/agents/harness/meta-optimizer.md",
65
- "sha256": "a4eed88084c7cfb5ace3edc72b72d7ead4134b3eae0d444b391decfe2640a632"
65
+ "sha256": "cbaab35367126796b7136389a02ab41b4fd1fe7098cf83be562d7b7493ccc297"
66
66
  },
67
67
  "harness/sentrux-bootstrap": {
68
68
  "path": ".pi/agents/harness/sentrux-bootstrap.md",
@@ -70,59 +70,63 @@
70
70
  },
71
71
  "harness/tie-breaker": {
72
72
  "path": ".pi/agents/harness/tie-breaker.md",
73
- "sha256": "68f02b86e95927f06d7f963e1f61f193159bbef1ba4558d90c84d5457d62b3f7"
73
+ "sha256": "1c54c1c3274291dea1ea8826563a7ad4fe1d9c4302984e907bfcd22cfc4f5eba"
74
74
  },
75
75
  "harness/trace-librarian": {
76
76
  "path": ".pi/agents/harness/trace-librarian.md",
77
- "sha256": "03b499a948b8467f1cfe2b4e63190feb7b8b9d96461055638e774253b9b6b2d4"
77
+ "sha256": "336b3f3f6141cef8750ab18d29bbe454caf26973830a86afe099d9e4ad8b0abe"
78
78
  },
79
79
  "harness/planning/decompose": {
80
80
  "path": ".pi/agents/harness/planning/decompose.md",
81
- "sha256": "1b3f85d956d2e203ec87045a731c47f8b40f75b63fce8916fda91cefc39244a8"
81
+ "sha256": "0919dafa1d1cd008d513c28524c1e7218867586a138982dccf01db5270c42c73"
82
82
  },
83
83
  "harness/planning/execution-plan-author": {
84
84
  "path": ".pi/agents/harness/planning/execution-plan-author.md",
85
- "sha256": "a69fb2e8bda9336e71ce9536071f9c8a2f4abd9d9d88930c6a8be29bdc9c5f62"
85
+ "sha256": "55ece0f1ee14abd17fe7b3e478b548240f637eacbfc2a34758e98d3878dc82fd"
86
86
  },
87
87
  "harness/planning/hypothesis-validator": {
88
88
  "path": ".pi/agents/harness/planning/hypothesis-validator.md",
89
- "sha256": "f75312439c441ccee72692d41f44b6e733df08e06c89e930740fc256bed3ba02"
89
+ "sha256": "36f0baa7796229f21bd02faf5e70402c7bf054289eab557a25bfbe3cb7781de7"
90
90
  },
91
91
  "harness/planning/hypothesis": {
92
92
  "path": ".pi/agents/harness/planning/hypothesis.md",
93
- "sha256": "b20c527d15c2243cd5d3a8f16cea6d44bdfd16e01915d42f3b830bf9938e5f8b"
93
+ "sha256": "e83d5c4faaee8d32af4a5f22c9917b70a173f3e22d7c0f182b361706f2309171"
94
+ },
95
+ "harness/planning/implementation-researcher": {
96
+ "path": ".pi/agents/harness/planning/implementation-researcher.md",
97
+ "sha256": "653f320b5d51bb331774246687f24a75347b406bba4e6dfd2968d6e5d4cc8bb3"
94
98
  },
95
99
  "harness/planning/plan-adversary": {
96
100
  "path": ".pi/agents/harness/planning/plan-adversary.md",
97
- "sha256": "84c7fa63d38c39e32000c90093688a45bc2b96a2c6209037342222eae0c854f9"
101
+ "sha256": "3241d7ec939dc29e0af64690b99e9f74b209f40b0daa4a2a1f9ff86f99f94a8d"
98
102
  },
99
103
  "harness/planning/plan-evaluator": {
100
104
  "path": ".pi/agents/harness/planning/plan-evaluator.md",
101
- "sha256": "580d8c7a31f7a6ecd9e627460459d600650580b5df63d129278beefd3f3e347c"
105
+ "sha256": "71660ab58bfcfdfae56c873140d4ea5946ae30cd5719c96afeabfd02b1d1f81d"
102
106
  },
103
107
  "harness/planning/review-integrator": {
104
108
  "path": ".pi/agents/harness/planning/review-integrator.md",
105
- "sha256": "cd1e5d10f0cb8b7a4197d2e92489023c285e90e250f1badc371470165aeb8cfd"
109
+ "sha256": "cf3f0dbe81274ec9ef0ff2e0c170e8dc929b20be65492d0ee9a80d985acf6d71"
106
110
  },
107
111
  "harness/planning/scout-graphify": {
108
112
  "path": ".pi/agents/harness/planning/scout-graphify.md",
109
- "sha256": "8a5ff68306a5eedf1a62067ac8812eac4ac1fe2016cba63337ef4e90b5136e00"
113
+ "sha256": "6e2bda8ad38311810c9916d9dab311873bc776e4b8832bb0e574136e45e1255e"
110
114
  },
111
115
  "harness/planning/scout-semantic": {
112
116
  "path": ".pi/agents/harness/planning/scout-semantic.md",
113
- "sha256": "36bd424ebd422bda82bd447b22f591f99f32ec897ea43f385586119da5c26caa"
117
+ "sha256": "416e518d8204a55b26dc53da1f750865c6f09ee2c7f343b41e7c08da3230c089"
114
118
  },
115
119
  "harness/planning/scout-structure": {
116
120
  "path": ".pi/agents/harness/planning/scout-structure.md",
117
- "sha256": "e67b7cd75519e5ae36e1bb5f49ca158888c28d365465863aee50a9b2e8e5b7d7"
121
+ "sha256": "76c42a15cc74cf1de2cf861cb0146c865c205f69cce7b9605d41893b19600029"
118
122
  },
119
123
  "harness/planning/sprint-contract-auditor": {
120
124
  "path": ".pi/agents/harness/planning/sprint-contract-auditor.md",
121
- "sha256": "f613a4fa937d76936fa01155d4e7956a81878f300100f99f6a78915b0af6f7c7"
125
+ "sha256": "12cb5e6b53dcc19ace62e8e4c152d96440717df53a182e76216dd2327410df4d"
122
126
  },
123
127
  "harness/planning/stack-researcher": {
124
128
  "path": ".pi/agents/harness/planning/stack-researcher.md",
125
- "sha256": "90e2ff1348f54bebc8c0392407bf1bb4d794c942fd8d6f342d80b191c945b34e"
129
+ "sha256": "ce546ef3aca19da7f334f07cef8f510b79068bffeb7f276c428f3e6236bbe96b"
126
130
  }
127
131
  }
128
132
  }
@@ -13,9 +13,7 @@
13
13
  - `harness/planning/decompose` — DeepMind-style problem decomposition (`PlanDecompositionBrief`)
14
14
  - `harness/planning/hypothesis` — DARWIN hypothesis generation (`PlanHypothesisBrief`)
15
15
  2. **Parent maps hypothesis → PlanPacket** — `plan-packet.schema.json` unchanged; execution gating stable.
16
- 3. **Parallel pre-approval reviews:**
17
- - `harness/planning/plan-adversary` — execution risk on PlanPacket
18
- - `harness/planning/hypothesis-eval` — blind self-eval (task + hypothesis only)
16
+ 3. **Review Gate (ADR 0035):** outcome-based debate with `hypothesis-validator` on R1 (blind — task + hypothesis only). Retired `hypothesis-eval` as a separate pre-approval agent.
19
17
  4. **`approve_plan` optional `research_brief`** — rendered in `plan-review.md`; not written to `plan-packet.json`.
20
18
  5. **`--quick`** still skips semantic scout only; never skips decompose/hypothesis.
21
19
 
@@ -2,26 +2,34 @@
2
2
 
3
3
  ## Status
4
4
 
5
- Accepted (2026-05-18)
5
+ Accepted (2026-05-18); amended 2026-05-19 (outcome-based debate + ping-pong dialogue)
6
6
 
7
7
  ## Context
8
8
 
9
9
  `/harness-plan` produced thin PlanPackets (scope + bullets). Post-execute adversarial review (`/harness-critic`) ran too late. Graphify corpus (Structured Planning, ADR-020, Generator–Evaluator) defines WBS, validation, and review gate before baseline.
10
10
 
11
+ Early implementation treated debate as a fixed four-round checklist with single evaluator→adversary exchange per round, which ended debate on round count rather than focus coverage and quality.
12
+
11
13
  ## Decision
12
14
 
13
15
  1. **PlanPacket 1.1.0** — required `execution_plan` (phases, work_items, sprint_contract, dag_validation).
14
16
  2. **YAML on disk** — `plan-packet.yaml`, `research-brief.yaml`, `run-context.yaml`, `artifacts/*.yaml`. JSON Schema unchanged; instances validated after YAML parse.
15
17
  3. **Review Gate agents** — `stack-researcher`, `execution-plan-author`, debate: `hypothesis-validator`, `plan-evaluator`, `plan-adversary`, `sprint-contract-auditor`, `review-integrator`.
16
- 4. **Debate bus** — `debate_id=plan-<run_id>`, plan budget profile (4 rounds, 12k cap), plan-phase consensus prerequisites.
17
- 5. **No legacy JSON** plan paths; no pre-debate standalone `hypothesis-eval`.
18
+ 4. **Debate bus** — `debate_id=plan-<run_id>`, plan budget profile:
19
+ - `min_focus_rounds=4`, `max_rounds=12`, `max_exchanges_per_round=3`
20
+ - `round_token_cap=8000`, `debate_global_cap=80000`
21
+ 5. **Outcome-based completion** — consensus `adversarial_debate_completed` when all focuses `spec|wbs|schedule|quality` are covered in submitted review rounds, last `review_gate_ready: true`, and parent DAG validation passes (not `round_count >= 4` alone).
22
+ 6. **Within-round dialogue** — pi-messenger kinds: `claim`, `rebuttal`, `clarification`, `counter`; parent orchestrates ping-pong via `harness_debate_round_status` / `harness_debate_advance_thread` before integrator.
23
+ 7. **Sequential debate spawns** — parent must not parallelize debate lane subagents in one batch.
24
+ 8. **No legacy JSON** plan paths; no pre-debate standalone `hypothesis-eval`.
18
25
 
19
26
  ## Consequences
20
27
 
21
- - Positive: PM-grade plans, deterministic DAG gate, blind hypothesis eval in debate R1.
22
- - Negative: Higher spawn/token cost; `harness-verify` and smoke fixtures must use `.yaml`.
28
+ - Positive: PM-grade plans, deterministic DAG gate, blind hypothesis eval in debate R1, richer evaluator↔adversary threads, extendable round index for partial re-debate.
29
+ - Negative: Higher token cost (80k debate cap vs 12k); parent orchestration more stateful; smoke fixtures must include four `debate_round_focus` values.
23
30
 
24
31
  ## References
25
32
 
26
33
  - [ADR-0033](0033-parent-orchestrated-planning.md), [ADR-0034](0034-darwin-plan-research-pipeline.md)
27
34
  - `raw/decisions/adr-020.md`, `raw/modules/structured-planning.md`
35
+ - `.pi/prompts/planning-rubrics.md`, `.pi/prompts/harness-plan.md` Phase 5
@@ -0,0 +1,51 @@
1
+ # ADR 0036: Implementation research and selective debate
2
+
3
+ - **Status:** Accepted
4
+ - **Date:** 2026-05-19
5
+
6
+ ## Context
7
+
8
+ ADR 0034–0035 established Darwin research and outcome-based Review Gate debate. Gaps remained:
9
+
10
+ - No dedicated pass for external solution patterns vs in-repo stack selection.
11
+ - Debate always required all four focuses with `min_focus_rounds=4`, even for low-risk tasks.
12
+ - Sprint-contract-auditor spawn in code did not match prompt (quality focus).
13
+
14
+ ## Decision
15
+
16
+ 1. **Phase 3.5** — After decompose/hypothesis, parent spawns in parallel:
17
+ - `harness/planning/implementation-researcher` → `PlanImplementationResearchBrief` → `artifacts/implementation-research.yaml`
18
+ - `harness/planning/stack-researcher` → `PlanStackBrief` → `artifacts/stack.yaml`
19
+ 2. Research stays **outside** debate; debate agents cite artifacts, no web tools.
20
+ 3. **Phase 4d** — `harness_plan_debate_eligibility` (pre-debate only) selects `full | standard | light` and `required_focuses`; persisted on messenger + bus at `harness_debate_open`.
21
+ 4. **Light profile** — `spec` + `quality` only, `min_focus_rounds=2`, reduced global cap; gate uses stored `required_focuses` (not hardcoded four).
22
+ 5. **Sprint auditor** — shared `lanesForRound(roundIndex, focus)` spawns sprint lane when `focus === quality` OR `roundIndex >= 4`.
23
+ 6. **`--quick`** still skips semantic scout only; never skips Phase 3.5 or debate.
24
+
25
+ ## Profiles
26
+
27
+ | Profile | When | Focuses | min_focus_rounds |
28
+ |---------|------|---------|-------------------|
29
+ | full | high risk, material fork, open implementation questions, DAG manual patch, many tensions | all four | 4 |
30
+ | standard | default (ambiguous → standard) | all four | 4 |
31
+ | light | low risk, no fork, high-confidence implementation + clear stack primary | spec, quality | 2 |
32
+
33
+ ## Consequences
34
+
35
+ ### Positive
36
+
37
+ - Better plans on hard tasks (external patterns before WBS).
38
+ - Cheaper low-risk plans (light debate).
39
+ - Deterministic eligibility and gate alignment.
40
+
41
+ ### Negative
42
+
43
+ - Extra subagent per plan (implementation-researcher).
44
+ - Parents must run eligibility before `harness_debate_open`.
45
+
46
+ ## References
47
+
48
+ - `.pi/prompts/harness-plan.md`
49
+ - `.pi/harness/specs/plan-implementation-research-brief.schema.json`
50
+ - `.pi/extensions/lib/plan-debate-eligibility.ts`
51
+ - ADR 0034, ADR 0035
@@ -0,0 +1,31 @@
1
+ # ADR 0037: Subagent submit tools (replace JSON prose contracts)
2
+
3
+ **Status:** Accepted
4
+ **Date:** 2026-05-19
5
+
6
+ ## Context
7
+
8
+ Harness plan/execute agents used fenced JSON in `finalOutput`, requiring the parent orchestrator to parse prose and call `write_harness_yaml`. This was fragile (truncated parallel summaries, invalid JSON, double-hop writes).
9
+
10
+ Planning agents set `extensions: false` and subprocess spawn used `--no-extensions`, so harness tools were unavailable in children.
11
+
12
+ ## Decision
13
+
14
+ 1. **Option A — subprocess-only extension bundle:** vendored spawn passes `--no-extensions -e .pi/extensions/harness-subagent-submit.ts` for `harness/*` agents with `extensions: false`.
15
+ 2. **Scoped `submit_*` tools** per agent, validated against `.pi/harness/specs/*.schema.json` (Ajv) and written deterministically under `HARNESS_RUN_DIR`.
16
+ 3. **Parent gates** via `harness_artifact_ready` (file existence) instead of parsing subprocess JSON.
17
+ 4. **Debate lanes:** `tool_result` hook prefers last `submit_*` in `details.results[].messages`; skips `finalOutput` auto-apply when submit present (`HARNESS_SUBMIT_TOOLS` default on).
18
+ 5. **Parent** blocks all `submit_*`; keeps `write_harness_yaml` for merges and debate round submission only.
19
+
20
+ ## Consequences
21
+
22
+ - Agent frontmatter lists one terminal `submit_*` tool per role.
23
+ - `HarnessSpawnContext` must include `run_id` / `run_dir`; bridge sets `HARNESS_RUN_ID`, `HARNESS_RUN_DIR`, `HARNESS_AGENT_ID` on spawn.
24
+ - `parseHarnessAgentJson` retained for migration/tests; hot path is tool args.
25
+ - See ADR 0038 for budget telemetry-only default.
26
+
27
+ ## References
28
+
29
+ - `.pi/extensions/harness-subagent-submit.ts`
30
+ - `.pi/extensions/lib/harness-subagent-submit-registry.ts`
31
+ - `.pi/harness/specs/plan-scout-findings.schema.json`
@@ -0,0 +1,23 @@
1
+ # ADR 0038: Budget enforcement telemetry-only (default)
2
+
3
+ **Status:** Accepted
4
+ **Date:** 2026-05-19
5
+
6
+ ## Context
7
+
8
+ Token and debate caps emitted `harness-budget-exhausted`, which set `budgetExhausted` in the live widget and blocked flows even when `HARNESS_BUDGET_HARD_STOP` was false. `max_rounds` and messenger exchange limits in `validatePlanDebateGate` also hard-failed approval.
9
+
10
+ ## Decision
11
+
12
+ - **`HARNESS_BUDGET_ENFORCE` default `off`:** phase/debate caps log `harness-budget-soft-limit` and `harness-budget-telemetry` only; `harness-budget-exhausted` is emitted only when enforce is on **and** hard-stop flags are set.
13
+ - **UI:** `budgetExhausted` / blocked substate only when blocking exhaustion events qualify.
14
+ - **Debate:** `capsForDebate` uses sentinel caps when enforce is off; `max_rounds` gate errors become warnings.
15
+ - **CLI:** `--budget` on harness prompts is reserved/no-op until a real budget story ships.
16
+
17
+ Re-enable: `HARNESS_BUDGET_ENFORCE=1` plus `HARNESS_BUDGET_HARD_STOP` / `HARNESS_DEBATE_HARD_STOP` as needed.
18
+
19
+ ## Consequences
20
+
21
+ - Long debates and large plans are not blocked by soft token telemetry.
22
+ - Quality gates (`min_focus_rounds`, required focuses, `review_gate_ready`) remain enforced.
23
+ - PostHog should prefer `harness_budget_telemetry` over exhausted for dashboards until enforce returns.
@@ -20,6 +20,10 @@ Team-shared ADRs for the ultimate-pi harness live under `.pi/harness/docs/adrs/`
20
20
  | [0032](0032-harness-command-orchestration.md) | Harness commands as agent orchestrators | Accepted |
21
21
  | [0033](0033-parent-orchestrated-planning.md) | Parent-orchestrated harness planning | Accepted |
22
22
  | [0034](0034-darwin-plan-research-pipeline.md) | Darwin plan research pipeline | Accepted |
23
+ | [0035](0035-plan-phase-review-gate.md) | Plan-phase Review Gate | Accepted |
24
+ | [0036](0036-implementation-research-and-selective-debate.md) | Implementation research and selective debate | Accepted |
25
+ | [0037](0037-subagent-submit-tools.md) | Subagent submit tools (subprocess extension) | Accepted |
26
+ | [0038](0038-budget-telemetry-only.md) | Budget caps telemetry-only by default | Accepted |
23
27
 
24
28
  ## Template
25
29