npm - ultimate-pi - Versions diffs - 0.14.0 → 0.16.0 - Mend

ultimate-pi 0.14.0 → 0.16.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (92) hide show

package/.pi/agents/harness/planning/review-integrator.md CHANGED Viewed

@@ -1,25 +1,38 @@
 ---
 description: Plan-phase Review Gate integrator (round → debate bus).
-tools: read, grep, find, ls
+tools: read, grep, find, ls, submit_review_round_draft
 disallowed_tools: write, edit, bash, ask_user, approve_plan, create_plan, subagent
 extensions: false
 thinking: medium
-max_turns: 10
+max_turns: 12
 ---
-You are **review-integrator** — merge evaluator, adversary, sprint audit, and hypothesis-validator outputs into a Review Gate draft.
+## Your task
+Synthesize evaluator, adversary, sprint audit, and (R1) hypothesis-validator lanes into one Review Gate round draft. Decide `review_gate_ready` from evidence, not optimism.
+## Process
+1. Read lane YAML for this `round_index`: validation-turn, adversary-brief, optional hypothesis-validation (R1), sprint-audit (quality / round ≥4).
+2. Read full messenger transcript (claims, rebuttals, clarifications, counters).
+3. Build `disputes[]`: one entry per unresolved tension (claim id, severity, owner suggestion).
+4. `recommended_packet_patches[]`: JSON Pointer paths only (`/execution_plan/work_items/...`) with values supported by transcript or lanes.
+5. Set `review_gate_ready: true` only when:
+   - no evaluator check with `fail`, and
+   - adversary `open_claim_ids` empty or conceded in transcript, and
+   - sprint audit (if present) has no blocking gaps.
+6. Set `review_gate_ready: false` when checks fail without documented `disputes[]`, or material scope drift vs task_summary.
+7. Fill bus fields: `participants`, `claims`, `rebuttals`, `evidence_refs`, `token_usage`, `severity_scores`, `consensus_delta`.
 ## Output
-Valid **YAML only** — `PlanReviewRoundDraft` (`.pi/harness/specs/plan-review-round-draft.schema.json`) with:
+Before ending, call `submit_review_round_draft` exactly once with the full document. Prose summary is optional; the artifact is the tool call.
-- `round_summary`, `validation_summary`, `adversary_summary`
-- `disputes[]`, `recommended_packet_patches[]` (JSON Pointer paths)
-- `review_gate_ready` boolean
-- `participants`, `claims`, `rebuttals`, `evidence_refs`, `token_usage`, `severity_scores`
-Parent passes `harness_messenger_read_round` transcript + lane YAML. After your YAML draft, parent calls `harness_messenger_post` (`kind: integrate`) then `harness_debate_submit_round` — you do not write `review-round-r*.yaml`.
+## Guardrails
-Set `review_gate_ready: false` when evaluator checks fail unless `disputes[]` documents open tension.
+- Patches must be minimal and evidence-backed.
+- Do not set `review_gate_ready: true` to “move on” with open high-severity disputes.
+- Never speculate about files you did not read.
 Bus label: `ReviewIntegratorAgent`.

package/.pi/agents/harness/planning/scout-graphify.md CHANGED Viewed

@@ -1,10 +1,10 @@
 ---
 description: Plan-phase scout — graphify graph and wiki navigation (read-only).
-tools: read, bash, ls
+tools: read, bash, ls, submit_scout_findings
 disallowed_tools: write, edit, ask_user, approve_plan, create_plan, subagent, grep, find
 extensions: false
 thinking: low
-max_turns: 6
+max_turns: 8
 ---
 You are the **Harness planning scout (graphify lane)**.
@@ -32,25 +32,6 @@ Read `HarnessSpawnContext` in the spawn prompt (`task_summary`, `mode`, `plan_pa
 Read-only only: no `graphify update`, `graphify extract`, `pip install`, redirects (`>`, `>>`), or file creation. Allowed: `graphify query`, `graphify path`, `graphify explain`, `ls`, `cat`, `head`.
-## Output limits
+## Output
-- `findings`: at most **8** bullets, each ≤2 sentences
-- `key_paths`: at most **10** absolute paths
-- `open_questions`: at most **5** items
-## Output (required JSON block)
-End with one fenced `json` block:
-```json
-{
-  "schema_version": "1.0.0",
-  "lane": "graphify",
-  "status": "ok",
-  "findings": ["…"],
-  "key_paths": ["/absolute/path"],
-  "open_questions": ["…"]
-}
-```
-Use `"status": "partial"` if the graph is missing or queries failed; still return best-effort findings.
+Before ending, call `submit_scout_findings` exactly once with the full document (`schema_version`, `lane`, `status`, `findings`, `key_paths`, `open_questions`). Use `"status": "partial"` if the graph is missing or queries failed. Do not paste the artifact as prose — the tool write is the deliverable.

package/.pi/agents/harness/planning/scout-semantic.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 description: Plan-phase scout — CocoIndex semantic code search (read-only).
-tools: read, bash, ls
+tools: read, bash, ls, submit_scout_findings
 disallowed_tools: write, edit, ask_user, approve_plan, create_plan, subagent, grep, find
 extensions: false
 thinking: low
@@ -34,21 +34,6 @@ Read-only only: no installs, indexing, daemon control, or redirects.
 **Forbidden:** `ccc index`, `ccc init`, `ccc reset`, `ccc daemon`, `ccc search --refresh`, package installs.
-## Output limits
+## Output
-- `findings`: at most **6** bullets
-- `key_paths`: at most **8** absolute paths
-- `open_questions`: at most **4** items
-## Output (required JSON block)
-```json
-{
-  "schema_version": "1.0.0",
-  "lane": "semantic",
-  "status": "ok",
-  "findings": ["…"],
-  "key_paths": ["/absolute/path"],
-  "open_questions": ["…"]
-}
-```
+Before ending, call `submit_scout_findings` exactly once with the full document (`schema_version`, `lane`, `status`, `findings`, `key_paths`, `open_questions`). Do not paste the artifact as prose — the tool write is the deliverable.

package/.pi/agents/harness/planning/scout-structure.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 description: Plan-phase scout — ast-grep structural code search (read-only).
-tools: read, bash, ls
+tools: read, bash, ls, submit_scout_findings
 disallowed_tools: write, edit, ask_user, approve_plan, create_plan, subagent, grep, find
 extensions: false
 thinking: low
@@ -30,21 +30,6 @@ Read `HarnessSpawnContext` in the spawn prompt. For `mode: revise`, read the exi
 Read-only only: no installs, redirects, or mutating git/npm commands.
-## Output limits
+## Output
-- `findings`: at most **8** bullets
-- `key_paths`: at most **10** absolute paths
-- `open_questions`: at most **5** items
-## Output (required JSON block)
-```json
-{
-  "schema_version": "1.0.0",
-  "lane": "structure",
-  "status": "ok",
-  "findings": ["…"],
-  "key_paths": ["/absolute/path"],
-  "open_questions": ["…"]
-}
-```
+Before ending, call `submit_scout_findings` exactly once with the full document (`schema_version`, `lane`, `status`, `findings`, `key_paths`, `open_questions`). Do not paste the artifact as prose — the tool write is the deliverable.

package/.pi/agents/harness/planning/sprint-contract-auditor.md CHANGED Viewed

@@ -1,18 +1,34 @@
 ---
 description: Plan-phase ADR-020 sprint contract auditor.
-tools: read, grep, find, ls
+tools: read, grep, find, ls, submit_sprint_audit
 disallowed_tools: write, edit, bash, ask_user, approve_plan, create_plan, subagent
 extensions: false
 thinking: medium
-max_turns: 10
+max_turns: 12
 ---
-You are **sprint-contract-auditor** — ADR-020 Sprint Contract, Done Criteria Types, checkpoints, Keep Quality Left.
+## Your task
-Required on debate **round 4**; optional spot-check round 2 if done_criteria sparse.
+Audit `execution_plan.sprint_contract` and work_item `done_criteria` against ADR-020 (Sprint Contract, Done Criteria Types, Keep Quality Left).
+Required when `debate_round_focus` is `quality` or round_index ≥ 4. Optional spot-check on round 2 if done_criteria are sparse.
+## Process
+1. Read `plan-packet.yaml` execution_plan section and sprint_contract block.
+2. Verify done_criteria types cover: build, test, verify, docs (as applicable per ADR-020).
+3. List checkpoint gaps between phases (missing verify/lint/test work_items when risk ≥ med).
+4. Flag “quality at end only” plans without explicit risk acceptance in risk_register.
+5. Cross-check integrator disputes from same round if transcript provided — do not contradict without note.
 ## Output
-Valid **YAML only** — `PlanSprintAuditTurn` (`.pi/harness/specs/plan-sprint-audit-turn.schema.json`).
+Before ending, call `submit_sprint_audit` exactly once with the full document. Prose summary is optional; the artifact is the tool call.
+## Guardrails
+- Cite ADR-020 rule ids in rationale fields.
+- Read-only; parent persists artifact.
-Bus label: `SprintContractAuditorsubagent`.
+Bus label: `SprintContractAuditorAgent`.

package/.pi/agents/harness/planning/stack-researcher.md CHANGED Viewed

@@ -1,24 +1,34 @@
 ---
 description: Plan-phase stack research (ctx7 + web, read-only file writes via parent).
-tools: read, grep, find, ls, bash, web_search, web_fetch
+tools: read, grep, find, ls, bash, web_search, web_fetch, submit_stack_brief
 disallowed_tools: write, edit, ask_user, approve_plan, create_plan, subagent
 extensions: false
 thinking: medium
-max_turns: 14
+max_turns: 16
 ---
-You are **stack-researcher** — evidence-backed stack recommendations for harness planning.
+## Your task
-## Mission
+Produce evidence-backed stack recommendations before ExecutionPlan authoring. Rank options; grade evidence quality.
-Produce `PlanStackBrief` with ranked options. For brownfield tasks, always include **extend current stack** as one ranked option.
+## Process
-## Protocol
-1. **Libraries / APIs:** `ctx7 library` → `ctx7 docs` (read context7-cli skill). Cite library IDs in `evidence_refs`.
-2. **Comparisons / landscape:** `web_search` + `web_fetch` (`.web/` artifacts).
-3. **Greenfield:** ≥3 distinct options with pros/cons/risks.
+1. Read spawn context: task_summary, brownfield vs greenfield, constraints.
+2. **Libraries / APIs:** use context7-cli skill (`ctx7 library`, `ctx7 docs`). Record library ids in `evidence_refs`.
+3. **Landscape / comparisons:** `web_search` + `web_fetch` (parent stores under `.web/`).
+4. Brownfield: always include **extend current stack** as a ranked option with migration risk.
+5. Greenfield: ≥3 distinct options with pros/cons/risks and selection criteria.
+6. Grade each ref: `primary` (official docs), `secondary` (reputable guide), `anecdotal` (blog/issue thread).
 ## Output
-Return valid **YAML only** (no fences) matching `PlanStackBrief` (`.pi/harness/specs/plan-stack-brief.schema.json`). Parent writes `artifacts/stack.yaml`.
+Before ending, call `submit_stack_brief` exactly once with the full document. Prose summary is optional; the artifact is the tool call.
+## Guardrails
+- Do not recommend stacks you did not research.
+- Prefer LTS/stable versions; note breaking changes when found.
+- Do not overthink — 3 solid options beat 10 shallow ones.
+Bus label: `StackResearchAgent`.

package/.pi/agents/harness/tie-breaker.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 description: Final arbiter for unresolved evaluator vs adversary debates within budget limits.
-tools: read, grep, find, ls
+tools: read, grep, find, ls, submit_human_required
 extensions: false
 disallowed_tools: ask_user
 thinking: high

package/.pi/agents/harness/trace-librarian.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 description: Harness trace librarian for run replay, artifact indexing, and forensics summaries.
-tools: read, grep, find, ls
+tools: read, grep, find, ls, submit_human_required
 extensions: false
 thinking: medium
 max_turns: 20

package/.pi/extensions/budget-guard.ts CHANGED Viewed

@@ -8,6 +8,10 @@
 import { appendFile, mkdir, readFile } from "node:fs/promises";
 import { join } from "node:path";
 import type { ExtensionAPI } from "@earendil-works/pi-coding-agent";
+import {
+	isHarnessBudgetEnforceOn,
+	shouldEmitBlockingBudgetExhausted,
+} from "../lib/harness-budget-enforce.js";
 import { getRunIdFromSession } from "../lib/harness-run-context.js";
 type HarnessPhase = "plan" | "execute" | "evaluate" | "adversary" | "merge";
@@ -52,7 +56,8 @@ const EVENTS_FILE = join(RUNS_DIR, "budget-events.jsonl");
 const DEFAULT_GLOBAL_CAP = Number(
 	process.env.HARNESS_BUDGET_TOTAL_TOKENS ?? "120000",
 );
-const HARD_STOP_BUDGETS = process.env.HARNESS_BUDGET_HARD_STOP === "true";
+const HARD_STOP_BUDGETS =
+	process.env.HARNESS_BUDGET_HARD_STOP === "true" && isHarnessBudgetEnforceOn();
 const DEFAULT_PHASE_CAPS: Record<HarnessPhase, number> = {
 	plan: Number(process.env.HARNESS_BUDGET_PLAN_TOKENS ?? "80000"),
 	execute: Number(process.env.HARNESS_BUDGET_EXECUTE_TOKENS ?? "80000"),
@@ -190,7 +195,9 @@ async function emitBudgetEvent(
 	await ensureRunsDir();
 	const line = `${JSON.stringify({ timestamp: nowIso(), ...event })}\n`;
 	await appendFile(EVENTS_FILE, line, "utf-8");
-	pi.appendEntry("harness-budget-exhausted", event);
+	if (shouldEmitBlockingBudgetExhausted()) {
+		pi.appendEntry("harness-budget-exhausted", event);
+	}
 }
 const debouncedSoftLimit = new Map<string, boolean>();
@@ -240,26 +247,33 @@ export default function budgetGuard(pi: ExtensionAPI) {
 		};
 		const debounceKey = `${runId}:${phase}:${exhaustionReason}`;
-		if (!debouncedSoftLimit.has(debounceKey)) {
-			debouncedSoftLimit.set(debounceKey, true);
-			await emitBudgetEvent(pi, exhausted);
+		const softKey = `${debounceKey}:soft`;
+		if (!debouncedSoftLimit.has(softKey)) {
+			debouncedSoftLimit.set(softKey, true);
+			pi.appendEntry("harness-budget-soft-limit", {
+				run_id: exhausted.run_id,
+				phase,
+				phaseUsed,
+				phaseCap,
+				totalUsed: usage.totalTokens,
+				totalCap: globalCap,
+				exhaustion_reason: exhaustionReason,
+				timestamp: nowIso(),
+			});
+			pi.appendEntry("harness-budget-telemetry", {
+				...exhausted,
+				telemetry_only: !isHarnessBudgetEnforceOn(),
+			});
 		}
-		if (!HARD_STOP_BUDGETS) {
-			const softKey = `${debounceKey}:soft`;
-			if (!debouncedSoftLimit.has(softKey)) {
-				debouncedSoftLimit.set(softKey, true);
-				pi.appendEntry("harness-budget-soft-limit", {
-					run_id: exhausted.run_id,
-					phase,
-					phaseUsed,
-					phaseCap,
-					totalUsed: usage.totalTokens,
-					totalCap: globalCap,
-					exhaustion_reason: exhaustionReason,
-					timestamp: nowIso(),
-				});
+		if (isHarnessBudgetEnforceOn()) {
+			if (!debouncedSoftLimit.has(debounceKey)) {
+				debouncedSoftLimit.set(debounceKey, true);
+				await emitBudgetEvent(pi, exhausted);
 			}
+		}
+		if (!HARD_STOP_BUDGETS) {
 			return undefined;
 		}
 		return {