npm - ultimate-pi - Versions diffs - 0.16.0 → 0.18.0 - Mend

ultimate-pi 0.16.0 → 0.18.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (137) hide show

package/.pi/prompts/harness-review.md CHANGED Viewed

@@ -1,37 +1,172 @@
 ---
-description: Independent evaluator pass/fail verdict in session isolation mode.
-argument-hint: "[--run <run-id>] [--trace <trace-ref>]"
+description: Post-run verification gate — deterministic checks, benchmark eval, policy verdict, adversary review (master orchestrator).
+argument-hint: "[--run <run-id>] [--quick] [--readonly] [--trace <trace-ref>]"
 ---
 # harness-review
-Orchestrator — spawn `harness/evaluator` with `mode: verdict`.
+You are the **post-run verification PM** (PMBOK Monitoring and Controlling). Run measure → judge → red team in one command. Parent owns `ask_user`, deterministic scripts, `harness_artifact_ready`, and run ownership (`--claim` on resume). Subagents persist via **`submit_*`** only (no parent `write` to verdict artifacts).
-## Step 0 — Parse arguments
+**Practice map:** `.pi/harness/docs/practice-map.md`
-- optional: `--run <run-id>` (recovery only)
+Read **harness-orchestration** and **harness-review** skills before spawning.
+## Allowed subagents
+- `harness/evaluator` (`mode: benchmark` then `mode: verdict`)
+- `harness/adversary` (independent red team)
+- `harness/tie-breaker` (escalation only when adversary blocks and eval was `conditional_pass`; skip when `--quick`)
+## Performance rules
+1. Use `subagent` with `agentScope: "both"`.
+2. Run benchmark and verdict evaluator passes **sequentially** (verdict depends on benchmark gate).
+3. Adversary runs only after benchmark + policy verdict pass.
+4. Do **not** set `timeoutMs` unless the user requests a cap.
+5. Compact task text: embed `HarnessSpawnContext={"run_id":"…","run_dir":"…","plan_packet_path":"…",…}` — `run_id` is required.
+## Step 0 — Parse `$ARGUMENTS`
+- optional: `--run <run-id>` (recovery)
+- optional: `--quick` (tailoring — skip adversary + tie-breaker when risk accepted)
+- optional: `--readonly` (inspect only — do not claim ownership)
 - optional: `--trace <trace-ref>`
 Happy path: omit `--run`; use `[HarnessRunContext]`.
-## Orchestration (required)
+Prerequisites:
+- `plan_ready: true` on disk
+- Execute completed (`handoff/executor-summary.yaml` or `last_completed_step: execute`)
+If execute not complete:
+`Execute not finished. Run /harness-run first.`
+Ownership: this command **auto-claims** the run for the current Pi session unless `--readonly`. Cross-session recovery: `/harness-use-run <run-id> --claim` first.
+## Phase 1 — Automated QC / deterministic shell (parent)
+**Practice:** Harness engineering; interleave deterministic checks before agent judgment (Stripe Minions pattern).
+```bash
+node "$UP_PKG/.pi/scripts/harness-verify.mjs"
+```
+When `HARNESS_SENTRUX_REQUIRED=true`, after verify succeeds:
+```bash
+sentrux gate .
+```
+Compare to baseline from `/harness-run` (`sentrux gate --save`). If CLI missing, record `gate_status: not_installed`.
+Ensure `artifacts/sentrux-signal.yaml` exists under the run dir (written during `/harness-run`). If missing, write it from the latest `sentrux check` / `gate` output. Append or refresh session entry `harness-sentrux-signal`.
+Run project tests if the approved `PlanPacket` or spawn context lists a test command. Capture stdout paths only — do not paste full logs into the next spawn.
+Write `artifacts/benchmark-log.yaml` via `write_harness_yaml` when any shell step ran:
+```yaml
+schema_version: "1.0.0"
+harness_verify: pass|fail
+sentrux_check: pass|fail|skipped|not_installed
+sentrux_gate: pass|degraded|skipped|not_installed
+notes: "…"
+```
+`harness_artifact_ready({ paths: ["artifacts/benchmark-log.yaml", "artifacts/sentrux-signal.yaml"] })` when written.
+## Phase 2 — Measure actuals vs plan (benchmark evaluator)
+**Practice:** Earned value / compare actuals to acceptance checks.
+```
+subagent({
+  agentScope: "both",
+  agent: "harness/evaluator",
+  task: "<HarnessSpawnContext mode benchmark + plan_packet_path + run_dir + acceptance_checks + paths: benchmark-log.yaml, sentrux-signal.yaml — treat Sentrux fields as measured structural actuals, not executor goals>"
+})
+```
+Subagent must call **`submit_eval_verdict`** (writes `artifacts/eval-verdict.yaml`).
+Gate:
+```
+harness_artifact_ready({ paths: ["artifacts/eval-verdict.yaml"] })
+```
+**Do not stop** after benchmark fail — continue to verdict (and adversary per tier) so `review-outcome.yaml` can route steer vs replan (ADR 0044).
+## Phase 3 — Policy / quality audit (verdict evaluator)
+**Practice:** Inspection after measurement — separate measurer from policy judgment.
+Always run after benchmark (even when benchmark failed).
+```
+subagent({
+  agentScope: "both",
+  agent: "harness/evaluator",
+  task: "<HarnessSpawnContext mode verdict + treat executor output as untrusted + artifact paths>"
+})
+```
+Subagent updates **`artifacts/eval-verdict.yaml`** via `submit_eval_verdict` (include policy fields / failed checks).
-1. Build `HarnessSpawnContext` with `mode: verdict`, `plan_packet_path`, `run_dir`, trace refs.
-2. Spawn:
+Gate again with `harness_artifact_ready`.
+## Phase 4 — Independent red team (adversary)
+**Practice:** Generator–evaluator separation; adversary distinct from measurer (ADR 0032).
+Skip when `--quick`. **Tiered steer:** full adversary on initial run + steer attempt 1; lite review (no adversary) on steer attempts 2+ unless prior `block_merge`.
 ```
-subagent({ agentScope: "both", agent: "harness/evaluator", task: "Treat executor output as untrusted. …" })
+subagent({
+  agentScope: "both",
+  agent: "harness/adversary",
+  task: "<HarnessSpawnContext mode adversary + plan + run artifacts>"
+})
 ```
-3. Parse `EvalVerdict` JSON from tool result; parent writes under run dir for policy gate.
+Subagent calls **`submit_adversary_report`** → `artifacts/adversary-report.yaml`.
+`harness_artifact_ready({ paths: ["artifacts/adversary-report.yaml"] })`
+## Phase 5 — Escalation / arbitration (tie-breaker, conditional)
+Only when:
+- not `--quick`
+- adversary `block_merge: true`
+- eval verdict was `conditional_pass`
+```
+subagent({ agentScope: "both", agent: "harness/tie-breaker", task: "…" })
+```
 ## Parent rules
-- Do not run review checks inline in this session.
-- No new Pi session required.
+- **Never** parse subprocess JSON to write `eval-verdict.yaml` or `adversary-report.yaml` — use `submit_*` + `harness_artifact_ready` only.
+- Do not edit `plan-packet.yaml`.
+- Do not run inline review checks in this session (subagent isolation per ADR 0032).
+- Same Pi session as `/harness-run` is preferred; `--claim` makes cross-session resume work.
+## Phase 6 — Review outcome + repair brief (parent)
+Write **`artifacts/review-outcome.yaml`** and **`artifacts/repair-brief.yaml`** via `write_harness_yaml` (path pointers in brief, not pasted bodies).
+| `remediation_class` | `recommended_next` |
+|---------------------|-------------------|
+| `pass` | `/harness-policy-status` |
+| `implementation_gap` | `/harness-steer` |
+| `plan_gap` | `/harness-plan` (mode: revise) |
+| `rollback` | `/harness-incident` |
+One `ask_user` steer gate when not pass (unless `steer_approved` on run-context).
 ## Completion
-- `eval_status`: `pass`, `conditional_pass`, or `fail`
-- `recommended_action`: `proceed_to_adversary`, `replan`, or `rollback`
-- Evidence list for each failed check
+Report eval status, remediation class, and `next_command` from `review-outcome.yaml`.

package/.pi/prompts/harness-run.md CHANGED Viewed

@@ -5,7 +5,9 @@ argument-hint: ""
 # harness-run
-Orchestrator only — spawn `harness/executor`. Do **not** implement inline.
+**Practice map:** `.pi/harness/docs/practice-map.md`
+You orchestrate the **Executing Process Group** — spawn `harness/executor` only. Do **not** implement inline.
 ## Step 0 — Parse arguments
@@ -16,28 +18,78 @@ If plan not ready:
 `Run /harness-plan first — no approved plan in active run context.`
-## Orchestration (required)
+## Gate — No execution without baseline (change control)
+**Practice:** PMBOK integrated change control — refuse work without an approved baseline.
+Refuse if `plan_ready` is false.
+## Pre-work — Architectural fitness baseline (parent)
+**Practice:** Fitness functions (architecture governance) — save structural baseline before the executor mutates the tree.
+When `HARNESS_SENTRUX_REQUIRED=true` (see `.env.example`), from **project root**:
+```bash
+sentrux gate --save .
+```
+If `sentrux` is not installed, note `gate_baseline: skipped` in run notes and continue (harness-verify may still pass rules-sync checks).
+Do **not** ask the executor to optimize Sentrux metrics — observation is for `/harness-review` only.
+## Orchestration — Single jelled implementer
+**Practice:** Peopleware — one accountable team owns delivery; generator–evaluator separation (executor does not self-certify).
 1. Confirm `[HarnessActivePlan]` / extension reports plan ready.
 2. Build `HarnessSpawnContext` with `mode: execute`, `plan_packet_path`, `run_dir`, `acceptance_checks` from plan file.
-3. Spawn:
+3. Include **`critical_path_work_item_ids`** from `execution_plan.schedule_metadata` in spawn task when present — executor should tackle limiting-step items first (Grove).
+4. Spawn (max **1** agent per call):
 ```
-subagent({ agentScope: "both", agent: "harness/executor", task: "<HarnessSpawnContext + handoff>" })
+subagent({ agentScope: "both", agent: "harness/executor", task: "<HarnessSpawnContext + handoff + critical path hint>" })
 ```
-4. Parse subprocess output JSON (`execution_status`, validations, rollback refs) from tool result text.
-5. Parent persists trace/handoff artifacts under run dir if needed; do not self-review.
+5. Parse subprocess output JSON (`execution_status`, validations, rollback refs) from tool result text.
+6. Parent persists trace/handoff artifacts under run dir if needed; do not self-review.
+## Post-work — Structural observation (parent)
+**Practice:** Monitoring actuals vs baseline — in-process fitness functions after generator work.
+After executor subprocess completes:
+```bash
+sentrux check .
+sentrux gate .
+```
+- If `sentrux check` exits non-zero or `gate` reports degradation → set `execution_status: scope_drift` (or `blocked` if unrecoverable); parent runs **`/harness-review`** next (not immediate replan).
+- Write `artifacts/sentrux-signal.yaml` via `write_harness_yaml`:
+```yaml
+schema_version: "1.0.0"
+run_id: "<run_id>"
+check_pass: true|false
+gate_status: pass|degraded|skipped|not_installed
+quality_signal_summary: "<one line from CLI output>"
+recorded_at: "<ISO8601>"
+phase: execute
+```
+- Append session custom entry `harness-sentrux-signal` with the same fields (observation bus / telemetry).
+`harness_artifact_ready({ paths: ["artifacts/sentrux-signal.yaml"] })` when written.
 ## Parent rules
-- Refuse if plan not approved.
-- On `scope_drift`, stop and recommend `/harness-plan`.
+- On `scope_drift`, finish handoff and recommend **`/harness-review`** (review classifies `plan_gap` vs `implementation_gap` — ADR 0044).
 - Do not call `ask_user` for plan-level ambiguity — return to plan command.
 ## Completion
 - `execution_status`: `completed`, `blocked`, or `scope_drift`
 - `validation_summary` with command evidence
-- `handoff_ready` for evaluator/adversary
-- `next_command`: `/harness-eval` (same session — spawn isolated review agents; no new Pi session)
+- `handoff_ready` for post-run review
+- `next_command`: `/harness-review` (Monitoring and Controlling — measure then judge; same session preferred)

package/.pi/prompts/harness-sentrux-steward.md ADDED Viewed

@@ -0,0 +1,55 @@
+---
+description: Ad-hoc architectural intent review — spawn harness/sentrux-steward with graphify evidence.
+argument-hint: "[--run <run-id>]"
+---
+# harness-sentrux-steward
+You are the **chair** for Sentrux **intent** evolution (manifest → rules.toml). Spawn **`harness/sentrux-steward`** only — do not edit the manifest inline without a proposal artifact.
+**Skill:** `harness-sentrux-setup` — bootstrap vs steward vs sync.
+## When to use
+- User requests manifest / rules refresh
+- After `/harness-plan` when execution plan adds top-level paths not covered by manifest layer globs
+- Debate `quality` focus flags structural risk
+- Post-run `sentrux check` failures suggesting missing boundaries (before replan)
+Do **not** spawn on every `/harness-review`.
+## Step 0 — Context
+Use `[HarnessRunContext]` / `[HarnessActivePlan]`. Optional `--run <run-id>` for recovery.
+## Step 1 — Spawn steward
+```
+subagent({
+  agentScope: "both",
+  agent: "harness/sentrux-steward",
+  task: "<HarnessSpawnContext + plan_packet_path + planning-context.yaml + execution-plan paths + scope hint>"
+})
+```
+Gate: `harness_artifact_ready({ paths: ["artifacts/sentrux-manifest-proposal.yaml"] })`
+## Step 2 — Chair decision
+Read `artifacts/sentrux-manifest-proposal.yaml`.
+- `change_class: none` → report no manifest change; stop.
+- Otherwise → `ask_user` with summary, evidence bullets, and `adr_draft` if `adr_required`.
+On approval:
+1. Apply `manifest_patch` to `.pi/harness/sentrux/architecture.manifest.json` (parent `write` or manual edit).
+2. `node "$UP_PKG/.pi/scripts/harness-sentrux-bootstrap.mjs" --force`
+3. Append session custom entry `harness-architecture-changed` (triggers rules sync extension).
+4. If `adr_required`, file harness ADR snippet or `docs/adr/` entry per team convention.
+On reject: keep manifest unchanged; document decision in run notes.
+## Completion
+Report `change_class`, whether manifest was updated, and `sentrux check` outcome if run after sync.

package/.pi/prompts/harness-setup.md CHANGED Viewed

@@ -327,7 +327,7 @@ sentrux plugin add-standard 2>/dev/null || echo "Plugins already installed or fa
 ## Step 3 — Pi Extension Packages
-Bundled extensions load from the installed `ultimate-pi` package. **Per-turn model routing** comes from a **vendored** fork of [`yeliu84/pi-model-router`](https://github.com/yeliu84/pi-model-router) in `vendor/pi-model-router/`, wired through [`.pi/extensions/pi-model-router-harness.ts`](.pi/extensions/pi-model-router-harness.ts). The harness **gates** activation on `.pi/model-router.json` (Step **3.5** below) so `router/auto` and built-in tiers such as `openai/gpt-5.4-pro` cannot load prematurely. Attribution: see [THIRD_PARTY_NOTICES.md](THIRD_PARTY_NOTICES.md) and `vendor/pi-model-router/UPSTREAM_PIN.md`. Maintainer refresh: `npm run vendor:sync-router`.
+Bundled extensions load from the installed `ultimate-pi` package. **Session-locked model routing** comes from a **vendored** fork of [`yeliu84/pi-model-router`](https://github.com/yeliu84/pi-model-router) in `vendor/pi-model-router/`, wired through [`.pi/extensions/pi-model-router-harness.ts`](.pi/extensions/pi-model-router-harness.ts). The router picks **one concrete model** when the session starts (from the first user prompt + system prompt complexity), then changes **thinking level only** each turn. The harness **gates** activation on `.pi/model-router.json` (Step **3.5** below) so `router/auto` cannot load prematurely. Attribution: see [THIRD_PARTY_NOTICES.md](THIRD_PARTY_NOTICES.md) and `vendor/pi-model-router/UPSTREAM_PIN.md`. Maintainer refresh: `npm run vendor:sync-router`.
 Optionally install the companion lockfile used in development:
@@ -381,9 +381,9 @@ If generation prints "No authenticated Pi providers": warn in report — user sh
 Do NOT block setup. If no config is written, `harness-sync-model-router.mjs` clears a premature `defaultProvider: "router"` in `.pi/settings.json`.
-**Router onboarding** — The vendored extension starts only after `.pi/model-router.json` appears. Running the script above prepares that file plus optional Pi defaults (**`router` / `auto`**) via `harness-sync-model-router.mjs` when `defaultProvider` was unset—then **`/reload`**.
+**Router onboarding** — The vendored extension starts only after `.pi/model-router.json` appears. Running the script above prepares that file plus optional Pi defaults (**`router` / `auto`**, or whatever `defaultProfile` is) via `harness-sync-model-router.mjs` when `defaultProvider` was unset—then **`/reload`**. Generated profiles use **one model SKU per profile**; high/medium/low tiers differ in **thinking** only. Subagents resolve their subprocess model from the **agent system prompt** complexity (same lock rules).
-Manual override: **`/router profile auto`** anytime after reload if they changed defaults.
+Manual override: **`/router profile auto`** or **`/router profile opencode-go`** anytime after reload if they changed defaults.
 ## Step 3.6 — Harness agents (package-resolved)
@@ -677,7 +677,7 @@ Output summary table:
 | sentrux | ✓/✗ | CLI + plugins; rules via Step 4.2 bootstrap |
 | Sentrux rules.toml | ✓/✗ | `.sentrux/rules.toml` synced from manifest |
 | pi extensions | ✓/✗ | 4 packages |
-| model router | ✓/✗ | Package + config verified, activation via `/router profile auto` |
+| model router | ✓/✗ | Package + config verified, activation via `/router profile auto` (or `opencode-go`) |
 | `.env` | ✓/✗/ask | Created / keys appended / user declined |
 | .gitignore | ✓/✗ | entries added (incl. `.env`) |

package/.pi/prompts/harness-steer.md ADDED Viewed

@@ -0,0 +1,30 @@
+---
+description: Post-review repair pass — executor reads repair-brief.yaml, then re-verify via /harness-review.
+argument-hint: "[--attempt N]"
+---
+# harness-steer
+Thin orchestrator for the **steer loop** (ADR 0044). Run only after `/harness-review` produced `artifacts/review-outcome.yaml` and `artifacts/repair-brief.yaml` with `remediation_class: implementation_gap`.
+## Preconditions
+- Active run with `plan_ready` and `plan_packet_path`
+- `review-outcome.remediation_class` is `implementation_gap` (review outcome wins over executor `scope_drift` for routing)
+- `steer_attempt < HARNESS_STEER_MAX_ATTEMPTS` (default 3)
+## Steps
+1. Read `artifacts/review-outcome.yaml`, `artifacts/repair-brief.yaml`, `plan_packet_path` (paths only — do not paste bodies into tool args).
+2. Update `artifacts/steer-state.yaml` (`attempt`, `max_attempts`, `active: true`).
+3. Set policy phase to **execute** before spawning executor (required for mutating tools).
+4. One `ask_user` steer gate unless `run-context.steer_approved` is already true.
+5. Spawn **`harness/executor`** with `HarnessSpawnContext.mode: repair` and `repair_brief_path: artifacts/repair-brief.yaml`.
+6. Optional: `sentrux gate --save .` after repair to refresh baseline (ADR 0044).
+7. `next_command`: **`/harness-review`** (always re-verify; tiered adversary on attempts 2+ per practice-map).
+## Forbidden
+- Re-call `approve_plan` unless `plan-packet.yaml` structure changed
+- Widen scope beyond approved packet
+- Skip review after repair