npm - ultimate-pi - Versions diffs - 0.16.0 → 0.18.0 - Mend

ultimate-pi 0.16.0 → 0.18.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (137) hide show

package/.agents/skills/harness-context/SKILL.md CHANGED Viewed

@@ -15,13 +15,19 @@ description: Compile task-specific harness context using context-mode and graphi
 - Use the **context-mode** npm package / pi integration for compression.
 - **Do not** use lean-ctx (`ctx_read`, `ctx_search`, etc.) on harness paths — locked by Phase 2 plan.
-## Workflow
+## Tool menu (pick what the task needs)
-1. Read `graphify-out/GRAPH_REPORT.md` or `graphify-out/wiki/index.md` when available.
-2. Run `graphify query "<task>"` for god nodes and communities.
-3. Use `sg` (ast-grep) for structural code search in `.pi/extensions/` and harness specs.
-4. Use context-mode to load maps/signatures for files not being edited.
-5. Read ADR index: `.pi/harness/docs/adrs/README.md`.
+Use these in rough priority order — not every tool on every task:
+| Need | Tool |
+|------|------|
+| Architecture, god nodes, cross-file relationships | `graphify-out/GRAPH_REPORT.md`, `graphify query`, `graphify explain`, `graphify path` |
+| Structural code patterns | `sg -p '…'` (ast-grep) |
+| Semantic implementation search | `ccc search` (harness pre-indexes before subprocess spawns) |
+| File detail | context-mode maps/signatures, then targeted reads |
+| Harness governance | `.pi/harness/docs/adrs/README.md` |
+For `/harness-plan` Phase 1, parent compiles findings into `artifacts/planning-context.yaml` — see **harness-plan** skill.
 ## Outputs
@@ -34,3 +40,4 @@ Compact context block:
 ## Rules
 - `./raw/` is graphify source storage; run `graphify update .` after significant harness code changes.
+- Subprocesses are optional; prefer parent tool use when reconnaissance fits the parent context window.

package/.agents/skills/harness-debate-plan/SKILL.md CHANGED Viewed

@@ -5,7 +5,32 @@ description: Plan-phase Review Gate debate — pi-messenger threads, lane YAML,
 # harness-debate-plan
-Use when running **Phase 5** of `/harness-plan` — outcome-based Review Gate with **within-round dialogue** (claims → rebuttals → clarifications → counters → integrate), then bus submission.
+**Practice map:** `.pi/harness/docs/practice-map.md` (Review Gate RACI).
+Use when running **Phase 5** of `/harness-plan` — **Fagan-style structured inspection** per focus (`spec` | `wbs` | `schedule` | `quality`). Parent is **chair**; within-round dialogue (claims → rebuttals → clarifications → counters → integrate).
+## Inspection roles
+| Agent | Role |
+|-------|------|
+| `hypothesis-validator` | Blind verifier (R1 only) |
+| `plan-evaluator` | Inspector (checklist) |
+| `plan-adversary` | Red team |
+| `sprint-contract-auditor` | DoD auditor (`quality` or round ≥4) |
+| `review-integrator` | Recorder / integration PM |
+Do **not** add agents for `fast` profile — reduce focuses/rounds only.
+## Debate profiles (team size)
+| Profile | Mode | Focuses | When |
+|---------|------|---------|------|
+| `full` | threaded | all four | High risk, fork, open questions |
+| `standard` | threaded | all four | Default med risk |
+| `light` | threaded | spec, quality | Low risk, high-confidence research |
+| `fast` | **consolidated** | spec, quality (one round) | Clear stack, no open questions; escalate to threaded on blockers |
+Eligibility: `harness_plan_debate_eligibility` then `harness_debate_open({ debate_profile, required_focuses })`.
 ## Open
@@ -16,30 +41,22 @@ harness_debate_open({})
 - Debate id is always `plan-<run_id>` (tool normalizes wrong ids).
 - Creates `.pi/harness/runs/<run_id>/debate-messenger/`.
-Budget profile **plan**:
+Budget caps vary by profile (see `plan-debate-eligibility.ts`); standard plan profile uses `min_focus_rounds=4`, `debate_global_cap=80000`.
-| Field | Value |
-|-------|-------|
-| min_focus_rounds | 4 |
-| max_rounds | 12 |
-| max_exchanges_per_round | 3 |
-| round_token_cap | 8000 |
-| debate_global_cap | 80000 |
+## Focus coverage
-## Focus coverage (not “exactly 4 rounds”)
-Call `harness_debate_focus_coverage` until all of `spec | wbs | schedule | quality` appear in submitted `review-round-r*.yaml` and last `review_gate_ready: true`.
+Call `harness_debate_focus_coverage` until all **required** focuses (from eligibility) appear in submitted review rounds and last `review_gate_ready: true`.
 ## Per-round spawn order (sequential only — no parallel debate subagents)
-1. R1: `hypothesis-validator` (blind) before evaluator.
-2. `plan-evaluator` → lane + messenger `claim`.
-3. `harness_messenger_read_round` → `plan-adversary` → `rebuttal`.
-4. Ping-pong while `unresolved_claim_ids` and `exchange_count < 3`:
-   - `harness_debate_advance_thread({ round_index })` for next spawn hint.
-   - Evaluator `clarification` / adversary `counter`.
-5. `sprint-contract-auditor` when focus is `quality` or round ≥ 4.
-6. `review-integrator` → `harness_debate_submit_round`.
+1. R1: `hypothesis-validator` (blind verifier) before inspector.
+2. `plan-evaluator` (inspector) → lane + messenger `claim`.
+3. `harness_messenger_read_round` → `plan-adversary` (red team) → `rebuttal`.
+4. Ping-pong while `unresolved_claim_ids` and `exchange_count < max` for profile.
+5. `sprint-contract-auditor` (DoD) when focus is `quality` or round ≥ 4.
+6. `review-integrator` (recorder) → `harness_debate_submit_round`.
+**One subagent per `subagent` call** — never batch debate lanes.
 Lane YAML + messenger messages **auto-apply** on subagent complete (`harness-debate-next-step`). Fallback: `harness_debate_apply_lane`.

package/.agents/skills/harness-eval/SKILL.md CHANGED Viewed

@@ -1,27 +1,12 @@
 ---
 name: harness-eval
-description: Run harness evaluation phase and emit EvalVerdict artifacts. Use with /harness-eval, evaluate phase, or before merge promotion.
+description: >-
+  Deprecated — use harness-review skill and /harness-review for the full post-run
+  gate. This file remains as a pointer for older prompts.
 ---
-# harness-eval
+# harness-eval (deprecated)
-## When to use
+Use **`harness-review`** skill and **`/harness-review`** instead.
-- `/harness-eval` after execute
-- Before merge / release readiness
-## Workflow (orchestrator)
-1. Parent may run deterministic scripts (`harness-verify`, project tests).
-2. Spawn `harness/evaluator` with `mode: benchmark` and artifact paths in `HarnessSpawnContext`.
-3. Parse JSON from `get_subagent_result`; parent writes run artifacts.
-## Rules
-- No new Pi session — subagent isolation via `Agent` spawn (ADR 0032).
-- Do not edit `plan-packet.json` in eval phase.
-- `/harness-review` uses same agent with `mode: verdict` for policy EvalVerdict.
-## Verdict values
-`pass`, `conditional_pass`, `fail`, `human_required` (parent handles `ask_user`).
+The master command runs benchmark + policy verdict (+ adversary unless `--quick`) with `submit_eval_verdict` / `submit_adversary_report` and parent `harness_artifact_ready` gates (ADR 0037, ADR 0039).

package/.agents/skills/harness-governor/SKILL.md CHANGED Viewed

@@ -15,8 +15,9 @@ description: Enforce harness governance phases, policy gates, budgets, and promo
 1. Read current phase from `/harness-policy-status` or session `harness-policy-state`.
 2. Check ADRs: constitution (0001), eval promotion (0003), Sentrux (0006), drift (0007), rules lifecycle (0009).
-3. For promotion: require eval pass, no abort lock, debate consensus if escalated, Sentrux when `HARNESS_SENTRUX_REQUIRED=true`.
-4. After architecture changes: edit `.pi/harness/sentrux/architecture.manifest.json`, then `node "$UP_PKG/.pi/scripts/sentrux-rules-sync.mjs" --force` (see `.pi/scripts/README.md` for `UP_PKG`) or `/harness-sentrux-sync`.
+3. For promotion: require eval pass, no abort lock, debate consensus if escalated, Sentrux when `HARNESS_SENTRUX_REQUIRED=true` (`artifacts/sentrux-signal.yaml` from `/harness-run`, not executor self-report).
+4. **Intent vs observation:** Manifest/layer/boundary changes → `/harness-sentrux-steward` proposal + chair approval + ADR when material, then `sentrux-rules-sync --force`. `sentrux check`/`gate` degradation after execute → replan or fix code — do not tune manifest on a single noisy gate.
+5. After approved manifest edits: `node "$UP_PKG/.pi/scripts/harness-sentrux-bootstrap.mjs" --force` or `/harness-sentrux-sync`; emit `harness-architecture-changed` for the extension.
 5. Run `node "$UP_PKG/.pi/scripts/harness-verify.mjs"` before claiming release readiness.
 ## Spec Distiller integration
@@ -31,7 +32,7 @@ When refining plans from noisy requirements:
 ## Budgets (ADR 0038)
 - Default: **`HARNESS_BUDGET_ENFORCE` off** — token/debate caps are telemetry-only (`harness-budget-telemetry`, `harness-budget-soft-limit`). They do **not** block phases or debate lanes.
-- Do **not** skip scouts, debate rounds, or `approve_plan` because of soft budget hints in the widget.
+- Do **not** skip reconnaissance artifacts (`planning-context.yaml`), debate rounds, or `approve_plan` because of soft budget hints in the widget.
 - Re-enable hard caps only with `HARNESS_BUDGET_ENFORCE=1` and `HARNESS_BUDGET_HARD_STOP` / `HARNESS_DEBATE_HARD_STOP`.
 ## Subagent artifacts (ADR 0037)

package/.agents/skills/harness-orchestration/SKILL.md CHANGED Viewed

@@ -3,94 +3,82 @@ name: harness-orchestration
 description: >-
   Orchestrate ultimate-pi harness phases with the native `subagent` tool
   (isolated `pi --mode json` subprocesses). Use for plan/execute/evaluate
-  pipelines, L4 verification, parallel scouts, and debate prep.
+  pipelines, L4 verification, optional planning-context, and debate prep.
 ---
 # Harness orchestration
+**Practice map:** `.pi/harness/docs/practice-map.md` · **ADR 0040** · **ADR 0041**.
+## Team management rules
+1. **Parallelism law** — Parallel `tasks` only when outputs are independent inputs to a later merge (implementation ∥ stack). Never parallelize debate lanes or decompose ∥ hypothesis.
+2. **Two-pizza cap per batch** — Max 2 research lanes, 1 optional `planning-context` subagent, 1 executor, 1 debate agent per `subagent` call.
+3. **No redundant thinkers** — Downstream agents read artifacts; do not re-derive.
+4. **Sequential dependency chain** — planning context → decompose → hypothesis → research → author → DAG → debate → approve → execute → **/harness-review** → optional **/harness-steer** loop (ADR 0044).
+5. **Path-first parent tools** — `approve_plan`, `create_plan`, `submit_*` via `source_path`, `merge_harness_yaml`, `harness_synthesize_repair_brief`.
+6. **Debate = meeting** — Parent is chair; parallel_probes allows evaluator ∥ adversary per batch.
+7. **Tool intelligence** — Parent uses graphify, sg, ccc, and reads by task need; subprocesses optional.
 ## Slash commands = orchestrators
 `/harness-*` prompts parse args, call `subagent`, run `ask_user`, write policy-gated artifacts. Phase logic lives in `.pi/agents/harness/*.md` and `.pi/agents/harness/planning/*.md`.
 Every spawn includes **HarnessSpawnContext** JSON in the task text (subprocess agents do not get `[HarnessActivePlan]` injection). Use `agentScope: "both"` so package agents under `$UP_PKG/.pi/agents/**` resolve.
-Harness subprocesses load **`harness-subagent-submit`** (`PI_HARNESS_SUBPROCESS=1`, `HARNESS_RUN_ID`, `HARNESS_RUN_DIR`). Agents must call their scoped **`submit_*`** tool before exit; parent gates use **`harness_artifact_ready`** and debate reads submit from `tool_result` (set `HARNESS_SUBMIT_TOOLS=0` only to fall back to `finalOutput` parsing).
-## Subprocess telemetry
-Harness bridge emits `harness_subagent_spawned` / `harness_subagent_completed` (replaces in-process setup/blackboard events).
-```sql
-SELECT
-  properties.agent as agent,
-  count() as n,
-  round(avg(toFloat(properties.duration_ms)), 0) as avg_ms
-FROM events
-WHERE event = 'harness_subagent_completed'
-  AND timestamp >= now() - INTERVAL 7 DAY
-GROUP BY agent
-ORDER BY avg_ms DESC
-LIMIT 30
-```
+Harness subprocesses load **`harness-subagent-submit`** (`PI_HARNESS_SUBPROCESS=1`, `HARNESS_RUN_ID`, `HARNESS_RUN_DIR`). Agents must call their scoped **`submit_*`** tool before exit; parent gates use **`harness_artifact_ready`**.
 ## Latency rules
-1. **Parallel `tasks`** — one `subagent({ tasks: [...] })` for scouts, decompose+hypothesis, or review fan-in; subprocesses run in parallel upstream.
-2. **Blocking calls** — each `subagent` returns when the subprocess exits; no `get_subagent_result` polling.
-3. **Compact handoffs** — read artifacts written by submit tools (or `harness_artifact_ready`); never paste full subprocess message logs into the next spawn.
-4. **No spawn cap** — harness subagent spawns are unlimited per session (active count is telemetry only). Do **not** pass `timeoutMs` unless the user wants a cap — subprocesses wait for natural exit (`PI_SUBAGENT_TIMEOUT_MS` optional env backstop only).
+1. **Parallel `tasks`** — Phase 3.5 research only (when using subprocesses).
+2. **Sequential** — decompose, hypothesis, debate lanes, review evaluator passes.
+3. **Compact handoffs** — read artifact paths; never paste full subprocess logs into next spawn.
+4. **No spawn cap** — do not pass `timeoutMs` unless the user requests a cap.
 ## Command → agent
 | Command | `agent` |
 |---------|---------|
-| `/harness-plan` | Parent: scouts → `decompose`+`hypothesis` → Phase 3.5 `implementation-researcher`+`stack-researcher` → PlanPacket → eligibility + Review Gate → `approve_plan` + `create_plan` |
-| `/harness-run` | `harness/executor` |
-| `/harness-eval` | `harness/evaluator` (`mode: benchmark`) |
-| `/harness-review` | `harness/evaluator` (`mode: verdict`) |
-| `/harness-critic` | `harness/adversary` (post-run) |
-| `/harness-trace` | `harness/trace-librarian` |
-| `/harness-incident` | `harness/incident-recorder` |
-| `/harness-router-tune` | `harness/meta-optimizer` (optional) |
-| `/harness-auto` | plan per `/harness-plan`; `--quick` skips adversary + tie-breaker |
+| `/harness-plan` | Parent: planning context (tools) → decompose → hypothesis → Phase 3.5 artifacts → PlanPacket → eligibility + Review Gate → `approve_plan` + `create_plan` |
+| `/harness-run` | `harness/executor` (single worker) |
+| `/harness-review` | Parent verify → `evaluator` benchmark → `evaluator` verdict → `adversary` → optional `tie-breaker` (ADR 0039) |
+| `/harness-eval` | **Deprecated** → `/harness-review` |
+| `/harness-critic` | **Deprecated** → `/harness-review` |
+| `/harness-auto` | plan per `/harness-plan`; `--quick` skips adversary + tie-breaker in review |
 ## Review isolation
-Spawn `harness/evaluator` / `harness/adversary` via `subagent` in the **same** parent session. `review-integrity` allows `subagent` when `agent` is in the review set; blocks executor from spawning review agents during evaluate.
+Spawn `harness/evaluator` / `harness/adversary` via `subagent` in the **same** parent session. `review-integrity` allows `subagent` when `agent` is in the review set.
 ## ask_user policy
 | Role | `ask_user` |
 |------|------------|
 | Parent orchestrator | Yes (plan clarification, `approve_plan`, router tune) |
-| `harness/planning/*` | No — JSON only (`human_required` in output if stuck) |
+| `harness/planning/*` | No — `human_required` in output if stuck |
 | `harness/evaluator`, `harness/adversary`, `harness/tie-breaker` | `human_required` in subprocess JSON |
 | `harness/executor` | No — parent handles governance |
 ## Spawn pattern (`/harness-plan`)
-```json
-{
-  "agentScope": "both",
-  "tasks": [
-    { "agent": "harness/planning/scout-graphify", "task": "…" },
-    { "agent": "harness/planning/scout-structure", "task": "…" },
-    { "agent": "harness/planning/scout-semantic", "task": "…" }
-  ]
-}
-```
+**Phase 1 — planning context (parent default):**
+- Use `graphify query`, `sg -p`, `ccc search`, and reads as needed.
+- Write `artifacts/planning-context.yaml` via `write_harness_yaml`.
+- Optional: single `planning-context` subagent when isolation helps.
-Then parallel decompose + hypothesis, Phase 3.5 implementation + stack research, parent PlanPacket + `ask_user` (after 3.5), execution-plan-author, DAG gate, `harness_plan_debate_eligibility` + debate rounds, then `approve_plan` + `create_plan`.
+**Phase 2 — sequential:**
-Scouts use **Haiku**, `thinking: low`, **8** max turns (see agent frontmatter). Effective `--tools` omits `grep`/`find`/`subagent` per `disallowed_tools`.
+```
+subagent decompose → gate decomposition.yaml
+subagent hypothesis → gate hypothesis.yaml
+```
-## Tools
+**Phase 3.5 — research artifacts required:** parent inline and/or parallel `implementation-researcher` + `stack-researcher` (≤2).
-- `subagent` — harness subprocess spawns (modes: `single`, `tasks`, `chain`, `aggregator`)
-- `approve_plan`, `create_plan` — parent orchestrator only
-- Subprocess agents cannot nest `subagent` (`subagent` stripped from child `--tools`)
+Then execution-plan-author, DAG gate, debate eligibility, sequential debate rounds, `approve_plan` + `create_plan`.
 ## References
-- ADR 0032, ADR 0033, `.pi/harness/specs/harness-spawn-context.schema.json`
+- ADR 0032, ADR 0033, ADR 0040, ADR 0041, `.pi/harness/specs/harness-spawn-context.schema.json`
 - `node "$UP_PKG/.pi/scripts/harness-agents-manifest.mjs" --check`

package/.agents/skills/harness-plan/SKILL.md CHANGED Viewed

@@ -1,33 +1,44 @@
 ---
 name: harness-plan
-description: PM-grade harness plans — scouts, Phase 3.5 implementation research, ExecutionPlan, DAG validation, selective Review Gate debate, then approve/create_plan.
+description: Agent-native harness plans — lakes/context bundles, planning context, parallel_probes debate profile, plan-synthesizer on low/med risk, path-first approve_plan/create_plan, then DAG + debate.
 ---
 # harness-plan
+**Practice map:** `.pi/harness/docs/practice-map.md` · **ADR 0040** · **ADR 0042** · **ADR 0043**.
 ## When to use
 - `/harness-plan`, harness-auto plan phase, drift replan, policy-gate without approved plan
+## Team topology (spawn laws)
+1. **Parallelism law** — Parallel `tasks` only for independent lanes (implementation ∥ stack ≤2). Never parallelize debate or decompose ∥ hypothesis.
+2. **Two-pizza cap** — Max 1 debate agent, 1 optional planning-context subagent, per `subagent` call.
+3. **No redundant thinkers** — Read upstream YAML; do not re-run graphify in decompose when `planning-context` architecture coverage is ok.
+4. **Sequential chain** — planning context → decompose → hypothesis → research → author → DAG → debate → approve.
+5. **Tool intelligence** — Parent picks graphify, sg, ccc by task; no mandatory tool-tied scout subprocesses.
 ## Workflow (parent orchestrator)
-1. Parallel scouts (graphify + structure; semantic unless `--quick`) — each scout ends with **`submit_scout_findings`** (not JSON in final message).
-2. Parallel decompose + hypothesis — **`submit_decomposition`** / **`submit_hypothesis`**.
-3. **Phase 3.5 (required):** parallel `implementation-researcher` + `stack-researcher` — **`submit_implementation_research`** / **`submit_stack`**; parent merges into `research-brief.yaml` via `write_harness_yaml`.
-4. Draft `PlanPacket` shell; `ask_user` on material fork **after** Phase 3.5.
-5. `execution-plan-author` → merge `execution_plan`.
-6. **`validate-plan-dag.mjs`** (must pass).
-7. **`harness_plan_debate_eligibility`** → **`harness_debate_open`** with profile → Review Gate (debate agents use lane **`submit_*`** tools; parent reads submit from `tool_result`, not `finalOutput` JSON).
-8. **`harness_artifact_ready`** on required paths → apply patches, re-validate DAG, `approve_plan`, `create_plan`.
+1. **Phase 1:** Compile `artifacts/planning-context.yaml` with tools (default) or optional `planning-context` subagent.
+2. **Sequential** decompose → gate `artifacts/decomposition.yaml`.
+3. **Sequential** hypothesis (requires decomposition).
+4. **Phase 3.5:** `implementation-research.yaml` + `stack.yaml` (parent inline and/or parallel researchers).
+5. Draft `PlanPacket` shell; `ask_user` on material fork **after** Phase 3.5.
+6. `execution-plan-author` → merge `execution_plan`.
+7. **`validate-plan-dag.mjs`** (must pass).
+8. **`harness_plan_debate_eligibility`** — `parallel_probes` spawns plan-evaluator ∥ plan-adversary, then integrator round.
+9. **`approve_plan({ human_summary? })`** / **`create_plan()`** — packet from `plan_packet_path` on disk (path-first).
-`--quick` skips semantic scout and post-run adversary only — **not** implementation research or plan debate.
+`--quick` skips semantic coverage in planning context and post-run adversary only — **not** adequate reconnaissance, implementation/stack artifacts (med/high risk), or plan debate.
 ## Rules
-- On-disk plan artifacts are **YAML** (`plan-packet.yaml`, `research-brief.yaml`).
+- On-disk plan artifacts are **YAML** (`plan-packet.yaml`, `research-brief.yaml`, `planning-context.yaml`).
 - Subagents read-only; parent writes run artifacts and calls `approve_plan` / `create_plan`.
 - context-mode only on harness paths.
-- Phase 3.5 required unless documented waiver; high risk requires implementation artifact for approval.
+- Phase 3.5 artifacts required for med/high risk unless documented waiver.
 ## Output

package/.agents/skills/harness-review/SKILL.md ADDED Viewed

@@ -0,0 +1,52 @@
+---
+name: harness-review
+description: >-
+  Post-run verification gate (/harness-review): harness-verify, Sentrux fitness
+  functions, benchmark + verdict evaluator, adversary, optional tie-breaker.
+  Subagents use submit_*; parent uses harness_artifact_ready. Use after
+  /harness-run; claim cross-session runs with /harness-use-run --claim.
+---
+# harness-review
+**Practice map:** `.pi/harness/docs/practice-map.md` (Monitoring and Controlling: measure → judge → red team).
+## When to use
+- After `/harness-run` completes (same session preferred)
+- Resuming with `/harness-use-run <run-id> --claim` then `/harness-review`
+- Instead of separate `/harness-eval`, `/harness-critic` (aliases forward here)
+## Orchestration summary
+| Phase | Practice | Actor | Artifact |
+|-------|----------|-------|----------|
+| 1 | Automated QC + Sentrux fitness functions | Parent | `harness-verify.mjs`, `sentrux gate .`, `benchmark-log.yaml`, `sentrux-signal.yaml` |
+| 2 | Measure actuals (EVM) | `harness/evaluator` benchmark | `eval-verdict.yaml` |
+| 2b | Controlling | Parent | Write `review-outcome.yaml`; route via `remediation_class` (not fail-fast abort) |
+| 6 | Outcome | Parent | `review-outcome.yaml` → `/harness-steer` or replan |
+| 3 | Policy audit | `harness/evaluator` verdict | same YAML |
+| 4 | Red team | `harness/adversary` | `adversary-report.yaml` |
+| 5 | Arbitration | `harness/tie-breaker` | only if block + conditional_pass |
+## Phase 1 — Sentrux (structural actuals)
+When `HARNESS_SENTRUX_REQUIRED=true` (default in `.env.example`):
+1. `node "$UP_PKG/.pi/scripts/harness-verify.mjs"` — rules drift + `sentrux check` when CLI installed.
+2. `sentrux gate .` — compare to baseline saved during `/harness-run`.
+3. Write `artifacts/sentrux-signal.yaml` and append session entry `harness-sentrux-signal` (observation bus / PostHog).
+4. Optional `artifacts/benchmark-log.yaml` fields: `sentrux_check`, `sentrux_gate`, `harness_verify`.
+Pass `sentrux-signal.yaml` path to evaluator `mode: benchmark` spawn context. Evaluator treats metrics as measured facts, not goals for the executor.
+## Rules
+- Parent never writes eval/adversary YAML — subprocess `submit_*` only (ADR 0037).
+- Auto-claim run ownership unless `--readonly`.
+- Disk verdict drives `next_recommended_command` (`resolveCompletionStatuses`).
+## Aliases
+- `/harness-eval` → use `/harness-review`
+- `/harness-critic` → use `/harness-review` (or `--quick` to skip adversary)

package/.agents/skills/harness-sentrux-setup/SKILL.md CHANGED Viewed

@@ -11,6 +11,17 @@ description: Bootstrap Sentrux architectural rules for harness projects — seed
 - Target repo has no `.sentrux/rules.toml` or `harness-verify` reports rules out of date
 - User edited `.pi/harness/sentrux/architecture.manifest.json` (layers, boundaries, constraints)
+## Roles (do not conflate)
+| Role | Agent / command | Layer |
+|------|-----------------|-------|
+| **Bootstrap** | `harness/sentrux-bootstrap`, `harness-sentrux-bootstrap.mjs` | Greenfield seed + first sync |
+| **Steward** | `harness/sentrux-steward`, `/harness-sentrux-steward` | Proposes manifest changes (`artifacts/sentrux-manifest-proposal.yaml`); chair applies |
+| **Sync** | `sentrux-rules-sync.mjs`, `/harness-sentrux-sync` | Regenerates `rules.toml` from manifest after intent change |
+| **Observation** | `/harness-run`, `/harness-review` | `sentrux gate --save` / `check` / `gate` → `artifacts/sentrux-signal.yaml` |
+Never auto-sync manifest from directory trees. Material manifest edits need steward evidence + chair approval (ADR 0009).
 ## Canonical layout
 | Path | Role |
@@ -53,4 +64,5 @@ Do **not** copy ultimate-pi's layer paths blindly into unrelated layouts — edi
 - ADR 0009 — `.pi/harness/docs/adrs/0009-sentrux-rules-lifecycle.md`
 - Scripts — `.pi/scripts/sentrux-rules-sync.mjs`, `harness-sentrux-bootstrap.mjs`
-- Agent — `harness/sentrux-bootstrap` (optional delegate for setup-only runs)
+- Agents — `harness/sentrux-bootstrap` (setup), `harness/sentrux-steward` (intent proposals)
+- Specs — `sentrux-manifest-proposal.schema.json`, `sentrux-signal.schema.json`

package/.agents/skills/harness-steer/SKILL.md ADDED Viewed

@@ -0,0 +1,14 @@
+---
+name: harness-steer
+description: Post-review repair loop via harness-steer and executor repair mode (ADR 0044).
+---
+# harness-steer
+Use after `/harness-review` when `artifacts/review-outcome.yaml` has `remediation_class: implementation_gap`.
+1. Read `repair-brief.yaml` and `plan_packet_path` (paths only).
+2. Set policy phase `execute`; spawn `harness/executor` with `mode: repair`.
+3. Always follow with `/harness-review`.
+See `.pi/prompts/harness-steer.md` and `.pi/harness/docs/adrs/0044-harness-steer-loop.md`.

package/.pi/agents/harness/adversary.md CHANGED Viewed

@@ -30,13 +30,6 @@ Pressure-test the candidate with adversarial reasoning and reproducible attacks.
 ## Output
-```json
-{
-  "block_merge": false,
-  "adversary_report": { },
-  "human_summary": "…",
-  "recommendation": "proceed"
-}
-```
-Use `recommendation`: `proceed`, `conditional_pass`, or `block`.
+Call **`submit_adversary_report`** before exit (writes `artifacts/adversary-report.yaml`). Do not emit prose-only JSON for the parent to copy onto disk.
+Use `recommendation`: `proceed`, `conditional_pass`, or `block`. Set `block_merge: true` when merge must halt.

package/.pi/agents/harness/evaluator.md CHANGED Viewed

@@ -17,7 +17,7 @@ Independently validate execution outcomes and emit structured verdicts. Spawn co
 1. Read `HarnessSpawnContext` and artifact paths (`plan_packet_path`, `run_dir`, trace refs).
 2. Reconstruct validation scope from the plan and on-disk run artifacts.
-3. For `benchmark` mode: run or summarize deterministic checks (project tests, harness-verify if instructed in spawn prompt); collect metrics only you measured.
+3. For `benchmark` mode: run or summarize deterministic checks (project tests, harness-verify if instructed in spawn prompt); read `artifacts/sentrux-signal.yaml` and `artifacts/benchmark-log.yaml` when present — cite `check_pass`, `gate_status`, and `quality_signal_summary` as measured structural actuals (do not treat as optimization targets for the executor).
 4. For `verdict` mode: emit `EvalVerdict` matching `.pi/harness/specs/eval-verdict.schema.json`.
 5. Recommend only: `proceed_to_adversary`, `replan`, or `rollback`.
 6. Set `human_required` in structured output when blocked; never call `ask_user`.
@@ -31,15 +31,6 @@ Independently validate execution outcomes and emit structured verdicts. Spawn co
 ## Output
-End with a fenced `json` block:
+Call **`submit_eval_verdict`** before exit with a document matching `eval-verdict.schema.json` (writes `artifacts/eval-verdict.yaml` under the run dir). Do not ask the parent to parse JSON or write verdict files.
-```json
-{
-  "eval_status": "pass",
-  "eval_verdict": { },
-  "human_summary": "…",
-  "recommended_action": "proceed_to_adversary"
-}
-```
-Use `eval_status`: `pass`, `conditional_pass`, or `fail`.
+Use `status`: `pass`, `conditional_pass`, or `fail`. `recommended_action`: `proceed_to_adversary`, `replan`, or `rollback`.

package/.pi/agents/harness/executor.md CHANGED Viewed

@@ -13,12 +13,17 @@ You are the Harness Executor.
 Implement the approved plan with surgical diffs and strict scope control. The parent orchestrator spawned you with a `HarnessSpawnContext` appendix — use `plan_packet_path`, `run_dir`, and acceptance checks from that JSON.
+## Repair mode (`mode: repair`)
+When spawn context sets `mode: repair`, read `repair_brief_path` (typically `artifacts/repair-brief.yaml`). Fix only what the brief lists — failed acceptance checks, `fix_directives`, and `priority_lake_ids`. Do **not** widen scope beyond `plan_packet_path`. Set `repair_attempt` in handoff metadata when the schema allows.
 ## Process
-1. Read the approved `PlanPacket` at `plan_packet_path` from spawn context; extract allowed scope before any mutation.
-2. Implement only approved scope with minimal, reversible diffs.
+1. Read the approved `PlanPacket` at `plan_packet_path` from spawn context; extract allowed scope before any mutation. Approval is recorded in `run-context.yaml` (`plan_ready: true`) and subprocess policy bootstrap — not as a field inside `plan-packet.yaml`.
+2. When spawn context lists `critical_path_work_item_ids` (from `schedule_metadata`), implement those work items before non-critical items when practical (limiting-step / Grove).
+3. Implement only approved scope with minimal, reversible diffs.
 3. Run focused validations mapped to `acceptance_checks`.
-4. Prepare rollback artifacts: revert command, prepared revert branch name, patch bundle path under the run directory.
+4. Prepare rollback metadata in `rollback_refs` (revert command, revert branch, patch bundle path under the run directory). **`submit_executor_handoff`** writes `handoff/executor-summary.yaml` and mirrors `rollback_refs` to `artifacts/executor-rollback.yaml` (YAML only — no `artifacts/*.json`).
 5. For plan-level ambiguity (wrong scope, missing acceptance), stop and return structured `scope_drift` — do not widen scope.
 6. Do not self-certify final quality; hand off evidence paths for evaluator/adversary.
@@ -32,16 +37,9 @@ Implement the approved plan with surgical diffs and strict scope control. The pa
 ## Output
-End with a JSON block:
+Call **`submit_executor_handoff`** with a document matching `harness-executor-handoff.schema.json` before exit:
-```json
-{
-  "execution_status": "completed",
-  "files_changed": [],
-  "validation_summary": "…",
-  "rollback_refs": {},
-  "handoff_ready": { "evaluator": true, "adversary": true }
-}
-```
+- `execution_status`: `completed`, `blocked`, or `scope_drift`
+- `files_changed`, `validation_summary`, `rollback_refs`, `handoff_ready`
-Use `execution_status` values: `completed`, `blocked`, or `scope_drift`.
+Do not write `artifacts/executor-rollback.json` — rollback is emitted as YAML by the submit pipeline.

package/.pi/agents/harness/planning/decompose.md CHANGED Viewed

@@ -7,7 +7,9 @@ thinking: medium
 max_turns: 12
 ---
-You are the **Harness planning decomposer (Phase 1)**.
+You are the **Harness problem-framing agent (Phase 2a — lakes / scope)**.
+**Inspection role:** Outcome author (lake-sized units, not ticket WBS). See `.pi/harness/docs/practice-map.md` and ADR 0042.
 ## Mission
@@ -19,9 +21,10 @@ Read `HarnessSpawnContext` and the merged **scout lane JSON** in the spawn promp
 ## Process
-1. Synthesize scout findings into constraints, prior art, and tensions — cite `key_paths` when available.
-2. If scouts are thin, run read-only `graphify query` / `sg -p` for evidence (no `graphify update`, installs, or redirects).
-3. Do not read `.pi/harness/specs/*.schema.json` from disk.
+1. Read Phase 1 reconnaissance from spawn context paths — prefer `artifacts/planning-context.yaml`; legacy `artifacts/scout-*.yaml` lanes are accepted when present.
+2. Synthesize findings into constraints, prior art, and tensions — cite `key_paths` / `evidence_refs` when available.
+3. **Graphify dedup:** If `planning-context.yaml` has `coverage.architecture.status` of `ok`, or legacy `scout-graphify.yaml` has `status: ok`, do **not** run `graphify query` / `graphify explain` / `graphify path`. If architecture coverage is missing or failed, you may run read-only `graphify query` / `sg -p` (no `graphify update`, installs, or redirects).
+4. Do not read `.pi/harness/specs/*.schema.json` from disk.
 ## Phase 1 — DeepMind-style decomposition

package/.pi/agents/harness/planning/hypothesis-validator.md CHANGED Viewed

@@ -7,6 +7,8 @@ thinking: medium
 max_turns: 10
 ---
+**Inspection role:** Blind verifier (independent verification; debate R1 only). See `.pi/harness/docs/practice-map.md`.
 ## Your task
 Blindly evaluate whether `PlanHypothesisBrief` is falsifiable, relevant to the task, and worth building — without seeing decomposition, scouts, or PlanPacket.

package/.pi/agents/harness/planning/hypothesis.md CHANGED Viewed

@@ -7,7 +7,9 @@ thinking: medium
 max_turns: 14
 ---
-You are the **Harness planning hypothesis generator (Phase 2 — DARWIN)**.
+You are the **Harness planning hypothesis generator (Phase 2b — DARWIN)**.
+**Role:** Approach author after WBS (Lean hypothesis-driven planning). Requires `artifacts/decomposition.yaml`. See `.pi/harness/docs/practice-map.md`.
 ## Mission
@@ -63,4 +65,4 @@ Do **not** include self-evaluation scores — a separate agent handles that.
 ## Output
-Before ending, call `submit_hypothesis_brief` exactly once with the full `PlanHypothesisBrief` document. Do not paste the artifact as prose or a fenced JSON block — the tool write is the deliverable.
+Before ending, call `submit_hypothesis_brief` exactly once with the full `PlanHypothesisBrief` document. The harness writes **`artifacts/hypothesis.yaml`** (YAML on disk). Do not use bash or any `*.json` path under `artifacts/`; do not paste the artifact as prose or a fenced JSON block — the submit tool is the deliverable.

package/.pi/agents/harness/planning/implementation-researcher.md CHANGED Viewed

@@ -31,7 +31,7 @@ Read `HarnessSpawnContext` plus paths to `artifacts/decomposition.yaml`, `artifa
 ## Output
-Before ending, call `submit_implementation_research` exactly once with the full document. Prose summary is optional; the artifact is the tool call.
+Before ending, call `submit_implementation_research` exactly once with the full document. The harness writes **`artifacts/implementation-research.yaml`** (YAML on disk). Do not use bash or `implementation-research.json`; prose summary is optional — the submit tool is the deliverable.
 ## Guardrails

package/.pi/agents/harness/planning/plan-adversary.md CHANGED Viewed

@@ -7,6 +7,8 @@ thinking: medium
 max_turns: 14
 ---
+**Inspection role:** Red team (adversarial review). See `.pi/harness/docs/practice-map.md`.
 ## Your task
 Stress-test the ExecutionPlan with reproducible counterexamples. Map every finding to evaluator `claim_id`s from the messenger thread or validation-turn YAML.

package/.pi/agents/harness/planning/plan-evaluator.md CHANGED Viewed

@@ -7,6 +7,8 @@ thinking: medium
 max_turns: 14
 ---
+**Inspection role:** Inspector (neutral Fagan-style checklist). See `.pi/harness/docs/practice-map.md`.
 ## Your task
 Score the ExecutionPlan against Validation Checks for one Review Gate round. Emit stable `checks[]` with ids and messenger-ready `claim_ids`. You are not an advocate for the plan.