npm - ultimate-pi - Versions diffs - 0.17.0 → 0.18.1 - Mend

ultimate-pi 0.17.0 → 0.18.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (137) hide show

package/.agents/skills/harness-context/SKILL.md +13 -6
package/.agents/skills/harness-debate-plan/SKILL.md +37 -20
package/.agents/skills/harness-decisions/SKILL.md +1 -1
package/.agents/skills/harness-eval/SKILL.md +6 -21
package/.agents/skills/harness-governor/SKILL.md +4 -3
package/.agents/skills/harness-orchestration/SKILL.md +41 -53
package/.agents/skills/harness-plan/SKILL.md +23 -12
package/.agents/skills/harness-review/SKILL.md +52 -0
package/.agents/skills/harness-sentrux-setup/SKILL.md +16 -3
package/.agents/skills/harness-steer/SKILL.md +14 -0
package/.agents/skills/sentrux/SKILL.md +9 -9
package/.pi/agents/harness/planning/decompose.md +7 -4
package/.pi/agents/harness/planning/hypothesis-validator.md +2 -0
package/.pi/agents/harness/planning/hypothesis.md +3 -1
package/.pi/agents/harness/planning/plan-adversary.md +2 -0
package/.pi/agents/harness/planning/plan-evaluator.md +2 -0
package/.pi/agents/harness/planning/plan-synthesizer.md +25 -0
package/.pi/agents/harness/planning/planning-context.md +48 -0
package/.pi/agents/harness/planning/review-integrator.md +2 -0
package/.pi/agents/harness/planning/sprint-contract-auditor.md +2 -0
package/.pi/agents/harness/{adversary.md → reviewing/adversary.md} +3 -10
package/.pi/agents/harness/{evaluator.md → reviewing/evaluator.md} +3 -12
package/.pi/agents/harness/running/executor.md +45 -0
package/.pi/agents/harness/sentrux-steward.md +51 -0
package/.pi/extensions/00-harness-project-control.ts +133 -0
package/.pi/extensions/00-posthog-network-bootstrap.ts +11 -0
package/.pi/extensions/budget-guard.ts +2 -0
package/.pi/extensions/debate-orchestrator.ts +2 -0
package/.pi/extensions/harness-ask-user.ts +2 -2
package/.pi/extensions/harness-debate-tools.ts +2 -2
package/.pi/extensions/harness-live-widget.ts +60 -3
package/.pi/extensions/harness-plan-approval.ts +64 -58
package/.pi/extensions/harness-run-context.ts +715 -90
package/.pi/extensions/harness-subagent-submit.ts +46 -12
package/.pi/extensions/harness-subagents.ts +2 -2
package/.pi/extensions/harness-telemetry.ts +2 -0
package/.pi/extensions/harness-web-tools.ts +2 -2
package/.pi/extensions/lib/extension-load-guard.ts +10 -0
package/.pi/extensions/lib/harness-artifact-gate.ts +172 -0
package/.pi/extensions/lib/harness-posthog.ts +9 -5
package/.pi/extensions/lib/harness-spawn-topology.ts +165 -0
package/.pi/extensions/lib/harness-subagent-auth.ts +1 -2
package/.pi/extensions/lib/harness-subagent-policy.ts +28 -24
package/.pi/extensions/lib/harness-subagent-precheck.ts +36 -10
package/.pi/extensions/lib/harness-subagent-submit-pipeline.ts +66 -2
package/.pi/extensions/lib/harness-subagent-submit-registry.ts +22 -22
package/.pi/extensions/lib/harness-subagents-bridge.ts +7 -29
package/.pi/extensions/lib/harness-subprocess-bootstrap.ts +73 -0
package/.pi/extensions/lib/plan-approval/create-plan.ts +2 -3
package/.pi/extensions/lib/plan-approval/resolve-disk.ts +102 -0
package/.pi/extensions/lib/plan-approval/schema.ts +22 -8
package/.pi/extensions/lib/plan-approval/types.ts +1 -1
package/.pi/extensions/lib/plan-approval/validate.ts +2 -2
package/.pi/extensions/lib/plan-approval-readiness.ts +192 -0
package/.pi/extensions/lib/plan-debate-eligibility.ts +12 -5
package/.pi/extensions/lib/plan-debate-gate.ts +22 -1
package/.pi/extensions/lib/plan-debate-lanes.ts +32 -2
package/.pi/extensions/lib/plan-review-gate.ts +8 -0
package/.pi/extensions/lib/posthog-client.ts +76 -0
package/.pi/extensions/lib/spawn-policy.ts +3 -3
package/.pi/extensions/observation-bus.ts +2 -0
package/.pi/extensions/policy-gate.ts +26 -19
package/.pi/extensions/review-integrity.ts +91 -10
package/.pi/extensions/sentrux-rules-sync.ts +2 -0
package/.pi/extensions/test-diff-integrity.ts +1 -0
package/.pi/extensions/trace-recorder.ts +2 -0
package/.pi/harness/agents.manifest.json +37 -37
package/.pi/harness/corpus/cron.example +8 -0
package/.pi/harness/corpus/graphify-kb-updater.config.json +214 -0
package/.pi/harness/corpus/systemd/graphify-kb-updater.env.template +4 -0
package/.pi/harness/corpus/systemd/graphify-kb-updater.service +17 -0
package/.pi/harness/corpus/systemd/graphify-kb-updater.timer +11 -0
package/.pi/harness/docs/adrs/0001-harness-constitution.md +2 -1
package/.pi/harness/docs/adrs/0006-sentrux-dual-layer.md +8 -6
package/.pi/harness/docs/adrs/0009-sentrux-rules-lifecycle.md +6 -1
package/.pi/harness/docs/adrs/0031-harness-run-context.md +1 -1
package/.pi/harness/docs/adrs/0032-harness-command-orchestration.md +7 -0
package/.pi/harness/docs/adrs/0034-darwin-plan-research-pipeline.md +3 -3
package/.pi/harness/docs/adrs/0036-implementation-research-and-selective-debate.md +8 -5
package/.pi/harness/docs/adrs/0039-harness-post-run-review-gate.md +47 -0
package/.pi/harness/docs/adrs/0040-practice-grounded-orchestration.md +40 -0
package/.pi/harness/docs/adrs/0041-intelligent-planning-reconnaissance.md +39 -0
package/.pi/harness/docs/adrs/0042-agent-native-orchestration.md +35 -0
package/.pi/harness/docs/adrs/0043-path-first-harness-tools.md +38 -0
package/.pi/harness/docs/adrs/0044-harness-steer-loop.md +37 -0
package/.pi/harness/docs/adrs/0045-phase-scoped-agent-directories.md +33 -0
package/.pi/harness/docs/adrs/README.md +11 -0
package/.pi/harness/docs/graphify-kb-updater-runbook.md +163 -0
package/.pi/harness/docs/practice-map.md +110 -0
package/.pi/harness/env.harness.template +5 -3
package/.pi/harness/evals/smoke/sentrux-stub.json +1 -1
package/.pi/harness/evals/smoke/smoke-harness-plan.mjs +5 -2
package/.pi/harness/specs/README.md +1 -1
package/.pi/harness/specs/harness-run-context.schema.json +11 -0
package/.pi/harness/specs/harness-spawn-context.schema.json +15 -1
package/.pi/harness/specs/plan-execution-plan.schema.json +39 -1
package/.pi/harness/specs/plan-packet.schema.json +4 -0
package/.pi/harness/specs/plan-phase-status.schema.json +17 -0
package/.pi/harness/specs/plan-phase-waiver.schema.json +25 -0
package/.pi/harness/specs/plan-planning-context.schema.json +50 -0
package/.pi/harness/specs/repair-brief.schema.json +45 -0
package/.pi/harness/specs/review-outcome.schema.json +46 -0
package/.pi/harness/specs/sentrux-manifest-proposal.schema.json +80 -0
package/.pi/harness/specs/sentrux-signal.schema.json +43 -0
package/.pi/harness/specs/steer-state.schema.json +20 -0
package/.pi/lib/harness-context-mode-policy.ts +256 -0
package/.pi/lib/harness-project-config.ts +91 -0
package/.pi/lib/harness-repair-brief.ts +145 -0
package/.pi/lib/harness-run-context.ts +591 -32
package/.pi/lib/harness-ui-state.ts +114 -21
package/.pi/prompts/harness-auto.md +10 -10
package/.pi/prompts/harness-critic.md +3 -30
package/.pi/prompts/harness-eval.md +4 -37
package/.pi/prompts/harness-plan.md +116 -54
package/.pi/prompts/harness-review.md +150 -15
package/.pi/prompts/harness-run.md +62 -10
package/.pi/prompts/harness-sentrux-steward.md +55 -0
package/.pi/prompts/harness-setup.md +5 -4
package/.pi/prompts/harness-steer.md +30 -0
package/.pi/scripts/README.md +1 -0
package/.pi/scripts/graphify-kb-updater.mjs +398 -0
package/.pi/scripts/harness-agents-manifest.mjs +1 -1
package/.pi/scripts/harness-project-toggle.mjs +129 -0
package/.pi/scripts/harness-sentrux-cli.mjs +142 -0
package/.pi/scripts/harness-verify.mjs +22 -6
package/.pi/scripts/harness-web-policy-guard.mjs +68 -0
package/.pi/scripts/validate-plan-dag.mjs +3 -3
package/AGENTS.md +1 -0
package/CHANGELOG.md +23 -0
package/README.md +94 -58
package/package.json +5 -4
package/.pi/agents/harness/executor.md +0 -47
package/.pi/agents/harness/planning/scout-graphify.md +0 -37
package/.pi/agents/harness/planning/scout-semantic.md +0 -39
package/.pi/agents/harness/planning/scout-structure.md +0 -35
package/.pi/prompts/git-sync.md +0 -124
/package/.pi/agents/harness/{tie-breaker.md → reviewing/tie-breaker.md} +0 -0

package/.agents/skills/harness-context/SKILL.md CHANGED Viewed

@@ -15,13 +15,19 @@ description: Compile task-specific harness context using context-mode and graphi
 - Use the **context-mode** npm package / pi integration for compression.
 - **Do not** use lean-ctx (`ctx_read`, `ctx_search`, etc.) on harness paths — locked by Phase 2 plan.
-## Workflow
+## Tool menu (pick what the task needs)
-1. Read `graphify-out/GRAPH_REPORT.md` or `graphify-out/wiki/index.md` when available.
-2. Run `graphify query "<task>"` for god nodes and communities.
-3. Use `sg` (ast-grep) for structural code search in `.pi/extensions/` and harness specs.
-4. Use context-mode to load maps/signatures for files not being edited.
-5. Read ADR index: `.pi/harness/docs/adrs/README.md`.
+Use these in rough priority order — not every tool on every task:
+| Need | Tool |
+|------|------|
+| Architecture, god nodes, cross-file relationships | `graphify-out/GRAPH_REPORT.md`, `graphify query`, `graphify explain`, `graphify path` |
+| Structural code patterns | `sg -p '…'` (ast-grep) |
+| Semantic implementation search | `ccc search` (harness pre-indexes before subprocess spawns) |
+| File detail | context-mode maps/signatures, then targeted reads |
+| Harness governance | `.pi/harness/docs/adrs/README.md` |
+For `/harness-plan` Phase 1, parent compiles findings into `artifacts/planning-context.yaml` — see **harness-plan** skill.
 ## Outputs
@@ -34,3 +40,4 @@ Compact context block:
 ## Rules
 - `./raw/` is graphify source storage; run `graphify update .` after significant harness code changes.
+- Subprocesses are optional; prefer parent tool use when reconnaissance fits the parent context window.

package/.agents/skills/harness-debate-plan/SKILL.md CHANGED Viewed

@@ -5,7 +5,32 @@ description: Plan-phase Review Gate debate — pi-messenger threads, lane YAML,
 # harness-debate-plan
-Use when running **Phase 5** of `/harness-plan` — outcome-based Review Gate with **within-round dialogue** (claims → rebuttals → clarifications → counters → integrate), then bus submission.
+**Practice map:** `.pi/harness/docs/practice-map.md` (Review Gate RACI).
+Use when running **Phase 5** of `/harness-plan` — **Fagan-style structured inspection** per focus (`spec` | `wbs` | `schedule` | `quality`). Parent is **chair**; within-round dialogue (claims → rebuttals → clarifications → counters → integrate).
+## Inspection roles
+| Agent | Role |
+|-------|------|
+| `hypothesis-validator` | Blind verifier (R1 only) |
+| `plan-evaluator` | Inspector (checklist) |
+| `plan-adversary` | Red team |
+| `sprint-contract-auditor` | DoD auditor (`quality` or round ≥4) |
+| `review-integrator` | Recorder / integration PM |
+Do **not** add agents for `fast` profile — reduce focuses/rounds only.
+## Debate profiles (team size)
+| Profile | Mode | Focuses | When |
+|---------|------|---------|------|
+| `full` | threaded | all four | High risk, fork, open questions |
+| `standard` | threaded | all four | Default med risk |
+| `light` | threaded | spec, quality | Low risk, high-confidence research |
+| `fast` | **consolidated** | spec, quality (one round) | Clear stack, no open questions; escalate to threaded on blockers |
+Eligibility: `harness_plan_debate_eligibility` then `harness_debate_open({ debate_profile, required_focuses })`.
 ## Open
@@ -16,30 +41,22 @@ harness_debate_open({})
 - Debate id is always `plan-<run_id>` (tool normalizes wrong ids).
 - Creates `.pi/harness/runs/<run_id>/debate-messenger/`.
-Budget profile **plan**:
+Budget caps vary by profile (see `plan-debate-eligibility.ts`); standard plan profile uses `min_focus_rounds=4`, `debate_global_cap=80000`.
-| Field | Value |
-|-------|-------|
-| min_focus_rounds | 4 |
-| max_rounds | 12 |
-| max_exchanges_per_round | 3 |
-| round_token_cap | 8000 |
-| debate_global_cap | 80000 |
+## Focus coverage
-## Focus coverage (not “exactly 4 rounds”)
-Call `harness_debate_focus_coverage` until all of `spec | wbs | schedule | quality` appear in submitted `review-round-r*.yaml` and last `review_gate_ready: true`.
+Call `harness_debate_focus_coverage` until all **required** focuses (from eligibility) appear in submitted review rounds and last `review_gate_ready: true`.
 ## Per-round spawn order (sequential only — no parallel debate subagents)
-1. R1: `hypothesis-validator` (blind) before evaluator.
-2. `plan-evaluator` → lane + messenger `claim`.
-3. `harness_messenger_read_round` → `plan-adversary` → `rebuttal`.
-4. Ping-pong while `unresolved_claim_ids` and `exchange_count < 3`:
-   - `harness_debate_advance_thread({ round_index })` for next spawn hint.
-   - Evaluator `clarification` / adversary `counter`.
-5. `sprint-contract-auditor` when focus is `quality` or round ≥ 4.
-6. `review-integrator` → `harness_debate_submit_round`.
+1. R1: `hypothesis-validator` (blind verifier) before inspector.
+2. `plan-evaluator` (inspector) → lane + messenger `claim`.
+3. `harness_messenger_read_round` → `plan-adversary` (red team) → `rebuttal`.
+4. Ping-pong while `unresolved_claim_ids` and `exchange_count < max` for profile.
+5. `sprint-contract-auditor` (DoD) when focus is `quality` or round ≥ 4.
+6. `review-integrator` (recorder) → `harness_debate_submit_round`.
+**One subagent per `subagent` call** — never batch debate lanes.
 Lane YAML + messenger messages **auto-apply** on subagent complete (`harness-debate-next-step`). Fallback: `harness_debate_apply_lane`.

package/.agents/skills/harness-decisions/SKILL.md CHANGED Viewed

@@ -72,4 +72,4 @@ Parent orchestrator calls **`approve_plan`** with the full `plan_packet` (scroll
 - **Parent orchestrator** during `/harness-plan` — `ask_user` for clarification; **`approve_plan`** then **`create_plan`** for the plan file.
 - `harness/planning/*` (scouts, decompose, hypothesis, hypothesis-eval) — JSON only; no `ask_user` / `approve_plan` / `create_plan`.
-- `harness/evaluator`, `harness/adversary`, and `harness/tie-breaker` — emit `human_required`; the **parent orchestrator** calls `ask_user`.
+- `harness/reviewing/evaluator`, `harness/reviewing/adversary`, and `harness/reviewing/tie-breaker` — emit `human_required`; the **parent orchestrator** calls `ask_user`.

package/.agents/skills/harness-eval/SKILL.md CHANGED Viewed

@@ -1,27 +1,12 @@
 ---
 name: harness-eval
-description: Run harness evaluation phase and emit EvalVerdict artifacts. Use with /harness-eval, evaluate phase, or before merge promotion.
+description: >-
+  Deprecated — use harness-review skill and /harness-review for the full post-run
+  gate. This file remains as a pointer for older prompts.
 ---
-# harness-eval
+# harness-eval (deprecated)
-## When to use
+Use **`harness-review`** skill and **`/harness-review`** instead.
-- `/harness-eval` after execute
-- Before merge / release readiness
-## Workflow (orchestrator)
-1. Parent may run deterministic scripts (`harness-verify`, project tests).
-2. Spawn `harness/evaluator` with `mode: benchmark` and artifact paths in `HarnessSpawnContext`.
-3. Parse JSON from `get_subagent_result`; parent writes run artifacts.
-## Rules
-- No new Pi session — subagent isolation via `Agent` spawn (ADR 0032).
-- Do not edit `plan-packet.json` in eval phase.
-- `/harness-review` uses same agent with `mode: verdict` for policy EvalVerdict.
-## Verdict values
-`pass`, `conditional_pass`, `fail`, `human_required` (parent handles `ask_user`).
+The master command runs benchmark + policy verdict (+ adversary unless `--quick`) with `submit_eval_verdict` / `submit_adversary_report` and parent `harness_artifact_ready` gates (ADR 0037, ADR 0039).

package/.agents/skills/harness-governor/SKILL.md CHANGED Viewed

@@ -15,8 +15,9 @@ description: Enforce harness governance phases, policy gates, budgets, and promo
 1. Read current phase from `/harness-policy-status` or session `harness-policy-state`.
 2. Check ADRs: constitution (0001), eval promotion (0003), Sentrux (0006), drift (0007), rules lifecycle (0009).
-3. For promotion: require eval pass, no abort lock, debate consensus if escalated, Sentrux when `HARNESS_SENTRUX_REQUIRED=true`.
-4. After architecture changes: edit `.pi/harness/sentrux/architecture.manifest.json`, then `node "$UP_PKG/.pi/scripts/sentrux-rules-sync.mjs" --force` (see `.pi/scripts/README.md` for `UP_PKG`) or `/harness-sentrux-sync`.
+3. For promotion: require eval pass, no abort lock, debate consensus if escalated, Sentrux when `HARNESS_SENTRUX_REQUIRED=true` (`artifacts/sentrux-signal.yaml` from `/harness-run`, not executor self-report).
+4. **Intent vs observation:** Manifest/layer/boundary changes → `/harness-sentrux-steward` proposal + chair approval + ADR when material, then `sentrux-rules-sync --force`. `sentrux check`/`gate` degradation after execute → replan or fix code — do not tune manifest on a single noisy gate.
+5. After approved manifest edits: `node "$UP_PKG/.pi/scripts/harness-sentrux-bootstrap.mjs" --force` or `/harness-sentrux-sync`; emit `harness-architecture-changed` for the extension.
 5. Run `node "$UP_PKG/.pi/scripts/harness-verify.mjs"` before claiming release readiness.
 ## Spec Distiller integration
@@ -31,7 +32,7 @@ When refining plans from noisy requirements:
 ## Budgets (ADR 0038)
 - Default: **`HARNESS_BUDGET_ENFORCE` off** — token/debate caps are telemetry-only (`harness-budget-telemetry`, `harness-budget-soft-limit`). They do **not** block phases or debate lanes.
-- Do **not** skip scouts, debate rounds, or `approve_plan` because of soft budget hints in the widget.
+- Do **not** skip reconnaissance artifacts (`planning-context.yaml`), debate rounds, or `approve_plan` because of soft budget hints in the widget.
 - Re-enable hard caps only with `HARNESS_BUDGET_ENFORCE=1` and `HARNESS_BUDGET_HARD_STOP` / `HARNESS_DEBATE_HARD_STOP`.
 ## Subagent artifacts (ADR 0037)

package/.agents/skills/harness-orchestration/SKILL.md CHANGED Viewed

@@ -3,94 +3,82 @@ name: harness-orchestration
 description: >-
   Orchestrate ultimate-pi harness phases with the native `subagent` tool
   (isolated `pi --mode json` subprocesses). Use for plan/execute/evaluate
-  pipelines, L4 verification, parallel scouts, and debate prep.
+  pipelines, L4 verification, optional planning-context, and debate prep.
 ---
 # Harness orchestration
+**Practice map:** `.pi/harness/docs/practice-map.md` · **ADR 0040** · **ADR 0041**.
+## Team management rules
+1. **Parallelism law** — Parallel `tasks` only when outputs are independent inputs to a later merge (implementation ∥ stack). Never parallelize debate lanes or decompose ∥ hypothesis.
+2. **Two-pizza cap per batch** — Max 2 research lanes, 1 optional `planning-context` subagent, 1 executor, 1 debate agent per `subagent` call.
+3. **No redundant thinkers** — Downstream agents read artifacts; do not re-derive.
+4. **Sequential dependency chain** — planning context → decompose → hypothesis → research → author → DAG → debate → approve → execute → **/harness-review** → optional **/harness-steer** loop (ADR 0044).
+5. **Path-first parent tools** — `approve_plan`, `create_plan`, `submit_*` via `source_path`, `merge_harness_yaml`, `harness_synthesize_repair_brief`.
+6. **Debate = meeting** — Parent is chair; parallel_probes allows evaluator ∥ adversary per batch.
+7. **Tool intelligence** — Parent uses graphify, sg, ccc, and reads by task need; subprocesses optional.
 ## Slash commands = orchestrators
 `/harness-*` prompts parse args, call `subagent`, run `ask_user`, write policy-gated artifacts. Phase logic lives in `.pi/agents/harness/*.md` and `.pi/agents/harness/planning/*.md`.
 Every spawn includes **HarnessSpawnContext** JSON in the task text (subprocess agents do not get `[HarnessActivePlan]` injection). Use `agentScope: "both"` so package agents under `$UP_PKG/.pi/agents/**` resolve.
-Harness subprocesses load **`harness-subagent-submit`** (`PI_HARNESS_SUBPROCESS=1`, `HARNESS_RUN_ID`, `HARNESS_RUN_DIR`). Agents must call their scoped **`submit_*`** tool before exit; parent gates use **`harness_artifact_ready`** and debate reads submit from `tool_result` (set `HARNESS_SUBMIT_TOOLS=0` only to fall back to `finalOutput` parsing).
-## Subprocess telemetry
-Harness bridge emits `harness_subagent_spawned` / `harness_subagent_completed` (replaces in-process setup/blackboard events).
-```sql
-SELECT
-  properties.agent as agent,
-  count() as n,
-  round(avg(toFloat(properties.duration_ms)), 0) as avg_ms
-FROM events
-WHERE event = 'harness_subagent_completed'
-  AND timestamp >= now() - INTERVAL 7 DAY
-GROUP BY agent
-ORDER BY avg_ms DESC
-LIMIT 30
-```
+Harness subprocesses load **`harness-subagent-submit`** (`PI_HARNESS_SUBPROCESS=1`, `HARNESS_RUN_ID`, `HARNESS_RUN_DIR`). Agents must call their scoped **`submit_*`** tool before exit; parent gates use **`harness_artifact_ready`**.
 ## Latency rules
-1. **Parallel `tasks`** — one `subagent({ tasks: [...] })` for scouts, decompose+hypothesis, or review fan-in; subprocesses run in parallel upstream.
-2. **Blocking calls** — each `subagent` returns when the subprocess exits; no `get_subagent_result` polling.
-3. **Compact handoffs** — read artifacts written by submit tools (or `harness_artifact_ready`); never paste full subprocess message logs into the next spawn.
-4. **No spawn cap** — harness subagent spawns are unlimited per session (active count is telemetry only). Do **not** pass `timeoutMs` unless the user wants a cap — subprocesses wait for natural exit (`PI_SUBAGENT_TIMEOUT_MS` optional env backstop only).
+1. **Parallel `tasks`** — Phase 3.5 research only (when using subprocesses).
+2. **Sequential** — decompose, hypothesis, debate lanes, review evaluator passes.
+3. **Compact handoffs** — read artifact paths; never paste full subprocess logs into next spawn.
+4. **No spawn cap** — do not pass `timeoutMs` unless the user requests a cap.
 ## Command → agent
 | Command | `agent` |
 |---------|---------|
-| `/harness-plan` | Parent: scouts → `decompose`+`hypothesis` → Phase 3.5 `implementation-researcher`+`stack-researcher` → PlanPacket → eligibility + Review Gate → `approve_plan` + `create_plan` |
-| `/harness-run` | `harness/executor` |
-| `/harness-eval` | `harness/evaluator` (`mode: benchmark`) |
-| `/harness-review` | `harness/evaluator` (`mode: verdict`) |
-| `/harness-critic` | `harness/adversary` (post-run) |
-| `/harness-trace` | `harness/trace-librarian` |
-| `/harness-incident` | `harness/incident-recorder` |
-| `/harness-router-tune` | `harness/meta-optimizer` (optional) |
-| `/harness-auto` | plan per `/harness-plan`; `--quick` skips adversary + tie-breaker |
+| `/harness-plan` | Parent: planning context (tools) → decompose → hypothesis → Phase 3.5 artifacts → PlanPacket → eligibility + Review Gate → `approve_plan` + `create_plan` |
+| `/harness-run` | `harness/running/executor` (single worker) |
+| `/harness-review` | Parent verify → `evaluator` benchmark → `evaluator` verdict → `adversary` → optional `tie-breaker` (ADR 0039) |
+| `/harness-eval` | **Deprecated** → `/harness-review` |
+| `/harness-critic` | **Deprecated** → `/harness-review` |
+| `/harness-auto` | plan per `/harness-plan`; `--quick` skips adversary + tie-breaker in review |
 ## Review isolation
-Spawn `harness/evaluator` / `harness/adversary` via `subagent` in the **same** parent session. `review-integrity` allows `subagent` when `agent` is in the review set; blocks executor from spawning review agents during evaluate.
+Spawn `harness/reviewing/evaluator` / `harness/reviewing/adversary` via `subagent` in the **same** parent session. `review-integrity` allows `subagent` when `agent` is in the review set.
 ## ask_user policy
 | Role | `ask_user` |
 |------|------------|
 | Parent orchestrator | Yes (plan clarification, `approve_plan`, router tune) |
-| `harness/planning/*` | No — JSON only (`human_required` in output if stuck) |
-| `harness/evaluator`, `harness/adversary`, `harness/tie-breaker` | `human_required` in subprocess JSON |
-| `harness/executor` | No — parent handles governance |
+| `harness/planning/*` | No — `human_required` in output if stuck |
+| `harness/reviewing/evaluator`, `harness/reviewing/adversary`, `harness/reviewing/tie-breaker` | `human_required` in subprocess JSON |
+| `harness/running/executor` | No — parent handles governance |
 ## Spawn pattern (`/harness-plan`)
-```json
-{
-  "agentScope": "both",
-  "tasks": [
-    { "agent": "harness/planning/scout-graphify", "task": "…" },
-    { "agent": "harness/planning/scout-structure", "task": "…" },
-    { "agent": "harness/planning/scout-semantic", "task": "…" }
-  ]
-}
-```
+**Phase 1 — planning context (parent default):**
+- Use `graphify query`, `sg -p`, `ccc search`, and reads as needed.
+- Write `artifacts/planning-context.yaml` via `write_harness_yaml`.
+- Optional: single `planning-context` subagent when isolation helps.
-Then parallel decompose + hypothesis, Phase 3.5 implementation + stack research, parent PlanPacket + `ask_user` (after 3.5), execution-plan-author, DAG gate, `harness_plan_debate_eligibility` + debate rounds, then `approve_plan` + `create_plan`.
+**Phase 2 — sequential:**
-Scouts use **Haiku**, `thinking: low`, **8** max turns (see agent frontmatter). Effective `--tools` omits `grep`/`find`/`subagent` per `disallowed_tools`.
+```
+subagent decompose → gate decomposition.yaml
+subagent hypothesis → gate hypothesis.yaml
+```
-## Tools
+**Phase 3.5 — research artifacts required:** parent inline and/or parallel `implementation-researcher` + `stack-researcher` (≤2).
-- `subagent` — harness subprocess spawns (modes: `single`, `tasks`, `chain`, `aggregator`)
-- `approve_plan`, `create_plan` — parent orchestrator only
-- Subprocess agents cannot nest `subagent` (`subagent` stripped from child `--tools`)
+Then execution-plan-author, DAG gate, debate eligibility, sequential debate rounds, `approve_plan` + `create_plan`.
 ## References
-- ADR 0032, ADR 0033, `.pi/harness/specs/harness-spawn-context.schema.json`
+- ADR 0032, ADR 0033, ADR 0040, ADR 0041, `.pi/harness/specs/harness-spawn-context.schema.json`
 - `node "$UP_PKG/.pi/scripts/harness-agents-manifest.mjs" --check`

package/.agents/skills/harness-plan/SKILL.md CHANGED Viewed

@@ -1,33 +1,44 @@
 ---
 name: harness-plan
-description: PM-grade harness plans — scouts, Phase 3.5 implementation research, ExecutionPlan, DAG validation, selective Review Gate debate, then approve/create_plan.
+description: Agent-native harness plans — lakes/context bundles, planning context, parallel_probes debate profile, plan-synthesizer on low/med risk, path-first approve_plan/create_plan, then DAG + debate.
 ---
 # harness-plan
+**Practice map:** `.pi/harness/docs/practice-map.md` · **ADR 0040** · **ADR 0042** · **ADR 0043**.
 ## When to use
 - `/harness-plan`, harness-auto plan phase, drift replan, policy-gate without approved plan
+## Team topology (spawn laws)
+1. **Parallelism law** — Parallel `tasks` only for independent lanes (implementation ∥ stack ≤2). Never parallelize debate or decompose ∥ hypothesis.
+2. **Two-pizza cap** — Max 1 debate agent, 1 optional planning-context subagent, per `subagent` call.
+3. **No redundant thinkers** — Read upstream YAML; do not re-run graphify in decompose when `planning-context` architecture coverage is ok.
+4. **Sequential chain** — planning context → decompose → hypothesis → research → author → DAG → debate → approve.
+5. **Tool intelligence** — Parent picks graphify, sg, ccc by task; no mandatory tool-tied scout subprocesses.
 ## Workflow (parent orchestrator)
-1. Parallel scouts (graphify + structure; semantic unless `--quick`) — each scout ends with **`submit_scout_findings`** (not JSON in final message).
-2. Parallel decompose + hypothesis — **`submit_decomposition`** / **`submit_hypothesis`**.
-3. **Phase 3.5 (required):** parallel `implementation-researcher` + `stack-researcher` — **`submit_implementation_research`** / **`submit_stack`**; parent merges into `research-brief.yaml` via `write_harness_yaml`.
-4. Draft `PlanPacket` shell; `ask_user` on material fork **after** Phase 3.5.
-5. `execution-plan-author` → merge `execution_plan`.
-6. **`validate-plan-dag.mjs`** (must pass).
-7. **`harness_plan_debate_eligibility`** → **`harness_debate_open`** with profile → Review Gate (debate agents use lane **`submit_*`** tools; parent reads submit from `tool_result`, not `finalOutput` JSON).
-8. **`harness_artifact_ready`** on required paths → apply patches, re-validate DAG, `approve_plan`, `create_plan`.
+1. **Phase 1:** Compile `artifacts/planning-context.yaml` with tools (default) or optional `planning-context` subagent.
+2. **Sequential** decompose → gate `artifacts/decomposition.yaml`.
+3. **Sequential** hypothesis (requires decomposition).
+4. **Phase 3.5:** `implementation-research.yaml` + `stack.yaml` (parent inline and/or parallel researchers).
+5. Draft `PlanPacket` shell; `ask_user` on material fork **after** Phase 3.5.
+6. `execution-plan-author` → merge `execution_plan`.
+7. **`validate-plan-dag.mjs`** (must pass).
+8. **`harness_plan_debate_eligibility`** — `parallel_probes` spawns plan-evaluator ∥ plan-adversary, then integrator round.
+9. **`approve_plan({ human_summary? })`** / **`create_plan()`** — packet from `plan_packet_path` on disk (path-first).
-`--quick` skips semantic scout and post-run adversary only — **not** implementation research or plan debate.
+`--quick` skips semantic coverage in planning context and post-run adversary only — **not** adequate reconnaissance, implementation/stack artifacts (med/high risk), or plan debate.
 ## Rules
-- On-disk plan artifacts are **YAML** (`plan-packet.yaml`, `research-brief.yaml`).
+- On-disk plan artifacts are **YAML** (`plan-packet.yaml`, `research-brief.yaml`, `planning-context.yaml`).
 - Subagents read-only; parent writes run artifacts and calls `approve_plan` / `create_plan`.
 - context-mode only on harness paths.
-- Phase 3.5 required unless documented waiver; high risk requires implementation artifact for approval.
+- Phase 3.5 artifacts required for med/high risk unless documented waiver.
 ## Output

package/.agents/skills/harness-review/SKILL.md ADDED Viewed

@@ -0,0 +1,52 @@
+---
+name: harness-review
+description: >-
+  Post-run verification gate (/harness-review): harness-verify, Sentrux fitness
+  functions, benchmark + verdict evaluator, adversary, optional tie-breaker.
+  Subagents use submit_*; parent uses harness_artifact_ready. Use after
+  /harness-run; claim cross-session runs with /harness-use-run --claim.
+---
+# harness-review
+**Practice map:** `.pi/harness/docs/practice-map.md` (Monitoring and Controlling: measure → judge → red team).
+## When to use
+- After `/harness-run` completes (same session preferred)
+- Resuming with `/harness-use-run <run-id> --claim` then `/harness-review`
+- Instead of separate `/harness-eval`, `/harness-critic` (aliases forward here)
+## Orchestration summary
+| Phase | Practice | Actor | Artifact |
+|-------|----------|-------|----------|
+| 1 | Automated QC + Sentrux fitness functions | Parent | `harness-verify.mjs`, `harness-sentrux-cli.mjs gate`, `benchmark-log.yaml`, `sentrux-signal.yaml` |
+| 2 | Measure actuals (EVM) | `harness/reviewing/evaluator` benchmark | `eval-verdict.yaml` |
+| 2b | Controlling | Parent | Write `review-outcome.yaml`; route via `remediation_class` (not fail-fast abort) |
+| 6 | Outcome | Parent | `review-outcome.yaml` → `/harness-steer` or replan |
+| 3 | Policy audit | `harness/reviewing/evaluator` verdict | same YAML |
+| 4 | Red team | `harness/reviewing/adversary` | `adversary-report.yaml` |
+| 5 | Arbitration | `harness/reviewing/tie-breaker` | only if block + conditional_pass |
+## Phase 1 — Sentrux (structural actuals)
+When `HARNESS_SENTRUX_REQUIRED=true` (default in `.env.example`):
+1. `node "$UP_PKG/.pi/scripts/harness-verify.mjs"` — rules drift + Sentrux check when CLI installed.
+2. `node "$UP_PKG/.pi/scripts/harness-sentrux-cli.mjs" gate` — compare to baseline saved during `/harness-run`.
+3. Write `artifacts/sentrux-signal.yaml` and append session entry `harness-sentrux-signal` (observation bus / PostHog).
+4. Optional `artifacts/benchmark-log.yaml` fields: `sentrux_check`, `sentrux_gate`, `harness_verify`.
+Pass `sentrux-signal.yaml` path to evaluator `mode: benchmark` spawn context. Evaluator treats metrics as measured facts, not goals for the executor.
+## Rules
+- Parent never writes eval/adversary YAML — subprocess `submit_*` only (ADR 0037).
+- Auto-claim run ownership unless `--readonly`.
+- Disk verdict drives `next_recommended_command` (`resolveCompletionStatuses`).
+## Aliases
+- `/harness-eval` → use `/harness-review`
+- `/harness-critic` → use `/harness-review` (or `--quick` to skip adversary)

package/.agents/skills/harness-sentrux-setup/SKILL.md CHANGED Viewed

@@ -11,6 +11,17 @@ description: Bootstrap Sentrux architectural rules for harness projects — seed
 - Target repo has no `.sentrux/rules.toml` or `harness-verify` reports rules out of date
 - User edited `.pi/harness/sentrux/architecture.manifest.json` (layers, boundaries, constraints)
+## Roles (do not conflate)
+| Role | Agent / command | Layer |
+|------|-----------------|-------|
+| **Bootstrap** | `harness/sentrux-bootstrap`, `harness-sentrux-bootstrap.mjs` | Greenfield seed + first sync |
+| **Steward** | `harness/sentrux-steward`, `/harness-sentrux-steward` | Proposes manifest changes (`artifacts/sentrux-manifest-proposal.yaml`); chair applies |
+| **Sync** | `sentrux-rules-sync.mjs`, `/harness-sentrux-sync` | Regenerates `rules.toml` from manifest after intent change |
+| **Observation** | `/harness-run`, `/harness-review` | `harness-sentrux-cli.mjs gate --save` / `check` / `gate` → `artifacts/sentrux-signal.yaml` |
+Never auto-sync manifest from directory trees. Material manifest edits need steward evidence + chair approval (ADR 0009).
 ## Canonical layout
 | Path | Role |
@@ -28,6 +39,7 @@ Custom TOML **outside** `# --- harness:managed:start/end ---` is preserved on ev
 | First-time / harness-setup (idempotent) | `node "$UP_PKG/.pi/scripts/harness-sentrux-bootstrap.mjs"` |
 | After manifest edits | `node "$UP_PKG/.pi/scripts/harness-sentrux-bootstrap.mjs" --force` |
 | CI / verify only | `node "$UP_PKG/.pi/scripts/sentrux-rules-sync.mjs" --check` |
+| Run/review Sentrux observation | `node "$UP_PKG/.pi/scripts/harness-sentrux-cli.mjs" check` / `gate [--save]` |
 | In pi session | `/harness-sentrux-sync` (extension; uses `--force`) |
 **Bootstrap vs `--force`:** Default bootstrap/sync skips rewriting `rules.toml` when the manifest hash is unchanged. Use `--force` (or `/harness-sentrux-sync`) after changing `architecture.manifest.json` or when verify reports drift.
@@ -40,7 +52,7 @@ Custom TOML **outside** `# --- harness:managed:start/end ---` is preserved on ev
    node "$UP_PKG/.pi/scripts/harness-sentrux-bootstrap.mjs"
    ```
 3. Optional: `sentrux plugin add-standard` (language plugins; harness-setup Step 2.8).
-4. `sentrux check .` — fix violations or tune manifest `max_cc` / layers.
+4. `node "$UP_PKG/.pi/scripts/harness-sentrux-cli.mjs" check` — fix violations or tune manifest `max_cc` / layers.
 5. Commit `.sentrux/rules.toml` and project-specific `architecture.manifest.json`.
 ## External repos
@@ -52,5 +64,6 @@ Do **not** copy ultimate-pi's layer paths blindly into unrelated layouts — edi
 ## References
 - ADR 0009 — `.pi/harness/docs/adrs/0009-sentrux-rules-lifecycle.md`
-- Scripts — `.pi/scripts/sentrux-rules-sync.mjs`, `harness-sentrux-bootstrap.mjs`
-- Agent — `harness/sentrux-bootstrap` (optional delegate for setup-only runs)
+- Scripts — `.pi/scripts/sentrux-rules-sync.mjs`, `harness-sentrux-bootstrap.mjs`, `harness-sentrux-cli.mjs`
+- Agents — `harness/sentrux-bootstrap` (setup), `harness/sentrux-steward` (intent proposals)
+- Specs — `sentrux-manifest-proposal.schema.json`, `sentrux-signal.schema.json`

package/.agents/skills/harness-steer/SKILL.md ADDED Viewed

@@ -0,0 +1,14 @@
+---
+name: harness-steer
+description: Post-review repair loop via harness-steer and executor repair mode (ADR 0044).
+---
+# harness-steer
+Use after `/harness-review` when `artifacts/review-outcome.yaml` has `remediation_class: implementation_gap`.
+1. Read `repair-brief.yaml` and `plan_packet_path` (paths only).
+2. Set policy phase `execute`; spawn `harness/running/executor` with `mode: repair`.
+3. Always follow with `/harness-review`.
+See `.pi/prompts/harness-steer.md` and `.pi/harness/docs/adrs/0044-harness-steer-loop.md`.

package/.agents/skills/sentrux/SKILL.md CHANGED Viewed

@@ -35,22 +35,22 @@ sentrux plugin add-standard
 ## Core workflows (project root)
-Run from the **target repo root** (where `.sentrux/rules.toml` lives).
+Run from the **target repo root** (where `.sentrux/rules.toml` lives), or prefer the bundled wrapper when invoked by harness commands from run directories.
 | When | Command | Notes |
 |------|---------|-------|
-| CI / pre-commit | `sentrux check .` | Exit 0 = pass, 1 = violations |
-| Before agent work | `sentrux gate --save .` | Save session baseline |
-| After agent work | `sentrux gate .` | Detect degradation vs baseline |
+| CI / pre-commit | `node "$UP_PKG/.pi/scripts/harness-sentrux-cli.mjs" check` | Exit 0 = pass, 1 = violations |
+| Before agent work | `node "$UP_PKG/.pi/scripts/harness-sentrux-cli.mjs" gate --save` | Save session baseline |
+| After agent work | `node "$UP_PKG/.pi/scripts/harness-sentrux-cli.mjs" gate` | Detect degradation vs baseline |
 | Explore structure | `sentrux` or `sentrux .` | GUI treemap (optional) |
 Typical agent loop:
 ```bash
-sentrux gate --save .
+node "$UP_PKG/.pi/scripts/harness-sentrux-cli.mjs" gate --save
 # … agent edits …
-sentrux check .          # rules still pass?
-sentrux gate .           # structural regression?
+node "$UP_PKG/.pi/scripts/harness-sentrux-cli.mjs" check  # rules still pass?
+node "$UP_PKG/.pi/scripts/harness-sentrux-cli.mjs" gate   # structural regression?
 ```
 If `check` fails, fix violations or tune manifest constraints (see **Rules** below). If `gate` reports degradation, inspect changed modules before merging.
@@ -73,7 +73,7 @@ Custom TOML outside `# --- harness:managed:start/end ---` is preserved on sync.
 |-------|------|
 | `sentrux-rules-sync` extension | Session start: warns if `rules.toml` drifts; auto-sync after plan/merge phases |
 | `/harness-sentrux-sync` | Force-regenerate rules from manifest (pi command) |
-| `harness-verify.mjs` | Runs `sentrux check .` when rules present |
+| `harness-verify.mjs` | Runs rules sync and Sentrux checks when rules are present |
 | **observation-bus** | Maps `harness-sentrux-signal` custom entries → evaluator observations |
 | **harness-eval** | Evaluate phase may require a Sentrux quality signal (stub or future MCP) per ADR 0006 |
@@ -90,7 +90,7 @@ High level: **execute** uses CLI gate/check around edits; **evaluate** consumes
 - Assume Sentrux **MCP** tools (`scan`, `session_start`, `health`, etc.) exist in **Pi** — they do not; use CLI only
 - Edit or rely on `.pi/mcp.json` for Pi sessions
 - Duplicate bootstrap/sync steps from **harness-sentrux-setup**
-- Skip `sentrux check .` after large refactors when `.sentrux/rules.toml` exists
+- Skip the root-resolving Sentrux check after large refactors when `.sentrux/rules.toml` exists
 ## References

package/.pi/agents/harness/planning/decompose.md CHANGED Viewed

@@ -7,7 +7,9 @@ thinking: medium
 max_turns: 12
 ---
-You are the **Harness planning decomposer (Phase 1)**.
+You are the **Harness problem-framing agent (Phase 2a — lakes / scope)**.
+**Inspection role:** Outcome author (lake-sized units, not ticket WBS). See `.pi/harness/docs/practice-map.md` and ADR 0042.
 ## Mission
@@ -19,9 +21,10 @@ Read `HarnessSpawnContext` and the merged **scout lane JSON** in the spawn promp
 ## Process
-1. Synthesize scout findings into constraints, prior art, and tensions — cite `key_paths` when available.
-2. If scouts are thin, run read-only `graphify query` / `sg -p` for evidence (no `graphify update`, installs, or redirects).
-3. Do not read `.pi/harness/specs/*.schema.json` from disk.
+1. Read Phase 1 reconnaissance from spawn context paths — prefer `artifacts/planning-context.yaml`; legacy `artifacts/scout-*.yaml` lanes are accepted when present.
+2. Synthesize findings into constraints, prior art, and tensions — cite `key_paths` / `evidence_refs` when available.
+3. **Graphify dedup:** If `planning-context.yaml` has `coverage.architecture.status` of `ok`, do **not** run `graphify query` / `graphify explain` / `graphify path`. If architecture coverage is missing or failed, you may run read-only `graphify query` / `sg -p` (no `graphify update`, installs, or redirects).
+4. Do not read `.pi/harness/specs/*.schema.json` from disk.
 ## Phase 1 — DeepMind-style decomposition

package/.pi/agents/harness/planning/hypothesis-validator.md CHANGED Viewed

@@ -7,6 +7,8 @@ thinking: medium
 max_turns: 10
 ---
+**Inspection role:** Blind verifier (independent verification; debate R1 only). See `.pi/harness/docs/practice-map.md`.
 ## Your task
 Blindly evaluate whether `PlanHypothesisBrief` is falsifiable, relevant to the task, and worth building — without seeing decomposition, scouts, or PlanPacket.

package/.pi/agents/harness/planning/hypothesis.md CHANGED Viewed

@@ -7,7 +7,9 @@ thinking: medium
 max_turns: 14
 ---
-You are the **Harness planning hypothesis generator (Phase 2 — DARWIN)**.
+You are the **Harness planning hypothesis generator (Phase 2b — DARWIN)**.
+**Role:** Approach author after WBS (Lean hypothesis-driven planning). Requires `artifacts/decomposition.yaml`. See `.pi/harness/docs/practice-map.md`.
 ## Mission

package/.pi/agents/harness/planning/plan-adversary.md CHANGED Viewed

@@ -7,6 +7,8 @@ thinking: medium
 max_turns: 14
 ---
+**Inspection role:** Red team (adversarial review). See `.pi/harness/docs/practice-map.md`.
 ## Your task
 Stress-test the ExecutionPlan with reproducible counterexamples. Map every finding to evaluator `claim_id`s from the messenger thread or validation-turn YAML.

package/.pi/agents/harness/planning/plan-evaluator.md CHANGED Viewed

@@ -7,6 +7,8 @@ thinking: medium
 max_turns: 14
 ---
+**Inspection role:** Inspector (neutral Fagan-style checklist). See `.pi/harness/docs/practice-map.md`.
 ## Your task
 Score the ExecutionPlan against Validation Checks for one Review Gate round. Emit stable `checks[]` with ids and messenger-ready `claim_ids`. You are not an advocate for the plan.

package/.pi/agents/harness/planning/plan-synthesizer.md ADDED Viewed

@@ -0,0 +1,25 @@
+---
+name: harness/planning/plan-synthesizer
+description: Lake-first plan synthesis for low/med risk — problem framing, hypothesis, and execution_plan draft in one pass.
+---
+# Plan synthesizer
+You produce **lake-sized** outcomes (ADR 0042), not ticket-granularity WBS. Read `artifacts/planning-context.yaml`, research briefs, and prior artifacts from disk paths in `HarnessSpawnContext` — do not re-run graphify when coverage is already ok.
+## Outputs (all required on disk)
+1. **`submit_decomposition_brief`** → `artifacts/decomposition.yaml` — `core_tension`, `lakes[]` (outcome, scope boundary, verification intent), not a deep task tree.
+2. **`submit_hypothesis_brief`** → `artifacts/hypothesis.yaml` — falsifiable claim grounded in decomposition.
+3. **`submit_execution_plan_brief`** → `artifacts/execution-plan-draft.yaml` — lake-first `execution_plan` with `work_items` (each with `lake_id`, rich `description`, optional `context_bundle_path`), `executor_strategy` (`single_pass` for low, `per_lake` for med unless user dictates otherwise).
+## Rules
+- Use **`submit_*({ source_path })`** when drafts exist on disk (ADR 0043); otherwise `document`.
+- Do not spawn subprocesses; you are the subprocess.
+- Match schemas under `.pi/harness/specs/`.
+- Parent runs `validate-plan-dag.mjs` after merge into `plan-packet.yaml`.
+## High risk
+If `--risk high` or material fork, stop and tell parent to use sequential `decompose` → `hypothesis` → `execution-plan-author` instead.