ultimate-pi 0.16.0 → 0.18.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.agents/skills/harness-context/SKILL.md +13 -6
- package/.agents/skills/harness-debate-plan/SKILL.md +37 -20
- package/.agents/skills/harness-eval/SKILL.md +6 -21
- package/.agents/skills/harness-governor/SKILL.md +4 -3
- package/.agents/skills/harness-orchestration/SKILL.md +39 -51
- package/.agents/skills/harness-plan/SKILL.md +23 -12
- package/.agents/skills/harness-review/SKILL.md +52 -0
- package/.agents/skills/harness-sentrux-setup/SKILL.md +13 -1
- package/.agents/skills/harness-steer/SKILL.md +14 -0
- package/.pi/agents/harness/adversary.md +3 -10
- package/.pi/agents/harness/evaluator.md +3 -12
- package/.pi/agents/harness/executor.md +12 -14
- package/.pi/agents/harness/planning/decompose.md +7 -4
- package/.pi/agents/harness/planning/hypothesis-validator.md +2 -0
- package/.pi/agents/harness/planning/hypothesis.md +4 -2
- package/.pi/agents/harness/planning/implementation-researcher.md +1 -1
- package/.pi/agents/harness/planning/plan-adversary.md +2 -0
- package/.pi/agents/harness/planning/plan-evaluator.md +2 -0
- package/.pi/agents/harness/planning/plan-synthesizer.md +25 -0
- package/.pi/agents/harness/planning/planning-context.md +48 -0
- package/.pi/agents/harness/planning/review-integrator.md +2 -0
- package/.pi/agents/harness/planning/scout-graphify.md +3 -1
- package/.pi/agents/harness/planning/scout-semantic.md +3 -1
- package/.pi/agents/harness/planning/scout-structure.md +3 -1
- package/.pi/agents/harness/planning/sprint-contract-auditor.md +2 -0
- package/.pi/agents/harness/sentrux-steward.md +51 -0
- package/.pi/extensions/00-posthog-network-bootstrap.ts +11 -0
- package/.pi/extensions/harness-debate-tools.ts +12 -3
- package/.pi/extensions/harness-live-widget.ts +27 -1
- package/.pi/extensions/harness-plan-approval.ts +62 -56
- package/.pi/extensions/harness-run-context.ts +553 -84
- package/.pi/extensions/harness-subagent-submit.ts +43 -33
- package/.pi/extensions/harness-telemetry.ts +29 -4
- package/.pi/extensions/lib/debate-bus-core.ts +15 -9
- package/.pi/extensions/lib/harness-artifact-gate.ts +182 -0
- package/.pi/extensions/lib/harness-posthog.ts +9 -5
- package/.pi/extensions/lib/harness-spawn-topology.ts +188 -0
- package/.pi/extensions/lib/harness-subagent-auth.ts +105 -19
- package/.pi/extensions/lib/harness-subagent-policy.ts +37 -19
- package/.pi/extensions/lib/harness-subagent-precheck.ts +35 -9
- package/.pi/extensions/lib/harness-subagent-submit-pipeline.ts +66 -2
- package/.pi/extensions/lib/harness-subagent-submit-registry.ts +21 -3
- package/.pi/extensions/lib/harness-subagents-bridge.ts +91 -28
- package/.pi/extensions/lib/harness-subprocess-bootstrap.ts +73 -0
- package/.pi/extensions/lib/plan-approval/create-plan.ts +2 -3
- package/.pi/extensions/lib/plan-approval/resolve-disk.ts +102 -0
- package/.pi/extensions/lib/plan-approval/schema.ts +22 -8
- package/.pi/extensions/lib/plan-approval/types.ts +1 -1
- package/.pi/extensions/lib/plan-approval/validate.ts +2 -2
- package/.pi/extensions/lib/plan-approval-readiness.ts +241 -0
- package/.pi/extensions/lib/plan-debate-eligibility.ts +67 -7
- package/.pi/extensions/lib/plan-debate-focus.ts +21 -9
- package/.pi/extensions/lib/plan-debate-gate.ts +101 -17
- package/.pi/extensions/lib/plan-debate-lanes.ts +57 -3
- package/.pi/extensions/lib/plan-debate-round-status.ts +18 -7
- package/.pi/extensions/lib/plan-messenger.ts +4 -0
- package/.pi/extensions/lib/plan-review-gate.ts +59 -0
- package/.pi/extensions/lib/posthog-client.ts +76 -0
- package/.pi/extensions/policy-gate.ts +24 -19
- package/.pi/extensions/trace-recorder.ts +1 -0
- package/.pi/harness/agents.manifest.json +24 -16
- package/.pi/harness/corpus/cron.example +8 -0
- package/.pi/harness/corpus/graphify-kb-updater.config.json +159 -0
- package/.pi/harness/corpus/systemd/graphify-kb-updater.env.template +4 -0
- package/.pi/harness/corpus/systemd/graphify-kb-updater.service +17 -0
- package/.pi/harness/corpus/systemd/graphify-kb-updater.timer +11 -0
- package/.pi/harness/docs/adrs/0001-harness-constitution.md +2 -1
- package/.pi/harness/docs/adrs/0006-sentrux-dual-layer.md +7 -6
- package/.pi/harness/docs/adrs/0009-sentrux-rules-lifecycle.md +6 -1
- package/.pi/harness/docs/adrs/0031-harness-run-context.md +1 -1
- package/.pi/harness/docs/adrs/0032-harness-command-orchestration.md +7 -0
- package/.pi/harness/docs/adrs/0034-darwin-plan-research-pipeline.md +3 -3
- package/.pi/harness/docs/adrs/0036-implementation-research-and-selective-debate.md +8 -5
- package/.pi/harness/docs/adrs/0039-harness-post-run-review-gate.md +47 -0
- package/.pi/harness/docs/adrs/0040-practice-grounded-orchestration.md +40 -0
- package/.pi/harness/docs/adrs/0041-intelligent-planning-reconnaissance.md +39 -0
- package/.pi/harness/docs/adrs/0042-agent-native-orchestration.md +35 -0
- package/.pi/harness/docs/adrs/0043-path-first-harness-tools.md +38 -0
- package/.pi/harness/docs/adrs/0044-harness-steer-loop.md +36 -0
- package/.pi/harness/docs/adrs/README.md +10 -0
- package/.pi/harness/docs/graphify-kb-updater-runbook.md +157 -0
- package/.pi/harness/docs/practice-map.md +110 -0
- package/.pi/harness/env.harness.template +5 -3
- package/.pi/harness/evals/smoke/fixtures/plan-phase/minimal-med-fast/artifacts/implementation-research.yaml +28 -0
- package/.pi/harness/evals/smoke/fixtures/plan-phase/minimal-med-fast/artifacts/review-round-consolidated.yaml +25 -0
- package/.pi/harness/evals/smoke/fixtures/plan-phase/minimal-med-fast/plan-packet.yaml +196 -0
- package/.pi/harness/evals/smoke/fixtures/plan-phase/minimal-med-fast/plan-review.md +14 -0
- package/.pi/harness/evals/smoke/fixtures/plan-phase/minimal-med-fast/research-brief.yaml +62 -0
- package/.pi/harness/evals/smoke/sentrux-stub.json +1 -1
- package/.pi/harness/evals/smoke/smoke-harness-plan.mjs +43 -17
- package/.pi/harness/specs/README.md +1 -1
- package/.pi/harness/specs/harness-run-context.schema.json +11 -0
- package/.pi/harness/specs/harness-spawn-context.schema.json +14 -0
- package/.pi/harness/specs/plan-execution-plan.schema.json +39 -1
- package/.pi/harness/specs/plan-packet.schema.json +4 -0
- package/.pi/harness/specs/plan-phase-status.schema.json +17 -0
- package/.pi/harness/specs/plan-phase-waiver.schema.json +25 -0
- package/.pi/harness/specs/plan-planning-context.schema.json +50 -0
- package/.pi/harness/specs/plan-review-round-draft.schema.json +1 -1
- package/.pi/harness/specs/repair-brief.schema.json +45 -0
- package/.pi/harness/specs/review-outcome.schema.json +46 -0
- package/.pi/harness/specs/sentrux-manifest-proposal.schema.json +80 -0
- package/.pi/harness/specs/sentrux-signal.schema.json +43 -0
- package/.pi/harness/specs/steer-state.schema.json +20 -0
- package/.pi/lib/harness-context-mode-policy.ts +256 -0
- package/.pi/lib/harness-repair-brief.ts +145 -0
- package/.pi/lib/harness-run-context.ts +591 -32
- package/.pi/lib/harness-ui-state.ts +87 -9
- package/.pi/model-router.example.json +13 -4
- package/.pi/prompts/harness-auto.md +9 -9
- package/.pi/prompts/harness-critic.md +3 -30
- package/.pi/prompts/harness-eval.md +4 -37
- package/.pi/prompts/harness-plan.md +139 -57
- package/.pi/prompts/harness-review.md +150 -15
- package/.pi/prompts/harness-run.md +62 -10
- package/.pi/prompts/harness-sentrux-steward.md +55 -0
- package/.pi/prompts/harness-setup.md +4 -4
- package/.pi/prompts/harness-steer.md +30 -0
- package/.pi/scripts/graphify-kb-updater.mjs +358 -0
- package/.pi/scripts/harness-generate-model-router.mjs +118 -36
- package/.pi/scripts/harness-model-router-routing.test.mjs +97 -0
- package/.pi/scripts/harness-sync-model-router.mjs +15 -2
- package/.pi/scripts/harness-verify.mjs +51 -6
- package/.pi/scripts/harness-web-policy-guard.mjs +68 -0
- package/.pi/scripts/validate-plan-dag.mjs +3 -3
- package/AGENTS.md +1 -0
- package/CHANGELOG.md +22 -0
- package/package.json +5 -4
- package/vendor/pi-model-router/UPSTREAM_PIN.md +3 -1
- package/vendor/pi-model-router/extensions/commands.ts +4 -4
- package/vendor/pi-model-router/extensions/index.ts +21 -0
- package/vendor/pi-model-router/extensions/provider.ts +130 -79
- package/vendor/pi-model-router/extensions/routing.ts +148 -0
- package/vendor/pi-model-router/extensions/state.ts +3 -0
- package/vendor/pi-model-router/extensions/types.ts +9 -0
- package/vendor/pi-model-router/extensions/ui.ts +16 -2
- package/.pi/prompts/git-sync.md +0 -124
|
@@ -1,37 +1,172 @@
|
|
|
1
1
|
---
|
|
2
|
-
description:
|
|
3
|
-
argument-hint: "[--run <run-id>] [--trace <trace-ref>]"
|
|
2
|
+
description: Post-run verification gate — deterministic checks, benchmark eval, policy verdict, adversary review (master orchestrator).
|
|
3
|
+
argument-hint: "[--run <run-id>] [--quick] [--readonly] [--trace <trace-ref>]"
|
|
4
4
|
---
|
|
5
5
|
|
|
6
6
|
# harness-review
|
|
7
7
|
|
|
8
|
-
|
|
8
|
+
You are the **post-run verification PM** (PMBOK Monitoring and Controlling). Run measure → judge → red team in one command. Parent owns `ask_user`, deterministic scripts, `harness_artifact_ready`, and run ownership (`--claim` on resume). Subagents persist via **`submit_*`** only (no parent `write` to verdict artifacts).
|
|
9
9
|
|
|
10
|
-
|
|
10
|
+
**Practice map:** `.pi/harness/docs/practice-map.md`
|
|
11
11
|
|
|
12
|
-
-
|
|
12
|
+
Read **harness-orchestration** and **harness-review** skills before spawning.
|
|
13
|
+
|
|
14
|
+
## Allowed subagents
|
|
15
|
+
|
|
16
|
+
- `harness/evaluator` (`mode: benchmark` then `mode: verdict`)
|
|
17
|
+
- `harness/adversary` (independent red team)
|
|
18
|
+
- `harness/tie-breaker` (escalation only when adversary blocks and eval was `conditional_pass`; skip when `--quick`)
|
|
19
|
+
|
|
20
|
+
## Performance rules
|
|
21
|
+
|
|
22
|
+
1. Use `subagent` with `agentScope: "both"`.
|
|
23
|
+
2. Run benchmark and verdict evaluator passes **sequentially** (verdict depends on benchmark gate).
|
|
24
|
+
3. Adversary runs only after benchmark + policy verdict pass.
|
|
25
|
+
4. Do **not** set `timeoutMs` unless the user requests a cap.
|
|
26
|
+
5. Compact task text: embed `HarnessSpawnContext={"run_id":"…","run_dir":"…","plan_packet_path":"…",…}` — `run_id` is required.
|
|
27
|
+
|
|
28
|
+
## Step 0 — Parse `$ARGUMENTS`
|
|
29
|
+
|
|
30
|
+
- optional: `--run <run-id>` (recovery)
|
|
31
|
+
- optional: `--quick` (tailoring — skip adversary + tie-breaker when risk accepted)
|
|
32
|
+
- optional: `--readonly` (inspect only — do not claim ownership)
|
|
13
33
|
- optional: `--trace <trace-ref>`
|
|
14
34
|
|
|
15
35
|
Happy path: omit `--run`; use `[HarnessRunContext]`.
|
|
16
36
|
|
|
17
|
-
|
|
37
|
+
Prerequisites:
|
|
38
|
+
|
|
39
|
+
- `plan_ready: true` on disk
|
|
40
|
+
- Execute completed (`handoff/executor-summary.yaml` or `last_completed_step: execute`)
|
|
41
|
+
|
|
42
|
+
If execute not complete:
|
|
43
|
+
|
|
44
|
+
`Execute not finished. Run /harness-run first.`
|
|
45
|
+
|
|
46
|
+
Ownership: this command **auto-claims** the run for the current Pi session unless `--readonly`. Cross-session recovery: `/harness-use-run <run-id> --claim` first.
|
|
47
|
+
|
|
48
|
+
## Phase 1 — Automated QC / deterministic shell (parent)
|
|
49
|
+
|
|
50
|
+
**Practice:** Harness engineering; interleave deterministic checks before agent judgment (Stripe Minions pattern).
|
|
51
|
+
|
|
52
|
+
```bash
|
|
53
|
+
node "$UP_PKG/.pi/scripts/harness-verify.mjs"
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
When `HARNESS_SENTRUX_REQUIRED=true`, after verify succeeds:
|
|
57
|
+
|
|
58
|
+
```bash
|
|
59
|
+
sentrux gate .
|
|
60
|
+
```
|
|
61
|
+
|
|
62
|
+
Compare to baseline from `/harness-run` (`sentrux gate --save`). If CLI missing, record `gate_status: not_installed`.
|
|
63
|
+
|
|
64
|
+
Ensure `artifacts/sentrux-signal.yaml` exists under the run dir (written during `/harness-run`). If missing, write it from the latest `sentrux check` / `gate` output. Append or refresh session entry `harness-sentrux-signal`.
|
|
65
|
+
|
|
66
|
+
Run project tests if the approved `PlanPacket` or spawn context lists a test command. Capture stdout paths only — do not paste full logs into the next spawn.
|
|
67
|
+
|
|
68
|
+
Write `artifacts/benchmark-log.yaml` via `write_harness_yaml` when any shell step ran:
|
|
69
|
+
|
|
70
|
+
```yaml
|
|
71
|
+
schema_version: "1.0.0"
|
|
72
|
+
harness_verify: pass|fail
|
|
73
|
+
sentrux_check: pass|fail|skipped|not_installed
|
|
74
|
+
sentrux_gate: pass|degraded|skipped|not_installed
|
|
75
|
+
notes: "…"
|
|
76
|
+
```
|
|
77
|
+
|
|
78
|
+
`harness_artifact_ready({ paths: ["artifacts/benchmark-log.yaml", "artifacts/sentrux-signal.yaml"] })` when written.
|
|
79
|
+
|
|
80
|
+
## Phase 2 — Measure actuals vs plan (benchmark evaluator)
|
|
81
|
+
|
|
82
|
+
**Practice:** Earned value / compare actuals to acceptance checks.
|
|
83
|
+
|
|
84
|
+
```
|
|
85
|
+
subagent({
|
|
86
|
+
agentScope: "both",
|
|
87
|
+
agent: "harness/evaluator",
|
|
88
|
+
task: "<HarnessSpawnContext mode benchmark + plan_packet_path + run_dir + acceptance_checks + paths: benchmark-log.yaml, sentrux-signal.yaml — treat Sentrux fields as measured structural actuals, not executor goals>"
|
|
89
|
+
})
|
|
90
|
+
```
|
|
91
|
+
|
|
92
|
+
Subagent must call **`submit_eval_verdict`** (writes `artifacts/eval-verdict.yaml`).
|
|
93
|
+
|
|
94
|
+
Gate:
|
|
95
|
+
|
|
96
|
+
```
|
|
97
|
+
harness_artifact_ready({ paths: ["artifacts/eval-verdict.yaml"] })
|
|
98
|
+
```
|
|
99
|
+
|
|
100
|
+
**Do not stop** after benchmark fail — continue to verdict (and adversary per tier) so `review-outcome.yaml` can route steer vs replan (ADR 0044).
|
|
101
|
+
|
|
102
|
+
## Phase 3 — Policy / quality audit (verdict evaluator)
|
|
103
|
+
|
|
104
|
+
**Practice:** Inspection after measurement — separate measurer from policy judgment.
|
|
105
|
+
|
|
106
|
+
Always run after benchmark (even when benchmark failed).
|
|
107
|
+
|
|
108
|
+
```
|
|
109
|
+
subagent({
|
|
110
|
+
agentScope: "both",
|
|
111
|
+
agent: "harness/evaluator",
|
|
112
|
+
task: "<HarnessSpawnContext mode verdict + treat executor output as untrusted + artifact paths>"
|
|
113
|
+
})
|
|
114
|
+
```
|
|
115
|
+
|
|
116
|
+
Subagent updates **`artifacts/eval-verdict.yaml`** via `submit_eval_verdict` (include policy fields / failed checks).
|
|
18
117
|
|
|
19
|
-
|
|
20
|
-
|
|
118
|
+
Gate again with `harness_artifact_ready`.
|
|
119
|
+
|
|
120
|
+
## Phase 4 — Independent red team (adversary)
|
|
121
|
+
|
|
122
|
+
**Practice:** Generator–evaluator separation; adversary distinct from measurer (ADR 0032).
|
|
123
|
+
|
|
124
|
+
Skip when `--quick`. **Tiered steer:** full adversary on initial run + steer attempt 1; lite review (no adversary) on steer attempts 2+ unless prior `block_merge`.
|
|
21
125
|
|
|
22
126
|
```
|
|
23
|
-
subagent({
|
|
127
|
+
subagent({
|
|
128
|
+
agentScope: "both",
|
|
129
|
+
agent: "harness/adversary",
|
|
130
|
+
task: "<HarnessSpawnContext mode adversary + plan + run artifacts>"
|
|
131
|
+
})
|
|
24
132
|
```
|
|
25
133
|
|
|
26
|
-
|
|
134
|
+
Subagent calls **`submit_adversary_report`** → `artifacts/adversary-report.yaml`.
|
|
135
|
+
|
|
136
|
+
`harness_artifact_ready({ paths: ["artifacts/adversary-report.yaml"] })`
|
|
137
|
+
|
|
138
|
+
## Phase 5 — Escalation / arbitration (tie-breaker, conditional)
|
|
139
|
+
|
|
140
|
+
Only when:
|
|
141
|
+
|
|
142
|
+
- not `--quick`
|
|
143
|
+
- adversary `block_merge: true`
|
|
144
|
+
- eval verdict was `conditional_pass`
|
|
145
|
+
|
|
146
|
+
```
|
|
147
|
+
subagent({ agentScope: "both", agent: "harness/tie-breaker", task: "…" })
|
|
148
|
+
```
|
|
27
149
|
|
|
28
150
|
## Parent rules
|
|
29
151
|
|
|
30
|
-
-
|
|
31
|
-
-
|
|
152
|
+
- **Never** parse subprocess JSON to write `eval-verdict.yaml` or `adversary-report.yaml` — use `submit_*` + `harness_artifact_ready` only.
|
|
153
|
+
- Do not edit `plan-packet.yaml`.
|
|
154
|
+
- Do not run inline review checks in this session (subagent isolation per ADR 0032).
|
|
155
|
+
- Same Pi session as `/harness-run` is preferred; `--claim` makes cross-session resume work.
|
|
156
|
+
|
|
157
|
+
## Phase 6 — Review outcome + repair brief (parent)
|
|
158
|
+
|
|
159
|
+
Write **`artifacts/review-outcome.yaml`** and **`artifacts/repair-brief.yaml`** via `write_harness_yaml` (path pointers in brief, not pasted bodies).
|
|
160
|
+
|
|
161
|
+
| `remediation_class` | `recommended_next` |
|
|
162
|
+
|---------------------|-------------------|
|
|
163
|
+
| `pass` | `/harness-policy-status` |
|
|
164
|
+
| `implementation_gap` | `/harness-steer` |
|
|
165
|
+
| `plan_gap` | `/harness-plan` (mode: revise) |
|
|
166
|
+
| `rollback` | `/harness-incident` |
|
|
167
|
+
|
|
168
|
+
One `ask_user` steer gate when not pass (unless `steer_approved` on run-context).
|
|
32
169
|
|
|
33
170
|
## Completion
|
|
34
171
|
|
|
35
|
-
|
|
36
|
-
- `recommended_action`: `proceed_to_adversary`, `replan`, or `rollback`
|
|
37
|
-
- Evidence list for each failed check
|
|
172
|
+
Report eval status, remediation class, and `next_command` from `review-outcome.yaml`.
|
|
@@ -5,7 +5,9 @@ argument-hint: ""
|
|
|
5
5
|
|
|
6
6
|
# harness-run
|
|
7
7
|
|
|
8
|
-
|
|
8
|
+
**Practice map:** `.pi/harness/docs/practice-map.md`
|
|
9
|
+
|
|
10
|
+
You orchestrate the **Executing Process Group** — spawn `harness/executor` only. Do **not** implement inline.
|
|
9
11
|
|
|
10
12
|
## Step 0 — Parse arguments
|
|
11
13
|
|
|
@@ -16,28 +18,78 @@ If plan not ready:
|
|
|
16
18
|
|
|
17
19
|
`Run /harness-plan first — no approved plan in active run context.`
|
|
18
20
|
|
|
19
|
-
##
|
|
21
|
+
## Gate — No execution without baseline (change control)
|
|
22
|
+
|
|
23
|
+
**Practice:** PMBOK integrated change control — refuse work without an approved baseline.
|
|
24
|
+
|
|
25
|
+
Refuse if `plan_ready` is false.
|
|
26
|
+
|
|
27
|
+
## Pre-work — Architectural fitness baseline (parent)
|
|
28
|
+
|
|
29
|
+
**Practice:** Fitness functions (architecture governance) — save structural baseline before the executor mutates the tree.
|
|
30
|
+
|
|
31
|
+
When `HARNESS_SENTRUX_REQUIRED=true` (see `.env.example`), from **project root**:
|
|
32
|
+
|
|
33
|
+
```bash
|
|
34
|
+
sentrux gate --save .
|
|
35
|
+
```
|
|
36
|
+
|
|
37
|
+
If `sentrux` is not installed, note `gate_baseline: skipped` in run notes and continue (harness-verify may still pass rules-sync checks).
|
|
38
|
+
|
|
39
|
+
Do **not** ask the executor to optimize Sentrux metrics — observation is for `/harness-review` only.
|
|
40
|
+
|
|
41
|
+
## Orchestration — Single jelled implementer
|
|
42
|
+
|
|
43
|
+
**Practice:** Peopleware — one accountable team owns delivery; generator–evaluator separation (executor does not self-certify).
|
|
20
44
|
|
|
21
45
|
1. Confirm `[HarnessActivePlan]` / extension reports plan ready.
|
|
22
46
|
2. Build `HarnessSpawnContext` with `mode: execute`, `plan_packet_path`, `run_dir`, `acceptance_checks` from plan file.
|
|
23
|
-
3.
|
|
47
|
+
3. Include **`critical_path_work_item_ids`** from `execution_plan.schedule_metadata` in spawn task when present — executor should tackle limiting-step items first (Grove).
|
|
48
|
+
4. Spawn (max **1** agent per call):
|
|
24
49
|
|
|
25
50
|
```
|
|
26
|
-
subagent({ agentScope: "both", agent: "harness/executor", task: "<HarnessSpawnContext + handoff>" })
|
|
51
|
+
subagent({ agentScope: "both", agent: "harness/executor", task: "<HarnessSpawnContext + handoff + critical path hint>" })
|
|
27
52
|
```
|
|
28
53
|
|
|
29
|
-
|
|
30
|
-
|
|
54
|
+
5. Parse subprocess output JSON (`execution_status`, validations, rollback refs) from tool result text.
|
|
55
|
+
6. Parent persists trace/handoff artifacts under run dir if needed; do not self-review.
|
|
56
|
+
|
|
57
|
+
## Post-work — Structural observation (parent)
|
|
58
|
+
|
|
59
|
+
**Practice:** Monitoring actuals vs baseline — in-process fitness functions after generator work.
|
|
60
|
+
|
|
61
|
+
After executor subprocess completes:
|
|
62
|
+
|
|
63
|
+
```bash
|
|
64
|
+
sentrux check .
|
|
65
|
+
sentrux gate .
|
|
66
|
+
```
|
|
67
|
+
|
|
68
|
+
- If `sentrux check` exits non-zero or `gate` reports degradation → set `execution_status: scope_drift` (or `blocked` if unrecoverable); parent runs **`/harness-review`** next (not immediate replan).
|
|
69
|
+
- Write `artifacts/sentrux-signal.yaml` via `write_harness_yaml`:
|
|
70
|
+
|
|
71
|
+
```yaml
|
|
72
|
+
schema_version: "1.0.0"
|
|
73
|
+
run_id: "<run_id>"
|
|
74
|
+
check_pass: true|false
|
|
75
|
+
gate_status: pass|degraded|skipped|not_installed
|
|
76
|
+
quality_signal_summary: "<one line from CLI output>"
|
|
77
|
+
recorded_at: "<ISO8601>"
|
|
78
|
+
phase: execute
|
|
79
|
+
```
|
|
80
|
+
|
|
81
|
+
- Append session custom entry `harness-sentrux-signal` with the same fields (observation bus / telemetry).
|
|
82
|
+
|
|
83
|
+
`harness_artifact_ready({ paths: ["artifacts/sentrux-signal.yaml"] })` when written.
|
|
31
84
|
|
|
32
85
|
## Parent rules
|
|
33
86
|
|
|
34
|
-
-
|
|
35
|
-
- On `scope_drift`, stop and recommend `/harness-plan`.
|
|
87
|
+
- On `scope_drift`, finish handoff and recommend **`/harness-review`** (review classifies `plan_gap` vs `implementation_gap` — ADR 0044).
|
|
36
88
|
- Do not call `ask_user` for plan-level ambiguity — return to plan command.
|
|
37
89
|
|
|
38
90
|
## Completion
|
|
39
91
|
|
|
40
92
|
- `execution_status`: `completed`, `blocked`, or `scope_drift`
|
|
41
93
|
- `validation_summary` with command evidence
|
|
42
|
-
- `handoff_ready` for
|
|
43
|
-
- `next_command`: `/harness-
|
|
94
|
+
- `handoff_ready` for post-run review
|
|
95
|
+
- `next_command`: `/harness-review` (Monitoring and Controlling — measure then judge; same session preferred)
|
|
@@ -0,0 +1,55 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: Ad-hoc architectural intent review — spawn harness/sentrux-steward with graphify evidence.
|
|
3
|
+
argument-hint: "[--run <run-id>]"
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# harness-sentrux-steward
|
|
7
|
+
|
|
8
|
+
You are the **chair** for Sentrux **intent** evolution (manifest → rules.toml). Spawn **`harness/sentrux-steward`** only — do not edit the manifest inline without a proposal artifact.
|
|
9
|
+
|
|
10
|
+
**Skill:** `harness-sentrux-setup` — bootstrap vs steward vs sync.
|
|
11
|
+
|
|
12
|
+
## When to use
|
|
13
|
+
|
|
14
|
+
- User requests manifest / rules refresh
|
|
15
|
+
- After `/harness-plan` when execution plan adds top-level paths not covered by manifest layer globs
|
|
16
|
+
- Debate `quality` focus flags structural risk
|
|
17
|
+
- Post-run `sentrux check` failures suggesting missing boundaries (before replan)
|
|
18
|
+
|
|
19
|
+
Do **not** spawn on every `/harness-review`.
|
|
20
|
+
|
|
21
|
+
## Step 0 — Context
|
|
22
|
+
|
|
23
|
+
Use `[HarnessRunContext]` / `[HarnessActivePlan]`. Optional `--run <run-id>` for recovery.
|
|
24
|
+
|
|
25
|
+
## Step 1 — Spawn steward
|
|
26
|
+
|
|
27
|
+
```
|
|
28
|
+
subagent({
|
|
29
|
+
agentScope: "both",
|
|
30
|
+
agent: "harness/sentrux-steward",
|
|
31
|
+
task: "<HarnessSpawnContext + plan_packet_path + planning-context.yaml + execution-plan paths + scope hint>"
|
|
32
|
+
})
|
|
33
|
+
```
|
|
34
|
+
|
|
35
|
+
Gate: `harness_artifact_ready({ paths: ["artifacts/sentrux-manifest-proposal.yaml"] })`
|
|
36
|
+
|
|
37
|
+
## Step 2 — Chair decision
|
|
38
|
+
|
|
39
|
+
Read `artifacts/sentrux-manifest-proposal.yaml`.
|
|
40
|
+
|
|
41
|
+
- `change_class: none` → report no manifest change; stop.
|
|
42
|
+
- Otherwise → `ask_user` with summary, evidence bullets, and `adr_draft` if `adr_required`.
|
|
43
|
+
|
|
44
|
+
On approval:
|
|
45
|
+
|
|
46
|
+
1. Apply `manifest_patch` to `.pi/harness/sentrux/architecture.manifest.json` (parent `write` or manual edit).
|
|
47
|
+
2. `node "$UP_PKG/.pi/scripts/harness-sentrux-bootstrap.mjs" --force`
|
|
48
|
+
3. Append session custom entry `harness-architecture-changed` (triggers rules sync extension).
|
|
49
|
+
4. If `adr_required`, file harness ADR snippet or `docs/adr/` entry per team convention.
|
|
50
|
+
|
|
51
|
+
On reject: keep manifest unchanged; document decision in run notes.
|
|
52
|
+
|
|
53
|
+
## Completion
|
|
54
|
+
|
|
55
|
+
Report `change_class`, whether manifest was updated, and `sentrux check` outcome if run after sync.
|
|
@@ -327,7 +327,7 @@ sentrux plugin add-standard 2>/dev/null || echo "Plugins already installed or fa
|
|
|
327
327
|
|
|
328
328
|
## Step 3 — Pi Extension Packages
|
|
329
329
|
|
|
330
|
-
Bundled extensions load from the installed `ultimate-pi` package. **
|
|
330
|
+
Bundled extensions load from the installed `ultimate-pi` package. **Session-locked model routing** comes from a **vendored** fork of [`yeliu84/pi-model-router`](https://github.com/yeliu84/pi-model-router) in `vendor/pi-model-router/`, wired through [`.pi/extensions/pi-model-router-harness.ts`](.pi/extensions/pi-model-router-harness.ts). The router picks **one concrete model** when the session starts (from the first user prompt + system prompt complexity), then changes **thinking level only** each turn. The harness **gates** activation on `.pi/model-router.json` (Step **3.5** below) so `router/auto` cannot load prematurely. Attribution: see [THIRD_PARTY_NOTICES.md](THIRD_PARTY_NOTICES.md) and `vendor/pi-model-router/UPSTREAM_PIN.md`. Maintainer refresh: `npm run vendor:sync-router`.
|
|
331
331
|
|
|
332
332
|
Optionally install the companion lockfile used in development:
|
|
333
333
|
|
|
@@ -381,9 +381,9 @@ If generation prints "No authenticated Pi providers": warn in report — user sh
|
|
|
381
381
|
|
|
382
382
|
Do NOT block setup. If no config is written, `harness-sync-model-router.mjs` clears a premature `defaultProvider: "router"` in `.pi/settings.json`.
|
|
383
383
|
|
|
384
|
-
**Router onboarding** — The vendored extension starts only after `.pi/model-router.json` appears. Running the script above prepares that file plus optional Pi defaults (**`router` / `auto
|
|
384
|
+
**Router onboarding** — The vendored extension starts only after `.pi/model-router.json` appears. Running the script above prepares that file plus optional Pi defaults (**`router` / `auto`**, or whatever `defaultProfile` is) via `harness-sync-model-router.mjs` when `defaultProvider` was unset—then **`/reload`**. Generated profiles use **one model SKU per profile**; high/medium/low tiers differ in **thinking** only. Subagents resolve their subprocess model from the **agent system prompt** complexity (same lock rules).
|
|
385
385
|
|
|
386
|
-
Manual override: **`/router profile auto`** anytime after reload if they changed defaults.
|
|
386
|
+
Manual override: **`/router profile auto`** or **`/router profile opencode-go`** anytime after reload if they changed defaults.
|
|
387
387
|
|
|
388
388
|
## Step 3.6 — Harness agents (package-resolved)
|
|
389
389
|
|
|
@@ -677,7 +677,7 @@ Output summary table:
|
|
|
677
677
|
| sentrux | ✓/✗ | CLI + plugins; rules via Step 4.2 bootstrap |
|
|
678
678
|
| Sentrux rules.toml | ✓/✗ | `.sentrux/rules.toml` synced from manifest |
|
|
679
679
|
| pi extensions | ✓/✗ | 4 packages |
|
|
680
|
-
| model router | ✓/✗ | Package + config verified, activation via `/router profile auto` |
|
|
680
|
+
| model router | ✓/✗ | Package + config verified, activation via `/router profile auto` (or `opencode-go`) |
|
|
681
681
|
| `.env` | ✓/✗/ask | Created / keys appended / user declined |
|
|
682
682
|
|
|
683
683
|
| .gitignore | ✓/✗ | entries added (incl. `.env`) |
|
|
@@ -0,0 +1,30 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: Post-review repair pass — executor reads repair-brief.yaml, then re-verify via /harness-review.
|
|
3
|
+
argument-hint: "[--attempt N]"
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# harness-steer
|
|
7
|
+
|
|
8
|
+
Thin orchestrator for the **steer loop** (ADR 0044). Run only after `/harness-review` produced `artifacts/review-outcome.yaml` and `artifacts/repair-brief.yaml` with `remediation_class: implementation_gap`.
|
|
9
|
+
|
|
10
|
+
## Preconditions
|
|
11
|
+
|
|
12
|
+
- Active run with `plan_ready` and `plan_packet_path`
|
|
13
|
+
- `review-outcome.remediation_class` is `implementation_gap` (review outcome wins over executor `scope_drift` for routing)
|
|
14
|
+
- `steer_attempt < HARNESS_STEER_MAX_ATTEMPTS` (default 3)
|
|
15
|
+
|
|
16
|
+
## Steps
|
|
17
|
+
|
|
18
|
+
1. Read `artifacts/review-outcome.yaml`, `artifacts/repair-brief.yaml`, `plan_packet_path` (paths only — do not paste bodies into tool args).
|
|
19
|
+
2. Update `artifacts/steer-state.yaml` (`attempt`, `max_attempts`, `active: true`).
|
|
20
|
+
3. Set policy phase to **execute** before spawning executor (required for mutating tools).
|
|
21
|
+
4. One `ask_user` steer gate unless `run-context.steer_approved` is already true.
|
|
22
|
+
5. Spawn **`harness/executor`** with `HarnessSpawnContext.mode: repair` and `repair_brief_path: artifacts/repair-brief.yaml`.
|
|
23
|
+
6. Optional: `sentrux gate --save .` after repair to refresh baseline (ADR 0044).
|
|
24
|
+
7. `next_command`: **`/harness-review`** (always re-verify; tiered adversary on attempts 2+ per practice-map).
|
|
25
|
+
|
|
26
|
+
## Forbidden
|
|
27
|
+
|
|
28
|
+
- Re-call `approve_plan` unless `plan-packet.yaml` structure changed
|
|
29
|
+
- Widen scope beyond approved packet
|
|
30
|
+
- Skip review after repair
|