ultimate-pi 0.7.0 → 0.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (115) hide show
  1. package/.agents/skills/harness-decisions/SKILL.md +20 -1
  2. package/.agents/skills/harness-eval/SKILL.md +11 -13
  3. package/.agents/skills/harness-orchestration/SKILL.md +36 -30
  4. package/.agents/skills/harness-plan/SKILL.md +13 -18
  5. package/.pi/PACKAGING.md +1 -1
  6. package/.pi/agents/harness/adversary.md +20 -12
  7. package/.pi/agents/harness/evaluator.md +25 -14
  8. package/.pi/agents/harness/executor.md +27 -16
  9. package/.pi/agents/harness/incident-recorder.md +37 -0
  10. package/.pi/agents/harness/meta-optimizer.md +18 -15
  11. package/.pi/agents/harness/planner.md +26 -30
  12. package/.pi/agents/harness/tie-breaker.md +4 -2
  13. package/.pi/agents/harness/trace-librarian.md +18 -11
  14. package/.pi/agents/pi-pi/ext-expert.md +1 -1
  15. package/.pi/agents/pi-pi/keybinding-expert.md +1 -1
  16. package/.pi/agents/pi-pi/tui-expert.md +3 -3
  17. package/.pi/extensions/00-ultimate-pi-system-prompt.ts +2 -2
  18. package/.pi/extensions/budget-guard.ts +47 -18
  19. package/.pi/extensions/custom-footer.ts +8 -3
  20. package/.pi/extensions/custom-header.ts +2 -2
  21. package/.pi/extensions/debate-orchestrator.ts +1 -1
  22. package/.pi/extensions/dotenv-loader.ts +1 -1
  23. package/.pi/extensions/drift-monitor.ts +1 -1
  24. package/.pi/extensions/harness-ask-user.ts +1 -1
  25. package/.pi/extensions/harness-live-widget.ts +1 -1
  26. package/.pi/extensions/harness-run-context.ts +197 -33
  27. package/.pi/extensions/harness-telemetry.ts +1 -1
  28. package/.pi/extensions/harness-web-guard.ts +1 -1
  29. package/.pi/extensions/harness-web-tools.ts +1 -1
  30. package/.pi/extensions/lib/ask-user/dialog.ts +2 -2
  31. package/.pi/extensions/lib/ask-user/fallback.ts +1 -1
  32. package/.pi/extensions/lib/ask-user/render.ts +3 -3
  33. package/.pi/extensions/lib/harness-subagents/agent-loader.ts +1 -1
  34. package/.pi/extensions/lib/harness-subagents/agent-parser.ts +1 -1
  35. package/.pi/extensions/lib/harness-subagents/blackboard-tool.ts +1 -1
  36. package/.pi/extensions/lib/harness-subagents/harness-subagent-policy.ts +134 -0
  37. package/.pi/extensions/lib/harness-subagents/parent-ask-user-bridge.ts +89 -0
  38. package/.pi/extensions/lib/harness-subagents/spawn-policy.ts +20 -2
  39. package/.pi/extensions/lib/harness-subagents/vendored/agent-manager.ts +3 -2
  40. package/.pi/extensions/lib/harness-subagents/vendored/agent-runner.ts +44 -24
  41. package/.pi/extensions/lib/harness-subagents/vendored/context.ts +1 -1
  42. package/.pi/extensions/lib/harness-subagents/vendored/env.ts +1 -1
  43. package/.pi/extensions/lib/harness-subagents/vendored/index.ts +23 -2
  44. package/.pi/extensions/lib/harness-subagents/vendored/output-file.ts +1 -1
  45. package/.pi/extensions/lib/harness-subagents/vendored/schedule.ts +1 -1
  46. package/.pi/extensions/lib/harness-subagents/vendored/settings.ts +1 -1
  47. package/.pi/extensions/lib/harness-subagents/vendored/skill-loader.ts +1 -1
  48. package/.pi/extensions/lib/harness-subagents/vendored/types.ts +2 -2
  49. package/.pi/extensions/lib/harness-subagents/vendored/ui/agent-widget.ts +1 -1
  50. package/.pi/extensions/lib/harness-subagents/vendored/ui/conversation-viewer.ts +2 -2
  51. package/.pi/extensions/lib/harness-subagents/vendored/ui/schedule-menu.ts +1 -1
  52. package/.pi/extensions/observation-bus.ts +1 -1
  53. package/.pi/extensions/pi-model-router-harness.ts +1 -1
  54. package/.pi/extensions/policy-gate.ts +90 -20
  55. package/.pi/extensions/provider-payload-sanitize.ts +1 -1
  56. package/.pi/extensions/review-integrity.ts +76 -22
  57. package/.pi/extensions/sentrux-rules-sync.ts +1 -1
  58. package/.pi/extensions/soundboard.ts +1 -1
  59. package/.pi/extensions/test-diff-integrity.ts +1 -1
  60. package/.pi/extensions/trace-recorder.ts +1 -1
  61. package/.pi/extensions/ultimate-pi-vcc.ts +1 -1
  62. package/.pi/harness/agents.manifest.json +82 -78
  63. package/.pi/harness/docs/adrs/0031-harness-run-context.md +6 -3
  64. package/.pi/harness/docs/adrs/0032-harness-command-orchestration.md +37 -0
  65. package/.pi/harness/docs/adrs/README.md +1 -0
  66. package/.pi/harness/specs/budget-exhausted-event.schema.json +3 -1
  67. package/.pi/harness/specs/harness-spawn-context.schema.json +65 -0
  68. package/.pi/harness/specs/harness-turn.schema.json +18 -0
  69. package/.pi/lib/harness-agent-output.ts +41 -0
  70. package/.pi/lib/harness-run-context.ts +516 -37
  71. package/.pi/lib/harness-ui-state.ts +1 -1
  72. package/.pi/prompts/harness-auto.md +36 -61
  73. package/.pi/prompts/harness-critic.md +15 -28
  74. package/.pi/prompts/harness-eval.md +19 -27
  75. package/.pi/prompts/harness-incident.md +15 -34
  76. package/.pi/prompts/harness-plan.md +28 -49
  77. package/.pi/prompts/harness-review.md +16 -30
  78. package/.pi/prompts/harness-router-tune.md +16 -38
  79. package/.pi/prompts/harness-run.md +21 -38
  80. package/.pi/prompts/harness-setup.md +2 -0
  81. package/.pi/prompts/harness-trace.md +13 -30
  82. package/.pi/scripts/harness-generate-model-router.mjs +16 -13
  83. package/.pi/scripts/harness-verify.mjs +17 -0
  84. package/.pi/scripts/vendor-sync-pi-model-router.sh +10 -10
  85. package/CHANGELOG.md +25 -1
  86. package/README.md +4 -5
  87. package/THIRD_PARTY_NOTICES.md +1 -1
  88. package/package.json +13 -8
  89. package/vendor/pi-model-router/UPSTREAM_PIN.md +1 -1
  90. package/vendor/pi-model-router/extensions/commands.ts +2 -2
  91. package/vendor/pi-model-router/extensions/config.ts +2 -2
  92. package/vendor/pi-model-router/extensions/index.ts +1 -1
  93. package/vendor/pi-model-router/extensions/provider.ts +2 -2
  94. package/vendor/pi-model-router/extensions/routing.ts +2 -2
  95. package/vendor/pi-model-router/extensions/types.ts +1 -1
  96. package/vendor/pi-model-router/extensions/ui.ts +1 -1
  97. package/vendor/pi-model-router/package.json +4 -4
  98. package/vendor/pi-vcc/index.ts +1 -1
  99. package/vendor/pi-vcc/package.json +1 -1
  100. package/vendor/pi-vcc/src/commands/pi-vcc.ts +1 -1
  101. package/vendor/pi-vcc/src/commands/vcc-recall.ts +1 -1
  102. package/vendor/pi-vcc/src/core/content.ts +1 -1
  103. package/vendor/pi-vcc/src/core/load-messages.ts +1 -1
  104. package/vendor/pi-vcc/src/core/normalize.ts +1 -1
  105. package/vendor/pi-vcc/src/core/render-entries.ts +1 -1
  106. package/vendor/pi-vcc/src/core/report.ts +1 -1
  107. package/vendor/pi-vcc/src/core/search-entries.ts +1 -1
  108. package/vendor/pi-vcc/src/core/summarize.ts +1 -1
  109. package/vendor/pi-vcc/src/hooks/before-compact.ts +2 -2
  110. package/vendor/pi-vcc/src/tools/recall.ts +1 -1
  111. package/vendor/pi-vcc/src/types.ts +1 -1
  112. package/vendor/pi-vcc/tests/fixtures.ts +1 -1
  113. package/vendor/pi-vcc/tests/render-entries.test.ts +1 -1
  114. package/vendor/pi-vcc/tests/search-entries.test.ts +1 -1
  115. package/vendor/pi-vcc/tests/support/load-session.ts +2 -2
@@ -5,79 +5,54 @@ argument-hint: "\"<task>\" [--quick] [--risk low|med|high] [--budget <amount>]"
5
5
 
6
6
  # harness-auto
7
7
 
8
- Run full harness flow in one command:
9
-
10
- `plan -> execute -> evaluate -> adversary -> severity-policy decision -> commit+PR (no auto-merge)`
8
+ Pipeline orchestrator — one session, sequential `Agent` spawns. Invoke **harness-orchestration** skill for agent IDs. Do **not** implement or review inline.
11
9
 
12
10
  ## Step 0 — Parse arguments
13
11
 
14
- Read `$ARGUMENTS` and normalize:
15
-
16
- - required task: quoted or unquoted first value
17
- - optional flags: `--quick`, `--risk low|med|high`, `--budget <amount>`
12
+ - required task (quoted or first token)
13
+ - optional: `--quick`, `--risk`, `--budget`
18
14
 
19
- If task is missing, stop and return:
15
+ If task missing:
20
16
 
21
17
  `Usage: /harness-auto "<task>" [--quick] [--risk low|med|high] [--budget <amount>]`
22
18
 
23
- ## Process contract
24
-
25
- 1. Build and approve plan packet at the canonical active-run path before any mutation (extension allocates one `run_id` for the auto pipeline).
26
- 2. Execute only approved scope with rollback artifacts.
27
- 3. Run independent evaluator then adversarial reviewer.
28
- 4. Apply severity policy + strict pre-PR gates.
29
- 5. If gates pass, auto-commit and open PR; never auto-merge.
30
-
31
- ## Locked decisions (must not be changed)
32
-
33
- - Always produce a plan packet before mutation.
34
- - Adversarial review is always required.
35
- - Merge blocking authority is severity-policy-engine.
36
- - Router tuning is propose-and-approve only.
37
- - Plan ambiguity must use `ask_user` (harness-decisions skill) — no silent guessing.
38
- - Rollback artifact must be revert-commit-ready and include:
39
- - revert command
40
- - prepared revert branch
41
- - patch bundle
42
- - Debate profile is aggressive with locked confidence weights:
43
- - claim_quality=0.20
44
- - reproducibility=0.40
45
- - agreement=0.40
46
- - Strict pre-PR gate is mandatory.
47
- - Post-pass behavior is auto-commit and auto-open-PR.
48
- - Never auto-merge PR.
49
-
50
- ## Guardrails
51
-
52
- - Do not overthink straightforward gate outcomes; enforce gates deterministically.
53
- - Only follow the locked pipeline and governance decisions listed here.
54
- - Never bypass mandatory safety gates, even in `--quick` mode.
19
+ ## Orchestration (required) — same session
55
20
 
56
- ## Strict gates
21
+ 1. **Plan** — spawn `harness/planner` → parse JSON → present full plan → `ask_user` Approve/Changes/Cancel → write `plan-packet.json` only on Approve (advances phase via policy-gate).
22
+ 2. **Execute** — spawn `harness/executor` with `HarnessSpawnContext` (`mode: execute`). Summarize handoff bullets for next spawn (do not paste full subagent log).
23
+ 3. **Eval** — spawn `harness/evaluator` (`mode: benchmark`) after parent scripts if needed.
24
+ 4. **Review** — spawn `harness/evaluator` (`mode: verdict`) OR rely on eval verdict if policy allows — prefer both when strict gates require.
25
+ 5. **Adversary** — spawn `harness/adversary` with artifact paths.
26
+ 6. **Tie-breaker** — spawn `harness/tie-breaker` only if debate unresolved.
27
+ 7. **Parent** — apply locked strict gates below; commit/PR only if all pass.
57
28
 
58
- Block commit/PR if any gate fails:
29
+ No new Pi session for review — subagents use isolated context (`inherit_context: false`).
59
30
 
60
- 1. Plan gate passed.
61
- 2. Execution completed within approved scope.
62
- 3. Independent evaluator passed.
63
- 4. Adversarial review completed with consensus packet.
64
- 5. Severity-policy-engine output is `pass` or `conditional_pass`.
65
- 6. Benchmark delta checks passed.
66
- 7. Rollback artifacts generated.
31
+ ## Locked decisions (do not change)
67
32
 
68
- ## Notes
33
+ - Always produce and approve plan before mutation.
34
+ - Adversarial review always required.
35
+ - Severity-policy-engine blocks merge.
36
+ - Router tuning propose-and-approve only.
37
+ - Plan ambiguity → parent `ask_user` (harness-decisions).
38
+ - Rollback artifacts: revert command, revert branch, patch bundle.
39
+ - Debate weights: claim_quality=0.20, reproducibility=0.40, agreement=0.40.
40
+ - Strict pre-PR gate mandatory; auto-commit + open PR; never auto-merge.
69
41
 
70
- - `--quick` may reduce breadth, never safety gates.
71
- - `--risk` can tighten behavior, never disable adversary.
72
- - If risk/ambiguity is high, auto-fallback to manual `harness-plan` and use `ask_user` for blocking forks.
73
- - If execution must be interrupted safely, run `/harness-abort [reason]`, then restart with `/harness-plan "<task>"`.
74
- - Always output artifact references (`plan`, `eval`, `adversary`, `consensus`, `rollback`) and incident paths when applicable — do not ask the user to copy a run id; point to `/harness-run-status` or `/harness-trace-last` for phase handoff.
42
+ ## Strict gates
43
+
44
+ Block commit/PR if any fails: plan gate, execution in scope, evaluator pass, adversary complete, severity-policy pass/conditional_pass, benchmark deltas, rollback artifacts.
45
+
46
+ ## Notes
75
47
 
76
- ## Completion behavior
48
+ - `--quick` reduces breadth, never safety gates.
49
+ - High risk/ambiguity → stop and recommend manual `/harness-plan` with `ask_user`.
50
+ - Interrupt: `/harness-abort [reason]` then `/harness-plan`.
51
+ - Artifact refs under active run dir; `/harness-run-status` or `/harness-trace-last` for handoff.
77
52
 
78
- End with a deterministic handoff block:
53
+ ## Completion
79
54
 
80
- 1. `Pipeline status` (pass/fail per strict gate).
81
- 2. Phase trace summary and artifact references (`plan`, `eval`, `adversary`, `consensus`, `rollback`) under the active run directory.
82
- 3. `Policy outcome` (`pass`, `conditional_pass`, `block`, or `human_required`) with one-line rationale.
83
- 4. `Next action` (open PR, replan, rollback, or human override path).
55
+ 1. Pipeline status per gate
56
+ 2. Artifact references
57
+ 3. Policy outcome: `pass`, `conditional_pass`, `block`, or `human_required`
58
+ 4. Next action (PR, replan, rollback, override)
@@ -5,46 +5,33 @@ argument-hint: "[--run <run-id>] [--trace <trace-ref>] [--risk low|med|high]"
5
5
 
6
6
  # harness-critic
7
7
 
8
- Run adversarial review against the candidate result.
8
+ Orchestrator spawn `harness/adversary`.
9
9
 
10
10
  ## Step 0 — Parse arguments
11
11
 
12
- Read `$ARGUMENTS` and parse:
13
-
14
12
  - optional: `--run <run-id>` (recovery only)
15
13
  - optional: `--trace <trace-ref>`, `--risk low|med|high`
16
14
 
17
- On the happy path, **omit `--run`**. Use active run context. Prefer a session isolated from execute.
18
-
19
- ## Process
20
-
21
- 1. Assume hidden regressions exist and identify likely fault surfaces.
22
- 2. Challenge evaluator/executor assumptions with reproducible probes.
23
- 3. Emit structured adversarial findings for severity policy consumption.
24
-
25
- ## Requirements
15
+ Happy path: omit `--run`.
26
16
 
27
- - Assume hidden regressions exist until disproven.
28
- - Attempt to invalidate evaluator assumptions with concrete evidence.
29
- - Emit `AdversaryReport` matching `.pi/harness/specs/adversary-report.schema.json`.
30
- - Flag `block_merge=true` for high-confidence correctness/security/test-integrity risks.
17
+ ## Orchestration (required)
31
18
 
32
- ## Guardrails
19
+ 1. Build `HarnessSpawnContext` with `mode: adversary`, run artifacts, plan path, trace refs.
20
+ 2. Spawn:
33
21
 
34
- - Do not overthink speculative attacks; prioritize reproducible findings.
35
- - Only report risks tied to candidate behavior and gate policy.
36
- - Never claim a defect without evidence and repro steps.
22
+ ```
23
+ Agent({ subagent_type: "harness/adversary", prompt: "…" })
24
+ ```
37
25
 
38
- ## Output
26
+ 3. `get_subagent_result` — parse `AdversaryReport` JSON; parent persists for severity policy.
39
27
 
40
- - Prioritized findings with repro steps.
41
- - Structured `AdversaryReport` JSON.
42
- - Clear merge-block recommendation.
28
+ ## Parent rules
43
29
 
44
- ## Completion behavior
30
+ - Assume hidden regressions until disproven (in subagent).
31
+ - No new Pi session required.
45
32
 
46
- Always end with:
33
+ ## Completion
47
34
 
48
35
  - `block_merge` decision
49
- - top 1-3 high-confidence findings with repro pointers
50
- - explicit recommendation (`proceed`, `conditional_pass`, or `block`)
36
+ - Top findings with repro pointers
37
+ - `recommendation`: `proceed`, `conditional_pass`, or `block`
@@ -5,47 +5,39 @@ argument-hint: "[--run <run-id>] [--baseline <ref>] [--suite <name>]"
5
5
 
6
6
  # harness-eval
7
7
 
8
- Run focused evaluations for the active harness run and produce structured artifacts.
8
+ Orchestrator run deterministic scripts in parent if needed, then spawn `harness/evaluator` with `mode: benchmark`.
9
9
 
10
10
  ## Step 0 — Parse arguments
11
11
 
12
- Read `$ARGUMENTS` and parse:
13
-
14
- - optional: `--run <run-id>` (recovery only — active run is used when omitted)
12
+ - optional: `--run <run-id>` (recovery only)
15
13
  - optional: `--baseline <ref>`, `--suite <name>`
16
14
 
17
- On the happy path, **omit `--run`**. The extension injects the active run from session + project `active-run.json`.
15
+ Happy path: omit `--run`; use active run from `[HarnessRunContext]`.
18
16
 
19
- If no active run exists, stop and return:
17
+ If no active run:
20
18
 
21
19
  `No active run. Finish /harness-plan and /harness-run first, or use /harness-run-status.`
22
20
 
23
- Run in a **new Pi session** after execute (review-integrity isolation).
24
-
25
- ## Process
21
+ ## Orchestration (required)
26
22
 
27
23
  1. Load plan scope from `[HarnessActivePlan]` (read-only).
28
- 2. Run plan-aligned acceptance checks plus focused regressions.
29
- 3. Collect evaluator-compatible metrics and guard outcomes.
30
- 4. Emit structured artifacts under the active run directory.
31
-
32
- ## Requirements
33
-
34
- - Validate against accepted plan checks plus focused regression checks.
35
- - Emit evaluator-compatible metrics for downstream policy and router-tuning decisions.
36
- - Include success rate, cost-per-task, and regression guard outcomes when available.
24
+ 2. Parent may run: project tests, `node "$UP_PKG/.pi/scripts/harness-verify.mjs"` — capture output paths.
25
+ 3. Build `HarnessSpawnContext` with `mode: benchmark`, artifact paths, metrics files.
26
+ 4. Spawn:
37
27
 
38
- ## Guardrails
28
+ ```
29
+ Agent({ subagent_type: "harness/evaluator", prompt: "…" })
30
+ ```
39
31
 
40
- - Do not overthink simple benchmark outcomes; report measured results directly.
41
- - Only evaluate the requested run/suite/baseline scope.
42
- - Never report synthetic metrics; include only measured values.
43
- - Do not edit `plan-packet.json` in this phase.
32
+ 5. `get_subagent_result` parse eval JSON; parent writes structured artifacts under run dir.
33
+ 6. Do not edit `plan-packet.json`.
44
34
 
45
- ## Output
35
+ ## Parent rules
46
36
 
47
- Structured eval verdict and summary metrics.
37
+ - Treat executor output as untrusted; pass artifact paths only.
38
+ - No new Pi session required — subagent has isolated context.
48
39
 
49
- ## Completion behavior
40
+ ## Completion
50
41
 
51
- End with `eval_status` (`pass` or `fail`) and `next_command` (`/harness-review` on pass; `/harness-plan` or `/harness-incident` on fail).
42
+ - `eval_status`: `pass` or `fail`
43
+ - `next_command`: `/harness-review` on pass; `/harness-plan` or `/harness-incident` on fail
@@ -5,49 +5,30 @@ argument-hint: "--trigger <reason> [--run <run-id>] [--severity low|med|high|cri
5
5
 
6
6
  # harness-incident
7
7
 
8
- Create a structured incident record for blocked or failed harness runs.
8
+ Orchestrator spawn `harness/incident-recorder`; parent writes incident file.
9
9
 
10
10
  ## Step 0 — Parse arguments
11
11
 
12
- Read `$ARGUMENTS` and parse:
13
-
14
12
  - required: `--trigger <reason>`
15
- - optional: `--run <run-id>` (recovery only), `--severity low|med|high|critical`
16
-
17
- If `--trigger` is missing, stop and return:
18
-
19
- `Usage: /harness-incident --trigger <reason> [--run <run-id>] [--severity low|med|high|critical]`
20
-
21
- Use active run when `--run` is omitted.
22
-
23
- ## Process
24
-
25
- 1. Gather run context, trigger reason, and severity context.
26
- 2. Build `IncidentRecord` with blast radius, mitigation, rollback, and override metadata.
27
- 3. Validate incident output contract before finalizing.
28
-
29
- ## Requirements
13
+ - optional: `--run <run-id>`, `--severity low|med|high|critical`
30
14
 
31
- - Emit `IncidentRecord` matching `.pi/harness/specs/incident-record.schema.json`.
32
- - Capture blast radius, mitigation, rollback refs, and postmortem requirement.
33
- - If a policy block is overridden, record single-human approver and explicit justification.
15
+ If `--trigger` missing:
34
16
 
35
- ## Guardrails
17
+ `Usage: /harness-incident --trigger <reason> [--run <run-id>] [--severity …]`
36
18
 
37
- - Do not overthink incident narrative; prioritize factual, auditable records.
38
- - Only record details supported by available run artifacts and explicit inputs.
39
- - Never omit override approver identity or justification when override occurred.
19
+ ## Orchestration (required)
40
20
 
41
- ## Output
21
+ 1. Build `HarnessSpawnContext` with `mode: incident`, trigger, severity, run paths.
22
+ 2. Spawn:
42
23
 
43
- - Incident summary.
44
- - Structured `IncidentRecord` JSON.
45
- - Immediate rollback decision trail.
24
+ ```
25
+ Agent({ subagent_type: "harness/incident-recorder", prompt: "…" })
26
+ ```
46
27
 
47
- ## Completion behavior
28
+ 3. `get_subagent_result` — validate `IncidentRecord` draft; parent writes under `.pi/harness/incidents/`.
48
29
 
49
- Finish with:
30
+ ## Completion
50
31
 
51
- - `incident_status` (`recorded` or `needs_input`)
52
- - rollback action (`execute_now` or `standby`)
53
- - postmortem requirement (`true`/`false`)
32
+ - `incident_status`: `recorded` or `needs_input`
33
+ - `rollback_action`: `execute_now` or `standby`
34
+ - `postmortem_required`: true/false
@@ -5,75 +5,54 @@ argument-hint: "\"<task>\" [--risk low|med|high] [--budget <amount>] [--quick]"
5
5
 
6
6
  # harness-plan
7
7
 
8
- Create a machine-readable plan packet before execution.
8
+ Orchestrator only — spawn `harness/planner` once; planner runs clarification and approval via `ask_user` (parent UI). Write `plan-packet.json` only after approval. Do **not** plan inline in this session.
9
9
 
10
10
  ## Step 0 — Parse arguments
11
11
 
12
- Read `$ARGUMENTS` and parse:
12
+ Read `$ARGUMENTS`:
13
13
 
14
14
  - task statement (required)
15
- - optional flags: `--risk low|med|high`, `--budget <amount>`, `--quick`
15
+ - optional: `--risk low|med|high`, `--budget <amount>`, `--quick`
16
16
 
17
- If task is missing, stop and return:
17
+ If task is missing:
18
18
 
19
19
  `Usage: /harness-plan "<task>" [--risk low|med|high] [--budget <amount>] [--quick]`
20
20
 
21
- Do **not** require or accept `--plan` on this command.
21
+ `--quick` narrows planning breadth only it does **not** skip user approval.
22
22
 
23
23
  ## Active plan context
24
24
 
25
- If `[HarnessActivePlan]` is present in context:
25
+ Use injected context only — **do not** read `.pi/harness/specs/*.schema.json` or explore specs with bash.
26
26
 
27
- - Read the current PlanPacket from the injected `plan_packet_path` first.
28
- - Treat the user task as **revise/amend** of that packet (not a greenfield plan), unless `/harness-new-run` was used.
29
- - After drift replan or post-abort, update the same canonical file.
27
+ If `[HarnessActivePlan]` is present:
30
28
 
31
- If no prior plan file exists, create PlanPacket at the canonical path from `[HarnessRunContext]`.
29
+ - Treat task as **revise/amend** unless `/harness-new-run` was used.
30
+ - Pass `mode: revise` using the `HarnessSpawnContext` JSON in `[HarnessRunContext]`.
32
31
 
33
- ## Process
32
+ Otherwise use `HarnessSpawnContext` from `[HarnessRunContext]` for greenfield `mode: create`.
34
33
 
35
- 1. Parse the requested task and extract concrete scope and constraints.
36
- 2. If ambiguity blocks safe execution planning, call `ask_user` (harness-decisions skill). Stop with `needs_clarification` if the user cancels.
37
- 3. Build a `PlanPacket` that is valid against `.pi/harness/specs/plan-packet.schema.json`.
38
- 4. **Write** the PlanPacket JSON to the canonical `plan_packet_path` before completing.
39
- 5. Include rollback artifacts in all required forms.
34
+ ## Orchestration (required)
40
35
 
41
- ## Hard requirements
36
+ 1. Copy the `HarnessSpawnContext=…` JSON from `[HarnessRunContext]` into the spawn prompt (adjust `risk_level`, `quick`, `mode` from `$ARGUMENTS` if needed).
37
+ 2. Spawn **once** with **`inherit_context: false`**:
42
38
 
43
- - Do not run mutating tools in this command.
44
- - If task scope is ambiguous, call `ask_user` do not guess or use prose-only clarification.
45
- - Produce a `PlanPacket` matching `.pi/harness/specs/plan-packet.schema.json`.
46
- - Include rollback artifacts in all three forms:
47
- - revert command
48
- - prepared revert branch name
49
- - patch bundle path
50
- - Set risk level to `high` if uncertainty, broad blast radius, or policy-sensitive surfaces are involved.
51
- - Do **not** embed `plan_id=` in the user prompt for policy sync — the extension sets `approvedPlan` from the written file.
39
+ ```
40
+ Agent({ subagent_type: "harness/planner", prompt: "<task + HarnessSpawnContext JSON + output schema>" })
41
+ ```
52
42
 
53
- ## Guardrails
43
+ 3. `get_subagent_result` — parse final JSON (`status`, `plan_packet`, `human_summary`, `clarification`) via fenced `json` block.
44
+ 4. If `status === "ready"` and user approved in the subagent (`ask_user` Approve), validate `plan_packet` fields, then **write** `PlanPacket` JSON to canonical `plan_packet_path` from `[HarnessRunContext]`.
45
+ 5. If `needs_clarification`, tell the user the planner is waiting — do **not** re-spawn; user should answer in the subagent or re-run `/harness-plan`.
46
+ 6. Do **not** call `ask_user` in this parent session for planner clarification or approval.
54
47
 
55
- - Do not overthink straightforward planning requests.
56
- - Only plan the requested scope; do not execute or widen implementation.
57
- - Never speculate about code or configuration that was not read.
48
+ ## Parent rules
58
49
 
59
- ## Output contract
50
+ - Do not mutate project source files — only `plan-packet.json` after subagent approval is recorded.
51
+ - Do not embed `plan_id=` in prompts for policy sync.
52
+ - Optional: `/harness-plan-commit` if write was blocked but approval exists.
60
53
 
61
- Return:
54
+ ## Completion
62
55
 
63
- 1. Human-readable plan summary:
64
- - scope
65
- - assumptions
66
- - acceptance checks
67
- - rollback plan
68
- 2. Confirmation that PlanPacket was written to the canonical path.
69
-
70
- Do not proceed to execution from this command.
71
-
72
- ## Completion behavior
73
-
74
- Always end with:
75
-
76
- - one-line `plan_status` (`ready` or `needs_clarification`)
77
- - the final `risk_level` used
78
- - explicit `next_command` recommendation: `/harness-run` when `ready` (never `/harness-run --plan …`)
79
- - if `needs_clarification`, tell the user they may reply in plain language or run `/harness-plan` again with updates
56
+ - `plan_status`: `ready` or `needs_clarification`
57
+ - `risk_level` used
58
+ - `next_command`: `/harness-run` when `ready` (never `/harness-run --plan …`)
@@ -5,47 +5,33 @@ argument-hint: "[--run <run-id>] [--trace <trace-ref>]"
5
5
 
6
6
  # harness-review
7
7
 
8
- Produce an independent evaluator verdict.
8
+ Orchestrator spawn `harness/evaluator` with `mode: verdict`.
9
9
 
10
10
  ## Step 0 — Parse arguments
11
11
 
12
- Read `$ARGUMENTS` and parse:
13
-
14
12
  - optional: `--run <run-id>` (recovery only)
15
13
  - optional: `--trace <trace-ref>`
16
14
 
17
- On the happy path, **omit `--run`**. Use active run context from `[HarnessRunContext]`.
18
- Run in a **new Pi session** after execute when possible.
19
-
20
- ## Process
21
-
22
- 1. Reconstruct expected outcomes from plan and run artifacts.
23
- 2. Independently verify checks and regression guards.
24
- 3. Emit `EvalVerdict` output for policy gate consumption.
25
-
26
- ## Requirements
15
+ Happy path: omit `--run`; use `[HarnessRunContext]`.
27
16
 
28
- - Treat executor output as untrusted.
29
- - Do not self-review with executor-private scratch context.
30
- - Emit `EvalVerdict` contract matching `.pi/harness/specs/eval-verdict.schema.json`.
31
- - Provide reproducible failed checks and regression flags.
17
+ ## Orchestration (required)
32
18
 
33
- ## Guardrails
19
+ 1. Build `HarnessSpawnContext` with `mode: verdict`, `plan_packet_path`, `run_dir`, trace refs.
20
+ 2. Spawn:
34
21
 
35
- - Do not overthink straightforward pass/fail evidence.
36
- - Only evaluate requested run artifacts and gates.
37
- - Never speculate about checks that were not executed.
22
+ ```
23
+ Agent({ subagent_type: "harness/evaluator", prompt: "Treat executor output as untrusted. …" })
24
+ ```
38
25
 
39
- ## Output
26
+ 3. `get_subagent_result` — parse `EvalVerdict` JSON; parent writes under run dir for policy gate.
40
27
 
41
- - Human-readable findings.
42
- - Structured `EvalVerdict` JSON.
43
- - Recommended action: `proceed_to_adversary`, `replan`, or `rollback`.
28
+ ## Parent rules
44
29
 
45
- ## Completion behavior
30
+ - Do not run review checks inline in this session.
31
+ - No new Pi session required.
46
32
 
47
- Always finish with:
33
+ ## Completion
48
34
 
49
- - `eval_status` (`pass`, `conditional_pass`, `fail`)
50
- - `recommended_action`
51
- - short evidence list that maps each failed check to a reproducible reference
35
+ - `eval_status`: `pass`, `conditional_pass`, or `fail`
36
+ - `recommended_action`: `proceed_to_adversary`, `replan`, or `rollback`
37
+ - Evidence list for each failed check
@@ -5,32 +5,27 @@ argument-hint: "--evidence <evidence.json> --candidate <candidate-router.json> [
5
5
 
6
6
  # harness-router-tune
7
7
 
8
- Router tuning is **propose-and-approve only**.
8
+ Orchestrator scripts + `harness/meta-optimizer` spawn. **Never** write `.pi/model-router.json` directly.
9
9
 
10
10
  ## Step 0 — Parse arguments
11
11
 
12
- Read `$ARGUMENTS` and parse:
13
-
14
12
  - required: `--evidence <evidence.json>`, `--candidate <candidate-router.json>`
15
13
  - optional: `--proposal <out.json>`
16
14
 
17
- If required args are missing, stop and return:
18
-
19
- `Usage: /harness-router-tune --evidence <evidence.json> --candidate <candidate-router.json> [--proposal <out.json>]`
20
-
21
- ## Process
15
+ If missing required args:
22
16
 
23
- 1. Validate evidence completeness and guard status. Evidence may live under `.pi/harness/runs/<run_id>/` for the active harness run when produced by `/harness-eval` (resolve via active run context or explicit paths — no run id required on the happy path).
24
- 2. Generate a proposal artifact only (no live router mutation).
25
- 3. Require explicit human approval metadata before any apply step.
17
+ `Usage: /harness-router-tune --evidence <path> --candidate <path> [--proposal <out.json>]`
26
18
 
27
- ## Never-do rule
19
+ ## Orchestration (required)
28
20
 
29
- - Never write `.pi/model-router.json` directly from this command.
21
+ 1. Parent validates evidence paths exist.
22
+ 2. Optionally spawn:
30
23
 
31
- ## Proposal flow
24
+ ```
25
+ Agent({ subagent_type: "harness/meta-optimizer", prompt: "mode: tune, evidence paths…" })
26
+ ```
32
27
 
33
- 1. Build proposal:
28
+ 3. Parent runs proposal script:
34
29
 
35
30
  ```bash
36
31
  node .pi/harness/router/propose-router-tuning.mjs \
@@ -39,8 +34,8 @@ node .pi/harness/router/propose-router-tuning.mjs \
39
34
  --proposal-out .pi/harness/router/proposals/<id>.json
40
35
  ```
41
36
 
42
- 2. Call `ask_user` to approve / reject / request edits before apply (harness-decisions skill).
43
- 3. Apply only after approval, with explicit approver + justification:
37
+ 4. `ask_user` approve / reject / edit (harness-decisions).
38
+ 5. Apply only after approval:
44
39
 
45
40
  ```bash
46
41
  node .pi/harness/router/apply-router-proposal.mjs \
@@ -50,25 +45,8 @@ node .pi/harness/router/apply-router-proposal.mjs \
50
45
  --write
51
46
  ```
52
47
 
53
- ## Evidence requirements
54
-
55
- - Minimum sample count threshold met.
56
- - Pre/post success-rate delta included.
57
- - Cost-per-task delta included.
58
- - Regression guard status present and passing.
59
-
60
- If any requirement is missing, stop with `human_required`.
61
-
62
- ## Guardrails
63
-
64
- - Do not overthink weak evidence; reject incomplete proposals quickly.
65
- - Only produce proposal/apply instructions within this contract.
66
- - Never apply tuning without explicit human approver identity and justification.
67
-
68
- ## Completion behavior
69
-
70
- End with:
48
+ ## Completion
71
49
 
72
- - `tuning_status` (`proposed`, `human_required`, or `rejected`)
73
- - evidence gate summary (sample count, success delta, cost delta, regression guard)
74
- - explicit non-mutation confirmation for `.pi/model-router.json`
50
+ - `tuning_status`: `proposed`, `human_required`, or `rejected`
51
+ - Evidence gate summary
52
+ - Confirm `.pi/model-router.json` was not mutated without apply script
@@ -5,56 +5,39 @@ argument-hint: "[--budget <amount>]"
5
5
 
6
6
  # harness-run
7
7
 
8
- Execute implementation only after an approved plan exists in active run context.
8
+ Orchestrator only spawn `harness/executor`. Do **not** implement inline.
9
9
 
10
10
  ## Step 0 — Parse arguments
11
11
 
12
- Read `$ARGUMENTS` and parse:
13
-
14
12
  - optional: `--budget <amount>`
13
+ - Do **not** use `--plan` on happy path — load from `[HarnessActivePlan]` / `plan_packet_path`.
15
14
 
16
- Do **not** parse `--plan` on the happy path. Load the PlanPacket from `[HarnessActivePlan]` / injected `plan_packet_path` only.
17
-
18
- If the extension reports plan not ready, stop and return:
15
+ If plan not ready:
19
16
 
20
17
  `Run /harness-plan first — no approved plan in active run context.`
21
18
 
22
- Advanced recovery only: `--plan <path>` must live under the active run directory (extension validates).
23
-
24
- ## Process
25
-
26
- 1. Load PlanPacket from the injected canonical path and confirm it is valid.
27
- 2. Execute only within approved scope.
28
- 3. Run focused validations mapped to approved acceptance checks.
29
- 4. Produce rollback artifacts and handoff references for downstream gates.
30
-
31
- ## Gate behavior
32
-
33
- - Refuse execution if active plan is not ready (extension blocks before the agent runs).
34
- - Keep edits strictly within approved scope.
35
- - If scope drift appears, stop and return to `harness-plan`.
36
- - For **implementation forks** inside approved scope, call `ask_user` with 2–4 options. For plan-level ambiguity, stop and return to `harness-plan`.
37
- - Record evaluator/adversary prerequisites for downstream gates.
38
- - Always prepare rollback artifacts as part of execution output.
19
+ ## Orchestration (required)
39
20
 
40
- ## Guardrails
21
+ 1. Confirm `[HarnessActivePlan]` / extension reports plan ready.
22
+ 2. Build `HarnessSpawnContext` with `mode: execute`, `plan_packet_path`, `run_dir`, `acceptance_checks` from plan file.
23
+ 3. Spawn:
41
24
 
42
- - Do not overthink straightforward approved changes; execute the approved scope directly.
43
- - Only modify files and behaviors covered by the approved `PlanPacket`.
44
- - Never speculate about successful validation without runnable evidence.
25
+ ```
26
+ Agent({ subagent_type: "harness/executor", prompt: "<HarnessSpawnContext + handoff>" })
27
+ ```
45
28
 
46
- ## Output
29
+ 4. `get_subagent_result` — parse executor JSON (`execution_status`, validations, rollback refs).
30
+ 5. Parent persists trace/handoff artifacts under run dir if needed; do not self-review.
47
31
 
48
- - Implementation summary scoped to approved plan.
49
- - Files changed and why.
50
- - Targeted validations run.
51
- - Trace pointers and rollback references.
32
+ ## Parent rules
52
33
 
53
- ## Completion behavior
34
+ - Refuse if plan not approved.
35
+ - On `scope_drift`, stop and recommend `/harness-plan`.
36
+ - Do not call `ask_user` for plan-level ambiguity — return to plan command.
54
37
 
55
- End with:
38
+ ## Completion
56
39
 
57
- 1. `execution_status` (`completed`, `blocked`, or `scope_drift`).
58
- 2. `validation_summary` (pass/fail with command evidence).
59
- 3. `handoff_ready` booleans for evaluator/adversary prerequisites.
60
- 4. `next_command`: **New Pi session `/harness-eval`** when execution completed successfully.
40
+ - `execution_status`: `completed`, `blocked`, or `scope_drift`
41
+ - `validation_summary` with command evidence
42
+ - `handoff_ready` for evaluator/adversary
43
+ - `next_command`: `/harness-eval` (same session spawn isolated review agents; no new Pi session)