ultimate-pi 0.6.1 → 0.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (117) hide show
  1. package/.agents/skills/harness-decisions/SKILL.md +20 -1
  2. package/.agents/skills/harness-eval/SKILL.md +11 -13
  3. package/.agents/skills/harness-orchestration/SKILL.md +36 -30
  4. package/.agents/skills/harness-plan/SKILL.md +13 -14
  5. package/.agents/skills/harness-sentrux-setup/SKILL.md +3 -4
  6. package/.pi/PACKAGING.md +1 -1
  7. package/.pi/agents/harness/adversary.md +20 -12
  8. package/.pi/agents/harness/evaluator.md +25 -14
  9. package/.pi/agents/harness/executor.md +27 -16
  10. package/.pi/agents/harness/incident-recorder.md +37 -0
  11. package/.pi/agents/harness/meta-optimizer.md +18 -15
  12. package/.pi/agents/harness/planner.md +27 -30
  13. package/.pi/agents/harness/tie-breaker.md +4 -2
  14. package/.pi/agents/harness/trace-librarian.md +18 -11
  15. package/.pi/agents/pi-pi/ext-expert.md +1 -1
  16. package/.pi/agents/pi-pi/keybinding-expert.md +1 -1
  17. package/.pi/agents/pi-pi/tui-expert.md +3 -3
  18. package/.pi/extensions/00-ultimate-pi-system-prompt.ts +194 -0
  19. package/.pi/extensions/budget-guard.ts +11 -3
  20. package/.pi/extensions/custom-footer.ts +8 -3
  21. package/.pi/extensions/custom-header.ts +2 -2
  22. package/.pi/extensions/debate-orchestrator.ts +11 -3
  23. package/.pi/extensions/dotenv-loader.ts +1 -1
  24. package/.pi/extensions/drift-monitor.ts +1 -1
  25. package/.pi/extensions/harness-ask-user.ts +1 -1
  26. package/.pi/extensions/harness-live-widget.ts +11 -4
  27. package/.pi/extensions/harness-run-context.ts +745 -0
  28. package/.pi/extensions/harness-telemetry.ts +1 -1
  29. package/.pi/extensions/harness-web-guard.ts +1 -1
  30. package/.pi/extensions/harness-web-tools.ts +1 -1
  31. package/.pi/extensions/lib/ask-user/dialog.ts +2 -2
  32. package/.pi/extensions/lib/ask-user/fallback.ts +1 -1
  33. package/.pi/extensions/lib/ask-user/render.ts +3 -3
  34. package/.pi/extensions/lib/harness-subagents/agent-loader.ts +1 -1
  35. package/.pi/extensions/lib/harness-subagents/agent-parser.ts +1 -1
  36. package/.pi/extensions/lib/harness-subagents/blackboard-tool.ts +1 -1
  37. package/.pi/extensions/lib/harness-subagents/harness-subagent-policy.ts +134 -0
  38. package/.pi/extensions/lib/harness-subagents/vendored/agent-manager.ts +2 -2
  39. package/.pi/extensions/lib/harness-subagents/vendored/agent-runner.ts +9 -5
  40. package/.pi/extensions/lib/harness-subagents/vendored/context.ts +1 -1
  41. package/.pi/extensions/lib/harness-subagents/vendored/env.ts +1 -1
  42. package/.pi/extensions/lib/harness-subagents/vendored/index.ts +2 -2
  43. package/.pi/extensions/lib/harness-subagents/vendored/output-file.ts +1 -1
  44. package/.pi/extensions/lib/harness-subagents/vendored/schedule.ts +1 -1
  45. package/.pi/extensions/lib/harness-subagents/vendored/settings.ts +1 -1
  46. package/.pi/extensions/lib/harness-subagents/vendored/skill-loader.ts +1 -1
  47. package/.pi/extensions/lib/harness-subagents/vendored/types.ts +2 -2
  48. package/.pi/extensions/lib/harness-subagents/vendored/ui/agent-widget.ts +1 -1
  49. package/.pi/extensions/lib/harness-subagents/vendored/ui/conversation-viewer.ts +2 -2
  50. package/.pi/extensions/lib/harness-subagents/vendored/ui/schedule-menu.ts +1 -1
  51. package/.pi/extensions/observation-bus.ts +8 -10
  52. package/.pi/extensions/pi-model-router-harness.ts +1 -1
  53. package/.pi/extensions/policy-gate.ts +136 -84
  54. package/.pi/extensions/provider-payload-sanitize.ts +1 -1
  55. package/.pi/extensions/review-integrity.ts +76 -22
  56. package/.pi/extensions/sentrux-rules-sync.ts +1 -1
  57. package/.pi/extensions/soundboard.ts +1 -1
  58. package/.pi/extensions/test-diff-integrity.ts +1 -1
  59. package/.pi/extensions/trace-recorder.ts +81 -21
  60. package/.pi/extensions/ultimate-pi-vcc.ts +1 -1
  61. package/.pi/harness/README.md +2 -0
  62. package/.pi/harness/agents.manifest.json +17 -13
  63. package/.pi/harness/docs/adrs/0009-sentrux-rules-lifecycle.md +1 -1
  64. package/.pi/harness/docs/adrs/0031-harness-run-context.md +41 -0
  65. package/.pi/harness/docs/adrs/0032-harness-command-orchestration.md +37 -0
  66. package/.pi/harness/docs/adrs/README.md +2 -0
  67. package/.pi/harness/evals/smoke/run-context.fixture.json +17 -0
  68. package/.pi/harness/specs/harness-run-context.schema.json +80 -0
  69. package/.pi/harness/specs/harness-spawn-context.schema.json +65 -0
  70. package/.pi/lib/harness-agent-output.ts +41 -0
  71. package/.pi/lib/harness-run-context.ts +1139 -0
  72. package/.pi/lib/harness-ui-state.ts +12 -1
  73. package/.pi/prompts/harness-abort.md +9 -6
  74. package/.pi/prompts/harness-auto.md +36 -61
  75. package/.pi/prompts/harness-critic.md +17 -32
  76. package/.pi/prompts/harness-eval.md +22 -30
  77. package/.pi/prompts/harness-incident.md +17 -34
  78. package/.pi/prompts/harness-plan.md +32 -36
  79. package/.pi/prompts/harness-review.md +18 -33
  80. package/.pi/prompts/harness-router-tune.md +16 -38
  81. package/.pi/prompts/harness-run.md +23 -40
  82. package/.pi/prompts/harness-setup.md +7 -27
  83. package/.pi/prompts/harness-trace.md +15 -34
  84. package/.pi/scripts/harness-generate-model-router.mjs +16 -13
  85. package/.pi/scripts/harness-verify.mjs +34 -0
  86. package/.pi/scripts/vendor-sync-pi-model-router.sh +10 -10
  87. package/CHANGELOG.md +34 -1
  88. package/README.md +31 -15
  89. package/THIRD_PARTY_NOTICES.md +1 -1
  90. package/package.json +14 -9
  91. package/vendor/pi-model-router/UPSTREAM_PIN.md +1 -1
  92. package/vendor/pi-model-router/extensions/commands.ts +2 -2
  93. package/vendor/pi-model-router/extensions/config.ts +2 -2
  94. package/vendor/pi-model-router/extensions/index.ts +1 -1
  95. package/vendor/pi-model-router/extensions/provider.ts +2 -2
  96. package/vendor/pi-model-router/extensions/routing.ts +2 -2
  97. package/vendor/pi-model-router/extensions/types.ts +1 -1
  98. package/vendor/pi-model-router/extensions/ui.ts +1 -1
  99. package/vendor/pi-model-router/package.json +4 -4
  100. package/vendor/pi-vcc/index.ts +1 -1
  101. package/vendor/pi-vcc/package.json +1 -1
  102. package/vendor/pi-vcc/src/commands/pi-vcc.ts +1 -1
  103. package/vendor/pi-vcc/src/commands/vcc-recall.ts +1 -1
  104. package/vendor/pi-vcc/src/core/content.ts +1 -1
  105. package/vendor/pi-vcc/src/core/load-messages.ts +1 -1
  106. package/vendor/pi-vcc/src/core/normalize.ts +1 -1
  107. package/vendor/pi-vcc/src/core/render-entries.ts +1 -1
  108. package/vendor/pi-vcc/src/core/report.ts +1 -1
  109. package/vendor/pi-vcc/src/core/search-entries.ts +1 -1
  110. package/vendor/pi-vcc/src/core/summarize.ts +1 -1
  111. package/vendor/pi-vcc/src/hooks/before-compact.ts +2 -2
  112. package/vendor/pi-vcc/src/tools/recall.ts +1 -1
  113. package/vendor/pi-vcc/src/types.ts +1 -1
  114. package/vendor/pi-vcc/tests/fixtures.ts +1 -1
  115. package/vendor/pi-vcc/tests/render-entries.test.ts +1 -1
  116. package/vendor/pi-vcc/tests/search-entries.test.ts +1 -1
  117. package/vendor/pi-vcc/tests/support/load-session.ts +2 -2
@@ -1,4 +1,4 @@
1
- import type { ExtensionContext } from "@mariozechner/pi-coding-agent";
1
+ import type { ExtensionContext } from "@earendil-works/pi-coding-agent";
2
2
 
3
3
  export type HarnessPhase =
4
4
  | "plan"
@@ -97,6 +97,7 @@ export interface HarnessUiState {
97
97
  testIntegrity: number | null;
98
98
  };
99
99
  traceRunId: string | null;
100
+ nextRecommendedCommand: string | null;
100
101
  }
101
102
 
102
103
  const DEFAULT_STATE: HarnessUiState = {
@@ -123,6 +124,7 @@ const DEFAULT_STATE: HarnessUiState = {
123
124
  testIntegrity: null,
124
125
  },
125
126
  traceRunId: null,
127
+ nextRecommendedCommand: null,
126
128
  };
127
129
 
128
130
  const RELEVANT_CUSTOM_TYPES = new Set([
@@ -135,6 +137,7 @@ const RELEVANT_CUSTOM_TYPES = new Set([
135
137
  "harness-test-integrity-flag",
136
138
  "harness-run-trace",
137
139
  "harness-trace-state",
140
+ "harness-run-context",
138
141
  ]);
139
142
 
140
143
  function asNumber(value: unknown): number | null {
@@ -284,6 +287,14 @@ function createStateFromEntries(entries: unknown[]): HarnessUiState {
284
287
  ? traceState.run_id
285
288
  : null;
286
289
 
290
+ const runCtx = latest.get("harness-run-context") as
291
+ | { next_recommended_command?: string }
292
+ | undefined;
293
+ state.nextRecommendedCommand =
294
+ typeof runCtx?.next_recommended_command === "string"
295
+ ? runCtx.next_recommended_command
296
+ : null;
297
+
287
298
  state.flowSubstate = deriveFlowSubstate(state);
288
299
  return state;
289
300
  }
@@ -13,8 +13,9 @@ Safely abort the current harness run in this session.
13
13
  - `phase: plan`
14
14
  - `approvedPlan: false`
15
15
  - `planId: null`
16
- - records abort metadata for observability.
17
- - enables a hard safety lock that blocks mutating tools until a new approved plan is attached.
16
+ - clears active run `plan_ready` (plan files may remain on disk for forensics)
17
+ - records abort metadata for observability
18
+ - enables a hard safety lock that blocks mutating tools until a new approved plan is attached
18
19
 
19
20
  ## Usage
20
21
 
@@ -27,8 +28,8 @@ Examples:
27
28
 
28
29
  ## Safety guarantees
29
30
 
30
- - no mutating work should continue under the previous run context.
31
- - a fresh approved plan is required before mutation can resume.
31
+ - no mutating work should continue under the previous run context
32
+ - a fresh approved plan is required before mutation can resume
32
33
 
33
34
  ## Next step
34
35
 
@@ -36,6 +37,8 @@ Run:
36
37
 
37
38
  `/harness-plan "<task>"`
38
39
 
39
- Then proceed with:
40
+ Then:
40
41
 
41
- `/harness-run --plan <path-to-plan-packet.json>`
42
+ `/harness-run`
43
+
44
+ (No `--plan` or run id required — the harness restores active context after replan.)
@@ -5,79 +5,54 @@ argument-hint: "\"<task>\" [--quick] [--risk low|med|high] [--budget <amount>]"
5
5
 
6
6
  # harness-auto
7
7
 
8
- Run full harness flow in one command:
9
-
10
- `plan -> execute -> evaluate -> adversary -> severity-policy decision -> commit+PR (no auto-merge)`
8
+ Pipeline orchestrator — one session, sequential `Agent` spawns. Invoke **harness-orchestration** skill for agent IDs. Do **not** implement or review inline.
11
9
 
12
10
  ## Step 0 — Parse arguments
13
11
 
14
- Read `$ARGUMENTS` and normalize:
15
-
16
- - required task: quoted or unquoted first value
17
- - optional flags: `--quick`, `--risk low|med|high`, `--budget <amount>`
12
+ - required task (quoted or first token)
13
+ - optional: `--quick`, `--risk`, `--budget`
18
14
 
19
- If task is missing, stop and return:
15
+ If task missing:
20
16
 
21
17
  `Usage: /harness-auto "<task>" [--quick] [--risk low|med|high] [--budget <amount>]`
22
18
 
23
- ## Process contract
24
-
25
- 1. Build and approve plan packet before any mutation.
26
- 2. Execute only approved scope with rollback artifacts.
27
- 3. Run independent evaluator then adversarial reviewer.
28
- 4. Apply severity policy + strict pre-PR gates.
29
- 5. If gates pass, auto-commit and open PR; never auto-merge.
30
-
31
- ## Locked decisions (must not be changed)
32
-
33
- - Always produce a plan packet before mutation.
34
- - Adversarial review is always required.
35
- - Merge blocking authority is severity-policy-engine.
36
- - Router tuning is propose-and-approve only.
37
- - Plan ambiguity must use `ask_user` (harness-decisions skill) — no silent guessing.
38
- - Rollback artifact must be revert-commit-ready and include:
39
- - revert command
40
- - prepared revert branch
41
- - patch bundle
42
- - Debate profile is aggressive with locked confidence weights:
43
- - claim_quality=0.20
44
- - reproducibility=0.40
45
- - agreement=0.40
46
- - Strict pre-PR gate is mandatory.
47
- - Post-pass behavior is auto-commit and auto-open-PR.
48
- - Never auto-merge PR.
49
-
50
- ## Guardrails
51
-
52
- - Do not overthink straightforward gate outcomes; enforce gates deterministically.
53
- - Only follow the locked pipeline and governance decisions listed here.
54
- - Never bypass mandatory safety gates, even in `--quick` mode.
19
+ ## Orchestration (required) — same session
55
20
 
56
- ## Strict gates
21
+ 1. **Plan** — spawn `harness/planner` → parse JSON → present full plan → `ask_user` Approve/Changes/Cancel → write `plan-packet.json` only on Approve (advances phase via policy-gate).
22
+ 2. **Execute** — spawn `harness/executor` with `HarnessSpawnContext` (`mode: execute`). Summarize handoff bullets for next spawn (do not paste full subagent log).
23
+ 3. **Eval** — spawn `harness/evaluator` (`mode: benchmark`) after parent scripts if needed.
24
+ 4. **Review** — spawn `harness/evaluator` (`mode: verdict`) OR rely on eval verdict if policy allows — prefer both when strict gates require.
25
+ 5. **Adversary** — spawn `harness/adversary` with artifact paths.
26
+ 6. **Tie-breaker** — spawn `harness/tie-breaker` only if debate unresolved.
27
+ 7. **Parent** — apply locked strict gates below; commit/PR only if all pass.
57
28
 
58
- Block commit/PR if any gate fails:
29
+ No new Pi session for review — subagents use isolated context (`inherit_context: false`).
59
30
 
60
- 1. Plan gate passed.
61
- 2. Execution completed within approved scope.
62
- 3. Independent evaluator passed.
63
- 4. Adversarial review completed with consensus packet.
64
- 5. Severity-policy-engine output is `pass` or `conditional_pass`.
65
- 6. Benchmark delta checks passed.
66
- 7. Rollback artifacts generated.
31
+ ## Locked decisions (do not change)
67
32
 
68
- ## Notes
33
+ - Always produce and approve plan before mutation.
34
+ - Adversarial review always required.
35
+ - Severity-policy-engine blocks merge.
36
+ - Router tuning propose-and-approve only.
37
+ - Plan ambiguity → parent `ask_user` (harness-decisions).
38
+ - Rollback artifacts: revert command, revert branch, patch bundle.
39
+ - Debate weights: claim_quality=0.20, reproducibility=0.40, agreement=0.40.
40
+ - Strict pre-PR gate mandatory; auto-commit + open PR; never auto-merge.
69
41
 
70
- - `--quick` may reduce breadth, never safety gates.
71
- - `--risk` can tighten behavior, never disable adversary.
72
- - If risk/ambiguity is high, auto-fallback to manual `harness-plan` and use `ask_user` for blocking forks.
73
- - If execution must be interrupted safely, run `/harness-abort [reason]`, then restart with `/harness-plan "<task>"`.
74
- - Always output trace bundle ID and incident/rollback references.
42
+ ## Strict gates
43
+
44
+ Block commit/PR if any fails: plan gate, execution in scope, evaluator pass, adversary complete, severity-policy pass/conditional_pass, benchmark deltas, rollback artifacts.
45
+
46
+ ## Notes
75
47
 
76
- ## Completion behavior
48
+ - `--quick` reduces breadth, never safety gates.
49
+ - High risk/ambiguity → stop and recommend manual `/harness-plan` with `ask_user`.
50
+ - Interrupt: `/harness-abort [reason]` then `/harness-plan`.
51
+ - Artifact refs under active run dir; `/harness-run-status` or `/harness-trace-last` for handoff.
77
52
 
78
- End with a deterministic handoff block:
53
+ ## Completion
79
54
 
80
- 1. `Pipeline status` (pass/fail per strict gate).
81
- 2. `Trace bundle` and artifact references (`plan`, `eval`, `adversary`, `consensus`, `rollback`).
82
- 3. `Policy outcome` (`pass`, `conditional_pass`, `block`, or `human_required`) with one-line rationale.
83
- 4. `Next action` (open PR, replan, rollback, or human override path).
55
+ 1. Pipeline status per gate
56
+ 2. Artifact references
57
+ 3. Policy outcome: `pass`, `conditional_pass`, `block`, or `human_required`
58
+ 4. Next action (PR, replan, rollback, override)
@@ -1,52 +1,37 @@
1
1
  ---
2
2
  description: Adversarial reviewer command with reproducible, merge-blocking findings.
3
- argument-hint: "--run <run-id> [--trace <trace-ref>] [--risk low|med|high]"
3
+ argument-hint: "[--run <run-id>] [--trace <trace-ref>] [--risk low|med|high]"
4
4
  ---
5
5
 
6
6
  # harness-critic
7
7
 
8
- Run adversarial review against the candidate result.
8
+ Orchestrator spawn `harness/adversary`.
9
9
 
10
10
  ## Step 0 — Parse arguments
11
11
 
12
- Read `$ARGUMENTS` and parse:
13
-
14
- - required: `--run <run-id>`
12
+ - optional: `--run <run-id>` (recovery only)
15
13
  - optional: `--trace <trace-ref>`, `--risk low|med|high`
16
14
 
17
- If `--run` is missing, stop and return:
18
-
19
- `Usage: /harness-critic --run <run-id> [--trace <trace-ref>] [--risk low|med|high]`
20
-
21
- ## Process
22
-
23
- 1. Assume hidden regressions exist and identify likely fault surfaces.
24
- 2. Challenge evaluator/executor assumptions with reproducible probes.
25
- 3. Emit structured adversarial findings for severity policy consumption.
26
-
27
- ## Requirements
15
+ Happy path: omit `--run`.
28
16
 
29
- - Assume hidden regressions exist until disproven.
30
- - Attempt to invalidate evaluator assumptions with concrete evidence.
31
- - Emit `AdversaryReport` matching `.pi/harness/specs/adversary-report.schema.json`.
32
- - Flag `block_merge=true` for high-confidence correctness/security/test-integrity risks.
17
+ ## Orchestration (required)
33
18
 
34
- ## Guardrails
19
+ 1. Build `HarnessSpawnContext` with `mode: adversary`, run artifacts, plan path, trace refs.
20
+ 2. Spawn:
35
21
 
36
- - Do not overthink speculative attacks; prioritize reproducible findings.
37
- - Only report risks tied to candidate behavior and gate policy.
38
- - Never claim a defect without evidence and repro steps.
22
+ ```
23
+ Agent({ subagent_type: "harness/adversary", prompt: "…" })
24
+ ```
39
25
 
40
- ## Output
26
+ 3. `get_subagent_result` — parse `AdversaryReport` JSON; parent persists for severity policy.
41
27
 
42
- - Prioritized findings with repro steps.
43
- - Structured `AdversaryReport` JSON.
44
- - Clear merge-block recommendation.
28
+ ## Parent rules
45
29
 
46
- ## Completion behavior
30
+ - Assume hidden regressions until disproven (in subagent).
31
+ - No new Pi session required.
47
32
 
48
- Always end with:
33
+ ## Completion
49
34
 
50
35
  - `block_merge` decision
51
- - top 1-3 high-confidence findings with repro pointers
52
- - explicit recommendation (`proceed`, `conditional_pass`, or `block`)
36
+ - Top findings with repro pointers
37
+ - `recommendation`: `proceed`, `conditional_pass`, or `block`
@@ -1,51 +1,43 @@
1
1
  ---
2
2
  description: Run focused benchmark/eval checks and emit structured harness verdict artifacts.
3
- argument-hint: "--run <run-id> [--baseline <ref>] [--suite <name>]"
3
+ argument-hint: "[--run <run-id>] [--baseline <ref>] [--suite <name>]"
4
4
  ---
5
5
 
6
6
  # harness-eval
7
7
 
8
- Run focused evaluations for the run and produce structured artifacts.
8
+ Orchestrator run deterministic scripts in parent if needed, then spawn `harness/evaluator` with `mode: benchmark`.
9
9
 
10
10
  ## Step 0 — Parse arguments
11
11
 
12
- Read `$ARGUMENTS` and parse:
13
-
14
- - required: `--run <run-id>`
12
+ - optional: `--run <run-id>` (recovery only)
15
13
  - optional: `--baseline <ref>`, `--suite <name>`
16
14
 
17
- If `--run` is missing, stop and return:
18
-
19
- `Usage: /harness-eval --run <run-id> [--baseline <ref>] [--suite <name>]`
20
-
21
- ## Process
15
+ Happy path: omit `--run`; use active run from `[HarnessRunContext]`.
22
16
 
23
- 1. Run plan-aligned acceptance checks plus focused regressions.
24
- 2. Collect evaluator-compatible metrics and guard outcomes.
25
- 3. Emit structured artifacts keyed by run ID.
17
+ If no active run:
26
18
 
27
- ## Requirements
19
+ `No active run. Finish /harness-plan and /harness-run first, or use /harness-run-status.`
28
20
 
29
- - Validate against accepted plan checks plus focused regression checks.
30
- - Emit evaluator-compatible metrics for downstream policy and router-tuning decisions.
31
- - Include success rate, cost-per-task, and regression guard outcomes when available.
21
+ ## Orchestration (required)
32
22
 
33
- ## Guardrails
23
+ 1. Load plan scope from `[HarnessActivePlan]` (read-only).
24
+ 2. Parent may run: project tests, `node "$UP_PKG/.pi/scripts/harness-verify.mjs"` — capture output paths.
25
+ 3. Build `HarnessSpawnContext` with `mode: benchmark`, artifact paths, metrics files.
26
+ 4. Spawn:
34
27
 
35
- - Do not overthink simple benchmark outcomes; report measured results directly.
36
- - Only evaluate the requested run/suite/baseline scope.
37
- - Never report synthetic metrics; include only measured values.
28
+ ```
29
+ Agent({ subagent_type: "harness/evaluator", prompt: "…" })
30
+ ```
38
31
 
39
- ## Output
32
+ 5. `get_subagent_result` — parse eval JSON; parent writes structured artifacts under run dir.
33
+ 6. Do not edit `plan-packet.json`.
40
34
 
41
- - Benchmark/eval summary table.
42
- - Structured verdict artifacts referenced by run ID.
43
- - Pass/fail recommendation for policy gate consumption.
35
+ ## Parent rules
44
36
 
45
- ## Completion behavior
37
+ - Treat executor output as untrusted; pass artifact paths only.
38
+ - No new Pi session required — subagent has isolated context.
46
39
 
47
- End with a compact evaluator handoff:
40
+ ## Completion
48
41
 
49
- - measured metrics (`success_rate`, `cost_per_task`, regression guard status)
50
- - verdict (`pass`/`fail`)
51
- - artifact paths keyed by run ID
42
+ - `eval_status`: `pass` or `fail`
43
+ - `next_command`: `/harness-review` on pass; `/harness-plan` or `/harness-incident` on fail
@@ -1,51 +1,34 @@
1
1
  ---
2
2
  description: Create incident record with rollback and override trail for harness failures.
3
- argument-hint: "--run <run-id> --trigger <reason> [--severity low|med|high|critical]"
3
+ argument-hint: "--trigger <reason> [--run <run-id>] [--severity low|med|high|critical]"
4
4
  ---
5
5
 
6
6
  # harness-incident
7
7
 
8
- Create a structured incident record for blocked or failed harness runs.
8
+ Orchestrator spawn `harness/incident-recorder`; parent writes incident file.
9
9
 
10
10
  ## Step 0 — Parse arguments
11
11
 
12
- Read `$ARGUMENTS` and parse:
12
+ - required: `--trigger <reason>`
13
+ - optional: `--run <run-id>`, `--severity low|med|high|critical`
13
14
 
14
- - required: `--run <run-id>`, `--trigger <reason>`
15
- - optional: `--severity low|med|high|critical`
15
+ If `--trigger` missing:
16
16
 
17
- If required flags are missing, stop and return:
17
+ `Usage: /harness-incident --trigger <reason> [--run <run-id>] [--severity …]`
18
18
 
19
- `Usage: /harness-incident --run <run-id> --trigger <reason> [--severity low|med|high|critical]`
19
+ ## Orchestration (required)
20
20
 
21
- ## Process
21
+ 1. Build `HarnessSpawnContext` with `mode: incident`, trigger, severity, run paths.
22
+ 2. Spawn:
22
23
 
23
- 1. Gather run context, trigger reason, and severity context.
24
- 2. Build `IncidentRecord` with blast radius, mitigation, rollback, and override metadata.
25
- 3. Validate incident output contract before finalizing.
24
+ ```
25
+ Agent({ subagent_type: "harness/incident-recorder", prompt: "…" })
26
+ ```
26
27
 
27
- ## Requirements
28
+ 3. `get_subagent_result` — validate `IncidentRecord` draft; parent writes under `.pi/harness/incidents/`.
28
29
 
29
- - Emit `IncidentRecord` matching `.pi/harness/specs/incident-record.schema.json`.
30
- - Capture blast radius, mitigation, rollback refs, and postmortem requirement.
31
- - If a policy block is overridden, record single-human approver and explicit justification.
30
+ ## Completion
32
31
 
33
- ## Guardrails
34
-
35
- - Do not overthink incident narrative; prioritize factual, auditable records.
36
- - Only record details supported by available run artifacts and explicit inputs.
37
- - Never omit override approver identity or justification when override occurred.
38
-
39
- ## Output
40
-
41
- - Incident summary.
42
- - Structured `IncidentRecord` JSON.
43
- - Immediate rollback decision trail.
44
-
45
- ## Completion behavior
46
-
47
- Finish with:
48
-
49
- - `incident_status` (`recorded` or `needs_input`)
50
- - rollback action (`execute_now` or `standby`)
51
- - postmortem requirement (`true`/`false`)
32
+ - `incident_status`: `recorded` or `needs_input`
33
+ - `rollback_action`: `execute_now` or `standby`
34
+ - `postmortem_required`: true/false
@@ -5,60 +5,56 @@ argument-hint: "\"<task>\" [--risk low|med|high] [--budget <amount>] [--quick]"
5
5
 
6
6
  # harness-plan
7
7
 
8
- Create a machine-readable plan packet before execution.
8
+ Orchestrator only spawn `harness/planner`, present draft, run `ask_user`, write plan after Approve. Do **not** plan inline in this session.
9
9
 
10
10
  ## Step 0 — Parse arguments
11
11
 
12
- Read `$ARGUMENTS` and parse:
12
+ Read `$ARGUMENTS`:
13
13
 
14
14
  - task statement (required)
15
- - optional flags: `--risk low|med|high`, `--budget <amount>`, `--quick`
15
+ - optional: `--risk low|med|high`, `--budget <amount>`, `--quick`
16
16
 
17
- If task is missing, stop and return:
17
+ If task is missing:
18
18
 
19
19
  `Usage: /harness-plan "<task>" [--risk low|med|high] [--budget <amount>] [--quick]`
20
20
 
21
- ## Process
21
+ `--quick` narrows planning breadth only — it does **not** skip user approval.
22
22
 
23
- 1. Parse the requested task and extract concrete scope and constraints.
24
- 2. If ambiguity blocks safe execution planning, call `ask_user` (harness-decisions skill). Stop with `needs_clarification` if the user cancels.
25
- 3. Build a `PlanPacket` that is valid against `.pi/harness/specs/plan-packet.schema.json`.
26
- 4. Include rollback artifacts in all required forms.
23
+ ## Active plan context
27
24
 
28
- ## Hard requirements
25
+ If `[HarnessActivePlan]` is present:
29
26
 
30
- - Do not run mutating tools in this command.
31
- - If task scope is ambiguous, call `ask_user` do not guess or use prose-only clarification.
32
- - Produce a `PlanPacket` matching `.pi/harness/specs/plan-packet.schema.json`.
33
- - Include rollback artifacts in all three forms:
34
- - revert command
35
- - prepared revert branch name
36
- - patch bundle path
37
- - Set risk level to `high` if uncertainty, broad blast radius, or policy-sensitive surfaces are involved.
27
+ - Read current packet from `plan_packet_path` first.
28
+ - Treat task as **revise/amend** unless `/harness-new-run` was used.
29
+ - Pass `mode: revise` in spawn context.
38
30
 
39
- ## Guardrails
31
+ Otherwise use canonical path from `[HarnessRunContext]` for greenfield `mode: create`.
40
32
 
41
- - Do not overthink straightforward planning requests.
42
- - Only plan the requested scope; do not execute or widen implementation.
43
- - Never speculate about code or configuration that was not read.
33
+ ## Orchestration (required)
44
34
 
45
- ## Output contract
35
+ 1. Build `HarnessSpawnContext` JSON (`.pi/harness/specs/harness-spawn-context.schema.json`) from injected run/plan context: `run_id`, `plan_packet_path`, `task_summary`, `risk_level`, `quick`, `mode`.
36
+ 2. Spawn with **`inherit_context: false`**:
46
37
 
47
- Return:
38
+ ```
39
+ Agent({ subagent_type: "harness/planner", prompt: "<task + HarnessSpawnContext JSON + output schema>" })
40
+ ```
48
41
 
49
- 1. Human-readable plan summary:
50
- - scope
51
- - assumptions
52
- - acceptance checks
53
- - rollback plan
54
- 2. A valid JSON `PlanPacket` object.
42
+ 3. `get_subagent_result` parse final JSON (`status`, `plan_packet`, `human_summary`, `clarification`) via fenced `json` block.
43
+ 4. If `needs_clarification`, call `ask_user` (harness-decisions) with planner `clarification.options`, then re-spawn with answers.
44
+ 5. Present **full** human-readable plan in chat (scope, assumptions, acceptance_checks, rollback_plan, risk_level).
45
+ 6. Call `ask_user`: **Approve** / **Request changes** / **Cancel** (harness-decisions). **Do not write** until Approve.
46
+ 7. On **Request changes**, re-spawn planner with `mode: revise` and user feedback — do not write file.
47
+ 8. **Only after Approve** — write `PlanPacket` JSON to canonical `plan_packet_path`.
55
48
 
56
- Do not proceed to execution from this command.
49
+ ## Parent rules
57
50
 
58
- ## Completion behavior
51
+ - Do not mutate project source files — only `plan-packet.json` after approval.
52
+ - Validate draft against `.pi/harness/specs/plan-packet.schema.json` before `ask_user` Approve.
53
+ - Do not embed `plan_id=` in prompts for policy sync.
59
54
 
60
- Always end with:
55
+ ## Completion
61
56
 
62
- - one-line `plan_status` (`ready` or `needs_clarification`)
63
- - the final `risk_level` used
64
- - explicit `next_command` recommendation (`/harness-run --plan ...` or clarification request)
57
+ - `plan_status`: `ready` or `needs_clarification`
58
+ - `risk_level` used
59
+ - `next_command`: `/harness-run` when `ready` (never `/harness-run --plan …`)
60
+ - If `needs_clarification`, user may reply in chat or re-run `/harness-plan`
@@ -1,52 +1,37 @@
1
1
  ---
2
2
  description: Independent evaluator pass/fail verdict in session isolation mode.
3
- argument-hint: "--run <run-id> [--trace <trace-ref>]"
3
+ argument-hint: "[--run <run-id>] [--trace <trace-ref>]"
4
4
  ---
5
5
 
6
6
  # harness-review
7
7
 
8
- Produce an independent evaluator verdict.
8
+ Orchestrator spawn `harness/evaluator` with `mode: verdict`.
9
9
 
10
10
  ## Step 0 — Parse arguments
11
11
 
12
- Read `$ARGUMENTS` and parse:
13
-
14
- - required: `--run <run-id>`
12
+ - optional: `--run <run-id>` (recovery only)
15
13
  - optional: `--trace <trace-ref>`
16
14
 
17
- If `--run` is missing, stop and return:
18
-
19
- `Usage: /harness-review --run <run-id> [--trace <trace-ref>]`
20
-
21
- ## Process
22
-
23
- 1. Reconstruct expected outcomes from plan and run artifacts.
24
- 2. Independently verify checks and regression guards.
25
- 3. Emit `EvalVerdict` output for policy gate consumption.
26
-
27
- ## Requirements
15
+ Happy path: omit `--run`; use `[HarnessRunContext]`.
28
16
 
29
- - Treat executor output as untrusted.
30
- - Do not self-review with executor-private scratch context.
31
- - Emit `EvalVerdict` contract matching `.pi/harness/specs/eval-verdict.schema.json`.
32
- - Provide reproducible failed checks and regression flags.
17
+ ## Orchestration (required)
33
18
 
34
- ## Guardrails
19
+ 1. Build `HarnessSpawnContext` with `mode: verdict`, `plan_packet_path`, `run_dir`, trace refs.
20
+ 2. Spawn:
35
21
 
36
- - Do not overthink straightforward pass/fail evidence.
37
- - Only evaluate requested run artifacts and gates.
38
- - Never speculate about checks that were not executed.
22
+ ```
23
+ Agent({ subagent_type: "harness/evaluator", prompt: "Treat executor output as untrusted. …" })
24
+ ```
39
25
 
40
- ## Output
26
+ 3. `get_subagent_result` — parse `EvalVerdict` JSON; parent writes under run dir for policy gate.
41
27
 
42
- - Human-readable findings.
43
- - Structured `EvalVerdict` JSON.
44
- - Recommended action: `proceed_to_adversary`, `replan`, or `rollback`.
28
+ ## Parent rules
45
29
 
46
- ## Completion behavior
30
+ - Do not run review checks inline in this session.
31
+ - No new Pi session required.
47
32
 
48
- Always finish with:
33
+ ## Completion
49
34
 
50
- - `eval_status` (`pass`, `conditional_pass`, `fail`)
51
- - `recommended_action`
52
- - short evidence list that maps each failed check to a reproducible reference
35
+ - `eval_status`: `pass`, `conditional_pass`, or `fail`
36
+ - `recommended_action`: `proceed_to_adversary`, `replan`, or `rollback`
37
+ - Evidence list for each failed check
@@ -5,32 +5,27 @@ argument-hint: "--evidence <evidence.json> --candidate <candidate-router.json> [
5
5
 
6
6
  # harness-router-tune
7
7
 
8
- Router tuning is **propose-and-approve only**.
8
+ Orchestrator scripts + `harness/meta-optimizer` spawn. **Never** write `.pi/model-router.json` directly.
9
9
 
10
10
  ## Step 0 — Parse arguments
11
11
 
12
- Read `$ARGUMENTS` and parse:
13
-
14
12
  - required: `--evidence <evidence.json>`, `--candidate <candidate-router.json>`
15
13
  - optional: `--proposal <out.json>`
16
14
 
17
- If required args are missing, stop and return:
18
-
19
- `Usage: /harness-router-tune --evidence <evidence.json> --candidate <candidate-router.json> [--proposal <out.json>]`
20
-
21
- ## Process
15
+ If missing required args:
22
16
 
23
- 1. Validate evidence completeness and guard status.
24
- 2. Generate a proposal artifact only (no live router mutation).
25
- 3. Require explicit human approval metadata before any apply step.
17
+ `Usage: /harness-router-tune --evidence <path> --candidate <path> [--proposal <out.json>]`
26
18
 
27
- ## Never-do rule
19
+ ## Orchestration (required)
28
20
 
29
- - Never write `.pi/model-router.json` directly from this command.
21
+ 1. Parent validates evidence paths exist.
22
+ 2. Optionally spawn:
30
23
 
31
- ## Proposal flow
24
+ ```
25
+ Agent({ subagent_type: "harness/meta-optimizer", prompt: "mode: tune, evidence paths…" })
26
+ ```
32
27
 
33
- 1. Build proposal:
28
+ 3. Parent runs proposal script:
34
29
 
35
30
  ```bash
36
31
  node .pi/harness/router/propose-router-tuning.mjs \
@@ -39,8 +34,8 @@ node .pi/harness/router/propose-router-tuning.mjs \
39
34
  --proposal-out .pi/harness/router/proposals/<id>.json
40
35
  ```
41
36
 
42
- 2. Call `ask_user` to approve / reject / request edits before apply (harness-decisions skill).
43
- 3. Apply only after approval, with explicit approver + justification:
37
+ 4. `ask_user` approve / reject / edit (harness-decisions).
38
+ 5. Apply only after approval:
44
39
 
45
40
  ```bash
46
41
  node .pi/harness/router/apply-router-proposal.mjs \
@@ -50,25 +45,8 @@ node .pi/harness/router/apply-router-proposal.mjs \
50
45
  --write
51
46
  ```
52
47
 
53
- ## Evidence requirements
54
-
55
- - Minimum sample count threshold met.
56
- - Pre/post success-rate delta included.
57
- - Cost-per-task delta included.
58
- - Regression guard status present and passing.
59
-
60
- If any requirement is missing, stop with `human_required`.
61
-
62
- ## Guardrails
63
-
64
- - Do not overthink weak evidence; reject incomplete proposals quickly.
65
- - Only produce proposal/apply instructions within this contract.
66
- - Never apply tuning without explicit human approver identity and justification.
67
-
68
- ## Completion behavior
69
-
70
- End with:
48
+ ## Completion
71
49
 
72
- - `tuning_status` (`proposed`, `human_required`, or `rejected`)
73
- - evidence gate summary (sample count, success delta, cost delta, regression guard)
74
- - explicit non-mutation confirmation for `.pi/model-router.json`
50
+ - `tuning_status`: `proposed`, `human_required`, or `rejected`
51
+ - Evidence gate summary
52
+ - Confirm `.pi/model-router.json` was not mutated without apply script