ultimate-pi 0.7.0 → 0.9.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.agents/skills/harness-decisions/SKILL.md +20 -1
- package/.agents/skills/harness-eval/SKILL.md +11 -13
- package/.agents/skills/harness-orchestration/SKILL.md +36 -30
- package/.agents/skills/harness-plan/SKILL.md +13 -18
- package/.pi/PACKAGING.md +1 -1
- package/.pi/agents/harness/adversary.md +20 -12
- package/.pi/agents/harness/evaluator.md +25 -14
- package/.pi/agents/harness/executor.md +27 -16
- package/.pi/agents/harness/incident-recorder.md +37 -0
- package/.pi/agents/harness/meta-optimizer.md +18 -15
- package/.pi/agents/harness/planner.md +26 -30
- package/.pi/agents/harness/tie-breaker.md +4 -2
- package/.pi/agents/harness/trace-librarian.md +18 -11
- package/.pi/agents/pi-pi/ext-expert.md +1 -1
- package/.pi/agents/pi-pi/keybinding-expert.md +1 -1
- package/.pi/agents/pi-pi/tui-expert.md +3 -3
- package/.pi/extensions/00-ultimate-pi-system-prompt.ts +2 -2
- package/.pi/extensions/budget-guard.ts +47 -18
- package/.pi/extensions/custom-footer.ts +8 -3
- package/.pi/extensions/custom-header.ts +2 -2
- package/.pi/extensions/debate-orchestrator.ts +1 -1
- package/.pi/extensions/dotenv-loader.ts +1 -1
- package/.pi/extensions/drift-monitor.ts +1 -1
- package/.pi/extensions/harness-ask-user.ts +1 -1
- package/.pi/extensions/harness-live-widget.ts +1 -1
- package/.pi/extensions/harness-run-context.ts +197 -33
- package/.pi/extensions/harness-telemetry.ts +1 -1
- package/.pi/extensions/harness-web-guard.ts +1 -1
- package/.pi/extensions/harness-web-tools.ts +1 -1
- package/.pi/extensions/lib/ask-user/dialog.ts +2 -2
- package/.pi/extensions/lib/ask-user/fallback.ts +1 -1
- package/.pi/extensions/lib/ask-user/render.ts +3 -3
- package/.pi/extensions/lib/harness-subagents/agent-loader.ts +1 -1
- package/.pi/extensions/lib/harness-subagents/agent-parser.ts +1 -1
- package/.pi/extensions/lib/harness-subagents/blackboard-tool.ts +1 -1
- package/.pi/extensions/lib/harness-subagents/harness-subagent-policy.ts +134 -0
- package/.pi/extensions/lib/harness-subagents/parent-ask-user-bridge.ts +89 -0
- package/.pi/extensions/lib/harness-subagents/spawn-policy.ts +20 -2
- package/.pi/extensions/lib/harness-subagents/vendored/agent-manager.ts +3 -2
- package/.pi/extensions/lib/harness-subagents/vendored/agent-runner.ts +44 -24
- package/.pi/extensions/lib/harness-subagents/vendored/context.ts +1 -1
- package/.pi/extensions/lib/harness-subagents/vendored/env.ts +1 -1
- package/.pi/extensions/lib/harness-subagents/vendored/index.ts +23 -2
- package/.pi/extensions/lib/harness-subagents/vendored/output-file.ts +1 -1
- package/.pi/extensions/lib/harness-subagents/vendored/schedule.ts +1 -1
- package/.pi/extensions/lib/harness-subagents/vendored/settings.ts +1 -1
- package/.pi/extensions/lib/harness-subagents/vendored/skill-loader.ts +1 -1
- package/.pi/extensions/lib/harness-subagents/vendored/types.ts +2 -2
- package/.pi/extensions/lib/harness-subagents/vendored/ui/agent-widget.ts +1 -1
- package/.pi/extensions/lib/harness-subagents/vendored/ui/conversation-viewer.ts +2 -2
- package/.pi/extensions/lib/harness-subagents/vendored/ui/schedule-menu.ts +1 -1
- package/.pi/extensions/observation-bus.ts +1 -1
- package/.pi/extensions/pi-model-router-harness.ts +1 -1
- package/.pi/extensions/policy-gate.ts +90 -20
- package/.pi/extensions/provider-payload-sanitize.ts +1 -1
- package/.pi/extensions/review-integrity.ts +76 -22
- package/.pi/extensions/sentrux-rules-sync.ts +1 -1
- package/.pi/extensions/soundboard.ts +1 -1
- package/.pi/extensions/test-diff-integrity.ts +1 -1
- package/.pi/extensions/trace-recorder.ts +1 -1
- package/.pi/extensions/ultimate-pi-vcc.ts +1 -1
- package/.pi/harness/agents.manifest.json +82 -78
- package/.pi/harness/docs/adrs/0031-harness-run-context.md +6 -3
- package/.pi/harness/docs/adrs/0032-harness-command-orchestration.md +37 -0
- package/.pi/harness/docs/adrs/README.md +1 -0
- package/.pi/harness/specs/budget-exhausted-event.schema.json +3 -1
- package/.pi/harness/specs/harness-spawn-context.schema.json +65 -0
- package/.pi/harness/specs/harness-turn.schema.json +18 -0
- package/.pi/lib/harness-agent-output.ts +41 -0
- package/.pi/lib/harness-run-context.ts +516 -37
- package/.pi/lib/harness-ui-state.ts +1 -1
- package/.pi/prompts/harness-auto.md +36 -61
- package/.pi/prompts/harness-critic.md +15 -28
- package/.pi/prompts/harness-eval.md +19 -27
- package/.pi/prompts/harness-incident.md +15 -34
- package/.pi/prompts/harness-plan.md +28 -49
- package/.pi/prompts/harness-review.md +16 -30
- package/.pi/prompts/harness-router-tune.md +16 -38
- package/.pi/prompts/harness-run.md +21 -38
- package/.pi/prompts/harness-setup.md +2 -0
- package/.pi/prompts/harness-trace.md +13 -30
- package/.pi/scripts/harness-generate-model-router.mjs +16 -13
- package/.pi/scripts/harness-verify.mjs +17 -0
- package/.pi/scripts/vendor-sync-pi-model-router.sh +10 -10
- package/CHANGELOG.md +25 -1
- package/README.md +4 -5
- package/THIRD_PARTY_NOTICES.md +1 -1
- package/package.json +13 -8
- package/vendor/pi-model-router/UPSTREAM_PIN.md +1 -1
- package/vendor/pi-model-router/extensions/commands.ts +2 -2
- package/vendor/pi-model-router/extensions/config.ts +2 -2
- package/vendor/pi-model-router/extensions/index.ts +1 -1
- package/vendor/pi-model-router/extensions/provider.ts +2 -2
- package/vendor/pi-model-router/extensions/routing.ts +2 -2
- package/vendor/pi-model-router/extensions/types.ts +1 -1
- package/vendor/pi-model-router/extensions/ui.ts +1 -1
- package/vendor/pi-model-router/package.json +4 -4
- package/vendor/pi-vcc/index.ts +1 -1
- package/vendor/pi-vcc/package.json +1 -1
- package/vendor/pi-vcc/src/commands/pi-vcc.ts +1 -1
- package/vendor/pi-vcc/src/commands/vcc-recall.ts +1 -1
- package/vendor/pi-vcc/src/core/content.ts +1 -1
- package/vendor/pi-vcc/src/core/load-messages.ts +1 -1
- package/vendor/pi-vcc/src/core/normalize.ts +1 -1
- package/vendor/pi-vcc/src/core/render-entries.ts +1 -1
- package/vendor/pi-vcc/src/core/report.ts +1 -1
- package/vendor/pi-vcc/src/core/search-entries.ts +1 -1
- package/vendor/pi-vcc/src/core/summarize.ts +1 -1
- package/vendor/pi-vcc/src/hooks/before-compact.ts +2 -2
- package/vendor/pi-vcc/src/tools/recall.ts +1 -1
- package/vendor/pi-vcc/src/types.ts +1 -1
- package/vendor/pi-vcc/tests/fixtures.ts +1 -1
- package/vendor/pi-vcc/tests/render-entries.test.ts +1 -1
- package/vendor/pi-vcc/tests/search-entries.test.ts +1 -1
- package/vendor/pi-vcc/tests/support/load-session.ts +2 -2
|
@@ -5,79 +5,54 @@ argument-hint: "\"<task>\" [--quick] [--risk low|med|high] [--budget <amount>]"
|
|
|
5
5
|
|
|
6
6
|
# harness-auto
|
|
7
7
|
|
|
8
|
-
|
|
9
|
-
|
|
10
|
-
`plan -> execute -> evaluate -> adversary -> severity-policy decision -> commit+PR (no auto-merge)`
|
|
8
|
+
Pipeline orchestrator — one session, sequential `Agent` spawns. Invoke **harness-orchestration** skill for agent IDs. Do **not** implement or review inline.
|
|
11
9
|
|
|
12
10
|
## Step 0 — Parse arguments
|
|
13
11
|
|
|
14
|
-
|
|
15
|
-
|
|
16
|
-
- required task: quoted or unquoted first value
|
|
17
|
-
- optional flags: `--quick`, `--risk low|med|high`, `--budget <amount>`
|
|
12
|
+
- required task (quoted or first token)
|
|
13
|
+
- optional: `--quick`, `--risk`, `--budget`
|
|
18
14
|
|
|
19
|
-
If task
|
|
15
|
+
If task missing:
|
|
20
16
|
|
|
21
17
|
`Usage: /harness-auto "<task>" [--quick] [--risk low|med|high] [--budget <amount>]`
|
|
22
18
|
|
|
23
|
-
##
|
|
24
|
-
|
|
25
|
-
1. Build and approve plan packet at the canonical active-run path before any mutation (extension allocates one `run_id` for the auto pipeline).
|
|
26
|
-
2. Execute only approved scope with rollback artifacts.
|
|
27
|
-
3. Run independent evaluator then adversarial reviewer.
|
|
28
|
-
4. Apply severity policy + strict pre-PR gates.
|
|
29
|
-
5. If gates pass, auto-commit and open PR; never auto-merge.
|
|
30
|
-
|
|
31
|
-
## Locked decisions (must not be changed)
|
|
32
|
-
|
|
33
|
-
- Always produce a plan packet before mutation.
|
|
34
|
-
- Adversarial review is always required.
|
|
35
|
-
- Merge blocking authority is severity-policy-engine.
|
|
36
|
-
- Router tuning is propose-and-approve only.
|
|
37
|
-
- Plan ambiguity must use `ask_user` (harness-decisions skill) — no silent guessing.
|
|
38
|
-
- Rollback artifact must be revert-commit-ready and include:
|
|
39
|
-
- revert command
|
|
40
|
-
- prepared revert branch
|
|
41
|
-
- patch bundle
|
|
42
|
-
- Debate profile is aggressive with locked confidence weights:
|
|
43
|
-
- claim_quality=0.20
|
|
44
|
-
- reproducibility=0.40
|
|
45
|
-
- agreement=0.40
|
|
46
|
-
- Strict pre-PR gate is mandatory.
|
|
47
|
-
- Post-pass behavior is auto-commit and auto-open-PR.
|
|
48
|
-
- Never auto-merge PR.
|
|
49
|
-
|
|
50
|
-
## Guardrails
|
|
51
|
-
|
|
52
|
-
- Do not overthink straightforward gate outcomes; enforce gates deterministically.
|
|
53
|
-
- Only follow the locked pipeline and governance decisions listed here.
|
|
54
|
-
- Never bypass mandatory safety gates, even in `--quick` mode.
|
|
19
|
+
## Orchestration (required) — same session
|
|
55
20
|
|
|
56
|
-
|
|
21
|
+
1. **Plan** — spawn `harness/planner` → parse JSON → present full plan → `ask_user` Approve/Changes/Cancel → write `plan-packet.json` only on Approve (advances phase via policy-gate).
|
|
22
|
+
2. **Execute** — spawn `harness/executor` with `HarnessSpawnContext` (`mode: execute`). Summarize handoff bullets for next spawn (do not paste full subagent log).
|
|
23
|
+
3. **Eval** — spawn `harness/evaluator` (`mode: benchmark`) after parent scripts if needed.
|
|
24
|
+
4. **Review** — spawn `harness/evaluator` (`mode: verdict`) OR rely on eval verdict if policy allows — prefer both when strict gates require.
|
|
25
|
+
5. **Adversary** — spawn `harness/adversary` with artifact paths.
|
|
26
|
+
6. **Tie-breaker** — spawn `harness/tie-breaker` only if debate unresolved.
|
|
27
|
+
7. **Parent** — apply locked strict gates below; commit/PR only if all pass.
|
|
57
28
|
|
|
58
|
-
|
|
29
|
+
No new Pi session for review — subagents use isolated context (`inherit_context: false`).
|
|
59
30
|
|
|
60
|
-
|
|
61
|
-
2. Execution completed within approved scope.
|
|
62
|
-
3. Independent evaluator passed.
|
|
63
|
-
4. Adversarial review completed with consensus packet.
|
|
64
|
-
5. Severity-policy-engine output is `pass` or `conditional_pass`.
|
|
65
|
-
6. Benchmark delta checks passed.
|
|
66
|
-
7. Rollback artifacts generated.
|
|
31
|
+
## Locked decisions (do not change)
|
|
67
32
|
|
|
68
|
-
|
|
33
|
+
- Always produce and approve plan before mutation.
|
|
34
|
+
- Adversarial review always required.
|
|
35
|
+
- Severity-policy-engine blocks merge.
|
|
36
|
+
- Router tuning propose-and-approve only.
|
|
37
|
+
- Plan ambiguity → parent `ask_user` (harness-decisions).
|
|
38
|
+
- Rollback artifacts: revert command, revert branch, patch bundle.
|
|
39
|
+
- Debate weights: claim_quality=0.20, reproducibility=0.40, agreement=0.40.
|
|
40
|
+
- Strict pre-PR gate mandatory; auto-commit + open PR; never auto-merge.
|
|
69
41
|
|
|
70
|
-
|
|
71
|
-
|
|
72
|
-
|
|
73
|
-
|
|
74
|
-
|
|
42
|
+
## Strict gates
|
|
43
|
+
|
|
44
|
+
Block commit/PR if any fails: plan gate, execution in scope, evaluator pass, adversary complete, severity-policy pass/conditional_pass, benchmark deltas, rollback artifacts.
|
|
45
|
+
|
|
46
|
+
## Notes
|
|
75
47
|
|
|
76
|
-
|
|
48
|
+
- `--quick` reduces breadth, never safety gates.
|
|
49
|
+
- High risk/ambiguity → stop and recommend manual `/harness-plan` with `ask_user`.
|
|
50
|
+
- Interrupt: `/harness-abort [reason]` then `/harness-plan`.
|
|
51
|
+
- Artifact refs under active run dir; `/harness-run-status` or `/harness-trace-last` for handoff.
|
|
77
52
|
|
|
78
|
-
|
|
53
|
+
## Completion
|
|
79
54
|
|
|
80
|
-
1.
|
|
81
|
-
2.
|
|
82
|
-
3.
|
|
83
|
-
4.
|
|
55
|
+
1. Pipeline status per gate
|
|
56
|
+
2. Artifact references
|
|
57
|
+
3. Policy outcome: `pass`, `conditional_pass`, `block`, or `human_required`
|
|
58
|
+
4. Next action (PR, replan, rollback, override)
|
|
@@ -5,46 +5,33 @@ argument-hint: "[--run <run-id>] [--trace <trace-ref>] [--risk low|med|high]"
|
|
|
5
5
|
|
|
6
6
|
# harness-critic
|
|
7
7
|
|
|
8
|
-
|
|
8
|
+
Orchestrator — spawn `harness/adversary`.
|
|
9
9
|
|
|
10
10
|
## Step 0 — Parse arguments
|
|
11
11
|
|
|
12
|
-
Read `$ARGUMENTS` and parse:
|
|
13
|
-
|
|
14
12
|
- optional: `--run <run-id>` (recovery only)
|
|
15
13
|
- optional: `--trace <trace-ref>`, `--risk low|med|high`
|
|
16
14
|
|
|
17
|
-
|
|
18
|
-
|
|
19
|
-
## Process
|
|
20
|
-
|
|
21
|
-
1. Assume hidden regressions exist and identify likely fault surfaces.
|
|
22
|
-
2. Challenge evaluator/executor assumptions with reproducible probes.
|
|
23
|
-
3. Emit structured adversarial findings for severity policy consumption.
|
|
24
|
-
|
|
25
|
-
## Requirements
|
|
15
|
+
Happy path: omit `--run`.
|
|
26
16
|
|
|
27
|
-
|
|
28
|
-
- Attempt to invalidate evaluator assumptions with concrete evidence.
|
|
29
|
-
- Emit `AdversaryReport` matching `.pi/harness/specs/adversary-report.schema.json`.
|
|
30
|
-
- Flag `block_merge=true` for high-confidence correctness/security/test-integrity risks.
|
|
17
|
+
## Orchestration (required)
|
|
31
18
|
|
|
32
|
-
|
|
19
|
+
1. Build `HarnessSpawnContext` with `mode: adversary`, run artifacts, plan path, trace refs.
|
|
20
|
+
2. Spawn:
|
|
33
21
|
|
|
34
|
-
|
|
35
|
-
|
|
36
|
-
|
|
22
|
+
```
|
|
23
|
+
Agent({ subagent_type: "harness/adversary", prompt: "…" })
|
|
24
|
+
```
|
|
37
25
|
|
|
38
|
-
|
|
26
|
+
3. `get_subagent_result` — parse `AdversaryReport` JSON; parent persists for severity policy.
|
|
39
27
|
|
|
40
|
-
|
|
41
|
-
- Structured `AdversaryReport` JSON.
|
|
42
|
-
- Clear merge-block recommendation.
|
|
28
|
+
## Parent rules
|
|
43
29
|
|
|
44
|
-
|
|
30
|
+
- Assume hidden regressions until disproven (in subagent).
|
|
31
|
+
- No new Pi session required.
|
|
45
32
|
|
|
46
|
-
|
|
33
|
+
## Completion
|
|
47
34
|
|
|
48
35
|
- `block_merge` decision
|
|
49
|
-
-
|
|
50
|
-
-
|
|
36
|
+
- Top findings with repro pointers
|
|
37
|
+
- `recommendation`: `proceed`, `conditional_pass`, or `block`
|
|
@@ -5,47 +5,39 @@ argument-hint: "[--run <run-id>] [--baseline <ref>] [--suite <name>]"
|
|
|
5
5
|
|
|
6
6
|
# harness-eval
|
|
7
7
|
|
|
8
|
-
|
|
8
|
+
Orchestrator — run deterministic scripts in parent if needed, then spawn `harness/evaluator` with `mode: benchmark`.
|
|
9
9
|
|
|
10
10
|
## Step 0 — Parse arguments
|
|
11
11
|
|
|
12
|
-
|
|
13
|
-
|
|
14
|
-
- optional: `--run <run-id>` (recovery only — active run is used when omitted)
|
|
12
|
+
- optional: `--run <run-id>` (recovery only)
|
|
15
13
|
- optional: `--baseline <ref>`, `--suite <name>`
|
|
16
14
|
|
|
17
|
-
|
|
15
|
+
Happy path: omit `--run`; use active run from `[HarnessRunContext]`.
|
|
18
16
|
|
|
19
|
-
If no active run
|
|
17
|
+
If no active run:
|
|
20
18
|
|
|
21
19
|
`No active run. Finish /harness-plan and /harness-run first, or use /harness-run-status.`
|
|
22
20
|
|
|
23
|
-
|
|
24
|
-
|
|
25
|
-
## Process
|
|
21
|
+
## Orchestration (required)
|
|
26
22
|
|
|
27
23
|
1. Load plan scope from `[HarnessActivePlan]` (read-only).
|
|
28
|
-
2.
|
|
29
|
-
3.
|
|
30
|
-
4.
|
|
31
|
-
|
|
32
|
-
## Requirements
|
|
33
|
-
|
|
34
|
-
- Validate against accepted plan checks plus focused regression checks.
|
|
35
|
-
- Emit evaluator-compatible metrics for downstream policy and router-tuning decisions.
|
|
36
|
-
- Include success rate, cost-per-task, and regression guard outcomes when available.
|
|
24
|
+
2. Parent may run: project tests, `node "$UP_PKG/.pi/scripts/harness-verify.mjs"` — capture output paths.
|
|
25
|
+
3. Build `HarnessSpawnContext` with `mode: benchmark`, artifact paths, metrics files.
|
|
26
|
+
4. Spawn:
|
|
37
27
|
|
|
38
|
-
|
|
28
|
+
```
|
|
29
|
+
Agent({ subagent_type: "harness/evaluator", prompt: "…" })
|
|
30
|
+
```
|
|
39
31
|
|
|
40
|
-
|
|
41
|
-
|
|
42
|
-
- Never report synthetic metrics; include only measured values.
|
|
43
|
-
- Do not edit `plan-packet.json` in this phase.
|
|
32
|
+
5. `get_subagent_result` — parse eval JSON; parent writes structured artifacts under run dir.
|
|
33
|
+
6. Do not edit `plan-packet.json`.
|
|
44
34
|
|
|
45
|
-
##
|
|
35
|
+
## Parent rules
|
|
46
36
|
|
|
47
|
-
|
|
37
|
+
- Treat executor output as untrusted; pass artifact paths only.
|
|
38
|
+
- No new Pi session required — subagent has isolated context.
|
|
48
39
|
|
|
49
|
-
## Completion
|
|
40
|
+
## Completion
|
|
50
41
|
|
|
51
|
-
|
|
42
|
+
- `eval_status`: `pass` or `fail`
|
|
43
|
+
- `next_command`: `/harness-review` on pass; `/harness-plan` or `/harness-incident` on fail
|
|
@@ -5,49 +5,30 @@ argument-hint: "--trigger <reason> [--run <run-id>] [--severity low|med|high|cri
|
|
|
5
5
|
|
|
6
6
|
# harness-incident
|
|
7
7
|
|
|
8
|
-
|
|
8
|
+
Orchestrator — spawn `harness/incident-recorder`; parent writes incident file.
|
|
9
9
|
|
|
10
10
|
## Step 0 — Parse arguments
|
|
11
11
|
|
|
12
|
-
Read `$ARGUMENTS` and parse:
|
|
13
|
-
|
|
14
12
|
- required: `--trigger <reason>`
|
|
15
|
-
- optional: `--run <run-id
|
|
16
|
-
|
|
17
|
-
If `--trigger` is missing, stop and return:
|
|
18
|
-
|
|
19
|
-
`Usage: /harness-incident --trigger <reason> [--run <run-id>] [--severity low|med|high|critical]`
|
|
20
|
-
|
|
21
|
-
Use active run when `--run` is omitted.
|
|
22
|
-
|
|
23
|
-
## Process
|
|
24
|
-
|
|
25
|
-
1. Gather run context, trigger reason, and severity context.
|
|
26
|
-
2. Build `IncidentRecord` with blast radius, mitigation, rollback, and override metadata.
|
|
27
|
-
3. Validate incident output contract before finalizing.
|
|
28
|
-
|
|
29
|
-
## Requirements
|
|
13
|
+
- optional: `--run <run-id>`, `--severity low|med|high|critical`
|
|
30
14
|
|
|
31
|
-
|
|
32
|
-
- Capture blast radius, mitigation, rollback refs, and postmortem requirement.
|
|
33
|
-
- If a policy block is overridden, record single-human approver and explicit justification.
|
|
15
|
+
If `--trigger` missing:
|
|
34
16
|
|
|
35
|
-
|
|
17
|
+
`Usage: /harness-incident --trigger <reason> [--run <run-id>] [--severity …]`
|
|
36
18
|
|
|
37
|
-
|
|
38
|
-
- Only record details supported by available run artifacts and explicit inputs.
|
|
39
|
-
- Never omit override approver identity or justification when override occurred.
|
|
19
|
+
## Orchestration (required)
|
|
40
20
|
|
|
41
|
-
|
|
21
|
+
1. Build `HarnessSpawnContext` with `mode: incident`, trigger, severity, run paths.
|
|
22
|
+
2. Spawn:
|
|
42
23
|
|
|
43
|
-
|
|
44
|
-
-
|
|
45
|
-
|
|
24
|
+
```
|
|
25
|
+
Agent({ subagent_type: "harness/incident-recorder", prompt: "…" })
|
|
26
|
+
```
|
|
46
27
|
|
|
47
|
-
|
|
28
|
+
3. `get_subagent_result` — validate `IncidentRecord` draft; parent writes under `.pi/harness/incidents/`.
|
|
48
29
|
|
|
49
|
-
|
|
30
|
+
## Completion
|
|
50
31
|
|
|
51
|
-
- `incident_status
|
|
52
|
-
-
|
|
53
|
-
-
|
|
32
|
+
- `incident_status`: `recorded` or `needs_input`
|
|
33
|
+
- `rollback_action`: `execute_now` or `standby`
|
|
34
|
+
- `postmortem_required`: true/false
|
|
@@ -5,75 +5,54 @@ argument-hint: "\"<task>\" [--risk low|med|high] [--budget <amount>] [--quick]"
|
|
|
5
5
|
|
|
6
6
|
# harness-plan
|
|
7
7
|
|
|
8
|
-
|
|
8
|
+
Orchestrator only — spawn `harness/planner` once; planner runs clarification and approval via `ask_user` (parent UI). Write `plan-packet.json` only after approval. Do **not** plan inline in this session.
|
|
9
9
|
|
|
10
10
|
## Step 0 — Parse arguments
|
|
11
11
|
|
|
12
|
-
Read `$ARGUMENTS
|
|
12
|
+
Read `$ARGUMENTS`:
|
|
13
13
|
|
|
14
14
|
- task statement (required)
|
|
15
|
-
- optional
|
|
15
|
+
- optional: `--risk low|med|high`, `--budget <amount>`, `--quick`
|
|
16
16
|
|
|
17
|
-
If task is missing
|
|
17
|
+
If task is missing:
|
|
18
18
|
|
|
19
19
|
`Usage: /harness-plan "<task>" [--risk low|med|high] [--budget <amount>] [--quick]`
|
|
20
20
|
|
|
21
|
-
|
|
21
|
+
`--quick` narrows planning breadth only — it does **not** skip user approval.
|
|
22
22
|
|
|
23
23
|
## Active plan context
|
|
24
24
|
|
|
25
|
-
|
|
25
|
+
Use injected context only — **do not** read `.pi/harness/specs/*.schema.json` or explore specs with bash.
|
|
26
26
|
|
|
27
|
-
|
|
28
|
-
- Treat the user task as **revise/amend** of that packet (not a greenfield plan), unless `/harness-new-run` was used.
|
|
29
|
-
- After drift replan or post-abort, update the same canonical file.
|
|
27
|
+
If `[HarnessActivePlan]` is present:
|
|
30
28
|
|
|
31
|
-
|
|
29
|
+
- Treat task as **revise/amend** unless `/harness-new-run` was used.
|
|
30
|
+
- Pass `mode: revise` using the `HarnessSpawnContext` JSON in `[HarnessRunContext]`.
|
|
32
31
|
|
|
33
|
-
|
|
32
|
+
Otherwise use `HarnessSpawnContext` from `[HarnessRunContext]` for greenfield `mode: create`.
|
|
34
33
|
|
|
35
|
-
|
|
36
|
-
2. If ambiguity blocks safe execution planning, call `ask_user` (harness-decisions skill). Stop with `needs_clarification` if the user cancels.
|
|
37
|
-
3. Build a `PlanPacket` that is valid against `.pi/harness/specs/plan-packet.schema.json`.
|
|
38
|
-
4. **Write** the PlanPacket JSON to the canonical `plan_packet_path` before completing.
|
|
39
|
-
5. Include rollback artifacts in all required forms.
|
|
34
|
+
## Orchestration (required)
|
|
40
35
|
|
|
41
|
-
|
|
36
|
+
1. Copy the `HarnessSpawnContext=…` JSON from `[HarnessRunContext]` into the spawn prompt (adjust `risk_level`, `quick`, `mode` from `$ARGUMENTS` if needed).
|
|
37
|
+
2. Spawn **once** with **`inherit_context: false`**:
|
|
42
38
|
|
|
43
|
-
|
|
44
|
-
|
|
45
|
-
|
|
46
|
-
- Include rollback artifacts in all three forms:
|
|
47
|
-
- revert command
|
|
48
|
-
- prepared revert branch name
|
|
49
|
-
- patch bundle path
|
|
50
|
-
- Set risk level to `high` if uncertainty, broad blast radius, or policy-sensitive surfaces are involved.
|
|
51
|
-
- Do **not** embed `plan_id=` in the user prompt for policy sync — the extension sets `approvedPlan` from the written file.
|
|
39
|
+
```
|
|
40
|
+
Agent({ subagent_type: "harness/planner", prompt: "<task + HarnessSpawnContext JSON + output schema>" })
|
|
41
|
+
```
|
|
52
42
|
|
|
53
|
-
|
|
43
|
+
3. `get_subagent_result` — parse final JSON (`status`, `plan_packet`, `human_summary`, `clarification`) via fenced `json` block.
|
|
44
|
+
4. If `status === "ready"` and user approved in the subagent (`ask_user` Approve), validate `plan_packet` fields, then **write** `PlanPacket` JSON to canonical `plan_packet_path` from `[HarnessRunContext]`.
|
|
45
|
+
5. If `needs_clarification`, tell the user the planner is waiting — do **not** re-spawn; user should answer in the subagent or re-run `/harness-plan`.
|
|
46
|
+
6. Do **not** call `ask_user` in this parent session for planner clarification or approval.
|
|
54
47
|
|
|
55
|
-
|
|
56
|
-
- Only plan the requested scope; do not execute or widen implementation.
|
|
57
|
-
- Never speculate about code or configuration that was not read.
|
|
48
|
+
## Parent rules
|
|
58
49
|
|
|
59
|
-
|
|
50
|
+
- Do not mutate project source files — only `plan-packet.json` after subagent approval is recorded.
|
|
51
|
+
- Do not embed `plan_id=` in prompts for policy sync.
|
|
52
|
+
- Optional: `/harness-plan-commit` if write was blocked but approval exists.
|
|
60
53
|
|
|
61
|
-
|
|
54
|
+
## Completion
|
|
62
55
|
|
|
63
|
-
|
|
64
|
-
|
|
65
|
-
|
|
66
|
-
- acceptance checks
|
|
67
|
-
- rollback plan
|
|
68
|
-
2. Confirmation that PlanPacket was written to the canonical path.
|
|
69
|
-
|
|
70
|
-
Do not proceed to execution from this command.
|
|
71
|
-
|
|
72
|
-
## Completion behavior
|
|
73
|
-
|
|
74
|
-
Always end with:
|
|
75
|
-
|
|
76
|
-
- one-line `plan_status` (`ready` or `needs_clarification`)
|
|
77
|
-
- the final `risk_level` used
|
|
78
|
-
- explicit `next_command` recommendation: `/harness-run` when `ready` (never `/harness-run --plan …`)
|
|
79
|
-
- if `needs_clarification`, tell the user they may reply in plain language or run `/harness-plan` again with updates
|
|
56
|
+
- `plan_status`: `ready` or `needs_clarification`
|
|
57
|
+
- `risk_level` used
|
|
58
|
+
- `next_command`: `/harness-run` when `ready` (never `/harness-run --plan …`)
|
|
@@ -5,47 +5,33 @@ argument-hint: "[--run <run-id>] [--trace <trace-ref>]"
|
|
|
5
5
|
|
|
6
6
|
# harness-review
|
|
7
7
|
|
|
8
|
-
|
|
8
|
+
Orchestrator — spawn `harness/evaluator` with `mode: verdict`.
|
|
9
9
|
|
|
10
10
|
## Step 0 — Parse arguments
|
|
11
11
|
|
|
12
|
-
Read `$ARGUMENTS` and parse:
|
|
13
|
-
|
|
14
12
|
- optional: `--run <run-id>` (recovery only)
|
|
15
13
|
- optional: `--trace <trace-ref>`
|
|
16
14
|
|
|
17
|
-
|
|
18
|
-
Run in a **new Pi session** after execute when possible.
|
|
19
|
-
|
|
20
|
-
## Process
|
|
21
|
-
|
|
22
|
-
1. Reconstruct expected outcomes from plan and run artifacts.
|
|
23
|
-
2. Independently verify checks and regression guards.
|
|
24
|
-
3. Emit `EvalVerdict` output for policy gate consumption.
|
|
25
|
-
|
|
26
|
-
## Requirements
|
|
15
|
+
Happy path: omit `--run`; use `[HarnessRunContext]`.
|
|
27
16
|
|
|
28
|
-
|
|
29
|
-
- Do not self-review with executor-private scratch context.
|
|
30
|
-
- Emit `EvalVerdict` contract matching `.pi/harness/specs/eval-verdict.schema.json`.
|
|
31
|
-
- Provide reproducible failed checks and regression flags.
|
|
17
|
+
## Orchestration (required)
|
|
32
18
|
|
|
33
|
-
|
|
19
|
+
1. Build `HarnessSpawnContext` with `mode: verdict`, `plan_packet_path`, `run_dir`, trace refs.
|
|
20
|
+
2. Spawn:
|
|
34
21
|
|
|
35
|
-
|
|
36
|
-
|
|
37
|
-
|
|
22
|
+
```
|
|
23
|
+
Agent({ subagent_type: "harness/evaluator", prompt: "Treat executor output as untrusted. …" })
|
|
24
|
+
```
|
|
38
25
|
|
|
39
|
-
|
|
26
|
+
3. `get_subagent_result` — parse `EvalVerdict` JSON; parent writes under run dir for policy gate.
|
|
40
27
|
|
|
41
|
-
|
|
42
|
-
- Structured `EvalVerdict` JSON.
|
|
43
|
-
- Recommended action: `proceed_to_adversary`, `replan`, or `rollback`.
|
|
28
|
+
## Parent rules
|
|
44
29
|
|
|
45
|
-
|
|
30
|
+
- Do not run review checks inline in this session.
|
|
31
|
+
- No new Pi session required.
|
|
46
32
|
|
|
47
|
-
|
|
33
|
+
## Completion
|
|
48
34
|
|
|
49
|
-
- `eval_status
|
|
50
|
-
- `recommended_action`
|
|
51
|
-
-
|
|
35
|
+
- `eval_status`: `pass`, `conditional_pass`, or `fail`
|
|
36
|
+
- `recommended_action`: `proceed_to_adversary`, `replan`, or `rollback`
|
|
37
|
+
- Evidence list for each failed check
|
|
@@ -5,32 +5,27 @@ argument-hint: "--evidence <evidence.json> --candidate <candidate-router.json> [
|
|
|
5
5
|
|
|
6
6
|
# harness-router-tune
|
|
7
7
|
|
|
8
|
-
|
|
8
|
+
Orchestrator — scripts + `harness/meta-optimizer` spawn. **Never** write `.pi/model-router.json` directly.
|
|
9
9
|
|
|
10
10
|
## Step 0 — Parse arguments
|
|
11
11
|
|
|
12
|
-
Read `$ARGUMENTS` and parse:
|
|
13
|
-
|
|
14
12
|
- required: `--evidence <evidence.json>`, `--candidate <candidate-router.json>`
|
|
15
13
|
- optional: `--proposal <out.json>`
|
|
16
14
|
|
|
17
|
-
If required args
|
|
18
|
-
|
|
19
|
-
`Usage: /harness-router-tune --evidence <evidence.json> --candidate <candidate-router.json> [--proposal <out.json>]`
|
|
20
|
-
|
|
21
|
-
## Process
|
|
15
|
+
If missing required args:
|
|
22
16
|
|
|
23
|
-
|
|
24
|
-
2. Generate a proposal artifact only (no live router mutation).
|
|
25
|
-
3. Require explicit human approval metadata before any apply step.
|
|
17
|
+
`Usage: /harness-router-tune --evidence <path> --candidate <path> [--proposal <out.json>]`
|
|
26
18
|
|
|
27
|
-
##
|
|
19
|
+
## Orchestration (required)
|
|
28
20
|
|
|
29
|
-
|
|
21
|
+
1. Parent validates evidence paths exist.
|
|
22
|
+
2. Optionally spawn:
|
|
30
23
|
|
|
31
|
-
|
|
24
|
+
```
|
|
25
|
+
Agent({ subagent_type: "harness/meta-optimizer", prompt: "mode: tune, evidence paths…" })
|
|
26
|
+
```
|
|
32
27
|
|
|
33
|
-
|
|
28
|
+
3. Parent runs proposal script:
|
|
34
29
|
|
|
35
30
|
```bash
|
|
36
31
|
node .pi/harness/router/propose-router-tuning.mjs \
|
|
@@ -39,8 +34,8 @@ node .pi/harness/router/propose-router-tuning.mjs \
|
|
|
39
34
|
--proposal-out .pi/harness/router/proposals/<id>.json
|
|
40
35
|
```
|
|
41
36
|
|
|
42
|
-
|
|
43
|
-
|
|
37
|
+
4. `ask_user` approve / reject / edit (harness-decisions).
|
|
38
|
+
5. Apply only after approval:
|
|
44
39
|
|
|
45
40
|
```bash
|
|
46
41
|
node .pi/harness/router/apply-router-proposal.mjs \
|
|
@@ -50,25 +45,8 @@ node .pi/harness/router/apply-router-proposal.mjs \
|
|
|
50
45
|
--write
|
|
51
46
|
```
|
|
52
47
|
|
|
53
|
-
##
|
|
54
|
-
|
|
55
|
-
- Minimum sample count threshold met.
|
|
56
|
-
- Pre/post success-rate delta included.
|
|
57
|
-
- Cost-per-task delta included.
|
|
58
|
-
- Regression guard status present and passing.
|
|
59
|
-
|
|
60
|
-
If any requirement is missing, stop with `human_required`.
|
|
61
|
-
|
|
62
|
-
## Guardrails
|
|
63
|
-
|
|
64
|
-
- Do not overthink weak evidence; reject incomplete proposals quickly.
|
|
65
|
-
- Only produce proposal/apply instructions within this contract.
|
|
66
|
-
- Never apply tuning without explicit human approver identity and justification.
|
|
67
|
-
|
|
68
|
-
## Completion behavior
|
|
69
|
-
|
|
70
|
-
End with:
|
|
48
|
+
## Completion
|
|
71
49
|
|
|
72
|
-
- `tuning_status
|
|
73
|
-
-
|
|
74
|
-
-
|
|
50
|
+
- `tuning_status`: `proposed`, `human_required`, or `rejected`
|
|
51
|
+
- Evidence gate summary
|
|
52
|
+
- Confirm `.pi/model-router.json` was not mutated without apply script
|
|
@@ -5,56 +5,39 @@ argument-hint: "[--budget <amount>]"
|
|
|
5
5
|
|
|
6
6
|
# harness-run
|
|
7
7
|
|
|
8
|
-
|
|
8
|
+
Orchestrator only — spawn `harness/executor`. Do **not** implement inline.
|
|
9
9
|
|
|
10
10
|
## Step 0 — Parse arguments
|
|
11
11
|
|
|
12
|
-
Read `$ARGUMENTS` and parse:
|
|
13
|
-
|
|
14
12
|
- optional: `--budget <amount>`
|
|
13
|
+
- Do **not** use `--plan` on happy path — load from `[HarnessActivePlan]` / `plan_packet_path`.
|
|
15
14
|
|
|
16
|
-
|
|
17
|
-
|
|
18
|
-
If the extension reports plan not ready, stop and return:
|
|
15
|
+
If plan not ready:
|
|
19
16
|
|
|
20
17
|
`Run /harness-plan first — no approved plan in active run context.`
|
|
21
18
|
|
|
22
|
-
|
|
23
|
-
|
|
24
|
-
## Process
|
|
25
|
-
|
|
26
|
-
1. Load PlanPacket from the injected canonical path and confirm it is valid.
|
|
27
|
-
2. Execute only within approved scope.
|
|
28
|
-
3. Run focused validations mapped to approved acceptance checks.
|
|
29
|
-
4. Produce rollback artifacts and handoff references for downstream gates.
|
|
30
|
-
|
|
31
|
-
## Gate behavior
|
|
32
|
-
|
|
33
|
-
- Refuse execution if active plan is not ready (extension blocks before the agent runs).
|
|
34
|
-
- Keep edits strictly within approved scope.
|
|
35
|
-
- If scope drift appears, stop and return to `harness-plan`.
|
|
36
|
-
- For **implementation forks** inside approved scope, call `ask_user` with 2–4 options. For plan-level ambiguity, stop and return to `harness-plan`.
|
|
37
|
-
- Record evaluator/adversary prerequisites for downstream gates.
|
|
38
|
-
- Always prepare rollback artifacts as part of execution output.
|
|
19
|
+
## Orchestration (required)
|
|
39
20
|
|
|
40
|
-
|
|
21
|
+
1. Confirm `[HarnessActivePlan]` / extension reports plan ready.
|
|
22
|
+
2. Build `HarnessSpawnContext` with `mode: execute`, `plan_packet_path`, `run_dir`, `acceptance_checks` from plan file.
|
|
23
|
+
3. Spawn:
|
|
41
24
|
|
|
42
|
-
|
|
43
|
-
|
|
44
|
-
|
|
25
|
+
```
|
|
26
|
+
Agent({ subagent_type: "harness/executor", prompt: "<HarnessSpawnContext + handoff>" })
|
|
27
|
+
```
|
|
45
28
|
|
|
46
|
-
|
|
29
|
+
4. `get_subagent_result` — parse executor JSON (`execution_status`, validations, rollback refs).
|
|
30
|
+
5. Parent persists trace/handoff artifacts under run dir if needed; do not self-review.
|
|
47
31
|
|
|
48
|
-
|
|
49
|
-
- Files changed and why.
|
|
50
|
-
- Targeted validations run.
|
|
51
|
-
- Trace pointers and rollback references.
|
|
32
|
+
## Parent rules
|
|
52
33
|
|
|
53
|
-
|
|
34
|
+
- Refuse if plan not approved.
|
|
35
|
+
- On `scope_drift`, stop and recommend `/harness-plan`.
|
|
36
|
+
- Do not call `ask_user` for plan-level ambiguity — return to plan command.
|
|
54
37
|
|
|
55
|
-
|
|
38
|
+
## Completion
|
|
56
39
|
|
|
57
|
-
|
|
58
|
-
|
|
59
|
-
|
|
60
|
-
|
|
40
|
+
- `execution_status`: `completed`, `blocked`, or `scope_drift`
|
|
41
|
+
- `validation_summary` with command evidence
|
|
42
|
+
- `handoff_ready` for evaluator/adversary
|
|
43
|
+
- `next_command`: `/harness-eval` (same session — spawn isolated review agents; no new Pi session)
|