npm - ultimate-pi - Versions diffs - 0.1.7 → 0.2.2 - Mend

ultimate-pi 0.1.7 → 0.2.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (524) hide show

package/.pi/prompts/harness-critic.md ADDED Viewed

@@ -0,0 +1,52 @@
+---
+description: Adversarial reviewer command with reproducible, merge-blocking findings.
+argument-hint: "--run <run-id> [--trace <trace-ref>] [--risk low|med|high]"
+---
+# harness-critic
+Run adversarial review against the candidate result.
+## Step 0 — Parse arguments
+Read `$ARGUMENTS` and parse:
+- required: `--run <run-id>`
+- optional: `--trace <trace-ref>`, `--risk low|med|high`
+If `--run` is missing, stop and return:
+`Usage: /harness-critic --run <run-id> [--trace <trace-ref>] [--risk low|med|high]`
+## Process
+1. Assume hidden regressions exist and identify likely fault surfaces.
+2. Challenge evaluator/executor assumptions with reproducible probes.
+3. Emit structured adversarial findings for severity policy consumption.
+## Requirements
+- Assume hidden regressions exist until disproven.
+- Attempt to invalidate evaluator assumptions with concrete evidence.
+- Emit `AdversaryReport` matching `.pi/harness/specs/adversary-report.schema.json`.
+- Flag `block_merge=true` for high-confidence correctness/security/test-integrity risks.
+## Guardrails
+- Do not overthink speculative attacks; prioritize reproducible findings.
+- Only report risks tied to candidate behavior and gate policy.
+- Never claim a defect without evidence and repro steps.
+## Output
+- Prioritized findings with repro steps.
+- Structured `AdversaryReport` JSON.
+- Clear merge-block recommendation.
+## Completion behavior
+Always end with:
+- `block_merge` decision
+- top 1-3 high-confidence findings with repro pointers
+- explicit recommendation (`proceed`, `conditional_pass`, or `block`)

package/.pi/prompts/harness-eval.md ADDED Viewed

@@ -0,0 +1,51 @@
+---
+description: Run focused benchmark/eval checks and emit structured harness verdict artifacts.
+argument-hint: "--run <run-id> [--baseline <ref>] [--suite <name>]"
+---
+# harness-eval
+Run focused evaluations for the run and produce structured artifacts.
+## Step 0 — Parse arguments
+Read `$ARGUMENTS` and parse:
+- required: `--run <run-id>`
+- optional: `--baseline <ref>`, `--suite <name>`
+If `--run` is missing, stop and return:
+`Usage: /harness-eval --run <run-id> [--baseline <ref>] [--suite <name>]`
+## Process
+1. Run plan-aligned acceptance checks plus focused regressions.
+2. Collect evaluator-compatible metrics and guard outcomes.
+3. Emit structured artifacts keyed by run ID.
+## Requirements
+- Validate against accepted plan checks plus focused regression checks.
+- Emit evaluator-compatible metrics for downstream policy and router-tuning decisions.
+- Include success rate, cost-per-task, and regression guard outcomes when available.
+## Guardrails
+- Do not overthink simple benchmark outcomes; report measured results directly.
+- Only evaluate the requested run/suite/baseline scope.
+- Never report synthetic metrics; include only measured values.
+## Output
+- Benchmark/eval summary table.
+- Structured verdict artifacts referenced by run ID.
+- Pass/fail recommendation for policy gate consumption.
+## Completion behavior
+End with a compact evaluator handoff:
+- measured metrics (`success_rate`, `cost_per_task`, regression guard status)
+- verdict (`pass`/`fail`)
+- artifact paths keyed by run ID

package/.pi/prompts/harness-incident.md ADDED Viewed

@@ -0,0 +1,51 @@
+---
+description: Create incident record with rollback and override trail for harness failures.
+argument-hint: "--run <run-id> --trigger <reason> [--severity low|med|high|critical]"
+---
+# harness-incident
+Create a structured incident record for blocked or failed harness runs.
+## Step 0 — Parse arguments
+Read `$ARGUMENTS` and parse:
+- required: `--run <run-id>`, `--trigger <reason>`
+- optional: `--severity low|med|high|critical`
+If required flags are missing, stop and return:
+`Usage: /harness-incident --run <run-id> --trigger <reason> [--severity low|med|high|critical]`
+## Process
+1. Gather run context, trigger reason, and severity context.
+2. Build `IncidentRecord` with blast radius, mitigation, rollback, and override metadata.
+3. Validate incident output contract before finalizing.
+## Requirements
+- Emit `IncidentRecord` matching `.pi/harness/specs/incident-record.schema.json`.
+- Capture blast radius, mitigation, rollback refs, and postmortem requirement.
+- If a policy block is overridden, record single-human approver and explicit justification.
+## Guardrails
+- Do not overthink incident narrative; prioritize factual, auditable records.
+- Only record details supported by available run artifacts and explicit inputs.
+- Never omit override approver identity or justification when override occurred.
+## Output
+- Incident summary.
+- Structured `IncidentRecord` JSON.
+- Immediate rollback decision trail.
+## Completion behavior
+Finish with:
+- `incident_status` (`recorded` or `needs_input`)
+- rollback action (`execute_now` or `standby`)
+- postmortem requirement (`true`/`false`)

package/.pi/prompts/harness-plan.md ADDED Viewed

@@ -0,0 +1,64 @@
+---
+description: Build a strict read-only PlanPacket before any mutating work.
+argument-hint: "\"<task>\" [--risk low|med|high] [--budget <amount>] [--quick]"
+---
+# harness-plan
+Create a machine-readable plan packet before execution.
+## Step 0 — Parse arguments
+Read `$ARGUMENTS` and parse:
+- task statement (required)
+- optional flags: `--risk low|med|high`, `--budget <amount>`, `--quick`
+If task is missing, stop and return:
+`Usage: /harness-plan "<task>" [--risk low|med|high] [--budget <amount>] [--quick]`
+## Process
+1. Parse the requested task and extract concrete scope and constraints.
+2. If ambiguity blocks safe execution planning, request clarification and stop.
+3. Build a `PlanPacket` that is valid against `.pi/harness/specs/plan-packet.schema.json`.
+4. Include rollback artifacts in all required forms.
+## Hard requirements
+- Do not run mutating tools in this command.
+- If task scope is ambiguous, request clarification and stop.
+- Produce a `PlanPacket` matching `.pi/harness/specs/plan-packet.schema.json`.
+- Include rollback artifacts in all three forms:
+  - revert command
+  - prepared revert branch name
+  - patch bundle path
+- Set risk level to `high` if uncertainty, broad blast radius, or policy-sensitive surfaces are involved.
+## Guardrails
+- Do not overthink straightforward planning requests.
+- Only plan the requested scope; do not execute or widen implementation.
+- Never speculate about code or configuration that was not read.
+## Output contract
+Return:
+1. Human-readable plan summary:
+   - scope
+   - assumptions
+   - acceptance checks
+   - rollback plan
+2. A valid JSON `PlanPacket` object.
+Do not proceed to execution from this command.
+## Completion behavior
+Always end with:
+- one-line `plan_status` (`ready` or `needs_clarification`)
+- the final `risk_level` used
+- explicit `next_command` recommendation (`/harness-run --plan ...` or clarification request)

package/.pi/prompts/harness-review.md ADDED Viewed

@@ -0,0 +1,52 @@
+---
+description: Independent evaluator pass/fail verdict in session isolation mode.
+argument-hint: "--run <run-id> [--trace <trace-ref>]"
+---
+# harness-review
+Produce an independent evaluator verdict.
+## Step 0 — Parse arguments
+Read `$ARGUMENTS` and parse:
+- required: `--run <run-id>`
+- optional: `--trace <trace-ref>`
+If `--run` is missing, stop and return:
+`Usage: /harness-review --run <run-id> [--trace <trace-ref>]`
+## Process
+1. Reconstruct expected outcomes from plan and run artifacts.
+2. Independently verify checks and regression guards.
+3. Emit `EvalVerdict` output for policy gate consumption.
+## Requirements
+- Treat executor output as untrusted.
+- Do not self-review with executor-private scratch context.
+- Emit `EvalVerdict` contract matching `.pi/harness/specs/eval-verdict.schema.json`.
+- Provide reproducible failed checks and regression flags.
+## Guardrails
+- Do not overthink straightforward pass/fail evidence.
+- Only evaluate requested run artifacts and gates.
+- Never speculate about checks that were not executed.
+## Output
+- Human-readable findings.
+- Structured `EvalVerdict` JSON.
+- Recommended action: `proceed_to_adversary`, `replan`, or `rollback`.
+## Completion behavior
+Always finish with:
+- `eval_status` (`pass`, `conditional_pass`, `fail`)
+- `recommended_action`
+- short evidence list that maps each failed check to a reproducible reference

package/.pi/prompts/harness-router-tune.md ADDED Viewed

@@ -0,0 +1,74 @@
+---
+description: Propose model-router updates from eval evidence; apply only with explicit approval.
+argument-hint: "--evidence <evidence.json> --candidate <candidate-router.json> [--proposal <out.json>]"
+---
+# harness-router-tune
+Router tuning is **propose-and-approve only**.
+## Step 0 — Parse arguments
+Read `$ARGUMENTS` and parse:
+- required: `--evidence <evidence.json>`, `--candidate <candidate-router.json>`
+- optional: `--proposal <out.json>`
+If required args are missing, stop and return:
+`Usage: /harness-router-tune --evidence <evidence.json> --candidate <candidate-router.json> [--proposal <out.json>]`
+## Process
+1. Validate evidence completeness and guard status.
+2. Generate a proposal artifact only (no live router mutation).
+3. Require explicit human approval metadata before any apply step.
+## Never-do rule
+- Never write `.pi/model-router.json` directly from this command.
+## Proposal flow
+1. Build proposal:
+```bash
+node .pi/harness/router/propose-router-tuning.mjs \
+  --evidence <evidence.json> \
+  --candidate <candidate-router.json> \
+  --proposal-out .pi/harness/router/proposals/<id>.json
+```
+2. Review proposal (human approval required).
+3. Apply only with explicit approver + justification:
+```bash
+node .pi/harness/router/apply-router-proposal.mjs \
+  --proposal .pi/harness/router/proposals/<id>.json \
+  --approve-by "<human>" \
+  --justification "<reason>" \
+  --write
+```
+## Evidence requirements
+- Minimum sample count threshold met.
+- Pre/post success-rate delta included.
+- Cost-per-task delta included.
+- Regression guard status present and passing.
+If any requirement is missing, stop with `human_required`.
+## Guardrails
+- Do not overthink weak evidence; reject incomplete proposals quickly.
+- Only produce proposal/apply instructions within this contract.
+- Never apply tuning without explicit human approver identity and justification.
+## Completion behavior
+End with:
+- `tuning_status` (`proposed`, `human_required`, or `rejected`)
+- evidence gate summary (sample count, success delta, cost delta, regression guard)
+- explicit non-mutation confirmation for `.pi/model-router.json`

package/.pi/prompts/harness-run.md ADDED Viewed

@@ -0,0 +1,59 @@
+---
+description: Execute only against an approved PlanPacket with strict phase gates.
+argument-hint: "--plan <path-to-plan-packet.json> [--budget <amount>]"
+---
+# harness-run
+Execute implementation only after an approved plan exists.
+## Step 0 — Parse arguments
+Read `$ARGUMENTS` and parse:
+- required: `--plan <path-to-plan-packet.json>`
+- optional: `--budget <amount>`
+If `--plan` is missing, stop and return:
+`Usage: /harness-run --plan <path-to-plan-packet.json> [--budget <amount>]`
+## Process
+1. Validate `--plan` input and confirm it is a valid approved `PlanPacket`.
+2. Execute only within approved scope.
+3. Run focused validations mapped to approved acceptance checks.
+4. Produce rollback artifacts and handoff references for downstream gates.
+## Required input
+- `--plan` must point to a valid `PlanPacket`.
+## Gate behavior
+- Refuse execution if no valid plan packet is provided.
+- Keep edits strictly within approved scope.
+- If scope drift appears, stop and return to `harness-plan`.
+- Record evaluator/adversary prerequisites for downstream gates.
+- Always prepare rollback artifacts as part of execution output.
+## Guardrails
+- Do not overthink straightforward approved changes; execute the approved scope directly.
+- Only modify files and behaviors covered by the approved `PlanPacket`.
+- Never speculate about successful validation without runnable evidence.
+## Output
+- Implementation summary scoped to approved plan.
+- Files changed and why.
+- Targeted validations run.
+- Trace pointers and rollback references.
+## Completion behavior
+End with:
+1. `execution_status` (`completed`, `blocked`, or `scope_drift`).
+2. `validation_summary` (pass/fail with command evidence).
+3. `handoff_ready` booleans for evaluator/adversary prerequisites.