npm - ultimate-pi - Versions diffs - 0.2.3 → 0.2.4 - Mend

ultimate-pi 0.2.3 → 0.2.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (18) hide show

package/.codex/hooks.json +15 -0
package/.pi/extensions/custom-header.ts +26 -2
package/.pi/extensions/lib/harness-paths.ts +47 -0
package/.pi/extensions/model-router-bootstrap.ts +174 -0
package/.pi/extensions/sentrux-rules-sync.ts +33 -2
package/.pi/harness/browser.json +1 -0
package/.pi/model-router.example.json +27 -0
package/.pi/prompts/graphify.md +4 -8
package/.pi/prompts/harness-setup.md +143 -92
package/.pi/settings.json +0 -2
package/.sentrux/.harness-rules-meta.json +1 -1
package/AGENTS.md +12 -0
package/CHANGELOG.md +13 -0
package/README.md +39 -350
package/package.json +4 -2
package/scripts/harness-cli-verify.sh +294 -0
package/scripts/harness-graphify-bootstrap.sh +151 -0
package/.pi/model-router.json +0 -95

package/README.md CHANGED Viewed

@@ -2,397 +2,86 @@
 > The **ultimate AI coding harness** on top of [**pi.dev**](https://pi.dev).
-## What this project is
+`ultimate-pi` is a pi package that adds a governed coding workflow: plan first, then implement, then independent review—so agents cannot silently skip planning or merge unsafe changes.
-`ultimate-pi` is a production-oriented harness for AI-assisted coding with strict safety and governance built in.
+## Quick start
-It gives you:
+**Requirements:** Node 18+, npm 9+, git.
-- A phase-based workflow (`plan -> execute -> evaluate -> adversary -> merge`)
-- Enforcement that blocks unsafe behavior (for example, mutating code before planning)
-- Structured artifacts in `.pi/harness/` for auditability and replay
-- Canonical contracts (`HarnessRunRecord`, observations, harness PostHog events) and team ADRs
-- Dual PostHog analytics: LLM spans (`$ai_*`) plus harness domain events (`harness_*`)
-- A practical bootstrap command that sets up tools, graph, and runtime integrations
-If you are new: start with the **Quick Start** section and run one task through the full pipeline.
-## 5-minute quickstart
-If you just want to get started fast:
-1. Install into your current project:
+1. **Install** (from your project directory):
 ```bash
 pi install npm:ultimate-pi
 /reload
 ```
-2. Bootstrap the harness:
+2. **Bootstrap** (once per project):
 ```text
 /harness-setup
 ```
-3. Run your first task:
+3. **Run a task** (full pipeline in one command):
 ```text
 /harness-auto "implement feature X safely"
 ```
-That command runs the strict pipeline:
-`plan -> execute -> evaluate -> adversary -> policy decision`.
+That runs: plan → execute → evaluate → adversary → policy decision. It does **not** auto-merge.
-If it blocks, inspect with:
+If something blocks, inspect the last run:
 ```text
 /harness-trace-last
 /harness-policy-status
 ```
-## Table of Contents
-- [5-minute quickstart](#5-minute-quickstart)
-- [How the harness works](#how-the-harness-works)
-- [Harness Phase 2 (developers)](#harness-phase-2-developers)
-- [PostHog and harness telemetry](#posthog-and-harness-telemetry)
-- [Verify your harness install](#verify-your-harness-install)
-- [Prerequisites](#prerequisites)
-- [Quick Start (new users)](#quick-start-new-users)
-- [Run your first harness task](#run-your-first-harness-task)
-- [Command reference](#command-reference)
-- [Harness artifacts and file layout](#harness-artifacts-and-file-layout)
-- [Safety and governance defaults](#safety-and-governance-defaults)
-- [Router tuning flow](#router-tuning-flow)
-- [Troubleshooting](#troubleshooting)
-- [Contributing](#contributing)
-## How the harness works
-The harness enforces a deterministic execution lifecycle:
-1. **Plan**
-   Create a `PlanPacket` before any mutating work.
-2. **Execute**
-   Implement only within the approved plan scope.
-3. **Evaluate**
-   Run independent evaluation and produce an `EvalVerdict`.
-4. **Adversary**
-   Run adversarial review and produce an `AdversaryReport`.
-5. **Policy / Merge decision**
-   Debate consensus + severity policy decides `pass`, `conditional_pass`, `block`, or `human_required`.
-### Why this matters
-- You get fewer silent mistakes.
-- Reviews are reproducible, not opinion-only.
-- Incidents and overrides are recorded in structured, machine-readable artifacts.
-## Harness Phase 2 (developers)
-Phase 2 adds machine-readable contracts, observability, and deterministic checks on top of the phase workflow above. You do not need to read every ADR to use the harness; run `/harness-auto` and `npm run harness:verify` first, then drill down when you are changing behavior.
-**What shipped**
-- **Contracts** in `.pi/harness/specs/` — including `HarnessRunRecord`, `HarnessPostHogEvent`, and `HarnessObservation` (see [specs README](.pi/harness/specs/README.md))
-- **Extensions** (auto-loaded from `.pi/extensions/`) — `trace-recorder`, `harness-telemetry`, `observation-bus`, `drift-monitor`, plus existing governance extensions
-- **ADRs** — team-shared decisions in [`.pi/harness/docs/adrs/`](.pi/harness/docs/adrs/README.md) (0001–0009)
-- **Skills** — `harness-spec`, `harness-plan`, `harness-governor`, `harness-eval`, `harness-context` (context-mode only)
-- **Smoke evals** — `.pi/harness/evals/smoke/` (fixtures only; no CI LLM)
-- **Evolution** — `.pi/harness/evolution/` (self-healing rules, meta-optimizer)
-**Typical flows**
-| Goal | Command |
-|------|---------|
-| End-to-end task (strict pipeline) | `/harness-auto "<task>"` |
-| Check schemas, fixtures, and extension wiring | `npm run harness:verify` |
-| Last run trace summary | `/harness-trace-last` |
-| Telemetry config | `/harness-telemetry-status` |
-| Sync Sentrux rules from architecture manifest | `npm run harness:sentrux-sync` or `/harness-sentrux-sync` |
-For extension internals, env vars, and verification details, see [`.pi/harness/README.md`](.pi/harness/README.md) and [CONTRIBUTING.md](./CONTRIBUTING.md#harness-governance-extensions).
-## Sentrux architectural rules
-[Sentrux](https://sentrux.dev/docs/rules-engine/) enforces layers and boundaries via `.sentrux/rules.toml`. The harness keeps that file in sync with the repo layout:
-| Artifact | Role |
-|----------|------|
-| [`.pi/harness/sentrux/architecture.manifest.json`](.pi/harness/sentrux/architecture.manifest.json) | Canonical layers, boundaries, constraints (edit when architecture changes) |
-| [`.sentrux/rules.toml`](.sentrux/rules.toml) | Generated rules for `sentrux check` and MCP `check_rules` (committed; custom TOML outside managed markers is preserved) |
-**When to sync**
-- During `/harness-setup` (Step 2.8)
-- After changing `architecture.manifest.json`
-- Automatically on harness `plan` / `merge` phases (`sentrux-rules-sync` extension)
-- Before release: `npm run harness:verify` fails if manifest and rules are out of date
-```bash
-npm run harness:sentrux-sync    # write/merge rules.toml
-sentrux check .                 # enforce rules (CI-friendly exit codes)
-```
-Details: [ADR 0009](.pi/harness/docs/adrs/0009-sentrux-rules-lifecycle.md).
-## PostHog and harness telemetry
-ultimate-pi uses **two PostHog layers** on the same project key (`POSTHOG_API_KEY`, project `ultimate-pi`):
-| Layer | Source | Events | Purpose |
-|-------|--------|--------|---------|
-| LLM analytics | `@posthog/pi` | `$ai_generation`, `$ai_span`, `$ai_trace` | Model/tool usage and latency |
-| Harness domain | `harness-telemetry.ts` | `harness_run_started`, `harness_run_completed`, `harness_policy_violation`, … | Governance KPIs and run correlation |
-Copy [`.env.example`](.env.example) to `.env` and set at minimum:
-- `POSTHOG_API_KEY` — project API key
-- `POSTHOG_PROJECT_NAME=ultimate-pi`
-- `HARNESS_TELEMETRY_ENABLED=true` — set `false` to disable **only** `harness_*` captures (LLM layer unchanged)
-- `POSTHOG_PRIVACY_MODE` — when `true`, harness properties strip paths (counts/enums only)
-**Verify `harness_*` events**
-1. Ensure env vars above are set; run `/harness-telemetry-status` in a pi session.
-2. Run `/harness-auto "smoke task"` (or any harness run that completes).
-3. In PostHog → **Live events**, filter `event` contains `harness_`.
-4. Confirm `harness_run_started` and `harness_run_completed` share the same `harness_run_id`.
-Event catalog and dashboard seed queries: [ADR 0008](.pi/harness/docs/adrs/0008-harness-posthog-telemetry.md).
-## Verify your harness install
-After `/harness-setup` or when changing harness specs/extensions:
-```bash
-npm run harness:verify
-```
-This runs deterministic checks (schemas, smoke fixtures, extension registration) without calling an LLM. Fix any reported errors before relying on `/harness-auto` in production workflows.
-Optional: set `HARNESS_SENTRUX_REQUIRED=true` in `.env` if your environment must assert Sentrux stub wiring (see `.env.example`).
-## Prerequisites
-Minimum recommended environment:
+## Commands
-- `node >= 18`
-- `npm >= 9`
-- `git`
-- `python >= 3.10` (for Graphify workflow)
+| Command | What it does |
+|---------|----------------|
+| `/harness-setup` | One-time project bootstrap (tools, harness dirs, extensions) |
+| `/harness-auto "<task>"` | End-to-end pipeline (recommended) |
+| `/harness-plan "<task>"` | Plan only (no code changes) |
+| `/harness-run --plan <file>` | Execute an approved plan |
+| `/harness-eval --run <run-id>` | Evaluation summary |
+| `/harness-review --run <run-id>` | Independent review verdict |
+| `/harness-critic --run <run-id>` | Adversarial review |
+| `/harness-trace --run <run-id>` | Full trace for a run |
+| `/harness-trace-last` | Summary of the most recent run |
+| `/harness-policy-status` | Current policy / block reasons |
+| `/harness-abort [reason]` | Stop and return to plan-only mode |
-Optional but commonly used:
+## Manual workflow
-- `gh` CLI for GitHub workflow
-- Docker (only if you want self-hosted Firecrawl)
-## Quick Start (new users)
-From your project folder:
-```bash
-pi install npm:ultimate-pi
-/reload
-```
-Run the full bootstrap:
-```text
-/harness-setup
-```
-`/harness-setup` is idempotent and designed as the one-command initializer for:
-- Graphify knowledge graph setup
-- CLI tool installation and checks
-- Harness/runtime directory scaffolding
-- Extension package verification
-- Model-router bootstrap configuration
-## Run your first harness task
-### Fastest path
-Use the one-command pipeline:
-```text
-/harness-auto "implement feature X safely"
-```
-This runs:
-`plan -> execute -> evaluate -> adversary -> policy decision -> commit/PR (no auto-merge)`
-### Manual path (recommended for learning)
-1. Plan
-```text
-/harness-plan "implement feature X safely"
-```
-2. Execute with approved plan:
-```text
-/harness-run --plan <path-to-plan-packet.json>
-```
-3. Evaluate:
+Use this when you want each step separate:
 ```text
+/harness-plan "your task"
+/harness-run --plan .pi/harness/runs/<run-id>/plan-packet.json
 /harness-eval --run <run-id>
 /harness-review --run <run-id>
-```
-4. Adversarial review:
-```text
 /harness-critic --run <run-id>
 ```
-5. If blocked or ambiguous, record incident:
-```text
-/harness-incident --run <run-id> --trigger "<reason>"
-```
-6. Trace/debug:
-```text
-/harness-trace --run <run-id>
-```
-## Command reference
-### Core workflow commands
-- `/harness-setup` - bootstrap complete environment and harness scaffolding
-- `/harness-auto "<task>"` - run strict end-to-end pipeline
-- `/harness-plan "<task>"` - generate read-only `PlanPacket`
-- `/harness-run --plan <file>` - execute approved scope only
-- `/harness-eval --run <run-id>` - benchmark/evaluation summary
-- `/harness-review --run <run-id>` - independent evaluator verdict
-- `/harness-critic --run <run-id>` - adversarial findings and merge-block signal
-- `/harness-incident --run <run-id> --trigger "<reason>"` - incident record
-- `/harness-trace --run <run-id>` - replay and artifact completeness
-- `/harness-abort [reason]` - reset safely to plan phase and lock mutation until new plan
-### Operational/status commands
-- `/harness-policy-status`
-- `/harness-budget-status`
-- `/harness-review-integrity-status`
-- `/harness-test-integrity-last`
-- `/harness-trace-last` — compact summary of the most recent run trace + `HarnessRunRecord`
-- `/harness-telemetry-status` — PostHog harness layer config and session flush count
-- `/harness-debate-open`
-- `/harness-debate-round`
-- `/harness-debate-consensus`
-## Harness artifacts and file layout
+## Defaults you should know
-Primary harness directories:
+- **Model routing is opt-in** — install does not force `router/auto` or `gpt-5.4-pro`. Enable with `/router profile auto` after `/harness-setup` generates `.pi/model-router.json`, or copy [`.pi/model-router.example.json`](.pi/model-router.example.json).
+- **Plan before mutate** — write/edit/shell that changes the repo is blocked until execute phase.
+- **No auto-merge** — you decide when to open or merge a PR.
+- **Structured runs** — each run writes artifacts under `.pi/harness/runs/` for replay and audit.
-- `.pi/harness/specs/` — JSON schemas for core contracts
-- `.pi/harness/runs/` — per-run trace summaries, `HarnessRunRecord`, event indexes
-- `.pi/harness/incidents/` — incident and policy override records
-- `.pi/harness/debates/` — debate rounds, consensus packets, budget events
-- `.pi/harness/router/` — router tuning proposals and apply flow scripts
-- `.pi/harness/docs/adrs/` — Architectural Decision Records ([index](.pi/harness/docs/adrs/README.md))
-- `.pi/harness/evals/smoke/` — deterministic smoke fixtures
-- `.pi/harness/evolution/` — self-healing rules and meta-optimizer (JSONL-first)
-Core contract schemas in `.pi/harness/specs/`:
-- `PlanPacket`, `RunTrace`, `HarnessRunRecord`
-- `HarnessPostHogEvent`, `HarnessObservation`
-- `EvalVerdict`, `AdversaryReport`
-- `RoundResult`, `ConsensusPacket`
-- `BudgetExhausted`, `IncidentRecord`
-- `RouterTuningProposal`
-## Safety and governance defaults
-The harness intentionally locks in these behaviors:
-- **Plan-before-mutate**: write/edit/mutating shell commands blocked outside execute phase
-- **Mandatory adversarial review** in the strict pipeline
-- **Review isolation**: evaluator/adversary cannot share executor session context
-- **Budget hard-stops** with structured `budget_exhausted` events
-- **Test-diff integrity checks** for suspicious test weakening patterns
-- **Severity policy thresholds**:
-  - block if `security >= 0.70` or `correctness >= 0.70`
-  - block if `architecture >= 0.80` or `test_integrity >= 0.80`
-- **Override policy**: single human approver with explicit justification
-- **Never auto-merge**
-## Router tuning flow
-Router changes are two-step and approval-gated:
-1. Propose (no live mutation):
-```bash
-node .pi/harness/router/propose-router-tuning.mjs \
-  --evidence /path/to/evidence.json \
-  --candidate /path/to/candidate-router.json \
-  --proposal-out .pi/harness/router/proposals/proposal-001.json
-```
-2. Apply (explicit human approval + justification + `--write`):
-```bash
-node .pi/harness/router/apply-router-proposal.mjs \
-  --proposal .pi/harness/router/proposals/proposal-001.json \
-  --approve-by "human.name" \
-  --justification "why this is safe" \
-  --write
-```
-Blind writes to `.pi/model-router.json` are intentionally disallowed.
+Optional: copy [`.env.example`](.env.example) to `.env` if you use PostHog or other integrations wired by `/harness-setup`.
 ## Troubleshooting
-### `/harness-setup` fails early
-- Check `node --version`, `npm --version`, `git --version`
-- Ensure Node is at least 18
-### Graphify not available
-- Install Python 3.10+
-- Then install Graphify and build/update graph
-### Review/integrity blocks in evaluate/adversary phase
-- This means review is not isolated from execute context
-- Fork/switch session, then rerun review commands
-### Budget hard-stop triggers
-- Use `/harness-budget-status`
-- Reduce scope, split task, or restart with a narrower plan
-### Suspicious test diff warning
-- Use `/harness-test-integrity-last`
-- Restore or justify test changes; expect adversarial scrutiny
-### No `harness_*` events in PostHog
-- Run `/harness-telemetry-status` — confirm `POSTHOG_API_KEY` is set and `HARNESS_TELEMETRY_ENABLED` is not `false`
-- Complete a full run (`/harness-auto` or `/harness-run` through `agent_end`) so `harness-telemetry` can flush
-- Filter Live events for `harness_`, not `$ai_*` (those come from `@posthog/pi` only)
-### `npm run harness:verify` fails
-- Read the script output for the first schema or fixture mismatch
-- Compare your change against [`.pi/harness/specs/`](.pi/harness/specs/) and [ADR 0002](.pi/harness/docs/adrs/0002-harness-run-record.md) if you edited run/trace shapes
+| Problem | Try |
+|---------|-----|
+| Setup fails | `node --version` (need 18+), rerun `/harness-setup` |
+| Blocked in evaluate/review | Run review in a fresh session (isolation from execute) |
+| Budget / scope stop | `/harness-budget-status`, narrow the task or split the plan |
+| Test integrity warning | `/harness-test-integrity-last`, fix or justify test changes |
 ## Contributing
-For local dev setup, lint/test commands, Firecrawl notes, harness extension details, and architectural quality gate workflow, see:
-- [CONTRIBUTING.md](./CONTRIBUTING.md)
-- [`.pi/harness/README.md`](.pi/harness/README.md) — scaffold layout, verification, governance extensions
+Local development, harness internals, and quality gates: [CONTRIBUTING.md](./CONTRIBUTING.md) and [`.pi/harness/README.md`](.pi/harness/README.md).

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
 	"name": "ultimate-pi",
-	"version": "0.2.3",
+	"version": "0.2.4",
 	"description": "Ultimate AI coding harness for pi.dev — extensible skills, Obsidian wiki knowledge layer, compressed context, deterministic output",
 	"keywords": [
 		"pi-package",
@@ -35,7 +35,9 @@
 		]
 	},
 	"scripts": {
-		"check:ts": "tsc --noEmit --target ES2022 --moduleResolution nodenext --module nodenext --skipLibCheck .pi/extensions/dotenv-loader.ts .pi/extensions/lib/posthog-node.d.ts .pi/extensions/lib/harness-posthog.ts .pi/extensions/harness-telemetry.ts .pi/extensions/trace-recorder.ts .pi/extensions/observation-bus.ts .pi/extensions/drift-monitor.ts .pi/extensions/sentrux-rules-sync.ts",
+		"check:ts": "tsc --noEmit --target ES2022 --moduleResolution nodenext --module nodenext --skipLibCheck .pi/extensions/dotenv-loader.ts .pi/extensions/lib/posthog-node.d.ts .pi/extensions/lib/harness-posthog.ts .pi/extensions/lib/harness-paths.ts .pi/extensions/model-router-bootstrap.ts .pi/extensions/harness-telemetry.ts .pi/extensions/trace-recorder.ts .pi/extensions/observation-bus.ts .pi/extensions/drift-monitor.ts .pi/extensions/sentrux-rules-sync.ts .pi/extensions/custom-header.ts",
+		"harness:graphify-bootstrap": "bash scripts/harness-graphify-bootstrap.sh",
+		"harness:cli-verify": "bash scripts/harness-cli-verify.sh",
 		"harness:verify": "node scripts/harness-verify.mjs",
 		"harness:sentrux-sync": "node scripts/sentrux-rules-sync.mjs --force",
 		"harness:meta-optimizer": "node .pi/harness/evolution/meta-optimizer.mjs",