ultimate-pi 0.2.3 → 0.2.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -2,397 +2,86 @@
2
2
 
3
3
  > The **ultimate AI coding harness** on top of [**pi.dev**](https://pi.dev).
4
4
 
5
- ## What this project is
5
+ `ultimate-pi` is a pi package that adds a governed coding workflow: plan first, then implement, then independent review—so agents cannot silently skip planning or merge unsafe changes.
6
6
 
7
- `ultimate-pi` is a production-oriented harness for AI-assisted coding with strict safety and governance built in.
7
+ ## Quick start
8
8
 
9
- It gives you:
9
+ **Requirements:** Node 18+, npm 9+, git.
10
10
 
11
- - A phase-based workflow (`plan -> execute -> evaluate -> adversary -> merge`)
12
- - Enforcement that blocks unsafe behavior (for example, mutating code before planning)
13
- - Structured artifacts in `.pi/harness/` for auditability and replay
14
- - Canonical contracts (`HarnessRunRecord`, observations, harness PostHog events) and team ADRs
15
- - Dual PostHog analytics: LLM spans (`$ai_*`) plus harness domain events (`harness_*`)
16
- - A practical bootstrap command that sets up tools, graph, and runtime integrations
17
-
18
- If you are new: start with the **Quick Start** section and run one task through the full pipeline.
19
-
20
- ## 5-minute quickstart
21
-
22
- If you just want to get started fast:
23
-
24
- 1. Install into your current project:
11
+ 1. **Install** (from your project directory):
25
12
 
26
13
  ```bash
27
14
  pi install npm:ultimate-pi
28
15
  /reload
29
16
  ```
30
17
 
31
- 2. Bootstrap the harness:
18
+ 2. **Bootstrap** (once per project):
32
19
 
33
20
  ```text
34
21
  /harness-setup
35
22
  ```
36
23
 
37
- 3. Run your first task:
24
+ 3. **Run a task** (full pipeline in one command):
38
25
 
39
26
  ```text
40
27
  /harness-auto "implement feature X safely"
41
28
  ```
42
29
 
43
- That command runs the strict pipeline:
44
- `plan -> execute -> evaluate -> adversary -> policy decision`.
30
+ That runs: plan execute → evaluate → adversary → policy decision. It does **not** auto-merge.
45
31
 
46
- If it blocks, inspect with:
32
+ If something blocks, inspect the last run:
47
33
 
48
34
  ```text
49
35
  /harness-trace-last
50
36
  /harness-policy-status
51
37
  ```
52
38
 
53
- ## Table of Contents
54
-
55
- - [5-minute quickstart](#5-minute-quickstart)
56
- - [How the harness works](#how-the-harness-works)
57
- - [Harness Phase 2 (developers)](#harness-phase-2-developers)
58
- - [PostHog and harness telemetry](#posthog-and-harness-telemetry)
59
- - [Verify your harness install](#verify-your-harness-install)
60
- - [Prerequisites](#prerequisites)
61
- - [Quick Start (new users)](#quick-start-new-users)
62
- - [Run your first harness task](#run-your-first-harness-task)
63
- - [Command reference](#command-reference)
64
- - [Harness artifacts and file layout](#harness-artifacts-and-file-layout)
65
- - [Safety and governance defaults](#safety-and-governance-defaults)
66
- - [Router tuning flow](#router-tuning-flow)
67
- - [Troubleshooting](#troubleshooting)
68
- - [Contributing](#contributing)
69
-
70
- ## How the harness works
71
-
72
- The harness enforces a deterministic execution lifecycle:
73
-
74
- 1. **Plan**
75
- Create a `PlanPacket` before any mutating work.
76
- 2. **Execute**
77
- Implement only within the approved plan scope.
78
- 3. **Evaluate**
79
- Run independent evaluation and produce an `EvalVerdict`.
80
- 4. **Adversary**
81
- Run adversarial review and produce an `AdversaryReport`.
82
- 5. **Policy / Merge decision**
83
- Debate consensus + severity policy decides `pass`, `conditional_pass`, `block`, or `human_required`.
84
-
85
- ### Why this matters
86
-
87
- - You get fewer silent mistakes.
88
- - Reviews are reproducible, not opinion-only.
89
- - Incidents and overrides are recorded in structured, machine-readable artifacts.
90
-
91
- ## Harness Phase 2 (developers)
92
-
93
- Phase 2 adds machine-readable contracts, observability, and deterministic checks on top of the phase workflow above. You do not need to read every ADR to use the harness; run `/harness-auto` and `npm run harness:verify` first, then drill down when you are changing behavior.
94
-
95
- **What shipped**
96
-
97
- - **Contracts** in `.pi/harness/specs/` — including `HarnessRunRecord`, `HarnessPostHogEvent`, and `HarnessObservation` (see [specs README](.pi/harness/specs/README.md))
98
- - **Extensions** (auto-loaded from `.pi/extensions/`) — `trace-recorder`, `harness-telemetry`, `observation-bus`, `drift-monitor`, plus existing governance extensions
99
- - **ADRs** — team-shared decisions in [`.pi/harness/docs/adrs/`](.pi/harness/docs/adrs/README.md) (0001–0009)
100
- - **Skills** — `harness-spec`, `harness-plan`, `harness-governor`, `harness-eval`, `harness-context` (context-mode only)
101
- - **Smoke evals** — `.pi/harness/evals/smoke/` (fixtures only; no CI LLM)
102
- - **Evolution** — `.pi/harness/evolution/` (self-healing rules, meta-optimizer)
103
-
104
- **Typical flows**
105
-
106
- | Goal | Command |
107
- |------|---------|
108
- | End-to-end task (strict pipeline) | `/harness-auto "<task>"` |
109
- | Check schemas, fixtures, and extension wiring | `npm run harness:verify` |
110
- | Last run trace summary | `/harness-trace-last` |
111
- | Telemetry config | `/harness-telemetry-status` |
112
- | Sync Sentrux rules from architecture manifest | `npm run harness:sentrux-sync` or `/harness-sentrux-sync` |
113
-
114
- For extension internals, env vars, and verification details, see [`.pi/harness/README.md`](.pi/harness/README.md) and [CONTRIBUTING.md](./CONTRIBUTING.md#harness-governance-extensions).
115
-
116
- ## Sentrux architectural rules
117
-
118
- [Sentrux](https://sentrux.dev/docs/rules-engine/) enforces layers and boundaries via `.sentrux/rules.toml`. The harness keeps that file in sync with the repo layout:
119
-
120
- | Artifact | Role |
121
- |----------|------|
122
- | [`.pi/harness/sentrux/architecture.manifest.json`](.pi/harness/sentrux/architecture.manifest.json) | Canonical layers, boundaries, constraints (edit when architecture changes) |
123
- | [`.sentrux/rules.toml`](.sentrux/rules.toml) | Generated rules for `sentrux check` and MCP `check_rules` (committed; custom TOML outside managed markers is preserved) |
124
-
125
- **When to sync**
126
-
127
- - During `/harness-setup` (Step 2.8)
128
- - After changing `architecture.manifest.json`
129
- - Automatically on harness `plan` / `merge` phases (`sentrux-rules-sync` extension)
130
- - Before release: `npm run harness:verify` fails if manifest and rules are out of date
131
-
132
- ```bash
133
- npm run harness:sentrux-sync # write/merge rules.toml
134
- sentrux check . # enforce rules (CI-friendly exit codes)
135
- ```
136
-
137
- Details: [ADR 0009](.pi/harness/docs/adrs/0009-sentrux-rules-lifecycle.md).
138
-
139
- ## PostHog and harness telemetry
140
-
141
- ultimate-pi uses **two PostHog layers** on the same project key (`POSTHOG_API_KEY`, project `ultimate-pi`):
142
-
143
- | Layer | Source | Events | Purpose |
144
- |-------|--------|--------|---------|
145
- | LLM analytics | `@posthog/pi` | `$ai_generation`, `$ai_span`, `$ai_trace` | Model/tool usage and latency |
146
- | Harness domain | `harness-telemetry.ts` | `harness_run_started`, `harness_run_completed`, `harness_policy_violation`, … | Governance KPIs and run correlation |
147
-
148
- Copy [`.env.example`](.env.example) to `.env` and set at minimum:
149
-
150
- - `POSTHOG_API_KEY` — project API key
151
- - `POSTHOG_PROJECT_NAME=ultimate-pi`
152
- - `HARNESS_TELEMETRY_ENABLED=true` — set `false` to disable **only** `harness_*` captures (LLM layer unchanged)
153
- - `POSTHOG_PRIVACY_MODE` — when `true`, harness properties strip paths (counts/enums only)
154
-
155
- **Verify `harness_*` events**
156
-
157
- 1. Ensure env vars above are set; run `/harness-telemetry-status` in a pi session.
158
- 2. Run `/harness-auto "smoke task"` (or any harness run that completes).
159
- 3. In PostHog → **Live events**, filter `event` contains `harness_`.
160
- 4. Confirm `harness_run_started` and `harness_run_completed` share the same `harness_run_id`.
161
-
162
- Event catalog and dashboard seed queries: [ADR 0008](.pi/harness/docs/adrs/0008-harness-posthog-telemetry.md).
163
-
164
- ## Verify your harness install
165
-
166
- After `/harness-setup` or when changing harness specs/extensions:
167
-
168
- ```bash
169
- npm run harness:verify
170
- ```
171
-
172
- This runs deterministic checks (schemas, smoke fixtures, extension registration) without calling an LLM. Fix any reported errors before relying on `/harness-auto` in production workflows.
173
-
174
- Optional: set `HARNESS_SENTRUX_REQUIRED=true` in `.env` if your environment must assert Sentrux stub wiring (see `.env.example`).
175
-
176
- ## Prerequisites
177
-
178
- Minimum recommended environment:
39
+ ## Commands
179
40
 
180
- - `node >= 18`
181
- - `npm >= 9`
182
- - `git`
183
- - `python >= 3.10` (for Graphify workflow)
41
+ | Command | What it does |
42
+ |---------|----------------|
43
+ | `/harness-setup` | One-time project bootstrap (tools, harness dirs, extensions) |
44
+ | `/harness-auto "<task>"` | End-to-end pipeline (recommended) |
45
+ | `/harness-plan "<task>"` | Plan only (no code changes) |
46
+ | `/harness-run --plan <file>` | Execute an approved plan |
47
+ | `/harness-eval --run <run-id>` | Evaluation summary |
48
+ | `/harness-review --run <run-id>` | Independent review verdict |
49
+ | `/harness-critic --run <run-id>` | Adversarial review |
50
+ | `/harness-trace --run <run-id>` | Full trace for a run |
51
+ | `/harness-trace-last` | Summary of the most recent run |
52
+ | `/harness-policy-status` | Current policy / block reasons |
53
+ | `/harness-abort [reason]` | Stop and return to plan-only mode |
184
54
 
185
- Optional but commonly used:
55
+ ## Manual workflow
186
56
 
187
- - `gh` CLI for GitHub workflow
188
- - Docker (only if you want self-hosted Firecrawl)
189
-
190
- ## Quick Start (new users)
191
-
192
- From your project folder:
193
-
194
- ```bash
195
- pi install npm:ultimate-pi
196
- /reload
197
- ```
198
-
199
- Run the full bootstrap:
200
-
201
- ```text
202
- /harness-setup
203
- ```
204
-
205
- `/harness-setup` is idempotent and designed as the one-command initializer for:
206
-
207
- - Graphify knowledge graph setup
208
- - CLI tool installation and checks
209
- - Harness/runtime directory scaffolding
210
- - Extension package verification
211
- - Model-router bootstrap configuration
212
-
213
- ## Run your first harness task
214
-
215
- ### Fastest path
216
-
217
- Use the one-command pipeline:
218
-
219
- ```text
220
- /harness-auto "implement feature X safely"
221
- ```
222
-
223
- This runs:
224
-
225
- `plan -> execute -> evaluate -> adversary -> policy decision -> commit/PR (no auto-merge)`
226
-
227
- ### Manual path (recommended for learning)
228
-
229
- 1. Plan
230
-
231
- ```text
232
- /harness-plan "implement feature X safely"
233
- ```
234
-
235
- 2. Execute with approved plan:
236
-
237
- ```text
238
- /harness-run --plan <path-to-plan-packet.json>
239
- ```
240
-
241
- 3. Evaluate:
57
+ Use this when you want each step separate:
242
58
 
243
59
  ```text
60
+ /harness-plan "your task"
61
+ /harness-run --plan .pi/harness/runs/<run-id>/plan-packet.json
244
62
  /harness-eval --run <run-id>
245
63
  /harness-review --run <run-id>
246
- ```
247
-
248
- 4. Adversarial review:
249
-
250
- ```text
251
64
  /harness-critic --run <run-id>
252
65
  ```
253
66
 
254
- 5. If blocked or ambiguous, record incident:
255
-
256
- ```text
257
- /harness-incident --run <run-id> --trigger "<reason>"
258
- ```
259
-
260
- 6. Trace/debug:
261
-
262
- ```text
263
- /harness-trace --run <run-id>
264
- ```
265
-
266
- ## Command reference
267
-
268
- ### Core workflow commands
269
-
270
- - `/harness-setup` - bootstrap complete environment and harness scaffolding
271
- - `/harness-auto "<task>"` - run strict end-to-end pipeline
272
- - `/harness-plan "<task>"` - generate read-only `PlanPacket`
273
- - `/harness-run --plan <file>` - execute approved scope only
274
- - `/harness-eval --run <run-id>` - benchmark/evaluation summary
275
- - `/harness-review --run <run-id>` - independent evaluator verdict
276
- - `/harness-critic --run <run-id>` - adversarial findings and merge-block signal
277
- - `/harness-incident --run <run-id> --trigger "<reason>"` - incident record
278
- - `/harness-trace --run <run-id>` - replay and artifact completeness
279
- - `/harness-abort [reason]` - reset safely to plan phase and lock mutation until new plan
280
-
281
- ### Operational/status commands
282
-
283
- - `/harness-policy-status`
284
- - `/harness-budget-status`
285
- - `/harness-review-integrity-status`
286
- - `/harness-test-integrity-last`
287
- - `/harness-trace-last` — compact summary of the most recent run trace + `HarnessRunRecord`
288
- - `/harness-telemetry-status` — PostHog harness layer config and session flush count
289
- - `/harness-debate-open`
290
- - `/harness-debate-round`
291
- - `/harness-debate-consensus`
292
-
293
- ## Harness artifacts and file layout
67
+ ## Defaults you should know
294
68
 
295
- Primary harness directories:
69
+ - **Model routing is opt-in** — install does not force `router/auto` or `gpt-5.4-pro`. Enable with `/router profile auto` after `/harness-setup` generates `.pi/model-router.json`, or copy [`.pi/model-router.example.json`](.pi/model-router.example.json).
70
+ - **Plan before mutate** — write/edit/shell that changes the repo is blocked until execute phase.
71
+ - **No auto-merge** — you decide when to open or merge a PR.
72
+ - **Structured runs** — each run writes artifacts under `.pi/harness/runs/` for replay and audit.
296
73
 
297
- - `.pi/harness/specs/` JSON schemas for core contracts
298
- - `.pi/harness/runs/` — per-run trace summaries, `HarnessRunRecord`, event indexes
299
- - `.pi/harness/incidents/` — incident and policy override records
300
- - `.pi/harness/debates/` — debate rounds, consensus packets, budget events
301
- - `.pi/harness/router/` — router tuning proposals and apply flow scripts
302
- - `.pi/harness/docs/adrs/` — Architectural Decision Records ([index](.pi/harness/docs/adrs/README.md))
303
- - `.pi/harness/evals/smoke/` — deterministic smoke fixtures
304
- - `.pi/harness/evolution/` — self-healing rules and meta-optimizer (JSONL-first)
305
-
306
- Core contract schemas in `.pi/harness/specs/`:
307
-
308
- - `PlanPacket`, `RunTrace`, `HarnessRunRecord`
309
- - `HarnessPostHogEvent`, `HarnessObservation`
310
- - `EvalVerdict`, `AdversaryReport`
311
- - `RoundResult`, `ConsensusPacket`
312
- - `BudgetExhausted`, `IncidentRecord`
313
- - `RouterTuningProposal`
314
-
315
- ## Safety and governance defaults
316
-
317
- The harness intentionally locks in these behaviors:
318
-
319
- - **Plan-before-mutate**: write/edit/mutating shell commands blocked outside execute phase
320
- - **Mandatory adversarial review** in the strict pipeline
321
- - **Review isolation**: evaluator/adversary cannot share executor session context
322
- - **Budget hard-stops** with structured `budget_exhausted` events
323
- - **Test-diff integrity checks** for suspicious test weakening patterns
324
- - **Severity policy thresholds**:
325
- - block if `security >= 0.70` or `correctness >= 0.70`
326
- - block if `architecture >= 0.80` or `test_integrity >= 0.80`
327
- - **Override policy**: single human approver with explicit justification
328
- - **Never auto-merge**
329
-
330
- ## Router tuning flow
331
-
332
- Router changes are two-step and approval-gated:
333
-
334
- 1. Propose (no live mutation):
335
-
336
- ```bash
337
- node .pi/harness/router/propose-router-tuning.mjs \
338
- --evidence /path/to/evidence.json \
339
- --candidate /path/to/candidate-router.json \
340
- --proposal-out .pi/harness/router/proposals/proposal-001.json
341
- ```
342
-
343
- 2. Apply (explicit human approval + justification + `--write`):
344
-
345
- ```bash
346
- node .pi/harness/router/apply-router-proposal.mjs \
347
- --proposal .pi/harness/router/proposals/proposal-001.json \
348
- --approve-by "human.name" \
349
- --justification "why this is safe" \
350
- --write
351
- ```
352
-
353
- Blind writes to `.pi/model-router.json` are intentionally disallowed.
74
+ Optional: copy [`.env.example`](.env.example) to `.env` if you use PostHog or other integrations wired by `/harness-setup`.
354
75
 
355
76
  ## Troubleshooting
356
77
 
357
- ### `/harness-setup` fails early
358
-
359
- - Check `node --version`, `npm --version`, `git --version`
360
- - Ensure Node is at least 18
361
-
362
- ### Graphify not available
363
-
364
- - Install Python 3.10+
365
- - Then install Graphify and build/update graph
366
-
367
- ### Review/integrity blocks in evaluate/adversary phase
368
-
369
- - This means review is not isolated from execute context
370
- - Fork/switch session, then rerun review commands
371
-
372
- ### Budget hard-stop triggers
373
-
374
- - Use `/harness-budget-status`
375
- - Reduce scope, split task, or restart with a narrower plan
376
-
377
- ### Suspicious test diff warning
378
-
379
- - Use `/harness-test-integrity-last`
380
- - Restore or justify test changes; expect adversarial scrutiny
381
-
382
- ### No `harness_*` events in PostHog
383
-
384
- - Run `/harness-telemetry-status` — confirm `POSTHOG_API_KEY` is set and `HARNESS_TELEMETRY_ENABLED` is not `false`
385
- - Complete a full run (`/harness-auto` or `/harness-run` through `agent_end`) so `harness-telemetry` can flush
386
- - Filter Live events for `harness_`, not `$ai_*` (those come from `@posthog/pi` only)
387
-
388
- ### `npm run harness:verify` fails
389
-
390
- - Read the script output for the first schema or fixture mismatch
391
- - Compare your change against [`.pi/harness/specs/`](.pi/harness/specs/) and [ADR 0002](.pi/harness/docs/adrs/0002-harness-run-record.md) if you edited run/trace shapes
78
+ | Problem | Try |
79
+ |---------|-----|
80
+ | Setup fails | `node --version` (need 18+), rerun `/harness-setup` |
81
+ | Blocked in evaluate/review | Run review in a fresh session (isolation from execute) |
82
+ | Budget / scope stop | `/harness-budget-status`, narrow the task or split the plan |
83
+ | Test integrity warning | `/harness-test-integrity-last`, fix or justify test changes |
392
84
 
393
85
  ## Contributing
394
86
 
395
- For local dev setup, lint/test commands, Firecrawl notes, harness extension details, and architectural quality gate workflow, see:
396
-
397
- - [CONTRIBUTING.md](./CONTRIBUTING.md)
398
- - [`.pi/harness/README.md`](.pi/harness/README.md) — scaffold layout, verification, governance extensions
87
+ Local development, harness internals, and quality gates: [CONTRIBUTING.md](./CONTRIBUTING.md) and [`.pi/harness/README.md`](.pi/harness/README.md).
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "ultimate-pi",
3
- "version": "0.2.3",
3
+ "version": "0.2.4",
4
4
  "description": "Ultimate AI coding harness for pi.dev — extensible skills, Obsidian wiki knowledge layer, compressed context, deterministic output",
5
5
  "keywords": [
6
6
  "pi-package",
@@ -35,7 +35,9 @@
35
35
  ]
36
36
  },
37
37
  "scripts": {
38
- "check:ts": "tsc --noEmit --target ES2022 --moduleResolution nodenext --module nodenext --skipLibCheck .pi/extensions/dotenv-loader.ts .pi/extensions/lib/posthog-node.d.ts .pi/extensions/lib/harness-posthog.ts .pi/extensions/harness-telemetry.ts .pi/extensions/trace-recorder.ts .pi/extensions/observation-bus.ts .pi/extensions/drift-monitor.ts .pi/extensions/sentrux-rules-sync.ts",
38
+ "check:ts": "tsc --noEmit --target ES2022 --moduleResolution nodenext --module nodenext --skipLibCheck .pi/extensions/dotenv-loader.ts .pi/extensions/lib/posthog-node.d.ts .pi/extensions/lib/harness-posthog.ts .pi/extensions/lib/harness-paths.ts .pi/extensions/model-router-bootstrap.ts .pi/extensions/harness-telemetry.ts .pi/extensions/trace-recorder.ts .pi/extensions/observation-bus.ts .pi/extensions/drift-monitor.ts .pi/extensions/sentrux-rules-sync.ts .pi/extensions/custom-header.ts",
39
+ "harness:graphify-bootstrap": "bash scripts/harness-graphify-bootstrap.sh",
40
+ "harness:cli-verify": "bash scripts/harness-cli-verify.sh",
39
41
  "harness:verify": "node scripts/harness-verify.mjs",
40
42
  "harness:sentrux-sync": "node scripts/sentrux-rules-sync.mjs --force",
41
43
  "harness:meta-optimizer": "node .pi/harness/evolution/meta-optimizer.mjs",