npm - ultimate-pi - Versions diffs - 0.13.1 → 0.15.0 - Mend

ultimate-pi 0.13.1 → 0.15.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (66) hide show

package/.pi/harness/docs/adrs/0035-plan-phase-review-gate.md CHANGED Viewed

@@ -2,26 +2,34 @@
 ## Status
-Accepted (2026-05-18)
+Accepted (2026-05-18); amended 2026-05-19 (outcome-based debate + ping-pong dialogue)
 ## Context
 `/harness-plan` produced thin PlanPackets (scope + bullets). Post-execute adversarial review (`/harness-critic`) ran too late. Graphify corpus (Structured Planning, ADR-020, Generator–Evaluator) defines WBS, validation, and review gate before baseline.
+Early implementation treated debate as a fixed four-round checklist with single evaluator→adversary exchange per round, which ended debate on round count rather than focus coverage and quality.
 ## Decision
 1. **PlanPacket 1.1.0** — required `execution_plan` (phases, work_items, sprint_contract, dag_validation).
 2. **YAML on disk** — `plan-packet.yaml`, `research-brief.yaml`, `run-context.yaml`, `artifacts/*.yaml`. JSON Schema unchanged; instances validated after YAML parse.
 3. **Review Gate agents** — `stack-researcher`, `execution-plan-author`, debate: `hypothesis-validator`, `plan-evaluator`, `plan-adversary`, `sprint-contract-auditor`, `review-integrator`.
-4. **Debate bus** — `debate_id=plan-<run_id>`, plan budget profile (4 rounds, 12k cap), plan-phase consensus prerequisites.
-5. **No legacy JSON** plan paths; no pre-debate standalone `hypothesis-eval`.
+4. **Debate bus** — `debate_id=plan-<run_id>`, plan budget profile:
+   - `min_focus_rounds=4`, `max_rounds=12`, `max_exchanges_per_round=3`
+   - `round_token_cap=8000`, `debate_global_cap=80000`
+5. **Outcome-based completion** — consensus `adversarial_debate_completed` when all focuses `spec|wbs|schedule|quality` are covered in submitted review rounds, last `review_gate_ready: true`, and parent DAG validation passes (not `round_count >= 4` alone).
+6. **Within-round dialogue** — pi-messenger kinds: `claim`, `rebuttal`, `clarification`, `counter`; parent orchestrates ping-pong via `harness_debate_round_status` / `harness_debate_advance_thread` before integrator.
+7. **Sequential debate spawns** — parent must not parallelize debate lane subagents in one batch.
+8. **No legacy JSON** plan paths; no pre-debate standalone `hypothesis-eval`.
 ## Consequences
-- Positive: PM-grade plans, deterministic DAG gate, blind hypothesis eval in debate R1.
-- Negative: Higher spawn/token cost; `harness-verify` and smoke fixtures must use `.yaml`.
+- Positive: PM-grade plans, deterministic DAG gate, blind hypothesis eval in debate R1, richer evaluator↔adversary threads, extendable round index for partial re-debate.
+- Negative: Higher token cost (80k debate cap vs 12k); parent orchestration more stateful; smoke fixtures must include four `debate_round_focus` values.
 ## References
 - [ADR-0033](0033-parent-orchestrated-planning.md), [ADR-0034](0034-darwin-plan-research-pipeline.md)
 - `raw/decisions/adr-020.md`, `raw/modules/structured-planning.md`
+- `.pi/prompts/planning-rubrics.md`, `.pi/prompts/harness-plan.md` Phase 5

package/.pi/harness/docs/adrs/0036-implementation-research-and-selective-debate.md ADDED Viewed

@@ -0,0 +1,51 @@
+# ADR 0036: Implementation research and selective debate
+- **Status:** Accepted
+- **Date:** 2026-05-19
+## Context
+ADR 0034–0035 established Darwin research and outcome-based Review Gate debate. Gaps remained:
+- No dedicated pass for external solution patterns vs in-repo stack selection.
+- Debate always required all four focuses with `min_focus_rounds=4`, even for low-risk tasks.
+- Sprint-contract-auditor spawn in code did not match prompt (quality focus).
+## Decision
+1. **Phase 3.5** — After decompose/hypothesis, parent spawns in parallel:
+   - `harness/planning/implementation-researcher` → `PlanImplementationResearchBrief` → `artifacts/implementation-research.yaml`
+   - `harness/planning/stack-researcher` → `PlanStackBrief` → `artifacts/stack.yaml`
+2. Research stays **outside** debate; debate agents cite artifacts, no web tools.
+3. **Phase 4d** — `harness_plan_debate_eligibility` (pre-debate only) selects `full | standard | light` and `required_focuses`; persisted on messenger + bus at `harness_debate_open`.
+4. **Light profile** — `spec` + `quality` only, `min_focus_rounds=2`, reduced global cap; gate uses stored `required_focuses` (not hardcoded four).
+5. **Sprint auditor** — shared `lanesForRound(roundIndex, focus)` spawns sprint lane when `focus === quality` OR `roundIndex >= 4`.
+6. **`--quick`** still skips semantic scout only; never skips Phase 3.5 or debate.
+## Profiles
+| Profile | When | Focuses | min_focus_rounds |
+|---------|------|---------|-------------------|
+| full | high risk, material fork, open implementation questions, DAG manual patch, many tensions | all four | 4 |
+| standard | default (ambiguous → standard) | all four | 4 |
+| light | low risk, no fork, high-confidence implementation + clear stack primary | spec, quality | 2 |
+## Consequences
+### Positive
+- Better plans on hard tasks (external patterns before WBS).
+- Cheaper low-risk plans (light debate).
+- Deterministic eligibility and gate alignment.
+### Negative
+- Extra subagent per plan (implementation-researcher).
+- Parents must run eligibility before `harness_debate_open`.
+## References
+- `.pi/prompts/harness-plan.md`
+- `.pi/harness/specs/plan-implementation-research-brief.schema.json`
+- `.pi/extensions/lib/plan-debate-eligibility.ts`
+- ADR 0034, ADR 0035

package/.pi/harness/docs/adrs/README.md CHANGED Viewed

@@ -20,6 +20,8 @@ Team-shared ADRs for the ultimate-pi harness live under `.pi/harness/docs/adrs/`
 | [0032](0032-harness-command-orchestration.md) | Harness commands as agent orchestrators | Accepted |
 | [0033](0033-parent-orchestrated-planning.md) | Parent-orchestrated harness planning | Accepted |
 | [0034](0034-darwin-plan-research-pipeline.md) | Darwin plan research pipeline | Accepted |
+| [0035](0035-plan-phase-review-gate.md) | Plan-phase Review Gate | Accepted |
+| [0036](0036-implementation-research-and-selective-debate.md) | Implementation research and selective debate | Accepted |
 ## Template

package/.pi/harness/evals/smoke/fixtures/plan-phase/minimal-low-light/artifacts/implementation-research.yaml ADDED Viewed

@@ -0,0 +1,28 @@
+schema_version: "1.0.0"
+problem_framing: Validate harness plan-phase with fixture-driven smoke
+sub_problems:
+  - DAG validation
+  - Debate gate coverage
+internal_references:
+  - path: .pi/harness/evals/smoke/smoke-harness-plan.mjs
+    relevance: Existing smoke pattern
+    reuse_signal: high
+external_references: []
+solution_patterns:
+  - name: fixture-driven gate
+    provenance: in-repo smoke
+    fit: Validates plan pipeline without live agents
+    tradeoffs:
+      pros: [Deterministic CI]
+      cons: []
+    risks: []
+similar_implementations: []
+recommended_approach:
+  summary: Extend minimal-med fixture with implementation artifact
+  recommended_approach_confidence: high
+  confidence_rationale: Reuses established smoke-harness-plan pattern
+  evidence_refs:
+    - .pi/harness/evals/smoke/smoke-harness-plan.mjs
+    - .pi/scripts/validate-plan-dag.mjs
+anti_patterns: []
+open_questions: []

package/.pi/harness/evals/smoke/fixtures/plan-phase/minimal-low-light/artifacts/review-round-r1.yaml ADDED Viewed

@@ -0,0 +1,24 @@
+schema_version: "1.0.0"
+round_index: 1
+debate_round_focus: spec
+round_summary: Spec round for light profile fixture
+validation_summary: Spec checks pass
+adversary_summary: No blocking findings
+disputes: []
+recommended_packet_patches: []
+review_gate_ready: false
+participants:
+  - PlanEvaluatorAgent
+  - PlanAdversaryAgent
+  - HypothesisValidatorAgent
+  - ReviewIntegratorAgent
+claims:
+  - spec validation complete
+rebuttals: []
+evidence_refs: []
+token_usage:
+  per_agent:
+    PlanEvaluatorAgent: 80
+    PlanAdversaryAgent: 80
+  round_total: 160
+consensus_delta: 0.1

package/.pi/harness/evals/smoke/fixtures/plan-phase/minimal-low-light/artifacts/review-round-r2.yaml ADDED Viewed

@@ -0,0 +1,25 @@
+schema_version: "1.0.0"
+round_index: 2
+debate_round_focus: quality
+round_summary: Quality round for light profile fixture
+validation_summary: Quality and sprint contract pass
+adversary_summary: No gaps
+disputes: []
+recommended_packet_patches: []
+review_gate_ready: true
+participants:
+  - PlanEvaluatorAgent
+  - PlanAdversaryAgent
+  - SprintContractAuditorAgent
+  - ReviewIntegratorAgent
+claims:
+  - review gate ready
+rebuttals: []
+evidence_refs: []
+token_usage:
+  per_agent:
+    PlanEvaluatorAgent: 100
+    PlanAdversaryAgent: 90
+    SprintContractAuditorAgent: 70
+  round_total: 260
+consensus_delta: 0.12

package/.pi/harness/evals/smoke/fixtures/plan-phase/minimal-low-light/plan-packet.yaml ADDED Viewed

@@ -0,0 +1,196 @@
+schema_version: "1.0.0"
+contract_version: "1.1.0"
+plan_id: plan-smoke-fixture-001
+task_id: task-smoke-001
+scope: Smoke fixture for plan-phase harness validation with execution_plan and debate artifacts.
+assumptions:
+  - Fixture only; no live agent run
+risk_level: low
+acceptance_checks:
+  - id: AC-1
+    description: DAG validation passes
+  - id: AC-2
+    description: Two debate rounds recorded (light profile)
+  - id: AC-3
+    description: Stack brief present in research-brief
+  - id: AC-4
+    description: Sprint contract complete
+  - id: AC-5
+    description: plan-review.md renders
+rollback_plan:
+  revert_commit_ready: true
+  rollback_artifacts:
+    revert_command: git revert HEAD
+    revert_branch: main
+    patch_bundle: .pi/harness/runs/smoke-fixture/patch.bundle
+execution_plan:
+  schema_version: "1.0.0"
+  phases:
+    - phase_id: P1
+      name: Foundation
+      objective: Establish baseline and verify harness wiring
+      entry_criteria:
+        - Fixture loaded
+      exit_criteria:
+        - AC-1 satisfied
+      milestone: M1-baseline
+      work_item_ids: [WI-1, WI-2, WI-3]
+    - phase_id: P2
+      name: Build
+      objective: Implement core changes
+      entry_criteria:
+        - M1-baseline complete
+      exit_criteria:
+        - AC-2 satisfied
+      milestone: M2-build
+      work_item_ids: [WI-4, WI-5, WI-6]
+    - phase_id: P3
+      name: Verify
+      objective: Quality gate and documentation
+      entry_criteria:
+        - M2-build complete
+      exit_criteria:
+        - AC-5 satisfied
+      milestone: M3-ship
+      work_item_ids: [WI-7, WI-8]
+  work_items:
+    - work_item_id: WI-1
+      phase_id: P1
+      title: Load fixture packet
+      description: Read plan-packet.yaml from fixture directory
+      depends_on: []
+      files:
+        - .pi/harness/evals/smoke/fixtures/plan-phase/minimal-med/plan-packet.yaml
+      parallel_safe: true
+      done_criteria:
+        type: manual
+        spec: Fixture packet readable
+      acceptance_check_ids: [AC-1]
+    - work_item_id: WI-2
+      phase_id: P1
+      title: Run DAG validator
+      description: Execute validate-plan-dag.mjs
+      depends_on: [WI-1]
+      files:
+        - .pi/scripts/validate-plan-dag.mjs
+      parallel_safe: false
+      done_criteria:
+        type: command
+        spec: node .pi/scripts/validate-plan-dag.mjs --packet plan-packet.yaml
+      acceptance_check_ids: [AC-1]
+    - work_item_id: WI-3
+      phase_id: P1
+      title: Lint harness-yaml
+      description: Ensure YAML helpers parse fixture
+      depends_on: [WI-1]
+      files:
+        - .pi/lib/harness-yaml.ts
+      parallel_safe: true
+      done_criteria:
+        type: lint
+        spec: npm test
+      acceptance_check_ids: [AC-1]
+    - work_item_id: WI-4
+      phase_id: P2
+      title: Debate round 1-2 artifacts
+      description: Validate review-round YAML
+      depends_on: [WI-2]
+      files:
+        - .pi/harness/evals/smoke/fixtures/plan-phase/minimal-med/artifacts/review-round-r1.yaml
+      parallel_safe: false
+      done_criteria:
+        type: artifact
+        spec: artifacts/review-round-r1.yaml exists
+      acceptance_check_ids: [AC-2]
+    - work_item_id: WI-5
+      phase_id: P2
+      title: Debate round 3-4 artifacts
+      description: Validate final review round
+      depends_on: [WI-4]
+      files:
+        - .pi/harness/evals/smoke/fixtures/plan-phase/minimal-med/artifacts/review-round-r4.yaml
+      parallel_safe: false
+      done_criteria:
+        type: artifact
+        spec: artifacts/review-round-r4.yaml exists
+      acceptance_check_ids: [AC-2]
+    - work_item_id: WI-6
+      phase_id: P2
+      title: Stack research merge
+      description: research-brief includes stack section
+      depends_on: [WI-2]
+      files: []
+      non_code: true
+      parallel_safe: true
+      done_criteria:
+        type: manual
+        spec: research-brief.yaml contains stack key
+      acceptance_check_ids: [AC-3]
+    - work_item_id: WI-7
+      phase_id: P3
+      title: Sprint contract audit
+      description: R4 sprint audit artifact
+      depends_on: [WI-5]
+      files:
+        - .pi/harness/evals/smoke/fixtures/plan-phase/minimal-med/artifacts/sprint-audit-r4.yaml
+      parallel_safe: false
+      done_criteria:
+        type: artifact
+        spec: sprint-audit-r4.yaml present
+      acceptance_check_ids: [AC-4]
+    - work_item_id: WI-8
+      phase_id: P3
+      title: Render plan-review
+      description: Human-readable plan review markdown
+      depends_on: [WI-7]
+      files:
+        - .pi/harness/evals/smoke/fixtures/plan-phase/minimal-med/plan-review.md
+      parallel_safe: false
+      done_criteria:
+        type: manual
+        spec: plan-review.md non-empty
+      acceptance_check_ids: [AC-5]
+  sprint_contract:
+    in_scope:
+      - Fixture validation only
+    out_of_scope:
+      - Production deploy
+    definition_of_done: All smoke checks green
+    assumptions:
+      - CI environment has node
+    external_dependencies: []
+  wbs_dictionary:
+    - work_item_id: WI-1
+      deliverable: Fixture packet loaded
+      owner_role: executor
+      inputs: []
+      outputs: [parsed packet]
+  risk_register:
+    - risk_id: R1
+      description: DAG validator false negative
+      likelihood: low
+      impact: high
+      mitigation: Unit tests on validate-plan-dag.mjs
+      linked_work_item_ids: [WI-2]
+    - risk_id: R2
+      description: Debate cap misconfiguration
+      likelihood: med
+      impact: med
+      mitigation: debate-orchestrator plan profile tests
+      linked_work_item_ids: [WI-4]
+    - risk_id: R3
+      description: YAML parse drift
+      likelihood: low
+      impact: med
+      mitigation: harness-yaml strict parse
+      linked_work_item_ids: [WI-3]
+  schedule_metadata:
+    critical_path_work_item_ids: [WI-1, WI-2, WI-4, WI-5, WI-7, WI-8]
+    parallel_groups:
+      - [WI-1, WI-3]
+    schedule_baseline_note: Fixture topological order; no calendar dates
+  dag_validation:
+    status: pass
+    topological_order: [WI-1, WI-2, WI-3, WI-4, WI-5, WI-6, WI-7, WI-8]
+    cycles: []
+    conflicts: []

package/.pi/harness/evals/smoke/fixtures/plan-phase/minimal-low-light/plan-review.md ADDED Viewed

@@ -0,0 +1,14 @@
+# Plan review (fixture)
+plan_id: plan-smoke-fixture-001
+## Execution plan
+Phases: P1 Foundation → P2 Build → P3 Verify
+Critical path: WI-1 → WI-2 → WI-4 → WI-5 → WI-7 → WI-8
+## Debate
+- Round 1 (spec): review_gate_ready
+- Round 4 (quality): review_gate_ready

package/.pi/harness/evals/smoke/fixtures/plan-phase/minimal-low-light/research-brief.yaml ADDED Viewed

@@ -0,0 +1,62 @@
+decomposition:
+  schema_version: "1.0.0"
+  problem_restatement: Light-profile smoke for two-focus debate
+hypothesis:
+  schema_version: "1.0.0"
+  primary:
+    claim: Light debate covers spec and quality only
+    mechanism: Eligibility profile light with min_focus_rounds 2
+    prediction: planDebateOutcomeComplete passes with two rounds
+    experiment: Run smoke-harness-plan.mjs --fixture minimal-low-light
+implementation:
+  schema_version: "1.0.0"
+  problem_framing: Low-risk fixture for selective debate
+  sub_problems: [spec coverage, quality coverage]
+  internal_references:
+    - path: test/plan-debate-eligibility.test.mjs
+      relevance: Eligibility unit tests
+      reuse_signal: high
+  external_references: []
+  solution_patterns:
+    - name: light profile gate
+      provenance: ADR-0036
+      fit: Reduces debate cost on trivial tasks
+      tradeoffs:
+        pros: [Fewer rounds]
+        cons: []
+      risks: []
+  similar_implementations:
+    - name: minimal-med four-focus fixture
+      what_it_solves: Full debate coverage
+      gap_vs_us: Light uses two focuses only
+  recommended_approach:
+    summary: Two review rounds with spec then quality
+    recommended_approach_confidence: high
+    confidence_rationale: Deterministic fixture aligned with eligibility rules
+    evidence_refs:
+      - .pi/extensions/lib/plan-debate-eligibility.ts
+      - test/plan-debate-eligibility.test.mjs
+  anti_patterns: []
+  open_questions: []
+stack:
+  schema_version: "1.0.0"
+  problem_framing: Node harness tooling
+  constraints: []
+  options:
+    - name: extend current stack
+      category: brownfield
+      fit_summary: Use existing ultimate-pi harness
+      tradeoffs:
+        pros: [No new deps]
+        cons: []
+      risks: []
+      evidence_refs: []
+      recommendation_rank: 1
+  recommended_primary: extend current stack
+  rationale: Fixture validates in-repo harness
+eval:
+  schema_version: "1.0.0"
+  revision_recommended: false
+  relevance:
+    passes: true
+    rationale: Hypothesis matches light smoke task

package/.pi/harness/evals/smoke/fixtures/plan-phase/minimal-med/artifacts/implementation-research.yaml ADDED Viewed

@@ -0,0 +1,28 @@
+schema_version: "1.0.0"
+problem_framing: Validate harness plan-phase with fixture-driven smoke
+sub_problems:
+  - DAG validation
+  - Debate gate coverage
+internal_references:
+  - path: .pi/harness/evals/smoke/smoke-harness-plan.mjs
+    relevance: Existing smoke pattern
+    reuse_signal: high
+external_references: []
+solution_patterns:
+  - name: fixture-driven gate
+    provenance: in-repo smoke
+    fit: Validates plan pipeline without live agents
+    tradeoffs:
+      pros: [Deterministic CI]
+      cons: []
+    risks: []
+similar_implementations: []
+recommended_approach:
+  summary: Extend minimal-med fixture with implementation artifact
+  recommended_approach_confidence: high
+  confidence_rationale: Reuses established smoke-harness-plan pattern
+  evidence_refs:
+    - .pi/harness/evals/smoke/smoke-harness-plan.mjs
+    - .pi/scripts/validate-plan-dag.mjs
+anti_patterns: []
+open_questions: []

package/.pi/harness/evals/smoke/fixtures/plan-phase/minimal-med/artifacts/review-round-r2.yaml ADDED Viewed

@@ -0,0 +1,24 @@
+schema_version: "1.0.0"
+round_index: 2
+debate_round_focus: wbs
+round_summary: WBS round passed for fixture
+validation_summary: Work breakdown structure validated
+adversary_summary: No blocking findings
+disputes: []
+recommended_packet_patches: []
+review_gate_ready: true
+participants:
+  - PlanEvaluatorAgent
+  - PlanAdversaryAgent
+  - ReviewIntegratorAgent
+claims:
+  - wbs validation complete
+rebuttals: []
+evidence_refs: []
+token_usage:
+  per_agent:
+    PlanEvaluatorAgent: 100
+    PlanAdversaryAgent: 100
+    ReviewIntegratorAgent: 50
+  round_total: 250
+consensus_delta: 0.1

package/.pi/harness/evals/smoke/fixtures/plan-phase/minimal-med/artifacts/review-round-r3.yaml ADDED Viewed

@@ -0,0 +1,24 @@
+schema_version: "1.0.0"
+round_index: 3
+debate_round_focus: schedule
+round_summary: Schedule round passed for fixture
+validation_summary: Critical path and dependencies validated
+adversary_summary: No schedule risks unmitigated
+disputes: []
+recommended_packet_patches: []
+review_gate_ready: true
+participants:
+  - PlanEvaluatorAgent
+  - PlanAdversaryAgent
+  - ReviewIntegratorAgent
+claims:
+  - schedule validation complete
+rebuttals: []
+evidence_refs: []
+token_usage:
+  per_agent:
+    PlanEvaluatorAgent: 100
+    PlanAdversaryAgent: 100
+    ReviewIntegratorAgent: 50
+  round_total: 250
+consensus_delta: 0.1

package/.pi/harness/evals/smoke/fixtures/plan-phase/minimal-med/research-brief.yaml CHANGED Viewed

@@ -8,6 +8,35 @@ hypothesis:
     mechanism: Static artifacts plus validate-plan-dag.mjs
     prediction: CI passes without live agents
     experiment: Run smoke-harness-plan.mjs --fixture
+implementation:
+  schema_version: "1.0.0"
+  problem_framing: Validate harness plan-phase with fixture-driven smoke
+  sub_problems:
+    - DAG validation
+    - Debate gate coverage
+  internal_references:
+    - path: .pi/harness/evals/smoke/smoke-harness-plan.mjs
+      relevance: Existing smoke pattern
+      reuse_signal: high
+  external_references: []
+  solution_patterns:
+    - name: fixture-driven gate
+      provenance: in-repo smoke
+      fit: Validates plan pipeline without live agents
+      tradeoffs:
+        pros: [Deterministic CI]
+        cons: []
+      risks: []
+  similar_implementations: []
+  recommended_approach:
+    summary: Extend minimal-med fixture with implementation artifact
+    recommended_approach_confidence: high
+    confidence_rationale: Reuses established smoke-harness-plan pattern
+    evidence_refs:
+      - .pi/harness/evals/smoke/smoke-harness-plan.mjs
+      - .pi/scripts/validate-plan-dag.mjs
+  anti_patterns: []
+  open_questions: []
 stack:
   schema_version: "1.0.0"
   problem_framing: Node harness tooling