npm - @a-canary/pi-director - Versions diffs - 0.1.0 - Mend

@a-canary/pi-director 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (31) hide show

package/.pi/corrections.jsonl +3 -0
package/CHOICES.md +244 -0
package/PLAN.md +108 -0
package/README.md +97 -0
package/agents/README.md +42 -0
package/agents/builder.md +37 -0
package/agents/critic.md +74 -0
package/agents/director.md +133 -0
package/agents/planner.md +44 -0
package/agents/reviewer.md +35 -0
package/agents/scout.md +37 -0
package/agents/writer.md +28 -0
package/extensions/nightly-analysis.ts +229 -0
package/package.json +39 -0
package/skills/build/SKILL.md +53 -0
package/skills/build/lib/hard-stops.md +66 -0
package/skills/build/lib/phase-loop.md +99 -0
package/skills/build/lib/regression-check.md +63 -0
package/skills/choose/SKILL.md +48 -0
package/skills/choose/lib/pipeline.md +83 -0
package/skills/next/SKILL.md +84 -0
package/skills/next/lib/choice-scanner.md +51 -0
package/skills/next/lib/code-scanner.md +57 -0
package/skills/next/lib/log-scanner.md +55 -0
package/skills/next/lib/ranker.md +72 -0
package/skills/next/lib/session-scanner.md +53 -0
package/templates/NEXT.md +35 -0
package/test/agents.test.ts +63 -0
package/test/next-template.test.ts +41 -0
package/test/package.test.ts +59 -0
package/test/skills.test.ts +52 -0

package/skills/build/lib/phase-loop.md ADDED Viewed

@@ -0,0 +1,99 @@
+# Phase Loop
+Core execution loop for a single PLAN.md phase. The director agent follows this for each phase.
+## Input
+- PLAN.md with phases, steps, and gates
+- CHOICES.md for context and constraints
+- Current phase number (first incomplete)
+## Loop
+```
+┌──────────────────┐
+│ 1. Read Gates    │ ← PLAN.md + CHOICES.md
+└────────┬─────────┘
+         ▼
+┌──────────────────┐
+│ 2. Recon         │ ← scout agents (operational, parallel, many tool calls)
+└────────┬─────────┘
+         ▼
+┌──────────────────┐
+│ 3. Plan          │ ← planner agent (tactical, few tool calls)
+└────────┬─────────┘
+         ▼
+┌──────────────────┐
+│ 4. Critique      │ ← critic agent (strategic, ZERO tools, decision tree)
+└────────┬─────────┘
+         ▼
+┌──────────────────┐
+│ 5. Finalize      │ ← planner resolves decision tree branches (tactical)
+└────────┬─────────┘
+         ▼
+┌──────────────────┐
+│ 6. Build & Test  │ ← builder + reviewer agents (operational/tactical)
+└────────┬─────────┘
+         ▼
+┌──────────────────┐
+│ 7. Gate Critique │ ← critic reviews results (strategic, ZERO tools)
+└────────┬─────────┘
+    pass │    │ fail
+         ▼    ▼
+      ✅ Next  → diagnose → retry or ❌ STOP
+```
+## Step Details
+### 1. Read Gates
+- Parse PLAN.md for current phase's steps and gates
+- Parse CHOICES.md for relevant decisions
+- Identify: exit criteria, assumptions, blockers
+### 2. Recon
+Spawn parallel read-only agents:
+- **scout**: `find`, `grep`, `read` relevant codebase areas
+- **scout** (web): context7/web-search if phase references external APIs/libraries
+- Output: compressed context for builder handoff
+### 3. Plan (tactical)
+Delegate to **planner** agent with recon findings:
+- Planner synthesizes concrete steps with file paths and function names
+- Few tool calls — reads specific files to confirm assumptions
+- Produces structured plan for critique
+### 4. Critique (strategic — zero tools)
+Delegate to **critic** agent with:
+- Recon summary (from step 2)
+- Proposed plan (from step 3)
+- CHOICES.md context + priority ladder
+- Critic produces: approval/improvements + **decision tree** (max 8 leaves) for unknowns
+- Critic uses maximum thinking depth for elevated reasoning
+### 5. Finalize (tactical)
+Delegate back to **planner** to incorporate critique:
+- Resolve decision tree branches using tool calls (check conditions)
+- Apply critic's improvements
+- Delegate to **writer** to update PLAN.md
+- If critique rejected the plan → rework or STOP
+### 6. Build & Test (operational)
+- Delegate to **builder** agents (parallel when tasks touch independent files)
+- After each builder: delegate to **reviewer** (tactical)
+- Reviewer issues → delegate fixes to builder
+- Run test suite after implementation
+### 7. Gate Critique (strategic — zero tools)
+Delegate to **critic** with:
+- Phase gate results + exit criteria
+- Regression check results (see [regression-check.md](regression-check.md))
+- Implementation summary
+- Critic approves, or produces decision tree for remediation
+- If critic rejects → diagnose and fix via tactical/operational agents, or STOP if infeasible
+- All gates pass → mark phase complete in PLAN.md
+## Multi-Phase Mode
+When user says "do all phases" or "implement the plan":
+1. Execute steps 1-6 for current phase
+2. On success, loop to step 1 for next phase
+3. Continue until all phases complete or hard stop hit

package/skills/build/lib/regression-check.md ADDED Viewed

@@ -0,0 +1,63 @@
+# Regression Check
+Verify that changes don't regress higher-priority concerns (M-0100, A-0100).
+## When to Run
+- After every build step (Step 5 of phase loop)
+- As part of gate check (Step 6 of phase loop)
+- Before marking any phase complete
+## Check Order
+Run checks top-down through the priority ladder. Stop at first regression.
+### 1. UX Quality Regression
+- Do all existing user-facing features still work?
+- Has any error message become less clear?
+- Has any workflow gained extra steps?
+- Do interactive elements still respond correctly?
+**How to verify:**
+```bash
+# Run existing tests
+npm test 2>&1
+# Check for removed/changed public APIs
+git diff --stat HEAD~1 | grep -E '\.(ts|js|py)$'
+# Manual: does the happy path still work?
+```
+### 2. Security Regression
+- Are there new unvalidated inputs?
+- Are secrets still protected (no hardcoded keys)?
+- Are dependencies from trusted sources?
+- Are permissions still properly scoped?
+**How to verify:**
+```bash
+# Check for hardcoded secrets
+grep -rn 'password\|secret\|api_key\|token' --include='*.ts' --include='*.js' | grep -v node_modules | grep -v '.md'
+# Check new dependencies
+git diff HEAD~1 -- package.json
+```
+### 3. Scale Regression (if past scale gate)
+- Does the change add O(n²) or worse operations?
+- Are there new unbounded loops or recursions?
+- Are new resources properly cleaned up?
+### 4. Efficiency Check (informational only)
+- Note any efficiency impacts but don't block
+- Log as suggestion for future optimization
+## Output
+```markdown
+### Regression Check
+- [x] UX Quality: {pass/fail — details}
+- [x] Security: {pass/fail — details}
+- [ ] Scale: {pass/fail — details}
+- [ ] Efficiency: {noted — details}
+```
+If any check fails at a level higher than the current work's priority:
+→ **Hard stop. Do not proceed.**

package/skills/choose/SKILL.md ADDED Viewed

@@ -0,0 +1,48 @@
+# Choose — Project Intent Clarification
+Wrapper around pi-choose-wisely for managing CHOICES.md within the director workflow.
+## When to Use
+- User asks to clarify project intent, scope, goals
+- User runs `/choose`
+- New project without CHOICES.md
+## Routing
+| User Input | Delegate To |
+|------------|-------------|
+| No CHOICES.md exists | `pi-choose-wisely:choose-wisely` → bootstrap mode (scan docs, extract choices) |
+| "audit" / "check" | `pi-choose-wisely:choose-wisely` → audit mode (8 validation checks) |
+| "init" / "interview" | `pi-choose-wisely:choose-wisely` → interview mode (structured planning) |
+| Describes a change | `pi-choose-wisely:choose-wisely` → change mode (apply + cascade) |
+| "replan" / "plan" | `pi-choose-wisely:replan` → gap analysis → generate PLAN.md |
+## Process
+### Step 1 — Delegate to pi-choose-wisely
+Route the user's request to the appropriate pi-choose-wisely operation (see table above). Pass through all user context.
+### Step 2 — Post-Change Pipeline
+After any CHOICES.md modification:
+1. **Cascade audit** — pi-choose-wisely runs this automatically (upward, lateral, downward checks)
+2. **Priority ladder check** — verify new/changed choices respect M-0100 ordering
+3. **Suggest replan** — if PLAN.md exists, ask: "CHOICES.md changed. Regenerate PLAN.md?" If yes, delegate to `pi-choose-wisely:replan`
+4. **Suggest /next** — "Run `/next` to see how this affects recommendations?"
+### Step 3 — New Choice Validation
+For any new choice added, verify:
+- Has a `Supports:` line (unless top-level Mission)
+- ID is a fresh UID (never reuse or renumber existing IDs)
+- Positioned correctly within its section (position = priority)
+See [pipeline.md](lib/pipeline.md) for the full intent-to-execution flow.
+## Integration Points
+- After CHOICES.md changes → suggest `/build` if PLAN.md exists
+- After CHOICES.md changes → suggest `/next` for impact analysis
+- CHOICES.md gaps feed into `/next` recommendations via choice-scanner
+## Delegates To
+- `pi-choose-wisely:choose-wisely` skill (all CHOICES.md operations)
+- `pi-choose-wisely:replan` skill (gap analysis → PLAN.md generation)

package/skills/choose/lib/pipeline.md ADDED Viewed

@@ -0,0 +1,83 @@
+# Intent-to-Execution Pipeline
+How project intent flows from CHOICES.md through to running code.
+## Flow
+```
+User Intent
+    │
+    ▼
+┌─────────────────┐
+│    /choose       │  ← Clarify WHY and WHAT
+│   CHOICES.md     │
+└────────┬────────┘
+         │ changes
+         ▼
+┌─────────────────┐
+│    replan        │  ← Gap analysis: choices vs reality
+│   PLAN.md        │
+└────────┬────────┘
+         │ phases
+         ▼
+┌─────────────────┐
+│    /build        │  ← Execute phases: HOW
+│   code + tests   │
+└────────┬────────┘
+         │ results
+         ▼
+┌─────────────────┐
+│    /next         │  ← Analyze what happened
+│   NEXT.md        │
+└────────┬────────┘
+         │ recommendations
+         ▼
+    Back to /choose or /build
+```
+## Artifact Lifecycle
+### CHOICES.md
+- **Created by**: `/choose` (bootstrap or interview)
+- **Modified by**: `/choose` (change + cascade)
+- **Read by**: `/build` (constraints), `/next` (gap analysis)
+- **Owned by**: pi-choose-wisely
+### PLAN.md
+- **Created by**: replan (from CHOICES.md gap analysis)
+- **Modified by**: `/build` (marks phases complete, refines steps)
+- **Read by**: `/next` (incomplete phases), `/build` (current phase)
+- **Owned by**: pi-choose-wisely:replan + pi-director:/build
+### NEXT.md
+- **Created by**: `/next` (analysis engine)
+- **Modified by**: User approval (items deferred/dismissed)
+- **Read by**: User (recommendations), `/choose` (scope changes), `/build` (approved items)
+- **Owned by**: pi-director:/next
+## Autonomy Boundary
+CHOICES.md is the autonomy boundary:
+- **Inside scope** → director acts freely (bugs, gaps, refactors aligned with choices)
+- **Outside scope** → NEXT.md surfaces it, user decides via `/choose`
+```
+CHOICES.md (user-steered)
+    │
+    ├── In scope? ──→ Director acts autonomously
+    │                  (build, fix, refactor, test)
+    │
+    └── Out of scope? → NEXT.md (agent-discovered)
+                          │
+                          └── User accepts? → Update CHOICES.md → Director can act
+```
+## Cycle
+1. `/choose` → user steers intent (interview, feedback)
+2. replan → generate phases from intent
+3. `/build` → execute phases autonomously (within scope)
+4. `/next` → surface out-of-scope issues for user review
+5. User accepts items → back to `/choose`
+Each cycle tightens alignment between intent and implementation.

package/skills/next/SKILL.md ADDED Viewed

@@ -0,0 +1,84 @@
+# Next — Analysis & Recommendation Engine
+Analyze project data and generate ranked recommendations in NEXT.md.
+## When to Use
+- User asks "what should I do next?"
+- User runs `/next`
+- Nightly scheduled analysis
+## Data Sources
+1. **Session history** — `.pi/agent/sessions/*.jsonl` — patterns of repeated work, failed attempts
+2. **Correction logs** — `.pi/corrections.jsonl` — systematic failures (via pi-upskill)
+3. **Code analysis** — complexity, test coverage gaps, dead code, large files
+4. **CHOICES.md** — unimplemented choices, stale decisions
+5. **PLAN.md** — incomplete phases, blocked items
+6. **App output/logs** — runtime errors, performance issues
+## Process
+### Step 1 — Gather
+Spawn **parallel** scout agents, one per scanner module:
+- [Session scanner](lib/session-scanner.md): parse recent sessions for failure patterns, token waste, repeated manual fixes
+- [Code scanner](lib/code-scanner.md): find complexity hotspots, untested code, large files, dead exports
+- [Choice scanner](lib/choice-scanner.md): diff CHOICES.md against codebase reality
+- [Log scanner](lib/log-scanner.md): parse app logs for recurring errors
+Each scanner runs as an independent subagent. All four run in parallel.
+### Step 2 — Analyze
+Synthesize findings into recommendation categories:
+- **refactor** — code quality improvements with clear before/after
+- **simplify** — remove unnecessary complexity, dead code, over-abstraction
+- **scope-change** — CHOICES.md additions/removals based on evidence
+- **ux-improvement** — user experience issues found in logs or session patterns
+- **upskill** — repeated agent failures suggesting a new skill or rule
+- **debt** — technical debt items with effort estimates
+### Step 3 — Rank
+Apply the [ranking algorithm](lib/ranker.md):
+- **Impact** × **Effort** × **Evidence** = priority score (1-27)
+- Filter through priority ladder (M-0100): UX Quality > Security > Scale > Efficiency
+- Flag any recommendation that would regress a higher priority with ⚠️
+### Step 4 — Write NEXT.md
+Generate structured output:
+```markdown
+# NEXT.md — Recommended Actions
+Generated: {date}
+Sources analyzed: {count} sessions, {count} corrections, {count} files
+## Priority 1: {title}
+Category: refactor | Impact: high | Effort: small
+Evidence: {what data supports this}
+Action: {specific steps}
+## Priority 2: {title}
+...
+```
+### Step 5 — Classify & Route
+**Within CHOICES.md scope** → director handles autonomously (no NEXT.md entry needed):
+- Bug fixes aligned with existing choices
+- Test failures for implemented features
+- Implementation gaps for existing choices
+- Refactors that support existing architecture decisions
+**Outside CHOICES.md scope** → write to NEXT.md for user review:
+- Problems that contradict CHOICES.md decisions
+- Opportunities that expand beyond current scope
+- New concerns not addressed by any existing choice
+- Trade-offs that require user judgment
+### Step 6 — Present for Approval
+Show NEXT.md items (scope-external only). User selects items to:
+- Accept → feeds into `/choose` to update CHOICES.md, then director can act
+- Defer (stays in NEXT.md for next cycle)
+- Dismiss (removed with reason logged)
+## Output
+- Writes `NEXT.md` in project root
+- Returns summary of top recommendations for user selection

package/skills/next/lib/choice-scanner.md ADDED Viewed

@@ -0,0 +1,51 @@
+# Choice Scanner
+Instructions for a scout agent to diff CHOICES.md against codebase reality.
+## Process
+1. Read CHOICES.md, extract all choices with IDs and titles
+2. For each choice, assess implementation status:
+### Status Assessment
+| Status | Evidence |
+|--------|----------|
+| **Fulfilled** | Code exists, tests pass, feature works |
+| **Partial** | Some code exists, incomplete or untested |
+| **Not started** | No implementation evidence |
+| **Stale** | Code exists but contradicts current choice wording |
+| **Orphaned** | Implementation exists for a choice that was removed |
+3. Check for gaps:
+- Choices in Technology section without matching `package.json` deps
+- Choices in Architecture section without matching directory structure
+- Choices in Features section without matching code paths
+4. Check PLAN.md alignment:
+- Plan phases that reference removed/changed choices
+- Completed plan phases whose choices have since changed
+## Output Format
+```markdown
+## Choice-Reality Gap Analysis
+CHOICES.md: {N} choices
+Codebase: {M} source files
+### Status
+- ✓ Fulfilled: {count} — {IDs}
+- ◐ Partial: {count} — {IDs}
+- ✗ Not started: {count} — {IDs}
+- ⚠ Stale: {count} — {IDs}
+- 👻 Orphaned: {count} — {descriptions}
+### Gaps
+1. {choice ID}: {what's missing}
+2. ...
+### PLAN.md Drift
+1. Phase {N}: {misalignment description}
+2. ...
+```

package/skills/next/lib/code-scanner.md ADDED Viewed

@@ -0,0 +1,57 @@
+# Code Scanner
+Instructions for a scout agent to find code quality issues.
+## Process
+1. Get project file list:
+```bash
+find . -type f \( -name "*.ts" -o -name "*.js" -o -name "*.py" -o -name "*.go" -o -name "*.rs" \) \
+  -not -path '*/node_modules/*' -not -path '*/.git/*' -not -path '*/dist/*'
+```
+2. For each source file, check:
+### Complexity Hotspots
+- Files > 300 lines: `wc -l` on each file
+- Functions > 50 lines: grep for function definitions, count lines to next function/end
+- Deeply nested code: grep for 4+ levels of indentation
+### Test Coverage Gaps
+- Source files without corresponding test file (`*.test.*`, `*.spec.*`)
+- Test files that exist but are empty or have no assertions
+- Untested exports: public functions/classes not referenced in tests
+### Dead Code
+- Exported functions not imported anywhere else
+- Files not imported by any other file
+- Unused dependencies in package.json
+### Dependency Health
+- `package.json` deps vs what's actually imported
+- Outdated major versions (if lockfile available)
+## Output Format
+```markdown
+## Code Analysis
+Files scanned: {count}
+Total lines: {count}
+### Complexity Hotspots
+1. `{path}` — {lines} lines, {reason}
+2. ...
+### Missing Tests
+1. `{path}` — no test file found
+2. ...
+### Dead Code Candidates
+1. `{path}:{export}` — not imported anywhere
+2. ...
+### Dependency Issues
+1. `{package}` — {issue}
+2. ...
+```

package/skills/next/lib/log-scanner.md ADDED Viewed

@@ -0,0 +1,55 @@
+# Log Scanner
+Instructions for a scout agent to parse application output logs.
+## Process
+1. Find log files:
+```bash
+find . -name "*.log" -not -path '*/node_modules/*' -not -path '*/.git/*' 2>/dev/null
+ls logs/ 2>/dev/null
+ls .pi/*.log 2>/dev/null
+```
+2. Also check for common log patterns:
+- `npm test` output (captured in CI or local)
+- Docker logs if containerized
+- stderr output captured in session files
+3. For each log source, extract:
+### Error Patterns
+- Repeated error messages (same error 3+ times)
+- Stack traces with common root causes
+- Warnings that escalated to errors over time
+### Performance Signals
+- Slow operations (timeout warnings, > 5s responses)
+- Memory warnings
+- Rate limit hits
+### Runtime Issues
+- Deprecated API usage warnings
+- Unhandled promise rejections
+- Missing environment variables
+## Output Format
+```markdown
+## Log Analysis
+Log sources found: {count}
+Period: {date range}
+### Recurring Errors
+1. `{error message}` — {count} occurrences — source: {file/component}
+2. ...
+### Performance Issues
+1. {description} — {frequency} — impact: {assessment}
+2. ...
+### Warnings
+1. {description} — {count} occurrences
+2. ...
+```

package/skills/next/lib/ranker.md ADDED Viewed

@@ -0,0 +1,72 @@
+# Recommendation Ranker
+How to score and rank recommendations from scanner outputs.
+## Input
+Combined findings from all scanners: session, code, choice, log.
+## Scoring
+Each recommendation gets three scores (1-3):
+### Impact (how much does fixing this improve the project?)
+- **3 (high)**: Affects core UX, blocks features, or causes user-visible issues
+- **2 (medium)**: Improves maintainability, reduces debt, prevents future problems
+- **1 (low)**: Nice-to-have cleanup, minor optimization
+### Effort (how much work to fix?)
+- **3 (small)**: < 1 hour, single file, clear fix
+- **2 (medium)**: 1-4 hours, multiple files, some design needed
+- **1 (large)**: > 4 hours, architectural change, needs planning
+### Evidence (how strong is the supporting data?)
+- **3 (strong)**: 5+ signals from multiple sources
+- **2 (moderate)**: 2-4 signals or single strong signal
+- **1 (weak)**: 1 signal, inference-based
+### Priority Score
+`priority = impact × effort × evidence` (max 27, min 1)
+## Categories
+Assign each recommendation exactly one category:
+- **refactor** — restructure code without changing behavior
+- **simplify** — remove complexity, dead code, over-abstraction
+- **scope-change** — add/remove/modify a CHOICES.md decision
+- **ux-improvement** — improve user experience based on evidence
+- **upskill** — create/modify agent skill or rule to prevent recurring failures
+- **debt** — address accumulated technical debt
+## Priority Ladder Filter
+After ranking, verify each recommendation respects M-0100:
+- UX Quality recommendations always rank above Security-only items
+- Security items rank above Scale-only items
+- Scale items rank above Efficiency-only items
+- Any recommendation that would regress a higher priority is flagged with ⚠️
+## Scope Classification
+Before ranking, classify each finding:
+### In-scope (autonomous)
+The finding relates to an existing CHOICES.md decision. The director can act without approval.
+- Mark as `scope: in` — these don't go to NEXT.md
+- Route directly to `/build` or fix inline
+### Out-of-scope (needs approval)
+The finding conflicts with, expands beyond, or is not covered by CHOICES.md.
+- Mark as `scope: out` — these go to NEXT.md
+- User must accept (→ update CHOICES.md) before director can act
+## Output
+Top 10 out-of-scope recommendations for NEXT.md, sorted by priority score descending:
+```markdown
+## Priority 1: {title}
+Category: {category} | Impact: {high/med/low} | Effort: {small/med/large} | Score: {N}
+Evidence: {what data supports this — scanner, count, files}
+Action: {specific steps to take}
+Supports: {CHOICES.md IDs affected, if any}
+```

package/skills/next/lib/session-scanner.md ADDED Viewed

@@ -0,0 +1,53 @@
+# Session Scanner
+Instructions for a scout agent to analyze pi session history.
+## Input
+- Session files: `.pi/agent/sessions/*.jsonl` (current project)
+- Global sessions: `~/.pi/agent/sessions/*.jsonl` (cross-project patterns)
+## Process
+1. List session files, sorted by date (newest first), limit to last 14 days
+2. For each session file, parse JSONL — each line is a message object
+3. Look for these patterns:
+### Failure Patterns
+- Messages with `tool_error` or failed bash commands (exit code != 0)
+- Repeated attempts at the same operation (3+ tries = signal)
+- User corrections ("no", "wrong", "actually", "I meant")
+### Token Waste
+- Sessions > 50 messages without a commit (spinning)
+- Large file reads followed by small edits (could have used grep)
+- Repeated context re-establishment (agent forgot prior work)
+### Repeated Manual Work
+- Same file edited across 3+ sessions (hotspot)
+- Same bash command run across 3+ sessions (should be automated)
+- Same question asked across sessions (missing documentation)
+## Output Format
+```markdown
+## Session Analysis
+Period: {date range}
+Sessions analyzed: {count}
+### Failure Patterns
+1. {pattern} — seen {N} times — files: {list}
+2. ...
+### Token Waste Signals
+1. {pattern} — est. {N} tokens wasted — sessions: {list}
+2. ...
+### Repeated Manual Work
+1. {file/command} — {N} occurrences — suggestion: {what to automate}
+2. ...
+### Hotspot Files
+1. `{path}` — touched in {N} sessions — last: {date}
+2. ...
+```