npm - @sienklogic/plan-build-run - Versions diffs - 2.3.1 → 2.4.0 - Mend

@sienklogic/plan-build-run 2.3.1 → 2.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (146) hide show

package/plugins/copilot-pbr/references/model-profiles.md ADDED Viewed

@@ -0,0 +1,100 @@
+<!-- canonical: ../../pbr/references/model-profiles.md -->
+# Model Profiles Reference
+How Plan-Build-Run maps agents to models and how to configure model selection.
+---
+## Agent-to-Model Mapping
+Each Plan-Build-Run agent has a default model specified in its agent definition frontmatter (`model:` field). These defaults are overridden by the `models` section of `config.json`.
+### Default Agent Models
+| Agent | Default Model | Rationale |
+|-------|---------------|-----------|
+| `researcher` | `sonnet` | Research requires strong reasoning for source evaluation and synthesis |
+| `planner` | `inherit` | Planning is complex; inherits the session's primary model |
+| `executor` | `inherit` | Execution needs the full capability of the session model |
+| `verifier` | `sonnet` | Verification needs solid reasoning but not the heaviest model |
+| `integration-checker` | `sonnet` | Cross-phase analysis requires strong pattern matching |
+| `plan-checker` | `sonnet` | Plan quality analysis needs good analytical capability |
+| `debugger` | `inherit` | Debugging is complex; inherits session model |
+| `codebase-mapper` | `sonnet` | Codebase analysis requires thorough reasoning |
+| `synthesizer` | `haiku` | Synthesis is mechanical combination; speed over depth |
+| `general` | `inherit` | Lightweight utility; inherits session model |
+The `inherit` value means the agent uses whatever model the parent session is running (typically the user's configured Claude model).
+---
+## Model Profile Presets
+The `/pbr:config model-profile {preset}` command sets all agent models at once using a preset:
+| Profile | Researcher | Planner | Executor | Verifier | Int-Checker | Debugger | Mapper | Synthesizer |
+|---------|-----------|---------|----------|----------|-------------|----------|--------|-------------|
+| `quality` | opus | opus | opus | opus | sonnet | opus | sonnet | sonnet |
+| `balanced` | sonnet | inherit | inherit | sonnet | sonnet | inherit | sonnet | haiku |
+| `budget` | haiku | haiku | haiku | haiku | haiku | haiku | haiku | haiku |
+| `adaptive` | sonnet | sonnet | inherit | sonnet | haiku | inherit | haiku | haiku |
+### Preset Descriptions
+- **quality**: Maximum capability across all agents. Best results but highest token cost. Use for critical projects or complex architectures.
+- **balanced**: The default preset. Front-loads intelligence in research and verification, inherits session model for planning and execution. Good general-purpose choice.
+- **budget**: All agents use haiku. Fastest and cheapest, but reduced quality for complex reasoning tasks. Suitable for well-defined, mechanical work.
+- **adaptive**: Front-loads intelligence in research and planning (where decisions matter most), uses lighter models for mechanical execution and verification. Good cost-quality tradeoff.
+---
+## Configuring Models
+### Per-Agent Configuration
+Set an individual agent's model via `/pbr:config`:
+```
+/pbr:config model executor sonnet
+/pbr:config model verifier opus
+```
+Or edit `config.json` directly:
+```json
+{
+  "models": {
+    "researcher": "sonnet",
+    "planner": "inherit",
+    "executor": "inherit",
+    "verifier": "sonnet",
+    "integration_checker": "sonnet",
+    "debugger": "inherit",
+    "mapper": "sonnet",
+    "synthesizer": "haiku"
+  }
+}
+```
+### Valid Model Values
+| Value | Meaning |
+|-------|---------|
+| `sonnet` | Claude Sonnet (4.5/4.6) -- balanced speed and capability |
+| `opus` | Claude Opus (4.6) -- highest capability, slower |
+| `haiku` | Claude Haiku (4.5) -- fastest, lower capability |
+| `inherit` | Use the session's primary model (whatever the user is running) |
+Note: Claude Code 2.1.45+ supports Sonnet 4.6. Model values are abstract names — Claude Code resolves them to the latest available version.
+---
+## Model Selection in Skill Orchestration
+Skills that spawn subagents use the `model` parameter in `Task()` calls. Some skills hardcode a lighter model for specific tasks:
+- **Build skill**: Spawns inline verifiers with `model: "haiku"` for quick spot-checks
+- **Build skill**: Spawns codebase mapper updates with `model: "haiku"` for incremental map refreshes
+- **Plan skill**: Uses the configured `planner` model for main planning work
+The `subagent_type` parameter automatically loads the agent definition, and the model from `config.json` takes precedence over the agent's default `model:` frontmatter field.

package/plugins/copilot-pbr/references/model-selection.md ADDED Viewed

@@ -0,0 +1,32 @@
+<!-- canonical: ../../pbr/references/model-selection.md -->
+# Model Selection Reference
+Plan-Build-Run uses adaptive model selection to balance cost and capability.
+## How It Works
+1. The planner annotates each task with `complexity="simple|medium|complex"`
+2. The build skill maps complexity to a model via `config.models.complexity_map`
+3. The executor agent is spawned with the selected model
+## Default Mapping
+| Complexity | Model | Rationale |
+|-----------|-------|-----------|
+| simple | haiku | Fast, cheap — sufficient for mechanical changes |
+| medium | sonnet | Good balance for standard feature work |
+| complex | inherit | Uses session model — typically Opus for critical work |
+## Override Mechanisms
+1. **Per-task override**: Add `model="sonnet"` attribute to task XML in PLAN.md
+2. **Config override**: Set `models.complexity_map` in config.json to change defaults
+3. **Agent-level override**: Set `models.executor` in config.json to force a single model for all executor spawns (disables adaptive selection)
+## Heuristics
+The planner uses these signals to determine complexity:
+- File count and types
+- Task name keywords
+- Dependency count
+- Whether the task introduces new patterns vs. follows established ones

package/plugins/copilot-pbr/references/pbr-rules.md ADDED Viewed

@@ -0,0 +1,194 @@
+<!-- canonical: ../../pbr/references/pbr-rules.md -->
+# Plan-Build-Run Rules
+Authoritative rules for all Plan-Build-Run skills, agents, hooks, and workflows.
+Condensed from the 3,100-line `docs/DEVELOPMENT-GUIDE.md`. When in doubt, these rules govern.
+---
+## Philosophy
+1. **Context is precious.** The orchestrator stays lean (~15% usage). Delegate heavy work to subagents.
+2. **State lives on disk.** Skills and agents communicate through `.planning/` files, not messages.
+3. **Agents are black boxes.** Clear input/output contracts. Agents never read other agent definitions.
+4. **Gates provide safety.** Users control pace via config toggles. Never skip a gate.
+5. **One task, one commit.** Atomic commits for clean history and easy rollback.
+6. **Cross-platform always.** `path.join()`, CommonJS, test on Windows/macOS/Linux.
+7. **Test everything.** 70% coverage minimum. All 9 CI matrix combinations must pass.
+8. **PLAN-BUILD-RUN branding only.** Always use `PLAN-BUILD-RUN ►` prefix in banners.
+---
+## Context Budget
+9. Target 15% orchestrator context usage. Warn user at 30%.
+10. **Never** read agent definitions (`agents/*.md`) in the orchestrator — `subagent_type` auto-loads them.
+11. **Never** inline large files into `Task()` prompts — tell agents to read files from disk.
+12. **Never** read full subagent output — read frontmatter only (first 10-20 lines).
+13. Read STATE.md and config.json fully. Read ROADMAP.md by section. Read PLAN.md/SUMMARY.md/VERIFICATION.md frontmatter only.
+14. Use the `limit` parameter on Read to restrict line counts.
+15. Proactively suggest `/pbr:pause` when context gets heavy — before compaction, not after.
+---
+## State Management
+16. STATE.md is the **single source of truth** for current position.
+17. **Never** infer current phase from directory listings, git log, or conversation history.
+18. Always read STATE.md before making state-dependent decisions.
+19. Update STATE.md at: begin, plan, build, review, pause, continue, milestone.
+20. **Never** commit STATE.md mid-skill — hooks handle session persistence.
+21. Agents write artifacts (SUMMARY.md, VERIFICATION.md). Only the orchestrator writes STATE.md.
+22. Every ROADMAP.md phase must have a matching `.planning/phases/` directory, and vice versa.
+23. config.json is validated against `scripts/config-schema.json` on load.
+---
+## Agent Spawning
+24. Always use `subagent_type: "pbr:{name}"` to spawn agents.
+25. Pass file paths to agents, not file contents.
+26. Agents never read other agent definition files.
+27. Each agent gets a fresh 200k token context window — let them do the heavy reading.
+---
+## User Interaction Patterns
+28. Use `AskUserQuestion` for all structured gate checks — never plain-text "Type approved" prompts.
+29. AskUserQuestion is an orchestrator-only tool. It **cannot** be called from subagents (Task contexts).
+30. Max 4 options per AskUserQuestion call. If more options exist, split into a 2-step flow.
+31. `header` field must be max 12 characters. Keep it a single word when possible.
+32. `multiSelect` is always `false` for Plan-Build-Run gate checks.
+33. Always handle the "Other" case — users may type freeform text instead of selecting an option.
+34. Reuse patterns from `skills/shared/gate-prompts.md` by name. Do not reinvent prompts inline.
+35. Do **not** use AskUserQuestion for freeform text input (symptom gathering, Socratic discussion, open-ended questions). Use plain conversation for those.
+36. Skills that do not require user interaction (continue, health, help, pause) intentionally omit AskUserQuestion from allowed-tools.
+---
+## Skill Authoring
+37. Every SKILL.md starts with YAML frontmatter: `name`, `description`, `allowed-tools`, `argument-hint`.
+38. Skills that spawn agents **must** have a `## Context Budget` section.
+39. Mark each step as `(inline)` or `(delegate)` — inline for light work, delegate for analysis.
+40. Reference templates and references by filename — never inline them.
+41. Gate checks: read config toggle → display summary → ask user → proceed/abort/revise.
+42. Use branded UI elements from `references/ui-formatting.md`.
+---
+## Agent Authoring
+43. Every agent file starts with YAML frontmatter: `name`, `description`, `model`, `memory`, `tools`.
+44. Agent name matches the agent file name (no prefix needed).
+45. `tools` array: only include tools the agent actually uses.
+46. Agents write artifacts to disk. They never modify STATE.md or ROADMAP.md.
+47. Agent definitions are self-contained — no cross-agent references.
+---
+## Hook Development
+48. All hooks use **CommonJS** (`require`), never ES modules (`import`).
+49. All hooks call `logHook()` from `hook-logger.js`. No silent exits.
+50. Exit codes: `0` = allow/success, `2` = block (PreToolUse only), other = error (logged, non-blocking).
+51. Hooks receive JSON on stdin, write JSON to stdout.
+52. Fail gracefully: wrap in try/catch, exit 0 on parse errors.
+53. Every hook in `hooks.json` must have a `statusMessage` field (gerund form: "Validating...", keep short).
+54. Windows file deletion: use 3-attempt retry loop to handle AV/indexer locks.
+55. Every hook script must have a corresponding `tests/{name}.test.js`.
+---
+## Commits
+56. Format: `{type}({scope}): {description}`.
+57. Valid types: `feat`, `fix`, `refactor`, `test`, `docs`, `chore`, `wip`.
+58. Valid scopes: `{NN}-{MM}` (phase-plan), `quick-{NNN}`, `planning`, or any lowercase word.
+59. **Never** use `git add .` or `git add -A` — stage specific files only.
+60. Blocked files: `.env` (not `.env.example`), `*.key`, `*.pem`, `*.pfx`, `*.p12`, `*credential*`, `*secret*` (unless in `tests/` or `*.example`).
+61. TDD tasks: exactly 3 commits — RED (test), GREEN (feat), REFACTOR.
+62. Executors: one atomic commit per task.
+---
+## Templates
+63. Default syntax: `{variable}` placeholders for string substitution.
+64. Only use EJS (`<%= %>`, `<% %>`) when loops or conditionals are genuinely needed.
+65. Template files use `.tmpl` extension. Name matches the output file (e.g., `SUMMARY.md.tmpl` → `SUMMARY.md`).
+66. Templates live in `templates/` (global), `templates/codebase/` (scan), `templates/research/` (research), or `skills/{name}/templates/` (skill-specific).
+---
+## File Naming
+67. Skills: `plugins/pbr/skills/{skill-name}/SKILL.md` — lowercase hyphenated dir, uppercase `SKILL.md`.
+68. Agents: `plugins/pbr/agents/{name}.md` — agent name matches file name.
+69. Scripts: `plugins/pbr/scripts/{name}.js` — lowercase, hyphenated, CommonJS.
+70. References: `plugins/pbr/references/{name}.md` — lowercase, hyphenated.
+71. Tests: `tests/{script-name}.test.js` — mirrors the script name.
+---
+## Testing
+72. Test files mirror script names: `scripts/foo.js` → `tests/foo.test.js`.
+73. Use the fixture project at `tests/fixtures/fake-project/` for integration tests.
+74. Hook tests: use stdin/stdout protocol with `execSync()`, timeout 5000ms, run in temp dir.
+75. Coverage target: 70% branches/functions/lines/statements.
+76. CI matrix: Node 18/20/22 × ubuntu/macos/windows. All 9 must pass.
+77. Local pre-push: `npm run lint && npm test && npm run validate`.
+---
+## Cross-Platform
+78. **Never** hardcode `/` or `\` — always `path.join()`.
+79. Use `\n` in string literals — Node.js and Git normalize line endings.
+80. In `hooks.json`, use `${CLAUDE_PLUGIN_ROOT}` — Claude Code expands this internally on all platforms.
+81. **Never** use `$CLAUDE_PLUGIN_ROOT` (shell expansion is platform-dependent).
+82. **Never** rely on execute bits — always invoke via `node script.js`.
+---
+## Dependencies & Gates
+83. Check all `depends_on` plans have SUMMARY.md files before executing.
+84. Checkpoint tasks (`checkpoint:human-verify`) require STOP — write partial SUMMARY.md, return metadata.
+85. Respect all `gates.confirm_*` config toggles. Never auto-proceed when a gate is enabled.
+---
+## Logging
+86. Hook scripts → `logHook()` (writes to `logs/hooks.jsonl`, max 200 entries).
+87. Workflow milestones → `logEvent()` (writes to `logs/events.jsonl`, max 1,000 entries).
+88. Rule of thumb: inside a hook? `logHook()`. Tracking a high-level event? `logEvent()`.
+---
+## Gotchas
+89. JSDoc `*/` in glob patterns closes Babel comments early — use `//` line comments instead.
+90. Regex anchors: use `\b` word boundaries, not `^` anchors (misses chained commands like `cd && git commit`).
+91. Windows cwd locking in tests: use `cwd` option in `execSync()` instead of `process.chdir()`.
+92. PLAN.md frontmatter must include all required fields: `phase`, `plan`, `type`, `wave`, `depends_on`, `files_modified`, `autonomous`, `must_haves`.
+---
+## Quick Reference: Never Do This
+| # | Anti-Pattern | Impact |
+|---|-------------|--------|
+| 1 | Read agent definitions in orchestrator | Context balloons 15% → 88% |
+| 2 | Inline large files into Task() prompts | Wasted orchestrator context |
+| 3 | Read full subagent output | Context bloat |
+| 4 | Use non-Plan-Build-Run branding | User confusion |
+| 5 | Hardcode path separators | Cross-platform breakage |
+| 6 | Use ES modules in hooks | Hook fails to load |
+| 7 | Skip hook logging | Silent failures |
+| 8 | Modify STATE.md in agents | Race conditions, corruption |
+| 9 | Use `git add .` | Sensitive data leaks |
+| 10 | Skip tests for new scripts | CI failures, regressions |

package/plugins/copilot-pbr/references/plan-authoring.md ADDED Viewed

@@ -0,0 +1,182 @@
+<!-- canonical: ../../pbr/references/plan-authoring.md -->
+# Plan Authoring Guide
+Quality guidelines for writing executable plans. Used by planner and plan-checker.
+---
+## Action Writing Guidelines
+The `<action>` element is the most important part of the plan. It must be specific enough that the executor agent can follow it mechanically without making design decisions.
+### Good Action
+```xml
+<action>
+1. Create file `src/auth/discord.ts`
+2. Import `OAuth2Client` from `discord-oauth2` package
+3. Export async function `authenticateWithDiscord(code: string): Promise<User>`
+   - Create OAuth2Client with env vars: DISCORD_CLIENT_ID, DISCORD_CLIENT_SECRET
+   - Exchange authorization code for access token
+   - Fetch user profile from Discord API: GET /api/users/@me
+   - Return User object with fields: id, username, avatar, email
+4. Export function `getDiscordAuthUrl(): string`
+   - Build OAuth2 authorization URL with scopes: identify, email
+   - Include redirect URI from env: DISCORD_REDIRECT_URI
+   - Return the URL string
+5. Add to `.env.example`:
+   DISCORD_CLIENT_ID=
+   DISCORD_CLIENT_SECRET=
+   DISCORD_REDIRECT_URI=http://localhost:3000/auth/callback
+</action>
+```
+### Bad Action
+```xml
+<action>
+Set up Discord OAuth authentication.
+</action>
+```
+### Action Rules
+1. **Number the steps** -- executor follows them in order
+2. **Name specific files** -- never say "create necessary files"
+3. **Name specific functions/exports** -- never say "implement the auth logic"
+4. **Include type signatures** -- when the project uses TypeScript
+5. **Reference existing code** -- when modifying files, say what to modify
+6. **Include code snippets** -- for complex patterns or configurations
+7. **Specify environment variables** -- with example values
+8. **Note error handling** -- only when it's a critical part of the task
+---
+## Verify Command Guidelines
+The `<verify>` element must contain commands that the executor can run to check the task is complete.
+### Good Verify
+```xml
+<verify>
+npx tsc --noEmit
+npm run test -- --grep "discord auth"
+ls -la src/auth/discord.ts
+</verify>
+```
+### Bad Verify
+```xml
+<verify>
+Check that authentication works.
+</verify>
+```
+### Verify Rules
+1. **Must be executable** -- actual shell commands, not descriptions
+2. **Must be deterministic** -- same result every time if code is correct
+3. **Prefer automated checks** -- type checking, tests, linting
+4. **Include existence checks** -- `ls` for created files
+5. **Include build checks** -- `npx tsc --noEmit` for TypeScript
+6. **Avoid interactive commands** -- no commands requiring user input
+---
+## Done Condition Guidelines
+The `<done>` element describes the observable outcome, not the implementation activity.
+### Good Done
+- "User can authenticate via Discord OAuth and is redirected to dashboard"
+- "Auth middleware rejects invalid tokens and passes valid tokens"
+- "All 12 unit tests pass for the auth module"
+### Bad Done
+- "Code was written"
+- "File was created"
+- "Feature is implemented"
+### Done Rules
+1. **Maps to a must-have** -- every done condition traces back to a frontmatter must-have
+2. **Observable** -- describes a state of the system, not an activity
+3. **Falsifiable** -- you can check whether it's true or false
+4. **User-oriented** -- prefer "user can..." over "code does..."
+---
+## Scope Limits
+### Per-Plan Limits
+| Constraint | Limit | Rationale |
+|-----------|-------|-----------|
+| Tasks per plan | **2-3** | Keeps plans atomic and recoverable |
+| Files per plan | **5-8** | Limits blast radius of failures |
+| Dependencies | **3 max** | Avoids deep dependency chains |
+### When to Split
+- More than 3 tasks? Split.
+- More than 8 files? Split.
+- Tasks in different functional areas? Split.
+- Some tasks need human checkpoints, others don't? Split into autonomous and checkpoint plans.
+### Split Signals
+| Signal | Action |
+|--------|--------|
+| >3 tasks needed | Split by subsystem -- one plan per subsystem |
+| Multiple unrelated subsystems | One plan per subsystem |
+| >5 files per task | Task is too big -- break it down |
+| Checkpoint + implementation in same plan | Separate the checkpoint into its own plan |
+| Discovery research + implementation | Separate plans -- research plan first |
+---
+## Discovery Levels
+When a plan requires research before execution, set the `discovery` field in plan frontmatter. Default is 1 for most plans.
+| Level | Name | Description | Executor Behavior |
+|-------|------|-------------|-------------------|
+| 0 | Skip | No research needed | Execute immediately |
+| 1 | Quick | Fast verification | Check official docs for 1-2 specific questions |
+| 2 | Standard | Normal research | Spawn researcher for phase research |
+| 3 | Deep | Extensive investigation | Full research cycle before execution |
+### Level 0 -- Skip
+**When to use**: Simple refactors, documentation updates, file renames, configuration tweaks, or any task where the implementation approach is unambiguous from the plan's `<action>` steps alone.
+### Level 1 -- Quick (default)
+**When to use**: Standard feature work where the technology is known but specific API signatures, config options, or version-specific behavior need a quick check.
+### Level 2 -- Standard
+**When to use**: Work involving unfamiliar libraries, new integration patterns, or approaches the executor hasn't seen in this codebase before.
+### Level 3 -- Deep
+**When to use**: High-risk or architecturally significant work where getting the approach wrong would require substantial rework.
+---
+## Dependency Graph Rules
+### File Conflict Detection
+Two plans CONFLICT if their `files_modified` lists overlap. Conflicting plans:
+- MUST be in different waves (cannot run in parallel)
+- MUST have explicit `depends_on` relationship
+- Later plan's `<action>` must reference what the earlier plan produces
+### Circular Dependencies
+**NEVER create circular dependencies.** If you detect a potential cycle, restructure the plans to break it. Common resolution: merge the circular plans into one, or extract the shared dependency into its own plan.
+---
+## Context Fidelity Checklist
+Before writing plan files, verify context compliance:
+1. **Locked decision coverage**: For each locked decision in CONTEXT.md, identify the task that implements it.
+2. **Deferred idea exclusion**: Scan all tasks -- no task should implement a deferred idea.
+3. **Discretion area handling**: Each "Claude's Discretion" item should be addressed in at least one task.