npm - @sulhadin/orchestrator - Versions diffs - 3.0.0-beta.9 → 3.0.0 - Mend

@sulhadin/orchestrator 3.0.0-beta.9 → 3.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (16) hide show

package/README.md +17 -13
package/bin/build-template.js +9 -1
package/package.json +2 -2
package/template/.claude/agents/conductor.md +174 -16
package/template/.claude/agents/reviewer.md +1 -1
package/template/.claude/commands/orchestra/help.md +3 -1
package/template/.claude/commands/orchestra/rewind.md +60 -0
package/template/.claude/commands/orchestra/start.md +6 -2
package/template/.claude/commands/orchestra/status.md +2 -2
package/template/.claude/commands/orchestra/verifier.md +52 -0
package/template/.orchestra/README.md +93 -45
package/template/.orchestra/config.yml +9 -4
package/template/.orchestra/knowledge.md +1 -1
package/template/.orchestra/roles/product-manager.md +43 -11
package/template/CLAUDE.md +18 -2
package/bin/merge-config.test.js +0 -135

package/README.md CHANGED Viewed

@@ -4,7 +4,7 @@ AI team orchestration for [Claude Code](https://docs.anthropic.com/en/docs/claud
 ## What is Orchestra?
-Orchestra turns a single Claude Code session into a coordinated development team. A Product Manager plans features, a Conductor executes them — switching between specialized roles (backend, frontend, architect) automatically. Each role has strict boundaries, every commit passes verification, and the system learns from past milestones.
+Orchestra turns a single Claude Code session into a coordinated development team. A Product Manager plans features, a Conductor orchestrates them — delegating each phase to a sub-agent with the right role (backend, frontend, architect). Sub-agents own implementation and verification; conductor owns commits. Each role has strict boundaries, every commit passes verification, and the system learns from past milestones.
 No infrastructure. No API keys. Just markdown files and Claude Code.
@@ -23,12 +23,12 @@ Terminal 1 (PM):                    Terminal 2 (Conductor):
   /orchestra pm                       /orchestra start
   │                                   │
   ├─ Discuss features                 ├─ Scan milestones
-  ├─ Create milestones                ├─ Activate architect → RFC
-  ├─ Groom phases                     ├─ Activate backend → code + tests
-  │                                   ├─ Activate frontend → UI
+  ├─ Create milestones                ├─ Delegate to architect → RFC
+  ├─ Groom phases                     ├─ Delegate to backend → code + tests
+  │                                   ├─ Delegate to frontend → UI
   │  (plan M2 while M1 runs)          ├─ Call reviewer → code review
   │                                   ├─ Push → milestone done
-  │                                   └─ Loop → next milestone
+  │                                   └─ Stop (inline) or next milestone (agent)
 ```
 ## Quick Example
@@ -49,8 +49,7 @@ PM challenges scope, creates M1-user-auth with 3 phases
 ⚙️ backend → phase-2: API endpoints → committed
 🎨 frontend → phase-3: Login UI → committed
 🔍 reviewer → approved
-🚦 Push? → yes
-✅ M1-user-auth done. Checking for next milestone...
+✅ M1-user-auth done. Pushed to origin.
 ```
 ## Commands
@@ -63,6 +62,8 @@ PM challenges scope, creates M1-user-auth with 3 phases
 | `/orchestra start --auto` | Fully autonomous — warns once, then auto-push |
 | `/orchestra hotfix {desc}` | Ultra-fast fix: implement → verify → commit → push |
 | `/orchestra status` | Milestone status report (PM only) |
+| `/orchestra verifier [N]` | Verify milestones match PRD/RFC requirements (PM only) |
+| `/orchestra rewind [N]` | Review execution history: decisions, metrics, insights (PM only) |
 | `/orchestra blueprint {name}` | Generate milestones from template |
 | `/orchestra blueprint add` | Save current work as reusable template |
 | `/orchestra create-role` | Create a new role interactively (Orchestrator only) |
@@ -86,7 +87,7 @@ PM challenges scope, creates M1-user-auth with 3 phases
 │   ├── conductor.md                    ← Autonomous milestone executor
 │   └── reviewer.md                     ← Independent code review
 ├── skills/*.orchestra.md               ← 14 domain checklists
-├── rules/*.orchestra.md                ← 8 discipline rules
+├── rules/*.orchestra.md                ← Discipline rules (auto-loaded)
 └── commands/orchestra/                 ← /orchestra commands
 .orchestra/                             ← Project data + config
@@ -101,10 +102,11 @@ PM challenges scope, creates M1-user-auth with 3 phases
 **Config-driven pipeline** — `.orchestra/config.yml` controls everything: verification commands (customize for Go, Python, Rust), approval gates, thresholds, parallel execution. No hardcoded assumptions.
-**Three complexity levels** — PM sets per milestone:
-- `quick` → Engineer → Commit → Push (trivial changes)
-- `standard` → Engineer → Review → Push (typical features)
-- `full` → Architect → Engineer → Review → Push (complex work)
+**Four complexity levels with model tiering** — PM sets per phase:
+- `trivial` (haiku) → Config changes, version bumps
+- `quick` (sonnet) → Single-file fixes, simple CRUD
+- `standard` (sonnet) → Typical features (default)
+- `complex` (opus) → New subsystems, architectural changes
 **Verification gate** — Tests + lint must pass before every commit. Commands come from config. Fails 3 times → phase marked failed, escalated to user.
@@ -116,6 +118,8 @@ PM challenges scope, creates M1-user-auth with 3 phases
 **Role boundaries** — Enforced via `.claude/rules/`. PM cannot write code. Engineers cannot modify system files. Orchestrator cannot write features. Boundaries checked by file path, not by words.
+**Milestone isolation** — `inline` mode stops after each milestone (user compacts manually). `agent` mode spawns each milestone in its own sub-agent — context freed automatically, enabling 20+ milestones in a single `--auto` session.
 **Stuck detection** — Detects repeated failures, circular fixes, over-engineering. Tries different approach once, then escalates. Auto mode skips to next phase.
 ## Upgrading
@@ -135,7 +139,7 @@ Smart merge on upgrade:
 | Blueprints (your custom) | Preserved |
 | milestones/ | Untouched |
 | knowledge.md | Preserved |
-| config.yml | Preserved |
+| config.yml | Smart merged (user values preserved, new keys added) |
 ## Documentation

package/bin/build-template.js CHANGED Viewed

@@ -6,9 +6,17 @@ const path = require("path");
 const rootDir = process.cwd();
 const templateDir = path.join(rootDir, "template");
+// Dev-only agents that should NOT be published to users
+const DEV_ONLY_AGENTS = new Set([
+  "codebase-deep-analyzer.md",
+  "orchestra-analyzer.md",
+  "orchestra-reviewer.md",
+  "repo-deep-analyzer.md",
+]);
 // System files to include in the template
 const SYSTEM_PATHS = [
-  { src: ".claude/agents", dest: ".claude/agents" },
+  { src: ".claude/agents", dest: ".claude/agents", filter: (f) => !DEV_ONLY_AGENTS.has(f) },
   { src: ".claude/commands/orchestra", dest: ".claude/commands/orchestra" },
   { src: ".claude/rules", dest: ".claude/rules", filter: (f) => f.endsWith(".orchestra.md") },
   { src: ".claude/skills", dest: ".claude/skills", filter: (f) => f.endsWith(".orchestra.md") },

package/package.json CHANGED Viewed

@@ -1,10 +1,10 @@
 {
   "name": "@sulhadin/orchestrator",
-  "version": "3.0.0-beta.9",
+  "version": "3.0.0",
   "description": "AI Team Orchestration System — multi-role coordination for Claude Code",
   "bin": "bin/index.js",
   "scripts": {
-    "test": "node --test bin/**/*.test.js",
+    "test": "node --test test/**/*.test.js",
     "template": "node bin/build-template.js",
     "prepare": "husky"
   },

package/template/.claude/agents/conductor.md CHANGED Viewed

@@ -18,8 +18,10 @@ by delegating each phase to a sub-agent. You NEVER implement code yourself.
 When started:
-1. If `--auto`: print `Warning: Auto mode — all gates skipped, auto-push enabled.` and proceed.
+1. If `--auto`: print `Warning: Auto mode — RFC gate skipped, fully autonomous.` and proceed.
 2. Read `.orchestra/config.yml` for pipeline settings and thresholds.
+   - Read `pipeline.milestone_isolation` (default: `inline`).
+   - If `--auto` and `milestone_isolation: inline`: warn once: "Inline mode with --auto: conductor stops after each milestone. Consider `milestone_isolation: agent` for batch runs."
 3. Read `.orchestra/README.md` for orchestration rules.
 4. Read `.orchestra/knowledge.md` Active Knowledge section (skip Archive).
 5. Scan milestones:
@@ -136,10 +138,11 @@ that affect this phase. Omit for first phase.}
 ### 4. Process Sub-Agent Result (Conductor does this)
 - If **done** (verification passed):
-  1. Conductor commits → update phase status → `done`, update context.md
-  2. Store sub-agent ID for potential review fix cycle
+  1. Conductor commits
+  2. Update context.md: set phase `done`, add commit hash + files_changed, append decisions from notes
+  3. Store sub-agent ID for potential review fix cycle
 - If **failed** (verification failed after max retries):
-  1. Log in context.md: phase name, last error summary, retry count
+  1. Update context.md: set phase `failed`, add error summary + last-error + retry count
   2. Decide: retry with new sub-agent or escalate to user
 **Note:** Conductor owns commit only. Sub-agents own implementation + verification.
@@ -168,8 +171,8 @@ If config.yml `pipeline.parallel: enabled`:
 After all implementation phases (unless config says `review: skip`):
 1. Call reviewer agent (`.claude/agents/reviewer.md`) as sub-agent
 2. Reviewer reads git diff independently, applies checklist, returns verdict
-3. **approved** → push gate
-4. **approved-with-comments** → push gate, log comments in context.md
+3. **approved** → push immediately
+4. **approved-with-comments** → push immediately, log comments in context.md
 5. **changes-requested** → fix cycle:
    - Use SendMessage to continue the last phase's sub-agent with reviewer findings
      (if sub-agent no longer available, launch new sub-agent with findings + role)
@@ -179,16 +182,17 @@ After all implementation phases (unless config says `review: skip`):
 ## Approval Gates
 Read gate behavior from config.yml:
-- **Normal mode:** Ask user at configured gates (rfc_approval, push_approval).
+- **Normal mode:** Ask user at RFC gate (rfc_approval). Push is automatic after review passes.
 - **Auto mode:** Skip all gates. Print status but don't wait.
 ## Rejection Flow
 - **RFC Rejected:** Ask feedback → architect revises → re-submit (max config.yml `pipeline.max_rfc_rounds`).
-- **Push Rejected:** Ask feedback → create fix phase → re-submit.
 ## Milestone Completion
+### Inline Mode (default)
 After push:
 1. Update milestone.md `status: done`, remove `Locked-By`.
 2. Append 5-line retrospective to knowledge.md:
@@ -200,26 +204,180 @@ After push:
    - Review findings: {N blocking, N non-blocking} — {top issue}
    - Missing skill: {name or "none"}
    ```
+3. Proceed to "Next Milestone — Mode-Dependent Behavior" → Inline Mode.
+### Agent Mode
+Milestone agent handles push and returns structured result (see Milestone Agent Delegation).
+Conductor processes the return:
+1. Update milestone.md `status: done`, remove `Locked-By`.
+2. Append retro from milestone agent's return to knowledge.md.
+3. Proceed to "Next Milestone — Mode-Dependent Behavior" → Agent Mode.
+## Next Milestone — Mode-Dependent Behavior
+Behavior after milestone completion depends on `pipeline.milestone_isolation`:
+### Inline Mode (default)
+After push and retro:
+1. **STOP.** Print: "Milestone {id} complete and pushed."
+2. Do NOT loop to next milestone.
+### Agent Mode
+After milestone agent returns (retro already written in Milestone Completion above):
+1. Re-read knowledge.md Active section (may have new retros)
+2. Re-scan `.orchestra/milestones/` using Glob (PM may have created new ones)
+3. If pending → spawn next milestone agent
+4. If none → "All milestones complete. Waiting for new work from PM."
+Context stays lean because all phase-level context lived in the (now ended)
+milestone agent. Conductor only accumulates ~1-2k tokens per milestone
+(prompt + structured result).
+## Milestone Agent Delegation (Agent Mode Only)
+This section applies ONLY when config `pipeline.milestone_isolation: agent`.
+In agent mode, the conductor becomes a two-tier dispatcher:
+- Conductor spawns one milestone agent per milestone
+- Milestone agent spawns phase sub-agents (same as current phase delegation)
+- When milestone agent completes, its context is freed entirely
+### Milestone Agent Prompt Template
+```
+You are a Milestone Agent executing milestone {milestone_id}: {title}.
+Rules from `.claude/rules/*.orchestra.md` are automatically loaded.
+**Config:**
+{config_yml_content}
+**Orchestration Rules:**
+{readme_content}
+**Active Knowledge:**
+{knowledge_active_section}
+**Milestone:**
+{milestone.md content}
+**Grooming:**
+{grooming.md content}
+**Context (if resuming):**
+{context.md content}
+Sections: `## Status` (milestone state), `## Phases` (per-phase status — skip `done` phases),
+`## Decisions` (cross-phase context), `## Metrics` (duration + retries per phase).
-## Next Milestone
+**Phase files:**
+{all phase file contents, in order}
-After completion:
-- Re-scan `.orchestra/milestones/` using Glob (PM may have created new ones)
-- If found → start next milestone
-- If none → "All milestones complete. Waiting for new work from PM."
+**Role files (unique, one per role used in phases):**
+{role file contents — deduplicated}
+**Skills (unique, one per skill used in phases):**
+{skill file contents — deduplicated}
+## Your Task
+Execute this milestone using the Phase Execution protocol:
+1. For each phase: pre-flight → compose prompt → delegate to phase sub-agent → process result
+2. Conductor (you) commits after each successful phase, updates context.md
+3. After all phases: trigger review (unless config says skip)
+4. After review passes: push to origin
+5. On phase failure after max retries: set phase to `failed`, log in context.md
+   - If stuck: set milestone status to `failed`, return immediately
+6. You own exactly ONE milestone — do NOT loop to other milestones
+## Return Format
+- status: done | failed
+- phases_completed: [list of phase names]
+- phases_failed: [list with error summaries]
+- review_verdict: approved | approved-with-comments | changes-requested | skipped
+- pushed: true | false
+- retro: |
+    ## Retro: {id} — {title} ({date})
+    - Longest phase: {name} (~{duration}) — {why}
+    - Verification retries: {count} — {which phases}
+    - Stuck: {yes/no} — {root cause if yes}
+    - Review findings: {N blocking, N non-blocking} — {top issue}
+    - Missing skill: {name or "none"}
+- notes: {anything conductor should know for subsequent milestones}
+IMPORTANT: Return retro text in your result. Do NOT write to knowledge.md — conductor handles this.
+```
+### Processing Milestone Agent Result
+Conductor processes the return:
+- **status: done + pushed: true** → Write retro to knowledge.md, update milestone.md status to `done`, remove `Locked-By`, proceed to next milestone.
+- **status: failed** → Log failure to context.md, write partial retro to knowledge.md.
+  - `--auto` mode: move to next milestone.
+  - Normal mode: stop and report to user with options: (a) retry with fresh agent, (b) skip, (c) stop.
+- **status: done + pushed: false** → Log error, escalate to user.
+### Milestone Agent Configuration
+- Use default (general-purpose) subagent_type — milestone identity is in the prompt
+- Do NOT use `isolation: "worktree"` — milestones run sequentially, not in parallel
+- Milestone agent inherits all conductor capabilities: git, Agent tool, file access
+- On resume (milestone was `in-progress`): include context.md in prompt — milestone agent reads phase statuses and continues from last completed phase
 ## Context Persistence
-Update context.md at: phase start, phase completion (with sub-agent summary), errors.
-On resume: read context.md, continue from last completed phase.
+context.md uses a fixed structure. Conductor updates it at phase start, completion, and on errors.
+### context.md Format
+```markdown
+## Status
+milestone: {milestone-id}
+started: {YYYY-MM-DD}
+pipeline: {quick | standard | full}
+## Phases
+- phase-1: {done | in-progress | failed | pending} | commit: {hash} | files: {changed files}
+- phase-2: {status} | error: {error summary, retry count} | last-error: {specific error}
+- phase-3: pending
+...
+## Codebase Map
+{path — one-line description, generated by scout sub-agent}
+## Decisions
+- phase-1: {key decision or trade-off made during implementation}
+- phase-2: {why a specific approach was chosen}
+## Metrics
+- phase-1: duration: ~{N}min | verification_retries: {N}
+- phase-2: duration: ~{N}min | verification_retries: {N}
+```
+### Update Rules
+- **Phase start:** Set phase status to `in-progress`
+- **Phase done:** Set status to `done`, add commit hash and files_changed from sub-agent result
+- **Phase failed:** Set status to `failed`, add error summary and last-error
+- **Decisions:** Append key decisions from sub-agent's `notes` field — only non-obvious choices that affect later phases
+- **Metrics:** Record approximate phase duration and verification_retries from sub-agent result
+- **Milestone complete:** Retro is written to knowledge.md (see Milestone Completion)
+### On Resume
+Read context.md → skip phases marked `done` → resume from first non-done phase.
+`## Decisions` from completed phases are included in "previous phase summary" for the next sub-agent — this preserves cross-phase context even after session restart.
 ## Hotfix Pipeline
+Hotfix always runs inline regardless of `milestone_isolation` setting — single-phase fast path, sub-agent isolation adds no value.
 When user types `/orchestra hotfix {description}`:
 1. Auto-create hotfix milestone + single phase
 2. Launch implementation sub-agent (model: standard) — implements, verifies, reports
 3. If done → conductor commits → push immediately (no RFC, no review, no gates)
-5. Append one-liner to knowledge.md
+4. Append one-liner to knowledge.md
 6. Return to normal execution if active
 ## What Conductor Does NOT Do

package/template/.claude/agents/reviewer.md CHANGED Viewed

@@ -10,7 +10,7 @@ Review code independently. No implementation context by design — only the code
 ## Process
-1. Read context.md for objectives and acceptance criteria
+1. Read milestone.md for objectives, phase files for acceptance criteria, context.md for codebase map and decisions
 2. Read RFC if exists
 3. `git log origin/{branch}..HEAD` + `git diff origin/{branch}...HEAD`
 4. Detect mode from diff: backend / frontend / both → apply relevant checklist

package/template/.claude/commands/orchestra/help.md CHANGED Viewed

@@ -11,6 +11,8 @@ COMMANDS:
   /orchestra start --auto    Fully autonomous (warns once, then auto-push)
   /orchestra hotfix {desc}   Ultra-fast fix: implement → verify → commit → push
   /orchestra status          Milestone status report (PM only)
+  /orchestra verifier [N]    Verify milestones match requirements (PM only)
+  /orchestra rewind [N]      Review milestone execution history (PM only)
   /orchestra help            Show this help
   /orchestra blueprint {name}  Generate milestones from template (PM only)
   /orchestra blueprint add   Save current work as blueprint (PM only)
@@ -41,7 +43,7 @@ FILES:
   .claude/skills/*.orchestra.md    Domain checklists (auth, CRUD, deploy, etc.)
   .claude/rules/*.orchestra.md     Discipline rules (verification, commit format, etc.)
   .claude/commands/orchestra/      Orchestra commands
-  .orchestra/roles/                Role identities (slim, 15 lines each)
+  .orchestra/roles/                Role identities (one file per role)
   .orchestra/config.yml            Pipeline configuration
   .orchestra/blueprints/           Project/component templates
   .orchestra/knowledge.md          Append-only project knowledge

package/template/.claude/commands/orchestra/rewind.md ADDED Viewed

@@ -0,0 +1,60 @@
+Review milestone execution history for actionable insights. PM role only.
+**Usage:**
+- `/orchestra rewind` — rewind all `done` milestones
+- `/orchestra rewind 1,2,3` — rewind only specified milestone numbers
+1. Read `.orchestra/roles/product-manager.md` to activate PM.
+2. Scan `.orchestra/milestones/` — collect milestones to review:
+   - No arguments: all milestones with `status: done`
+   - With numbers: only milestones matching those numbers (e.g., `1` matches `M1-*`)
+3. For each milestone, read execution artifacts:
+   - `context.md` — structured sections:
+     - `## Decisions` — key choices made during implementation
+     - `## Metrics` — phase duration and verification retries
+     - `## Phases` — status, commits, errors per phase
+   - `knowledge.md` — retro entry for this milestone
+   - `grooming.md` — original scope vs what actually happened
+   - Review verdict and comments (from context.md or git log)
+4. Extract and present — focus on **what the user needs to know**, not execution mechanics:
+```
+## Rewind: M1-user-auth
+### Key Decisions Made During Execution
+- phase-1: Used Stripe SDK v4 instead of raw API (architect RFC recommendation)
+- phase-2: Split webhook handler into separate file for testability
+- phase-3: Chose CSS modules over Tailwind (frontend preference)
+### Performance
+- Total phases: 5 | Completed: 5 | Failed: 0
+- Longest phase: phase-3 (~12min) — complex UI with form validation
+- Verification retries: 3 total (phase-2: 2, phase-4: 1)
+- Stuck: No
+### Review Findings
+- Verdict: approved-with-comments
+- Comments:
+  - "Consider adding index on user_email for login query" (non-blocking)
+  - "Error messages expose internal details" (non-blocking, logged)
+### Scope Changes
+- Original grooming planned 4 phases, executed 5 (phase-3 was split during implementation)
+- phase-2 scope expanded: webhook handler was not in original PRD, added during RFC
+### Unresolved Items
+- 🔧 DB index on user_email — reviewer flagged, not addressed
+- 🔧 Error message sanitization — reviewer flagged, not addressed
+- 🔧 phase-2 workaround: hardcoded timeout — flagged as tech debt in Decisions
+### What We Learned
+- 📝 Webhook handler pattern — reusable for future integrations
+- ⏱️ Form validation phases consistently slow — consider a form-validation skill
+- 💡 Splitting phase-3 mid-execution worked well — complex UI benefits from smaller phases
+```
+5. After all milestones, present a cross-milestone summary:
+   - **Unresolved items** — review comments and flagged workarounds never addressed, across all milestones
+   - **Recurring patterns** — same review comments, same slow phase types, same failure modes
+   - **Skill gaps** — missing skills that would have helped
+   - **Strategic suggestions** — new skills to create, process improvements, items to fix in upcoming work

package/template/.claude/commands/orchestra/start.md CHANGED Viewed

@@ -7,7 +7,11 @@ The conductor will:
 2. Execute phases sequentially (or parallel if configured)
 3. Activate roles, load skills, implement code
 4. Trigger code review via reviewer agent
-5. Push after approval (or auto-push in --auto mode)
-6. Loop to next milestone until all complete
+5. Push automatically after review passes
+6. Behavior after milestone: stop (inline mode) or continue to next (agent mode)
+Config `pipeline.milestone_isolation` controls post-milestone behavior:
+- `inline` (default): stops after each milestone. User compacts and restarts.
+- `agent`: spawns each milestone in sub-agent. Loops automatically. Best with `--auto`.
 Pass `--auto` flag for fully autonomous mode (warns once, then skips all gates).

package/template/.claude/commands/orchestra/status.md CHANGED Viewed

@@ -5,7 +5,7 @@ Show full milestone status report. PM role only.
 3. For active milestones, read context.md for progress details.
 4. Report:
    - All milestones with status, current phase, next action
-   - Phase details for active milestone (role, status, cost tracking)
+   - Phase details for active milestone (role, status, metrics)
    - Git status (branch, unpushed commits)
-   - Cost summary (from context.md)
+   - Metrics summary from context.md `## Metrics` section (duration + retries per phase)
    - Actions needed (specific next steps)

package/template/.claude/commands/orchestra/verifier.md ADDED Viewed

@@ -0,0 +1,52 @@
+Verify that implemented milestones match their requirements. PM role only.
+**Usage:**
+- `/orchestra verifier` — verify all `done` milestones
+- `/orchestra verifier 1,2,3` — verify only specified milestone numbers
+1. Read `.orchestra/roles/product-manager.md` to activate PM.
+2. Scan `.orchestra/milestones/` — collect milestones to verify:
+   - No arguments: all milestones with `status: done`
+   - With numbers: only milestones matching those numbers (e.g., `1` matches `M1-*`)
+3. For each milestone, read:
+   - `prd.md` — product requirements and acceptance criteria
+   - `rfc.md` — technical design decisions (if exists)
+   - `milestone.md` — summary and acceptance criteria
+   - `grooming.md` — scope decisions and phase breakdown
+   - All `phases/*.md` — phase acceptance criteria
+4. For each milestone, read execution context:
+   - `context.md` — `## Decisions` section (why specific approaches were chosen)
+   - `context.md` — `## Phases` section (which phases completed, which failed)
+5. For each milestone, read the actual implementation:
+   - Run `git log --oneline` filtered to commits from that milestone's phases
+   - Run `git diff` for those commits to see what changed
+   - Read the current state of modified files — diff shows changes, but current code shows completeness
+6. Compare requirements vs implementation. For each requirement/acceptance criterion:
+   - **met** — implementation satisfies the requirement
+   - **partial** — partially implemented, missing aspects noted
+   - **missed** — not implemented at all
+   - **deviated** — implemented differently than specified
+6. Report:
+```
+## Verification: M1-user-auth
+### Requirements Coverage
+- ✅ met: JWT authentication endpoint (phase-1, commit abc123)
+- ⚠️ partial: Rate limiting — implemented but no Redis backing (phase-2)
+- ❌ missed: Password reset flow — not in any commit
+- 🔀 deviated: Token refresh — RFC said rotating tokens, implemented static expiry
+### Summary
+4 requirements: 1 met, 1 partial, 1 missed, 1 deviated
+### Severity
+- 🔴 critical: Password reset flow missing (core auth feature)
+- 🟡 moderate: Rate limiting without Redis (works but won't scale)
+- 🟡 moderate: Token refresh deviation (security concern)
+```
+8. After reporting all milestones, if there are critical or moderate gaps:
+   - List gaps grouped by severity
+   - Suggest: "Use `/orchestra pm` to plan fix milestones for these gaps."
+   - Do NOT create milestones directly — PM decides scope and priority

package/template/.orchestra/README.md CHANGED Viewed

@@ -10,14 +10,13 @@ Terminal 1 (PM):                    Terminal 2 (Conductor):
   /orchestra pm                      /orchestra start
   │                                  │
   ├─ Discuss features with user      ├─ Scan milestones
-  ├─ Create milestones               ├─ 🏗️ architect → RFC
+  ├─ Create milestones               ├─ 🏗️ delegate to architect → RFC
   ├─ Groom phases                    ├─ 🚦 User approves RFC
-  ├─ Always available                ├─ ⚙️ backend → phase by phase
-  │                                  ├─ 🎨 frontend → phase by phase
+  ├─ Always available                ├─ ⚙️ delegate to backend → phase by phase
+  │                                  ├─ 🎨 delegate to frontend → phase by phase
   │  (can plan M2 while M1 runs)     ├─ 🔍 reviewer → review commits
-  │                                  ├─ 🚦 User approves push
   │                                  ├─ git push → milestone done
-  │                                  └─ Loop → next milestone
+  │                                  └─ Stop (inline) or next milestone (agent)
 ```
 ## Directory Structure
@@ -25,7 +24,7 @@ Terminal 1 (PM):                    Terminal 2 (Conductor):
 ```
 .orchestra/
 ├── README.md              # This file
-├── roles/                 # Role identities (slim, ~15 lines each)
+├── roles/                 # Role identities (one file per role)
 │   ├── product-manager.md
 │   ├── architect.md
 │   ├── backend-engineer.md
@@ -56,8 +55,10 @@ You can plan new milestones while the conductor is executing another one.
 ### Terminal 2: `/orchestra start` (Execution)
-Conductor reads milestones, executes phases autonomously. Activates roles per phase.
-Loops to the next milestone when done. Maintains `context.md` for resume capability.
+Conductor reads milestones, delegates each phase to a sub-agent with the right role.
+Sub-agents implement + verify; conductor commits. After milestone completion, behavior
+depends on `milestone_isolation` config: stops (inline) or continues to next (agent).
+Maintains `context.md` for resume capability.
 ```
 /orchestra start
@@ -81,7 +82,6 @@ PM discusses feature with user
   → Conductor executes frontend phases (sequential, each → commit)
   → Conductor calls reviewer agent (reviews unpushed commits)
   → FIX cycle if changes-requested (re-review if fix >= 30 lines)
-  → [USER APPROVAL GATE: Push to origin]
   → Conductor pushes, PM verifies acceptance criteria, closes milestone
   → Conductor appends 5-line retrospective to knowledge.md
@@ -94,19 +94,45 @@ Hotfix (production bugs):
 ### Milestone Lock
 Conductor claims a milestone by writing `Locked-By: {timestamp}` to milestone.md before execution.
-Other conductors skip locked milestones. Lock expires after 2 hours (stale protection).
+Other conductors skip locked milestones. Lock expires after config.yml `thresholds.milestone_lock_timeout` minutes (default 120).
 ### Pipeline Modes (Complexity)
-PM sets a `Complexity` level on each milestone that determines the pipeline:
+PM sets `Complexity` on milestone (pipeline) and `complexity` on each phase (model selection):
-| Complexity | Pipeline | Use when |
-|------------|----------|----------|
-| `quick` | Engineer → Commit → Push | Config tweaks, copy changes, trivial fixes |
-| `standard` | Engineer → Review → Push | Typical features, clear requirements |
-| `full` | Architect → Engineer → Review → Push | Complex features, new subsystems |
+| Complexity | Model | Pipeline | Use when |
+|------------|-------|----------|----------|
+| `trivial` | Haiku | Phases → Commit → Push | Version bumps, env vars, config changes |
+| `quick` | Sonnet | Phases → Commit → Push (skip review) | Single-file fixes, simple CRUD |
+| `standard` | Sonnet | Phases → Review → Push | Typical features, clear requirements |
+| `complex` | Opus | Architect → Phases → Review → Push | New subsystems, unfamiliar territory |
-Default is `full` if not specified. Conductor reads the `Complexity` field from `milestone.md`.
+Defaults: config.yml `pipeline.default_pipeline` and `pipeline.default_complexity`.
+### Milestone Isolation
+Config `pipeline.milestone_isolation` controls how the conductor handles multiple milestones:
+| Mode | Behavior | Best for |
+|------|----------|----------|
+| `inline` (default) | Conductor runs milestone directly, **stops** after completion. User runs `/compact` then `/orchestra start` for next milestone. | Manual sessions, PC-based work |
+| `agent` | Conductor spawns a sub-agent per milestone. Context freed automatically after each. Loops to next milestone. | `--auto` overnight batch runs |
+```
+Inline mode:                          Agent mode:
+  /orchestra start                      /orchestra start --auto
+  → M1 executes → done → STOP          → Spawn Agent(M1) → done → freed
+  user: /compact                        → Spawn Agent(M2) → done → freed
+  /orchestra start                      → Spawn Agent(M3) → done → freed
+  → M2 executes → done → STOP          → All done
+```
+In agent mode, the delegation is two-tier:
+```
+Conductor (lean dispatcher)
+  └── Milestone Agent (fresh context)
+        └── Phase Agent (unchanged)
+```
 ### Milestone Statuses
@@ -142,8 +168,8 @@ Within each domain (backend/frontend), phases run in order: phase-1 → phase-2
 **Parallel execution:** If PM sets `depends_on` in phase frontmatter, independent phases
 can run in parallel via subagent worktree isolation. No `depends_on` = sequential (default).
-**Verification Gate:** Before every commit, conductor MUST pass type check + tests + lint
-(commands from config.yml). Commit is blocked until all checks pass.
+**Verification Gate:** Sub-agents run typecheck + tests + lint (from config.yml) before reporting.
+Conductor NEVER commits unless verification passes.
 ---
@@ -151,7 +177,8 @@ can run in parallel via subagent worktree isolation. No `depends_on` = sequentia
 - Each phase completion → **one conventional commit** on the current branch
 - No branch creation or switching — work happens on whatever branch is checked out
-- Milestone completion → **push to origin** (after user approval)
+- Milestone completion → **push to origin** (automatic after review passes)
+- Commits stay local until milestone fully completes — no partial push on failure
 - Reviewer reviews unpushed commits: `git log origin/{branch}..HEAD`
 - Clean git history: each commit maps to a phase
@@ -185,16 +212,14 @@ Rules:
 The user must approve before these transitions:
 - **Milestone creation** — PM discusses and plans, but must get user approval before creating the milestone directory and files
-- **RFC → Implementation** — user reviews architect's RFC
-- **Push to origin** — user approves the final changeset
+- **RFC → Implementation** — user reviews architect's RFC (if `rfc_approval` is not `skip`)
-All other transitions are automatic.
+Push is automatic after review passes. All other transitions are automatic.
 ### Rejection Handling
 If the user says **no** at any gate:
-- **RFC rejected** → Architect revises based on feedback, re-submits (max 3 rounds)
-- **Push rejected** → Conductor creates fix phase, implements, re-submits push gate
+- **RFC rejected** → Architect revises based on feedback, re-submits (max config `pipeline.max_rfc_rounds`)
 - **Milestone rejected** → PM revises in PM terminal
 Rejections are normal. The system does not stall — it loops back with feedback.
@@ -213,12 +238,12 @@ Conductor calls reviewer agent
   → Returns: approved / approved-with-comments / changes-requested
 ```
-**If approved** → proceed to push gate.
+**If approved** → push immediately.
-**If approved-with-comments** → proceed to push gate. Comments are logged in context.md.
+**If approved-with-comments** → push immediately. Comments are logged in context.md.
-**If changes-requested** → Conductor switches to the relevant role, fixes
-and commits. Re-review triggered if fix >= config `re_review_lines` threshold.
+**If changes-requested** → Conductor continues the phase's sub-agent via SendMessage with
+reviewer findings. Re-review triggered if fix >= config `re_review_lines` threshold.
 ---
@@ -283,16 +308,21 @@ PM and conductor run in **separate terminals**. They communicate through milesto
 ### Context Persistence
-Conductor maintains `context.md` in each milestone directory. This allows:
-- Resume after terminal close/reopen
-- Track decisions made during implementation
-- Record what was committed in each phase
+Conductor maintains `context.md` in each milestone directory with a fixed structure:
+- `## Status` — milestone id, start date, pipeline type
+- `## Phases` — per-phase status, commit hash, files changed, errors
+- `## Codebase Map` — scout-generated file map (survives milestone clear)
+- `## Decisions` — key choices from each phase that affect later phases
+- `## Metrics` — phase duration and verification retries (used by `/orchestra status`)
+This enables resume after terminal close/reopen. On restart, conductor reads context.md and skips completed phases.
 ### Approval Gates (Conductor Terminal)
-Conductor asks the user directly (not PM) at these points:
-1. **RFC ready** — "Approve RFC to start implementation?"
-2. **Push to origin** — "All done. Push to origin?"
+Conductor asks the user directly (not PM) at this point:
+1. **RFC ready** — "Approve RFC to start implementation?" (if `rfc_approval` is not `skip`)
+Push is automatic after review passes — no approval needed.
 ---
@@ -330,16 +360,18 @@ sequenceDiagram
         C->>C: Fix → commit
     end
-    C->>U: Push to origin?
-    U->>C: Yes
     C->>C: git push → milestone done
-    C->>C: Next milestone? → loop or done
+    alt Inline mode (default)
+        C->>C: STOP — user compacts and restarts
+    else Agent mode
+        C->>C: Next milestone? → loop or done
+    end
     Note over PM: PM is free the entire time<br/>Can plan M2 while M1 executes
 ```
-### 2. Conductor Execution Loop
+### 2. Conductor Execution Loop (Inline Mode)
 ```mermaid
 sequenceDiagram
@@ -354,11 +386,27 @@ sequenceDiagram
     C->>C: reviewer → approved
     C->>C: Push → M1 done
-    C->>C: Start M2
-    C->>C: architect → RFC
-    C->>C: backend phase-1
-    C->>C: reviewer → approved
-    C->>C: Push → M2 done
+    Note over C: STOP. "Run /compact or /clear then /orchestra start"
+```
+### 3. Conductor Execution Loop (Agent Mode)
+```mermaid
+sequenceDiagram
+    participant C as Conductor
+    participant MA as Milestone Agent
+    C->>C: Scan milestones/
+    C->>MA: Spawn Agent(M1)
+    MA->>MA: phase-1 → phase-2 → review → push
+    MA-->>C: {status: done, retro: ...}
+    Note over C: Write retro, ~1-2k tokens retained
+    C->>MA: Spawn Agent(M2)
+    MA->>MA: phase-1 → phase-2 → review → push
+    MA-->>C: {status: done, retro: ...}
+    Note over C: Write retro, ~1-2k tokens retained
     C->>C: No more milestones
     Note over C: "All done. Waiting for new work."

package/template/.orchestra/config.yml CHANGED Viewed

@@ -13,10 +13,7 @@ pipeline:
     standard: sonnet
     complex: opus
   # RFC approval gate: required | optional | skip
-  rfc_approval: required
-  # Push approval gate: required | auto
-  push_approval: required
+  rfc_approval: skip
   # Code review: required | optional | skip
   review: required
@@ -25,6 +22,11 @@ pipeline:
   # When enabled, phases with depends_on: [] run in parallel
   parallel: disabled
+  # Milestone isolation mode: inline | agent
+  # inline: conductor runs milestones directly, stops after each. User compacts manually. (default)
+  # agent: each milestone runs in its own sub-agent. Context freed automatically. Best for --auto.
+  milestone_isolation: inline
   # Default pipeline when milestone Complexity is missing
   default_pipeline: full  # quick | standard | full
@@ -34,6 +36,9 @@ pipeline:
   # Max RFC rejection rounds before escalating to user
   max_rfc_rounds: 3
+  # Max milestone review rounds before proceeding anyway with warnings
+  max_milestone_review_rounds: 3
 thresholds:
   # Milestone lock timeout in minutes (stale locks are ignored)
   milestone_lock_timeout: 120

package/template/.orchestra/knowledge.md CHANGED Viewed

@@ -69,7 +69,7 @@ Last 5 milestones. Conductor reads before every milestone start. PM reads before
 ### Decisions
 - Skill System (markdown-only): Lightweight `.orchestra/skills/` with domain checklists (auth, CRUD, deployment). No registry, no keyword matching — PM manually assigns via `skills:` frontmatter in phase files. Preserves zero-infrastructure philosophy.
-- Cost Awareness: Track duration + verification retries per phase in context.md Cost Tracking table. PM sees this in #status. No token counting (unreliable from prompt), focus on observable metrics.
+- Cost Awareness: Track duration + verification retries per phase in context.md `## Metrics` section. PM sees this in `/orchestra status`. No token counting (unreliable from prompt), focus on observable metrics.
 - Re-review Threshold: Fix < 30 lines → no re-review. Fix >= 30 lines → abbreviated re-review (only the fix commit). Balances quality vs speed.
 - Rejection Flow: RFC rejected → architect revises (max 3 rounds). Push rejected → create fix phase. System no longer stalls on "no".

package/template/.orchestra/roles/product-manager.md CHANGED Viewed

@@ -42,13 +42,44 @@ Cannot write: feature code, RFCs, architecture docs, review findings, system fil
     └── phase-2.md
 ```
-### Pre-flight Checklist
+### Milestone Review Loop
+After creating milestone files, launch a milestone-reviewer sub-agent before
+marking the milestone as ready. This catches planning errors before conductor executes.
+**Flow:** PM creates → reviewer sub-agent → PM fixes → reviewer again → max `pipeline.max_milestone_review_rounds`
+Launch sub-agent (general-purpose, model: sonnet) with this prompt:
+```
+You are reviewing a milestone for quality before execution. Read these files
+in {milestone_path}/: prd.md, milestone.md, grooming.md, and all files in phases/.
+(rfc.md and context.md don't exist yet — don't flag them as missing.)
+## Checklist
 1. Every phase has `role:` set?
-2. Every phase has `skills:` reviewed?
-3. Every phase has clear, testable acceptance criteria?
-4. `milestone.md` has `Complexity:` set?
-5. Phase order and dependencies correct?
+2. Every phase has `complexity:` set?
+3. Every phase has `skills:` appropriate for the role and task?
+4. Every phase has `scope:` defining which files/dirs to touch?
+5. Acceptance criteria are testable? (not vague like "works well" — specific like "returns 200")
+6. `milestone.md` has `Complexity:` set?
+7. Phase order and `depends_on` are correct? (frontend depends on backend, etc.)
+8. No overlapping scope between phases? (two phases writing same files)
+9. PRD explains WHY, not just WHAT?
+## Return Format
+verdict: approved | changes-requested
+issues:
+- [severity: blocking|suggestion] {description} — {file}
+summary: {2-3 sentences}
+```
+**Process:**
+1. If **approved** → proceed, milestone is ready for conductor
+2. If **changes-requested** → PM reads issues, fixes milestone files, re-launches reviewer
+3. After max rounds with no blocking issues → proceed with suggestions logged in grooming.md
+4. After max rounds with blocking issues still open → escalate to user, do NOT proceed
+5. Present verdict to user before finalizing
 ### milestone.md Format
@@ -59,7 +90,7 @@ Cannot write: feature code, RFCs, architecture docs, review findings, system fil
 |-------|-------|
 | Status | planning / in-progress / review / done |
 | Priority | P0 / P1 / P2 |
-| Complexity | quick / standard / full |
+| Complexity | trivial / quick / standard / complex |
 | PRD | prd.md |
 | Created | {date} |
 ```
@@ -85,11 +116,12 @@ depends_on: []
 ### Complexity Levels
-| Level | Pipeline | When |
-|-------|----------|------|
-| `quick` | Engineer → Commit → Push | Trivial: config, copy, single-file fix |
-| `standard` | Engineer → Review → Push | Typical features, clear requirements |
-| `full` | Architect → Engineer → Review → Push | Complex: new subsystems, unfamiliar territory |
+| Level | Model | Pipeline | When |
+|-------|-------|----------|------|
+| `trivial` | Haiku | Phases → Commit → Push | Version bumps, env vars, config changes |
+| `quick` | Sonnet | Phases → Commit → Push (skip review) | Single-file fixes, simple CRUD |
+| `standard` | Sonnet | Phases → Review → Push | Typical features (default) |
+| `complex` | Opus | Architect → Phases → Review → Push | New subsystems, unfamiliar territory |
 ### Blueprint Command

package/template/CLAUDE.md CHANGED Viewed

@@ -1,6 +1,6 @@
-# CLAUDE.md — Orchestra Setup Instructions
+# CLAUDE.md
-This file is automatically read by Claude at the start of every session.
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
 <!-- orchestra -->
 ## Orchestra — AI Team Orchestration System
@@ -46,6 +46,22 @@ Role IDs: orchestrator, product-manager, architect, backend-engineer, frontend-e
 - Rules (`.claude/rules/*.orchestra.md`) auto-loaded. Skills loaded per phase.
 - **PROTECTED:** Non-Orchestrator roles NEVER modify `.orchestra/roles/`, `.orchestra/config.yml`, `.orchestra/README.md`, `.orchestra/blueprints/`, `.claude/agents/`, `.claude/rules/*.orchestra.md`, `.claude/skills/*.orchestra.md`, `.claude/commands/orchestra/`, `CLAUDE.md`, or `docs/`.
+## Development
+This is an npm package (`@sulhadin/orchestrator`) — a CLI installer that copies Orchestra template files into user projects.
+```bash
+yarn test              # Run tests (node:test, test/**/*.test.js)
+yarn template          # Rebuild template/ from source files (bin/build-template.js)
+yarn build             # Full build (defined in lint-staged)
+```
+**Architecture:** `bin/index.js` is the CLI entry point (runs via `npx`). It copies files from `template/` into the user's project, with smart YAML merge for `config.yml` (preserves user values, adds new keys). `bin/build-template.js` generates the `template/` directory from the source `.orchestra/` and `.claude/` files.
+**npm publishes:** Only `bin/` and `template/` directories (see `package.json` `files` field). Tests, docs, and source orchestra files are excluded.
+**Pre-commit:** Husky + lint-staged runs `yarn template && yarn build` on staged `.js`, `.md`, `.yml`, `.json` files.
 ## Installation
 See `docs/getting-started.md` for setup instructions.

package/bin/merge-config.test.js DELETED Viewed

@@ -1,135 +0,0 @@
-const { describe, it } = require("node:test");
-const assert = require("node:assert");
-const fs = require("fs");
-const path = require("path");
-// Extract mergeConfigYaml from index.js
-const src = fs.readFileSync(path.join(__dirname, "index.js"), "utf-8");
-const match = src.match(/function mergeConfigYaml\([\s\S]*?^}/m);
-eval(match[0]);
-const userConfig = `pipeline:
-  models:
-    quick: haiku
-    standard: sonnet
-    complex: opus
-  rfc_approval: skip
-  push_approval: auto
-  review: required
-  parallel: disabled
-thresholds:
-  re_review_lines: 50
-  phase_time_limit: 20
-  phase_tool_limit: 40
-  stuck_retry_limit: 5
-verification:
-  typecheck: "yarn tsc --noEmit"
-  test: "yarn test"
-  lint: "yarn lint"
-`;
-const templateConfig = `# Orchestra Configuration
-# Customize pipeline behavior, thresholds, and verification commands.
-pipeline:
-  # Model selection per phase complexity
-  models:
-    trivial: haiku
-    quick: sonnet
-    standard: sonnet
-    complex: opus
-  rfc_approval: required
-  push_approval: required
-  review: required
-  parallel: disabled
-  default_pipeline: full
-  default_complexity: standard
-  max_rfc_rounds: 3
-thresholds:
-  milestone_lock_timeout: 120
-  re_review_lines: 30
-  phase_time_limit: 15
-  phase_tool_limit: 40
-  stuck_retry_limit: 3
-verification:
-  typecheck: "npx tsc --noEmit"
-  test: "npm test"
-  lint: "npm run lint"
-`;
-describe("mergeConfigYaml", () => {
-  const result = mergeConfigYaml(userConfig, templateConfig);
-  describe("new template keys are added", () => {
-    it("adds trivial model tier", () => {
-      assert.ok(result.includes("trivial: haiku"));
-    });
-    it("adds default_pipeline", () => {
-      assert.ok(result.includes("default_pipeline: full"));
-    });
-    it("adds default_complexity", () => {
-      assert.ok(result.includes("default_complexity: standard"));
-    });
-    it("adds max_rfc_rounds", () => {
-      assert.ok(result.includes("max_rfc_rounds: 3"));
-    });
-    it("adds milestone_lock_timeout", () => {
-      assert.ok(result.includes("milestone_lock_timeout: 120"));
-    });
-  });
-  describe("user values are preserved", () => {
-    it("keeps user models.quick value", () => {
-      assert.ok(result.includes("quick: haiku"));
-    });
-    it("keeps user rfc_approval", () => {
-      assert.ok(result.includes("rfc_approval: skip"));
-    });
-    it("keeps user push_approval", () => {
-      assert.ok(result.includes("push_approval: auto"));
-    });
-    it("keeps user re_review_lines", () => {
-      assert.ok(result.includes("re_review_lines: 50"));
-    });
-    it("keeps user phase_time_limit", () => {
-      assert.ok(result.includes("phase_time_limit: 20"));
-    });
-    it("keeps user stuck_retry_limit", () => {
-      assert.ok(result.includes("stuck_retry_limit: 5"));
-    });
-    it("keeps user verification commands", () => {
-      assert.ok(result.includes('typecheck: "yarn tsc --noEmit"'));
-      assert.ok(result.includes('test: "yarn test"'));
-      assert.ok(result.includes('lint: "yarn lint"'));
-    });
-  });
-  describe("template structure is preserved", () => {
-    it("keeps comments from template", () => {
-      assert.ok(result.includes("# Orchestra Configuration"));
-      assert.ok(result.includes("# Model selection per phase complexity"));
-    });
-    it("maintains section order", () => {
-      const pipelineIdx = result.indexOf("pipeline:");
-      const thresholdsIdx = result.indexOf("thresholds:");
-      const verificationIdx = result.indexOf("verification:");
-      assert.ok(pipelineIdx < thresholdsIdx);
-      assert.ok(thresholdsIdx < verificationIdx);
-    });
-  });
-});