npm - orchestr8 - Versions diffs - 2.5.0 → 2.6.1 - Mend

orchestr8 2.5.0 → 2.6.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (54) hide show

package/.blueprint/features/feature_validate-command/story-run-validation.md ADDED Viewed

@@ -0,0 +1,59 @@
+# Story — Run Validation Command
+## User story
+As a developer, I want to run `orchestr8 validate` in my project directory so that I can check whether my environment is correctly configured before running the pipeline.
+---
+## Context / scope
+- Developer using orchestr8 CLI
+- Command can be run in any directory (initialized or not)
+- Route: `orchestr8 validate` via `bin/cli.js`
+- This is the primary entry point for the validate feature
+See feature spec: `.blueprint/features/feature_validate-command/FEATURE_SPEC.md`
+---
+## Acceptance criteria
+**AC-1 — Command is available**
+- Given I have orchestr8 installed,
+- When I run `orchestr8 validate`,
+- Then the command executes without throwing an exception.
+**AC-2 — Checks execute in sequence**
+- Given I run `orchestr8 validate`,
+- When the command executes,
+- Then all validation checks are performed:
+  - Directory existence (`.blueprint/`, `.business_context/`, `.claude/commands/`)
+  - System spec existence (`.blueprint/system_specification/SYSTEM_SPEC.md`)
+  - Agent spec files existence (4 files in `.blueprint/agents/`)
+  - Business context non-empty check
+  - Skills installed check (`.claude/commands/implement-feature.md`)
+  - Node.js version check (>=18)
+**AC-3 — Each check produces a status line**
+- Given I run `orchestr8 validate`,
+- When each check completes,
+- Then a status line is printed showing pass or fail for that check.
+**AC-4 — Command completes without crashes**
+- Given any combination of missing/present files,
+- When I run `orchestr8 validate`,
+- Then the command completes gracefully (does not throw exceptions)
+- And all checks are wrapped to handle missing paths.
+**AC-5 — Idempotent execution**
+- Given I run `orchestr8 validate` multiple times,
+- When each execution completes,
+- Then no files are created, modified, or deleted
+- And each run produces the same output for the same state.
+---
+## Out of scope
+- Validation of file contents (e.g., SYSTEM_SPEC.md is well-formed)
+- Automatic remediation of issues
+- Network checks
+- Queue state validation

package/.blueprint/features/feature_validate-command/story-success-output.md ADDED Viewed

@@ -0,0 +1,50 @@
+# Story — Validation Success Output
+## User story
+As a developer, I want to see clear success indicators when all validation checks pass so that I have confidence my project is ready to run the pipeline.
+---
+## Context / scope
+- Developer using orchestr8 CLI
+- Project is fully initialized with all required artifacts
+- This story covers the happy path output formatting
+See feature spec: `.blueprint/features/feature_validate-command/FEATURE_SPEC.md`
+---
+## Acceptance criteria
+**AC-1 — Checkmark displayed for passed checks**
+- Given all required files and directories exist,
+- When I run `orchestr8 validate`,
+- Then each passed check displays a checkmark indicator.
+**AC-2 — Colorized output when supported**
+- Given my terminal supports color output,
+- When I run `orchestr8 validate` and checks pass,
+- Then checkmarks are displayed in green.
+**AC-3 — ASCII fallback for non-color terminals**
+- Given my terminal does not support color output,
+- When I run `orchestr8 validate`,
+- Then success indicators use ASCII-compatible characters
+- And output remains readable.
+**AC-4 — Overall success message**
+- Given all validation checks pass,
+- When the command completes,
+- Then an overall success message is printed indicating the project is ready.
+**AC-5 — Exit code zero on success**
+- Given all validation checks pass,
+- When the command completes,
+- Then the process exits with code 0
+- And this exit code can be used in scripts/CI pipelines.
+---
+## Out of scope
+- JSON output format (deferred)
+- Verbose mode with additional details

package/.blueprint/system_specification/SYSTEM_SPEC.md ADDED Viewed

@@ -0,0 +1,248 @@
+# System Specification — orchestr8
+## 1. Purpose & Intent
+**orchestr8** is a multi-agent workflow framework that automates feature development from specification to implementation. It coordinates four specialized AI agents (Alex, Cass, Nigel, Codey) through a sequential pipeline, ensuring that features are explicitly specified, documented with user stories, tested, and implemented with traceability.
+**Why this system exists:**
+- To transform feature ideas into working, tested code through a structured, repeatable process
+- To maintain coherence between specifications, stories, tests, and implementation
+- To reduce ambiguity and drift by enforcing explicit handoffs between agents
+**Who it exists for:**
+- Developers using Claude Code who want automated, spec-driven feature development
+- Teams seeking a structured approach to AI-assisted software engineering
+**What success looks like:**
+- Features are implemented with full traceability from spec to code
+- Tests are written before implementation, providing a stable contract
+- All artifacts (specs, stories, tests, code) are aligned and consistent
+**What must not be compromised:**
+- Explicit specification before implementation
+- Test-first development contracts
+- Agent role boundaries and handoff integrity
+---
+## 2. Business & Domain Context
+orchestr8 operates in the domain of AI-assisted software development, specifically within the Claude Code CLI environment.
+**Relevant drivers:**
+- Growing adoption of AI coding assistants
+- Need for structured processes to guide AI-generated code
+- Requirement for traceability and quality assurance in automated development
+**Domain constraints:**
+- Operates within Claude Code's Task tool for spawning sub-agents
+- Subject to Claude's token limits (optimized via incremental writes)
+- Relies on file-based artifacts for persistence and handoffs
+**Assumptions:**
+- Users have Node.js 18+ installed
+- Users have Claude Code CLI available
+- Projects use standard testing frameworks (Jest, Node test runner)
+- The `.business_context/` directory contains project-specific domain knowledge
+---
+## 3. System Boundaries
+### In Scope
+- **CLI tooling:** `init`, `update`, `skills`, `add-skills`, `queue` commands
+- **Agent orchestration:** Sequential pipeline via `/implement-feature` skill
+- **Artifact management:** Feature specs, user stories, test specs, tests, implementation plans
+- **Queue persistence:** Recovery and resumption from `.claude/implement-queue.json`
+- **Skills integration:** Installing optional skills from skills.sh ecosystem
+### Out of Scope
+- Runtime test execution environment (assumes existing test infrastructure)
+- CI/CD integration (users configure their own pipelines)
+- Project-specific business logic (provided via `.business_context/`)
+- IDE integration beyond Claude Code CLI
+---
+## 4. Actors & Roles
+### Human User
+- **Description:** Developer invoking orchestr8 commands and the `/implement-feature` skill
+- **Primary goals:** Automate feature development, maintain control over scope and quality
+- **Authority:** Final arbiter on intent, scope, and breaking changes; can pause/abort pipeline
+### Alex (System Specification & Chief-of-Staff)
+- **Description:** Creates and maintains system and feature specifications
+- **Primary goals:** Prevent drift, ensure coherence, produce feature specs for downstream agents
+- **Authority:** Can define and evolve specifications; cannot make unilateral product decisions
+### Cass (Story Writer / Business Analyst)
+- **Description:** Translates feature specs into user stories and acceptance criteria
+- **Primary goals:** Produce explicit, testable, behaviour-first stories
+- **Authority:** Elaborates specifications into stories; does not introduce unspecified behaviour
+### Nigel (Tester)
+- **Description:** Creates test specifications and executable tests from user stories
+- **Primary goals:** Expose ambiguities, provide stable test contracts
+- **Authority:** Defines test coverage; does not implement production code
+### Codey (Developer)
+- **Description:** Implements features according to plans and test contracts
+- **Primary goals:** Make tests pass, maintain code quality
+- **Authority:** Writes implementation code; does not modify tests without agreement
+---
+## 5. Core Domain Concepts
+### Feature
+- **Definition:** A bounded unit of functionality identified by a slug (e.g., `user-auth`)
+- **Key attributes:** slug, feature spec, user stories, tests, implementation plan
+- **Lifecycle:** Created by Alex, elaborated by Cass, tested by Nigel, implemented by Codey
+### Pipeline Stage
+- **Definition:** A discrete phase in the feature development workflow
+- **Key attributes:** agent owner, input artifacts, output artifacts
+- **Stages:** alex, cass, nigel, codey-plan, codey-implement, auto-commit
+### Queue
+- **Definition:** Persistent state tracking features through the pipeline
+- **Key attributes:** current, alexQueue, cassQueue, nigelQueue, codeyQueue, completed, failed
+- **Lifecycle:** Created on first run, updated at each stage transition, cleared on completion
+### Artifact
+- **Definition:** A file produced by an agent as output
+- **Types:** SYSTEM_SPEC.md, FEATURE_SPEC.md, story-*.md, test-spec.md, *.test.js, IMPLEMENTATION_PLAN.md
+### Skill
+- **Definition:** A Claude Code command that provides specialized agent capabilities
+- **Key attributes:** name, description, prompt template
+- **Lifecycle:** Installed via `add-skills`, invoked via slash commands
+---
+## 6. High-Level Lifecycle & State Model
+### Pipeline Lifecycle
+```
+INIT → ALEX → CASS → NIGEL → CODEY_PLAN → CODEY_IMPLEMENT → COMMIT → COMPLETE
+                                ↓
+                             PAUSED (--pause-after)
+                                ↓
+                             RESUME
+```
+### Feature States
+| State | Entry Condition | Exit Condition |
+|-------|----------------|----------------|
+| **pending** | Feature slug provided | Agent starts processing |
+| **in_progress** | Agent spawned for stage | Agent completes or fails |
+| **completed** | All stages pass, commit done | Moved to completed list |
+| **failed** | Agent error or abort | Moved to failed list |
+### Recovery
+- On re-invocation, pipeline reads `current.stage` from queue and resumes
+- Failed features can be retried or restarted from scratch
+---
+## 7. Governing Rules & Invariants
+### Pipeline Rules
+- **Sequential execution:** Agents execute in order (Alex → Cass → Nigel → Codey)
+- **Artifact gates:** Each stage requires output artifacts before proceeding
+- **System spec gate:** Pipeline cannot proceed without SYSTEM_SPEC.md
+### Agent Rules
+- **Single responsibility:** Each agent has defined inputs and outputs; no overlap
+- **No silent changes:** Agents flag deviations; do not silently alter specifications
+- **Incremental output:** All agents write one file at a time to avoid token limits
+### Queue Rules
+- **Single current:** Only one feature can be `current` at a time
+- **Immutable completed:** Completed features are not re-processed unless explicitly restarted
+- **Recovery-safe:** Queue file enables resumption after failures
+### Implementation Rules
+- **Tests are contracts:** Codey implements against tests, not around them
+- **No test deletion:** Tests cannot be removed without explicit approval
+- **Green suite required:** Implementation is complete only when all tests pass
+---
+## 8. Cross-Cutting Concerns
+### Traceability
+- Every test maps to acceptance criteria (AC → Test ID table)
+- Every story maps to feature spec sections
+- Commit messages reference artifacts and agents
+### Token Limit Management
+- Incremental file writes (one file at a time)
+- Brief summaries (5 bullets max)
+- Reference by path instead of quoting content
+- Consolidated artifacts (Nigel: 2 files, not 4)
+### Failure Handling
+- Each agent spawn offers: retry, skip, abort
+- Failures are recorded with stage, reason, and timestamp
+- Queue persists state for recovery
+### Observability
+- Queue status available via `orchestr8 queue`
+- Each agent provides completion summary
+- Commit messages document pipeline execution
+---
+## 9. Non-Functional Expectations
+### Reliability
+- Pipeline resumes from last checkpoint on failure
+- Queue file is gitignored to avoid conflicts
+- Deterministic output given same inputs
+### Performance
+- Optimized for Claude's 4096 output token limit
+- Incremental processing reduces memory pressure
+- Skills installed once per project
+### Extensibility
+- Skills can be added per-agent from skills.sh ecosystem
+- Agent specifications can be customized in `.blueprint/agents/`
+- Business context directory allows project-specific grounding
+### Maintainability
+- `update` command replaces framework files while preserving user content
+- Clear separation: framework dirs (replaced) vs user dirs (preserved)
+---
+## 10. Known Gaps, Risks & Open Questions
+### Known Limitations
+- Pipeline is sequential; no parallel agent execution
+- Single-feature focus; no batch processing
+- Assumes Node.js testing ecosystem
+### Risks
+- Token limit errors if agents generate excessive output
+- Agent specifications may need tuning for non-standard projects
+- Skill installation requires network access
+### Open Questions
+- Should agents support alternative testing frameworks?
+- Could pipeline stages be made configurable or skippable?
+- How to handle features that span multiple repositories?
+---
+## 11. Change Log (System-Level)
+| Date | Change | Reason | Approved By |
+|------|--------|--------|-------------|
+| 2026-02-24 | Initial system specification | Document orchestr8 v2.5.0 | Alex |

package/README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # orchestr8
-A multi-agent workflow framework for automated feature development. Four specialized AI agents collaborate in sequence to take features from specification to implementation.
+A multi-agent workflow framework for automated feature development. Four specialized AI agents collaborate in sequence to take features from specification to implementation, with built-in feedback loops and self-improvement capabilities.
 ## Agents
@@ -19,53 +19,155 @@ npx orchestr8 init
 This installs the `.blueprint/` directory, `.business_context/`, and the `/implement-feature` skill to `.claude/commands/`. If files already exist, you'll be prompted before overwriting. It also adds the workflow queue to `.gitignore`.
-### Commands
+## Keeping Up to Date
+**Modules** (history, insights, feedback, retry, validate) are part of the npm package and update automatically when you use `npx` - no action needed.
+**Project files** (agent specs, templates, skill definition) are copied to your project and need explicit updating:
+```bash
+npx orchestr8 update
+```
+This updates `.blueprint/agents/`, `.blueprint/templates/`, `.blueprint/ways_of_working/`, and `.claude/commands/implement-feature.md` while preserving your content in `features/` and `system_specification/`.
+## Commands
+### Core Commands
 | Command | Description |
 |---------|-------------|
 | `npx orchestr8 init` | Initialize `.blueprint/`, `.business_context/`, and skill in your project |
 | `npx orchestr8 update` | Update agents, templates, and rituals to latest version |
-| `npx orchestr8 add-skills [agent]` | Install recommended skills for an agent (alex, cass, nigel, codey, all) |
-| `npx orchestr8 skills [agent]` | List recommended skills |
+| `npx orchestr8 validate` | Pre-flight checks before running pipeline |
 | `npx orchestr8 help` | Show help |
-The `update` command preserves your content in `features/` and `system_specification/` while updating the framework files. Your `.business_context/` directory is separate from `.blueprint/` and unaffected by updates.
-### Optional Skills
+### History & Insights
-Each agent has recommended skills from the [skills.sh](https://skills.sh) ecosystem that enhance their capabilities:
+| Command | Description |
+|---------|-------------|
+| `npx orchestr8 history` | View recent pipeline runs |
+| `npx orchestr8 history --stats` | View aggregate statistics |
+| `npx orchestr8 history --all` | View all runs |
+| `npx orchestr8 history clear` | Clear history |
+| `npx orchestr8 insights` | Analyze patterns and get recommendations |
+| `npx orchestr8 insights --feedback` | View feedback correlation analysis |
+| `npx orchestr8 insights --bottlenecks` | View bottleneck analysis |
+| `npx orchestr8 insights --failures` | View failure pattern analysis |
-| Agent | Skills |
-|-------|--------|
-| **Alex** | `avoid-feature-creep`, `feature-spec` |
-| **Cass** | `user-story-writing` |
-| **Nigel** | `javascript-testing-patterns`, `modern-javascript-patterns` |
-| **Codey** | `javascript-expert`, `modern-javascript-patterns` |
+### Configuration
-```bash
-npx orchestr8 add-skills all     # Install all recommended skills
-npx orchestr8 add-skills codey   # Install skills for Codey only
-```
+| Command | Description |
+|---------|-------------|
+| `npx orchestr8 retry-config` | View retry configuration |
+| `npx orchestr8 retry-config set <key> <value>` | Modify retry settings |
+| `npx orchestr8 retry-config reset` | Reset to defaults |
+| `npx orchestr8 feedback-config` | View feedback thresholds |
+| `npx orchestr8 feedback-config set <key> <value>` | Modify feedback settings |
 ## Usage
 Run the pipeline with the `/implement-feature` skill in Claude Code:
 ```bash
-/implement-feature                                    # Interactive
-/implement-feature "user-auth"                        # New feature
-/implement-feature "user-auth" --update-feature-spec  # Update spec
-/implement-feature "user-auth" --update-story "login" # Update story
-/implement-feature --update-system-spec               # Update system spec
+/implement-feature                           # Interactive
+/implement-feature "user-auth"               # New feature
+/implement-feature "user-auth" --no-feedback # Skip feedback collection
+/implement-feature "user-auth" --no-validate # Skip pre-flight validation
+/implement-feature "user-auth" --no-history  # Skip history recording
+/implement-feature "user-auth" --no-commit   # Skip auto-commit
+/implement-feature "user-auth" --pause-after=alex|cass|nigel|codey-plan
 ```
-### Pipeline Flow
+## Pipeline Flow
+The pipeline now includes validation, feedback loops, and history tracking:
 ```
-New Feature:    Alex → Cass → Nigel → Codey → Commit
-Update Story:   Cass → Nigel → Codey → Commit
-Update Feature: Alex → cascade check
-Update System:  Alex → cascade check
+┌─────────────────────────────────────────────────────────────────┐
+│  Step 0: Pre-flight Validation                                  │
+│  • Check directories, specs, Node.js version                    │
+└─────────────────────────────────────────────────────────────────┘
+                              │
+                              ▼
+┌─────────────────────────────────────────────────────────────────┐
+│  Insights Preview                                               │
+│  • "Last 10 runs: 85% success, estimated ~12 min"               │
+└─────────────────────────────────────────────────────────────────┘
+                              │
+                              ▼
+┌─────────────────────────────────────────────────────────────────┐
+│  Alex (Feature Spec)                                            │
+│         │                                                       │
+│         ▼                                                       │
+│  Cass rates Alex → Quality Gate (pause if rating < 3)           │
+│         │                                                       │
+│         ▼                                                       │
+│  Cass (User Stories)                                            │
+│         │                                                       │
+│         ▼                                                       │
+│  Nigel rates Cass → Quality Gate                                │
+│         │                                                       │
+│         ▼                                                       │
+│  Nigel (Tests)                                                  │
+│         │                                                       │
+│         ▼                                                       │
+│  Codey rates Nigel → Quality Gate                               │
+│         │                                                       │
+│         ▼                                                       │
+│  Codey (Plan → Implement)                                       │
+└─────────────────────────────────────────────────────────────────┘
+                              │
+                    On Failure│
+                              ▼
+┌─────────────────────────────────────────────────────────────────┐
+│  Smart Retry                                                    │
+│  • Check feedback chain for root cause                          │
+│  • Recommend strategy based on history                          │
+│  • Apply: simplify-prompt, add-context, incremental, etc.       │
+└─────────────────────────────────────────────────────────────────┘
+                              │
+                              ▼
+┌─────────────────────────────────────────────────────────────────┐
+│  Auto-commit → Record to History                                │
+│  • Duration, feedback scores, outcome                           │
+└─────────────────────────────────────────────────────────────────┘
+```
+## Modules
+orchestr8 includes these built-in modules for observability and self-improvement:
+| Module | Purpose |
+|--------|---------|
+| **validate** | Pre-flight checks before pipeline runs |
+| **history** | Records execution data (timing, status, feedback) |
+| **insights** | Analyzes patterns, detects bottlenecks, recommends improvements |
+| **retry** | Smart retry strategies based on failure history |
+| **feedback** | Agent-to-agent quality assessment with correlation tracking |
+### How They Work Together
+```
+Pipeline Run
+     │
+     ├──► history.js records timing at each stage
+     │
+     ├──► feedback.js collects quality ratings between stages
+     │
+     └──► On completion/failure, data stored in pipeline-history.json
+                              │
+                              ▼
+                    Future Pipeline Runs
+                              │
+     ┌────────────────────────┼────────────────────────┐
+     │                        │                        │
+     ▼                        ▼                        ▼
+insights.js              retry.js               feedback.js
+analyzes:               recommends:            calibrates:
+• Bottlenecks           • Retry strategies     • Quality thresholds
+• Failure patterns      • Based on history     • Agent accuracy
+• Trends                • And feedback issues
 ```
 ## Directory Structure
@@ -73,7 +175,7 @@ Update System:  Alex → cascade check
 ```
 your-project/
 ├── .blueprint/
-│   ├── agents/                    # Agent specifications
+│   ├── agents/                    # Agent specifications (with guardrails)
 │   │   ├── AGENT_SPECIFICATION_ALEX.md
 │   │   ├── AGENT_BA_CASS.md
 │   │   ├── AGENT_TESTER_NIGEL.md
@@ -86,19 +188,61 @@ your-project/
 │   └── system_specification/      # System spec (populated on first run)
 ├── .business_context/             # Business context documents
 │   └── README.md
-└── .claude/
-    └── commands/
-        └── implement-feature.md   # The /implement-feature skill
+├── .claude/
+│   ├── commands/
+│   │   └── implement-feature.md   # The /implement-feature skill
+│   ├── pipeline-history.json      # Execution history (gitignored)
+│   ├── retry-config.json          # Retry configuration (gitignored)
+│   ├── feedback-config.json       # Feedback thresholds (gitignored)
+│   └── implement-queue.json       # Pipeline queue state (gitignored)
+└── test/
+    ├── artifacts/                 # Test specs per feature
+    └── feature_*.test.js          # Executable tests
 ```
-## How It Works
+## Agent Guardrails
+All agents follow strict guardrails to ensure quality:
+| Guardrail | Description |
+|-----------|-------------|
+| **Source Restrictions** | Only use provided inputs (specs, code, business_context) |
+| **Prohibited Sources** | No social media, forums, external APIs, training data for domain facts |
+| **Citation Requirements** | All claims must cite source files |
+| **Confidentiality** | Business context treated as confidential |
+| **Escalation Protocol** | Clear rules for when to ask vs assume |
-1. **Alex** gates on the system spec — creates it if missing, then routes to the appropriate workflow
-2. **Cass** writes user stories with acceptance criteria from the feature spec
-3. **Nigel** creates test plans, behavior matrices, and executable tests
-4. **Codey** implements code to pass the tests, then auto-commits
+## Self-Improvement Loop
-The pipeline maintains a queue (`.claude/implement-queue.json`) to track progress and enable recovery on failure.
+The pipeline learns from itself:
+1. **Record** — Each run records timing, feedback, and outcomes
+2. **Analyze** — `orchestr8 insights` identifies patterns
+3. **Adapt** — Future runs use history to:
+   - Set dynamic quality thresholds
+   - Select optimal retry strategies
+   - Predict duration and success rate
+4. **Improve** — Feedback patterns suggest prompt improvements
+```bash
+# Example insights output
+$ npx orchestr8 insights
+## Pipeline Insights
+### Bottlenecks
+- nigel averages 4.2 min (42% of pipeline time)
+- Recommendation: Consider simplifying test requirements
+### Failure Patterns
+- codey-implement has 23% failure rate
+- Most common issue: "missing-error-handling" (73% correlation)
+- Recommendation: Ensure Alex includes error handling in specs
+### Trends
+- Success rate: 75% → 85% (improving)
+- Avg duration: 14 min → 11 min (improving)
+```
 ## License