npm - orchestr8 - Versions diffs - 2.5.0 → 2.6.1 - Mend

orchestr8 2.5.0 → 2.6.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (54) hide show

package/.blueprint/features/feature_feedback-loop/story-feedback-collection.md ADDED Viewed

@@ -0,0 +1,63 @@
+# Story — Feedback Collection
+## User Story
+As a **pipeline orchestrator**, I want **downstream agents to provide structured feedback on upstream artifacts** so that **quality issues are surfaced explicitly at each stage boundary**.
+---
+## Context / Scope
+- Per FEATURE_SPEC.md:Section 4, feedback is collected at each stage boundary (Cass on Alex, Nigel on Cass, Codey on Nigel)
+- Feedback uses a defined schema with rating, confidence, issues, and recommendation
+- Feedback is captured before the downstream agent begins its main work
+- Per SYSTEM_SPEC.md:Section 7, agents must "flag deviations" — this story operationalises that principle
+---
+## Acceptance Criteria
+**AC-1 — Feedback schema structure**
+- Given an agent is spawned to provide feedback,
+- When the agent completes feedback output,
+- Then the feedback object contains:
+  - `about`: agent name being assessed (alex|cass|nigel)
+  - `rating`: integer 1-5
+  - `confidence`: float 0.0-1.0
+  - `issues`: array of issue codes (may be empty)
+  - `recommendation`: one of "proceed", "pause", or "revise"
+**AC-2 — Cass provides feedback on Alex**
+- Given Alex has completed a feature specification,
+- When Cass is spawned for story writing,
+- Then Cass first produces a feedback object with `about: "alex"` assessing the feature spec quality.
+**AC-3 — Nigel provides feedback on Cass**
+- Given Cass has completed user stories,
+- When Nigel is spawned for test writing,
+- Then Nigel first produces a feedback object with `about: "cass"` assessing story quality and testability.
+**AC-4 — Codey provides feedback on Nigel**
+- Given Nigel has completed test specifications,
+- When Codey is spawned for planning/implementation,
+- Then Codey first produces a feedback object with `about: "nigel"` assessing test coverage and implementation feasibility.
+**AC-5 — Feedback validation**
+- Given an agent produces a feedback object,
+- When the orchestrator reads the feedback,
+- Then the feedback is validated against the schema,
+- And invalid feedback triggers a warning but does not block the pipeline (per FEATURE_SPEC.md:Section 8, degraded mode).
+**AC-6 — Feedback persisted to history**
+- Given feedback is collected from an agent,
+- When the stage completes,
+- Then the feedback is stored in the history entry at `stages[stage].feedback`.
+---
+## Out of Scope
+- Feedback from Alex (no prior stage to assess)
+- Feedback on auto-commit stage
+- Natural language feedback parsing (structured schema only)
+- Automatic remediation based on feedback

package/.blueprint/features/feature_feedback-loop/story-feedback-config.md ADDED Viewed

@@ -0,0 +1,61 @@
+# Story — Feedback Configuration
+## User Story
+As a **developer**, I want **CLI commands to view and modify feedback thresholds** so that **I can tune quality gate sensitivity based on my project's needs**.
+---
+## Context / Scope
+- Per FEATURE_SPEC.md:Section 7, configuration is stored in `.claude/feedback-config.json`
+- Per FEATURE_SPEC.md:Section 7, new CLI commands: `orchestr8 feedback-config` and `orchestr8 feedback-config set <key> <value>`
+- Parallel track to quality gates — configuration can be set independently
+---
+## Acceptance Criteria
+**AC-1 — View feedback configuration**
+- Given the user runs `orchestr8 feedback-config`,
+- When the command executes,
+- Then the current configuration is displayed including:
+  - `minRatingThreshold` (default: 3.0)
+  - `enabled` (default: true)
+  - Any custom issue-to-strategy mappings
+**AC-2 — Set threshold value**
+- Given the user runs `orchestr8 feedback-config set minRating <value>`,
+- When the value is a number between 1.0 and 5.0,
+- Then the threshold is updated in `.claude/feedback-config.json`,
+- And a confirmation message is displayed.
+**AC-3 — Invalid threshold rejected**
+- Given the user runs `orchestr8 feedback-config set minRating <value>`,
+- When the value is outside 1.0-5.0 range or not a number,
+- Then an error message is displayed,
+- And the configuration is not modified.
+**AC-4 — Enable/disable feedback system**
+- Given the user runs `orchestr8 feedback-config set enabled <true|false>`,
+- When the command executes,
+- Then the `enabled` flag is updated,
+- And when disabled, feedback collection and quality gates are skipped.
+**AC-5 — Configuration file created on first set**
+- Given `.claude/feedback-config.json` does not exist,
+- When the user runs a `feedback-config set` command,
+- Then the file is created with default values plus the specified override.
+**AC-6 — Configuration file is gitignored**
+- Given a project is initialised with orchestr8,
+- When feedback configuration is created,
+- Then `.claude/feedback-config.json` is included in gitignore patterns.
+---
+## Out of Scope
+- Per-agent threshold configuration (single global threshold for MVP)
+- Custom issue code definition via CLI
+- Configuration import/export

package/.blueprint/features/feature_feedback-loop/story-feedback-insights.md ADDED Viewed

@@ -0,0 +1,63 @@
+# Story — Feedback Insights
+## User Story
+As a **developer**, I want **correlation analysis between feedback scores and pipeline outcomes** so that **I can understand how predictive agent feedback is and tune thresholds accordingly**.
+---
+## Context / Scope
+- Per FEATURE_SPEC.md:Section 7, extends `src/insights.js` with feedback analysis functions
+- Per FEATURE_SPEC.md:Section 6 (Rule 4), calculates agent calibration as correlation between ratings and outcomes
+- Depends on feedback being stored in history (story-feedback-collection.md)
+- Per FEATURE_SPEC.md:Section 9, requires 10+ completed runs for meaningful results
+---
+## Acceptance Criteria
+**AC-1 — Feedback analysis command**
+- Given the user runs `orchestr8 insights --feedback`,
+- When sufficient history exists (10+ completed runs with feedback),
+- Then a feedback analysis report is displayed.
+**AC-2 — Agent calibration scoring**
+- Given the feedback analysis runs,
+- When calibration is calculated per agent,
+- Then each agent receives a calibration score (0.0-1.0):
+  - 0.0 = feedback uncorrelated with outcomes
+  - 1.0 = perfect predictor of success/failure
+- And the score is displayed as "Cass calibration: 0.72" format.
+**AC-3 — Issue pattern correlation**
+- Given feedback history contains issue codes,
+- When the analysis runs,
+- Then issue codes are correlated with failure outcomes,
+- And frequently predictive issues are highlighted (e.g., "`unclear-scope` preceded 80% of failures").
+**AC-4 — Threshold recommendation**
+- Given sufficient calibration data exists,
+- When the analysis runs,
+- Then a recommended `minRatingThreshold` is suggested based on historical data,
+- And the recommendation balances false positives (unnecessary pauses) and false negatives (missed quality issues).
+**AC-5 — Insufficient data handling**
+- Given the user runs `orchestr8 insights --feedback`,
+- When fewer than 10 completed runs with feedback exist,
+- Then a message is displayed: "Insufficient data for feedback analysis. {N}/10 runs with feedback available."
+**AC-6 — Retry strategy mapping**
+- Given feedback analysis identifies predictive issue patterns,
+- When the user views insights,
+- Then issue-to-strategy mappings are displayed (per FEATURE_SPEC.md:Rule 3),
+- And the user can see which retry strategies are recommended for common issues.
+---
+## Out of Scope
+- Cross-pipeline feedback aggregation (each project is independent)
+- Real-time calibration updates during pipeline execution
+- Natural language interpretation of feedback patterns
+- Automatic threshold adjustment (user must run `feedback-config set`)

package/.blueprint/features/feature_feedback-loop/story-quality-gates.md ADDED Viewed

@@ -0,0 +1,57 @@
+# Story — Quality Gates
+## User Story
+As a **developer**, I want **the pipeline to pause when feedback indicates quality concerns** so that **I can review and address issues before proceeding with flawed inputs**.
+---
+## Context / Scope
+- Per FEATURE_SPEC.md:Section 4 (Alternative: Quality Gate Triggers Pause), pipeline pauses when rating < threshold or recommendation is "pause"
+- Default threshold is 3.0 (per FEATURE_SPEC.md:Section 4)
+- Depends on feedback collection (story-feedback-collection.md)
+- Per SYSTEM_SPEC.md:Section 8, failure handling already supports pause/review — quality gates extend this
+---
+## Acceptance Criteria
+**AC-1 — Quality gate evaluation**
+- Given feedback is collected from an agent,
+- When the orchestrator evaluates the feedback,
+- Then `shouldPause` is true if:
+  - `rating < minRatingThreshold`, OR
+  - `recommendation === "pause"`
+**AC-2 — Pipeline pauses on quality gate trigger**
+- Given `shouldPause` evaluates to true,
+- When the quality gate is triggered,
+- Then the pipeline pauses before the current agent begins its main work,
+- And the user is prompted with: "Quality gate triggered. {Agent} rated previous stage {rating}/5. Issues: {issues}. (review/proceed/abort)"
+**AC-3 — User can proceed past quality gate**
+- Given the pipeline is paused at a quality gate,
+- When the user chooses "proceed",
+- Then the pipeline continues with the current agent's main work,
+- And the decision is recorded in history.
+**AC-4 — User can abort at quality gate**
+- Given the pipeline is paused at a quality gate,
+- When the user chooses "abort",
+- Then the pipeline stops,
+- And the feature is moved to the failed list with reason "quality_gate_abort".
+**AC-5 — User can review at quality gate**
+- Given the pipeline is paused at a quality gate,
+- When the user chooses "review",
+- Then the pipeline remains paused,
+- And the user can examine upstream artifacts before deciding to proceed or abort.
+---
+## Out of Scope
+- Automatic remediation or revision of upstream artifacts
+- Multiple threshold levels per agent (single global threshold for MVP)
+- Bypassing quality gates without explicit user action

package/.blueprint/features/feature_pipeline-history/FEATURE_SPEC.md ADDED Viewed

@@ -0,0 +1,239 @@
+# Feature Specification — Pipeline History
+## 1. Feature Intent
+**Why this feature exists.**
+- **Problem being addressed:** Currently, orchestr8 provides no visibility into historical pipeline executions. Users cannot see which features have been processed, how long each stage took, or identify patterns in failures.
+- **User need:** Developers want to understand pipeline performance over time, identify bottlenecks, and diagnose recurring failures. This supports continuous improvement of the feature development process.
+- **System purpose alignment:** Per SYSTEM_SPEC.md:Section 8 (Cross-Cutting Concerns:Observability), the system aims for observability via queue status and agent summaries. This feature extends observability to historical data, enabling retrospective analysis.
+> This feature reinforces the system's observability goals without altering core pipeline behaviour.
+---
+## 2. Scope
+### In Scope
+- Recording execution metrics for each pipeline run (start/end times, duration per stage, success/failure)
+- Persisting history to a JSON file (`.claude/pipeline-history.json`)
+- New CLI command `orchestr8 history` with subcommands and flags
+- Display of recent runs, aggregate statistics, and failure analysis
+- Clearing history via CLI
+### Out of Scope
+- Real-time monitoring or streaming metrics
+- Integration with external monitoring systems (Prometheus, Grafana, etc.)
+- Exporting history to formats other than JSON
+- History synchronisation across machines or repositories
+- Detailed error logs or stack traces (only stage-level failure status)
+---
+## 3. Actors Involved
+### Human User
+- **Can do:** View pipeline history via CLI; view aggregate statistics; clear history
+- **Cannot do:** Modify individual history entries; replay failed pipelines from history (out of scope)
+### Pipeline Orchestrator (internal component)
+- **Can do:** Record execution metrics at stage boundaries; write to history file
+- **Cannot do:** Alter past entries; delete selective entries
+---
+## 4. Behaviour Overview
+### Happy-path behaviour
+1. User invokes `/implement-feature "slug"` and pipeline executes normally
+2. At each stage transition (Alex, Cass, Nigel, Codey-plan, Codey-implement), timestamps are recorded
+3. On pipeline completion (success or failure), a history entry is written to `.claude/pipeline-history.json`
+4. User runs `orchestr8 history` to view recent executions and statistics
+### Key alternatives or branches
+- **Pipeline paused:** If `--pause-after` is used, history entry is recorded up to the pause point with status `paused`
+- **Pipeline failure:** If a stage fails, history entry records failure stage and status `failed`
+- **No history file:** On first write, file is created with empty array structure
+- **History clear:** User runs `orchestr8 history clear` to remove all entries
+### User-visible outcomes
+- List of recent pipeline runs with timing and status
+- Success rate percentage across all runs
+- Average duration per stage
+- Most common failure stage (if any failures exist)
+---
+## 5. State & Lifecycle Interactions
+### States entered
+- **history_recording:** When pipeline starts, a pending history entry is created in memory
+- **history_persisted:** When pipeline completes/fails/pauses, entry is written to file
+### States modified
+- Queue state (`.claude/implement-queue.json`) is extended with `startedAt` timestamp for each stage (if not already present)
+### This feature is:
+- **State-creating:** Creates new history entries per pipeline run
+- **Not state-transitioning:** Does not alter pipeline flow
+- **Not state-constraining:** Does not block pipeline operations
+---
+## 6. Rules & Decision Logic
+### Rule: History Entry Creation
+- **Description:** A history entry is created when a pipeline run completes (success), fails, or pauses
+- **Inputs:** Feature slug, stage timestamps, final status
+- **Outputs:** JSON object appended to history array
+- **Deterministic:** Yes
+### Rule: Duration Calculation
+- **Description:** Duration per stage calculated as difference between stage start and next stage start (or completion time for final stage)
+- **Inputs:** Stage timestamps
+- **Outputs:** Duration in milliseconds for each stage
+- **Deterministic:** Yes
+### Rule: Statistics Aggregation
+- **Description:** Statistics computed on-read from full history file
+- **Inputs:** All history entries
+- **Outputs:** Success rate, average durations, failure frequency by stage
+- **Deterministic:** Yes
+### Rule: Display Limit Default
+- **Description:** By default, show last 10 runs; `--all` shows unlimited
+- **Inputs:** Flag presence, history array length
+- **Outputs:** Truncated or full list
+- **Deterministic:** Yes
+---
+## 7. Dependencies
+### System components
+- `src/orchestrator.js` — Must emit events or expose hooks for recording stage transitions
+- `bin/cli.js` — Must register new `history` command
+- `.claude/implement-queue.json` — May be read for timing data during pipeline execution
+### External systems
+- None
+### Operational dependencies
+- File system access to `.claude/` directory
+- Permissions to write `.claude/pipeline-history.json`
+---
+## 8. Non-Functional Considerations
+### Performance sensitivity
+- History file read/write should be efficient; consider file size growth over time
+- ASSUMPTION: Most users will have <100 runs; O(n) aggregation is acceptable
+### Audit/logging needs
+- History file serves as an audit log of pipeline executions
+- Timestamps must be ISO 8601 format for consistency
+### Error tolerance
+- If history file is corrupted or unreadable, CLI should warn and allow continuation (no blocking)
+- History recording failure should not abort pipeline execution
+### Security implications
+- Feature slugs may contain project information; history file should be gitignored
+- No sensitive data (credentials, secrets) should appear in history entries
+---
+## 9. Assumptions & Open Questions
+### Assumptions
+- ASSUMPTION: History file will grow at a manageable rate (tens to hundreds of entries)
+- ASSUMPTION: Stage names are stable (alex, cass, nigel, codey-plan, codey-implement)
+- ASSUMPTION: ISO 8601 timestamps are sufficient for duration calculations
+### Open Questions
+- Should history entries include partial stage data for paused pipelines?
+- Should `--stats` include median duration alongside average?
+- Should there be a `history export` subcommand for CI integration (deferred)?
+---
+## 10. Impact on System Specification
+### Alignment assessment
+This feature **reinforces existing system assumptions**:
+- Per SYSTEM_SPEC.md:Section 8 (Observability), the system already tracks queue status and completion summaries
+- This feature extends observability to historical analysis without changing core behaviour
+- The queue file structure (`.claude/implement-queue.json`) already captures timestamps; this feature adds persistence beyond current run
+### No contradictions identified
+The feature does not alter:
+- Agent roles or boundaries
+- Pipeline flow or stage order
+- Artifact structures or handoff mechanisms
+### Minor extension to system spec
+The following addition to SYSTEM_SPEC.md:Section 5 (Core Domain Concepts) may be warranted:
+> **History Entry** — A record of a completed pipeline run, including slug, timestamps per stage, duration, and final status. Persisted to `.claude/pipeline-history.json`.
+This is flagged as a **non-breaking extension** for consideration.
+---
+## 11. Handover to BA (Cass)
+### Story themes
+1. **History recording** — Capturing execution data during pipeline runs
+2. **History display** — CLI command for viewing recent runs
+3. **Statistics display** — Aggregate metrics via `--stats` flag
+4. **History management** — Clearing history via `history clear`
+### Expected story boundaries
+- Recording logic should be a separate story from display logic
+- Statistics computation may be combined with display or separated
+- Clear functionality is a distinct, small story
+### Areas needing careful story framing
+- The interaction between `--pause-after` and history recording needs precise acceptance criteria
+- Error handling when history file is corrupted should be explicit
+- The "most common failure stage" calculation needs clear definition when there are ties
+---
+## 12. Change Log (Feature-Level)
+| Date       | Change                                | Reason                       | Raised By |
+|------------|---------------------------------------|------------------------------|-----------|
+| 2026-02-24 | Initial feature specification created | Feature request from user    | Alex      |

package/.blueprint/features/feature_pipeline-history/IMPLEMENTATION_PLAN.md ADDED Viewed

@@ -0,0 +1,71 @@
+# Implementation Plan - Pipeline History
+## Summary
+Implement pipeline history tracking by creating a new `src/history.js` module that records execution metrics during pipeline runs and provides CLI commands for viewing/managing history. The module will integrate with the existing orchestrator to capture stage timestamps and persist entries to `.claude/pipeline-history.json`. CLI routing in `bin/cli.js` will be extended with a new `history` command supporting subcommands and flags.
+---
+## Files to Create/Modify
+| Path | Action | Purpose |
+|------|--------|---------|
+| `src/history.js` | Create | Core history module: `recordHistory()`, `displayHistory()`, `showStats()`, `clearHistory()` |
+| `bin/cli.js` | Modify | Add `history` command routing with `--all`, `--stats`, `--force` flags and `clear` subcommand |
+| `src/orchestrator.js` | Modify | Add stage timestamp tracking; call `recordHistory()` on pipeline completion/failure/pause |
+---
+## Implementation Steps
+1. **Create `src/history.js` with file I/O helpers** - `readHistoryFile()`, `writeHistoryFile()`, `ensureHistoryFile()` handling missing/corrupted files gracefully.
+2. **Implement `recordHistory(entry)` function** - Accepts history entry object, appends to history array, writes to `.claude/pipeline-history.json`. Wrap in try/catch to log warning on failure without throwing.
+3. **Implement `displayHistory(options)` function** - Read history, sort by `completedAt` descending, slice to 10 entries (unless `--all`), format tabular output with color-coded status.
+4. **Implement `showStats()` function** - Compute success rate, average duration per stage, total average for successful runs, most common failure stage (handling ties).
+5. **Implement `clearHistory(options)` function** - Show confirmation prompt (unless `--force`), reset file to empty array on confirm, display count of removed entries.
+6. **Add CLI routing in `bin/cli.js`** - Register `history` command; parse `--all`, `--stats`, `--force` flags; handle `clear` subcommand.
+7. **Modify `src/orchestrator.js` to track stage timestamps** - Update `setCurrent()` to record `startedAt`; add `completeStage()` helper to record `completedAt` and compute `durationMs`.
+8. **Add `recordPipelineCompletion()` to orchestrator** - Called on success/failure/pause; builds history entry from accumulated stage data and calls `recordHistory()`.
+9. **Add `.claude/pipeline-history.json` to `.gitignore`** - Update `src/init.js` to append this pattern during initialization.
+10. **Run tests and verify all T-* test IDs pass** - Execute `node --test test/feature_pipeline-history.test.js`.
+---
+## Data Model
+```json
+{
+  "slug": "feature-name",
+  "status": "success | failed | paused",
+  "startedAt": "2026-02-24T10:00:00.000Z",
+  "completedAt": "2026-02-24T10:15:00.000Z",
+  "totalDurationMs": 900000,
+  "stages": {
+    "alex": { "startedAt": "...", "completedAt": "...", "durationMs": 120000 },
+    "cass": { "startedAt": "...", "completedAt": "...", "durationMs": 90000 },
+    "nigel": { "startedAt": "...", "completedAt": "...", "durationMs": 180000 },
+    "codey-plan": { "startedAt": "...", "completedAt": "...", "durationMs": 75000 },
+    "codey-implement": { "startedAt": "...", "completedAt": "...", "durationMs": 255000 }
+  },
+  "failedStage": null,
+  "pausedAfter": null
+}
+```
+---
+## Risks/Questions
+- **Confirmation prompt testing**: Tests will need to mock stdin for `clearHistory()` confirmation; consider using `readline` interface that can be injected for testing.
+- **Color output detection**: Use `process.stdout.isTTY` to determine if colors should be applied; provide fallback for non-TTY environments.
+- **Orchestrator integration point**: The `/implement-feature` skill (SKILL.md) runs via Task tool sub-agents; recording hooks must be called from the skill's completion handler, not just from `src/orchestrator.js`. Verify integration path.
+- **File corruption recovery**: Per AC-4, corrupted history file should not block CLI; implement robust JSON parsing with fallback to empty array after warning.

package/.blueprint/features/feature_pipeline-history/story-clear-history.md ADDED Viewed

@@ -0,0 +1,73 @@
+# Story — Clear Pipeline History
+## User story
+As a developer using orchestr8, I want to clear the pipeline history so that I can reset metrics or remove stale data.
+---
+## Context / scope
+- New CLI subcommand: `orchestr8 history clear`
+- Removes all entries from `.claude/pipeline-history.json`
+- Per FEATURE_SPEC.md:Section 3 (Actors), users can clear history but cannot modify individual entries
+- Destructive action requiring confirmation
+---
+## Acceptance criteria
+**AC-1 — Clear with confirmation**
+- Given `.claude/pipeline-history.json` contains history entries,
+- When I run `orchestr8 history clear`,
+- Then I see a confirmation prompt: "This will delete all 25 history entries. Continue? (y/N)"
+- And I must type 'y' or 'yes' to proceed.
+**AC-2 — Clear executes on confirmation**
+- Given I confirm the clear action,
+- When the command completes,
+- Then `.claude/pipeline-history.json` is reset to an empty array `[]`,
+- And I see "Pipeline history cleared. 25 entries removed."
+**AC-3 — Clear cancelled on decline**
+- Given I decline the confirmation (type 'n', 'no', or press Enter),
+- When the command completes,
+- Then `.claude/pipeline-history.json` remains unchanged,
+- And I see "Clear cancelled. History unchanged."
+**AC-4 — Force clear without confirmation**
+- Given `.claude/pipeline-history.json` contains history entries,
+- When I run `orchestr8 history clear --force`,
+- Then the history is cleared without a confirmation prompt,
+- And I see "Pipeline history cleared. 25 entries removed."
+**AC-5 — Clear empty history**
+- Given `.claude/pipeline-history.json` is empty or does not exist,
+- When I run `orchestr8 history clear`,
+- Then I see "No history to clear."
+- And the command exits with code 0.
+---
+## CLI interaction
+```
+$ orchestr8 history clear
+This will delete all 25 history entries. Continue? (y/N) y
+Pipeline history cleared. 25 entries removed.
+$ orchestr8 history clear --force
+Pipeline history cleared. 25 entries removed.
+$ orchestr8 history clear
+No history to clear.
+```
+---
+## Out of scope
+- Clearing individual entries or filtered subsets
+- Archiving history before clearing
+- Undo/restore functionality
+- Automatic cleanup based on age or count