orchestr8 2.5.0 → 2.6.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (54) hide show
  1. package/.blueprint/agents/AGENT_BA_CASS.md +42 -19
  2. package/.blueprint/agents/AGENT_DEVELOPER_CODEY.md +42 -38
  3. package/.blueprint/agents/AGENT_SPECIFICATION_ALEX.md +45 -0
  4. package/.blueprint/agents/AGENT_TESTER_NIGEL.md +42 -21
  5. package/.blueprint/features/feature_adaptive-retry/FEATURE_SPEC.md +239 -0
  6. package/.blueprint/features/feature_adaptive-retry/IMPLEMENTATION_PLAN.md +48 -0
  7. package/.blueprint/features/feature_adaptive-retry/story-prompt-modification.md +85 -0
  8. package/.blueprint/features/feature_adaptive-retry/story-retry-config.md +89 -0
  9. package/.blueprint/features/feature_adaptive-retry/story-should-retry.md +98 -0
  10. package/.blueprint/features/feature_adaptive-retry/story-strategy-recommendation.md +85 -0
  11. package/.blueprint/features/feature_agent-guardrails/FEATURE_SPEC.md +328 -0
  12. package/.blueprint/features/feature_agent-guardrails/IMPLEMENTATION_PLAN.md +90 -0
  13. package/.blueprint/features/feature_agent-guardrails/story-citation-requirements.md +50 -0
  14. package/.blueprint/features/feature_agent-guardrails/story-confidentiality.md +50 -0
  15. package/.blueprint/features/feature_agent-guardrails/story-escalation-protocol.md +55 -0
  16. package/.blueprint/features/feature_agent-guardrails/story-source-restrictions.md +50 -0
  17. package/.blueprint/features/feature_feedback-loop/FEATURE_SPEC.md +347 -0
  18. package/.blueprint/features/feature_feedback-loop/IMPLEMENTATION_PLAN.md +71 -0
  19. package/.blueprint/features/feature_feedback-loop/story-feedback-collection.md +63 -0
  20. package/.blueprint/features/feature_feedback-loop/story-feedback-config.md +61 -0
  21. package/.blueprint/features/feature_feedback-loop/story-feedback-insights.md +63 -0
  22. package/.blueprint/features/feature_feedback-loop/story-quality-gates.md +57 -0
  23. package/.blueprint/features/feature_pipeline-history/FEATURE_SPEC.md +239 -0
  24. package/.blueprint/features/feature_pipeline-history/IMPLEMENTATION_PLAN.md +71 -0
  25. package/.blueprint/features/feature_pipeline-history/story-clear-history.md +73 -0
  26. package/.blueprint/features/feature_pipeline-history/story-display-history.md +75 -0
  27. package/.blueprint/features/feature_pipeline-history/story-record-execution.md +76 -0
  28. package/.blueprint/features/feature_pipeline-history/story-show-statistics.md +85 -0
  29. package/.blueprint/features/feature_pipeline-insights/FEATURE_SPEC.md +288 -0
  30. package/.blueprint/features/feature_pipeline-insights/IMPLEMENTATION_PLAN.md +65 -0
  31. package/.blueprint/features/feature_pipeline-insights/story-anomaly-detection.md +71 -0
  32. package/.blueprint/features/feature_pipeline-insights/story-bottleneck-analysis.md +75 -0
  33. package/.blueprint/features/feature_pipeline-insights/story-failure-patterns.md +75 -0
  34. package/.blueprint/features/feature_pipeline-insights/story-json-output.md +75 -0
  35. package/.blueprint/features/feature_pipeline-insights/story-trend-analysis.md +78 -0
  36. package/.blueprint/features/feature_validate-command/FEATURE_SPEC.md +209 -0
  37. package/.blueprint/features/feature_validate-command/IMPLEMENTATION_PLAN.md +59 -0
  38. package/.blueprint/features/feature_validate-command/story-failure-output.md +61 -0
  39. package/.blueprint/features/feature_validate-command/story-node-version-check.md +52 -0
  40. package/.blueprint/features/feature_validate-command/story-run-validation.md +59 -0
  41. package/.blueprint/features/feature_validate-command/story-success-output.md +50 -0
  42. package/.blueprint/system_specification/SYSTEM_SPEC.md +248 -0
  43. package/README.md +182 -38
  44. package/SKILL.md +333 -23
  45. package/bin/cli.js +128 -20
  46. package/package.json +2 -2
  47. package/src/feedback.js +171 -0
  48. package/src/history.js +306 -0
  49. package/src/index.js +57 -2
  50. package/src/init.js +2 -6
  51. package/src/insights.js +504 -0
  52. package/src/retry.js +274 -0
  53. package/src/validate.js +172 -0
  54. package/src/skills.js +0 -93
@@ -0,0 +1,63 @@
1
+ # Story — Feedback Collection
2
+
3
+ ## User Story
4
+
5
+ As a **pipeline orchestrator**, I want **downstream agents to provide structured feedback on upstream artifacts** so that **quality issues are surfaced explicitly at each stage boundary**.
6
+
7
+ ---
8
+
9
+ ## Context / Scope
10
+
11
+ - Per FEATURE_SPEC.md:Section 4, feedback is collected at each stage boundary (Cass on Alex, Nigel on Cass, Codey on Nigel)
12
+ - Feedback uses a defined schema with rating, confidence, issues, and recommendation
13
+ - Feedback is captured before the downstream agent begins its main work
14
+ - Per SYSTEM_SPEC.md:Section 7, agents must "flag deviations" — this story operationalises that principle
15
+
16
+ ---
17
+
18
+ ## Acceptance Criteria
19
+
20
+ **AC-1 — Feedback schema structure**
21
+ - Given an agent is spawned to provide feedback,
22
+ - When the agent completes feedback output,
23
+ - Then the feedback object contains:
24
+ - `about`: agent name being assessed (alex|cass|nigel)
25
+ - `rating`: integer 1-5
26
+ - `confidence`: float 0.0-1.0
27
+ - `issues`: array of issue codes (may be empty)
28
+ - `recommendation`: one of "proceed", "pause", or "revise"
29
+
30
+ **AC-2 — Cass provides feedback on Alex**
31
+ - Given Alex has completed a feature specification,
32
+ - When Cass is spawned for story writing,
33
+ - Then Cass first produces a feedback object with `about: "alex"` assessing the feature spec quality.
34
+
35
+ **AC-3 — Nigel provides feedback on Cass**
36
+ - Given Cass has completed user stories,
37
+ - When Nigel is spawned for test writing,
38
+ - Then Nigel first produces a feedback object with `about: "cass"` assessing story quality and testability.
39
+
40
+ **AC-4 — Codey provides feedback on Nigel**
41
+ - Given Nigel has completed test specifications,
42
+ - When Codey is spawned for planning/implementation,
43
+ - Then Codey first produces a feedback object with `about: "nigel"` assessing test coverage and implementation feasibility.
44
+
45
+ **AC-5 — Feedback validation**
46
+ - Given an agent produces a feedback object,
47
+ - When the orchestrator reads the feedback,
48
+ - Then the feedback is validated against the schema,
49
+ - And invalid feedback triggers a warning but does not block the pipeline (per FEATURE_SPEC.md:Section 8, degraded mode).
50
+
51
+ **AC-6 — Feedback persisted to history**
52
+ - Given feedback is collected from an agent,
53
+ - When the stage completes,
54
+ - Then the feedback is stored in the history entry at `stages[stage].feedback`.
55
+
56
+ ---
57
+
58
+ ## Out of Scope
59
+
60
+ - Feedback from Alex (no prior stage to assess)
61
+ - Feedback on auto-commit stage
62
+ - Natural language feedback parsing (structured schema only)
63
+ - Automatic remediation based on feedback
@@ -0,0 +1,61 @@
1
+ # Story — Feedback Configuration
2
+
3
+ ## User Story
4
+
5
+ As a **developer**, I want **CLI commands to view and modify feedback thresholds** so that **I can tune quality gate sensitivity based on my project's needs**.
6
+
7
+ ---
8
+
9
+ ## Context / Scope
10
+
11
+ - Per FEATURE_SPEC.md:Section 7, configuration is stored in `.claude/feedback-config.json`
12
+ - Per FEATURE_SPEC.md:Section 7, new CLI commands: `orchestr8 feedback-config` and `orchestr8 feedback-config set <key> <value>`
13
+ - Parallel track to quality gates — configuration can be set independently
14
+
15
+ ---
16
+
17
+ ## Acceptance Criteria
18
+
19
+ **AC-1 — View feedback configuration**
20
+ - Given the user runs `orchestr8 feedback-config`,
21
+ - When the command executes,
22
+ - Then the current configuration is displayed including:
23
+ - `minRatingThreshold` (default: 3.0)
24
+ - `enabled` (default: true)
25
+ - Any custom issue-to-strategy mappings
26
+
27
+ **AC-2 — Set threshold value**
28
+ - Given the user runs `orchestr8 feedback-config set minRating <value>`,
29
+ - When the value is a number between 1.0 and 5.0,
30
+ - Then the threshold is updated in `.claude/feedback-config.json`,
31
+ - And a confirmation message is displayed.
32
+
33
+ **AC-3 — Invalid threshold rejected**
34
+ - Given the user runs `orchestr8 feedback-config set minRating <value>`,
35
+ - When the value is outside 1.0-5.0 range or not a number,
36
+ - Then an error message is displayed,
37
+ - And the configuration is not modified.
38
+
39
+ **AC-4 — Enable/disable feedback system**
40
+ - Given the user runs `orchestr8 feedback-config set enabled <true|false>`,
41
+ - When the command executes,
42
+ - Then the `enabled` flag is updated,
43
+ - And when disabled, feedback collection and quality gates are skipped.
44
+
45
+ **AC-5 — Configuration file created on first set**
46
+ - Given `.claude/feedback-config.json` does not exist,
47
+ - When the user runs a `feedback-config set` command,
48
+ - Then the file is created with default values plus the specified override.
49
+
50
+ **AC-6 — Configuration file is gitignored**
51
+ - Given a project is initialised with orchestr8,
52
+ - When feedback configuration is created,
53
+ - Then `.claude/feedback-config.json` is included in gitignore patterns.
54
+
55
+ ---
56
+
57
+ ## Out of Scope
58
+
59
+ - Per-agent threshold configuration (single global threshold for MVP)
60
+ - Custom issue code definition via CLI
61
+ - Configuration import/export
@@ -0,0 +1,63 @@
1
+ # Story — Feedback Insights
2
+
3
+ ## User Story
4
+
5
+ As a **developer**, I want **correlation analysis between feedback scores and pipeline outcomes** so that **I can understand how predictive agent feedback is and tune thresholds accordingly**.
6
+
7
+ ---
8
+
9
+ ## Context / Scope
10
+
11
+ - Per FEATURE_SPEC.md:Section 7, extends `src/insights.js` with feedback analysis functions
12
+ - Per FEATURE_SPEC.md:Section 6 (Rule 4), calculates agent calibration as correlation between ratings and outcomes
13
+ - Depends on feedback being stored in history (story-feedback-collection.md)
14
+ - Per FEATURE_SPEC.md:Section 9, requires 10+ completed runs for meaningful results
15
+
16
+ ---
17
+
18
+ ## Acceptance Criteria
19
+
20
+ **AC-1 — Feedback analysis command**
21
+ - Given the user runs `orchestr8 insights --feedback`,
22
+ - When sufficient history exists (10+ completed runs with feedback),
23
+ - Then a feedback analysis report is displayed.
24
+
25
+ **AC-2 — Agent calibration scoring**
26
+ - Given the feedback analysis runs,
27
+ - When calibration is calculated per agent,
28
+ - Then each agent receives a calibration score (0.0-1.0):
29
+ - 0.0 = feedback uncorrelated with outcomes
30
+ - 1.0 = perfect predictor of success/failure
31
+ - And the score is displayed as "Cass calibration: 0.72" format.
32
+
33
+ **AC-3 — Issue pattern correlation**
34
+ - Given feedback history contains issue codes,
35
+ - When the analysis runs,
36
+ - Then issue codes are correlated with failure outcomes,
37
+ - And frequently predictive issues are highlighted (e.g., "`unclear-scope` preceded 80% of failures").
38
+
39
+ **AC-4 — Threshold recommendation**
40
+ - Given sufficient calibration data exists,
41
+ - When the analysis runs,
42
+ - Then a recommended `minRatingThreshold` is suggested based on historical data,
43
+ - And the recommendation balances false positives (unnecessary pauses) and false negatives (missed quality issues).
44
+
45
+ **AC-5 — Insufficient data handling**
46
+ - Given the user runs `orchestr8 insights --feedback`,
47
+ - When fewer than 10 completed runs with feedback exist,
48
+ - Then a message is displayed: "Insufficient data for feedback analysis. {N}/10 runs with feedback available."
49
+
50
+ **AC-6 — Retry strategy mapping**
51
+ - Given feedback analysis identifies predictive issue patterns,
52
+ - When the user views insights,
53
+ - Then issue-to-strategy mappings are displayed (per FEATURE_SPEC.md:Rule 3),
54
+ - And the user can see which retry strategies are recommended for common issues.
55
+
56
+ ---
57
+
58
+ ## Out of Scope
59
+
60
+ - Cross-pipeline feedback aggregation (each project is independent)
61
+ - Real-time calibration updates during pipeline execution
62
+ - Natural language interpretation of feedback patterns
63
+ - Automatic threshold adjustment (user must run `feedback-config set`)
@@ -0,0 +1,57 @@
1
+ # Story — Quality Gates
2
+
3
+ ## User Story
4
+
5
+ As a **developer**, I want **the pipeline to pause when feedback indicates quality concerns** so that **I can review and address issues before proceeding with flawed inputs**.
6
+
7
+ ---
8
+
9
+ ## Context / Scope
10
+
11
+ - Per FEATURE_SPEC.md:Section 4 (Alternative: Quality Gate Triggers Pause), pipeline pauses when rating < threshold or recommendation is "pause"
12
+ - Default threshold is 3.0 (per FEATURE_SPEC.md:Section 4)
13
+ - Depends on feedback collection (story-feedback-collection.md)
14
+ - Per SYSTEM_SPEC.md:Section 8, failure handling already supports pause/review — quality gates extend this
15
+
16
+ ---
17
+
18
+ ## Acceptance Criteria
19
+
20
+ **AC-1 — Quality gate evaluation**
21
+ - Given feedback is collected from an agent,
22
+ - When the orchestrator evaluates the feedback,
23
+ - Then `shouldPause` is true if:
24
+ - `rating < minRatingThreshold`, OR
25
+ - `recommendation === "pause"`
26
+
27
+ **AC-2 — Pipeline pauses on quality gate trigger**
28
+ - Given `shouldPause` evaluates to true,
29
+ - When the quality gate is triggered,
30
+ - Then the pipeline pauses before the current agent begins its main work,
31
+ - And the user is prompted with: "Quality gate triggered. {Agent} rated previous stage {rating}/5. Issues: {issues}. (review/proceed/abort)"
32
+
33
+ **AC-3 — User can proceed past quality gate**
34
+ - Given the pipeline is paused at a quality gate,
35
+ - When the user chooses "proceed",
36
+ - Then the pipeline continues with the current agent's main work,
37
+ - And the decision is recorded in history.
38
+
39
+ **AC-4 — User can abort at quality gate**
40
+ - Given the pipeline is paused at a quality gate,
41
+ - When the user chooses "abort",
42
+ - Then the pipeline stops,
43
+ - And the feature is moved to the failed list with reason "quality_gate_abort".
44
+
45
+ **AC-5 — User can review at quality gate**
46
+ - Given the pipeline is paused at a quality gate,
47
+ - When the user chooses "review",
48
+ - Then the pipeline remains paused,
49
+ - And the user can examine upstream artifacts before deciding to proceed or abort.
50
+
51
+ ---
52
+
53
+ ## Out of Scope
54
+
55
+ - Automatic remediation or revision of upstream artifacts
56
+ - Multiple threshold levels per agent (single global threshold for MVP)
57
+ - Bypassing quality gates without explicit user action
@@ -0,0 +1,239 @@
1
+ # Feature Specification — Pipeline History
2
+
3
+ ## 1. Feature Intent
4
+
5
+ **Why this feature exists.**
6
+
7
+ - **Problem being addressed:** Currently, orchestr8 provides no visibility into historical pipeline executions. Users cannot see which features have been processed, how long each stage took, or identify patterns in failures.
8
+ - **User need:** Developers want to understand pipeline performance over time, identify bottlenecks, and diagnose recurring failures. This supports continuous improvement of the feature development process.
9
+ - **System purpose alignment:** Per SYSTEM_SPEC.md:Section 8 (Cross-Cutting Concerns:Observability), the system aims for observability via queue status and agent summaries. This feature extends observability to historical data, enabling retrospective analysis.
10
+
11
+ > This feature reinforces the system's observability goals without altering core pipeline behaviour.
12
+
13
+ ---
14
+
15
+ ## 2. Scope
16
+
17
+ ### In Scope
18
+
19
+ - Recording execution metrics for each pipeline run (start/end times, duration per stage, success/failure)
20
+ - Persisting history to a JSON file (`.claude/pipeline-history.json`)
21
+ - New CLI command `orchestr8 history` with subcommands and flags
22
+ - Display of recent runs, aggregate statistics, and failure analysis
23
+ - Clearing history via CLI
24
+
25
+ ### Out of Scope
26
+
27
+ - Real-time monitoring or streaming metrics
28
+ - Integration with external monitoring systems (Prometheus, Grafana, etc.)
29
+ - Exporting history to formats other than JSON
30
+ - History synchronisation across machines or repositories
31
+ - Detailed error logs or stack traces (only stage-level failure status)
32
+
33
+ ---
34
+
35
+ ## 3. Actors Involved
36
+
37
+ ### Human User
38
+
39
+ - **Can do:** View pipeline history via CLI; view aggregate statistics; clear history
40
+ - **Cannot do:** Modify individual history entries; replay failed pipelines from history (out of scope)
41
+
42
+ ### Pipeline Orchestrator (internal component)
43
+
44
+ - **Can do:** Record execution metrics at stage boundaries; write to history file
45
+ - **Cannot do:** Alter past entries; delete selective entries
46
+
47
+ ---
48
+
49
+ ## 4. Behaviour Overview
50
+
51
+ ### Happy-path behaviour
52
+
53
+ 1. User invokes `/implement-feature "slug"` and pipeline executes normally
54
+ 2. At each stage transition (Alex, Cass, Nigel, Codey-plan, Codey-implement), timestamps are recorded
55
+ 3. On pipeline completion (success or failure), a history entry is written to `.claude/pipeline-history.json`
56
+ 4. User runs `orchestr8 history` to view recent executions and statistics
57
+
58
+ ### Key alternatives or branches
59
+
60
+ - **Pipeline paused:** If `--pause-after` is used, history entry is recorded up to the pause point with status `paused`
61
+ - **Pipeline failure:** If a stage fails, history entry records failure stage and status `failed`
62
+ - **No history file:** On first write, file is created with empty array structure
63
+ - **History clear:** User runs `orchestr8 history clear` to remove all entries
64
+
65
+ ### User-visible outcomes
66
+
67
+ - List of recent pipeline runs with timing and status
68
+ - Success rate percentage across all runs
69
+ - Average duration per stage
70
+ - Most common failure stage (if any failures exist)
71
+
72
+ ---
73
+
74
+ ## 5. State & Lifecycle Interactions
75
+
76
+ ### States entered
77
+
78
+ - **history_recording:** When pipeline starts, a pending history entry is created in memory
79
+ - **history_persisted:** When pipeline completes/fails/pauses, entry is written to file
80
+
81
+ ### States modified
82
+
83
+ - Queue state (`.claude/implement-queue.json`) is extended with `startedAt` timestamp for each stage (if not already present)
84
+
85
+ ### This feature is:
86
+
87
+ - **State-creating:** Creates new history entries per pipeline run
88
+ - **Not state-transitioning:** Does not alter pipeline flow
89
+ - **Not state-constraining:** Does not block pipeline operations
90
+
91
+ ---
92
+
93
+ ## 6. Rules & Decision Logic
94
+
95
+ ### Rule: History Entry Creation
96
+
97
+ - **Description:** A history entry is created when a pipeline run completes (success), fails, or pauses
98
+ - **Inputs:** Feature slug, stage timestamps, final status
99
+ - **Outputs:** JSON object appended to history array
100
+ - **Deterministic:** Yes
101
+
102
+ ### Rule: Duration Calculation
103
+
104
+ - **Description:** Duration per stage calculated as difference between stage start and next stage start (or completion time for final stage)
105
+ - **Inputs:** Stage timestamps
106
+ - **Outputs:** Duration in milliseconds for each stage
107
+ - **Deterministic:** Yes
108
+
109
+ ### Rule: Statistics Aggregation
110
+
111
+ - **Description:** Statistics computed on-read from full history file
112
+ - **Inputs:** All history entries
113
+ - **Outputs:** Success rate, average durations, failure frequency by stage
114
+ - **Deterministic:** Yes
115
+
116
+ ### Rule: Display Limit Default
117
+
118
+ - **Description:** By default, show last 10 runs; `--all` shows unlimited
119
+ - **Inputs:** Flag presence, history array length
120
+ - **Outputs:** Truncated or full list
121
+ - **Deterministic:** Yes
122
+
123
+ ---
124
+
125
+ ## 7. Dependencies
126
+
127
+ ### System components
128
+
129
+ - `src/orchestrator.js` — Must emit events or expose hooks for recording stage transitions
130
+ - `bin/cli.js` — Must register new `history` command
131
+ - `.claude/implement-queue.json` — May be read for timing data during pipeline execution
132
+
133
+ ### External systems
134
+
135
+ - None
136
+
137
+ ### Operational dependencies
138
+
139
+ - File system access to `.claude/` directory
140
+ - Permissions to write `.claude/pipeline-history.json`
141
+
142
+ ---
143
+
144
+ ## 8. Non-Functional Considerations
145
+
146
+ ### Performance sensitivity
147
+
148
+ - History file read/write should be efficient; consider file size growth over time
149
+ - ASSUMPTION: Most users will have <100 runs; O(n) aggregation is acceptable
150
+
151
+ ### Audit/logging needs
152
+
153
+ - History file serves as an audit log of pipeline executions
154
+ - Timestamps must be ISO 8601 format for consistency
155
+
156
+ ### Error tolerance
157
+
158
+ - If history file is corrupted or unreadable, CLI should warn and allow continuation (no blocking)
159
+ - History recording failure should not abort pipeline execution
160
+
161
+ ### Security implications
162
+
163
+ - Feature slugs may contain project information; history file should be gitignored
164
+ - No sensitive data (credentials, secrets) should appear in history entries
165
+
166
+ ---
167
+
168
+ ## 9. Assumptions & Open Questions
169
+
170
+ ### Assumptions
171
+
172
+ - ASSUMPTION: History file will grow at a manageable rate (tens to hundreds of entries)
173
+ - ASSUMPTION: Stage names are stable (alex, cass, nigel, codey-plan, codey-implement)
174
+ - ASSUMPTION: ISO 8601 timestamps are sufficient for duration calculations
175
+
176
+ ### Open Questions
177
+
178
+ - Should history entries include partial stage data for paused pipelines?
179
+ - Should `--stats` include median duration alongside average?
180
+ - Should there be a `history export` subcommand for CI integration (deferred)?
181
+
182
+ ---
183
+
184
+ ## 10. Impact on System Specification
185
+
186
+ ### Alignment assessment
187
+
188
+ This feature **reinforces existing system assumptions**:
189
+
190
+ - Per SYSTEM_SPEC.md:Section 8 (Observability), the system already tracks queue status and completion summaries
191
+ - This feature extends observability to historical analysis without changing core behaviour
192
+ - The queue file structure (`.claude/implement-queue.json`) already captures timestamps; this feature adds persistence beyond current run
193
+
194
+ ### No contradictions identified
195
+
196
+ The feature does not alter:
197
+
198
+ - Agent roles or boundaries
199
+ - Pipeline flow or stage order
200
+ - Artifact structures or handoff mechanisms
201
+
202
+ ### Minor extension to system spec
203
+
204
+ The following addition to SYSTEM_SPEC.md:Section 5 (Core Domain Concepts) may be warranted:
205
+
206
+ > **History Entry** — A record of a completed pipeline run, including slug, timestamps per stage, duration, and final status. Persisted to `.claude/pipeline-history.json`.
207
+
208
+ This is flagged as a **non-breaking extension** for consideration.
209
+
210
+ ---
211
+
212
+ ## 11. Handover to BA (Cass)
213
+
214
+ ### Story themes
215
+
216
+ 1. **History recording** — Capturing execution data during pipeline runs
217
+ 2. **History display** — CLI command for viewing recent runs
218
+ 3. **Statistics display** — Aggregate metrics via `--stats` flag
219
+ 4. **History management** — Clearing history via `history clear`
220
+
221
+ ### Expected story boundaries
222
+
223
+ - Recording logic should be a separate story from display logic
224
+ - Statistics computation may be combined with display or separated
225
+ - Clear functionality is a distinct, small story
226
+
227
+ ### Areas needing careful story framing
228
+
229
+ - The interaction between `--pause-after` and history recording needs precise acceptance criteria
230
+ - Error handling when history file is corrupted should be explicit
231
+ - The "most common failure stage" calculation needs clear definition when there are ties
232
+
233
+ ---
234
+
235
+ ## 12. Change Log (Feature-Level)
236
+
237
+ | Date | Change | Reason | Raised By |
238
+ |------------|---------------------------------------|------------------------------|-----------|
239
+ | 2026-02-24 | Initial feature specification created | Feature request from user | Alex |
@@ -0,0 +1,71 @@
1
+ # Implementation Plan - Pipeline History
2
+
3
+ ## Summary
4
+
5
+ Implement pipeline history tracking by creating a new `src/history.js` module that records execution metrics during pipeline runs and provides CLI commands for viewing/managing history. The module will integrate with the existing orchestrator to capture stage timestamps and persist entries to `.claude/pipeline-history.json`. CLI routing in `bin/cli.js` will be extended with a new `history` command supporting subcommands and flags.
6
+
7
+ ---
8
+
9
+ ## Files to Create/Modify
10
+
11
+ | Path | Action | Purpose |
12
+ |------|--------|---------|
13
+ | `src/history.js` | Create | Core history module: `recordHistory()`, `displayHistory()`, `showStats()`, `clearHistory()` |
14
+ | `bin/cli.js` | Modify | Add `history` command routing with `--all`, `--stats`, `--force` flags and `clear` subcommand |
15
+ | `src/orchestrator.js` | Modify | Add stage timestamp tracking; call `recordHistory()` on pipeline completion/failure/pause |
16
+
17
+ ---
18
+
19
+ ## Implementation Steps
20
+
21
+ 1. **Create `src/history.js` with file I/O helpers** - `readHistoryFile()`, `writeHistoryFile()`, `ensureHistoryFile()` handling missing/corrupted files gracefully.
22
+
23
+ 2. **Implement `recordHistory(entry)` function** - Accepts history entry object, appends to history array, writes to `.claude/pipeline-history.json`. Wrap in try/catch to log warning on failure without throwing.
24
+
25
+ 3. **Implement `displayHistory(options)` function** - Read history, sort by `completedAt` descending, slice to 10 entries (unless `--all`), format tabular output with color-coded status.
26
+
27
+ 4. **Implement `showStats()` function** - Compute success rate, average duration per stage, total average for successful runs, most common failure stage (handling ties).
28
+
29
+ 5. **Implement `clearHistory(options)` function** - Show confirmation prompt (unless `--force`), reset file to empty array on confirm, display count of removed entries.
30
+
31
+ 6. **Add CLI routing in `bin/cli.js`** - Register `history` command; parse `--all`, `--stats`, `--force` flags; handle `clear` subcommand.
32
+
33
+ 7. **Modify `src/orchestrator.js` to track stage timestamps** - Update `setCurrent()` to record `startedAt`; add `completeStage()` helper to record `completedAt` and compute `durationMs`.
34
+
35
+ 8. **Add `recordPipelineCompletion()` to orchestrator** - Called on success/failure/pause; builds history entry from accumulated stage data and calls `recordHistory()`.
36
+
37
+ 9. **Add `.claude/pipeline-history.json` to `.gitignore`** - Update `src/init.js` to append this pattern during initialization.
38
+
39
+ 10. **Run tests and verify all T-* test IDs pass** - Execute `node --test test/feature_pipeline-history.test.js`.
40
+
41
+ ---
42
+
43
+ ## Data Model
44
+
45
+ ```json
46
+ {
47
+ "slug": "feature-name",
48
+ "status": "success | failed | paused",
49
+ "startedAt": "2026-02-24T10:00:00.000Z",
50
+ "completedAt": "2026-02-24T10:15:00.000Z",
51
+ "totalDurationMs": 900000,
52
+ "stages": {
53
+ "alex": { "startedAt": "...", "completedAt": "...", "durationMs": 120000 },
54
+ "cass": { "startedAt": "...", "completedAt": "...", "durationMs": 90000 },
55
+ "nigel": { "startedAt": "...", "completedAt": "...", "durationMs": 180000 },
56
+ "codey-plan": { "startedAt": "...", "completedAt": "...", "durationMs": 75000 },
57
+ "codey-implement": { "startedAt": "...", "completedAt": "...", "durationMs": 255000 }
58
+ },
59
+ "failedStage": null,
60
+ "pausedAfter": null
61
+ }
62
+ ```
63
+
64
+ ---
65
+
66
+ ## Risks/Questions
67
+
68
+ - **Confirmation prompt testing**: Tests will need to mock stdin for `clearHistory()` confirmation; consider using `readline` interface that can be injected for testing.
69
+ - **Color output detection**: Use `process.stdout.isTTY` to determine if colors should be applied; provide fallback for non-TTY environments.
70
+ - **Orchestrator integration point**: The `/implement-feature` skill (SKILL.md) runs via Task tool sub-agents; recording hooks must be called from the skill's completion handler, not just from `src/orchestrator.js`. Verify integration path.
71
+ - **File corruption recovery**: Per AC-4, corrupted history file should not block CLI; implement robust JSON parsing with fallback to empty array after warning.
@@ -0,0 +1,73 @@
1
+ # Story — Clear Pipeline History
2
+
3
+ ## User story
4
+
5
+ As a developer using orchestr8, I want to clear the pipeline history so that I can reset metrics or remove stale data.
6
+
7
+ ---
8
+
9
+ ## Context / scope
10
+
11
+ - New CLI subcommand: `orchestr8 history clear`
12
+ - Removes all entries from `.claude/pipeline-history.json`
13
+ - Per FEATURE_SPEC.md:Section 3 (Actors), users can clear history but cannot modify individual entries
14
+ - Destructive action requiring confirmation
15
+
16
+ ---
17
+
18
+ ## Acceptance criteria
19
+
20
+ **AC-1 — Clear with confirmation**
21
+ - Given `.claude/pipeline-history.json` contains history entries,
22
+ - When I run `orchestr8 history clear`,
23
+ - Then I see a confirmation prompt: "This will delete all 25 history entries. Continue? (y/N)"
24
+ - And I must type 'y' or 'yes' to proceed.
25
+
26
+ **AC-2 — Clear executes on confirmation**
27
+ - Given I confirm the clear action,
28
+ - When the command completes,
29
+ - Then `.claude/pipeline-history.json` is reset to an empty array `[]`,
30
+ - And I see "Pipeline history cleared. 25 entries removed."
31
+
32
+ **AC-3 — Clear cancelled on decline**
33
+ - Given I decline the confirmation (type 'n', 'no', or press Enter),
34
+ - When the command completes,
35
+ - Then `.claude/pipeline-history.json` remains unchanged,
36
+ - And I see "Clear cancelled. History unchanged."
37
+
38
+ **AC-4 — Force clear without confirmation**
39
+ - Given `.claude/pipeline-history.json` contains history entries,
40
+ - When I run `orchestr8 history clear --force`,
41
+ - Then the history is cleared without a confirmation prompt,
42
+ - And I see "Pipeline history cleared. 25 entries removed."
43
+
44
+ **AC-5 — Clear empty history**
45
+ - Given `.claude/pipeline-history.json` is empty or does not exist,
46
+ - When I run `orchestr8 history clear`,
47
+ - Then I see "No history to clear."
48
+ - And the command exits with code 0.
49
+
50
+ ---
51
+
52
+ ## CLI interaction
53
+
54
+ ```
55
+ $ orchestr8 history clear
56
+ This will delete all 25 history entries. Continue? (y/N) y
57
+ Pipeline history cleared. 25 entries removed.
58
+
59
+ $ orchestr8 history clear --force
60
+ Pipeline history cleared. 25 entries removed.
61
+
62
+ $ orchestr8 history clear
63
+ No history to clear.
64
+ ```
65
+
66
+ ---
67
+
68
+ ## Out of scope
69
+
70
+ - Clearing individual entries or filtered subsets
71
+ - Archiving history before clearing
72
+ - Undo/restore functionality
73
+ - Automatic cleanup based on age or count