orchestr8 2.4.0 → 2.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (55) hide show
  1. package/.blueprint/agents/AGENT_BA_CASS.md +50 -25
  2. package/.blueprint/agents/AGENT_DEVELOPER_CODEY.md +60 -69
  3. package/.blueprint/agents/AGENT_SPECIFICATION_ALEX.md +45 -0
  4. package/.blueprint/agents/AGENT_TESTER_NIGEL.md +72 -105
  5. package/.blueprint/features/feature_adaptive-retry/FEATURE_SPEC.md +239 -0
  6. package/.blueprint/features/feature_adaptive-retry/IMPLEMENTATION_PLAN.md +48 -0
  7. package/.blueprint/features/feature_adaptive-retry/story-prompt-modification.md +85 -0
  8. package/.blueprint/features/feature_adaptive-retry/story-retry-config.md +89 -0
  9. package/.blueprint/features/feature_adaptive-retry/story-should-retry.md +98 -0
  10. package/.blueprint/features/feature_adaptive-retry/story-strategy-recommendation.md +85 -0
  11. package/.blueprint/features/feature_agent-guardrails/FEATURE_SPEC.md +328 -0
  12. package/.blueprint/features/feature_agent-guardrails/IMPLEMENTATION_PLAN.md +90 -0
  13. package/.blueprint/features/feature_agent-guardrails/story-citation-requirements.md +50 -0
  14. package/.blueprint/features/feature_agent-guardrails/story-confidentiality.md +50 -0
  15. package/.blueprint/features/feature_agent-guardrails/story-escalation-protocol.md +55 -0
  16. package/.blueprint/features/feature_agent-guardrails/story-source-restrictions.md +50 -0
  17. package/.blueprint/features/feature_feedback-loop/FEATURE_SPEC.md +347 -0
  18. package/.blueprint/features/feature_feedback-loop/IMPLEMENTATION_PLAN.md +71 -0
  19. package/.blueprint/features/feature_feedback-loop/story-feedback-collection.md +63 -0
  20. package/.blueprint/features/feature_feedback-loop/story-feedback-config.md +61 -0
  21. package/.blueprint/features/feature_feedback-loop/story-feedback-insights.md +63 -0
  22. package/.blueprint/features/feature_feedback-loop/story-quality-gates.md +57 -0
  23. package/.blueprint/features/feature_pipeline-history/FEATURE_SPEC.md +239 -0
  24. package/.blueprint/features/feature_pipeline-history/IMPLEMENTATION_PLAN.md +71 -0
  25. package/.blueprint/features/feature_pipeline-history/story-clear-history.md +73 -0
  26. package/.blueprint/features/feature_pipeline-history/story-display-history.md +75 -0
  27. package/.blueprint/features/feature_pipeline-history/story-record-execution.md +76 -0
  28. package/.blueprint/features/feature_pipeline-history/story-show-statistics.md +85 -0
  29. package/.blueprint/features/feature_pipeline-insights/FEATURE_SPEC.md +288 -0
  30. package/.blueprint/features/feature_pipeline-insights/IMPLEMENTATION_PLAN.md +65 -0
  31. package/.blueprint/features/feature_pipeline-insights/story-anomaly-detection.md +71 -0
  32. package/.blueprint/features/feature_pipeline-insights/story-bottleneck-analysis.md +75 -0
  33. package/.blueprint/features/feature_pipeline-insights/story-failure-patterns.md +75 -0
  34. package/.blueprint/features/feature_pipeline-insights/story-json-output.md +75 -0
  35. package/.blueprint/features/feature_pipeline-insights/story-trend-analysis.md +78 -0
  36. package/.blueprint/features/feature_validate-command/FEATURE_SPEC.md +209 -0
  37. package/.blueprint/features/feature_validate-command/IMPLEMENTATION_PLAN.md +59 -0
  38. package/.blueprint/features/feature_validate-command/story-failure-output.md +61 -0
  39. package/.blueprint/features/feature_validate-command/story-node-version-check.md +52 -0
  40. package/.blueprint/features/feature_validate-command/story-run-validation.md +59 -0
  41. package/.blueprint/features/feature_validate-command/story-success-output.md +50 -0
  42. package/.blueprint/system_specification/SYSTEM_SPEC.md +248 -0
  43. package/README.md +174 -40
  44. package/SKILL.md +399 -74
  45. package/bin/cli.js +128 -20
  46. package/package.json +1 -1
  47. package/src/feedback.js +171 -0
  48. package/src/history.js +306 -0
  49. package/src/index.js +57 -2
  50. package/src/init.js +2 -6
  51. package/src/insights.js +504 -0
  52. package/src/retry.js +274 -0
  53. package/src/update.js +10 -2
  54. package/src/validate.js +172 -0
  55. package/src/skills.js +0 -93
@@ -0,0 +1,288 @@
1
+ # Feature Specification — Pipeline Insights
2
+
3
+ ## 1. Feature Intent
4
+
5
+ **Why this feature exists.**
6
+
7
+ - **Problem being addressed:** The pipeline-history feature captures execution data but provides only basic statistics. Users cannot identify optimization opportunities—such as bottleneck stages, failure patterns, or performance trends—without manual analysis of the raw history data.
8
+ - **User need:** Developers want actionable recommendations to improve pipeline efficiency. They need to understand which stages are slowest, why failures occur, and whether the pipeline is improving or degrading over time.
9
+ - **System purpose alignment:** Per SYSTEM_SPEC.md:Section 8 (Cross-Cutting Concerns:Observability), the system aims for observability via queue status and agent summaries. Per SYSTEM_SPEC.md:Section 2 (Business & Domain Context), orchestr8 seeks to provide "structured processes to guide AI-generated code." This feature extends observability into actionable intelligence, enabling users to optimize their development workflow.
10
+
11
+ > This feature builds upon the existing pipeline-history feature (`.blueprint/features/feature_pipeline-history/FEATURE_SPEC.md`) without modifying history recording. It is a read-only analysis layer.
12
+
13
+ ---
14
+
15
+ ## 2. Scope
16
+
17
+ ### In Scope
18
+
19
+ - New CLI command `orchestr8 insights` that analyzes `.claude/pipeline-history.json`
20
+ - Bottleneck detection: Identify which stage consistently takes longest
21
+ - Failure pattern analysis: Determine which stages fail most and correlate with feature characteristics
22
+ - Anomaly detection: Flag runs that deviate significantly from average durations
23
+ - Trend analysis: Track whether pipeline performance is improving or degrading over time
24
+ - Agent performance comparison: Compare stage durations and success rates across agents
25
+ - Flag support for filtering analysis types (`--bottlenecks`, `--failures`, `--json`)
26
+ - Human-readable recommendations based on detected patterns
27
+
28
+ ### Out of Scope
29
+
30
+ - Modifying pipeline-history recording logic (that feature is separate)
31
+ - Machine learning or complex statistical models (simple heuristics only)
32
+ - Automatic remediation or pipeline configuration changes
33
+ - Integration with external analytics platforms
34
+ - Predictive modelling of future pipeline performance
35
+ - Feature-type classification (assumes slugs are opaque identifiers)
36
+
37
+ ---
38
+
39
+ ## 3. Actors Involved
40
+
41
+ ### Human User
42
+
43
+ - **Can do:** Invoke `orchestr8 insights` to view optimization recommendations; filter by analysis type; export as JSON for programmatic use
44
+ - **Cannot do:** Modify the analysis thresholds or algorithms; act on recommendations automatically
45
+
46
+ ### Insights Analyzer (internal component)
47
+
48
+ - **Can do:** Read history file; compute statistics; generate recommendations; output formatted reports
49
+ - **Cannot do:** Write to history file; modify pipeline configuration; alter agent behaviour
50
+
51
+ ---
52
+
53
+ ## 4. Behaviour Overview
54
+
55
+ ### Happy-path behaviour
56
+
57
+ 1. User runs `orchestr8 insights` after accumulating several pipeline runs
58
+ 2. System reads `.claude/pipeline-history.json` and validates data sufficiency
59
+ 3. System performs analysis across four dimensions: bottlenecks, failures, anomalies, trends
60
+ 4. System generates human-readable report with recommendations
61
+ 5. User reviews recommendations and decides which to act upon
62
+
63
+ ### Key alternatives or branches
64
+
65
+ - **Insufficient data:** If fewer than 3 runs exist, display message: "Insufficient data for insights. Complete at least 3 pipeline runs."
66
+ - **No failures:** If all runs succeeded, omit failure analysis section; note "No failures recorded"
67
+ - **Filtered analysis:** If `--bottlenecks` or `--failures` flag provided, display only that section
68
+ - **JSON output:** If `--json` flag provided, output structured JSON instead of formatted text
69
+ - **Corrupted history:** If history file is corrupted, display warning and exit gracefully
70
+
71
+ ### User-visible outcomes
72
+
73
+ - Identification of the slowest pipeline stage with percentage of total time
74
+ - List of stages with high failure rates and potential contributing factors
75
+ - Flagged anomalous runs that deviated significantly from norms
76
+ - Trend indicators showing improvement or degradation over recent runs
77
+ - Actionable recommendations for each identified issue
78
+
79
+ ---
80
+
81
+ ## 5. State & Lifecycle Interactions
82
+
83
+ ### States entered
84
+
85
+ - None. This feature is stateless and read-only.
86
+
87
+ ### States modified
88
+
89
+ - None. This feature does not modify any system state.
90
+
91
+ ### This feature is:
92
+
93
+ - **Not state-creating:** Does not persist analysis results
94
+ - **Not state-transitioning:** Does not alter pipeline flow
95
+ - **Not state-constraining:** Does not block any operations
96
+
97
+ This is a pure read-only analysis feature that operates on existing history data.
98
+
99
+ ---
100
+
101
+ ## 6. Rules & Decision Logic
102
+
103
+ ### Rule: Bottleneck Detection
104
+
105
+ - **Description:** Identify the stage that consumes the largest proportion of total pipeline time
106
+ - **Inputs:** Stage durations from successful runs
107
+ - **Outputs:** Stage name, average duration, percentage of total pipeline time
108
+ - **Algorithm:** Calculate mean duration per stage; identify stage with highest mean; compute as percentage of sum
109
+ - **Threshold:** Report as bottleneck if stage accounts for >35% of total pipeline time
110
+ - **Deterministic:** Yes
111
+
112
+ ### Rule: Failure Pattern Analysis
113
+
114
+ - **Description:** Identify stages with disproportionate failure rates and correlate with feature characteristics
115
+ - **Inputs:** All history entries with status `failed`; feature slugs
116
+ - **Outputs:** Failure rate per stage; most common failure stage; correlation hints
117
+ - **Algorithm:** Count failures by stage; compute failure rate as failures/total runs for that stage; identify features with repeated failures
118
+ - **Threshold:** Report as concerning if failure rate >15%
119
+ - **Deterministic:** Yes
120
+
121
+ ### Rule: Anomaly Detection
122
+
123
+ - **Description:** Flag individual runs where stage duration deviates significantly from average
124
+ - **Inputs:** Stage durations from all runs; calculated means and standard deviations
125
+ - **Outputs:** List of anomalous runs with stage, actual duration, expected duration
126
+ - **Algorithm:** Calculate mean and standard deviation per stage; flag if duration > mean + 2*stddev
127
+ - **Threshold:** 2 standard deviations above mean
128
+ - **Scope:** Last 10 runs only (to limit output)
129
+ - **Deterministic:** Yes
130
+
131
+ ### Rule: Trend Analysis
132
+
133
+ - **Description:** Determine if pipeline performance is improving or degrading over time
134
+ - **Inputs:** All history entries, sorted chronologically
135
+ - **Outputs:** Success rate trend (improving/stable/degrading); duration trend (improving/stable/degrading)
136
+ - **Algorithm:** Compare metrics from first half vs second half of history; compute percentage change
137
+ - **Thresholds:** Improving if >10% better; degrading if >10% worse; stable otherwise
138
+ - **Minimum data:** Requires at least 6 runs to compute trends
139
+ - **Deterministic:** Yes
140
+
141
+ ### Rule: Agent Performance Comparison
142
+
143
+ - **Description:** Compare duration and success metrics across agent stages
144
+ - **Inputs:** All history entries with stage data
145
+ - **Outputs:** Ranked list of stages by average duration; success rate per stage
146
+ - **Algorithm:** Aggregate durations and success/failure counts per stage; rank by mean duration
147
+ - **Deterministic:** Yes
148
+
149
+ ### Rule: Recommendation Generation
150
+
151
+ - **Description:** Generate actionable recommendations based on detected patterns
152
+ - **Inputs:** Analysis results from all rules above
153
+ - **Outputs:** Human-readable recommendation strings
154
+ - **Logic:**
155
+ - If bottleneck stage is >40% of time → "Consider simplifying {stage} requirements or splitting features"
156
+ - If failure rate >20% on a stage → "Review {stage} agent configuration or specification clarity"
157
+ - If anomalies detected → "Investigate flagged runs for unusual feature complexity"
158
+ - If degrading trend → "Review recent changes to agent specifications or system spec"
159
+ - **Deterministic:** Yes (same inputs produce same recommendations)
160
+
161
+ ---
162
+
163
+ ## 7. Dependencies
164
+
165
+ ### System components
166
+
167
+ - `src/history.js` — Must expose `readHistoryFile()` function; currently exports this
168
+ - `bin/cli.js` — Must register new `insights` command
169
+ - `.claude/pipeline-history.json` — Must exist with entries from pipeline-history feature
170
+
171
+ ### Upstream features
172
+
173
+ - **pipeline-history** (`.blueprint/features/feature_pipeline-history/`) — This feature depends entirely on history data recorded by pipeline-history. The history entry schema (slug, status, stages, timestamps, durations) must be stable.
174
+
175
+ ### External systems
176
+
177
+ - None
178
+
179
+ ### Operational dependencies
180
+
181
+ - File system read access to `.claude/pipeline-history.json`
182
+
183
+ ---
184
+
185
+ ## 8. Non-Functional Considerations
186
+
187
+ ### Performance sensitivity
188
+
189
+ - Analysis is computed on-demand from full history file
190
+ - ASSUMPTION: History files contain <500 entries; O(n) algorithms acceptable
191
+ - No caching required; each invocation recomputes from scratch
192
+
193
+ ### Audit/logging needs
194
+
195
+ - None. This feature is read-only and does not produce persistent outputs.
196
+
197
+ ### Error tolerance
198
+
199
+ - If history file is missing, display "No history found. Run some pipelines first."
200
+ - If history file is corrupted, display warning and exit gracefully
201
+ - If insufficient data for specific analysis, skip that section with explanation
202
+
203
+ ### Security implications
204
+
205
+ - Feature slugs may reveal project information; output to terminal only
206
+ - JSON output should not include sensitive data beyond what is already in history file
207
+
208
+ ---
209
+
210
+ ## 9. Assumptions & Open Questions
211
+
212
+ ### Assumptions
213
+
214
+ - ASSUMPTION: The history entry schema from pipeline-history is stable: `{ slug, status, stages, completedAt, totalDurationMs }`
215
+ - ASSUMPTION: Stage names are fixed: `alex`, `cass`, `nigel`, `codey-plan`, `codey-implement`
216
+ - ASSUMPTION: 2 standard deviations is an appropriate anomaly threshold for this domain
217
+ - ASSUMPTION: 6 runs provides sufficient data for meaningful trend analysis
218
+ - ASSUMPTION: Users will act on recommendations manually; no automation required
219
+
220
+ ### Open Questions
221
+
222
+ - Should anomaly detection consider stage-specific thresholds rather than uniform 2-stddev?
223
+ - Should trend analysis use a sliding window rather than first-half/second-half comparison?
224
+ - Should there be a `--verbose` flag for more detailed analysis output?
225
+ - Should the feature support analysis of a specific time range (e.g., last 30 days)?
226
+
227
+ ---
228
+
229
+ ## 10. Impact on System Specification
230
+
231
+ ### Alignment assessment
232
+
233
+ This feature **reinforces existing system assumptions**:
234
+
235
+ - Per SYSTEM_SPEC.md:Section 8 (Observability), the system already aims for visibility into pipeline execution
236
+ - Per SYSTEM_SPEC.md:Section 5 (Core Domain Concepts), the queue and pipeline concepts are well-defined
237
+ - This feature adds an intelligence layer without altering core pipeline behaviour
238
+
239
+ ### No contradictions identified
240
+
241
+ The feature does not alter:
242
+
243
+ - Agent roles or boundaries
244
+ - Pipeline flow or stage order
245
+ - Artifact structures or handoff mechanisms
246
+ - History recording behaviour (defers entirely to pipeline-history feature)
247
+
248
+ ### Minor extension to system spec
249
+
250
+ The following addition to SYSTEM_SPEC.md:Section 5 (Core Domain Concepts) may be warranted:
251
+
252
+ > **Pipeline Insights** — An analysis layer that examines historical pipeline data to identify bottlenecks, failure patterns, anomalies, and trends. Provides recommendations for pipeline optimization without modifying pipeline behaviour.
253
+
254
+ This is flagged as a **non-breaking extension** for consideration.
255
+
256
+ ---
257
+
258
+ ## 11. Handover to BA (Cass)
259
+
260
+ ### Story themes
261
+
262
+ 1. **Bottleneck analysis** — Identifying and reporting the slowest pipeline stages
263
+ 2. **Failure pattern analysis** — Analyzing failure frequency and generating recommendations
264
+ 3. **Anomaly detection** — Flagging runs that deviate significantly from averages
265
+ 4. **Trend analysis** — Computing and displaying performance trends over time
266
+ 5. **JSON output** — Supporting programmatic consumption of insights data
267
+
268
+ ### Expected story boundaries
269
+
270
+ - Core insights engine (statistics computation) may be shared across stories
271
+ - Each analysis type (bottlenecks, failures, anomalies, trends) is a candidate for separate story
272
+ - JSON output support could be combined with any analysis story or kept separate
273
+ - CLI command registration is infrastructure supporting all stories
274
+
275
+ ### Areas needing careful story framing
276
+
277
+ - The threshold values (35% for bottleneck, 15% for failure rate, 2-stddev for anomaly) should be explicitly stated in acceptance criteria
278
+ - The minimum data requirements (3 runs for basic insights, 6 runs for trends) need clear edge case handling
279
+ - The recommendation text generation needs precise acceptance criteria for consistent output
280
+ - Handling of ties in "most common failure stage" should be explicit
281
+
282
+ ---
283
+
284
+ ## 12. Change Log (Feature-Level)
285
+
286
+ | Date | Change | Reason | Raised By |
287
+ |------------|---------------------------------------|-------------------------------------------|-----------|
288
+ | 2026-02-24 | Initial feature specification created | Extend pipeline-history with actionable insights | Alex |
@@ -0,0 +1,65 @@
1
+ # Implementation Plan — Pipeline Insights
2
+
3
+ ## Summary
4
+
5
+ This feature adds a new `orchestr8 insights` CLI command that performs read-only analysis of pipeline history data. It computes bottleneck detection, failure patterns, anomaly detection, and trend analysis, outputting human-readable recommendations or JSON. The implementation creates a new `src/insights.js` module that reuses `readHistoryFile()` from the existing `src/history.js`.
6
+
7
+ ---
8
+
9
+ ## Files to Create/Modify
10
+
11
+ | Path | Action | Purpose |
12
+ |------|--------|---------|
13
+ | `src/insights.js` | Create | Core analysis engine with all computation logic |
14
+ | `bin/cli.js` | Modify | Register `insights` command and route flags |
15
+
16
+ ---
17
+
18
+ ## Implementation Steps
19
+
20
+ 1. **Create `src/insights.js` scaffold** - Export main `displayInsights(options)` function that reads history via `readHistoryFile()` and validates minimum data (3 runs).
21
+
22
+ 2. **Implement bottleneck analysis** - Calculate average duration per stage across successful runs; identify stage with highest mean; compute percentage of total; flag if >35%; generate recommendation if >40%.
23
+
24
+ 3. **Implement failure pattern analysis** - Count failures by stage; compute failure rate per stage; identify most common failure stage; list features with repeated failures; flag if rate >15%; generate recommendation if >20%.
25
+
26
+ 4. **Implement anomaly detection** - Calculate mean and stddev per stage from all runs; evaluate last 10 runs; flag any stage duration exceeding mean + 2*stddev; include slug, stage, actual, expected, deviation in output.
27
+
28
+ 5. **Implement trend analysis** - Require 6+ runs; split history into first and second halves; compare success rates and average durations; classify as improving/stable/degrading based on 10% threshold; show percentage change.
29
+
30
+ 6. **Implement output formatters** - Create `formatTextOutput(analysis)` for human-readable output and `formatJsonOutput(analysis)` for structured JSON; handle section filtering based on flags.
31
+
32
+ 7. **Handle edge cases** - Missing history file returns "No history found"; corrupted file shows warning; insufficient data (<3 runs) shows appropriate message; no failures omits failure section with "No failures recorded".
33
+
34
+ 8. **Register CLI command** - In `bin/cli.js`, import `displayInsights` from `src/insights.js`; add `insights` command with flag parsing for `--bottlenecks`, `--failures`, `--json`.
35
+
36
+ 9. **Run tests and verify** - Execute `node --test test/feature_pipeline-insights.test.js` after each file change; ensure all tests pass.
37
+
38
+ 10. **Final cleanup** - Verify output formatting matches AC requirements; ensure recommendations use exact wording from spec.
39
+
40
+ ---
41
+
42
+ ## Key Functions
43
+
44
+ **In `src/insights.js`:**
45
+ - `displayInsights(options)` - Main entry point; orchestrates analysis and output
46
+ - `analyzeBottlenecks(history)` - Returns `{ stage, avgDurationMs, percentage, isBottleneck, recommendation }`
47
+ - `analyzeFailures(history)` - Returns `{ failuresByStage, mostCommonStage, repeatedFeatures, recommendation }`
48
+ - `detectAnomalies(history)` - Returns `{ anomalies: [{slug, stage, actual, expected, deviation}], recommendation }`
49
+ - `analyzeTrends(history)` - Returns `{ successRate: {trend, change}, duration: {trend, change}, recommendation }`
50
+ - `formatTextOutput(analysis, sections)` - Formats analysis as human-readable text
51
+ - `formatJsonOutput(analysis, sections)` - Formats analysis as JSON object
52
+ - `calculateMean(values)` - Helper: compute arithmetic mean
53
+ - `calculateStdDev(values, mean)` - Helper: compute population standard deviation
54
+
55
+ **In `bin/cli.js`:**
56
+ - Extend `parseFlags()` to recognize `--bottlenecks`, `--failures`, `--json`
57
+ - Add `insights` command entry in `commands` object
58
+
59
+ ---
60
+
61
+ ## Risks/Questions
62
+
63
+ - **History schema assumption**: Implementation assumes `stages` is an object with `{name: {durationMs}}` structure per test-spec.md. If actual schema differs, adapter logic may be needed.
64
+ - **Tie-breaking**: Per test-spec.md, ties in "most common failure stage" resolved by first occurrence. Implementation should use stable sort or maintain insertion order.
65
+ - **Standard deviation formula**: Using population stddev (N divisor) per test-spec.md assumption, not sample stddev (N-1).
@@ -0,0 +1,71 @@
1
+ # Story — Anomaly Detection
2
+
3
+ ## User story
4
+
5
+ As a developer, I want to identify pipeline runs where stage durations deviated significantly from normal so that I can investigate unusual behaviour and understand outliers.
6
+
7
+ ---
8
+
9
+ ## Context / scope
10
+
11
+ - User has accumulated enough history data to establish baseline metrics
12
+ - Analysis uses statistical deviation (mean + 2*stddev) to identify anomalies
13
+ - Scope limited to last 10 runs to keep output manageable
14
+ - This is a read-only analysis; no pipeline state is modified
15
+ - Route: `orchestr8 insights` (anomaly section included by default)
16
+
17
+ Per FEATURE_SPEC.md:Section 6 (Rule: Anomaly Detection):
18
+ - Threshold: 2 standard deviations above mean
19
+ - Scope: Last 10 runs only
20
+
21
+ ---
22
+
23
+ ## Acceptance criteria
24
+
25
+ **AC-1 — Detect anomalous stage durations**
26
+ - Given the history file contains at least 3 pipeline runs,
27
+ - When the user runs `orchestr8 insights`,
28
+ - Then the output includes an "Anomalies" section listing any runs where a stage duration exceeded mean + 2*stddev.
29
+
30
+ **AC-2 — Display anomaly details**
31
+ - Given an anomalous run is detected,
32
+ - When the analysis completes,
33
+ - Then the output shows: feature slug, stage name, actual duration, expected duration (mean), and deviation factor.
34
+
35
+ **AC-3 — Limit scope to recent runs**
36
+ - Given the history contains more than 10 runs,
37
+ - When anomaly detection is performed,
38
+ - Then only the most recent 10 runs are evaluated for anomalies.
39
+
40
+ **AC-4 — Generate recommendation when anomalies found**
41
+ - Given one or more anomalous runs are detected,
42
+ - When the analysis completes,
43
+ - Then the output includes the recommendation: "Investigate flagged runs for unusual feature complexity".
44
+
45
+ **AC-5 — No anomalies detected**
46
+ - Given all recent runs have stage durations within 2 standard deviations of the mean,
47
+ - When the user runs `orchestr8 insights`,
48
+ - Then the anomalies section displays: "No anomalies detected in recent runs."
49
+
50
+ **AC-6 — Insufficient data for statistics**
51
+ - Given the history file contains fewer than 3 runs,
52
+ - When anomaly detection is attempted,
53
+ - Then it is skipped with explanation: "Insufficient data for anomaly detection."
54
+
55
+ ---
56
+
57
+ ## Out of scope
58
+
59
+ - Configurable standard deviation threshold
60
+ - Stage-specific anomaly thresholds
61
+ - Anomaly detection for failure counts (only duration-based)
62
+ - Historical anomaly tracking beyond last 10 runs
63
+ - Automatic investigation or drill-down into anomalous runs
64
+
65
+ ---
66
+
67
+ ## References
68
+
69
+ - Feature spec: `.blueprint/features/feature_pipeline-insights/FEATURE_SPEC.md`
70
+ - Upstream dependency: `.blueprint/features/feature_pipeline-history/FEATURE_SPEC.md`
71
+ - System spec: `.blueprint/system_specification/SYSTEM_SPEC.md`
@@ -0,0 +1,75 @@
1
+ # Story — Bottleneck Analysis
2
+
3
+ ## User story
4
+
5
+ As a developer, I want to identify which pipeline stage consistently takes the longest so that I can focus optimization efforts where they will have the greatest impact.
6
+
7
+ ---
8
+
9
+ ## Context / scope
10
+
11
+ - User has executed multiple pipeline runs via `/implement-feature`
12
+ - History data exists in `.claude/pipeline-history.json`
13
+ - This is a read-only analysis; no pipeline state is modified
14
+ - Route: `orchestr8 insights` or `orchestr8 insights --bottlenecks`
15
+
16
+ Per FEATURE_SPEC.md:Section 6 (Rule: Bottleneck Detection):
17
+ - Bottleneck threshold: >35% of total pipeline time
18
+ - Recommendation threshold: >40% of total pipeline time
19
+
20
+ ---
21
+
22
+ ## Acceptance criteria
23
+
24
+ **AC-1 — Display bottleneck stage**
25
+ - Given the history file contains at least 3 successful pipeline runs,
26
+ - When the user runs `orchestr8 insights`,
27
+ - Then the output includes a "Bottlenecks" section identifying the stage with the highest average duration.
28
+
29
+ **AC-2 — Show percentage of total time**
30
+ - Given a bottleneck stage is identified,
31
+ - When the analysis completes,
32
+ - Then the output displays the stage name, average duration in milliseconds, and percentage of total pipeline time.
33
+
34
+ **AC-3 — Bottleneck threshold reporting**
35
+ - Given a stage accounts for more than 35% of total pipeline time,
36
+ - When the analysis completes,
37
+ - Then that stage is flagged as a bottleneck in the output.
38
+
39
+ **AC-4 — Generate recommendation for severe bottleneck**
40
+ - Given a stage accounts for more than 40% of total pipeline time,
41
+ - When the analysis completes,
42
+ - Then the output includes the recommendation: "Consider simplifying {stage} requirements or splitting features".
43
+
44
+ **AC-5 — Filter to bottlenecks only**
45
+ - Given the user runs `orchestr8 insights --bottlenecks`,
46
+ - When the analysis completes,
47
+ - Then only the bottleneck analysis section is displayed (other analysis types are omitted).
48
+
49
+ **AC-6 — Insufficient data handling**
50
+ - Given the history file contains fewer than 3 runs,
51
+ - When the user runs `orchestr8 insights`,
52
+ - Then the output displays: "Insufficient data for insights. Complete at least 3 pipeline runs."
53
+
54
+ **AC-7 — Missing history file handling**
55
+ - Given no history file exists at `.claude/pipeline-history.json`,
56
+ - When the user runs `orchestr8 insights`,
57
+ - Then the output displays: "No history found. Run some pipelines first."
58
+
59
+ ---
60
+
61
+ ## Out of scope
62
+
63
+ - Modifying the history file or pipeline configuration
64
+ - Customizing the 35%/40% threshold values
65
+ - Providing automated remediation
66
+ - Stage-specific threshold configuration
67
+ - Analysis of partial/in-progress runs
68
+
69
+ ---
70
+
71
+ ## References
72
+
73
+ - Feature spec: `.blueprint/features/feature_pipeline-insights/FEATURE_SPEC.md`
74
+ - Upstream dependency: `.blueprint/features/feature_pipeline-history/FEATURE_SPEC.md`
75
+ - System spec: `.blueprint/system_specification/SYSTEM_SPEC.md`
@@ -0,0 +1,75 @@
1
+ # Story — Failure Pattern Analysis
2
+
3
+ ## User story
4
+
5
+ As a developer, I want to analyze which pipeline stages fail most frequently so that I can identify systemic issues and improve pipeline reliability.
6
+
7
+ ---
8
+
9
+ ## Context / scope
10
+
11
+ - User has executed multiple pipeline runs, some of which have failed
12
+ - History data includes entries with `status: "failed"` and associated stage information
13
+ - This is a read-only analysis; no pipeline state is modified
14
+ - Route: `orchestr8 insights` or `orchestr8 insights --failures`
15
+
16
+ Per FEATURE_SPEC.md:Section 6 (Rule: Failure Pattern Analysis):
17
+ - Failure rate threshold: >15% is reported as concerning
18
+ - Recommendation threshold: >20% triggers specific recommendation
19
+
20
+ ---
21
+
22
+ ## Acceptance criteria
23
+
24
+ **AC-1 — Display failure rates per stage**
25
+ - Given the history file contains at least 3 pipeline runs with at least one failure,
26
+ - When the user runs `orchestr8 insights`,
27
+ - Then the output includes a "Failure Patterns" section showing failure rate for each stage that has experienced failures.
28
+
29
+ **AC-2 — Identify most common failure stage**
30
+ - Given failures exist in the history,
31
+ - When the analysis completes,
32
+ - Then the output identifies the stage with the highest failure count as the "most common failure stage".
33
+
34
+ **AC-3 — Flag concerning failure rates**
35
+ - Given a stage has a failure rate greater than 15%,
36
+ - When the analysis completes,
37
+ - Then that stage is flagged as having a concerning failure rate.
38
+
39
+ **AC-4 — Generate recommendation for high failure rate**
40
+ - Given a stage has a failure rate greater than 20%,
41
+ - When the analysis completes,
42
+ - Then the output includes the recommendation: "Review {stage} agent configuration or specification clarity".
43
+
44
+ **AC-5 — Identify features with repeated failures**
45
+ - Given the same feature slug has failed multiple times,
46
+ - When the analysis completes,
47
+ - Then those features are listed as correlation hints (e.g., "Feature 'complex-auth' has failed 3 times").
48
+
49
+ **AC-6 — Filter to failures only**
50
+ - Given the user runs `orchestr8 insights --failures`,
51
+ - When the analysis completes,
52
+ - Then only the failure pattern analysis section is displayed (other analysis types are omitted).
53
+
54
+ **AC-7 — No failures recorded**
55
+ - Given all pipeline runs in history have status "completed" (no failures),
56
+ - When the user runs `orchestr8 insights`,
57
+ - Then the failure analysis section displays: "No failures recorded" and is omitted from recommendations.
58
+
59
+ ---
60
+
61
+ ## Out of scope
62
+
63
+ - Automatic correlation with feature complexity metrics
64
+ - Root cause analysis beyond stage identification
65
+ - Failure notification or alerting
66
+ - Retry or remediation automation
67
+ - Classification of failure types (timeout vs error vs abort)
68
+
69
+ ---
70
+
71
+ ## References
72
+
73
+ - Feature spec: `.blueprint/features/feature_pipeline-insights/FEATURE_SPEC.md`
74
+ - Upstream dependency: `.blueprint/features/feature_pipeline-history/FEATURE_SPEC.md`
75
+ - System spec: `.blueprint/system_specification/SYSTEM_SPEC.md`
@@ -0,0 +1,75 @@
1
+ # Story — JSON Output
2
+
3
+ ## User story
4
+
5
+ As a developer, I want to export pipeline insights as structured JSON so that I can integrate the analysis with other tools or process the data programmatically.
6
+
7
+ ---
8
+
9
+ ## Context / scope
10
+
11
+ - User wants machine-readable output instead of human-readable text
12
+ - JSON output contains all the same analysis data as text output
13
+ - Enables integration with CI/CD pipelines, dashboards, or custom tooling
14
+ - This is a read-only analysis; no pipeline state is modified
15
+ - Route: `orchestr8 insights --json`
16
+
17
+ Per FEATURE_SPEC.md:Section 4 (Key alternatives or branches):
18
+ - `--json` flag produces structured JSON instead of formatted text
19
+
20
+ ---
21
+
22
+ ## Acceptance criteria
23
+
24
+ **AC-1 — Output JSON when flag provided**
25
+ - Given the history file contains valid pipeline data,
26
+ - When the user runs `orchestr8 insights --json`,
27
+ - Then the output is valid JSON (parseable by `JSON.parse()`).
28
+
29
+ **AC-2 — Include bottleneck data in JSON**
30
+ - Given bottleneck analysis completes successfully,
31
+ - When `--json` flag is provided,
32
+ - Then the JSON output includes a `bottlenecks` object with: `stage`, `averageDurationMs`, `percentageOfTotal`, `isBottleneck`, `recommendation` (if applicable).
33
+
34
+ **AC-3 — Include failure data in JSON**
35
+ - Given failure analysis completes successfully,
36
+ - When `--json` flag is provided,
37
+ - Then the JSON output includes a `failures` object with: `failuresByStage` (array), `mostCommonFailureStage`, `featuresWithRepeatedFailures` (array), `recommendation` (if applicable).
38
+
39
+ **AC-4 — Include anomaly data in JSON**
40
+ - Given anomaly detection completes successfully,
41
+ - When `--json` flag is provided,
42
+ - Then the JSON output includes an `anomalies` object with: `detected` (array of anomalous runs), `recommendation` (if applicable).
43
+
44
+ **AC-5 — Include trend data in JSON**
45
+ - Given trend analysis completes successfully,
46
+ - When `--json` flag is provided,
47
+ - Then the JSON output includes a `trends` object with: `successRate` (trend + percentage), `duration` (trend + percentage), `recommendation` (if applicable).
48
+
49
+ **AC-6 — Combine JSON with filter flags**
50
+ - Given the user runs `orchestr8 insights --json --bottlenecks`,
51
+ - When the analysis completes,
52
+ - Then the JSON output includes only the `bottlenecks` section (other analysis types are omitted).
53
+
54
+ **AC-7 — Handle insufficient data in JSON**
55
+ - Given there is insufficient data for analysis,
56
+ - When `--json` flag is provided,
57
+ - Then the JSON output includes an `error` field with the appropriate message (e.g., `{"error": "Insufficient data for insights. Complete at least 3 pipeline runs."}`).
58
+
59
+ ---
60
+
61
+ ## Out of scope
62
+
63
+ - Exporting JSON to a file (output to stdout only)
64
+ - JSON schema validation or versioning
65
+ - Compressed or minified JSON output options
66
+ - Integration with specific external platforms
67
+ - Historical JSON output comparison
68
+
69
+ ---
70
+
71
+ ## References
72
+
73
+ - Feature spec: `.blueprint/features/feature_pipeline-insights/FEATURE_SPEC.md`
74
+ - Upstream dependency: `.blueprint/features/feature_pipeline-history/FEATURE_SPEC.md`
75
+ - System spec: `.blueprint/system_specification/SYSTEM_SPEC.md`