orchestr8 2.4.0 → 2.6.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.blueprint/agents/AGENT_BA_CASS.md +50 -25
- package/.blueprint/agents/AGENT_DEVELOPER_CODEY.md +60 -69
- package/.blueprint/agents/AGENT_SPECIFICATION_ALEX.md +45 -0
- package/.blueprint/agents/AGENT_TESTER_NIGEL.md +72 -105
- package/.blueprint/features/feature_adaptive-retry/FEATURE_SPEC.md +239 -0
- package/.blueprint/features/feature_adaptive-retry/IMPLEMENTATION_PLAN.md +48 -0
- package/.blueprint/features/feature_adaptive-retry/story-prompt-modification.md +85 -0
- package/.blueprint/features/feature_adaptive-retry/story-retry-config.md +89 -0
- package/.blueprint/features/feature_adaptive-retry/story-should-retry.md +98 -0
- package/.blueprint/features/feature_adaptive-retry/story-strategy-recommendation.md +85 -0
- package/.blueprint/features/feature_agent-guardrails/FEATURE_SPEC.md +328 -0
- package/.blueprint/features/feature_agent-guardrails/IMPLEMENTATION_PLAN.md +90 -0
- package/.blueprint/features/feature_agent-guardrails/story-citation-requirements.md +50 -0
- package/.blueprint/features/feature_agent-guardrails/story-confidentiality.md +50 -0
- package/.blueprint/features/feature_agent-guardrails/story-escalation-protocol.md +55 -0
- package/.blueprint/features/feature_agent-guardrails/story-source-restrictions.md +50 -0
- package/.blueprint/features/feature_feedback-loop/FEATURE_SPEC.md +347 -0
- package/.blueprint/features/feature_feedback-loop/IMPLEMENTATION_PLAN.md +71 -0
- package/.blueprint/features/feature_feedback-loop/story-feedback-collection.md +63 -0
- package/.blueprint/features/feature_feedback-loop/story-feedback-config.md +61 -0
- package/.blueprint/features/feature_feedback-loop/story-feedback-insights.md +63 -0
- package/.blueprint/features/feature_feedback-loop/story-quality-gates.md +57 -0
- package/.blueprint/features/feature_pipeline-history/FEATURE_SPEC.md +239 -0
- package/.blueprint/features/feature_pipeline-history/IMPLEMENTATION_PLAN.md +71 -0
- package/.blueprint/features/feature_pipeline-history/story-clear-history.md +73 -0
- package/.blueprint/features/feature_pipeline-history/story-display-history.md +75 -0
- package/.blueprint/features/feature_pipeline-history/story-record-execution.md +76 -0
- package/.blueprint/features/feature_pipeline-history/story-show-statistics.md +85 -0
- package/.blueprint/features/feature_pipeline-insights/FEATURE_SPEC.md +288 -0
- package/.blueprint/features/feature_pipeline-insights/IMPLEMENTATION_PLAN.md +65 -0
- package/.blueprint/features/feature_pipeline-insights/story-anomaly-detection.md +71 -0
- package/.blueprint/features/feature_pipeline-insights/story-bottleneck-analysis.md +75 -0
- package/.blueprint/features/feature_pipeline-insights/story-failure-patterns.md +75 -0
- package/.blueprint/features/feature_pipeline-insights/story-json-output.md +75 -0
- package/.blueprint/features/feature_pipeline-insights/story-trend-analysis.md +78 -0
- package/.blueprint/features/feature_validate-command/FEATURE_SPEC.md +209 -0
- package/.blueprint/features/feature_validate-command/IMPLEMENTATION_PLAN.md +59 -0
- package/.blueprint/features/feature_validate-command/story-failure-output.md +61 -0
- package/.blueprint/features/feature_validate-command/story-node-version-check.md +52 -0
- package/.blueprint/features/feature_validate-command/story-run-validation.md +59 -0
- package/.blueprint/features/feature_validate-command/story-success-output.md +50 -0
- package/.blueprint/system_specification/SYSTEM_SPEC.md +248 -0
- package/README.md +174 -40
- package/SKILL.md +399 -74
- package/bin/cli.js +128 -20
- package/package.json +1 -1
- package/src/feedback.js +171 -0
- package/src/history.js +306 -0
- package/src/index.js +57 -2
- package/src/init.js +2 -6
- package/src/insights.js +504 -0
- package/src/retry.js +274 -0
- package/src/update.js +10 -2
- package/src/validate.js +172 -0
- package/src/skills.js +0 -93
|
@@ -0,0 +1,288 @@
|
|
|
1
|
+
# Feature Specification — Pipeline Insights
|
|
2
|
+
|
|
3
|
+
## 1. Feature Intent
|
|
4
|
+
|
|
5
|
+
**Why this feature exists.**
|
|
6
|
+
|
|
7
|
+
- **Problem being addressed:** The pipeline-history feature captures execution data but provides only basic statistics. Users cannot identify optimization opportunities—such as bottleneck stages, failure patterns, or performance trends—without manual analysis of the raw history data.
|
|
8
|
+
- **User need:** Developers want actionable recommendations to improve pipeline efficiency. They need to understand which stages are slowest, why failures occur, and whether the pipeline is improving or degrading over time.
|
|
9
|
+
- **System purpose alignment:** Per SYSTEM_SPEC.md:Section 8 (Cross-Cutting Concerns:Observability), the system aims for observability via queue status and agent summaries. Per SYSTEM_SPEC.md:Section 2 (Business & Domain Context), orchestr8 seeks to provide "structured processes to guide AI-generated code." This feature extends observability into actionable intelligence, enabling users to optimize their development workflow.
|
|
10
|
+
|
|
11
|
+
> This feature builds upon the existing pipeline-history feature (`.blueprint/features/feature_pipeline-history/FEATURE_SPEC.md`) without modifying history recording. It is a read-only analysis layer.
|
|
12
|
+
|
|
13
|
+
---
|
|
14
|
+
|
|
15
|
+
## 2. Scope
|
|
16
|
+
|
|
17
|
+
### In Scope
|
|
18
|
+
|
|
19
|
+
- New CLI command `orchestr8 insights` that analyzes `.claude/pipeline-history.json`
|
|
20
|
+
- Bottleneck detection: Identify which stage consistently takes longest
|
|
21
|
+
- Failure pattern analysis: Determine which stages fail most and correlate with feature characteristics
|
|
22
|
+
- Anomaly detection: Flag runs that deviate significantly from average durations
|
|
23
|
+
- Trend analysis: Track whether pipeline performance is improving or degrading over time
|
|
24
|
+
- Agent performance comparison: Compare stage durations and success rates across agents
|
|
25
|
+
- Flag support for filtering analysis types (`--bottlenecks`, `--failures`, `--json`)
|
|
26
|
+
- Human-readable recommendations based on detected patterns
|
|
27
|
+
|
|
28
|
+
### Out of Scope
|
|
29
|
+
|
|
30
|
+
- Modifying pipeline-history recording logic (that feature is separate)
|
|
31
|
+
- Machine learning or complex statistical models (simple heuristics only)
|
|
32
|
+
- Automatic remediation or pipeline configuration changes
|
|
33
|
+
- Integration with external analytics platforms
|
|
34
|
+
- Predictive modelling of future pipeline performance
|
|
35
|
+
- Feature-type classification (assumes slugs are opaque identifiers)
|
|
36
|
+
|
|
37
|
+
---
|
|
38
|
+
|
|
39
|
+
## 3. Actors Involved
|
|
40
|
+
|
|
41
|
+
### Human User
|
|
42
|
+
|
|
43
|
+
- **Can do:** Invoke `orchestr8 insights` to view optimization recommendations; filter by analysis type; export as JSON for programmatic use
|
|
44
|
+
- **Cannot do:** Modify the analysis thresholds or algorithms; act on recommendations automatically
|
|
45
|
+
|
|
46
|
+
### Insights Analyzer (internal component)
|
|
47
|
+
|
|
48
|
+
- **Can do:** Read history file; compute statistics; generate recommendations; output formatted reports
|
|
49
|
+
- **Cannot do:** Write to history file; modify pipeline configuration; alter agent behaviour
|
|
50
|
+
|
|
51
|
+
---
|
|
52
|
+
|
|
53
|
+
## 4. Behaviour Overview
|
|
54
|
+
|
|
55
|
+
### Happy-path behaviour
|
|
56
|
+
|
|
57
|
+
1. User runs `orchestr8 insights` after accumulating several pipeline runs
|
|
58
|
+
2. System reads `.claude/pipeline-history.json` and validates data sufficiency
|
|
59
|
+
3. System performs analysis across four dimensions: bottlenecks, failures, anomalies, trends
|
|
60
|
+
4. System generates human-readable report with recommendations
|
|
61
|
+
5. User reviews recommendations and decides which to act upon
|
|
62
|
+
|
|
63
|
+
### Key alternatives or branches
|
|
64
|
+
|
|
65
|
+
- **Insufficient data:** If fewer than 3 runs exist, display message: "Insufficient data for insights. Complete at least 3 pipeline runs."
|
|
66
|
+
- **No failures:** If all runs succeeded, omit failure analysis section; note "No failures recorded"
|
|
67
|
+
- **Filtered analysis:** If `--bottlenecks` or `--failures` flag provided, display only that section
|
|
68
|
+
- **JSON output:** If `--json` flag provided, output structured JSON instead of formatted text
|
|
69
|
+
- **Corrupted history:** If history file is corrupted, display warning and exit gracefully
|
|
70
|
+
|
|
71
|
+
### User-visible outcomes
|
|
72
|
+
|
|
73
|
+
- Identification of the slowest pipeline stage with percentage of total time
|
|
74
|
+
- List of stages with high failure rates and potential contributing factors
|
|
75
|
+
- Flagged anomalous runs that deviated significantly from norms
|
|
76
|
+
- Trend indicators showing improvement or degradation over recent runs
|
|
77
|
+
- Actionable recommendations for each identified issue
|
|
78
|
+
|
|
79
|
+
---
|
|
80
|
+
|
|
81
|
+
## 5. State & Lifecycle Interactions
|
|
82
|
+
|
|
83
|
+
### States entered
|
|
84
|
+
|
|
85
|
+
- None. This feature is stateless and read-only.
|
|
86
|
+
|
|
87
|
+
### States modified
|
|
88
|
+
|
|
89
|
+
- None. This feature does not modify any system state.
|
|
90
|
+
|
|
91
|
+
### This feature is:
|
|
92
|
+
|
|
93
|
+
- **Not state-creating:** Does not persist analysis results
|
|
94
|
+
- **Not state-transitioning:** Does not alter pipeline flow
|
|
95
|
+
- **Not state-constraining:** Does not block any operations
|
|
96
|
+
|
|
97
|
+
This is a pure read-only analysis feature that operates on existing history data.
|
|
98
|
+
|
|
99
|
+
---
|
|
100
|
+
|
|
101
|
+
## 6. Rules & Decision Logic
|
|
102
|
+
|
|
103
|
+
### Rule: Bottleneck Detection
|
|
104
|
+
|
|
105
|
+
- **Description:** Identify the stage that consumes the largest proportion of total pipeline time
|
|
106
|
+
- **Inputs:** Stage durations from successful runs
|
|
107
|
+
- **Outputs:** Stage name, average duration, percentage of total pipeline time
|
|
108
|
+
- **Algorithm:** Calculate mean duration per stage; identify stage with highest mean; compute as percentage of sum
|
|
109
|
+
- **Threshold:** Report as bottleneck if stage accounts for >35% of total pipeline time
|
|
110
|
+
- **Deterministic:** Yes
|
|
111
|
+
|
|
112
|
+
### Rule: Failure Pattern Analysis
|
|
113
|
+
|
|
114
|
+
- **Description:** Identify stages with disproportionate failure rates and correlate with feature characteristics
|
|
115
|
+
- **Inputs:** All history entries with status `failed`; feature slugs
|
|
116
|
+
- **Outputs:** Failure rate per stage; most common failure stage; correlation hints
|
|
117
|
+
- **Algorithm:** Count failures by stage; compute failure rate as failures/total runs for that stage; identify features with repeated failures
|
|
118
|
+
- **Threshold:** Report as concerning if failure rate >15%
|
|
119
|
+
- **Deterministic:** Yes
|
|
120
|
+
|
|
121
|
+
### Rule: Anomaly Detection
|
|
122
|
+
|
|
123
|
+
- **Description:** Flag individual runs where stage duration deviates significantly from average
|
|
124
|
+
- **Inputs:** Stage durations from all runs; calculated means and standard deviations
|
|
125
|
+
- **Outputs:** List of anomalous runs with stage, actual duration, expected duration
|
|
126
|
+
- **Algorithm:** Calculate mean and standard deviation per stage; flag if duration > mean + 2*stddev
|
|
127
|
+
- **Threshold:** 2 standard deviations above mean
|
|
128
|
+
- **Scope:** Last 10 runs only (to limit output)
|
|
129
|
+
- **Deterministic:** Yes
|
|
130
|
+
|
|
131
|
+
### Rule: Trend Analysis
|
|
132
|
+
|
|
133
|
+
- **Description:** Determine if pipeline performance is improving or degrading over time
|
|
134
|
+
- **Inputs:** All history entries, sorted chronologically
|
|
135
|
+
- **Outputs:** Success rate trend (improving/stable/degrading); duration trend (improving/stable/degrading)
|
|
136
|
+
- **Algorithm:** Compare metrics from first half vs second half of history; compute percentage change
|
|
137
|
+
- **Thresholds:** Improving if >10% better; degrading if >10% worse; stable otherwise
|
|
138
|
+
- **Minimum data:** Requires at least 6 runs to compute trends
|
|
139
|
+
- **Deterministic:** Yes
|
|
140
|
+
|
|
141
|
+
### Rule: Agent Performance Comparison
|
|
142
|
+
|
|
143
|
+
- **Description:** Compare duration and success metrics across agent stages
|
|
144
|
+
- **Inputs:** All history entries with stage data
|
|
145
|
+
- **Outputs:** Ranked list of stages by average duration; success rate per stage
|
|
146
|
+
- **Algorithm:** Aggregate durations and success/failure counts per stage; rank by mean duration
|
|
147
|
+
- **Deterministic:** Yes
|
|
148
|
+
|
|
149
|
+
### Rule: Recommendation Generation
|
|
150
|
+
|
|
151
|
+
- **Description:** Generate actionable recommendations based on detected patterns
|
|
152
|
+
- **Inputs:** Analysis results from all rules above
|
|
153
|
+
- **Outputs:** Human-readable recommendation strings
|
|
154
|
+
- **Logic:**
|
|
155
|
+
- If bottleneck stage is >40% of time → "Consider simplifying {stage} requirements or splitting features"
|
|
156
|
+
- If failure rate >20% on a stage → "Review {stage} agent configuration or specification clarity"
|
|
157
|
+
- If anomalies detected → "Investigate flagged runs for unusual feature complexity"
|
|
158
|
+
- If degrading trend → "Review recent changes to agent specifications or system spec"
|
|
159
|
+
- **Deterministic:** Yes (same inputs produce same recommendations)
|
|
160
|
+
|
|
161
|
+
---
|
|
162
|
+
|
|
163
|
+
## 7. Dependencies
|
|
164
|
+
|
|
165
|
+
### System components
|
|
166
|
+
|
|
167
|
+
- `src/history.js` — Must expose `readHistoryFile()` function; currently exports this
|
|
168
|
+
- `bin/cli.js` — Must register new `insights` command
|
|
169
|
+
- `.claude/pipeline-history.json` — Must exist with entries from pipeline-history feature
|
|
170
|
+
|
|
171
|
+
### Upstream features
|
|
172
|
+
|
|
173
|
+
- **pipeline-history** (`.blueprint/features/feature_pipeline-history/`) — This feature depends entirely on history data recorded by pipeline-history. The history entry schema (slug, status, stages, timestamps, durations) must be stable.
|
|
174
|
+
|
|
175
|
+
### External systems
|
|
176
|
+
|
|
177
|
+
- None
|
|
178
|
+
|
|
179
|
+
### Operational dependencies
|
|
180
|
+
|
|
181
|
+
- File system read access to `.claude/pipeline-history.json`
|
|
182
|
+
|
|
183
|
+
---
|
|
184
|
+
|
|
185
|
+
## 8. Non-Functional Considerations
|
|
186
|
+
|
|
187
|
+
### Performance sensitivity
|
|
188
|
+
|
|
189
|
+
- Analysis is computed on-demand from full history file
|
|
190
|
+
- ASSUMPTION: History files contain <500 entries; O(n) algorithms acceptable
|
|
191
|
+
- No caching required; each invocation recomputes from scratch
|
|
192
|
+
|
|
193
|
+
### Audit/logging needs
|
|
194
|
+
|
|
195
|
+
- None. This feature is read-only and does not produce persistent outputs.
|
|
196
|
+
|
|
197
|
+
### Error tolerance
|
|
198
|
+
|
|
199
|
+
- If history file is missing, display "No history found. Run some pipelines first."
|
|
200
|
+
- If history file is corrupted, display warning and exit gracefully
|
|
201
|
+
- If insufficient data for specific analysis, skip that section with explanation
|
|
202
|
+
|
|
203
|
+
### Security implications
|
|
204
|
+
|
|
205
|
+
- Feature slugs may reveal project information; output to terminal only
|
|
206
|
+
- JSON output should not include sensitive data beyond what is already in history file
|
|
207
|
+
|
|
208
|
+
---
|
|
209
|
+
|
|
210
|
+
## 9. Assumptions & Open Questions
|
|
211
|
+
|
|
212
|
+
### Assumptions
|
|
213
|
+
|
|
214
|
+
- ASSUMPTION: The history entry schema from pipeline-history is stable: `{ slug, status, stages, completedAt, totalDurationMs }`
|
|
215
|
+
- ASSUMPTION: Stage names are fixed: `alex`, `cass`, `nigel`, `codey-plan`, `codey-implement`
|
|
216
|
+
- ASSUMPTION: 2 standard deviations is an appropriate anomaly threshold for this domain
|
|
217
|
+
- ASSUMPTION: 6 runs provides sufficient data for meaningful trend analysis
|
|
218
|
+
- ASSUMPTION: Users will act on recommendations manually; no automation required
|
|
219
|
+
|
|
220
|
+
### Open Questions
|
|
221
|
+
|
|
222
|
+
- Should anomaly detection consider stage-specific thresholds rather than uniform 2-stddev?
|
|
223
|
+
- Should trend analysis use a sliding window rather than first-half/second-half comparison?
|
|
224
|
+
- Should there be a `--verbose` flag for more detailed analysis output?
|
|
225
|
+
- Should the feature support analysis of a specific time range (e.g., last 30 days)?
|
|
226
|
+
|
|
227
|
+
---
|
|
228
|
+
|
|
229
|
+
## 10. Impact on System Specification
|
|
230
|
+
|
|
231
|
+
### Alignment assessment
|
|
232
|
+
|
|
233
|
+
This feature **reinforces existing system assumptions**:
|
|
234
|
+
|
|
235
|
+
- Per SYSTEM_SPEC.md:Section 8 (Observability), the system already aims for visibility into pipeline execution
|
|
236
|
+
- Per SYSTEM_SPEC.md:Section 5 (Core Domain Concepts), the queue and pipeline concepts are well-defined
|
|
237
|
+
- This feature adds an intelligence layer without altering core pipeline behaviour
|
|
238
|
+
|
|
239
|
+
### No contradictions identified
|
|
240
|
+
|
|
241
|
+
The feature does not alter:
|
|
242
|
+
|
|
243
|
+
- Agent roles or boundaries
|
|
244
|
+
- Pipeline flow or stage order
|
|
245
|
+
- Artifact structures or handoff mechanisms
|
|
246
|
+
- History recording behaviour (defers entirely to pipeline-history feature)
|
|
247
|
+
|
|
248
|
+
### Minor extension to system spec
|
|
249
|
+
|
|
250
|
+
The following addition to SYSTEM_SPEC.md:Section 5 (Core Domain Concepts) may be warranted:
|
|
251
|
+
|
|
252
|
+
> **Pipeline Insights** — An analysis layer that examines historical pipeline data to identify bottlenecks, failure patterns, anomalies, and trends. Provides recommendations for pipeline optimization without modifying pipeline behaviour.
|
|
253
|
+
|
|
254
|
+
This is flagged as a **non-breaking extension** for consideration.
|
|
255
|
+
|
|
256
|
+
---
|
|
257
|
+
|
|
258
|
+
## 11. Handover to BA (Cass)
|
|
259
|
+
|
|
260
|
+
### Story themes
|
|
261
|
+
|
|
262
|
+
1. **Bottleneck analysis** — Identifying and reporting the slowest pipeline stages
|
|
263
|
+
2. **Failure pattern analysis** — Analyzing failure frequency and generating recommendations
|
|
264
|
+
3. **Anomaly detection** — Flagging runs that deviate significantly from averages
|
|
265
|
+
4. **Trend analysis** — Computing and displaying performance trends over time
|
|
266
|
+
5. **JSON output** — Supporting programmatic consumption of insights data
|
|
267
|
+
|
|
268
|
+
### Expected story boundaries
|
|
269
|
+
|
|
270
|
+
- Core insights engine (statistics computation) may be shared across stories
|
|
271
|
+
- Each analysis type (bottlenecks, failures, anomalies, trends) is a candidate for separate story
|
|
272
|
+
- JSON output support could be combined with any analysis story or kept separate
|
|
273
|
+
- CLI command registration is infrastructure supporting all stories
|
|
274
|
+
|
|
275
|
+
### Areas needing careful story framing
|
|
276
|
+
|
|
277
|
+
- The threshold values (35% for bottleneck, 15% for failure rate, 2-stddev for anomaly) should be explicitly stated in acceptance criteria
|
|
278
|
+
- The minimum data requirements (3 runs for basic insights, 6 runs for trends) need clear edge case handling
|
|
279
|
+
- The recommendation text generation needs precise acceptance criteria for consistent output
|
|
280
|
+
- Handling of ties in "most common failure stage" should be explicit
|
|
281
|
+
|
|
282
|
+
---
|
|
283
|
+
|
|
284
|
+
## 12. Change Log (Feature-Level)
|
|
285
|
+
|
|
286
|
+
| Date | Change | Reason | Raised By |
|
|
287
|
+
|------------|---------------------------------------|-------------------------------------------|-----------|
|
|
288
|
+
| 2026-02-24 | Initial feature specification created | Extend pipeline-history with actionable insights | Alex |
|
|
@@ -0,0 +1,65 @@
|
|
|
1
|
+
# Implementation Plan — Pipeline Insights
|
|
2
|
+
|
|
3
|
+
## Summary
|
|
4
|
+
|
|
5
|
+
This feature adds a new `orchestr8 insights` CLI command that performs read-only analysis of pipeline history data. It computes bottleneck detection, failure patterns, anomaly detection, and trend analysis, outputting human-readable recommendations or JSON. The implementation creates a new `src/insights.js` module that reuses `readHistoryFile()` from the existing `src/history.js`.
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## Files to Create/Modify
|
|
10
|
+
|
|
11
|
+
| Path | Action | Purpose |
|
|
12
|
+
|------|--------|---------|
|
|
13
|
+
| `src/insights.js` | Create | Core analysis engine with all computation logic |
|
|
14
|
+
| `bin/cli.js` | Modify | Register `insights` command and route flags |
|
|
15
|
+
|
|
16
|
+
---
|
|
17
|
+
|
|
18
|
+
## Implementation Steps
|
|
19
|
+
|
|
20
|
+
1. **Create `src/insights.js` scaffold** - Export main `displayInsights(options)` function that reads history via `readHistoryFile()` and validates minimum data (3 runs).
|
|
21
|
+
|
|
22
|
+
2. **Implement bottleneck analysis** - Calculate average duration per stage across successful runs; identify stage with highest mean; compute percentage of total; flag if >35%; generate recommendation if >40%.
|
|
23
|
+
|
|
24
|
+
3. **Implement failure pattern analysis** - Count failures by stage; compute failure rate per stage; identify most common failure stage; list features with repeated failures; flag if rate >15%; generate recommendation if >20%.
|
|
25
|
+
|
|
26
|
+
4. **Implement anomaly detection** - Calculate mean and stddev per stage from all runs; evaluate last 10 runs; flag any stage duration exceeding mean + 2*stddev; include slug, stage, actual, expected, deviation in output.
|
|
27
|
+
|
|
28
|
+
5. **Implement trend analysis** - Require 6+ runs; split history into first and second halves; compare success rates and average durations; classify as improving/stable/degrading based on 10% threshold; show percentage change.
|
|
29
|
+
|
|
30
|
+
6. **Implement output formatters** - Create `formatTextOutput(analysis)` for human-readable output and `formatJsonOutput(analysis)` for structured JSON; handle section filtering based on flags.
|
|
31
|
+
|
|
32
|
+
7. **Handle edge cases** - Missing history file returns "No history found"; corrupted file shows warning; insufficient data (<3 runs) shows appropriate message; no failures omits failure section with "No failures recorded".
|
|
33
|
+
|
|
34
|
+
8. **Register CLI command** - In `bin/cli.js`, import `displayInsights` from `src/insights.js`; add `insights` command with flag parsing for `--bottlenecks`, `--failures`, `--json`.
|
|
35
|
+
|
|
36
|
+
9. **Run tests and verify** - Execute `node --test test/feature_pipeline-insights.test.js` after each file change; ensure all tests pass.
|
|
37
|
+
|
|
38
|
+
10. **Final cleanup** - Verify output formatting matches AC requirements; ensure recommendations use exact wording from spec.
|
|
39
|
+
|
|
40
|
+
---
|
|
41
|
+
|
|
42
|
+
## Key Functions
|
|
43
|
+
|
|
44
|
+
**In `src/insights.js`:**
|
|
45
|
+
- `displayInsights(options)` - Main entry point; orchestrates analysis and output
|
|
46
|
+
- `analyzeBottlenecks(history)` - Returns `{ stage, avgDurationMs, percentage, isBottleneck, recommendation }`
|
|
47
|
+
- `analyzeFailures(history)` - Returns `{ failuresByStage, mostCommonStage, repeatedFeatures, recommendation }`
|
|
48
|
+
- `detectAnomalies(history)` - Returns `{ anomalies: [{slug, stage, actual, expected, deviation}], recommendation }`
|
|
49
|
+
- `analyzeTrends(history)` - Returns `{ successRate: {trend, change}, duration: {trend, change}, recommendation }`
|
|
50
|
+
- `formatTextOutput(analysis, sections)` - Formats analysis as human-readable text
|
|
51
|
+
- `formatJsonOutput(analysis, sections)` - Formats analysis as JSON object
|
|
52
|
+
- `calculateMean(values)` - Helper: compute arithmetic mean
|
|
53
|
+
- `calculateStdDev(values, mean)` - Helper: compute population standard deviation
|
|
54
|
+
|
|
55
|
+
**In `bin/cli.js`:**
|
|
56
|
+
- Extend `parseFlags()` to recognize `--bottlenecks`, `--failures`, `--json`
|
|
57
|
+
- Add `insights` command entry in `commands` object
|
|
58
|
+
|
|
59
|
+
---
|
|
60
|
+
|
|
61
|
+
## Risks/Questions
|
|
62
|
+
|
|
63
|
+
- **History schema assumption**: Implementation assumes `stages` is an object with `{name: {durationMs}}` structure per test-spec.md. If actual schema differs, adapter logic may be needed.
|
|
64
|
+
- **Tie-breaking**: Per test-spec.md, ties in "most common failure stage" resolved by first occurrence. Implementation should use stable sort or maintain insertion order.
|
|
65
|
+
- **Standard deviation formula**: Using population stddev (N divisor) per test-spec.md assumption, not sample stddev (N-1).
|
|
@@ -0,0 +1,71 @@
|
|
|
1
|
+
# Story — Anomaly Detection
|
|
2
|
+
|
|
3
|
+
## User story
|
|
4
|
+
|
|
5
|
+
As a developer, I want to identify pipeline runs where stage durations deviated significantly from normal so that I can investigate unusual behaviour and understand outliers.
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## Context / scope
|
|
10
|
+
|
|
11
|
+
- User has accumulated enough history data to establish baseline metrics
|
|
12
|
+
- Analysis uses statistical deviation (mean + 2*stddev) to identify anomalies
|
|
13
|
+
- Scope limited to last 10 runs to keep output manageable
|
|
14
|
+
- This is a read-only analysis; no pipeline state is modified
|
|
15
|
+
- Route: `orchestr8 insights` (anomaly section included by default)
|
|
16
|
+
|
|
17
|
+
Per FEATURE_SPEC.md:Section 6 (Rule: Anomaly Detection):
|
|
18
|
+
- Threshold: 2 standard deviations above mean
|
|
19
|
+
- Scope: Last 10 runs only
|
|
20
|
+
|
|
21
|
+
---
|
|
22
|
+
|
|
23
|
+
## Acceptance criteria
|
|
24
|
+
|
|
25
|
+
**AC-1 — Detect anomalous stage durations**
|
|
26
|
+
- Given the history file contains at least 3 pipeline runs,
|
|
27
|
+
- When the user runs `orchestr8 insights`,
|
|
28
|
+
- Then the output includes an "Anomalies" section listing any runs where a stage duration exceeded mean + 2*stddev.
|
|
29
|
+
|
|
30
|
+
**AC-2 — Display anomaly details**
|
|
31
|
+
- Given an anomalous run is detected,
|
|
32
|
+
- When the analysis completes,
|
|
33
|
+
- Then the output shows: feature slug, stage name, actual duration, expected duration (mean), and deviation factor.
|
|
34
|
+
|
|
35
|
+
**AC-3 — Limit scope to recent runs**
|
|
36
|
+
- Given the history contains more than 10 runs,
|
|
37
|
+
- When anomaly detection is performed,
|
|
38
|
+
- Then only the most recent 10 runs are evaluated for anomalies.
|
|
39
|
+
|
|
40
|
+
**AC-4 — Generate recommendation when anomalies found**
|
|
41
|
+
- Given one or more anomalous runs are detected,
|
|
42
|
+
- When the analysis completes,
|
|
43
|
+
- Then the output includes the recommendation: "Investigate flagged runs for unusual feature complexity".
|
|
44
|
+
|
|
45
|
+
**AC-5 — No anomalies detected**
|
|
46
|
+
- Given all recent runs have stage durations within 2 standard deviations of the mean,
|
|
47
|
+
- When the user runs `orchestr8 insights`,
|
|
48
|
+
- Then the anomalies section displays: "No anomalies detected in recent runs."
|
|
49
|
+
|
|
50
|
+
**AC-6 — Insufficient data for statistics**
|
|
51
|
+
- Given the history file contains fewer than 3 runs,
|
|
52
|
+
- When anomaly detection is attempted,
|
|
53
|
+
- Then it is skipped with explanation: "Insufficient data for anomaly detection."
|
|
54
|
+
|
|
55
|
+
---
|
|
56
|
+
|
|
57
|
+
## Out of scope
|
|
58
|
+
|
|
59
|
+
- Configurable standard deviation threshold
|
|
60
|
+
- Stage-specific anomaly thresholds
|
|
61
|
+
- Anomaly detection for failure counts (only duration-based)
|
|
62
|
+
- Historical anomaly tracking beyond last 10 runs
|
|
63
|
+
- Automatic investigation or drill-down into anomalous runs
|
|
64
|
+
|
|
65
|
+
---
|
|
66
|
+
|
|
67
|
+
## References
|
|
68
|
+
|
|
69
|
+
- Feature spec: `.blueprint/features/feature_pipeline-insights/FEATURE_SPEC.md`
|
|
70
|
+
- Upstream dependency: `.blueprint/features/feature_pipeline-history/FEATURE_SPEC.md`
|
|
71
|
+
- System spec: `.blueprint/system_specification/SYSTEM_SPEC.md`
|
|
@@ -0,0 +1,75 @@
|
|
|
1
|
+
# Story — Bottleneck Analysis
|
|
2
|
+
|
|
3
|
+
## User story
|
|
4
|
+
|
|
5
|
+
As a developer, I want to identify which pipeline stage consistently takes the longest so that I can focus optimization efforts where they will have the greatest impact.
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## Context / scope
|
|
10
|
+
|
|
11
|
+
- User has executed multiple pipeline runs via `/implement-feature`
|
|
12
|
+
- History data exists in `.claude/pipeline-history.json`
|
|
13
|
+
- This is a read-only analysis; no pipeline state is modified
|
|
14
|
+
- Route: `orchestr8 insights` or `orchestr8 insights --bottlenecks`
|
|
15
|
+
|
|
16
|
+
Per FEATURE_SPEC.md:Section 6 (Rule: Bottleneck Detection):
|
|
17
|
+
- Bottleneck threshold: >35% of total pipeline time
|
|
18
|
+
- Recommendation threshold: >40% of total pipeline time
|
|
19
|
+
|
|
20
|
+
---
|
|
21
|
+
|
|
22
|
+
## Acceptance criteria
|
|
23
|
+
|
|
24
|
+
**AC-1 — Display bottleneck stage**
|
|
25
|
+
- Given the history file contains at least 3 successful pipeline runs,
|
|
26
|
+
- When the user runs `orchestr8 insights`,
|
|
27
|
+
- Then the output includes a "Bottlenecks" section identifying the stage with the highest average duration.
|
|
28
|
+
|
|
29
|
+
**AC-2 — Show percentage of total time**
|
|
30
|
+
- Given a bottleneck stage is identified,
|
|
31
|
+
- When the analysis completes,
|
|
32
|
+
- Then the output displays the stage name, average duration in milliseconds, and percentage of total pipeline time.
|
|
33
|
+
|
|
34
|
+
**AC-3 — Bottleneck threshold reporting**
|
|
35
|
+
- Given a stage accounts for more than 35% of total pipeline time,
|
|
36
|
+
- When the analysis completes,
|
|
37
|
+
- Then that stage is flagged as a bottleneck in the output.
|
|
38
|
+
|
|
39
|
+
**AC-4 — Generate recommendation for severe bottleneck**
|
|
40
|
+
- Given a stage accounts for more than 40% of total pipeline time,
|
|
41
|
+
- When the analysis completes,
|
|
42
|
+
- Then the output includes the recommendation: "Consider simplifying {stage} requirements or splitting features".
|
|
43
|
+
|
|
44
|
+
**AC-5 — Filter to bottlenecks only**
|
|
45
|
+
- Given the user runs `orchestr8 insights --bottlenecks`,
|
|
46
|
+
- When the analysis completes,
|
|
47
|
+
- Then only the bottleneck analysis section is displayed (other analysis types are omitted).
|
|
48
|
+
|
|
49
|
+
**AC-6 — Insufficient data handling**
|
|
50
|
+
- Given the history file contains fewer than 3 runs,
|
|
51
|
+
- When the user runs `orchestr8 insights`,
|
|
52
|
+
- Then the output displays: "Insufficient data for insights. Complete at least 3 pipeline runs."
|
|
53
|
+
|
|
54
|
+
**AC-7 — Missing history file handling**
|
|
55
|
+
- Given no history file exists at `.claude/pipeline-history.json`,
|
|
56
|
+
- When the user runs `orchestr8 insights`,
|
|
57
|
+
- Then the output displays: "No history found. Run some pipelines first."
|
|
58
|
+
|
|
59
|
+
---
|
|
60
|
+
|
|
61
|
+
## Out of scope
|
|
62
|
+
|
|
63
|
+
- Modifying the history file or pipeline configuration
|
|
64
|
+
- Customizing the 35%/40% threshold values
|
|
65
|
+
- Providing automated remediation
|
|
66
|
+
- Stage-specific threshold configuration
|
|
67
|
+
- Analysis of partial/in-progress runs
|
|
68
|
+
|
|
69
|
+
---
|
|
70
|
+
|
|
71
|
+
## References
|
|
72
|
+
|
|
73
|
+
- Feature spec: `.blueprint/features/feature_pipeline-insights/FEATURE_SPEC.md`
|
|
74
|
+
- Upstream dependency: `.blueprint/features/feature_pipeline-history/FEATURE_SPEC.md`
|
|
75
|
+
- System spec: `.blueprint/system_specification/SYSTEM_SPEC.md`
|
|
@@ -0,0 +1,75 @@
|
|
|
1
|
+
# Story — Failure Pattern Analysis
|
|
2
|
+
|
|
3
|
+
## User story
|
|
4
|
+
|
|
5
|
+
As a developer, I want to analyze which pipeline stages fail most frequently so that I can identify systemic issues and improve pipeline reliability.
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## Context / scope
|
|
10
|
+
|
|
11
|
+
- User has executed multiple pipeline runs, some of which have failed
|
|
12
|
+
- History data includes entries with `status: "failed"` and associated stage information
|
|
13
|
+
- This is a read-only analysis; no pipeline state is modified
|
|
14
|
+
- Route: `orchestr8 insights` or `orchestr8 insights --failures`
|
|
15
|
+
|
|
16
|
+
Per FEATURE_SPEC.md:Section 6 (Rule: Failure Pattern Analysis):
|
|
17
|
+
- Failure rate threshold: >15% is reported as concerning
|
|
18
|
+
- Recommendation threshold: >20% triggers specific recommendation
|
|
19
|
+
|
|
20
|
+
---
|
|
21
|
+
|
|
22
|
+
## Acceptance criteria
|
|
23
|
+
|
|
24
|
+
**AC-1 — Display failure rates per stage**
|
|
25
|
+
- Given the history file contains at least 3 pipeline runs with at least one failure,
|
|
26
|
+
- When the user runs `orchestr8 insights`,
|
|
27
|
+
- Then the output includes a "Failure Patterns" section showing failure rate for each stage that has experienced failures.
|
|
28
|
+
|
|
29
|
+
**AC-2 — Identify most common failure stage**
|
|
30
|
+
- Given failures exist in the history,
|
|
31
|
+
- When the analysis completes,
|
|
32
|
+
- Then the output identifies the stage with the highest failure count as the "most common failure stage".
|
|
33
|
+
|
|
34
|
+
**AC-3 — Flag concerning failure rates**
|
|
35
|
+
- Given a stage has a failure rate greater than 15%,
|
|
36
|
+
- When the analysis completes,
|
|
37
|
+
- Then that stage is flagged as having a concerning failure rate.
|
|
38
|
+
|
|
39
|
+
**AC-4 — Generate recommendation for high failure rate**
|
|
40
|
+
- Given a stage has a failure rate greater than 20%,
|
|
41
|
+
- When the analysis completes,
|
|
42
|
+
- Then the output includes the recommendation: "Review {stage} agent configuration or specification clarity".
|
|
43
|
+
|
|
44
|
+
**AC-5 — Identify features with repeated failures**
|
|
45
|
+
- Given the same feature slug has failed multiple times,
|
|
46
|
+
- When the analysis completes,
|
|
47
|
+
- Then those features are listed as correlation hints (e.g., "Feature 'complex-auth' has failed 3 times").
|
|
48
|
+
|
|
49
|
+
**AC-6 — Filter to failures only**
|
|
50
|
+
- Given the user runs `orchestr8 insights --failures`,
|
|
51
|
+
- When the analysis completes,
|
|
52
|
+
- Then only the failure pattern analysis section is displayed (other analysis types are omitted).
|
|
53
|
+
|
|
54
|
+
**AC-7 — No failures recorded**
|
|
55
|
+
- Given all pipeline runs in history have status "completed" (no failures),
|
|
56
|
+
- When the user runs `orchestr8 insights`,
|
|
57
|
+
- Then the failure analysis section displays: "No failures recorded" and is omitted from recommendations.
|
|
58
|
+
|
|
59
|
+
---
|
|
60
|
+
|
|
61
|
+
## Out of scope
|
|
62
|
+
|
|
63
|
+
- Automatic correlation with feature complexity metrics
|
|
64
|
+
- Root cause analysis beyond stage identification
|
|
65
|
+
- Failure notification or alerting
|
|
66
|
+
- Retry or remediation automation
|
|
67
|
+
- Classification of failure types (timeout vs error vs abort)
|
|
68
|
+
|
|
69
|
+
---
|
|
70
|
+
|
|
71
|
+
## References
|
|
72
|
+
|
|
73
|
+
- Feature spec: `.blueprint/features/feature_pipeline-insights/FEATURE_SPEC.md`
|
|
74
|
+
- Upstream dependency: `.blueprint/features/feature_pipeline-history/FEATURE_SPEC.md`
|
|
75
|
+
- System spec: `.blueprint/system_specification/SYSTEM_SPEC.md`
|
|
@@ -0,0 +1,75 @@
|
|
|
1
|
+
# Story — JSON Output
|
|
2
|
+
|
|
3
|
+
## User story
|
|
4
|
+
|
|
5
|
+
As a developer, I want to export pipeline insights as structured JSON so that I can integrate the analysis with other tools or process the data programmatically.
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## Context / scope
|
|
10
|
+
|
|
11
|
+
- User wants machine-readable output instead of human-readable text
|
|
12
|
+
- JSON output contains all the same analysis data as text output
|
|
13
|
+
- Enables integration with CI/CD pipelines, dashboards, or custom tooling
|
|
14
|
+
- This is a read-only analysis; no pipeline state is modified
|
|
15
|
+
- Route: `orchestr8 insights --json`
|
|
16
|
+
|
|
17
|
+
Per FEATURE_SPEC.md:Section 4 (Key alternatives or branches):
|
|
18
|
+
- `--json` flag produces structured JSON instead of formatted text
|
|
19
|
+
|
|
20
|
+
---
|
|
21
|
+
|
|
22
|
+
## Acceptance criteria
|
|
23
|
+
|
|
24
|
+
**AC-1 — Output JSON when flag provided**
|
|
25
|
+
- Given the history file contains valid pipeline data,
|
|
26
|
+
- When the user runs `orchestr8 insights --json`,
|
|
27
|
+
- Then the output is valid JSON (parseable by `JSON.parse()`).
|
|
28
|
+
|
|
29
|
+
**AC-2 — Include bottleneck data in JSON**
|
|
30
|
+
- Given bottleneck analysis completes successfully,
|
|
31
|
+
- When `--json` flag is provided,
|
|
32
|
+
- Then the JSON output includes a `bottlenecks` object with: `stage`, `averageDurationMs`, `percentageOfTotal`, `isBottleneck`, `recommendation` (if applicable).
|
|
33
|
+
|
|
34
|
+
**AC-3 — Include failure data in JSON**
|
|
35
|
+
- Given failure analysis completes successfully,
|
|
36
|
+
- When `--json` flag is provided,
|
|
37
|
+
- Then the JSON output includes a `failures` object with: `failuresByStage` (array), `mostCommonFailureStage`, `featuresWithRepeatedFailures` (array), `recommendation` (if applicable).
|
|
38
|
+
|
|
39
|
+
**AC-4 — Include anomaly data in JSON**
|
|
40
|
+
- Given anomaly detection completes successfully,
|
|
41
|
+
- When `--json` flag is provided,
|
|
42
|
+
- Then the JSON output includes an `anomalies` object with: `detected` (array of anomalous runs), `recommendation` (if applicable).
|
|
43
|
+
|
|
44
|
+
**AC-5 — Include trend data in JSON**
|
|
45
|
+
- Given trend analysis completes successfully,
|
|
46
|
+
- When `--json` flag is provided,
|
|
47
|
+
- Then the JSON output includes a `trends` object with: `successRate` (trend + percentage), `duration` (trend + percentage), `recommendation` (if applicable).
|
|
48
|
+
|
|
49
|
+
**AC-6 — Combine JSON with filter flags**
|
|
50
|
+
- Given the user runs `orchestr8 insights --json --bottlenecks`,
|
|
51
|
+
- When the analysis completes,
|
|
52
|
+
- Then the JSON output includes only the `bottlenecks` section (other analysis types are omitted).
|
|
53
|
+
|
|
54
|
+
**AC-7 — Handle insufficient data in JSON**
|
|
55
|
+
- Given there is insufficient data for analysis,
|
|
56
|
+
- When `--json` flag is provided,
|
|
57
|
+
- Then the JSON output includes an `error` field with the appropriate message (e.g., `{"error": "Insufficient data for insights. Complete at least 3 pipeline runs."}`).
|
|
58
|
+
|
|
59
|
+
---
|
|
60
|
+
|
|
61
|
+
## Out of scope
|
|
62
|
+
|
|
63
|
+
- Exporting JSON to a file (output to stdout only)
|
|
64
|
+
- JSON schema validation or versioning
|
|
65
|
+
- Compressed or minified JSON output options
|
|
66
|
+
- Integration with specific external platforms
|
|
67
|
+
- Historical JSON output comparison
|
|
68
|
+
|
|
69
|
+
---
|
|
70
|
+
|
|
71
|
+
## References
|
|
72
|
+
|
|
73
|
+
- Feature spec: `.blueprint/features/feature_pipeline-insights/FEATURE_SPEC.md`
|
|
74
|
+
- Upstream dependency: `.blueprint/features/feature_pipeline-history/FEATURE_SPEC.md`
|
|
75
|
+
- System spec: `.blueprint/system_specification/SYSTEM_SPEC.md`
|