orchestr8 2.5.0 → 2.6.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.blueprint/agents/AGENT_BA_CASS.md +42 -19
- package/.blueprint/agents/AGENT_DEVELOPER_CODEY.md +42 -38
- package/.blueprint/agents/AGENT_SPECIFICATION_ALEX.md +45 -0
- package/.blueprint/agents/AGENT_TESTER_NIGEL.md +42 -21
- package/.blueprint/features/feature_adaptive-retry/FEATURE_SPEC.md +239 -0
- package/.blueprint/features/feature_adaptive-retry/IMPLEMENTATION_PLAN.md +48 -0
- package/.blueprint/features/feature_adaptive-retry/story-prompt-modification.md +85 -0
- package/.blueprint/features/feature_adaptive-retry/story-retry-config.md +89 -0
- package/.blueprint/features/feature_adaptive-retry/story-should-retry.md +98 -0
- package/.blueprint/features/feature_adaptive-retry/story-strategy-recommendation.md +85 -0
- package/.blueprint/features/feature_agent-guardrails/FEATURE_SPEC.md +328 -0
- package/.blueprint/features/feature_agent-guardrails/IMPLEMENTATION_PLAN.md +90 -0
- package/.blueprint/features/feature_agent-guardrails/story-citation-requirements.md +50 -0
- package/.blueprint/features/feature_agent-guardrails/story-confidentiality.md +50 -0
- package/.blueprint/features/feature_agent-guardrails/story-escalation-protocol.md +55 -0
- package/.blueprint/features/feature_agent-guardrails/story-source-restrictions.md +50 -0
- package/.blueprint/features/feature_feedback-loop/FEATURE_SPEC.md +347 -0
- package/.blueprint/features/feature_feedback-loop/IMPLEMENTATION_PLAN.md +71 -0
- package/.blueprint/features/feature_feedback-loop/story-feedback-collection.md +63 -0
- package/.blueprint/features/feature_feedback-loop/story-feedback-config.md +61 -0
- package/.blueprint/features/feature_feedback-loop/story-feedback-insights.md +63 -0
- package/.blueprint/features/feature_feedback-loop/story-quality-gates.md +57 -0
- package/.blueprint/features/feature_pipeline-history/FEATURE_SPEC.md +239 -0
- package/.blueprint/features/feature_pipeline-history/IMPLEMENTATION_PLAN.md +71 -0
- package/.blueprint/features/feature_pipeline-history/story-clear-history.md +73 -0
- package/.blueprint/features/feature_pipeline-history/story-display-history.md +75 -0
- package/.blueprint/features/feature_pipeline-history/story-record-execution.md +76 -0
- package/.blueprint/features/feature_pipeline-history/story-show-statistics.md +85 -0
- package/.blueprint/features/feature_pipeline-insights/FEATURE_SPEC.md +288 -0
- package/.blueprint/features/feature_pipeline-insights/IMPLEMENTATION_PLAN.md +65 -0
- package/.blueprint/features/feature_pipeline-insights/story-anomaly-detection.md +71 -0
- package/.blueprint/features/feature_pipeline-insights/story-bottleneck-analysis.md +75 -0
- package/.blueprint/features/feature_pipeline-insights/story-failure-patterns.md +75 -0
- package/.blueprint/features/feature_pipeline-insights/story-json-output.md +75 -0
- package/.blueprint/features/feature_pipeline-insights/story-trend-analysis.md +78 -0
- package/.blueprint/features/feature_validate-command/FEATURE_SPEC.md +209 -0
- package/.blueprint/features/feature_validate-command/IMPLEMENTATION_PLAN.md +59 -0
- package/.blueprint/features/feature_validate-command/story-failure-output.md +61 -0
- package/.blueprint/features/feature_validate-command/story-node-version-check.md +52 -0
- package/.blueprint/features/feature_validate-command/story-run-validation.md +59 -0
- package/.blueprint/features/feature_validate-command/story-success-output.md +50 -0
- package/.blueprint/system_specification/SYSTEM_SPEC.md +248 -0
- package/README.md +170 -38
- package/SKILL.md +333 -23
- package/bin/cli.js +128 -20
- package/package.json +1 -1
- package/src/feedback.js +171 -0
- package/src/history.js +306 -0
- package/src/index.js +57 -2
- package/src/init.js +2 -6
- package/src/insights.js +504 -0
- package/src/retry.js +274 -0
- package/src/validate.js +172 -0
- package/src/skills.js +0 -93
|
@@ -0,0 +1,75 @@
|
|
|
1
|
+
# Story — Display Pipeline History
|
|
2
|
+
|
|
3
|
+
## User story
|
|
4
|
+
|
|
5
|
+
As a developer using orchestr8, I want to view recent pipeline runs via CLI so that I can review execution history and identify patterns.
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## Context / scope
|
|
10
|
+
|
|
11
|
+
- New CLI command: `orchestr8 history`
|
|
12
|
+
- Displays tabular list of recent pipeline executions
|
|
13
|
+
- Per FEATURE_SPEC.md:Section 6 (Rule: Display Limit Default), shows last 10 runs by default
|
|
14
|
+
- Requires `.claude/pipeline-history.json` to exist with valid entries
|
|
15
|
+
|
|
16
|
+
---
|
|
17
|
+
|
|
18
|
+
## Acceptance criteria
|
|
19
|
+
|
|
20
|
+
**AC-1 — Display recent runs**
|
|
21
|
+
- Given `.claude/pipeline-history.json` contains history entries,
|
|
22
|
+
- When I run `orchestr8 history`,
|
|
23
|
+
- Then I see a list of the 10 most recent runs showing: slug, status, date, total duration.
|
|
24
|
+
|
|
25
|
+
**AC-2 — Display all runs with flag**
|
|
26
|
+
- Given `.claude/pipeline-history.json` contains more than 10 entries,
|
|
27
|
+
- When I run `orchestr8 history --all`,
|
|
28
|
+
- Then I see all history entries (not truncated to 10).
|
|
29
|
+
|
|
30
|
+
**AC-3 — Empty history message**
|
|
31
|
+
- Given `.claude/pipeline-history.json` is empty or does not exist,
|
|
32
|
+
- When I run `orchestr8 history`,
|
|
33
|
+
- Then I see a message: "No pipeline history found."
|
|
34
|
+
|
|
35
|
+
**AC-4 — Corrupted file handling**
|
|
36
|
+
- Given `.claude/pipeline-history.json` contains invalid JSON,
|
|
37
|
+
- When I run `orchestr8 history`,
|
|
38
|
+
- Then I see a warning: "History file is corrupted. Run 'orchestr8 history clear' to reset."
|
|
39
|
+
- And the command exits with code 0 (non-blocking).
|
|
40
|
+
|
|
41
|
+
**AC-5 — Status colour coding**
|
|
42
|
+
- Given history entries with different statuses,
|
|
43
|
+
- When displayed in the terminal,
|
|
44
|
+
- Then `success` entries show in green, `failed` in red, `paused` in yellow.
|
|
45
|
+
|
|
46
|
+
**AC-6 — Most recent first ordering**
|
|
47
|
+
- Given multiple history entries,
|
|
48
|
+
- When displayed,
|
|
49
|
+
- Then entries are ordered by `completedAt` descending (most recent first).
|
|
50
|
+
|
|
51
|
+
---
|
|
52
|
+
|
|
53
|
+
## CLI output format
|
|
54
|
+
|
|
55
|
+
```
|
|
56
|
+
Pipeline History (showing 10 of 25 runs)
|
|
57
|
+
|
|
58
|
+
SLUG STATUS DATE DURATION
|
|
59
|
+
user-auth success 2026-02-24 10:15:00 15m 32s
|
|
60
|
+
payment-flow failed 2026-02-24 09:45:00 8m 12s (failed at: nigel)
|
|
61
|
+
checkout-page paused 2026-02-23 16:30:00 5m 00s (paused at: cass)
|
|
62
|
+
...
|
|
63
|
+
|
|
64
|
+
Run 'orchestr8 history --all' to see all entries.
|
|
65
|
+
Run 'orchestr8 history --stats' for aggregate statistics.
|
|
66
|
+
```
|
|
67
|
+
|
|
68
|
+
---
|
|
69
|
+
|
|
70
|
+
## Out of scope
|
|
71
|
+
|
|
72
|
+
- Filtering by status (e.g., `--status=failed`)
|
|
73
|
+
- Filtering by date range
|
|
74
|
+
- Pagination beyond simple `--all` flag
|
|
75
|
+
- Detailed per-stage breakdown in list view (use `--stats` for that)
|
|
@@ -0,0 +1,76 @@
|
|
|
1
|
+
# Story — Record Pipeline Execution
|
|
2
|
+
|
|
3
|
+
## User story
|
|
4
|
+
|
|
5
|
+
As a developer using orchestr8, I want pipeline execution data to be automatically recorded during runs so that I have historical data for later analysis.
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## Context / scope
|
|
10
|
+
|
|
11
|
+
- Applies to all pipeline invocations via `/implement-feature`
|
|
12
|
+
- Recording occurs at stage boundaries: alex, cass, nigel, codey-plan, codey-implement
|
|
13
|
+
- Data persisted to `.claude/pipeline-history.json`
|
|
14
|
+
- Per FEATURE_SPEC.md:Section 5 (State & Lifecycle), recording creates new history entries without altering pipeline flow
|
|
15
|
+
|
|
16
|
+
---
|
|
17
|
+
|
|
18
|
+
## Acceptance criteria
|
|
19
|
+
|
|
20
|
+
**AC-1 — History entry created on pipeline completion**
|
|
21
|
+
- Given I invoke `/implement-feature "my-feature"`,
|
|
22
|
+
- When the pipeline completes successfully,
|
|
23
|
+
- Then a history entry is appended to `.claude/pipeline-history.json` with status `success`.
|
|
24
|
+
|
|
25
|
+
**AC-2 — History entry created on pipeline failure**
|
|
26
|
+
- Given I invoke `/implement-feature "my-feature"`,
|
|
27
|
+
- When a stage fails during execution,
|
|
28
|
+
- Then a history entry is appended with status `failed` and the failing stage recorded.
|
|
29
|
+
|
|
30
|
+
**AC-3 — History entry created on pipeline pause**
|
|
31
|
+
- Given I invoke `/implement-feature "my-feature" --pause-after=cass`,
|
|
32
|
+
- When the pipeline pauses after the specified stage,
|
|
33
|
+
- Then a history entry is appended with status `paused` and stages completed up to the pause point.
|
|
34
|
+
|
|
35
|
+
**AC-4 — Timestamps recorded per stage**
|
|
36
|
+
- Given a pipeline run completes (success, failure, or pause),
|
|
37
|
+
- When the history entry is created,
|
|
38
|
+
- Then each completed stage has `startedAt` and `completedAt` timestamps in ISO 8601 format.
|
|
39
|
+
|
|
40
|
+
**AC-5 — History file created if absent**
|
|
41
|
+
- Given `.claude/pipeline-history.json` does not exist,
|
|
42
|
+
- When a pipeline run completes,
|
|
43
|
+
- Then the file is created with an array containing the single history entry.
|
|
44
|
+
|
|
45
|
+
**AC-6 — Recording failure does not abort pipeline**
|
|
46
|
+
- Given the history file cannot be written (e.g., permissions error),
|
|
47
|
+
- When a pipeline run completes,
|
|
48
|
+
- Then a warning is logged but the pipeline completes normally.
|
|
49
|
+
|
|
50
|
+
---
|
|
51
|
+
|
|
52
|
+
## History entry structure
|
|
53
|
+
|
|
54
|
+
```json
|
|
55
|
+
{
|
|
56
|
+
"slug": "my-feature",
|
|
57
|
+
"status": "success" | "failed" | "paused",
|
|
58
|
+
"startedAt": "2026-02-24T10:00:00.000Z",
|
|
59
|
+
"completedAt": "2026-02-24T10:15:00.000Z",
|
|
60
|
+
"stages": {
|
|
61
|
+
"alex": { "startedAt": "...", "completedAt": "...", "durationMs": 120000 },
|
|
62
|
+
"cass": { "startedAt": "...", "completedAt": "...", "durationMs": 90000 },
|
|
63
|
+
...
|
|
64
|
+
},
|
|
65
|
+
"failedStage": "nigel" | null
|
|
66
|
+
}
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
---
|
|
70
|
+
|
|
71
|
+
## Out of scope
|
|
72
|
+
|
|
73
|
+
- Real-time metrics streaming during execution
|
|
74
|
+
- Detailed error logs or stack traces (only stage-level status)
|
|
75
|
+
- Modifying past history entries
|
|
76
|
+
- History file rotation or size management
|
|
@@ -0,0 +1,85 @@
|
|
|
1
|
+
# Story — Show Pipeline Statistics
|
|
2
|
+
|
|
3
|
+
## User story
|
|
4
|
+
|
|
5
|
+
As a developer using orchestr8, I want to see aggregate statistics about pipeline performance so that I can identify bottlenecks and improvement opportunities.
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## Context / scope
|
|
10
|
+
|
|
11
|
+
- New CLI flag: `orchestr8 history --stats`
|
|
12
|
+
- Computes statistics from all entries in `.claude/pipeline-history.json`
|
|
13
|
+
- Per FEATURE_SPEC.md:Section 6 (Rule: Statistics Aggregation), statistics are computed on-read
|
|
14
|
+
- Per FEATURE_SPEC.md:Section 4 (User-visible outcomes): success rate, avg duration, common failure stage
|
|
15
|
+
|
|
16
|
+
---
|
|
17
|
+
|
|
18
|
+
## Acceptance criteria
|
|
19
|
+
|
|
20
|
+
**AC-1 — Display success rate**
|
|
21
|
+
- Given history contains both successful and failed runs,
|
|
22
|
+
- When I run `orchestr8 history --stats`,
|
|
23
|
+
- Then I see the success rate as a percentage (e.g., "Success rate: 85% (17/20 runs)").
|
|
24
|
+
|
|
25
|
+
**AC-2 — Display average duration per stage**
|
|
26
|
+
- Given history contains completed runs,
|
|
27
|
+
- When I run `orchestr8 history --stats`,
|
|
28
|
+
- Then I see average duration for each stage (alex, cass, nigel, codey-plan, codey-implement).
|
|
29
|
+
|
|
30
|
+
**AC-3 — Display total average duration**
|
|
31
|
+
- Given history contains completed runs,
|
|
32
|
+
- When I run `orchestr8 history --stats`,
|
|
33
|
+
- Then I see the average total pipeline duration across all successful runs.
|
|
34
|
+
|
|
35
|
+
**AC-4 — Display most common failure stage**
|
|
36
|
+
- Given history contains failed runs,
|
|
37
|
+
- When I run `orchestr8 history --stats`,
|
|
38
|
+
- Then I see the stage that has failed most frequently (e.g., "Most common failure: nigel (3 failures)").
|
|
39
|
+
|
|
40
|
+
**AC-5 — Handle ties in failure stage**
|
|
41
|
+
- Given multiple stages have equal failure counts,
|
|
42
|
+
- When I run `orchestr8 history --stats`,
|
|
43
|
+
- Then all tied stages are listed (e.g., "Most common failures: cass, nigel (2 each)").
|
|
44
|
+
|
|
45
|
+
**AC-6 — No failures message**
|
|
46
|
+
- Given history contains only successful runs,
|
|
47
|
+
- When I run `orchestr8 history --stats`,
|
|
48
|
+
- Then I see "No failures recorded" instead of failure stage data.
|
|
49
|
+
|
|
50
|
+
**AC-7 — Insufficient data message**
|
|
51
|
+
- Given history is empty or has no completed runs,
|
|
52
|
+
- When I run `orchestr8 history --stats`,
|
|
53
|
+
- Then I see "Insufficient data for statistics. Complete at least one pipeline run."
|
|
54
|
+
|
|
55
|
+
---
|
|
56
|
+
|
|
57
|
+
## CLI output format
|
|
58
|
+
|
|
59
|
+
```
|
|
60
|
+
Pipeline Statistics (based on 25 runs)
|
|
61
|
+
|
|
62
|
+
METRIC VALUE
|
|
63
|
+
Success rate 85% (17/20 completed runs)
|
|
64
|
+
Total runs 25 (17 success, 5 failed, 3 paused)
|
|
65
|
+
Avg pipeline duration 12m 45s
|
|
66
|
+
|
|
67
|
+
STAGE AVG DURATION FAILURES
|
|
68
|
+
alex 2m 30s 0
|
|
69
|
+
cass 1m 45s 1
|
|
70
|
+
nigel 3m 00s 3
|
|
71
|
+
codey-plan 1m 15s 0
|
|
72
|
+
codey-implement 4m 15s 1
|
|
73
|
+
|
|
74
|
+
Most common failure: nigel (3 failures)
|
|
75
|
+
```
|
|
76
|
+
|
|
77
|
+
---
|
|
78
|
+
|
|
79
|
+
## Out of scope
|
|
80
|
+
|
|
81
|
+
- Median duration (per FEATURE_SPEC.md:Section 9 open question)
|
|
82
|
+
- Percentile calculations (p50, p90, p99)
|
|
83
|
+
- Time-range filtering for statistics
|
|
84
|
+
- Graphical visualisations
|
|
85
|
+
- Export to external formats
|
|
@@ -0,0 +1,288 @@
|
|
|
1
|
+
# Feature Specification — Pipeline Insights
|
|
2
|
+
|
|
3
|
+
## 1. Feature Intent
|
|
4
|
+
|
|
5
|
+
**Why this feature exists.**
|
|
6
|
+
|
|
7
|
+
- **Problem being addressed:** The pipeline-history feature captures execution data but provides only basic statistics. Users cannot identify optimization opportunities—such as bottleneck stages, failure patterns, or performance trends—without manual analysis of the raw history data.
|
|
8
|
+
- **User need:** Developers want actionable recommendations to improve pipeline efficiency. They need to understand which stages are slowest, why failures occur, and whether the pipeline is improving or degrading over time.
|
|
9
|
+
- **System purpose alignment:** Per SYSTEM_SPEC.md:Section 8 (Cross-Cutting Concerns:Observability), the system aims for observability via queue status and agent summaries. Per SYSTEM_SPEC.md:Section 2 (Business & Domain Context), orchestr8 seeks to provide "structured processes to guide AI-generated code." This feature extends observability into actionable intelligence, enabling users to optimize their development workflow.
|
|
10
|
+
|
|
11
|
+
> This feature builds upon the existing pipeline-history feature (`.blueprint/features/feature_pipeline-history/FEATURE_SPEC.md`) without modifying history recording. It is a read-only analysis layer.
|
|
12
|
+
|
|
13
|
+
---
|
|
14
|
+
|
|
15
|
+
## 2. Scope
|
|
16
|
+
|
|
17
|
+
### In Scope
|
|
18
|
+
|
|
19
|
+
- New CLI command `orchestr8 insights` that analyzes `.claude/pipeline-history.json`
|
|
20
|
+
- Bottleneck detection: Identify which stage consistently takes longest
|
|
21
|
+
- Failure pattern analysis: Determine which stages fail most and correlate with feature characteristics
|
|
22
|
+
- Anomaly detection: Flag runs that deviate significantly from average durations
|
|
23
|
+
- Trend analysis: Track whether pipeline performance is improving or degrading over time
|
|
24
|
+
- Agent performance comparison: Compare stage durations and success rates across agents
|
|
25
|
+
- Flag support for filtering analysis types (`--bottlenecks`, `--failures`, `--json`)
|
|
26
|
+
- Human-readable recommendations based on detected patterns
|
|
27
|
+
|
|
28
|
+
### Out of Scope
|
|
29
|
+
|
|
30
|
+
- Modifying pipeline-history recording logic (that feature is separate)
|
|
31
|
+
- Machine learning or complex statistical models (simple heuristics only)
|
|
32
|
+
- Automatic remediation or pipeline configuration changes
|
|
33
|
+
- Integration with external analytics platforms
|
|
34
|
+
- Predictive modelling of future pipeline performance
|
|
35
|
+
- Feature-type classification (assumes slugs are opaque identifiers)
|
|
36
|
+
|
|
37
|
+
---
|
|
38
|
+
|
|
39
|
+
## 3. Actors Involved
|
|
40
|
+
|
|
41
|
+
### Human User
|
|
42
|
+
|
|
43
|
+
- **Can do:** Invoke `orchestr8 insights` to view optimization recommendations; filter by analysis type; export as JSON for programmatic use
|
|
44
|
+
- **Cannot do:** Modify the analysis thresholds or algorithms; act on recommendations automatically
|
|
45
|
+
|
|
46
|
+
### Insights Analyzer (internal component)
|
|
47
|
+
|
|
48
|
+
- **Can do:** Read history file; compute statistics; generate recommendations; output formatted reports
|
|
49
|
+
- **Cannot do:** Write to history file; modify pipeline configuration; alter agent behaviour
|
|
50
|
+
|
|
51
|
+
---
|
|
52
|
+
|
|
53
|
+
## 4. Behaviour Overview
|
|
54
|
+
|
|
55
|
+
### Happy-path behaviour
|
|
56
|
+
|
|
57
|
+
1. User runs `orchestr8 insights` after accumulating several pipeline runs
|
|
58
|
+
2. System reads `.claude/pipeline-history.json` and validates data sufficiency
|
|
59
|
+
3. System performs analysis across four dimensions: bottlenecks, failures, anomalies, trends
|
|
60
|
+
4. System generates human-readable report with recommendations
|
|
61
|
+
5. User reviews recommendations and decides which to act upon
|
|
62
|
+
|
|
63
|
+
### Key alternatives or branches
|
|
64
|
+
|
|
65
|
+
- **Insufficient data:** If fewer than 3 runs exist, display message: "Insufficient data for insights. Complete at least 3 pipeline runs."
|
|
66
|
+
- **No failures:** If all runs succeeded, omit failure analysis section; note "No failures recorded"
|
|
67
|
+
- **Filtered analysis:** If `--bottlenecks` or `--failures` flag provided, display only that section
|
|
68
|
+
- **JSON output:** If `--json` flag provided, output structured JSON instead of formatted text
|
|
69
|
+
- **Corrupted history:** If history file is corrupted, display warning and exit gracefully
|
|
70
|
+
|
|
71
|
+
### User-visible outcomes
|
|
72
|
+
|
|
73
|
+
- Identification of the slowest pipeline stage with percentage of total time
|
|
74
|
+
- List of stages with high failure rates and potential contributing factors
|
|
75
|
+
- Flagged anomalous runs that deviated significantly from norms
|
|
76
|
+
- Trend indicators showing improvement or degradation over recent runs
|
|
77
|
+
- Actionable recommendations for each identified issue
|
|
78
|
+
|
|
79
|
+
---
|
|
80
|
+
|
|
81
|
+
## 5. State & Lifecycle Interactions
|
|
82
|
+
|
|
83
|
+
### States entered
|
|
84
|
+
|
|
85
|
+
- None. This feature is stateless and read-only.
|
|
86
|
+
|
|
87
|
+
### States modified
|
|
88
|
+
|
|
89
|
+
- None. This feature does not modify any system state.
|
|
90
|
+
|
|
91
|
+
### This feature is:
|
|
92
|
+
|
|
93
|
+
- **Not state-creating:** Does not persist analysis results
|
|
94
|
+
- **Not state-transitioning:** Does not alter pipeline flow
|
|
95
|
+
- **Not state-constraining:** Does not block any operations
|
|
96
|
+
|
|
97
|
+
This is a pure read-only analysis feature that operates on existing history data.
|
|
98
|
+
|
|
99
|
+
---
|
|
100
|
+
|
|
101
|
+
## 6. Rules & Decision Logic
|
|
102
|
+
|
|
103
|
+
### Rule: Bottleneck Detection
|
|
104
|
+
|
|
105
|
+
- **Description:** Identify the stage that consumes the largest proportion of total pipeline time
|
|
106
|
+
- **Inputs:** Stage durations from successful runs
|
|
107
|
+
- **Outputs:** Stage name, average duration, percentage of total pipeline time
|
|
108
|
+
- **Algorithm:** Calculate mean duration per stage; identify stage with highest mean; compute as percentage of sum
|
|
109
|
+
- **Threshold:** Report as bottleneck if stage accounts for >35% of total pipeline time
|
|
110
|
+
- **Deterministic:** Yes
|
|
111
|
+
|
|
112
|
+
### Rule: Failure Pattern Analysis
|
|
113
|
+
|
|
114
|
+
- **Description:** Identify stages with disproportionate failure rates and correlate with feature characteristics
|
|
115
|
+
- **Inputs:** All history entries with status `failed`; feature slugs
|
|
116
|
+
- **Outputs:** Failure rate per stage; most common failure stage; correlation hints
|
|
117
|
+
- **Algorithm:** Count failures by stage; compute failure rate as failures/total runs for that stage; identify features with repeated failures
|
|
118
|
+
- **Threshold:** Report as concerning if failure rate >15%
|
|
119
|
+
- **Deterministic:** Yes
|
|
120
|
+
|
|
121
|
+
### Rule: Anomaly Detection
|
|
122
|
+
|
|
123
|
+
- **Description:** Flag individual runs where stage duration deviates significantly from average
|
|
124
|
+
- **Inputs:** Stage durations from all runs; calculated means and standard deviations
|
|
125
|
+
- **Outputs:** List of anomalous runs with stage, actual duration, expected duration
|
|
126
|
+
- **Algorithm:** Calculate mean and standard deviation per stage; flag if duration > mean + 2*stddev
|
|
127
|
+
- **Threshold:** 2 standard deviations above mean
|
|
128
|
+
- **Scope:** Last 10 runs only (to limit output)
|
|
129
|
+
- **Deterministic:** Yes
|
|
130
|
+
|
|
131
|
+
### Rule: Trend Analysis
|
|
132
|
+
|
|
133
|
+
- **Description:** Determine if pipeline performance is improving or degrading over time
|
|
134
|
+
- **Inputs:** All history entries, sorted chronologically
|
|
135
|
+
- **Outputs:** Success rate trend (improving/stable/degrading); duration trend (improving/stable/degrading)
|
|
136
|
+
- **Algorithm:** Compare metrics from first half vs second half of history; compute percentage change
|
|
137
|
+
- **Thresholds:** Improving if >10% better; degrading if >10% worse; stable otherwise
|
|
138
|
+
- **Minimum data:** Requires at least 6 runs to compute trends
|
|
139
|
+
- **Deterministic:** Yes
|
|
140
|
+
|
|
141
|
+
### Rule: Agent Performance Comparison
|
|
142
|
+
|
|
143
|
+
- **Description:** Compare duration and success metrics across agent stages
|
|
144
|
+
- **Inputs:** All history entries with stage data
|
|
145
|
+
- **Outputs:** Ranked list of stages by average duration; success rate per stage
|
|
146
|
+
- **Algorithm:** Aggregate durations and success/failure counts per stage; rank by mean duration
|
|
147
|
+
- **Deterministic:** Yes
|
|
148
|
+
|
|
149
|
+
### Rule: Recommendation Generation
|
|
150
|
+
|
|
151
|
+
- **Description:** Generate actionable recommendations based on detected patterns
|
|
152
|
+
- **Inputs:** Analysis results from all rules above
|
|
153
|
+
- **Outputs:** Human-readable recommendation strings
|
|
154
|
+
- **Logic:**
|
|
155
|
+
- If bottleneck stage is >40% of time → "Consider simplifying {stage} requirements or splitting features"
|
|
156
|
+
- If failure rate >20% on a stage → "Review {stage} agent configuration or specification clarity"
|
|
157
|
+
- If anomalies detected → "Investigate flagged runs for unusual feature complexity"
|
|
158
|
+
- If degrading trend → "Review recent changes to agent specifications or system spec"
|
|
159
|
+
- **Deterministic:** Yes (same inputs produce same recommendations)
|
|
160
|
+
|
|
161
|
+
---
|
|
162
|
+
|
|
163
|
+
## 7. Dependencies
|
|
164
|
+
|
|
165
|
+
### System components
|
|
166
|
+
|
|
167
|
+
- `src/history.js` — Must expose `readHistoryFile()` function; currently exports this
|
|
168
|
+
- `bin/cli.js` — Must register new `insights` command
|
|
169
|
+
- `.claude/pipeline-history.json` — Must exist with entries from pipeline-history feature
|
|
170
|
+
|
|
171
|
+
### Upstream features
|
|
172
|
+
|
|
173
|
+
- **pipeline-history** (`.blueprint/features/feature_pipeline-history/`) — This feature depends entirely on history data recorded by pipeline-history. The history entry schema (slug, status, stages, timestamps, durations) must be stable.
|
|
174
|
+
|
|
175
|
+
### External systems
|
|
176
|
+
|
|
177
|
+
- None
|
|
178
|
+
|
|
179
|
+
### Operational dependencies
|
|
180
|
+
|
|
181
|
+
- File system read access to `.claude/pipeline-history.json`
|
|
182
|
+
|
|
183
|
+
---
|
|
184
|
+
|
|
185
|
+
## 8. Non-Functional Considerations
|
|
186
|
+
|
|
187
|
+
### Performance sensitivity
|
|
188
|
+
|
|
189
|
+
- Analysis is computed on-demand from full history file
|
|
190
|
+
- ASSUMPTION: History files contain <500 entries; O(n) algorithms acceptable
|
|
191
|
+
- No caching required; each invocation recomputes from scratch
|
|
192
|
+
|
|
193
|
+
### Audit/logging needs
|
|
194
|
+
|
|
195
|
+
- None. This feature is read-only and does not produce persistent outputs.
|
|
196
|
+
|
|
197
|
+
### Error tolerance
|
|
198
|
+
|
|
199
|
+
- If history file is missing, display "No history found. Run some pipelines first."
|
|
200
|
+
- If history file is corrupted, display warning and exit gracefully
|
|
201
|
+
- If insufficient data for specific analysis, skip that section with explanation
|
|
202
|
+
|
|
203
|
+
### Security implications
|
|
204
|
+
|
|
205
|
+
- Feature slugs may reveal project information; output to terminal only
|
|
206
|
+
- JSON output should not include sensitive data beyond what is already in history file
|
|
207
|
+
|
|
208
|
+
---
|
|
209
|
+
|
|
210
|
+
## 9. Assumptions & Open Questions
|
|
211
|
+
|
|
212
|
+
### Assumptions
|
|
213
|
+
|
|
214
|
+
- ASSUMPTION: The history entry schema from pipeline-history is stable: `{ slug, status, stages, completedAt, totalDurationMs }`
|
|
215
|
+
- ASSUMPTION: Stage names are fixed: `alex`, `cass`, `nigel`, `codey-plan`, `codey-implement`
|
|
216
|
+
- ASSUMPTION: 2 standard deviations is an appropriate anomaly threshold for this domain
|
|
217
|
+
- ASSUMPTION: 6 runs provides sufficient data for meaningful trend analysis
|
|
218
|
+
- ASSUMPTION: Users will act on recommendations manually; no automation required
|
|
219
|
+
|
|
220
|
+
### Open Questions
|
|
221
|
+
|
|
222
|
+
- Should anomaly detection consider stage-specific thresholds rather than uniform 2-stddev?
|
|
223
|
+
- Should trend analysis use a sliding window rather than first-half/second-half comparison?
|
|
224
|
+
- Should there be a `--verbose` flag for more detailed analysis output?
|
|
225
|
+
- Should the feature support analysis of a specific time range (e.g., last 30 days)?
|
|
226
|
+
|
|
227
|
+
---
|
|
228
|
+
|
|
229
|
+
## 10. Impact on System Specification
|
|
230
|
+
|
|
231
|
+
### Alignment assessment
|
|
232
|
+
|
|
233
|
+
This feature **reinforces existing system assumptions**:
|
|
234
|
+
|
|
235
|
+
- Per SYSTEM_SPEC.md:Section 8 (Observability), the system already aims for visibility into pipeline execution
|
|
236
|
+
- Per SYSTEM_SPEC.md:Section 5 (Core Domain Concepts), the queue and pipeline concepts are well-defined
|
|
237
|
+
- This feature adds an intelligence layer without altering core pipeline behaviour
|
|
238
|
+
|
|
239
|
+
### No contradictions identified
|
|
240
|
+
|
|
241
|
+
The feature does not alter:
|
|
242
|
+
|
|
243
|
+
- Agent roles or boundaries
|
|
244
|
+
- Pipeline flow or stage order
|
|
245
|
+
- Artifact structures or handoff mechanisms
|
|
246
|
+
- History recording behaviour (defers entirely to pipeline-history feature)
|
|
247
|
+
|
|
248
|
+
### Minor extension to system spec
|
|
249
|
+
|
|
250
|
+
The following addition to SYSTEM_SPEC.md:Section 5 (Core Domain Concepts) may be warranted:
|
|
251
|
+
|
|
252
|
+
> **Pipeline Insights** — An analysis layer that examines historical pipeline data to identify bottlenecks, failure patterns, anomalies, and trends. Provides recommendations for pipeline optimization without modifying pipeline behaviour.
|
|
253
|
+
|
|
254
|
+
This is flagged as a **non-breaking extension** for consideration.
|
|
255
|
+
|
|
256
|
+
---
|
|
257
|
+
|
|
258
|
+
## 11. Handover to BA (Cass)
|
|
259
|
+
|
|
260
|
+
### Story themes
|
|
261
|
+
|
|
262
|
+
1. **Bottleneck analysis** — Identifying and reporting the slowest pipeline stages
|
|
263
|
+
2. **Failure pattern analysis** — Analyzing failure frequency and generating recommendations
|
|
264
|
+
3. **Anomaly detection** — Flagging runs that deviate significantly from averages
|
|
265
|
+
4. **Trend analysis** — Computing and displaying performance trends over time
|
|
266
|
+
5. **JSON output** — Supporting programmatic consumption of insights data
|
|
267
|
+
|
|
268
|
+
### Expected story boundaries
|
|
269
|
+
|
|
270
|
+
- Core insights engine (statistics computation) may be shared across stories
|
|
271
|
+
- Each analysis type (bottlenecks, failures, anomalies, trends) is a candidate for separate story
|
|
272
|
+
- JSON output support could be combined with any analysis story or kept separate
|
|
273
|
+
- CLI command registration is infrastructure supporting all stories
|
|
274
|
+
|
|
275
|
+
### Areas needing careful story framing
|
|
276
|
+
|
|
277
|
+
- The threshold values (35% for bottleneck, 15% for failure rate, 2-stddev for anomaly) should be explicitly stated in acceptance criteria
|
|
278
|
+
- The minimum data requirements (3 runs for basic insights, 6 runs for trends) need clear edge case handling
|
|
279
|
+
- The recommendation text generation needs precise acceptance criteria for consistent output
|
|
280
|
+
- Handling of ties in "most common failure stage" should be explicit
|
|
281
|
+
|
|
282
|
+
---
|
|
283
|
+
|
|
284
|
+
## 12. Change Log (Feature-Level)
|
|
285
|
+
|
|
286
|
+
| Date | Change | Reason | Raised By |
|
|
287
|
+
|------------|---------------------------------------|-------------------------------------------|-----------|
|
|
288
|
+
| 2026-02-24 | Initial feature specification created | Extend pipeline-history with actionable insights | Alex |
|
|
@@ -0,0 +1,65 @@
|
|
|
1
|
+
# Implementation Plan — Pipeline Insights
|
|
2
|
+
|
|
3
|
+
## Summary
|
|
4
|
+
|
|
5
|
+
This feature adds a new `orchestr8 insights` CLI command that performs read-only analysis of pipeline history data. It computes bottleneck detection, failure patterns, anomaly detection, and trend analysis, outputting human-readable recommendations or JSON. The implementation creates a new `src/insights.js` module that reuses `readHistoryFile()` from the existing `src/history.js`.
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## Files to Create/Modify
|
|
10
|
+
|
|
11
|
+
| Path | Action | Purpose |
|
|
12
|
+
|------|--------|---------|
|
|
13
|
+
| `src/insights.js` | Create | Core analysis engine with all computation logic |
|
|
14
|
+
| `bin/cli.js` | Modify | Register `insights` command and route flags |
|
|
15
|
+
|
|
16
|
+
---
|
|
17
|
+
|
|
18
|
+
## Implementation Steps
|
|
19
|
+
|
|
20
|
+
1. **Create `src/insights.js` scaffold** - Export main `displayInsights(options)` function that reads history via `readHistoryFile()` and validates minimum data (3 runs).
|
|
21
|
+
|
|
22
|
+
2. **Implement bottleneck analysis** - Calculate average duration per stage across successful runs; identify stage with highest mean; compute percentage of total; flag if >35%; generate recommendation if >40%.
|
|
23
|
+
|
|
24
|
+
3. **Implement failure pattern analysis** - Count failures by stage; compute failure rate per stage; identify most common failure stage; list features with repeated failures; flag if rate >15%; generate recommendation if >20%.
|
|
25
|
+
|
|
26
|
+
4. **Implement anomaly detection** - Calculate mean and stddev per stage from all runs; evaluate last 10 runs; flag any stage duration exceeding mean + 2*stddev; include slug, stage, actual, expected, deviation in output.
|
|
27
|
+
|
|
28
|
+
5. **Implement trend analysis** - Require 6+ runs; split history into first and second halves; compare success rates and average durations; classify as improving/stable/degrading based on 10% threshold; show percentage change.
|
|
29
|
+
|
|
30
|
+
6. **Implement output formatters** - Create `formatTextOutput(analysis)` for human-readable output and `formatJsonOutput(analysis)` for structured JSON; handle section filtering based on flags.
|
|
31
|
+
|
|
32
|
+
7. **Handle edge cases** - Missing history file returns "No history found"; corrupted file shows warning; insufficient data (<3 runs) shows appropriate message; no failures omits failure section with "No failures recorded".
|
|
33
|
+
|
|
34
|
+
8. **Register CLI command** - In `bin/cli.js`, import `displayInsights` from `src/insights.js`; add `insights` command with flag parsing for `--bottlenecks`, `--failures`, `--json`.
|
|
35
|
+
|
|
36
|
+
9. **Run tests and verify** - Execute `node --test test/feature_pipeline-insights.test.js` after each file change; ensure all tests pass.
|
|
37
|
+
|
|
38
|
+
10. **Final cleanup** - Verify output formatting matches AC requirements; ensure recommendations use exact wording from spec.
|
|
39
|
+
|
|
40
|
+
---
|
|
41
|
+
|
|
42
|
+
## Key Functions
|
|
43
|
+
|
|
44
|
+
**In `src/insights.js`:**
|
|
45
|
+
- `displayInsights(options)` - Main entry point; orchestrates analysis and output
|
|
46
|
+
- `analyzeBottlenecks(history)` - Returns `{ stage, avgDurationMs, percentage, isBottleneck, recommendation }`
|
|
47
|
+
- `analyzeFailures(history)` - Returns `{ failuresByStage, mostCommonStage, repeatedFeatures, recommendation }`
|
|
48
|
+
- `detectAnomalies(history)` - Returns `{ anomalies: [{slug, stage, actual, expected, deviation}], recommendation }`
|
|
49
|
+
- `analyzeTrends(history)` - Returns `{ successRate: {trend, change}, duration: {trend, change}, recommendation }`
|
|
50
|
+
- `formatTextOutput(analysis, sections)` - Formats analysis as human-readable text
|
|
51
|
+
- `formatJsonOutput(analysis, sections)` - Formats analysis as JSON object
|
|
52
|
+
- `calculateMean(values)` - Helper: compute arithmetic mean
|
|
53
|
+
- `calculateStdDev(values, mean)` - Helper: compute population standard deviation
|
|
54
|
+
|
|
55
|
+
**In `bin/cli.js`:**
|
|
56
|
+
- Extend `parseFlags()` to recognize `--bottlenecks`, `--failures`, `--json`
|
|
57
|
+
- Add `insights` command entry in `commands` object
|
|
58
|
+
|
|
59
|
+
---
|
|
60
|
+
|
|
61
|
+
## Risks/Questions
|
|
62
|
+
|
|
63
|
+
- **History schema assumption**: Implementation assumes `stages` is an object with `{name: {durationMs}}` structure per test-spec.md. If actual schema differs, adapter logic may be needed.
|
|
64
|
+
- **Tie-breaking**: Per test-spec.md, ties in "most common failure stage" resolved by first occurrence. Implementation should use stable sort or maintain insertion order.
|
|
65
|
+
- **Standard deviation formula**: Using population stddev (N divisor) per test-spec.md assumption, not sample stddev (N-1).
|
|
@@ -0,0 +1,71 @@
|
|
|
1
|
+
# Story — Anomaly Detection
|
|
2
|
+
|
|
3
|
+
## User story
|
|
4
|
+
|
|
5
|
+
As a developer, I want to identify pipeline runs where stage durations deviated significantly from normal so that I can investigate unusual behaviour and understand outliers.
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## Context / scope
|
|
10
|
+
|
|
11
|
+
- User has accumulated enough history data to establish baseline metrics
|
|
12
|
+
- Analysis uses statistical deviation (mean + 2*stddev) to identify anomalies
|
|
13
|
+
- Scope limited to last 10 runs to keep output manageable
|
|
14
|
+
- This is a read-only analysis; no pipeline state is modified
|
|
15
|
+
- Route: `orchestr8 insights` (anomaly section included by default)
|
|
16
|
+
|
|
17
|
+
Per FEATURE_SPEC.md:Section 6 (Rule: Anomaly Detection):
|
|
18
|
+
- Threshold: 2 standard deviations above mean
|
|
19
|
+
- Scope: Last 10 runs only
|
|
20
|
+
|
|
21
|
+
---
|
|
22
|
+
|
|
23
|
+
## Acceptance criteria
|
|
24
|
+
|
|
25
|
+
**AC-1 — Detect anomalous stage durations**
|
|
26
|
+
- Given the history file contains at least 3 pipeline runs,
|
|
27
|
+
- When the user runs `orchestr8 insights`,
|
|
28
|
+
- Then the output includes an "Anomalies" section listing any runs where a stage duration exceeded mean + 2*stddev.
|
|
29
|
+
|
|
30
|
+
**AC-2 — Display anomaly details**
|
|
31
|
+
- Given an anomalous run is detected,
|
|
32
|
+
- When the analysis completes,
|
|
33
|
+
- Then the output shows: feature slug, stage name, actual duration, expected duration (mean), and deviation factor.
|
|
34
|
+
|
|
35
|
+
**AC-3 — Limit scope to recent runs**
|
|
36
|
+
- Given the history contains more than 10 runs,
|
|
37
|
+
- When anomaly detection is performed,
|
|
38
|
+
- Then only the most recent 10 runs are evaluated for anomalies.
|
|
39
|
+
|
|
40
|
+
**AC-4 — Generate recommendation when anomalies found**
|
|
41
|
+
- Given one or more anomalous runs are detected,
|
|
42
|
+
- When the analysis completes,
|
|
43
|
+
- Then the output includes the recommendation: "Investigate flagged runs for unusual feature complexity".
|
|
44
|
+
|
|
45
|
+
**AC-5 — No anomalies detected**
|
|
46
|
+
- Given all recent runs have stage durations within 2 standard deviations of the mean,
|
|
47
|
+
- When the user runs `orchestr8 insights`,
|
|
48
|
+
- Then the anomalies section displays: "No anomalies detected in recent runs."
|
|
49
|
+
|
|
50
|
+
**AC-6 — Insufficient data for statistics**
|
|
51
|
+
- Given the history file contains fewer than 3 runs,
|
|
52
|
+
- When anomaly detection is attempted,
|
|
53
|
+
- Then it is skipped with explanation: "Insufficient data for anomaly detection."
|
|
54
|
+
|
|
55
|
+
---
|
|
56
|
+
|
|
57
|
+
## Out of scope
|
|
58
|
+
|
|
59
|
+
- Configurable standard deviation threshold
|
|
60
|
+
- Stage-specific anomaly thresholds
|
|
61
|
+
- Anomaly detection for failure counts (only duration-based)
|
|
62
|
+
- Historical anomaly tracking beyond last 10 runs
|
|
63
|
+
- Automatic investigation or drill-down into anomalous runs
|
|
64
|
+
|
|
65
|
+
---
|
|
66
|
+
|
|
67
|
+
## References
|
|
68
|
+
|
|
69
|
+
- Feature spec: `.blueprint/features/feature_pipeline-insights/FEATURE_SPEC.md`
|
|
70
|
+
- Upstream dependency: `.blueprint/features/feature_pipeline-history/FEATURE_SPEC.md`
|
|
71
|
+
- System spec: `.blueprint/system_specification/SYSTEM_SPEC.md`
|