orchestr8 2.4.0 → 2.6.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.blueprint/agents/AGENT_BA_CASS.md +50 -25
- package/.blueprint/agents/AGENT_DEVELOPER_CODEY.md +60 -69
- package/.blueprint/agents/AGENT_SPECIFICATION_ALEX.md +45 -0
- package/.blueprint/agents/AGENT_TESTER_NIGEL.md +72 -105
- package/.blueprint/features/feature_adaptive-retry/FEATURE_SPEC.md +239 -0
- package/.blueprint/features/feature_adaptive-retry/IMPLEMENTATION_PLAN.md +48 -0
- package/.blueprint/features/feature_adaptive-retry/story-prompt-modification.md +85 -0
- package/.blueprint/features/feature_adaptive-retry/story-retry-config.md +89 -0
- package/.blueprint/features/feature_adaptive-retry/story-should-retry.md +98 -0
- package/.blueprint/features/feature_adaptive-retry/story-strategy-recommendation.md +85 -0
- package/.blueprint/features/feature_agent-guardrails/FEATURE_SPEC.md +328 -0
- package/.blueprint/features/feature_agent-guardrails/IMPLEMENTATION_PLAN.md +90 -0
- package/.blueprint/features/feature_agent-guardrails/story-citation-requirements.md +50 -0
- package/.blueprint/features/feature_agent-guardrails/story-confidentiality.md +50 -0
- package/.blueprint/features/feature_agent-guardrails/story-escalation-protocol.md +55 -0
- package/.blueprint/features/feature_agent-guardrails/story-source-restrictions.md +50 -0
- package/.blueprint/features/feature_feedback-loop/FEATURE_SPEC.md +347 -0
- package/.blueprint/features/feature_feedback-loop/IMPLEMENTATION_PLAN.md +71 -0
- package/.blueprint/features/feature_feedback-loop/story-feedback-collection.md +63 -0
- package/.blueprint/features/feature_feedback-loop/story-feedback-config.md +61 -0
- package/.blueprint/features/feature_feedback-loop/story-feedback-insights.md +63 -0
- package/.blueprint/features/feature_feedback-loop/story-quality-gates.md +57 -0
- package/.blueprint/features/feature_pipeline-history/FEATURE_SPEC.md +239 -0
- package/.blueprint/features/feature_pipeline-history/IMPLEMENTATION_PLAN.md +71 -0
- package/.blueprint/features/feature_pipeline-history/story-clear-history.md +73 -0
- package/.blueprint/features/feature_pipeline-history/story-display-history.md +75 -0
- package/.blueprint/features/feature_pipeline-history/story-record-execution.md +76 -0
- package/.blueprint/features/feature_pipeline-history/story-show-statistics.md +85 -0
- package/.blueprint/features/feature_pipeline-insights/FEATURE_SPEC.md +288 -0
- package/.blueprint/features/feature_pipeline-insights/IMPLEMENTATION_PLAN.md +65 -0
- package/.blueprint/features/feature_pipeline-insights/story-anomaly-detection.md +71 -0
- package/.blueprint/features/feature_pipeline-insights/story-bottleneck-analysis.md +75 -0
- package/.blueprint/features/feature_pipeline-insights/story-failure-patterns.md +75 -0
- package/.blueprint/features/feature_pipeline-insights/story-json-output.md +75 -0
- package/.blueprint/features/feature_pipeline-insights/story-trend-analysis.md +78 -0
- package/.blueprint/features/feature_validate-command/FEATURE_SPEC.md +209 -0
- package/.blueprint/features/feature_validate-command/IMPLEMENTATION_PLAN.md +59 -0
- package/.blueprint/features/feature_validate-command/story-failure-output.md +61 -0
- package/.blueprint/features/feature_validate-command/story-node-version-check.md +52 -0
- package/.blueprint/features/feature_validate-command/story-run-validation.md +59 -0
- package/.blueprint/features/feature_validate-command/story-success-output.md +50 -0
- package/.blueprint/system_specification/SYSTEM_SPEC.md +248 -0
- package/README.md +174 -40
- package/SKILL.md +399 -74
- package/bin/cli.js +128 -20
- package/package.json +1 -1
- package/src/feedback.js +171 -0
- package/src/history.js +306 -0
- package/src/index.js +57 -2
- package/src/init.js +2 -6
- package/src/insights.js +504 -0
- package/src/retry.js +274 -0
- package/src/update.js +10 -2
- package/src/validate.js +172 -0
- package/src/skills.js +0 -93
|
@@ -0,0 +1,347 @@
|
|
|
1
|
+
# Feature Specification — Agent Feedback Loop
|
|
2
|
+
|
|
3
|
+
## 1. Feature Intent
|
|
4
|
+
**Why this feature exists.**
|
|
5
|
+
|
|
6
|
+
- **Problem being addressed:** The orchestr8 pipeline executes sequentially but lacks intra-stage feedback. Agents cannot assess the quality of upstream artifacts, leading to silent propagation of poor-quality specifications, stories, or tests through the pipeline.
|
|
7
|
+
- **User need:** Developers want visibility into how each agent perceives the quality of inputs from previous stages. When quality is low, the pipeline should pause for human review rather than proceeding with flawed inputs.
|
|
8
|
+
- **System alignment:** Per SYSTEM_SPEC.md:Section 7 (Governing Rules), agents are expected to "flag deviations" and "not silently alter specifications". This feature operationalises that principle by requiring explicit quality assessment at each stage boundary.
|
|
9
|
+
|
|
10
|
+
> This feature introduces a quality feedback mechanism that integrates with existing history, insights, and retry modules to create a closed-loop quality system.
|
|
11
|
+
|
|
12
|
+
---
|
|
13
|
+
|
|
14
|
+
## 2. Scope
|
|
15
|
+
|
|
16
|
+
### In Scope
|
|
17
|
+
|
|
18
|
+
- Feedback collection schema and data structure for agent assessments
|
|
19
|
+
- Feedback capture at each stage boundary (Cass on Alex, Nigel on Cass, Codey on Nigel)
|
|
20
|
+
- Quality gate logic: pause pipeline if feedback rating falls below threshold
|
|
21
|
+
- Configuration management for feedback thresholds (`.claude/feedback-config.json`)
|
|
22
|
+
- Storage of feedback in pipeline history entries
|
|
23
|
+
- Insights extension: correlation analysis between feedback scores and outcomes
|
|
24
|
+
- Retry integration: mapping feedback issues to retry strategies
|
|
25
|
+
- CLI commands for feedback configuration and analysis
|
|
26
|
+
|
|
27
|
+
### Out of Scope
|
|
28
|
+
|
|
29
|
+
- Feedback from Alex (no prior stage to assess within the pipeline)
|
|
30
|
+
- Automatic remediation based on feedback (human review required)
|
|
31
|
+
- Cross-pipeline feedback aggregation (each run is independent)
|
|
32
|
+
- Natural language feedback parsing (structured schema only)
|
|
33
|
+
- Feedback on auto-commit stage (no agent assessment)
|
|
34
|
+
|
|
35
|
+
---
|
|
36
|
+
|
|
37
|
+
## 3. Actors Involved
|
|
38
|
+
|
|
39
|
+
### Human User
|
|
40
|
+
|
|
41
|
+
- **Can do:** View feedback thresholds; modify threshold configuration; view feedback correlation insights; review and approve paused pipelines
|
|
42
|
+
- **Cannot do:** Directly inject feedback into history; bypass quality gates without explicit action
|
|
43
|
+
|
|
44
|
+
### Cass (Story Writer Agent)
|
|
45
|
+
|
|
46
|
+
- **Can do:** Provide feedback on Alex's feature specification before writing stories
|
|
47
|
+
- **Feedback target:** Feature specification quality, completeness, clarity
|
|
48
|
+
|
|
49
|
+
### Nigel (Tester Agent)
|
|
50
|
+
|
|
51
|
+
- **Can do:** Provide feedback on Cass's user stories before writing tests
|
|
52
|
+
- **Feedback target:** Story quality, acceptance criteria testability, scope clarity
|
|
53
|
+
|
|
54
|
+
### Codey (Developer Agent)
|
|
55
|
+
|
|
56
|
+
- **Can do:** Provide feedback on Nigel's test specification before planning/implementing
|
|
57
|
+
- **Feedback target:** Test coverage, implementation feasibility, specification clarity
|
|
58
|
+
|
|
59
|
+
### Pipeline Orchestrator (SKILL.md implementation)
|
|
60
|
+
|
|
61
|
+
- **Can do:** Collect feedback from agents; evaluate against thresholds; persist to history; trigger quality gates
|
|
62
|
+
- **Cannot do:** Override human decisions on paused pipelines
|
|
63
|
+
|
|
64
|
+
### History Module (src/history.js)
|
|
65
|
+
|
|
66
|
+
- **Extended by:** New `feedback` field in stage entries
|
|
67
|
+
- **Maintains:** Backward compatibility with existing entries (no feedback = null)
|
|
68
|
+
|
|
69
|
+
### Insights Module (src/insights.js)
|
|
70
|
+
|
|
71
|
+
- **Extended by:** New `--feedback` analysis mode
|
|
72
|
+
- **Provides:** Calibration scoring, issue pattern correlation
|
|
73
|
+
|
|
74
|
+
### Retry Module (src/retry.js)
|
|
75
|
+
|
|
76
|
+
- **Extended by:** Feedback-informed strategy selection
|
|
77
|
+
- **Consumes:** Issue patterns from feedback to recommend targeted strategies
|
|
78
|
+
|
|
79
|
+
---
|
|
80
|
+
|
|
81
|
+
## 4. Behaviour Overview
|
|
82
|
+
|
|
83
|
+
### Happy Path: Feedback Collection and Proceeding
|
|
84
|
+
|
|
85
|
+
1. Alex completes feature specification
|
|
86
|
+
2. Orchestrator spawns Cass with explicit instruction to provide feedback on Alex's output
|
|
87
|
+
3. Cass writes feedback object (rating, confidence, issues, recommendation) to designated output
|
|
88
|
+
4. Orchestrator reads feedback and evaluates against configured threshold (default: 3.0)
|
|
89
|
+
5. Rating >= threshold: pipeline proceeds, feedback stored in history
|
|
90
|
+
6. Cass proceeds to write user stories
|
|
91
|
+
7. Pattern repeats for Nigel (feedback on Cass) and Codey (feedback on Nigel)
|
|
92
|
+
8. On completion, all feedback is persisted in history entry
|
|
93
|
+
|
|
94
|
+
### Alternative: Quality Gate Triggers Pause
|
|
95
|
+
|
|
96
|
+
1. Agent provides feedback with rating < configured threshold
|
|
97
|
+
2. Agent's recommendation is "pause" or "revise"
|
|
98
|
+
3. Orchestrator pauses pipeline before current agent's main work
|
|
99
|
+
4. User is prompted: "Quality gate triggered. {Agent} rated previous stage {rating}/5. Issues: {issues}. (review/proceed/abort)"
|
|
100
|
+
5. User can review upstream artifacts, request revision, or proceed anyway
|
|
101
|
+
6. Decision and feedback are recorded in history
|
|
102
|
+
|
|
103
|
+
### Alternative: Dynamic Threshold Adjustment
|
|
104
|
+
|
|
105
|
+
1. User runs `orchestr8 insights --feedback` after sufficient runs
|
|
106
|
+
2. Insights module calculates agent calibration (how predictive is their feedback of actual outcomes)
|
|
107
|
+
3. User runs `orchestr8 feedback-config set minRating 3.5` to adjust threshold based on data
|
|
108
|
+
4. Future runs use updated threshold
|
|
109
|
+
|
|
110
|
+
### Alternative: Retry with Feedback-Informed Strategy
|
|
111
|
+
|
|
112
|
+
1. Pipeline fails at a stage (e.g., Codey cannot implement)
|
|
113
|
+
2. Retry module examines feedback chain for the failed run
|
|
114
|
+
3. Feedback issues are mapped to strategies:
|
|
115
|
+
- "missing-error-handling" → `add-context`
|
|
116
|
+
- "too-complex" → `simplify-prompt`
|
|
117
|
+
- "too-many-stories" → `reduce-stories`
|
|
118
|
+
4. Recommended strategy reflects feedback analysis
|
|
119
|
+
5. User accepts or chooses alternative
|
|
120
|
+
|
|
121
|
+
---
|
|
122
|
+
|
|
123
|
+
## 5. State & Lifecycle Interactions
|
|
124
|
+
|
|
125
|
+
### States Entered
|
|
126
|
+
|
|
127
|
+
- **feedback_pending:** After upstream agent completes, before downstream agent provides feedback
|
|
128
|
+
- **quality_gate_paused:** When feedback triggers quality gate (rating < threshold)
|
|
129
|
+
|
|
130
|
+
### States Exited
|
|
131
|
+
|
|
132
|
+
- **feedback_pending → in_progress:** When feedback is recorded and threshold is met
|
|
133
|
+
- **quality_gate_paused → in_progress:** When user chooses to proceed
|
|
134
|
+
- **quality_gate_paused → paused:** When user requests review/revision
|
|
135
|
+
|
|
136
|
+
### States Modified
|
|
137
|
+
|
|
138
|
+
- Pipeline history entries gain `stages[].feedback` field
|
|
139
|
+
- Queue entries may include temporary feedback data during execution
|
|
140
|
+
|
|
141
|
+
### Lifecycle Classification
|
|
142
|
+
|
|
143
|
+
- **State-creating:** Creates feedback_pending state at each stage boundary
|
|
144
|
+
- **State-constraining:** Quality gates can block progression
|
|
145
|
+
- **State-transitioning:** Moves between feedback states based on ratings
|
|
146
|
+
|
|
147
|
+
---
|
|
148
|
+
|
|
149
|
+
## 6. Rules & Decision Logic
|
|
150
|
+
|
|
151
|
+
### Rule 1: Feedback Schema Validation
|
|
152
|
+
|
|
153
|
+
- **Description:** All feedback must conform to the defined schema
|
|
154
|
+
- **Inputs:** Agent feedback output
|
|
155
|
+
- **Outputs:** Validated feedback object or validation error
|
|
156
|
+
- **Type:** Deterministic
|
|
157
|
+
|
|
158
|
+
```json
|
|
159
|
+
{
|
|
160
|
+
"about": "alex|cass|nigel",
|
|
161
|
+
"rating": 1-5,
|
|
162
|
+
"confidence": 0.0-1.0,
|
|
163
|
+
"issues": ["issue-code", ...],
|
|
164
|
+
"recommendation": "proceed|pause|revise"
|
|
165
|
+
}
|
|
166
|
+
```
|
|
167
|
+
|
|
168
|
+
### Rule 2: Quality Gate Evaluation
|
|
169
|
+
|
|
170
|
+
- **Description:** Compare feedback rating against threshold to determine if pipeline should pause
|
|
171
|
+
- **Inputs:** Feedback rating, configured threshold, recommendation
|
|
172
|
+
- **Outputs:** Boolean (shouldPause)
|
|
173
|
+
- **Type:** Deterministic
|
|
174
|
+
|
|
175
|
+
```
|
|
176
|
+
shouldPause = (rating < minRatingThreshold) OR (recommendation === "pause")
|
|
177
|
+
```
|
|
178
|
+
|
|
179
|
+
### Rule 3: Issue-to-Strategy Mapping
|
|
180
|
+
|
|
181
|
+
- **Description:** Map feedback issue codes to retry strategies
|
|
182
|
+
- **Inputs:** List of issue codes from feedback chain
|
|
183
|
+
- **Outputs:** Prioritised list of recommended strategies
|
|
184
|
+
- **Type:** Deterministic with configurable mappings
|
|
185
|
+
|
|
186
|
+
Default mappings:
|
|
187
|
+
| Issue Code | Strategy |
|
|
188
|
+
|------------|----------|
|
|
189
|
+
| `missing-error-handling` | `add-context` |
|
|
190
|
+
| `unclear-scope` | `simplify-prompt` |
|
|
191
|
+
| `too-complex` | `simplify-prompt` |
|
|
192
|
+
| `too-many-stories` | `reduce-stories` |
|
|
193
|
+
| `untestable-criteria` | `simplify-tests` |
|
|
194
|
+
| `missing-edge-cases` | `add-context` |
|
|
195
|
+
|
|
196
|
+
### Rule 4: Agent Calibration Calculation
|
|
197
|
+
|
|
198
|
+
- **Description:** Measure correlation between agent feedback and eventual pipeline outcomes
|
|
199
|
+
- **Inputs:** Historical feedback ratings, pipeline outcomes (success/failed)
|
|
200
|
+
- **Outputs:** Calibration score per agent (0.0 = uncorrelated, 1.0 = perfect predictor)
|
|
201
|
+
- **Type:** Deterministic (statistical calculation)
|
|
202
|
+
|
|
203
|
+
```
|
|
204
|
+
calibration[agent] = correlation(feedback_ratings[agent], outcome_success_binary)
|
|
205
|
+
```
|
|
206
|
+
|
|
207
|
+
### Rule 5: Threshold Recommendation
|
|
208
|
+
|
|
209
|
+
- **Description:** Suggest optimal threshold based on historical data
|
|
210
|
+
- **Inputs:** All feedback/outcome pairs, desired false positive/negative balance
|
|
211
|
+
- **Outputs:** Recommended minRating threshold
|
|
212
|
+
- **Type:** Deterministic
|
|
213
|
+
|
|
214
|
+
---
|
|
215
|
+
|
|
216
|
+
## 7. Dependencies
|
|
217
|
+
|
|
218
|
+
### System Components
|
|
219
|
+
|
|
220
|
+
- **src/history.js:** Extended to store feedback in history entries
|
|
221
|
+
- New field: `stages[stage].feedback` containing feedback object
|
|
222
|
+
- Backward compatible: existing entries without feedback are valid
|
|
223
|
+
|
|
224
|
+
- **src/insights.js:** Extended with feedback analysis functions
|
|
225
|
+
- New function: `analyzeFeedbackCorrelation(history)`
|
|
226
|
+
- New CLI flag: `--feedback` for feedback-specific analysis
|
|
227
|
+
|
|
228
|
+
- **src/retry.js:** Extended with feedback-informed strategy selection
|
|
229
|
+
- New function: `mapIssuesToStrategies(issues, config)`
|
|
230
|
+
- Modified: `shouldRetry()` to consider feedback chain
|
|
231
|
+
|
|
232
|
+
- **bin/cli.js:** New command registration
|
|
233
|
+
- `orchestr8 feedback-config` (view)
|
|
234
|
+
- `orchestr8 feedback-config set <key> <value>`
|
|
235
|
+
- `orchestr8 insights --feedback`
|
|
236
|
+
|
|
237
|
+
### File Dependencies
|
|
238
|
+
|
|
239
|
+
- **`.claude/feedback-config.json`:** Configuration storage
|
|
240
|
+
- **`.claude/pipeline-history.json`:** Extended schema for feedback storage
|
|
241
|
+
|
|
242
|
+
### Agent Specification Dependencies
|
|
243
|
+
|
|
244
|
+
- Agent prompts (in SKILL.md or agent specs) must include feedback collection instructions
|
|
245
|
+
- Feedback schema must be communicated to agents in their task prompts
|
|
246
|
+
|
|
247
|
+
---
|
|
248
|
+
|
|
249
|
+
## 8. Non-Functional Considerations
|
|
250
|
+
|
|
251
|
+
### Performance
|
|
252
|
+
|
|
253
|
+
- Feedback collection adds one structured output per stage (minimal overhead)
|
|
254
|
+
- Quality gate evaluation is O(1) comparison
|
|
255
|
+
- Calibration calculation is O(n) over history entries
|
|
256
|
+
|
|
257
|
+
### Resilience
|
|
258
|
+
|
|
259
|
+
- If feedback collection fails, pipeline proceeds with warning (degraded mode)
|
|
260
|
+
- Missing feedback in history is treated as neutral (no quality gate effect)
|
|
261
|
+
- Invalid feedback schema triggers warning but does not block pipeline
|
|
262
|
+
|
|
263
|
+
### Audit/Logging
|
|
264
|
+
|
|
265
|
+
- All feedback is persisted in history for retrospective analysis
|
|
266
|
+
- Quality gate decisions are logged with timestamp and user action
|
|
267
|
+
|
|
268
|
+
### Security
|
|
269
|
+
|
|
270
|
+
- Feedback file is gitignored (contains project-specific assessments)
|
|
271
|
+
- No sensitive data expected in feedback (ratings, codes, recommendations only)
|
|
272
|
+
|
|
273
|
+
---
|
|
274
|
+
|
|
275
|
+
## 9. Assumptions & Open Questions
|
|
276
|
+
|
|
277
|
+
### Assumptions
|
|
278
|
+
|
|
279
|
+
- ASSUMPTION: Agents can reliably produce structured feedback in the specified schema
|
|
280
|
+
- ASSUMPTION: Feedback ratings are comparable across agents and runs
|
|
281
|
+
- ASSUMPTION: Issue codes will emerge from practice and can be standardised iteratively
|
|
282
|
+
- ASSUMPTION: Correlation analysis requires 10+ completed runs for meaningful results
|
|
283
|
+
|
|
284
|
+
### Open Questions
|
|
285
|
+
|
|
286
|
+
- Should feedback influence agent prompts proactively (not just on retry)?
|
|
287
|
+
- How should conflicting feedback (high rating but "pause" recommendation) be handled?
|
|
288
|
+
- Should there be feedback severity levels (blocking vs advisory)?
|
|
289
|
+
- What feedback should Codey provide about Codey-plan (within same agent)?
|
|
290
|
+
|
|
291
|
+
---
|
|
292
|
+
|
|
293
|
+
## 10. Impact on System Specification
|
|
294
|
+
|
|
295
|
+
### Reinforces Existing Assumptions
|
|
296
|
+
|
|
297
|
+
- Per SYSTEM_SPEC.md:Section 7, agents must "flag deviations" - feedback formalises this
|
|
298
|
+
- Per SYSTEM_SPEC.md:Section 8, failure handling already supports pause/review - quality gates extend this
|
|
299
|
+
|
|
300
|
+
### Stretches Existing Assumptions
|
|
301
|
+
|
|
302
|
+
- History module shifts from pure observability to operational dependency (also noted in adaptive-retry)
|
|
303
|
+
- Agent boundaries are subtly extended: agents now assess peer outputs, not just produce artifacts
|
|
304
|
+
- Pipeline flow gains conditional branches (quality gates) beyond explicit `--pause-after`
|
|
305
|
+
|
|
306
|
+
### Potential Contradiction
|
|
307
|
+
|
|
308
|
+
The system spec states pipelines are "sequential" (Section 7). Quality gates introduce conditional pauses that may feel like interruptions. However, this is consistent with the existing `--pause-after` mechanism and does not fundamentally alter sequence.
|
|
309
|
+
|
|
310
|
+
**Flagged for consideration:** Should SYSTEM_SPEC.md:Section 6 be updated to explicitly acknowledge quality gates as a pipeline flow modifier?
|
|
311
|
+
|
|
312
|
+
---
|
|
313
|
+
|
|
314
|
+
## 11. Handover to BA (Cass)
|
|
315
|
+
|
|
316
|
+
### Story Themes
|
|
317
|
+
|
|
318
|
+
1. **Feedback Collection:** Agents provide structured feedback on upstream artifacts
|
|
319
|
+
2. **Quality Gates:** Pipeline pauses when feedback indicates quality concerns
|
|
320
|
+
3. **Configuration Management:** User can view and modify feedback thresholds
|
|
321
|
+
4. **History Integration:** Feedback is stored in pipeline history entries
|
|
322
|
+
5. **Insights Extension:** Feedback correlation analysis and calibration scoring
|
|
323
|
+
6. **Retry Integration:** Feedback issues inform retry strategy selection
|
|
324
|
+
|
|
325
|
+
### Expected Story Boundaries
|
|
326
|
+
|
|
327
|
+
- Feedback schema definition and validation as foundational story
|
|
328
|
+
- Quality gate logic as separate story (depends on schema)
|
|
329
|
+
- CLI configuration commands as separate story (parallel track)
|
|
330
|
+
- History integration as separate story (depends on schema)
|
|
331
|
+
- Insights extension as separate story (depends on history integration)
|
|
332
|
+
- Retry integration as separate story (depends on insights correlation)
|
|
333
|
+
|
|
334
|
+
### Areas Needing Careful Story Framing
|
|
335
|
+
|
|
336
|
+
- Feedback collection happens *within* agent execution; must not disrupt agent focus
|
|
337
|
+
- Quality gate UX: user prompt must clearly explain situation and options
|
|
338
|
+
- Issue code taxonomy: start with small set, plan for iterative expansion
|
|
339
|
+
- Calibration display: statistical concepts must be presented accessibly
|
|
340
|
+
|
|
341
|
+
---
|
|
342
|
+
|
|
343
|
+
## 12. Change Log (Feature-Level)
|
|
344
|
+
|
|
345
|
+
| Date | Change | Reason | Raised By |
|
|
346
|
+
|------------|---------------------------------------|---------------------------------|-----------|
|
|
347
|
+
| 2026-02-24 | Initial feature specification created | Feature request for agent feedback system | Alex |
|
|
@@ -0,0 +1,71 @@
|
|
|
1
|
+
# Implementation Plan — Feedback Loop Feature
|
|
2
|
+
|
|
3
|
+
## Summary
|
|
4
|
+
|
|
5
|
+
This feature adds a quality feedback mechanism where downstream agents (Cass, Nigel, Codey) assess upstream artifacts before proceeding. Implementation requires a new `src/feedback.js` module for schema validation, quality gate logic, configuration management, and insights analysis. The existing `src/history.js`, `src/insights.js`, and `src/retry.js` modules need extensions to store, analyze, and act on feedback data.
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## Files to Create/Modify
|
|
10
|
+
|
|
11
|
+
| Path | Action | Purpose |
|
|
12
|
+
|------|--------|---------|
|
|
13
|
+
| `src/feedback.js` | Create | Core feedback logic: validation, quality gates, config management |
|
|
14
|
+
| `src/history.js` | Modify | Add `storeStageFeedback()` and extend entry schema |
|
|
15
|
+
| `src/insights.js` | Modify | Add feedback analysis: `analyzeFeedbackCorrelation()`, calibration |
|
|
16
|
+
| `src/retry.js` | Modify | Add `mapIssuesToStrategies()` for feedback-informed retries |
|
|
17
|
+
| `bin/cli.js` | Modify | Register `feedback-config` command and `insights --feedback` flag |
|
|
18
|
+
| `src/index.js` | Modify | Export feedback module |
|
|
19
|
+
|
|
20
|
+
---
|
|
21
|
+
|
|
22
|
+
## Implementation Steps
|
|
23
|
+
|
|
24
|
+
1. **Create `src/feedback.js` with schema validation** — Implement `validateFeedback()` per FEATURE_SPEC.md:Rule 1. Schema: `{about, rating, confidence, issues, recommendation}`. Return `{valid, errors}`.
|
|
25
|
+
|
|
26
|
+
2. **Add quality gate logic to `src/feedback.js`** — Implement `shouldPause(feedback, config)` per FEATURE_SPEC.md:Rule 2. Returns true if `rating < minRatingThreshold` OR `recommendation === "pause"`.
|
|
27
|
+
|
|
28
|
+
3. **Add config management to `src/feedback.js`** — Implement `getDefaultConfig()`, `readConfig()`, `writeConfig()`, `setConfigValue()`. Default: `{minRatingThreshold: 3.0, enabled: true, issueMappings: {...}}`.
|
|
29
|
+
|
|
30
|
+
4. **Extend `src/history.js`** — Add `storeStageFeedback(slug, stage, feedback)` to persist feedback at `stages[stage].feedback`. Ensure backward compatibility (missing feedback = null).
|
|
31
|
+
|
|
32
|
+
5. **Add calibration calculation to `src/insights.js`** — Implement `calculateCalibration(agent, history)` per FEATURE_SPEC.md:Rule 4. Return null if <10 runs with feedback, else correlation score 0-1.
|
|
33
|
+
|
|
34
|
+
6. **Add issue correlation to `src/insights.js`** — Implement `correlateIssues(history)` to map issue codes to failure rates. Return `{issueCode: failureCorrelation}`.
|
|
35
|
+
|
|
36
|
+
7. **Add threshold recommendation to `src/insights.js`** — Implement `recommendThreshold(history)` to suggest optimal minRatingThreshold based on historical data.
|
|
37
|
+
|
|
38
|
+
8. **Extend `src/retry.js`** — Add `mapIssuesToStrategies(issues, config)` using default mappings from FEATURE_SPEC.md:Rule 3.
|
|
39
|
+
|
|
40
|
+
9. **Register CLI commands in `bin/cli.js`** — Add `feedback-config` (view), `feedback-config set <key> <value>`, and `--feedback` flag to `insights` command.
|
|
41
|
+
|
|
42
|
+
10. **Wire exports in `src/index.js`** — Export feedback module for orchestrator integration.
|
|
43
|
+
|
|
44
|
+
---
|
|
45
|
+
|
|
46
|
+
## Key Functions
|
|
47
|
+
|
|
48
|
+
**src/feedback.js:**
|
|
49
|
+
- `validateFeedback(feedback)` — Schema validation, returns `{valid, errors}`
|
|
50
|
+
- `shouldPause(feedback, config)` — Quality gate evaluation
|
|
51
|
+
- `getDefaultConfig()` / `readConfig()` / `writeConfig(config)` — Config I/O
|
|
52
|
+
- `setConfigValue(key, value)` — CLI config setter with validation
|
|
53
|
+
- `displayConfig()` — Pretty-print current config
|
|
54
|
+
|
|
55
|
+
**src/insights.js (new):**
|
|
56
|
+
- `calculateCalibration(agent, history)` — Agent calibration score
|
|
57
|
+
- `correlateIssues(history)` — Issue-to-failure correlation
|
|
58
|
+
- `recommendThreshold(history)` — Optimal threshold suggestion
|
|
59
|
+
- `displayFeedbackInsights(options)` — CLI output for `--feedback`
|
|
60
|
+
|
|
61
|
+
**src/retry.js (new):**
|
|
62
|
+
- `mapIssuesToStrategies(issues, config)` — Feedback-informed strategy selection
|
|
63
|
+
|
|
64
|
+
---
|
|
65
|
+
|
|
66
|
+
## Risks/Questions
|
|
67
|
+
|
|
68
|
+
- **Agent prompt integration**: Feedback collection requires agent prompts (in SKILL.md) to include feedback instructions. This is orchestrator-level work outside core modules.
|
|
69
|
+
- **Calibration metric**: Tests use simple accuracy (predicted vs actual). May need Pearson correlation for better calibration measure in production.
|
|
70
|
+
- **Issue taxonomy**: Starting with 6 issue codes per FEATURE_SPEC.md:Rule 3. Plan for iterative expansion.
|
|
71
|
+
- **Conflicting signals**: Per FEATURE_SPEC.md:Section 9, high rating + "pause" recommendation is unresolved. Recommend: recommendation takes precedence.
|
|
@@ -0,0 +1,63 @@
|
|
|
1
|
+
# Story — Feedback Collection
|
|
2
|
+
|
|
3
|
+
## User Story
|
|
4
|
+
|
|
5
|
+
As a **pipeline orchestrator**, I want **downstream agents to provide structured feedback on upstream artifacts** so that **quality issues are surfaced explicitly at each stage boundary**.
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## Context / Scope
|
|
10
|
+
|
|
11
|
+
- Per FEATURE_SPEC.md:Section 4, feedback is collected at each stage boundary (Cass on Alex, Nigel on Cass, Codey on Nigel)
|
|
12
|
+
- Feedback uses a defined schema with rating, confidence, issues, and recommendation
|
|
13
|
+
- Feedback is captured before the downstream agent begins its main work
|
|
14
|
+
- Per SYSTEM_SPEC.md:Section 7, agents must "flag deviations" — this story operationalises that principle
|
|
15
|
+
|
|
16
|
+
---
|
|
17
|
+
|
|
18
|
+
## Acceptance Criteria
|
|
19
|
+
|
|
20
|
+
**AC-1 — Feedback schema structure**
|
|
21
|
+
- Given an agent is spawned to provide feedback,
|
|
22
|
+
- When the agent completes feedback output,
|
|
23
|
+
- Then the feedback object contains:
|
|
24
|
+
- `about`: agent name being assessed (alex|cass|nigel)
|
|
25
|
+
- `rating`: integer 1-5
|
|
26
|
+
- `confidence`: float 0.0-1.0
|
|
27
|
+
- `issues`: array of issue codes (may be empty)
|
|
28
|
+
- `recommendation`: one of "proceed", "pause", or "revise"
|
|
29
|
+
|
|
30
|
+
**AC-2 — Cass provides feedback on Alex**
|
|
31
|
+
- Given Alex has completed a feature specification,
|
|
32
|
+
- When Cass is spawned for story writing,
|
|
33
|
+
- Then Cass first produces a feedback object with `about: "alex"` assessing the feature spec quality.
|
|
34
|
+
|
|
35
|
+
**AC-3 — Nigel provides feedback on Cass**
|
|
36
|
+
- Given Cass has completed user stories,
|
|
37
|
+
- When Nigel is spawned for test writing,
|
|
38
|
+
- Then Nigel first produces a feedback object with `about: "cass"` assessing story quality and testability.
|
|
39
|
+
|
|
40
|
+
**AC-4 — Codey provides feedback on Nigel**
|
|
41
|
+
- Given Nigel has completed test specifications,
|
|
42
|
+
- When Codey is spawned for planning/implementation,
|
|
43
|
+
- Then Codey first produces a feedback object with `about: "nigel"` assessing test coverage and implementation feasibility.
|
|
44
|
+
|
|
45
|
+
**AC-5 — Feedback validation**
|
|
46
|
+
- Given an agent produces a feedback object,
|
|
47
|
+
- When the orchestrator reads the feedback,
|
|
48
|
+
- Then the feedback is validated against the schema,
|
|
49
|
+
- And invalid feedback triggers a warning but does not block the pipeline (per FEATURE_SPEC.md:Section 8, degraded mode).
|
|
50
|
+
|
|
51
|
+
**AC-6 — Feedback persisted to history**
|
|
52
|
+
- Given feedback is collected from an agent,
|
|
53
|
+
- When the stage completes,
|
|
54
|
+
- Then the feedback is stored in the history entry at `stages[stage].feedback`.
|
|
55
|
+
|
|
56
|
+
---
|
|
57
|
+
|
|
58
|
+
## Out of Scope
|
|
59
|
+
|
|
60
|
+
- Feedback from Alex (no prior stage to assess)
|
|
61
|
+
- Feedback on auto-commit stage
|
|
62
|
+
- Natural language feedback parsing (structured schema only)
|
|
63
|
+
- Automatic remediation based on feedback
|
|
@@ -0,0 +1,61 @@
|
|
|
1
|
+
# Story — Feedback Configuration
|
|
2
|
+
|
|
3
|
+
## User Story
|
|
4
|
+
|
|
5
|
+
As a **developer**, I want **CLI commands to view and modify feedback thresholds** so that **I can tune quality gate sensitivity based on my project's needs**.
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## Context / Scope
|
|
10
|
+
|
|
11
|
+
- Per FEATURE_SPEC.md:Section 7, configuration is stored in `.claude/feedback-config.json`
|
|
12
|
+
- Per FEATURE_SPEC.md:Section 7, new CLI commands: `orchestr8 feedback-config` and `orchestr8 feedback-config set <key> <value>`
|
|
13
|
+
- Parallel track to quality gates — configuration can be set independently
|
|
14
|
+
|
|
15
|
+
---
|
|
16
|
+
|
|
17
|
+
## Acceptance Criteria
|
|
18
|
+
|
|
19
|
+
**AC-1 — View feedback configuration**
|
|
20
|
+
- Given the user runs `orchestr8 feedback-config`,
|
|
21
|
+
- When the command executes,
|
|
22
|
+
- Then the current configuration is displayed including:
|
|
23
|
+
- `minRatingThreshold` (default: 3.0)
|
|
24
|
+
- `enabled` (default: true)
|
|
25
|
+
- Any custom issue-to-strategy mappings
|
|
26
|
+
|
|
27
|
+
**AC-2 — Set threshold value**
|
|
28
|
+
- Given the user runs `orchestr8 feedback-config set minRating <value>`,
|
|
29
|
+
- When the value is a number between 1.0 and 5.0,
|
|
30
|
+
- Then the threshold is updated in `.claude/feedback-config.json`,
|
|
31
|
+
- And a confirmation message is displayed.
|
|
32
|
+
|
|
33
|
+
**AC-3 — Invalid threshold rejected**
|
|
34
|
+
- Given the user runs `orchestr8 feedback-config set minRating <value>`,
|
|
35
|
+
- When the value is outside 1.0-5.0 range or not a number,
|
|
36
|
+
- Then an error message is displayed,
|
|
37
|
+
- And the configuration is not modified.
|
|
38
|
+
|
|
39
|
+
**AC-4 — Enable/disable feedback system**
|
|
40
|
+
- Given the user runs `orchestr8 feedback-config set enabled <true|false>`,
|
|
41
|
+
- When the command executes,
|
|
42
|
+
- Then the `enabled` flag is updated,
|
|
43
|
+
- And when disabled, feedback collection and quality gates are skipped.
|
|
44
|
+
|
|
45
|
+
**AC-5 — Configuration file created on first set**
|
|
46
|
+
- Given `.claude/feedback-config.json` does not exist,
|
|
47
|
+
- When the user runs a `feedback-config set` command,
|
|
48
|
+
- Then the file is created with default values plus the specified override.
|
|
49
|
+
|
|
50
|
+
**AC-6 — Configuration file is gitignored**
|
|
51
|
+
- Given a project is initialised with orchestr8,
|
|
52
|
+
- When feedback configuration is created,
|
|
53
|
+
- Then `.claude/feedback-config.json` is included in gitignore patterns.
|
|
54
|
+
|
|
55
|
+
---
|
|
56
|
+
|
|
57
|
+
## Out of Scope
|
|
58
|
+
|
|
59
|
+
- Per-agent threshold configuration (single global threshold for MVP)
|
|
60
|
+
- Custom issue code definition via CLI
|
|
61
|
+
- Configuration import/export
|
|
@@ -0,0 +1,63 @@
|
|
|
1
|
+
# Story — Feedback Insights
|
|
2
|
+
|
|
3
|
+
## User Story
|
|
4
|
+
|
|
5
|
+
As a **developer**, I want **correlation analysis between feedback scores and pipeline outcomes** so that **I can understand how predictive agent feedback is and tune thresholds accordingly**.
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## Context / Scope
|
|
10
|
+
|
|
11
|
+
- Per FEATURE_SPEC.md:Section 7, extends `src/insights.js` with feedback analysis functions
|
|
12
|
+
- Per FEATURE_SPEC.md:Section 6 (Rule 4), calculates agent calibration as correlation between ratings and outcomes
|
|
13
|
+
- Depends on feedback being stored in history (story-feedback-collection.md)
|
|
14
|
+
- Per FEATURE_SPEC.md:Section 9, requires 10+ completed runs for meaningful results
|
|
15
|
+
|
|
16
|
+
---
|
|
17
|
+
|
|
18
|
+
## Acceptance Criteria
|
|
19
|
+
|
|
20
|
+
**AC-1 — Feedback analysis command**
|
|
21
|
+
- Given the user runs `orchestr8 insights --feedback`,
|
|
22
|
+
- When sufficient history exists (10+ completed runs with feedback),
|
|
23
|
+
- Then a feedback analysis report is displayed.
|
|
24
|
+
|
|
25
|
+
**AC-2 — Agent calibration scoring**
|
|
26
|
+
- Given the feedback analysis runs,
|
|
27
|
+
- When calibration is calculated per agent,
|
|
28
|
+
- Then each agent receives a calibration score (0.0-1.0):
|
|
29
|
+
- 0.0 = feedback uncorrelated with outcomes
|
|
30
|
+
- 1.0 = perfect predictor of success/failure
|
|
31
|
+
- And the score is displayed as "Cass calibration: 0.72" format.
|
|
32
|
+
|
|
33
|
+
**AC-3 — Issue pattern correlation**
|
|
34
|
+
- Given feedback history contains issue codes,
|
|
35
|
+
- When the analysis runs,
|
|
36
|
+
- Then issue codes are correlated with failure outcomes,
|
|
37
|
+
- And frequently predictive issues are highlighted (e.g., "`unclear-scope` preceded 80% of failures").
|
|
38
|
+
|
|
39
|
+
**AC-4 — Threshold recommendation**
|
|
40
|
+
- Given sufficient calibration data exists,
|
|
41
|
+
- When the analysis runs,
|
|
42
|
+
- Then a recommended `minRatingThreshold` is suggested based on historical data,
|
|
43
|
+
- And the recommendation balances false positives (unnecessary pauses) and false negatives (missed quality issues).
|
|
44
|
+
|
|
45
|
+
**AC-5 — Insufficient data handling**
|
|
46
|
+
- Given the user runs `orchestr8 insights --feedback`,
|
|
47
|
+
- When fewer than 10 completed runs with feedback exist,
|
|
48
|
+
- Then a message is displayed: "Insufficient data for feedback analysis. {N}/10 runs with feedback available."
|
|
49
|
+
|
|
50
|
+
**AC-6 — Retry strategy mapping**
|
|
51
|
+
- Given feedback analysis identifies predictive issue patterns,
|
|
52
|
+
- When the user views insights,
|
|
53
|
+
- Then issue-to-strategy mappings are displayed (per FEATURE_SPEC.md:Rule 3),
|
|
54
|
+
- And the user can see which retry strategies are recommended for common issues.
|
|
55
|
+
|
|
56
|
+
---
|
|
57
|
+
|
|
58
|
+
## Out of Scope
|
|
59
|
+
|
|
60
|
+
- Cross-pipeline feedback aggregation (each project is independent)
|
|
61
|
+
- Real-time calibration updates during pipeline execution
|
|
62
|
+
- Natural language interpretation of feedback patterns
|
|
63
|
+
- Automatic threshold adjustment (user must run `feedback-config set`)
|
|
@@ -0,0 +1,57 @@
|
|
|
1
|
+
# Story — Quality Gates
|
|
2
|
+
|
|
3
|
+
## User Story
|
|
4
|
+
|
|
5
|
+
As a **developer**, I want **the pipeline to pause when feedback indicates quality concerns** so that **I can review and address issues before proceeding with flawed inputs**.
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## Context / Scope
|
|
10
|
+
|
|
11
|
+
- Per FEATURE_SPEC.md:Section 4 (Alternative: Quality Gate Triggers Pause), pipeline pauses when rating < threshold or recommendation is "pause"
|
|
12
|
+
- Default threshold is 3.0 (per FEATURE_SPEC.md:Section 4)
|
|
13
|
+
- Depends on feedback collection (story-feedback-collection.md)
|
|
14
|
+
- Per SYSTEM_SPEC.md:Section 8, failure handling already supports pause/review — quality gates extend this
|
|
15
|
+
|
|
16
|
+
---
|
|
17
|
+
|
|
18
|
+
## Acceptance Criteria
|
|
19
|
+
|
|
20
|
+
**AC-1 — Quality gate evaluation**
|
|
21
|
+
- Given feedback is collected from an agent,
|
|
22
|
+
- When the orchestrator evaluates the feedback,
|
|
23
|
+
- Then `shouldPause` is true if:
|
|
24
|
+
- `rating < minRatingThreshold`, OR
|
|
25
|
+
- `recommendation === "pause"`
|
|
26
|
+
|
|
27
|
+
**AC-2 — Pipeline pauses on quality gate trigger**
|
|
28
|
+
- Given `shouldPause` evaluates to true,
|
|
29
|
+
- When the quality gate is triggered,
|
|
30
|
+
- Then the pipeline pauses before the current agent begins its main work,
|
|
31
|
+
- And the user is prompted with: "Quality gate triggered. {Agent} rated previous stage {rating}/5. Issues: {issues}. (review/proceed/abort)"
|
|
32
|
+
|
|
33
|
+
**AC-3 — User can proceed past quality gate**
|
|
34
|
+
- Given the pipeline is paused at a quality gate,
|
|
35
|
+
- When the user chooses "proceed",
|
|
36
|
+
- Then the pipeline continues with the current agent's main work,
|
|
37
|
+
- And the decision is recorded in history.
|
|
38
|
+
|
|
39
|
+
**AC-4 — User can abort at quality gate**
|
|
40
|
+
- Given the pipeline is paused at a quality gate,
|
|
41
|
+
- When the user chooses "abort",
|
|
42
|
+
- Then the pipeline stops,
|
|
43
|
+
- And the feature is moved to the failed list with reason "quality_gate_abort".
|
|
44
|
+
|
|
45
|
+
**AC-5 — User can review at quality gate**
|
|
46
|
+
- Given the pipeline is paused at a quality gate,
|
|
47
|
+
- When the user chooses "review",
|
|
48
|
+
- Then the pipeline remains paused,
|
|
49
|
+
- And the user can examine upstream artifacts before deciding to proceed or abort.
|
|
50
|
+
|
|
51
|
+
---
|
|
52
|
+
|
|
53
|
+
## Out of Scope
|
|
54
|
+
|
|
55
|
+
- Automatic remediation or revision of upstream artifacts
|
|
56
|
+
- Multiple threshold levels per agent (single global threshold for MVP)
|
|
57
|
+
- Bypassing quality gates without explicit user action
|