murmur8 3.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (120) hide show
  1. package/.blueprint/agents/AGENT_BA_CASS.md +239 -0
  2. package/.blueprint/agents/AGENT_DEVELOPER_CODEY.md +308 -0
  3. package/.blueprint/agents/AGENT_SPECIFICATION_ALEX.md +183 -0
  4. package/.blueprint/agents/AGENT_TESTER_NIGEL.md +159 -0
  5. package/.blueprint/agents/GUARDRAILS.md +83 -0
  6. package/.blueprint/agents/TEAM_MANIFESTO.md +91 -0
  7. package/.blueprint/features/.gitkeep +0 -0
  8. package/.blueprint/features/feature_adaptive-retry/FEATURE_SPEC.md +239 -0
  9. package/.blueprint/features/feature_adaptive-retry/IMPLEMENTATION_PLAN.md +48 -0
  10. package/.blueprint/features/feature_adaptive-retry/story-prompt-modification.md +85 -0
  11. package/.blueprint/features/feature_adaptive-retry/story-retry-config.md +89 -0
  12. package/.blueprint/features/feature_adaptive-retry/story-should-retry.md +98 -0
  13. package/.blueprint/features/feature_adaptive-retry/story-strategy-recommendation.md +85 -0
  14. package/.blueprint/features/feature_agent-guardrails/FEATURE_SPEC.md +328 -0
  15. package/.blueprint/features/feature_agent-guardrails/IMPLEMENTATION_PLAN.md +90 -0
  16. package/.blueprint/features/feature_agent-guardrails/story-citation-requirements.md +50 -0
  17. package/.blueprint/features/feature_agent-guardrails/story-confidentiality.md +50 -0
  18. package/.blueprint/features/feature_agent-guardrails/story-escalation-protocol.md +55 -0
  19. package/.blueprint/features/feature_agent-guardrails/story-source-restrictions.md +50 -0
  20. package/.blueprint/features/feature_compressed-feedback/FEATURE_SPEC.md +136 -0
  21. package/.blueprint/features/feature_compressed-feedback/IMPLEMENTATION_PLAN.md +40 -0
  22. package/.blueprint/features/feature_feedback-loop/FEATURE_SPEC.md +347 -0
  23. package/.blueprint/features/feature_feedback-loop/IMPLEMENTATION_PLAN.md +71 -0
  24. package/.blueprint/features/feature_feedback-loop/story-feedback-collection.md +63 -0
  25. package/.blueprint/features/feature_feedback-loop/story-feedback-config.md +61 -0
  26. package/.blueprint/features/feature_feedback-loop/story-feedback-insights.md +63 -0
  27. package/.blueprint/features/feature_feedback-loop/story-quality-gates.md +57 -0
  28. package/.blueprint/features/feature_interactive-alex/FEATURE_SPEC.md +263 -0
  29. package/.blueprint/features/feature_interactive-alex/IMPLEMENTATION_PLAN.md +69 -0
  30. package/.blueprint/features/feature_interactive-alex/handoff-alex.md +19 -0
  31. package/.blueprint/features/feature_interactive-alex/handoff-cass.md +21 -0
  32. package/.blueprint/features/feature_interactive-alex/handoff-nigel.md +19 -0
  33. package/.blueprint/features/feature_interactive-alex/story-flag-routing.md +54 -0
  34. package/.blueprint/features/feature_interactive-alex/story-iterative-drafting.md +65 -0
  35. package/.blueprint/features/feature_interactive-alex/story-pipeline-integration.md +66 -0
  36. package/.blueprint/features/feature_interactive-alex/story-session-lifecycle.md +75 -0
  37. package/.blueprint/features/feature_interactive-alex/story-system-spec-creation.md +57 -0
  38. package/.blueprint/features/feature_lazy-business-context/FEATURE_SPEC.md +140 -0
  39. package/.blueprint/features/feature_lazy-business-context/IMPLEMENTATION_PLAN.md +54 -0
  40. package/.blueprint/features/feature_model-native-features/FEATURE_SPEC.md +174 -0
  41. package/.blueprint/features/feature_model-native-features/IMPLEMENTATION_PLAN.md +45 -0
  42. package/.blueprint/features/feature_parallel-abort/FEATURE_SPEC.md +117 -0
  43. package/.blueprint/features/feature_parallel-confirm/FEATURE_SPEC.md +90 -0
  44. package/.blueprint/features/feature_parallel-features/FEATURE_SPEC.md +291 -0
  45. package/.blueprint/features/feature_parallel-features/IMPLEMENTATION_PLAN.md +73 -0
  46. package/.blueprint/features/feature_parallel-lock/FEATURE_SPEC.md +119 -0
  47. package/.blueprint/features/feature_parallel-logging/FEATURE_SPEC.md +105 -0
  48. package/.blueprint/features/feature_parallel-preflight/FEATURE_SPEC.md +141 -0
  49. package/.blueprint/features/feature_pipeline-history/FEATURE_SPEC.md +239 -0
  50. package/.blueprint/features/feature_pipeline-history/IMPLEMENTATION_PLAN.md +71 -0
  51. package/.blueprint/features/feature_pipeline-history/story-clear-history.md +73 -0
  52. package/.blueprint/features/feature_pipeline-history/story-display-history.md +75 -0
  53. package/.blueprint/features/feature_pipeline-history/story-record-execution.md +76 -0
  54. package/.blueprint/features/feature_pipeline-history/story-show-statistics.md +85 -0
  55. package/.blueprint/features/feature_pipeline-insights/FEATURE_SPEC.md +288 -0
  56. package/.blueprint/features/feature_pipeline-insights/IMPLEMENTATION_PLAN.md +65 -0
  57. package/.blueprint/features/feature_pipeline-insights/story-anomaly-detection.md +71 -0
  58. package/.blueprint/features/feature_pipeline-insights/story-bottleneck-analysis.md +75 -0
  59. package/.blueprint/features/feature_pipeline-insights/story-failure-patterns.md +75 -0
  60. package/.blueprint/features/feature_pipeline-insights/story-json-output.md +75 -0
  61. package/.blueprint/features/feature_pipeline-insights/story-trend-analysis.md +78 -0
  62. package/.blueprint/features/feature_shared-guardrails/FEATURE_SPEC.md +119 -0
  63. package/.blueprint/features/feature_shared-guardrails/IMPLEMENTATION_PLAN.md +34 -0
  64. package/.blueprint/features/feature_shared-guardrails/story-extract-guardrails.md +60 -0
  65. package/.blueprint/features/feature_shared-guardrails/story-update-init-commands.md +63 -0
  66. package/.blueprint/features/feature_slim-agent-prompts/FEATURE_SPEC.md +145 -0
  67. package/.blueprint/features/feature_slim-agent-prompts/IMPLEMENTATION_PLAN.md +87 -0
  68. package/.blueprint/features/feature_slim-agent-prompts/story-create-runtime-prompt-template.md +59 -0
  69. package/.blueprint/features/feature_slim-agent-prompts/story-create-slim-agent-prompts.md +65 -0
  70. package/.blueprint/features/feature_slim-agent-prompts/story-skill-integration.md +53 -0
  71. package/.blueprint/features/feature_smart-story-routing/FEATURE_SPEC.md +147 -0
  72. package/.blueprint/features/feature_smart-story-routing/IMPLEMENTATION_PLAN.md +73 -0
  73. package/.blueprint/features/feature_template-extraction/FEATURE_SPEC.md +134 -0
  74. package/.blueprint/features/feature_template-extraction/IMPLEMENTATION_PLAN.md +46 -0
  75. package/.blueprint/features/feature_upstream-summaries/FEATURE_SPEC.md +150 -0
  76. package/.blueprint/features/feature_upstream-summaries/IMPLEMENTATION_PLAN.md +70 -0
  77. package/.blueprint/features/feature_validate-command/FEATURE_SPEC.md +209 -0
  78. package/.blueprint/features/feature_validate-command/IMPLEMENTATION_PLAN.md +59 -0
  79. package/.blueprint/features/feature_validate-command/story-failure-output.md +61 -0
  80. package/.blueprint/features/feature_validate-command/story-node-version-check.md +52 -0
  81. package/.blueprint/features/feature_validate-command/story-run-validation.md +59 -0
  82. package/.blueprint/features/feature_validate-command/story-success-output.md +50 -0
  83. package/.blueprint/prompts/TEMPLATE.md +65 -0
  84. package/.blueprint/prompts/alex-runtime.md +49 -0
  85. package/.blueprint/prompts/cass-runtime.md +46 -0
  86. package/.blueprint/prompts/codey-implement-runtime.md +52 -0
  87. package/.blueprint/prompts/codey-plan-runtime.md +47 -0
  88. package/.blueprint/prompts/nigel-runtime.md +47 -0
  89. package/.blueprint/system_specification/.gitkeep +0 -0
  90. package/.blueprint/system_specification/SYSTEM_SPEC.md +248 -0
  91. package/.blueprint/templates/FEATURE_SPEC.md +125 -0
  92. package/.blueprint/templates/STORY_TEMPLATE.md +96 -0
  93. package/.blueprint/templates/SYSTEM_SPEC.md +128 -0
  94. package/.blueprint/templates/TEST_TEMPLATE.md +76 -0
  95. package/.blueprint/ways_of_working/DEVELOPMENT_RITUAL.md +178 -0
  96. package/.business_context/README.md +27 -0
  97. package/LICENSE +21 -0
  98. package/README.md +564 -0
  99. package/SKILL.md +840 -0
  100. package/bin/cli.js +388 -0
  101. package/package.json +36 -0
  102. package/src/business-context.js +91 -0
  103. package/src/classifier.js +173 -0
  104. package/src/feedback.js +201 -0
  105. package/src/handoff.js +148 -0
  106. package/src/history.js +306 -0
  107. package/src/index.js +170 -0
  108. package/src/init.js +139 -0
  109. package/src/insights.js +504 -0
  110. package/src/interactive.js +338 -0
  111. package/src/orchestrator.js +217 -0
  112. package/src/parallel.js +1544 -0
  113. package/src/retry.js +274 -0
  114. package/src/stack.js +320 -0
  115. package/src/tools/index.js +27 -0
  116. package/src/tools/prompts.js +45 -0
  117. package/src/tools/schemas.js +38 -0
  118. package/src/tools/validation.js +83 -0
  119. package/src/update.js +112 -0
  120. package/src/validate.js +172 -0
@@ -0,0 +1,136 @@
1
+ # Feature Specification — Compressed Feedback Prompts
2
+
3
+ ## 1. Feature Intent
4
+ **Why this feature exists.**
5
+
6
+ - Current feedback prompts are verbose (~10 lines, ~200 tokens per stage)
7
+ - Feedback is collected at 3 points: Cass→Alex, Nigel→Cass, Codey→Nigel
8
+ - Total overhead: ~600 tokens per pipeline run
9
+ - Compressed prompts achieve same result with ~3 lines each
10
+
11
+ ---
12
+
13
+ ## 2. Scope
14
+ ### In Scope
15
+ - Rewrite feedback prompt sections to be more concise
16
+ - Maintain same output format (JSON with rating, issues, recommendation)
17
+ - Ensure feedback quality is not degraded
18
+
19
+ ### Out of Scope
20
+ - Changing feedback data structure
21
+ - Removing feedback collection
22
+ - Changing quality gate thresholds
23
+
24
+ ---
25
+
26
+ ## 3. Actors Involved
27
+
28
+ | Actor | Feedback Role |
29
+ |-------|--------------|
30
+ | Cass | Rates Alex's feature spec |
31
+ | Nigel | Rates Cass's user stories |
32
+ | Codey | Rates Nigel's tests |
33
+
34
+ ---
35
+
36
+ ## 4. Behaviour Overview
37
+
38
+ **Current verbose prompt (~10 lines):**
39
+ ```
40
+ FIRST, before writing stories, evaluate Alex's feature spec:
41
+ - Rating (1-5): How clear and complete is the spec?
42
+ - Issues: List any problems (e.g., "missing-error-handling", "unclear-scope")
43
+ - Recommendation: "proceed" | "pause" | "revise"
44
+
45
+ Output your feedback as:
46
+ FEEDBACK: { "rating": N, "issues": [...], "recommendation": "..." }
47
+ ```
48
+
49
+ **Compressed prompt (~3 lines):**
50
+ ```
51
+ FEEDBACK FIRST: Rate prior stage 1-5, list issues (e.g., unclear-scope), recommend proceed|pause|revise.
52
+ Format: FEEDBACK: {"rating":N,"issues":["..."],"rec":"proceed|pause|revise"}
53
+ Then continue with your task.
54
+ ```
55
+
56
+ **Key outcomes:**
57
+ - ~400 fewer tokens per pipeline run (3 stages × ~130 token savings)
58
+ - Same feedback data collected
59
+ - Same quality gate functionality
60
+
61
+ ---
62
+
63
+ ## 5. State & Lifecycle Interactions
64
+
65
+ - No state changes
66
+ - Feedback format unchanged
67
+ - Quality gate logic unchanged
68
+
69
+ ---
70
+
71
+ ## 6. Rules & Decision Logic
72
+
73
+ | Rule | Description |
74
+ |------|-------------|
75
+ | Same output format | JSON structure must remain compatible with feedback.js |
76
+ | Abbreviations allowed | "rec" instead of "recommendation" in output |
77
+ | Examples condensed | One inline example instead of multiple lines |
78
+
79
+ ---
80
+
81
+ ## 7. Dependencies
82
+
83
+ - SKILL.md feedback sections updated
84
+ - `src/feedback.js` may need to accept abbreviated keys ("rec" → "recommendation")
85
+ - No other module changes
86
+
87
+ ---
88
+
89
+ ## 8. Non-Functional Considerations
90
+
91
+ - **Performance:** ~400 token reduction per run
92
+ - **Clarity:** Compressed prompts must still be unambiguous
93
+ - **Risk:** Agents may misinterpret terse instructions
94
+
95
+ ---
96
+
97
+ ## 9. Assumptions & Open Questions
98
+
99
+ **Assumptions:**
100
+ - Agents can parse terse instructions correctly
101
+ - Abbreviated JSON keys are acceptable
102
+ - Feedback quality won't degrade with shorter prompts
103
+
104
+ **Open Questions:**
105
+ - Should we A/B test compressed vs verbose prompts?
106
+ - Is "rec" acceptable or should we keep "recommendation"?
107
+ - Do we need to update feedback.js to normalize keys?
108
+
109
+ ---
110
+
111
+ ## 10. Impact on System Specification
112
+
113
+ - No impact on system specification
114
+ - Feedback behaviour unchanged
115
+ - Quality gates unchanged
116
+
117
+ ---
118
+
119
+ ## 11. Handover to BA (Cass)
120
+
121
+ **Story themes:**
122
+ - Rewrite Cass feedback prompt (rates Alex)
123
+ - Rewrite Nigel feedback prompt (rates Cass)
124
+ - Rewrite Codey feedback prompt (rates Nigel)
125
+ - Update feedback.js if key normalization needed
126
+
127
+ **Expected story boundaries:**
128
+ - One story for prompt compression (all 3 stages)
129
+ - One story for feedback.js updates if needed
130
+
131
+ ---
132
+
133
+ ## 12. Change Log (Feature-Level)
134
+ | Date | Change | Reason | Raised By |
135
+ |-----|------|--------|-----------|
136
+ | 2026-02-25 | Initial spec | Token efficiency improvement | Claude |
@@ -0,0 +1,40 @@
1
+ # Implementation Plan — Compressed Feedback Prompts
2
+
3
+ ## Summary
4
+
5
+ Compress verbose feedback prompts (~10 lines) to terse format (~3 lines) across three pipeline stages (Cass, Nigel, Codey). Update `src/feedback.js` to normalize the abbreviated "rec" key to "recommendation" and add a parsing function for extracting feedback JSON from agent output.
6
+
7
+ ## Files to Create/Modify
8
+
9
+ | Path | Action | Purpose |
10
+ |------|--------|---------|
11
+ | `SKILL.md` | Modify | Compress feedback prompts in Steps 6.5, 7.5, 8.5 |
12
+ | `src/feedback.js` | Modify | Add key normalization and output parsing functions |
13
+
14
+ ## Implementation Steps
15
+
16
+ 1. **Add `normalizeFeedbackKeys()` to feedback.js** — Function that converts `{rec: "..."}` to `{recommendation: "..."}` while preserving existing full key.
17
+
18
+ 2. **Add `parseFeedbackFromOutput()` to feedback.js** — Regex-based parser to extract `FEEDBACK: {...}` JSON from agent output text.
19
+
20
+ 3. **Update `validateFeedback()` in feedback.js** — Accept both "rec" and "recommendation" keys by checking either before validation.
21
+
22
+ 4. **Export new functions from feedback.js** — Add `normalizeFeedbackKeys` and `parseFeedbackFromOutput` to module.exports.
23
+
24
+ 5. **Compress Step 6.5 prompt in SKILL.md** — Replace Cass→Alex verbose prompt with:
25
+ ```
26
+ FEEDBACK FIRST: Rate Alex's spec 1-5, list issues (e.g., unclear-scope), recommend proceed|pause|revise.
27
+ Format: FEEDBACK: {"rating":N,"issues":["..."],"rec":"proceed|pause|revise"}
28
+ Then continue with your task.
29
+ ```
30
+
31
+ 6. **Compress Step 7.5 prompt in SKILL.md** — Replace Nigel→Cass verbose prompt with similar terse format.
32
+
33
+ 7. **Compress Step 8.5 prompt in SKILL.md** — Replace Codey→Nigel verbose prompt with similar terse format.
34
+
35
+ 8. **Run tests** — Execute `node --test test/feature_compressed-feedback.test.js` to verify implementation.
36
+
37
+ ## Risks/Questions
38
+
39
+ - **Agent interpretation:** Terse prompts may occasionally confuse agents; monitor initial runs for correct feedback format.
40
+ - **Key preference:** If both "rec" and "recommendation" appear, current implementation prefers "recommendation" — this matches test expectations.
@@ -0,0 +1,347 @@
1
+ # Feature Specification — Agent Feedback Loop
2
+
3
+ ## 1. Feature Intent
4
+ **Why this feature exists.**
5
+
6
+ - **Problem being addressed:** The murmur8 pipeline executes sequentially but lacks intra-stage feedback. Agents cannot assess the quality of upstream artifacts, leading to silent propagation of poor-quality specifications, stories, or tests through the pipeline.
7
+ - **User need:** Developers want visibility into how each agent perceives the quality of inputs from previous stages. When quality is low, the pipeline should pause for human review rather than proceeding with flawed inputs.
8
+ - **System alignment:** Per SYSTEM_SPEC.md:Section 7 (Governing Rules), agents are expected to "flag deviations" and "not silently alter specifications". This feature operationalises that principle by requiring explicit quality assessment at each stage boundary.
9
+
10
+ > This feature introduces a quality feedback mechanism that integrates with existing history, insights, and retry modules to create a closed-loop quality system.
11
+
12
+ ---
13
+
14
+ ## 2. Scope
15
+
16
+ ### In Scope
17
+
18
+ - Feedback collection schema and data structure for agent assessments
19
+ - Feedback capture at each stage boundary (Cass on Alex, Nigel on Cass, Codey on Nigel)
20
+ - Quality gate logic: pause pipeline if feedback rating falls below threshold
21
+ - Configuration management for feedback thresholds (`.claude/feedback-config.json`)
22
+ - Storage of feedback in pipeline history entries
23
+ - Insights extension: correlation analysis between feedback scores and outcomes
24
+ - Retry integration: mapping feedback issues to retry strategies
25
+ - CLI commands for feedback configuration and analysis
26
+
27
+ ### Out of Scope
28
+
29
+ - Feedback from Alex (no prior stage to assess within the pipeline)
30
+ - Automatic remediation based on feedback (human review required)
31
+ - Cross-pipeline feedback aggregation (each run is independent)
32
+ - Natural language feedback parsing (structured schema only)
33
+ - Feedback on auto-commit stage (no agent assessment)
34
+
35
+ ---
36
+
37
+ ## 3. Actors Involved
38
+
39
+ ### Human User
40
+
41
+ - **Can do:** View feedback thresholds; modify threshold configuration; view feedback correlation insights; review and approve paused pipelines
42
+ - **Cannot do:** Directly inject feedback into history; bypass quality gates without explicit action
43
+
44
+ ### Cass (Story Writer Agent)
45
+
46
+ - **Can do:** Provide feedback on Alex's feature specification before writing stories
47
+ - **Feedback target:** Feature specification quality, completeness, clarity
48
+
49
+ ### Nigel (Tester Agent)
50
+
51
+ - **Can do:** Provide feedback on Cass's user stories before writing tests
52
+ - **Feedback target:** Story quality, acceptance criteria testability, scope clarity
53
+
54
+ ### Codey (Developer Agent)
55
+
56
+ - **Can do:** Provide feedback on Nigel's test specification before planning/implementing
57
+ - **Feedback target:** Test coverage, implementation feasibility, specification clarity
58
+
59
+ ### Pipeline Orchestrator (SKILL.md implementation)
60
+
61
+ - **Can do:** Collect feedback from agents; evaluate against thresholds; persist to history; trigger quality gates
62
+ - **Cannot do:** Override human decisions on paused pipelines
63
+
64
+ ### History Module (src/history.js)
65
+
66
+ - **Extended by:** New `feedback` field in stage entries
67
+ - **Maintains:** Backward compatibility with existing entries (no feedback = null)
68
+
69
+ ### Insights Module (src/insights.js)
70
+
71
+ - **Extended by:** New `--feedback` analysis mode
72
+ - **Provides:** Calibration scoring, issue pattern correlation
73
+
74
+ ### Retry Module (src/retry.js)
75
+
76
+ - **Extended by:** Feedback-informed strategy selection
77
+ - **Consumes:** Issue patterns from feedback to recommend targeted strategies
78
+
79
+ ---
80
+
81
+ ## 4. Behaviour Overview
82
+
83
+ ### Happy Path: Feedback Collection and Proceeding
84
+
85
+ 1. Alex completes feature specification
86
+ 2. Orchestrator spawns Cass with explicit instruction to provide feedback on Alex's output
87
+ 3. Cass writes feedback object (rating, confidence, issues, recommendation) to designated output
88
+ 4. Orchestrator reads feedback and evaluates against configured threshold (default: 3.0)
89
+ 5. Rating >= threshold: pipeline proceeds, feedback stored in history
90
+ 6. Cass proceeds to write user stories
91
+ 7. Pattern repeats for Nigel (feedback on Cass) and Codey (feedback on Nigel)
92
+ 8. On completion, all feedback is persisted in history entry
93
+
94
+ ### Alternative: Quality Gate Triggers Pause
95
+
96
+ 1. Agent provides feedback with rating < configured threshold
97
+ 2. Agent's recommendation is "pause" or "revise"
98
+ 3. Orchestrator pauses pipeline before current agent's main work
99
+ 4. User is prompted: "Quality gate triggered. {Agent} rated previous stage {rating}/5. Issues: {issues}. (review/proceed/abort)"
100
+ 5. User can review upstream artifacts, request revision, or proceed anyway
101
+ 6. Decision and feedback are recorded in history
102
+
103
+ ### Alternative: Dynamic Threshold Adjustment
104
+
105
+ 1. User runs `murmur8 insights --feedback` after sufficient runs
106
+ 2. Insights module calculates agent calibration (how predictive is their feedback of actual outcomes)
107
+ 3. User runs `murmur8 feedback-config set minRating 3.5` to adjust threshold based on data
108
+ 4. Future runs use updated threshold
109
+
110
+ ### Alternative: Retry with Feedback-Informed Strategy
111
+
112
+ 1. Pipeline fails at a stage (e.g., Codey cannot implement)
113
+ 2. Retry module examines feedback chain for the failed run
114
+ 3. Feedback issues are mapped to strategies:
115
+ - "missing-error-handling" → `add-context`
116
+ - "too-complex" → `simplify-prompt`
117
+ - "too-many-stories" → `reduce-stories`
118
+ 4. Recommended strategy reflects feedback analysis
119
+ 5. User accepts or chooses alternative
120
+
121
+ ---
122
+
123
+ ## 5. State & Lifecycle Interactions
124
+
125
+ ### States Entered
126
+
127
+ - **feedback_pending:** After upstream agent completes, before downstream agent provides feedback
128
+ - **quality_gate_paused:** When feedback triggers quality gate (rating < threshold)
129
+
130
+ ### States Exited
131
+
132
+ - **feedback_pending → in_progress:** When feedback is recorded and threshold is met
133
+ - **quality_gate_paused → in_progress:** When user chooses to proceed
134
+ - **quality_gate_paused → paused:** When user requests review/revision
135
+
136
+ ### States Modified
137
+
138
+ - Pipeline history entries gain `stages[].feedback` field
139
+ - Queue entries may include temporary feedback data during execution
140
+
141
+ ### Lifecycle Classification
142
+
143
+ - **State-creating:** Creates feedback_pending state at each stage boundary
144
+ - **State-constraining:** Quality gates can block progression
145
+ - **State-transitioning:** Moves between feedback states based on ratings
146
+
147
+ ---
148
+
149
+ ## 6. Rules & Decision Logic
150
+
151
+ ### Rule 1: Feedback Schema Validation
152
+
153
+ - **Description:** All feedback must conform to the defined schema
154
+ - **Inputs:** Agent feedback output
155
+ - **Outputs:** Validated feedback object or validation error
156
+ - **Type:** Deterministic
157
+
158
+ ```json
159
+ {
160
+ "about": "alex|cass|nigel",
161
+ "rating": 1-5,
162
+ "confidence": 0.0-1.0,
163
+ "issues": ["issue-code", ...],
164
+ "recommendation": "proceed|pause|revise"
165
+ }
166
+ ```
167
+
168
+ ### Rule 2: Quality Gate Evaluation
169
+
170
+ - **Description:** Compare feedback rating against threshold to determine if pipeline should pause
171
+ - **Inputs:** Feedback rating, configured threshold, recommendation
172
+ - **Outputs:** Boolean (shouldPause)
173
+ - **Type:** Deterministic
174
+
175
+ ```
176
+ shouldPause = (rating < minRatingThreshold) OR (recommendation === "pause")
177
+ ```
178
+
179
+ ### Rule 3: Issue-to-Strategy Mapping
180
+
181
+ - **Description:** Map feedback issue codes to retry strategies
182
+ - **Inputs:** List of issue codes from feedback chain
183
+ - **Outputs:** Prioritised list of recommended strategies
184
+ - **Type:** Deterministic with configurable mappings
185
+
186
+ Default mappings:
187
+ | Issue Code | Strategy |
188
+ |------------|----------|
189
+ | `missing-error-handling` | `add-context` |
190
+ | `unclear-scope` | `simplify-prompt` |
191
+ | `too-complex` | `simplify-prompt` |
192
+ | `too-many-stories` | `reduce-stories` |
193
+ | `untestable-criteria` | `simplify-tests` |
194
+ | `missing-edge-cases` | `add-context` |
195
+
196
+ ### Rule 4: Agent Calibration Calculation
197
+
198
+ - **Description:** Measure correlation between agent feedback and eventual pipeline outcomes
199
+ - **Inputs:** Historical feedback ratings, pipeline outcomes (success/failed)
200
+ - **Outputs:** Calibration score per agent (0.0 = uncorrelated, 1.0 = perfect predictor)
201
+ - **Type:** Deterministic (statistical calculation)
202
+
203
+ ```
204
+ calibration[agent] = correlation(feedback_ratings[agent], outcome_success_binary)
205
+ ```
206
+
207
+ ### Rule 5: Threshold Recommendation
208
+
209
+ - **Description:** Suggest optimal threshold based on historical data
210
+ - **Inputs:** All feedback/outcome pairs, desired false positive/negative balance
211
+ - **Outputs:** Recommended minRating threshold
212
+ - **Type:** Deterministic
213
+
214
+ ---
215
+
216
+ ## 7. Dependencies
217
+
218
+ ### System Components
219
+
220
+ - **src/history.js:** Extended to store feedback in history entries
221
+ - New field: `stages[stage].feedback` containing feedback object
222
+ - Backward compatible: existing entries without feedback are valid
223
+
224
+ - **src/insights.js:** Extended with feedback analysis functions
225
+ - New function: `analyzeFeedbackCorrelation(history)`
226
+ - New CLI flag: `--feedback` for feedback-specific analysis
227
+
228
+ - **src/retry.js:** Extended with feedback-informed strategy selection
229
+ - New function: `mapIssuesToStrategies(issues, config)`
230
+ - Modified: `shouldRetry()` to consider feedback chain
231
+
232
+ - **bin/cli.js:** New command registration
233
+ - `murmur8 feedback-config` (view)
234
+ - `murmur8 feedback-config set <key> <value>`
235
+ - `murmur8 insights --feedback`
236
+
237
+ ### File Dependencies
238
+
239
+ - **`.claude/feedback-config.json`:** Configuration storage
240
+ - **`.claude/pipeline-history.json`:** Extended schema for feedback storage
241
+
242
+ ### Agent Specification Dependencies
243
+
244
+ - Agent prompts (in SKILL.md or agent specs) must include feedback collection instructions
245
+ - Feedback schema must be communicated to agents in their task prompts
246
+
247
+ ---
248
+
249
+ ## 8. Non-Functional Considerations
250
+
251
+ ### Performance
252
+
253
+ - Feedback collection adds one structured output per stage (minimal overhead)
254
+ - Quality gate evaluation is O(1) comparison
255
+ - Calibration calculation is O(n) over history entries
256
+
257
+ ### Resilience
258
+
259
+ - If feedback collection fails, pipeline proceeds with warning (degraded mode)
260
+ - Missing feedback in history is treated as neutral (no quality gate effect)
261
+ - Invalid feedback schema triggers warning but does not block pipeline
262
+
263
+ ### Audit/Logging
264
+
265
+ - All feedback is persisted in history for retrospective analysis
266
+ - Quality gate decisions are logged with timestamp and user action
267
+
268
+ ### Security
269
+
270
+ - Feedback file is gitignored (contains project-specific assessments)
271
+ - No sensitive data expected in feedback (ratings, codes, recommendations only)
272
+
273
+ ---
274
+
275
+ ## 9. Assumptions & Open Questions
276
+
277
+ ### Assumptions
278
+
279
+ - ASSUMPTION: Agents can reliably produce structured feedback in the specified schema
280
+ - ASSUMPTION: Feedback ratings are comparable across agents and runs
281
+ - ASSUMPTION: Issue codes will emerge from practice and can be standardised iteratively
282
+ - ASSUMPTION: Correlation analysis requires 10+ completed runs for meaningful results
283
+
284
+ ### Open Questions
285
+
286
+ - Should feedback influence agent prompts proactively (not just on retry)?
287
+ - How should conflicting feedback (high rating but "pause" recommendation) be handled?
288
+ - Should there be feedback severity levels (blocking vs advisory)?
289
+ - What feedback should Codey provide about Codey-plan (within same agent)?
290
+
291
+ ---
292
+
293
+ ## 10. Impact on System Specification
294
+
295
+ ### Reinforces Existing Assumptions
296
+
297
+ - Per SYSTEM_SPEC.md:Section 7, agents must "flag deviations" - feedback formalises this
298
+ - Per SYSTEM_SPEC.md:Section 8, failure handling already supports pause/review - quality gates extend this
299
+
300
+ ### Stretches Existing Assumptions
301
+
302
+ - History module shifts from pure observability to operational dependency (also noted in adaptive-retry)
303
+ - Agent boundaries are subtly extended: agents now assess peer outputs, not just produce artifacts
304
+ - Pipeline flow gains conditional branches (quality gates) beyond explicit `--pause-after`
305
+
306
+ ### Potential Contradiction
307
+
308
+ The system spec states pipelines are "sequential" (Section 7). Quality gates introduce conditional pauses that may feel like interruptions. However, this is consistent with the existing `--pause-after` mechanism and does not fundamentally alter sequence.
309
+
310
+ **Flagged for consideration:** Should SYSTEM_SPEC.md:Section 6 be updated to explicitly acknowledge quality gates as a pipeline flow modifier?
311
+
312
+ ---
313
+
314
+ ## 11. Handover to BA (Cass)
315
+
316
+ ### Story Themes
317
+
318
+ 1. **Feedback Collection:** Agents provide structured feedback on upstream artifacts
319
+ 2. **Quality Gates:** Pipeline pauses when feedback indicates quality concerns
320
+ 3. **Configuration Management:** User can view and modify feedback thresholds
321
+ 4. **History Integration:** Feedback is stored in pipeline history entries
322
+ 5. **Insights Extension:** Feedback correlation analysis and calibration scoring
323
+ 6. **Retry Integration:** Feedback issues inform retry strategy selection
324
+
325
+ ### Expected Story Boundaries
326
+
327
+ - Feedback schema definition and validation as foundational story
328
+ - Quality gate logic as separate story (depends on schema)
329
+ - CLI configuration commands as separate story (parallel track)
330
+ - History integration as separate story (depends on schema)
331
+ - Insights extension as separate story (depends on history integration)
332
+ - Retry integration as separate story (depends on insights correlation)
333
+
334
+ ### Areas Needing Careful Story Framing
335
+
336
+ - Feedback collection happens *within* agent execution; must not disrupt agent focus
337
+ - Quality gate UX: user prompt must clearly explain situation and options
338
+ - Issue code taxonomy: start with small set, plan for iterative expansion
339
+ - Calibration display: statistical concepts must be presented accessibly
340
+
341
+ ---
342
+
343
+ ## 12. Change Log (Feature-Level)
344
+
345
+ | Date | Change | Reason | Raised By |
346
+ |------------|---------------------------------------|---------------------------------|-----------|
347
+ | 2026-02-24 | Initial feature specification created | Feature request for agent feedback system | Alex |
@@ -0,0 +1,71 @@
1
+ # Implementation Plan — Feedback Loop Feature
2
+
3
+ ## Summary
4
+
5
+ This feature adds a quality feedback mechanism where downstream agents (Cass, Nigel, Codey) assess upstream artifacts before proceeding. Implementation requires a new `src/feedback.js` module for schema validation, quality gate logic, configuration management, and insights analysis. The existing `src/history.js`, `src/insights.js`, and `src/retry.js` modules need extensions to store, analyze, and act on feedback data.
6
+
7
+ ---
8
+
9
+ ## Files to Create/Modify
10
+
11
+ | Path | Action | Purpose |
12
+ |------|--------|---------|
13
+ | `src/feedback.js` | Create | Core feedback logic: validation, quality gates, config management |
14
+ | `src/history.js` | Modify | Add `storeStageFeedback()` and extend entry schema |
15
+ | `src/insights.js` | Modify | Add feedback analysis: `analyzeFeedbackCorrelation()`, calibration |
16
+ | `src/retry.js` | Modify | Add `mapIssuesToStrategies()` for feedback-informed retries |
17
+ | `bin/cli.js` | Modify | Register `feedback-config` command and `insights --feedback` flag |
18
+ | `src/index.js` | Modify | Export feedback module |
19
+
20
+ ---
21
+
22
+ ## Implementation Steps
23
+
24
+ 1. **Create `src/feedback.js` with schema validation** — Implement `validateFeedback()` per FEATURE_SPEC.md:Rule 1. Schema: `{about, rating, confidence, issues, recommendation}`. Return `{valid, errors}`.
25
+
26
+ 2. **Add quality gate logic to `src/feedback.js`** — Implement `shouldPause(feedback, config)` per FEATURE_SPEC.md:Rule 2. Returns true if `rating < minRatingThreshold` OR `recommendation === "pause"`.
27
+
28
+ 3. **Add config management to `src/feedback.js`** — Implement `getDefaultConfig()`, `readConfig()`, `writeConfig()`, `setConfigValue()`. Default: `{minRatingThreshold: 3.0, enabled: true, issueMappings: {...}}`.
29
+
30
+ 4. **Extend `src/history.js`** — Add `storeStageFeedback(slug, stage, feedback)` to persist feedback at `stages[stage].feedback`. Ensure backward compatibility (missing feedback = null).
31
+
32
+ 5. **Add calibration calculation to `src/insights.js`** — Implement `calculateCalibration(agent, history)` per FEATURE_SPEC.md:Rule 4. Return null if <10 runs with feedback, else correlation score 0-1.
33
+
34
+ 6. **Add issue correlation to `src/insights.js`** — Implement `correlateIssues(history)` to map issue codes to failure rates. Return `{issueCode: failureCorrelation}`.
35
+
36
+ 7. **Add threshold recommendation to `src/insights.js`** — Implement `recommendThreshold(history)` to suggest optimal minRatingThreshold based on historical data.
37
+
38
+ 8. **Extend `src/retry.js`** — Add `mapIssuesToStrategies(issues, config)` using default mappings from FEATURE_SPEC.md:Rule 3.
39
+
40
+ 9. **Register CLI commands in `bin/cli.js`** — Add `feedback-config` (view), `feedback-config set <key> <value>`, and `--feedback` flag to `insights` command.
41
+
42
+ 10. **Wire exports in `src/index.js`** — Export feedback module for orchestrator integration.
43
+
44
+ ---
45
+
46
+ ## Key Functions
47
+
48
+ **src/feedback.js:**
49
+ - `validateFeedback(feedback)` — Schema validation, returns `{valid, errors}`
50
+ - `shouldPause(feedback, config)` — Quality gate evaluation
51
+ - `getDefaultConfig()` / `readConfig()` / `writeConfig(config)` — Config I/O
52
+ - `setConfigValue(key, value)` — CLI config setter with validation
53
+ - `displayConfig()` — Pretty-print current config
54
+
55
+ **src/insights.js (new):**
56
+ - `calculateCalibration(agent, history)` — Agent calibration score
57
+ - `correlateIssues(history)` — Issue-to-failure correlation
58
+ - `recommendThreshold(history)` — Optimal threshold suggestion
59
+ - `displayFeedbackInsights(options)` — CLI output for `--feedback`
60
+
61
+ **src/retry.js (new):**
62
+ - `mapIssuesToStrategies(issues, config)` — Feedback-informed strategy selection
63
+
64
+ ---
65
+
66
+ ## Risks/Questions
67
+
68
+ - **Agent prompt integration**: Feedback collection requires agent prompts (in SKILL.md) to include feedback instructions. This is orchestrator-level work outside core modules.
69
+ - **Calibration metric**: Tests use simple accuracy (predicted vs actual). May need Pearson correlation for better calibration measure in production.
70
+ - **Issue taxonomy**: Starting with 6 issue codes per FEATURE_SPEC.md:Rule 3. Plan for iterative expansion.
71
+ - **Conflicting signals**: Per FEATURE_SPEC.md:Section 9, high rating + "pause" recommendation is unresolved. Recommend: recommendation takes precedence.
@@ -0,0 +1,63 @@
1
+ # Story — Feedback Collection
2
+
3
+ ## User Story
4
+
5
+ As a **pipeline orchestrator**, I want **downstream agents to provide structured feedback on upstream artifacts** so that **quality issues are surfaced explicitly at each stage boundary**.
6
+
7
+ ---
8
+
9
+ ## Context / Scope
10
+
11
+ - Per FEATURE_SPEC.md:Section 4, feedback is collected at each stage boundary (Cass on Alex, Nigel on Cass, Codey on Nigel)
12
+ - Feedback uses a defined schema with rating, confidence, issues, and recommendation
13
+ - Feedback is captured before the downstream agent begins its main work
14
+ - Per SYSTEM_SPEC.md:Section 7, agents must "flag deviations" — this story operationalises that principle
15
+
16
+ ---
17
+
18
+ ## Acceptance Criteria
19
+
20
+ **AC-1 — Feedback schema structure**
21
+ - Given an agent is spawned to provide feedback,
22
+ - When the agent completes feedback output,
23
+ - Then the feedback object contains:
24
+ - `about`: agent name being assessed (alex|cass|nigel)
25
+ - `rating`: integer 1-5
26
+ - `confidence`: float 0.0-1.0
27
+ - `issues`: array of issue codes (may be empty)
28
+ - `recommendation`: one of "proceed", "pause", or "revise"
29
+
30
+ **AC-2 — Cass provides feedback on Alex**
31
+ - Given Alex has completed a feature specification,
32
+ - When Cass is spawned for story writing,
33
+ - Then Cass first produces a feedback object with `about: "alex"` assessing the feature spec quality.
34
+
35
+ **AC-3 — Nigel provides feedback on Cass**
36
+ - Given Cass has completed user stories,
37
+ - When Nigel is spawned for test writing,
38
+ - Then Nigel first produces a feedback object with `about: "cass"` assessing story quality and testability.
39
+
40
+ **AC-4 — Codey provides feedback on Nigel**
41
+ - Given Nigel has completed test specifications,
42
+ - When Codey is spawned for planning/implementation,
43
+ - Then Codey first produces a feedback object with `about: "nigel"` assessing test coverage and implementation feasibility.
44
+
45
+ **AC-5 — Feedback validation**
46
+ - Given an agent produces a feedback object,
47
+ - When the orchestrator reads the feedback,
48
+ - Then the feedback is validated against the schema,
49
+ - And invalid feedback triggers a warning but does not block the pipeline (per FEATURE_SPEC.md:Section 8, degraded mode).
50
+
51
+ **AC-6 — Feedback persisted to history**
52
+ - Given feedback is collected from an agent,
53
+ - When the stage completes,
54
+ - Then the feedback is stored in the history entry at `stages[stage].feedback`.
55
+
56
+ ---
57
+
58
+ ## Out of Scope
59
+
60
+ - Feedback from Alex (no prior stage to assess)
61
+ - Feedback on auto-commit stage
62
+ - Natural language feedback parsing (structured schema only)
63
+ - Automatic remediation based on feedback