orchestr8 3.2.0 β†’ 3.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -40,6 +40,7 @@ Implement the feature according to the plan. Work incrementally, making tests pa
40
40
  - Match existing patterns in the codebase
41
41
  - Validate inputs defensively
42
42
  - Handle errors gracefully
43
+ - If tests pass but behaviour feels wrong or forced, consult the failure-mode rituals in `.blueprint/ways_of_working/DEVELOPMENT_RITUAL.md`
43
44
 
44
45
  ## Completion
45
46
 
@@ -1,142 +1,178 @@
1
- # Development Ritual (with CLI + Failure Modes)
1
+ # Development Ritual
2
2
 
3
3
  This document defines:
4
- - the **core ritual**
5
- - a **CLI checklist** agents must walk through
6
- - **micro-rituals for failure modes**
4
+ - The **pipeline stages** and what each agent must deliver
5
+ - **Checklists** each agent must satisfy before handoff
6
+ - **Failure-mode rituals** that override normal flow when triggered
7
+ - The **feedback and handoff** mechanisms that connect stages
7
8
 
8
- A stage is not complete until its ritual is satisfied.
9
+ A stage is not complete until its checklist is satisfied.
9
10
 
10
11
  ---
11
12
 
12
- ## πŸ” Core Ritual (Summary)
13
+ ## Pipeline Stages
13
14
 
14
- 1️⃣ Story β†’ Tester
15
- 2️⃣ Tester β†’ Developer
16
- 3️⃣ Developer β†’ QA
15
+ ```
16
+ Alex (feature spec) β†’ Cass (user stories) β†’ Nigel (tests) β†’ Codey (plan β†’ implement) β†’ Auto-commit β†’ Human QA
17
+ ```
17
18
 
18
- Tests define behaviour. QA validates intent.
19
+ Each agent reads the previous agent's outputs and produces artifacts for the next. Context is passed via **handoff summaries** (max 30 lines) to keep token usage efficient. The pipeline uses a **feedback chain** where each agent rates the previous agent's work before starting their own.
20
+
21
+ Tests define behaviour. The human validates intent after auto-commit.
19
22
 
20
23
  ---
21
24
 
22
- ## πŸ–₯️ CLI Agent Ritual Checklist
25
+ ## Handoff Mechanism
23
26
 
24
- Agents should **print this checklist to the CLI** at the start of their work and explicitly tick items as they complete them.
27
+ Between stages, the pipeline creates a handoff summary (`handoff-{agent}.md`, max 30 lines) that passes key context to the next agent. This keeps each agent focused without re-reading everything from scratch.
25
28
 
26
- ### Example CLI pattern
29
+ Each agent also provides **feedback** on the previous agent's output:
30
+ - **Rating** (1-5) on quality
31
+ - **Issues** list (if any)
32
+ - **Recommendation**: `proceed`, `pause`, or `revise`
33
+
34
+ If the rating falls below the configured threshold (default: 3.0), the pipeline pauses for human review. See `feedback-config` for threshold settings.
35
+
36
+ ---
37
+
38
+ ## Agent Checklists
39
+
40
+ ### Alex (System Specification)
41
+
42
+ Before writing the feature spec:
43
+ - [ ] Read the system specification
44
+ - [ ] Read relevant business context (`.business_context/`)
45
+ - [ ] Read the feature template
46
+
47
+ Before handoff:
48
+ - [ ] Feature spec written to `FEATURE_SPEC.md`
49
+ - [ ] Intent, scope, actors, rules, and dependencies covered
50
+ - [ ] Ambiguities flagged explicitly
51
+ - [ ] Assumptions labelled as such
52
+ - [ ] Spec aligns with system boundaries
53
+
54
+ ### Cass (Story Writer)
55
+
56
+ Before writing stories:
57
+ - [ ] Read the feature spec
58
+ - [ ] Read the system specification for context
59
+ - [ ] Identified primary behaviour, entry/exit conditions, branching logic
60
+
61
+ Before handoff:
62
+ - [ ] Each story file (`story-{slug}.md`) has a single clear goal
63
+ - [ ] Acceptance criteria are in Given/When/Then, max 5-7 per story
64
+ - [ ] Routing is explicit (no "goes to next screen")
65
+ - [ ] Out-of-scope items listed
66
+ - [ ] Assumptions flagged
67
+
68
+ ### Nigel (Tester)
27
69
 
28
- ```text
29
- [ ] Read story and acceptance criteria
30
- [ ] Read tester understanding & test plan
31
- [ ] Ran baseline tests
32
- [ ] Implemented behaviour
33
- [ ] Tests passing
34
- [ ] Lint passing
35
- [ ] Summary written
36
- ```
37
- ### Tester CLI Ritual (Nigel)
38
70
  Before writing tests:
39
- [ ] Story has a single clear goal
40
- [ ] Acceptance criteria are testable
41
- [ ] Ambiguities identified
42
- [ ] Assumptions written down
43
-
44
- Before handover to the human to pass to Claude:
45
- [ ] Understanding summary written
46
- [ ] Test plan created
47
- [ ] Happy path tests written
48
- [ ] Edge/error tests written
49
- [ ] Tests runnable via npm test
50
- [ ] Traceability table complete
51
- [ ] Open questions listed
52
-
53
- If any box is unchecked β†’ raise it with the human that its not ready to hand over. If all boxes are checked, let the human know that its ready to handover to Claude.
54
-
55
- πŸ§‘β€πŸ’» Developer CLI Ritual (Claude)
71
+ - [ ] Read all story files and the feature spec
72
+ - [ ] Acceptance criteria are testable
73
+ - [ ] Ambiguities identified
74
+ - [ ] Assumptions written down
75
+
76
+ Before handoff:
77
+ - [ ] `test-spec.md` written (understanding, AC-to-test mapping, assumptions)
78
+ - [ ] Executable test file written
79
+ - [ ] Happy path tests written
80
+ - [ ] Edge case and error tests written
81
+ - [ ] Tests runnable via the project's configured test command (see `.claude/stack-config.json`)
82
+ - [ ] Traceability table complete (every AC mapped to test IDs)
83
+ - [ ] Open questions listed
84
+
85
+ If any box is unchecked, raise it before handoff.
86
+
87
+ ### Codey (Developer) β€” Planning
88
+
89
+ Before writing the plan:
90
+ - [ ] Read feature spec, stories, test-spec, and executable tests
91
+ - [ ] Built mental model of happy path, edge cases, error flows
92
+ - [ ] Identified what already exists vs what is new
93
+
94
+ Before handoff:
95
+ - [ ] `IMPLEMENTATION_PLAN.md` written (summary, files table, steps, risks)
96
+ - [ ] Steps ordered to make tests pass incrementally
97
+ - [ ] No implementation code written yet
98
+
99
+ ### Codey (Developer) β€” Implementation
100
+
56
101
  Before coding:
57
- [ ] Read story + ACs
58
- [ ] Read tester understanding
59
- [ ] Read executable tests
60
- [ ] Ran baseline tests (expected failures only)
102
+ - [ ] Read implementation plan and tests
103
+ - [ ] Ran baseline tests (note expected failures)
61
104
 
62
105
  During coding:
63
- [ ] Implemented behaviour incrementally
64
- [ ] Ran relevant tests after each change
65
- [ ] Did not weaken or delete tests
66
-
67
- Before handover to the human:
68
- [ ] All tests passing
69
- [ ] Lint passing
70
- [ ] No unexplained skip/todo
71
- [ ] Changes summarised
72
- [ ] Assumptions restated
73
-
74
- If tests pass but confidence is low β†’ trigger a failure-mode ritual.
75
-
76
- 🚨 Failure-Mode Micro-Rituals
77
- These rituals override normal flow. When triggered, stop and follow them explicitly.
78
- ❓ Tests pass, but behaviour feels wrong
79
- Trigger when:
80
- - UX feels off
81
- - behaviour technically matches tests but not intent
82
- - something feels β€œtoo easy”
83
- Ritual:
84
- [ ] Re-read original user story
85
- [ ] Re-state intended behaviour in plain English
86
- [ ] Identify mismatch: story vs tests vs implementation
87
- [ ] Decide:
88
- - tests are wrong
89
- - story is underspecified
90
- - implementation misinterpreted behaviour
91
- Outcome:
92
- Update tests (Tester)
93
- Clarify ACs (Story owner)
94
- Fix implementation (Developer)
95
- Never β€œlet it slide”.
96
-
97
- πŸ§ͺ Tests are unclear or contradictory
98
- Trigger when:
99
- - assertions conflict
100
- - test names don’t match expectations
101
- - passing tests don’t explain behaviour
102
- Ritual:
103
- [ ] Identify specific confusing test(s)
104
- [ ] State what behaviour they appear to encode
105
- [ ] Compare to acceptance criteria
106
- [ ] Propose corrected test behaviour
107
- Outcome:
108
- - Tester revises tests
109
- - Developer does not guess
110
-
111
- πŸ” Tests are failing for non-behaviour reasons
112
- Trigger when:
113
- - environment/setup issues
114
- - brittle timing
115
- - global state leakage
116
- Ritual:
117
- [ ] Confirm failure is not missing behaviour
118
- [ ] Isolate failing test
119
- [ ] Remove flakiness or hidden coupling
120
- [ ] Re-run full suite
121
- Outcome:
122
- - Stabilise tests before continuing feature work
123
-
124
- ⚠️ Developer changed behaviour to make tests pass
125
- Trigger when:
126
- - implementation feels forced
127
- - logic seems unnatural or overly complex
128
- Ritual:
129
- [ ] Pause implementation
130
- [ ] Identify which test is driving awkward behaviour
131
- [ ] Re-check acceptance criteria
132
- [ ] Raise concern to Tester / QA
133
- Outcome:
134
- - Adjust tests or clarify intent
135
- - Prefer simpler behaviour aligned to story
136
-
137
- 🧭 Meta-Rules (Always On)
138
- ❗ Tests are the behavioural contract
139
- ❗ Green builds are necessary, not sufficient
140
- ❗ Assumptions must be written down
141
- ❗ No silent changes
142
- ❗ When in doubt, slow down and ask the human
106
+ - [ ] Implemented behaviour incrementally (one file at a time)
107
+ - [ ] Ran tests after each file change
108
+ - [ ] Did not weaken or delete Nigel's tests
109
+
110
+ Before handoff:
111
+ - [ ] All tests passing
112
+ - [ ] Lint passing
113
+ - [ ] No unexplained `skip` or `todo`
114
+ - [ ] Changes summarised (files changed, test status, blockers)
115
+ - [ ] Assumptions restated
116
+
117
+ If tests pass but confidence is low, trigger a failure-mode ritual (see below).
118
+
119
+ ---
120
+
121
+ ## Failure-Mode Rituals
122
+
123
+ These override normal flow. When triggered, stop and follow the steps explicitly.
124
+
125
+ ### Tests pass, but behaviour feels wrong
126
+
127
+ **Trigger:** Behaviour technically matches tests but not intent, or something feels "too easy."
128
+
129
+ - [ ] Re-read the original user story
130
+ - [ ] Re-state intended behaviour in plain English
131
+ - [ ] Identify mismatch: story vs tests vs implementation
132
+ - [ ] Decide: tests are wrong, story is underspecified, or implementation misinterpreted behaviour
133
+
134
+ **Outcome:** Update tests (Nigel), clarify ACs (Cass), or fix implementation (Codey). Never "let it slide."
135
+
136
+ ### Tests are unclear or contradictory
137
+
138
+ **Trigger:** Assertions conflict, test names don't match expectations, or passing tests don't explain behaviour.
139
+
140
+ - [ ] Identify the specific confusing test(s)
141
+ - [ ] State what behaviour they appear to encode
142
+ - [ ] Compare to acceptance criteria
143
+ - [ ] Propose corrected test behaviour
144
+
145
+ **Outcome:** Nigel revises tests. Codey does not guess.
146
+
147
+ ### Tests fail for non-behaviour reasons
148
+
149
+ **Trigger:** Environment/setup issues, brittle timing, or global state leakage.
150
+
151
+ - [ ] Confirm failure is not missing behaviour
152
+ - [ ] Isolate failing test
153
+ - [ ] Remove flakiness or hidden coupling
154
+ - [ ] Re-run full suite
155
+
156
+ **Outcome:** Stabilise tests before continuing feature work.
157
+
158
+ ### Implementation feels forced
159
+
160
+ **Trigger:** Logic seems unnatural or overly complex to make tests pass.
161
+
162
+ - [ ] Pause implementation
163
+ - [ ] Identify which test is driving the awkward behaviour
164
+ - [ ] Re-check acceptance criteria
165
+ - [ ] Raise concern to the human
166
+
167
+ **Outcome:** Adjust tests or clarify intent. Prefer simpler behaviour aligned to the story.
168
+
169
+ ---
170
+
171
+ ## Meta-Rules (Always On)
172
+
173
+ - Tests are the behavioural contract
174
+ - Green builds are necessary, not sufficient
175
+ - No silent changes β€” all assumptions written down
176
+ - When in doubt, slow down and ask the human
177
+
178
+ See `GUARDRAILS.md` for the full shared constraints (source restrictions, escalation protocol, anti-patterns).
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "orchestr8",
3
- "version": "3.2.0",
3
+ "version": "3.3.0",
4
4
  "description": "Multi-agent workflow framework for automated feature development",
5
5
  "main": "src/index.js",
6
6
  "bin": {