orchestr8 2.6.0 → 2.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (46) hide show
  1. package/.blueprint/agents/AGENT_BA_CASS.md +2 -112
  2. package/.blueprint/agents/AGENT_DEVELOPER_CODEY.md +1 -40
  3. package/.blueprint/agents/AGENT_SPECIFICATION_ALEX.md +1 -40
  4. package/.blueprint/agents/AGENT_TESTER_NIGEL.md +3 -51
  5. package/.blueprint/agents/GUARDRAILS.md +42 -0
  6. package/.blueprint/features/feature_compressed-feedback/FEATURE_SPEC.md +136 -0
  7. package/.blueprint/features/feature_compressed-feedback/IMPLEMENTATION_PLAN.md +40 -0
  8. package/.blueprint/features/feature_lazy-business-context/FEATURE_SPEC.md +140 -0
  9. package/.blueprint/features/feature_lazy-business-context/IMPLEMENTATION_PLAN.md +54 -0
  10. package/.blueprint/features/feature_model-native-features/FEATURE_SPEC.md +174 -0
  11. package/.blueprint/features/feature_model-native-features/IMPLEMENTATION_PLAN.md +45 -0
  12. package/.blueprint/features/feature_shared-guardrails/FEATURE_SPEC.md +119 -0
  13. package/.blueprint/features/feature_shared-guardrails/IMPLEMENTATION_PLAN.md +34 -0
  14. package/.blueprint/features/feature_shared-guardrails/story-extract-guardrails.md +60 -0
  15. package/.blueprint/features/feature_shared-guardrails/story-update-init-commands.md +63 -0
  16. package/.blueprint/features/feature_slim-agent-prompts/FEATURE_SPEC.md +145 -0
  17. package/.blueprint/features/feature_slim-agent-prompts/IMPLEMENTATION_PLAN.md +87 -0
  18. package/.blueprint/features/feature_slim-agent-prompts/story-create-runtime-prompt-template.md +59 -0
  19. package/.blueprint/features/feature_slim-agent-prompts/story-create-slim-agent-prompts.md +65 -0
  20. package/.blueprint/features/feature_slim-agent-prompts/story-skill-integration.md +53 -0
  21. package/.blueprint/features/feature_smart-story-routing/FEATURE_SPEC.md +147 -0
  22. package/.blueprint/features/feature_smart-story-routing/IMPLEMENTATION_PLAN.md +73 -0
  23. package/.blueprint/features/feature_template-extraction/FEATURE_SPEC.md +134 -0
  24. package/.blueprint/features/feature_template-extraction/IMPLEMENTATION_PLAN.md +46 -0
  25. package/.blueprint/features/feature_upstream-summaries/FEATURE_SPEC.md +150 -0
  26. package/.blueprint/features/feature_upstream-summaries/IMPLEMENTATION_PLAN.md +70 -0
  27. package/.blueprint/prompts/TEMPLATE.md +65 -0
  28. package/.blueprint/prompts/alex-runtime.md +48 -0
  29. package/.blueprint/prompts/cass-runtime.md +45 -0
  30. package/.blueprint/prompts/codey-implement-runtime.md +50 -0
  31. package/.blueprint/prompts/codey-plan-runtime.md +46 -0
  32. package/.blueprint/prompts/nigel-runtime.md +46 -0
  33. package/.blueprint/templates/STORY_TEMPLATE.md +96 -0
  34. package/.blueprint/templates/TEST_TEMPLATE.md +76 -0
  35. package/README.md +94 -18
  36. package/SKILL.md +180 -80
  37. package/package.json +2 -2
  38. package/src/business-context.js +91 -0
  39. package/src/classifier.js +173 -0
  40. package/src/feedback.js +47 -17
  41. package/src/handoff.js +148 -0
  42. package/src/index.js +51 -1
  43. package/src/tools/index.js +27 -0
  44. package/src/tools/prompts.js +45 -0
  45. package/src/tools/schemas.js +38 -0
  46. package/src/tools/validation.js +83 -0
@@ -0,0 +1,46 @@
1
+ You are Nigel, the Tester Agent.
2
+
3
+ ## Task
4
+
5
+ Create tests from user stories and acceptance criteria. Tests must expose ambiguities and edge cases early, providing a stable contract for the Developer to code against.
6
+
7
+ ## Inputs (read these files)
8
+
9
+ - Stories: {FEAT_DIR}/story-*.md
10
+ - Feature Spec: {FEAT_DIR}/FEATURE_SPEC.md
11
+
12
+ ## Outputs (write these files IN ORDER)
13
+
14
+ Step 1: Write {TEST_DIR}/test-spec.md containing:
15
+ - Brief understanding (5-10 lines)
16
+ - AC to Test ID mapping table (compact)
17
+ - Key assumptions (bullet list)
18
+
19
+ Step 2: Write {TEST_FILE} containing:
20
+ - Executable tests (Jest or Node test runner)
21
+ - One describe block per story
22
+ - One test per acceptance criterion
23
+
24
+ ## Rules
25
+
26
+ - Write test-spec.md FIRST, then write test file
27
+ - Keep test-spec.md under 100 lines using table format
28
+ - Tests should be self-documenting with minimal comments
29
+ - Reference story files by path in test descriptions
30
+ - Make failure states meaningful with expected error messages
31
+ - Do not over-prescribe implementation details
32
+ - Focus on externally observable behaviour
33
+
34
+ ## Test Design Principles
35
+
36
+ - Clarity over cleverness
37
+ - Deterministic tests (avoid flaky patterns)
38
+ - Cover boundaries: min/max, empty/null, invalid formats
39
+
40
+ ## Completion
41
+
42
+ Brief summary: test count, AC coverage %, assumptions (5 bullets max).
43
+
44
+ ## Reference
45
+
46
+ For detailed guidance, see: .blueprint/agents/AGENT_TESTER_NIGEL.md
@@ -0,0 +1,96 @@
1
+ # User Story Template
2
+
3
+ Use this template when writing user stories and acceptance criteria.
4
+
5
+ ---
6
+
7
+ ## Screen [N] — [Title]
8
+
9
+ ### User story
10
+ As a [role], I want [capability] so that [benefit].
11
+
12
+ ---
13
+
14
+ ### Context / scope
15
+ - Professional user (Solicitor)
16
+ - England standard possession claim
17
+ - Screen is reached when: [entry condition]
18
+ - Route:
19
+ - `GET /claims/[route-name]`
20
+ - `POST /claims/[route-name]`
21
+ - This screen captures: [what data]
22
+
23
+ ---
24
+
25
+ ### Acceptance criteria
26
+
27
+ Write ACs in **Given/When/Then** format (precondition, action, result):
28
+
29
+ **AC-1 — [Short description]**
30
+ - Given [precondition],
31
+ - When [action],
32
+ - Then [expected result].
33
+
34
+ **AC-2 — [Short description]**
35
+ - Given [precondition],
36
+ - When [action],
37
+ - Then [expected result].
38
+
39
+ <!-- Continue with AC-3, AC-4, etc. -->
40
+
41
+ **AC-N — Previous navigation**
42
+ - Given I click Previous,
43
+ - Then I am returned to [previous route]
44
+ - And any entered data is preserved in session.
45
+
46
+ **AC-N+1 — Continue navigation**
47
+ - Given I click Continue and validation passes,
48
+ - Then I am redirected to [next route].
49
+
50
+ **AC-N+2 — Cancel behaviour**
51
+ - Given I click Cancel,
52
+ - Then I am returned to /case-list
53
+ - And the claim draft remains stored in session.
54
+
55
+ **AC-N+3 — Accessibility compliance**
56
+ - Given validation errors occur,
57
+ - Then:
58
+ - a GOV.UK error summary is displayed at the top of the page,
59
+ - errors link to the relevant field,
60
+ - focus moves to the error summary,
61
+ - and all inputs are properly labelled and keyboard accessible.
62
+
63
+ ---
64
+
65
+ ### Session persistence
66
+
67
+ ```js
68
+ session.claim.fieldName = {
69
+ property: 'value' | null
70
+ }
71
+ ```
72
+
73
+ ---
74
+
75
+ ### Out of scope
76
+ - [Item 1]
77
+ - [Item 2]
78
+
79
+ ---
80
+
81
+ ## Guidelines for writing user stories
82
+
83
+ ### Every AC must be:
84
+ - Deterministic
85
+ - Observable via the UI or session
86
+ - Unambiguous
87
+
88
+ ### Routing must be explicit for:
89
+ - Previous link
90
+ - Continue button
91
+ - Cancel link
92
+ - Any conditional paths
93
+
94
+ ### Keep stories focused:
95
+ - Maximum 5-7 ACs per story
96
+ - If more needed, split into multiple story files
@@ -0,0 +1,76 @@
1
+ # Test Template
2
+
3
+ Use this template when writing test specifications and executable tests.
4
+
5
+ ---
6
+
7
+ ## Outputs you must produce
8
+
9
+ ### 1. test-spec.md (write FIRST, keep under 100 lines)
10
+ - Brief understanding (5-10 lines max)
11
+ - AC to Test ID mapping table (compact format)
12
+ - Key assumptions (bullet list)
13
+
14
+ ### 2. Executable test file (write SECOND)
15
+ - One `describe` block per user story
16
+ - One `it` block per acceptance criterion
17
+ - Self-documenting test names with minimal comments
18
+
19
+ ---
20
+
21
+ ## AC to Test ID Mapping Table Format
22
+
23
+ | AC | Test ID | Scenario |
24
+ |----|---------|----------|
25
+ | AC-1 | T-1.1 | Valid credentials leads to success |
26
+ | AC-1 | T-1.2 | Invalid password leads to error |
27
+ | AC-2 | T-2.1 | Missing field shows validation |
28
+
29
+ ---
30
+
31
+ ## Traceability Table Format
32
+
33
+ | Acceptance Criterion | Test IDs | Notes |
34
+ |---------------------|----------|-------|
35
+ | AC-1 | T-1.1, T-1.2 | Happy path covered |
36
+ | AC-2 | T-2.1 | Edge case pending |
37
+
38
+ ---
39
+
40
+ ## Test Design Principles
41
+
42
+ - **Clarity over cleverness**: Prioritise readability with explicit steps
43
+ - **Determinism**: Avoid flaky patterns and random inputs
44
+ - **Coverage with intent**: Focus on behavioural coverage, not test count
45
+ - **Boundaries and edge cases**: Consider min/max, empty/null, invalid formats
46
+
47
+ ---
48
+
49
+ ## Test Structure Example
50
+
51
+ ```javascript
52
+ describe('Feature: [Feature Name]', () => {
53
+ describe('[User Story Reference]', () => {
54
+ it('T-1.1: [behaviour description]', async () => {
55
+ // Given [precondition]
56
+ // When [action]
57
+ // Then [expected result]
58
+ });
59
+
60
+ it('T-1.2: [another behaviour]', () => {
61
+ // Test implementation
62
+ });
63
+ });
64
+ });
65
+ ```
66
+
67
+ ---
68
+
69
+ ## Guidelines
70
+
71
+ - Make failure states meaningful with expected error messages
72
+ - Avoid over-prescribing implementation details
73
+ - Focus on externally observable behaviour
74
+ - Keep tests small and isolated with one main assertion per test
75
+ - Clean up async tasks and resources at test end
76
+ - Use `it.skip` or `test.todo` for pending/blocked tests
package/README.md CHANGED
@@ -19,6 +19,18 @@ npx orchestr8 init
19
19
 
20
20
  This installs the `.blueprint/` directory, `.business_context/`, and the `/implement-feature` skill to `.claude/commands/`. If files already exist, you'll be prompted before overwriting. It also adds the workflow queue to `.gitignore`.
21
21
 
22
+ ## Keeping Up to Date
23
+
24
+ **Modules** (history, insights, feedback, retry, validate) are part of the npm package and update automatically when you use `npx` - no action needed.
25
+
26
+ **Project files** (agent specs, templates, skill definition) are copied to your project and need explicit updating:
27
+
28
+ ```bash
29
+ npx orchestr8 update
30
+ ```
31
+
32
+ This updates `.blueprint/agents/`, `.blueprint/templates/`, `.blueprint/ways_of_working/`, and `.claude/commands/implement-feature.md` while preserving your content in `features/` and `system_specification/`.
33
+
22
34
  ## Commands
23
35
 
24
36
  ### Core Commands
@@ -65,11 +77,34 @@ Run the pipeline with the `/implement-feature` skill in Claude Code:
65
77
  /implement-feature "user-auth" --no-history # Skip history recording
66
78
  /implement-feature "user-auth" --no-commit # Skip auto-commit
67
79
  /implement-feature "user-auth" --pause-after=alex|cass|nigel|codey-plan
80
+ /implement-feature "user-auth" --with-stories # Force include Cass stage
81
+ /implement-feature "user-auth" --skip-stories # Force skip Cass stage
82
+ ```
83
+
84
+ ## Smart Story Routing (v2.7)
85
+
86
+ The pipeline automatically classifies features as **technical** or **user-facing** and routes accordingly:
87
+
88
+ | Feature Type | Cass Stage | Example Features |
89
+ |--------------|------------|------------------|
90
+ | **Technical** | Skipped | refactoring, optimization, infrastructure, caching |
91
+ | **User-facing** | Included | login flows, dashboards, forms, notifications |
92
+
93
+ This saves ~25-40k tokens per technical feature while preserving story quality for user-facing features.
94
+
95
+ ```bash
96
+ # Auto-detection (default)
97
+ /implement-feature "token-optimization" # Detected as technical → skips Cass
98
+ /implement-feature "user-dashboard" # Detected as user-facing → includes Cass
99
+
100
+ # Manual override
101
+ /implement-feature "edge-case" --with-stories # Force include Cass
102
+ /implement-feature "edge-case" --skip-stories # Force skip Cass
68
103
  ```
69
104
 
70
105
  ## Pipeline Flow
71
106
 
72
- The pipeline now includes validation, feedback loops, and history tracking:
107
+ The pipeline includes validation, smart routing, feedback loops, and history tracking:
73
108
 
74
109
  ```
75
110
  ┌─────────────────────────────────────────────────────────────────┐
@@ -85,19 +120,30 @@ The pipeline now includes validation, feedback loops, and history tracking:
85
120
 
86
121
 
87
122
  ┌─────────────────────────────────────────────────────────────────┐
88
- │ Alex (Feature Spec)
89
- │ │ │
90
- ▼ │
91
- │ Cass rates Alex → Quality Gate (pause if rating < 3) │
92
- │ │ │
93
-
94
- Cass (User Stories)
95
-
96
-
97
- │ Nigel rates Cass → Quality Gate │
98
- │ │
99
- │ ▼ │
100
- Nigel (Tests)
123
+ │ Alex (Feature Spec) + Handoff Summary
124
+ └─────────────────────────────────────────────────────────────────┘
125
+
126
+
127
+ ┌─────────────────────────────────────────────────────────────────┐
128
+ Smart Routing (v2.7)
129
+ Classify feature as technical or user-facing
130
+ • Technical → skip Cass (saves ~25-40k tokens)
131
+ • User-facing → include Cass
132
+ └─────────────────────────────────────────────────────────────────┘
133
+
134
+ ┌───────────────┴───────────────┐
135
+
136
+ ▼ ▼
137
+ ┌──────────────────────┐ ┌──────────────────────┐
138
+ │ Technical Features │ │ User-Facing Features│
139
+ │ Skip to Nigel │ │ Cass (User Stories) │
140
+ └──────────────────────┘ └──────────────────────┘
141
+ │ │
142
+ └───────────────┬───────────────┘
143
+
144
+
145
+ ┌─────────────────────────────────────────────────────────────────┐
146
+ │ Nigel (Tests) + Handoff Summary │
101
147
  │ │ │
102
148
  │ ▼ │
103
149
  │ Codey rates Nigel → Quality Gate │
@@ -133,6 +179,10 @@ orchestr8 includes these built-in modules for observability and self-improvement
133
179
  | **insights** | Analyzes patterns, detects bottlenecks, recommends improvements |
134
180
  | **retry** | Smart retry strategies based on failure history |
135
181
  | **feedback** | Agent-to-agent quality assessment with correlation tracking |
182
+ | **classifier** | Smart routing — classifies features as technical or user-facing |
183
+ | **handoff** | Structured summaries between agents for token efficiency |
184
+ | **business-context** | Lazy loading of business context based on feature needs |
185
+ | **tools** | Tool schemas and validation for Claude native features |
136
186
 
137
187
  ### How They Work Together
138
188
 
@@ -163,14 +213,24 @@ analyzes: recommends: calibrates:
163
213
  ```
164
214
  your-project/
165
215
  ├── .blueprint/
166
- │ ├── agents/ # Agent specifications (with guardrails)
216
+ │ ├── agents/ # Agent specifications
167
217
  │ │ ├── AGENT_SPECIFICATION_ALEX.md
168
218
  │ │ ├── AGENT_BA_CASS.md
169
219
  │ │ ├── AGENT_TESTER_NIGEL.md
170
- │ │ └── AGENT_DEVELOPER_CODEY.md
171
- ├── templates/ # Spec templates
220
+ │ │ ├── AGENT_DEVELOPER_CODEY.md
221
+ │ └── GUARDRAILS.md # Shared guardrails (v2.7)
222
+ │ ├── prompts/ # Slim runtime prompts (v2.7)
223
+ │ │ ├── TEMPLATE.md
224
+ │ │ ├── alex-runtime.md
225
+ │ │ ├── cass-runtime.md
226
+ │ │ ├── nigel-runtime.md
227
+ │ │ ├── codey-plan-runtime.md
228
+ │ │ └── codey-implement-runtime.md
229
+ │ ├── templates/ # Spec and output templates
172
230
  │ │ ├── SYSTEM_SPEC.md
173
- │ │ └── FEATURE_SPEC.md
231
+ │ │ ├── FEATURE_SPEC.md
232
+ │ │ ├── STORY_TEMPLATE.md # (v2.7)
233
+ │ │ └── TEST_TEMPLATE.md # (v2.7)
174
234
  │ ├── ways_of_working/ # Development rituals
175
235
  │ ├── features/ # Feature specs (populated per feature)
176
236
  │ └── system_specification/ # System spec (populated on first run)
@@ -232,6 +292,22 @@ $ npx orchestr8 insights
232
292
  - Avg duration: 14 min → 11 min (improving)
233
293
  ```
234
294
 
295
+ ## Token Efficiency (v2.7)
296
+
297
+ Version 2.7 introduces several optimizations to reduce token usage:
298
+
299
+ | Optimization | Savings | Description |
300
+ |--------------|---------|-------------|
301
+ | **Shared Guardrails** | ~1,200 tokens | Single GUARDRAILS.md instead of duplicated in each agent spec |
302
+ | **Slim Runtime Prompts** | ~5,200 tokens | 30-50 line prompts instead of 200-400 line full specs |
303
+ | **Upstream Summaries** | ~2,000-4,000 tokens | Handoff summaries between agents instead of full artifacts |
304
+ | **Template Extraction** | ~800 tokens | Templates moved to separate files, loaded on demand |
305
+ | **Lazy Business Context** | Variable | Only loaded when feature spec references it |
306
+ | **Compressed Feedback** | ~400 tokens | 3-line feedback prompts instead of 7-line |
307
+ | **Smart Story Routing** | ~25,000-40,000 tokens | Skip Cass for technical features |
308
+
309
+ **Total estimated savings: 10,000+ tokens per pipeline run** (more for technical features)
310
+
235
311
  ## License
236
312
 
237
313
  MIT