npm - orchestr8 - Versions diffs - 3.2.0 → 3.3.0 - Mend

orchestr8 3.2.0 → 3.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (3) hide show

package/.blueprint/prompts/codey-implement-runtime.md +1 -0
package/.blueprint/ways_of_working/DEVELOPMENT_RITUAL.md +160 -124
package/package.json +1 -1

package/.blueprint/prompts/codey-implement-runtime.md CHANGED Viewed

@@ -40,6 +40,7 @@ Implement the feature according to the plan. Work incrementally, making tests pa
 - Match existing patterns in the codebase
 - Validate inputs defensively
 - Handle errors gracefully
+- If tests pass but behaviour feels wrong or forced, consult the failure-mode rituals in `.blueprint/ways_of_working/DEVELOPMENT_RITUAL.md`
 ## Completion

package/.blueprint/ways_of_working/DEVELOPMENT_RITUAL.md CHANGED Viewed

@@ -1,142 +1,178 @@
-# Development Ritual (with CLI + Failure Modes)
+# Development Ritual
 This document defines:
-- the **core ritual**
-- a **CLI checklist** agents must walk through
-- **micro-rituals for failure modes**
+- The **pipeline stages** and what each agent must deliver
+- **Checklists** each agent must satisfy before handoff
+- **Failure-mode rituals** that override normal flow when triggered
+- The **feedback and handoff** mechanisms that connect stages
-A stage is not complete until its ritual is satisfied.
+A stage is not complete until its checklist is satisfied.
 ---
-## 🔁 Core Ritual (Summary)
+## Pipeline Stages
-1️⃣ Story → Tester
-2️⃣ Tester → Developer
-3️⃣ Developer → QA
+```
+Alex (feature spec) → Cass (user stories) → Nigel (tests) → Codey (plan → implement) → Auto-commit → Human QA
+```
-Tests define behaviour. QA validates intent.
+Each agent reads the previous agent's outputs and produces artifacts for the next. Context is passed via **handoff summaries** (max 30 lines) to keep token usage efficient. The pipeline uses a **feedback chain** where each agent rates the previous agent's work before starting their own.
+Tests define behaviour. The human validates intent after auto-commit.
 ---
-## 🖥️ CLI Agent Ritual Checklist
+## Handoff Mechanism
-Agents should **print this checklist to the CLI** at the start of their work and explicitly tick items as they complete them.
+Between stages, the pipeline creates a handoff summary (`handoff-{agent}.md`, max 30 lines) that passes key context to the next agent. This keeps each agent focused without re-reading everything from scratch.
-### Example CLI pattern
+Each agent also provides **feedback** on the previous agent's output:
+- **Rating** (1-5) on quality
+- **Issues** list (if any)
+- **Recommendation**: `proceed`, `pause`, or `revise`
+If the rating falls below the configured threshold (default: 3.0), the pipeline pauses for human review. See `feedback-config` for threshold settings.
+---
+## Agent Checklists
+### Alex (System Specification)
+Before writing the feature spec:
+- [ ] Read the system specification
+- [ ] Read relevant business context (`.business_context/`)
+- [ ] Read the feature template
+Before handoff:
+- [ ] Feature spec written to `FEATURE_SPEC.md`
+- [ ] Intent, scope, actors, rules, and dependencies covered
+- [ ] Ambiguities flagged explicitly
+- [ ] Assumptions labelled as such
+- [ ] Spec aligns with system boundaries
+### Cass (Story Writer)
+Before writing stories:
+- [ ] Read the feature spec
+- [ ] Read the system specification for context
+- [ ] Identified primary behaviour, entry/exit conditions, branching logic
+Before handoff:
+- [ ] Each story file (`story-{slug}.md`) has a single clear goal
+- [ ] Acceptance criteria are in Given/When/Then, max 5-7 per story
+- [ ] Routing is explicit (no "goes to next screen")
+- [ ] Out-of-scope items listed
+- [ ] Assumptions flagged
+### Nigel (Tester)
-```text
-[ ] Read story and acceptance criteria
-[ ] Read tester understanding & test plan
-[ ] Ran baseline tests
-[ ] Implemented behaviour
-[ ] Tests passing
-[ ] Lint passing
-[ ] Summary written
-```
-### Tester CLI Ritual (Nigel)
 Before writing tests:
-[ ] Story has a single clear goal
-[ ] Acceptance criteria are testable
-[ ] Ambiguities identified
-[ ] Assumptions written down
-Before handover to the human to pass to Claude:
-[ ] Understanding summary written
-[ ] Test plan created
-[ ] Happy path tests written
-[ ] Edge/error tests written
-[ ] Tests runnable via npm test
-[ ] Traceability table complete
-[ ] Open questions listed
-If any box is unchecked → raise it with the human that its not ready to hand over. If all boxes are checked, let the human know that its ready to handover to Claude.
-🧑‍💻 Developer CLI Ritual (Claude)
+- [ ] Read all story files and the feature spec
+- [ ] Acceptance criteria are testable
+- [ ] Ambiguities identified
+- [ ] Assumptions written down
+Before handoff:
+- [ ] `test-spec.md` written (understanding, AC-to-test mapping, assumptions)
+- [ ] Executable test file written
+- [ ] Happy path tests written
+- [ ] Edge case and error tests written
+- [ ] Tests runnable via the project's configured test command (see `.claude/stack-config.json`)
+- [ ] Traceability table complete (every AC mapped to test IDs)
+- [ ] Open questions listed
+If any box is unchecked, raise it before handoff.
+### Codey (Developer) — Planning
+Before writing the plan:
+- [ ] Read feature spec, stories, test-spec, and executable tests
+- [ ] Built mental model of happy path, edge cases, error flows
+- [ ] Identified what already exists vs what is new
+Before handoff:
+- [ ] `IMPLEMENTATION_PLAN.md` written (summary, files table, steps, risks)
+- [ ] Steps ordered to make tests pass incrementally
+- [ ] No implementation code written yet
+### Codey (Developer) — Implementation
 Before coding:
-[ ] Read story + ACs
-[ ] Read tester understanding
-[ ] Read executable tests
-[ ] Ran baseline tests (expected failures only)
+- [ ] Read implementation plan and tests
+- [ ] Ran baseline tests (note expected failures)
 During coding:
-[ ] Implemented behaviour incrementally
-[ ] Ran relevant tests after each change
-[ ] Did not weaken or delete tests
-Before handover to the human:
-[ ] All tests passing
-[ ] Lint passing
-[ ] No unexplained skip/todo
-[ ] Changes summarised
-[ ] Assumptions restated
-If tests pass but confidence is low → trigger a failure-mode ritual.
-🚨 Failure-Mode Micro-Rituals
-These rituals override normal flow. When triggered, stop and follow them explicitly.
-❓ Tests pass, but behaviour feels wrong
-Trigger when:
-- UX feels off
-- behaviour technically matches tests but not intent
-- something feels “too easy”
-Ritual:
-[ ] Re-read original user story
-[ ] Re-state intended behaviour in plain English
-[ ] Identify mismatch: story vs tests vs implementation
-[ ] Decide:
-    - tests are wrong
-    - story is underspecified
-    - implementation misinterpreted behaviour
-Outcome:
-Update tests (Tester)
-Clarify ACs (Story owner)
-Fix implementation (Developer)
-Never “let it slide”.
-🧪 Tests are unclear or contradictory
-Trigger when:
-- assertions conflict
-- test names don’t match expectations
-- passing tests don’t explain behaviour
-Ritual:
-[ ] Identify specific confusing test(s)
-[ ] State what behaviour they appear to encode
-[ ] Compare to acceptance criteria
-[ ] Propose corrected test behaviour
-Outcome:
-- Tester revises tests
-- Developer does not guess
-🔁 Tests are failing for non-behaviour reasons
-Trigger when:
-- environment/setup issues
-- brittle timing
-- global state leakage
-Ritual:
-[ ] Confirm failure is not missing behaviour
-[ ] Isolate failing test
-[ ] Remove flakiness or hidden coupling
-[ ] Re-run full suite
-Outcome:
-- Stabilise tests before continuing feature work
-⚠️ Developer changed behaviour to make tests pass
-Trigger when:
-- implementation feels forced
-- logic seems unnatural or overly complex
-Ritual:
-[ ] Pause implementation
-[ ] Identify which test is driving awkward behaviour
-[ ] Re-check acceptance criteria
-[ ] Raise concern to Tester / QA
-Outcome:
-- Adjust tests or clarify intent
-- Prefer simpler behaviour aligned to story
-🧭 Meta-Rules (Always On)
-❗ Tests are the behavioural contract
-❗ Green builds are necessary, not sufficient
-❗ Assumptions must be written down
-❗ No silent changes
-❗ When in doubt, slow down and ask the human
+- [ ] Implemented behaviour incrementally (one file at a time)
+- [ ] Ran tests after each file change
+- [ ] Did not weaken or delete Nigel's tests
+Before handoff:
+- [ ] All tests passing
+- [ ] Lint passing
+- [ ] No unexplained `skip` or `todo`
+- [ ] Changes summarised (files changed, test status, blockers)
+- [ ] Assumptions restated
+If tests pass but confidence is low, trigger a failure-mode ritual (see below).
+---
+## Failure-Mode Rituals
+These override normal flow. When triggered, stop and follow the steps explicitly.
+### Tests pass, but behaviour feels wrong
+**Trigger:** Behaviour technically matches tests but not intent, or something feels "too easy."
+- [ ] Re-read the original user story
+- [ ] Re-state intended behaviour in plain English
+- [ ] Identify mismatch: story vs tests vs implementation
+- [ ] Decide: tests are wrong, story is underspecified, or implementation misinterpreted behaviour
+**Outcome:** Update tests (Nigel), clarify ACs (Cass), or fix implementation (Codey). Never "let it slide."
+### Tests are unclear or contradictory
+**Trigger:** Assertions conflict, test names don't match expectations, or passing tests don't explain behaviour.
+- [ ] Identify the specific confusing test(s)
+- [ ] State what behaviour they appear to encode
+- [ ] Compare to acceptance criteria
+- [ ] Propose corrected test behaviour
+**Outcome:** Nigel revises tests. Codey does not guess.
+### Tests fail for non-behaviour reasons
+**Trigger:** Environment/setup issues, brittle timing, or global state leakage.
+- [ ] Confirm failure is not missing behaviour
+- [ ] Isolate failing test
+- [ ] Remove flakiness or hidden coupling
+- [ ] Re-run full suite
+**Outcome:** Stabilise tests before continuing feature work.
+### Implementation feels forced
+**Trigger:** Logic seems unnatural or overly complex to make tests pass.
+- [ ] Pause implementation
+- [ ] Identify which test is driving the awkward behaviour
+- [ ] Re-check acceptance criteria
+- [ ] Raise concern to the human
+**Outcome:** Adjust tests or clarify intent. Prefer simpler behaviour aligned to the story.
+---
+## Meta-Rules (Always On)
+- Tests are the behavioural contract
+- Green builds are necessary, not sufficient
+- No silent changes — all assumptions written down
+- When in doubt, slow down and ask the human
+See `GUARDRAILS.md` for the full shared constraints (source restrictions, escalation protocol, anti-patterns).

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "orchestr8",
-  "version": "3.2.0",
+  "version": "3.3.0",
   "description": "Multi-agent workflow framework for automated feature development",
   "main": "src/index.js",
   "bin": {