npm - orchestr8 - Versions diffs - 2.6.0 → 2.7.0 - Mend

orchestr8 2.6.0 → 2.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (46) hide show

package/.blueprint/features/feature_slim-agent-prompts/FEATURE_SPEC.md ADDED Viewed

@@ -0,0 +1,145 @@
+# Feature Specification — Slim Agent Prompts
+## 1. Feature Intent
+**Why this feature exists.**
+- Current agent prompts tell each agent to "Read your full specification from: AGENT_*.md"
+- Full specs are 200-450 lines (~1,500-2,000 tokens each), loaded every invocation
+- Most spec content is background/training context, not runtime-essential
+- Creating slim runtime prompts reduces token usage by ~5,200 tokens per pipeline run
+---
+## 2. Scope
+### In Scope
+- Create slim runtime prompts in `.blueprint/prompts/` directory
+- Runtime prompts contain only: role reminder, task, inputs, outputs, output rules
+- Update SKILL.md to use runtime prompts instead of full specs
+- Keep full specs for reference/documentation
+### Out of Scope
+- Changing agent behaviour or capabilities
+- Removing full agent specs (they remain for context)
+- Changing the pipeline flow
+---
+## 3. Actors Involved
+| Actor | Can Do | Cannot Do |
+|-------|--------|-----------|
+| Alex | Execute from slim prompt | Access removed context unless needed |
+| Cass | Execute from slim prompt | Access removed context unless needed |
+| Nigel | Execute from slim prompt | Access removed context unless needed |
+| Codey | Execute from slim prompt (plan + implement) | Access removed context unless needed |
+---
+## 4. Behaviour Overview
+**Happy path:**
+1. Pipeline invokes agent via Task tool
+2. Agent receives slim prompt (~30-50 lines) instead of "read full spec" instruction
+3. Agent executes task with essential context only
+4. Output quality maintained with fewer input tokens
+**Slim prompt structure:**
+```markdown
+You are {Agent}, the {Role}.
+## Task
+{Current task description}
+## Inputs
+{List of files to read}
+## Outputs
+{Files to write, format requirements}
+## Rules
+{5-7 critical rules only}
+```
+**Key outcomes:**
+- ~5,200 fewer input tokens per pipeline run
+- Faster agent responses (less context to process)
+- Same output quality
+---
+## 5. State & Lifecycle Interactions
+- No state changes — prompts are stateless
+- Pipeline flow unchanged
+- Queue structure unchanged
+---
+## 6. Rules & Decision Logic
+| Rule | Description |
+|------|-------------|
+| Essential only | Runtime prompts contain only task-critical information |
+| No duplication | Don't repeat what's in input files (e.g., feature spec) |
+| Reference full spec | Include path to full spec for edge cases: "For detailed guidance, see: AGENT_*.md" |
+| Consistent structure | All runtime prompts follow same template |
+---
+## 7. Dependencies
+- SKILL.md must be updated with new prompt structure
+- New `.blueprint/prompts/` directory created
+- Full agent specs remain in `.blueprint/agents/` for reference
+---
+## 8. Non-Functional Considerations
+- **Performance:** ~5,200 token reduction per run (~35% of agent input tokens)
+- **Latency:** Faster responses due to smaller context
+- **Maintainability:** Two places to update (runtime prompt + full spec) — mitigated by clear separation of concerns
+---
+## 9. Assumptions & Open Questions
+**Assumptions:**
+- Agents can perform tasks effectively with condensed prompts
+- Edge cases are rare enough that full spec reference is sufficient
+- Output quality won't degrade with less verbose instructions
+**Open Questions:**
+- Should runtime prompts be generated from full specs, or maintained separately?
+- What's the minimum context needed for each agent to maintain quality?
+- Should we A/B test slim vs full prompts to measure quality impact?
+---
+## 10. Impact on System Specification
+- Reinforces efficiency goals
+- No contradiction with system spec
+- Consider adding "token efficiency" as a system-level concern
+---
+## 11. Handover to BA (Cass)
+**Story themes:**
+- Create runtime prompt template
+- Create slim prompts for each agent (Alex, Cass, Nigel, Codey-plan, Codey-implement)
+- Update SKILL.md to use runtime prompts
+- Test output quality with slim prompts
+**Expected story boundaries:**
+- One story per agent prompt creation
+- One story for SKILL.md integration
+- One story for quality validation
+---
+## 12. Change Log (Feature-Level)
+| Date | Change | Reason | Raised By |
+|-----|------|--------|-----------|
+| 2026-02-25 | Initial spec | Token efficiency improvement | Claude |

package/.blueprint/features/feature_slim-agent-prompts/IMPLEMENTATION_PLAN.md ADDED Viewed

@@ -0,0 +1,87 @@
+# Implementation Plan — Slim Agent Prompts
+## Summary
+Create slim runtime prompts (~30-50 lines each) for all agents to reduce token usage by ~5,200 tokens per pipeline run. Implementation involves: (1) creating a TEMPLATE.md with standardized structure, (2) creating 5 runtime prompt files following the template, and (3) updating SKILL.md to use runtime prompts instead of full spec references.
+## Files to Create/Modify
+| Path | Action | Purpose |
+|------|--------|---------|
+| `.blueprint/prompts/` | Create dir | New directory for runtime prompts |
+| `.blueprint/prompts/TEMPLATE.md` | Create | Defines standard structure for all runtime prompts |
+| `.blueprint/prompts/alex-runtime.md` | Create | Slim prompt for Alex (specification) |
+| `.blueprint/prompts/cass-runtime.md` | Create | Slim prompt for Cass (stories) |
+| `.blueprint/prompts/nigel-runtime.md` | Create | Slim prompt for Nigel (tests) |
+| `.blueprint/prompts/codey-plan-runtime.md` | Create | Slim prompt for Codey planning phase |
+| `.blueprint/prompts/codey-implement-runtime.md` | Create | Slim prompt for Codey implementation phase |
+| `SKILL.md` | Modify | Replace "Read your full specification" with runtime prompt content |
+## Implementation Steps
+1. **Create prompts directory**: `mkdir -p .blueprint/prompts`
+2. **Create TEMPLATE.md** with sections:
+   - Role identity line pattern
+   - Task section placeholder
+   - Inputs section (file paths)
+   - Outputs section (files + format)
+   - Rules section (5-7 items, warn against duplication)
+   - Full spec reference line
+   - Include guidance: target 30-50 non-blank lines
+3. **Create alex-runtime.md** (~35 lines):
+   - Role: "You are Alex, the System Specification Agent"
+   - Task: Create feature spec
+   - Inputs: System spec, template, business context
+   - Outputs: FEATURE_SPEC.md with format rules
+   - Rules: 5-7 essential rules from full spec
+   - Reference: AGENT_SPECIFICATION_ALEX.md
+4. **Create cass-runtime.md** (~35 lines):
+   - Role: "You are Cass, the Story Writer Agent"
+   - Task: Create user stories
+   - Inputs: Feature spec, system spec
+   - Outputs: story-*.md files
+   - Rules: 5-7 essential rules
+   - Reference: AGENT_BA_CASS.md
+5. **Create nigel-runtime.md** (~35 lines):
+   - Role: "You are Nigel, the Tester Agent"
+   - Task: Create tests
+   - Inputs: Stories, feature spec
+   - Outputs: test-spec.md, test file
+   - Rules: 5-7 essential rules
+   - Reference: AGENT_TESTER_NIGEL.md
+6. **Create codey-plan-runtime.md** (~35 lines):
+   - Role: "You are Codey, the Developer Agent"
+   - Task: Create implementation plan (not implement)
+   - Inputs: Feature spec, stories, test spec, tests
+   - Outputs: IMPLEMENTATION_PLAN.md
+   - Rules: 5-7 essential rules
+   - Reference: AGENT_DEVELOPER_CODEY.md
+7. **Create codey-implement-runtime.md** (~35 lines):
+   - Role: "You are Codey, the Developer Agent"
+   - Task: Implement feature
+   - Inputs: Plan, tests
+   - Outputs: Source files (incremental)
+   - Rules: 5-7 essential rules
+   - Reference: AGENT_DEVELOPER_CODEY.md
+8. **Update SKILL.md agent prompts**: For each of Steps 6-10, replace the pattern:
+   ```
+   Read your full specification from: .blueprint/agents/AGENT_*.md
+   ```
+   with embedded content from corresponding runtime prompt, keeping task-specific context (slug, paths).
+9. **Verify line counts**: Ensure each runtime prompt has 30-50 non-blank lines using test helper `countNonBlankLines()`.
+10. **Run tests**: `node --test test/feature_slim-agent-prompts.test.js` to verify all 18 test cases pass.
+## Risks/Questions
+- **Maintainability**: Two sources of truth (runtime prompt + full spec). Mitigated by clear separation of concerns - runtime prompts contain only task instructions, full specs contain background/training.
+- **Quality impact**: Agents receive less context. Mitigated by including full spec reference for edge cases.
+- **SKILL.md size**: Embedding prompt content may increase file size. Consider using file references if prompts grow.

package/.blueprint/features/feature_slim-agent-prompts/story-create-runtime-prompt-template.md ADDED Viewed

@@ -0,0 +1,59 @@
+# Story — Create Runtime Prompt Template
+## User Story
+As a **framework maintainer**, I want a standardized runtime prompt template so that all agent prompts follow a consistent structure and contain only essential runtime information.
+---
+## Context / Scope
+- Applies to all agents: Alex, Cass, Nigel, Codey (plan + implement)
+- Template defines the structure for slim runtime prompts (~30-50 lines)
+- Full agent specs remain in `.blueprint/agents/` for reference
+- Runtime prompts will live in `.blueprint/prompts/`
+- See feature spec: `.blueprint/features/feature_slim-agent-prompts/FEATURE_SPEC.md`
+---
+## Acceptance Criteria
+**AC-1 — Template structure defined**
+- Given the need for consistent slim prompts,
+- When the runtime prompt template is created,
+- Then it includes exactly these sections in order:
+  1. Role identity line ("You are {Agent}, the {Role}.")
+  2. Task section
+  3. Inputs section (files to read)
+  4. Outputs section (files to write, format requirements)
+  5. Rules section (5-7 critical rules only)
+  6. Full spec reference line
+**AC-2 — Template enforces brevity**
+- Given the goal of token reduction,
+- When a runtime prompt follows the template,
+- Then the total line count is between 30-50 lines (excluding blank lines).
+**AC-3 — Template includes full spec reference**
+- Given agents may need additional context for edge cases,
+- When the template is applied,
+- Then each prompt includes: "For detailed guidance, see: `.blueprint/agents/AGENT_*.md`"
+**AC-4 — Template location established**
+- Given the need for organized prompt storage,
+- When runtime prompts are created,
+- Then they are stored in `.blueprint/prompts/` directory with naming convention `{agent-slug}-runtime.md`.
+**AC-5 — No duplication of input content**
+- Given runtime prompts should be minimal,
+- When the Rules section is written,
+- Then it does not duplicate information already present in the agent's input files (feature spec, system spec, etc.).
+---
+## Out of Scope
+- Creating the actual agent prompts (covered in separate story)
+- Updating SKILL.md integration (covered in separate story)
+- Automated template validation tooling
+- A/B testing infrastructure for quality comparison

package/.blueprint/features/feature_slim-agent-prompts/story-create-slim-agent-prompts.md ADDED Viewed

@@ -0,0 +1,65 @@
+# Story — Create Slim Agent Prompts
+## User Story
+As a **pipeline user**, I want slim runtime prompts for each agent so that pipeline execution uses fewer input tokens while maintaining output quality.
+---
+## Context / Scope
+- Creates 5 runtime prompts: Alex, Cass, Nigel, Codey-plan, Codey-implement
+- Each prompt follows the template from story-create-runtime-prompt-template.md
+- Full specs remain in `.blueprint/agents/` for reference
+- Expected token reduction: ~5,200 tokens per pipeline run
+- See feature spec: `.blueprint/features/feature_slim-agent-prompts/FEATURE_SPEC.md`
+---
+## Acceptance Criteria
+**AC-1 — All agents have runtime prompts**
+- Given the template has been established,
+- When slim prompts are created,
+- Then the following files exist in `.blueprint/prompts/`:
+  - `alex-runtime.md`
+  - `cass-runtime.md`
+  - `nigel-runtime.md`
+  - `codey-plan-runtime.md`
+  - `codey-implement-runtime.md`
+**AC-2 — Prompts contain role identity**
+- Given each agent has a distinct role,
+- When a runtime prompt is read,
+- Then it begins with "You are {Name}, the {Role}." matching the agent's identity from the full spec.
+**AC-3 — Prompts specify inputs and outputs**
+- Given agents need to know what to read and write,
+- When a runtime prompt is read,
+- Then the Inputs section lists specific file paths to read,
+- And the Outputs section lists specific files to create with format requirements.
+**AC-4 — Rules are essential only**
+- Given the goal of brevity,
+- When a runtime prompt's Rules section is reviewed,
+- Then it contains 5-7 rules maximum,
+- And each rule is critical to correct task execution.
+**AC-5 — Prompts reference full specs**
+- Given edge cases may require additional context,
+- When a runtime prompt is read,
+- Then it includes a reference to the full agent spec path.
+**AC-6 — Line count within target**
+- Given the target of 30-50 lines,
+- When any runtime prompt is measured,
+- Then it contains between 30-50 non-blank lines.
+---
+## Out of Scope
+- Modifying full agent specs in `.blueprint/agents/`
+- Changing agent behaviour or capabilities
+- SKILL.md integration (covered in separate story)
+- Automated token counting tooling

package/.blueprint/features/feature_slim-agent-prompts/story-skill-integration.md ADDED Viewed

@@ -0,0 +1,53 @@
+# Story — SKILL.md Integration with Runtime Prompts
+## User Story
+As a **pipeline user**, I want SKILL.md to use the slim runtime prompts so that agent invocations benefit from reduced token usage.
+---
+## Context / Scope
+- Updates SKILL.md to embed or reference runtime prompts instead of full specs
+- Runtime prompts located in `.blueprint/prompts/`
+- Pipeline flow remains unchanged: Alex -> Cass -> Nigel -> Codey
+- Must maintain all existing functionality (--pause-after, --no-commit, queue recovery)
+- See feature spec: `.blueprint/features/feature_slim-agent-prompts/FEATURE_SPEC.md`
+---
+## Acceptance Criteria
+**AC-1 — Agent spawns use runtime prompts**
+- Given SKILL.md spawns agents via Task tool,
+- When an agent is invoked,
+- Then the prompt content comes from `.blueprint/prompts/{agent}-runtime.md` rather than instructing "Read your full specification from: AGENT_*.md".
+**AC-2 — All five agent contexts updated**
+- Given the pipeline has 5 agent contexts,
+- When SKILL.md is reviewed,
+- Then Alex, Cass, Nigel, Codey-plan, and Codey-implement all use their respective runtime prompts.
+**AC-3 — Pipeline flow unchanged**
+- Given existing pipeline behaviour must be preserved,
+- When a feature is processed with updated SKILL.md,
+- Then the sequence remains: Alex -> Cass -> Nigel -> Codey-plan -> Codey-implement -> Auto-commit.
+**AC-4 — Pause and commit options preserved**
+- Given users rely on `--pause-after` and `--no-commit` flags,
+- When SKILL.md is updated,
+- Then these options continue to function as documented.
+**AC-5 — Queue recovery unchanged**
+- Given pipeline may fail mid-execution,
+- When a pipeline is resumed from queue state,
+- Then the correct runtime prompt is used for the resumed stage.
+---
+## Out of Scope
+- Changes to queue file structure
+- Changes to CLI commands
+- Adding new pipeline stages
+- Modifying the Task tool invocation mechanism

package/.blueprint/features/feature_smart-story-routing/FEATURE_SPEC.md ADDED Viewed

@@ -0,0 +1,147 @@
+# Feature Specification — Smart Story Routing
+## 1. Feature Intent
+The pipeline currently requires manual decision about whether to include the Cass (story writing) step. This feature automates that decision by classifying features as "technical" or "user-facing" based on their content, then routing accordingly.
+- Technical features skip Cass (saves ~25-40k tokens)
+- User-facing features include Cass (stories add value for user journeys)
+- Override flags allow manual control when needed
+---
+## 2. Scope
+### In Scope
+- Classify features based on feature spec content
+- Route pipeline to include/skip Cass based on classification
+- Provide override flags for manual control
+- Log classification decision for transparency
+### Out of Scope
+- Changing Cass's behavior
+- ML-based classification (use keyword matching)
+- Retroactive classification of completed features
+---
+## 3. Actors Involved
+| Actor | Role |
+|-------|------|
+| Pipeline orchestrator | Calls classifier, routes accordingly |
+| Feature classifier | Analyzes spec, returns classification |
+| User | Can override with flags |
+---
+## 4. Behaviour Overview
+### Classification Logic
+**Technical indicators** (skip Cass):
+- Keywords: refactor, token, performance, module, internal, infrastructure, optimization, extract, compress, cache, schema, validation, helper, utility, config
+- Patterns: "reduce.*token", "improve.*efficiency", "extract.*to"
+**User-facing indicators** (include Cass):
+- Keywords: user, customer, UI, screen, journey, flow, experience, interface, form, button, login, signup, dashboard, notification, email
+- Patterns: "user can", "user should", "as a user"
+**Decision rules:**
+1. Count technical indicators in feature spec
+2. Count user-facing indicators in feature spec
+3. If user-facing > technical → include Cass
+4. If technical > user-facing → skip Cass
+5. If tie or unclear → default to include Cass (safer)
+### Override Flags
+- `--with-stories` - Force include Cass regardless of classification
+- `--skip-stories` - Force skip Cass regardless of classification
+### Pipeline Integration
+```
+/implement-feature "slug"
+       │
+       ▼
+  Classify Feature
+       │
+       ├── Technical → Skip Cass → Nigel
+       │
+       └── User-facing → Include Cass → Cass → Nigel
+```
+---
+## 5. State & Lifecycle Interactions
+- Classification stored in queue: `current.featureType: "technical" | "user-facing"`
+- Classification stored in queue: `current.skippedCass: boolean`
+- No impact on other pipeline stages
+---
+## 6. Rules & Decision Logic
+| Rule | Description |
+|------|-------------|
+| Keyword matching | Case-insensitive search in feature spec |
+| Threshold | User-facing wins ties (conservative) |
+| Override precedence | Flags override classification |
+| Logging | Classification reason logged to console |
+---
+## 7. Dependencies
+- Feature spec must exist before classification
+- SKILL.md routes based on classification result
+- Queue tracks classification for recovery
+---
+## 8. Non-Functional Considerations
+- **Performance**: Classification is fast (string matching)
+- **Accuracy**: Keyword-based approach may misclassify edge cases
+- **Transparency**: Log why decision was made
+---
+## 9. Assumptions & Open Questions
+**Assumptions:**
+- Keyword matching is sufficient for most cases
+- Users will use override flags for edge cases
+- Technical features don't benefit significantly from stories
+**Open Questions:**
+- Should we track classification accuracy over time?
+- Should confidence score be exposed to user?
+---
+## 10. Impact on System Specification
+- Adds new routing decision point to pipeline
+- No breaking changes to existing behavior
+- Backward compatible (default is current behavior)
+---
+## 11. Handover to Nigel
+**Test themes:**
+- Classification function returns correct type for technical content
+- Classification function returns correct type for user-facing content
+- Override flags work correctly
+- Tie-breaking defaults to user-facing
+- Queue state updated with classification
+---
+## 12. Change Log
+| Date | Change | Reason | Raised By |
+|------|--------|--------|-----------|
+| 2026-02-25 | Initial spec | Optimize pipeline efficiency | User request |

package/.blueprint/features/feature_smart-story-routing/IMPLEMENTATION_PLAN.md ADDED Viewed

@@ -0,0 +1,73 @@
+# Implementation Plan: Smart Story Routing
+## Summary
+This feature adds automatic classification of features as "technical" or "user-facing" to determine whether the Cass (story writing) stage should be included in the pipeline. Implementation requires creating a new classifier module and integrating it with the orchestrator and SKILL.md pipeline definition.
+## Files to Create/Modify
+| Path | Action | Purpose |
+|------|--------|---------|
+| `src/classifier.js` | Create | Classification logic, flag parsing, decision function |
+| `src/orchestrator.js` | Modify | Add `featureType` and `skippedCass` to queue state |
+| `src/index.js` | Modify | Export classifier module |
+| `SKILL.md` | Modify | Add routing logic and new flags documentation |
+| `bin/cli.js` | Modify | Add `classify` command for testing/debugging |
+## Implementation Steps
+1. **Create `src/classifier.js` with keyword lists**
+   - Define `TECHNICAL_KEYWORDS` array (refactor, token, performance, etc.)
+   - Define `USER_FACING_KEYWORDS` array (user, customer, UI, etc.)
+   - Define `TECHNICAL_PATTERNS` and `USER_FACING_PATTERNS` regex arrays
+   - Tests: T-CF-1.1, T-CF-1.2, T-CF-1.3, T-CF-2.1, T-CF-2.2, T-CF-2.3
+2. **Implement `classifyFeature(specContent)` function**
+   - Case-insensitive keyword counting
+   - Pattern matching for regex patterns
+   - Return `{ type, technicalCount, userFacingCount, reason }`
+   - Tie-breaking: user-facing wins (conservative default)
+   - Tests: T-CF-3.1, T-CF-3.2
+3. **Implement `parseStoryFlags(args)` function**
+   - Parse `--with-stories` flag -> `{ override: 'include' }`
+   - Parse `--skip-stories` flag -> `{ override: 'skip' }`
+   - No flag -> `{ override: null }`
+   - Tests: T-FP-1.1, T-FP-1.2, T-FP-1.3, T-FP-1.4
+4. **Implement `shouldIncludeStories(featureType, override)` function**
+   - Override takes precedence over classification
+   - Return boolean for pipeline routing decision
+   - Tests: T-SD-1.1, T-SD-1.2, T-SD-2.1, T-SD-2.2
+5. **Implement logging helper `logClassification(result)`**
+   - Console output: "Feature classified as {type}: {reason}"
+   - Include indicator counts for transparency
+6. **Update `src/orchestrator.js` setCurrent function**
+   - Add optional `featureType` and `skippedCass` fields to queue state
+   - Ensure fields persist through queue operations
+   - Tests: T-QS-1.1, T-QS-1.2, T-QS-1.3
+7. **Update `src/index.js` exports**
+   - Export classifier functions for use by SKILL.md
+8. **Update `SKILL.md` pipeline routing**
+   - Add classification step after Step 5 (Initialize)
+   - Document `--with-stories` and `--skip-stories` flags
+   - Add conditional routing: if technical, skip to Nigel
+9. **Add optional CLI command `classify`**
+   - Usage: `npx orchestr8 classify path/to/FEATURE_SPEC.md`
+   - Outputs classification result for debugging/testing
+10. **Run all tests and verify**
+    - `node --test test/feature_smart-story-routing.test.js`
+    - Ensure all 19 tests pass
+## Risks/Questions
+- **Keyword list completeness**: May need tuning based on real-world usage; current lists from spec are a starting point
+- **Pattern matching performance**: Regex patterns should be compiled once, not per-call
+- **Edge case handling**: Empty/whitespace-only specs default to user-facing (safe default)
+- **SKILL.md integration**: Routing logic is instructional, not code - relies on Claude following instructions correctly