orchestr8 2.5.0 → 2.6.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.blueprint/agents/AGENT_BA_CASS.md +42 -19
- package/.blueprint/agents/AGENT_DEVELOPER_CODEY.md +42 -38
- package/.blueprint/agents/AGENT_SPECIFICATION_ALEX.md +45 -0
- package/.blueprint/agents/AGENT_TESTER_NIGEL.md +42 -21
- package/.blueprint/features/feature_adaptive-retry/FEATURE_SPEC.md +239 -0
- package/.blueprint/features/feature_adaptive-retry/IMPLEMENTATION_PLAN.md +48 -0
- package/.blueprint/features/feature_adaptive-retry/story-prompt-modification.md +85 -0
- package/.blueprint/features/feature_adaptive-retry/story-retry-config.md +89 -0
- package/.blueprint/features/feature_adaptive-retry/story-should-retry.md +98 -0
- package/.blueprint/features/feature_adaptive-retry/story-strategy-recommendation.md +85 -0
- package/.blueprint/features/feature_agent-guardrails/FEATURE_SPEC.md +328 -0
- package/.blueprint/features/feature_agent-guardrails/IMPLEMENTATION_PLAN.md +90 -0
- package/.blueprint/features/feature_agent-guardrails/story-citation-requirements.md +50 -0
- package/.blueprint/features/feature_agent-guardrails/story-confidentiality.md +50 -0
- package/.blueprint/features/feature_agent-guardrails/story-escalation-protocol.md +55 -0
- package/.blueprint/features/feature_agent-guardrails/story-source-restrictions.md +50 -0
- package/.blueprint/features/feature_feedback-loop/FEATURE_SPEC.md +347 -0
- package/.blueprint/features/feature_feedback-loop/IMPLEMENTATION_PLAN.md +71 -0
- package/.blueprint/features/feature_feedback-loop/story-feedback-collection.md +63 -0
- package/.blueprint/features/feature_feedback-loop/story-feedback-config.md +61 -0
- package/.blueprint/features/feature_feedback-loop/story-feedback-insights.md +63 -0
- package/.blueprint/features/feature_feedback-loop/story-quality-gates.md +57 -0
- package/.blueprint/features/feature_pipeline-history/FEATURE_SPEC.md +239 -0
- package/.blueprint/features/feature_pipeline-history/IMPLEMENTATION_PLAN.md +71 -0
- package/.blueprint/features/feature_pipeline-history/story-clear-history.md +73 -0
- package/.blueprint/features/feature_pipeline-history/story-display-history.md +75 -0
- package/.blueprint/features/feature_pipeline-history/story-record-execution.md +76 -0
- package/.blueprint/features/feature_pipeline-history/story-show-statistics.md +85 -0
- package/.blueprint/features/feature_pipeline-insights/FEATURE_SPEC.md +288 -0
- package/.blueprint/features/feature_pipeline-insights/IMPLEMENTATION_PLAN.md +65 -0
- package/.blueprint/features/feature_pipeline-insights/story-anomaly-detection.md +71 -0
- package/.blueprint/features/feature_pipeline-insights/story-bottleneck-analysis.md +75 -0
- package/.blueprint/features/feature_pipeline-insights/story-failure-patterns.md +75 -0
- package/.blueprint/features/feature_pipeline-insights/story-json-output.md +75 -0
- package/.blueprint/features/feature_pipeline-insights/story-trend-analysis.md +78 -0
- package/.blueprint/features/feature_validate-command/FEATURE_SPEC.md +209 -0
- package/.blueprint/features/feature_validate-command/IMPLEMENTATION_PLAN.md +59 -0
- package/.blueprint/features/feature_validate-command/story-failure-output.md +61 -0
- package/.blueprint/features/feature_validate-command/story-node-version-check.md +52 -0
- package/.blueprint/features/feature_validate-command/story-run-validation.md +59 -0
- package/.blueprint/features/feature_validate-command/story-success-output.md +50 -0
- package/.blueprint/system_specification/SYSTEM_SPEC.md +248 -0
- package/README.md +182 -38
- package/SKILL.md +333 -23
- package/bin/cli.js +128 -20
- package/package.json +2 -2
- package/src/feedback.js +171 -0
- package/src/history.js +306 -0
- package/src/index.js +57 -2
- package/src/init.js +2 -6
- package/src/insights.js +504 -0
- package/src/retry.js +274 -0
- package/src/validate.js +172 -0
- package/src/skills.js +0 -93
|
@@ -0,0 +1,89 @@
|
|
|
1
|
+
# Story — Retry Configuration Management
|
|
2
|
+
|
|
3
|
+
## User Story
|
|
4
|
+
|
|
5
|
+
As a **developer using orchestr8**, I want to **view, modify, and reset retry configuration** so that **I can customize how the pipeline handles failures based on my project's needs**.
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## Context / Scope
|
|
10
|
+
|
|
11
|
+
- Per FEATURE_SPEC.md:Section 2, retry configuration is managed via `.claude/retry-config.json`
|
|
12
|
+
- Per FEATURE_SPEC.md:Section 3 (Actors), users retain final decision authority and can modify thresholds
|
|
13
|
+
- Per FEATURE_SPEC.md:Section 11 (Handover), this story covers CLI commands for `retry-config` and `retry-config reset`
|
|
14
|
+
- Configuration file is gitignored (per FEATURE_SPEC.md:Section 7)
|
|
15
|
+
|
|
16
|
+
---
|
|
17
|
+
|
|
18
|
+
## Acceptance Criteria
|
|
19
|
+
|
|
20
|
+
**AC-1 — View current configuration**
|
|
21
|
+
- Given the retry configuration file exists at `.claude/retry-config.json`,
|
|
22
|
+
- When I run `orchestr8 retry-config`,
|
|
23
|
+
- Then the current configuration is displayed in a readable format showing thresholds, window size, max retries, and enabled strategies.
|
|
24
|
+
|
|
25
|
+
**AC-2 — View defaults when no configuration exists**
|
|
26
|
+
- Given no retry configuration file exists,
|
|
27
|
+
- When I run `orchestr8 retry-config`,
|
|
28
|
+
- Then the hardcoded default configuration is displayed,
|
|
29
|
+
- And a message indicates "Using default configuration (no config file found)".
|
|
30
|
+
|
|
31
|
+
**AC-3 — Modify configuration value**
|
|
32
|
+
- Given I run `orchestr8 retry-config set <key> <value>` (e.g., `orchestr8 retry-config set maxRetries 5`),
|
|
33
|
+
- When the key is valid and value is of correct type,
|
|
34
|
+
- Then the configuration file is updated with the new value,
|
|
35
|
+
- And a confirmation message shows the updated setting.
|
|
36
|
+
|
|
37
|
+
**AC-4 — Reject invalid configuration key**
|
|
38
|
+
- Given I run `orchestr8 retry-config set <invalidKey> <value>`,
|
|
39
|
+
- When the key is not recognized,
|
|
40
|
+
- Then an error message is displayed listing valid configuration keys,
|
|
41
|
+
- And no file modification occurs.
|
|
42
|
+
|
|
43
|
+
**AC-5 — Reset configuration to defaults**
|
|
44
|
+
- Given the retry configuration file exists with custom values,
|
|
45
|
+
- When I run `orchestr8 retry-config reset`,
|
|
46
|
+
- Then the configuration file is replaced with default values,
|
|
47
|
+
- And a confirmation message indicates "Retry configuration reset to defaults".
|
|
48
|
+
|
|
49
|
+
**AC-6 — Create configuration file on first modification**
|
|
50
|
+
- Given no retry configuration file exists,
|
|
51
|
+
- When I run `orchestr8 retry-config set <key> <value>`,
|
|
52
|
+
- Then the configuration file is created with default values plus the specified modification,
|
|
53
|
+
- And the file is created in `.claude/retry-config.json`.
|
|
54
|
+
|
|
55
|
+
**AC-7 — Handle corrupted configuration gracefully**
|
|
56
|
+
- Given the retry configuration file exists but contains invalid JSON,
|
|
57
|
+
- When I run `orchestr8 retry-config`,
|
|
58
|
+
- Then an error message is displayed indicating the file is corrupted,
|
|
59
|
+
- And a suggestion to run `orchestr8 retry-config reset` is shown.
|
|
60
|
+
|
|
61
|
+
---
|
|
62
|
+
|
|
63
|
+
## Configuration Schema
|
|
64
|
+
|
|
65
|
+
Per FEATURE_SPEC.md:Section 6 (Rules), default configuration includes:
|
|
66
|
+
|
|
67
|
+
```json
|
|
68
|
+
{
|
|
69
|
+
"maxRetries": 3,
|
|
70
|
+
"windowSize": 10,
|
|
71
|
+
"highFailureThreshold": 0.2,
|
|
72
|
+
"strategies": {
|
|
73
|
+
"alex": ["simplify-prompt", "add-context"],
|
|
74
|
+
"cass": ["reduce-stories", "simplify-prompt"],
|
|
75
|
+
"nigel": ["simplify-tests", "add-context"],
|
|
76
|
+
"codey-plan": ["add-context", "simplify-prompt"],
|
|
77
|
+
"codey-implement": ["incremental", "rollback"]
|
|
78
|
+
}
|
|
79
|
+
}
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
---
|
|
83
|
+
|
|
84
|
+
## Out of Scope
|
|
85
|
+
|
|
86
|
+
- Modifying strategy definitions themselves (strategies are code, not config)
|
|
87
|
+
- Per-feature configuration overrides (configuration is global)
|
|
88
|
+
- Configuration UI/interactive editor (CLI only)
|
|
89
|
+
- Validating strategy effectiveness (config is just data; strategy logic is separate)
|
|
@@ -0,0 +1,98 @@
|
|
|
1
|
+
# Story — Should Retry Decision Logic
|
|
2
|
+
|
|
3
|
+
## User Story
|
|
4
|
+
|
|
5
|
+
As a **developer using orchestr8**, I want the **pipeline to intelligently decide whether to recommend retrying** so that **I receive useful guidance based on attempt count, failure history, and system state**.
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## Context / Scope
|
|
10
|
+
|
|
11
|
+
- Per FEATURE_SPEC.md:Section 4 (Happy Path), retry module is consulted when an agent fails
|
|
12
|
+
- Per FEATURE_SPEC.md:Section 5 (State & Lifecycle), this operates within the transition from any agent state to FAILED
|
|
13
|
+
- Per FEATURE_SPEC.md:Section 8 (Resilience), graceful degradation is required when history/config is corrupted
|
|
14
|
+
- Per SYSTEM_SPEC.md:Section 8 (Failure Handling), each agent spawn currently offers retry, skip, abort
|
|
15
|
+
|
|
16
|
+
---
|
|
17
|
+
|
|
18
|
+
## Acceptance Criteria
|
|
19
|
+
|
|
20
|
+
**AC-1 — Consult retry module on agent failure**
|
|
21
|
+
- Given an agent fails during pipeline execution,
|
|
22
|
+
- When the error handling flow is triggered,
|
|
23
|
+
- Then the retry module is consulted before displaying options to the user,
|
|
24
|
+
- And the module returns a recommendation (strategy name or "abort-recommended").
|
|
25
|
+
|
|
26
|
+
**AC-2 — Record retry attempts in history**
|
|
27
|
+
- Given the user chooses to retry (with or without strategy),
|
|
28
|
+
- When the retry completes (success or failure),
|
|
29
|
+
- Then the outcome is recorded in `.claude/pipeline-history.json`,
|
|
30
|
+
- And the record includes: `retryAttempts` count and `strategiesUsed[]` array.
|
|
31
|
+
|
|
32
|
+
**AC-3 — Track attempt count per stage per feature**
|
|
33
|
+
- Given an agent fails and user retries,
|
|
34
|
+
- When tracking retry state,
|
|
35
|
+
- Then the attempt count is incremented for the current stage and feature combination,
|
|
36
|
+
- And the count persists across retries within the same pipeline run.
|
|
37
|
+
|
|
38
|
+
**AC-4 — Degrade gracefully with corrupted history**
|
|
39
|
+
- Given the history file at `.claude/pipeline-history.json` is corrupted or missing,
|
|
40
|
+
- When the retry module is consulted,
|
|
41
|
+
- Then the module defaults to simple retry recommendation,
|
|
42
|
+
- And a warning is logged: "History file unavailable; defaulting to simple retry".
|
|
43
|
+
|
|
44
|
+
**AC-5 — Degrade gracefully with corrupted configuration**
|
|
45
|
+
- Given the configuration file at `.claude/retry-config.json` is corrupted or missing,
|
|
46
|
+
- When the retry module calculates strategy,
|
|
47
|
+
- Then hardcoded defaults are used for thresholds and strategies,
|
|
48
|
+
- And a warning is logged: "Config file unavailable; using default settings".
|
|
49
|
+
|
|
50
|
+
**AC-6 — Preserve state transitions**
|
|
51
|
+
- Given a retry is in progress,
|
|
52
|
+
- When the retry attempt completes successfully,
|
|
53
|
+
- Then the pipeline continues to the next stage as normal,
|
|
54
|
+
- And the state transitions per SYSTEM_SPEC.md:Section 6 (e.g., CASS to NIGEL).
|
|
55
|
+
|
|
56
|
+
**AC-7 — Support abort without retry**
|
|
57
|
+
- Given the user is shown retry options,
|
|
58
|
+
- When the user selects "abort",
|
|
59
|
+
- Then the feature is moved to the failed list in the queue,
|
|
60
|
+
- And no retry is attempted,
|
|
61
|
+
- And the failure is recorded with reason "user-aborted".
|
|
62
|
+
|
|
63
|
+
---
|
|
64
|
+
|
|
65
|
+
## Integration Points
|
|
66
|
+
|
|
67
|
+
Per FEATURE_SPEC.md:Section 7 (Dependencies):
|
|
68
|
+
|
|
69
|
+
- **src/history.js:** `readHistoryFile()` provides failure data
|
|
70
|
+
- **src/insights.js:** `analyzeFailures()` calculates failure rates by stage
|
|
71
|
+
- **src/orchestrator.js:** Queue management receives retry module functions
|
|
72
|
+
- **SKILL.md:** Error handling section references retry recommendations
|
|
73
|
+
|
|
74
|
+
---
|
|
75
|
+
|
|
76
|
+
## History Record Schema
|
|
77
|
+
|
|
78
|
+
Per FEATURE_SPEC.md:Section 8 (Audit/Logging), new fields in history entries:
|
|
79
|
+
|
|
80
|
+
```json
|
|
81
|
+
{
|
|
82
|
+
"featureSlug": "adaptive-retry",
|
|
83
|
+
"stage": "cass",
|
|
84
|
+
"outcome": "success",
|
|
85
|
+
"timestamp": "2026-02-24T10:30:00Z",
|
|
86
|
+
"retryAttempts": 2,
|
|
87
|
+
"strategiesUsed": ["retry", "reduce-stories"]
|
|
88
|
+
}
|
|
89
|
+
```
|
|
90
|
+
|
|
91
|
+
---
|
|
92
|
+
|
|
93
|
+
## Out of Scope
|
|
94
|
+
|
|
95
|
+
- Automatic retry without user prompt (user always has final say)
|
|
96
|
+
- Retry across different features (each feature's retry state is independent)
|
|
97
|
+
- Persistent retry state across pipeline invocations (state is per-run only; history is persistent)
|
|
98
|
+
- Batch retry of multiple failed features (single-feature focus per SYSTEM_SPEC.md:Section 10)
|
|
@@ -0,0 +1,85 @@
|
|
|
1
|
+
# Story — Strategy Recommendation
|
|
2
|
+
|
|
3
|
+
## User Story
|
|
4
|
+
|
|
5
|
+
As a **developer using orchestr8**, I want the **pipeline to recommend a retry strategy based on failure history** so that **I can make informed decisions about how to handle failures more effectively**.
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## Context / Scope
|
|
10
|
+
|
|
11
|
+
- Per FEATURE_SPEC.md:Section 4 (Behaviour Overview), recommendations are advisory only; user retains final choice
|
|
12
|
+
- Per FEATURE_SPEC.md:Section 6 (Rule 1 & Rule 2), strategy selection is based on failure rate calculation
|
|
13
|
+
- Per FEATURE_SPEC.md:Section 3 (Actors), the History Module provides failure data (read-only)
|
|
14
|
+
- Per FEATURE_SPEC.md:Section 7 (Dependencies), uses `src/history.js` and `src/insights.js`
|
|
15
|
+
|
|
16
|
+
---
|
|
17
|
+
|
|
18
|
+
## Acceptance Criteria
|
|
19
|
+
|
|
20
|
+
**AC-1 — Calculate failure rate from history**
|
|
21
|
+
- Given an agent fails during pipeline execution,
|
|
22
|
+
- When the retry module is consulted,
|
|
23
|
+
- Then the failure rate is calculated as `failedRunsAtStage / totalRecentRuns` over the configured window (default 10 runs),
|
|
24
|
+
- And the calculation uses data from `.claude/pipeline-history.json`.
|
|
25
|
+
|
|
26
|
+
**AC-2 — Recommend simple retry for low failure rate**
|
|
27
|
+
- Given the calculated failure rate is at or below the threshold (default 20%),
|
|
28
|
+
- When displaying retry options to the user,
|
|
29
|
+
- Then the recommendation is "retry" with no prompt modification suggested,
|
|
30
|
+
- And the message format is: "Recommended: Simple retry. Retry? (y/n/skip/abort)".
|
|
31
|
+
|
|
32
|
+
**AC-3 — Recommend alternative strategy for high failure rate**
|
|
33
|
+
- Given the calculated failure rate exceeds the threshold (default 20%),
|
|
34
|
+
- When displaying retry options to the user,
|
|
35
|
+
- Then an alternative strategy from the stage's strategy list is recommended,
|
|
36
|
+
- And the message format is: "Recommended strategy: {strategyName}. Retry with this approach? (y/apply/skip/abort)".
|
|
37
|
+
|
|
38
|
+
**AC-4 — Escalate strategy on subsequent attempts**
|
|
39
|
+
- Given a retry attempt has already been made at the current stage,
|
|
40
|
+
- When another failure occurs and the user is prompted again,
|
|
41
|
+
- Then the next strategy in the stage's strategy list is recommended,
|
|
42
|
+
- And if all strategies have been exhausted, "abort-recommended" is suggested.
|
|
43
|
+
|
|
44
|
+
**AC-5 — Default to simple retry with no history**
|
|
45
|
+
- Given this is the first recorded run or no history exists for the current stage,
|
|
46
|
+
- When an agent fails,
|
|
47
|
+
- Then the recommendation is "retry" (simple retry),
|
|
48
|
+
- And a note indicates "No failure history for this stage; defaulting to simple retry".
|
|
49
|
+
|
|
50
|
+
**AC-6 — Warn when max retries exceeded**
|
|
51
|
+
- Given the current attempt count exceeds `maxRetries` (default 3) for the same feature and stage,
|
|
52
|
+
- When displaying options to the user,
|
|
53
|
+
- Then the recommendation is "abort-recommended" or "skip-recommended",
|
|
54
|
+
- And a warning is displayed: "Max retries ({count}) exceeded for {stage}. Consider skipping or aborting."
|
|
55
|
+
|
|
56
|
+
**AC-7 — Display recommendation without forcing choice**
|
|
57
|
+
- Given a strategy recommendation is displayed,
|
|
58
|
+
- When the user selects a different option (e.g., chooses "skip" when "retry" was recommended),
|
|
59
|
+
- Then the user's choice is respected,
|
|
60
|
+
- And no error or warning is shown for overriding the recommendation.
|
|
61
|
+
|
|
62
|
+
---
|
|
63
|
+
|
|
64
|
+
## Session State
|
|
65
|
+
|
|
66
|
+
During pipeline execution, retry state is tracked:
|
|
67
|
+
|
|
68
|
+
```js
|
|
69
|
+
retryState = {
|
|
70
|
+
stage: 'cass',
|
|
71
|
+
featureSlug: 'adaptive-retry',
|
|
72
|
+
attemptCount: 2,
|
|
73
|
+
strategiesUsed: ['retry', 'reduce-stories'],
|
|
74
|
+
failureRate: 0.3
|
|
75
|
+
}
|
|
76
|
+
```
|
|
77
|
+
|
|
78
|
+
---
|
|
79
|
+
|
|
80
|
+
## Out of Scope
|
|
81
|
+
|
|
82
|
+
- Machine learning or predictive models for strategy selection (per FEATURE_SPEC.md:Section 2)
|
|
83
|
+
- Cross-feature failure correlation (per FEATURE_SPEC.md:Section 2)
|
|
84
|
+
- Automatic retry without user confirmation (user choice is paramount)
|
|
85
|
+
- Modifying the agent prompts (that is story-prompt-modification)
|
|
@@ -0,0 +1,328 @@
|
|
|
1
|
+
# Feature Specification — Agent Guardrails
|
|
2
|
+
|
|
3
|
+
## 1. Feature Intent
|
|
4
|
+
|
|
5
|
+
**Why this feature exists.**
|
|
6
|
+
|
|
7
|
+
The orchestr8 framework relies on four AI agents (Alex, Cass, Nigel, Codey) operating autonomously within a pipeline. Without explicit guardrails, these agents may:
|
|
8
|
+
|
|
9
|
+
- Generate content based on training data rather than provided inputs
|
|
10
|
+
- Reference external sources (social media, forums, web content) that are unreliable or inappropriate
|
|
11
|
+
- Expose confidential business context in outputs
|
|
12
|
+
- Produce non-deterministic or hallucinated content that cannot be traced to authoritative sources
|
|
13
|
+
|
|
14
|
+
**Problem being addressed:**
|
|
15
|
+
Uncontrolled information sourcing and output generation creates risks around accuracy, confidentiality, and auditability in automated feature development.
|
|
16
|
+
|
|
17
|
+
**User need:**
|
|
18
|
+
Users need confidence that agent outputs are grounded exclusively in provided inputs (specs, code, business_context, templates) and that confidential information is protected.
|
|
19
|
+
|
|
20
|
+
**System purpose alignment:**
|
|
21
|
+
Per `.blueprint/system_specification/SYSTEM_SPEC.md`: "What must not be compromised: Explicit specification before implementation" and "All artifacts (specs, stories, tests, code) are aligned and consistent." Guardrails directly support these principles by ensuring agents produce traceable, consistent outputs.
|
|
22
|
+
|
|
23
|
+
---
|
|
24
|
+
|
|
25
|
+
## 2. Scope
|
|
26
|
+
|
|
27
|
+
### In Scope
|
|
28
|
+
|
|
29
|
+
- **Source restrictions**: Rules governing what information sources agents may and may not use
|
|
30
|
+
- **Grounding requirements**: Citation and traceability standards for all agent assertions
|
|
31
|
+
- **Confidentiality constraints**: Rules for protecting `.business_context/` content and preventing data leakage
|
|
32
|
+
- **Determinism expectations**: Standards for reproducible, consistent agent behaviour
|
|
33
|
+
- **Escalation protocols**: Clear rules for when agents must stop and ask the user
|
|
34
|
+
- **Anti-hallucination measures**: Explicit preference for "I don't have this information" over guessing
|
|
35
|
+
|
|
36
|
+
### Out of Scope
|
|
37
|
+
|
|
38
|
+
- Technical enforcement mechanisms (runtime validation, content filtering)
|
|
39
|
+
- Monitoring or auditing infrastructure for guardrail compliance
|
|
40
|
+
- Changes to the pipeline flow or agent sequencing
|
|
41
|
+
- Modifications to CLI tooling or queue management
|
|
42
|
+
- Integration with external guardrail services or APIs
|
|
43
|
+
|
|
44
|
+
---
|
|
45
|
+
|
|
46
|
+
## 3. Actors Involved
|
|
47
|
+
|
|
48
|
+
### Alex (System Specification & Chief-of-Staff)
|
|
49
|
+
- **What they can do**: Define system and feature specifications grounded in provided inputs; cite sources for all assertions; flag missing information
|
|
50
|
+
- **What they cannot do**: Use external sources; invent business rules not found in inputs; expose confidential context in specifications
|
|
51
|
+
|
|
52
|
+
### Cass (Story Writer & Business Analyst)
|
|
53
|
+
- **What they can do**: Translate specifications into user stories citing the feature spec; make explicit assumptions when information is missing
|
|
54
|
+
- **What they cannot do**: Reference external examples or implementations; introduce behaviour not derived from specifications
|
|
55
|
+
|
|
56
|
+
### Nigel (Tester)
|
|
57
|
+
- **What they can do**: Create tests based on user stories and acceptance criteria; note assumptions explicitly
|
|
58
|
+
- **What they cannot do**: Use external testing patterns without attribution; invent requirements beyond what stories specify
|
|
59
|
+
|
|
60
|
+
### Codey (Developer)
|
|
61
|
+
- **What they can do**: Implement against tests and specifications; make implementation assumptions explicit
|
|
62
|
+
- **What they cannot do**: Use code from external sources without flagging; modify behaviour beyond what tests require
|
|
63
|
+
|
|
64
|
+
### Human User
|
|
65
|
+
- **What they can do**: Provide source materials; respond to escalations; approve assumptions
|
|
66
|
+
- **What they cannot do**: N/A (human user is the authority)
|
|
67
|
+
|
|
68
|
+
---
|
|
69
|
+
|
|
70
|
+
## 4. Behaviour Overview
|
|
71
|
+
|
|
72
|
+
**Happy-path behaviour:**
|
|
73
|
+
|
|
74
|
+
1. Agent receives task with explicit inputs (specs, stories, tests, code, business_context)
|
|
75
|
+
2. Agent processes ONLY the provided inputs to produce outputs
|
|
76
|
+
3. Agent cites sources for all assertions using standard format: "Per [filename]: [claim]" or "[filename:section] states..."
|
|
77
|
+
4. Agent flags any gaps or assumptions explicitly rather than filling them silently
|
|
78
|
+
5. Agent produces self-contained outputs that do not leak confidential context
|
|
79
|
+
6. Given identical inputs, agent produces consistent outputs
|
|
80
|
+
|
|
81
|
+
**Key alternatives or branches:**
|
|
82
|
+
|
|
83
|
+
- **Missing information path**: When required information is not in provided inputs, agent explicitly states "This information is not available in the provided inputs" and either (a) makes an explicit assumption labelled as such, or (b) escalates to the user
|
|
84
|
+
- **Ambiguity path**: When inputs are ambiguous, agent lists possible interpretations and asks the user to clarify
|
|
85
|
+
- **Confidentiality conflict path**: When an output would require exposing confidential context, agent flags this and asks for guidance
|
|
86
|
+
|
|
87
|
+
**User-visible outcomes:**
|
|
88
|
+
|
|
89
|
+
- All agent outputs contain traceable citations to source files
|
|
90
|
+
- Assumptions are explicitly labelled and distinguishable from facts
|
|
91
|
+
- Outputs are self-contained and do not reference confidential details by name
|
|
92
|
+
- Re-running the pipeline with identical inputs produces consistent results
|
|
93
|
+
|
|
94
|
+
---
|
|
95
|
+
|
|
96
|
+
## 5. State & Lifecycle Interactions
|
|
97
|
+
|
|
98
|
+
This feature is **state-constraining** rather than state-creating or state-transitioning.
|
|
99
|
+
|
|
100
|
+
**States affected:**
|
|
101
|
+
- All pipeline stages (alex, cass, nigel, codey-plan, codey-implement) are constrained by guardrails
|
|
102
|
+
- Guardrails apply regardless of whether a feature is pending, in_progress, or being resumed
|
|
103
|
+
|
|
104
|
+
**No new states introduced:**
|
|
105
|
+
The feature adds behavioural constraints to existing states without modifying the state model defined in `.blueprint/system_specification/SYSTEM_SPEC.md` section 6.
|
|
106
|
+
|
|
107
|
+
---
|
|
108
|
+
|
|
109
|
+
## 6. Rules & Decision Logic
|
|
110
|
+
|
|
111
|
+
### Rule 1: Source Restriction Rule
|
|
112
|
+
|
|
113
|
+
**Description:** Agents must use ONLY information from explicitly provided inputs.
|
|
114
|
+
|
|
115
|
+
**Inputs:** Task context, file references, `.business_context/` directory contents
|
|
116
|
+
|
|
117
|
+
**Outputs:** Agent output grounded exclusively in provided inputs
|
|
118
|
+
|
|
119
|
+
**Deterministic:** Yes
|
|
120
|
+
|
|
121
|
+
**Allowed sources:**
|
|
122
|
+
- System specification (`.blueprint/system_specification/SYSTEM_SPEC.md`)
|
|
123
|
+
- Feature specifications (`.blueprint/features/*/FEATURE_SPEC.md`)
|
|
124
|
+
- User stories (`story-*.md`)
|
|
125
|
+
- Test artifacts (`test-spec.md`, `*.test.js`)
|
|
126
|
+
- Implementation code in the project
|
|
127
|
+
- Business context (`.business_context/*`)
|
|
128
|
+
- Templates (`.blueprint/templates/*`)
|
|
129
|
+
- Agent specifications (`.blueprint/agents/AGENT_*.md`)
|
|
130
|
+
|
|
131
|
+
**Prohibited sources:**
|
|
132
|
+
- Social media (Twitter/X, Reddit, LinkedIn, Facebook, etc.)
|
|
133
|
+
- Forums, blog posts, or user-generated web content
|
|
134
|
+
- Web searches or external APIs
|
|
135
|
+
- Training data for domain-specific facts
|
|
136
|
+
- External project implementations or company references
|
|
137
|
+
|
|
138
|
+
---
|
|
139
|
+
|
|
140
|
+
### Rule 2: Citation Rule
|
|
141
|
+
|
|
142
|
+
**Description:** All assertions about requirements, behaviour, or domain knowledge must cite their source.
|
|
143
|
+
|
|
144
|
+
**Inputs:** Agent assertions about the system or domain
|
|
145
|
+
|
|
146
|
+
**Outputs:** Assertion with citation in format: "Per [filename]: [claim]" or "[filename:section] states..."
|
|
147
|
+
|
|
148
|
+
**Deterministic:** Yes
|
|
149
|
+
|
|
150
|
+
**Examples:**
|
|
151
|
+
- "Per FEATURE_SPEC.md: Users must authenticate before accessing the dashboard"
|
|
152
|
+
- "Per story-login.md:AC-3: Failed login attempts are logged"
|
|
153
|
+
- ".business_context/glossary.md defines 'tenant' as..."
|
|
154
|
+
|
|
155
|
+
---
|
|
156
|
+
|
|
157
|
+
### Rule 3: Confidentiality Rule
|
|
158
|
+
|
|
159
|
+
**Description:** Agents must treat `.business_context/` content as confidential and prevent data leakage.
|
|
160
|
+
|
|
161
|
+
**Inputs:** Any content from `.business_context/` directory
|
|
162
|
+
|
|
163
|
+
**Outputs:** Outputs that do not expose confidential details
|
|
164
|
+
|
|
165
|
+
**Deterministic:** Yes
|
|
166
|
+
|
|
167
|
+
**Constraints:**
|
|
168
|
+
- Do not reference external projects, companies, or implementations by name
|
|
169
|
+
- Do not use external services that would expose project data
|
|
170
|
+
- Output artifacts should be self-contained
|
|
171
|
+
- Generic descriptions preferred over specific confidential details
|
|
172
|
+
|
|
173
|
+
---
|
|
174
|
+
|
|
175
|
+
### Rule 4: Assumption Declaration Rule
|
|
176
|
+
|
|
177
|
+
**Description:** When information is not available in provided inputs, agents must explicitly declare assumptions.
|
|
178
|
+
|
|
179
|
+
**Inputs:** Gap in provided information
|
|
180
|
+
|
|
181
|
+
**Outputs:** Explicit assumption statement labelled as such
|
|
182
|
+
|
|
183
|
+
**Deterministic:** Yes
|
|
184
|
+
|
|
185
|
+
**Format:**
|
|
186
|
+
- "ASSUMPTION: [statement] - This is not specified in the provided inputs"
|
|
187
|
+
- "NOTE: Assuming [X] in absence of explicit guidance"
|
|
188
|
+
|
|
189
|
+
---
|
|
190
|
+
|
|
191
|
+
### Rule 5: Escalation Rule
|
|
192
|
+
|
|
193
|
+
**Description:** Agents must escalate to the user under defined conditions rather than proceeding with guesses.
|
|
194
|
+
|
|
195
|
+
**Inputs:** Trigger conditions (listed below)
|
|
196
|
+
|
|
197
|
+
**Outputs:** Escalation request to user
|
|
198
|
+
|
|
199
|
+
**Deterministic:** Yes
|
|
200
|
+
|
|
201
|
+
**Escalation triggers:**
|
|
202
|
+
- Information required for the task is not in provided inputs AND cannot be safely assumed
|
|
203
|
+
- Ambiguity in inputs that significantly affects output
|
|
204
|
+
- Conflict between different input sources
|
|
205
|
+
- Request would require violating confidentiality constraints
|
|
206
|
+
- Uncertainty that could lead to material misalignment
|
|
207
|
+
|
|
208
|
+
---
|
|
209
|
+
|
|
210
|
+
### Rule 6: Determinism Rule
|
|
211
|
+
|
|
212
|
+
**Description:** Same inputs should produce consistent outputs across runs.
|
|
213
|
+
|
|
214
|
+
**Inputs:** Identical task context and input files
|
|
215
|
+
|
|
216
|
+
**Outputs:** Consistent agent outputs
|
|
217
|
+
|
|
218
|
+
**Deterministic:** Yes (by definition)
|
|
219
|
+
|
|
220
|
+
**Implications:**
|
|
221
|
+
- Avoid incorporating timestamps or random elements unless explicitly required
|
|
222
|
+
- Avoid referencing volatile or external state
|
|
223
|
+
- Structure outputs to be reproducible
|
|
224
|
+
|
|
225
|
+
---
|
|
226
|
+
|
|
227
|
+
## 7. Dependencies
|
|
228
|
+
|
|
229
|
+
### System Components
|
|
230
|
+
- Agent specifications (`.blueprint/agents/AGENT_*.md`) - must be updated to incorporate guardrails
|
|
231
|
+
- Pipeline orchestration (`SKILL.md` / `/implement-feature`) - no changes required, but agents must apply guardrails
|
|
232
|
+
|
|
233
|
+
### External Systems
|
|
234
|
+
- None (guardrails specifically prohibit external dependencies)
|
|
235
|
+
|
|
236
|
+
### Policy Dependencies
|
|
237
|
+
- None identified
|
|
238
|
+
|
|
239
|
+
### Operational Dependencies
|
|
240
|
+
- Users must provide adequate input materials (business_context, specs) for agents to work from
|
|
241
|
+
- Teams adopting orchestr8 must understand that agents will escalate when information is insufficient
|
|
242
|
+
|
|
243
|
+
---
|
|
244
|
+
|
|
245
|
+
## 8. Non-Functional Considerations
|
|
246
|
+
|
|
247
|
+
### Auditability
|
|
248
|
+
- Citation format enables traceability from outputs back to source inputs
|
|
249
|
+
- Assumption labels enable review of agent decisions
|
|
250
|
+
- Escalation log provides audit trail of human decisions
|
|
251
|
+
|
|
252
|
+
### Reliability
|
|
253
|
+
- Determinism rule supports reproducible builds and debugging
|
|
254
|
+
- Explicit assumptions reduce hidden failure modes
|
|
255
|
+
|
|
256
|
+
### Security / Confidentiality
|
|
257
|
+
- Confidentiality constraints protect business-sensitive information
|
|
258
|
+
- Prohibition on external services prevents data exposure
|
|
259
|
+
|
|
260
|
+
### Performance
|
|
261
|
+
- No significant performance impact expected
|
|
262
|
+
- Guardrails are behavioural constraints, not runtime checks
|
|
263
|
+
|
|
264
|
+
---
|
|
265
|
+
|
|
266
|
+
## 9. Assumptions & Open Questions
|
|
267
|
+
|
|
268
|
+
### Assumptions
|
|
269
|
+
|
|
270
|
+
1. **Input sufficiency**: Users will provide adequate input materials for agents to complete tasks without excessive escalation
|
|
271
|
+
2. **Agent compliance**: Guardrails are specification-level constraints that agents will follow; no technical enforcement mechanism is assumed
|
|
272
|
+
3. **Citation overhead**: The additional effort of citing sources is acceptable given the traceability benefits
|
|
273
|
+
4. **Escalation tolerance**: Users accept that agents will ask clarifying questions rather than guessing
|
|
274
|
+
|
|
275
|
+
### Open Questions
|
|
276
|
+
|
|
277
|
+
1. **Granularity of citation**: Should citations reference file-level or section-level? (Recommendation: section-level where feasible)
|
|
278
|
+
2. **Assumption threshold**: What level of assumption is acceptable before escalation is required? (Recommendation: err on the side of escalation for domain-specific matters)
|
|
279
|
+
3. **Confidentiality scope**: Should confidentiality constraints extend beyond `.business_context/` to include implementation code? (Recommendation: yes, treat all project content as confidential by default)
|
|
280
|
+
|
|
281
|
+
---
|
|
282
|
+
|
|
283
|
+
## 10. Impact on System Specification
|
|
284
|
+
|
|
285
|
+
**Does this feature reinforce, stretch, or contradict existing system assumptions?**
|
|
286
|
+
|
|
287
|
+
This feature **reinforces** existing system assumptions, particularly:
|
|
288
|
+
|
|
289
|
+
- Per SYSTEM_SPEC.md section 7 (Governing Rules & Invariants): "No silent changes: Agents flag deviations; do not silently alter specifications" - guardrails extend this principle
|
|
290
|
+
- Per SYSTEM_SPEC.md section 8 (Cross-Cutting Concerns): "Traceability" - citation requirements directly support traceability goals
|
|
291
|
+
- Per SYSTEM_SPEC.md section 9 (Non-Functional Expectations): "Deterministic output given same inputs" - determinism rule makes this explicit
|
|
292
|
+
|
|
293
|
+
**Potential system spec enhancement:**
|
|
294
|
+
|
|
295
|
+
Section 7 (Governing Rules & Invariants) could be extended with a new subsection "Agent Guardrails" that codifies these constraints at the system level. This is a **non-breaking enhancement** that strengthens existing principles.
|
|
296
|
+
|
|
297
|
+
---
|
|
298
|
+
|
|
299
|
+
## 11. Handover to BA (Cass)
|
|
300
|
+
|
|
301
|
+
### Story Themes
|
|
302
|
+
|
|
303
|
+
Cass should derive stories around four main themes:
|
|
304
|
+
|
|
305
|
+
1. **Source Restriction Stories**: Stories covering what agents can and cannot reference when producing outputs
|
|
306
|
+
2. **Citation & Traceability Stories**: Stories defining how agents cite sources and maintain traceability
|
|
307
|
+
3. **Confidentiality Stories**: Stories ensuring business context remains protected
|
|
308
|
+
4. **Escalation & Assumption Stories**: Stories defining when and how agents escalate vs. assume
|
|
309
|
+
|
|
310
|
+
### Expected Story Boundaries
|
|
311
|
+
|
|
312
|
+
- Each guardrail rule (Rules 1-6 in section 6) likely maps to one or more stories
|
|
313
|
+
- Stories should be agent-agnostic where possible (guardrails apply to all agents)
|
|
314
|
+
- Stories should include acceptance criteria that can be verified by reviewing agent outputs
|
|
315
|
+
|
|
316
|
+
### Areas Needing Careful Story Framing
|
|
317
|
+
|
|
318
|
+
- **Balancing thoroughness vs. practicality**: Citation requirements should not create excessive overhead
|
|
319
|
+
- **Escalation threshold**: Stories should clarify when escalation is warranted vs. when reasonable assumption is acceptable
|
|
320
|
+
- **Confidentiality boundaries**: What exactly constitutes "confidential" and what is acceptable to reference
|
|
321
|
+
|
|
322
|
+
---
|
|
323
|
+
|
|
324
|
+
## 12. Change Log (Feature-Level)
|
|
325
|
+
|
|
326
|
+
| Date | Change | Reason | Raised By |
|
|
327
|
+
|------|--------|--------|-----------|
|
|
328
|
+
| 2026-02-24 | Initial feature specification | Define comprehensive guardrails for agent behaviour | Alex |
|
|
@@ -0,0 +1,90 @@
|
|
|
1
|
+
# Implementation Plan — Agent Guardrails
|
|
2
|
+
|
|
3
|
+
## Summary
|
|
4
|
+
|
|
5
|
+
Add a standardised "Guardrails" section to all four agent specification files (AGENT_SPECIFICATION_ALEX.md, AGENT_BA_CASS.md, AGENT_TESTER_NIGEL.md, AGENT_DEVELOPER_CODEY.md). The section covers source restrictions, citation requirements, confidentiality constraints, and escalation protocols. This is a documentation-only change with no runtime code modifications.
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## Files to Create/Modify
|
|
10
|
+
|
|
11
|
+
| Path | Action | Purpose |
|
|
12
|
+
|------|--------|---------|
|
|
13
|
+
| `.blueprint/agents/AGENT_SPECIFICATION_ALEX.md` | Modify | Add Guardrails section |
|
|
14
|
+
| `.blueprint/agents/AGENT_BA_CASS.md` | Modify | Add Guardrails section |
|
|
15
|
+
| `.blueprint/agents/AGENT_TESTER_NIGEL.md` | Modify | Add Guardrails section |
|
|
16
|
+
| `.blueprint/agents/AGENT_DEVELOPER_CODEY.md` | Modify | Add Guardrails section |
|
|
17
|
+
|
|
18
|
+
---
|
|
19
|
+
|
|
20
|
+
## Implementation Steps
|
|
21
|
+
|
|
22
|
+
1. **Read each agent spec file** to identify the best insertion point for the Guardrails section (after existing content, before any skills section if present).
|
|
23
|
+
|
|
24
|
+
2. **Add Guardrails section to AGENT_SPECIFICATION_ALEX.md** using the template below.
|
|
25
|
+
|
|
26
|
+
3. **Add Guardrails section to AGENT_BA_CASS.md** using the template below.
|
|
27
|
+
|
|
28
|
+
4. **Add Guardrails section to AGENT_TESTER_NIGEL.md** using the template below.
|
|
29
|
+
|
|
30
|
+
5. **Add Guardrails section to AGENT_DEVELOPER_CODEY.md** using the template below.
|
|
31
|
+
|
|
32
|
+
6. **Run tests** (`node --test test/feature_agent-guardrails.test.js`) to verify all 21 test assertions pass.
|
|
33
|
+
|
|
34
|
+
7. **Review outputs** to ensure no test failures remain.
|
|
35
|
+
|
|
36
|
+
---
|
|
37
|
+
|
|
38
|
+
## Guardrails Section Template
|
|
39
|
+
|
|
40
|
+
```markdown
|
|
41
|
+
## Guardrails
|
|
42
|
+
|
|
43
|
+
### Allowed Sources
|
|
44
|
+
You may use ONLY information from these sources:
|
|
45
|
+
- System specification (`.blueprint/system_specification/SYSTEM_SPEC.md`)
|
|
46
|
+
- Feature specifications (`.blueprint/features/*/FEATURE_SPEC.md`)
|
|
47
|
+
- User stories (`story-*.md`) and test artifacts (`test-spec.md`, `*.test.js`)
|
|
48
|
+
- Implementation code in the project
|
|
49
|
+
- Business context (`.business_context/*`)
|
|
50
|
+
- Templates (`.blueprint/templates/*`) and agent specifications
|
|
51
|
+
|
|
52
|
+
### Prohibited Sources
|
|
53
|
+
Do not use:
|
|
54
|
+
- Social media, forums, blog posts, or external APIs
|
|
55
|
+
- Training data for domain facts—do not invent business rules
|
|
56
|
+
- External project or company references by name
|
|
57
|
+
|
|
58
|
+
### Citation Requirements
|
|
59
|
+
- Cite sources using: "Per [filename]: [claim]" or "[filename:section] states..."
|
|
60
|
+
- Use section-level citations where feasible (e.g., "story-login.md:AC-3")
|
|
61
|
+
- Reference `.business_context/` files for domain definitions
|
|
62
|
+
- Maintain a traceable chain: downstream artifacts cite upstream sources
|
|
63
|
+
|
|
64
|
+
### Assumptions vs Facts
|
|
65
|
+
- Label assumptions explicitly: "ASSUMPTION: [statement]" or "NOTE: Assuming..."
|
|
66
|
+
- Distinguish clearly between cited facts and assumptions
|
|
67
|
+
- Do not guess—state "This information is not available in the provided inputs"
|
|
68
|
+
|
|
69
|
+
### Confidentiality
|
|
70
|
+
- Do not reproduce `.business_context/` content verbatim; summarise or use generic descriptions
|
|
71
|
+
- Do not reference external entities, companies, or projects by name
|
|
72
|
+
- Do not use external services that would expose project data
|
|
73
|
+
- Outputs must be self-contained and understandable without access to confidential sources
|
|
74
|
+
|
|
75
|
+
### Escalation Protocol
|
|
76
|
+
Escalate to the user when:
|
|
77
|
+
- Critical information is missing and cannot be safely assumed
|
|
78
|
+
- Inputs are ambiguous with multiple possible interpretations—list options and ask for clarification
|
|
79
|
+
- Source documents conflict—cite both sources and request resolution
|
|
80
|
+
- Output would require violating confidentiality constraints
|
|
81
|
+
|
|
82
|
+
When escalation is not warranted, you may proceed with an explicit assumption labelled as such.
|
|
83
|
+
```
|
|
84
|
+
|
|
85
|
+
---
|
|
86
|
+
|
|
87
|
+
## Risks/Questions
|
|
88
|
+
|
|
89
|
+
- **Insertion point consistency**: Each agent file has slightly different structure; insert before "Skills available" section if present, otherwise at end.
|
|
90
|
+
- **Test phrase matching**: Tests use case-insensitive substring matching; template wording must include trigger phrases from test file.
|