@abdullah-alnahas/claude-sdd 0.6.0 → 0.7.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude-plugin/plugin.json +1 -1
- package/commands/sdd-autopilot.md +19 -7
- package/commands/sdd-execute.md +18 -0
- package/commands/sdd-phase.md +1 -1
- package/commands/sdd-review.md +52 -18
- package/hooks/hooks.json +2 -1
- package/hooks/scripts/post-edit-review.sh +7 -3
- package/hooks/scripts/session-init.sh +15 -1
- package/package.json +1 -1
- package/scripts/verify-hooks.sh +28 -8
- package/scripts/verify-skills.sh +1 -1
- package/skills/architecture-aware/SKILL.md +4 -5
- package/skills/guardrails/SKILL.md +51 -5
- package/skills/guardrails/references/failure-patterns.md +5 -0
- package/skills/iterative-execution/SKILL.md +20 -6
- package/skills/iterative-execution/references/review-prompts.md +69 -0
- package/skills/performance-optimization/SKILL.md +5 -4
- package/skills/spec-first/SKILL.md +4 -4
- package/skills/tdd-discipline/SKILL.md +23 -6
- package/skills/using-sdd/SKILL.md +72 -0
- package/skills/using-sdd/references/skill-creation-process.md +54 -0
|
@@ -54,29 +54,41 @@ When invoked, execute the following phases in order. Announce each phase transit
|
|
|
54
54
|
|
|
55
55
|
**Input**: Behavior spec + roadmap from previous phases.
|
|
56
56
|
**Actions**:
|
|
57
|
-
1. Work through roadmap items in priority order
|
|
57
|
+
1. Work through roadmap items in priority order, in **batches of 3**
|
|
58
58
|
2. For each item, use TDD:
|
|
59
59
|
- Write failing test(s) that cover the relevant acceptance criteria
|
|
60
60
|
- Write minimal code to pass
|
|
61
61
|
- Refactor
|
|
62
|
-
3. After each
|
|
62
|
+
3. After each batch of 3 items:
|
|
63
|
+
- Run available verification (test suite, linters, type checks)
|
|
64
|
+
- Report progress with verification evidence (actual test output)
|
|
65
|
+
- Pause for user feedback before continuing
|
|
63
66
|
4. If tests fail, fix using TDD (understand failure → write targeted fix → verify)
|
|
64
67
|
5. Continue until all roadmap items complete
|
|
65
68
|
|
|
66
69
|
Use the iterative execution outer loop: implement → verify → fix gaps → repeat (max 10 iterations per roadmap item).
|
|
67
70
|
|
|
68
|
-
**Transition
|
|
71
|
+
**Transition** (only when all items are complete): "Implement phase complete — all M roadmap items done. Entering Verify phase."
|
|
69
72
|
|
|
70
73
|
### Phase 4: Verify
|
|
71
74
|
|
|
72
75
|
**Input**: Implementation from Phase 3.
|
|
73
76
|
**Actions**:
|
|
77
|
+
Use the two-stage review process (see `/sdd-review`):
|
|
78
|
+
|
|
79
|
+
**Stage 1 — Spec Compliance:**
|
|
74
80
|
1. Run full test suite
|
|
75
81
|
2. Invoke **spec-compliance agent** — compare implementation against behavior-spec.md
|
|
76
|
-
3.
|
|
77
|
-
4.
|
|
78
|
-
5.
|
|
79
|
-
|
|
82
|
+
3. DO NOT trust the implementation report. Read actual code and test output independently.
|
|
83
|
+
4. For each acceptance criterion: PASS / FAIL / PARTIAL with evidence
|
|
84
|
+
5. Stage 1 must pass before proceeding to Stage 2
|
|
85
|
+
|
|
86
|
+
**Stage 2 — Code Quality:**
|
|
87
|
+
6. Invoke **critic agent** — find logical errors, assumption issues
|
|
88
|
+
7. Invoke **simplifier agent** — find unnecessary complexity
|
|
89
|
+
8. Invoke **security-reviewer agent** — check for vulnerabilities
|
|
90
|
+
9. If performance optimization was part of the spec, invoke **performance-reviewer agent**
|
|
91
|
+
10. Collect all findings
|
|
80
92
|
|
|
81
93
|
**Transition**: "Verify phase complete — N findings (X critical, Y high, Z medium). Entering Review phase."
|
|
82
94
|
|
package/commands/sdd-execute.md
CHANGED
|
@@ -73,9 +73,27 @@ No critical issues from critic agent.
|
|
|
73
73
|
Completion is genuine — verified against spec.
|
|
74
74
|
```
|
|
75
75
|
|
|
76
|
+
## Batch Execution
|
|
77
|
+
|
|
78
|
+
When working on multiple criteria or tasks, group them into batches of 3:
|
|
79
|
+
|
|
80
|
+
1. Implement batch (3 criteria/tasks) using TDD
|
|
81
|
+
2. Verify the batch — run tests, check spec compliance
|
|
82
|
+
3. Report progress with verification evidence (test output, not claims)
|
|
83
|
+
4. Pause for user feedback before continuing to the next batch
|
|
84
|
+
|
|
85
|
+
This prevents long unverified runs and gives the user control over direction.
|
|
86
|
+
|
|
87
|
+
## Verification
|
|
88
|
+
|
|
89
|
+
After each batch, use the two-stage review process:
|
|
90
|
+
- **Stage 1**: Spec compliance — verify each criterion with evidence (see `/sdd-review`)
|
|
91
|
+
- **Stage 2**: Code quality — only after Stage 1 passes
|
|
92
|
+
|
|
76
93
|
## Principles
|
|
77
94
|
|
|
78
95
|
- TDD is the inner discipline: every piece of new code starts with a failing test
|
|
79
96
|
- The outer loop verifies against the spec, not just test results
|
|
80
97
|
- Honest reporting: never claim done when criteria are unsatisfied
|
|
81
98
|
- Bounded: max iterations prevent infinite loops
|
|
99
|
+
- Batch execution: groups of 3 with checkpoint reports
|
package/commands/sdd-phase.md
CHANGED
|
@@ -63,5 +63,5 @@ Phase-specific agent recommendations:
|
|
|
63
63
|
- **specify**: spec-compliance (verify spec completeness)
|
|
64
64
|
- **design**: critic (architectural review), simplifier
|
|
65
65
|
- **implement**: spec-compliance (traceability), critic (logic review)
|
|
66
|
-
- **verify**: security-reviewer, performance-reviewer, spec-compliance
|
|
66
|
+
- **verify**: critic, security-reviewer, performance-reviewer, spec-compliance
|
|
67
67
|
- **review**: all agents
|
package/commands/sdd-review.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: sdd-review
|
|
3
|
-
description:
|
|
3
|
+
description: Two-stage review — spec compliance first, then code quality. Only proceeds to Stage 2 after Stage 1 passes.
|
|
4
4
|
argument-hint: "[--max-iterations <n>]"
|
|
5
5
|
allowed-tools:
|
|
6
6
|
- Read
|
|
@@ -14,41 +14,70 @@ allowed-tools:
|
|
|
14
14
|
|
|
15
15
|
# /sdd-review
|
|
16
16
|
|
|
17
|
-
Trigger
|
|
17
|
+
Trigger a two-stage review of recent work. Stage 1 verifies spec compliance. Stage 2 reviews code quality. Stage 2 only runs after Stage 1 passes.
|
|
18
18
|
|
|
19
19
|
## Usage
|
|
20
20
|
|
|
21
21
|
- `/sdd-review` — Review recent changes
|
|
22
22
|
- `/sdd-review --max-iterations <n>` — Set max review-fix cycles (default: 3)
|
|
23
23
|
|
|
24
|
-
##
|
|
24
|
+
## Stage 1: Spec Compliance
|
|
25
|
+
|
|
26
|
+
**Goal**: Verify the implementation satisfies the behavior spec.
|
|
25
27
|
|
|
26
28
|
1. Identify what was recently changed (git diff or session context)
|
|
27
|
-
2.
|
|
28
|
-
3. Run the **
|
|
29
|
-
4.
|
|
30
|
-
5.
|
|
31
|
-
6.
|
|
32
|
-
|
|
33
|
-
|
|
34
|
-
|
|
29
|
+
2. Find the relevant behavior spec and acceptance criteria
|
|
30
|
+
3. Run the **spec-compliance agent** with the Stage 1 prompt from `iterative-execution/references/review-prompts.md`
|
|
31
|
+
4. **DO NOT trust the implementation report.** Read the actual code and test output independently.
|
|
32
|
+
5. For each acceptance criterion: PASS / FAIL / PARTIAL with evidence
|
|
33
|
+
6. If any criterion fails:
|
|
34
|
+
- Present findings
|
|
35
|
+
- Offer to fix (using TDD — write a test for the fix first)
|
|
36
|
+
- After fixing, re-run Stage 1
|
|
37
|
+
- Repeat until all criteria pass or max iterations reached
|
|
38
|
+
|
|
39
|
+
**Stage 1 must pass before proceeding to Stage 2.**
|
|
40
|
+
|
|
41
|
+
## Stage 2: Code Quality
|
|
42
|
+
|
|
43
|
+
**Goal**: Find unnecessary complexity, dead code, scope creep.
|
|
44
|
+
|
|
45
|
+
1. Run the **critic agent** — find logical errors, assumption issues
|
|
46
|
+
2. Run the **simplifier agent** — find unnecessary complexity
|
|
47
|
+
3. If the changes involve performance optimization, run the **performance-reviewer agent**
|
|
48
|
+
4. Present findings with severity levels:
|
|
49
|
+
- [Critical] — must fix
|
|
50
|
+
- [Simplification] — should fix
|
|
51
|
+
- [Observation] — consider fixing
|
|
52
|
+
5. Offer to auto-fix critical and simplification issues
|
|
53
|
+
6. If fixes are made (using TDD), re-run Stage 2
|
|
54
|
+
7. Repeat until no critical issues remain or max iterations reached
|
|
35
55
|
|
|
36
56
|
## Output Format
|
|
37
57
|
|
|
38
58
|
```
|
|
39
|
-
SDD Review —
|
|
40
|
-
|
|
59
|
+
SDD Review — Stage 1: Spec Compliance
|
|
60
|
+
──────────────────────────────────────
|
|
61
|
+
|
|
62
|
+
Spec: specs/behavior-spec.md (5 criteria)
|
|
63
|
+
|
|
64
|
+
✓ Criterion 1: User can log in — PASS (test_login passes)
|
|
65
|
+
✓ Criterion 2: Invalid credentials show error — PASS (test_invalid_login passes)
|
|
66
|
+
✗ Criterion 3: Session persists — FAIL (no test found for this criterion)
|
|
67
|
+
|
|
68
|
+
Stage 1: 2/3 FAIL — must fix before proceeding to Stage 2.
|
|
69
|
+
```
|
|
70
|
+
|
|
71
|
+
```
|
|
72
|
+
SDD Review — Stage 2: Code Quality
|
|
73
|
+
───────────────────────────────────
|
|
41
74
|
|
|
42
75
|
Critic Findings:
|
|
43
76
|
[Critical] ...
|
|
44
|
-
[Warning] ...
|
|
45
77
|
|
|
46
78
|
Simplifier Findings:
|
|
47
79
|
[Simplification] ...
|
|
48
80
|
|
|
49
|
-
Spec Compliance:
|
|
50
|
-
[X of Y criteria satisfied]
|
|
51
|
-
|
|
52
81
|
Actions:
|
|
53
82
|
- Fix critical issues? (y/n)
|
|
54
83
|
- Apply simplifications? (y/n)
|
|
@@ -56,7 +85,12 @@ Actions:
|
|
|
56
85
|
|
|
57
86
|
## Principles
|
|
58
87
|
|
|
88
|
+
- Stage 1 is the gate. No code quality review on non-compliant code.
|
|
59
89
|
- Reviews are honest — findings are reported as-is, not softened
|
|
90
|
+
- DO NOT trust the implementer's report. Verify independently.
|
|
60
91
|
- Fixes follow TDD: if the fix changes behavior, write a test first
|
|
61
92
|
- Max iterations prevent infinite loops
|
|
62
|
-
|
|
93
|
+
|
|
94
|
+
## References
|
|
95
|
+
|
|
96
|
+
See: `iterative-execution/references/review-prompts.md` — Subagent prompt templates
|
package/hooks/hooks.json
CHANGED
|
@@ -10,9 +10,11 @@ if [ "${GUARDRAILS_DISABLED:-false}" = "true" ]; then
|
|
|
10
10
|
fi
|
|
11
11
|
|
|
12
12
|
PROJECT_DIR="${CLAUDE_PROJECT_DIR:-.}"
|
|
13
|
-
# Resolve to absolute path
|
|
13
|
+
# Resolve to absolute path (POSIX-compatible fallback when realpath unavailable)
|
|
14
14
|
if command -v realpath &>/dev/null; then
|
|
15
15
|
PROJECT_DIR=$(realpath "$PROJECT_DIR" 2>/dev/null || echo "$PROJECT_DIR")
|
|
16
|
+
else
|
|
17
|
+
PROJECT_DIR=$(cd "$PROJECT_DIR" 2>/dev/null && pwd || echo "$PROJECT_DIR")
|
|
16
18
|
fi
|
|
17
19
|
|
|
18
20
|
# Read tool input from stdin (JSON with file_path)
|
|
@@ -20,8 +22,8 @@ INPUT=$(cat)
|
|
|
20
22
|
|
|
21
23
|
# Use jq if available, fall back to sed
|
|
22
24
|
if command -v jq &>/dev/null; then
|
|
23
|
-
FILE_PATH=$(echo "$INPUT" | jq -r '.file_path // .filePath // empty' 2>/dev/null)
|
|
24
|
-
if [
|
|
25
|
+
FILE_PATH=$(echo "$INPUT" | jq -r '.file_path // .filePath // empty' 2>/dev/null || true)
|
|
26
|
+
if [ -z "$FILE_PATH" ]; then
|
|
25
27
|
echo "SDD: post-edit-review skipped — could not parse file_path from hook input" >&2
|
|
26
28
|
exit 0
|
|
27
29
|
fi
|
|
@@ -40,6 +42,8 @@ fi
|
|
|
40
42
|
# Resolve file path to absolute for consistent comparison
|
|
41
43
|
if command -v realpath &>/dev/null; then
|
|
42
44
|
FILE_PATH=$(realpath "$FILE_PATH" 2>/dev/null || echo "$FILE_PATH")
|
|
45
|
+
elif [ -f "$FILE_PATH" ]; then
|
|
46
|
+
FILE_PATH=$(cd "$(dirname "$FILE_PATH")" 2>/dev/null && echo "$(pwd)/$(basename "$FILE_PATH")" || echo "$FILE_PATH")
|
|
43
47
|
fi
|
|
44
48
|
|
|
45
49
|
# Check if inside project directory
|
|
@@ -1,9 +1,14 @@
|
|
|
1
1
|
#!/bin/bash
|
|
2
2
|
# SDD Session Initialization Hook
|
|
3
|
-
# Loads .sdd.yaml config, sets environment variables, checks yolo flag
|
|
3
|
+
# Loads .sdd.yaml config, sets environment variables, checks yolo flag, injects using-sdd skill
|
|
4
4
|
|
|
5
5
|
set -euo pipefail
|
|
6
6
|
|
|
7
|
+
# Validate plugin environment
|
|
8
|
+
if [ -z "${CLAUDE_PLUGIN_ROOT:-}" ]; then
|
|
9
|
+
echo "SDD WARNING: CLAUDE_PLUGIN_ROOT is not set — hooks may not locate plugin resources" >&2
|
|
10
|
+
fi
|
|
11
|
+
|
|
7
12
|
PROJECT_DIR="${CLAUDE_PROJECT_DIR:-.}"
|
|
8
13
|
ENV_FILE="${CLAUDE_ENV_FILE:-}"
|
|
9
14
|
CONFIG_FILE="$PROJECT_DIR/.sdd.yaml"
|
|
@@ -42,5 +47,14 @@ else
|
|
|
42
47
|
fi
|
|
43
48
|
fi
|
|
44
49
|
|
|
50
|
+
# Inject using-sdd skill as additionalContext
|
|
51
|
+
USING_SDD_PATH="${CLAUDE_PLUGIN_ROOT:-}/skills/using-sdd/SKILL.md"
|
|
52
|
+
if [ -f "$USING_SDD_PATH" ]; then
|
|
53
|
+
# Strip frontmatter and output as additionalContext
|
|
54
|
+
sed '1{/^---$/!q;};1,/^---$/d' "$USING_SDD_PATH"
|
|
55
|
+
else
|
|
56
|
+
echo "SDD WARNING: using-sdd skill not found at $USING_SDD_PATH" >&2
|
|
57
|
+
fi
|
|
58
|
+
|
|
45
59
|
echo "SDD: Session initialized — guardrails active" >&2
|
|
46
60
|
exit 0
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@abdullah-alnahas/claude-sdd",
|
|
3
|
-
"version": "0.
|
|
3
|
+
"version": "0.7.0",
|
|
4
4
|
"description": "Spec-Driven Development discipline system for Claude Code — behavioral guardrails, spec-first development, architecture awareness, TDD enforcement, iterative execution loops",
|
|
5
5
|
"keywords": [
|
|
6
6
|
"claude-code-plugin",
|
package/scripts/verify-hooks.sh
CHANGED
|
@@ -23,36 +23,50 @@ check() {
|
|
|
23
23
|
echo "SDD Hook Verification"
|
|
24
24
|
echo "─────────────────────"
|
|
25
25
|
|
|
26
|
+
# Detect Python interpreter
|
|
27
|
+
PYTHON=""
|
|
28
|
+
for candidate in python3 python; do
|
|
29
|
+
if command -v "$candidate" &>/dev/null; then
|
|
30
|
+
PYTHON="$candidate"
|
|
31
|
+
break
|
|
32
|
+
fi
|
|
33
|
+
done
|
|
34
|
+
|
|
26
35
|
# Check hooks.json exists and is valid JSON
|
|
27
36
|
echo ""
|
|
28
37
|
echo "hooks.json:"
|
|
29
38
|
check "File exists" test -f "$PLUGIN_DIR/hooks/hooks.json"
|
|
30
|
-
|
|
31
|
-
check "
|
|
39
|
+
if [ -n "$PYTHON" ]; then
|
|
40
|
+
check "Valid JSON" "$PYTHON" -c "import json, sys; json.load(open(sys.argv[1]))" "$PLUGIN_DIR/hooks/hooks.json"
|
|
41
|
+
check "Has hooks wrapper" "$PYTHON" -c "
|
|
32
42
|
import json, sys
|
|
33
43
|
d = json.load(open(sys.argv[1]))
|
|
34
44
|
assert 'hooks' in d, 'Missing hooks key'
|
|
35
45
|
" "$PLUGIN_DIR/hooks/hooks.json"
|
|
36
|
-
check "Has SessionStart hook"
|
|
46
|
+
check "Has SessionStart hook" "$PYTHON" -c "
|
|
37
47
|
import json, sys
|
|
38
48
|
d = json.load(open(sys.argv[1]))
|
|
39
49
|
assert 'SessionStart' in d['hooks']
|
|
40
50
|
" "$PLUGIN_DIR/hooks/hooks.json"
|
|
41
|
-
check "Has UserPromptSubmit hook"
|
|
51
|
+
check "Has UserPromptSubmit hook" "$PYTHON" -c "
|
|
42
52
|
import json, sys
|
|
43
53
|
d = json.load(open(sys.argv[1]))
|
|
44
54
|
assert 'UserPromptSubmit' in d['hooks']
|
|
45
55
|
" "$PLUGIN_DIR/hooks/hooks.json"
|
|
46
|
-
check "Has PostToolUse hook"
|
|
56
|
+
check "Has PostToolUse hook" "$PYTHON" -c "
|
|
47
57
|
import json, sys
|
|
48
58
|
d = json.load(open(sys.argv[1]))
|
|
49
59
|
assert 'PostToolUse' in d['hooks']
|
|
50
60
|
" "$PLUGIN_DIR/hooks/hooks.json"
|
|
51
|
-
check "Has Stop hook"
|
|
61
|
+
check "Has Stop hook" "$PYTHON" -c "
|
|
52
62
|
import json, sys
|
|
53
63
|
d = json.load(open(sys.argv[1]))
|
|
54
64
|
assert 'Stop' in d['hooks']
|
|
55
65
|
" "$PLUGIN_DIR/hooks/hooks.json"
|
|
66
|
+
else
|
|
67
|
+
echo " ⚠ Python not found — skipping JSON validation (install python3 or activate a venv)"
|
|
68
|
+
FAIL=$((FAIL + 1))
|
|
69
|
+
fi
|
|
56
70
|
|
|
57
71
|
# Check scripts exist and are executable
|
|
58
72
|
echo ""
|
|
@@ -62,10 +76,16 @@ check "post-edit-review.sh exists" test -f "$PLUGIN_DIR/hooks/scripts/post-edit-
|
|
|
62
76
|
check "session-init.sh is executable or bash-runnable" bash -n "$PLUGIN_DIR/hooks/scripts/session-init.sh"
|
|
63
77
|
check "post-edit-review.sh is executable or bash-runnable" bash -n "$PLUGIN_DIR/hooks/scripts/post-edit-review.sh"
|
|
64
78
|
|
|
65
|
-
# Test session-init.sh runs without error
|
|
79
|
+
# Test session-init.sh runs without error (in isolated temp dir to avoid side effects)
|
|
66
80
|
echo ""
|
|
67
81
|
echo "Script execution:"
|
|
68
|
-
check "session-init.sh runs without error" bash
|
|
82
|
+
check "session-init.sh runs without error" bash -c "
|
|
83
|
+
TMPDIR=\$(mktemp -d)
|
|
84
|
+
CLAUDE_PROJECT_DIR=\"\$TMPDIR\" CLAUDE_ENV_FILE=\"\" bash \"$PLUGIN_DIR/hooks/scripts/session-init.sh\" 2>/dev/null
|
|
85
|
+
rc=\$?
|
|
86
|
+
rm -rf \"\$TMPDIR\"
|
|
87
|
+
exit \$rc
|
|
88
|
+
"
|
|
69
89
|
|
|
70
90
|
echo ""
|
|
71
91
|
echo "─────────────────────"
|
package/scripts/verify-skills.sh
CHANGED
|
@@ -23,7 +23,7 @@ check() {
|
|
|
23
23
|
echo "SDD Skill Verification"
|
|
24
24
|
echo "──────────────────────"
|
|
25
25
|
|
|
26
|
-
SKILLS=("guardrails" "spec-first" "architecture-aware" "tdd-discipline" "iterative-execution" "performance-optimization")
|
|
26
|
+
SKILLS=("guardrails" "spec-first" "architecture-aware" "tdd-discipline" "iterative-execution" "performance-optimization" "using-sdd")
|
|
27
27
|
|
|
28
28
|
for skill in "${SKILLS[@]}"; do
|
|
29
29
|
echo ""
|
|
@@ -1,11 +1,9 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: Architecture Awareness
|
|
3
3
|
description: >
|
|
4
|
-
|
|
5
|
-
|
|
6
|
-
|
|
7
|
-
"how should I structure this?", "what pattern should I use?", "should I split this into services?",
|
|
8
|
-
"should I write an ADR?", or "document this decision."
|
|
4
|
+
Use when structuring or organizing code, discussing architecture or design patterns, planning
|
|
5
|
+
integrations between components, or when the user asks "how should I structure this?", "what
|
|
6
|
+
pattern should I use?", "should I split this into services?", or "document this decision."
|
|
9
7
|
---
|
|
10
8
|
|
|
11
9
|
# Architecture Awareness
|
|
@@ -41,6 +39,7 @@ If a decision is hard to reverse or affects multiple components, it deserves an
|
|
|
41
39
|
## Related Skills
|
|
42
40
|
|
|
43
41
|
- **spec-first** — architecture decisions emerge during Stage 4 (Architecture)
|
|
42
|
+
- **iterative-execution** — architectural context guides integration during implementation
|
|
44
43
|
- **guardrails** — enforces architectural consistency as part of scope discipline
|
|
45
44
|
|
|
46
45
|
## References
|
|
@@ -1,16 +1,19 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: SDD Guardrails
|
|
3
3
|
description: >
|
|
4
|
-
|
|
5
|
-
|
|
6
|
-
|
|
7
|
-
discipline, simplicity, and verification before claiming completion.
|
|
4
|
+
Use when implementing, building, fixing, refactoring, adding, changing, or modifying code.
|
|
5
|
+
Use when reviewing code or claiming work is complete. Use when you notice yourself agreeing
|
|
6
|
+
without critical evaluation or adding code beyond what was requested.
|
|
8
7
|
---
|
|
9
8
|
|
|
10
9
|
# SDD Behavioral Guardrails
|
|
11
10
|
|
|
12
11
|
Operate under the SDD (Spec-Driven Development) discipline system. These guardrails defend against known LLM failure patterns in software development.
|
|
13
12
|
|
|
13
|
+
## Spirit vs. Letter
|
|
14
|
+
|
|
15
|
+
Follow the **spirit** of these guardrails, not just their checklists. The goal is disciplined development that produces correct, simple, spec-compliant code. If following a checklist item mechanically would produce worse results than thoughtful application of the principle behind it, follow the principle. But this is **never** an excuse to skip steps — it's a reason to apply them thoughtfully.
|
|
16
|
+
|
|
14
17
|
## Core Principles
|
|
15
18
|
|
|
16
19
|
### 1. Honesty Over Agreement
|
|
@@ -26,7 +29,50 @@ The right solution is the simplest one that works. Before writing any code, ask:
|
|
|
26
29
|
Every assumption you make is a potential bug. Enumerate your assumptions explicitly. If you're uncertain about intent, ask. If you're uncertain about behavior, test. Never silently guess.
|
|
27
30
|
|
|
28
31
|
### 5. Verify Before Claiming
|
|
29
|
-
Never say "done" until you've verified.
|
|
32
|
+
Never say "done" until you've verified. This is a formal gate:
|
|
33
|
+
|
|
34
|
+
1. **IDENTIFY** the command or check needed to verify
|
|
35
|
+
2. **RUN** the command (test suite, linter, type checker)
|
|
36
|
+
3. **READ** the output — actually read it, don't skim
|
|
37
|
+
4. **VERIFY** the claim against the output — does the evidence support "done"?
|
|
38
|
+
5. **THEN** claim completion, citing the evidence
|
|
39
|
+
|
|
40
|
+
A completion claim without verification is a lie.
|
|
41
|
+
|
|
42
|
+
**Common verification failures:**
|
|
43
|
+
|
|
44
|
+
| Failure | What Actually Happened |
|
|
45
|
+
|---------|----------------------|
|
|
46
|
+
| "Tests pass" without running them | You guessed. Run them. |
|
|
47
|
+
| Ran tests but didn't read output | A failure was buried in the output. Read it. |
|
|
48
|
+
| Tests pass but don't cover the change | You tested the wrong thing. Check coverage. |
|
|
49
|
+
| "Looks correct" from reading code | Reading is not testing. Execute it. |
|
|
50
|
+
| Verified one case, claimed all cases | Edge cases exist. Test them. |
|
|
51
|
+
|
|
52
|
+
## Rationalization Red Flags
|
|
53
|
+
|
|
54
|
+
These thoughts mean STOP — you're about to violate a guardrail:
|
|
55
|
+
|
|
56
|
+
| Thought | Reality |
|
|
57
|
+
|---------|---------|
|
|
58
|
+
| "This small fix doesn't need the full checkpoint" | Small fixes are where scope creep starts. |
|
|
59
|
+
| "The user seems to want me to just do it" | Discipline is not optional based on tone. |
|
|
60
|
+
| "I'll verify at the end" | Verify continuously. End-of-task verification catches less. |
|
|
61
|
+
| "This is obviously correct" | Obvious code has bugs too. Test it. |
|
|
62
|
+
| "Adding this extra thing will help" | That's scope creep. Mention it, don't do it. |
|
|
63
|
+
| "I'm sure this test passes" | Run it. Being sure is not evidence. |
|
|
64
|
+
| "The user won't notice this improvement" | Unasked changes are defects regardless. |
|
|
65
|
+
| "This is a standard pattern, no need to verify" | Standard patterns fail in specific contexts. Verify. |
|
|
66
|
+
|
|
67
|
+
## Escalation Rule
|
|
68
|
+
|
|
69
|
+
After 3 failed attempts to fix the same issue, **STOP**. Do not attempt a 4th fix. Instead:
|
|
70
|
+
|
|
71
|
+
1. State what you've tried and why each attempt failed
|
|
72
|
+
2. Question whether the approach or architecture is wrong
|
|
73
|
+
3. Suggest an alternative approach or ask the user for direction
|
|
74
|
+
|
|
75
|
+
Repeated failures on the same issue usually indicate a wrong approach, not insufficient effort.
|
|
30
76
|
|
|
31
77
|
## Pre-Implementation Checkpoint
|
|
32
78
|
|
|
@@ -61,3 +61,8 @@
|
|
|
61
61
|
**Detection**: You've been coding for a while and haven't re-read the original requirement.
|
|
62
62
|
**Response**: Periodically re-read the request. Check that your solution actually solves the stated problem, not a related but different one.
|
|
63
63
|
**Example**: User asks to "sort by date" and you implement alphabetical sort because you started coding before fully reading.
|
|
64
|
+
|
|
65
|
+
### 13. Fix Thrashing
|
|
66
|
+
**Detection**: You've attempted 3+ fixes for the same issue and it's still broken. Each fix introduces a new problem or reverts to a previous failure.
|
|
67
|
+
**Response**: Stop fixing. Step back and question the approach. State what you've tried, why each failed, and propose an alternative architecture or ask the user for direction.
|
|
68
|
+
**Example**: A test keeps failing despite three different fixes to the handler. The real problem is the test assumes synchronous behavior but the handler is async. The fix isn't in the handler — it's in the test setup or the architectural approach.
|
|
@@ -1,10 +1,9 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: Iterative Execution
|
|
3
3
|
description: >
|
|
4
|
-
|
|
5
|
-
|
|
6
|
-
|
|
7
|
-
tests pass," or "it's not matching the spec yet."
|
|
4
|
+
Use when implementing a feature from a spec, when implementation needs iterating to match
|
|
5
|
+
requirements, or when delivering any non-trivial change. Use when the user says "make this work,"
|
|
6
|
+
"implement this spec," "keep going until all tests pass," or "it's not matching the spec yet."
|
|
8
7
|
---
|
|
9
8
|
|
|
10
9
|
# Iterative Execution
|
|
@@ -36,6 +35,21 @@ They are complementary, not competing. TDD governs how you write code. Iterative
|
|
|
36
35
|
7. Report honest completion status
|
|
37
36
|
```
|
|
38
37
|
|
|
38
|
+
## Rationalization Red Flags
|
|
39
|
+
|
|
40
|
+
These thoughts mean STOP — you're about to skip verification:
|
|
41
|
+
|
|
42
|
+
| Thought | Reality |
|
|
43
|
+
|---------|---------|
|
|
44
|
+
| "I'll verify everything at the end" | Verify after each change. End-of-task catches less. |
|
|
45
|
+
| "The code looks right, no need to run tests" | Looking right is not evidence. Run the tests. |
|
|
46
|
+
| "I fixed the issue, moving on" | Did you verify the fix? Run the test again. |
|
|
47
|
+
| "Only one small thing changed" | Small changes cause big failures. Verify. |
|
|
48
|
+
| "I already know this passes" | You knew the previous version passed. This is a new version. |
|
|
49
|
+
| "Verification would take too long" | Shipping a bug takes longer. Verify. |
|
|
50
|
+
| "The spec is satisfied, I can see it" | Seeing is not testing. Run the criteria checks. |
|
|
51
|
+
| "I'll skip this iteration's verification" | Skipping once becomes skipping always. Never skip. |
|
|
52
|
+
|
|
39
53
|
## Completion Criteria
|
|
40
54
|
|
|
41
55
|
Good completion criteria are:
|
|
@@ -63,16 +77,16 @@ For performance optimization tasks, the verification step must additionally incl
|
|
|
63
77
|
- **Never weaken criteria to match output.** The spec defines done, not the implementation.
|
|
64
78
|
- **Be honest about partial completion.** "3 of 5 criteria met, blocked on X" is better than a false "done."
|
|
65
79
|
|
|
66
|
-
## References
|
|
67
|
-
|
|
68
80
|
## Related Skills
|
|
69
81
|
|
|
70
82
|
- **tdd-discipline** — the inner discipline used within each implementation step
|
|
71
83
|
- **spec-first** — produces the specs that define completion criteria
|
|
72
84
|
- **guardrails** — the overarching discipline layer
|
|
85
|
+
- **architecture-aware** — architectural context for integration decisions during implementation
|
|
73
86
|
- **performance-optimization** — specialized verification for performance tasks
|
|
74
87
|
|
|
75
88
|
## References
|
|
76
89
|
|
|
77
90
|
See: `references/loop-patterns.md`
|
|
78
91
|
See: `references/completion-criteria.md`
|
|
92
|
+
See: `references/review-prompts.md`
|
|
@@ -0,0 +1,69 @@
|
|
|
1
|
+
# Review Prompt Templates
|
|
2
|
+
|
|
3
|
+
Subagent prompt templates for the two-stage review process.
|
|
4
|
+
|
|
5
|
+
## Stage 1: Spec Compliance Reviewer
|
|
6
|
+
|
|
7
|
+
Use this prompt for the spec-compliance review subagent:
|
|
8
|
+
|
|
9
|
+
```
|
|
10
|
+
You are reviewing an implementation against its behavior specification.
|
|
11
|
+
|
|
12
|
+
DO NOT trust the implementer's report of what was done. Read the actual code and actual test output.
|
|
13
|
+
|
|
14
|
+
Your job:
|
|
15
|
+
1. Read the behavior spec (acceptance criteria)
|
|
16
|
+
2. Read the implementation code
|
|
17
|
+
3. Run or read test results
|
|
18
|
+
4. For EACH acceptance criterion, independently verify:
|
|
19
|
+
- Is there a test that covers this criterion?
|
|
20
|
+
- Does the test actually test what the criterion specifies?
|
|
21
|
+
- Does the test pass? (Read the output, don't trust claims)
|
|
22
|
+
- Does the implementation match the criterion's intent, not just its letter?
|
|
23
|
+
|
|
24
|
+
Report format:
|
|
25
|
+
- For each criterion: PASS / FAIL / PARTIAL with evidence
|
|
26
|
+
- Overall: X of Y criteria satisfied
|
|
27
|
+
- Blocking issues (must fix before proceeding)
|
|
28
|
+
- Non-blocking observations
|
|
29
|
+
|
|
30
|
+
DO NOT soften findings. DO NOT say "mostly works" when a criterion fails.
|
|
31
|
+
A criterion either passes with evidence or it doesn't.
|
|
32
|
+
```
|
|
33
|
+
|
|
34
|
+
## Stage 2: Code Quality Reviewer
|
|
35
|
+
|
|
36
|
+
Use this prompt for the code quality review subagent (only run after Stage 1 passes):
|
|
37
|
+
|
|
38
|
+
```
|
|
39
|
+
You are reviewing code quality after spec compliance has been verified.
|
|
40
|
+
|
|
41
|
+
Review the implementation for:
|
|
42
|
+
1. Unnecessary complexity (could this be simpler?)
|
|
43
|
+
2. Dead code introduced by the changes
|
|
44
|
+
3. Scope creep (changes beyond what the spec required)
|
|
45
|
+
4. Missing error handling at system boundaries
|
|
46
|
+
5. Naming clarity
|
|
47
|
+
6. Function/file length (aim ~50/~500 lines)
|
|
48
|
+
|
|
49
|
+
For each finding, classify:
|
|
50
|
+
- [Critical] — must fix (bugs, security issues)
|
|
51
|
+
- [Simplification] — should fix (unnecessary complexity)
|
|
52
|
+
- [Observation] — consider fixing (style, minor improvements)
|
|
53
|
+
|
|
54
|
+
DO NOT invent requirements. Only flag issues that make the code worse.
|
|
55
|
+
DO NOT suggest adding features, abstractions, or patterns not needed by the spec.
|
|
56
|
+
```
|
|
57
|
+
|
|
58
|
+
## Implementer Self-Review Checklist
|
|
59
|
+
|
|
60
|
+
Before requesting external review, the implementer should verify:
|
|
61
|
+
|
|
62
|
+
1. [ ] Re-read the original request/spec
|
|
63
|
+
2. [ ] Every acceptance criterion has a corresponding test
|
|
64
|
+
3. [ ] All tests pass (actually ran them, read the output)
|
|
65
|
+
4. [ ] No unrelated files were modified
|
|
66
|
+
5. [ ] No dead code was introduced
|
|
67
|
+
6. [ ] No abstractions for single-use patterns
|
|
68
|
+
7. [ ] Function lengths are reasonable
|
|
69
|
+
8. [ ] Changes are the minimum needed to satisfy the spec
|
|
@@ -1,10 +1,9 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: Performance Optimization
|
|
3
3
|
description: >
|
|
4
|
-
|
|
5
|
-
|
|
6
|
-
|
|
7
|
-
performance without breaking correctness.
|
|
4
|
+
Use when optimizing, speeding up, profiling, reducing memory usage, or improving performance.
|
|
5
|
+
Use when the user says "profile this," "find the bottleneck," "speed up," or "optimize."
|
|
6
|
+
Use when any change targets performance without breaking correctness.
|
|
8
7
|
---
|
|
9
8
|
|
|
10
9
|
# Performance Optimization Discipline
|
|
@@ -59,6 +58,8 @@ Before submitting a performance patch, verify it is NOT:
|
|
|
59
58
|
- **guardrails** — enforces correctness-first and verify-before-claiming during optimization
|
|
60
59
|
- **iterative-execution** — the outer verify-fix cycle for measuring and iterating on improvements
|
|
61
60
|
- **tdd-discipline** — ensures test suite is maintained through optimization changes
|
|
61
|
+
- **spec-first** — performance requirements originate in specs (stack.md, behavior-spec.md)
|
|
62
|
+
- **architecture-aware** — structural optimizations require architectural context
|
|
62
63
|
|
|
63
64
|
## References
|
|
64
65
|
|
|
@@ -1,10 +1,9 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: Spec-First Development
|
|
3
3
|
description: >
|
|
4
|
-
|
|
5
|
-
|
|
6
|
-
|
|
7
|
-
plan this out," "write a spec for this," or "let's design this first."
|
|
4
|
+
Use when starting a new project or feature, creating specs or plans, adopting an existing project,
|
|
5
|
+
or when the user says "I want to build something," "let's plan this out," "write a spec," or
|
|
6
|
+
"let's design this first." Use before any non-trivial implementation that lacks a spec.
|
|
8
7
|
---
|
|
9
8
|
|
|
10
9
|
# Spec-First Development
|
|
@@ -64,6 +63,7 @@ For existing projects, use the adoption flow instead of starting from scratch. S
|
|
|
64
63
|
|
|
65
64
|
- **architecture-aware** — for deeper architectural guidance during Stage 4
|
|
66
65
|
- **tdd-discipline** — for test planning from behavior specs (use `references/templates/test-plan.md`)
|
|
66
|
+
- **iterative-execution** — delivers features against the specs produced here
|
|
67
67
|
- **guardrails** — enforces spec-first as a pre-implementation check
|
|
68
68
|
|
|
69
69
|
## References
|
|
@@ -1,16 +1,19 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: TDD Discipline
|
|
3
3
|
description: >
|
|
4
|
-
|
|
5
|
-
|
|
6
|
-
|
|
7
|
-
"fix this bug," or "debug this."
|
|
4
|
+
Use when writing tests, adding test coverage, fixing bugs, debugging, or when any new code needs
|
|
5
|
+
to be written. Use when the user says "write tests," "add tests," "fix this bug," "debug this,"
|
|
6
|
+
or "how should I test this?"
|
|
8
7
|
---
|
|
9
8
|
|
|
10
9
|
# TDD Discipline
|
|
11
10
|
|
|
12
11
|
Tests are not an afterthought — they are the first expression of intent. Write the test that describes the behavior, watch it fail, then write the minimum code to make it pass.
|
|
13
12
|
|
|
13
|
+
## Spirit vs. Letter
|
|
14
|
+
|
|
15
|
+
The spirit of TDD is: **know what correct behavior looks like before writing the code.** The Red/Green/Refactor cycle is the mechanism, but the principle is that you define "done" before you start. If a situation genuinely doesn't benefit from a test-first approach (see "When TDD Is Overhead" below), skip the mechanism — but never skip the principle of defining expected behavior first.
|
|
16
|
+
|
|
14
17
|
## Red → Green → Refactor
|
|
15
18
|
|
|
16
19
|
1. **Red**: Write a failing test that describes the desired behavior
|
|
@@ -19,6 +22,21 @@ Tests are not an afterthought — they are the first expression of intent. Write
|
|
|
19
22
|
|
|
20
23
|
This cycle applies at every level: unit, integration, e2e.
|
|
21
24
|
|
|
25
|
+
## Rationalization Red Flags
|
|
26
|
+
|
|
27
|
+
These thoughts mean STOP — you're about to skip TDD:
|
|
28
|
+
|
|
29
|
+
| Thought | Reality |
|
|
30
|
+
|---------|---------|
|
|
31
|
+
| "I'll write tests after the code works" | That's test-after, not TDD. Write the test first. |
|
|
32
|
+
| "This is too simple to need a test" | Simple code with no test becomes complex code with no test. |
|
|
33
|
+
| "I know this works, I'll just verify manually" | Manual verification doesn't persist. Tests do. |
|
|
34
|
+
| "The test is obvious, I'll skip to code" | If it's obvious, it takes 30 seconds to write. Do it. |
|
|
35
|
+
| "I need to see the code structure first" | Write the test to discover the structure. That's the point. |
|
|
36
|
+
| "This is just a refactor, tests already pass" | Run the tests. Confirm they pass. Then refactor. |
|
|
37
|
+
| "Writing a test for this would be too complex" | If you can't test it, you can't verify it. Simplify the design. |
|
|
38
|
+
| "I'll add tests in the next iteration" | Next iteration never comes. Write them now. |
|
|
39
|
+
|
|
22
40
|
## Relationship to Iterative Execution
|
|
23
41
|
|
|
24
42
|
TDD is the **inner discipline** — how you write each piece of code. Iterative execution is the **outer cycle** — how you deliver a complete feature against a spec. They are complementary: TDD ensures correctness at the unit level; iterative execution ensures spec satisfaction at the feature level. See the **iterative-execution** skill for the full outer cycle.
|
|
@@ -50,13 +68,12 @@ Code: FormHandler.submit()
|
|
|
50
68
|
|
|
51
69
|
This chain ensures nothing is built without a reason and nothing specified goes untested. If a test has no spec criterion, either add the criterion to the spec or question whether the test is needed. If a spec criterion has no test, that is a finding — even if the code works.
|
|
52
70
|
|
|
53
|
-
## References
|
|
54
|
-
|
|
55
71
|
## Related Skills
|
|
56
72
|
|
|
57
73
|
- **iterative-execution** — the outer delivery cycle that uses TDD internally
|
|
58
74
|
- **spec-first** — produces behavior specs that drive test design (see `spec-first/references/templates/test-plan.md`)
|
|
59
75
|
- **guardrails** — enforces TDD during implementation
|
|
76
|
+
- **performance-optimization** — uses TDD to preserve correctness during optimization
|
|
60
77
|
|
|
61
78
|
## References
|
|
62
79
|
|
|
@@ -0,0 +1,72 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: Using SDD
|
|
3
|
+
description: >
|
|
4
|
+
Use at the start of every session and before every response to determine which SDD skills apply.
|
|
5
|
+
This is the meta-skill — it teaches skill discovery and invocation discipline.
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
# Using SDD Skills
|
|
9
|
+
|
|
10
|
+
You have access to SDD (Spec-Driven Development) skills that enforce development discipline. **Check for applicable skills before every response.**
|
|
11
|
+
|
|
12
|
+
## The Rule
|
|
13
|
+
|
|
14
|
+
**Invoke relevant skills BEFORE any response or action.** Even a 1% chance a skill might apply means you should check. If it turns out to be wrong for the situation, you don't need to use it.
|
|
15
|
+
|
|
16
|
+
## Available Skills
|
|
17
|
+
|
|
18
|
+
| Skill | When to Use |
|
|
19
|
+
|-------|-------------|
|
|
20
|
+
| **guardrails** | ANY coding task — implement, build, fix, refactor, add, change, modify |
|
|
21
|
+
| **spec-first** | New project/feature, creating specs/plans, adopting a project |
|
|
22
|
+
| **tdd-discipline** | Writing tests, adding coverage, fixing bugs, debugging |
|
|
23
|
+
| **iterative-execution** | Implementing a feature from spec, iterating to match requirements |
|
|
24
|
+
| **architecture-aware** | Structuring code, design patterns, component integration, ADRs |
|
|
25
|
+
| **performance-optimization** | Optimizing, profiling, speeding up, reducing resource usage |
|
|
26
|
+
|
|
27
|
+
## Skill Priority Order
|
|
28
|
+
|
|
29
|
+
When multiple skills apply, use this order:
|
|
30
|
+
|
|
31
|
+
1. **Guardrails first** — always active for any coding task. This is the discipline layer.
|
|
32
|
+
2. **Process skills second** (spec-first, tdd-discipline, iterative-execution) — these determine HOW to approach the task.
|
|
33
|
+
3. **Domain skills third** (architecture-aware, performance-optimization) — these provide specialized guidance.
|
|
34
|
+
|
|
35
|
+
## Rationalization Red Flags
|
|
36
|
+
|
|
37
|
+
These thoughts mean STOP — you're rationalizing skipping a skill:
|
|
38
|
+
|
|
39
|
+
| Thought | Reality |
|
|
40
|
+
|---------|---------|
|
|
41
|
+
| "This is just a simple question" | Questions about code are tasks. Check guardrails. |
|
|
42
|
+
| "I need more context first" | Skill check comes BEFORE exploration. |
|
|
43
|
+
| "Let me explore the codebase first" | Skills tell you HOW to explore. Check first. |
|
|
44
|
+
| "I can just do this quickly" | Quick work is where discipline matters most. |
|
|
45
|
+
| "This doesn't need a formal skill" | If a skill exists for this task type, use it. |
|
|
46
|
+
| "I remember the skill" | Skills evolve. Read the current version. |
|
|
47
|
+
| "This doesn't count as implementation" | If you're changing code, guardrails apply. |
|
|
48
|
+
| "The skill is overkill" | Simple things become complex. Use it. |
|
|
49
|
+
| "I'll just do this one thing first" | Check BEFORE doing anything. |
|
|
50
|
+
| "This feels productive" | Undisciplined action wastes time. Skills prevent this. |
|
|
51
|
+
| "The user said to skip guardrails" | Only `/sdd-yolo` disables guardrails. Verbal requests don't count. |
|
|
52
|
+
| "I already know what to do" | Knowing the task ≠ following the discipline. |
|
|
53
|
+
|
|
54
|
+
## Skill Classification
|
|
55
|
+
|
|
56
|
+
**Rigid skills** (follow exactly, don't adapt away discipline):
|
|
57
|
+
- guardrails
|
|
58
|
+
- tdd-discipline
|
|
59
|
+
|
|
60
|
+
**Flexible skills** (adapt principles to context):
|
|
61
|
+
- spec-first
|
|
62
|
+
- architecture-aware
|
|
63
|
+
- iterative-execution
|
|
64
|
+
- performance-optimization
|
|
65
|
+
|
|
66
|
+
## Spirit vs. Letter
|
|
67
|
+
|
|
68
|
+
Follow the **spirit** of each skill, not just its checklist. The goal is disciplined development that produces correct, simple, spec-compliant code. If following a checklist item mechanically would produce worse results than thoughtful application of the principle behind it, follow the principle. But this is never an excuse to skip steps — it's a reason to apply them thoughtfully.
|
|
69
|
+
|
|
70
|
+
## References
|
|
71
|
+
|
|
72
|
+
See: `references/skill-creation-process.md`
|
|
@@ -0,0 +1,54 @@
|
|
|
1
|
+
# Skill Creation Process
|
|
2
|
+
|
|
3
|
+
Creating new SDD skills follows a RED/GREEN/REFACTOR approach — the same TDD discipline applied to the skills themselves.
|
|
4
|
+
|
|
5
|
+
## RED: Identify the Failure
|
|
6
|
+
|
|
7
|
+
Before writing a skill, you need evidence of a failure pattern:
|
|
8
|
+
|
|
9
|
+
1. **Observe the failure** — identify a specific, repeatable behavior problem (e.g., the agent skips verification, over-engineers, ignores specs)
|
|
10
|
+
2. **Document the failure** — write down exactly what went wrong, with concrete examples
|
|
11
|
+
3. **Pressure test** — verify this isn't a one-off. Does it happen across different tasks, projects, or prompts?
|
|
12
|
+
|
|
13
|
+
If you can't reproduce the failure consistently, you don't need a skill yet. You need more data.
|
|
14
|
+
|
|
15
|
+
## GREEN: Write the Minimal Skill
|
|
16
|
+
|
|
17
|
+
Write the smallest skill that addresses the failure:
|
|
18
|
+
|
|
19
|
+
1. **Frontmatter** — name + CSO-format description ("Use when..." with trigger conditions only)
|
|
20
|
+
2. **One core principle** — the single behavioral change needed
|
|
21
|
+
3. **Detection** — how the agent recognizes it's about to fail
|
|
22
|
+
4. **Response** — what the agent should do instead
|
|
23
|
+
5. **Rationalization table** — 4-8 entries mapping excuses to counters
|
|
24
|
+
|
|
25
|
+
The skill should be under 500 words at this stage. If it's longer, you're solving too many problems at once.
|
|
26
|
+
|
|
27
|
+
## REFACTOR: Plug Loopholes
|
|
28
|
+
|
|
29
|
+
Deploy the minimal skill and observe:
|
|
30
|
+
|
|
31
|
+
1. **Does the agent follow it?** If not, the trigger conditions in the description may be wrong — fix them.
|
|
32
|
+
2. **Does the agent rationalize around it?** Add entries to the rationalization table for each observed excuse.
|
|
33
|
+
3. **Does it create new problems?** If the skill causes over-correction (e.g., too rigid in cases where flexibility is needed), add "When This Skill Is Overhead" section.
|
|
34
|
+
4. **Is it too broad?** Split into focused skills. One skill should address one failure pattern cluster.
|
|
35
|
+
|
|
36
|
+
## Checklist
|
|
37
|
+
|
|
38
|
+
Before shipping a new skill:
|
|
39
|
+
|
|
40
|
+
- [ ] Failure pattern documented with 3+ examples
|
|
41
|
+
- [ ] Description uses "Use when..." CSO format
|
|
42
|
+
- [ ] Rationalization table has 4+ entries
|
|
43
|
+
- [ ] Skill body under 3000 words
|
|
44
|
+
- [ ] References directory exists (even if empty initially)
|
|
45
|
+
- [ ] Added to `using-sdd` skill table
|
|
46
|
+
- [ ] Added to `scripts/verify-skills.sh` SKILLS array
|
|
47
|
+
- [ ] Rigid vs. flexible classification documented in `using-sdd`
|
|
48
|
+
|
|
49
|
+
## Anti-Patterns
|
|
50
|
+
|
|
51
|
+
- **Speculative skills**: Writing a skill for a problem you haven't observed yet
|
|
52
|
+
- **Kitchen-sink skills**: Cramming multiple unrelated concerns into one skill
|
|
53
|
+
- **Checklist-only skills**: Lists of rules without detection/response guidance
|
|
54
|
+
- **Aspirational skills**: Describing ideal behavior without addressing the specific failure that motivated the skill
|