@codenhub/skills 0.0.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +201 -0
- package/README.md +53 -0
- package/dist/cli.js +213 -0
- package/package.json +36 -0
- package/src/agents-md-improver/SKILL.md +216 -0
- package/src/agents-md-improver/agents/openai.yaml +4 -0
- package/src/agents-md-improver/references/quality-criteria.md +116 -0
- package/src/agents-md-improver/references/templates.md +255 -0
- package/src/agents-md-improver/references/update-guidelines.md +155 -0
- package/src/brainstorming/SKILL.md +118 -0
- package/src/brainstorming/agents/openai.yaml +4 -0
- package/src/caveman/SKILL.md +59 -0
- package/src/caveman/agents/openai.yaml +4 -0
- package/src/caveman-commit/SKILL.md +68 -0
- package/src/caveman-commit/agents/openai.yaml +4 -0
- package/src/caveman-review/SKILL.md +54 -0
- package/src/caveman-review/agents/openai.yaml +4 -0
- package/src/cli.test.ts +102 -0
- package/src/cli.ts +311 -0
- package/src/executing-plans/SKILL.md +92 -0
- package/src/executing-plans/agents/openai.yaml +4 -0
- package/src/frontend-design/SKILL.md +60 -0
- package/src/frontend-design/agents/openai.yaml +4 -0
- package/src/subagent-specialist/SKILL.md +226 -0
- package/src/subagent-specialist/agents/openai.yaml +4 -0
- package/src/subagent-specialist/references/code-quality-reviewer-prompt.md +48 -0
- package/src/subagent-specialist/references/implementer-prompt.md +84 -0
- package/src/subagent-specialist/references/parallel-investigator-prompt.md +49 -0
- package/src/subagent-specialist/references/spec-reviewer-prompt.md +52 -0
- package/src/test-driven-development/SKILL.md +239 -0
- package/src/test-driven-development/agents/openai.yaml +11 -0
- package/src/test-driven-development/testing-anti-patterns.md +162 -0
- package/src/test-driven-development/verification-baselines.md +42 -0
- package/src/writing-plans/SKILL.md +169 -0
- package/src/writing-plans/agents/openai.yaml +4 -0
- package/src/writing-skills/SKILL.md +222 -0
- package/src/writing-skills/agents/openai.yaml +4 -0
- package/src/writing-skills/best-practices.md +321 -0
- package/src/writing-skills/examples/SKILL_AUTHORING_GUIDE_TESTING.md +156 -0
- package/src/writing-skills/persuasion-principles.md +172 -0
- package/src/writing-skills/testing-skills-with-subagents.md +310 -0
- package/src/writing-specs/SKILL.md +72 -0
- package/src/writing-specs/agents/openai.yaml +4 -0
|
@@ -0,0 +1,239 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: test-driven-development
|
|
3
|
+
description: Use when an agent must do feature work, bug fixes, refactors, or behavior changes with fail-first TDD.
|
|
4
|
+
metadata:
|
|
5
|
+
short-description: Enforce test-first development
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
# Test-Driven Development
|
|
9
|
+
|
|
10
|
+
## Overview
|
|
11
|
+
|
|
12
|
+
Use strict test-first development: write a failing test, make it pass with the
|
|
13
|
+
smallest possible implementation, then refactor while keeping tests green.
|
|
14
|
+
|
|
15
|
+
**Core principle:** If the test did not fail first for the expected reason, the
|
|
16
|
+
test does not prove the behavior.
|
|
17
|
+
|
|
18
|
+
## When To Use
|
|
19
|
+
|
|
20
|
+
Apply this workflow for:
|
|
21
|
+
|
|
22
|
+
- new features
|
|
23
|
+
- bug fixes
|
|
24
|
+
- behavior changes
|
|
25
|
+
- refactors that can affect behavior
|
|
26
|
+
|
|
27
|
+
Possible exceptions (only with explicit user confirmation in this conversation):
|
|
28
|
+
|
|
29
|
+
- throwaway prototypes
|
|
30
|
+
- generated code
|
|
31
|
+
- strictly non-executable configuration edits (comments, whitespace, key
|
|
32
|
+
ordering, or descriptive metadata that no runtime/build/deploy tooling reads)
|
|
33
|
+
|
|
34
|
+
Any configuration change that can affect runtime behavior (flags, permissions,
|
|
35
|
+
routing, dependency resolution, build output, or environment loading) requires
|
|
36
|
+
tests.
|
|
37
|
+
|
|
38
|
+
Do not self-approve exception paths.
|
|
39
|
+
|
|
40
|
+
When a change appears to qualify for the config-only exception and explicit
|
|
41
|
+
confirmation is missing, ask once for explicit confirmation for this exact
|
|
42
|
+
change. If confirmation is still missing, run full TDD.
|
|
43
|
+
|
|
44
|
+
If there is any uncertainty about runtime impact, treat the change as behavior
|
|
45
|
+
changing and run tests.
|
|
46
|
+
|
|
47
|
+
Exception authorization must be explicit for this exact change in the
|
|
48
|
+
conversation. Generic urgency language is not approval to skip TDD.
|
|
49
|
+
|
|
50
|
+
Config-only exception checklist (all must be true):
|
|
51
|
+
|
|
52
|
+
- change is limited to comments, formatting, key ordering, or descriptive text
|
|
53
|
+
- no runtime/build/deploy-consumed key or value changed
|
|
54
|
+
- no flags, permissions, routes, dependency pins, or environment keys changed
|
|
55
|
+
- the diff itself clearly proves the above
|
|
56
|
+
|
|
57
|
+
If any checklist item is false or uncertain, do not use the exception path.
|
|
58
|
+
|
|
59
|
+
## The Iron Law
|
|
60
|
+
|
|
61
|
+
```text
|
|
62
|
+
NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
If production code is written before a failing test:
|
|
66
|
+
|
|
67
|
+
1. Remove that code from the intended delivery change set.
|
|
68
|
+
2. Write the failing test.
|
|
69
|
+
3. Re-implement from the test.
|
|
70
|
+
|
|
71
|
+
No exceptions:
|
|
72
|
+
|
|
73
|
+
- Do not keep the old code as "reference".
|
|
74
|
+
- Do not adapt pre-written code while pretending to do RED.
|
|
75
|
+
- Do not use destructive version-control operations just to enforce this rule.
|
|
76
|
+
- Delete means delete from the delivery path.
|
|
77
|
+
|
|
78
|
+
## Red-Green-Refactor Loop
|
|
79
|
+
|
|
80
|
+
### 1) RED - Write a Failing Test
|
|
81
|
+
|
|
82
|
+
Write one small test for one behavior.
|
|
83
|
+
|
|
84
|
+
Requirements:
|
|
85
|
+
|
|
86
|
+
- one behavior per test
|
|
87
|
+
- clear, behavior-focused test name
|
|
88
|
+
- test real behavior, not mock wiring
|
|
89
|
+
- avoid mocks unless isolation is truly required
|
|
90
|
+
|
|
91
|
+
### 2) Verify RED - Watch It Fail
|
|
92
|
+
|
|
93
|
+
**MANDATORY. Never skip this step.**
|
|
94
|
+
|
|
95
|
+
Run the targeted test.
|
|
96
|
+
|
|
97
|
+
Confirm:
|
|
98
|
+
|
|
99
|
+
- test fails for the expected missing-behavior reason
|
|
100
|
+
- failure is not caused by setup, typos, stale fixtures, or harness issues
|
|
101
|
+
|
|
102
|
+
If the failure is a setup or harness error, fix setup and re-run until the
|
|
103
|
+
failure reflects the expected missing behavior.
|
|
104
|
+
|
|
105
|
+
If the test passes immediately, it is not proving new behavior. Fix the test or
|
|
106
|
+
choose a different scenario.
|
|
107
|
+
|
|
108
|
+
### 3) GREEN - Write Minimal Code
|
|
109
|
+
|
|
110
|
+
Implement the smallest change that makes the failing test pass.
|
|
111
|
+
|
|
112
|
+
Rules:
|
|
113
|
+
|
|
114
|
+
- no speculative abstractions
|
|
115
|
+
- no unrelated refactors
|
|
116
|
+
- no extra features beyond the current test
|
|
117
|
+
|
|
118
|
+
### 4) Verify GREEN - Watch It Pass
|
|
119
|
+
|
|
120
|
+
**MANDATORY. Never skip this step.**
|
|
121
|
+
|
|
122
|
+
Run:
|
|
123
|
+
|
|
124
|
+
- the targeted test
|
|
125
|
+
- nearby related tests
|
|
126
|
+
- tests in changed modules or packages
|
|
127
|
+
- tests that cover touched public interfaces and integration boundaries
|
|
128
|
+
- repo-required smoke or pre-merge suites before finishing
|
|
129
|
+
|
|
130
|
+
If baseline failures exist before coding (flaky or not), run the exact
|
|
131
|
+
verification commands before coding and capture failing test IDs plus error
|
|
132
|
+
signatures as baseline evidence.
|
|
133
|
+
|
|
134
|
+
If uncertain, run the broader suite.
|
|
135
|
+
|
|
136
|
+
Confirm:
|
|
137
|
+
|
|
138
|
+
- new test passes
|
|
139
|
+
- existing tests remain green, or only baseline failures with matching test IDs
|
|
140
|
+
and error signatures remain
|
|
141
|
+
- no new warnings or runtime errors
|
|
142
|
+
|
|
143
|
+
For baseline handling and flaky failure triage, use
|
|
144
|
+
`verification-baselines.md`.
|
|
145
|
+
|
|
146
|
+
### 5) REFACTOR - Improve Design Safely
|
|
147
|
+
|
|
148
|
+
Refactor only while all tests are green.
|
|
149
|
+
|
|
150
|
+
Allowed work:
|
|
151
|
+
|
|
152
|
+
- remove duplication
|
|
153
|
+
- improve naming
|
|
154
|
+
- extract helpers
|
|
155
|
+
- simplify design
|
|
156
|
+
|
|
157
|
+
Do not change behavior during refactor. If behavior changes, start another RED
|
|
158
|
+
cycle first.
|
|
159
|
+
|
|
160
|
+
### 6) Repeat
|
|
161
|
+
|
|
162
|
+
Move to the next behavior with a new failing test.
|
|
163
|
+
|
|
164
|
+
## Test Quality Rules
|
|
165
|
+
|
|
166
|
+
| Quality | Good | Bad |
|
|
167
|
+
| ------------ | ------------------------------------- | ---------------------------------------- |
|
|
168
|
+
| Focus | One behavior per test | Multiple behaviors in one test |
|
|
169
|
+
| Naming | Describes expected behavior | Vague names like `test1` |
|
|
170
|
+
| Intent | Validates externally visible behavior | Validates private implementation details |
|
|
171
|
+
| Dependencies | Real collaborators when practical | Heavy mocking without clear need |
|
|
172
|
+
|
|
173
|
+
## Common Rationalizations
|
|
174
|
+
|
|
175
|
+
| Excuse | Reality |
|
|
176
|
+
| ---------------------------------------- | ----------------------------------------------------------------------- |
|
|
177
|
+
| "It is too small to test." | Small code breaks too. The test is cheap insurance. |
|
|
178
|
+
| "I will add tests after coding." | Passing tests written after code do not prove they can fail correctly. |
|
|
179
|
+
| "I already tested manually." | Manual checks are not repeatable regression protection. |
|
|
180
|
+
| "Deleting work is wasteful." | Keeping unverified code creates future bugs and rework. |
|
|
181
|
+
| "Tests after are equivalent." | Tests-after ask what code does; tests-first define what code should do. |
|
|
182
|
+
| "TDD is dogmatic. I am being pragmatic." | Pragmatic delivery includes repeatable tests that fail first. |
|
|
183
|
+
|
|
184
|
+
If these appear, stop and return to RED.
|
|
185
|
+
|
|
186
|
+
## Red Flags
|
|
187
|
+
|
|
188
|
+
Stop and restart the cycle when you see:
|
|
189
|
+
|
|
190
|
+
- production code added before a failing test
|
|
191
|
+
- tests written after implementation "for coverage"
|
|
192
|
+
- tests that pass on first run when adding new behavior
|
|
193
|
+
- inability to explain why the test failed
|
|
194
|
+
- mock-heavy tests that do not validate real behavior
|
|
195
|
+
- phrases like "just this once" or "I already manually tested it"
|
|
196
|
+
- phrases that rationalize skipping fail-first behavior
|
|
197
|
+
|
|
198
|
+
## Bug Fix Pattern
|
|
199
|
+
|
|
200
|
+
For bugs, start by writing a failing regression test that reproduces the issue.
|
|
201
|
+
|
|
202
|
+
If deterministic reproduction is not immediately possible, write the smallest
|
|
203
|
+
deterministic characterization test (contract, property, or boundary case) that
|
|
204
|
+
captures the observed failure before changing production code.
|
|
205
|
+
|
|
206
|
+
If you cannot produce a failing automated test, stop and ask the user before
|
|
207
|
+
shipping a bug fix.
|
|
208
|
+
|
|
209
|
+
Then run the normal RED-GREEN-REFACTOR cycle. Never ship a bug fix without a
|
|
210
|
+
reproduction or characterization test.
|
|
211
|
+
|
|
212
|
+
## Testing Anti-Patterns
|
|
213
|
+
|
|
214
|
+
When adding or changing tests, especially with mocks, review
|
|
215
|
+
`testing-anti-patterns.md`.
|
|
216
|
+
|
|
217
|
+
When baseline failures or flaky tests appear during verification, review
|
|
218
|
+
`verification-baselines.md`.
|
|
219
|
+
|
|
220
|
+
## Verification Checklist
|
|
221
|
+
|
|
222
|
+
Before marking work complete:
|
|
223
|
+
|
|
224
|
+
- [ ] Every behavior change has a test
|
|
225
|
+
- [ ] Each new test was observed failing first
|
|
226
|
+
- [ ] Failures happened for the expected reason
|
|
227
|
+
- [ ] Implementation was minimal for each GREEN step
|
|
228
|
+
- [ ] All relevant tests are passing
|
|
229
|
+
- [ ] Refactors preserved behavior
|
|
230
|
+
- [ ] Edge cases and error paths were covered
|
|
231
|
+
|
|
232
|
+
If any box is unchecked, the workflow is incomplete.
|
|
233
|
+
|
|
234
|
+
## Bottom Line
|
|
235
|
+
|
|
236
|
+
```text
|
|
237
|
+
Test first. Fail first. Minimal pass. Refactor safely.
|
|
238
|
+
No failing test first means no TDD.
|
|
239
|
+
```
|
|
@@ -0,0 +1,11 @@
|
|
|
1
|
+
interface:
|
|
2
|
+
display_name: "Test-Driven Development"
|
|
3
|
+
short_description: "Enforce strict test-first red-green-refactor"
|
|
4
|
+
default_prompt: >
|
|
5
|
+
Use $test-driven-development for strict test-first TDD: verify RED before
|
|
6
|
+
production code, remove any premature implementation from the delivery path,
|
|
7
|
+
allow config-only exceptions only with explicit user confirmation for that
|
|
8
|
+
exact change, verify GREEN with targeted plus relevant related suites,
|
|
9
|
+
handle flaky baselines by matching test IDs and error signatures, treat new
|
|
10
|
+
failures as regressions, refactor only while tests stay green, and apply
|
|
11
|
+
testing anti-pattern guardrails when mocks are involved.
|
|
@@ -0,0 +1,162 @@
|
|
|
1
|
+
# Testing Anti-Patterns
|
|
2
|
+
|
|
3
|
+
Load this reference when writing tests, adding mocks, or considering test-only
|
|
4
|
+
helpers in production code.
|
|
5
|
+
|
|
6
|
+
## Overview
|
|
7
|
+
|
|
8
|
+
Tests should verify behavior that matters to users and calling code. Mocks are
|
|
9
|
+
tools for isolation, not the target of assertions.
|
|
10
|
+
|
|
11
|
+
**Core principle:** Test real behavior, not mock behavior.
|
|
12
|
+
|
|
13
|
+
## Iron Rules
|
|
14
|
+
|
|
15
|
+
```text
|
|
16
|
+
1. Never test mock existence instead of behavior.
|
|
17
|
+
2. Never add test-only methods to production APIs.
|
|
18
|
+
3. Never mock dependencies you do not understand.
|
|
19
|
+
4. Do not use partial mocks for structured payloads unless fixture builders
|
|
20
|
+
provide all required fields by default.
|
|
21
|
+
```
|
|
22
|
+
|
|
23
|
+
## Anti-Pattern 1: Testing Mock Behavior
|
|
24
|
+
|
|
25
|
+
Problem:
|
|
26
|
+
|
|
27
|
+
- assertions target `*-mock` elements or fake methods
|
|
28
|
+
- tests pass when mock is present, not when behavior is correct
|
|
29
|
+
|
|
30
|
+
Fix:
|
|
31
|
+
|
|
32
|
+
- assert externally visible behavior
|
|
33
|
+
- use real collaborators when practical
|
|
34
|
+
- if mocking is required, assert the unit output, not mock rendering details
|
|
35
|
+
|
|
36
|
+
Gate check before adding assertions:
|
|
37
|
+
|
|
38
|
+
```text
|
|
39
|
+
Am I checking behavior of the system under test,
|
|
40
|
+
or only confirming a mock exists?
|
|
41
|
+
```
|
|
42
|
+
|
|
43
|
+
If it is only mock existence, rewrite the test.
|
|
44
|
+
|
|
45
|
+
## Anti-Pattern 2: Test-Only Methods in Production Code
|
|
46
|
+
|
|
47
|
+
Problem:
|
|
48
|
+
|
|
49
|
+
- production classes gain methods used only by tests
|
|
50
|
+
- public APIs become polluted with lifecycle code that belongs in test tooling
|
|
51
|
+
|
|
52
|
+
Fix:
|
|
53
|
+
|
|
54
|
+
- move cleanup/setup behavior to test utilities
|
|
55
|
+
- keep production APIs focused on production use cases
|
|
56
|
+
|
|
57
|
+
Gate check before adding a method:
|
|
58
|
+
|
|
59
|
+
```text
|
|
60
|
+
Is this method required by production behavior,
|
|
61
|
+
or only by test setup/cleanup?
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
If only tests need it, place it in test support code.
|
|
65
|
+
|
|
66
|
+
## Anti-Pattern 3: Mocking Without Understanding Dependencies
|
|
67
|
+
|
|
68
|
+
Problem:
|
|
69
|
+
|
|
70
|
+
- high-level methods are mocked blindly
|
|
71
|
+
- required side effects are accidentally removed
|
|
72
|
+
- tests fail or pass for the wrong reasons
|
|
73
|
+
|
|
74
|
+
Fix:
|
|
75
|
+
|
|
76
|
+
1. Run the test against real code first.
|
|
77
|
+
2. Identify which dependency is slow, flaky, or external.
|
|
78
|
+
3. Mock only that boundary.
|
|
79
|
+
4. Preserve side effects the test relies on.
|
|
80
|
+
5. If dependency behavior is unclear, map required side effects before mocking.
|
|
81
|
+
|
|
82
|
+
Red flags:
|
|
83
|
+
|
|
84
|
+
- "I will mock this to be safe"
|
|
85
|
+
- "I do not know what this dependency does"
|
|
86
|
+
- mock setup is longer than the test behavior itself
|
|
87
|
+
|
|
88
|
+
## Anti-Pattern 4: Incomplete Data Mocks
|
|
89
|
+
|
|
90
|
+
Problem:
|
|
91
|
+
|
|
92
|
+
- mocked payloads include only fields used by the current assertion
|
|
93
|
+
- downstream code later depends on missing fields
|
|
94
|
+
|
|
95
|
+
Fix:
|
|
96
|
+
|
|
97
|
+
- include all required fields, metadata, and nested defaults from real schemas
|
|
98
|
+
- prefer fixture builders/factories so tests override only fields under test
|
|
99
|
+
- centralize canonical fixtures when the structure is reused
|
|
100
|
+
|
|
101
|
+
Gate check:
|
|
102
|
+
|
|
103
|
+
```text
|
|
104
|
+
Does this mock represent the real schema,
|
|
105
|
+
or only the subset I remembered?
|
|
106
|
+
```
|
|
107
|
+
|
|
108
|
+
## Anti-Pattern 5: Tests as a Final Step
|
|
109
|
+
|
|
110
|
+
Problem:
|
|
111
|
+
|
|
112
|
+
- implementation is marked "done"
|
|
113
|
+
- tests are added afterward for confirmation
|
|
114
|
+
|
|
115
|
+
Fix:
|
|
116
|
+
|
|
117
|
+
- return to strict RED-GREEN-REFACTOR
|
|
118
|
+
- write the failing test first for every behavior change
|
|
119
|
+
|
|
120
|
+
If a test did not fail first, it did not drive the implementation.
|
|
121
|
+
|
|
122
|
+
## When Mocks Become Too Complex
|
|
123
|
+
|
|
124
|
+
Warning signs:
|
|
125
|
+
|
|
126
|
+
- mock setup is longer than the test behavior
|
|
127
|
+
- multiple layers of mocks or fakes are required just to run the test
|
|
128
|
+
- the test fails when mock internals change but behavior stays the same
|
|
129
|
+
- you cannot explain why each mock is needed
|
|
130
|
+
|
|
131
|
+
Response:
|
|
132
|
+
|
|
133
|
+
- prefer an integration test with real collaborators
|
|
134
|
+
- mock only true external boundaries (network, process, filesystem, time)
|
|
135
|
+
- keep assertions focused on observable behavior
|
|
136
|
+
|
|
137
|
+
Gate check:
|
|
138
|
+
|
|
139
|
+
```text
|
|
140
|
+
Are mocks helping isolate the behavior,
|
|
141
|
+
or replacing the behavior under test?
|
|
142
|
+
```
|
|
143
|
+
|
|
144
|
+
If mocks are replacing behavior, reduce mocking or move to an integration test.
|
|
145
|
+
|
|
146
|
+
## Quick Diagnostic
|
|
147
|
+
|
|
148
|
+
If any answer is "yes", revisit your test design:
|
|
149
|
+
|
|
150
|
+
- Are assertions mostly about mocks?
|
|
151
|
+
- Are you introducing test-only production methods?
|
|
152
|
+
- Are you mocking dependencies without mapping side effects?
|
|
153
|
+
- Are your mocked payloads incomplete vs real schemas?
|
|
154
|
+
- Is mock setup larger than the behavior under test?
|
|
155
|
+
- Were tests added only after implementation finished?
|
|
156
|
+
|
|
157
|
+
## Bottom Line
|
|
158
|
+
|
|
159
|
+
```text
|
|
160
|
+
Use mocks to isolate boundaries, not to fake confidence.
|
|
161
|
+
Behavior-first tests + strict TDD prevent most test anti-patterns.
|
|
162
|
+
```
|
|
@@ -0,0 +1,42 @@
|
|
|
1
|
+
# Verification Baselines and Flaky Triage
|
|
2
|
+
|
|
3
|
+
Load this reference when baseline failures exist before coding or verification
|
|
4
|
+
fails during VERIFY GREEN.
|
|
5
|
+
|
|
6
|
+
## Core Rule
|
|
7
|
+
|
|
8
|
+
Compare like-for-like test results using identical commands. Never re-label new
|
|
9
|
+
or drifted failures as "known flaky."
|
|
10
|
+
|
|
11
|
+
## 1) Capture Baseline Before Coding
|
|
12
|
+
|
|
13
|
+
- run the exact verification commands you plan to use after changes
|
|
14
|
+
- record failing test IDs and error signatures
|
|
15
|
+
- keep enough evidence to compare later (command + failing IDs + signature text)
|
|
16
|
+
|
|
17
|
+
## 2) Compare Post-Change Results
|
|
18
|
+
|
|
19
|
+
Run the same commands and classify each failure:
|
|
20
|
+
|
|
21
|
+
- **baseline match:** test ID and error signature match baseline evidence
|
|
22
|
+
exactly; document and continue
|
|
23
|
+
- **new or drifted failure:** new failing test ID or changed signature; treat as
|
|
24
|
+
regression and fix before proceeding
|
|
25
|
+
- **infrastructure failure:** runner outage, dependency timeout, or environment
|
|
26
|
+
instability; re-run once, then if still failing and clearly unrelated to
|
|
27
|
+
changed paths, document evidence and ask the user before completion
|
|
28
|
+
|
|
29
|
+
## 3) Guardrails
|
|
30
|
+
|
|
31
|
+
- use identical verification commands for baseline and post-change comparison
|
|
32
|
+
- do not downgrade new failures to "already flaky"
|
|
33
|
+
- if your change touched a flaky test, stabilize it now or quarantine it with a
|
|
34
|
+
documented follow-up issue before completion
|
|
35
|
+
- if you must change verification commands, recapture baseline evidence first
|
|
36
|
+
|
|
37
|
+
## Bottom Line
|
|
38
|
+
|
|
39
|
+
```text
|
|
40
|
+
Baseline evidence protects trust in GREEN. Same command, same comparison,
|
|
41
|
+
no relabeling.
|
|
42
|
+
```
|
|
@@ -0,0 +1,169 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: writing-plans
|
|
3
|
+
description: Use when you have a spec or requirements for a multi-step task, before touching code.
|
|
4
|
+
metadata:
|
|
5
|
+
short-description: Build complete implementation plans
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
# Writing Plans
|
|
9
|
+
|
|
10
|
+
## Overview
|
|
11
|
+
|
|
12
|
+
Write implementation-ready plans for a low-context engineer. Document the file map, task sequence, relevant docs, constraints, and verification needed to execute safely without guessing. Keep the plan concrete, but do not pre-write the full implementation unless the shape is fragile or non-obvious. Prefer plans that assume delegated execution via subagents when practical. DRY. YAGNI. TDD. Use frequent, atomic commits when the workflow calls for them.
|
|
13
|
+
|
|
14
|
+
Assume they are a skilled developer, but know almost nothing about the toolset or problem domain.
|
|
15
|
+
|
|
16
|
+
**Announce at start:** "I'm using the writing-plans skill to create the implementation plan."
|
|
17
|
+
|
|
18
|
+
**Context:** In planning context, include implementation workflow guidance when useful (often: new branch or isolated worktree, atomic commits, and a PR at the end). If repo conventions or explicit user instructions call for another approach, follow that instead.
|
|
19
|
+
|
|
20
|
+
## Tool Compatibility
|
|
21
|
+
|
|
22
|
+
- Keep instructions tool-agnostic and avoid provider-specific wording.
|
|
23
|
+
- When behavior differs across tools, resolve conflicts in this order: OpenCode > Claude Code > Codex CLI > Gemini CLI.
|
|
24
|
+
|
|
25
|
+
## Save Plans To
|
|
26
|
+
|
|
27
|
+
- Prefer `.docs/plans/NNNN--kebab-case-plan-name.md` (for example, `.docs/plans/0001--add-bulk-export-plan.md`).
|
|
28
|
+
- Before choosing the path, inspect the project for an existing structure and respect it if one is already in use, for example `docs/superpowers/plans/`, `.docs/plans/`, or another documented path.
|
|
29
|
+
- User preferences for plan location override this default.
|
|
30
|
+
|
|
31
|
+
## Status
|
|
32
|
+
|
|
33
|
+
- Every plan must include `**Status:** Draft` near the top.
|
|
34
|
+
- Valid statuses for spec and plan docs are `Draft`, `Approved`, `In Progress`, and `Implemented`.
|
|
35
|
+
|
|
36
|
+
## Approval Gate
|
|
37
|
+
|
|
38
|
+
- Prefer writing plans from a spec documented as `Approved`.
|
|
39
|
+
- If no approved spec is available, write from explicit user-provided requirements when the user asks you to proceed.
|
|
40
|
+
- Never mark a spec or plan `Approved` on your own. The user must update it, or you may update it only if they explicitly ask you to.
|
|
41
|
+
|
|
42
|
+
## Scope Check
|
|
43
|
+
|
|
44
|
+
If the spec covers multiple independent subsystems, it should have been broken into sub-project specs during brainstorming. If it was not, suggest breaking this into separate plans, one per subsystem. Each plan should produce working, testable software on its own.
|
|
45
|
+
|
|
46
|
+
## File Structure
|
|
47
|
+
|
|
48
|
+
Before defining tasks, map out which files will be created or modified and what each one is responsible for. This is where decomposition decisions get locked in.
|
|
49
|
+
|
|
50
|
+
- Design units with clear boundaries and well-defined interfaces. Each file should have one clear responsibility.
|
|
51
|
+
- You reason best about code you can hold in context at once, and your edits are more reliable when files are focused. Prefer smaller, focused files over large ones that do too much.
|
|
52
|
+
- Files that change together should live together. Split by responsibility, not by technical layer.
|
|
53
|
+
- In existing codebases, follow established patterns. If the codebase uses large files, do not unilaterally restructure, but if a file you are modifying has grown unwieldy, including a split in the plan is reasonable.
|
|
54
|
+
|
|
55
|
+
This structure informs the task decomposition. Each task should produce self-contained changes that make sense independently.
|
|
56
|
+
|
|
57
|
+
## Bite-Sized Task Granularity
|
|
58
|
+
|
|
59
|
+
**Each task should be a coherent, verifiable unit of work:**
|
|
60
|
+
|
|
61
|
+
- Split into smaller steps only when sequencing, validation, or rollback boundaries are fragile.
|
|
62
|
+
- Keep tightly coupled work together instead of exploding it into artificial micro-steps.
|
|
63
|
+
- Include commit guidance only when the workflow or user explicitly calls for it, or when a milestone boundary matters.
|
|
64
|
+
|
|
65
|
+
## Plan Document Header
|
|
66
|
+
|
|
67
|
+
**Every plan MUST start with this header:**
|
|
68
|
+
|
|
69
|
+
```markdown
|
|
70
|
+
# [Feature Name] Implementation Plan
|
|
71
|
+
|
|
72
|
+
**Status:** Draft
|
|
73
|
+
|
|
74
|
+
> **For agentic workers:** Optional sub-skill: use `executing-plans` to implement this plan task-by-task when delegated execution is preferred.
|
|
75
|
+
> Delegated and inline execution are both valid. When useful, default git flow is a new branch, atomic commits, and a PR at the end unless repo conventions or explicit user instructions say otherwise.
|
|
76
|
+
|
|
77
|
+
**Goal:** [One sentence describing what this builds]
|
|
78
|
+
**Architecture:** [2-3 sentences about approach]
|
|
79
|
+
**Tech Stack:** [Key technologies/libraries]
|
|
80
|
+
|
|
81
|
+
---
|
|
82
|
+
```
|
|
83
|
+
|
|
84
|
+
Steps use checkbox (`- [ ]`) syntax for tracking.
|
|
85
|
+
|
|
86
|
+
## Task Structure
|
|
87
|
+
|
|
88
|
+
```markdown
|
|
89
|
+
### Task N: [Component Name]
|
|
90
|
+
|
|
91
|
+
**Goal:** [What this task completes]
|
|
92
|
+
|
|
93
|
+
**Files:**
|
|
94
|
+
|
|
95
|
+
- Create: `exact/path/to/file.py`
|
|
96
|
+
- Modify: `exact/path/to/existing.py`
|
|
97
|
+
- Test: `tests/exact/path/to/test.py`
|
|
98
|
+
|
|
99
|
+
- [ ] **Implement the task**
|
|
100
|
+
|
|
101
|
+
Implementation notes:
|
|
102
|
+
|
|
103
|
+
- [Key behavior, interfaces, constraints, and patterns to follow]
|
|
104
|
+
- [Reference an earlier helper or task when relevant, and restate what changes]
|
|
105
|
+
|
|
106
|
+
- [ ] **Verify the task**
|
|
107
|
+
|
|
108
|
+
Run: `pytest tests/path/test.py -v`
|
|
109
|
+
Expected: PASS
|
|
110
|
+
```
|
|
111
|
+
|
|
112
|
+
## No Placeholders
|
|
113
|
+
|
|
114
|
+
Every step must contain the actual content an engineer needs.
|
|
115
|
+
|
|
116
|
+
These are **plan failures** and must never appear:
|
|
117
|
+
|
|
118
|
+
- "TBD", "TODO", "implement later", "fill in details"
|
|
119
|
+
- "Add appropriate error handling" / "add validation" / "handle edge cases"
|
|
120
|
+
- "Write tests for the above" without stating what behavior to verify and how to run it
|
|
121
|
+
- "Similar to Task N" or "same as above" without pointing to the earlier task or helper and stating what changes
|
|
122
|
+
- Steps that describe what to do without giving enough detail to execute safely
|
|
123
|
+
- References to types, functions, or methods not defined anywhere in the plan or spec
|
|
124
|
+
|
|
125
|
+
## Remember
|
|
126
|
+
|
|
127
|
+
- Prefer exact file paths when known. If they are not yet certain, identify the owning area and the discovery needed before editing.
|
|
128
|
+
- Make the plan concrete enough to execute without guesswork. Include exact code only when interface shape, migrations, commands, or test structure are non-obvious.
|
|
129
|
+
- Exact commands with expected output when verification matters
|
|
130
|
+
- DRY, YAGNI, TDD, and milestone-based commit guidance when the workflow calls for it
|
|
131
|
+
|
|
132
|
+
## Self-Review
|
|
133
|
+
|
|
134
|
+
After writing the complete plan, look at the spec with fresh eyes and check the plan against it.
|
|
135
|
+
|
|
136
|
+
This is a checklist you run yourself, not a subagent dispatch.
|
|
137
|
+
|
|
138
|
+
1. **Spec coverage:** Skim each section and requirement in the spec. Can you point to a task that implements it? List any gaps.
|
|
139
|
+
2. **Placeholder scan:** Search your plan for red flags, any of the patterns from the "No Placeholders" section above. Fix them.
|
|
140
|
+
3. **Type consistency:** Do the types, method signatures, and property names you used in later tasks match what you defined in earlier tasks? A function called `clearLayers()` in Task 3 but `clearFullLayers()` in Task 7 is a bug.
|
|
141
|
+
|
|
142
|
+
If you find issues, fix them inline. No need to re-review, just fix and move on. If you find a spec requirement with no task, add the task.
|
|
143
|
+
|
|
144
|
+
## User Review Gate
|
|
145
|
+
|
|
146
|
+
After saving the plan and completing self-review, ask the user to review it before execution.
|
|
147
|
+
|
|
148
|
+
If they approve in chat but the document is still `Draft`, ask whether they want to update it themselves or want you to update it.
|
|
149
|
+
|
|
150
|
+
Only hand off to `executing-plans` once the plan document says `Approved`, unless the user explicitly instructs you to execute without documented approval.
|
|
151
|
+
|
|
152
|
+
## Execution Handoff
|
|
153
|
+
|
|
154
|
+
After the plan is documented as `Approved`, offer execution choice:
|
|
155
|
+
|
|
156
|
+
\*\*"Plan complete and saved to `[PLAN_FILE_PATH]` with status `Approved`.
|
|
157
|
+
|
|
158
|
+
Two execution options:
|
|
159
|
+
|
|
160
|
+
1. Delegated execution
|
|
161
|
+
- Default when subagents are available and the tasks can be split cleanly
|
|
162
|
+
- Use `executing-plans` with fresh workers per task or independent workstream
|
|
163
|
+
2. Inline execution
|
|
164
|
+
- Use when the work is tightly coupled, subagents are unavailable, or you explicitly prefer inline
|
|
165
|
+
- Follow the same plan and verification steps in this session
|
|
166
|
+
|
|
167
|
+
Default implementation workflow is a new branch, atomic commits, and a PR at the end unless repo conventions or your instructions say otherwise.
|
|
168
|
+
|
|
169
|
+
Which approach?"\*\*
|
|
@@ -0,0 +1,4 @@
|
|
|
1
|
+
interface:
|
|
2
|
+
display_name: "Writing Plans"
|
|
3
|
+
short_description: "Turn specs into detailed implementation plans"
|
|
4
|
+
default_prompt: "Use $writing-plans to turn an approved spec (preferred), or explicit user requirements when no approved spec exists or the user asks to proceed, into a direct, complete implementation plan with a status, bite-sized tasks, exact test steps, optional subagent-assisted execution guidance, and the usual new-branch/atomic-commit/PR workflow, then wait for documented approval before execution unless the user explicitly says otherwise."
|