@anionzo/skill 1.4.0 → 1.6.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CONTRIBUTING.md +2 -1
- package/README.md +21 -9
- package/docs/design-brief.md +19 -13
- package/i18n/CONTRIBUTING.vi.md +2 -1
- package/i18n/README.vi.md +21 -9
- package/i18n/design-brief.vi.md +19 -13
- package/knowledge/global/skill-triggering-rules.md +2 -1
- package/package.json +1 -1
- package/skills/brainstorming/SKILL.md +176 -13
- package/skills/brainstorming/meta.yaml +19 -10
- package/skills/code-review/SKILL.md +214 -19
- package/skills/code-review/meta.yaml +21 -9
- package/skills/commit/SKILL.md +187 -0
- package/skills/commit/examples.md +62 -0
- package/skills/commit/meta.yaml +30 -0
- package/skills/commit/references/output-template.md +14 -0
- package/skills/debug/SKILL.md +252 -0
- package/skills/debug/examples.md +83 -0
- package/skills/debug/meta.yaml +38 -0
- package/skills/debug/references/output-template.md +16 -0
- package/skills/docs-writer/SKILL.md +85 -10
- package/skills/docs-writer/meta.yaml +16 -12
- package/skills/extract/SKILL.md +161 -0
- package/skills/extract/examples.md +47 -0
- package/skills/extract/meta.yaml +27 -0
- package/skills/extract/references/output-template.md +24 -0
- package/skills/feature-delivery/SKILL.md +10 -5
- package/skills/feature-delivery/meta.yaml +5 -0
- package/skills/go-pipeline/SKILL.md +156 -0
- package/skills/go-pipeline/examples.md +56 -0
- package/skills/go-pipeline/meta.yaml +27 -0
- package/skills/go-pipeline/references/output-template.md +17 -0
- package/skills/planning/SKILL.md +128 -17
- package/skills/planning/meta.yaml +15 -6
- package/skills/refactor-safe/SKILL.md +10 -7
- package/skills/repo-onboarding/SKILL.md +11 -7
- package/skills/repo-onboarding/meta.yaml +2 -0
- package/skills/research/SKILL.md +100 -0
- package/skills/research/examples.md +79 -0
- package/skills/research/meta.yaml +27 -0
- package/skills/research/references/output-template.md +23 -0
- package/skills/test-driven-development/SKILL.md +194 -0
- package/skills/test-driven-development/examples.md +77 -0
- package/skills/test-driven-development/meta.yaml +31 -0
- package/skills/test-driven-development/references/.gitkeep +0 -0
- package/skills/test-driven-development/references/output-template.md +31 -0
- package/skills/using-skills/SKILL.md +32 -14
- package/skills/using-skills/examples.md +3 -3
- package/skills/using-skills/meta.yaml +8 -3
- package/skills/verification-before-completion/SKILL.md +127 -13
- package/skills/verification-before-completion/meta.yaml +24 -14
- package/templates/SKILL.md +8 -1
- package/skills/bug-triage/SKILL.md +0 -47
- package/skills/bug-triage/examples.md +0 -68
- package/skills/bug-triage/meta.yaml +0 -25
- package/skills/bug-triage/references/output-template.md +0 -26
|
@@ -0,0 +1,194 @@
|
|
|
1
|
+
# Test-Driven Development
|
|
2
|
+
|
|
3
|
+
## Purpose
|
|
4
|
+
|
|
5
|
+
Enforce the discipline of writing a failing test before writing production code. This skill exists because tests written after implementation pass immediately, proving nothing — they verify what you built, not what was required.
|
|
6
|
+
|
|
7
|
+
## When To Use
|
|
8
|
+
|
|
9
|
+
Load this skill when:
|
|
10
|
+
|
|
11
|
+
- implementing any new feature or behavior
|
|
12
|
+
- fixing a bug (write a test that reproduces the bug first)
|
|
13
|
+
- refactoring code that lacks test coverage
|
|
14
|
+
- the user says "use TDD", "test first", or "red-green-refactor"
|
|
15
|
+
|
|
16
|
+
Exceptions (confirm with the user first):
|
|
17
|
+
|
|
18
|
+
- throwaway prototypes or spikes
|
|
19
|
+
- generated code (codegen, scaffolding)
|
|
20
|
+
- pure configuration files
|
|
21
|
+
|
|
22
|
+
## The Iron Law
|
|
23
|
+
|
|
24
|
+
```
|
|
25
|
+
NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST
|
|
26
|
+
```
|
|
27
|
+
|
|
28
|
+
Wrote code before the test? Delete it. Start over. Implement fresh from the test.
|
|
29
|
+
|
|
30
|
+
- Do not keep it as "reference"
|
|
31
|
+
- Do not "adapt" it while writing tests
|
|
32
|
+
- Do not look at it while writing the test
|
|
33
|
+
- Delete means delete
|
|
34
|
+
|
|
35
|
+
## Workflow: Red-Green-Refactor
|
|
36
|
+
|
|
37
|
+
### 1. RED — Write a Failing Test
|
|
38
|
+
|
|
39
|
+
Write one minimal test that describes the behavior you want.
|
|
40
|
+
|
|
41
|
+
Requirements:
|
|
42
|
+
|
|
43
|
+
- Tests one behavior (if the test name contains "and", split it)
|
|
44
|
+
- Clear name that describes expected behavior
|
|
45
|
+
- Uses real code, not mocks (unless external dependency makes this impossible)
|
|
46
|
+
- Asserts observable outcomes, not implementation details
|
|
47
|
+
|
|
48
|
+
### 2. Verify RED — Watch It Fail
|
|
49
|
+
|
|
50
|
+
Run the test. This step is mandatory — never skip it.
|
|
51
|
+
|
|
52
|
+
```bash
|
|
53
|
+
<test-command> <path-to-test-file>
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
Confirm:
|
|
57
|
+
|
|
58
|
+
- The test fails (not errors due to syntax or import issues)
|
|
59
|
+
- The failure message matches what you expect
|
|
60
|
+
- It fails because the feature is missing, not because of a typo
|
|
61
|
+
|
|
62
|
+
If the test passes immediately, you are testing existing behavior. Rewrite the test.
|
|
63
|
+
|
|
64
|
+
If the test errors instead of failing, fix the error first, then re-run until it fails correctly.
|
|
65
|
+
|
|
66
|
+
### 3. GREEN — Write Minimal Code to Pass
|
|
67
|
+
|
|
68
|
+
Write the simplest code that makes the test pass. Nothing more.
|
|
69
|
+
|
|
70
|
+
- Do not add features the test does not require
|
|
71
|
+
- Do not refactor other code
|
|
72
|
+
- Do not "improve" beyond what the test demands
|
|
73
|
+
- YAGNI — You Aren't Gonna Need It
|
|
74
|
+
|
|
75
|
+
### 4. Verify GREEN — Watch It Pass
|
|
76
|
+
|
|
77
|
+
Run the test again. This step is mandatory.
|
|
78
|
+
|
|
79
|
+
```bash
|
|
80
|
+
<test-command> <path-to-test-file>
|
|
81
|
+
```
|
|
82
|
+
|
|
83
|
+
Confirm:
|
|
84
|
+
|
|
85
|
+
- The new test passes
|
|
86
|
+
- All other tests still pass
|
|
87
|
+
- No warnings or errors in the output
|
|
88
|
+
|
|
89
|
+
If the test still fails, fix the code — not the test.
|
|
90
|
+
|
|
91
|
+
If other tests broke, fix them now before moving on.
|
|
92
|
+
|
|
93
|
+
### 5. REFACTOR — Clean Up (Tests Must Stay Green)
|
|
94
|
+
|
|
95
|
+
After green only:
|
|
96
|
+
|
|
97
|
+
- Remove duplication
|
|
98
|
+
- Improve names
|
|
99
|
+
- Extract helpers or shared utilities
|
|
100
|
+
|
|
101
|
+
Run tests after every refactor change. If any test fails, undo the refactor and try again.
|
|
102
|
+
|
|
103
|
+
Do not add new behavior during refactor. Refactor changes structure, not behavior.
|
|
104
|
+
|
|
105
|
+
### 6. Repeat
|
|
106
|
+
|
|
107
|
+
Next failing test for the next behavior. One cycle at a time.
|
|
108
|
+
|
|
109
|
+
## Test Quality Checklist
|
|
110
|
+
|
|
111
|
+
| Quality | Good | Bad |
|
|
112
|
+
|---------|------|-----|
|
|
113
|
+
| **Minimal** | Tests one thing | "validates email and domain and whitespace" |
|
|
114
|
+
| **Clear name** | Describes expected behavior | "test1", "it works" |
|
|
115
|
+
| **Shows intent** | Demonstrates desired API usage | Tests internal implementation details |
|
|
116
|
+
| **Real code** | Calls actual functions | Mocks everything, tests mock behavior |
|
|
117
|
+
| **Observable** | Asserts return values or side effects | Asserts internal state or call counts |
|
|
118
|
+
|
|
119
|
+
## When Tests Are Hard to Write
|
|
120
|
+
|
|
121
|
+
| Problem | What It Means | Action |
|
|
122
|
+
|---------|---------------|--------|
|
|
123
|
+
| Cannot figure out how to test | Design is unclear | Write the API you wish existed first, then assert on it |
|
|
124
|
+
| Test is too complicated | Code design is too complicated | Simplify the interface |
|
|
125
|
+
| Must mock everything | Code is too tightly coupled | Use dependency injection, reduce coupling |
|
|
126
|
+
| Test setup is enormous | Too many dependencies | Extract helpers — if still complex, simplify the design |
|
|
127
|
+
|
|
128
|
+
Hard-to-test code is hard-to-use code. Listen to what the test is telling you about the design.
|
|
129
|
+
|
|
130
|
+
## Bug Fix Protocol
|
|
131
|
+
|
|
132
|
+
Every bug fix follows TDD:
|
|
133
|
+
|
|
134
|
+
1. **RED** — write a test that reproduces the bug
|
|
135
|
+
2. **Verify RED** — confirm the test fails with the bug present
|
|
136
|
+
3. **GREEN** — implement the fix
|
|
137
|
+
4. **Verify GREEN** — confirm the test passes and the bug is gone
|
|
138
|
+
|
|
139
|
+
Never fix a bug without a test. The test proves the fix works and prevents regression.
|
|
140
|
+
|
|
141
|
+
## Common Rationalizations
|
|
142
|
+
|
|
143
|
+
| Excuse | Reality |
|
|
144
|
+
|--------|---------|
|
|
145
|
+
| "Too simple to test" | Simple code breaks. The test takes 30 seconds. |
|
|
146
|
+
| "I'll write tests after" | Tests written after pass immediately and prove nothing. |
|
|
147
|
+
| "Tests after achieve the same goals" | Tests-after verify "what does this do?" Tests-first define "what should this do?" |
|
|
148
|
+
| "Already manually tested" | Manual testing is ad-hoc: no record, cannot re-run, easy to miss cases. |
|
|
149
|
+
| "Deleting X hours of work is wasteful" | Sunk cost fallacy. Keeping unverified code is technical debt. |
|
|
150
|
+
| "Keep as reference, write tests first" | You will adapt it instead of writing fresh. That is testing after. |
|
|
151
|
+
| "Need to explore first" | Fine. Throw away the exploration. Start fresh with TDD. |
|
|
152
|
+
| "TDD will slow me down" | TDD is faster than debugging after the fact. |
|
|
153
|
+
| "This is different because..." | It is not. Delete the code. Start over with TDD. |
|
|
154
|
+
|
|
155
|
+
## Output Format
|
|
156
|
+
|
|
157
|
+
Present results using the Shared Output Contract:
|
|
158
|
+
|
|
159
|
+
1. **Goal/Result** — what was implemented using TDD and the current cycle state
|
|
160
|
+
2. **Key Details:**
|
|
161
|
+
- tests written (names and what they verify)
|
|
162
|
+
- RED/GREEN/REFACTOR status for each cycle
|
|
163
|
+
- any test that could not be written and why
|
|
164
|
+
- verification output (pass/fail counts)
|
|
165
|
+
3. **Next Action** — continue with next RED cycle, or hand off:
|
|
166
|
+
- all tests green and feature complete → `verification-before-completion`
|
|
167
|
+
- needs broader review → `code-review`
|
|
168
|
+
- complex feature needs planning first → `planning`
|
|
169
|
+
|
|
170
|
+
## Red Flags
|
|
171
|
+
|
|
172
|
+
- writing production code before a failing test exists
|
|
173
|
+
- test passes immediately on first run (testing existing behavior, not new behavior)
|
|
174
|
+
- cannot explain why the test failed (do not proceed to GREEN)
|
|
175
|
+
- adding features the test does not require during GREEN
|
|
176
|
+
- adding behavior during REFACTOR
|
|
177
|
+
- rationalizing "just this once" to skip the failing test step
|
|
178
|
+
- mocking so heavily that the test verifies mock behavior, not real behavior
|
|
179
|
+
- keeping pre-TDD code as "reference" instead of deleting it
|
|
180
|
+
|
|
181
|
+
## Checklist
|
|
182
|
+
|
|
183
|
+
- [ ] Every new function/method has a test that was written first
|
|
184
|
+
- [ ] Watched each test fail before writing implementation
|
|
185
|
+
- [ ] Each test failed for the expected reason (feature missing, not typo)
|
|
186
|
+
- [ ] Wrote minimal code to pass each test (YAGNI)
|
|
187
|
+
- [ ] All tests pass after each GREEN step
|
|
188
|
+
- [ ] Refactoring did not break any tests
|
|
189
|
+
- [ ] Tests use real code (mocks only when unavoidable)
|
|
190
|
+
- [ ] Edge cases and error paths are covered
|
|
191
|
+
|
|
192
|
+
## Done Criteria
|
|
193
|
+
|
|
194
|
+
This skill is complete when all required behaviors have passing tests that were written before the implementation, the full test suite is green, and no production code exists without a corresponding test that was seen to fail first. If any rationalization was used to skip TDD, the skill is not complete — delete the code and start over.
|
|
@@ -0,0 +1,77 @@
|
|
|
1
|
+
# Test-Driven Development — Examples
|
|
2
|
+
|
|
3
|
+
## Example 1
|
|
4
|
+
|
|
5
|
+
**User:** "Add email validation to the signup form"
|
|
6
|
+
|
|
7
|
+
Expected routing:
|
|
8
|
+
|
|
9
|
+
- task type: new feature with TDD
|
|
10
|
+
- chosen skill: `test-driven-development`
|
|
11
|
+
- planning required: yes, if multi-file
|
|
12
|
+
- next step: write a failing test for email validation before any implementation
|
|
13
|
+
|
|
14
|
+
## Example 2
|
|
15
|
+
|
|
16
|
+
**User:** "Fix bug: empty email accepted by the form"
|
|
17
|
+
|
|
18
|
+
Expected routing:
|
|
19
|
+
|
|
20
|
+
- task type: bug fix with TDD
|
|
21
|
+
- chosen skill: `test-driven-development` (via `debug`)
|
|
22
|
+
- next step: write a test that reproduces the bug (empty email accepted), confirm it fails, then fix
|
|
23
|
+
|
|
24
|
+
## Example 3
|
|
25
|
+
|
|
26
|
+
**User:** "Refactor the auth module — add tests first since it has none"
|
|
27
|
+
|
|
28
|
+
Expected routing:
|
|
29
|
+
|
|
30
|
+
- task type: refactor with TDD safety net
|
|
31
|
+
- chosen skill: `test-driven-development` + `refactor-safe`
|
|
32
|
+
- next step: write characterization tests for existing behavior before refactoring
|
|
33
|
+
|
|
34
|
+
## Red-Green-Refactor Cycle Example
|
|
35
|
+
|
|
36
|
+
**Feature:** retry failed HTTP requests 3 times
|
|
37
|
+
|
|
38
|
+
### RED
|
|
39
|
+
|
|
40
|
+
```typescript
|
|
41
|
+
test('retries failed operations 3 times', async () => {
|
|
42
|
+
let attempts = 0;
|
|
43
|
+
const operation = () => {
|
|
44
|
+
attempts++;
|
|
45
|
+
if (attempts < 3) throw new Error('fail');
|
|
46
|
+
return 'success';
|
|
47
|
+
};
|
|
48
|
+
|
|
49
|
+
const result = await retryOperation(operation);
|
|
50
|
+
|
|
51
|
+
expect(result).toBe('success');
|
|
52
|
+
expect(attempts).toBe(3);
|
|
53
|
+
});
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
Run test → FAIL: `retryOperation is not defined`
|
|
57
|
+
|
|
58
|
+
### GREEN
|
|
59
|
+
|
|
60
|
+
```typescript
|
|
61
|
+
async function retryOperation<T>(fn: () => T | Promise<T>): Promise<T> {
|
|
62
|
+
for (let i = 0; i < 3; i++) {
|
|
63
|
+
try {
|
|
64
|
+
return await fn();
|
|
65
|
+
} catch (e) {
|
|
66
|
+
if (i === 2) throw e;
|
|
67
|
+
}
|
|
68
|
+
}
|
|
69
|
+
throw new Error('unreachable');
|
|
70
|
+
}
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
Run test → PASS
|
|
74
|
+
|
|
75
|
+
### REFACTOR
|
|
76
|
+
|
|
77
|
+
Extract magic number 3 into a constant if needed. Run tests → still PASS.
|
|
@@ -0,0 +1,31 @@
|
|
|
1
|
+
name: test-driven-development
|
|
2
|
+
version: 0.1.0
|
|
3
|
+
category: quality
|
|
4
|
+
summary: Enforce test-first discipline with red-green-refactor cycles, preventing production code without a failing test.
|
|
5
|
+
summary_vi: "Thực thi kỷ luật test-first với chu trình red-green-refactor, không cho phép code production khi chưa có test fail."
|
|
6
|
+
triggers:
|
|
7
|
+
- implement with TDD
|
|
8
|
+
- test first
|
|
9
|
+
- red-green-refactor
|
|
10
|
+
- write a failing test first
|
|
11
|
+
- use TDD
|
|
12
|
+
inputs:
|
|
13
|
+
- feature or behavior to implement
|
|
14
|
+
- existing test framework and conventions
|
|
15
|
+
outputs:
|
|
16
|
+
- failing tests written first
|
|
17
|
+
- minimal implementation passing tests
|
|
18
|
+
- refactored code with green tests
|
|
19
|
+
- verification output
|
|
20
|
+
constraints:
|
|
21
|
+
- no production code without a failing test
|
|
22
|
+
- delete code written before its test
|
|
23
|
+
- one behavior per test
|
|
24
|
+
- YAGNI during GREEN phase
|
|
25
|
+
related_skills:
|
|
26
|
+
- using-skills
|
|
27
|
+
- planning
|
|
28
|
+
- feature-delivery
|
|
29
|
+
- debug
|
|
30
|
+
- verification-before-completion
|
|
31
|
+
- code-review
|
|
File without changes
|
|
@@ -0,0 +1,31 @@
|
|
|
1
|
+
# TDD Cycle Output Template
|
|
2
|
+
|
|
3
|
+
Use this template when reporting TDD progress so each cycle is explicit and verifiable.
|
|
4
|
+
|
|
5
|
+
```markdown
|
|
6
|
+
## Goal/Result
|
|
7
|
+
|
|
8
|
+
[What behavior was implemented or fixed using TDD]
|
|
9
|
+
|
|
10
|
+
## Key Details
|
|
11
|
+
|
|
12
|
+
- Test name: `[exact test name]`
|
|
13
|
+
- RED status: PASS / FAIL
|
|
14
|
+
- RED evidence: `[command and failure reason]`
|
|
15
|
+
- GREEN status: PASS / FAIL
|
|
16
|
+
- GREEN evidence: `[command and pass/fail result]`
|
|
17
|
+
- Refactor performed: yes / no
|
|
18
|
+
- Notes: `[edge cases, blockers, or why a test could not be written]`
|
|
19
|
+
|
|
20
|
+
## Next Action
|
|
21
|
+
|
|
22
|
+
- Continue with next RED cycle
|
|
23
|
+
- Or hand off to `verification-before-completion`
|
|
24
|
+
```
|
|
25
|
+
|
|
26
|
+
Checklist:
|
|
27
|
+
|
|
28
|
+
- The test was written before production code
|
|
29
|
+
- The RED failure was observed for the expected reason
|
|
30
|
+
- The GREEN pass was observed with fresh output
|
|
31
|
+
- Refactor did not add new behavior
|
|
@@ -22,7 +22,7 @@ Load this skill when:
|
|
|
22
22
|
|
|
23
23
|
> `an:` stands for "activate now" — a shorthand to immediately load a specific skill.
|
|
24
24
|
|
|
25
|
-
If the user types `an:<skill-name>` (for example `an:planning` or `an:
|
|
25
|
+
If the user types `an:<skill-name>` (for example `an:planning` or `an:debug`), skip classification and load that skill immediately.
|
|
26
26
|
|
|
27
27
|
**Rules:**
|
|
28
28
|
|
|
@@ -33,25 +33,31 @@ If the user types `an:<skill-name>` (for example `an:planning` or `an:bug-triage
|
|
|
33
33
|
|
|
34
34
|
**Available skills:**
|
|
35
35
|
|
|
36
|
-
- `an:brainstorming` —
|
|
36
|
+
- `an:brainstorming` — explore ideas, lock decisions, optionally write a spec
|
|
37
37
|
- `an:repo-onboarding` — understand an unfamiliar codebase
|
|
38
|
-
- `an:
|
|
38
|
+
- `an:research` — explore existing code and patterns before implementing
|
|
39
|
+
- `an:planning` — create an execution-ready plan with bite-sized steps
|
|
39
40
|
- `an:feature-delivery` — implement a feature
|
|
40
|
-
- `an:
|
|
41
|
+
- `an:test-driven-development` — implement with TDD (red-green-refactor)
|
|
42
|
+
- `an:debug` — systematic 4-phase debugging: investigate, analyze, fix, learn
|
|
41
43
|
- `an:refactor-safe` — restructure code without behavior change
|
|
42
44
|
- `an:verification-before-completion` — verify before claiming done
|
|
43
|
-
- `an:code-review` — review a diff or
|
|
45
|
+
- `an:code-review` — review a diff/PR, or evaluate received feedback
|
|
46
|
+
- `an:commit` — create a conventional commit with verification
|
|
44
47
|
- `an:docs-writer` — update documentation
|
|
48
|
+
- `an:extract` — extract patterns, decisions, and learnings from completed work
|
|
49
|
+
- `an:go-pipeline` — execute a full spec-to-commit pipeline in one run
|
|
45
50
|
|
|
46
51
|
## Workflow
|
|
47
52
|
|
|
48
53
|
1. Check for direct trigger: if the user typed `an:<skill-name>`, load that skill and skip to step 5.
|
|
49
54
|
2. Classify the request into one of these modes:
|
|
50
|
-
- idea refinement or
|
|
55
|
+
- idea refinement, specification, or requirements definition
|
|
51
56
|
- repo understanding
|
|
52
57
|
- bug or regression investigation
|
|
53
58
|
- planning and implementation
|
|
54
|
-
-
|
|
59
|
+
- test-driven implementation
|
|
60
|
+
- code review (giving or receiving)
|
|
55
61
|
- documentation work
|
|
56
62
|
- answer-only guidance
|
|
57
63
|
3. Decide whether the task first needs brainstorming or can go straight to planning.
|
|
@@ -64,12 +70,19 @@ If the user types `an:<skill-name>` (for example `an:planning` or `an:bug-triage
|
|
|
64
70
|
- `an:<skill-name>` (direct trigger) -> load the named skill immediately
|
|
65
71
|
- vague feature idea, unclear goal, tradeoff exploration -> `brainstorming`, then `planning`
|
|
66
72
|
- unfamiliar repo or missing context -> `repo-onboarding`
|
|
73
|
+
- need to understand existing code before implementing -> `research`
|
|
74
|
+
- complex feature needing requirements definition -> `brainstorming` (includes spec writing)
|
|
67
75
|
- docs work in an unfamiliar repo -> `repo-onboarding` first, then `docs-writer`
|
|
68
|
-
- bug report, error trace, failing test, regression -> `
|
|
76
|
+
- bug report, error trace, failing test, regression -> `debug`
|
|
69
77
|
- implement or change behavior -> `planning`, then `feature-delivery`
|
|
78
|
+
- implement with TDD approach -> `planning`, then `test-driven-development`
|
|
70
79
|
- refactor, restructure, extract, or migrate without behavior change -> `planning`, then `refactor-safe`
|
|
71
80
|
- review diff, PR, or changed files -> `code-review`
|
|
81
|
+
- respond to review feedback -> `code-review` (receiving mode)
|
|
82
|
+
- ready to commit -> `commit`
|
|
72
83
|
- update README, runbook, onboarding docs, API notes in a known repo -> `docs-writer`
|
|
84
|
+
- extract learnings from completed work -> `extract`
|
|
85
|
+
- execute an approved spec end-to-end -> `go-pipeline`
|
|
73
86
|
|
|
74
87
|
## Planning Rule
|
|
75
88
|
|
|
@@ -84,15 +97,19 @@ You may skip a separate planning step only when the change is clearly local, low
|
|
|
84
97
|
|
|
85
98
|
## Verification Rule
|
|
86
99
|
|
|
87
|
-
Use `verification-before-completion` before any strong claim that work is done, fixed, passing, or ready.
|
|
100
|
+
Use `verification-before-completion` before any strong claim that work is done, fixed, passing, or ready. No completion claims without fresh evidence.
|
|
88
101
|
|
|
89
102
|
## Output Format
|
|
90
103
|
|
|
91
|
-
|
|
92
|
-
|
|
93
|
-
|
|
94
|
-
|
|
95
|
-
-
|
|
104
|
+
Present results using the Shared Output Contract:
|
|
105
|
+
|
|
106
|
+
1. **Goal/Result** — the task classified and primary skill chosen
|
|
107
|
+
2. **Key Details:**
|
|
108
|
+
- task type
|
|
109
|
+
- chosen primary skill
|
|
110
|
+
- whether planning is required
|
|
111
|
+
- key assumption or missing decision, if any
|
|
112
|
+
3. **Next Action** — the immediate first step with the chosen skill
|
|
96
113
|
|
|
97
114
|
## Red Flags
|
|
98
115
|
|
|
@@ -102,6 +119,7 @@ Use `verification-before-completion` before any strong claim that work is done,
|
|
|
102
119
|
- loading many skills at once without a clear reason
|
|
103
120
|
- asking broad planning questions before checking if the task is already clear
|
|
104
121
|
- forcing a feature workflow onto a review or docs task
|
|
122
|
+
- skipping TDD when the user requested it
|
|
105
123
|
|
|
106
124
|
## Done Criteria
|
|
107
125
|
|
|
@@ -26,9 +26,9 @@ This login flow started failing after yesterday's deploy.
|
|
|
26
26
|
Expected routing:
|
|
27
27
|
|
|
28
28
|
- task type: bug investigation
|
|
29
|
-
- chosen skill: `
|
|
30
|
-
- planning required: maybe, after
|
|
31
|
-
- next step:
|
|
29
|
+
- chosen skill: `debug`
|
|
30
|
+
- planning required: maybe, after diagnosis if the fix is not obviously local
|
|
31
|
+
- next step: classify the issue and try to reproduce it
|
|
32
32
|
|
|
33
33
|
## Example 3
|
|
34
34
|
|
|
@@ -1,8 +1,8 @@
|
|
|
1
1
|
name: using-skills
|
|
2
|
-
version: 0.
|
|
2
|
+
version: 0.3.0
|
|
3
3
|
category: routing
|
|
4
4
|
summary: Route a user request to the right primary skill and working mode before deeper work begins.
|
|
5
|
-
summary_vi: Phân loại request và chọn đúng skill chính trước khi bắt đầu công việc sâu hơn.
|
|
5
|
+
summary_vi: "Phân loại request và chọn đúng skill chính trước khi bắt đầu công việc sâu hơn."
|
|
6
6
|
triggers:
|
|
7
7
|
- start a new session
|
|
8
8
|
- decide which skill to load
|
|
@@ -20,10 +20,15 @@ constraints:
|
|
|
20
20
|
related_skills:
|
|
21
21
|
- brainstorming
|
|
22
22
|
- repo-onboarding
|
|
23
|
+
- research
|
|
23
24
|
- planning
|
|
24
|
-
-
|
|
25
|
+
- test-driven-development
|
|
26
|
+
- debug
|
|
25
27
|
- feature-delivery
|
|
26
28
|
- refactor-safe
|
|
27
29
|
- code-review
|
|
30
|
+
- commit
|
|
28
31
|
- verification-before-completion
|
|
29
32
|
- docs-writer
|
|
33
|
+
- extract
|
|
34
|
+
- go-pipeline
|
|
@@ -2,7 +2,15 @@
|
|
|
2
2
|
|
|
3
3
|
## Purpose
|
|
4
4
|
|
|
5
|
-
Stop false completion claims by requiring fresh evidence before saying work is done, fixed, or passing.
|
|
5
|
+
Stop false completion claims by requiring fresh evidence before saying work is done, fixed, or passing. Evidence before claims, always.
|
|
6
|
+
|
|
7
|
+
## The Iron Law
|
|
8
|
+
|
|
9
|
+
```
|
|
10
|
+
NO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION EVIDENCE
|
|
11
|
+
```
|
|
12
|
+
|
|
13
|
+
If you have not run the verification command in this response, you cannot claim it passes. No exceptions.
|
|
6
14
|
|
|
7
15
|
## When To Use
|
|
8
16
|
|
|
@@ -12,6 +20,37 @@ Load this skill when:
|
|
|
12
20
|
- about to say tests or builds pass
|
|
13
21
|
- about to mark work complete
|
|
14
22
|
- about to commit, open a PR, or hand off finished work
|
|
23
|
+
- verifying implementation against a spec's acceptance criteria
|
|
24
|
+
- expressing satisfaction about work state ("done", "ready", "all good")
|
|
25
|
+
|
|
26
|
+
## The Gate Function
|
|
27
|
+
|
|
28
|
+
```
|
|
29
|
+
BEFORE claiming any status or expressing satisfaction:
|
|
30
|
+
|
|
31
|
+
1. IDENTIFY: What command proves this claim?
|
|
32
|
+
2. RUN: Execute the FULL command (fresh, complete)
|
|
33
|
+
3. READ: Full output, check exit code, count failures
|
|
34
|
+
4. VERIFY: Does output confirm the claim?
|
|
35
|
+
- If NO: State actual status with evidence
|
|
36
|
+
- If YES: State claim WITH evidence
|
|
37
|
+
5. ONLY THEN: Make the claim
|
|
38
|
+
|
|
39
|
+
Skip any step = lying, not verifying
|
|
40
|
+
```
|
|
41
|
+
|
|
42
|
+
## Forbidden Words
|
|
43
|
+
|
|
44
|
+
Do not use these words in completion claims unless backed by fresh evidence run in the same response:
|
|
45
|
+
|
|
46
|
+
- "should work now"
|
|
47
|
+
- "probably fixed"
|
|
48
|
+
- "seems to pass"
|
|
49
|
+
- "looks correct"
|
|
50
|
+
- "I'm confident"
|
|
51
|
+
- "Great!", "Perfect!", "Done!" (before verification)
|
|
52
|
+
|
|
53
|
+
Replace with evidence: "Tests pass (42/42, 0 failures)" or "Build exits 0."
|
|
15
54
|
|
|
16
55
|
## Workflow
|
|
17
56
|
|
|
@@ -19,28 +58,103 @@ Load this skill when:
|
|
|
19
58
|
2. Identify the command, test, or check that proves that claim.
|
|
20
59
|
3. Run the most relevant verification available.
|
|
21
60
|
4. Read the actual result, not just the expectation.
|
|
22
|
-
5.
|
|
23
|
-
|
|
24
|
-
|
|
25
|
-
-
|
|
26
|
-
|
|
61
|
+
5. If spec-linked, verify acceptance criteria coverage.
|
|
62
|
+
6. Check verification levels per deliverable.
|
|
63
|
+
7. Report one of three states:
|
|
64
|
+
- **verified** — evidence confirms the claim
|
|
65
|
+
- **failed verification** — evidence contradicts the claim
|
|
66
|
+
- **verification blocked** — cannot verify, state what remains unproven
|
|
67
|
+
|
|
68
|
+
## Verification Levels
|
|
69
|
+
|
|
70
|
+
For each deliverable, verify at three levels:
|
|
71
|
+
|
|
72
|
+
| Level | Check | Meaning |
|
|
73
|
+
|-------|-------|---------|
|
|
74
|
+
| **L1: EXISTS** | File/component/route exists | Created but unknown quality |
|
|
75
|
+
| **L2: SUBSTANTIVE** | Not a stub (no `return null`, empty handlers, TODO-only) | Has real implementation |
|
|
76
|
+
| **L3: WIRED** | Imported and used in the integration layer | Actually connected |
|
|
77
|
+
|
|
78
|
+
Report status per deliverable:
|
|
79
|
+
|
|
80
|
+
- L1+L2+L3: fully wired
|
|
81
|
+
- L1+L2 only: created but not integrated (flag it)
|
|
82
|
+
- L1 only (stub): exists but empty (blocks completion)
|
|
83
|
+
- Missing: not found (blocks completion)
|
|
84
|
+
|
|
85
|
+
## Spec Acceptance Criteria Coverage
|
|
86
|
+
|
|
87
|
+
When work is linked to a spec, also verify:
|
|
88
|
+
|
|
89
|
+
1. Map each acceptance criterion to its verification evidence.
|
|
90
|
+
2. Report coverage:
|
|
91
|
+
|
|
92
|
+
```
|
|
93
|
+
AC Coverage
|
|
94
|
+
===========
|
|
95
|
+
- [x] AC-1: [description] — VERIFIED (test passes)
|
|
96
|
+
- [x] AC-2: [description] — VERIFIED (manual check)
|
|
97
|
+
- [ ] AC-3: [description] — NOT VERIFIED (no test exists)
|
|
98
|
+
|
|
99
|
+
Coverage: 2/3 (67%)
|
|
100
|
+
```
|
|
101
|
+
|
|
102
|
+
3. Flag any AC that has no verification evidence.
|
|
103
|
+
|
|
104
|
+
## Common Rationalizations
|
|
105
|
+
|
|
106
|
+
| Excuse | Reality |
|
|
107
|
+
|--------|---------|
|
|
108
|
+
| "Should work now" | RUN the verification. |
|
|
109
|
+
| "I'm confident" | Confidence is not evidence. |
|
|
110
|
+
| "Just this once" | No exceptions. |
|
|
111
|
+
| "Linter passed" | Linter is not compiler is not test suite. |
|
|
112
|
+
| "Agent said success" | Verify independently. |
|
|
113
|
+
| "Partial check is enough" | Partial proves nothing about the unchecked parts. |
|
|
114
|
+
| "Different words so rule doesn't apply" | Spirit over letter. Any claim of success requires evidence. |
|
|
115
|
+
| "I'm tired" | Exhaustion is not an excuse to ship broken code. |
|
|
27
116
|
|
|
28
117
|
## Output Format
|
|
29
118
|
|
|
30
|
-
|
|
31
|
-
|
|
32
|
-
|
|
33
|
-
|
|
34
|
-
-
|
|
119
|
+
Present results using the Shared Output Contract:
|
|
120
|
+
|
|
121
|
+
1. **Goal/Result** — what claim was checked and the verification status
|
|
122
|
+
2. **Key Details:**
|
|
123
|
+
- claim being checked
|
|
124
|
+
- evidence run (exact command or check)
|
|
125
|
+
- result (pass/fail/blocked)
|
|
126
|
+
- verification level per deliverable (L1/L2/L3)
|
|
127
|
+
- AC coverage (if spec-linked)
|
|
128
|
+
- remaining uncertainty, if any
|
|
129
|
+
3. **Next Action:**
|
|
130
|
+
- if verified → `code-review` or `commit`
|
|
131
|
+
- if failed → back to `feature-delivery` or `debug`
|
|
132
|
+
- if blocked → state what is needed
|
|
35
133
|
|
|
36
134
|
## Red Flags
|
|
37
135
|
|
|
38
|
-
-
|
|
136
|
+
- using any forbidden word without fresh evidence
|
|
137
|
+
- saying "should work now"
|
|
39
138
|
- treating code edits as proof
|
|
40
139
|
- using stale test output as fresh evidence
|
|
41
140
|
- extrapolating from a partial check to a broader claim
|
|
42
141
|
- declaring success while verification is still blocked
|
|
142
|
+
- marking stubs (L1 only) as complete
|
|
143
|
+
- skipping AC coverage check for spec-linked work
|
|
144
|
+
- expressing satisfaction before running verification
|
|
145
|
+
|
|
146
|
+
## Checklist
|
|
147
|
+
|
|
148
|
+
- [ ] Claim identified
|
|
149
|
+
- [ ] Verification command/check identified
|
|
150
|
+
- [ ] Verification run (fresh, not stale)
|
|
151
|
+
- [ ] Actual result read (not assumed)
|
|
152
|
+
- [ ] No forbidden words used without evidence
|
|
153
|
+
- [ ] Verification levels checked per deliverable (L1/L2/L3)
|
|
154
|
+
- [ ] AC coverage verified (if spec-linked)
|
|
155
|
+
- [ ] Status reported (verified / failed / blocked)
|
|
156
|
+
- [ ] Remaining uncertainty stated
|
|
43
157
|
|
|
44
158
|
## Done Criteria
|
|
45
159
|
|
|
46
|
-
This skill is complete when the claim is either backed by fresh evidence or explicitly marked as unverified with the blocker stated. If verification passes and a review is warranted, hand off to `code-review`.
|
|
160
|
+
This skill is complete when the claim is either backed by fresh evidence or explicitly marked as unverified with the blocker stated. If verification passes and a review is warranted, hand off to `code-review`. If spec-linked, AC coverage must be reported.
|