@anionzo/skill 1.3.0 → 1.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (57) hide show
  1. package/CONTRIBUTING.md +2 -1
  2. package/README.md +29 -10
  3. package/docs/design-brief.md +19 -13
  4. package/i18n/CONTRIBUTING.vi.md +2 -1
  5. package/i18n/README.vi.md +29 -10
  6. package/i18n/design-brief.vi.md +19 -13
  7. package/knowledge/global/skill-triggering-rules.md +2 -1
  8. package/package.json +1 -1
  9. package/scripts/install-opencode-skills +161 -12
  10. package/skills/brainstorming/SKILL.md +176 -13
  11. package/skills/brainstorming/meta.yaml +19 -10
  12. package/skills/code-review/SKILL.md +214 -19
  13. package/skills/code-review/meta.yaml +21 -9
  14. package/skills/commit/SKILL.md +187 -0
  15. package/skills/commit/examples.md +62 -0
  16. package/skills/commit/meta.yaml +30 -0
  17. package/skills/commit/references/output-template.md +14 -0
  18. package/skills/debug/SKILL.md +252 -0
  19. package/skills/debug/examples.md +83 -0
  20. package/skills/debug/meta.yaml +38 -0
  21. package/skills/debug/references/output-template.md +16 -0
  22. package/skills/docs-writer/SKILL.md +85 -10
  23. package/skills/docs-writer/meta.yaml +16 -12
  24. package/skills/extract/SKILL.md +161 -0
  25. package/skills/extract/examples.md +47 -0
  26. package/skills/extract/meta.yaml +27 -0
  27. package/skills/extract/references/output-template.md +24 -0
  28. package/skills/feature-delivery/SKILL.md +10 -5
  29. package/skills/feature-delivery/meta.yaml +5 -0
  30. package/skills/go-pipeline/SKILL.md +156 -0
  31. package/skills/go-pipeline/examples.md +56 -0
  32. package/skills/go-pipeline/meta.yaml +27 -0
  33. package/skills/go-pipeline/references/output-template.md +17 -0
  34. package/skills/planning/SKILL.md +128 -17
  35. package/skills/planning/meta.yaml +15 -6
  36. package/skills/refactor-safe/SKILL.md +10 -7
  37. package/skills/repo-onboarding/SKILL.md +11 -7
  38. package/skills/repo-onboarding/meta.yaml +2 -0
  39. package/skills/research/SKILL.md +100 -0
  40. package/skills/research/examples.md +79 -0
  41. package/skills/research/meta.yaml +27 -0
  42. package/skills/research/references/output-template.md +23 -0
  43. package/skills/test-driven-development/SKILL.md +194 -0
  44. package/skills/test-driven-development/examples.md +77 -0
  45. package/skills/test-driven-development/meta.yaml +31 -0
  46. package/skills/test-driven-development/references/.gitkeep +0 -0
  47. package/skills/test-driven-development/references/output-template.md +31 -0
  48. package/skills/using-skills/SKILL.md +32 -14
  49. package/skills/using-skills/examples.md +3 -3
  50. package/skills/using-skills/meta.yaml +8 -3
  51. package/skills/verification-before-completion/SKILL.md +127 -13
  52. package/skills/verification-before-completion/meta.yaml +24 -14
  53. package/templates/SKILL.md +8 -1
  54. package/skills/bug-triage/SKILL.md +0 -47
  55. package/skills/bug-triage/examples.md +0 -68
  56. package/skills/bug-triage/meta.yaml +0 -25
  57. package/skills/bug-triage/references/output-template.md +0 -26
@@ -0,0 +1,194 @@
1
+ # Test-Driven Development
2
+
3
+ ## Purpose
4
+
5
+ Enforce the discipline of writing a failing test before writing production code. This skill exists because tests written after implementation pass immediately, proving nothing — they verify what you built, not what was required.
6
+
7
+ ## When To Use
8
+
9
+ Load this skill when:
10
+
11
+ - implementing any new feature or behavior
12
+ - fixing a bug (write a test that reproduces the bug first)
13
+ - refactoring code that lacks test coverage
14
+ - the user says "use TDD", "test first", or "red-green-refactor"
15
+
16
+ Exceptions (confirm with the user first):
17
+
18
+ - throwaway prototypes or spikes
19
+ - generated code (codegen, scaffolding)
20
+ - pure configuration files
21
+
22
+ ## The Iron Law
23
+
24
+ ```
25
+ NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST
26
+ ```
27
+
28
+ Wrote code before the test? Delete it. Start over. Implement fresh from the test.
29
+
30
+ - Do not keep it as "reference"
31
+ - Do not "adapt" it while writing tests
32
+ - Do not look at it while writing the test
33
+ - Delete means delete
34
+
35
+ ## Workflow: Red-Green-Refactor
36
+
37
+ ### 1. RED — Write a Failing Test
38
+
39
+ Write one minimal test that describes the behavior you want.
40
+
41
+ Requirements:
42
+
43
+ - Tests one behavior (if the test name contains "and", split it)
44
+ - Clear name that describes expected behavior
45
+ - Uses real code, not mocks (unless external dependency makes this impossible)
46
+ - Asserts observable outcomes, not implementation details
47
+
48
+ ### 2. Verify RED — Watch It Fail
49
+
50
+ Run the test. This step is mandatory — never skip it.
51
+
52
+ ```bash
53
+ <test-command> <path-to-test-file>
54
+ ```
55
+
56
+ Confirm:
57
+
58
+ - The test fails (not errors due to syntax or import issues)
59
+ - The failure message matches what you expect
60
+ - It fails because the feature is missing, not because of a typo
61
+
62
+ If the test passes immediately, you are testing existing behavior. Rewrite the test.
63
+
64
+ If the test errors instead of failing, fix the error first, then re-run until it fails correctly.
65
+
66
+ ### 3. GREEN — Write Minimal Code to Pass
67
+
68
+ Write the simplest code that makes the test pass. Nothing more.
69
+
70
+ - Do not add features the test does not require
71
+ - Do not refactor other code
72
+ - Do not "improve" beyond what the test demands
73
+ - YAGNI — You Aren't Gonna Need It
74
+
75
+ ### 4. Verify GREEN — Watch It Pass
76
+
77
+ Run the test again. This step is mandatory.
78
+
79
+ ```bash
80
+ <test-command> <path-to-test-file>
81
+ ```
82
+
83
+ Confirm:
84
+
85
+ - The new test passes
86
+ - All other tests still pass
87
+ - No warnings or errors in the output
88
+
89
+ If the test still fails, fix the code — not the test.
90
+
91
+ If other tests broke, fix them now before moving on.
92
+
93
+ ### 5. REFACTOR — Clean Up (Tests Must Stay Green)
94
+
95
+ After green only:
96
+
97
+ - Remove duplication
98
+ - Improve names
99
+ - Extract helpers or shared utilities
100
+
101
+ Run tests after every refactor change. If any test fails, undo the refactor and try again.
102
+
103
+ Do not add new behavior during refactor. Refactor changes structure, not behavior.
104
+
105
+ ### 6. Repeat
106
+
107
+ Next failing test for the next behavior. One cycle at a time.
108
+
109
+ ## Test Quality Checklist
110
+
111
+ | Quality | Good | Bad |
112
+ |---------|------|-----|
113
+ | **Minimal** | Tests one thing | "validates email and domain and whitespace" |
114
+ | **Clear name** | Describes expected behavior | "test1", "it works" |
115
+ | **Shows intent** | Demonstrates desired API usage | Tests internal implementation details |
116
+ | **Real code** | Calls actual functions | Mocks everything, tests mock behavior |
117
+ | **Observable** | Asserts return values or side effects | Asserts internal state or call counts |
118
+
119
+ ## When Tests Are Hard to Write
120
+
121
+ | Problem | What It Means | Action |
122
+ |---------|---------------|--------|
123
+ | Cannot figure out how to test | Design is unclear | Write the API you wish existed first, then assert on it |
124
+ | Test is too complicated | Code design is too complicated | Simplify the interface |
125
+ | Must mock everything | Code is too tightly coupled | Use dependency injection, reduce coupling |
126
+ | Test setup is enormous | Too many dependencies | Extract helpers — if still complex, simplify the design |
127
+
128
+ Hard-to-test code is hard-to-use code. Listen to what the test is telling you about the design.
129
+
130
+ ## Bug Fix Protocol
131
+
132
+ Every bug fix follows TDD:
133
+
134
+ 1. **RED** — write a test that reproduces the bug
135
+ 2. **Verify RED** — confirm the test fails with the bug present
136
+ 3. **GREEN** — implement the fix
137
+ 4. **Verify GREEN** — confirm the test passes and the bug is gone
138
+
139
+ Never fix a bug without a test. The test proves the fix works and prevents regression.
140
+
141
+ ## Common Rationalizations
142
+
143
+ | Excuse | Reality |
144
+ |--------|---------|
145
+ | "Too simple to test" | Simple code breaks. The test takes 30 seconds. |
146
+ | "I'll write tests after" | Tests written after pass immediately and prove nothing. |
147
+ | "Tests after achieve the same goals" | Tests-after verify "what does this do?" Tests-first define "what should this do?" |
148
+ | "Already manually tested" | Manual testing is ad-hoc: no record, cannot re-run, easy to miss cases. |
149
+ | "Deleting X hours of work is wasteful" | Sunk cost fallacy. Keeping unverified code is technical debt. |
150
+ | "Keep as reference, write tests first" | You will adapt it instead of writing fresh. That is testing after. |
151
+ | "Need to explore first" | Fine. Throw away the exploration. Start fresh with TDD. |
152
+ | "TDD will slow me down" | TDD is faster than debugging after the fact. |
153
+ | "This is different because..." | It is not. Delete the code. Start over with TDD. |
154
+
155
+ ## Output Format
156
+
157
+ Present results using the Shared Output Contract:
158
+
159
+ 1. **Goal/Result** — what was implemented using TDD and the current cycle state
160
+ 2. **Key Details:**
161
+ - tests written (names and what they verify)
162
+ - RED/GREEN/REFACTOR status for each cycle
163
+ - any test that could not be written and why
164
+ - verification output (pass/fail counts)
165
+ 3. **Next Action** — continue with next RED cycle, or hand off:
166
+ - all tests green and feature complete → `verification-before-completion`
167
+ - needs broader review → `code-review`
168
+ - complex feature needs planning first → `planning`
169
+
170
+ ## Red Flags
171
+
172
+ - writing production code before a failing test exists
173
+ - test passes immediately on first run (testing existing behavior, not new behavior)
174
+ - cannot explain why the test failed (do not proceed to GREEN)
175
+ - adding features the test does not require during GREEN
176
+ - adding behavior during REFACTOR
177
+ - rationalizing "just this once" to skip the failing test step
178
+ - mocking so heavily that the test verifies mock behavior, not real behavior
179
+ - keeping pre-TDD code as "reference" instead of deleting it
180
+
181
+ ## Checklist
182
+
183
+ - [ ] Every new function/method has a test that was written first
184
+ - [ ] Watched each test fail before writing implementation
185
+ - [ ] Each test failed for the expected reason (feature missing, not typo)
186
+ - [ ] Wrote minimal code to pass each test (YAGNI)
187
+ - [ ] All tests pass after each GREEN step
188
+ - [ ] Refactoring did not break any tests
189
+ - [ ] Tests use real code (mocks only when unavoidable)
190
+ - [ ] Edge cases and error paths are covered
191
+
192
+ ## Done Criteria
193
+
194
+ This skill is complete when all required behaviors have passing tests that were written before the implementation, the full test suite is green, and no production code exists without a corresponding test that was seen to fail first. If any rationalization was used to skip TDD, the skill is not complete — delete the code and start over.
@@ -0,0 +1,77 @@
1
+ # Test-Driven Development — Examples
2
+
3
+ ## Example 1
4
+
5
+ **User:** "Add email validation to the signup form"
6
+
7
+ Expected routing:
8
+
9
+ - task type: new feature with TDD
10
+ - chosen skill: `test-driven-development`
11
+ - planning required: yes, if multi-file
12
+ - next step: write a failing test for email validation before any implementation
13
+
14
+ ## Example 2
15
+
16
+ **User:** "Fix bug: empty email accepted by the form"
17
+
18
+ Expected routing:
19
+
20
+ - task type: bug fix with TDD
21
+ - chosen skill: `test-driven-development` (via `debug`)
22
+ - next step: write a test that reproduces the bug (empty email accepted), confirm it fails, then fix
23
+
24
+ ## Example 3
25
+
26
+ **User:** "Refactor the auth module — add tests first since it has none"
27
+
28
+ Expected routing:
29
+
30
+ - task type: refactor with TDD safety net
31
+ - chosen skill: `test-driven-development` + `refactor-safe`
32
+ - next step: write characterization tests for existing behavior before refactoring
33
+
34
+ ## Red-Green-Refactor Cycle Example
35
+
36
+ **Feature:** retry failed HTTP requests 3 times
37
+
38
+ ### RED
39
+
40
+ ```typescript
41
+ test('retries failed operations 3 times', async () => {
42
+ let attempts = 0;
43
+ const operation = () => {
44
+ attempts++;
45
+ if (attempts < 3) throw new Error('fail');
46
+ return 'success';
47
+ };
48
+
49
+ const result = await retryOperation(operation);
50
+
51
+ expect(result).toBe('success');
52
+ expect(attempts).toBe(3);
53
+ });
54
+ ```
55
+
56
+ Run test → FAIL: `retryOperation is not defined`
57
+
58
+ ### GREEN
59
+
60
+ ```typescript
61
+ async function retryOperation<T>(fn: () => T | Promise<T>): Promise<T> {
62
+ for (let i = 0; i < 3; i++) {
63
+ try {
64
+ return await fn();
65
+ } catch (e) {
66
+ if (i === 2) throw e;
67
+ }
68
+ }
69
+ throw new Error('unreachable');
70
+ }
71
+ ```
72
+
73
+ Run test → PASS
74
+
75
+ ### REFACTOR
76
+
77
+ Extract magic number 3 into a constant if needed. Run tests → still PASS.
@@ -0,0 +1,31 @@
1
+ name: test-driven-development
2
+ version: 0.1.0
3
+ category: quality
4
+ summary: Enforce test-first discipline with red-green-refactor cycles, preventing production code without a failing test.
5
+ summary_vi: "Thực thi kỷ luật test-first với chu trình red-green-refactor, không cho phép code production khi chưa có test fail."
6
+ triggers:
7
+ - implement with TDD
8
+ - test first
9
+ - red-green-refactor
10
+ - write a failing test first
11
+ - use TDD
12
+ inputs:
13
+ - feature or behavior to implement
14
+ - existing test framework and conventions
15
+ outputs:
16
+ - failing tests written first
17
+ - minimal implementation passing tests
18
+ - refactored code with green tests
19
+ - verification output
20
+ constraints:
21
+ - no production code without a failing test
22
+ - delete code written before its test
23
+ - one behavior per test
24
+ - YAGNI during GREEN phase
25
+ related_skills:
26
+ - using-skills
27
+ - planning
28
+ - feature-delivery
29
+ - debug
30
+ - verification-before-completion
31
+ - code-review
@@ -0,0 +1,31 @@
1
+ # TDD Cycle Output Template
2
+
3
+ Use this template when reporting TDD progress so each cycle is explicit and verifiable.
4
+
5
+ ```markdown
6
+ ## Goal/Result
7
+
8
+ [What behavior was implemented or fixed using TDD]
9
+
10
+ ## Key Details
11
+
12
+ - Test name: `[exact test name]`
13
+ - RED status: PASS / FAIL
14
+ - RED evidence: `[command and failure reason]`
15
+ - GREEN status: PASS / FAIL
16
+ - GREEN evidence: `[command and pass/fail result]`
17
+ - Refactor performed: yes / no
18
+ - Notes: `[edge cases, blockers, or why a test could not be written]`
19
+
20
+ ## Next Action
21
+
22
+ - Continue with next RED cycle
23
+ - Or hand off to `verification-before-completion`
24
+ ```
25
+
26
+ Checklist:
27
+
28
+ - The test was written before production code
29
+ - The RED failure was observed for the expected reason
30
+ - The GREEN pass was observed with fresh output
31
+ - Refactor did not add new behavior
@@ -22,7 +22,7 @@ Load this skill when:
22
22
 
23
23
  > `an:` stands for "activate now" — a shorthand to immediately load a specific skill.
24
24
 
25
- If the user types `an:<skill-name>` (for example `an:planning` or `an:bug-triage`), skip classification and load that skill immediately.
25
+ If the user types `an:<skill-name>` (for example `an:planning` or `an:debug`), skip classification and load that skill immediately.
26
26
 
27
27
  **Rules:**
28
28
 
@@ -33,25 +33,31 @@ If the user types `an:<skill-name>` (for example `an:planning` or `an:bug-triage
33
33
 
34
34
  **Available skills:**
35
35
 
36
- - `an:brainstorming` — refine vague ideas before planning
36
+ - `an:brainstorming` — explore ideas, lock decisions, optionally write a spec
37
37
  - `an:repo-onboarding` — understand an unfamiliar codebase
38
- - `an:planning` — create an execution-ready plan
38
+ - `an:research` — explore existing code and patterns before implementing
39
+ - `an:planning` — create an execution-ready plan with bite-sized steps
39
40
  - `an:feature-delivery` — implement a feature
40
- - `an:bug-triage` — investigate errors or regressions
41
+ - `an:test-driven-development` — implement with TDD (red-green-refactor)
42
+ - `an:debug` — systematic 4-phase debugging: investigate, analyze, fix, learn
41
43
  - `an:refactor-safe` — restructure code without behavior change
42
44
  - `an:verification-before-completion` — verify before claiming done
43
- - `an:code-review` — review a diff or PR
45
+ - `an:code-review` — review a diff/PR, or evaluate received feedback
46
+ - `an:commit` — create a conventional commit with verification
44
47
  - `an:docs-writer` — update documentation
48
+ - `an:extract` — extract patterns, decisions, and learnings from completed work
49
+ - `an:go-pipeline` — execute a full spec-to-commit pipeline in one run
45
50
 
46
51
  ## Workflow
47
52
 
48
53
  1. Check for direct trigger: if the user typed `an:<skill-name>`, load that skill and skip to step 5.
49
54
  2. Classify the request into one of these modes:
50
- - idea refinement or specification
55
+ - idea refinement, specification, or requirements definition
51
56
  - repo understanding
52
57
  - bug or regression investigation
53
58
  - planning and implementation
54
- - code review
59
+ - test-driven implementation
60
+ - code review (giving or receiving)
55
61
  - documentation work
56
62
  - answer-only guidance
57
63
  3. Decide whether the task first needs brainstorming or can go straight to planning.
@@ -64,12 +70,19 @@ If the user types `an:<skill-name>` (for example `an:planning` or `an:bug-triage
64
70
  - `an:<skill-name>` (direct trigger) -> load the named skill immediately
65
71
  - vague feature idea, unclear goal, tradeoff exploration -> `brainstorming`, then `planning`
66
72
  - unfamiliar repo or missing context -> `repo-onboarding`
73
+ - need to understand existing code before implementing -> `research`
74
+ - complex feature needing requirements definition -> `brainstorming` (includes spec writing)
67
75
  - docs work in an unfamiliar repo -> `repo-onboarding` first, then `docs-writer`
68
- - bug report, error trace, failing test, regression -> `bug-triage`, then `planning` if the fix is not already obvious and bounded
76
+ - bug report, error trace, failing test, regression -> `debug`
69
77
  - implement or change behavior -> `planning`, then `feature-delivery`
78
+ - implement with TDD approach -> `planning`, then `test-driven-development`
70
79
  - refactor, restructure, extract, or migrate without behavior change -> `planning`, then `refactor-safe`
71
80
  - review diff, PR, or changed files -> `code-review`
81
+ - respond to review feedback -> `code-review` (receiving mode)
82
+ - ready to commit -> `commit`
72
83
  - update README, runbook, onboarding docs, API notes in a known repo -> `docs-writer`
84
+ - extract learnings from completed work -> `extract`
85
+ - execute an approved spec end-to-end -> `go-pipeline`
73
86
 
74
87
  ## Planning Rule
75
88
 
@@ -84,15 +97,19 @@ You may skip a separate planning step only when the change is clearly local, low
84
97
 
85
98
  ## Verification Rule
86
99
 
87
- Use `verification-before-completion` before any strong claim that work is done, fixed, passing, or ready.
100
+ Use `verification-before-completion` before any strong claim that work is done, fixed, passing, or ready. No completion claims without fresh evidence.
88
101
 
89
102
  ## Output Format
90
103
 
91
- - task type
92
- - chosen primary skill
93
- - planning required
94
- - key assumption or missing decision, if any
95
- - immediate next step
104
+ Present results using the Shared Output Contract:
105
+
106
+ 1. **Goal/Result** — the task classified and primary skill chosen
107
+ 2. **Key Details:**
108
+ - task type
109
+ - chosen primary skill
110
+ - whether planning is required
111
+ - key assumption or missing decision, if any
112
+ 3. **Next Action** — the immediate first step with the chosen skill
96
113
 
97
114
  ## Red Flags
98
115
 
@@ -102,6 +119,7 @@ Use `verification-before-completion` before any strong claim that work is done,
102
119
  - loading many skills at once without a clear reason
103
120
  - asking broad planning questions before checking if the task is already clear
104
121
  - forcing a feature workflow onto a review or docs task
122
+ - skipping TDD when the user requested it
105
123
 
106
124
  ## Done Criteria
107
125
 
@@ -26,9 +26,9 @@ This login flow started failing after yesterday's deploy.
26
26
  Expected routing:
27
27
 
28
28
  - task type: bug investigation
29
- - chosen skill: `bug-triage`
30
- - planning required: maybe, after triage if the fix is not obviously local
31
- - next step: restate the symptom and try to reproduce it
29
+ - chosen skill: `debug`
30
+ - planning required: maybe, after diagnosis if the fix is not obviously local
31
+ - next step: classify the issue and try to reproduce it
32
32
 
33
33
  ## Example 3
34
34
 
@@ -1,8 +1,8 @@
1
1
  name: using-skills
2
- version: 0.1.0
2
+ version: 0.3.0
3
3
  category: routing
4
4
  summary: Route a user request to the right primary skill and working mode before deeper work begins.
5
- summary_vi: Phân loại request và chọn đúng skill chính trước khi bắt đầu công việc sâu hơn.
5
+ summary_vi: "Phân loại request và chọn đúng skill chính trước khi bắt đầu công việc sâu hơn."
6
6
  triggers:
7
7
  - start a new session
8
8
  - decide which skill to load
@@ -20,10 +20,15 @@ constraints:
20
20
  related_skills:
21
21
  - brainstorming
22
22
  - repo-onboarding
23
+ - research
23
24
  - planning
24
- - bug-triage
25
+ - test-driven-development
26
+ - debug
25
27
  - feature-delivery
26
28
  - refactor-safe
27
29
  - code-review
30
+ - commit
28
31
  - verification-before-completion
29
32
  - docs-writer
33
+ - extract
34
+ - go-pipeline
@@ -2,7 +2,15 @@
2
2
 
3
3
  ## Purpose
4
4
 
5
- Stop false completion claims by requiring fresh evidence before saying work is done, fixed, or passing.
5
+ Stop false completion claims by requiring fresh evidence before saying work is done, fixed, or passing. Evidence before claims, always.
6
+
7
+ ## The Iron Law
8
+
9
+ ```
10
+ NO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION EVIDENCE
11
+ ```
12
+
13
+ If you have not run the verification command in this response, you cannot claim it passes. No exceptions.
6
14
 
7
15
  ## When To Use
8
16
 
@@ -12,6 +20,37 @@ Load this skill when:
12
20
  - about to say tests or builds pass
13
21
  - about to mark work complete
14
22
  - about to commit, open a PR, or hand off finished work
23
+ - verifying implementation against a spec's acceptance criteria
24
+ - expressing satisfaction about work state ("done", "ready", "all good")
25
+
26
+ ## The Gate Function
27
+
28
+ ```
29
+ BEFORE claiming any status or expressing satisfaction:
30
+
31
+ 1. IDENTIFY: What command proves this claim?
32
+ 2. RUN: Execute the FULL command (fresh, complete)
33
+ 3. READ: Full output, check exit code, count failures
34
+ 4. VERIFY: Does output confirm the claim?
35
+ - If NO: State actual status with evidence
36
+ - If YES: State claim WITH evidence
37
+ 5. ONLY THEN: Make the claim
38
+
39
+ Skip any step = lying, not verifying
40
+ ```
41
+
42
+ ## Forbidden Words
43
+
44
+ Do not use these words in completion claims unless backed by fresh evidence run in the same response:
45
+
46
+ - "should work now"
47
+ - "probably fixed"
48
+ - "seems to pass"
49
+ - "looks correct"
50
+ - "I'm confident"
51
+ - "Great!", "Perfect!", "Done!" (before verification)
52
+
53
+ Replace with evidence: "Tests pass (42/42, 0 failures)" or "Build exits 0."
15
54
 
16
55
  ## Workflow
17
56
 
@@ -19,28 +58,103 @@ Load this skill when:
19
58
  2. Identify the command, test, or check that proves that claim.
20
59
  3. Run the most relevant verification available.
21
60
  4. Read the actual result, not just the expectation.
22
- 5. Report one of three states:
23
- - verified
24
- - failed verification
25
- - verification blocked
26
- 6. If blocked, state what remains unproven.
61
+ 5. If spec-linked, verify acceptance criteria coverage.
62
+ 6. Check verification levels per deliverable.
63
+ 7. Report one of three states:
64
+ - **verified** — evidence confirms the claim
65
+ - **failed verification** evidence contradicts the claim
66
+ - **verification blocked** — cannot verify, state what remains unproven
67
+
68
+ ## Verification Levels
69
+
70
+ For each deliverable, verify at three levels:
71
+
72
+ | Level | Check | Meaning |
73
+ |-------|-------|---------|
74
+ | **L1: EXISTS** | File/component/route exists | Created but unknown quality |
75
+ | **L2: SUBSTANTIVE** | Not a stub (no `return null`, empty handlers, TODO-only) | Has real implementation |
76
+ | **L3: WIRED** | Imported and used in the integration layer | Actually connected |
77
+
78
+ Report status per deliverable:
79
+
80
+ - L1+L2+L3: fully wired
81
+ - L1+L2 only: created but not integrated (flag it)
82
+ - L1 only (stub): exists but empty (blocks completion)
83
+ - Missing: not found (blocks completion)
84
+
85
+ ## Spec Acceptance Criteria Coverage
86
+
87
+ When work is linked to a spec, also verify:
88
+
89
+ 1. Map each acceptance criterion to its verification evidence.
90
+ 2. Report coverage:
91
+
92
+ ```
93
+ AC Coverage
94
+ ===========
95
+ - [x] AC-1: [description] — VERIFIED (test passes)
96
+ - [x] AC-2: [description] — VERIFIED (manual check)
97
+ - [ ] AC-3: [description] — NOT VERIFIED (no test exists)
98
+
99
+ Coverage: 2/3 (67%)
100
+ ```
101
+
102
+ 3. Flag any AC that has no verification evidence.
103
+
104
+ ## Common Rationalizations
105
+
106
+ | Excuse | Reality |
107
+ |--------|---------|
108
+ | "Should work now" | RUN the verification. |
109
+ | "I'm confident" | Confidence is not evidence. |
110
+ | "Just this once" | No exceptions. |
111
+ | "Linter passed" | Linter is not compiler is not test suite. |
112
+ | "Agent said success" | Verify independently. |
113
+ | "Partial check is enough" | Partial proves nothing about the unchecked parts. |
114
+ | "Different words so rule doesn't apply" | Spirit over letter. Any claim of success requires evidence. |
115
+ | "I'm tired" | Exhaustion is not an excuse to ship broken code. |
27
116
 
28
117
  ## Output Format
29
118
 
30
- - claim being checked
31
- - evidence run
32
- - result
33
- - final status
34
- - remaining uncertainty, if any
119
+ Present results using the Shared Output Contract:
120
+
121
+ 1. **Goal/Result** — what claim was checked and the verification status
122
+ 2. **Key Details:**
123
+ - claim being checked
124
+ - evidence run (exact command or check)
125
+ - result (pass/fail/blocked)
126
+ - verification level per deliverable (L1/L2/L3)
127
+ - AC coverage (if spec-linked)
128
+ - remaining uncertainty, if any
129
+ 3. **Next Action:**
130
+ - if verified → `code-review` or `commit`
131
+ - if failed → back to `feature-delivery` or `debug`
132
+ - if blocked → state what is needed
35
133
 
36
134
  ## Red Flags
37
135
 
38
- - saying `should work now`
136
+ - using any forbidden word without fresh evidence
137
+ - saying "should work now"
39
138
  - treating code edits as proof
40
139
  - using stale test output as fresh evidence
41
140
  - extrapolating from a partial check to a broader claim
42
141
  - declaring success while verification is still blocked
142
+ - marking stubs (L1 only) as complete
143
+ - skipping AC coverage check for spec-linked work
144
+ - expressing satisfaction before running verification
145
+
146
+ ## Checklist
147
+
148
+ - [ ] Claim identified
149
+ - [ ] Verification command/check identified
150
+ - [ ] Verification run (fresh, not stale)
151
+ - [ ] Actual result read (not assumed)
152
+ - [ ] No forbidden words used without evidence
153
+ - [ ] Verification levels checked per deliverable (L1/L2/L3)
154
+ - [ ] AC coverage verified (if spec-linked)
155
+ - [ ] Status reported (verified / failed / blocked)
156
+ - [ ] Remaining uncertainty stated
43
157
 
44
158
  ## Done Criteria
45
159
 
46
- This skill is complete when the claim is either backed by fresh evidence or explicitly marked as unverified with the blocker stated. If verification passes and a review is warranted, hand off to `code-review`.
160
+ This skill is complete when the claim is either backed by fresh evidence or explicitly marked as unverified with the blocker stated. If verification passes and a review is warranted, hand off to `code-review`. If spec-linked, AC coverage must be reported.