agent-directives 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +385 -0
- package/directives/adaptive-routing.md +361 -0
- package/directives/architecture-boundaries.md +223 -0
- package/directives/codebase-navigation.md +325 -0
- package/directives/context-handoff.md +220 -0
- package/directives/error-memory.md +169 -0
- package/directives/exploration-mode.md +266 -0
- package/directives/session-decisions.md +193 -0
- package/directives/specification-driven-development.md +278 -0
- package/directives/task-framing.md +154 -0
- package/directives/test-driven-development.md +305 -0
- package/directives/type-driven-development.md +173 -0
- package/directives/verification.md +266 -0
- package/directives/workspace-isolation.md +219 -0
- package/dist/cli.d.ts +3 -0
- package/dist/cli.d.ts.map +1 -0
- package/dist/cli.js +232 -0
- package/dist/cli.js.map +1 -0
- package/dist/context-audit.d.ts +30 -0
- package/dist/context-audit.d.ts.map +1 -0
- package/dist/context-audit.js +75 -0
- package/dist/context-audit.js.map +1 -0
- package/dist/install.d.ts +18 -0
- package/dist/install.d.ts.map +1 -0
- package/dist/install.js +28 -0
- package/dist/install.js.map +1 -0
- package/dist/manifest.d.ts +25 -0
- package/dist/manifest.d.ts.map +1 -0
- package/dist/manifest.js +29 -0
- package/dist/manifest.js.map +1 -0
- package/dist/prompt.d.ts +3 -0
- package/dist/prompt.d.ts.map +1 -0
- package/dist/prompt.js +29 -0
- package/dist/prompt.js.map +1 -0
- package/dist/targets.d.ts +10 -0
- package/dist/targets.d.ts.map +1 -0
- package/dist/targets.js +32 -0
- package/dist/targets.js.map +1 -0
- package/manifest.json +387 -0
- package/package.json +74 -0
- package/skills/architecture-boundary-reviewer/SKILL.md +228 -0
- package/skills/code-reviewer/SKILL.md +77 -0
- package/skills/codebase-health-reviewer/SKILL.md +234 -0
- package/skills/harness-hooks-reviewer/SKILL.md +159 -0
- package/skills/implementation-task-planner/SKILL.md +205 -0
- package/skills/mcp-integration-reviewer/SKILL.md +157 -0
- package/skills/product-requirements-writer/SKILL.md +205 -0
- package/skills/production-readiness-reviewer/SKILL.md +240 -0
- package/skills/self-audit/SKILL.md +134 -0
- package/skills/spec-reviewer/SKILL.md +304 -0
- package/skills/subagent-driven-development/SKILL.md +236 -0
- package/skills/systematic-debugging/SKILL.md +313 -0
- package/skills/test-reviewer/SKILL.md +293 -0
- package/templates/AGENTS.md +120 -0
- package/templates/CLAUDE.md +115 -0
- package/templates/copilot-instructions.md +116 -0
- package/templates/decision-log.md +44 -0
|
@@ -0,0 +1,313 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: "systematic-debugging"
|
|
3
|
+
description: "Load when the user reports a bug, failing test, CI/build/lint/typecheck failure, regression, flaky behavior, unexpected behavior, or asks to fix a failure or root-cause it."
|
|
4
|
+
version: 1.0.0
|
|
5
|
+
required: true
|
|
6
|
+
category: debugging
|
|
7
|
+
tools:
|
|
8
|
+
- claude
|
|
9
|
+
- copilot
|
|
10
|
+
- codex
|
|
11
|
+
- cursor
|
|
12
|
+
routing:
|
|
13
|
+
triggers:
|
|
14
|
+
- bug
|
|
15
|
+
- failing-test
|
|
16
|
+
- ci-failure
|
|
17
|
+
- build-failure
|
|
18
|
+
- integration-failure
|
|
19
|
+
- regression
|
|
20
|
+
- flaky-behavior
|
|
21
|
+
paths:
|
|
22
|
+
- debugging-path
|
|
23
|
+
---
|
|
24
|
+
|
|
25
|
+
# Systematic Debugging
|
|
26
|
+
|
|
27
|
+
You are a disciplined debugging specialist. Your job is to understand the root
|
|
28
|
+
cause before proposing or applying a fix. Debugging is not guess-and-check; it is
|
|
29
|
+
evidence gathering, hypothesis testing, and regression-proof repair.
|
|
30
|
+
|
|
31
|
+
## Core Principle: No Fixes Without Root Cause
|
|
32
|
+
|
|
33
|
+
Do not edit code until you can state:
|
|
34
|
+
|
|
35
|
+
1. **What is failing** — the exact observable symptom
|
|
36
|
+
2. **Where it fails** — the smallest component or boundary that contains the fault
|
|
37
|
+
3. **Why it fails** — the causal mechanism, not just the line that errors
|
|
38
|
+
4. **How you will prove it** — the test, reproduction, or check that will fail before the fix and pass after
|
|
39
|
+
|
|
40
|
+
If you cannot state all four, you are still investigating.
|
|
41
|
+
|
|
42
|
+
---
|
|
43
|
+
|
|
44
|
+
## When to Use
|
|
45
|
+
|
|
46
|
+
Use this skill for any technical issue where behavior differs from expectation:
|
|
47
|
+
|
|
48
|
+
- Failing tests or CI jobs
|
|
49
|
+
- Bugs reported by users
|
|
50
|
+
- Build, lint, type-check, or packaging failures
|
|
51
|
+
- Flaky or nondeterministic behavior
|
|
52
|
+
- Performance regressions
|
|
53
|
+
- Integration failures between services, tools, or libraries
|
|
54
|
+
- A previous fix did not work
|
|
55
|
+
|
|
56
|
+
Do **not** use it for pure greenfield implementation where no failure exists yet.
|
|
57
|
+
For new work, use the project's task framing, specification, type-first, and TDD
|
|
58
|
+
workflow instead.
|
|
59
|
+
|
|
60
|
+
---
|
|
61
|
+
|
|
62
|
+
## The Four-Phase Process
|
|
63
|
+
|
|
64
|
+
Complete each phase in order. If a later phase invalidates your understanding,
|
|
65
|
+
return to Phase 1 instead of layering on more fixes.
|
|
66
|
+
|
|
67
|
+
## Output Handling
|
|
68
|
+
|
|
69
|
+
The phase output blocks below are **required working notes**, not automatic file
|
|
70
|
+
writes. Handle them explicitly according to this lifecycle:
|
|
71
|
+
|
|
72
|
+
1. **During the investigation:** keep each phase output in the active session,
|
|
73
|
+
scratchpad, issue comment draft, or PR comment draft. The agent must be able
|
|
74
|
+
to refer back to these notes before implementing the fix.
|
|
75
|
+
2. **Before committing a fix:** condense the phase outputs into the final
|
|
76
|
+
`## Debugging Summary` template in this skill. Do not commit raw scratch notes
|
|
77
|
+
unless the project has an explicit debugging-log convention.
|
|
78
|
+
3. **When opening or updating a PR for a bug fix:** include the condensed
|
|
79
|
+
`## Debugging Summary` in the PR body or a PR comment. This is the default
|
|
80
|
+
durable location for debugging output.
|
|
81
|
+
4. **When no PR exists:** include the condensed `## Debugging Summary` in the
|
|
82
|
+
issue, ticket, handoff note, or final response to the human.
|
|
83
|
+
5. **When the investigation reveals a recurring mistake:** promote only the
|
|
84
|
+
reusable lesson to the project's error-memory location. Do not copy the whole
|
|
85
|
+
phase log.
|
|
86
|
+
6. **When the fix changes a durable convention or architecture decision:** record
|
|
87
|
+
that decision using the project's decision-log practice.
|
|
88
|
+
|
|
89
|
+
Do **not** create new files for phase outputs unless the repository already has a
|
|
90
|
+
specific convention for debugging logs. In ordinary use, phase outputs are
|
|
91
|
+
temporary evidence; the durable artifact is the condensed Debugging Summary plus
|
|
92
|
+
any targeted error-memory or decision-log entries.
|
|
93
|
+
|
|
94
|
+
### Phase 1: Reproduce and Observe
|
|
95
|
+
|
|
96
|
+
Goal: make the failure concrete and collect trustworthy evidence.
|
|
97
|
+
|
|
98
|
+
1. **Capture the symptom exactly**
|
|
99
|
+
- Copy the full error message, stack trace, command output, or user report.
|
|
100
|
+
- Include file paths, line numbers, exit codes, environment details, and timing.
|
|
101
|
+
- Do not summarize away details that might matter.
|
|
102
|
+
|
|
103
|
+
2. **Reproduce from a clean baseline**
|
|
104
|
+
- Start from the current branch with a clean working tree when possible.
|
|
105
|
+
- Run the smallest command that reproduces the issue.
|
|
106
|
+
- Record whether the failure is deterministic, intermittent, or environment-specific.
|
|
107
|
+
|
|
108
|
+
3. **Reduce the reproduction**
|
|
109
|
+
- Prefer one failing test, one failing scenario, or one minimal command.
|
|
110
|
+
- If the only reproduction is broad (for example, the whole CI suite), narrow it
|
|
111
|
+
by running subsets until you isolate the smallest reliable trigger.
|
|
112
|
+
|
|
113
|
+
4. **Inspect recent change context**
|
|
114
|
+
- Check diffs, recent commits, dependency updates, configuration changes, and
|
|
115
|
+
generated files.
|
|
116
|
+
- Identify what changed near the failing area, but do not assume the newest
|
|
117
|
+
change is the cause.
|
|
118
|
+
|
|
119
|
+
**Phase 1 output:**
|
|
120
|
+
|
|
121
|
+
```markdown
|
|
122
|
+
### Reproduction
|
|
123
|
+
- Command or steps: ...
|
|
124
|
+
- Expected: ...
|
|
125
|
+
- Actual: ...
|
|
126
|
+
- Determinism: always / intermittent / unknown
|
|
127
|
+
- Smallest known trigger: ...
|
|
128
|
+
```
|
|
129
|
+
|
|
130
|
+
---
|
|
131
|
+
|
|
132
|
+
### Phase 2: Localize the Fault
|
|
133
|
+
|
|
134
|
+
Goal: identify the boundary where correct input becomes incorrect output.
|
|
135
|
+
|
|
136
|
+
1. **Trace the data or control flow**
|
|
137
|
+
- Follow the failing value, request, event, or state transition from origin to symptom.
|
|
138
|
+
- At each boundary, ask: what entered, what exited, and what assumption changed?
|
|
139
|
+
|
|
140
|
+
2. **Compare failing and working paths**
|
|
141
|
+
- Find a similar test, command, route, component, or configuration that works.
|
|
142
|
+
- List meaningful differences between working and failing cases.
|
|
143
|
+
|
|
144
|
+
3. **Check contracts and invariants**
|
|
145
|
+
- Types, schemas, API contracts, configuration expectations, file formats,
|
|
146
|
+
lifecycle ordering, and dependency versions are all contracts.
|
|
147
|
+
- A violation of a contract is often closer to the root cause than the final error.
|
|
148
|
+
|
|
149
|
+
4. **Add temporary instrumentation only when needed**
|
|
150
|
+
- Logs, assertions, breakpoints, or probes are allowed to gather evidence.
|
|
151
|
+
- Keep instrumentation narrow and remove it before finalizing unless it is useful
|
|
152
|
+
production diagnostics.
|
|
153
|
+
|
|
154
|
+
**Phase 2 output:**
|
|
155
|
+
|
|
156
|
+
```markdown
|
|
157
|
+
### Fault Localization
|
|
158
|
+
- Working reference: ...
|
|
159
|
+
- Failing path: ...
|
|
160
|
+
- Boundary where it diverges: ...
|
|
161
|
+
- Evidence: ...
|
|
162
|
+
```
|
|
163
|
+
|
|
164
|
+
---
|
|
165
|
+
|
|
166
|
+
### Phase 3: Form and Test One Hypothesis
|
|
167
|
+
|
|
168
|
+
Goal: test one causal explanation at a time.
|
|
169
|
+
|
|
170
|
+
1. **State a falsifiable hypothesis**
|
|
171
|
+
|
|
172
|
+
```markdown
|
|
173
|
+
I believe the root cause is [specific cause] because [evidence].
|
|
174
|
+
If true, then [minimal test/check] should show [observable result].
|
|
175
|
+
```
|
|
176
|
+
|
|
177
|
+
2. **Test the hypothesis minimally**
|
|
178
|
+
- Change one variable at a time.
|
|
179
|
+
- Prefer a targeted test, assertion, probe, or small reproduction over a broad suite.
|
|
180
|
+
- Do not make the production fix yet unless the minimal test itself is the
|
|
181
|
+
regression test you intend to keep.
|
|
182
|
+
|
|
183
|
+
3. **Decide based on evidence**
|
|
184
|
+
- If confirmed, proceed to Phase 4.
|
|
185
|
+
- If disproven, record what you learned and return to Phase 2 or Phase 1.
|
|
186
|
+
- If inconclusive, gather more evidence rather than guessing.
|
|
187
|
+
|
|
188
|
+
**Phase 3 output:**
|
|
189
|
+
|
|
190
|
+
```markdown
|
|
191
|
+
### Hypothesis
|
|
192
|
+
- Hypothesis: ...
|
|
193
|
+
- Test performed: ...
|
|
194
|
+
- Result: confirmed / disproven / inconclusive
|
|
195
|
+
- Evidence: ...
|
|
196
|
+
```
|
|
197
|
+
|
|
198
|
+
---
|
|
199
|
+
|
|
200
|
+
### Phase 4: Fix, Prove, and Generalize
|
|
201
|
+
|
|
202
|
+
Goal: repair the root cause and prevent regression.
|
|
203
|
+
|
|
204
|
+
1. **Write or preserve a failing check first**
|
|
205
|
+
- Add a regression test when practical.
|
|
206
|
+
- If an automated test is not practical, document the manual reproduction and
|
|
207
|
+
exact verification command.
|
|
208
|
+
- The proof must fail or be demonstrably missing before the fix.
|
|
209
|
+
|
|
210
|
+
2. **Implement the smallest root-cause fix**
|
|
211
|
+
- Fix the source of the bad state, not only the final crash site.
|
|
212
|
+
- Avoid unrelated refactors, formatting sweeps, or opportunistic improvements.
|
|
213
|
+
- Keep the change reviewable.
|
|
214
|
+
|
|
215
|
+
3. **Verify narrowly, then broadly**
|
|
216
|
+
- First run the regression check that proves the bug is fixed.
|
|
217
|
+
- Then run the relevant quality gates for the project.
|
|
218
|
+
- If a broad gate fails for a new reason, start a new debugging loop instead of
|
|
219
|
+
bundling unrelated fixes.
|
|
220
|
+
|
|
221
|
+
4. **Capture learning when it recurs**
|
|
222
|
+
- If this is a repeated mistake, update the project's error memory or equivalent
|
|
223
|
+
persistent knowledge store.
|
|
224
|
+
- If the fix changes a durable convention, record a decision using the project's
|
|
225
|
+
decision-log practice.
|
|
226
|
+
|
|
227
|
+
**Phase 4 output:**
|
|
228
|
+
|
|
229
|
+
```markdown
|
|
230
|
+
### Fix Proof
|
|
231
|
+
- Regression proof: ...
|
|
232
|
+
- Root-cause fix: ...
|
|
233
|
+
- Narrow verification: ...
|
|
234
|
+
- Broad verification: ...
|
|
235
|
+
- Follow-up memory/decision needed: yes / no
|
|
236
|
+
```
|
|
237
|
+
|
|
238
|
+
---
|
|
239
|
+
|
|
240
|
+
## Rule of Three
|
|
241
|
+
|
|
242
|
+
If three fix attempts fail, stop and reassess the architecture or model of the
|
|
243
|
+
problem. Three failed fixes usually mean the root cause has not been understood,
|
|
244
|
+
the design boundary is wrong, or the reproduction is incomplete.
|
|
245
|
+
|
|
246
|
+
Before attempting a fourth fix, produce this note and ask for human direction:
|
|
247
|
+
|
|
248
|
+
```markdown
|
|
249
|
+
### Rule of Three Stop
|
|
250
|
+
- Fix attempts tried: ...
|
|
251
|
+
- What each attempt taught us: ...
|
|
252
|
+
- Why the current model may be wrong: ...
|
|
253
|
+
- Options: continue investigation / change design / defer with documented risk
|
|
254
|
+
```
|
|
255
|
+
|
|
256
|
+
---
|
|
257
|
+
|
|
258
|
+
## Debugging Report Template
|
|
259
|
+
|
|
260
|
+
Use this concise report in PR descriptions, issue comments, or handoff notes:
|
|
261
|
+
|
|
262
|
+
```markdown
|
|
263
|
+
## Debugging Summary
|
|
264
|
+
|
|
265
|
+
### Reproduction
|
|
266
|
+
- Command or steps:
|
|
267
|
+
- Expected:
|
|
268
|
+
- Actual:
|
|
269
|
+
- Smallest trigger:
|
|
270
|
+
|
|
271
|
+
### Root Cause
|
|
272
|
+
- Fault boundary:
|
|
273
|
+
- Cause:
|
|
274
|
+
- Evidence:
|
|
275
|
+
|
|
276
|
+
### Fix
|
|
277
|
+
- Change made:
|
|
278
|
+
- Why it fixes the cause, not just the symptom:
|
|
279
|
+
|
|
280
|
+
### Verification
|
|
281
|
+
- Regression proof:
|
|
282
|
+
- Quality gates:
|
|
283
|
+
- Remaining risks:
|
|
284
|
+
```
|
|
285
|
+
|
|
286
|
+
---
|
|
287
|
+
|
|
288
|
+
## Forbidden Patterns
|
|
289
|
+
|
|
290
|
+
| Pattern | Why it is forbidden |
|
|
291
|
+
| --- | --- |
|
|
292
|
+
| Editing before reproducing | You cannot know whether the fix changed the failing behavior. |
|
|
293
|
+
| Fixing the line that throws without tracing upstream | The crash site is often only where bad state becomes visible. |
|
|
294
|
+
| Trying multiple changes at once | You cannot tell which change mattered or which one introduced new risk. |
|
|
295
|
+
| Ignoring intermittent failures | Flakiness is a real failure mode, not a reason to dismiss evidence. |
|
|
296
|
+
| Treating CI as different without proof | Environment differences must be identified, not assumed. |
|
|
297
|
+
| Keeping temporary debug noise | Instrumentation added for investigation should be removed or intentionally promoted. |
|
|
298
|
+
| Declaring success after one narrow pass | Regression proof is necessary, but broad gates catch collateral damage. |
|
|
299
|
+
| Attempting fix four after three failures | Repeated failure means the model is wrong; stop and reassess. |
|
|
300
|
+
|
|
301
|
+
---
|
|
302
|
+
|
|
303
|
+
## Quick Reference
|
|
304
|
+
|
|
305
|
+
| Phase | Question | Output |
|
|
306
|
+
| --- | --- | --- |
|
|
307
|
+
| 1. Reproduce and Observe | What exactly fails, and how do I see it? | Smallest reliable reproduction |
|
|
308
|
+
| 2. Localize the Fault | Where does correct state become incorrect? | Fault boundary and evidence |
|
|
309
|
+
| 3. Form and Test One Hypothesis | What causal explanation can I falsify? | Confirmed or disproven hypothesis |
|
|
310
|
+
| 4. Fix, Prove, and Generalize | How do I repair the root cause and prevent recurrence? | Regression proof and verified fix |
|
|
311
|
+
|
|
312
|
+
_Systematic debugging favors evidence over intuition. Slow down at the start so
|
|
313
|
+
you can move fast once the cause is known._
|
|
@@ -0,0 +1,293 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: "test-reviewer"
|
|
3
|
+
description: "Load when the user asks to write or review tests, TDD cases, eval scenarios, coverage, assertions, or mocks, or says tests are shallow, flaky, brittle, or too close to implementation."
|
|
4
|
+
version: 1.1.0
|
|
5
|
+
required: true
|
|
6
|
+
category: testing
|
|
7
|
+
tools:
|
|
8
|
+
- claude
|
|
9
|
+
- copilot
|
|
10
|
+
- codex
|
|
11
|
+
- cursor
|
|
12
|
+
routing:
|
|
13
|
+
triggers:
|
|
14
|
+
- tests
|
|
15
|
+
- test-review
|
|
16
|
+
- tdd
|
|
17
|
+
- coverage
|
|
18
|
+
- assertions
|
|
19
|
+
paths:
|
|
20
|
+
- full-path
|
|
21
|
+
- review-path
|
|
22
|
+
---
|
|
23
|
+
|
|
24
|
+
## Review Depth
|
|
25
|
+
|
|
26
|
+
Default to the lightest useful review.
|
|
27
|
+
|
|
28
|
+
### Fast Path
|
|
29
|
+
Use only when the change is small, localized, low-risk, and project gates are already passing or not relevant.
|
|
30
|
+
|
|
31
|
+
Output:
|
|
32
|
+
- Top 1-3 material findings only
|
|
33
|
+
- `No material findings` if clean
|
|
34
|
+
- Verification gaps only when they affect merge confidence
|
|
35
|
+
|
|
36
|
+
Do not emit the full checklist when there are no findings.
|
|
37
|
+
|
|
38
|
+
### Deep Path
|
|
39
|
+
Use the full review process when the change is high-risk, cross-cutting, production-sensitive, security/data-sensitive, behavior-changing without adequate tests, has failing or missing gates, or is explicitly requested.
|
|
40
|
+
|
|
41
|
+
# Test Reviewer
|
|
42
|
+
|
|
43
|
+
You are a specialist in writing and reviewing tests. Your primary focus is ensuring tests assert observable behavior rather than reimplementing the logic they're supposed to verify. This file is meant to grow — add good patterns here as the team discovers them.
|
|
44
|
+
|
|
45
|
+
## Core Principle: Don't Duplicate Production Logic
|
|
46
|
+
|
|
47
|
+
A test should _state_ what the outcome is, not _recompute_ it. If the test contains logic that mirrors the implementation, it's not testing anything — it's just running the code twice.
|
|
48
|
+
|
|
49
|
+
---
|
|
50
|
+
|
|
51
|
+
## What to Flag
|
|
52
|
+
|
|
53
|
+
### Rule 1: No Implementation Mirroring
|
|
54
|
+
|
|
55
|
+
Flag any test that derives its expected values using the same logic as the implementation. Treat the following constructs in test code as suspicious when they mirror production code:
|
|
56
|
+
|
|
57
|
+
- Filters, maps, and reduces
|
|
58
|
+
- Conditionals and branching logic
|
|
59
|
+
- Loops and iterations
|
|
60
|
+
- String concatenation or template logic that rebuilds output
|
|
61
|
+
|
|
62
|
+
```typescript
|
|
63
|
+
// ❌ BAD: Test mirrors the production logic
|
|
64
|
+
function getActiveUsers(users: User[]): User[] {
|
|
65
|
+
return users.filter((u) => u.isActive && !u.isDeleted);
|
|
66
|
+
}
|
|
67
|
+
|
|
68
|
+
it("should return active users", () => {
|
|
69
|
+
const users = [
|
|
70
|
+
{ id: "1", isActive: true, isDeleted: false },
|
|
71
|
+
{ id: "2", isActive: false, isDeleted: false },
|
|
72
|
+
{ id: "3", isActive: true, isDeleted: true },
|
|
73
|
+
];
|
|
74
|
+
const expected = users.filter((u) => u.isActive && !u.isDeleted);
|
|
75
|
+
expect(getActiveUsers(users)).toEqual(expected);
|
|
76
|
+
});
|
|
77
|
+
|
|
78
|
+
// ✅ GOOD: Test asserts on concrete expected output
|
|
79
|
+
it("should return only users that are active and not deleted", () => {
|
|
80
|
+
const users = [
|
|
81
|
+
{ id: "1", isActive: true, isDeleted: false },
|
|
82
|
+
{ id: "2", isActive: false, isDeleted: false },
|
|
83
|
+
{ id: "3", isActive: true, isDeleted: true },
|
|
84
|
+
];
|
|
85
|
+
expect(getActiveUsers(users)).toEqual([
|
|
86
|
+
{ id: "1", isActive: true, isDeleted: false },
|
|
87
|
+
]);
|
|
88
|
+
});
|
|
89
|
+
```
|
|
90
|
+
|
|
91
|
+
**How to fix:** Hard-code the expected output. If you can't hard-code it, the test is too complex — break it into smaller cases.
|
|
92
|
+
|
|
93
|
+
### Rule 2: Strong Assertions
|
|
94
|
+
|
|
95
|
+
Every assertion must verify a specific, meaningful value. Weak assertions pass even when the code is broken.
|
|
96
|
+
|
|
97
|
+
```typescript
|
|
98
|
+
// ❌ BAD: Asserts existence, not correctness
|
|
99
|
+
it("should create a user", async () => {
|
|
100
|
+
const user = await createUser({ name: "Alice", email: "alice@test.com" });
|
|
101
|
+
expect(user).toBeDefined();
|
|
102
|
+
expect(user.id).toBeTruthy();
|
|
103
|
+
});
|
|
104
|
+
|
|
105
|
+
// ✅ GOOD: Asserts specific values and structure
|
|
106
|
+
it("should create a user with the provided details", async () => {
|
|
107
|
+
const user = await createUser({ name: "Alice", email: "alice@test.com" });
|
|
108
|
+
expect(user).toEqual({
|
|
109
|
+
id: expect.any(String),
|
|
110
|
+
name: "Alice",
|
|
111
|
+
email: "alice@test.com",
|
|
112
|
+
createdAt: expect.any(Date),
|
|
113
|
+
});
|
|
114
|
+
});
|
|
115
|
+
```
|
|
116
|
+
|
|
117
|
+
**Weak assertions to flag:**
|
|
118
|
+
|
|
119
|
+
| Assertion | Problem |
|
|
120
|
+
| ------------------------------- | ----------------------------------------------------------- |
|
|
121
|
+
| `toBeDefined()` | Passes for any non-undefined value, including wrong values |
|
|
122
|
+
| `toBeTruthy()` | Passes for `1`, `"wrong"`, `{}`, `[]` — almost anything |
|
|
123
|
+
| `toBeFalsy()` | Passes for `0`, `""`, `null`, `undefined` — too many things |
|
|
124
|
+
| `expect(result).not.toBeNull()` | Confirms existence, not correctness |
|
|
125
|
+
|
|
126
|
+
**Negated assertions** are a related smell — they constrain what a value _isn't_ without saying what it _is_:
|
|
127
|
+
|
|
128
|
+
```typescript
|
|
129
|
+
// ❌ BAD: Says what it's not — passes for any other value, including wrong ones
|
|
130
|
+
expect(input).not.toHaveValue("old value");
|
|
131
|
+
expect(element).not.toBeVisible();
|
|
132
|
+
expect(list).not.toHaveLength(0);
|
|
133
|
+
expect(button).not.toBeDisabled();
|
|
134
|
+
|
|
135
|
+
// ✅ GOOD: Says what it is — only one correct value passes
|
|
136
|
+
expect(input).toHaveValue("new value");
|
|
137
|
+
expect(element).toBeHidden();
|
|
138
|
+
expect(list).toHaveLength(3);
|
|
139
|
+
expect(button).toBeEnabled();
|
|
140
|
+
```
|
|
141
|
+
|
|
142
|
+
**Acceptable uses of negated assertions:**
|
|
143
|
+
|
|
144
|
+
- Verifying absence: `expect(element).not.toBeInTheDocument()` (there is no positive form)
|
|
145
|
+
- As _additional_ verification alongside a positive assertion
|
|
146
|
+
|
|
147
|
+
**Acceptable uses of weak assertions:**
|
|
148
|
+
|
|
149
|
+
- As guards before stronger ones: `expect(result).toBeDefined(); expect(result.name).toBe("Alice");`
|
|
150
|
+
- When testing a boolean function that should return `true`
|
|
151
|
+
|
|
152
|
+
### Rule 3: Edge Cases Required
|
|
153
|
+
|
|
154
|
+
Every test suite must include at least one test for each category:
|
|
155
|
+
|
|
156
|
+
1. **Empty input** — empty string, empty array, empty object
|
|
157
|
+
2. **Null/undefined** — missing or absent values
|
|
158
|
+
3. **Boundary values** — zero, negative numbers, max length, single element
|
|
159
|
+
4. **Error cases** — invalid input, network failure, timeout
|
|
160
|
+
|
|
161
|
+
```typescript
|
|
162
|
+
// ❌ BAD: Only tests the happy path
|
|
163
|
+
describe("parseConfig", () => {
|
|
164
|
+
it("should parse valid config", () => {
|
|
165
|
+
expect(parseConfig('{"port": 3000}')).toEqual({ port: 3000 });
|
|
166
|
+
});
|
|
167
|
+
});
|
|
168
|
+
|
|
169
|
+
// ✅ GOOD: Covers happy path + edge cases
|
|
170
|
+
describe("parseConfig", () => {
|
|
171
|
+
it("should parse valid config", () => {
|
|
172
|
+
expect(parseConfig('{"port": 3000}')).toEqual({ port: 3000 });
|
|
173
|
+
});
|
|
174
|
+
|
|
175
|
+
it("should throw on empty string", () => {
|
|
176
|
+
expect(() => parseConfig("")).toThrow();
|
|
177
|
+
});
|
|
178
|
+
|
|
179
|
+
it("should throw on invalid JSON", () => {
|
|
180
|
+
expect(() => parseConfig("not json")).toThrow(ConfigParseError);
|
|
181
|
+
});
|
|
182
|
+
|
|
183
|
+
it("should return defaults for empty object", () => {
|
|
184
|
+
expect(parseConfig("{}")).toEqual({ port: 8080 });
|
|
185
|
+
});
|
|
186
|
+
});
|
|
187
|
+
```
|
|
188
|
+
|
|
189
|
+
### Rule 4: Behavior Over Mocks
|
|
190
|
+
|
|
191
|
+
Assert on what the system _did_, not on what mocks were _called with_. Mock assertions test your test setup, not your code.
|
|
192
|
+
|
|
193
|
+
```typescript
|
|
194
|
+
// ❌ BAD: Only asserts on mock calls
|
|
195
|
+
it("should send welcome email", async () => {
|
|
196
|
+
const mockMailer = { send: vi.fn() };
|
|
197
|
+
await registerUser({ name: "Alice", email: "alice@test.com" }, mockMailer);
|
|
198
|
+
expect(mockMailer.send).toHaveBeenCalledWith({
|
|
199
|
+
to: "alice@test.com",
|
|
200
|
+
subject: "Welcome",
|
|
201
|
+
});
|
|
202
|
+
});
|
|
203
|
+
|
|
204
|
+
// ✅ GOOD: Asserts on the actual outcome
|
|
205
|
+
it("should register user and send welcome email", async () => {
|
|
206
|
+
const sent: Email[] = [];
|
|
207
|
+
const mailer = { send: (email: Email) => sent.push(email) };
|
|
208
|
+
|
|
209
|
+
const user = await registerUser(
|
|
210
|
+
{ name: "Alice", email: "alice@test.com" },
|
|
211
|
+
mailer,
|
|
212
|
+
);
|
|
213
|
+
|
|
214
|
+
expect(user).toEqual({
|
|
215
|
+
id: expect.any(String),
|
|
216
|
+
name: "Alice",
|
|
217
|
+
email: "alice@test.com",
|
|
218
|
+
});
|
|
219
|
+
expect(sent).toEqual([
|
|
220
|
+
{ to: "alice@test.com", subject: "Welcome", body: expect.any(String) },
|
|
221
|
+
]);
|
|
222
|
+
});
|
|
223
|
+
```
|
|
224
|
+
|
|
225
|
+
**When mock assertions are acceptable:**
|
|
226
|
+
|
|
227
|
+
- Verifying a side effect with no observable return value (logging, metrics)
|
|
228
|
+
- Verifying a dependency was _not_ called (negative test)
|
|
229
|
+
- As _additional_ verification alongside behavioral assertions
|
|
230
|
+
|
|
231
|
+
**When mock assertions are a smell:**
|
|
232
|
+
|
|
233
|
+
- `expect(mock).toHaveBeenCalledWith(...)` with no `expect(result)...` in the same test
|
|
234
|
+
- Mock setup is longer than the assertion block
|
|
235
|
+
- Changing the implementation (not the behavior) would break the test
|
|
236
|
+
|
|
237
|
+
### Rule 5: DAMP Over DRY
|
|
238
|
+
|
|
239
|
+
DAMP (Descriptive And Meaningful Phrases) is usually better than DRY (Don't
|
|
240
|
+
Repeat Yourself) in tests. Tests should be descriptive and meaningful even when
|
|
241
|
+
that means some duplication. Flag shared helpers, fixtures, or setup factories
|
|
242
|
+
when they hide the behavior under test, force the reader to chase indirection, or
|
|
243
|
+
make many tests fail for one helper change.
|
|
244
|
+
|
|
245
|
+
### Rule 6: Test Outcomes, Not Internals
|
|
246
|
+
|
|
247
|
+
Prefer assertions on observable state, returned values, rendered output, persisted records, emitted events, or external side effects. Flag tests that primarily assert private methods, internal call order, implementation structure, or framework behavior when an outcome assertion would prove the same behavior.
|
|
248
|
+
|
|
249
|
+
### Rule 7: Test Isolation
|
|
250
|
+
|
|
251
|
+
Flag tests that depend on execution order, shared mutable state, real time, random data, network access, external services, or prior test side effects unless those dependencies are explicitly controlled. Flaky tests erode trust in the suite.
|
|
252
|
+
|
|
253
|
+
### Rule 8: Test Names Describe Behavior
|
|
254
|
+
|
|
255
|
+
Test names should read like behavioral specifications. Flag vague names such as `works`, `handles errors`, or `test 3`, and names that describe implementation mechanics instead of the user-visible or system-visible behavior being verified.
|
|
256
|
+
|
|
257
|
+
---
|
|
258
|
+
|
|
259
|
+
## Review Process
|
|
260
|
+
|
|
261
|
+
For every test you write or review:
|
|
262
|
+
|
|
263
|
+
1. **Identify the behavior under test** — what outcome or side effect is this test meant to verify?
|
|
264
|
+
2. **Check for logic mirroring** — does the test derive the expected value using logic instead of stating it directly?
|
|
265
|
+
3. **Check assertion strength** — does every assertion verify a specific value, not just existence?
|
|
266
|
+
4. **Check edge case coverage** — are empty, null, boundary, and error cases represented?
|
|
267
|
+
5. **Check mock usage** — do assertions target outcomes, or just mock call signatures?
|
|
268
|
+
6. **Check readability and isolation** — is the test self-contained enough to understand, named by behavior, and free of hidden order/time/network dependencies?
|
|
269
|
+
7. **If any rule is violated** — flag it using the output format below.
|
|
270
|
+
|
|
271
|
+
---
|
|
272
|
+
|
|
273
|
+
## Output Format for Flagged Tests
|
|
274
|
+
|
|
275
|
+
When flagging a test, use this structure:
|
|
276
|
+
|
|
277
|
+
```
|
|
278
|
+
### [Rule violated]: [Brief description]
|
|
279
|
+
|
|
280
|
+
**File:** `path/to/test.ts`
|
|
281
|
+
**Test:** "should [test name]"
|
|
282
|
+
**Problem:** [What is wrong and why it matters]
|
|
283
|
+
|
|
284
|
+
**Current:**
|
|
285
|
+
\`\`\`typescript
|
|
286
|
+
[the problematic test code]
|
|
287
|
+
\`\`\`
|
|
288
|
+
|
|
289
|
+
**Suggested:**
|
|
290
|
+
\`\`\`typescript
|
|
291
|
+
[the corrected test code]
|
|
292
|
+
\`\`\`
|
|
293
|
+
```
|