agent-directives 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (57) hide show
  1. package/README.md +385 -0
  2. package/directives/adaptive-routing.md +361 -0
  3. package/directives/architecture-boundaries.md +223 -0
  4. package/directives/codebase-navigation.md +325 -0
  5. package/directives/context-handoff.md +220 -0
  6. package/directives/error-memory.md +169 -0
  7. package/directives/exploration-mode.md +266 -0
  8. package/directives/session-decisions.md +193 -0
  9. package/directives/specification-driven-development.md +278 -0
  10. package/directives/task-framing.md +154 -0
  11. package/directives/test-driven-development.md +305 -0
  12. package/directives/type-driven-development.md +173 -0
  13. package/directives/verification.md +266 -0
  14. package/directives/workspace-isolation.md +219 -0
  15. package/dist/cli.d.ts +3 -0
  16. package/dist/cli.d.ts.map +1 -0
  17. package/dist/cli.js +232 -0
  18. package/dist/cli.js.map +1 -0
  19. package/dist/context-audit.d.ts +30 -0
  20. package/dist/context-audit.d.ts.map +1 -0
  21. package/dist/context-audit.js +75 -0
  22. package/dist/context-audit.js.map +1 -0
  23. package/dist/install.d.ts +18 -0
  24. package/dist/install.d.ts.map +1 -0
  25. package/dist/install.js +28 -0
  26. package/dist/install.js.map +1 -0
  27. package/dist/manifest.d.ts +25 -0
  28. package/dist/manifest.d.ts.map +1 -0
  29. package/dist/manifest.js +29 -0
  30. package/dist/manifest.js.map +1 -0
  31. package/dist/prompt.d.ts +3 -0
  32. package/dist/prompt.d.ts.map +1 -0
  33. package/dist/prompt.js +29 -0
  34. package/dist/prompt.js.map +1 -0
  35. package/dist/targets.d.ts +10 -0
  36. package/dist/targets.d.ts.map +1 -0
  37. package/dist/targets.js +32 -0
  38. package/dist/targets.js.map +1 -0
  39. package/manifest.json +387 -0
  40. package/package.json +74 -0
  41. package/skills/architecture-boundary-reviewer/SKILL.md +228 -0
  42. package/skills/code-reviewer/SKILL.md +77 -0
  43. package/skills/codebase-health-reviewer/SKILL.md +234 -0
  44. package/skills/harness-hooks-reviewer/SKILL.md +159 -0
  45. package/skills/implementation-task-planner/SKILL.md +205 -0
  46. package/skills/mcp-integration-reviewer/SKILL.md +157 -0
  47. package/skills/product-requirements-writer/SKILL.md +205 -0
  48. package/skills/production-readiness-reviewer/SKILL.md +240 -0
  49. package/skills/self-audit/SKILL.md +134 -0
  50. package/skills/spec-reviewer/SKILL.md +304 -0
  51. package/skills/subagent-driven-development/SKILL.md +236 -0
  52. package/skills/systematic-debugging/SKILL.md +313 -0
  53. package/skills/test-reviewer/SKILL.md +293 -0
  54. package/templates/AGENTS.md +120 -0
  55. package/templates/CLAUDE.md +115 -0
  56. package/templates/copilot-instructions.md +116 -0
  57. package/templates/decision-log.md +44 -0
@@ -0,0 +1,313 @@
1
+ ---
2
+ name: "systematic-debugging"
3
+ description: "Load when the user reports a bug, failing test, CI/build/lint/typecheck failure, regression, flaky behavior, unexpected behavior, or asks to fix a failure or root-cause it."
4
+ version: 1.0.0
5
+ required: true
6
+ category: debugging
7
+ tools:
8
+ - claude
9
+ - copilot
10
+ - codex
11
+ - cursor
12
+ routing:
13
+ triggers:
14
+ - bug
15
+ - failing-test
16
+ - ci-failure
17
+ - build-failure
18
+ - integration-failure
19
+ - regression
20
+ - flaky-behavior
21
+ paths:
22
+ - debugging-path
23
+ ---
24
+
25
+ # Systematic Debugging
26
+
27
+ You are a disciplined debugging specialist. Your job is to understand the root
28
+ cause before proposing or applying a fix. Debugging is not guess-and-check; it is
29
+ evidence gathering, hypothesis testing, and regression-proof repair.
30
+
31
+ ## Core Principle: No Fixes Without Root Cause
32
+
33
+ Do not edit code until you can state:
34
+
35
+ 1. **What is failing** — the exact observable symptom
36
+ 2. **Where it fails** — the smallest component or boundary that contains the fault
37
+ 3. **Why it fails** — the causal mechanism, not just the line that errors
38
+ 4. **How you will prove it** — the test, reproduction, or check that will fail before the fix and pass after
39
+
40
+ If you cannot state all four, you are still investigating.
41
+
42
+ ---
43
+
44
+ ## When to Use
45
+
46
+ Use this skill for any technical issue where behavior differs from expectation:
47
+
48
+ - Failing tests or CI jobs
49
+ - Bugs reported by users
50
+ - Build, lint, type-check, or packaging failures
51
+ - Flaky or nondeterministic behavior
52
+ - Performance regressions
53
+ - Integration failures between services, tools, or libraries
54
+ - A previous fix did not work
55
+
56
+ Do **not** use it for pure greenfield implementation where no failure exists yet.
57
+ For new work, use the project's task framing, specification, type-first, and TDD
58
+ workflow instead.
59
+
60
+ ---
61
+
62
+ ## The Four-Phase Process
63
+
64
+ Complete each phase in order. If a later phase invalidates your understanding,
65
+ return to Phase 1 instead of layering on more fixes.
66
+
67
+ ## Output Handling
68
+
69
+ The phase output blocks below are **required working notes**, not automatic file
70
+ writes. Handle them explicitly according to this lifecycle:
71
+
72
+ 1. **During the investigation:** keep each phase output in the active session,
73
+ scratchpad, issue comment draft, or PR comment draft. The agent must be able
74
+ to refer back to these notes before implementing the fix.
75
+ 2. **Before committing a fix:** condense the phase outputs into the final
76
+ `## Debugging Summary` template in this skill. Do not commit raw scratch notes
77
+ unless the project has an explicit debugging-log convention.
78
+ 3. **When opening or updating a PR for a bug fix:** include the condensed
79
+ `## Debugging Summary` in the PR body or a PR comment. This is the default
80
+ durable location for debugging output.
81
+ 4. **When no PR exists:** include the condensed `## Debugging Summary` in the
82
+ issue, ticket, handoff note, or final response to the human.
83
+ 5. **When the investigation reveals a recurring mistake:** promote only the
84
+ reusable lesson to the project's error-memory location. Do not copy the whole
85
+ phase log.
86
+ 6. **When the fix changes a durable convention or architecture decision:** record
87
+ that decision using the project's decision-log practice.
88
+
89
+ Do **not** create new files for phase outputs unless the repository already has a
90
+ specific convention for debugging logs. In ordinary use, phase outputs are
91
+ temporary evidence; the durable artifact is the condensed Debugging Summary plus
92
+ any targeted error-memory or decision-log entries.
93
+
94
+ ### Phase 1: Reproduce and Observe
95
+
96
+ Goal: make the failure concrete and collect trustworthy evidence.
97
+
98
+ 1. **Capture the symptom exactly**
99
+ - Copy the full error message, stack trace, command output, or user report.
100
+ - Include file paths, line numbers, exit codes, environment details, and timing.
101
+ - Do not summarize away details that might matter.
102
+
103
+ 2. **Reproduce from a clean baseline**
104
+ - Start from the current branch with a clean working tree when possible.
105
+ - Run the smallest command that reproduces the issue.
106
+ - Record whether the failure is deterministic, intermittent, or environment-specific.
107
+
108
+ 3. **Reduce the reproduction**
109
+ - Prefer one failing test, one failing scenario, or one minimal command.
110
+ - If the only reproduction is broad (for example, the whole CI suite), narrow it
111
+ by running subsets until you isolate the smallest reliable trigger.
112
+
113
+ 4. **Inspect recent change context**
114
+ - Check diffs, recent commits, dependency updates, configuration changes, and
115
+ generated files.
116
+ - Identify what changed near the failing area, but do not assume the newest
117
+ change is the cause.
118
+
119
+ **Phase 1 output:**
120
+
121
+ ```markdown
122
+ ### Reproduction
123
+ - Command or steps: ...
124
+ - Expected: ...
125
+ - Actual: ...
126
+ - Determinism: always / intermittent / unknown
127
+ - Smallest known trigger: ...
128
+ ```
129
+
130
+ ---
131
+
132
+ ### Phase 2: Localize the Fault
133
+
134
+ Goal: identify the boundary where correct input becomes incorrect output.
135
+
136
+ 1. **Trace the data or control flow**
137
+ - Follow the failing value, request, event, or state transition from origin to symptom.
138
+ - At each boundary, ask: what entered, what exited, and what assumption changed?
139
+
140
+ 2. **Compare failing and working paths**
141
+ - Find a similar test, command, route, component, or configuration that works.
142
+ - List meaningful differences between working and failing cases.
143
+
144
+ 3. **Check contracts and invariants**
145
+ - Types, schemas, API contracts, configuration expectations, file formats,
146
+ lifecycle ordering, and dependency versions are all contracts.
147
+ - A violation of a contract is often closer to the root cause than the final error.
148
+
149
+ 4. **Add temporary instrumentation only when needed**
150
+ - Logs, assertions, breakpoints, or probes are allowed to gather evidence.
151
+ - Keep instrumentation narrow and remove it before finalizing unless it is useful
152
+ production diagnostics.
153
+
154
+ **Phase 2 output:**
155
+
156
+ ```markdown
157
+ ### Fault Localization
158
+ - Working reference: ...
159
+ - Failing path: ...
160
+ - Boundary where it diverges: ...
161
+ - Evidence: ...
162
+ ```
163
+
164
+ ---
165
+
166
+ ### Phase 3: Form and Test One Hypothesis
167
+
168
+ Goal: test one causal explanation at a time.
169
+
170
+ 1. **State a falsifiable hypothesis**
171
+
172
+ ```markdown
173
+ I believe the root cause is [specific cause] because [evidence].
174
+ If true, then [minimal test/check] should show [observable result].
175
+ ```
176
+
177
+ 2. **Test the hypothesis minimally**
178
+ - Change one variable at a time.
179
+ - Prefer a targeted test, assertion, probe, or small reproduction over a broad suite.
180
+ - Do not make the production fix yet unless the minimal test itself is the
181
+ regression test you intend to keep.
182
+
183
+ 3. **Decide based on evidence**
184
+ - If confirmed, proceed to Phase 4.
185
+ - If disproven, record what you learned and return to Phase 2 or Phase 1.
186
+ - If inconclusive, gather more evidence rather than guessing.
187
+
188
+ **Phase 3 output:**
189
+
190
+ ```markdown
191
+ ### Hypothesis
192
+ - Hypothesis: ...
193
+ - Test performed: ...
194
+ - Result: confirmed / disproven / inconclusive
195
+ - Evidence: ...
196
+ ```
197
+
198
+ ---
199
+
200
+ ### Phase 4: Fix, Prove, and Generalize
201
+
202
+ Goal: repair the root cause and prevent regression.
203
+
204
+ 1. **Write or preserve a failing check first**
205
+ - Add a regression test when practical.
206
+ - If an automated test is not practical, document the manual reproduction and
207
+ exact verification command.
208
+ - The proof must fail or be demonstrably missing before the fix.
209
+
210
+ 2. **Implement the smallest root-cause fix**
211
+ - Fix the source of the bad state, not only the final crash site.
212
+ - Avoid unrelated refactors, formatting sweeps, or opportunistic improvements.
213
+ - Keep the change reviewable.
214
+
215
+ 3. **Verify narrowly, then broadly**
216
+ - First run the regression check that proves the bug is fixed.
217
+ - Then run the relevant quality gates for the project.
218
+ - If a broad gate fails for a new reason, start a new debugging loop instead of
219
+ bundling unrelated fixes.
220
+
221
+ 4. **Capture learning when it recurs**
222
+ - If this is a repeated mistake, update the project's error memory or equivalent
223
+ persistent knowledge store.
224
+ - If the fix changes a durable convention, record a decision using the project's
225
+ decision-log practice.
226
+
227
+ **Phase 4 output:**
228
+
229
+ ```markdown
230
+ ### Fix Proof
231
+ - Regression proof: ...
232
+ - Root-cause fix: ...
233
+ - Narrow verification: ...
234
+ - Broad verification: ...
235
+ - Follow-up memory/decision needed: yes / no
236
+ ```
237
+
238
+ ---
239
+
240
+ ## Rule of Three
241
+
242
+ If three fix attempts fail, stop and reassess the architecture or model of the
243
+ problem. Three failed fixes usually mean the root cause has not been understood,
244
+ the design boundary is wrong, or the reproduction is incomplete.
245
+
246
+ Before attempting a fourth fix, produce this note and ask for human direction:
247
+
248
+ ```markdown
249
+ ### Rule of Three Stop
250
+ - Fix attempts tried: ...
251
+ - What each attempt taught us: ...
252
+ - Why the current model may be wrong: ...
253
+ - Options: continue investigation / change design / defer with documented risk
254
+ ```
255
+
256
+ ---
257
+
258
+ ## Debugging Report Template
259
+
260
+ Use this concise report in PR descriptions, issue comments, or handoff notes:
261
+
262
+ ```markdown
263
+ ## Debugging Summary
264
+
265
+ ### Reproduction
266
+ - Command or steps:
267
+ - Expected:
268
+ - Actual:
269
+ - Smallest trigger:
270
+
271
+ ### Root Cause
272
+ - Fault boundary:
273
+ - Cause:
274
+ - Evidence:
275
+
276
+ ### Fix
277
+ - Change made:
278
+ - Why it fixes the cause, not just the symptom:
279
+
280
+ ### Verification
281
+ - Regression proof:
282
+ - Quality gates:
283
+ - Remaining risks:
284
+ ```
285
+
286
+ ---
287
+
288
+ ## Forbidden Patterns
289
+
290
+ | Pattern | Why it is forbidden |
291
+ | --- | --- |
292
+ | Editing before reproducing | You cannot know whether the fix changed the failing behavior. |
293
+ | Fixing the line that throws without tracing upstream | The crash site is often only where bad state becomes visible. |
294
+ | Trying multiple changes at once | You cannot tell which change mattered or which one introduced new risk. |
295
+ | Ignoring intermittent failures | Flakiness is a real failure mode, not a reason to dismiss evidence. |
296
+ | Treating CI as different without proof | Environment differences must be identified, not assumed. |
297
+ | Keeping temporary debug noise | Instrumentation added for investigation should be removed or intentionally promoted. |
298
+ | Declaring success after one narrow pass | Regression proof is necessary, but broad gates catch collateral damage. |
299
+ | Attempting fix four after three failures | Repeated failure means the model is wrong; stop and reassess. |
300
+
301
+ ---
302
+
303
+ ## Quick Reference
304
+
305
+ | Phase | Question | Output |
306
+ | --- | --- | --- |
307
+ | 1. Reproduce and Observe | What exactly fails, and how do I see it? | Smallest reliable reproduction |
308
+ | 2. Localize the Fault | Where does correct state become incorrect? | Fault boundary and evidence |
309
+ | 3. Form and Test One Hypothesis | What causal explanation can I falsify? | Confirmed or disproven hypothesis |
310
+ | 4. Fix, Prove, and Generalize | How do I repair the root cause and prevent recurrence? | Regression proof and verified fix |
311
+
312
+ _Systematic debugging favors evidence over intuition. Slow down at the start so
313
+ you can move fast once the cause is known._
@@ -0,0 +1,293 @@
1
+ ---
2
+ name: "test-reviewer"
3
+ description: "Load when the user asks to write or review tests, TDD cases, eval scenarios, coverage, assertions, or mocks, or says tests are shallow, flaky, brittle, or too close to implementation."
4
+ version: 1.1.0
5
+ required: true
6
+ category: testing
7
+ tools:
8
+ - claude
9
+ - copilot
10
+ - codex
11
+ - cursor
12
+ routing:
13
+ triggers:
14
+ - tests
15
+ - test-review
16
+ - tdd
17
+ - coverage
18
+ - assertions
19
+ paths:
20
+ - full-path
21
+ - review-path
22
+ ---
23
+
24
+ ## Review Depth
25
+
26
+ Default to the lightest useful review.
27
+
28
+ ### Fast Path
29
+ Use only when the change is small, localized, low-risk, and project gates are already passing or not relevant.
30
+
31
+ Output:
32
+ - Top 1-3 material findings only
33
+ - `No material findings` if clean
34
+ - Verification gaps only when they affect merge confidence
35
+
36
+ Do not emit the full checklist when there are no findings.
37
+
38
+ ### Deep Path
39
+ Use the full review process when the change is high-risk, cross-cutting, production-sensitive, security/data-sensitive, behavior-changing without adequate tests, has failing or missing gates, or is explicitly requested.
40
+
41
+ # Test Reviewer
42
+
43
+ You are a specialist in writing and reviewing tests. Your primary focus is ensuring tests assert observable behavior rather than reimplementing the logic they're supposed to verify. This file is meant to grow — add good patterns here as the team discovers them.
44
+
45
+ ## Core Principle: Don't Duplicate Production Logic
46
+
47
+ A test should _state_ what the outcome is, not _recompute_ it. If the test contains logic that mirrors the implementation, it's not testing anything — it's just running the code twice.
48
+
49
+ ---
50
+
51
+ ## What to Flag
52
+
53
+ ### Rule 1: No Implementation Mirroring
54
+
55
+ Flag any test that derives its expected values using the same logic as the implementation. Treat the following constructs in test code as suspicious when they mirror production code:
56
+
57
+ - Filters, maps, and reduces
58
+ - Conditionals and branching logic
59
+ - Loops and iterations
60
+ - String concatenation or template logic that rebuilds output
61
+
62
+ ```typescript
63
+ // ❌ BAD: Test mirrors the production logic
64
+ function getActiveUsers(users: User[]): User[] {
65
+ return users.filter((u) => u.isActive && !u.isDeleted);
66
+ }
67
+
68
+ it("should return active users", () => {
69
+ const users = [
70
+ { id: "1", isActive: true, isDeleted: false },
71
+ { id: "2", isActive: false, isDeleted: false },
72
+ { id: "3", isActive: true, isDeleted: true },
73
+ ];
74
+ const expected = users.filter((u) => u.isActive && !u.isDeleted);
75
+ expect(getActiveUsers(users)).toEqual(expected);
76
+ });
77
+
78
+ // ✅ GOOD: Test asserts on concrete expected output
79
+ it("should return only users that are active and not deleted", () => {
80
+ const users = [
81
+ { id: "1", isActive: true, isDeleted: false },
82
+ { id: "2", isActive: false, isDeleted: false },
83
+ { id: "3", isActive: true, isDeleted: true },
84
+ ];
85
+ expect(getActiveUsers(users)).toEqual([
86
+ { id: "1", isActive: true, isDeleted: false },
87
+ ]);
88
+ });
89
+ ```
90
+
91
+ **How to fix:** Hard-code the expected output. If you can't hard-code it, the test is too complex — break it into smaller cases.
92
+
93
+ ### Rule 2: Strong Assertions
94
+
95
+ Every assertion must verify a specific, meaningful value. Weak assertions pass even when the code is broken.
96
+
97
+ ```typescript
98
+ // ❌ BAD: Asserts existence, not correctness
99
+ it("should create a user", async () => {
100
+ const user = await createUser({ name: "Alice", email: "alice@test.com" });
101
+ expect(user).toBeDefined();
102
+ expect(user.id).toBeTruthy();
103
+ });
104
+
105
+ // ✅ GOOD: Asserts specific values and structure
106
+ it("should create a user with the provided details", async () => {
107
+ const user = await createUser({ name: "Alice", email: "alice@test.com" });
108
+ expect(user).toEqual({
109
+ id: expect.any(String),
110
+ name: "Alice",
111
+ email: "alice@test.com",
112
+ createdAt: expect.any(Date),
113
+ });
114
+ });
115
+ ```
116
+
117
+ **Weak assertions to flag:**
118
+
119
+ | Assertion | Problem |
120
+ | ------------------------------- | ----------------------------------------------------------- |
121
+ | `toBeDefined()` | Passes for any non-undefined value, including wrong values |
122
+ | `toBeTruthy()` | Passes for `1`, `"wrong"`, `{}`, `[]` — almost anything |
123
+ | `toBeFalsy()` | Passes for `0`, `""`, `null`, `undefined` — too many things |
124
+ | `expect(result).not.toBeNull()` | Confirms existence, not correctness |
125
+
126
+ **Negated assertions** are a related smell — they constrain what a value _isn't_ without saying what it _is_:
127
+
128
+ ```typescript
129
+ // ❌ BAD: Says what it's not — passes for any other value, including wrong ones
130
+ expect(input).not.toHaveValue("old value");
131
+ expect(element).not.toBeVisible();
132
+ expect(list).not.toHaveLength(0);
133
+ expect(button).not.toBeDisabled();
134
+
135
+ // ✅ GOOD: Says what it is — only one correct value passes
136
+ expect(input).toHaveValue("new value");
137
+ expect(element).toBeHidden();
138
+ expect(list).toHaveLength(3);
139
+ expect(button).toBeEnabled();
140
+ ```
141
+
142
+ **Acceptable uses of negated assertions:**
143
+
144
+ - Verifying absence: `expect(element).not.toBeInTheDocument()` (there is no positive form)
145
+ - As _additional_ verification alongside a positive assertion
146
+
147
+ **Acceptable uses of weak assertions:**
148
+
149
+ - As guards before stronger ones: `expect(result).toBeDefined(); expect(result.name).toBe("Alice");`
150
+ - When testing a boolean function that should return `true`
151
+
152
+ ### Rule 3: Edge Cases Required
153
+
154
+ Every test suite must include at least one test for each category:
155
+
156
+ 1. **Empty input** — empty string, empty array, empty object
157
+ 2. **Null/undefined** — missing or absent values
158
+ 3. **Boundary values** — zero, negative numbers, max length, single element
159
+ 4. **Error cases** — invalid input, network failure, timeout
160
+
161
+ ```typescript
162
+ // ❌ BAD: Only tests the happy path
163
+ describe("parseConfig", () => {
164
+ it("should parse valid config", () => {
165
+ expect(parseConfig('{"port": 3000}')).toEqual({ port: 3000 });
166
+ });
167
+ });
168
+
169
+ // ✅ GOOD: Covers happy path + edge cases
170
+ describe("parseConfig", () => {
171
+ it("should parse valid config", () => {
172
+ expect(parseConfig('{"port": 3000}')).toEqual({ port: 3000 });
173
+ });
174
+
175
+ it("should throw on empty string", () => {
176
+ expect(() => parseConfig("")).toThrow();
177
+ });
178
+
179
+ it("should throw on invalid JSON", () => {
180
+ expect(() => parseConfig("not json")).toThrow(ConfigParseError);
181
+ });
182
+
183
+ it("should return defaults for empty object", () => {
184
+ expect(parseConfig("{}")).toEqual({ port: 8080 });
185
+ });
186
+ });
187
+ ```
188
+
189
+ ### Rule 4: Behavior Over Mocks
190
+
191
+ Assert on what the system _did_, not on what mocks were _called with_. Mock assertions test your test setup, not your code.
192
+
193
+ ```typescript
194
+ // ❌ BAD: Only asserts on mock calls
195
+ it("should send welcome email", async () => {
196
+ const mockMailer = { send: vi.fn() };
197
+ await registerUser({ name: "Alice", email: "alice@test.com" }, mockMailer);
198
+ expect(mockMailer.send).toHaveBeenCalledWith({
199
+ to: "alice@test.com",
200
+ subject: "Welcome",
201
+ });
202
+ });
203
+
204
+ // ✅ GOOD: Asserts on the actual outcome
205
+ it("should register user and send welcome email", async () => {
206
+ const sent: Email[] = [];
207
+ const mailer = { send: (email: Email) => sent.push(email) };
208
+
209
+ const user = await registerUser(
210
+ { name: "Alice", email: "alice@test.com" },
211
+ mailer,
212
+ );
213
+
214
+ expect(user).toEqual({
215
+ id: expect.any(String),
216
+ name: "Alice",
217
+ email: "alice@test.com",
218
+ });
219
+ expect(sent).toEqual([
220
+ { to: "alice@test.com", subject: "Welcome", body: expect.any(String) },
221
+ ]);
222
+ });
223
+ ```
224
+
225
+ **When mock assertions are acceptable:**
226
+
227
+ - Verifying a side effect with no observable return value (logging, metrics)
228
+ - Verifying a dependency was _not_ called (negative test)
229
+ - As _additional_ verification alongside behavioral assertions
230
+
231
+ **When mock assertions are a smell:**
232
+
233
+ - `expect(mock).toHaveBeenCalledWith(...)` with no `expect(result)...` in the same test
234
+ - Mock setup is longer than the assertion block
235
+ - Changing the implementation (not the behavior) would break the test
236
+
237
+ ### Rule 5: DAMP Over DRY
238
+
239
+ DAMP (Descriptive And Meaningful Phrases) is usually better than DRY (Don't
240
+ Repeat Yourself) in tests. Tests should be descriptive and meaningful even when
241
+ that means some duplication. Flag shared helpers, fixtures, or setup factories
242
+ when they hide the behavior under test, force the reader to chase indirection, or
243
+ make many tests fail for one helper change.
244
+
245
+ ### Rule 6: Test Outcomes, Not Internals
246
+
247
+ Prefer assertions on observable state, returned values, rendered output, persisted records, emitted events, or external side effects. Flag tests that primarily assert private methods, internal call order, implementation structure, or framework behavior when an outcome assertion would prove the same behavior.
248
+
249
+ ### Rule 7: Test Isolation
250
+
251
+ Flag tests that depend on execution order, shared mutable state, real time, random data, network access, external services, or prior test side effects unless those dependencies are explicitly controlled. Flaky tests erode trust in the suite.
252
+
253
+ ### Rule 8: Test Names Describe Behavior
254
+
255
+ Test names should read like behavioral specifications. Flag vague names such as `works`, `handles errors`, or `test 3`, and names that describe implementation mechanics instead of the user-visible or system-visible behavior being verified.
256
+
257
+ ---
258
+
259
+ ## Review Process
260
+
261
+ For every test you write or review:
262
+
263
+ 1. **Identify the behavior under test** — what outcome or side effect is this test meant to verify?
264
+ 2. **Check for logic mirroring** — does the test derive the expected value using logic instead of stating it directly?
265
+ 3. **Check assertion strength** — does every assertion verify a specific value, not just existence?
266
+ 4. **Check edge case coverage** — are empty, null, boundary, and error cases represented?
267
+ 5. **Check mock usage** — do assertions target outcomes, or just mock call signatures?
268
+ 6. **Check readability and isolation** — is the test self-contained enough to understand, named by behavior, and free of hidden order/time/network dependencies?
269
+ 7. **If any rule is violated** — flag it using the output format below.
270
+
271
+ ---
272
+
273
+ ## Output Format for Flagged Tests
274
+
275
+ When flagging a test, use this structure:
276
+
277
+ ```
278
+ ### [Rule violated]: [Brief description]
279
+
280
+ **File:** `path/to/test.ts`
281
+ **Test:** "should [test name]"
282
+ **Problem:** [What is wrong and why it matters]
283
+
284
+ **Current:**
285
+ \`\`\`typescript
286
+ [the problematic test code]
287
+ \`\`\`
288
+
289
+ **Suggested:**
290
+ \`\`\`typescript
291
+ [the corrected test code]
292
+ \`\`\`
293
+ ```