@torus-engineering/tas-kit 1.11.1 → 1.12.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.tas/README.md +334 -334
- package/{.claude → .tas/_platform/claude-code}/settings.json +0 -12
- package/{.claude → .tas/_platform}/hooks/code-quality.js +1 -1
- package/{.claude → .tas/_platform}/hooks/session-end.js +20 -25
- package/{.claude → .tas}/commands/ado-create.md +5 -4
- package/{.claude → .tas}/commands/ado-delete.md +5 -4
- package/{.claude → .tas}/commands/ado-update.md +5 -4
- package/{.claude → .tas}/commands/tas-adr.md +3 -3
- package/{.claude → .tas}/commands/tas-apitest-plan.md +2 -2
- package/{.claude → .tas}/commands/tas-apitest.md +4 -4
- package/{.claude → .tas}/commands/tas-bug.md +6 -6
- package/{.claude → .tas}/commands/tas-design.md +3 -3
- package/{.claude → .tas}/commands/tas-dev.md +11 -14
- package/{.claude → .tas}/commands/tas-epic.md +3 -3
- package/{.claude → .tas}/commands/tas-feature.md +4 -4
- package/{.claude → .tas}/commands/tas-fix.md +5 -5
- package/{.claude → .tas}/commands/tas-init.md +1 -1
- package/{.claude → .tas}/commands/tas-plan.md +198 -198
- package/{.claude → .tas}/commands/tas-prd.md +3 -3
- package/{.claude → .tas}/commands/tas-review.md +17 -15
- package/{.claude → .tas}/commands/tas-sad.md +3 -3
- package/{.claude → .tas}/commands/tas-security.md +4 -4
- package/{.claude → .tas}/commands/tas-story.md +3 -3
- package/.tas/platforms.json +5 -0
- package/.tas/project-status-example.yaml +17 -17
- package/{.claude/skills/ado-integration/SKILL.md → .tas/rules/ado-integration.md} +5 -15
- package/{.claude/skills/api-design/SKILL.md → .tas/rules/common/api-design.md} +517 -530
- package/{.claude → .tas}/rules/common/code-review.md +30 -6
- package/{.claude/rules/common/post-review-agent.md → .tas/rules/common/post-implementation-review.md} +51 -49
- package/{.claude → .tas}/rules/common/project-status.md +80 -80
- package/{.claude → .tas}/rules/common/stack-detection.md +29 -29
- package/.tas/{checklists → rules/common}/story-done.md +12 -5
- package/{.claude/skills/tas-tdd/SKILL.md → .tas/rules/common/tdd.md} +4 -38
- package/{.claude → .tas}/rules/common/testing.md +3 -8
- package/{.claude → .tas}/rules/common/token-logging.md +36 -27
- package/{.claude → .tas}/rules/csharp/api-testing.md +171 -171
- package/{.claude → .tas}/rules/csharp/coding-style.md +0 -2
- package/{.claude → .tas}/rules/csharp/security.md +10 -0
- package/{.claude → .tas}/rules/python/coding-style.md +0 -2
- package/{.claude → .tas}/rules/typescript/coding-style.md +0 -2
- package/.tas/rules/typescript/patterns.md +142 -0
- package/.tas/rules/typescript/security.md +88 -0
- package/{.claude → .tas}/rules/typescript/testing.md +0 -4
- package/{.claude → .tas}/rules/web/coding-style.md +0 -2
- package/.tas/tas-example.yaml +125 -126
- package/.tas/templates/ADR.md +47 -47
- package/.tas/templates/Bug.md +67 -67
- package/.tas/templates/Design-Spec.md +36 -36
- package/.tas/templates/Epic.md +46 -46
- package/.tas/templates/Feature.md +1 -1
- package/.tas/templates/Security-Report.md +27 -27
- package/.tas/tools/tas-ado-readme.md +169 -169
- package/.tas/tools/tas-ado.py +621 -621
- package/README.md +334 -334
- package/bin/cli.js +91 -73
- package/lib/adapters/antigravity.js +137 -0
- package/lib/adapters/claude-code.js +35 -0
- package/lib/adapters/codex.js +163 -0
- package/lib/adapters/cursor.js +80 -0
- package/lib/adapters/index.js +20 -0
- package/lib/adapters/utils.js +81 -0
- package/lib/deleted-files.json +99 -0
- package/lib/install.js +403 -327
- package/package.json +4 -3
- package/.claude/agents/code-reviewer.md +0 -41
- package/.claude/agents/e2e-runner.md +0 -61
- package/.claude/agents/planner.md +0 -82
- package/.claude/agents/tdd-guide.md +0 -84
- package/.claude/commands/tas-verify.md +0 -51
- package/.claude/rules/typescript/patterns.md +0 -62
- package/.claude/rules/typescript/security.md +0 -28
- package/.claude/settings.local.json +0 -38
- package/.claude/skills/ai-regression-testing/SKILL.md +0 -364
- package/.claude/skills/architecture-decision-records/SKILL.md +0 -184
- package/.claude/skills/benchmark/SKILL.md +0 -98
- package/.claude/skills/browser-qa/SKILL.md +0 -92
- package/.claude/skills/canary-watch/SKILL.md +0 -104
- package/.claude/skills/js-backend-patterns/SKILL.md +0 -603
- package/.claude/skills/tas-conventions/SKILL.md +0 -65
- package/.claude/skills/tas-implementation-complete/SKILL.md +0 -100
- package/.claude/skills/token-logger/SKILL.md +0 -19
- package/.tas/checklists/code-review.md +0 -29
- package/.tas/checklists/security.md +0 -21
- /package/{.claude → .tas}/agents/architect.md +0 -0
- /package/{.claude → .tas}/agents/aws-reviewer.md +0 -0
- /package/{.claude → .tas}/agents/build-resolver.md +0 -0
- /package/{.claude → .tas}/agents/code-explorer.md +0 -0
- /package/{.claude → .tas}/agents/csharp-reviewer.md +0 -0
- /package/{.claude → .tas}/agents/database-reviewer.md +0 -0
- /package/{.claude → .tas}/agents/doc-updater.md +0 -0
- /package/{.claude → .tas}/agents/python-reviewer.md +0 -0
- /package/{.claude → .tas}/agents/security-reviewer.md +0 -0
- /package/{.claude → .tas}/agents/typescript-reviewer.md +0 -0
- /package/{.claude → .tas}/commands/ado-get.md +0 -0
- /package/{.claude → .tas}/commands/ado-status.md +0 -0
- /package/{.claude → .tas}/commands/tas-brainstorm.md +0 -0
- /package/{.claude → .tas}/commands/tas-e2e-mobile.md +0 -0
- /package/{.claude → .tas}/commands/tas-e2e-web.md +0 -0
- /package/{.claude → .tas}/commands/tas-e2e.md +0 -0
- /package/{.claude → .tas}/commands/tas-functest-mobile.md +0 -0
- /package/{.claude → .tas}/commands/tas-functest-web.md +0 -0
- /package/{.claude → .tas}/commands/tas-functest.md +0 -0
- /package/{.claude → .tas}/commands/tas-spec.md +0 -0
- /package/{.claude → .tas}/commands/tas-status.md +0 -0
- /package/{.claude → .tas}/rules/.gitkeep +0 -0
- /package/{.claude → .tas}/rules/common/hooks.md +0 -0
- /package/{.claude → .tas}/rules/common/patterns.md +0 -0
- /package/{.claude → .tas}/rules/common/security.md +0 -0
- /package/{.claude → .tas}/rules/csharp/hooks.md +0 -0
- /package/{.claude → .tas}/rules/csharp/patterns.md +0 -0
- /package/{.claude → .tas}/rules/csharp/testing.md +0 -0
- /package/{.claude → .tas}/rules/python/hooks.md +0 -0
- /package/{.claude → .tas}/rules/python/patterns.md +0 -0
- /package/{.claude → .tas}/rules/python/security.md +0 -0
- /package/{.claude → .tas}/rules/python/testing.md +0 -0
- /package/{.claude → .tas}/rules/typescript/hooks.md +0 -0
- /package/{.claude → .tas}/rules/web/design-quality.md +0 -0
- /package/{.claude → .tas}/rules/web/hooks.md +0 -0
- /package/{.claude → .tas}/rules/web/patterns.md +0 -0
- /package/{.claude → .tas}/rules/web/performance.md +0 -0
- /package/{.claude → .tas}/rules/web/security.md +0 -0
- /package/{.claude → .tas}/rules/web/testing.md +0 -0
- /package/{CLAUDE-Example.md → .tas/templates/AGENTS.md} +0 -0
|
@@ -1,364 +0,0 @@
|
|
|
1
|
-
---
|
|
2
|
-
name: ai-regression-testing
|
|
3
|
-
description: |
|
|
4
|
-
Auto-invoke when an AI agent has modified API routes or backend logic, when a bug
|
|
5
|
-
is found and needs a regression test written, or when running bug-check workflows
|
|
6
|
-
on AI-generated code. Especially valuable when a sandbox/mock mode exists —
|
|
7
|
-
enables fast, DB-free API testing to catch sandbox/production path mismatches.
|
|
8
|
-
origin: ECC
|
|
9
|
-
allowed-tools: Read, Write, Edit, Bash, Grep, Glob
|
|
10
|
-
---
|
|
11
|
-
|
|
12
|
-
# AI Regression Testing
|
|
13
|
-
|
|
14
|
-
Testing patterns specifically designed for AI-assisted development, where the same model writes code and reviews it — creating systematic blind spots that only automated tests can catch.
|
|
15
|
-
|
|
16
|
-
## When to Activate
|
|
17
|
-
|
|
18
|
-
- AI agent (Claude Code, Cursor, Codex) has modified API routes or backend logic
|
|
19
|
-
- A bug was found and fixed — need to prevent re-introduction
|
|
20
|
-
- Project has a sandbox/mock mode that can be leveraged for DB-free testing
|
|
21
|
-
- Running `/tas-verify` or post-fix review workflows after code changes
|
|
22
|
-
- Multiple code paths exist (sandbox vs production, feature flags, etc.)
|
|
23
|
-
|
|
24
|
-
## The Core Problem
|
|
25
|
-
|
|
26
|
-
When an AI writes code and then reviews its own work, it carries the same assumptions into both steps. This creates a predictable failure pattern:
|
|
27
|
-
|
|
28
|
-
```
|
|
29
|
-
AI writes fix → AI reviews fix → AI says "looks correct" → Bug still exists
|
|
30
|
-
```
|
|
31
|
-
|
|
32
|
-
**Real-world example** (observed in production):
|
|
33
|
-
|
|
34
|
-
```
|
|
35
|
-
Fix 1: Added notification_settings to API response
|
|
36
|
-
→ Forgot to add it to the SELECT query
|
|
37
|
-
→ AI reviewed and missed it (same blind spot)
|
|
38
|
-
|
|
39
|
-
Fix 2: Added it to SELECT query
|
|
40
|
-
→ TypeScript build error (column not in generated types)
|
|
41
|
-
→ AI reviewed Fix 1 but didn't catch the SELECT issue
|
|
42
|
-
|
|
43
|
-
Fix 3: Changed to SELECT *
|
|
44
|
-
→ Fixed production path, forgot sandbox path
|
|
45
|
-
→ AI reviewed and missed it AGAIN (4th occurrence)
|
|
46
|
-
|
|
47
|
-
Fix 4: Test caught it instantly on first run PASS:
|
|
48
|
-
```
|
|
49
|
-
|
|
50
|
-
The pattern: **sandbox/production path inconsistency** is the #1 AI-introduced regression.
|
|
51
|
-
|
|
52
|
-
## Sandbox-Mode API Testing
|
|
53
|
-
|
|
54
|
-
Most projects with AI-friendly architecture have a sandbox/mock mode. This is the key to fast, DB-free API testing.
|
|
55
|
-
|
|
56
|
-
### Setup (Vitest + Next.js App Router)
|
|
57
|
-
|
|
58
|
-
```typescript
|
|
59
|
-
// vitest.config.ts
|
|
60
|
-
import { defineConfig } from "vitest/config";
|
|
61
|
-
import path from "path";
|
|
62
|
-
|
|
63
|
-
export default defineConfig({
|
|
64
|
-
test: {
|
|
65
|
-
environment: "node",
|
|
66
|
-
globals: true,
|
|
67
|
-
include: ["__tests__/**/*.test.ts"],
|
|
68
|
-
setupFiles: ["__tests__/setup.ts"],
|
|
69
|
-
},
|
|
70
|
-
resolve: {
|
|
71
|
-
alias: {
|
|
72
|
-
"@": path.resolve(__dirname, "."),
|
|
73
|
-
},
|
|
74
|
-
},
|
|
75
|
-
});
|
|
76
|
-
```
|
|
77
|
-
|
|
78
|
-
```typescript
|
|
79
|
-
// __tests__/setup.ts
|
|
80
|
-
// Force sandbox mode — no database needed
|
|
81
|
-
process.env.SANDBOX_MODE = "true";
|
|
82
|
-
process.env.NEXT_PUBLIC_SUPABASE_URL = "";
|
|
83
|
-
process.env.NEXT_PUBLIC_SUPABASE_ANON_KEY = "";
|
|
84
|
-
```
|
|
85
|
-
|
|
86
|
-
### Test Helper for Next.js API Routes
|
|
87
|
-
|
|
88
|
-
```typescript
|
|
89
|
-
// __tests__/helpers.ts
|
|
90
|
-
import { NextRequest } from "next/server";
|
|
91
|
-
|
|
92
|
-
export function createTestRequest(
|
|
93
|
-
url: string,
|
|
94
|
-
options?: {
|
|
95
|
-
method?: string;
|
|
96
|
-
body?: Record<string, unknown>;
|
|
97
|
-
headers?: Record<string, string>;
|
|
98
|
-
sandboxUserId?: string;
|
|
99
|
-
},
|
|
100
|
-
): NextRequest {
|
|
101
|
-
const { method = "GET", body, headers = {}, sandboxUserId } = options || {};
|
|
102
|
-
const fullUrl = url.startsWith("http") ? url : `http://localhost:3000${url}`;
|
|
103
|
-
const reqHeaders: Record<string, string> = { ...headers };
|
|
104
|
-
|
|
105
|
-
if (sandboxUserId) {
|
|
106
|
-
reqHeaders["x-sandbox-user-id"] = sandboxUserId;
|
|
107
|
-
}
|
|
108
|
-
|
|
109
|
-
const init: { method: string; headers: Record<string, string>; body?: string } = {
|
|
110
|
-
method,
|
|
111
|
-
headers: reqHeaders,
|
|
112
|
-
};
|
|
113
|
-
|
|
114
|
-
if (body) {
|
|
115
|
-
init.body = JSON.stringify(body);
|
|
116
|
-
reqHeaders["content-type"] = "application/json";
|
|
117
|
-
}
|
|
118
|
-
|
|
119
|
-
return new NextRequest(fullUrl, init);
|
|
120
|
-
}
|
|
121
|
-
|
|
122
|
-
export async function parseResponse(response: Response) {
|
|
123
|
-
const json = await response.json();
|
|
124
|
-
return { status: response.status, json };
|
|
125
|
-
}
|
|
126
|
-
```
|
|
127
|
-
|
|
128
|
-
### Writing Regression Tests
|
|
129
|
-
|
|
130
|
-
The key principle: **write tests for bugs that were found, not for code that works**.
|
|
131
|
-
|
|
132
|
-
```typescript
|
|
133
|
-
// __tests__/api/user/profile.test.ts
|
|
134
|
-
import { describe, it, expect } from "vitest";
|
|
135
|
-
import { createTestRequest, parseResponse } from "../../helpers";
|
|
136
|
-
import { GET, PATCH } from "@/app/api/user/profile/route";
|
|
137
|
-
|
|
138
|
-
// Define the contract — what fields MUST be in the response
|
|
139
|
-
const REQUIRED_FIELDS = [
|
|
140
|
-
"id",
|
|
141
|
-
"email",
|
|
142
|
-
"full_name",
|
|
143
|
-
"phone",
|
|
144
|
-
"role",
|
|
145
|
-
"created_at",
|
|
146
|
-
"avatar_url",
|
|
147
|
-
"notification_settings", // ← Added after bug found it missing
|
|
148
|
-
];
|
|
149
|
-
|
|
150
|
-
describe("GET /api/user/profile", () => {
|
|
151
|
-
it("returns all required fields", async () => {
|
|
152
|
-
const req = createTestRequest("/api/user/profile");
|
|
153
|
-
const res = await GET(req);
|
|
154
|
-
const { status, json } = await parseResponse(res);
|
|
155
|
-
|
|
156
|
-
expect(status).toBe(200);
|
|
157
|
-
for (const field of REQUIRED_FIELDS) {
|
|
158
|
-
expect(json.data).toHaveProperty(field);
|
|
159
|
-
}
|
|
160
|
-
});
|
|
161
|
-
|
|
162
|
-
// Regression test — this exact bug was introduced by AI 4 times
|
|
163
|
-
it("notification_settings is not undefined (BUG-R1 regression)", async () => {
|
|
164
|
-
const req = createTestRequest("/api/user/profile");
|
|
165
|
-
const res = await GET(req);
|
|
166
|
-
const { json } = await parseResponse(res);
|
|
167
|
-
|
|
168
|
-
expect("notification_settings" in json.data).toBe(true);
|
|
169
|
-
const ns = json.data.notification_settings;
|
|
170
|
-
expect(ns === null || typeof ns === "object").toBe(true);
|
|
171
|
-
});
|
|
172
|
-
});
|
|
173
|
-
```
|
|
174
|
-
|
|
175
|
-
### Testing Sandbox/Production Parity
|
|
176
|
-
|
|
177
|
-
The most common AI regression: fixing production path but forgetting sandbox path (or vice versa).
|
|
178
|
-
|
|
179
|
-
```typescript
|
|
180
|
-
// Test that sandbox responses match the expected contract
|
|
181
|
-
describe("GET /api/user/messages (conversation list)", () => {
|
|
182
|
-
it("includes partner_name in sandbox mode", async () => {
|
|
183
|
-
const req = createTestRequest("/api/user/messages", {
|
|
184
|
-
sandboxUserId: "user-001",
|
|
185
|
-
});
|
|
186
|
-
const res = await GET(req);
|
|
187
|
-
const { json } = await parseResponse(res);
|
|
188
|
-
|
|
189
|
-
// This caught a bug where partner_name was added
|
|
190
|
-
// to production path but not sandbox path
|
|
191
|
-
if (json.data.length > 0) {
|
|
192
|
-
for (const conv of json.data) {
|
|
193
|
-
expect("partner_name" in conv).toBe(true);
|
|
194
|
-
}
|
|
195
|
-
}
|
|
196
|
-
});
|
|
197
|
-
});
|
|
198
|
-
```
|
|
199
|
-
|
|
200
|
-
## Integrating Tests into Bug-Check Workflow
|
|
201
|
-
|
|
202
|
-
### Workflow Integration with TAS Kit
|
|
203
|
-
|
|
204
|
-
Pair with `/tas-bug` for bug tracking and `/tas-verify` for post-fix verification.
|
|
205
|
-
|
|
206
|
-
```
|
|
207
|
-
User: "/tas-bug" or reports a bug
|
|
208
|
-
│
|
|
209
|
-
├─ Step 1: npm run test
|
|
210
|
-
│ ├─ FAIL → Bug found mechanically (no AI judgment needed)
|
|
211
|
-
│ └─ PASS → Continue
|
|
212
|
-
│
|
|
213
|
-
├─ Step 2: npm run build
|
|
214
|
-
│ ├─ FAIL → Type error found mechanically
|
|
215
|
-
│ └─ PASS → Continue
|
|
216
|
-
│
|
|
217
|
-
├─ Step 3: AI code review (with known blind spots in mind)
|
|
218
|
-
│ └─ Findings reported
|
|
219
|
-
│
|
|
220
|
-
└─ Step 4: For each fix, write a regression test
|
|
221
|
-
└─ Next bug-check catches if fix breaks
|
|
222
|
-
```
|
|
223
|
-
|
|
224
|
-
## Common AI Regression Patterns
|
|
225
|
-
|
|
226
|
-
### Pattern 1: Sandbox/Production Path Mismatch
|
|
227
|
-
|
|
228
|
-
**Frequency**: Most common (observed in 3 out of 4 regressions)
|
|
229
|
-
|
|
230
|
-
```typescript
|
|
231
|
-
// FAIL: AI adds field to production path only
|
|
232
|
-
if (isSandboxMode()) {
|
|
233
|
-
return { data: { id, email, name } }; // Missing new field
|
|
234
|
-
}
|
|
235
|
-
// Production path
|
|
236
|
-
return { data: { id, email, name, notification_settings } };
|
|
237
|
-
|
|
238
|
-
// PASS: Both paths must return the same shape
|
|
239
|
-
if (isSandboxMode()) {
|
|
240
|
-
return { data: { id, email, name, notification_settings: null } };
|
|
241
|
-
}
|
|
242
|
-
return { data: { id, email, name, notification_settings } };
|
|
243
|
-
```
|
|
244
|
-
|
|
245
|
-
**Test to catch it**:
|
|
246
|
-
|
|
247
|
-
```typescript
|
|
248
|
-
it("sandbox and production return same fields", async () => {
|
|
249
|
-
// In test env, sandbox mode is forced ON
|
|
250
|
-
const res = await GET(createTestRequest("/api/user/profile"));
|
|
251
|
-
const { json } = await parseResponse(res);
|
|
252
|
-
|
|
253
|
-
for (const field of REQUIRED_FIELDS) {
|
|
254
|
-
expect(json.data).toHaveProperty(field);
|
|
255
|
-
}
|
|
256
|
-
});
|
|
257
|
-
```
|
|
258
|
-
|
|
259
|
-
### Pattern 2: SELECT Clause Omission
|
|
260
|
-
|
|
261
|
-
**Frequency**: Common with Supabase/Prisma when adding new columns
|
|
262
|
-
|
|
263
|
-
```typescript
|
|
264
|
-
// FAIL: New column added to response but not to SELECT
|
|
265
|
-
const { data } = await supabase
|
|
266
|
-
.from("users")
|
|
267
|
-
.select("id, email, name") // notification_settings not here
|
|
268
|
-
.single();
|
|
269
|
-
|
|
270
|
-
return { data: { ...data, notification_settings: data.notification_settings } };
|
|
271
|
-
// → notification_settings is always undefined
|
|
272
|
-
|
|
273
|
-
// PASS: Use SELECT * or explicitly include new columns
|
|
274
|
-
const { data } = await supabase
|
|
275
|
-
.from("users")
|
|
276
|
-
.select("*")
|
|
277
|
-
.single();
|
|
278
|
-
```
|
|
279
|
-
|
|
280
|
-
### Pattern 3: Error State Leakage
|
|
281
|
-
|
|
282
|
-
**Frequency**: Moderate — when adding error handling to existing components
|
|
283
|
-
|
|
284
|
-
```typescript
|
|
285
|
-
// FAIL: Error state set but old data not cleared
|
|
286
|
-
catch (err) {
|
|
287
|
-
setError("Failed to load");
|
|
288
|
-
// reservations still shows data from previous tab!
|
|
289
|
-
}
|
|
290
|
-
|
|
291
|
-
// PASS: Clear related state on error
|
|
292
|
-
catch (err) {
|
|
293
|
-
setReservations([]); // Clear stale data
|
|
294
|
-
setError("Failed to load");
|
|
295
|
-
}
|
|
296
|
-
```
|
|
297
|
-
|
|
298
|
-
### Pattern 4: Optimistic Update Without Proper Rollback
|
|
299
|
-
|
|
300
|
-
```typescript
|
|
301
|
-
// FAIL: No rollback on failure
|
|
302
|
-
const handleRemove = async (id: string) => {
|
|
303
|
-
setItems(prev => prev.filter(i => i.id !== id));
|
|
304
|
-
await fetch(`/api/items/${id}`, { method: "DELETE" });
|
|
305
|
-
// If API fails, item is gone from UI but still in DB
|
|
306
|
-
};
|
|
307
|
-
|
|
308
|
-
// PASS: Capture previous state and rollback on failure
|
|
309
|
-
const handleRemove = async (id: string) => {
|
|
310
|
-
const prevItems = [...items];
|
|
311
|
-
setItems(prev => prev.filter(i => i.id !== id));
|
|
312
|
-
try {
|
|
313
|
-
const res = await fetch(`/api/items/${id}`, { method: "DELETE" });
|
|
314
|
-
if (!res.ok) throw new Error("API error");
|
|
315
|
-
} catch {
|
|
316
|
-
setItems(prevItems); // Rollback
|
|
317
|
-
alert("削除に失敗しました");
|
|
318
|
-
}
|
|
319
|
-
};
|
|
320
|
-
```
|
|
321
|
-
|
|
322
|
-
## Strategy: Test Where Bugs Were Found
|
|
323
|
-
|
|
324
|
-
Don't aim for 100% coverage. Instead:
|
|
325
|
-
|
|
326
|
-
```
|
|
327
|
-
Bug found in /api/user/profile → Write test for profile API
|
|
328
|
-
Bug found in /api/user/messages → Write test for messages API
|
|
329
|
-
Bug found in /api/user/favorites → Write test for favorites API
|
|
330
|
-
No bug in /api/user/notifications → Don't write test (yet)
|
|
331
|
-
```
|
|
332
|
-
|
|
333
|
-
**Why this works with AI development:**
|
|
334
|
-
|
|
335
|
-
1. AI tends to make the **same category of mistake** repeatedly
|
|
336
|
-
2. Bugs cluster in complex areas (auth, multi-path logic, state management)
|
|
337
|
-
3. Once tested, that exact regression **cannot happen again**
|
|
338
|
-
4. Test count grows organically with bug fixes — no wasted effort
|
|
339
|
-
|
|
340
|
-
## Quick Reference
|
|
341
|
-
|
|
342
|
-
| AI Regression Pattern | Test Strategy | Priority |
|
|
343
|
-
|---|---|---|
|
|
344
|
-
| Sandbox/production mismatch | Assert same response shape in sandbox mode | High |
|
|
345
|
-
| SELECT clause omission | Assert all required fields in response | High |
|
|
346
|
-
| Error state leakage | Assert state cleanup on error | Medium |
|
|
347
|
-
| Missing rollback | Assert state restored on API failure | Medium |
|
|
348
|
-
| Type cast masking null | Assert field is not undefined | Medium |
|
|
349
|
-
|
|
350
|
-
## DO / DON'T
|
|
351
|
-
|
|
352
|
-
**DO:**
|
|
353
|
-
- Write tests immediately after finding a bug (before fixing it if possible)
|
|
354
|
-
- Test the API response shape, not the implementation
|
|
355
|
-
- Run tests as the first step of every bug-check
|
|
356
|
-
- Keep tests fast (< 1 second total with sandbox mode)
|
|
357
|
-
- Name tests after the bug they prevent (e.g., "BUG-R1 regression")
|
|
358
|
-
|
|
359
|
-
**DON'T:**
|
|
360
|
-
- Write tests for code that has never had a bug
|
|
361
|
-
- Trust AI self-review as a substitute for automated tests
|
|
362
|
-
- Skip sandbox path testing because "it's just mock data"
|
|
363
|
-
- Write integration tests when unit tests suffice
|
|
364
|
-
- Aim for coverage percentage — aim for regression prevention
|
|
@@ -1,184 +0,0 @@
|
|
|
1
|
-
---
|
|
2
|
-
name: architecture-decision-records
|
|
3
|
-
description: |
|
|
4
|
-
Auto-invoke when user says "ADR this", "record this decision", "we decided to use X over Y",
|
|
5
|
-
or "the reason we're doing X instead of Y is...". Also when choosing between significant
|
|
6
|
-
alternatives (framework, database, pattern, API design) and reaching a conclusion.
|
|
7
|
-
When user asks "why did we choose X?" — read existing ADRs in docs/adr/.
|
|
8
|
-
origin: ECC
|
|
9
|
-
allowed-tools: Read, Write, Bash, Glob
|
|
10
|
-
---
|
|
11
|
-
|
|
12
|
-
# Architecture Decision Records
|
|
13
|
-
|
|
14
|
-
Capture architectural decisions as they happen during coding sessions. Instead of decisions living only in Slack threads, PR comments, or someone's memory, this skill produces structured ADR documents that live alongside the code.
|
|
15
|
-
|
|
16
|
-
## When to Activate
|
|
17
|
-
|
|
18
|
-
- User explicitly says "let's record this decision" or "ADR this"
|
|
19
|
-
- User chooses between significant alternatives (framework, library, pattern, database, API design)
|
|
20
|
-
- User says "we decided to..." or "the reason we're doing X instead of Y is..."
|
|
21
|
-
- User asks "why did we choose X?" (read existing ADRs)
|
|
22
|
-
- During planning phases when architectural trade-offs are discussed
|
|
23
|
-
|
|
24
|
-
## ADR Format
|
|
25
|
-
|
|
26
|
-
Use the lightweight ADR format proposed by Michael Nygard, adapted for AI-assisted development:
|
|
27
|
-
|
|
28
|
-
```markdown
|
|
29
|
-
# ADR-NNNN: [Decision Title]
|
|
30
|
-
|
|
31
|
-
**Date**: YYYY-MM-DD
|
|
32
|
-
**Status**: proposed | accepted | deprecated | superseded by ADR-NNNN
|
|
33
|
-
**Deciders**: [who was involved]
|
|
34
|
-
|
|
35
|
-
## Context
|
|
36
|
-
|
|
37
|
-
What is the issue that we're seeing that is motivating this decision or change?
|
|
38
|
-
|
|
39
|
-
[2-5 sentences describing the situation, constraints, and forces at play]
|
|
40
|
-
|
|
41
|
-
## Decision
|
|
42
|
-
|
|
43
|
-
What is the change that we're proposing and/or doing?
|
|
44
|
-
|
|
45
|
-
[1-3 sentences stating the decision clearly]
|
|
46
|
-
|
|
47
|
-
## Alternatives Considered
|
|
48
|
-
|
|
49
|
-
### Alternative 1: [Name]
|
|
50
|
-
- **Pros**: [benefits]
|
|
51
|
-
- **Cons**: [drawbacks]
|
|
52
|
-
- **Why not**: [specific reason this was rejected]
|
|
53
|
-
|
|
54
|
-
### Alternative 2: [Name]
|
|
55
|
-
- **Pros**: [benefits]
|
|
56
|
-
- **Cons**: [drawbacks]
|
|
57
|
-
- **Why not**: [specific reason this was rejected]
|
|
58
|
-
|
|
59
|
-
## Consequences
|
|
60
|
-
|
|
61
|
-
What becomes easier or more difficult to do because of this change?
|
|
62
|
-
|
|
63
|
-
### Positive
|
|
64
|
-
- [benefit 1]
|
|
65
|
-
- [benefit 2]
|
|
66
|
-
|
|
67
|
-
### Negative
|
|
68
|
-
- [trade-off 1]
|
|
69
|
-
- [trade-off 2]
|
|
70
|
-
|
|
71
|
-
### Risks
|
|
72
|
-
- [risk and mitigation]
|
|
73
|
-
```
|
|
74
|
-
|
|
75
|
-
## Workflow
|
|
76
|
-
|
|
77
|
-
### Capturing a New ADR
|
|
78
|
-
|
|
79
|
-
When a decision moment is detected:
|
|
80
|
-
|
|
81
|
-
1. **Initialize (first time only)** — if `docs/adr/` does not exist, ask the user for confirmation before creating the directory, a `README.md` seeded with the index table header (see ADR Index Format below), and a blank `template.md` for manual use. Do not create files without explicit consent.
|
|
82
|
-
2. **Identify the decision** — extract the core architectural choice being made
|
|
83
|
-
3. **Gather context** — what problem prompted this? What constraints exist?
|
|
84
|
-
4. **Document alternatives** — what other options were considered? Why were they rejected?
|
|
85
|
-
5. **State consequences** — what are the trade-offs? What becomes easier/harder?
|
|
86
|
-
6. **Assign a number** — scan existing ADRs in `docs/adr/` and increment
|
|
87
|
-
7. **Confirm and write** — present the draft ADR to the user for review. Only write to `docs/adr/NNNN-decision-title.md` after explicit approval. If the user declines, discard the draft without writing any files.
|
|
88
|
-
8. **Update the index** — append to `docs/adr/README.md`
|
|
89
|
-
|
|
90
|
-
### Reading Existing ADRs
|
|
91
|
-
|
|
92
|
-
When a user asks "why did we choose X?":
|
|
93
|
-
|
|
94
|
-
1. Check if `docs/adr/` exists — if not, respond: "No ADRs found in this project. Would you like to start recording architectural decisions?"
|
|
95
|
-
2. If it exists, scan `docs/adr/README.md` index for relevant entries
|
|
96
|
-
3. Read matching ADR files and present the Context and Decision sections
|
|
97
|
-
4. If no match is found, respond: "No ADR found for that decision. Would you like to record one now?"
|
|
98
|
-
|
|
99
|
-
### ADR Directory Structure
|
|
100
|
-
|
|
101
|
-
```
|
|
102
|
-
docs/
|
|
103
|
-
└── adr/
|
|
104
|
-
├── README.md ← index of all ADRs
|
|
105
|
-
├── 0001-use-nextjs.md
|
|
106
|
-
├── 0002-postgres-over-mongo.md
|
|
107
|
-
├── 0003-rest-over-graphql.md
|
|
108
|
-
└── template.md ← blank template for manual use
|
|
109
|
-
```
|
|
110
|
-
|
|
111
|
-
### ADR Index Format
|
|
112
|
-
|
|
113
|
-
```markdown
|
|
114
|
-
# Architecture Decision Records
|
|
115
|
-
|
|
116
|
-
| ADR | Title | Status | Date |
|
|
117
|
-
|-----|-------|--------|------|
|
|
118
|
-
| [0001](0001-use-nextjs.md) | Use Next.js as frontend framework | accepted | 2026-01-15 |
|
|
119
|
-
| [0002](0002-postgres-over-mongo.md) | PostgreSQL over MongoDB for primary datastore | accepted | 2026-01-20 |
|
|
120
|
-
| [0003](0003-rest-over-graphql.md) | REST API over GraphQL | accepted | 2026-02-01 |
|
|
121
|
-
```
|
|
122
|
-
|
|
123
|
-
## Decision Detection Signals
|
|
124
|
-
|
|
125
|
-
Watch for these patterns in conversation that indicate an architectural decision:
|
|
126
|
-
|
|
127
|
-
**Explicit signals**
|
|
128
|
-
- "Let's go with X"
|
|
129
|
-
- "We should use X instead of Y"
|
|
130
|
-
- "The trade-off is worth it because..."
|
|
131
|
-
- "Record this as an ADR"
|
|
132
|
-
|
|
133
|
-
**Implicit signals** (suggest recording an ADR — do not auto-create without user confirmation)
|
|
134
|
-
- Comparing two frameworks or libraries and reaching a conclusion
|
|
135
|
-
- Making a database schema design choice with stated rationale
|
|
136
|
-
- Choosing between architectural patterns (monolith vs microservices, REST vs GraphQL)
|
|
137
|
-
- Deciding on authentication/authorization strategy
|
|
138
|
-
- Selecting deployment infrastructure after evaluating alternatives
|
|
139
|
-
|
|
140
|
-
## What Makes a Good ADR
|
|
141
|
-
|
|
142
|
-
### Do
|
|
143
|
-
- **Be specific** — "Use Prisma ORM" not "use an ORM"
|
|
144
|
-
- **Record the why** — the rationale matters more than the what
|
|
145
|
-
- **Include rejected alternatives** — future developers need to know what was considered
|
|
146
|
-
- **State consequences honestly** — every decision has trade-offs
|
|
147
|
-
- **Keep it short** — an ADR should be readable in 2 minutes
|
|
148
|
-
- **Use present tense** — "We use X" not "We will use X"
|
|
149
|
-
|
|
150
|
-
### Don't
|
|
151
|
-
- Record trivial decisions — variable naming or formatting choices don't need ADRs
|
|
152
|
-
- Write essays — if the context section exceeds 10 lines, it's too long
|
|
153
|
-
- Omit alternatives — "we just picked it" is not a valid rationale
|
|
154
|
-
- Backfill without marking it — if recording a past decision, note the original date
|
|
155
|
-
- Let ADRs go stale — superseded decisions should reference their replacement
|
|
156
|
-
|
|
157
|
-
## ADR Lifecycle
|
|
158
|
-
|
|
159
|
-
```
|
|
160
|
-
proposed → accepted → [deprecated | superseded by ADR-NNNN]
|
|
161
|
-
```
|
|
162
|
-
|
|
163
|
-
- **proposed**: decision is under discussion, not yet committed
|
|
164
|
-
- **accepted**: decision is in effect and being followed
|
|
165
|
-
- **deprecated**: decision is no longer relevant (e.g., feature removed)
|
|
166
|
-
- **superseded**: a newer ADR replaces this one (always link the replacement)
|
|
167
|
-
|
|
168
|
-
## Categories of Decisions Worth Recording
|
|
169
|
-
|
|
170
|
-
| Category | Examples |
|
|
171
|
-
|----------|---------|
|
|
172
|
-
| **Technology choices** | Framework, language, database, cloud provider |
|
|
173
|
-
| **Architecture patterns** | Monolith vs microservices, event-driven, CQRS |
|
|
174
|
-
| **API design** | REST vs GraphQL, versioning strategy, auth mechanism |
|
|
175
|
-
| **Data modeling** | Schema design, normalization decisions, caching strategy |
|
|
176
|
-
| **Infrastructure** | Deployment model, CI/CD pipeline, monitoring stack |
|
|
177
|
-
| **Security** | Auth strategy, encryption approach, secret management |
|
|
178
|
-
| **Testing** | Test framework, coverage targets, E2E vs integration balance |
|
|
179
|
-
| **Process** | Branching strategy, review process, release cadence |
|
|
180
|
-
|
|
181
|
-
## Integration with Other Skills
|
|
182
|
-
|
|
183
|
-
- **Planner agent**: when the planner proposes architecture changes, suggest creating an ADR
|
|
184
|
-
- **Code reviewer agent**: flag PRs that introduce architectural changes without a corresponding ADR
|
|
@@ -1,98 +0,0 @@
|
|
|
1
|
-
---
|
|
2
|
-
name: benchmark
|
|
3
|
-
description: |
|
|
4
|
-
Auto-invoke when measuring performance impact of a PR, setting up performance baselines,
|
|
5
|
-
investigating "feels slow" or "it's getting slower" reports, comparing stack alternatives,
|
|
6
|
-
or validating Core Web Vitals before a launch. Requires browser MCP or direct API access
|
|
7
|
-
to target environment. SKIP if neither is available — note the gap to user.
|
|
8
|
-
origin: ECC
|
|
9
|
-
allowed-tools: Read, Bash
|
|
10
|
-
---
|
|
11
|
-
|
|
12
|
-
# Benchmark — Performance Baseline & Regression Detection
|
|
13
|
-
|
|
14
|
-
## When to Use
|
|
15
|
-
|
|
16
|
-
- Before and after a PR to measure performance impact
|
|
17
|
-
- Setting up performance baselines for a project
|
|
18
|
-
- When users report "it feels slow"
|
|
19
|
-
- Before a launch — ensure you meet performance targets
|
|
20
|
-
- Comparing your stack against alternatives
|
|
21
|
-
|
|
22
|
-
## How It Works
|
|
23
|
-
|
|
24
|
-
### Mode 1: Page Performance
|
|
25
|
-
|
|
26
|
-
Measures real browser metrics via browser MCP:
|
|
27
|
-
|
|
28
|
-
```
|
|
29
|
-
1. Navigate to each target URL
|
|
30
|
-
2. Measure Core Web Vitals:
|
|
31
|
-
- LCP (Largest Contentful Paint) — target < 2.5s
|
|
32
|
-
- CLS (Cumulative Layout Shift) — target < 0.1
|
|
33
|
-
- INP (Interaction to Next Paint) — target < 200ms
|
|
34
|
-
- FCP (First Contentful Paint) — target < 1.8s
|
|
35
|
-
- TTFB (Time to First Byte) — target < 800ms
|
|
36
|
-
3. Measure resource sizes:
|
|
37
|
-
- Total page weight (target < 1MB)
|
|
38
|
-
- JS bundle size (target < 200KB gzipped)
|
|
39
|
-
- CSS size
|
|
40
|
-
- Image weight
|
|
41
|
-
- Third-party script weight
|
|
42
|
-
4. Count network requests
|
|
43
|
-
5. Check for render-blocking resources
|
|
44
|
-
```
|
|
45
|
-
|
|
46
|
-
### Mode 2: API Performance
|
|
47
|
-
|
|
48
|
-
Benchmarks API endpoints:
|
|
49
|
-
|
|
50
|
-
```
|
|
51
|
-
1. Hit each endpoint 100 times
|
|
52
|
-
2. Measure: p50, p95, p99 latency
|
|
53
|
-
3. Track: response size, status codes
|
|
54
|
-
4. Test under load: 10 concurrent requests
|
|
55
|
-
5. Compare against SLA targets
|
|
56
|
-
```
|
|
57
|
-
|
|
58
|
-
### Mode 3: Build Performance
|
|
59
|
-
|
|
60
|
-
Measures development feedback loop:
|
|
61
|
-
|
|
62
|
-
```
|
|
63
|
-
1. Cold build time
|
|
64
|
-
2. Hot reload time (HMR)
|
|
65
|
-
3. Test suite duration
|
|
66
|
-
4. TypeScript check time
|
|
67
|
-
5. Lint time
|
|
68
|
-
6. Docker build time
|
|
69
|
-
```
|
|
70
|
-
|
|
71
|
-
### Mode 4: Before/After Comparison
|
|
72
|
-
|
|
73
|
-
Run before and after a change to measure impact:
|
|
74
|
-
|
|
75
|
-
```
|
|
76
|
-
/benchmark baseline # saves current metrics
|
|
77
|
-
# ... make changes ...
|
|
78
|
-
/benchmark compare # compares against baseline
|
|
79
|
-
```
|
|
80
|
-
|
|
81
|
-
Output:
|
|
82
|
-
```
|
|
83
|
-
| Metric | Before | After | Delta | Verdict |
|
|
84
|
-
|--------|--------|-------|-------|---------|
|
|
85
|
-
| LCP | 1.2s | 1.4s | +200ms | WARNING: WARN |
|
|
86
|
-
| Bundle | 180KB | 175KB | -5KB | ✓ BETTER |
|
|
87
|
-
| Build | 12s | 14s | +2s | WARNING: WARN |
|
|
88
|
-
```
|
|
89
|
-
|
|
90
|
-
## Output
|
|
91
|
-
|
|
92
|
-
Stores baselines in `docs/benchmarks/` as JSON. Git-tracked so the team shares baselines.
|
|
93
|
-
|
|
94
|
-
## Integration
|
|
95
|
-
|
|
96
|
-
- CI: run `/benchmark compare` on every PR
|
|
97
|
-
- Pair with `/canary-watch` for post-deploy monitoring
|
|
98
|
-
- Pair with `/browser-qa` for full pre-ship checklist
|