@simplysm/claude 13.0.26 → 13.0.28
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/claude/skills/sd-check/SKILL.md +139 -77
- package/claude/skills/sd-check/baseline-analysis.md +129 -0
- package/claude/skills/sd-check/test-scenarios.md +172 -0
- package/claude/skills/sd-debug/SKILL.md +296 -0
- package/claude/skills/sd-debug/condition-based-waiting-example.ts +158 -0
- package/claude/skills/sd-debug/condition-based-waiting.md +115 -0
- package/claude/skills/sd-debug/defense-in-depth.md +122 -0
- package/claude/skills/sd-debug/find-polluter.sh +58 -0
- package/claude/skills/sd-debug/root-cause-tracing.md +169 -0
- package/claude/skills/sd-debug/test-baseline-pressure.md +59 -0
- package/claude/skills/sd-use/SKILL.md +1 -0
- package/package.json +1 -1
|
@@ -1,143 +1,205 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: sd-check
|
|
3
|
-
description:
|
|
4
|
-
argument-hint: "[path]"
|
|
5
|
-
model: opus
|
|
3
|
+
description: Use when verifying code quality via typecheck, lint, and tests - before deployment, PR creation, after code changes, or when type errors, lint violations, or test failures are suspected. Applies to whole project or specific paths.
|
|
6
4
|
---
|
|
7
5
|
|
|
8
|
-
|
|
6
|
+
# sd-check
|
|
9
7
|
|
|
10
|
-
|
|
11
|
-
- `/sd-check packages/core-common` — verify a specific path only
|
|
8
|
+
Verify code quality through parallel execution of typecheck, lint, and test checks.
|
|
12
9
|
|
|
13
|
-
|
|
10
|
+
## Overview
|
|
14
11
|
|
|
15
|
-
|
|
12
|
+
**This skill provides EXACT STEPS you MUST follow - it is NOT a command to invoke.**
|
|
16
13
|
|
|
17
|
-
|
|
18
|
-
Run these checks **in parallel** and report results before proceeding.
|
|
14
|
+
**Foundational Principle:** Violating the letter of these steps is violating the spirit of verification.
|
|
19
15
|
|
|
20
|
-
|
|
16
|
+
When the user asks to verify code, YOU will manually execute **EXACTLY THESE 4 STEPS** (no more, no less):
|
|
21
17
|
|
|
22
|
-
|
|
23
|
-
|
|
18
|
+
**Step 1:** Environment Pre-check (4 checks in parallel)
|
|
19
|
+
**Step 2:** Launch 3 haiku agents in parallel (typecheck, lint, test ONLY)
|
|
20
|
+
**Step 3:** Collect results, fix errors in priority order
|
|
21
|
+
**Step 4:** Re-verify (go back to Step 2) until all pass
|
|
24
22
|
|
|
25
|
-
|
|
23
|
+
**Core principle:** Always re-run ALL checks after any fix - changes can cascade.
|
|
26
24
|
|
|
27
|
-
|
|
25
|
+
**CRITICAL:**
|
|
26
|
+
- This skill verifies ONLY typecheck, lint, and test
|
|
27
|
+
- **NO BUILD. NO DEV SERVER. NO TEAMS. NO TASK LISTS.**
|
|
28
|
+
- Do NOT create your own "better" workflow - follow these 4 steps EXACTLY
|
|
28
29
|
|
|
29
|
-
|
|
30
|
+
## Usage
|
|
30
31
|
|
|
31
|
-
|
|
32
|
-
|
|
33
|
-
```
|
|
32
|
+
- `/sd-check` — verify entire project
|
|
33
|
+
- `/sd-check packages/core-common` — verify specific path only
|
|
34
34
|
|
|
35
|
-
|
|
35
|
+
**Default:** If no path argument provided, verify entire project.
|
|
36
36
|
|
|
37
|
-
|
|
37
|
+
## Quick Reference
|
|
38
38
|
|
|
39
|
-
|
|
39
|
+
| Check | Command | Agent Model | Purpose |
|
|
40
|
+
|-------|---------|-------------|---------|
|
|
41
|
+
| Typecheck | `pnpm typecheck [path]` | haiku | Type errors |
|
|
42
|
+
| Lint | `pnpm lint --fix [path]` | haiku | Code quality |
|
|
43
|
+
| Test | `pnpm vitest [path] --run` | haiku | Functionality |
|
|
40
44
|
|
|
41
|
-
|
|
42
|
-
- `lint`
|
|
45
|
+
**All 3 run in PARALLEL** (separate haiku agents, single message)
|
|
43
46
|
|
|
44
|
-
|
|
47
|
+
## Workflow
|
|
45
48
|
|
|
46
|
-
###
|
|
49
|
+
### Step 1: Environment Pre-check
|
|
47
50
|
|
|
48
|
-
|
|
51
|
+
Before ANY verification, confirm environment setup with these checks **in parallel**:
|
|
49
52
|
|
|
50
|
-
|
|
51
|
-
|
|
52
|
-
```
|
|
53
|
+
1. **Root package.json version** - Read `package.json`, verify major version is `13` (e.g., `13.x.x`)
|
|
54
|
+
- If not 13: STOP, report "This skill requires simplysm v13. Current: {version}"
|
|
53
55
|
|
|
54
|
-
|
|
56
|
+
2. **pnpm workspace** - Verify `pnpm-workspace.yaml` and `pnpm-lock.yaml` exist
|
|
57
|
+
- Command: `ls pnpm-workspace.yaml pnpm-lock.yaml`
|
|
58
|
+
- If missing: STOP, report to user
|
|
55
59
|
|
|
56
|
-
|
|
57
|
-
|
|
58
|
-
If all pre-checks pass, report "Environment OK" and proceed to code verification.
|
|
60
|
+
3. **package.json scripts** - Read root `package.json`, confirm `typecheck` and `lint` scripts defined
|
|
61
|
+
- If missing: STOP, report to user
|
|
59
62
|
|
|
60
|
-
|
|
63
|
+
4. **Vitest config** - Verify `vitest.config.ts` exists
|
|
64
|
+
- Command: `ls vitest.config.ts`
|
|
65
|
+
- If missing: STOP, report to user
|
|
61
66
|
|
|
62
|
-
|
|
63
|
-
Repeat until all checks pass.
|
|
67
|
+
**If all pass:** Report "Environment OK", proceed to Step 2.
|
|
64
68
|
|
|
65
|
-
### Step
|
|
69
|
+
### Step 2: Launch 3 Haiku Agents in Parallel
|
|
66
70
|
|
|
67
|
-
Launch 3
|
|
71
|
+
Launch ALL 3 agents in a **single message** using Task tool.
|
|
68
72
|
|
|
69
|
-
**
|
|
73
|
+
**Replace `[path]` with user's argument, or OMIT if no argument (defaults to full project).**
|
|
70
74
|
|
|
71
75
|
**Agent 1 - Typecheck:**
|
|
72
76
|
```
|
|
73
|
-
Task tool
|
|
77
|
+
Task tool:
|
|
74
78
|
subagent_type: Bash
|
|
75
79
|
model: haiku
|
|
76
80
|
description: "Run typecheck"
|
|
77
|
-
prompt: "Run `pnpm typecheck [path]` and return
|
|
81
|
+
prompt: "Run `pnpm typecheck [path]` and return full output. Do NOT analyze or fix - just report raw output."
|
|
78
82
|
```
|
|
79
83
|
|
|
80
84
|
**Agent 2 - Lint:**
|
|
81
85
|
```
|
|
82
|
-
Task tool
|
|
86
|
+
Task tool:
|
|
83
87
|
subagent_type: Bash
|
|
84
88
|
model: haiku
|
|
85
89
|
description: "Run lint with auto-fix"
|
|
86
|
-
prompt: "Run `pnpm lint --fix [path]` and return
|
|
90
|
+
prompt: "Run `pnpm lint --fix [path]` and return full output. Do NOT analyze or fix - just report raw output."
|
|
87
91
|
```
|
|
88
92
|
|
|
89
93
|
**Agent 3 - Test:**
|
|
90
94
|
```
|
|
91
|
-
Task tool
|
|
95
|
+
Task tool:
|
|
92
96
|
subagent_type: Bash
|
|
93
97
|
model: haiku
|
|
94
98
|
description: "Run tests"
|
|
95
|
-
prompt: "Run `pnpm vitest [path] --run` and return
|
|
99
|
+
prompt: "Run `pnpm vitest [path] --run` and return full output. Do NOT analyze or fix - just report raw output."
|
|
96
100
|
```
|
|
97
101
|
|
|
98
|
-
### Step
|
|
102
|
+
### Step 3: Collect Results and Fix Errors
|
|
103
|
+
|
|
104
|
+
Wait for ALL 3 agents. Collect outputs.
|
|
105
|
+
|
|
106
|
+
**If all checks passed:** Complete (see Completion Criteria).
|
|
107
|
+
|
|
108
|
+
**If any errors found:**
|
|
99
109
|
|
|
100
|
-
|
|
110
|
+
1. **Analyze by priority:** Typecheck → Lint → Test
|
|
111
|
+
- Typecheck errors may cause lint/test errors (cascade)
|
|
101
112
|
|
|
102
|
-
|
|
113
|
+
2. **Read failing files** to identify root cause
|
|
103
114
|
|
|
104
|
-
|
|
105
|
-
- Typecheck
|
|
106
|
-
|
|
107
|
-
|
|
108
|
-
|
|
109
|
-
|
|
110
|
-
|
|
111
|
-
-
|
|
112
|
-
- If intentional changes not reflected in tests: Update test code
|
|
113
|
-
- If source code bug: Fix source code
|
|
114
|
-
4. Proceed to Step 3
|
|
115
|
+
3. **Fix with Edit:**
|
|
116
|
+
- **Typecheck:** Fix type issues
|
|
117
|
+
- **Lint:** Fix code quality (most auto-fixed by `--fix`)
|
|
118
|
+
- **Test:**
|
|
119
|
+
- Run `git diff` to check intentional changes
|
|
120
|
+
- If changes not reflected in tests: Update test
|
|
121
|
+
- If source bug: Fix source
|
|
122
|
+
- **If root cause unclear OR 2-3 fix attempts failed:** Recommend `/sd-debug`
|
|
115
123
|
|
|
116
|
-
|
|
124
|
+
4. **Proceed to Step 4**
|
|
117
125
|
|
|
118
|
-
### Step
|
|
126
|
+
### Step 4: Re-verify (Loop Until All Pass)
|
|
119
127
|
|
|
120
|
-
|
|
121
|
-
|
|
128
|
+
**CRITICAL:** After ANY fix, re-run ALL 3 checks.
|
|
129
|
+
|
|
130
|
+
Go back to Step 2 and launch 3 haiku agents again.
|
|
131
|
+
|
|
132
|
+
**Do NOT assume:** "I only fixed typecheck → skip lint/test". Fixes cascade.
|
|
133
|
+
|
|
134
|
+
Repeat Steps 2-4 until all 3 checks pass.
|
|
122
135
|
|
|
123
136
|
## Common Mistakes
|
|
124
137
|
|
|
125
|
-
### Running checks sequentially
|
|
126
|
-
|
|
127
|
-
|
|
138
|
+
### ❌ Running checks sequentially
|
|
139
|
+
**Wrong:** Launch agent 1, wait → agent 2, wait → agent 3
|
|
140
|
+
**Right:** Launch ALL 3 in single message (parallel Task calls)
|
|
141
|
+
|
|
142
|
+
### ❌ Fixing before collecting all results
|
|
143
|
+
**Wrong:** Agent 1 returns error → fix immediately → re-verify
|
|
144
|
+
**Right:** Wait for all 3 → collect all errors → fix in priority order → re-verify
|
|
145
|
+
|
|
146
|
+
### ❌ Skipping re-verification after fixes
|
|
147
|
+
**Wrong:** Fix typecheck → assume lint/test still pass
|
|
148
|
+
**Right:** ALWAYS re-run all 3 checks after any fix
|
|
149
|
+
|
|
150
|
+
### ❌ Using wrong model
|
|
151
|
+
**Wrong:** `model: opus` or `model: sonnet` for verification agents
|
|
152
|
+
**Right:** `model: haiku` (cheaper, faster for command execution)
|
|
153
|
+
|
|
154
|
+
### ❌ Including build/dev steps
|
|
155
|
+
**Wrong:** Run `pnpm build` or `pnpm dev` as part of verification
|
|
156
|
+
**Right:** sd-check is ONLY typecheck, lint, test (no build, no dev)
|
|
157
|
+
|
|
158
|
+
### ❌ Asking user for path
|
|
159
|
+
**Wrong:** No path provided → ask "which package?"
|
|
160
|
+
**Right:** No path → verify entire project (omit path in commands)
|
|
161
|
+
|
|
162
|
+
### ❌ Infinite fix loop
|
|
163
|
+
**Wrong:** Keep trying same fix when tests fail repeatedly
|
|
164
|
+
**Right:** After 2-3 failed attempts → recommend `/sd-debug`
|
|
165
|
+
|
|
166
|
+
## Red Flags - STOP and Follow Workflow
|
|
128
167
|
|
|
129
|
-
|
|
130
|
-
❌ **Wrong**: Agent 1 returns error → fix immediately → launch agents again
|
|
131
|
-
✅ **Right**: Wait for all 3 agents → collect all errors → fix in priority order → re-verify
|
|
168
|
+
If you find yourself doing ANY of these, you're violating the skill:
|
|
132
169
|
|
|
133
|
-
|
|
134
|
-
|
|
135
|
-
|
|
170
|
+
- Treating sd-check as a command to invoke (`Skill: sd-check Args: ...`)
|
|
171
|
+
- Including build or dev server in verification
|
|
172
|
+
- Running agents sequentially instead of parallel
|
|
173
|
+
- Not re-verifying after every fix
|
|
174
|
+
- Asking user for path when none provided
|
|
175
|
+
- Continuing past 2-3 failed fix attempts without recommending `/sd-debug`
|
|
176
|
+
- Spawning 4+ agents (only 3: typecheck, lint, test)
|
|
136
177
|
|
|
137
|
-
|
|
138
|
-
❌ **Wrong**: `model: opus` or `model: sonnet` for verification agents
|
|
139
|
-
✅ **Right**: `model: haiku` for command execution (cheaper, faster)
|
|
178
|
+
**All of these violate the skill's core principles. Go back to Step 1 and follow the workflow exactly.**
|
|
140
179
|
|
|
141
180
|
## Completion Criteria
|
|
142
181
|
|
|
143
|
-
Complete when
|
|
182
|
+
**Complete when:**
|
|
183
|
+
- All 3 checks (typecheck, lint, test) pass without errors
|
|
184
|
+
- Report: "All checks passed - code verified"
|
|
185
|
+
|
|
186
|
+
**Do NOT complete if:**
|
|
187
|
+
- Any check has errors
|
|
188
|
+
- Haven't re-verified after a fix
|
|
189
|
+
- Environment pre-checks failed
|
|
190
|
+
|
|
191
|
+
## Rationalization Table
|
|
192
|
+
|
|
193
|
+
| Excuse | Reality |
|
|
194
|
+
|--------|---------|
|
|
195
|
+
| "I'm following the spirit, not the letter" | Violating the letter IS violating the spirit - follow EXACTLY |
|
|
196
|
+
| "I'll create a better workflow with teams/tasks" | Follow the 4 steps EXACTLY - no teams, no task lists |
|
|
197
|
+
| "I'll split tests into multiple agents" | Only 3 agents total: typecheck, lint, test |
|
|
198
|
+
| "Stratified parallel is faster" | Run ALL 3 in parallel via separate agents - truly parallel |
|
|
199
|
+
| "I only fixed lint, typecheck still passes" | Always re-verify ALL - fixes can cascade |
|
|
200
|
+
| "Build is part of verification" | Build is deployment, not verification - NEVER include it |
|
|
201
|
+
| "Let me ask which path to check" | Default to full project - explicit behavior |
|
|
202
|
+
| "I'll try one more fix approach" | After 2-3 attempts → recommend /sd-debug |
|
|
203
|
+
| "Tests are independent of types" | Type fixes affect tests - always re-run ALL |
|
|
204
|
+
| "I'll invoke sd-check skill with args" | sd-check is EXACT STEPS, not a command |
|
|
205
|
+
| "4 agents: typecheck, lint, test, build" | Only 3 agents - build is FORBIDDEN |
|
|
@@ -0,0 +1,129 @@
|
|
|
1
|
+
# Baseline Test Analysis - sd-check Skill
|
|
2
|
+
|
|
3
|
+
## Summary
|
|
4
|
+
|
|
5
|
+
Tested 6 scenarios with agents WITHOUT sd-check skill. All agents failed to follow optimal verification patterns.
|
|
6
|
+
|
|
7
|
+
## Common Failures Across All Scenarios
|
|
8
|
+
|
|
9
|
+
### 1. No Cost Optimization
|
|
10
|
+
**Failure:** All agents planned direct command execution instead of using haiku subagents.
|
|
11
|
+
|
|
12
|
+
**Observed in:** All scenarios (1-6)
|
|
13
|
+
|
|
14
|
+
**Impact:** Higher cost, no isolation
|
|
15
|
+
|
|
16
|
+
**What skill must prevent:** Skill must explicitly require haiku subagent usage
|
|
17
|
+
|
|
18
|
+
### 2. Incomplete Parallelization
|
|
19
|
+
**Failure:** Agents either ran sequentially or only partially parallelized.
|
|
20
|
+
|
|
21
|
+
**Examples:**
|
|
22
|
+
- Scenario 1: Used `&` for typecheck/lint but ran tests sequentially ("stratified parallel")
|
|
23
|
+
- Scenario 2: No parallelization at all
|
|
24
|
+
- Scenario 3: Sequential fix → verify → fix → verify
|
|
25
|
+
|
|
26
|
+
**Impact:** Slower verification (60s → 120s+)
|
|
27
|
+
|
|
28
|
+
**What skill must prevent:** Skill must require ALL 3 checks (typecheck, lint, test) in parallel via 3 separate haiku agents
|
|
29
|
+
|
|
30
|
+
### 3. Missing Environment Pre-checks
|
|
31
|
+
**Failure:** No systematic environment validation before running checks.
|
|
32
|
+
|
|
33
|
+
**Observed:**
|
|
34
|
+
- Scenario 1: Checked Docker for ORM tests, but not other prerequisites
|
|
35
|
+
- Scenario 6: Only checked pnpm-lock.yaml, missed package.json version, scripts, vitest.config.ts
|
|
36
|
+
|
|
37
|
+
**Impact:** Confusing errors if environment misconfigured
|
|
38
|
+
|
|
39
|
+
**What skill must prevent:** Skill must require 4 pre-checks (package.json v13, pnpm workspace, scripts, vitest config)
|
|
40
|
+
|
|
41
|
+
### 4. Unclear Re-verification Loop
|
|
42
|
+
**Failure:** After fixing errors, no clear "re-run ALL checks" loop.
|
|
43
|
+
|
|
44
|
+
**Examples:**
|
|
45
|
+
- Scenario 3: Phase 1 verify → Phase 2 verify → Phase 3 verify (but no final "all phases" re-verify)
|
|
46
|
+
- Agents treated it as linear progression, not a loop
|
|
47
|
+
|
|
48
|
+
**Impact:** Fixes in one area may break another (cascade errors)
|
|
49
|
+
|
|
50
|
+
**What skill must prevent:** Skill must explicitly state "re-run ALL 3 checks until ALL pass"
|
|
51
|
+
|
|
52
|
+
### 5. No sd-debug Recommendation
|
|
53
|
+
**Failure:** When root cause unclear after multiple attempts, agents didn't recommend sd-debug.
|
|
54
|
+
|
|
55
|
+
**Observed:**
|
|
56
|
+
- Scenario 4: After 4 failed attempts, agent suggested various debugging approaches but NOT `/sd-debug` skill
|
|
57
|
+
|
|
58
|
+
**Impact:** User wastes time when systematic root-cause investigation needed
|
|
59
|
+
|
|
60
|
+
**What skill must prevent:** Skill must state "after 2-3 failed fix attempts → recommend /sd-debug"
|
|
61
|
+
|
|
62
|
+
### 6. Incorrect Default Behavior
|
|
63
|
+
**Failure:** When no path argument provided, agents asked user for clarification instead of defaulting to full project.
|
|
64
|
+
|
|
65
|
+
**Observed:**
|
|
66
|
+
- Scenario 5: Agent wanted to ask "which package?" instead of running on entire project
|
|
67
|
+
|
|
68
|
+
**Impact:** Unnecessary user friction
|
|
69
|
+
|
|
70
|
+
**What skill must prevent:** Skill must state "if no path argument → run on entire project (omit path in commands)"
|
|
71
|
+
|
|
72
|
+
### 7. Scope Creep (Unnecessary Steps)
|
|
73
|
+
**Failure:** Agents included steps not relevant to "verification".
|
|
74
|
+
|
|
75
|
+
**Examples:**
|
|
76
|
+
- Scenario 1: Included `pnpm build` (verification doesn't need build)
|
|
77
|
+
- Scenario 2: Included dev server test (not verification)
|
|
78
|
+
|
|
79
|
+
**Impact:** Wasted time, confusion about scope
|
|
80
|
+
|
|
81
|
+
**What skill must prevent:** Skill must clarify scope: typecheck, lint, test ONLY (no build, no dev)
|
|
82
|
+
|
|
83
|
+
## Rationalization Patterns (Verbatim)
|
|
84
|
+
|
|
85
|
+
### "Parallelization while maintaining logical dependencies"
|
|
86
|
+
- Used to justify partial parallelization
|
|
87
|
+
- Agents ran typecheck & lint in parallel, but tests sequentially
|
|
88
|
+
- **Counter:** ALL 3 checks are independent → all 3 in parallel
|
|
89
|
+
|
|
90
|
+
### "Stratified parallel execution"
|
|
91
|
+
- Used to justify sequential test runs grouped by environment
|
|
92
|
+
- **Counter:** Vitest projects are independent → run all via single command
|
|
93
|
+
|
|
94
|
+
### "Faster to fail fast on static checks"
|
|
95
|
+
- Good principle, but used to justify including build step
|
|
96
|
+
- **Counter:** Build is not a static check, and not required for verification
|
|
97
|
+
|
|
98
|
+
### "Type safety first" / "Incremental verification"
|
|
99
|
+
- Used to justify Phase 1 → Phase 2 → Phase 3 linear progression
|
|
100
|
+
- **Counter:** After fixes, must re-verify ALL phases (loop), not just next phase
|
|
101
|
+
|
|
102
|
+
### "Understanding first, then ONE comprehensive fix"
|
|
103
|
+
- Used to justify continued debugging without tools
|
|
104
|
+
- **Counter:** After 2-3 attempts, recommend /sd-debug for systematic investigation
|
|
105
|
+
|
|
106
|
+
### "Ask for clarification" / "Explicit and predictable"
|
|
107
|
+
- Used to justify asking user for path when none provided
|
|
108
|
+
- **Counter:** Default to full project is explicit and predictable behavior
|
|
109
|
+
|
|
110
|
+
## Success Criteria for Skill
|
|
111
|
+
|
|
112
|
+
Skill is effective if agents:
|
|
113
|
+
1. ✅ Launch 3 haiku agents in parallel (typecheck, lint, test)
|
|
114
|
+
2. ✅ Run environment pre-checks before verification
|
|
115
|
+
3. ✅ Default to full project when no path argument
|
|
116
|
+
4. ✅ Fix errors in priority order (typecheck → lint → test)
|
|
117
|
+
5. ✅ Re-run ALL 3 checks after any fix (loop until all pass)
|
|
118
|
+
6. ✅ Recommend /sd-debug after 2-3 failed fix attempts
|
|
119
|
+
7. ✅ Do NOT include build or dev server steps
|
|
120
|
+
|
|
121
|
+
## Test Scenarios for GREEN Phase
|
|
122
|
+
|
|
123
|
+
After writing skill, re-run scenarios 1-6. Agents should now exhibit correct behavior above.
|
|
124
|
+
|
|
125
|
+
Focus on:
|
|
126
|
+
- Scenario 1: Verify parallel haiku agents + env checks
|
|
127
|
+
- Scenario 3: Verify re-verification loop + priority
|
|
128
|
+
- Scenario 4: Verify sd-debug recommendation
|
|
129
|
+
- Scenario 5: Verify default to full project
|
|
@@ -0,0 +1,172 @@
|
|
|
1
|
+
# sd-check Pressure Test Scenarios
|
|
2
|
+
|
|
3
|
+
## Scenario 1: Basic Application - Full Project Check (Time Pressure)
|
|
4
|
+
|
|
5
|
+
**Setup:**
|
|
6
|
+
- Simulated project with typecheck, lint, test configured
|
|
7
|
+
- No existing errors
|
|
8
|
+
|
|
9
|
+
**Pressure:**
|
|
10
|
+
- Time constraint: "Need results quickly for deployment"
|
|
11
|
+
|
|
12
|
+
**Agent Prompt:**
|
|
13
|
+
```
|
|
14
|
+
I need to verify the entire simplysm project before deployment. Can you run all checks? We need to deploy soon, so please be fast.
|
|
15
|
+
```
|
|
16
|
+
|
|
17
|
+
**Expected Baseline Failures (without skill):**
|
|
18
|
+
- May run checks sequentially instead of parallel (slower)
|
|
19
|
+
- May skip environment pre-checks
|
|
20
|
+
- May not use haiku model (more expensive)
|
|
21
|
+
|
|
22
|
+
**Success Criteria (with skill):**
|
|
23
|
+
- Runs environment pre-checks first
|
|
24
|
+
- Launches 3 haiku agents in parallel
|
|
25
|
+
- Reports results correctly
|
|
26
|
+
|
|
27
|
+
---
|
|
28
|
+
|
|
29
|
+
## Scenario 2: Variation - Specific Path Check (Complex Path)
|
|
30
|
+
|
|
31
|
+
**Setup:**
|
|
32
|
+
- Project with multiple packages
|
|
33
|
+
- Target path: `packages/solid-demo`
|
|
34
|
+
|
|
35
|
+
**Pressure:**
|
|
36
|
+
- Complex path with potential typos
|
|
37
|
+
- User expects path to be handled correctly
|
|
38
|
+
|
|
39
|
+
**Agent Prompt:**
|
|
40
|
+
```
|
|
41
|
+
Can you verify just the packages/solid-demo directory? I only changed files there.
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
**Expected Baseline Failures:**
|
|
45
|
+
- May forget to pass path argument to commands
|
|
46
|
+
- May run full project check instead
|
|
47
|
+
- May incorrectly format path in commands
|
|
48
|
+
|
|
49
|
+
**Success Criteria:**
|
|
50
|
+
- Correctly passes `packages/solid-demo` to all 3 commands
|
|
51
|
+
- Only reports errors from that path
|
|
52
|
+
|
|
53
|
+
---
|
|
54
|
+
|
|
55
|
+
## Scenario 3: Edge Case - Typecheck Errors (Fix Priority)
|
|
56
|
+
|
|
57
|
+
**Setup:**
|
|
58
|
+
- Simulated project with typecheck errors that cascade to lint/test
|
|
59
|
+
|
|
60
|
+
**Pressure:**
|
|
61
|
+
- Multiple failing checks (frustration)
|
|
62
|
+
- Desire to "just make it work"
|
|
63
|
+
|
|
64
|
+
**Agent Prompt:**
|
|
65
|
+
```
|
|
66
|
+
Please verify the project. (Note: project has typecheck errors that cause lint and test failures)
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
**Expected Baseline Failures:**
|
|
70
|
+
- May fix lint or test errors first (wrong priority)
|
|
71
|
+
- May not understand cascade relationship
|
|
72
|
+
- May fix all errors simultaneously without priority
|
|
73
|
+
|
|
74
|
+
**Success Criteria:**
|
|
75
|
+
- Fixes typecheck errors first
|
|
76
|
+
- Recognizes cascade relationship
|
|
77
|
+
- Re-verifies after each fix round
|
|
78
|
+
|
|
79
|
+
---
|
|
80
|
+
|
|
81
|
+
## Scenario 4: Edge Case - Repeated Failures (Loop Exit)
|
|
82
|
+
|
|
83
|
+
**Setup:**
|
|
84
|
+
- Simulated project with obscure test failure
|
|
85
|
+
- Root cause is unclear
|
|
86
|
+
|
|
87
|
+
**Pressure:**
|
|
88
|
+
- Repeated verification failures (fatigue)
|
|
89
|
+
- Temptation to give up or skip
|
|
90
|
+
|
|
91
|
+
**Agent Prompt:**
|
|
92
|
+
```
|
|
93
|
+
Verify the project. (Note: test failures persist after 2-3 fix attempts)
|
|
94
|
+
```
|
|
95
|
+
|
|
96
|
+
**Expected Baseline Failures:**
|
|
97
|
+
- May keep trying same fix repeatedly (infinite loop)
|
|
98
|
+
- May skip re-verification to "save time"
|
|
99
|
+
- May not recommend sd-debug
|
|
100
|
+
|
|
101
|
+
**Success Criteria:**
|
|
102
|
+
- After 2-3 failed attempts, recommends `/sd-debug`
|
|
103
|
+
- Does not enter infinite loop
|
|
104
|
+
- Always re-verifies after fixes
|
|
105
|
+
|
|
106
|
+
---
|
|
107
|
+
|
|
108
|
+
## Scenario 5: Missing Information Test - No Path Argument
|
|
109
|
+
|
|
110
|
+
**Setup:**
|
|
111
|
+
- Standard project setup
|
|
112
|
+
|
|
113
|
+
**Pressure:**
|
|
114
|
+
- Ambiguous user request
|
|
115
|
+
|
|
116
|
+
**Agent Prompt:**
|
|
117
|
+
```
|
|
118
|
+
Run sd-check.
|
|
119
|
+
```
|
|
120
|
+
|
|
121
|
+
**Expected Baseline Failures:**
|
|
122
|
+
- May ask user for path (skill should default to full project)
|
|
123
|
+
- May incorrectly assume a path
|
|
124
|
+
|
|
125
|
+
**Success Criteria:**
|
|
126
|
+
- Runs on entire project (no path argument)
|
|
127
|
+
- Does not ask user for clarification
|
|
128
|
+
|
|
129
|
+
---
|
|
130
|
+
|
|
131
|
+
## Scenario 6: Missing Information Test - Invalid Environment
|
|
132
|
+
|
|
133
|
+
**Setup:**
|
|
134
|
+
- Project missing pnpm-lock.yaml or vitest.config.ts
|
|
135
|
+
|
|
136
|
+
**Pressure:**
|
|
137
|
+
- User expects check to work
|
|
138
|
+
|
|
139
|
+
**Agent Prompt:**
|
|
140
|
+
```
|
|
141
|
+
Please run sd-check on the project.
|
|
142
|
+
```
|
|
143
|
+
|
|
144
|
+
**Expected Baseline Failures:**
|
|
145
|
+
- May proceed without environment checks
|
|
146
|
+
- May report confusing errors from missing dependencies
|
|
147
|
+
|
|
148
|
+
**Success Criteria:**
|
|
149
|
+
- Runs environment pre-checks
|
|
150
|
+
- Stops with clear error message if environment invalid
|
|
151
|
+
- Reports which specific check failed
|
|
152
|
+
|
|
153
|
+
---
|
|
154
|
+
|
|
155
|
+
## Testing Methodology
|
|
156
|
+
|
|
157
|
+
### RED Phase (Current)
|
|
158
|
+
1. Run each scenario WITHOUT sd-check skill loaded
|
|
159
|
+
2. Document exact agent behavior verbatim
|
|
160
|
+
3. Record rationalizations used
|
|
161
|
+
4. Identify patterns in failures
|
|
162
|
+
|
|
163
|
+
### GREEN Phase
|
|
164
|
+
1. Write skill addressing specific baseline failures
|
|
165
|
+
2. Run same scenarios WITH skill
|
|
166
|
+
3. Verify compliance
|
|
167
|
+
|
|
168
|
+
### REFACTOR Phase
|
|
169
|
+
1. Identify new rationalizations from GREEN testing
|
|
170
|
+
2. Add explicit counters
|
|
171
|
+
3. Build rationalization table
|
|
172
|
+
4. Re-test until bulletproof
|