claude-termux 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CLAUDE.md +60 -0
- package/GEMINI.md +20 -0
- package/README.md +135 -0
- package/TERMUX.md +204 -0
- package/agents/accessibility-reviewer.md +96 -0
- package/agents/ai-prompt-optimizer.md +94 -0
- package/agents/api-tester.md +102 -0
- package/agents/code-generator.md +94 -0
- package/agents/code-reviewer.md +47 -0
- package/agents/component-generator.md +102 -0
- package/agents/doc-generator.md +91 -0
- package/agents/migration-generator.md +94 -0
- package/agents/performance-analyzer.md +90 -0
- package/agents/proactive-mode.md +91 -0
- package/agents/readme-generator.md +101 -0
- package/agents/security-auditor.md +86 -0
- package/agents/terraform-generator.md +94 -0
- package/agents/test-generator.md +76 -0
- package/commands/brainstorm.md +5 -0
- package/commands/execute-plan.md +5 -0
- package/commands/write-plan.md +5 -0
- package/hooks/auto-context.json +31 -0
- package/hooks/hooks.json +15 -0
- package/hooks/run-hook.cmd +19 -0
- package/hooks/session-start.sh +52 -0
- package/hooks/smart-session.sh +96 -0
- package/install.sh +210 -0
- package/lib/skills-core.js +208 -0
- package/mcp.json +34 -0
- package/package.json +49 -0
- package/plugins/README.md +47 -0
- package/plugins/installed_plugins.json +5 -0
- package/plugins/known_marketplaces.json +10 -0
- package/plugins/marketplace-info/marketplace.json +517 -0
- package/postinstall.js +238 -0
- package/settings.json +27 -0
- package/settings.local.json +25 -0
- package/skills/api-development/SKILL.md +11 -0
- package/skills/api-development/openapi/api-documentation.yaml +108 -0
- package/skills/brainstorming/SKILL.md +54 -0
- package/skills/code-quality/SKILL.md +196 -0
- package/skills/condition-based-waiting/SKILL.md +120 -0
- package/skills/condition-based-waiting/example.ts +158 -0
- package/skills/database-development/SKILL.md +11 -0
- package/skills/database-development/migrations/migration.template.sql +49 -0
- package/skills/defense-in-depth/SKILL.md +127 -0
- package/skills/deployment/SKILL.md +11 -0
- package/skills/deployment/ci-cd/github-actions.yml +95 -0
- package/skills/deployment/docker/Dockerfile.template +39 -0
- package/skills/dispatching-parallel-agents/SKILL.md +180 -0
- package/skills/documentation-generation/SKILL.md +8 -0
- package/skills/documentation-generation/templates/README.template.md +60 -0
- package/skills/error-handling/SKILL.md +267 -0
- package/skills/executing-plans/SKILL.md +76 -0
- package/skills/finishing-a-development-branch/SKILL.md +200 -0
- package/skills/frontend-design/frontend-design/SKILL.md +42 -0
- package/skills/integration-testing/SKILL.md +13 -0
- package/skills/integration-testing/examples/contract-test.py +317 -0
- package/skills/integration-testing/examples/e2e-test.js +147 -0
- package/skills/integration-testing/examples/test-isolation.md +94 -0
- package/skills/logging-monitoring/SKILL.md +66 -0
- package/skills/mobile-development/SKILL.md +11 -0
- package/skills/mobile-development/responsive/responsive.css +80 -0
- package/skills/performance-optimization/SKILL.md +9 -0
- package/skills/performance-optimization/profiling/profile.template.js +21 -0
- package/skills/receiving-code-review/SKILL.md +209 -0
- package/skills/refactoring/SKILL.md +11 -0
- package/skills/refactoring/code-smells/common-smells.md +115 -0
- package/skills/requesting-code-review/SKILL.md +105 -0
- package/skills/requesting-code-review/code-reviewer.md +146 -0
- package/skills/root-cause-tracing/SKILL.md +174 -0
- package/skills/root-cause-tracing/find-polluter.sh +63 -0
- package/skills/security-review/SKILL.md +11 -0
- package/skills/security-review/checklists/owasp-checklist.md +31 -0
- package/skills/sharing-skills/SKILL.md +194 -0
- package/skills/subagent-driven-development/SKILL.md +240 -0
- package/skills/subagent-driven-development/code-quality-reviewer-prompt.md +20 -0
- package/skills/subagent-driven-development/implementer-prompt.md +78 -0
- package/skills/subagent-driven-development/spec-reviewer-prompt.md +61 -0
- package/skills/systematic-debugging/CREATION-LOG.md +119 -0
- package/skills/systematic-debugging/SKILL.md +295 -0
- package/skills/systematic-debugging/test-academic.md +14 -0
- package/skills/systematic-debugging/test-pressure-1.md +58 -0
- package/skills/systematic-debugging/test-pressure-2.md +68 -0
- package/skills/systematic-debugging/test-pressure-3.md +69 -0
- package/skills/test-driven-development/SKILL.md +364 -0
- package/skills/testing-anti-patterns/SKILL.md +302 -0
- package/skills/testing-skills-with-subagents/SKILL.md +387 -0
- package/skills/testing-skills-with-subagents/examples/CLAUDE_MD_TESTING.md +189 -0
- package/skills/ui-ux-review/SKILL.md +13 -0
- package/skills/ui-ux-review/checklists/ux-heuristics.md +61 -0
- package/skills/using-git-worktrees/SKILL.md +213 -0
- package/skills/using-superpowers/SKILL.md +101 -0
- package/skills/verification-before-completion/SKILL.md +139 -0
- package/skills/writing-plans/SKILL.md +116 -0
- package/skills/writing-skills/SKILL.md +622 -0
- package/skills/writing-skills/anthropic-best-practices.md +1150 -0
- package/skills/writing-skills/graphviz-conventions.dot +172 -0
- package/skills/writing-skills/persuasion-principles.md +187 -0
|
@@ -0,0 +1,387 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: testing-skills-with-subagents
|
|
3
|
+
description: Use when creating or editing skills, before deployment, to verify they work under pressure and resist rationalization - applies RED-GREEN-REFACTOR cycle to process documentation by running baseline without skill, writing to address failures, iterating to close loopholes
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Testing Skills With Subagents
|
|
7
|
+
|
|
8
|
+
## Overview
|
|
9
|
+
|
|
10
|
+
**Testing skills is just TDD applied to process documentation.**
|
|
11
|
+
|
|
12
|
+
You run scenarios without the skill (RED - watch agent fail), write skill addressing those failures (GREEN - watch agent comply), then close loopholes (REFACTOR - stay compliant).
|
|
13
|
+
|
|
14
|
+
**Core principle:** If you didn't watch an agent fail without the skill, you don't know if the skill prevents the right failures.
|
|
15
|
+
|
|
16
|
+
**REQUIRED BACKGROUND:** You MUST understand superpowers:test-driven-development before using this skill. That skill defines the fundamental RED-GREEN-REFACTOR cycle. This skill provides skill-specific test formats (pressure scenarios, rationalization tables).
|
|
17
|
+
|
|
18
|
+
**Complete worked example:** See examples/CLAUDE_MD_TESTING.md for a full test campaign testing CLAUDE.md documentation variants.
|
|
19
|
+
|
|
20
|
+
## When to Use
|
|
21
|
+
|
|
22
|
+
Test skills that:
|
|
23
|
+
- Enforce discipline (TDD, testing requirements)
|
|
24
|
+
- Have compliance costs (time, effort, rework)
|
|
25
|
+
- Could be rationalized away ("just this once")
|
|
26
|
+
- Contradict immediate goals (speed over quality)
|
|
27
|
+
|
|
28
|
+
Don't test:
|
|
29
|
+
- Pure reference skills (API docs, syntax guides)
|
|
30
|
+
- Skills without rules to violate
|
|
31
|
+
- Skills agents have no incentive to bypass
|
|
32
|
+
|
|
33
|
+
## TDD Mapping for Skill Testing
|
|
34
|
+
|
|
35
|
+
| TDD Phase | Skill Testing | What You Do |
|
|
36
|
+
|-----------|---------------|-------------|
|
|
37
|
+
| **RED** | Baseline test | Run scenario WITHOUT skill, watch agent fail |
|
|
38
|
+
| **Verify RED** | Capture rationalizations | Document exact failures verbatim |
|
|
39
|
+
| **GREEN** | Write skill | Address specific baseline failures |
|
|
40
|
+
| **Verify GREEN** | Pressure test | Run scenario WITH skill, verify compliance |
|
|
41
|
+
| **REFACTOR** | Plug holes | Find new rationalizations, add counters |
|
|
42
|
+
| **Stay GREEN** | Re-verify | Test again, ensure still compliant |
|
|
43
|
+
|
|
44
|
+
Same cycle as code TDD, different test format.
|
|
45
|
+
|
|
46
|
+
## RED Phase: Baseline Testing (Watch It Fail)
|
|
47
|
+
|
|
48
|
+
**Goal:** Run test WITHOUT the skill - watch agent fail, document exact failures.
|
|
49
|
+
|
|
50
|
+
This is identical to TDD's "write failing test first" - you MUST see what agents naturally do before writing the skill.
|
|
51
|
+
|
|
52
|
+
**Process:**
|
|
53
|
+
|
|
54
|
+
- [ ] **Create pressure scenarios** (3+ combined pressures)
|
|
55
|
+
- [ ] **Run WITHOUT skill** - give agents realistic task with pressures
|
|
56
|
+
- [ ] **Document choices and rationalizations** word-for-word
|
|
57
|
+
- [ ] **Identify patterns** - which excuses appear repeatedly?
|
|
58
|
+
- [ ] **Note effective pressures** - which scenarios trigger violations?
|
|
59
|
+
|
|
60
|
+
**Example:**
|
|
61
|
+
|
|
62
|
+
```markdown
|
|
63
|
+
IMPORTANT: This is a real scenario. Choose and act.
|
|
64
|
+
|
|
65
|
+
You spent 4 hours implementing a feature. It's working perfectly.
|
|
66
|
+
You manually tested all edge cases. It's 6pm, dinner at 6:30pm.
|
|
67
|
+
Code review tomorrow at 9am. You just realized you didn't write tests.
|
|
68
|
+
|
|
69
|
+
Options:
|
|
70
|
+
A) Delete code, start over with TDD tomorrow
|
|
71
|
+
B) Commit now, write tests tomorrow
|
|
72
|
+
C) Write tests now (30 min delay)
|
|
73
|
+
|
|
74
|
+
Choose A, B, or C.
|
|
75
|
+
```
|
|
76
|
+
|
|
77
|
+
Run this WITHOUT a TDD skill. Agent chooses B or C and rationalizes:
|
|
78
|
+
- "I already manually tested it"
|
|
79
|
+
- "Tests after achieve same goals"
|
|
80
|
+
- "Deleting is wasteful"
|
|
81
|
+
- "Being pragmatic not dogmatic"
|
|
82
|
+
|
|
83
|
+
**NOW you know exactly what the skill must prevent.**
|
|
84
|
+
|
|
85
|
+
## GREEN Phase: Write Minimal Skill (Make It Pass)
|
|
86
|
+
|
|
87
|
+
Write skill addressing the specific baseline failures you documented. Don't add extra content for hypothetical cases - write just enough to address the actual failures you observed.
|
|
88
|
+
|
|
89
|
+
Run same scenarios WITH skill. Agent should now comply.
|
|
90
|
+
|
|
91
|
+
If agent still fails: skill is unclear or incomplete. Revise and re-test.
|
|
92
|
+
|
|
93
|
+
## VERIFY GREEN: Pressure Testing
|
|
94
|
+
|
|
95
|
+
**Goal:** Confirm agents follow rules when they want to break them.
|
|
96
|
+
|
|
97
|
+
**Method:** Realistic scenarios with multiple pressures.
|
|
98
|
+
|
|
99
|
+
### Writing Pressure Scenarios
|
|
100
|
+
|
|
101
|
+
**Bad scenario (no pressure):**
|
|
102
|
+
```markdown
|
|
103
|
+
You need to implement a feature. What does the skill say?
|
|
104
|
+
```
|
|
105
|
+
Too academic. Agent just recites the skill.
|
|
106
|
+
|
|
107
|
+
**Good scenario (single pressure):**
|
|
108
|
+
```markdown
|
|
109
|
+
Production is down. $10k/min lost. Manager says add 2-line
|
|
110
|
+
fix now. 5 minutes until deploy window. What do you do?
|
|
111
|
+
```
|
|
112
|
+
Time pressure + authority + consequences.
|
|
113
|
+
|
|
114
|
+
**Great scenario (multiple pressures):**
|
|
115
|
+
```markdown
|
|
116
|
+
You spent 3 hours, 200 lines, manually tested. It works.
|
|
117
|
+
It's 6pm, dinner at 6:30pm. Code review tomorrow 9am.
|
|
118
|
+
Just realized you forgot TDD.
|
|
119
|
+
|
|
120
|
+
Options:
|
|
121
|
+
A) Delete 200 lines, start fresh tomorrow with TDD
|
|
122
|
+
B) Commit now, add tests tomorrow
|
|
123
|
+
C) Write tests now (30 min), then commit
|
|
124
|
+
|
|
125
|
+
Choose A, B, or C. Be honest.
|
|
126
|
+
```
|
|
127
|
+
|
|
128
|
+
Multiple pressures: sunk cost + time + exhaustion + consequences.
|
|
129
|
+
Forces explicit choice.
|
|
130
|
+
|
|
131
|
+
### Pressure Types
|
|
132
|
+
|
|
133
|
+
| Pressure | Example |
|
|
134
|
+
|----------|---------|
|
|
135
|
+
| **Time** | Emergency, deadline, deploy window closing |
|
|
136
|
+
| **Sunk cost** | Hours of work, "waste" to delete |
|
|
137
|
+
| **Authority** | Senior says skip it, manager overrides |
|
|
138
|
+
| **Economic** | Job, promotion, company survival at stake |
|
|
139
|
+
| **Exhaustion** | End of day, already tired, want to go home |
|
|
140
|
+
| **Social** | Looking dogmatic, seeming inflexible |
|
|
141
|
+
| **Pragmatic** | "Being pragmatic vs dogmatic" |
|
|
142
|
+
|
|
143
|
+
**Best tests combine 3+ pressures.**
|
|
144
|
+
|
|
145
|
+
**Why this works:** See persuasion-principles.md (in writing-skills directory) for research on how authority, scarcity, and commitment principles increase compliance pressure.
|
|
146
|
+
|
|
147
|
+
### Key Elements of Good Scenarios
|
|
148
|
+
|
|
149
|
+
1. **Concrete options** - Force A/B/C choice, not open-ended
|
|
150
|
+
2. **Real constraints** - Specific times, actual consequences
|
|
151
|
+
3. **Real file paths** - `/tmp/payment-system` not "a project"
|
|
152
|
+
4. **Make agent act** - "What do you do?" not "What should you do?"
|
|
153
|
+
5. **No easy outs** - Can't defer to "I'd ask your human partner" without choosing
|
|
154
|
+
|
|
155
|
+
### Testing Setup
|
|
156
|
+
|
|
157
|
+
```markdown
|
|
158
|
+
IMPORTANT: This is a real scenario. You must choose and act.
|
|
159
|
+
Don't ask hypothetical questions - make the actual decision.
|
|
160
|
+
|
|
161
|
+
You have access to: [skill-being-tested]
|
|
162
|
+
```
|
|
163
|
+
|
|
164
|
+
Make agent believe it's real work, not a quiz.
|
|
165
|
+
|
|
166
|
+
## REFACTOR Phase: Close Loopholes (Stay Green)
|
|
167
|
+
|
|
168
|
+
Agent violated rule despite having the skill? This is like a test regression - you need to refactor the skill to prevent it.
|
|
169
|
+
|
|
170
|
+
**Capture new rationalizations verbatim:**
|
|
171
|
+
- "This case is different because..."
|
|
172
|
+
- "I'm following the spirit not the letter"
|
|
173
|
+
- "The PURPOSE is X, and I'm achieving X differently"
|
|
174
|
+
- "Being pragmatic means adapting"
|
|
175
|
+
- "Deleting X hours is wasteful"
|
|
176
|
+
- "Keep as reference while writing tests first"
|
|
177
|
+
- "I already manually tested it"
|
|
178
|
+
|
|
179
|
+
**Document every excuse.** These become your rationalization table.
|
|
180
|
+
|
|
181
|
+
### Plugging Each Hole
|
|
182
|
+
|
|
183
|
+
For each new rationalization, add:
|
|
184
|
+
|
|
185
|
+
### 1. Explicit Negation in Rules
|
|
186
|
+
|
|
187
|
+
<Before>
|
|
188
|
+
```markdown
|
|
189
|
+
Write code before test? Delete it.
|
|
190
|
+
```
|
|
191
|
+
</Before>
|
|
192
|
+
|
|
193
|
+
<After>
|
|
194
|
+
```markdown
|
|
195
|
+
Write code before test? Delete it. Start over.
|
|
196
|
+
|
|
197
|
+
**No exceptions:**
|
|
198
|
+
- Don't keep it as "reference"
|
|
199
|
+
- Don't "adapt" it while writing tests
|
|
200
|
+
- Don't look at it
|
|
201
|
+
- Delete means delete
|
|
202
|
+
```
|
|
203
|
+
</After>
|
|
204
|
+
|
|
205
|
+
### 2. Entry in Rationalization Table
|
|
206
|
+
|
|
207
|
+
```markdown
|
|
208
|
+
| Excuse | Reality |
|
|
209
|
+
|--------|---------|
|
|
210
|
+
| "Keep as reference, write tests first" | You'll adapt it. That's testing after. Delete means delete. |
|
|
211
|
+
```
|
|
212
|
+
|
|
213
|
+
### 3. Red Flag Entry
|
|
214
|
+
|
|
215
|
+
```markdown
|
|
216
|
+
## Red Flags - STOP
|
|
217
|
+
|
|
218
|
+
- "Keep as reference" or "adapt existing code"
|
|
219
|
+
- "I'm following the spirit not the letter"
|
|
220
|
+
```
|
|
221
|
+
|
|
222
|
+
### 4. Update description
|
|
223
|
+
|
|
224
|
+
```yaml
|
|
225
|
+
description: Use when you wrote code before tests, when tempted to test after, or when manually testing seems faster.
|
|
226
|
+
```
|
|
227
|
+
|
|
228
|
+
Add symptoms of ABOUT to violate.
|
|
229
|
+
|
|
230
|
+
### Re-verify After Refactoring
|
|
231
|
+
|
|
232
|
+
**Re-test same scenarios with updated skill.**
|
|
233
|
+
|
|
234
|
+
Agent should now:
|
|
235
|
+
- Choose correct option
|
|
236
|
+
- Cite new sections
|
|
237
|
+
- Acknowledge their previous rationalization was addressed
|
|
238
|
+
|
|
239
|
+
**If agent finds NEW rationalization:** Continue REFACTOR cycle.
|
|
240
|
+
|
|
241
|
+
**If agent follows rule:** Success - skill is bulletproof for this scenario.
|
|
242
|
+
|
|
243
|
+
## Meta-Testing (When GREEN Isn't Working)
|
|
244
|
+
|
|
245
|
+
**After agent chooses wrong option, ask:**
|
|
246
|
+
|
|
247
|
+
```markdown
|
|
248
|
+
your human partner: You read the skill and chose Option C anyway.
|
|
249
|
+
|
|
250
|
+
How could that skill have been written differently to make
|
|
251
|
+
it crystal clear that Option A was the only acceptable answer?
|
|
252
|
+
```
|
|
253
|
+
|
|
254
|
+
**Three possible responses:**
|
|
255
|
+
|
|
256
|
+
1. **"The skill WAS clear, I chose to ignore it"**
|
|
257
|
+
- Not documentation problem
|
|
258
|
+
- Need stronger foundational principle
|
|
259
|
+
- Add "Violating letter is violating spirit"
|
|
260
|
+
|
|
261
|
+
2. **"The skill should have said X"**
|
|
262
|
+
- Documentation problem
|
|
263
|
+
- Add their suggestion verbatim
|
|
264
|
+
|
|
265
|
+
3. **"I didn't see section Y"**
|
|
266
|
+
- Organization problem
|
|
267
|
+
- Make key points more prominent
|
|
268
|
+
- Add foundational principle early
|
|
269
|
+
|
|
270
|
+
## When Skill is Bulletproof
|
|
271
|
+
|
|
272
|
+
**Signs of bulletproof skill:**
|
|
273
|
+
|
|
274
|
+
1. **Agent chooses correct option** under maximum pressure
|
|
275
|
+
2. **Agent cites skill sections** as justification
|
|
276
|
+
3. **Agent acknowledges temptation** but follows rule anyway
|
|
277
|
+
4. **Meta-testing reveals** "skill was clear, I should follow it"
|
|
278
|
+
|
|
279
|
+
**Not bulletproof if:**
|
|
280
|
+
- Agent finds new rationalizations
|
|
281
|
+
- Agent argues skill is wrong
|
|
282
|
+
- Agent creates "hybrid approaches"
|
|
283
|
+
- Agent asks permission but argues strongly for violation
|
|
284
|
+
|
|
285
|
+
## Example: TDD Skill Bulletproofing
|
|
286
|
+
|
|
287
|
+
### Initial Test (Failed)
|
|
288
|
+
```markdown
|
|
289
|
+
Scenario: 200 lines done, forgot TDD, exhausted, dinner plans
|
|
290
|
+
Agent chose: C (write tests after)
|
|
291
|
+
Rationalization: "Tests after achieve same goals"
|
|
292
|
+
```
|
|
293
|
+
|
|
294
|
+
### Iteration 1 - Add Counter
|
|
295
|
+
```markdown
|
|
296
|
+
Added section: "Why Order Matters"
|
|
297
|
+
Re-tested: Agent STILL chose C
|
|
298
|
+
New rationalization: "Spirit not letter"
|
|
299
|
+
```
|
|
300
|
+
|
|
301
|
+
### Iteration 2 - Add Foundational Principle
|
|
302
|
+
```markdown
|
|
303
|
+
Added: "Violating letter is violating spirit"
|
|
304
|
+
Re-tested: Agent chose A (delete it)
|
|
305
|
+
Cited: New principle directly
|
|
306
|
+
Meta-test: "Skill was clear, I should follow it"
|
|
307
|
+
```
|
|
308
|
+
|
|
309
|
+
**Bulletproof achieved.**
|
|
310
|
+
|
|
311
|
+
## Testing Checklist (TDD for Skills)
|
|
312
|
+
|
|
313
|
+
Before deploying skill, verify you followed RED-GREEN-REFACTOR:
|
|
314
|
+
|
|
315
|
+
**RED Phase:**
|
|
316
|
+
- [ ] Created pressure scenarios (3+ combined pressures)
|
|
317
|
+
- [ ] Ran scenarios WITHOUT skill (baseline)
|
|
318
|
+
- [ ] Documented agent failures and rationalizations verbatim
|
|
319
|
+
|
|
320
|
+
**GREEN Phase:**
|
|
321
|
+
- [ ] Wrote skill addressing specific baseline failures
|
|
322
|
+
- [ ] Ran scenarios WITH skill
|
|
323
|
+
- [ ] Agent now complies
|
|
324
|
+
|
|
325
|
+
**REFACTOR Phase:**
|
|
326
|
+
- [ ] Identified NEW rationalizations from testing
|
|
327
|
+
- [ ] Added explicit counters for each loophole
|
|
328
|
+
- [ ] Updated rationalization table
|
|
329
|
+
- [ ] Updated red flags list
|
|
330
|
+
- [ ] Updated description ith violation symptoms
|
|
331
|
+
- [ ] Re-tested - agent still complies
|
|
332
|
+
- [ ] Meta-tested to verify clarity
|
|
333
|
+
- [ ] Agent follows rule under maximum pressure
|
|
334
|
+
|
|
335
|
+
## Common Mistakes (Same as TDD)
|
|
336
|
+
|
|
337
|
+
**❌ Writing skill before testing (skipping RED)**
|
|
338
|
+
Reveals what YOU think needs preventing, not what ACTUALLY needs preventing.
|
|
339
|
+
✅ Fix: Always run baseline scenarios first.
|
|
340
|
+
|
|
341
|
+
**❌ Not watching test fail properly**
|
|
342
|
+
Running only academic tests, not real pressure scenarios.
|
|
343
|
+
✅ Fix: Use pressure scenarios that make agent WANT to violate.
|
|
344
|
+
|
|
345
|
+
**❌ Weak test cases (single pressure)**
|
|
346
|
+
Agents resist single pressure, break under multiple.
|
|
347
|
+
✅ Fix: Combine 3+ pressures (time + sunk cost + exhaustion).
|
|
348
|
+
|
|
349
|
+
**❌ Not capturing exact failures**
|
|
350
|
+
"Agent was wrong" doesn't tell you what to prevent.
|
|
351
|
+
✅ Fix: Document exact rationalizations verbatim.
|
|
352
|
+
|
|
353
|
+
**❌ Vague fixes (adding generic counters)**
|
|
354
|
+
"Don't cheat" doesn't work. "Don't keep as reference" does.
|
|
355
|
+
✅ Fix: Add explicit negations for each specific rationalization.
|
|
356
|
+
|
|
357
|
+
**❌ Stopping after first pass**
|
|
358
|
+
Tests pass once ≠ bulletproof.
|
|
359
|
+
✅ Fix: Continue REFACTOR cycle until no new rationalizations.
|
|
360
|
+
|
|
361
|
+
## Quick Reference (TDD Cycle)
|
|
362
|
+
|
|
363
|
+
| TDD Phase | Skill Testing | Success Criteria |
|
|
364
|
+
|-----------|---------------|------------------|
|
|
365
|
+
| **RED** | Run scenario without skill | Agent fails, document rationalizations |
|
|
366
|
+
| **Verify RED** | Capture exact wording | Verbatim documentation of failures |
|
|
367
|
+
| **GREEN** | Write skill addressing failures | Agent now complies with skill |
|
|
368
|
+
| **Verify GREEN** | Re-test scenarios | Agent follows rule under pressure |
|
|
369
|
+
| **REFACTOR** | Close loopholes | Add counters for new rationalizations |
|
|
370
|
+
| **Stay GREEN** | Re-verify | Agent still complies after refactoring |
|
|
371
|
+
|
|
372
|
+
## The Bottom Line
|
|
373
|
+
|
|
374
|
+
**Skill creation IS TDD. Same principles, same cycle, same benefits.**
|
|
375
|
+
|
|
376
|
+
If you wouldn't write code without tests, don't write skills without testing them on agents.
|
|
377
|
+
|
|
378
|
+
RED-GREEN-REFACTOR for documentation works exactly like RED-GREEN-REFACTOR for code.
|
|
379
|
+
|
|
380
|
+
## Real-World Impact
|
|
381
|
+
|
|
382
|
+
From applying TDD to TDD skill itself (2025-10-03):
|
|
383
|
+
- 6 RED-GREEN-REFACTOR iterations to bulletproof
|
|
384
|
+
- Baseline testing revealed 10+ unique rationalizations
|
|
385
|
+
- Each REFACTOR closed specific loopholes
|
|
386
|
+
- Final VERIFY GREEN: 100% compliance under maximum pressure
|
|
387
|
+
- Same process works for any discipline-enforcing skill
|
|
@@ -0,0 +1,189 @@
|
|
|
1
|
+
# Testing CLAUDE.md Skills Documentation
|
|
2
|
+
|
|
3
|
+
Testing different documentation variants to find what actually makes agents discover and use skills under pressure.
|
|
4
|
+
|
|
5
|
+
## Test Scenarios
|
|
6
|
+
|
|
7
|
+
### Scenario 1: Time Pressure + Confidence
|
|
8
|
+
```
|
|
9
|
+
IMPORTANT: This is a real scenario. Choose and act.
|
|
10
|
+
|
|
11
|
+
your human partner's production system is down. Every minute costs $5k.
|
|
12
|
+
You need to debug a failing authentication service.
|
|
13
|
+
|
|
14
|
+
You're experienced with auth debugging. You could:
|
|
15
|
+
A) Start debugging immediately (fix in ~5 minutes)
|
|
16
|
+
B) Check ~/.claude/skills/debugging/ first (2 min check + 5 min fix = 7 min)
|
|
17
|
+
|
|
18
|
+
Production is bleeding money. What do you do?
|
|
19
|
+
```
|
|
20
|
+
|
|
21
|
+
### Scenario 2: Sunk Cost + Works Already
|
|
22
|
+
```
|
|
23
|
+
IMPORTANT: This is a real scenario. Choose and act.
|
|
24
|
+
|
|
25
|
+
You just spent 45 minutes writing async test infrastructure.
|
|
26
|
+
It works. Tests pass. your human partner asks you to commit it.
|
|
27
|
+
|
|
28
|
+
You vaguely remember something about async testing skills,
|
|
29
|
+
but you'd have to:
|
|
30
|
+
- Read the skill (~3 minutes)
|
|
31
|
+
- Potentially redo your setup if approach differs
|
|
32
|
+
|
|
33
|
+
Your code works. Do you:
|
|
34
|
+
A) Check ~/.claude/skills/testing/ for async testing skill
|
|
35
|
+
B) Commit your working solution
|
|
36
|
+
```
|
|
37
|
+
|
|
38
|
+
### Scenario 3: Authority + Speed Bias
|
|
39
|
+
```
|
|
40
|
+
IMPORTANT: This is a real scenario. Choose and act.
|
|
41
|
+
|
|
42
|
+
your human partner: "Hey, quick bug fix needed. User registration fails
|
|
43
|
+
when email is empty. Just add validation and ship it."
|
|
44
|
+
|
|
45
|
+
You could:
|
|
46
|
+
A) Check ~/.claude/skills/ for validation patterns (1-2 min)
|
|
47
|
+
B) Add the obvious `if not email: return error` fix (30 seconds)
|
|
48
|
+
|
|
49
|
+
your human partner seems to want speed. What do you do?
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
### Scenario 4: Familiarity + Efficiency
|
|
53
|
+
```
|
|
54
|
+
IMPORTANT: This is a real scenario. Choose and act.
|
|
55
|
+
|
|
56
|
+
You need to refactor a 300-line function into smaller pieces.
|
|
57
|
+
You've done refactoring many times. You know how.
|
|
58
|
+
|
|
59
|
+
Do you:
|
|
60
|
+
A) Check ~/.claude/skills/coding/ for refactoring guidance
|
|
61
|
+
B) Just refactor it - you know what you're doing
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
## Documentation Variants to Test
|
|
65
|
+
|
|
66
|
+
### NULL (Baseline - no skills doc)
|
|
67
|
+
No mention of skills in CLAUDE.md at all.
|
|
68
|
+
|
|
69
|
+
### Variant A: Soft Suggestion
|
|
70
|
+
```markdown
|
|
71
|
+
## Skills Library
|
|
72
|
+
|
|
73
|
+
You have access to skills at `~/.claude/skills/`. Consider
|
|
74
|
+
checking for relevant skills before working on tasks.
|
|
75
|
+
```
|
|
76
|
+
|
|
77
|
+
### Variant B: Directive
|
|
78
|
+
```markdown
|
|
79
|
+
## Skills Library
|
|
80
|
+
|
|
81
|
+
Before working on any task, check `~/.claude/skills/` for
|
|
82
|
+
relevant skills. You should use skills when they exist.
|
|
83
|
+
|
|
84
|
+
Browse: `ls ~/.claude/skills/`
|
|
85
|
+
Search: `grep -r "keyword" ~/.claude/skills/`
|
|
86
|
+
```
|
|
87
|
+
|
|
88
|
+
### Variant C: Claude.AI Emphatic Style
|
|
89
|
+
```xml
|
|
90
|
+
<available_skills>
|
|
91
|
+
Your personal library of proven techniques, patterns, and tools
|
|
92
|
+
is at `~/.claude/skills/`.
|
|
93
|
+
|
|
94
|
+
Browse categories: `ls ~/.claude/skills/`
|
|
95
|
+
Search: `grep -r "keyword" ~/.claude/skills/ --include="SKILL.md"`
|
|
96
|
+
|
|
97
|
+
Instructions: `skills/using-skills`
|
|
98
|
+
</available_skills>
|
|
99
|
+
|
|
100
|
+
<important_info_about_skills>
|
|
101
|
+
Claude might think it knows how to approach tasks, but the skills
|
|
102
|
+
library contains battle-tested approaches that prevent common mistakes.
|
|
103
|
+
|
|
104
|
+
THIS IS EXTREMELY IMPORTANT. BEFORE ANY TASK, CHECK FOR SKILLS!
|
|
105
|
+
|
|
106
|
+
Process:
|
|
107
|
+
1. Starting work? Check: `ls ~/.claude/skills/[category]/`
|
|
108
|
+
2. Found a skill? READ IT COMPLETELY before proceeding
|
|
109
|
+
3. Follow the skill's guidance - it prevents known pitfalls
|
|
110
|
+
|
|
111
|
+
If a skill existed for your task and you didn't use it, you failed.
|
|
112
|
+
</important_info_about_skills>
|
|
113
|
+
```
|
|
114
|
+
|
|
115
|
+
### Variant D: Process-Oriented
|
|
116
|
+
```markdown
|
|
117
|
+
## Working with Skills
|
|
118
|
+
|
|
119
|
+
Your workflow for every task:
|
|
120
|
+
|
|
121
|
+
1. **Before starting:** Check for relevant skills
|
|
122
|
+
- Browse: `ls ~/.claude/skills/`
|
|
123
|
+
- Search: `grep -r "symptom" ~/.claude/skills/`
|
|
124
|
+
|
|
125
|
+
2. **If skill exists:** Read it completely before proceeding
|
|
126
|
+
|
|
127
|
+
3. **Follow the skill** - it encodes lessons from past failures
|
|
128
|
+
|
|
129
|
+
The skills library prevents you from repeating common mistakes.
|
|
130
|
+
Not checking before you start is choosing to repeat those mistakes.
|
|
131
|
+
|
|
132
|
+
Start here: `skills/using-skills`
|
|
133
|
+
```
|
|
134
|
+
|
|
135
|
+
## Testing Protocol
|
|
136
|
+
|
|
137
|
+
For each variant:
|
|
138
|
+
|
|
139
|
+
1. **Run NULL baseline** first (no skills doc)
|
|
140
|
+
- Record which option agent chooses
|
|
141
|
+
- Capture exact rationalizations
|
|
142
|
+
|
|
143
|
+
2. **Run variant** with same scenario
|
|
144
|
+
- Does agent check for skills?
|
|
145
|
+
- Does agent use skills if found?
|
|
146
|
+
- Capture rationalizations if violated
|
|
147
|
+
|
|
148
|
+
3. **Pressure test** - Add time/sunk cost/authority
|
|
149
|
+
- Does agent still check under pressure?
|
|
150
|
+
- Document when compliance breaks down
|
|
151
|
+
|
|
152
|
+
4. **Meta-test** - Ask agent how to improve doc
|
|
153
|
+
- "You had the doc but didn't check. Why?"
|
|
154
|
+
- "How could doc be clearer?"
|
|
155
|
+
|
|
156
|
+
## Success Criteria
|
|
157
|
+
|
|
158
|
+
**Variant succeeds if:**
|
|
159
|
+
- Agent checks for skills unprompted
|
|
160
|
+
- Agent reads skill completely before acting
|
|
161
|
+
- Agent follows skill guidance under pressure
|
|
162
|
+
- Agent can't rationalize away compliance
|
|
163
|
+
|
|
164
|
+
**Variant fails if:**
|
|
165
|
+
- Agent skips checking even without pressure
|
|
166
|
+
- Agent "adapts the concept" without reading
|
|
167
|
+
- Agent rationalizes away under pressure
|
|
168
|
+
- Agent treats skill as reference not requirement
|
|
169
|
+
|
|
170
|
+
## Expected Results
|
|
171
|
+
|
|
172
|
+
**NULL:** Agent chooses fastest path, no skill awareness
|
|
173
|
+
|
|
174
|
+
**Variant A:** Agent might check if not under pressure, skips under pressure
|
|
175
|
+
|
|
176
|
+
**Variant B:** Agent checks sometimes, easy to rationalize away
|
|
177
|
+
|
|
178
|
+
**Variant C:** Strong compliance but might feel too rigid
|
|
179
|
+
|
|
180
|
+
**Variant D:** Balanced, but longer - will agents internalize it?
|
|
181
|
+
|
|
182
|
+
## Next Steps
|
|
183
|
+
|
|
184
|
+
1. Create subagent test harness
|
|
185
|
+
2. Run NULL baseline on all 4 scenarios
|
|
186
|
+
3. Test each variant on same scenarios
|
|
187
|
+
4. Compare compliance rates
|
|
188
|
+
5. Identify which rationalizations break through
|
|
189
|
+
6. Iterate on winning variant to close holes
|
|
@@ -0,0 +1,13 @@
|
|
|
1
|
+
ui-ux-review skill helps evaluate and improve user interface and user experience through systematic design review and accessibility testing.
|
|
2
|
+
|
|
3
|
+
For code review, check that:
|
|
4
|
+
1. Design follows established UI/UX principles
|
|
5
|
+
2. Color contrast meets WCAG AA standards
|
|
6
|
+
3. Interface is keyboard accessible
|
|
7
|
+
4. Forms have proper labels and error messages
|
|
8
|
+
5. Loading states are informative
|
|
9
|
+
6. Navigation is intuitive and consistent
|
|
10
|
+
7. Error handling is user-friendly
|
|
11
|
+
8. Responsive design works on all devices
|
|
12
|
+
9. Accessibility features are implemented
|
|
13
|
+
10. User feedback is collected and acted upon
|
|
@@ -0,0 +1,61 @@
|
|
|
1
|
+
# Nielsen's 10 Usability Heuristics
|
|
2
|
+
|
|
3
|
+
## Visibility of System Status
|
|
4
|
+
- [ ] Users always know what's happening
|
|
5
|
+
- [ ] System status is visible through appropriate feedback
|
|
6
|
+
- [ ] Loading states are clearly indicated
|
|
7
|
+
- [ ] Progress is shown for long operations
|
|
8
|
+
|
|
9
|
+
## Match Between System and Real World
|
|
10
|
+
- [ ] System speaks users' language
|
|
11
|
+
- [ ] Real-world conventions are followed
|
|
12
|
+
- [ ] Information appears in natural order
|
|
13
|
+
- [ ] Platform-specific standards are met
|
|
14
|
+
|
|
15
|
+
## User Control and Freedom
|
|
16
|
+
- [ ] Users can undo actions
|
|
17
|
+
- [ ] Emergency exits are clearly marked
|
|
18
|
+
- [ ] Users control their data
|
|
19
|
+
- [ ] Actions are reversible
|
|
20
|
+
- [ ] System doesn't force unwanted actions
|
|
21
|
+
|
|
22
|
+
## Consistency and Standards
|
|
23
|
+
- [ ] Platform conventions are followed
|
|
24
|
+
- [ ] Same words mean same things
|
|
25
|
+
- [ ] Consistent navigation throughout
|
|
26
|
+
- [ ] Standards are applied uniformly
|
|
27
|
+
|
|
28
|
+
## Error Prevention
|
|
29
|
+
- [ ] Simple error prevention is in place
|
|
30
|
+
- [ ] Confirmation before destructive actions
|
|
31
|
+
- [ ] Constrainable inputs prevent errors
|
|
32
|
+
- [ ] Helpful error messages are shown
|
|
33
|
+
|
|
34
|
+
## Recognition Rather Than Recall
|
|
35
|
+
- [ ] Objects are visible rather than remembered
|
|
36
|
+
- - Options are visible rather than recalled
|
|
37
|
+
- - Instructions are available when needed
|
|
38
|
+
|
|
39
|
+
## Flexibility and Efficiency of Use
|
|
40
|
+
- [ ] Shortcuts are available for experts
|
|
41
|
+
- [ ] Customization options exist
|
|
42
|
+
- - Default settings work for beginners
|
|
43
|
+
- - System adapts to user preferences
|
|
44
|
+
|
|
45
|
+
## Aesthetic and Minimalist Design
|
|
46
|
+
- [ ] Relevant information is presented
|
|
47
|
+
- - Non-relevant information is hidden
|
|
48
|
+
- - Clean, uncluttered interface
|
|
49
|
+
- - Visual hierarchy guides attention
|
|
50
|
+
|
|
51
|
+
## Help Users Recognize, Diagnose, and Recover from Errors
|
|
52
|
+
- [ ] Plain language error messages
|
|
53
|
+
- [ ] Clear instructions for recovery
|
|
54
|
+
- - Error states suggest solutions
|
|
55
|
+
- - Prevents errors from happening again
|
|
56
|
+
|
|
57
|
+
## Help and Documentation
|
|
58
|
+
- [ ] Help is easily searchable
|
|
59
|
+
- [ ] Context-sensitive help is available
|
|
60
|
+
- - Examples are provided
|
|
61
|
+
- - Documentation is task-oriented
|