safeword 0.1.0 → 0.2.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/{check-J6DFVBCE.js → check-US6EQLNS.js} +3 -3
- package/dist/check-US6EQLNS.js.map +1 -0
- package/dist/chunk-2XWIUEQK.js +190 -0
- package/dist/chunk-2XWIUEQK.js.map +1 -0
- package/dist/{chunk-UQMQ64CB.js → chunk-GZRQL3SX.js} +41 -2
- package/dist/chunk-GZRQL3SX.js.map +1 -0
- package/dist/chunk-Z2SOGTNJ.js +7 -0
- package/dist/{chunk-WWQ4YRZN.js.map → chunk-Z2SOGTNJ.js.map} +1 -1
- package/dist/cli.js +6 -6
- package/dist/{diff-U4IELWRL.js → diff-72ZUEZ6A.js} +32 -29
- package/dist/diff-72ZUEZ6A.js.map +1 -0
- package/dist/index.d.ts +1 -1
- package/dist/index.js +1 -1
- package/dist/{reset-XETOHTCK.js → reset-3ACTIYYE.js} +44 -27
- package/dist/reset-3ACTIYYE.js.map +1 -0
- package/dist/{setup-CLDCHROZ.js → setup-TSFCHD2D.js} +77 -48
- package/dist/setup-TSFCHD2D.js.map +1 -0
- package/dist/{upgrade-DOKWRK7J.js → upgrade-XDPQFSMC.js} +38 -50
- package/dist/upgrade-XDPQFSMC.js.map +1 -0
- package/package.json +1 -1
- package/templates/SAFEWORD.md +776 -0
- package/templates/commands/arch-review.md +24 -0
- package/templates/commands/lint.md +11 -0
- package/templates/commands/quality-review.md +23 -0
- package/templates/doc-templates/architecture-template.md +136 -0
- package/templates/doc-templates/design-doc-template.md +134 -0
- package/templates/doc-templates/test-definitions-feature.md +131 -0
- package/templates/doc-templates/user-stories-template.md +92 -0
- package/templates/guides/architecture-guide.md +423 -0
- package/templates/guides/code-philosophy.md +195 -0
- package/templates/guides/context-files-guide.md +457 -0
- package/templates/guides/data-architecture-guide.md +200 -0
- package/templates/guides/design-doc-guide.md +171 -0
- package/templates/guides/learning-extraction.md +552 -0
- package/templates/guides/llm-instruction-design.md +248 -0
- package/templates/guides/llm-prompting.md +102 -0
- package/templates/guides/tdd-best-practices.md +615 -0
- package/templates/guides/test-definitions-guide.md +334 -0
- package/templates/guides/testing-methodology.md +618 -0
- package/templates/guides/user-story-guide.md +256 -0
- package/templates/guides/zombie-process-cleanup.md +219 -0
- package/templates/hooks/agents-md-check.sh +27 -0
- package/templates/hooks/inject-timestamp.sh +2 -3
- package/templates/hooks/post-tool.sh +4 -0
- package/templates/hooks/pre-commit.sh +10 -0
- package/templates/lib/common.sh +26 -0
- package/templates/lib/jq-fallback.sh +20 -0
- package/templates/markdownlint.jsonc +25 -0
- package/templates/prompts/arch-review.md +43 -0
- package/templates/prompts/quality-review.md +10 -0
- package/templates/skills/safeword-quality-reviewer/SKILL.md +207 -0
- package/dist/check-J6DFVBCE.js.map +0 -1
- package/dist/chunk-24OB57NJ.js +0 -78
- package/dist/chunk-24OB57NJ.js.map +0 -1
- package/dist/chunk-DB4CMUFD.js +0 -157
- package/dist/chunk-DB4CMUFD.js.map +0 -1
- package/dist/chunk-UQMQ64CB.js.map +0 -1
- package/dist/chunk-WWQ4YRZN.js +0 -7
- package/dist/diff-U4IELWRL.js.map +0 -1
- package/dist/reset-XETOHTCK.js.map +0 -1
- package/dist/setup-CLDCHROZ.js.map +0 -1
- package/dist/upgrade-DOKWRK7J.js.map +0 -1
|
@@ -0,0 +1,248 @@
|
|
|
1
|
+
# Writing Instructions for LLMs
|
|
2
|
+
|
|
3
|
+
**Context:** When creating documentation that LLMs will read and follow (like AGENTS.md, CLAUDE.md, testing guides, coding standards), different best practices apply than when prompting an LLM directly.
|
|
4
|
+
|
|
5
|
+
## Core Principles
|
|
6
|
+
|
|
7
|
+
**1. MECE Principle (Mutually Exclusive, Collectively Exhaustive)**
|
|
8
|
+
|
|
9
|
+
Decision trees and categorization must have no overlap and cover all cases. Research shows LLMs struggle with overlapping categories—McKinsey/BCG MECE framework ensures clear decision paths.
|
|
10
|
+
|
|
11
|
+
```markdown
|
|
12
|
+
❌ BAD - Not mutually exclusive:
|
|
13
|
+
├─ Pure function?
|
|
14
|
+
├─ Multiple components interacting?
|
|
15
|
+
├─ Full user flow?
|
|
16
|
+
|
|
17
|
+
Problem: A function with database calls could match both
|
|
18
|
+
|
|
19
|
+
✅ GOOD - Sequential, mutually exclusive:
|
|
20
|
+
|
|
21
|
+
1. AI content quality? → LLM Eval
|
|
22
|
+
2. Requires real browser? → E2E test
|
|
23
|
+
3. Multiple components? → Integration test
|
|
24
|
+
4. Pure function? → Unit test
|
|
25
|
+
|
|
26
|
+
Stops at first match, no ambiguity.
|
|
27
|
+
```
|
|
28
|
+
|
|
29
|
+
**2. Explicit Over Implicit**
|
|
30
|
+
|
|
31
|
+
Never assume LLMs know what you mean. Define all terms, even "obvious" ones.
|
|
32
|
+
|
|
33
|
+
```markdown
|
|
34
|
+
❌ BAD: "Test at the lowest level"
|
|
35
|
+
✅ GOOD: "Test with the fastest test type that can catch the bug"
|
|
36
|
+
|
|
37
|
+
Examples needing definition:
|
|
38
|
+
|
|
39
|
+
- "Critical paths" → Always critical: auth, payment. Rarely: UI polish, admin
|
|
40
|
+
- "Browser" → Real browser (Playwright/Cypress), not jsdom
|
|
41
|
+
- "Pure function" → Input → output, no I/O (define edge cases like Date.now())
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
**3. No Contradictions**
|
|
45
|
+
|
|
46
|
+
Different sections must align. LLMs don't reconcile conflicting guidance. When updating, grep for related terms and update all references.
|
|
47
|
+
|
|
48
|
+
```markdown
|
|
49
|
+
❌ BAD:
|
|
50
|
+
Section A: "Write E2E tests only for critical user paths"
|
|
51
|
+
Section B: "All user-facing features have at least one E2E test"
|
|
52
|
+
|
|
53
|
+
✅ GOOD:
|
|
54
|
+
Section A: "Write E2E tests only for critical user paths"
|
|
55
|
+
Section B: "All critical multi-page user flows have at least one E2E test"
|
|
56
|
+
|
|
57
|
+
- Definition of "critical" with examples
|
|
58
|
+
```
|
|
59
|
+
|
|
60
|
+
**4. Concrete Examples Over Abstract Rules**
|
|
61
|
+
|
|
62
|
+
Show, don't just tell. LLMs learn patterns from examples. For every rule, include 2-3 concrete examples showing good vs bad.
|
|
63
|
+
|
|
64
|
+
```markdown
|
|
65
|
+
❌ BAD: "Follow best practices for testing"
|
|
66
|
+
|
|
67
|
+
✅ GOOD:
|
|
68
|
+
// ❌ BAD - Testing business logic with E2E
|
|
69
|
+
test('discount calculation', async ({ page }) => {
|
|
70
|
+
await page.goto('/checkout')
|
|
71
|
+
await page.fill('[name="price"]', '100')
|
|
72
|
+
await expect(page.locator('.total')).toContainText('80')
|
|
73
|
+
})
|
|
74
|
+
|
|
75
|
+
// ✅ GOOD - Unit test (runs in milliseconds)
|
|
76
|
+
it('applies 20% discount', () => {
|
|
77
|
+
expect(calculateDiscount(100, 0.20)).toBe(80)
|
|
78
|
+
})
|
|
79
|
+
```
|
|
80
|
+
|
|
81
|
+
**5. Edge Cases Must Be Explicit**
|
|
82
|
+
|
|
83
|
+
What seems obvious to humans often isn't to LLMs. After stating a rule, add "Edge cases:" section with common confusing scenarios.
|
|
84
|
+
|
|
85
|
+
```markdown
|
|
86
|
+
❌ BAD: "Unit test pure functions"
|
|
87
|
+
|
|
88
|
+
✅ GOOD: "Unit test pure functions"
|
|
89
|
+
|
|
90
|
+
Edge cases:
|
|
91
|
+
|
|
92
|
+
- Non-deterministic functions (Math.random(), Date.now()) → Unit test with mocked randomness/time
|
|
93
|
+
- Environment dependencies (process.env) → Integration test
|
|
94
|
+
- Mixed pure + I/O → Extract pure part, unit test separately
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
**6. Actionable Over Vague**
|
|
98
|
+
|
|
99
|
+
Give LLMs concrete actions, not subjective guidance. Replace subjective terms (most/some/few) with optimization rules + red flags.
|
|
100
|
+
|
|
101
|
+
```markdown
|
|
102
|
+
❌ BAD: "Most tests: Fast, Some tests: Slow"
|
|
103
|
+
|
|
104
|
+
✅ GOOD:
|
|
105
|
+
|
|
106
|
+
- Write as many fast tests as possible
|
|
107
|
+
- Write E2E tests only for critical paths requiring a browser
|
|
108
|
+
- Red flag: If you have more E2E tests than integration tests, suite is too slow
|
|
109
|
+
```
|
|
110
|
+
|
|
111
|
+
**7. Decision Trees: Sequential Over Parallel**
|
|
112
|
+
|
|
113
|
+
Structure decisions as ordered steps, not simultaneous checks. Sequential questions force the LLM through a deterministic decision path.
|
|
114
|
+
|
|
115
|
+
```markdown
|
|
116
|
+
❌ BAD - Parallel branches:
|
|
117
|
+
├─ Pure function?
|
|
118
|
+
├─ Multiple components?
|
|
119
|
+
└─ Full user flow?
|
|
120
|
+
|
|
121
|
+
✅ GOOD - Sequential (see Principle 1 example above)
|
|
122
|
+
Answer questions IN ORDER. Stop at the first match.
|
|
123
|
+
```
|
|
124
|
+
|
|
125
|
+
**8. Tie-Breaking Rules**
|
|
126
|
+
|
|
127
|
+
When multiple options could apply, tell LLMs how to choose.
|
|
128
|
+
|
|
129
|
+
```markdown
|
|
130
|
+
✅ GOOD:
|
|
131
|
+
"If multiple test types can catch the bug, choose the fastest one."
|
|
132
|
+
|
|
133
|
+
Reference in decision trees:
|
|
134
|
+
"If multiple seem to apply, use the tie-breaking rule stated above: choose the faster one."
|
|
135
|
+
```
|
|
136
|
+
|
|
137
|
+
**9. Lookup Tables for Complex Decisions**
|
|
138
|
+
|
|
139
|
+
When decision logic has 3+ branches, nested conditions, or multiple variables to consider, provide a reference table.
|
|
140
|
+
|
|
141
|
+
```markdown
|
|
142
|
+
| Bug Type | Unit? | Integration? | E2E? | Best Choice |
|
|
143
|
+
| ------------------ | ----- | ------------ | ---- | ----------------- |
|
|
144
|
+
| Calculation error | ✅ | ✅ | ✅ | Unit (fastest) |
|
|
145
|
+
| Database query bug | ❌ | ✅ | ✅ | Integration |
|
|
146
|
+
| CSS layout broken | ❌ | ❌ | ✅ | E2E (only option) |
|
|
147
|
+
```
|
|
148
|
+
|
|
149
|
+
**10. Avoid Caveats in Tables**
|
|
150
|
+
|
|
151
|
+
Keep patterns clean. Parentheticals break LLM pattern matching. Add separate rows for caveat cases.
|
|
152
|
+
|
|
153
|
+
```markdown
|
|
154
|
+
❌ BAD: | State management bug | ❌ NO (if mocked) | ✅ YES |
|
|
155
|
+
✅ GOOD: | State management bug (Zustand, Redux) | ❌ NO | ✅ YES |
|
|
156
|
+
```
|
|
157
|
+
|
|
158
|
+
**11. Percentages: Context or None**
|
|
159
|
+
|
|
160
|
+
Don't use percentages without adjustment guidance.
|
|
161
|
+
|
|
162
|
+
```markdown
|
|
163
|
+
❌ BAD: "70% unit tests, 20% integration, 10% E2E"
|
|
164
|
+
|
|
165
|
+
✅ BETTER: "Baseline: 70/20/10. Adjust: Microservices → 60/30/10, UI-heavy → 60/20/20"
|
|
166
|
+
|
|
167
|
+
✅ BEST: "Write as many fast tests as possible. Red flag: More E2E than integration = too slow."
|
|
168
|
+
```
|
|
169
|
+
|
|
170
|
+
**12. Specificity in Questions**
|
|
171
|
+
|
|
172
|
+
Use precise technical terms, not general descriptions.
|
|
173
|
+
|
|
174
|
+
```markdown
|
|
175
|
+
❌ BAD: "Does this require seeing the UI?"
|
|
176
|
+
✅ GOOD: "Does this require a real browser (Playwright/Cypress)?"
|
|
177
|
+
|
|
178
|
+
Note: React Testing Library does NOT require a browser - that's integration testing.
|
|
179
|
+
```
|
|
180
|
+
|
|
181
|
+
**13. Re-evaluation Paths**
|
|
182
|
+
|
|
183
|
+
When LLMs hit dead ends, provide concrete next steps.
|
|
184
|
+
|
|
185
|
+
```markdown
|
|
186
|
+
❌ BAD: "If none of the above apply, re-evaluate your approach"
|
|
187
|
+
|
|
188
|
+
✅ GOOD: "If testing behavior that doesn't fit the categories:
|
|
189
|
+
|
|
190
|
+
1. Break it down: Separate pure logic from I/O/UI concerns
|
|
191
|
+
2. Test each piece: Pure → Unit, I/O → Integration, Multi-page → E2E
|
|
192
|
+
3. Example: Login validation
|
|
193
|
+
- isValidEmail(email) → Unit test
|
|
194
|
+
- checkUserExists(email) → Integration test (database)
|
|
195
|
+
- Login form → Dashboard → E2E test (multi-page)"
|
|
196
|
+
```
|
|
197
|
+
|
|
198
|
+
## Anti-Patterns to Avoid
|
|
199
|
+
|
|
200
|
+
❌ **Visual metaphors** - Pyramids, icebergs—LLMs don't process visual information well
|
|
201
|
+
❌ **Undefined jargon** - "Technical debt", "code smell" need definitions
|
|
202
|
+
❌ **Competing guidance** - Multiple decision frameworks that contradict each other
|
|
203
|
+
❌ **Outdated references** - Remove concepts, but forget to update all mentions
|
|
204
|
+
|
|
205
|
+
## Quality Checklist
|
|
206
|
+
|
|
207
|
+
Before saving/committing LLM-consumable documentation:
|
|
208
|
+
|
|
209
|
+
- [ ] Decision trees follow MECE principle (mutually exclusive, collectively exhaustive)
|
|
210
|
+
- [ ] Technical terms explicitly defined
|
|
211
|
+
- [ ] No contradictions between sections
|
|
212
|
+
- [ ] Every rule has 2-3 concrete examples (good vs bad)
|
|
213
|
+
- [ ] Edge cases explicitly covered
|
|
214
|
+
- [ ] Vague terms replaced with actionable principles
|
|
215
|
+
- [ ] Tie-breaking rules provided
|
|
216
|
+
- [ ] Complex decisions (3+ branches) have lookup tables
|
|
217
|
+
- [ ] Dead-end paths have re-evaluation steps with examples
|
|
218
|
+
|
|
219
|
+
## Research-Backed Principles
|
|
220
|
+
|
|
221
|
+
- **MECE (McKinsey):** Mutually exclusive, collectively exhaustive decision trees for reliable LLM decisions
|
|
222
|
+
- **Prompt ambiguity (2025):** "Ambiguity is one of the most common causes of poor LLM output" (Zero-Shot Decision Tree Construction)
|
|
223
|
+
- **Concrete examples (2025):** Structured approaches with concrete examples consistently improve performance over "act as" or "###" techniques
|
|
224
|
+
|
|
225
|
+
## Example: Before and After
|
|
226
|
+
|
|
227
|
+
**Before (ambiguous):**
|
|
228
|
+
|
|
229
|
+
```markdown
|
|
230
|
+
Follow the test pyramid: lots of unit tests, some integration tests, few E2E tests.
|
|
231
|
+
```
|
|
232
|
+
|
|
233
|
+
**After (LLM-optimized):**
|
|
234
|
+
|
|
235
|
+
```markdown
|
|
236
|
+
Answer these questions IN ORDER to choose test type:
|
|
237
|
+
|
|
238
|
+
1. Pure function (input → output, no I/O)? → Unit test
|
|
239
|
+
2. Multiple components/services interacting? → Integration test
|
|
240
|
+
3. Requires real browser (Playwright)? → E2E test
|
|
241
|
+
|
|
242
|
+
If multiple apply: choose the faster one.
|
|
243
|
+
|
|
244
|
+
Edge cases:
|
|
245
|
+
|
|
246
|
+
- React components with React Testing Library → Integration (not E2E, no real browser)
|
|
247
|
+
- Non-deterministic functions (Date.now()) → Unit test with mocked time
|
|
248
|
+
```
|
|
@@ -0,0 +1,102 @@
|
|
|
1
|
+
# LLM Prompting Best Practices
|
|
2
|
+
|
|
3
|
+
This guide covers two related topics:
|
|
4
|
+
|
|
5
|
+
**Part 1: Prompting LLMs** - How to structure prompts when actively using an LLM (API calls, chat interactions)
|
|
6
|
+
|
|
7
|
+
**Part 2: Writing Instructions for LLMs** - How to write documentation that LLMs will read and follow (SAFEWORD.md, CLAUDE.md, testing guides, coding standards)
|
|
8
|
+
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
## Part 1: Prompting LLMs
|
|
12
|
+
|
|
13
|
+
### Prompt Engineering Principles
|
|
14
|
+
|
|
15
|
+
**Concrete Examples Over Abstract Rules:**
|
|
16
|
+
|
|
17
|
+
- ✅ Good: Show "❌ BAD" vs "✅ GOOD" code examples
|
|
18
|
+
- ❌ Bad: "Follow best practices" (too vague)
|
|
19
|
+
|
|
20
|
+
**"Why" Over "What":**
|
|
21
|
+
|
|
22
|
+
- Explain architectural trade-offs and reasoning
|
|
23
|
+
- Include specific numbers (90% cost reduction, 3x faster)
|
|
24
|
+
- Document gotchas with explanations
|
|
25
|
+
|
|
26
|
+
**Structured Outputs:**
|
|
27
|
+
|
|
28
|
+
- Use JSON mode for predictable LLM responses
|
|
29
|
+
- Define explicit schemas with validation
|
|
30
|
+
- Return structured data, not prose
|
|
31
|
+
|
|
32
|
+
```typescript
|
|
33
|
+
// ❌ BAD - Prose output
|
|
34
|
+
"The user wants to create a campaign named 'Shadows' with 4 players"
|
|
35
|
+
|
|
36
|
+
// ✅ GOOD - Structured JSON
|
|
37
|
+
{ "intent": "create_campaign", "name": "Shadows", "playerCount": 4 }
|
|
38
|
+
```
|
|
39
|
+
|
|
40
|
+
### Cost Optimization
|
|
41
|
+
|
|
42
|
+
**Prompt Caching (Critical for AI Agents):**
|
|
43
|
+
|
|
44
|
+
- Static rules → System prompt with cache_control: ephemeral (caches for ~5 min, auto-expires)
|
|
45
|
+
- Dynamic data (character state, user input) → User message (no caching)
|
|
46
|
+
- Example: 468-line prompt costs $0.10 without caching, $0.01 with (90% reduction)
|
|
47
|
+
- Cache invalidation: ANY change to cached blocks breaks ALL caches
|
|
48
|
+
- Rule: Change system prompts sparingly; accept one-time cache rebuild cost
|
|
49
|
+
|
|
50
|
+
**Message Architecture:**
|
|
51
|
+
|
|
52
|
+
```typescript
|
|
53
|
+
// ✅ GOOD - Cacheable system prompt
|
|
54
|
+
systemPrompt: [
|
|
55
|
+
{ text: STATIC_RULES, cache_control: { type: 'ephemeral' } },
|
|
56
|
+
{ text: STATIC_EXAMPLES, cache_control: { type: 'ephemeral' } },
|
|
57
|
+
];
|
|
58
|
+
userMessage: `Character: ${dynamicState}\nAction: ${userInput}`;
|
|
59
|
+
|
|
60
|
+
// ❌ BAD - Uncacheable (character state in system prompt)
|
|
61
|
+
systemPrompt: `Rules + Character: ${dynamicState}`;
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
### Testing AI Outputs
|
|
65
|
+
|
|
66
|
+
**LLM-as-Judge Pattern:**
|
|
67
|
+
|
|
68
|
+
- Use LLM to evaluate nuanced qualities (narrative tone, reasoning quality)
|
|
69
|
+
- Avoid brittle keyword matching for creative outputs
|
|
70
|
+
- Define rubrics: EXCELLENT / ACCEPTABLE / POOR with criteria
|
|
71
|
+
- Example: "Does the GM's response show collaborative tone?" vs checking for specific words
|
|
72
|
+
|
|
73
|
+
**Evaluation Framework:**
|
|
74
|
+
|
|
75
|
+
- Unit tests: Pure functions (parsing, validation)
|
|
76
|
+
- Integration tests: Agent + real LLM calls (schema compliance)
|
|
77
|
+
- LLM Evals: Judgment quality (position/effect reasoning, atmosphere)
|
|
78
|
+
- Cost awareness: 30 scenarios ≈ $0.15-0.30 per run with caching
|
|
79
|
+
|
|
80
|
+
---
|
|
81
|
+
|
|
82
|
+
## Part 2: Writing Instructions for LLMs
|
|
83
|
+
|
|
84
|
+
**Comprehensive framework:** See @.safeword/guides/llm-instruction-design.md
|
|
85
|
+
|
|
86
|
+
**Quick summary:** When creating documentation that LLMs will read and follow (SAFEWORD.md, CLAUDE.md, testing guides, coding standards), apply 13 core principles:
|
|
87
|
+
|
|
88
|
+
1. **MECE Principle** - Decision trees must be mutually exclusive and collectively exhaustive
|
|
89
|
+
2. **Explicit Over Implicit** - Define all terms, never assume LLMs know what you mean
|
|
90
|
+
3. **No Contradictions** - Different sections must align, LLMs don't reconcile conflicts
|
|
91
|
+
4. **Concrete Examples Over Abstract Rules** - Show, don't just tell (2-3 examples per rule)
|
|
92
|
+
5. **Edge Cases Must Be Explicit** - What seems obvious to humans often isn't to LLMs
|
|
93
|
+
6. **Actionable Over Vague** - Replace subjective terms with optimization rules + red flags
|
|
94
|
+
7. **Decision Trees: Sequential Over Parallel** - Ordered steps that stop at first match
|
|
95
|
+
8. **Tie-Breaking Rules** - Tell LLMs how to choose when multiple options apply
|
|
96
|
+
9. **Lookup Tables for Complex Decisions** - Provide reference tables for complex logic
|
|
97
|
+
10. **Avoid Caveats in Tables** - Keep patterns clean, parentheticals break LLM pattern matching
|
|
98
|
+
11. **Percentages: Context or None** - Include adjustment guidance or use principles instead
|
|
99
|
+
12. **Specificity in Questions** - Use precise technical terms, not general descriptions
|
|
100
|
+
13. **Re-evaluation Paths** - Provide concrete next steps when LLMs hit dead ends
|
|
101
|
+
|
|
102
|
+
**Also includes:** Anti-patterns to avoid, quality checklist, research-backed principles, and before/after examples.
|