safeword 0.2.3 → 0.2.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (235) hide show
  1. package/.claude/commands/arch-review.md +32 -0
  2. package/.claude/commands/lint.md +6 -0
  3. package/.claude/commands/quality-review.md +13 -0
  4. package/.claude/commands/setup-linting.md +6 -0
  5. package/.claude/hooks/auto-lint.sh +6 -0
  6. package/.claude/hooks/auto-quality-review.sh +170 -0
  7. package/.claude/hooks/check-linting-sync.sh +17 -0
  8. package/.claude/hooks/inject-timestamp.sh +6 -0
  9. package/.claude/hooks/question-protocol.sh +12 -0
  10. package/.claude/hooks/run-linters.sh +8 -0
  11. package/.claude/hooks/run-quality-review.sh +76 -0
  12. package/.claude/hooks/version-check.sh +10 -0
  13. package/.claude/mcp/README.md +96 -0
  14. package/.claude/mcp/arcade.sample.json +9 -0
  15. package/.claude/mcp/context7.sample.json +7 -0
  16. package/.claude/mcp/playwright.sample.json +7 -0
  17. package/.claude/settings.json +62 -0
  18. package/.claude/skills/quality-reviewer/SKILL.md +190 -0
  19. package/.claude/skills/safeword-quality-reviewer/SKILL.md +13 -0
  20. package/.env.arcade.example +4 -0
  21. package/.env.example +11 -0
  22. package/.gitmodules +4 -0
  23. package/.safeword/SAFEWORD.md +33 -0
  24. package/.safeword/eslint/eslint-base.mjs +101 -0
  25. package/.safeword/guides/architecture-guide.md +404 -0
  26. package/.safeword/guides/code-philosophy.md +174 -0
  27. package/.safeword/guides/context-files-guide.md +405 -0
  28. package/.safeword/guides/data-architecture-guide.md +183 -0
  29. package/.safeword/guides/design-doc-guide.md +165 -0
  30. package/.safeword/guides/learning-extraction.md +515 -0
  31. package/.safeword/guides/llm-instruction-design.md +239 -0
  32. package/.safeword/guides/llm-prompting.md +95 -0
  33. package/.safeword/guides/tdd-best-practices.md +570 -0
  34. package/.safeword/guides/test-definitions-guide.md +243 -0
  35. package/.safeword/guides/testing-methodology.md +573 -0
  36. package/.safeword/guides/user-story-guide.md +237 -0
  37. package/.safeword/guides/zombie-process-cleanup.md +214 -0
  38. package/{templates → .safeword}/hooks/agents-md-check.sh +0 -0
  39. package/{templates → .safeword}/hooks/post-tool.sh +0 -0
  40. package/{templates → .safeword}/hooks/pre-commit.sh +0 -0
  41. package/.safeword/planning/002-user-story-quality-evaluation.md +1840 -0
  42. package/.safeword/planning/003-langsmith-eval-setup-prompt.md +363 -0
  43. package/.safeword/planning/004-llm-eval-test-cases.md +3226 -0
  44. package/.safeword/planning/005-architecture-enforcement-system.md +169 -0
  45. package/.safeword/planning/006-reactive-fix-prevention-research.md +135 -0
  46. package/.safeword/planning/011-cli-ux-vision.md +330 -0
  47. package/.safeword/planning/012-project-structure-cleanup.md +154 -0
  48. package/.safeword/planning/README.md +39 -0
  49. package/.safeword/planning/automation-plan-v2.md +1225 -0
  50. package/.safeword/planning/automation-plan-v3.md +1291 -0
  51. package/.safeword/planning/automation-plan.md +3058 -0
  52. package/.safeword/planning/design/005-cli-implementation.md +343 -0
  53. package/.safeword/planning/design/013-cli-self-contained-templates.md +596 -0
  54. package/.safeword/planning/design/013a-eslint-plugin-suite.md +256 -0
  55. package/.safeword/planning/design/013b-implementation-snippets.md +385 -0
  56. package/.safeword/planning/design/013c-config-isolation-strategy.md +242 -0
  57. package/.safeword/planning/design/code-philosophy-improvements.md +60 -0
  58. package/.safeword/planning/mcp-analysis.md +545 -0
  59. package/.safeword/planning/phase2-subagents-vs-skills-analysis.md +451 -0
  60. package/.safeword/planning/settings-improvements.md +970 -0
  61. package/.safeword/planning/test-definitions/005-cli-implementation.md +1301 -0
  62. package/.safeword/planning/test-definitions/cli-self-contained-templates.md +205 -0
  63. package/.safeword/planning/user-stories/001-guides-review-user-stories.md +1381 -0
  64. package/.safeword/planning/user-stories/003-reactive-fix-prevention.md +132 -0
  65. package/.safeword/planning/user-stories/004-technical-constraints.md +86 -0
  66. package/.safeword/planning/user-stories/005-cli-implementation.md +311 -0
  67. package/.safeword/planning/user-stories/cli-self-contained-templates.md +172 -0
  68. package/.safeword/planning/versioned-distribution.md +740 -0
  69. package/.safeword/prompts/arch-review.md +43 -0
  70. package/.safeword/prompts/quality-review.md +11 -0
  71. package/.safeword/scripts/arch-review.sh +235 -0
  72. package/.safeword/scripts/check-linting-sync.sh +58 -0
  73. package/.safeword/scripts/setup-linting.sh +559 -0
  74. package/.safeword/templates/architecture-template.md +136 -0
  75. package/.safeword/templates/ci/architecture-check.yml +79 -0
  76. package/.safeword/templates/design-doc-template.md +127 -0
  77. package/.safeword/templates/test-definitions-feature.md +100 -0
  78. package/.safeword/templates/ticket-template.md +74 -0
  79. package/.safeword/templates/user-stories-template.md +82 -0
  80. package/.safeword/tickets/001-guides-review-user-stories.md +83 -0
  81. package/.safeword/tickets/002-architecture-enforcement.md +211 -0
  82. package/.safeword/tickets/003-reactive-fix-prevention.md +57 -0
  83. package/.safeword/tickets/004-technical-constraints-in-user-stories.md +39 -0
  84. package/.safeword/tickets/005-cli-implementation.md +248 -0
  85. package/.safeword/tickets/006-flesh-out-skills.md +43 -0
  86. package/.safeword/tickets/007-flesh-out-questioning.md +44 -0
  87. package/.safeword/tickets/008-upgrade-questioning.md +58 -0
  88. package/.safeword/tickets/009-naming-conventions.md +41 -0
  89. package/.safeword/tickets/010-safeword-md-cleanup.md +34 -0
  90. package/.safeword/tickets/011-cursor-setup.md +86 -0
  91. package/.safeword/tickets/README.md +73 -0
  92. package/.safeword/version +1 -0
  93. package/AGENTS.md +59 -0
  94. package/CLAUDE.md +12 -0
  95. package/README.md +347 -0
  96. package/docs/001-cli-implementation-plan.md +856 -0
  97. package/docs/elite-dx-implementation-plan.md +1034 -0
  98. package/framework/README.md +131 -0
  99. package/framework/mcp/README.md +96 -0
  100. package/framework/mcp/arcade.sample.json +8 -0
  101. package/framework/mcp/context7.sample.json +6 -0
  102. package/framework/mcp/playwright.sample.json +6 -0
  103. package/framework/scripts/arch-review.sh +235 -0
  104. package/framework/scripts/check-linting-sync.sh +58 -0
  105. package/framework/scripts/load-env.sh +49 -0
  106. package/framework/scripts/setup-claude.sh +223 -0
  107. package/framework/scripts/setup-linting.sh +559 -0
  108. package/framework/scripts/setup-quality.sh +477 -0
  109. package/framework/scripts/setup-safeword.sh +550 -0
  110. package/framework/templates/ci/architecture-check.yml +78 -0
  111. package/learnings/ai-sdk-v5-breaking-changes.md +178 -0
  112. package/learnings/e2e-test-zombie-processes.md +231 -0
  113. package/learnings/milkdown-crepe-editor-property.md +96 -0
  114. package/learnings/prosemirror-fragment-traversal.md +119 -0
  115. package/package.json +19 -43
  116. package/packages/cli/AGENTS.md +1 -0
  117. package/packages/cli/ARCHITECTURE.md +279 -0
  118. package/packages/cli/package.json +51 -0
  119. package/packages/cli/src/cli.ts +63 -0
  120. package/packages/cli/src/commands/check.ts +166 -0
  121. package/packages/cli/src/commands/diff.ts +209 -0
  122. package/packages/cli/src/commands/reset.ts +190 -0
  123. package/packages/cli/src/commands/setup.ts +325 -0
  124. package/packages/cli/src/commands/upgrade.ts +163 -0
  125. package/packages/cli/src/index.ts +3 -0
  126. package/packages/cli/src/templates/config.ts +58 -0
  127. package/packages/cli/src/templates/content.ts +18 -0
  128. package/packages/cli/src/templates/index.ts +12 -0
  129. package/packages/cli/src/utils/agents-md.ts +66 -0
  130. package/packages/cli/src/utils/fs.ts +179 -0
  131. package/packages/cli/src/utils/git.ts +124 -0
  132. package/packages/cli/src/utils/hooks.ts +29 -0
  133. package/packages/cli/src/utils/output.ts +60 -0
  134. package/packages/cli/src/utils/project-detector.test.ts +185 -0
  135. package/packages/cli/src/utils/project-detector.ts +44 -0
  136. package/packages/cli/src/utils/version.ts +28 -0
  137. package/packages/cli/src/version.ts +6 -0
  138. package/packages/cli/templates/SAFEWORD.md +776 -0
  139. package/packages/cli/templates/doc-templates/architecture-template.md +136 -0
  140. package/packages/cli/templates/doc-templates/design-doc-template.md +134 -0
  141. package/packages/cli/templates/doc-templates/test-definitions-feature.md +131 -0
  142. package/packages/cli/templates/doc-templates/ticket-template.md +82 -0
  143. package/packages/cli/templates/doc-templates/user-stories-template.md +92 -0
  144. package/packages/cli/templates/guides/architecture-guide.md +423 -0
  145. package/packages/cli/templates/guides/code-philosophy.md +195 -0
  146. package/packages/cli/templates/guides/context-files-guide.md +457 -0
  147. package/packages/cli/templates/guides/data-architecture-guide.md +200 -0
  148. package/packages/cli/templates/guides/design-doc-guide.md +171 -0
  149. package/packages/cli/templates/guides/learning-extraction.md +552 -0
  150. package/packages/cli/templates/guides/llm-instruction-design.md +248 -0
  151. package/packages/cli/templates/guides/llm-prompting.md +102 -0
  152. package/packages/cli/templates/guides/tdd-best-practices.md +615 -0
  153. package/packages/cli/templates/guides/test-definitions-guide.md +334 -0
  154. package/packages/cli/templates/guides/testing-methodology.md +618 -0
  155. package/packages/cli/templates/guides/user-story-guide.md +256 -0
  156. package/packages/cli/templates/guides/zombie-process-cleanup.md +219 -0
  157. package/packages/cli/templates/hooks/agents-md-check.sh +27 -0
  158. package/packages/cli/templates/hooks/post-tool.sh +4 -0
  159. package/packages/cli/templates/hooks/pre-commit.sh +10 -0
  160. package/packages/cli/templates/prompts/arch-review.md +43 -0
  161. package/packages/cli/templates/prompts/quality-review.md +10 -0
  162. package/packages/cli/templates/skills/safeword-quality-reviewer/SKILL.md +207 -0
  163. package/packages/cli/tests/commands/check.test.ts +129 -0
  164. package/packages/cli/tests/commands/cli.test.ts +89 -0
  165. package/packages/cli/tests/commands/diff.test.ts +115 -0
  166. package/packages/cli/tests/commands/reset.test.ts +310 -0
  167. package/packages/cli/tests/commands/self-healing.test.ts +170 -0
  168. package/packages/cli/tests/commands/setup-blocking.test.ts +71 -0
  169. package/packages/cli/tests/commands/setup-core.test.ts +135 -0
  170. package/packages/cli/tests/commands/setup-git.test.ts +139 -0
  171. package/packages/cli/tests/commands/setup-hooks.test.ts +334 -0
  172. package/packages/cli/tests/commands/setup-linting.test.ts +189 -0
  173. package/packages/cli/tests/commands/setup-noninteractive.test.ts +80 -0
  174. package/packages/cli/tests/commands/setup-templates.test.ts +181 -0
  175. package/packages/cli/tests/commands/upgrade.test.ts +215 -0
  176. package/packages/cli/tests/helpers.ts +243 -0
  177. package/packages/cli/tests/npm-package.test.ts +83 -0
  178. package/packages/cli/tests/technical-constraints.test.ts +96 -0
  179. package/packages/cli/tsconfig.json +25 -0
  180. package/packages/cli/tsup.config.ts +11 -0
  181. package/packages/cli/vitest.config.ts +23 -0
  182. package/promptfoo.yaml +3270 -0
  183. package/dist/check-3NGQ4NR5.js +0 -129
  184. package/dist/check-3NGQ4NR5.js.map +0 -1
  185. package/dist/chunk-2XWIUEQK.js +0 -190
  186. package/dist/chunk-2XWIUEQK.js.map +0 -1
  187. package/dist/chunk-GZRQL3SX.js +0 -146
  188. package/dist/chunk-GZRQL3SX.js.map +0 -1
  189. package/dist/chunk-ORQHKDT2.js +0 -10
  190. package/dist/chunk-ORQHKDT2.js.map +0 -1
  191. package/dist/chunk-W66Z3C5H.js +0 -21
  192. package/dist/chunk-W66Z3C5H.js.map +0 -1
  193. package/dist/cli.d.ts +0 -1
  194. package/dist/cli.js +0 -34
  195. package/dist/cli.js.map +0 -1
  196. package/dist/diff-Y6QTAW4O.js +0 -166
  197. package/dist/diff-Y6QTAW4O.js.map +0 -1
  198. package/dist/index.d.ts +0 -11
  199. package/dist/index.js +0 -7
  200. package/dist/index.js.map +0 -1
  201. package/dist/reset-3ACTIYYE.js +0 -143
  202. package/dist/reset-3ACTIYYE.js.map +0 -1
  203. package/dist/setup-RR4M334C.js +0 -266
  204. package/dist/setup-RR4M334C.js.map +0 -1
  205. package/dist/upgrade-6AR3DHUV.js +0 -134
  206. package/dist/upgrade-6AR3DHUV.js.map +0 -1
  207. /package/{templates → framework}/SAFEWORD.md +0 -0
  208. /package/{templates → framework}/guides/architecture-guide.md +0 -0
  209. /package/{templates → framework}/guides/code-philosophy.md +0 -0
  210. /package/{templates → framework}/guides/context-files-guide.md +0 -0
  211. /package/{templates → framework}/guides/data-architecture-guide.md +0 -0
  212. /package/{templates → framework}/guides/design-doc-guide.md +0 -0
  213. /package/{templates → framework}/guides/learning-extraction.md +0 -0
  214. /package/{templates → framework}/guides/llm-instruction-design.md +0 -0
  215. /package/{templates → framework}/guides/llm-prompting.md +0 -0
  216. /package/{templates → framework}/guides/tdd-best-practices.md +0 -0
  217. /package/{templates → framework}/guides/test-definitions-guide.md +0 -0
  218. /package/{templates → framework}/guides/testing-methodology.md +0 -0
  219. /package/{templates → framework}/guides/user-story-guide.md +0 -0
  220. /package/{templates → framework}/guides/zombie-process-cleanup.md +0 -0
  221. /package/{templates → framework}/prompts/arch-review.md +0 -0
  222. /package/{templates → framework}/prompts/quality-review.md +0 -0
  223. /package/{templates/skills/safeword-quality-reviewer → framework/skills/quality-reviewer}/SKILL.md +0 -0
  224. /package/{templates/doc-templates → framework/templates}/architecture-template.md +0 -0
  225. /package/{templates/doc-templates → framework/templates}/design-doc-template.md +0 -0
  226. /package/{templates/doc-templates → framework/templates}/test-definitions-feature.md +0 -0
  227. /package/{templates/doc-templates → framework/templates}/ticket-template.md +0 -0
  228. /package/{templates/doc-templates → framework/templates}/user-stories-template.md +0 -0
  229. /package/{templates → packages/cli/templates}/commands/arch-review.md +0 -0
  230. /package/{templates → packages/cli/templates}/commands/lint.md +0 -0
  231. /package/{templates → packages/cli/templates}/commands/quality-review.md +0 -0
  232. /package/{templates → packages/cli/templates}/hooks/inject-timestamp.sh +0 -0
  233. /package/{templates → packages/cli/templates}/lib/common.sh +0 -0
  234. /package/{templates → packages/cli/templates}/lib/jq-fallback.sh +0 -0
  235. /package/{templates → packages/cli/templates}/markdownlint.jsonc +0 -0
@@ -0,0 +1,248 @@
1
+ # Writing Instructions for LLMs
2
+
3
+ **Context:** When creating documentation that LLMs will read and follow (like AGENTS.md, CLAUDE.md, testing guides, coding standards), different best practices apply than when prompting an LLM directly.
4
+
5
+ ## Core Principles
6
+
7
+ **1. MECE Principle (Mutually Exclusive, Collectively Exhaustive)**
8
+
9
+ Decision trees and categorization must have no overlap and cover all cases. Research shows LLMs struggle with overlapping categories—McKinsey/BCG MECE framework ensures clear decision paths.
10
+
11
+ ```markdown
12
+ ❌ BAD - Not mutually exclusive:
13
+ ├─ Pure function?
14
+ ├─ Multiple components interacting?
15
+ ├─ Full user flow?
16
+
17
+ Problem: A function with database calls could match both
18
+
19
+ ✅ GOOD - Sequential, mutually exclusive:
20
+
21
+ 1. AI content quality? → LLM Eval
22
+ 2. Requires real browser? → E2E test
23
+ 3. Multiple components? → Integration test
24
+ 4. Pure function? → Unit test
25
+
26
+ Stops at first match, no ambiguity.
27
+ ```
28
+
29
+ **2. Explicit Over Implicit**
30
+
31
+ Never assume LLMs know what you mean. Define all terms, even "obvious" ones.
32
+
33
+ ```markdown
34
+ ❌ BAD: "Test at the lowest level"
35
+ ✅ GOOD: "Test with the fastest test type that can catch the bug"
36
+
37
+ Examples needing definition:
38
+
39
+ - "Critical paths" → Always critical: auth, payment. Rarely: UI polish, admin
40
+ - "Browser" → Real browser (Playwright/Cypress), not jsdom
41
+ - "Pure function" → Input → output, no I/O (define edge cases like Date.now())
42
+ ```
43
+
44
+ **3. No Contradictions**
45
+
46
+ Different sections must align. LLMs don't reconcile conflicting guidance. When updating, grep for related terms and update all references.
47
+
48
+ ```markdown
49
+ ❌ BAD:
50
+ Section A: "Write E2E tests only for critical user paths"
51
+ Section B: "All user-facing features have at least one E2E test"
52
+
53
+ ✅ GOOD:
54
+ Section A: "Write E2E tests only for critical user paths"
55
+ Section B: "All critical multi-page user flows have at least one E2E test"
56
+
57
+ - Definition of "critical" with examples
58
+ ```
59
+
60
+ **4. Concrete Examples Over Abstract Rules**
61
+
62
+ Show, don't just tell. LLMs learn patterns from examples. For every rule, include 2-3 concrete examples showing good vs bad.
63
+
64
+ ```markdown
65
+ ❌ BAD: "Follow best practices for testing"
66
+
67
+ ✅ GOOD:
68
+ // ❌ BAD - Testing business logic with E2E
69
+ test('discount calculation', async ({ page }) => {
70
+ await page.goto('/checkout')
71
+ await page.fill('[name="price"]', '100')
72
+ await expect(page.locator('.total')).toContainText('80')
73
+ })
74
+
75
+ // ✅ GOOD - Unit test (runs in milliseconds)
76
+ it('applies 20% discount', () => {
77
+ expect(calculateDiscount(100, 0.20)).toBe(80)
78
+ })
79
+ ```
80
+
81
+ **5. Edge Cases Must Be Explicit**
82
+
83
+ What seems obvious to humans often isn't to LLMs. After stating a rule, add "Edge cases:" section with common confusing scenarios.
84
+
85
+ ```markdown
86
+ ❌ BAD: "Unit test pure functions"
87
+
88
+ ✅ GOOD: "Unit test pure functions"
89
+
90
+ Edge cases:
91
+
92
+ - Non-deterministic functions (Math.random(), Date.now()) → Unit test with mocked randomness/time
93
+ - Environment dependencies (process.env) → Integration test
94
+ - Mixed pure + I/O → Extract pure part, unit test separately
95
+ ```
96
+
97
+ **6. Actionable Over Vague**
98
+
99
+ Give LLMs concrete actions, not subjective guidance. Replace subjective terms (most/some/few) with optimization rules + red flags.
100
+
101
+ ```markdown
102
+ ❌ BAD: "Most tests: Fast, Some tests: Slow"
103
+
104
+ ✅ GOOD:
105
+
106
+ - Write as many fast tests as possible
107
+ - Write E2E tests only for critical paths requiring a browser
108
+ - Red flag: If you have more E2E tests than integration tests, suite is too slow
109
+ ```
110
+
111
+ **7. Decision Trees: Sequential Over Parallel**
112
+
113
+ Structure decisions as ordered steps, not simultaneous checks. Sequential questions force the LLM through a deterministic decision path.
114
+
115
+ ```markdown
116
+ ❌ BAD - Parallel branches:
117
+ ├─ Pure function?
118
+ ├─ Multiple components?
119
+ └─ Full user flow?
120
+
121
+ ✅ GOOD - Sequential (see Principle 1 example above)
122
+ Answer questions IN ORDER. Stop at the first match.
123
+ ```
124
+
125
+ **8. Tie-Breaking Rules**
126
+
127
+ When multiple options could apply, tell LLMs how to choose.
128
+
129
+ ```markdown
130
+ ✅ GOOD:
131
+ "If multiple test types can catch the bug, choose the fastest one."
132
+
133
+ Reference in decision trees:
134
+ "If multiple seem to apply, use the tie-breaking rule stated above: choose the faster one."
135
+ ```
136
+
137
+ **9. Lookup Tables for Complex Decisions**
138
+
139
+ When decision logic has 3+ branches, nested conditions, or multiple variables to consider, provide a reference table.
140
+
141
+ ```markdown
142
+ | Bug Type | Unit? | Integration? | E2E? | Best Choice |
143
+ | ------------------ | ----- | ------------ | ---- | ----------------- |
144
+ | Calculation error | ✅ | ✅ | ✅ | Unit (fastest) |
145
+ | Database query bug | ❌ | ✅ | ✅ | Integration |
146
+ | CSS layout broken | ❌ | ❌ | ✅ | E2E (only option) |
147
+ ```
148
+
149
+ **10. Avoid Caveats in Tables**
150
+
151
+ Keep patterns clean. Parentheticals break LLM pattern matching. Add separate rows for caveat cases.
152
+
153
+ ```markdown
154
+ ❌ BAD: | State management bug | ❌ NO (if mocked) | ✅ YES |
155
+ ✅ GOOD: | State management bug (Zustand, Redux) | ❌ NO | ✅ YES |
156
+ ```
157
+
158
+ **11. Percentages: Context or None**
159
+
160
+ Don't use percentages without adjustment guidance.
161
+
162
+ ```markdown
163
+ ❌ BAD: "70% unit tests, 20% integration, 10% E2E"
164
+
165
+ ✅ BETTER: "Baseline: 70/20/10. Adjust: Microservices → 60/30/10, UI-heavy → 60/20/20"
166
+
167
+ ✅ BEST: "Write as many fast tests as possible. Red flag: More E2E than integration = too slow."
168
+ ```
169
+
170
+ **12. Specificity in Questions**
171
+
172
+ Use precise technical terms, not general descriptions.
173
+
174
+ ```markdown
175
+ ❌ BAD: "Does this require seeing the UI?"
176
+ ✅ GOOD: "Does this require a real browser (Playwright/Cypress)?"
177
+
178
+ Note: React Testing Library does NOT require a browser - that's integration testing.
179
+ ```
180
+
181
+ **13. Re-evaluation Paths**
182
+
183
+ When LLMs hit dead ends, provide concrete next steps.
184
+
185
+ ```markdown
186
+ ❌ BAD: "If none of the above apply, re-evaluate your approach"
187
+
188
+ ✅ GOOD: "If testing behavior that doesn't fit the categories:
189
+
190
+ 1. Break it down: Separate pure logic from I/O/UI concerns
191
+ 2. Test each piece: Pure → Unit, I/O → Integration, Multi-page → E2E
192
+ 3. Example: Login validation
193
+ - isValidEmail(email) → Unit test
194
+ - checkUserExists(email) → Integration test (database)
195
+ - Login form → Dashboard → E2E test (multi-page)"
196
+ ```
197
+
198
+ ## Anti-Patterns to Avoid
199
+
200
+ ❌ **Visual metaphors** - Pyramids, icebergs—LLMs don't process visual information well
201
+ ❌ **Undefined jargon** - "Technical debt", "code smell" need definitions
202
+ ❌ **Competing guidance** - Multiple decision frameworks that contradict each other
203
+ ❌ **Outdated references** - Remove concepts, but forget to update all mentions
204
+
205
+ ## Quality Checklist
206
+
207
+ Before saving/committing LLM-consumable documentation:
208
+
209
+ - [ ] Decision trees follow MECE principle (mutually exclusive, collectively exhaustive)
210
+ - [ ] Technical terms explicitly defined
211
+ - [ ] No contradictions between sections
212
+ - [ ] Every rule has 2-3 concrete examples (good vs bad)
213
+ - [ ] Edge cases explicitly covered
214
+ - [ ] Vague terms replaced with actionable principles
215
+ - [ ] Tie-breaking rules provided
216
+ - [ ] Complex decisions (3+ branches) have lookup tables
217
+ - [ ] Dead-end paths have re-evaluation steps with examples
218
+
219
+ ## Research-Backed Principles
220
+
221
+ - **MECE (McKinsey):** Mutually exclusive, collectively exhaustive decision trees for reliable LLM decisions
222
+ - **Prompt ambiguity (2025):** "Ambiguity is one of the most common causes of poor LLM output" (Zero-Shot Decision Tree Construction)
223
+ - **Concrete examples (2025):** Structured approaches with concrete examples consistently improve performance over "act as" or "###" techniques
224
+
225
+ ## Example: Before and After
226
+
227
+ **Before (ambiguous):**
228
+
229
+ ```markdown
230
+ Follow the test pyramid: lots of unit tests, some integration tests, few E2E tests.
231
+ ```
232
+
233
+ **After (LLM-optimized):**
234
+
235
+ ```markdown
236
+ Answer these questions IN ORDER to choose test type:
237
+
238
+ 1. Pure function (input → output, no I/O)? → Unit test
239
+ 2. Multiple components/services interacting? → Integration test
240
+ 3. Requires real browser (Playwright)? → E2E test
241
+
242
+ If multiple apply: choose the faster one.
243
+
244
+ Edge cases:
245
+
246
+ - React components with React Testing Library → Integration (not E2E, no real browser)
247
+ - Non-deterministic functions (Date.now()) → Unit test with mocked time
248
+ ```
@@ -0,0 +1,102 @@
1
+ # LLM Prompting Best Practices
2
+
3
+ This guide covers two related topics:
4
+
5
+ **Part 1: Prompting LLMs** - How to structure prompts when actively using an LLM (API calls, chat interactions)
6
+
7
+ **Part 2: Writing Instructions for LLMs** - How to write documentation that LLMs will read and follow (SAFEWORD.md, CLAUDE.md, testing guides, coding standards)
8
+
9
+ ---
10
+
11
+ ## Part 1: Prompting LLMs
12
+
13
+ ### Prompt Engineering Principles
14
+
15
+ **Concrete Examples Over Abstract Rules:**
16
+
17
+ - ✅ Good: Show "❌ BAD" vs "✅ GOOD" code examples
18
+ - ❌ Bad: "Follow best practices" (too vague)
19
+
20
+ **"Why" Over "What":**
21
+
22
+ - Explain architectural trade-offs and reasoning
23
+ - Include specific numbers (90% cost reduction, 3x faster)
24
+ - Document gotchas with explanations
25
+
26
+ **Structured Outputs:**
27
+
28
+ - Use JSON mode for predictable LLM responses
29
+ - Define explicit schemas with validation
30
+ - Return structured data, not prose
31
+
32
+ ```typescript
33
+ // ❌ BAD - Prose output
34
+ "The user wants to create a campaign named 'Shadows' with 4 players"
35
+
36
+ // ✅ GOOD - Structured JSON
37
+ { "intent": "create_campaign", "name": "Shadows", "playerCount": 4 }
38
+ ```
39
+
40
+ ### Cost Optimization
41
+
42
+ **Prompt Caching (Critical for AI Agents):**
43
+
44
+ - Static rules → System prompt with cache_control: ephemeral (caches for ~5 min, auto-expires)
45
+ - Dynamic data (character state, user input) → User message (no caching)
46
+ - Example: 468-line prompt costs $0.10 without caching, $0.01 with (90% reduction)
47
+ - Cache invalidation: ANY change to cached blocks breaks ALL caches
48
+ - Rule: Change system prompts sparingly; accept one-time cache rebuild cost
49
+
50
+ **Message Architecture:**
51
+
52
+ ```typescript
53
+ // ✅ GOOD - Cacheable system prompt
54
+ systemPrompt: [
55
+ { text: STATIC_RULES, cache_control: { type: 'ephemeral' } },
56
+ { text: STATIC_EXAMPLES, cache_control: { type: 'ephemeral' } },
57
+ ];
58
+ userMessage: `Character: ${dynamicState}\nAction: ${userInput}`;
59
+
60
+ // ❌ BAD - Uncacheable (character state in system prompt)
61
+ systemPrompt: `Rules + Character: ${dynamicState}`;
62
+ ```
63
+
64
+ ### Testing AI Outputs
65
+
66
+ **LLM-as-Judge Pattern:**
67
+
68
+ - Use LLM to evaluate nuanced qualities (narrative tone, reasoning quality)
69
+ - Avoid brittle keyword matching for creative outputs
70
+ - Define rubrics: EXCELLENT / ACCEPTABLE / POOR with criteria
71
+ - Example: "Does the GM's response show collaborative tone?" vs checking for specific words
72
+
73
+ **Evaluation Framework:**
74
+
75
+ - Unit tests: Pure functions (parsing, validation)
76
+ - Integration tests: Agent + real LLM calls (schema compliance)
77
+ - LLM Evals: Judgment quality (position/effect reasoning, atmosphere)
78
+ - Cost awareness: 30 scenarios ≈ $0.15-0.30 per run with caching
79
+
80
+ ---
81
+
82
+ ## Part 2: Writing Instructions for LLMs
83
+
84
+ **Comprehensive framework:** See @.safeword/guides/llm-instruction-design.md
85
+
86
+ **Quick summary:** When creating documentation that LLMs will read and follow (SAFEWORD.md, CLAUDE.md, testing guides, coding standards), apply 13 core principles:
87
+
88
+ 1. **MECE Principle** - Decision trees must be mutually exclusive and collectively exhaustive
89
+ 2. **Explicit Over Implicit** - Define all terms, never assume LLMs know what you mean
90
+ 3. **No Contradictions** - Different sections must align, LLMs don't reconcile conflicts
91
+ 4. **Concrete Examples Over Abstract Rules** - Show, don't just tell (2-3 examples per rule)
92
+ 5. **Edge Cases Must Be Explicit** - What seems obvious to humans often isn't to LLMs
93
+ 6. **Actionable Over Vague** - Replace subjective terms with optimization rules + red flags
94
+ 7. **Decision Trees: Sequential Over Parallel** - Ordered steps that stop at first match
95
+ 8. **Tie-Breaking Rules** - Tell LLMs how to choose when multiple options apply
96
+ 9. **Lookup Tables for Complex Decisions** - Provide reference tables for complex logic
97
+ 10. **Avoid Caveats in Tables** - Keep patterns clean, parentheticals break LLM pattern matching
98
+ 11. **Percentages: Context or None** - Include adjustment guidance or use principles instead
99
+ 12. **Specificity in Questions** - Use precise technical terms, not general descriptions
100
+ 13. **Re-evaluation Paths** - Provide concrete next steps when LLMs hit dead ends
101
+
102
+ **Also includes:** Anti-patterns to avoid, quality checklist, research-backed principles, and before/after examples.