safeword 0.2.2 → 0.2.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (235) hide show
  1. package/.claude/commands/arch-review.md +32 -0
  2. package/.claude/commands/lint.md +6 -0
  3. package/.claude/commands/quality-review.md +13 -0
  4. package/.claude/commands/setup-linting.md +6 -0
  5. package/.claude/hooks/auto-lint.sh +6 -0
  6. package/.claude/hooks/auto-quality-review.sh +170 -0
  7. package/.claude/hooks/check-linting-sync.sh +17 -0
  8. package/.claude/hooks/inject-timestamp.sh +6 -0
  9. package/.claude/hooks/question-protocol.sh +12 -0
  10. package/.claude/hooks/run-linters.sh +8 -0
  11. package/.claude/hooks/run-quality-review.sh +76 -0
  12. package/.claude/hooks/version-check.sh +10 -0
  13. package/.claude/mcp/README.md +96 -0
  14. package/.claude/mcp/arcade.sample.json +9 -0
  15. package/.claude/mcp/context7.sample.json +7 -0
  16. package/.claude/mcp/playwright.sample.json +7 -0
  17. package/.claude/settings.json +62 -0
  18. package/.claude/skills/quality-reviewer/SKILL.md +190 -0
  19. package/.claude/skills/safeword-quality-reviewer/SKILL.md +13 -0
  20. package/.env.arcade.example +4 -0
  21. package/.env.example +11 -0
  22. package/.gitmodules +4 -0
  23. package/.safeword/SAFEWORD.md +33 -0
  24. package/.safeword/eslint/eslint-base.mjs +101 -0
  25. package/.safeword/guides/architecture-guide.md +404 -0
  26. package/.safeword/guides/code-philosophy.md +174 -0
  27. package/.safeword/guides/context-files-guide.md +405 -0
  28. package/.safeword/guides/data-architecture-guide.md +183 -0
  29. package/.safeword/guides/design-doc-guide.md +165 -0
  30. package/.safeword/guides/learning-extraction.md +515 -0
  31. package/.safeword/guides/llm-instruction-design.md +239 -0
  32. package/.safeword/guides/llm-prompting.md +95 -0
  33. package/.safeword/guides/tdd-best-practices.md +570 -0
  34. package/.safeword/guides/test-definitions-guide.md +243 -0
  35. package/.safeword/guides/testing-methodology.md +573 -0
  36. package/.safeword/guides/user-story-guide.md +237 -0
  37. package/.safeword/guides/zombie-process-cleanup.md +214 -0
  38. package/{templates → .safeword}/hooks/agents-md-check.sh +0 -0
  39. package/{templates → .safeword}/hooks/post-tool.sh +0 -0
  40. package/{templates → .safeword}/hooks/pre-commit.sh +0 -0
  41. package/.safeword/planning/002-user-story-quality-evaluation.md +1840 -0
  42. package/.safeword/planning/003-langsmith-eval-setup-prompt.md +363 -0
  43. package/.safeword/planning/004-llm-eval-test-cases.md +3226 -0
  44. package/.safeword/planning/005-architecture-enforcement-system.md +169 -0
  45. package/.safeword/planning/006-reactive-fix-prevention-research.md +135 -0
  46. package/.safeword/planning/011-cli-ux-vision.md +330 -0
  47. package/.safeword/planning/012-project-structure-cleanup.md +154 -0
  48. package/.safeword/planning/README.md +39 -0
  49. package/.safeword/planning/automation-plan-v2.md +1225 -0
  50. package/.safeword/planning/automation-plan-v3.md +1291 -0
  51. package/.safeword/planning/automation-plan.md +3058 -0
  52. package/.safeword/planning/design/005-cli-implementation.md +343 -0
  53. package/.safeword/planning/design/013-cli-self-contained-templates.md +596 -0
  54. package/.safeword/planning/design/013a-eslint-plugin-suite.md +256 -0
  55. package/.safeword/planning/design/013b-implementation-snippets.md +385 -0
  56. package/.safeword/planning/design/013c-config-isolation-strategy.md +242 -0
  57. package/.safeword/planning/design/code-philosophy-improvements.md +60 -0
  58. package/.safeword/planning/mcp-analysis.md +545 -0
  59. package/.safeword/planning/phase2-subagents-vs-skills-analysis.md +451 -0
  60. package/.safeword/planning/settings-improvements.md +970 -0
  61. package/.safeword/planning/test-definitions/005-cli-implementation.md +1301 -0
  62. package/.safeword/planning/test-definitions/cli-self-contained-templates.md +205 -0
  63. package/.safeword/planning/user-stories/001-guides-review-user-stories.md +1381 -0
  64. package/.safeword/planning/user-stories/003-reactive-fix-prevention.md +132 -0
  65. package/.safeword/planning/user-stories/004-technical-constraints.md +86 -0
  66. package/.safeword/planning/user-stories/005-cli-implementation.md +311 -0
  67. package/.safeword/planning/user-stories/cli-self-contained-templates.md +172 -0
  68. package/.safeword/planning/versioned-distribution.md +740 -0
  69. package/.safeword/prompts/arch-review.md +43 -0
  70. package/.safeword/prompts/quality-review.md +11 -0
  71. package/.safeword/scripts/arch-review.sh +235 -0
  72. package/.safeword/scripts/check-linting-sync.sh +58 -0
  73. package/.safeword/scripts/setup-linting.sh +559 -0
  74. package/.safeword/templates/architecture-template.md +136 -0
  75. package/.safeword/templates/ci/architecture-check.yml +79 -0
  76. package/.safeword/templates/design-doc-template.md +127 -0
  77. package/.safeword/templates/test-definitions-feature.md +100 -0
  78. package/.safeword/templates/ticket-template.md +74 -0
  79. package/.safeword/templates/user-stories-template.md +82 -0
  80. package/.safeword/tickets/001-guides-review-user-stories.md +83 -0
  81. package/.safeword/tickets/002-architecture-enforcement.md +211 -0
  82. package/.safeword/tickets/003-reactive-fix-prevention.md +57 -0
  83. package/.safeword/tickets/004-technical-constraints-in-user-stories.md +39 -0
  84. package/.safeword/tickets/005-cli-implementation.md +248 -0
  85. package/.safeword/tickets/006-flesh-out-skills.md +43 -0
  86. package/.safeword/tickets/007-flesh-out-questioning.md +44 -0
  87. package/.safeword/tickets/008-upgrade-questioning.md +58 -0
  88. package/.safeword/tickets/009-naming-conventions.md +41 -0
  89. package/.safeword/tickets/010-safeword-md-cleanup.md +34 -0
  90. package/.safeword/tickets/011-cursor-setup.md +86 -0
  91. package/.safeword/tickets/README.md +73 -0
  92. package/.safeword/version +1 -0
  93. package/AGENTS.md +59 -0
  94. package/CLAUDE.md +12 -0
  95. package/README.md +347 -0
  96. package/docs/001-cli-implementation-plan.md +856 -0
  97. package/docs/elite-dx-implementation-plan.md +1034 -0
  98. package/framework/README.md +131 -0
  99. package/framework/mcp/README.md +96 -0
  100. package/framework/mcp/arcade.sample.json +8 -0
  101. package/framework/mcp/context7.sample.json +6 -0
  102. package/framework/mcp/playwright.sample.json +6 -0
  103. package/framework/scripts/arch-review.sh +235 -0
  104. package/framework/scripts/check-linting-sync.sh +58 -0
  105. package/framework/scripts/load-env.sh +49 -0
  106. package/framework/scripts/setup-claude.sh +223 -0
  107. package/framework/scripts/setup-linting.sh +559 -0
  108. package/framework/scripts/setup-quality.sh +477 -0
  109. package/framework/scripts/setup-safeword.sh +550 -0
  110. package/framework/templates/ci/architecture-check.yml +78 -0
  111. package/learnings/ai-sdk-v5-breaking-changes.md +178 -0
  112. package/learnings/e2e-test-zombie-processes.md +231 -0
  113. package/learnings/milkdown-crepe-editor-property.md +96 -0
  114. package/learnings/prosemirror-fragment-traversal.md +119 -0
  115. package/package.json +19 -43
  116. package/packages/cli/AGENTS.md +1 -0
  117. package/packages/cli/ARCHITECTURE.md +279 -0
  118. package/packages/cli/package.json +51 -0
  119. package/packages/cli/src/cli.ts +63 -0
  120. package/packages/cli/src/commands/check.ts +166 -0
  121. package/packages/cli/src/commands/diff.ts +209 -0
  122. package/packages/cli/src/commands/reset.ts +190 -0
  123. package/packages/cli/src/commands/setup.ts +325 -0
  124. package/packages/cli/src/commands/upgrade.ts +163 -0
  125. package/packages/cli/src/index.ts +3 -0
  126. package/packages/cli/src/templates/config.ts +58 -0
  127. package/packages/cli/src/templates/content.ts +18 -0
  128. package/packages/cli/src/templates/index.ts +12 -0
  129. package/packages/cli/src/utils/agents-md.ts +66 -0
  130. package/packages/cli/src/utils/fs.ts +179 -0
  131. package/packages/cli/src/utils/git.ts +124 -0
  132. package/packages/cli/src/utils/hooks.ts +29 -0
  133. package/packages/cli/src/utils/output.ts +60 -0
  134. package/packages/cli/src/utils/project-detector.test.ts +185 -0
  135. package/packages/cli/src/utils/project-detector.ts +44 -0
  136. package/packages/cli/src/utils/version.ts +28 -0
  137. package/packages/cli/src/version.ts +6 -0
  138. package/packages/cli/templates/SAFEWORD.md +776 -0
  139. package/packages/cli/templates/doc-templates/architecture-template.md +136 -0
  140. package/packages/cli/templates/doc-templates/design-doc-template.md +134 -0
  141. package/packages/cli/templates/doc-templates/test-definitions-feature.md +131 -0
  142. package/packages/cli/templates/doc-templates/ticket-template.md +82 -0
  143. package/packages/cli/templates/doc-templates/user-stories-template.md +92 -0
  144. package/packages/cli/templates/guides/architecture-guide.md +423 -0
  145. package/packages/cli/templates/guides/code-philosophy.md +195 -0
  146. package/packages/cli/templates/guides/context-files-guide.md +457 -0
  147. package/packages/cli/templates/guides/data-architecture-guide.md +200 -0
  148. package/packages/cli/templates/guides/design-doc-guide.md +171 -0
  149. package/packages/cli/templates/guides/learning-extraction.md +552 -0
  150. package/packages/cli/templates/guides/llm-instruction-design.md +248 -0
  151. package/packages/cli/templates/guides/llm-prompting.md +102 -0
  152. package/packages/cli/templates/guides/tdd-best-practices.md +615 -0
  153. package/packages/cli/templates/guides/test-definitions-guide.md +334 -0
  154. package/packages/cli/templates/guides/testing-methodology.md +618 -0
  155. package/packages/cli/templates/guides/user-story-guide.md +256 -0
  156. package/packages/cli/templates/guides/zombie-process-cleanup.md +219 -0
  157. package/packages/cli/templates/hooks/agents-md-check.sh +27 -0
  158. package/packages/cli/templates/hooks/post-tool.sh +4 -0
  159. package/packages/cli/templates/hooks/pre-commit.sh +10 -0
  160. package/packages/cli/templates/prompts/arch-review.md +43 -0
  161. package/packages/cli/templates/prompts/quality-review.md +10 -0
  162. package/packages/cli/templates/skills/safeword-quality-reviewer/SKILL.md +207 -0
  163. package/packages/cli/tests/commands/check.test.ts +129 -0
  164. package/packages/cli/tests/commands/cli.test.ts +89 -0
  165. package/packages/cli/tests/commands/diff.test.ts +115 -0
  166. package/packages/cli/tests/commands/reset.test.ts +310 -0
  167. package/packages/cli/tests/commands/self-healing.test.ts +170 -0
  168. package/packages/cli/tests/commands/setup-blocking.test.ts +71 -0
  169. package/packages/cli/tests/commands/setup-core.test.ts +135 -0
  170. package/packages/cli/tests/commands/setup-git.test.ts +139 -0
  171. package/packages/cli/tests/commands/setup-hooks.test.ts +334 -0
  172. package/packages/cli/tests/commands/setup-linting.test.ts +189 -0
  173. package/packages/cli/tests/commands/setup-noninteractive.test.ts +80 -0
  174. package/packages/cli/tests/commands/setup-templates.test.ts +181 -0
  175. package/packages/cli/tests/commands/upgrade.test.ts +215 -0
  176. package/packages/cli/tests/helpers.ts +243 -0
  177. package/packages/cli/tests/npm-package.test.ts +83 -0
  178. package/packages/cli/tests/technical-constraints.test.ts +96 -0
  179. package/packages/cli/tsconfig.json +25 -0
  180. package/packages/cli/tsup.config.ts +11 -0
  181. package/packages/cli/vitest.config.ts +23 -0
  182. package/promptfoo.yaml +3270 -0
  183. package/dist/check-M73LGONJ.js +0 -129
  184. package/dist/check-M73LGONJ.js.map +0 -1
  185. package/dist/chunk-2XWIUEQK.js +0 -190
  186. package/dist/chunk-2XWIUEQK.js.map +0 -1
  187. package/dist/chunk-GZRQL3SX.js +0 -146
  188. package/dist/chunk-GZRQL3SX.js.map +0 -1
  189. package/dist/chunk-V5G6BGOK.js +0 -26
  190. package/dist/chunk-V5G6BGOK.js.map +0 -1
  191. package/dist/chunk-W66Z3C5H.js +0 -21
  192. package/dist/chunk-W66Z3C5H.js.map +0 -1
  193. package/dist/cli.d.ts +0 -1
  194. package/dist/cli.js +0 -34
  195. package/dist/cli.js.map +0 -1
  196. package/dist/diff-FSFDCBL5.js +0 -166
  197. package/dist/diff-FSFDCBL5.js.map +0 -1
  198. package/dist/index.d.ts +0 -11
  199. package/dist/index.js +0 -7
  200. package/dist/index.js.map +0 -1
  201. package/dist/reset-3ACTIYYE.js +0 -143
  202. package/dist/reset-3ACTIYYE.js.map +0 -1
  203. package/dist/setup-MKVVQTVA.js +0 -266
  204. package/dist/setup-MKVVQTVA.js.map +0 -1
  205. package/dist/upgrade-FQOL6AF5.js +0 -134
  206. package/dist/upgrade-FQOL6AF5.js.map +0 -1
  207. /package/{templates → framework}/SAFEWORD.md +0 -0
  208. /package/{templates → framework}/guides/architecture-guide.md +0 -0
  209. /package/{templates → framework}/guides/code-philosophy.md +0 -0
  210. /package/{templates → framework}/guides/context-files-guide.md +0 -0
  211. /package/{templates → framework}/guides/data-architecture-guide.md +0 -0
  212. /package/{templates → framework}/guides/design-doc-guide.md +0 -0
  213. /package/{templates → framework}/guides/learning-extraction.md +0 -0
  214. /package/{templates → framework}/guides/llm-instruction-design.md +0 -0
  215. /package/{templates → framework}/guides/llm-prompting.md +0 -0
  216. /package/{templates → framework}/guides/tdd-best-practices.md +0 -0
  217. /package/{templates → framework}/guides/test-definitions-guide.md +0 -0
  218. /package/{templates → framework}/guides/testing-methodology.md +0 -0
  219. /package/{templates → framework}/guides/user-story-guide.md +0 -0
  220. /package/{templates → framework}/guides/zombie-process-cleanup.md +0 -0
  221. /package/{templates → framework}/prompts/arch-review.md +0 -0
  222. /package/{templates → framework}/prompts/quality-review.md +0 -0
  223. /package/{templates/skills/safeword-quality-reviewer → framework/skills/quality-reviewer}/SKILL.md +0 -0
  224. /package/{templates/doc-templates → framework/templates}/architecture-template.md +0 -0
  225. /package/{templates/doc-templates → framework/templates}/design-doc-template.md +0 -0
  226. /package/{templates/doc-templates → framework/templates}/test-definitions-feature.md +0 -0
  227. /package/{templates/doc-templates → framework/templates}/ticket-template.md +0 -0
  228. /package/{templates/doc-templates → framework/templates}/user-stories-template.md +0 -0
  229. /package/{templates → packages/cli/templates}/commands/arch-review.md +0 -0
  230. /package/{templates → packages/cli/templates}/commands/lint.md +0 -0
  231. /package/{templates → packages/cli/templates}/commands/quality-review.md +0 -0
  232. /package/{templates → packages/cli/templates}/hooks/inject-timestamp.sh +0 -0
  233. /package/{templates → packages/cli/templates}/lib/common.sh +0 -0
  234. /package/{templates → packages/cli/templates}/lib/jq-fallback.sh +0 -0
  235. /package/{templates → packages/cli/templates}/markdownlint.jsonc +0 -0
@@ -0,0 +1,239 @@
1
+ # Writing Instructions for LLMs
2
+
3
+ **Context:** When creating documentation that LLMs will read and follow (like AGENTS.md, CLAUDE.md, testing guides, coding standards), different best practices apply than when prompting an LLM directly.
4
+
5
+ ## Core Principles
6
+
7
+ **1. MECE Principle (Mutually Exclusive, Collectively Exhaustive)**
8
+
9
+ Decision trees and categorization must have no overlap and cover all cases. Research shows LLMs struggle with overlapping categories—McKinsey/BCG MECE framework ensures clear decision paths.
10
+
11
+ ```markdown
12
+ ❌ BAD - Not mutually exclusive:
13
+ ├─ Pure function?
14
+ ├─ Multiple components interacting?
15
+ ├─ Full user flow?
16
+
17
+ Problem: A function with database calls could match both
18
+
19
+ ✅ GOOD - Sequential, mutually exclusive:
20
+ 1. AI content quality? → LLM Eval
21
+ 2. Requires real browser? → E2E test
22
+ 3. Multiple components? → Integration test
23
+ 4. Pure function? → Unit test
24
+
25
+ Stops at first match, no ambiguity.
26
+ ```
27
+
28
+ **2. Explicit Over Implicit**
29
+
30
+ Never assume LLMs know what you mean. Define all terms, even "obvious" ones.
31
+
32
+ ```markdown
33
+ ❌ BAD: "Test at the lowest level"
34
+ ✅ GOOD: "Test with the fastest test type that can catch the bug"
35
+
36
+ Examples needing definition:
37
+ - "Critical paths" → Always critical: auth, payment. Rarely: UI polish, admin
38
+ - "Browser" → Real browser (Playwright/Cypress), not jsdom
39
+ - "Pure function" → Input → output, no I/O (define edge cases like Date.now())
40
+ ```
41
+
42
+ **3. No Contradictions**
43
+
44
+ Different sections must align. LLMs don't reconcile conflicting guidance. When updating, grep for related terms and update all references.
45
+
46
+ ```markdown
47
+ ❌ BAD:
48
+ Section A: "Write E2E tests only for critical user paths"
49
+ Section B: "All user-facing features have at least one E2E test"
50
+
51
+ ✅ GOOD:
52
+ Section A: "Write E2E tests only for critical user paths"
53
+ Section B: "All critical multi-page user flows have at least one E2E test"
54
+ + Definition of "critical" with examples
55
+ ```
56
+
57
+ **4. Concrete Examples Over Abstract Rules**
58
+
59
+ Show, don't just tell. LLMs learn patterns from examples. For every rule, include 2-3 concrete examples showing good vs bad.
60
+
61
+ ```markdown
62
+ ❌ BAD: "Follow best practices for testing"
63
+
64
+ ✅ GOOD:
65
+ // ❌ BAD - Testing business logic with E2E
66
+ test('discount calculation', async ({ page }) => {
67
+ await page.goto('/checkout')
68
+ await page.fill('[name="price"]', '100')
69
+ await expect(page.locator('.total')).toContainText('80')
70
+ })
71
+
72
+ // ✅ GOOD - Unit test (runs in milliseconds)
73
+ it('applies 20% discount', () => {
74
+ expect(calculateDiscount(100, 0.20)).toBe(80)
75
+ })
76
+ ```
77
+
78
+ **5. Edge Cases Must Be Explicit**
79
+
80
+ What seems obvious to humans often isn't to LLMs. After stating a rule, add "Edge cases:" section with common confusing scenarios.
81
+
82
+ ```markdown
83
+ ❌ BAD: "Unit test pure functions"
84
+
85
+ ✅ GOOD: "Unit test pure functions"
86
+
87
+ Edge cases:
88
+ - Non-deterministic functions (Math.random(), Date.now()) → Unit test with mocked randomness/time
89
+ - Environment dependencies (process.env) → Integration test
90
+ - Mixed pure + I/O → Extract pure part, unit test separately
91
+ ```
92
+
93
+ **6. Actionable Over Vague**
94
+
95
+ Give LLMs concrete actions, not subjective guidance. Replace subjective terms (most/some/few) with optimization rules + red flags.
96
+
97
+ ```markdown
98
+ ❌ BAD: "Most tests: Fast, Some tests: Slow"
99
+
100
+ ✅ GOOD:
101
+ - Write as many fast tests as possible
102
+ - Write E2E tests only for critical paths requiring a browser
103
+ - Red flag: If you have more E2E tests than integration tests, suite is too slow
104
+ ```
105
+
106
+ **7. Decision Trees: Sequential Over Parallel**
107
+
108
+ Structure decisions as ordered steps, not simultaneous checks. Sequential questions force the LLM through a deterministic decision path.
109
+
110
+ ```markdown
111
+ ❌ BAD - Parallel branches:
112
+ ├─ Pure function?
113
+ ├─ Multiple components?
114
+ └─ Full user flow?
115
+
116
+ ✅ GOOD - Sequential (see Principle 1 example above)
117
+ Answer questions IN ORDER. Stop at the first match.
118
+ ```
119
+
120
+ **8. Tie-Breaking Rules**
121
+
122
+ When multiple options could apply, tell LLMs how to choose.
123
+
124
+ ```markdown
125
+ ✅ GOOD:
126
+ "If multiple test types can catch the bug, choose the fastest one."
127
+
128
+ Reference in decision trees:
129
+ "If multiple seem to apply, use the tie-breaking rule stated above: choose the faster one."
130
+ ```
131
+
132
+ **9. Lookup Tables for Complex Decisions**
133
+
134
+ When decision logic has 3+ branches, nested conditions, or multiple variables to consider, provide a reference table.
135
+
136
+ ```markdown
137
+ | Bug Type | Unit? | Integration? | E2E? | Best Choice |
138
+ |----------|-------|--------------|------|-------------|
139
+ | Calculation error | ✅ | ✅ | ✅ | Unit (fastest) |
140
+ | Database query bug | ❌ | ✅ | ✅ | Integration |
141
+ | CSS layout broken | ❌ | ❌ | ✅ | E2E (only option) |
142
+ ```
143
+
144
+ **10. Avoid Caveats in Tables**
145
+
146
+ Keep patterns clean. Parentheticals break LLM pattern matching. Add separate rows for caveat cases.
147
+
148
+ ```markdown
149
+ ❌ BAD: | State management bug | ❌ NO (if mocked) | ✅ YES |
150
+ ✅ GOOD: | State management bug (Zustand, Redux) | ❌ NO | ✅ YES |
151
+ ```
152
+
153
+ **11. Percentages: Context or None**
154
+
155
+ Don't use percentages without adjustment guidance.
156
+
157
+ ```markdown
158
+ ❌ BAD: "70% unit tests, 20% integration, 10% E2E"
159
+
160
+ ✅ BETTER: "Baseline: 70/20/10. Adjust: Microservices → 60/30/10, UI-heavy → 60/20/20"
161
+
162
+ ✅ BEST: "Write as many fast tests as possible. Red flag: More E2E than integration = too slow."
163
+ ```
164
+
165
+ **12. Specificity in Questions**
166
+
167
+ Use precise technical terms, not general descriptions.
168
+
169
+ ```markdown
170
+ ❌ BAD: "Does this require seeing the UI?"
171
+ ✅ GOOD: "Does this require a real browser (Playwright/Cypress)?"
172
+
173
+ Note: React Testing Library does NOT require a browser - that's integration testing.
174
+ ```
175
+
176
+ **13. Re-evaluation Paths**
177
+
178
+ When LLMs hit dead ends, provide concrete next steps.
179
+
180
+ ```markdown
181
+ ❌ BAD: "If none of the above apply, re-evaluate your approach"
182
+
183
+ ✅ GOOD: "If testing behavior that doesn't fit the categories:
184
+ 1. Break it down: Separate pure logic from I/O/UI concerns
185
+ 2. Test each piece: Pure → Unit, I/O → Integration, Multi-page → E2E
186
+ 3. Example: Login validation
187
+ - isValidEmail(email) → Unit test
188
+ - checkUserExists(email) → Integration test (database)
189
+ - Login form → Dashboard → E2E test (multi-page)"
190
+ ```
191
+
192
+ ## Anti-Patterns to Avoid
193
+
194
+ ❌ **Visual metaphors** - Pyramids, icebergs—LLMs don't process visual information well
195
+ ❌ **Undefined jargon** - "Technical debt", "code smell" need definitions
196
+ ❌ **Competing guidance** - Multiple decision frameworks that contradict each other
197
+ ❌ **Outdated references** - Remove concepts, but forget to update all mentions
198
+
199
+ ## Quality Checklist
200
+
201
+ Before saving/committing LLM-consumable documentation:
202
+
203
+ - [ ] Decision trees follow MECE principle (mutually exclusive, collectively exhaustive)
204
+ - [ ] Technical terms explicitly defined
205
+ - [ ] No contradictions between sections
206
+ - [ ] Every rule has 2-3 concrete examples (good vs bad)
207
+ - [ ] Edge cases explicitly covered
208
+ - [ ] Vague terms replaced with actionable principles
209
+ - [ ] Tie-breaking rules provided
210
+ - [ ] Complex decisions (3+ branches) have lookup tables
211
+ - [ ] Dead-end paths have re-evaluation steps with examples
212
+
213
+ ## Research-Backed Principles
214
+
215
+ - **MECE (McKinsey):** Mutually exclusive, collectively exhaustive decision trees for reliable LLM decisions
216
+ - **Prompt ambiguity (2025):** "Ambiguity is one of the most common causes of poor LLM output" (Zero-Shot Decision Tree Construction)
217
+ - **Concrete examples (2025):** Structured approaches with concrete examples consistently improve performance over "act as" or "###" techniques
218
+
219
+ ## Example: Before and After
220
+
221
+ **Before (ambiguous):**
222
+ ```markdown
223
+ Follow the test pyramid: lots of unit tests, some integration tests, few E2E tests.
224
+ ```
225
+
226
+ **After (LLM-optimized):**
227
+ ```markdown
228
+ Answer these questions IN ORDER to choose test type:
229
+
230
+ 1. Pure function (input → output, no I/O)? → Unit test
231
+ 2. Multiple components/services interacting? → Integration test
232
+ 3. Requires real browser (Playwright)? → E2E test
233
+
234
+ If multiple apply: choose the faster one.
235
+
236
+ Edge cases:
237
+ - React components with React Testing Library → Integration (not E2E, no real browser)
238
+ - Non-deterministic functions (Date.now()) → Unit test with mocked time
239
+ ```
@@ -0,0 +1,95 @@
1
+ # LLM Prompting Best Practices
2
+
3
+ This guide covers two related topics:
4
+
5
+ **Part 1: Prompting LLMs** - How to structure prompts when actively using an LLM (API calls, chat interactions)
6
+
7
+ **Part 2: Writing Instructions for LLMs** - How to write documentation that LLMs will read and follow (SAFEWORD.md, CLAUDE.md, testing guides, coding standards)
8
+
9
+ ---
10
+
11
+ ## Part 1: Prompting LLMs
12
+
13
+ ### Prompt Engineering Principles
14
+
15
+ **Concrete Examples Over Abstract Rules:**
16
+ - ✅ Good: Show "❌ BAD" vs "✅ GOOD" code examples
17
+ - ❌ Bad: "Follow best practices" (too vague)
18
+
19
+ **"Why" Over "What":**
20
+ - Explain architectural trade-offs and reasoning
21
+ - Include specific numbers (90% cost reduction, 3x faster)
22
+ - Document gotchas with explanations
23
+
24
+ **Structured Outputs:**
25
+ - Use JSON mode for predictable LLM responses
26
+ - Define explicit schemas with validation
27
+ - Return structured data, not prose
28
+
29
+ ```typescript
30
+ // ❌ BAD - Prose output
31
+ "The user wants to create a campaign named 'Shadows' with 4 players"
32
+
33
+ // ✅ GOOD - Structured JSON
34
+ { "intent": "create_campaign", "name": "Shadows", "playerCount": 4 }
35
+ ```
36
+
37
+ ### Cost Optimization
38
+
39
+ **Prompt Caching (Critical for AI Agents):**
40
+ - Static rules → System prompt with cache_control: ephemeral (caches for ~5 min, auto-expires)
41
+ - Dynamic data (character state, user input) → User message (no caching)
42
+ - Example: 468-line prompt costs $0.10 without caching, $0.01 with (90% reduction)
43
+ - Cache invalidation: ANY change to cached blocks breaks ALL caches
44
+ - Rule: Change system prompts sparingly; accept one-time cache rebuild cost
45
+
46
+ **Message Architecture:**
47
+ ```typescript
48
+ // ✅ GOOD - Cacheable system prompt
49
+ systemPrompt: [
50
+ { text: STATIC_RULES, cache_control: { type: 'ephemeral' } },
51
+ { text: STATIC_EXAMPLES, cache_control: { type: 'ephemeral' } }
52
+ ]
53
+ userMessage: `Character: ${dynamicState}\nAction: ${userInput}`
54
+
55
+ // ❌ BAD - Uncacheable (character state in system prompt)
56
+ systemPrompt: `Rules + Character: ${dynamicState}`
57
+ ```
58
+
59
+ ### Testing AI Outputs
60
+
61
+ **LLM-as-Judge Pattern:**
62
+ - Use LLM to evaluate nuanced qualities (narrative tone, reasoning quality)
63
+ - Avoid brittle keyword matching for creative outputs
64
+ - Define rubrics: EXCELLENT / ACCEPTABLE / POOR with criteria
65
+ - Example: "Does the GM's response show collaborative tone?" vs checking for specific words
66
+
67
+ **Evaluation Framework:**
68
+ - Unit tests: Pure functions (parsing, validation)
69
+ - Integration tests: Agent + real LLM calls (schema compliance)
70
+ - LLM Evals: Judgment quality (position/effect reasoning, atmosphere)
71
+ - Cost awareness: 30 scenarios ≈ $0.15-0.30 per run with caching
72
+
73
+ ---
74
+
75
+ ## Part 2: Writing Instructions for LLMs
76
+
77
+ **Comprehensive framework:** See @.safeword/guides/llm-instruction-design.md
78
+
79
+ **Quick summary:** When creating documentation that LLMs will read and follow (SAFEWORD.md, CLAUDE.md, testing guides, coding standards), apply 13 core principles:
80
+
81
+ 1. **MECE Principle** - Decision trees must be mutually exclusive and collectively exhaustive
82
+ 2. **Explicit Over Implicit** - Define all terms, never assume LLMs know what you mean
83
+ 3. **No Contradictions** - Different sections must align, LLMs don't reconcile conflicts
84
+ 4. **Concrete Examples Over Abstract Rules** - Show, don't just tell (2-3 examples per rule)
85
+ 5. **Edge Cases Must Be Explicit** - What seems obvious to humans often isn't to LLMs
86
+ 6. **Actionable Over Vague** - Replace subjective terms with optimization rules + red flags
87
+ 7. **Decision Trees: Sequential Over Parallel** - Ordered steps that stop at first match
88
+ 8. **Tie-Breaking Rules** - Tell LLMs how to choose when multiple options apply
89
+ 9. **Lookup Tables for Complex Decisions** - Provide reference tables for complex logic
90
+ 10. **Avoid Caveats in Tables** - Keep patterns clean, parentheticals break LLM pattern matching
91
+ 11. **Percentages: Context or None** - Include adjustment guidance or use principles instead
92
+ 12. **Specificity in Questions** - Use precise technical terms, not general descriptions
93
+ 13. **Re-evaluation Paths** - Provide concrete next steps when LLMs hit dead ends
94
+
95
+ **Also includes:** Anti-patterns to avoid, quality checklist, research-backed principles, and before/after examples.