safeword 0.2.4 → 0.2.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (235) hide show
  1. package/dist/check-3NGQ4NR5.js +129 -0
  2. package/dist/check-3NGQ4NR5.js.map +1 -0
  3. package/dist/chunk-2XWIUEQK.js +190 -0
  4. package/dist/chunk-2XWIUEQK.js.map +1 -0
  5. package/dist/chunk-GZRQL3SX.js +146 -0
  6. package/dist/chunk-GZRQL3SX.js.map +1 -0
  7. package/dist/chunk-ORQHKDT2.js +10 -0
  8. package/dist/chunk-ORQHKDT2.js.map +1 -0
  9. package/dist/chunk-W66Z3C5H.js +21 -0
  10. package/dist/chunk-W66Z3C5H.js.map +1 -0
  11. package/dist/cli.d.ts +1 -0
  12. package/dist/cli.js +34 -0
  13. package/dist/cli.js.map +1 -0
  14. package/dist/diff-Y6QTAW4O.js +166 -0
  15. package/dist/diff-Y6QTAW4O.js.map +1 -0
  16. package/dist/index.d.ts +11 -0
  17. package/dist/index.js +7 -0
  18. package/dist/index.js.map +1 -0
  19. package/dist/reset-3ACTIYYE.js +143 -0
  20. package/dist/reset-3ACTIYYE.js.map +1 -0
  21. package/dist/setup-RR4M334C.js +266 -0
  22. package/dist/setup-RR4M334C.js.map +1 -0
  23. package/dist/upgrade-6AR3DHUV.js +134 -0
  24. package/dist/upgrade-6AR3DHUV.js.map +1 -0
  25. package/package.json +44 -19
  26. package/{.safeword → templates}/hooks/agents-md-check.sh +0 -0
  27. package/{.safeword → templates}/hooks/post-tool.sh +0 -0
  28. package/{.safeword → templates}/hooks/pre-commit.sh +0 -0
  29. package/.claude/commands/arch-review.md +0 -32
  30. package/.claude/commands/lint.md +0 -6
  31. package/.claude/commands/quality-review.md +0 -13
  32. package/.claude/commands/setup-linting.md +0 -6
  33. package/.claude/hooks/auto-lint.sh +0 -6
  34. package/.claude/hooks/auto-quality-review.sh +0 -170
  35. package/.claude/hooks/check-linting-sync.sh +0 -17
  36. package/.claude/hooks/inject-timestamp.sh +0 -6
  37. package/.claude/hooks/question-protocol.sh +0 -12
  38. package/.claude/hooks/run-linters.sh +0 -8
  39. package/.claude/hooks/run-quality-review.sh +0 -76
  40. package/.claude/hooks/version-check.sh +0 -10
  41. package/.claude/mcp/README.md +0 -96
  42. package/.claude/mcp/arcade.sample.json +0 -9
  43. package/.claude/mcp/context7.sample.json +0 -7
  44. package/.claude/mcp/playwright.sample.json +0 -7
  45. package/.claude/settings.json +0 -62
  46. package/.claude/skills/quality-reviewer/SKILL.md +0 -190
  47. package/.claude/skills/safeword-quality-reviewer/SKILL.md +0 -13
  48. package/.env.arcade.example +0 -4
  49. package/.env.example +0 -11
  50. package/.gitmodules +0 -4
  51. package/.safeword/SAFEWORD.md +0 -33
  52. package/.safeword/eslint/eslint-base.mjs +0 -101
  53. package/.safeword/guides/architecture-guide.md +0 -404
  54. package/.safeword/guides/code-philosophy.md +0 -174
  55. package/.safeword/guides/context-files-guide.md +0 -405
  56. package/.safeword/guides/data-architecture-guide.md +0 -183
  57. package/.safeword/guides/design-doc-guide.md +0 -165
  58. package/.safeword/guides/learning-extraction.md +0 -515
  59. package/.safeword/guides/llm-instruction-design.md +0 -239
  60. package/.safeword/guides/llm-prompting.md +0 -95
  61. package/.safeword/guides/tdd-best-practices.md +0 -570
  62. package/.safeword/guides/test-definitions-guide.md +0 -243
  63. package/.safeword/guides/testing-methodology.md +0 -573
  64. package/.safeword/guides/user-story-guide.md +0 -237
  65. package/.safeword/guides/zombie-process-cleanup.md +0 -214
  66. package/.safeword/planning/002-user-story-quality-evaluation.md +0 -1840
  67. package/.safeword/planning/003-langsmith-eval-setup-prompt.md +0 -363
  68. package/.safeword/planning/004-llm-eval-test-cases.md +0 -3226
  69. package/.safeword/planning/005-architecture-enforcement-system.md +0 -169
  70. package/.safeword/planning/006-reactive-fix-prevention-research.md +0 -135
  71. package/.safeword/planning/011-cli-ux-vision.md +0 -330
  72. package/.safeword/planning/012-project-structure-cleanup.md +0 -154
  73. package/.safeword/planning/README.md +0 -39
  74. package/.safeword/planning/automation-plan-v2.md +0 -1225
  75. package/.safeword/planning/automation-plan-v3.md +0 -1291
  76. package/.safeword/planning/automation-plan.md +0 -3058
  77. package/.safeword/planning/design/005-cli-implementation.md +0 -343
  78. package/.safeword/planning/design/013-cli-self-contained-templates.md +0 -596
  79. package/.safeword/planning/design/013a-eslint-plugin-suite.md +0 -256
  80. package/.safeword/planning/design/013b-implementation-snippets.md +0 -385
  81. package/.safeword/planning/design/013c-config-isolation-strategy.md +0 -242
  82. package/.safeword/planning/design/code-philosophy-improvements.md +0 -60
  83. package/.safeword/planning/mcp-analysis.md +0 -545
  84. package/.safeword/planning/phase2-subagents-vs-skills-analysis.md +0 -451
  85. package/.safeword/planning/settings-improvements.md +0 -970
  86. package/.safeword/planning/test-definitions/005-cli-implementation.md +0 -1301
  87. package/.safeword/planning/test-definitions/cli-self-contained-templates.md +0 -205
  88. package/.safeword/planning/user-stories/001-guides-review-user-stories.md +0 -1381
  89. package/.safeword/planning/user-stories/003-reactive-fix-prevention.md +0 -132
  90. package/.safeword/planning/user-stories/004-technical-constraints.md +0 -86
  91. package/.safeword/planning/user-stories/005-cli-implementation.md +0 -311
  92. package/.safeword/planning/user-stories/cli-self-contained-templates.md +0 -172
  93. package/.safeword/planning/versioned-distribution.md +0 -740
  94. package/.safeword/prompts/arch-review.md +0 -43
  95. package/.safeword/prompts/quality-review.md +0 -11
  96. package/.safeword/scripts/arch-review.sh +0 -235
  97. package/.safeword/scripts/check-linting-sync.sh +0 -58
  98. package/.safeword/scripts/setup-linting.sh +0 -559
  99. package/.safeword/templates/architecture-template.md +0 -136
  100. package/.safeword/templates/ci/architecture-check.yml +0 -79
  101. package/.safeword/templates/design-doc-template.md +0 -127
  102. package/.safeword/templates/test-definitions-feature.md +0 -100
  103. package/.safeword/templates/ticket-template.md +0 -74
  104. package/.safeword/templates/user-stories-template.md +0 -82
  105. package/.safeword/tickets/001-guides-review-user-stories.md +0 -83
  106. package/.safeword/tickets/002-architecture-enforcement.md +0 -211
  107. package/.safeword/tickets/003-reactive-fix-prevention.md +0 -57
  108. package/.safeword/tickets/004-technical-constraints-in-user-stories.md +0 -39
  109. package/.safeword/tickets/005-cli-implementation.md +0 -248
  110. package/.safeword/tickets/006-flesh-out-skills.md +0 -43
  111. package/.safeword/tickets/007-flesh-out-questioning.md +0 -44
  112. package/.safeword/tickets/008-upgrade-questioning.md +0 -58
  113. package/.safeword/tickets/009-naming-conventions.md +0 -41
  114. package/.safeword/tickets/010-safeword-md-cleanup.md +0 -34
  115. package/.safeword/tickets/011-cursor-setup.md +0 -86
  116. package/.safeword/tickets/README.md +0 -73
  117. package/.safeword/version +0 -1
  118. package/AGENTS.md +0 -59
  119. package/CLAUDE.md +0 -12
  120. package/README.md +0 -347
  121. package/docs/001-cli-implementation-plan.md +0 -856
  122. package/docs/elite-dx-implementation-plan.md +0 -1034
  123. package/framework/README.md +0 -131
  124. package/framework/mcp/README.md +0 -96
  125. package/framework/mcp/arcade.sample.json +0 -8
  126. package/framework/mcp/context7.sample.json +0 -6
  127. package/framework/mcp/playwright.sample.json +0 -6
  128. package/framework/scripts/arch-review.sh +0 -235
  129. package/framework/scripts/check-linting-sync.sh +0 -58
  130. package/framework/scripts/load-env.sh +0 -49
  131. package/framework/scripts/setup-claude.sh +0 -223
  132. package/framework/scripts/setup-linting.sh +0 -559
  133. package/framework/scripts/setup-quality.sh +0 -477
  134. package/framework/scripts/setup-safeword.sh +0 -550
  135. package/framework/templates/ci/architecture-check.yml +0 -78
  136. package/learnings/ai-sdk-v5-breaking-changes.md +0 -178
  137. package/learnings/e2e-test-zombie-processes.md +0 -231
  138. package/learnings/milkdown-crepe-editor-property.md +0 -96
  139. package/learnings/prosemirror-fragment-traversal.md +0 -119
  140. package/packages/cli/AGENTS.md +0 -1
  141. package/packages/cli/ARCHITECTURE.md +0 -279
  142. package/packages/cli/package.json +0 -51
  143. package/packages/cli/src/cli.ts +0 -63
  144. package/packages/cli/src/commands/check.ts +0 -166
  145. package/packages/cli/src/commands/diff.ts +0 -209
  146. package/packages/cli/src/commands/reset.ts +0 -190
  147. package/packages/cli/src/commands/setup.ts +0 -325
  148. package/packages/cli/src/commands/upgrade.ts +0 -163
  149. package/packages/cli/src/index.ts +0 -3
  150. package/packages/cli/src/templates/config.ts +0 -58
  151. package/packages/cli/src/templates/content.ts +0 -18
  152. package/packages/cli/src/templates/index.ts +0 -12
  153. package/packages/cli/src/utils/agents-md.ts +0 -66
  154. package/packages/cli/src/utils/fs.ts +0 -179
  155. package/packages/cli/src/utils/git.ts +0 -124
  156. package/packages/cli/src/utils/hooks.ts +0 -29
  157. package/packages/cli/src/utils/output.ts +0 -60
  158. package/packages/cli/src/utils/project-detector.test.ts +0 -185
  159. package/packages/cli/src/utils/project-detector.ts +0 -44
  160. package/packages/cli/src/utils/version.ts +0 -28
  161. package/packages/cli/src/version.ts +0 -6
  162. package/packages/cli/templates/SAFEWORD.md +0 -776
  163. package/packages/cli/templates/doc-templates/architecture-template.md +0 -136
  164. package/packages/cli/templates/doc-templates/design-doc-template.md +0 -134
  165. package/packages/cli/templates/doc-templates/test-definitions-feature.md +0 -131
  166. package/packages/cli/templates/doc-templates/ticket-template.md +0 -82
  167. package/packages/cli/templates/doc-templates/user-stories-template.md +0 -92
  168. package/packages/cli/templates/guides/architecture-guide.md +0 -423
  169. package/packages/cli/templates/guides/code-philosophy.md +0 -195
  170. package/packages/cli/templates/guides/context-files-guide.md +0 -457
  171. package/packages/cli/templates/guides/data-architecture-guide.md +0 -200
  172. package/packages/cli/templates/guides/design-doc-guide.md +0 -171
  173. package/packages/cli/templates/guides/learning-extraction.md +0 -552
  174. package/packages/cli/templates/guides/llm-instruction-design.md +0 -248
  175. package/packages/cli/templates/guides/llm-prompting.md +0 -102
  176. package/packages/cli/templates/guides/tdd-best-practices.md +0 -615
  177. package/packages/cli/templates/guides/test-definitions-guide.md +0 -334
  178. package/packages/cli/templates/guides/testing-methodology.md +0 -618
  179. package/packages/cli/templates/guides/user-story-guide.md +0 -256
  180. package/packages/cli/templates/guides/zombie-process-cleanup.md +0 -219
  181. package/packages/cli/templates/hooks/agents-md-check.sh +0 -27
  182. package/packages/cli/templates/hooks/post-tool.sh +0 -4
  183. package/packages/cli/templates/hooks/pre-commit.sh +0 -10
  184. package/packages/cli/templates/prompts/arch-review.md +0 -43
  185. package/packages/cli/templates/prompts/quality-review.md +0 -10
  186. package/packages/cli/templates/skills/safeword-quality-reviewer/SKILL.md +0 -207
  187. package/packages/cli/tests/commands/check.test.ts +0 -129
  188. package/packages/cli/tests/commands/cli.test.ts +0 -89
  189. package/packages/cli/tests/commands/diff.test.ts +0 -115
  190. package/packages/cli/tests/commands/reset.test.ts +0 -310
  191. package/packages/cli/tests/commands/self-healing.test.ts +0 -170
  192. package/packages/cli/tests/commands/setup-blocking.test.ts +0 -71
  193. package/packages/cli/tests/commands/setup-core.test.ts +0 -135
  194. package/packages/cli/tests/commands/setup-git.test.ts +0 -139
  195. package/packages/cli/tests/commands/setup-hooks.test.ts +0 -334
  196. package/packages/cli/tests/commands/setup-linting.test.ts +0 -189
  197. package/packages/cli/tests/commands/setup-noninteractive.test.ts +0 -80
  198. package/packages/cli/tests/commands/setup-templates.test.ts +0 -181
  199. package/packages/cli/tests/commands/upgrade.test.ts +0 -215
  200. package/packages/cli/tests/helpers.ts +0 -243
  201. package/packages/cli/tests/npm-package.test.ts +0 -83
  202. package/packages/cli/tests/technical-constraints.test.ts +0 -96
  203. package/packages/cli/tsconfig.json +0 -25
  204. package/packages/cli/tsup.config.ts +0 -11
  205. package/packages/cli/vitest.config.ts +0 -23
  206. package/promptfoo.yaml +0 -3270
  207. /package/{framework → templates}/SAFEWORD.md +0 -0
  208. /package/{packages/cli/templates → templates}/commands/arch-review.md +0 -0
  209. /package/{packages/cli/templates → templates}/commands/lint.md +0 -0
  210. /package/{packages/cli/templates → templates}/commands/quality-review.md +0 -0
  211. /package/{framework/templates → templates/doc-templates}/architecture-template.md +0 -0
  212. /package/{framework/templates → templates/doc-templates}/design-doc-template.md +0 -0
  213. /package/{framework/templates → templates/doc-templates}/test-definitions-feature.md +0 -0
  214. /package/{framework/templates → templates/doc-templates}/ticket-template.md +0 -0
  215. /package/{framework/templates → templates/doc-templates}/user-stories-template.md +0 -0
  216. /package/{framework → templates}/guides/architecture-guide.md +0 -0
  217. /package/{framework → templates}/guides/code-philosophy.md +0 -0
  218. /package/{framework → templates}/guides/context-files-guide.md +0 -0
  219. /package/{framework → templates}/guides/data-architecture-guide.md +0 -0
  220. /package/{framework → templates}/guides/design-doc-guide.md +0 -0
  221. /package/{framework → templates}/guides/learning-extraction.md +0 -0
  222. /package/{framework → templates}/guides/llm-instruction-design.md +0 -0
  223. /package/{framework → templates}/guides/llm-prompting.md +0 -0
  224. /package/{framework → templates}/guides/tdd-best-practices.md +0 -0
  225. /package/{framework → templates}/guides/test-definitions-guide.md +0 -0
  226. /package/{framework → templates}/guides/testing-methodology.md +0 -0
  227. /package/{framework → templates}/guides/user-story-guide.md +0 -0
  228. /package/{framework → templates}/guides/zombie-process-cleanup.md +0 -0
  229. /package/{packages/cli/templates → templates}/hooks/inject-timestamp.sh +0 -0
  230. /package/{packages/cli/templates → templates}/lib/common.sh +0 -0
  231. /package/{packages/cli/templates → templates}/lib/jq-fallback.sh +0 -0
  232. /package/{packages/cli/templates → templates}/markdownlint.jsonc +0 -0
  233. /package/{framework → templates}/prompts/arch-review.md +0 -0
  234. /package/{framework → templates}/prompts/quality-review.md +0 -0
  235. /package/{framework/skills/quality-reviewer → templates/skills/safeword-quality-reviewer}/SKILL.md +0 -0
@@ -1,239 +0,0 @@
1
- # Writing Instructions for LLMs
2
-
3
- **Context:** When creating documentation that LLMs will read and follow (like AGENTS.md, CLAUDE.md, testing guides, coding standards), different best practices apply than when prompting an LLM directly.
4
-
5
- ## Core Principles
6
-
7
- **1. MECE Principle (Mutually Exclusive, Collectively Exhaustive)**
8
-
9
- Decision trees and categorization must have no overlap and cover all cases. Research shows LLMs struggle with overlapping categories—McKinsey/BCG MECE framework ensures clear decision paths.
10
-
11
- ```markdown
12
- ❌ BAD - Not mutually exclusive:
13
- ├─ Pure function?
14
- ├─ Multiple components interacting?
15
- ├─ Full user flow?
16
-
17
- Problem: A function with database calls could match both
18
-
19
- ✅ GOOD - Sequential, mutually exclusive:
20
- 1. AI content quality? → LLM Eval
21
- 2. Requires real browser? → E2E test
22
- 3. Multiple components? → Integration test
23
- 4. Pure function? → Unit test
24
-
25
- Stops at first match, no ambiguity.
26
- ```
27
-
28
- **2. Explicit Over Implicit**
29
-
30
- Never assume LLMs know what you mean. Define all terms, even "obvious" ones.
31
-
32
- ```markdown
33
- ❌ BAD: "Test at the lowest level"
34
- ✅ GOOD: "Test with the fastest test type that can catch the bug"
35
-
36
- Examples needing definition:
37
- - "Critical paths" → Always critical: auth, payment. Rarely: UI polish, admin
38
- - "Browser" → Real browser (Playwright/Cypress), not jsdom
39
- - "Pure function" → Input → output, no I/O (define edge cases like Date.now())
40
- ```
41
-
42
- **3. No Contradictions**
43
-
44
- Different sections must align. LLMs don't reconcile conflicting guidance. When updating, grep for related terms and update all references.
45
-
46
- ```markdown
47
- ❌ BAD:
48
- Section A: "Write E2E tests only for critical user paths"
49
- Section B: "All user-facing features have at least one E2E test"
50
-
51
- ✅ GOOD:
52
- Section A: "Write E2E tests only for critical user paths"
53
- Section B: "All critical multi-page user flows have at least one E2E test"
54
- + Definition of "critical" with examples
55
- ```
56
-
57
- **4. Concrete Examples Over Abstract Rules**
58
-
59
- Show, don't just tell. LLMs learn patterns from examples. For every rule, include 2-3 concrete examples showing good vs bad.
60
-
61
- ```markdown
62
- ❌ BAD: "Follow best practices for testing"
63
-
64
- ✅ GOOD:
65
- // ❌ BAD - Testing business logic with E2E
66
- test('discount calculation', async ({ page }) => {
67
- await page.goto('/checkout')
68
- await page.fill('[name="price"]', '100')
69
- await expect(page.locator('.total')).toContainText('80')
70
- })
71
-
72
- // ✅ GOOD - Unit test (runs in milliseconds)
73
- it('applies 20% discount', () => {
74
- expect(calculateDiscount(100, 0.20)).toBe(80)
75
- })
76
- ```
77
-
78
- **5. Edge Cases Must Be Explicit**
79
-
80
- What seems obvious to humans often isn't to LLMs. After stating a rule, add "Edge cases:" section with common confusing scenarios.
81
-
82
- ```markdown
83
- ❌ BAD: "Unit test pure functions"
84
-
85
- ✅ GOOD: "Unit test pure functions"
86
-
87
- Edge cases:
88
- - Non-deterministic functions (Math.random(), Date.now()) → Unit test with mocked randomness/time
89
- - Environment dependencies (process.env) → Integration test
90
- - Mixed pure + I/O → Extract pure part, unit test separately
91
- ```
92
-
93
- **6. Actionable Over Vague**
94
-
95
- Give LLMs concrete actions, not subjective guidance. Replace subjective terms (most/some/few) with optimization rules + red flags.
96
-
97
- ```markdown
98
- ❌ BAD: "Most tests: Fast, Some tests: Slow"
99
-
100
- ✅ GOOD:
101
- - Write as many fast tests as possible
102
- - Write E2E tests only for critical paths requiring a browser
103
- - Red flag: If you have more E2E tests than integration tests, suite is too slow
104
- ```
105
-
106
- **7. Decision Trees: Sequential Over Parallel**
107
-
108
- Structure decisions as ordered steps, not simultaneous checks. Sequential questions force the LLM through a deterministic decision path.
109
-
110
- ```markdown
111
- ❌ BAD - Parallel branches:
112
- ├─ Pure function?
113
- ├─ Multiple components?
114
- └─ Full user flow?
115
-
116
- ✅ GOOD - Sequential (see Principle 1 example above)
117
- Answer questions IN ORDER. Stop at the first match.
118
- ```
119
-
120
- **8. Tie-Breaking Rules**
121
-
122
- When multiple options could apply, tell LLMs how to choose.
123
-
124
- ```markdown
125
- ✅ GOOD:
126
- "If multiple test types can catch the bug, choose the fastest one."
127
-
128
- Reference in decision trees:
129
- "If multiple seem to apply, use the tie-breaking rule stated above: choose the faster one."
130
- ```
131
-
132
- **9. Lookup Tables for Complex Decisions**
133
-
134
- When decision logic has 3+ branches, nested conditions, or multiple variables to consider, provide a reference table.
135
-
136
- ```markdown
137
- | Bug Type | Unit? | Integration? | E2E? | Best Choice |
138
- |----------|-------|--------------|------|-------------|
139
- | Calculation error | ✅ | ✅ | ✅ | Unit (fastest) |
140
- | Database query bug | ❌ | ✅ | ✅ | Integration |
141
- | CSS layout broken | ❌ | ❌ | ✅ | E2E (only option) |
142
- ```
143
-
144
- **10. Avoid Caveats in Tables**
145
-
146
- Keep patterns clean. Parentheticals break LLM pattern matching. Add separate rows for caveat cases.
147
-
148
- ```markdown
149
- ❌ BAD: | State management bug | ❌ NO (if mocked) | ✅ YES |
150
- ✅ GOOD: | State management bug (Zustand, Redux) | ❌ NO | ✅ YES |
151
- ```
152
-
153
- **11. Percentages: Context or None**
154
-
155
- Don't use percentages without adjustment guidance.
156
-
157
- ```markdown
158
- ❌ BAD: "70% unit tests, 20% integration, 10% E2E"
159
-
160
- ✅ BETTER: "Baseline: 70/20/10. Adjust: Microservices → 60/30/10, UI-heavy → 60/20/20"
161
-
162
- ✅ BEST: "Write as many fast tests as possible. Red flag: More E2E than integration = too slow."
163
- ```
164
-
165
- **12. Specificity in Questions**
166
-
167
- Use precise technical terms, not general descriptions.
168
-
169
- ```markdown
170
- ❌ BAD: "Does this require seeing the UI?"
171
- ✅ GOOD: "Does this require a real browser (Playwright/Cypress)?"
172
-
173
- Note: React Testing Library does NOT require a browser - that's integration testing.
174
- ```
175
-
176
- **13. Re-evaluation Paths**
177
-
178
- When LLMs hit dead ends, provide concrete next steps.
179
-
180
- ```markdown
181
- ❌ BAD: "If none of the above apply, re-evaluate your approach"
182
-
183
- ✅ GOOD: "If testing behavior that doesn't fit the categories:
184
- 1. Break it down: Separate pure logic from I/O/UI concerns
185
- 2. Test each piece: Pure → Unit, I/O → Integration, Multi-page → E2E
186
- 3. Example: Login validation
187
- - isValidEmail(email) → Unit test
188
- - checkUserExists(email) → Integration test (database)
189
- - Login form → Dashboard → E2E test (multi-page)"
190
- ```
191
-
192
- ## Anti-Patterns to Avoid
193
-
194
- ❌ **Visual metaphors** - Pyramids, icebergs—LLMs don't process visual information well
195
- ❌ **Undefined jargon** - "Technical debt", "code smell" need definitions
196
- ❌ **Competing guidance** - Multiple decision frameworks that contradict each other
197
- ❌ **Outdated references** - Remove concepts, but forget to update all mentions
198
-
199
- ## Quality Checklist
200
-
201
- Before saving/committing LLM-consumable documentation:
202
-
203
- - [ ] Decision trees follow MECE principle (mutually exclusive, collectively exhaustive)
204
- - [ ] Technical terms explicitly defined
205
- - [ ] No contradictions between sections
206
- - [ ] Every rule has 2-3 concrete examples (good vs bad)
207
- - [ ] Edge cases explicitly covered
208
- - [ ] Vague terms replaced with actionable principles
209
- - [ ] Tie-breaking rules provided
210
- - [ ] Complex decisions (3+ branches) have lookup tables
211
- - [ ] Dead-end paths have re-evaluation steps with examples
212
-
213
- ## Research-Backed Principles
214
-
215
- - **MECE (McKinsey):** Mutually exclusive, collectively exhaustive decision trees for reliable LLM decisions
216
- - **Prompt ambiguity (2025):** "Ambiguity is one of the most common causes of poor LLM output" (Zero-Shot Decision Tree Construction)
217
- - **Concrete examples (2025):** Structured approaches with concrete examples consistently improve performance over "act as" or "###" techniques
218
-
219
- ## Example: Before and After
220
-
221
- **Before (ambiguous):**
222
- ```markdown
223
- Follow the test pyramid: lots of unit tests, some integration tests, few E2E tests.
224
- ```
225
-
226
- **After (LLM-optimized):**
227
- ```markdown
228
- Answer these questions IN ORDER to choose test type:
229
-
230
- 1. Pure function (input → output, no I/O)? → Unit test
231
- 2. Multiple components/services interacting? → Integration test
232
- 3. Requires real browser (Playwright)? → E2E test
233
-
234
- If multiple apply: choose the faster one.
235
-
236
- Edge cases:
237
- - React components with React Testing Library → Integration (not E2E, no real browser)
238
- - Non-deterministic functions (Date.now()) → Unit test with mocked time
239
- ```
@@ -1,95 +0,0 @@
1
- # LLM Prompting Best Practices
2
-
3
- This guide covers two related topics:
4
-
5
- **Part 1: Prompting LLMs** - How to structure prompts when actively using an LLM (API calls, chat interactions)
6
-
7
- **Part 2: Writing Instructions for LLMs** - How to write documentation that LLMs will read and follow (SAFEWORD.md, CLAUDE.md, testing guides, coding standards)
8
-
9
- ---
10
-
11
- ## Part 1: Prompting LLMs
12
-
13
- ### Prompt Engineering Principles
14
-
15
- **Concrete Examples Over Abstract Rules:**
16
- - ✅ Good: Show "❌ BAD" vs "✅ GOOD" code examples
17
- - ❌ Bad: "Follow best practices" (too vague)
18
-
19
- **"Why" Over "What":**
20
- - Explain architectural trade-offs and reasoning
21
- - Include specific numbers (90% cost reduction, 3x faster)
22
- - Document gotchas with explanations
23
-
24
- **Structured Outputs:**
25
- - Use JSON mode for predictable LLM responses
26
- - Define explicit schemas with validation
27
- - Return structured data, not prose
28
-
29
- ```typescript
30
- // ❌ BAD - Prose output
31
- "The user wants to create a campaign named 'Shadows' with 4 players"
32
-
33
- // ✅ GOOD - Structured JSON
34
- { "intent": "create_campaign", "name": "Shadows", "playerCount": 4 }
35
- ```
36
-
37
- ### Cost Optimization
38
-
39
- **Prompt Caching (Critical for AI Agents):**
40
- - Static rules → System prompt with cache_control: ephemeral (caches for ~5 min, auto-expires)
41
- - Dynamic data (character state, user input) → User message (no caching)
42
- - Example: 468-line prompt costs $0.10 without caching, $0.01 with (90% reduction)
43
- - Cache invalidation: ANY change to cached blocks breaks ALL caches
44
- - Rule: Change system prompts sparingly; accept one-time cache rebuild cost
45
-
46
- **Message Architecture:**
47
- ```typescript
48
- // ✅ GOOD - Cacheable system prompt
49
- systemPrompt: [
50
- { text: STATIC_RULES, cache_control: { type: 'ephemeral' } },
51
- { text: STATIC_EXAMPLES, cache_control: { type: 'ephemeral' } }
52
- ]
53
- userMessage: `Character: ${dynamicState}\nAction: ${userInput}`
54
-
55
- // ❌ BAD - Uncacheable (character state in system prompt)
56
- systemPrompt: `Rules + Character: ${dynamicState}`
57
- ```
58
-
59
- ### Testing AI Outputs
60
-
61
- **LLM-as-Judge Pattern:**
62
- - Use LLM to evaluate nuanced qualities (narrative tone, reasoning quality)
63
- - Avoid brittle keyword matching for creative outputs
64
- - Define rubrics: EXCELLENT / ACCEPTABLE / POOR with criteria
65
- - Example: "Does the GM's response show collaborative tone?" vs checking for specific words
66
-
67
- **Evaluation Framework:**
68
- - Unit tests: Pure functions (parsing, validation)
69
- - Integration tests: Agent + real LLM calls (schema compliance)
70
- - LLM Evals: Judgment quality (position/effect reasoning, atmosphere)
71
- - Cost awareness: 30 scenarios ≈ $0.15-0.30 per run with caching
72
-
73
- ---
74
-
75
- ## Part 2: Writing Instructions for LLMs
76
-
77
- **Comprehensive framework:** See @.safeword/guides/llm-instruction-design.md
78
-
79
- **Quick summary:** When creating documentation that LLMs will read and follow (SAFEWORD.md, CLAUDE.md, testing guides, coding standards), apply 13 core principles:
80
-
81
- 1. **MECE Principle** - Decision trees must be mutually exclusive and collectively exhaustive
82
- 2. **Explicit Over Implicit** - Define all terms, never assume LLMs know what you mean
83
- 3. **No Contradictions** - Different sections must align, LLMs don't reconcile conflicts
84
- 4. **Concrete Examples Over Abstract Rules** - Show, don't just tell (2-3 examples per rule)
85
- 5. **Edge Cases Must Be Explicit** - What seems obvious to humans often isn't to LLMs
86
- 6. **Actionable Over Vague** - Replace subjective terms with optimization rules + red flags
87
- 7. **Decision Trees: Sequential Over Parallel** - Ordered steps that stop at first match
88
- 8. **Tie-Breaking Rules** - Tell LLMs how to choose when multiple options apply
89
- 9. **Lookup Tables for Complex Decisions** - Provide reference tables for complex logic
90
- 10. **Avoid Caveats in Tables** - Keep patterns clean, parentheticals break LLM pattern matching
91
- 11. **Percentages: Context or None** - Include adjustment guidance or use principles instead
92
- 12. **Specificity in Questions** - Use precise technical terms, not general descriptions
93
- 13. **Re-evaluation Paths** - Provide concrete next steps when LLMs hit dead ends
94
-
95
- **Also includes:** Anti-patterns to avoid, quality checklist, research-backed principles, and before/after examples.