npm - safeword - Versions diffs - 0.2.3 → 0.2.4 - Mend

safeword 0.2.3 → 0.2.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (235) hide show

package/.claude/commands/arch-review.md +32 -0
package/.claude/commands/lint.md +6 -0
package/.claude/commands/quality-review.md +13 -0
package/.claude/commands/setup-linting.md +6 -0
package/.claude/hooks/auto-lint.sh +6 -0
package/.claude/hooks/auto-quality-review.sh +170 -0
package/.claude/hooks/check-linting-sync.sh +17 -0
package/.claude/hooks/inject-timestamp.sh +6 -0
package/.claude/hooks/question-protocol.sh +12 -0
package/.claude/hooks/run-linters.sh +8 -0
package/.claude/hooks/run-quality-review.sh +76 -0
package/.claude/hooks/version-check.sh +10 -0
package/.claude/mcp/README.md +96 -0
package/.claude/mcp/arcade.sample.json +9 -0
package/.claude/mcp/context7.sample.json +7 -0
package/.claude/mcp/playwright.sample.json +7 -0
package/.claude/settings.json +62 -0
package/.claude/skills/quality-reviewer/SKILL.md +190 -0
package/.claude/skills/safeword-quality-reviewer/SKILL.md +13 -0
package/.env.arcade.example +4 -0
package/.env.example +11 -0
package/.gitmodules +4 -0
package/.safeword/SAFEWORD.md +33 -0
package/.safeword/eslint/eslint-base.mjs +101 -0
package/.safeword/guides/architecture-guide.md +404 -0
package/.safeword/guides/code-philosophy.md +174 -0
package/.safeword/guides/context-files-guide.md +405 -0
package/.safeword/guides/data-architecture-guide.md +183 -0
package/.safeword/guides/design-doc-guide.md +165 -0
package/.safeword/guides/learning-extraction.md +515 -0
package/.safeword/guides/llm-instruction-design.md +239 -0
package/.safeword/guides/llm-prompting.md +95 -0
package/.safeword/guides/tdd-best-practices.md +570 -0
package/.safeword/guides/test-definitions-guide.md +243 -0
package/.safeword/guides/testing-methodology.md +573 -0
package/.safeword/guides/user-story-guide.md +237 -0
package/.safeword/guides/zombie-process-cleanup.md +214 -0
package/{templates → .safeword}/hooks/agents-md-check.sh +0 -0
package/{templates → .safeword}/hooks/post-tool.sh +0 -0
package/{templates → .safeword}/hooks/pre-commit.sh +0 -0
package/.safeword/planning/002-user-story-quality-evaluation.md +1840 -0
package/.safeword/planning/003-langsmith-eval-setup-prompt.md +363 -0
package/.safeword/planning/004-llm-eval-test-cases.md +3226 -0
package/.safeword/planning/005-architecture-enforcement-system.md +169 -0
package/.safeword/planning/006-reactive-fix-prevention-research.md +135 -0
package/.safeword/planning/011-cli-ux-vision.md +330 -0
package/.safeword/planning/012-project-structure-cleanup.md +154 -0
package/.safeword/planning/README.md +39 -0
package/.safeword/planning/automation-plan-v2.md +1225 -0
package/.safeword/planning/automation-plan-v3.md +1291 -0
package/.safeword/planning/automation-plan.md +3058 -0
package/.safeword/planning/design/005-cli-implementation.md +343 -0
package/.safeword/planning/design/013-cli-self-contained-templates.md +596 -0
package/.safeword/planning/design/013a-eslint-plugin-suite.md +256 -0
package/.safeword/planning/design/013b-implementation-snippets.md +385 -0
package/.safeword/planning/design/013c-config-isolation-strategy.md +242 -0
package/.safeword/planning/design/code-philosophy-improvements.md +60 -0
package/.safeword/planning/mcp-analysis.md +545 -0
package/.safeword/planning/phase2-subagents-vs-skills-analysis.md +451 -0
package/.safeword/planning/settings-improvements.md +970 -0
package/.safeword/planning/test-definitions/005-cli-implementation.md +1301 -0
package/.safeword/planning/test-definitions/cli-self-contained-templates.md +205 -0
package/.safeword/planning/user-stories/001-guides-review-user-stories.md +1381 -0
package/.safeword/planning/user-stories/003-reactive-fix-prevention.md +132 -0
package/.safeword/planning/user-stories/004-technical-constraints.md +86 -0
package/.safeword/planning/user-stories/005-cli-implementation.md +311 -0
package/.safeword/planning/user-stories/cli-self-contained-templates.md +172 -0
package/.safeword/planning/versioned-distribution.md +740 -0
package/.safeword/prompts/arch-review.md +43 -0
package/.safeword/prompts/quality-review.md +11 -0
package/.safeword/scripts/arch-review.sh +235 -0
package/.safeword/scripts/check-linting-sync.sh +58 -0
package/.safeword/scripts/setup-linting.sh +559 -0
package/.safeword/templates/architecture-template.md +136 -0
package/.safeword/templates/ci/architecture-check.yml +79 -0
package/.safeword/templates/design-doc-template.md +127 -0
package/.safeword/templates/test-definitions-feature.md +100 -0
package/.safeword/templates/ticket-template.md +74 -0
package/.safeword/templates/user-stories-template.md +82 -0
package/.safeword/tickets/001-guides-review-user-stories.md +83 -0
package/.safeword/tickets/002-architecture-enforcement.md +211 -0
package/.safeword/tickets/003-reactive-fix-prevention.md +57 -0
package/.safeword/tickets/004-technical-constraints-in-user-stories.md +39 -0
package/.safeword/tickets/005-cli-implementation.md +248 -0
package/.safeword/tickets/006-flesh-out-skills.md +43 -0
package/.safeword/tickets/007-flesh-out-questioning.md +44 -0
package/.safeword/tickets/008-upgrade-questioning.md +58 -0
package/.safeword/tickets/009-naming-conventions.md +41 -0
package/.safeword/tickets/010-safeword-md-cleanup.md +34 -0
package/.safeword/tickets/011-cursor-setup.md +86 -0
package/.safeword/tickets/README.md +73 -0
package/.safeword/version +1 -0
package/AGENTS.md +59 -0
package/CLAUDE.md +12 -0
package/README.md +347 -0
package/docs/001-cli-implementation-plan.md +856 -0
package/docs/elite-dx-implementation-plan.md +1034 -0
package/framework/README.md +131 -0
package/framework/mcp/README.md +96 -0
package/framework/mcp/arcade.sample.json +8 -0
package/framework/mcp/context7.sample.json +6 -0
package/framework/mcp/playwright.sample.json +6 -0
package/framework/scripts/arch-review.sh +235 -0
package/framework/scripts/check-linting-sync.sh +58 -0
package/framework/scripts/load-env.sh +49 -0
package/framework/scripts/setup-claude.sh +223 -0
package/framework/scripts/setup-linting.sh +559 -0
package/framework/scripts/setup-quality.sh +477 -0
package/framework/scripts/setup-safeword.sh +550 -0
package/framework/templates/ci/architecture-check.yml +78 -0
package/learnings/ai-sdk-v5-breaking-changes.md +178 -0
package/learnings/e2e-test-zombie-processes.md +231 -0
package/learnings/milkdown-crepe-editor-property.md +96 -0
package/learnings/prosemirror-fragment-traversal.md +119 -0
package/package.json +19 -43
package/packages/cli/AGENTS.md +1 -0
package/packages/cli/ARCHITECTURE.md +279 -0
package/packages/cli/package.json +51 -0
package/packages/cli/src/cli.ts +63 -0
package/packages/cli/src/commands/check.ts +166 -0
package/packages/cli/src/commands/diff.ts +209 -0
package/packages/cli/src/commands/reset.ts +190 -0
package/packages/cli/src/commands/setup.ts +325 -0
package/packages/cli/src/commands/upgrade.ts +163 -0
package/packages/cli/src/index.ts +3 -0
package/packages/cli/src/templates/config.ts +58 -0
package/packages/cli/src/templates/content.ts +18 -0
package/packages/cli/src/templates/index.ts +12 -0
package/packages/cli/src/utils/agents-md.ts +66 -0
package/packages/cli/src/utils/fs.ts +179 -0
package/packages/cli/src/utils/git.ts +124 -0
package/packages/cli/src/utils/hooks.ts +29 -0
package/packages/cli/src/utils/output.ts +60 -0
package/packages/cli/src/utils/project-detector.test.ts +185 -0
package/packages/cli/src/utils/project-detector.ts +44 -0
package/packages/cli/src/utils/version.ts +28 -0
package/packages/cli/src/version.ts +6 -0
package/packages/cli/templates/SAFEWORD.md +776 -0
package/packages/cli/templates/doc-templates/architecture-template.md +136 -0
package/packages/cli/templates/doc-templates/design-doc-template.md +134 -0
package/packages/cli/templates/doc-templates/test-definitions-feature.md +131 -0
package/packages/cli/templates/doc-templates/ticket-template.md +82 -0
package/packages/cli/templates/doc-templates/user-stories-template.md +92 -0
package/packages/cli/templates/guides/architecture-guide.md +423 -0
package/packages/cli/templates/guides/code-philosophy.md +195 -0
package/packages/cli/templates/guides/context-files-guide.md +457 -0
package/packages/cli/templates/guides/data-architecture-guide.md +200 -0
package/packages/cli/templates/guides/design-doc-guide.md +171 -0
package/packages/cli/templates/guides/learning-extraction.md +552 -0
package/packages/cli/templates/guides/llm-instruction-design.md +248 -0
package/packages/cli/templates/guides/llm-prompting.md +102 -0
package/packages/cli/templates/guides/tdd-best-practices.md +615 -0
package/packages/cli/templates/guides/test-definitions-guide.md +334 -0
package/packages/cli/templates/guides/testing-methodology.md +618 -0
package/packages/cli/templates/guides/user-story-guide.md +256 -0
package/packages/cli/templates/guides/zombie-process-cleanup.md +219 -0
package/packages/cli/templates/hooks/agents-md-check.sh +27 -0
package/packages/cli/templates/hooks/post-tool.sh +4 -0
package/packages/cli/templates/hooks/pre-commit.sh +10 -0
package/packages/cli/templates/prompts/arch-review.md +43 -0
package/packages/cli/templates/prompts/quality-review.md +10 -0
package/packages/cli/templates/skills/safeword-quality-reviewer/SKILL.md +207 -0
package/packages/cli/tests/commands/check.test.ts +129 -0
package/packages/cli/tests/commands/cli.test.ts +89 -0
package/packages/cli/tests/commands/diff.test.ts +115 -0
package/packages/cli/tests/commands/reset.test.ts +310 -0
package/packages/cli/tests/commands/self-healing.test.ts +170 -0
package/packages/cli/tests/commands/setup-blocking.test.ts +71 -0
package/packages/cli/tests/commands/setup-core.test.ts +135 -0
package/packages/cli/tests/commands/setup-git.test.ts +139 -0
package/packages/cli/tests/commands/setup-hooks.test.ts +334 -0
package/packages/cli/tests/commands/setup-linting.test.ts +189 -0
package/packages/cli/tests/commands/setup-noninteractive.test.ts +80 -0
package/packages/cli/tests/commands/setup-templates.test.ts +181 -0
package/packages/cli/tests/commands/upgrade.test.ts +215 -0
package/packages/cli/tests/helpers.ts +243 -0
package/packages/cli/tests/npm-package.test.ts +83 -0
package/packages/cli/tests/technical-constraints.test.ts +96 -0
package/packages/cli/tsconfig.json +25 -0
package/packages/cli/tsup.config.ts +11 -0
package/packages/cli/vitest.config.ts +23 -0
package/promptfoo.yaml +3270 -0
package/dist/check-3NGQ4NR5.js +0 -129
package/dist/check-3NGQ4NR5.js.map +0 -1
package/dist/chunk-2XWIUEQK.js +0 -190
package/dist/chunk-2XWIUEQK.js.map +0 -1
package/dist/chunk-GZRQL3SX.js +0 -146
package/dist/chunk-GZRQL3SX.js.map +0 -1
package/dist/chunk-ORQHKDT2.js +0 -10
package/dist/chunk-ORQHKDT2.js.map +0 -1
package/dist/chunk-W66Z3C5H.js +0 -21
package/dist/chunk-W66Z3C5H.js.map +0 -1
package/dist/cli.d.ts +0 -1
package/dist/cli.js +0 -34
package/dist/cli.js.map +0 -1
package/dist/diff-Y6QTAW4O.js +0 -166
package/dist/diff-Y6QTAW4O.js.map +0 -1
package/dist/index.d.ts +0 -11
package/dist/index.js +0 -7
package/dist/index.js.map +0 -1
package/dist/reset-3ACTIYYE.js +0 -143
package/dist/reset-3ACTIYYE.js.map +0 -1
package/dist/setup-RR4M334C.js +0 -266
package/dist/setup-RR4M334C.js.map +0 -1
package/dist/upgrade-6AR3DHUV.js +0 -134
package/dist/upgrade-6AR3DHUV.js.map +0 -1
/package/{templates → framework}/SAFEWORD.md +0 -0
/package/{templates → framework}/guides/architecture-guide.md +0 -0
/package/{templates → framework}/guides/code-philosophy.md +0 -0
/package/{templates → framework}/guides/context-files-guide.md +0 -0
/package/{templates → framework}/guides/data-architecture-guide.md +0 -0
/package/{templates → framework}/guides/design-doc-guide.md +0 -0
/package/{templates → framework}/guides/learning-extraction.md +0 -0
/package/{templates → framework}/guides/llm-instruction-design.md +0 -0
/package/{templates → framework}/guides/llm-prompting.md +0 -0
/package/{templates → framework}/guides/tdd-best-practices.md +0 -0
/package/{templates → framework}/guides/test-definitions-guide.md +0 -0
/package/{templates → framework}/guides/testing-methodology.md +0 -0
/package/{templates → framework}/guides/user-story-guide.md +0 -0
/package/{templates → framework}/guides/zombie-process-cleanup.md +0 -0
/package/{templates → framework}/prompts/arch-review.md +0 -0
/package/{templates → framework}/prompts/quality-review.md +0 -0
/package/{templates/skills/safeword-quality-reviewer → framework/skills/quality-reviewer}/SKILL.md +0 -0
/package/{templates/doc-templates → framework/templates}/architecture-template.md +0 -0
/package/{templates/doc-templates → framework/templates}/design-doc-template.md +0 -0
/package/{templates/doc-templates → framework/templates}/test-definitions-feature.md +0 -0
/package/{templates/doc-templates → framework/templates}/ticket-template.md +0 -0
/package/{templates/doc-templates → framework/templates}/user-stories-template.md +0 -0
/package/{templates → packages/cli/templates}/commands/arch-review.md +0 -0
/package/{templates → packages/cli/templates}/commands/lint.md +0 -0
/package/{templates → packages/cli/templates}/commands/quality-review.md +0 -0
/package/{templates → packages/cli/templates}/hooks/inject-timestamp.sh +0 -0
/package/{templates → packages/cli/templates}/lib/common.sh +0 -0
/package/{templates → packages/cli/templates}/lib/jq-fallback.sh +0 -0
/package/{templates → packages/cli/templates}/markdownlint.jsonc +0 -0

package/packages/cli/templates/guides/tdd-best-practices.md ADDED Viewed

@@ -0,0 +1,615 @@
+# TDD Best Practices
+Patterns and examples for user stories and test definitions following TDD best practices.
+**LLM Instruction Design:** These templates create documentation that LLMs read and follow. For comprehensive framework on writing clear, actionable LLM-consumable documentation, see `@.safeword/guides/llm-instruction-design.md`.
+---
+## Fillable Template Files (When to Use Each)
+### Quick Reference
+| Need                          | Template                      | Location                               |
+| ----------------------------- | ----------------------------- | -------------------------------------- |
+| Feature/issue user stories    | `user-stories-template.md`    | `.safeword/planning/user-stories/`     |
+| Feature test suites           | `test-definitions-feature.md` | `.safeword/planning/test-definitions/` |
+| Feature implementation design | `design-doc-template.md`      | `.safeword/planning/design/`           |
+| Project-wide architecture     | No template                   | `ARCHITECTURE.md` at root              |
+**Decision rule:** If unclear, ask: "Does this affect the whole project or just one feature?" Project-wide → architecture doc. Single feature → design doc.
+### Template Details
+**User Stories** (`@.safeword/templates/user-stories-template.md`) - **For features/issues**
+- Multiple related stories in one file
+- Status tracking (✅/❌ per story and AC)
+- Test file references and implementation notes
+- Completion % and phase tracking
+- Use for GitHub issues with multiple user stories
+- Guidance: `@.safeword/guides/user-story-guide.md`
+**Test Definitions** (`@.safeword/templates/test-definitions-feature.md`) - **For feature test suites**
+- Organized by test suites and individual tests
+- Status tracking (✅ Passing / ⏭️ Skipped / ❌ Not Implemented / 🔴 Failing)
+- Detailed steps and expected outcomes
+- Coverage summary with percentages
+- Test execution commands
+- Guidance: `@.safeword/guides/test-definitions-guide.md`
+**Design Doc** (`@.safeword/templates/design-doc-template.md`) - **For feature/system implementation**
+- Implementation-focused (architecture, components, data model, user flow, component interaction)
+- Key technical decisions with rationale (includes "why")
+- Full [N] and [N+1] examples (matches user stories/test definitions pattern)
+- ~121 lines, optimized for LLM filling and consumption
+- No duplication (references user stories, test definitions)
+- Guidance: `@.safeword/guides/architecture-guide.md`
+**Architecture Document** (no template) - **For project/package-wide architecture decisions**
+- One `ARCHITECTURE.md` per project or package (in monorepos)
+- Document principles, data model, component design, decision rationale
+- Living document (updated as architecture evolves)
+- Include version, status, table of contents
+- All architectural decisions in one place (not separate ADRs)
+- Guidance: `@.safeword/guides/architecture-guide.md`
+**Example prompts:**
+- "Create user stories for issue #N" → Uses user stories template
+- "Create test definitions for issue #N" → Uses test definitions template
+- "Create a design doc for [feature]" → Uses design doc template (2-3 pages)
+- "Update the project architecture doc" → Adds to existing ARCHITECTURE.md
+**TDD Workflow:** See `@.safeword/guides/testing-methodology.md` for comprehensive RED → GREEN → REFACTOR workflow with latest best practices
+---
+## User Story Templates
+### When to Use Each Format
+| Format                         | Best For                                    | Example Trigger              |
+| ------------------------------ | ------------------------------------------- | ---------------------------- |
+| Standard (As a/I want/So that) | User-facing features, UI flows              | "User can do X"              |
+| Given-When-Then                | API behavior, state transitions, edge cases | "When X happens, then Y"     |
+| Job Story                      | Problem-solving, user motivation unclear    | "User needs to accomplish X" |
+**Decision rule:** Default to Standard. Use Given-When-Then for APIs or complex state. Use Job Story when focusing on the problem, not the solution.
+### Standard Format (Recommended)
+```
+As a [role/persona]
+I want [capability/feature]
+So that [business value/benefit]
+Acceptance Criteria:
+- [Specific, testable condition 1]
+- [Specific, testable condition 2]
+- [Specific, testable condition 3]
+Out of Scope:
+- [What this story explicitly does NOT include]
+```
+### Given-When-Then Format (Behavior-Focused)
+```
+Given [initial context/state]
+When [action/event occurs]
+Then [expected outcome]
+And [additional context/outcome]
+But [exception/edge case]
+```
+**Filled example:**
+```
+Given I am an authenticated API user
+When I POST to /api/campaigns with valid JSON
+Then I receive a 201 Created response with campaign ID
+And the campaign appears in my GET /api/campaigns list
+But invalid JSON returns 400 with descriptive error messages
+```
+### Job Story Format (Outcome-Focused)
+```
+When [situation/context]
+I want to [motivation/job-to-be-done]
+So I can [expected outcome]
+```
+**Filled example:**
+```
+When I'm debugging a failing test
+I want to see the exact LLM prompt and response
+So I can identify whether the issue is prompt engineering or code logic
+```
+---
+## User Story Best Practices
+### ✅ GOOD Examples
+**Web App Feature:**
+```
+As a user with multiple campaigns
+I want to switch between campaigns without reloading the page
+So that I can quickly compare game states
+Acceptance Criteria:
+- Campaign list shows all saved campaigns with last-played timestamp
+- Clicking a campaign loads its state within 200ms
+- Current campaign is visually highlighted
+- Switching preserves unsaved input in the current campaign
+Out of Scope:
+- Campaign merging/deletion (separate story)
+- Multi-campaign view (future epic)
+```
+**API Feature:**
+```
+Given I am an authenticated API user
+When I POST to /api/campaigns with valid JSON
+Then I receive a 201 Created response with campaign ID
+And the campaign appears in my GET /api/campaigns list
+But invalid JSON returns 400 with descriptive error messages
+```
+**CLI Feature:**
+```
+When I'm debugging a failing test
+I want to see the exact LLM prompt and response
+So I can identify whether the issue is prompt engineering or code logic
+Acceptance Criteria:
+- `--verbose` flag prints full prompt to stderr
+- Response JSON is pretty-printed with syntax highlighting
+- Token count and cost are displayed
+- Works with all agent types (rules, narrative, character)
+```
+**With Technical Constraints:**
+```
+As a user with multiple campaigns
+I want to switch between campaigns without reloading the page
+So that I can quickly compare game states
+Acceptance Criteria:
+- Campaign list shows all saved campaigns with last-played timestamp
+- Clicking a campaign loads its state within 200ms
+- Current campaign is visually highlighted
+Technical Constraints:
+Performance:
+- [ ] Campaign switch completes in < 200ms at P95
+- [ ] Works with up to 50 campaigns without UI lag
+Compatibility:
+- [ ] Chrome 100+, Safari 16+, Firefox 115+
+Data:
+- [ ] Campaign data persists across browser sessions
+```
+### ❌ BAD Examples (Anti-Patterns)
+**Too Vague:**
+```
+As a user
+I want the app to work better
+So that I'm happy
+```
+- ❌ No specific role
+- ❌ "Work better" is not measurable
+- ❌ No acceptance criteria
+**Too Technical (Implementation Details):**
+```
+As a developer
+I want to refactor the CharacterStore to use Immer
+So that state mutations are prevented
+```
+- ❌ This is a technical task, not a user story
+- ❌ Users don't care about Immer
+- ✅ Better as: Spike ticket or refactoring task
+**Missing "So That" (No Value):**
+```
+As a GM
+I want to roll dice
+```
+- ❌ No business value stated
+- ❌ Why does the GM need this?
+**Multiple Features in One Story:**
+```
+As a player
+I want to create characters, manage inventory, and track relationships
+So that I can play the game
+```
+- ❌ 3+ separate features bundled together
+- ❌ Cannot be completed in one sprint
+- ✅ Split into 3 stories
+---
+## Test Definition Templates
+### Unit Test Template
+```typescript
+describe('[Unit/Module Name]', () => {
+  describe('[function/method name]', () => {
+    it('should [expected behavior] when [condition]', () => {
+      // Arrange: Set up test data and dependencies
+      const input = {
+        /* test data */
+      };
+      const expected = {
+        /* expected output */
+      };
+      // Act: Execute the function under test
+      const result = functionUnderTest(input);
+      // Assert: Verify the outcome
+      expect(result).toEqual(expected);
+    });
+    it('should throw [error type] when [invalid condition]', () => {
+      const invalidInput = {
+        /* bad data */
+      };
+      expect(() => functionUnderTest(invalidInput)).toThrow('Expected error message');
+    });
+    it('should handle edge case: [specific edge case]', () => {
+      // Edge cases: empty arrays, null, undefined, boundary values
+    });
+  });
+});
+```
+### Integration Test Template
+```typescript
+describe('[Feature Name] Integration', () => {
+  beforeEach(async () => {
+    // Setup: Initialize database, mock external APIs
+    await setupTestDatabase();
+  });
+  afterEach(async () => {
+    // Teardown: Clean up resources
+    await cleanupTestDatabase();
+  });
+  it('should [complete user flow] successfully', async () => {
+    // Arrange: Create test user and prerequisites
+    const user = await createTestUser();
+    // Act: Execute the full workflow
+    const campaign = await createCampaign(user.id);
+    const character = await createCharacter(campaign.id);
+    const result = await performAction(character.id, 'Skirmish');
+    // Assert: Verify end-to-end behavior
+    expect(result.position).toBe('risky');
+    expect(result.effect).toBe('standard');
+    expect(campaign.history).toHaveLength(1);
+  });
+  it('should rollback transaction when [failure occurs]', async () => {
+    // Test error handling and data consistency
+  });
+  // Filled example: rollback on failure
+  it('should rollback order when payment fails', async () => {
+    const user = await createTestUser();
+    const order = await createOrder(user.id, { items: ['sword'] });
+    // Simulate payment failure
+    mockPaymentGateway.mockRejectedValue(new Error('Card declined'));
+    await expect(processOrder(order.id)).rejects.toThrow('Card declined');
+    // Verify rollback - order cancelled, inventory restored
+    const updatedOrder = await getOrder(order.id);
+    expect(updatedOrder.status).toBe('cancelled');
+    expect(await getInventory('sword')).toBe(1); // Not decremented
+  });
+});
+```
+### E2E Test Template (Playwright/Cypress)
+```typescript
+test.describe('[User Journey Name]', () => {
+  test('should [complete full user flow]', async ({ page }) => {
+    // Arrange: Navigate to starting point
+    await page.goto('/campaigns');
+    // Act: Simulate user interactions
+    await page.click('button:has-text("New Campaign")');
+    await page.fill('[name="campaignName"]', 'The Bloodletters');
+    await page.click('button:has-text("Create")');
+    // Assert: Verify UI state matches expectations
+    await expect(page.locator('h1')).toContainText('The Bloodletters');
+    await expect(page.locator('.campaign-list')).toContainText('The Bloodletters');
+    // Act: Continue the flow
+    await page.click('button:has-text("Create Character")');
+    // Assert: Verify next state
+    await expect(page).toHaveURL(/\/characters\/create/);
+  });
+});
+```
+---
+## Test Best Practices
+### Test Naming Conventions
+**✅ GOOD - Descriptive and Specific:**
+```typescript
+it('should return risky position when outnumbered 3-to-1');
+it('should cache LLM responses for 5 minutes to reduce costs');
+it('should preserve armor state after reducing harm from L2 to L1');
+it('should throw ValidationError when dice pool is negative');
+```
+**❌ BAD - Vague or Implementation-Focused:**
+```typescript
+it('works correctly'); // What does "correctly" mean?
+it('tests the function'); // Obvious, not descriptive
+it('should call setState'); // Implementation detail
+it('scenario 1'); // No context
+```
+**How to rename:**
+1. Identify the behavior being tested
+2. Identify the condition/input
+3. Use pattern: `'should [behavior] when [condition]'`
+Example: `'works correctly'` → `'should return 200 when user is authenticated'`
+### Arrange-Act-Assert (AAA) Pattern
+**Always use AAA structure for clarity:**
+```typescript
+it('should calculate critical success on 6', () => {
+  // Arrange: Setup test data
+  const diceResults = [6, 6, 4];
+  // Act: Execute the logic
+  const outcome = evaluateDiceRoll(diceResults);
+  // Assert: Verify expectations
+  expect(outcome).toBe('critical');
+  expect(outcome.highestDie).toBe(6);
+});
+```
+### Test Independence
+**✅ GOOD - Isolated Tests:**
+```typescript
+beforeEach(() => {
+  // Each test gets fresh state
+  gameState = createFreshGameState();
+});
+it('test A', () => {
+  /* uses gameState */
+});
+it('test B', () => {
+  /* uses separate gameState */
+});
+```
+**❌ BAD - Shared State:**
+```typescript
+let sharedState = {}; // Tests modify this
+it('test A', () => {
+  sharedState.foo = 'bar';
+});
+it('test B', () => {
+  expect(sharedState.foo).toBe('bar');
+}); // Depends on test A!
+```
+### What to Test
+**✅ Test These:**
+- Public API behavior (functions, methods, components)
+- User-facing features (can the user do X?)
+- Edge cases (empty, null, boundary values)
+- Error handling (does it fail gracefully?)
+- Integration points (API calls, database queries)
+**❌ Don't Test These:**
+- Private implementation details (internal helper functions)
+- Third-party library internals (assume React works)
+- Generated code (unless it's business logic)
+- Trivial getters/setters with no logic
+**Boundary example:**
+```typescript
+// ❌ DON'T test this private helper
+function _formatDateInternal(date) {
+  /* internal logic */
+}
+// ✅ DO test the public function that uses it
+export function getFormattedTimestamp(event) {
+  return _formatDateInternal(event.createdAt);
+}
+// Test getFormattedTimestamp, not _formatDateInternal
+```
+### Test Data Builders
+**Use builders for complex test data:**
+```typescript
+// ✅ GOOD - Reusable test data builder
+function buildCharacter(overrides = {}) {
+  return {
+    id: 'test-char-1',
+    name: 'Cutter',
+    playbook: 'Cutter',
+    stress: 0,
+    harm: [],
+    armor: true,
+    ...overrides, // Easy to customize per test
+  };
+}
+it('should increase stress when resisting', () => {
+  const character = buildCharacter({ stress: 3 });
+  // Test uses character with stress=3
+});
+```
+---
+## LLM Testing Patterns
+### Promptfoo LLM-as-Judge Template
+```yaml
+# Tests for AI outputs (narrative quality, reasoning)
+prompts:
+  - file://prompts/gm-narrative.txt
+providers:
+  - id: anthropic:messages:claude-sonnet-4
+    config:
+      temperature: 1.0
+tests:
+  - description: 'GM should telegraph position/effect before roll'
+    vars:
+      action: 'I Skirmish with the gang enforcers'
+      character: { /* character JSON */ }
+    assert:
+      - type: llm-rubric
+        value: |
+          The GM response must:
+          - State position (controlled/risky/desperate) explicitly
+          - State effect (limited/standard/great) explicitly
+          - Explain WHY these were chosen based on fiction
+          Grade as:
+          EXCELLENT: All three present and clear
+          ACCEPTABLE: Position and effect stated, reasoning weak
+          POOR: Missing position or effect
+      - type: llm-rubric
+        value: |
+          Does the GM show collaborative tone (asking questions, inviting detail)?
+          EXCELLENT: Asks open-ended questions, invites player creativity
+          ACCEPTABLE: Acknowledges player action, minimal collaboration
+          POOR: Dictates outcomes without player input
+```
+### Integration Test with Real LLM
+```typescript
+describe('Rules Agent Integration', () => {
+  it('should infer correct position for desperate situation', async () => {
+    // Arrange
+    const scenario = {
+      action: 'I Skirmish against 5 armed guards while wounded',
+      character: buildCharacter({ harm: [{ level: 2, description: 'Broken Arm' }] }),
+    };
+    // Act: Real LLM call (costs ~$0.01)
+    const response = await rulesAgent.processAction(scenario);
+    // Assert: Structured output (not narrative quality)
+    expect(response.position).toBe('desperate');
+    expect(response.effect).toBe('limited');
+    expect(response.dicePool).toBeLessThan(3); // Harm reduces dice
+    expect(response.consequences).toContain('severe harm');
+  });
+});
+```
+---
+## INVEST Checklist (Apply to Every User Story)
+Before writing a story, verify it passes all six criteria:
+- [ ] **Independent** - Can be completed without depending on other stories
+- [ ] **Negotiable** - Details emerge through conversation, not a fixed contract
+- [ ] **Valuable** - Delivers clear value to user or business
+- [ ] **Estimable** - Team can estimate effort (not too vague, not too detailed)
+- [ ] **Small** - Completable in one sprint/iteration (typically 1-5 days)
+- [ ] **Testable** - Clear acceptance criteria define when it's done
+**If a story fails any criteria, it's not ready - refine or split it.**
+---
+## Quick Reference
+**User Story Red Flags (INVEST Violations):**
+- No acceptance criteria → Too vague
+- > 3 acceptance criteria → Split into multiple stories
+- Technical implementation details → Wrong audience
+- Missing "So that" → No clear value
+**Test Red Flags:**
+- Test name doesn't describe behavior → Rename
+- Test depends on another test's state → Isolate
+- Test is >50 lines → Break into smaller tests
+- Test tests implementation details → Test behavior instead
+- Test never fails → Remove (not testing anything)
+**When to Write E2E vs Integration vs Unit:**
+- **E2E:** User can complete full workflow (slow, expensive, high confidence)
+- **Integration:** Multiple modules work together (moderate speed, good ROI)
+- **Unit:** Single function/module logic (fast, cheap, low-level confidence)
+**Ratio guidance:** 70% unit, 20% integration, 10% E2E (adjust based on project)