npm - safeword - Versions diffs - 0.7.6 → 0.8.0 - Mend

safeword 0.7.6 → 0.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (43) hide show

package/templates/SAFEWORD.md CHANGED Viewed

@@ -46,10 +46,8 @@ Training data is stale. Follow this sequence:
 | Trigger                                                   | Guide                                           |
 | --------------------------------------------------------- | ----------------------------------------------- |
-| Starting ANY feature, bug fix, or enhancement             | `./.safeword/guides/development-workflow.md`    |
-| Need to write OR review user stories                      | `./.safeword/guides/user-story-guide.md`        |
-| Need to write OR review test definitions                  | `./.safeword/guides/test-definitions-guide.md`  |
-| Writing tests, doing TDD, or test is failing              | `./.safeword/guides/tdd-best-practices.md`      |
+| Starting feature/task OR writing specs/test definitions   | `./.safeword/guides/planning-guide.md`          |
+| Choosing test type, doing TDD, OR test is failing         | `./.safeword/guides/testing-guide.md`           |
 | Creating OR updating a design doc                         | `./.safeword/guides/design-doc-guide.md`        |
 | Making architectural decision OR writing ADR              | `./.safeword/guides/architecture-guide.md`      |
 | Designing data models, schemas, or database changes       | `./.safeword/guides/data-architecture-guide.md` |
@@ -306,3 +304,14 @@ When markdown lint reports MD040 (missing language), choose:
 - Integration struggle between tools
 **Before extracting:** Check `.safeword/learnings/` for existing similar learnings—update, don't duplicate.
+---
+## Always Remember
+1. **Clarity → Simplicity → Correctness** (in that order)
+2. **Test what you can test**—never ask user to verify
+3. **RED → GREEN → REFACTOR**—never skip steps
+4. **Commit after each GREEN phase**
+5. **Read the matching guide** when a trigger fires
+6. **End every response** with: `{"proposedChanges": bool, "madeChanges": bool, "askedQuestion": bool}`

package/templates/doc-templates/feature-spec-template.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # Feature Spec: [Feature Name] (Issue #[number])
-**Guide**: `@./.safeword/guides/user-story-guide.md` - Best practices, INVEST criteria, and examples
+**Guide**: `@./.safeword/guides/planning-guide.md` - Best practices, INVEST criteria, and examples
 **Template**: `@./.safeword/templates/feature-spec-template.md`
 **Feature**: [Brief description of the feature]

package/templates/doc-templates/task-spec-template.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # Task: [Name]
-**Guide**: `@./.safeword/guides/development-workflow.md`
+**Guide**: `@./.safeword/guides/planning-guide.md`
 **Template**: `@./.safeword/templates/task-spec-template.md`
 ---

package/templates/doc-templates/test-definitions-feature.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # Test Definitions: [Feature Name] (Issue #[number])
-**Guide**: `@./.safeword/guides/test-definitions-guide.md` - Structure, status tracking, and TDD workflow
+**Guide**: `@./.safeword/guides/testing-guide.md` - Structure, status tracking, and TDD workflow
 **Template**: `@./.safeword/templates/test-definitions-feature.md`
 **Feature**: [Brief description of the feature]

package/templates/guides/architecture-guide.md CHANGED Viewed

@@ -413,11 +413,9 @@ export default defineConfig([
 ---
-## Key Takeaway
+## Key Takeaways
-**One comprehensive architecture document per project** > many scattered ADR files:
-✅ Full context in one place
-✅ Living document (update in place)
-✅ LLMs consume entire architecture at once
-✅ Sequential decision trees prevent ambiguity
+- One Architecture Doc per project—not scattered ADRs
+- Every decision needs: What / Why / Trade-off / Alternatives
+- Update when adding: technology, schema, or project-wide pattern
+- Living document—update in place with version/status tracking

package/templates/guides/cli-reference.md CHANGED Viewed

@@ -33,3 +33,11 @@ Common flags:
 - `-y, --yes` - Skip confirmations (setup, reset)
 - `-v, --verbose` - Show detailed output (diff)
 - `-q, --quiet` - Suppress output (sync)
+---
+## Key Takeaways
+- Always use `@latest` for setup/check/upgrade/diff to get current CLI
+- Run `sync` after adding/removing frameworks to update linting plugins
+- Use `diff` before `upgrade` to preview changes

package/templates/guides/code-philosophy.md CHANGED Viewed

@@ -196,3 +196,12 @@ Before completing any work, verify:
 # ❌ Bad: "misc fixes"
 # ✅ Good: "fix: login button not responding to clicks"
 ```
+---
+## Key Takeaways
+- Clarity → Simplicity → Correctness (in that order)
+- Delete unused code—no "just in case" abstractions
+- Commit often with descriptive messages
+- Verify library versions before using APIs (training data is stale)

package/templates/guides/context-files-guide.md CHANGED Viewed

@@ -455,3 +455,12 @@ Before committing:
 - Bloated files cost more tokens and introduce noise
 - Keep under 50KB for optimal performance (though no hard limit)
 - Use imports to modularize instead of monolithic files
+---
+## Key Takeaways
+- Keep context files under 200 lines—use imports to modularize
+- Short declarative bullets, not narrative paragraphs
+- Update immediately when architecture changes (stale docs = confusion)
+- Put critical rules at the END of documents (recency bias)

package/templates/guides/data-architecture-guide.md CHANGED Viewed

@@ -198,3 +198,12 @@ Before finalizing data architecture doc:
 - [ ] Migration strategy covers both additive and breaking changes
 - [ ] Version and status match codebase (verify with git/deployment)
 - [ ] Cross-referenced from root ARCHITECTURE.md or SAFEWORD.md (link exists)
+---
+## Key Takeaways
+- Data quality, governance, accessibility are core principles
+- Every entity needs: attributes, types, relationships, constraints
+- Performance targets use concrete numbers (e.g., <100ms, not "fast")
+- Migration strategy covers both additive and breaking changes

package/templates/guides/design-doc-guide.md CHANGED Viewed

@@ -1,5 +1,17 @@
 # Design Doc Guide for Claude Code
+## Escalation Check
+**STOP if ANY apply—use `architecture-guide.md` first:**
+- [ ] Need to choose a technology or library
+- [ ] Need to design a data model or schema
+- [ ] Pattern will affect 2+ features
+Then return here.
+---
 ## How to Fill Out Design Doc
 **Template:** `@.safeword/templates/design-doc-template.md`
@@ -169,3 +181,12 @@ Before saving, verify:
 **Important:** Design docs are instructions that LLMs read and follow.
 **See:** `@.safeword/guides/llm-guide.md` for comprehensive framework on writing clear, actionable documentation that LLMs can reliably follow.
+---
+## Key Takeaways
+- Escalate to Architecture Doc if: new tech, new schema, or pattern affects 2+ features
+- Reference user stories and test definitions—don't duplicate them
+- Every decision needs: what, why, trade-off
+- ~121 lines target (concise, LLM-optimized)

package/templates/guides/learning-extraction.md CHANGED Viewed

@@ -550,3 +550,12 @@ This is a **living process** - iterate and refine based on what works.
 - Refactor when multiple learnings cover similar topics (consolidate)
 - Split when learning file >200 lines (focus on single concept)
 - Update SAFEWORD.md references when learnings move or merge
+---
+## Key Takeaways
+- Extract after 5+ debug cycles or 3+ approaches tried
+- Check existing learnings first—update, don't duplicate
+- One concept per file, under 200 lines
+- Extract immediately while fresh (don't defer to "later")

package/templates/guides/llm-guide.md CHANGED Viewed

@@ -259,6 +259,44 @@ When LLMs hit dead ends, provide concrete next steps.
    - Login form → Dashboard → E2E test (multi-page)"
 ```
+### 14. Position-Aware Writing (Recency Bias)
+LLMs retain information at the **beginning and end** of context better than the middle. Structure documents accordingly.
+```markdown
+❌ BAD - Critical rules buried in middle:
+# Guide
+## Background (100 lines)
+## Details (200 lines)
+## Critical Rules (10 lines) ← forgotten
+## Appendix (50 lines)
+✅ GOOD - Critical rules at end:
+# Guide
+## Background (100 lines)
+## Details (200 lines)
+## Appendix (50 lines)
+## Key Takeaways (10 lines) ← retained
+```
+**Application:**
+- CLAUDE.md / SAFEWORD.md: Put "Always Remember" section last
+- Guides: End with "Key Takeaways" section
+- Templates: Put most important sections at top OR bottom, not middle
+**Research basis:** "Lost in the middle" phenomenon—models show <40% recall for middle content vs >80% for beginning/end content.
 ---
 ## Anti-Patterns
@@ -267,6 +305,7 @@ When LLMs hit dead ends, provide concrete next steps.
 ❌ **Undefined jargon** - "Technical debt", "code smell" need definitions
 ❌ **Competing guidance** - Multiple decision frameworks that contradict each other
 ❌ **Outdated references** - Remove concepts, but forget to update all mentions
+❌ **Critical info in the middle** - Most important rules buried between background and appendix
 ---
@@ -283,6 +322,7 @@ Before saving/committing LLM-consumable documentation:
 - [ ] Tie-breaking rules provided
 - [ ] Complex decisions (3+ branches) have lookup tables
 - [ ] Dead-end paths have re-evaluation steps with examples
+- [ ] Critical rules positioned at END of document (recency bias)
 ---
@@ -310,3 +350,12 @@ Edge cases:
 - React components with React Testing Library → Integration (not E2E, no real browser)
 - Non-deterministic functions (Date.now()) → Unit test with mocked time
 ```
+---
+## Key Takeaways
+- Decision trees: sequential, MECE, with tie-breakers
+- Every rule needs concrete examples (good vs bad)
+- Define all terms explicitly—assume nothing is obvious
+- Put critical rules at the END of documents (recency bias)

package/templates/guides/planning-guide.md ADDED Viewed

@@ -0,0 +1,431 @@
+# Planning Guide
+How to write specs, user stories, and test definitions before implementation.
+---
+## Artifact Levels
+**Triage first - answer IN ORDER, stop at first match:**
+| Question                                 | Level          | Artifacts                                            |
+| ---------------------------------------- | -------------- | ---------------------------------------------------- |
+| User-facing feature with business value? | **L2 Feature** | Feature Spec + Test Definitions (+ Design Doc if 3+) |
+| Bug, improvement, internal, or refactor? | **L1 Task**    | Task Spec with inline tests                          |
+| Typo, config, or trivial change?         | **L0 Micro**   | Minimal Task Spec, existing tests                    |
+**Locations:**
+- Specs: `.safeword/planning/specs/`
+- Test definitions: `.safeword/planning/test-definitions/`
+**If none fit:** Break down the work. A single task spanning all three levels should be split into separate L2 feature + L1 tasks.
+---
+## Templates
+| Need                            | Template                                             |
+| ------------------------------- | ---------------------------------------------------- |
+| L2 Feature spec                 | `@./.safeword/templates/feature-spec-template.md`    |
+| L1/L0 Task spec                 | `@./.safeword/templates/task-spec-template.md`       |
+| L2 Test definitions             | `@./.safeword/templates/test-definitions-feature.md` |
+| Complex feature design          | `@./.safeword/templates/design-doc-template.md`      |
+| Architectural decision          | `@./.safeword/templates/architecture-template.md`    |
+| Context anchor for complex work | `@./.safeword/templates/ticket-template.md`          |
+| Execution scratch pad           | `@./.safeword/templates/work-log-template.md`        |
+---
+## Part 1: User Stories
+### When to Use Each Format
+| Format                         | Best For                                    | Example Trigger              |
+| ------------------------------ | ------------------------------------------- | ---------------------------- |
+| Standard (As a/I want/So that) | User-facing features, UI flows              | "User can do X"              |
+| Given-When-Then                | API behavior, state transitions, edge cases | "When X happens, then Y"     |
+| Job Story                      | Problem-solving, user motivation unclear    | "User needs to accomplish X" |
+**Decision rule:** Default to Standard. Use Given-When-Then for APIs or complex state. Use Job Story when focusing on the problem, not the solution.
+**Edge cases:**
+- API with UI? → Standard for UI, Given-When-Then for API contract tests
+- Unclear user role? → Job Story to focus on the problem first, convert to Standard later
+- Technical task (refactor, upgrade)? → Skip story format, use Technical Task template
+### Standard Format (Recommended)
+```text
+As a [role/persona]
+I want [capability/feature]
+So that [business value/benefit]
+Acceptance Criteria:
+- [Specific, testable condition 1]
+- [Specific, testable condition 2]
+- [Specific, testable condition 3]
+Out of Scope:
+- [What this story explicitly does NOT include]
+```
+### Given-When-Then Format (Behavior-Focused)
+```text
+Given [initial context/state]
+When [action/event occurs]
+Then [expected outcome]
+And [additional context/outcome]
+But [exception/edge case]
+```
+**Example:**
+```text
+Given I am an authenticated API user
+When I POST to /api/campaigns with valid JSON
+Then I receive a 201 Created response with campaign ID
+And the campaign appears in my GET /api/campaigns list
+But invalid JSON returns 400 with descriptive error messages
+```
+### Job Story Format (Outcome-Focused)
+```text
+When [situation/context]
+I want to [motivation/job-to-be-done]
+So I can [expected outcome]
+```
+**Example:**
+```text
+When I'm debugging a failing test
+I want to see the exact LLM prompt and response
+So I can identify whether the issue is prompt engineering or code logic
+```
+---
+## INVEST Validation
+Before saving any story, verify it passes all six criteria:
+- [ ] **Independent** - Can be completed without depending on other stories
+- [ ] **Negotiable** - Details emerge through conversation, not a fixed contract
+- [ ] **Valuable** - Delivers clear value to user or business
+- [ ] **Estimable** - Team can estimate effort (not too vague, not too detailed)
+- [ ] **Small** - Completable in one sprint/iteration (typically 1-5 days)
+- [ ] **Testable** - Clear acceptance criteria define when it's done
+**If a story fails any criteria, it's not ready - refine or split it.**
+---
+## Writing Good Acceptance Criteria
+**✅ GOOD - Specific, user-facing, testable:**
+- User can switch campaigns without page reload
+- Response time is under 200ms
+- Current campaign is visually highlighted
+- Error message explains what went wrong
+**❌ BAD - Vague, technical, or implementation:**
+- Campaign switching works ← Too vague
+- Use Zustand for state ← Implementation detail
+- Database is fast ← Not user-facing
+- Code is clean ← Not testable
+---
+## Size Guidelines
+| Indicator           | Too Big | Just Right | Too Small |
+| ------------------- | ------- | ---------- | --------- |
+| Acceptance Criteria | 6+      | 1-5        | 0         |
+| Personas/Screens    | 3+      | 1-2        | N/A       |
+| Duration            | 6+ days | 1-5 days   | <1 hour   |
+| **Action**          | Split   | ✅ Ship    | Combine   |
+**Decision rule:** When borderline, err on the side of splitting. Smaller stories are easier to estimate and complete.
+---
+## Technical Constraints Section
+**Purpose:** Capture non-functional requirements that inform test definitions.
+**When to use:** Fill in constraints BEFORE writing test definitions. Delete sections that don't apply.
+| Category       | What It Captures                 | Examples                                        |
+| -------------- | -------------------------------- | ----------------------------------------------- |
+| Performance    | Speed, throughput, capacity      | Response time < 200ms, 1000 concurrent users    |
+| Security       | Auth, validation, rate limiting  | Sanitized inputs, session required, 100 req/min |
+| Compatibility  | Browsers, devices, accessibility | Chrome 100+, iOS 14+, WCAG 2.1 AA               |
+| Data           | Privacy, retention, compliance   | GDPR delete in 72h, 90-day log retention        |
+| Dependencies   | Existing systems, restrictions   | Use AuthService, no new packages                |
+| Infrastructure | Resources, offline, deployment   | < 512MB memory, offline-capable                 |
+**Include a constraint if:**
+- It affects how you write tests
+- It limits implementation choices
+- Violating it would fail an audit or break SLAs
+---
+## User Story Examples
+### ✅ GOOD Story
+```text
+As a player with multiple campaigns
+I want to switch between campaigns from the sidebar
+So that I can quickly resume different games
+Acceptance Criteria:
+- [ ] Sidebar shows all campaigns with last-played date
+- [ ] Clicking campaign loads it within 200ms
+- [ ] Current campaign is highlighted
+Out of Scope:
+- Campaign merging/deletion (separate story)
+```
+### ❌ BAD Story (Too Big)
+```text
+As a user
+I want a complete campaign management system
+So that I can organize my games
+Acceptance Criteria:
+- [ ] Create, edit, delete campaigns
+- [ ] Share campaigns with other players
+- [ ] Export/import campaign data
+- [ ] Search and filter campaigns
+- [ ] Tag campaigns by theme
+```
+**Problem:** This is 5+ separate stories. Split it.
+### ❌ BAD Story (No Value)
+```text
+As a developer
+I want to refactor the GameStore
+So that code is cleaner
+```
+**Problem:** Developer is not a user. "Cleaner code" is not user-facing value.
+### ✅ BETTER (Technical Task)
+```text
+Technical Task: Refactor GameStore to use Immer
+Why: Prevent state mutation bugs (3 bugs in last sprint)
+Effort: 2-3 hours
+Test: All existing tests pass, no new mutations
+```
+---
+## Part 2: Test Definitions
+### How to Fill Out Test Definitions
+1. Read `@./.safeword/templates/test-definitions-feature.md`
+2. Read user story's Technical Constraints section (if exists)
+3. Fill in feature name, issue number, test file path
+4. Organize tests into logical suites
+5. Create numbered tests (Test 1.1, Test 1.2, etc.)
+6. Add status for each test
+7. Include detailed steps and expected outcomes
+8. Add summary with coverage breakdown
+9. Save to `.safeword/planning/test-definitions/feature-[slug].md`
+---
+## Test Status Indicators
+Use these consistently:
+- **✅ Passing** - Test is implemented and passing
+- **⏭️ Skipped** - Test is intentionally skipped (add rationale)
+- **❌ Not Implemented** - Test is defined but not yet written
+- **🔴 Failing** - Test exists but is currently failing
+---
+## Test Definition Naming
+**✅ GOOD - Descriptive and specific:**
+- "Render all three panes"
+- "Cmd+J toggles AI pane visibility"
+- "State persistence across sessions"
+**❌ BAD - Vague or technical:**
+- "Test 1" (no description)
+- "Check state" (too vague)
+- "Verify useUIStore hook" (implementation detail)
+---
+## Writing Test Steps
+**✅ GOOD - Clear, actionable steps:**
+```text
+**Steps**:
+1. Toggle AI pane visible
+2. Get bounding box for AI pane
+3. Get bounding box for Editor pane
+4. Compare X coordinates
+```
+**❌ BAD - Vague or incomplete:**
+```text
+**Steps**:
+1. Check panes
+2. Verify order
+```
+---
+## Writing Expected Outcomes
+**✅ GOOD - Specific, testable assertions:**
+```text
+**Expected**:
+- AI pane X coordinate < Editor pane X coordinate
+- Explorer pane X coordinate > Editor pane X coordinate
+- All coordinates are positive numbers
+```
+**❌ BAD - Vague expectations:**
+```text
+**Expected**:
+- Panes are in correct order
+- Everything works
+```
+---
+## Organizing Test Suites
+Group related tests:
+- **Layout/Structure** - DOM structure, element presence, positioning
+- **User Interactions** - Clicks, keyboard shortcuts, drag/drop
+- **State Management** - State changes, persistence, reactivity
+- **Accessibility** - ARIA labels, keyboard navigation, focus
+- **Edge Cases** - Error handling, boundary conditions
+- **Technical Constraints** - Non-functional requirements from user story
+---
+## Coverage Summary
+**Always include:**
+- Total test count
+- Breakdown by status (passing, skipped, not implemented, failing)
+- Percentages for each category
+- Rationale for skipped tests
+**Example:**
+```text
+**Total**: 20 tests
+**Passing**: 9 tests (45%)
+**Skipped**: 4 tests (20%)
+**Not Implemented**: 7 tests (35%)
+**Failing**: 0 tests
+```
+---
+## Testing Technical Constraints
+User stories include Technical Constraints. These MUST have corresponding tests.
+| Constraint Category | Test Type                  | What to Verify                                |
+| ------------------- | -------------------------- | --------------------------------------------- |
+| Performance         | Load/timing tests          | Response times, throughput, capacity          |
+| Security            | Security tests             | Input sanitization, auth, rate limiting       |
+| Compatibility       | Cross-browser/device tests | Browser versions, mobile, accessibility       |
+| Data                | Compliance tests           | Retention, deletion, privacy rules            |
+| Dependencies        | Integration tests          | Required services work, no forbidden packages |
+| Infrastructure      | Resource tests             | Memory limits, offline behavior               |
+---
+## Test Definition Example
+```markdown
+### Test 3.1: Cmd+J toggles AI pane visibility ✅
+**Status**: ✅ Passing
+**Description**: Verifies Cmd+J keyboard shortcut toggles AI pane
+**Steps**:
+1. Verify AI pane hidden initially (default state)
+2. Press Cmd+J (Mac) or Ctrl+J (Windows/Linux)
+3. Verify AI pane becomes visible
+4. Press Cmd+J again
+5. Verify AI pane becomes hidden
+**Expected**:
+- AI pane starts hidden
+- After first toggle: AI pane visible
+- After second toggle: AI pane hidden
+```
+---
+## File Naming Convention
+**Specs:** `.safeword/planning/specs/feature-[slug].md` or `task-[slug].md`
+**Test definitions:** `.safeword/planning/test-definitions/feature-[slug].md`
+**Good filenames:**
+- `feature-campaign-switching.md`
+- `task-fix-login-timeout.md`
+**Bad filenames:**
+- `user-story-1.md` ← Not descriptive
+- `STORY_CAMPAIGN_FINAL_v2.md` ← Bloated
+---
+## Quick Reference
+**User Story Red Flags (INVEST Violations):**
+- No acceptance criteria → Too vague
+- > 5 acceptance criteria → Split into multiple stories
+- Technical implementation details → Wrong audience
+- Missing "So that" → No clear value
+**Test Definition Red Flags:**
+- Test name doesn't describe behavior → Rename
+- Steps are vague → Add detail
+- No expected outcomes → Add assertions
+- No coverage summary → Add totals