RubyGems - claude_memory - Versions diffs - 0.2.0 → 0.4.0 - Mend

claude_memory 0.2.0 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (120) hide show

checksums.yaml +4 -4
data/.claude/CLAUDE.md +1 -0
data/.claude/output-styles/memory-aware.md +1 -0
data/.claude/rules/claude_memory.generated.md +1 -20
data/.claude/settings.local.json +12 -1
data/.claude/skills/check-memory/DEPRECATED.md +29 -0
data/.claude/skills/check-memory/SKILL.md +77 -0
data/.claude/skills/debug-memory +1 -0
data/.claude/skills/improve/SKILL.md +532 -0
data/.claude/skills/improve/feature-patterns.md +1221 -0
data/.claude/skills/memory-first-workflow +1 -0
data/.claude/skills/quality-update/SKILL.md +229 -0
data/.claude/skills/quality-update/implementation-guide.md +346 -0
data/.claude/skills/review-commit/SKILL.md +199 -0
data/.claude/skills/review-for-quality/SKILL.md +154 -0
data/.claude/skills/review-for-quality/expert-checklists.md +79 -0
data/.claude/skills/setup-memory +1 -0
data/.claude/skills/study-repo/SKILL.md +307 -0
data/.claude/skills/study-repo/analysis-template.md +323 -0
data/.claude/skills/study-repo/focus-examples.md +327 -0
data/.claude-plugin/plugin.json +1 -1
data/.lefthook/map_specs.rb +29 -0
data/CHANGELOG.md +141 -0
data/CLAUDE.md +168 -11
data/README.md +160 -10
data/Rakefile +14 -1
data/WEEK2_COMPLETE.md +250 -0
data/db/migrations/001_create_initial_schema.rb +117 -0
data/db/migrations/002_add_project_scoping.rb +33 -0
data/db/migrations/003_add_session_metadata.rb +42 -0
data/db/migrations/004_add_fact_embeddings.rb +20 -0
data/db/migrations/005_add_incremental_sync.rb +21 -0
data/db/migrations/006_add_operation_tracking.rb +40 -0
data/db/migrations/007_add_ingestion_metrics.rb +26 -0
data/docs/GETTING_STARTED.md +587 -0
data/docs/RELEASE_NOTES_v0.2.0.md +0 -1
data/docs/RUBY_COMMUNITY_POST_v0.2.0.md +0 -2
data/docs/architecture.md +53 -17
data/docs/auto_init_design.md +230 -0
data/docs/ci_integration.md +294 -0
data/docs/eval_week1_summary.md +183 -0
data/docs/eval_week2_summary.md +419 -0
data/docs/evals.md +353 -0
data/docs/improvements.md +551 -726
data/docs/influence/.gitkeep +13 -0
data/docs/influence/grepai.md +933 -0
data/docs/influence/qmd.md +2195 -0
data/docs/plugin.md +257 -11
data/docs/quality_review.md +472 -1273
data/docs/remaining_improvements.md +330 -0
data/lefthook.yml +21 -1
data/lib/claude_memory/commands/checks/claude_md_check.rb +41 -0
data/lib/claude_memory/commands/checks/database_check.rb +120 -0
data/lib/claude_memory/commands/checks/hooks_check.rb +112 -0
data/lib/claude_memory/commands/checks/reporter.rb +110 -0
data/lib/claude_memory/commands/checks/snapshot_check.rb +30 -0
data/lib/claude_memory/commands/doctor_command.rb +12 -129
data/lib/claude_memory/commands/help_command.rb +1 -0
data/lib/claude_memory/commands/hook_command.rb +9 -2
data/lib/claude_memory/commands/index_command.rb +169 -0
data/lib/claude_memory/commands/ingest_command.rb +1 -1
data/lib/claude_memory/commands/init_command.rb +5 -197
data/lib/claude_memory/commands/initializers/database_ensurer.rb +30 -0
data/lib/claude_memory/commands/initializers/global_initializer.rb +85 -0
data/lib/claude_memory/commands/initializers/hooks_configurator.rb +156 -0
data/lib/claude_memory/commands/initializers/mcp_configurator.rb +56 -0
data/lib/claude_memory/commands/initializers/memory_instructions_writer.rb +135 -0
data/lib/claude_memory/commands/initializers/project_initializer.rb +111 -0
data/lib/claude_memory/commands/recover_command.rb +75 -0
data/lib/claude_memory/commands/registry.rb +5 -1
data/lib/claude_memory/commands/stats_command.rb +239 -0
data/lib/claude_memory/commands/uninstall_command.rb +226 -0
data/lib/claude_memory/core/batch_loader.rb +32 -0
data/lib/claude_memory/core/concept_ranker.rb +73 -0
data/lib/claude_memory/core/embedding_candidate_builder.rb +37 -0
data/lib/claude_memory/core/fact_collector.rb +51 -0
data/lib/claude_memory/core/fact_query_builder.rb +154 -0
data/lib/claude_memory/core/fact_ranker.rb +113 -0
data/lib/claude_memory/core/result_builder.rb +54 -0
data/lib/claude_memory/core/result_sorter.rb +25 -0
data/lib/claude_memory/core/scope_filter.rb +61 -0
data/lib/claude_memory/core/text_builder.rb +29 -0
data/lib/claude_memory/embeddings/fastembed_adapter.rb +55 -0
data/lib/claude_memory/embeddings/generator.rb +161 -0
data/lib/claude_memory/embeddings/similarity.rb +69 -0
data/lib/claude_memory/hook/handler.rb +4 -3
data/lib/claude_memory/index/lexical_fts.rb +7 -2
data/lib/claude_memory/infrastructure/operation_tracker.rb +158 -0
data/lib/claude_memory/infrastructure/schema_validator.rb +206 -0
data/lib/claude_memory/ingest/content_sanitizer.rb +6 -7
data/lib/claude_memory/ingest/ingester.rb +103 -15
data/lib/claude_memory/ingest/metadata_extractor.rb +57 -0
data/lib/claude_memory/ingest/tool_extractor.rb +71 -0
data/lib/claude_memory/mcp/response_formatter.rb +331 -0
data/lib/claude_memory/mcp/server.rb +19 -0
data/lib/claude_memory/mcp/setup_status_analyzer.rb +73 -0
data/lib/claude_memory/mcp/tool_definitions.rb +279 -0
data/lib/claude_memory/mcp/tool_helpers.rb +80 -0
data/lib/claude_memory/mcp/tools.rb +330 -320
data/lib/claude_memory/recall/dual_query_template.rb +63 -0
data/lib/claude_memory/recall.rb +304 -237
data/lib/claude_memory/resolve/resolver.rb +52 -49
data/lib/claude_memory/store/sqlite_store.rb +210 -144
data/lib/claude_memory/store/store_manager.rb +6 -6
data/lib/claude_memory/sweep/sweeper.rb +6 -0
data/lib/claude_memory/version.rb +1 -1
data/lib/claude_memory.rb +35 -3
data/output-styles/memory-aware.md +71 -0
data/skills/debug-memory/SKILL.md +146 -0
data/skills/memory-first-workflow/SKILL.md +144 -0
data/skills/setup-memory/SKILL.md +168 -0
metadata +83 -11
data/.claude/.mind.mv2.aLCUZd +0 -0
data/.claude/memory.sqlite3 +0 -0
data/.claude/output-styles/memory-aware.md +0 -21
data/.mcp.json +0 -11
/data/docs/{feature_adoption_plan.md → plans/feature_adoption_plan.md} +0 -0
/data/docs/{feature_adoption_plan_revised.md → plans/feature_adoption_plan_revised.md} +0 -0
/data/docs/{plan.md → plans/plan.md} +0 -0
/data/docs/{updated_plan.md → plans/updated_plan.md} +0 -0

data/docs/architecture.md CHANGED Viewed

@@ -9,7 +9,7 @@ ClaudeMemory is architected using Domain-Driven Design (DDD) principles with cle
 ```
 ┌─────────────────────────────────────────────────────────────┐
 │                    Application Layer                         │
-│  CLI (Router) → Commands (16 classes) → Configuration       │
+│  CLI (Router) → Commands (20 classes) → Configuration       │
 └──────────────────────┬──────────────────────────────────────┘
                        │
 ┌──────────────────────▼──────────────────────────────────────┐
@@ -22,12 +22,13 @@ ClaudeMemory is architected using Domain-Driven Design (DDD) principles with cle
 ┌──────────────────────▼──────────────────────────────────────┐
 │                 Business Logic Layer                         │
 │  Recall → Resolve → Distill → Ingest → Publish             │
-│  Sweep → MCP → Hook                                         │
+│  Sweep → Embeddings → MCP → Hook                           │
 └──────────────────────┬──────────────────────────────────────┘
                        │
 ┌──────────────────────▼──────────────────────────────────────┐
 │                 Infrastructure Layer                         │
-│  Store (SQLite via Sequel) → FileSystem → Index (FTS5)     │
+│  Store (SQLite v6 + WAL) → FileSystem → Index (FTS5+Vector) │
+│  Templates                                                   │
 └─────────────────────────────────────────────────────────────┘
 ```
@@ -38,8 +39,8 @@ ClaudeMemory is architected using Domain-Driven Design (DDD) principles with cle
 **Purpose:** Handle user interaction and command routing
 **Components:**
-- **CLI** (`cli.rb`): Thin router (41 lines) that dispatches to command classes
-- **Commands** (`commands/`): 16 command classes, each handling one CLI command
+- **CLI** (`cli.rb`): Thin router that dispatches to command classes
+- **Commands** (`commands/`): 20 command classes, each handling one CLI command
 - **Configuration** (`configuration.rb`): Centralized ENV access and path calculation
 **Key Principles:**
@@ -94,6 +95,9 @@ end
 - **SessionId**: Type-safe session identifiers
 - **TranscriptPath**: Type-safe file paths
 - **FactId**: Type-safe positive integer IDs
+- **TextBuilder**: Searchable text construction from entities/facts/decisions
+- **ResultSorter**: Result ranking and sorting logic
+- **FactQueryBuilder**: SQL query construction for fact retrieval
 - All are immutable (frozen) and self-validating
 #### Null Objects (`core/`)
@@ -115,13 +119,14 @@ end
 **Components:**
-#### Recall (`recall.rb`)
+#### Recall (`recall.rb` + `recall/`)
 - Queries facts from global and project databases
 - **Optimization**: Batch queries to eliminate N+1 issues
   - Before: 2N+1 queries for N facts
   - After: 3 queries total (FTS + batch facts + batch receipts)
 - Supports scope filtering (project, global, all)
 - Returns facts with provenance receipts
+- `DualQueryTemplate`: Query template handling for dual-database queries
 #### Resolve (`resolve/`)
 - Truth maintenance and conflict resolution
@@ -149,9 +154,19 @@ end
 - Time-bounded execution
 - Cleans up old content and expired facts
+#### Embeddings (`embeddings/`)
+- `Generator`: Built-in TF-IDF embedding generation (always available, no dependencies)
+- `FastembedAdapter`: High-quality local embeddings via [fastembed-rb](https://github.com/khasinski/fastembed-rb) (BAAI/bge-small-en-v1.5)
+- 384-dimensional normalized vectors (both generators produce same dimensionality)
+- Asymmetric query/passage encoding (FastEmbed) for better retrieval accuracy
+- `Similarity`: Cosine similarity calculations and top-k ranking
+- Dependency injection: `Recall.new(store, embedding_generator: adapter)`
 #### MCP (`mcp/`)
 - Model Context Protocol server
-- Exposes tools: recall, explain, promote, status, conflicts, changes, sweep_now
+- Exposes 19 tools including: recall, explain, promote, status, decisions, conventions, architecture, semantic search, check_setup, and more
+- `ResponseFormatter`: Consistent MCP response formatting
+- `SetupStatusAnalyzer`: Initialization and version status analysis
 #### Hook (`hook/`)
 - Reads JSON from stdin
@@ -164,10 +179,11 @@ end
 **Components:**
 #### Store (`store/`)
-- **SQLiteStore**: Direct database access via Sequel
+- **SQLiteStore**: Direct database access via Sequel (schema v6)
 - **StoreManager**: Manages dual databases (global + project)
 - **Transaction safety**: Atomic multi-step operations
-- Schema migrations
+- **WAL mode**: Write-Ahead Logging for better concurrency
+- Schema migrations with per-migration transactions
 #### FileSystem (`infrastructure/`)
 - **FileSystem**: Real filesystem wrapper
@@ -176,8 +192,14 @@ end
 - Enables testing without tempdir cleanup
 #### Index (`index/`)
-- SQLite FTS5 full-text search
-- No embeddings required
+- SQLite FTS5 for lexical full-text search
+- Vector embeddings for semantic similarity (384-dimensional vectors)
+- Hybrid search modes: text-only, vector-only, or both (FTS5 + vector)
+#### Templates (`templates/`)
+- Hook configuration examples (`hooks.example.json`)
+- Output style templates (`output-styles/memory-aware.md`)
+- Setup and configuration scaffolding
 **Key Principles:**
 - Ports and Adapters: Clear interfaces for external systems
@@ -276,6 +298,16 @@ FileSystem (write)
 **Solution:** Wrap in database transactions
 **Impact:** Data integrity guaranteed
+### 4. WAL Mode for Concurrency
+**Problem:** Database locks prevented concurrent reads during writes
+**Solution:** Enable Write-Ahead Logging (WAL) mode in SQLite
+**Impact:** MCP server and hooks can operate concurrently without blocking
+### 5. Local Semantic Search
+**Problem:** Traditional semantic search requires cloud API calls for embedding generation
+**Solution:** Local ONNX model via fastembed-rb (BAAI/bge-small-en-v1.5, 384-dimensional vectors)
+**Impact:** High-quality semantic search with no API costs, no network dependency after initial model download
 ## Testing Strategy
 ### Unit Tests
@@ -307,14 +339,17 @@ FileSystem (write)
 - Scattered ENV access
 ### After Refactoring
-- CLI: 41 lines (95% reduction)
-- Tests: 426 examples (149 added)
+- CLI: 41 lines (thin router, 95% reduction from original)
+- Tests: 988 examples (257% increase)
 - Batch queries (3 total)
 - FileSystem abstraction
-- Value objects
+- Value objects (SessionId, TranscriptPath, FactId)
 - Centralized Configuration
 - 4 domain models with business logic
-- 16 command classes
+- 20 command classes
+- 19 MCP tools
+- Semantic search with local embeddings (FastEmbed + TF-IDF fallback)
+- Schema v6 with WAL mode
 ## Future Improvements
@@ -350,11 +385,12 @@ FileSystem (write)
 The refactored architecture provides:
 - ✅ Clear separation of concerns
-- ✅ High testability (426 tests)
+- ✅ High testability (988 tests)
 - ✅ Type safety (value objects)
 - ✅ Null safety (null objects)
-- ✅ Performance (batch queries, in-memory FS)
+- ✅ Performance (batch queries, in-memory FS, WAL mode)
 - ✅ Maintainability (small, focused classes)
 - ✅ Extensibility (easy to add commands/tools)
+- ✅ Semantic search (local FastEmbed ONNX model, TF-IDF fallback)
 The codebase now follows best practices for Ruby applications and is well-positioned for future growth.

data/docs/auto_init_design.md ADDED Viewed

@@ -0,0 +1,230 @@
+# Auto-Initialization and Upgrade Design
+## Problem Statement
+When users install ClaudeMemory (add to MCP), they must manually run `claude-memory init`. There's no:
+- Automatic detection of uninitialized state
+- Upgrade detection when CLAUDE.md instructions change
+- Graceful degradation when not configured
+## Constraints
+1. **No hooks before init**: Can't use SessionStart hook to auto-init (hooks aren't configured yet)
+2. **MCP server is stateless**: Starts fresh each time, no persistent memory
+3. **Skills unavailable pre-init**: Can't use skills to detect/fix initialization
+## Proposed Multi-Layer Solution
+### Layer 1: Setup Status MCP Tool (Immediate Detection)
+**Add new MCP tool: `memory.check_setup`**
+```ruby
+{
+  name: "memory.check_setup",
+  description: "Check if ClaudeMemory is properly initialized. CALL THIS FIRST if memory tools fail or on first use of ClaudeMemory.",
+  result: {
+    initialized: true/false,
+    version: "1.2.3",
+    issues: ["No CLAUDE.md found", "Hooks not configured"],
+    recommendation: "Run: claude-memory init"
+  }
+}
+```
+**Implementation:**
+- Check for database existence
+- Check for CLAUDE.md with version marker
+- Check for hooks configuration
+- Return actionable recommendations
+**Update other tool descriptions:**
+```ruby
+description: "... If this tool fails with 'database not found', run memory.check_setup for guidance."
+```
+### Layer 2: Version Markers (Upgrade Detection)
+**Add version to CLAUDE.md:**
+```markdown
+<!-- ClaudeMemory v1.0.0 -->
+# ClaudeMemory
+...
+```
+**Create `claude-memory upgrade` command:**
+- Detect current version in CLAUDE.md
+- Compare with ClaudeMemory::VERSION
+- Offer to upgrade instructions
+- Preserve user customizations
+**Workflow:**
+```bash
+$ claude-memory upgrade
+Checking configuration version...
+Current: v0.9.0
+Latest: v1.0.0
+Changes in v1.0.0:
+- Added memory-first workflow instructions
+- Updated tool descriptions
+- New /check-memory skill
+Upgrade? [y/N] y
+✓ Backed up old CLAUDE.md to CLAUDE.md.backup
+✓ Updated workflow instructions
+✓ Preserved custom sections
+```
+### Layer 3: Graceful Degradation (Error Handling)
+**Update MCP Tools to detect uninitialized state:**
+```ruby
+def recall(args)
+  unless database_exists?
+    return {
+      error: "ClaudeMemory not initialized",
+      help: "Run 'claude-memory init' to set up databases and configuration",
+      documentation: "https://github.com/your-repo#installation"
+    }
+  end
+  # ... normal recall logic
+end
+```
+**Benefit**: Claude sees clear actionable errors instead of cryptic database failures.
+### Layer 4: Setup Reminder Skill
+**Create `/setup-memory` skill:**
+```markdown
+---
+name: setup-memory
+description: Guide user through ClaudeMemory installation
+disable-model-invocation: true
+---
+# ClaudeMemory Setup Guide
+ClaudeMemory is installed but not initialized.
+## Quick Setup
+Run this command:
+```bash
+claude-memory init
+```
+This will:
+1. Create global and project databases
+2. Configure hooks for automatic ingestion
+3. Add workflow instructions to CLAUDE.md
+4. Set up MCP server
+After running, restart Claude Code to load the configuration.
+## Verification
+After init, run:
+```bash
+claude-memory doctor
+```
+## Need Help?
+See: https://github.com/your-repo#troubleshooting
+```
+**Usage**: When Claude encounters "not initialized" errors, it can suggest: "Run `/setup-memory` for installation help"
+### Layer 5: Doctor Command Enhancement
+**Add `--fix` flag to doctor:**
+```bash
+$ claude-memory doctor --fix
+Checking configuration...
+✗ Project database missing
+✗ No CLAUDE.md found
+Would you like to run init? [y/N] y
+Running: claude-memory init
+...
+```
+**Add `--quiet` flag for programmatic checks:**
+```bash
+$ claude-memory doctor --quiet
+# Exit code 0 = healthy, 1 = needs init, 2 = needs upgrade
+```
+## Implementation Priority
+### Phase 1 (Immediate Value)
+1. ✅ Add version markers to init command
+2. ✅ Create `memory.check_setup` MCP tool
+3. ✅ Update error messages with actionable help
+4. ✅ Create `/setup-memory` skill
+### Phase 2 (Enhanced UX)
+5. ⬜ Create `claude-memory upgrade` command
+6. ⬜ Add `doctor --fix` and `doctor --quiet`
+7. ⬜ Add upgrade detection to SessionStart hook
+### Phase 3 (Polish)
+8. ⬜ Version migration system (v1.0.0 → v1.1.0)
+9. ⬜ Preserve custom CLAUDE.md sections during upgrade
+10. ⬜ Add upgrade notifications via MCP tool
+## Decision: Why Not Auto-Init?
+We deliberately **don't** auto-initialize because:
+1. **User control**: Installation should be explicit, not magical
+2. **Git hygiene**: Creates `.claude/` directory - users should understand this
+3. **Global vs project**: Users choose `--global` or project-local
+4. **Customization**: Users may want to review CLAUDE.md before committing
+Instead, we make initialization **obvious** and **frictionless** when needed.
+## Example User Journey
+### First-Time User
+```
+User: Where are client errors handled?
+Claude: Let me check memory...
+Claude: (calls memory.recall)
+MCP: Error - database not found. Run memory.check_setup.
+Claude: (calls memory.check_setup)
+MCP: Not initialized. Run: claude-memory init
+Claude: "It looks like ClaudeMemory isn't set up yet. Run `claude-memory init` to configure it. Would you like me to explain what this does first?"
+```
+### Upgrading User
+```
+User: Check memory about authentication
+Claude: (calls memory.recall)
+MCP: Returns results with warning: "Using outdated configuration v0.9.0. Run: claude-memory upgrade"
+Claude: "I found these facts about authentication: [...]. Note: You can upgrade to the latest ClaudeMemory configuration by running `claude-memory upgrade`."
+```
+## Testing Strategy
+- Unit tests for version detection logic
+- Integration tests for upgrade workflow
+- Manual testing of error messages
+- Test preservation of custom CLAUDE.md sections
+## Documentation Updates
+- Update README with upgrade instructions
+- Add CHANGELOG for version history
+- Document version markers in CLAUDE.md
+- Add troubleshooting guide for common issues

data/docs/ci_integration.md ADDED Viewed

@@ -0,0 +1,294 @@
+# CI Integration for Eval Framework
+## Current Status: ✅ Already Working
+The eval framework **requires no special CI setup** and already runs in GitHub Actions.
+### What's Already Running
+`.github/workflows/main.yml` runs on:
+- Every push to `main`
+- Every pull request
+It executes: `bundle exec rake` which runs:
+1. `rake spec` - All 1003 tests (including 15 eval tests)
+2. `rake standard` - Ruby linter
+**Evals are automatically included** because they're part of the RSpec suite (`spec/evals/*.rb`).
+### Why Evals Work in CI
+✅ **No API calls** - Use stubbed responses (no Claude API key needed)
+✅ **No external services** - Self-contained in-memory fixtures
+✅ **Fast** - <1s for all 15 eval tests, 40s for full suite
+✅ **Standard dependencies** - Just RSpec + ClaudeMemory gems
+✅ **Temporary directories** - Use `Dir.mktmpdir` (standard in CI)
+✅ **No environment variables** - No configuration needed
+### Current CI Output
+```
+...
+1003 examples, 0 failures
+Took 40 seconds
+```
+The 15 eval tests are included in the 1003 total. They run silently unless they fail.
+## Optional Enhancements
+If you want to make evals more visible in CI, consider these options:
+### Option 1: Separate Eval Report Step ⭐ Recommended
+Add a dedicated step to show eval summary:
+```yaml
+# .github/workflows/main.yml
+steps:
+  - uses: actions/checkout@v4
+  - name: Set up Ruby
+    uses: ruby/setup-ruby@v1
+    with:
+      ruby-version: ${{ matrix.ruby }}
+      bundler-cache: true
+  # NEW: Run evals with summary report
+  - name: Run evals with summary
+    run: ./bin/run-evals
+  # Existing: Run full test suite
+  - name: Run tests and linter
+    run: bundle exec rake
+```
+**Benefits:**
+- Clear "EVAL SUMMARY" section in CI logs
+- Shows behavioral scores prominently
+- Makes eval failures obvious
+**Example output in CI logs:**
+```
+============================================================
+EVAL SUMMARY
+============================================================
+Total Examples: 15
+Passed: 15 ✅
+Failed: 0 ❌
+============================================================
+BEHAVIORAL SCORES
+============================================================
+Convention Recall:       +100% improvement
+Architectural Decision:  +100% improvement
+Tech Stack Recall:       +100% improvement
+OVERALL: Memory improves responses by 100% on average
+============================================================
+```
+**Trade-offs:**
+- ✅ Better visibility
+- ⚠️ Runs evals twice (once in summary, once in full suite)
+- ⚠️ Adds ~1 second to CI time
+### Option 2: Fail Fast on Eval Failures
+Run evals first to catch memory issues early:
+```yaml
+- name: Run evals first (fail fast)
+  run: bundle exec rspec spec/evals/ --fail-fast
+- name: Run full test suite
+  run: bundle exec rake
+```
+**Benefits:**
+- Fails within ~1 second if evals break
+- Saves CI time (skips 1003 tests if evals fail)
+- Evals become "smoke tests" for memory system
+**Trade-offs:**
+- ⚠️ Runs evals twice (but stops fast if they fail)
+### Option 3: Separate Workflow for Evals
+Create `.github/workflows/evals.yml`:
+```yaml
+name: Evals
+on:
+  push:
+    branches: [main]
+  pull_request:
+  schedule:
+    - cron: '0 0 * * 0'  # Weekly on Sunday
+jobs:
+  evals:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - name: Set up Ruby
+        uses: ruby/setup-ruby@v1
+        with:
+          ruby-version: '4.0.1'
+          bundler-cache: true
+      - name: Run evals
+        run: ./bin/run-evals
+```
+**Benefits:**
+- Evals have dedicated status badge
+- Can schedule periodic eval runs (e.g., weekly)
+- Clearer separation of concerns
+**Trade-offs:**
+- ⚠️ More complex (2 workflows)
+- ⚠️ Runs evals 3 times (main workflow, eval workflow, scheduled)
+### Option 4: Eval Results as PR Comment
+Post eval summary as PR comment:
+```yaml
+- name: Run evals and capture results
+  id: evals
+  run: |
+    echo "results<<EOF" >> $GITHUB_OUTPUT
+    ./bin/run-evals >> $GITHUB_OUTPUT
+    echo "EOF" >> $GITHUB_OUTPUT
+- name: Comment eval results on PR
+  if: github.event_name == 'pull_request'
+  uses: actions/github-script@v7
+  with:
+    github-token: ${{ secrets.GITHUB_TOKEN }}
+    script: |
+      github.rest.issues.createComment({
+        issue_number: context.issue.number,
+        owner: context.repo.owner,
+        repo: context.repo.repo,
+        body: '## Eval Results\n\n```\n${{ steps.evals.outputs.results }}\n```'
+      })
+```
+**Benefits:**
+- Eval results visible in PR without checking logs
+- Reviewers see memory improvement metrics
+- Historical record in PR comments
+**Trade-offs:**
+- ⚠️ More complex (requires github-script action)
+- ⚠️ Creates comment on every push to PR
+- ⚠️ Requires GITHUB_TOKEN (usually automatic)
+## Recommendation
+**Current setup is perfect for now.** Evals already run and will catch regressions.
+When to add enhancements:
+- **Option 1**: If you want eval results more visible in logs (simple, low cost)
+- **Option 2**: If eval failures become frequent (fail fast saves time)
+- **Option 3**: If you want dedicated eval status badge
+- **Option 4**: If you want eval results visible to PR reviewers
+Most projects should start with **Option 1** (separate step with summary) only if visibility becomes an issue.
+## Testing CI Locally
+Simulate CI behavior locally:
+```bash
+# What CI runs (default rake task)
+bundle exec rake
+# Just evals (what CI could run separately)
+./bin/run-evals
+# Just evals with RSpec (alternative)
+bundle exec rspec spec/evals/ --format documentation
+```
+## CI Failure Scenarios
+### Scenario 1: Eval Test Fails
+```
+Failures:
+  1) Convention Recall Eval mentions stored conventions when asked
+     Failure/Error: expect(mentions_indentation).to be(true)
+       expected true
+            got false
+```
+**What happened**: Memory system regressed, stored conventions not recalled
+**Fix**: Investigate why memory population or recall failed
+### Scenario 2: All Tests Pass But Behavioral Scores Drop
+Current setup won't catch this (scores aren't checked automatically).
+To catch this in future (Week 3+):
+- Store expected scores in test
+- Assert: `expect(score).to be >= 0.9` (allow small variance)
+### Scenario 3: Fixture Setup Fails
+```
+Errno::EACCES: Permission denied @ dir_s_mkdir - /tmp
+```
+**What happened**: CI environment doesn't allow temp directory creation
+**Fix**: Unlikely in GitHub Actions (has `/tmp` access), but could use `ENV['TMPDIR']` fallback
+## Verification
+To verify evals are running in CI:
+1. **Check logs**: Look for "1003 examples, 0 failures" (includes evals)
+2. **Break an eval**: Change assertion to fail, push, check CI fails
+3. **Run locally**: `bundle exec rake` should match CI behavior
+## Future: Real Claude Execution (Week 3+)
+If you add real Claude execution (not stubbed):
+**Will need:**
+- `ANTHROPIC_API_KEY` in GitHub Secrets
+- Tag tests as `:slow` and skip by default
+- Optional: Run only on `main` branch (not PRs)
+- Optional: Schedule runs (don't run on every commit)
+**Example:**
+```yaml
+- name: Run slow evals (real Claude)
+  if: github.ref == 'refs/heads/main'
+  env:
+    ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
+  run: bundle exec rspec spec/evals/ --tag slow
+```
+But for current stubbed evals: **no special setup needed!** ✅
+## Summary
+| Aspect | Status | Notes |
+|--------|--------|-------|
+| Already running in CI? | ✅ Yes | Part of `bundle exec rake` |
+| Requires API keys? | ❌ No | Uses stubbed responses |
+| Requires environment variables? | ❌ No | Self-contained |
+| Requires special permissions? | ❌ No | Standard filesystem access |
+| Fast enough for CI? | ✅ Yes | <1s for evals, 40s total |
+| Catches regressions? | ✅ Yes | Will fail if memory system breaks |
+| Visible in logs? | ⚠️ Partial | Included in total count, not highlighted |
+| Recommended changes? | 🤷 Optional | Add separate summary step if desired |
+**Bottom line**: Evals work in CI today. Optional enhancements can improve visibility, but aren't required.