claude_memory 0.3.0 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,419 @@
1
+ # Week 2 Summary: Extract Patterns
2
+
3
+ **Date**: 2026-01-30
4
+ **Status**: ✅ Complete
5
+ **Duration**: ~1 hour
6
+
7
+ ## What We Built
8
+
9
+ ### Extracted Patterns
10
+
11
+ After implementing 3 eval scenarios in Week 1, clear patterns emerged. Week 2 focused on extracting these patterns without losing simplicity.
12
+
13
+ **Created**: `spec/evals/support/eval_helpers.rb`
14
+
15
+ This module provides 4 reusable components:
16
+
17
+ #### 1. SharedSetup Module
18
+ Common RSpec setup for all evals:
19
+ ```ruby
20
+ include EvalHelpers::SharedSetup
21
+
22
+ # Provides:
23
+ # - let(:tmpdir) - Isolated temp directory
24
+ # - let(:db_path) - Memory database path
25
+ # - before/after hooks for cleanup
26
+ ```
27
+
28
+ **Before** (repeated in each eval):
29
+ ```ruby
30
+ let(:tmpdir) { Dir.mktmpdir("eval_name_#{Process.pid}") }
31
+ let(:db_path) { File.join(tmpdir, ".claude/memory.sqlite3") }
32
+
33
+ before { FileUtils.mkdir_p(File.dirname(db_path)) }
34
+ after { FileUtils.rm_rf(tmpdir) }
35
+ ```
36
+
37
+ **After** (single include):
38
+ ```ruby
39
+ include EvalHelpers::SharedSetup
40
+ ```
41
+
42
+ **Lines saved**: ~6 per eval × 3 evals = 18 lines
43
+
44
+ #### 2. MemoryFixtureBuilder Class
45
+ Simplifies memory population:
46
+
47
+ **Before** (verbose, repetitive):
48
+ ```ruby
49
+ store = ClaudeMemory::Store::SQLiteStore.new(db_path)
50
+ entity_id = store.find_or_create_entity(type: "repo", name: "test-project")
51
+
52
+ content_id = store.upsert_content_item(...)
53
+ fact_id = store.insert_fact(...)
54
+ store.insert_provenance(...)
55
+ fts = ClaudeMemory::Index::LexicalFTS.new(store)
56
+ fts.index_content_item(...)
57
+
58
+ store.close
59
+ ```
60
+
61
+ **After** (declarative, clear intent):
62
+ ```ruby
63
+ builder = EvalHelpers::MemoryFixtureBuilder.new(db_path)
64
+
65
+ builder.add_fact(
66
+ predicate: "convention",
67
+ object: "Use 2-space indentation",
68
+ text: "Use 2-space indentation for Ruby files",
69
+ fts_keywords: "coding convention style"
70
+ )
71
+
72
+ builder.close
73
+ ```
74
+
75
+ **Benefits**:
76
+ - Single responsibility (fact creation)
77
+ - Hides implementation details (content items, provenance, FTS)
78
+ - Supports single fact (`add_fact`) or batch (`add_facts`)
79
+ - **Lines saved**: ~15-20 per eval
80
+
81
+ #### 3. ResponseStubs Module
82
+ Standardizes stubbed responses:
83
+
84
+ **Before**:
85
+ ```ruby
86
+ {
87
+ success: true,
88
+ result: "Response text here...",
89
+ session_id: "stub-session-123"
90
+ }
91
+ ```
92
+
93
+ **After**:
94
+ ```ruby
95
+ stub_success_response(
96
+ "Response text here...",
97
+ session_id: "stub-session-convention-memory"
98
+ )
99
+ ```
100
+
101
+ **Benefits**:
102
+ - Consistent response format
103
+ - Explicit success/failure handling
104
+ - Less boilerplate
105
+
106
+ #### 4. ScoringHelpers Module
107
+ Simplifies common checks:
108
+
109
+ **Before**:
110
+ ```ruby
111
+ mentions_indentation = response.include?("2-space") || response.include?("2 space")
112
+ mentions_rspec = response.include?("expect syntax") || response.include?("expect")
113
+
114
+ score = 0.0
115
+ score += 0.5 if mentions_indentation
116
+ score += 0.5 if mentions_rspec
117
+ ```
118
+
119
+ **After**:
120
+ ```ruby
121
+ mentions_indentation = includes_any?(response, "2-space", "2 space")
122
+ mentions_rspec = includes_any?(response, "expect syntax", "expect")
123
+
124
+ score = score_from_checks(mentions_indentation, mentions_rspec)
125
+ ```
126
+
127
+ **Benefits**:
128
+ - `includes_all?` - All terms present
129
+ - `includes_any?` - Any term present
130
+ - `score_from_checks` - Automatic equal weighting
131
+
132
+ ### Refactored All 3 Evals
133
+
134
+ Updated all eval specs to use helpers:
135
+ - `convention_recall_spec.rb`: 124 lines → 123 lines (cleaner, not shorter)
136
+ - `architectural_decision_spec.rb`: 130 lines → 120 lines (-10 lines)
137
+ - `tech_stack_recall_spec.rb`: 157 lines → 146 lines (-11 lines)
138
+
139
+ **Total reduction**: ~21 lines, but more importantly: **clearer intent**.
140
+
141
+ ## Code Quality Improvements
142
+
143
+ ### Before: Inline Everything (Week 1)
144
+ ```ruby
145
+ def populate_fixture_memory
146
+ store = ClaudeMemory::Store::SQLiteStore.new(db_path)
147
+ entity_id = store.find_or_create_entity(type: "repo", name: "test-project")
148
+
149
+ fact_id_1 = store.insert_fact(
150
+ subject_entity_id: entity_id,
151
+ predicate: "convention",
152
+ object_literal: "Use 2-space indentation for Ruby files",
153
+ scope: "project"
154
+ )
155
+
156
+ content_id_1 = store.upsert_content_item(
157
+ source: "test",
158
+ session_id: "test-session",
159
+ text_hash: Digest::SHA256.hexdigest("2-space indentation"),
160
+ byte_len: 20,
161
+ raw_text: "Use 2-space indentation for Ruby files"
162
+ )
163
+
164
+ store.insert_provenance(
165
+ fact_id: fact_id_1,
166
+ content_item_id: content_id_1,
167
+ quote: "2-space indentation",
168
+ strength: "stated"
169
+ )
170
+
171
+ fts = ClaudeMemory::Index::LexicalFTS.new(store)
172
+ fts.index_content_item(content_id_1, "Use 2-space indentation...")
173
+
174
+ # Repeat for fact_id_2...
175
+
176
+ store.close
177
+ end
178
+ ```
179
+
180
+ ### After: Extracted Helpers (Week 2)
181
+ ```ruby
182
+ def populate_fixture_memory
183
+ builder = EvalHelpers::MemoryFixtureBuilder.new(db_path)
184
+
185
+ builder.add_facts([
186
+ {
187
+ predicate: "convention",
188
+ object: "Use 2-space indentation for Ruby files",
189
+ text: "Use 2-space indentation for Ruby files",
190
+ fts_keywords: "coding convention style"
191
+ },
192
+ {
193
+ predicate: "convention",
194
+ object: "Prefer RSpec's expect syntax over should syntax",
195
+ text: "Prefer RSpec's expect syntax over should syntax",
196
+ fts_keywords: "convention style pattern"
197
+ }
198
+ ])
199
+
200
+ builder.close
201
+ end
202
+ ```
203
+
204
+ **Improvements**:
205
+ - ✅ Declarative (what, not how)
206
+ - ✅ Readable (clear intent)
207
+ - ✅ DRY (no repetition)
208
+ - ✅ Hides complexity (content items, provenance, FTS)
209
+
210
+ ## Test Results
211
+
212
+ ```bash
213
+ $ bundle exec rspec spec/evals/ --format documentation
214
+
215
+ Tech Stack Recall Eval
216
+ with memory populated
217
+ ✓ calculates accuracy score
218
+ ✓ correctly identifies the testing framework
219
+ fixture setup
220
+ ✓ creates memory database with tech stack facts
221
+ baseline (no memory)
222
+ ✓ has lower accuracy score
223
+ ✓ cannot identify the specific framework without memory
224
+
225
+ Convention Recall Eval
226
+ fixture setup
227
+ ✓ creates memory database with conventions
228
+ baseline (no memory)
229
+ ✓ does not mention specific project conventions
230
+ ✓ has lower behavioral score than memory-enabled
231
+ with memory populated
232
+ ✓ mentions stored conventions when asked
233
+ ✓ calculates behavioral score
234
+
235
+ Architectural Decision Eval
236
+ fixture setup
237
+ ✓ creates memory database with architectural decision
238
+ with memory populated
239
+ ✓ mentions the stored architectural decision
240
+ ✓ calculates behavioral score for decision adherence
241
+ baseline (no memory)
242
+ ✓ has lower decision adherence score
243
+ ✓ gives generic advice without knowing the decision
244
+
245
+ Finished in 0.23s
246
+ 15 examples, 0 failures ✅
247
+ ```
248
+
249
+ ## Design Principles Applied
250
+
251
+ ### Sandi Metz: Extract Only When Painful
252
+
253
+ > "Extract collaborators only when you feel pain"
254
+
255
+ **Week 1**: Inline everything, no abstractions
256
+ **Week 2**: Felt pain after 3 evals, extracted patterns
257
+ ✅ **Right timing**: Extracted based on real needs, not speculation
258
+
259
+ ### Kent Beck: Incremental Design
260
+
261
+ > "Make it work, make it right, make it fast"
262
+
263
+ **Week 1**: Make it work (3 evals passing)
264
+ **Week 2**: Make it right (extract patterns)
265
+ ✅ **Emerged design**: Abstraction emerged from real usage
266
+
267
+ ### Avdi Grimm: Tell, Don't Ask
268
+
269
+ **Before**:
270
+ ```ruby
271
+ store = ClaudeMemory::Store::SQLiteStore.new(db_path)
272
+ entity_id = store.find_or_create_entity(...)
273
+ fact_id = store.insert_fact(...)
274
+ # ... more imperative steps
275
+ ```
276
+
277
+ **After**:
278
+ ```ruby
279
+ builder.add_fact(predicate: "convention", object: "...", text: "...")
280
+ ```
281
+
282
+ ✅ **Declarative**: Tell builder what to create, not how
283
+
284
+ ## Files Modified
285
+
286
+ ```
287
+ spec/evals/support/
288
+ └── eval_helpers.rb # NEW: Extracted helpers (145 lines)
289
+
290
+ spec/evals/
291
+ ├── convention_recall_spec.rb # REFACTORED: Uses helpers
292
+ ├── architectural_decision_spec.rb # REFACTORED: Uses helpers
293
+ └── tech_stack_recall_spec.rb # REFACTORED: Uses helpers
294
+ ```
295
+
296
+ ## What We Learned
297
+
298
+ ### Extraction Was Worth It
299
+
300
+ **Benefits**:
301
+ 1. ✅ Less duplication (DRY)
302
+ 2. ✅ Clearer intent (declarative)
303
+ 3. ✅ Easier to add new evals (reuse helpers)
304
+ 4. ✅ Single place to fix bugs (MemoryFixtureBuilder)
305
+
306
+ **Trade-offs**:
307
+ 1. ⚠️ More indirection (need to understand helpers)
308
+ 2. ⚠️ Slightly more complex (4 modules vs inline code)
309
+
310
+ **Verdict**: Worth it. Adding a 4th eval will be much easier now.
311
+
312
+ ### When to Extract
313
+
314
+ **Right time to extract**:
315
+ - After 2-3 similar implementations
316
+ - When duplication causes pain
317
+ - When pattern is clear and stable
318
+
319
+ **Wrong time to extract**:
320
+ - Before any implementation (speculation)
321
+ - After only 1 implementation (too early)
322
+ - When pattern is still evolving
323
+
324
+ ### What NOT to Extract Yet
325
+
326
+ We deliberately **did not** extract:
327
+ 1. ❌ Base `EvalCase` class - Not enough common interface yet
328
+ 2. ❌ `ClaudeRunner` - Not using real Claude execution
329
+ 3. ❌ `MetricsCollector` - Not tracking results over time
330
+ 4. ❌ `ResultStore` - Not needed yet
331
+
332
+ **Reason**: No pain yet. Extract when needed.
333
+
334
+ ## Next Steps (Week 3+)
335
+
336
+ ### Option A: Add More Scenarios (Recommended)
337
+
338
+ Now that helpers exist, adding new evals is easy:
339
+
340
+ ```ruby
341
+ require_relative "support/eval_helpers"
342
+
343
+ RSpec.describe "New Eval", :eval do
344
+ include EvalHelpers::SharedSetup
345
+ include EvalHelpers::ResponseStubs
346
+ include EvalHelpers::ScoringHelpers
347
+
348
+ def populate_fixture_memory
349
+ builder = EvalHelpers::MemoryFixtureBuilder.new(db_path)
350
+ builder.add_fact(...)
351
+ builder.close
352
+ end
353
+
354
+ # ... rest of eval
355
+ end
356
+ ```
357
+
358
+ **Potential scenarios**:
359
+ - Implementation Consistency (follows existing patterns)
360
+ - Code Style Adherence (respects conventions)
361
+ - Framework Usage (uses correct APIs)
362
+
363
+ ### Option B: Add Real Claude Execution
364
+
365
+ Implement `ClaudeRunner` for integration tests:
366
+
367
+ ```ruby
368
+ module EvalHelpers
369
+ class ClaudeRunner
370
+ def run(prompt, working_dir)
371
+ # Shell out to claude -p --output-format json
372
+ # Parse response
373
+ # Extract tool calls from transcript
374
+ end
375
+ end
376
+ end
377
+ ```
378
+
379
+ **When**: If stubbed responses miss real issues
380
+
381
+ ### Option C: Tool Call Tracking
382
+
383
+ Add ability to verify memory tools were invoked:
384
+
385
+ ```ruby
386
+ it "invokes memory.conventions tool" do
387
+ result = run_claude(prompt, tmpdir)
388
+ tool_calls = result[:tool_calls]
389
+
390
+ expect(tool_calls).to include(tool: "memory.conventions")
391
+ end
392
+ ```
393
+
394
+ **When**: If we need to test tool selection (like Vercel's 56% skip rate)
395
+
396
+ ## Summary
397
+
398
+ **Week 2 achieved**:
399
+ - ✅ Extracted 4 helper modules from repeated patterns
400
+ - ✅ Refactored all 3 evals to use helpers
401
+ - ✅ Maintained 100% test pass rate (15/15)
402
+ - ✅ Improved code clarity and maintainability
403
+ - ✅ Made adding new evals easier
404
+
405
+ **Lines of code**:
406
+ - Added: 145 lines (helpers)
407
+ - Removed: ~21 lines (duplication)
408
+ - Net: +124 lines, but much clearer intent
409
+
410
+ **Velocity impact**:
411
+ - Adding 4th eval: ~30 minutes (vs 1 hour in Week 1)
412
+ - Changing fixture setup: 1 place (vs 3 places)
413
+
414
+ **Quality improvement**:
415
+ - Declarative > Imperative
416
+ - DRY > Repetitive
417
+ - Clear > Verbose
418
+
419
+ **Ready for**: Week 3 (add more scenarios or advanced features)