claude_memory 0.3.0 → 0.4.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.claude/CLAUDE.md +1 -1
- data/.claude/output-styles/memory-aware.md +1 -0
- data/.claude/rules/claude_memory.generated.md +1 -39
- data/.claude/settings.local.json +4 -1
- data/.claude/skills/check-memory/DEPRECATED.md +29 -0
- data/.claude/skills/debug-memory +1 -0
- data/.claude/skills/memory-first-workflow +1 -0
- data/.claude/skills/setup-memory +1 -0
- data/.claude-plugin/plugin.json +1 -1
- data/.lefthook/map_specs.rb +29 -0
- data/CHANGELOG.md +15 -7
- data/CLAUDE.md +38 -0
- data/README.md +43 -0
- data/Rakefile +14 -1
- data/WEEK2_COMPLETE.md +250 -0
- data/docs/architecture.md +49 -14
- data/docs/ci_integration.md +294 -0
- data/docs/eval_week1_summary.md +183 -0
- data/docs/eval_week2_summary.md +419 -0
- data/docs/evals.md +353 -0
- data/docs/improvements.md +22 -23
- data/docs/remaining_improvements.md +2 -2
- data/lefthook.yml +8 -1
- data/lib/claude_memory/embeddings/fastembed_adapter.rb +55 -0
- data/lib/claude_memory/ingest/ingester.rb +7 -3
- data/lib/claude_memory/mcp/tool_definitions.rb +7 -7
- data/lib/claude_memory/version.rb +1 -1
- data/output-styles/memory-aware.md +71 -0
- data/skills/debug-memory/SKILL.md +146 -0
- data/skills/memory-first-workflow/SKILL.md +144 -0
- metadata +16 -4
- data/.claude/.mind.mv2.o2N83S +0 -0
- data/.claude/output-styles/memory-aware.md +0 -21
- data/docs/.claude/mind.mv2.lock +0 -0
- /data/{.claude/skills → skills}/setup-memory/SKILL.md +0 -0
|
@@ -0,0 +1,419 @@
|
|
|
1
|
+
# Week 2 Summary: Extract Patterns
|
|
2
|
+
|
|
3
|
+
**Date**: 2026-01-30
|
|
4
|
+
**Status**: ✅ Complete
|
|
5
|
+
**Duration**: ~1 hour
|
|
6
|
+
|
|
7
|
+
## What We Built
|
|
8
|
+
|
|
9
|
+
### Extracted Patterns
|
|
10
|
+
|
|
11
|
+
After implementing 3 eval scenarios in Week 1, clear patterns emerged. Week 2 focused on extracting these patterns without losing simplicity.
|
|
12
|
+
|
|
13
|
+
**Created**: `spec/evals/support/eval_helpers.rb`
|
|
14
|
+
|
|
15
|
+
This module provides 4 reusable components:
|
|
16
|
+
|
|
17
|
+
#### 1. SharedSetup Module
|
|
18
|
+
Common RSpec setup for all evals:
|
|
19
|
+
```ruby
|
|
20
|
+
include EvalHelpers::SharedSetup
|
|
21
|
+
|
|
22
|
+
# Provides:
|
|
23
|
+
# - let(:tmpdir) - Isolated temp directory
|
|
24
|
+
# - let(:db_path) - Memory database path
|
|
25
|
+
# - before/after hooks for cleanup
|
|
26
|
+
```
|
|
27
|
+
|
|
28
|
+
**Before** (repeated in each eval):
|
|
29
|
+
```ruby
|
|
30
|
+
let(:tmpdir) { Dir.mktmpdir("eval_name_#{Process.pid}") }
|
|
31
|
+
let(:db_path) { File.join(tmpdir, ".claude/memory.sqlite3") }
|
|
32
|
+
|
|
33
|
+
before { FileUtils.mkdir_p(File.dirname(db_path)) }
|
|
34
|
+
after { FileUtils.rm_rf(tmpdir) }
|
|
35
|
+
```
|
|
36
|
+
|
|
37
|
+
**After** (single include):
|
|
38
|
+
```ruby
|
|
39
|
+
include EvalHelpers::SharedSetup
|
|
40
|
+
```
|
|
41
|
+
|
|
42
|
+
**Lines saved**: ~6 per eval × 3 evals = 18 lines
|
|
43
|
+
|
|
44
|
+
#### 2. MemoryFixtureBuilder Class
|
|
45
|
+
Simplifies memory population:
|
|
46
|
+
|
|
47
|
+
**Before** (verbose, repetitive):
|
|
48
|
+
```ruby
|
|
49
|
+
store = ClaudeMemory::Store::SQLiteStore.new(db_path)
|
|
50
|
+
entity_id = store.find_or_create_entity(type: "repo", name: "test-project")
|
|
51
|
+
|
|
52
|
+
content_id = store.upsert_content_item(...)
|
|
53
|
+
fact_id = store.insert_fact(...)
|
|
54
|
+
store.insert_provenance(...)
|
|
55
|
+
fts = ClaudeMemory::Index::LexicalFTS.new(store)
|
|
56
|
+
fts.index_content_item(...)
|
|
57
|
+
|
|
58
|
+
store.close
|
|
59
|
+
```
|
|
60
|
+
|
|
61
|
+
**After** (declarative, clear intent):
|
|
62
|
+
```ruby
|
|
63
|
+
builder = EvalHelpers::MemoryFixtureBuilder.new(db_path)
|
|
64
|
+
|
|
65
|
+
builder.add_fact(
|
|
66
|
+
predicate: "convention",
|
|
67
|
+
object: "Use 2-space indentation",
|
|
68
|
+
text: "Use 2-space indentation for Ruby files",
|
|
69
|
+
fts_keywords: "coding convention style"
|
|
70
|
+
)
|
|
71
|
+
|
|
72
|
+
builder.close
|
|
73
|
+
```
|
|
74
|
+
|
|
75
|
+
**Benefits**:
|
|
76
|
+
- Single responsibility (fact creation)
|
|
77
|
+
- Hides implementation details (content items, provenance, FTS)
|
|
78
|
+
- Supports single fact (`add_fact`) or batch (`add_facts`)
|
|
79
|
+
- **Lines saved**: ~15-20 per eval
|
|
80
|
+
|
|
81
|
+
#### 3. ResponseStubs Module
|
|
82
|
+
Standardizes stubbed responses:
|
|
83
|
+
|
|
84
|
+
**Before**:
|
|
85
|
+
```ruby
|
|
86
|
+
{
|
|
87
|
+
success: true,
|
|
88
|
+
result: "Response text here...",
|
|
89
|
+
session_id: "stub-session-123"
|
|
90
|
+
}
|
|
91
|
+
```
|
|
92
|
+
|
|
93
|
+
**After**:
|
|
94
|
+
```ruby
|
|
95
|
+
stub_success_response(
|
|
96
|
+
"Response text here...",
|
|
97
|
+
session_id: "stub-session-convention-memory"
|
|
98
|
+
)
|
|
99
|
+
```
|
|
100
|
+
|
|
101
|
+
**Benefits**:
|
|
102
|
+
- Consistent response format
|
|
103
|
+
- Explicit success/failure handling
|
|
104
|
+
- Less boilerplate
|
|
105
|
+
|
|
106
|
+
#### 4. ScoringHelpers Module
|
|
107
|
+
Simplifies common checks:
|
|
108
|
+
|
|
109
|
+
**Before**:
|
|
110
|
+
```ruby
|
|
111
|
+
mentions_indentation = response.include?("2-space") || response.include?("2 space")
|
|
112
|
+
mentions_rspec = response.include?("expect syntax") || response.include?("expect")
|
|
113
|
+
|
|
114
|
+
score = 0.0
|
|
115
|
+
score += 0.5 if mentions_indentation
|
|
116
|
+
score += 0.5 if mentions_rspec
|
|
117
|
+
```
|
|
118
|
+
|
|
119
|
+
**After**:
|
|
120
|
+
```ruby
|
|
121
|
+
mentions_indentation = includes_any?(response, "2-space", "2 space")
|
|
122
|
+
mentions_rspec = includes_any?(response, "expect syntax", "expect")
|
|
123
|
+
|
|
124
|
+
score = score_from_checks(mentions_indentation, mentions_rspec)
|
|
125
|
+
```
|
|
126
|
+
|
|
127
|
+
**Benefits**:
|
|
128
|
+
- `includes_all?` - All terms present
|
|
129
|
+
- `includes_any?` - Any term present
|
|
130
|
+
- `score_from_checks` - Automatic equal weighting
|
|
131
|
+
|
|
132
|
+
### Refactored All 3 Evals
|
|
133
|
+
|
|
134
|
+
Updated all eval specs to use helpers:
|
|
135
|
+
- `convention_recall_spec.rb`: 124 lines → 123 lines (cleaner, not shorter)
|
|
136
|
+
- `architectural_decision_spec.rb`: 130 lines → 120 lines (-10 lines)
|
|
137
|
+
- `tech_stack_recall_spec.rb`: 157 lines → 146 lines (-11 lines)
|
|
138
|
+
|
|
139
|
+
**Total reduction**: ~21 lines, but more importantly: **clearer intent**.
|
|
140
|
+
|
|
141
|
+
## Code Quality Improvements
|
|
142
|
+
|
|
143
|
+
### Before: Inline Everything (Week 1)
|
|
144
|
+
```ruby
|
|
145
|
+
def populate_fixture_memory
|
|
146
|
+
store = ClaudeMemory::Store::SQLiteStore.new(db_path)
|
|
147
|
+
entity_id = store.find_or_create_entity(type: "repo", name: "test-project")
|
|
148
|
+
|
|
149
|
+
fact_id_1 = store.insert_fact(
|
|
150
|
+
subject_entity_id: entity_id,
|
|
151
|
+
predicate: "convention",
|
|
152
|
+
object_literal: "Use 2-space indentation for Ruby files",
|
|
153
|
+
scope: "project"
|
|
154
|
+
)
|
|
155
|
+
|
|
156
|
+
content_id_1 = store.upsert_content_item(
|
|
157
|
+
source: "test",
|
|
158
|
+
session_id: "test-session",
|
|
159
|
+
text_hash: Digest::SHA256.hexdigest("2-space indentation"),
|
|
160
|
+
byte_len: 20,
|
|
161
|
+
raw_text: "Use 2-space indentation for Ruby files"
|
|
162
|
+
)
|
|
163
|
+
|
|
164
|
+
store.insert_provenance(
|
|
165
|
+
fact_id: fact_id_1,
|
|
166
|
+
content_item_id: content_id_1,
|
|
167
|
+
quote: "2-space indentation",
|
|
168
|
+
strength: "stated"
|
|
169
|
+
)
|
|
170
|
+
|
|
171
|
+
fts = ClaudeMemory::Index::LexicalFTS.new(store)
|
|
172
|
+
fts.index_content_item(content_id_1, "Use 2-space indentation...")
|
|
173
|
+
|
|
174
|
+
# Repeat for fact_id_2...
|
|
175
|
+
|
|
176
|
+
store.close
|
|
177
|
+
end
|
|
178
|
+
```
|
|
179
|
+
|
|
180
|
+
### After: Extracted Helpers (Week 2)
|
|
181
|
+
```ruby
|
|
182
|
+
def populate_fixture_memory
|
|
183
|
+
builder = EvalHelpers::MemoryFixtureBuilder.new(db_path)
|
|
184
|
+
|
|
185
|
+
builder.add_facts([
|
|
186
|
+
{
|
|
187
|
+
predicate: "convention",
|
|
188
|
+
object: "Use 2-space indentation for Ruby files",
|
|
189
|
+
text: "Use 2-space indentation for Ruby files",
|
|
190
|
+
fts_keywords: "coding convention style"
|
|
191
|
+
},
|
|
192
|
+
{
|
|
193
|
+
predicate: "convention",
|
|
194
|
+
object: "Prefer RSpec's expect syntax over should syntax",
|
|
195
|
+
text: "Prefer RSpec's expect syntax over should syntax",
|
|
196
|
+
fts_keywords: "convention style pattern"
|
|
197
|
+
}
|
|
198
|
+
])
|
|
199
|
+
|
|
200
|
+
builder.close
|
|
201
|
+
end
|
|
202
|
+
```
|
|
203
|
+
|
|
204
|
+
**Improvements**:
|
|
205
|
+
- ✅ Declarative (what, not how)
|
|
206
|
+
- ✅ Readable (clear intent)
|
|
207
|
+
- ✅ DRY (no repetition)
|
|
208
|
+
- ✅ Hides complexity (content items, provenance, FTS)
|
|
209
|
+
|
|
210
|
+
## Test Results
|
|
211
|
+
|
|
212
|
+
```bash
|
|
213
|
+
$ bundle exec rspec spec/evals/ --format documentation
|
|
214
|
+
|
|
215
|
+
Tech Stack Recall Eval
|
|
216
|
+
with memory populated
|
|
217
|
+
✓ calculates accuracy score
|
|
218
|
+
✓ correctly identifies the testing framework
|
|
219
|
+
fixture setup
|
|
220
|
+
✓ creates memory database with tech stack facts
|
|
221
|
+
baseline (no memory)
|
|
222
|
+
✓ has lower accuracy score
|
|
223
|
+
✓ cannot identify the specific framework without memory
|
|
224
|
+
|
|
225
|
+
Convention Recall Eval
|
|
226
|
+
fixture setup
|
|
227
|
+
✓ creates memory database with conventions
|
|
228
|
+
baseline (no memory)
|
|
229
|
+
✓ does not mention specific project conventions
|
|
230
|
+
✓ has lower behavioral score than memory-enabled
|
|
231
|
+
with memory populated
|
|
232
|
+
✓ mentions stored conventions when asked
|
|
233
|
+
✓ calculates behavioral score
|
|
234
|
+
|
|
235
|
+
Architectural Decision Eval
|
|
236
|
+
fixture setup
|
|
237
|
+
✓ creates memory database with architectural decision
|
|
238
|
+
with memory populated
|
|
239
|
+
✓ mentions the stored architectural decision
|
|
240
|
+
✓ calculates behavioral score for decision adherence
|
|
241
|
+
baseline (no memory)
|
|
242
|
+
✓ has lower decision adherence score
|
|
243
|
+
✓ gives generic advice without knowing the decision
|
|
244
|
+
|
|
245
|
+
Finished in 0.23s
|
|
246
|
+
15 examples, 0 failures ✅
|
|
247
|
+
```
|
|
248
|
+
|
|
249
|
+
## Design Principles Applied
|
|
250
|
+
|
|
251
|
+
### Sandi Metz: Extract Only When Painful
|
|
252
|
+
|
|
253
|
+
> "Extract collaborators only when you feel pain"
|
|
254
|
+
|
|
255
|
+
**Week 1**: Inline everything, no abstractions
|
|
256
|
+
**Week 2**: Felt pain after 3 evals, extracted patterns
|
|
257
|
+
✅ **Right timing**: Extracted based on real needs, not speculation
|
|
258
|
+
|
|
259
|
+
### Kent Beck: Incremental Design
|
|
260
|
+
|
|
261
|
+
> "Make it work, make it right, make it fast"
|
|
262
|
+
|
|
263
|
+
**Week 1**: Make it work (3 evals passing)
|
|
264
|
+
**Week 2**: Make it right (extract patterns)
|
|
265
|
+
✅ **Emerged design**: Abstraction emerged from real usage
|
|
266
|
+
|
|
267
|
+
### Avdi Grimm: Tell, Don't Ask
|
|
268
|
+
|
|
269
|
+
**Before**:
|
|
270
|
+
```ruby
|
|
271
|
+
store = ClaudeMemory::Store::SQLiteStore.new(db_path)
|
|
272
|
+
entity_id = store.find_or_create_entity(...)
|
|
273
|
+
fact_id = store.insert_fact(...)
|
|
274
|
+
# ... more imperative steps
|
|
275
|
+
```
|
|
276
|
+
|
|
277
|
+
**After**:
|
|
278
|
+
```ruby
|
|
279
|
+
builder.add_fact(predicate: "convention", object: "...", text: "...")
|
|
280
|
+
```
|
|
281
|
+
|
|
282
|
+
✅ **Declarative**: Tell builder what to create, not how
|
|
283
|
+
|
|
284
|
+
## Files Modified
|
|
285
|
+
|
|
286
|
+
```
|
|
287
|
+
spec/evals/support/
|
|
288
|
+
└── eval_helpers.rb # NEW: Extracted helpers (145 lines)
|
|
289
|
+
|
|
290
|
+
spec/evals/
|
|
291
|
+
├── convention_recall_spec.rb # REFACTORED: Uses helpers
|
|
292
|
+
├── architectural_decision_spec.rb # REFACTORED: Uses helpers
|
|
293
|
+
└── tech_stack_recall_spec.rb # REFACTORED: Uses helpers
|
|
294
|
+
```
|
|
295
|
+
|
|
296
|
+
## What We Learned
|
|
297
|
+
|
|
298
|
+
### Extraction Was Worth It
|
|
299
|
+
|
|
300
|
+
**Benefits**:
|
|
301
|
+
1. ✅ Less duplication (DRY)
|
|
302
|
+
2. ✅ Clearer intent (declarative)
|
|
303
|
+
3. ✅ Easier to add new evals (reuse helpers)
|
|
304
|
+
4. ✅ Single place to fix bugs (MemoryFixtureBuilder)
|
|
305
|
+
|
|
306
|
+
**Trade-offs**:
|
|
307
|
+
1. ⚠️ More indirection (need to understand helpers)
|
|
308
|
+
2. ⚠️ Slightly more complex (4 modules vs inline code)
|
|
309
|
+
|
|
310
|
+
**Verdict**: Worth it. Adding a 4th eval will be much easier now.
|
|
311
|
+
|
|
312
|
+
### When to Extract
|
|
313
|
+
|
|
314
|
+
**Right time to extract**:
|
|
315
|
+
- After 2-3 similar implementations
|
|
316
|
+
- When duplication causes pain
|
|
317
|
+
- When pattern is clear and stable
|
|
318
|
+
|
|
319
|
+
**Wrong time to extract**:
|
|
320
|
+
- Before any implementation (speculation)
|
|
321
|
+
- After only 1 implementation (too early)
|
|
322
|
+
- When pattern is still evolving
|
|
323
|
+
|
|
324
|
+
### What NOT to Extract Yet
|
|
325
|
+
|
|
326
|
+
We deliberately **did not** extract:
|
|
327
|
+
1. ❌ Base `EvalCase` class - Not enough common interface yet
|
|
328
|
+
2. ❌ `ClaudeRunner` - Not using real Claude execution
|
|
329
|
+
3. ❌ `MetricsCollector` - Not tracking results over time
|
|
330
|
+
4. ❌ `ResultStore` - Not needed yet
|
|
331
|
+
|
|
332
|
+
**Reason**: No pain yet. Extract when needed.
|
|
333
|
+
|
|
334
|
+
## Next Steps (Week 3+)
|
|
335
|
+
|
|
336
|
+
### Option A: Add More Scenarios (Recommended)
|
|
337
|
+
|
|
338
|
+
Now that helpers exist, adding new evals is easy:
|
|
339
|
+
|
|
340
|
+
```ruby
|
|
341
|
+
require_relative "support/eval_helpers"
|
|
342
|
+
|
|
343
|
+
RSpec.describe "New Eval", :eval do
|
|
344
|
+
include EvalHelpers::SharedSetup
|
|
345
|
+
include EvalHelpers::ResponseStubs
|
|
346
|
+
include EvalHelpers::ScoringHelpers
|
|
347
|
+
|
|
348
|
+
def populate_fixture_memory
|
|
349
|
+
builder = EvalHelpers::MemoryFixtureBuilder.new(db_path)
|
|
350
|
+
builder.add_fact(...)
|
|
351
|
+
builder.close
|
|
352
|
+
end
|
|
353
|
+
|
|
354
|
+
# ... rest of eval
|
|
355
|
+
end
|
|
356
|
+
```
|
|
357
|
+
|
|
358
|
+
**Potential scenarios**:
|
|
359
|
+
- Implementation Consistency (follows existing patterns)
|
|
360
|
+
- Code Style Adherence (respects conventions)
|
|
361
|
+
- Framework Usage (uses correct APIs)
|
|
362
|
+
|
|
363
|
+
### Option B: Add Real Claude Execution
|
|
364
|
+
|
|
365
|
+
Implement `ClaudeRunner` for integration tests:
|
|
366
|
+
|
|
367
|
+
```ruby
|
|
368
|
+
module EvalHelpers
|
|
369
|
+
class ClaudeRunner
|
|
370
|
+
def run(prompt, working_dir)
|
|
371
|
+
# Shell out to claude -p --output-format json
|
|
372
|
+
# Parse response
|
|
373
|
+
# Extract tool calls from transcript
|
|
374
|
+
end
|
|
375
|
+
end
|
|
376
|
+
end
|
|
377
|
+
```
|
|
378
|
+
|
|
379
|
+
**When**: If stubbed responses miss real issues
|
|
380
|
+
|
|
381
|
+
### Option C: Tool Call Tracking
|
|
382
|
+
|
|
383
|
+
Add ability to verify memory tools were invoked:
|
|
384
|
+
|
|
385
|
+
```ruby
|
|
386
|
+
it "invokes memory.conventions tool" do
|
|
387
|
+
result = run_claude(prompt, tmpdir)
|
|
388
|
+
tool_calls = result[:tool_calls]
|
|
389
|
+
|
|
390
|
+
expect(tool_calls).to include(tool: "memory.conventions")
|
|
391
|
+
end
|
|
392
|
+
```
|
|
393
|
+
|
|
394
|
+
**When**: If we need to test tool selection (like Vercel's 56% skip rate)
|
|
395
|
+
|
|
396
|
+
## Summary
|
|
397
|
+
|
|
398
|
+
**Week 2 achieved**:
|
|
399
|
+
- ✅ Extracted 4 helper modules from repeated patterns
|
|
400
|
+
- ✅ Refactored all 3 evals to use helpers
|
|
401
|
+
- ✅ Maintained 100% test pass rate (15/15)
|
|
402
|
+
- ✅ Improved code clarity and maintainability
|
|
403
|
+
- ✅ Made adding new evals easier
|
|
404
|
+
|
|
405
|
+
**Lines of code**:
|
|
406
|
+
- Added: 145 lines (helpers)
|
|
407
|
+
- Removed: ~21 lines (duplication)
|
|
408
|
+
- Net: +124 lines, but much clearer intent
|
|
409
|
+
|
|
410
|
+
**Velocity impact**:
|
|
411
|
+
- Adding 4th eval: ~30 minutes (vs 1 hour in Week 1)
|
|
412
|
+
- Changing fixture setup: 1 place (vs 3 places)
|
|
413
|
+
|
|
414
|
+
**Quality improvement**:
|
|
415
|
+
- Declarative > Imperative
|
|
416
|
+
- DRY > Repetitive
|
|
417
|
+
- Clear > Verbose
|
|
418
|
+
|
|
419
|
+
**Ready for**: Week 3 (add more scenarios or advanced features)
|