engrm 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (82) hide show
  1. package/.mcp.json +9 -0
  2. package/AUTH-DESIGN.md +436 -0
  3. package/BRIEF.md +197 -0
  4. package/CLAUDE.md +44 -0
  5. package/COMPETITIVE.md +174 -0
  6. package/CONTEXT-OPTIMIZATION.md +305 -0
  7. package/INFRASTRUCTURE.md +252 -0
  8. package/LICENSE +105 -0
  9. package/MARKET.md +230 -0
  10. package/PLAN.md +278 -0
  11. package/README.md +121 -0
  12. package/SENTINEL.md +293 -0
  13. package/SERVER-API-PLAN.md +553 -0
  14. package/SPEC.md +843 -0
  15. package/SWOT.md +148 -0
  16. package/SYNC-ARCHITECTURE.md +294 -0
  17. package/VIBE-CODER-STRATEGY.md +250 -0
  18. package/bun.lock +375 -0
  19. package/hooks/post-tool-use.ts +144 -0
  20. package/hooks/session-start.ts +64 -0
  21. package/hooks/stop.ts +131 -0
  22. package/mem-page.html +1305 -0
  23. package/package.json +30 -0
  24. package/src/capture/dedup.test.ts +103 -0
  25. package/src/capture/dedup.ts +76 -0
  26. package/src/capture/extractor.test.ts +245 -0
  27. package/src/capture/extractor.ts +330 -0
  28. package/src/capture/quality.test.ts +168 -0
  29. package/src/capture/quality.ts +104 -0
  30. package/src/capture/retrospective.test.ts +115 -0
  31. package/src/capture/retrospective.ts +121 -0
  32. package/src/capture/scanner.test.ts +131 -0
  33. package/src/capture/scanner.ts +100 -0
  34. package/src/capture/scrubber.test.ts +144 -0
  35. package/src/capture/scrubber.ts +181 -0
  36. package/src/cli.ts +517 -0
  37. package/src/config.ts +238 -0
  38. package/src/context/inject.test.ts +940 -0
  39. package/src/context/inject.ts +382 -0
  40. package/src/embeddings/backfill.ts +50 -0
  41. package/src/embeddings/embedder.test.ts +76 -0
  42. package/src/embeddings/embedder.ts +139 -0
  43. package/src/lifecycle/aging.test.ts +103 -0
  44. package/src/lifecycle/aging.ts +36 -0
  45. package/src/lifecycle/compaction.test.ts +264 -0
  46. package/src/lifecycle/compaction.ts +190 -0
  47. package/src/lifecycle/purge.test.ts +100 -0
  48. package/src/lifecycle/purge.ts +37 -0
  49. package/src/lifecycle/scheduler.test.ts +120 -0
  50. package/src/lifecycle/scheduler.ts +101 -0
  51. package/src/provisioning/browser-auth.ts +172 -0
  52. package/src/provisioning/provision.test.ts +198 -0
  53. package/src/provisioning/provision.ts +94 -0
  54. package/src/register.test.ts +167 -0
  55. package/src/register.ts +178 -0
  56. package/src/server.ts +436 -0
  57. package/src/storage/migrations.test.ts +244 -0
  58. package/src/storage/migrations.ts +261 -0
  59. package/src/storage/outbox.test.ts +229 -0
  60. package/src/storage/outbox.ts +131 -0
  61. package/src/storage/projects.test.ts +137 -0
  62. package/src/storage/projects.ts +184 -0
  63. package/src/storage/sqlite.test.ts +798 -0
  64. package/src/storage/sqlite.ts +934 -0
  65. package/src/storage/vec.test.ts +198 -0
  66. package/src/sync/auth.test.ts +76 -0
  67. package/src/sync/auth.ts +68 -0
  68. package/src/sync/client.ts +183 -0
  69. package/src/sync/engine.test.ts +94 -0
  70. package/src/sync/engine.ts +127 -0
  71. package/src/sync/pull.test.ts +279 -0
  72. package/src/sync/pull.ts +170 -0
  73. package/src/sync/push.test.ts +117 -0
  74. package/src/sync/push.ts +230 -0
  75. package/src/tools/get.ts +34 -0
  76. package/src/tools/pin.ts +47 -0
  77. package/src/tools/save.test.ts +301 -0
  78. package/src/tools/save.ts +231 -0
  79. package/src/tools/search.test.ts +69 -0
  80. package/src/tools/search.ts +181 -0
  81. package/src/tools/timeline.ts +64 -0
  82. package/tsconfig.json +22 -0
package/COMPETITIVE.md ADDED
@@ -0,0 +1,174 @@
1
+ # Competitive Analysis — Why Engrm Wins
2
+
3
+ ## Market Landscape
4
+
5
+ The AI agent memory space is early-stage and fragmented. No player has captured the "shared memory for coding agents" category. Here's where everyone stands:
6
+
7
+ ---
8
+
9
+ ## Head-to-Head Comparison
10
+
11
+ ### claude-mem (thedotmack)
12
+ **What it is**: Local-only memory plugin for Claude Code. SQLite + ChromaDB. AGPL-3.0.
13
+
14
+ | Aspect | claude-mem | Engrm |
15
+ |---|---|---|
16
+ | Storage | Local SQLite + local ChromaDB | Local SQLite + Candengo Vector (remote) |
17
+ | Cross-device | No | Yes (offline-first sync) |
18
+ | Team support | No | Yes (shared namespaces) |
19
+ | Multi-agent | Claude Code only | Claude Code + OpenClaw + MCP agents |
20
+ | Search quality | ChromaDB default embeddings | BGE-M3 hybrid + cross-encoder reranking |
21
+ | Workpacks | No | Yes |
22
+ | Secret scrubbing | No | Yes |
23
+ | Self-hosted backend | N/A (local only) | Yes |
24
+ | License | AGPL-3.0 (viral copyleft) | FSL-1.1-ALv2 (Fair Source, converts to Apache 2.0 after 2yr) |
25
+ | Offline support | Always local | Offline-first with sync |
26
+
27
+ **Why we win**: claude-mem is a good single-device tool but architecturally can't do cross-device or team without a major rewrite. Engrm is built from scratch around cross-device sync — no fork, no shared code, no AGPL constraints.
28
+
29
+ ---
30
+
31
+ ### mem0
32
+ **What it is**: Cloud memory layer for AI agents. VC-funded. Proprietary SaaS.
33
+
34
+ | Aspect | mem0 | Engrm |
35
+ |---|---|---|
36
+ | Hosting | SaaS only (their cloud) | Self-hosted or our cloud |
37
+ | Privacy | Code context sent to mem0's servers | Your infrastructure, your data |
38
+ | Cross-device | Yes (cloud) | Yes (self-hosted or cloud) |
39
+ | Team support | Limited | Built-in from day one |
40
+ | Multi-agent | Multiple via API | Multiple via MCP standard |
41
+ | Offline support | No (requires internet) | Yes (offline-first) |
42
+ | Workpacks | No | Yes |
43
+ | Pricing | Per-API-call | Subscription (predictable) |
44
+ | Lock-in | High (proprietary API, their storage) | Low (MCP standard, self-hosted option) |
45
+
46
+ **Why we win**: Privacy and control. Many developers and enterprises won't send code context to a third-party cloud. Self-hosted Engrm with offline-first sync is the answer they're looking for. mem0's SaaS-only model is their weakness.
47
+
48
+ ---
49
+
50
+ ### Cognee
51
+ **What it is**: Knowledge graph memory for AI agents. Open source. Focus on semantic relationships.
52
+
53
+ | Aspect | Cognee | Engrm |
54
+ |---|---|---|
55
+ | Architecture | Knowledge graphs (Neo4j / networkx) | Vector search (Qdrant + BGE-M3) |
56
+ | Strength | Relationship reasoning | Fast semantic retrieval |
57
+ | Cross-device | No | Yes |
58
+ | Team support | No | Yes |
59
+ | Setup complexity | High (graph DB, entity extraction) | Low (SQLite local, REST API remote) |
60
+ | Agent support | OpenClaw focus | Claude Code + OpenClaw + MCP |
61
+ | Developer focus | General AI memory | Purpose-built for coding agents |
62
+
63
+ **Why we win**: Cognee is solving a different problem (knowledge graphs for reasoning). We're solving the practical problem developers have right now: "I need my AI to remember what I did yesterday, on any machine."
64
+
65
+ ---
66
+
67
+ ### OpenClaw Built-in Memory
68
+ **What it is**: Local Markdown files + SQLite with FTS5 and sqlite-vec. Per-agent, per-device.
69
+
70
+ | Aspect | OpenClaw Memory | Engrm |
71
+ |---|---|---|
72
+ | Storage | Local Markdown + SQLite | Local SQLite + remote vector |
73
+ | Cross-device | No | Yes |
74
+ | Cross-agent | No (OpenClaw only) | Yes (Claude Code + OpenClaw) |
75
+ | Team support | No | Yes |
76
+ | Search | 70/30 vector/BM25 hybrid | BGE-M3 hybrid + reranking |
77
+ | Workpacks | No | Yes |
78
+ | Memory plugin slot | Replaceable (`plugins.slots.memory`) | Can slot in as replacement |
79
+
80
+ **Why we win**: OpenClaw's memory is good for single-device, single-agent use. We extend it to cross-device, cross-agent, and team scenarios while plugging into their existing memory slot architecture.
81
+
82
+ ---
83
+
84
+ ### Cursor / Windsurf / Continue Memory
85
+ **What it is**: Proprietary, IDE-integrated context. Varies by product.
86
+
87
+ | Aspect | IDE-Native Memory | Engrm |
88
+ |---|---|---|
89
+ | Portability | Locked to one IDE | Works across agents/IDEs |
90
+ | Cross-device | Via IDE account sync (limited) | Full observation sync |
91
+ | Control | Proprietary, opaque | Open plugin, transparent |
92
+ | Team | Some (via IDE features) | Built-in, flexible |
93
+
94
+ **Why we win**: These are walled gardens. Switch IDE, lose memory. We're the portable layer that follows you regardless of which agent or IDE you use.
95
+
96
+ ---
97
+
98
+ ## Defensible Moats
99
+
100
+ ### 1. Full Stack Ownership
101
+ We control plugin + backend + workpack ecosystem. No dependency on third-party pricing or availability changes. This is rare — most memory solutions depend on external vector DBs.
102
+
103
+ ### 2. Workpack Ecosystem
104
+ No competitor has anything like workpacks. This is a compounding advantage:
105
+ - More users → more observations → better auto-generated workpacks
106
+ - Better workpacks → more value → more users
107
+ - Premium workpacks = recurring revenue with high margins
108
+ - Community workpacks = distribution and engagement
109
+
110
+ ### 3. Network Effects (Team Memory)
111
+ Each new team member makes the memory more valuable for everyone. This creates organic retention and word-of-mouth growth. Individual tools don't have this dynamic.
112
+
113
+ ### 4. Self-Hosted Trust
114
+ In an era of increasing AI privacy concerns, being self-hostable is a strategic differentiator. Enterprise customers with compliance requirements can only use self-hosted solutions. This market is underserved.
115
+
116
+ ### 5. Cross-Agent Portability
117
+ MCP is becoming the standard protocol for AI agent tools. By supporting MCP natively, we're positioned for every future agent that adopts the protocol. Competitors locked to one agent lose users when developers switch tools.
118
+
119
+ ### 6. Offline-First Architecture
120
+ The only solution that works without internet. This sounds like a niche feature but it's actually critical for:
121
+ - Developers commuting (trains, flights)
122
+ - Corporate environments with restricted internet
123
+ - Regions with unreliable connectivity
124
+ - Privacy-first users who want to control when data syncs
125
+
126
+ ### 7. Free Tier with Cloud Sync
127
+ Every other Claude Code memory plugin is local-only. We're the first to offer free cloud sync (10K observations, 2 devices). This makes cross-device memory the default experience, not a paid upgrade. The free tier is generous enough to be genuinely useful — adoption first, monetisation via natural upgrade when users hit limits or need team features.
128
+
129
+ ### 8. FSL License + Proprietary Sentinel
130
+ Source-available under FSL-1.1-ALv2 (Fair Source). Developers can read, modify, and run the core freely. Competitors cannot fork and offer a competing hosted service. Each version converts to Apache 2.0 after 2 years — a trust signal no competitor matches. Premium features (Sentinel real-time AI audit) are in a separate private repo, delivered to paying customers only. This is the GitLab CE/EE pattern: open core attracts adoption, proprietary premium drives revenue.
131
+
132
+ ---
133
+
134
+ ## Market Timing Advantage
135
+
136
+ ### Why Now
137
+
138
+ 1. **Claude Code** is Anthropic's fastest-growing developer tool. Memory is the #1 community request.
139
+ 2. **OpenClaw** hit 100k+ GitHub stars in weeks. Its community is actively building and seeking memory solutions.
140
+ 3. **MCP adoption** is accelerating. Every major AI tool is adding MCP support.
141
+ 4. **Developer AI spend** is normalising. Companies are budgeting for AI tooling. Memory is a natural add-on.
142
+ 5. **No incumbent** has captured the cross-device shared memory category. The window is open.
143
+
144
+ ### First Mover Advantage Opportunity
145
+ The agent memory market will consolidate. The first solution that nails cross-device + team + workpacks becomes the default. Switching costs in memory systems are high (you can't easily migrate years of observations). Early adopters become long-term customers.
146
+
147
+ ---
148
+
149
+ ## Go-to-Market Strategy
150
+
151
+ ### Phase 1: Developer Adoption (Months 1-3)
152
+ - Open-source the plugin (MIT license) → GitHub stars, trust, community
153
+ - Launch on Claude Code plugin marketplace + OpenClaw skill directory
154
+ - Blog posts, demo videos, Twitter/X developer threads
155
+ - Free tier with self-hosted Candengo Vector
156
+
157
+ ### Phase 2: Team Conversion (Months 3-6)
158
+ - Target teams already using Claude Code or OpenClaw
159
+ - Team onboarding flow: one config snippet, instant shared memory
160
+ - Case studies from our own team (dogfooding with Alchemy development)
161
+
162
+ ### Phase 3: Enterprise + Workpacks (Months 6-12)
163
+ - Enterprise self-hosted offering with support SLA
164
+ - Premium workpack marketplace launch
165
+ - Partner with framework communities for co-branded workpacks
166
+ - Conference talks and workshops
167
+
168
+ ### Distribution Channels
169
+ 1. **GitHub** — open-source plugin repo, README, examples
170
+ 2. **Plugin marketplaces** — Claude Code, OpenClaw skill registry
171
+ 3. **Candengo website** — product page, pricing, docs
172
+ 4. **Developer communities** — Reddit, HackerNews, Discord servers
173
+ 5. **Content marketing** — blog posts, tutorials, comparison guides
174
+ 6. **Word of mouth** — team features drive organic growth
@@ -0,0 +1,305 @@
1
+ # Context & Token Optimization Research
2
+
3
+ ## Date: 2026-03-10
4
+
5
+ ## Current State
6
+
7
+ - 7 observations, 124KB DB, 178 tests passing
8
+ - FTS5 search: 0.01-0.10ms (well under 50ms target)
9
+ - `search()` tool: 11-15ms
10
+ - `session_context()`: 11ms, ~653 tokens for 7 observations
11
+ - Context injection: top 10 by quality, 150-char narrative truncation
12
+
13
+ ## Open Source Landscape
14
+
15
+ ### Key Projects Studied
16
+
17
+ | Project | Key Technique | Relevance |
18
+ |---------|--------------|-----------|
19
+ | [Mem0](https://github.com/mem0ai/mem0) | Memory compression, graph memory, 90% token reduction | High |
20
+ | [OpenMemory](https://github.com/CaviraOSS/OpenMemory) | Hierarchical Memory Decomposition, temporal validity, multi-sector embeddings | High |
21
+ | [mcp-memory-service](https://github.com/doobidoo/mcp-memory-service) | Dream-inspired consolidation, knowledge graph, decay scoring | Medium |
22
+ | [token-optimizer-mcp](https://github.com/ooples/token-optimizer-mcp) | 95%+ token reduction via caching/compression | Medium |
23
+ | [mcp-memory-keeper](https://github.com/mkreyman/mcp-memory-keeper) | Token budgets with safety buffers | Medium |
24
+ | [Speakeasy Dynamic Toolsets](https://www.speakeasy.com/blog/how-we-reduced-token-usage-by-100x-dynamic-toolsets-v2) | 96% input token reduction via meta-tools and progressive disclosure | High |
25
+ | [MCP Protocol #1576](https://github.com/modelcontextprotocol/modelcontextprotocol/issues/1576) | Schema redundancy reduction | Low |
26
+
27
+ ### Key Research
28
+
29
+ - [Mem0 Paper (arXiv 2504.19413)](https://arxiv.org/abs/2504.19413): 26% accuracy improvement, 91% lower latency, 90% token savings
30
+ - [Scott Spence: Optimising MCP Context](https://scottspence.com/posts/optimising-mcp-server-context-usage-in-claude-code): 66K tokens consumed before conversation starts with many MCPs
31
+
32
+ ## Identified Improvements
33
+
34
+ ### Tier 1: Quick Wins (XS-S effort)
35
+
36
+ #### 1. Token Budget (not count limit)
37
+ **Source**: mcp-memory-keeper
38
+ **Current**: `session_context` caps at 10 observations regardless of size
39
+ **Proposed**: Cap at ~800 tokens, fill greedily by relevance score
40
+ **Impact**: Prevents context blowup at scale
41
+
42
+ #### 2. Facts-First Context
43
+ **Source**: Mem0 compression engine
44
+ **Current**: Context shows narrative snippets (prose, verbose)
45
+ **Proposed**: Show `facts[]` array (bullet points) instead of narrative — 50% denser information per token
46
+ **Impact**: ~50% more information in same token budget
47
+
48
+ #### 3. Tiered Context Injection
49
+ **Source**: Speakeasy progressive disclosure pattern
50
+ **Current**: All observations get title + 150-char narrative equally
51
+ **Proposed**:
52
+ - Top 3 by relevance: title + facts/narrative snippet
53
+ - Remaining: title-only, single line
54
+ - Footer: "N more observations available via `search`"
55
+ **Impact**: ~40% token reduction on context injection
56
+
57
+ #### 4. Recency × Quality Blended Scoring
58
+ **Source**: General best practice, Mem0
59
+ **Current**: `ORDER BY quality DESC, created_at_epoch DESC` — quality dominates
60
+ **Proposed**: `score = quality * 0.6 + recency_normalized * 0.4`
61
+ **Impact**: Better relevance = less re-searching by agent
62
+
63
+ #### 5. Terse Tool Descriptions
64
+ **Source**: MCP Protocol Issue #1576
65
+ **Current**: Verbose `.describe()` strings on each tool parameter
66
+ **Proposed**: Trim to minimal descriptions, remove obvious ones
67
+ **Impact**: ~500 tokens saved per conversation
68
+
69
+ #### 6. Double-Injection Guard
70
+ **Current**: session-start hook AND `session_context` MCP tool can both fire
71
+ **Proposed**: Track injection per session, skip if already done
72
+ **Impact**: Eliminates 100% duplication when both paths fire
73
+
74
+ ### Tier 2: Medium Effort (M)
75
+
76
+ #### 7. Knowledge Supersession
77
+ **Source**: OpenMemory temporal graph (`valid_from` / `valid_to`)
78
+ **Current**: Old observations can contradict current reality
79
+ **Proposed**: `superseded_by` field, auto-detect when new observation about same topic is saved
80
+ **Impact**: Prevents stale/contradictory context
81
+
82
+ #### 8. Concept-Based Linking
83
+ **Source**: Mem0 graph memory, OpenMemory
84
+ **Current**: `concepts` stored but unused for retrieval
85
+ **Proposed**: Auto-link observations sharing concepts/files, cluster in context injection
86
+ **Impact**: Better retrieval, fewer redundant observations shown
87
+
88
+ #### 9. CWD/File-Aware Boosting
89
+ **Current**: No awareness of what files the developer is working on
90
+ **Proposed**: Boost observations whose `files_modified` overlap with current `git diff --name-only` or directory listing
91
+ **Impact**: More relevant context for current work
92
+
93
+ ### Tier 3: Larger Effort (L)
94
+
95
+ #### 10. Consolidation Scheduling
96
+ **Source**: mcp-memory-service dream-inspired consolidation
97
+ **Current**: Lifecycle aging planned but not implemented
98
+ **Proposed**: Nightly merge of related low-quality observations, decay scoring
99
+ **Impact**: Fewer, better observations over time
100
+
101
+ ## What NOT to Copy
102
+
103
+ - Knowledge graphs with D3.js dashboards — over-engineered for our use case
104
+ - Multi-sector embeddings (5 memory types) — too complex for now
105
+ - Cloud sync via Cloudflare — we have Candengo Vector
106
+ - REST API alongside MCP — MCP is our interface
107
+ - Code mode workarounds — we're optimising within MCP
108
+
109
+ ## Performance Baseline (2026-03-10)
110
+
111
+ ```
112
+ Database: 7 observations, 124KB
113
+ FTS5 search: 0.01-0.10ms
114
+ search() tool: 11-15ms
115
+ session_context(): 11ms, ~653 tokens
116
+ Context injection: ~2,612 chars for 7 obs
117
+ Unit tests: 178 passing (600ms)
118
+ ```
119
+
120
+ ---
121
+
122
+ ## Implementation Plan
123
+
124
+ ### Decision: What Makes the Cut
125
+
126
+ | # | Improvement | Verdict | Sprint | Rationale |
127
+ |---|-------------|---------|--------|-----------|
128
+ | 1 | Token budget | **YES** | 1 | Prevents blowup, foundational for tiered context |
129
+ | 2 | Facts-first context | **YES** | 1 | Tightly coupled with token budget changes |
130
+ | 3 | Tiered context | **YES** | 1 | Tightly coupled with token budget changes |
131
+ | 4 | Blended scoring | **YES** | 2 | Independent, touches same file as 1-3 so do after |
132
+ | 5 | Terse descriptions | **YES** | 1 | Isolated, zero-risk, immediate savings |
133
+ | 6 | Double-injection guard | **YES** | 1 | Simple module-level flag, prevents waste |
134
+ | 7 | Knowledge supersession | **YES** | 2 | Requires migration, medium effort, high value |
135
+ | 8 | Concept linking | **DEFER** | 3+ | Needs join table, unclear value until more data |
136
+ | 9 | CWD/file-aware boosting | **DEFER** | 3+ | Adds latency to hook, complexity for unclear gain |
137
+ | 10 | Consolidation scheduling | **DEFER** | 3+ | Needs background process, premature at 7 obs |
138
+
139
+ ### Sprint 1: Core Context Optimizations
140
+
141
+ Three parallel workstreams, all low-risk:
142
+
143
+ #### Step A: Token Budget + Facts-First + Tiered Context (Items 1+2+3)
144
+
145
+ **File**: `src/context/inject.ts`
146
+
147
+ These three are inseparable — switching to a token budget requires deciding what fills it (facts-first) and how to tier it.
148
+
149
+ **Changes**:
150
+ 1. Add `estimateTokens(text: string): number` — simple `Math.ceil(text.length / 4)` heuristic (standard for English, 10% safety margin built in by using budget of 720 out of 800)
151
+ 2. Change `buildSessionContext` signature: replace `maxObservations: number = 10` with options object `{ tokenBudget?: number, maxCount?: number }` for backwards compat
152
+ 3. Replace count-based LIMIT with larger fetch (LIMIT 50), apply token-budget filling in TypeScript
153
+ 4. Add `facts: string | null` to `ContextObservation` interface, update `toContextObservation`
154
+ 5. Rewrite `formatContextForInjection` for tiered output:
155
+ - **Top tier (first 3)**: title + parsed facts as bullet points (fallback: 100-char narrative)
156
+ - **Lower tier (rest)**: title-only, one line
157
+ - **Footer**: "N more observations available via search"
158
+ 6. Add `totalActive: number` to `InjectedContext` for footer count
159
+
160
+ **File**: `hooks/session-start.ts`
161
+ - Update `buildSessionContext` call to use new options: `buildSessionContext(db, event.cwd, { tokenBudget: 800 })`
162
+
163
+ **Token math**: With 800 budget → ~3 detailed entries (20-40 tokens each) + 7-10 title-only entries (10-15 each) + 30 tokens header/footer
164
+
165
+ #### Step B: Terse Tool Descriptions (Item 5)
166
+
167
+ **File**: `src/server.ts`
168
+
169
+ Trim tool and parameter descriptions:
170
+
171
+ | Tool | Before | After |
172
+ |------|--------|-------|
173
+ | `save_observation` | "Save a coding observation (discovery, bugfix, decision, pattern, etc.) to memory" | "Save an observation to memory" |
174
+ | `search` | "Search memory for relevant observations, discoveries, and decisions" | "Search memory" |
175
+ | `get_observations` | "Fetch full details for specific observation IDs" | "Get observations by ID" |
176
+ | `timeline` | "Get chronological context around a specific observation" | "Timeline around an observation" |
177
+ | `pin_observation` | "Pin or unpin an observation to prevent lifecycle aging" | "Pin/unpin observation" |
178
+ | `session_context` | "Get relevant project memory for the current session. Call at the start of a session to load prior context." | "Load project memory for this session" |
179
+
180
+ Parameter descriptions: trim verbose ones, remove `.describe()` where enum/type is self-documenting.
181
+
182
+ Estimated savings: ~100-150 tokens per conversation.
183
+
184
+ #### Step C: Double-Injection Guard (Item 6)
185
+
186
+ **File**: `src/server.ts`
187
+
188
+ Since hooks and MCP server are separate processes (can't share state directly), use a simple module-level flag:
189
+
190
+ ```typescript
191
+ let contextServed = false;
192
+ ```
193
+
194
+ In `session_context` tool handler:
195
+ - First call: serve full context, set `contextServed = true`
196
+ - Subsequent calls: return "Context already loaded. Use search for specific queries."
197
+
198
+ The flag resets on process restart, which maps to session boundaries in stdio transport.
199
+
200
+ **Note**: Hook-to-MCP dedup (via DB flag) deferred to Sprint 2 as it needs a migration.
201
+
202
+ ### Sprint 2: Scoring + Supersession
203
+
204
+ #### Step D: Recency × Quality Blended Scoring (Item 4)
205
+
206
+ **File**: `src/context/inject.ts`
207
+
208
+ Add scoring function:
209
+ ```
210
+ computeBlendedScore(quality, createdAtEpoch, nowEpoch):
211
+ recencyNorm = max(0, 1 - (now - created) / (30 * 86400)) // 30-day linear decay
212
+ return quality * 0.6 + recencyNorm * 0.4
213
+ ```
214
+
215
+ Fetch 50 candidates from DB (sorted by quality DESC), re-sort in TypeScript by blended score before applying token budget.
216
+
217
+ Result: 2-day-old q=0.5 beats 25-day-old q=0.7 — correct behavior.
218
+
219
+ #### Step E: Knowledge Supersession (Item 7)
220
+
221
+ **Files**: `src/storage/migrations.ts`, `src/storage/sqlite.ts`, `src/context/inject.ts`, `src/server.ts`
222
+
223
+ 1. Migration v2: `ALTER TABLE observations ADD COLUMN superseded_by INTEGER REFERENCES observations(id) ON DELETE SET NULL`
224
+ 2. `MemDatabase.supersedeObservation(oldId, newId)` — sets `superseded_by` and archives old
225
+ 3. Filter `AND superseded_by IS NULL` in context and search queries
226
+ 4. Add `supersedes` param to `save_observation` tool
227
+
228
+ ---
229
+
230
+ ## Test Plan
231
+
232
+ ### New Tests for Sprint 1 (inject.test.ts)
233
+
234
+ ```
235
+ describe("estimateTokens", () => {
236
+ test("empty string returns 0")
237
+ test("short string estimates correctly")
238
+ test("long narrative estimates within expected range")
239
+ })
240
+
241
+ describe("token budget", () => {
242
+ test("respects token budget — stops adding when budget exhausted")
243
+ test("always includes pinned observations even if they exceed budget")
244
+ test("800 token budget fits more entries than old 10-count limit with narratives")
245
+ test("single huge observation doesn't crash — graceful overflow")
246
+ test("default budget is 800 when no options provided")
247
+ })
248
+
249
+ describe("facts-first formatting", () => {
250
+ test("top-tier observations show facts as bullet points")
251
+ test("falls back to narrative snippet when no facts available")
252
+ test("lower-tier observations show title only")
253
+ test("footer shows correct remaining count")
254
+ test("handles malformed facts JSON gracefully")
255
+ })
256
+
257
+ describe("tiered context", () => {
258
+ test("first 3 observations get detailed format")
259
+ test("observations 4+ get title-only format")
260
+ test("footer says 'N more available via search'")
261
+ test("with ≤3 observations, all get detailed format, no footer")
262
+ })
263
+ ```
264
+
265
+ ### New Tests for Sprint 1 (server.test.ts or inline)
266
+
267
+ ```
268
+ describe("double-injection guard", () => {
269
+ test("first session_context call returns full context")
270
+ test("second session_context call returns abbreviated message")
271
+ })
272
+ ```
273
+
274
+ ### New Tests for Sprint 2 (inject.test.ts)
275
+
276
+ ```
277
+ describe("blended scoring", () => {
278
+ test("recent medium-quality beats old high-quality")
279
+ test("very old observations get near-zero recency boost")
280
+ test("same-age observations sort by quality")
281
+ test("blended score is always between 0 and 1")
282
+ })
283
+ ```
284
+
285
+ ### New Tests for Sprint 2 (sqlite.test.ts, inject.test.ts)
286
+
287
+ ```
288
+ describe("supersession", () => {
289
+ test("superseded observation excluded from session context")
290
+ test("superseded observation deprioritized in search results")
291
+ test("save with supersedes param archives old observation")
292
+ test("superseded_by set to NULL on referenced observation delete")
293
+ test("migration v2 adds superseded_by column correctly")
294
+ })
295
+ ```
296
+
297
+ ## Risks & Mitigations
298
+
299
+ | Risk | Impact | Mitigation |
300
+ |------|--------|------------|
301
+ | Token estimation inaccuracy (code tokens shorter than English) | Over-fill budget | Use 10% safety margin (720 of 800 effective budget) |
302
+ | `buildSessionContext` signature change breaks hook | Hook crash | Use options object with defaults for backwards compat |
303
+ | Module-level guard persists across sessions (if process reused) | Stale guard | Stdio transport = 1 process per session, safe |
304
+ | Facts stored as JSON may be malformed | Formatting crash | Try/catch JSON.parse, fall back to raw string |
305
+ | LIMIT 50 fetch overhead vs LIMIT 10 | Marginal latency | SQLite with indexes handles 50 rows in <1ms |