engrm 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.mcp.json +9 -0
- package/AUTH-DESIGN.md +436 -0
- package/BRIEF.md +197 -0
- package/CLAUDE.md +44 -0
- package/COMPETITIVE.md +174 -0
- package/CONTEXT-OPTIMIZATION.md +305 -0
- package/INFRASTRUCTURE.md +252 -0
- package/LICENSE +105 -0
- package/MARKET.md +230 -0
- package/PLAN.md +278 -0
- package/README.md +121 -0
- package/SENTINEL.md +293 -0
- package/SERVER-API-PLAN.md +553 -0
- package/SPEC.md +843 -0
- package/SWOT.md +148 -0
- package/SYNC-ARCHITECTURE.md +294 -0
- package/VIBE-CODER-STRATEGY.md +250 -0
- package/bun.lock +375 -0
- package/hooks/post-tool-use.ts +144 -0
- package/hooks/session-start.ts +64 -0
- package/hooks/stop.ts +131 -0
- package/mem-page.html +1305 -0
- package/package.json +30 -0
- package/src/capture/dedup.test.ts +103 -0
- package/src/capture/dedup.ts +76 -0
- package/src/capture/extractor.test.ts +245 -0
- package/src/capture/extractor.ts +330 -0
- package/src/capture/quality.test.ts +168 -0
- package/src/capture/quality.ts +104 -0
- package/src/capture/retrospective.test.ts +115 -0
- package/src/capture/retrospective.ts +121 -0
- package/src/capture/scanner.test.ts +131 -0
- package/src/capture/scanner.ts +100 -0
- package/src/capture/scrubber.test.ts +144 -0
- package/src/capture/scrubber.ts +181 -0
- package/src/cli.ts +517 -0
- package/src/config.ts +238 -0
- package/src/context/inject.test.ts +940 -0
- package/src/context/inject.ts +382 -0
- package/src/embeddings/backfill.ts +50 -0
- package/src/embeddings/embedder.test.ts +76 -0
- package/src/embeddings/embedder.ts +139 -0
- package/src/lifecycle/aging.test.ts +103 -0
- package/src/lifecycle/aging.ts +36 -0
- package/src/lifecycle/compaction.test.ts +264 -0
- package/src/lifecycle/compaction.ts +190 -0
- package/src/lifecycle/purge.test.ts +100 -0
- package/src/lifecycle/purge.ts +37 -0
- package/src/lifecycle/scheduler.test.ts +120 -0
- package/src/lifecycle/scheduler.ts +101 -0
- package/src/provisioning/browser-auth.ts +172 -0
- package/src/provisioning/provision.test.ts +198 -0
- package/src/provisioning/provision.ts +94 -0
- package/src/register.test.ts +167 -0
- package/src/register.ts +178 -0
- package/src/server.ts +436 -0
- package/src/storage/migrations.test.ts +244 -0
- package/src/storage/migrations.ts +261 -0
- package/src/storage/outbox.test.ts +229 -0
- package/src/storage/outbox.ts +131 -0
- package/src/storage/projects.test.ts +137 -0
- package/src/storage/projects.ts +184 -0
- package/src/storage/sqlite.test.ts +798 -0
- package/src/storage/sqlite.ts +934 -0
- package/src/storage/vec.test.ts +198 -0
- package/src/sync/auth.test.ts +76 -0
- package/src/sync/auth.ts +68 -0
- package/src/sync/client.ts +183 -0
- package/src/sync/engine.test.ts +94 -0
- package/src/sync/engine.ts +127 -0
- package/src/sync/pull.test.ts +279 -0
- package/src/sync/pull.ts +170 -0
- package/src/sync/push.test.ts +117 -0
- package/src/sync/push.ts +230 -0
- package/src/tools/get.ts +34 -0
- package/src/tools/pin.ts +47 -0
- package/src/tools/save.test.ts +301 -0
- package/src/tools/save.ts +231 -0
- package/src/tools/search.test.ts +69 -0
- package/src/tools/search.ts +181 -0
- package/src/tools/timeline.ts +64 -0
- package/tsconfig.json +22 -0
package/COMPETITIVE.md
ADDED
|
@@ -0,0 +1,174 @@
|
|
|
1
|
+
# Competitive Analysis — Why Engrm Wins
|
|
2
|
+
|
|
3
|
+
## Market Landscape
|
|
4
|
+
|
|
5
|
+
The AI agent memory space is early-stage and fragmented. No player has captured the "shared memory for coding agents" category. Here's where everyone stands:
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## Head-to-Head Comparison
|
|
10
|
+
|
|
11
|
+
### claude-mem (thedotmack)
|
|
12
|
+
**What it is**: Local-only memory plugin for Claude Code. SQLite + ChromaDB. AGPL-3.0.
|
|
13
|
+
|
|
14
|
+
| Aspect | claude-mem | Engrm |
|
|
15
|
+
|---|---|---|
|
|
16
|
+
| Storage | Local SQLite + local ChromaDB | Local SQLite + Candengo Vector (remote) |
|
|
17
|
+
| Cross-device | No | Yes (offline-first sync) |
|
|
18
|
+
| Team support | No | Yes (shared namespaces) |
|
|
19
|
+
| Multi-agent | Claude Code only | Claude Code + OpenClaw + MCP agents |
|
|
20
|
+
| Search quality | ChromaDB default embeddings | BGE-M3 hybrid + cross-encoder reranking |
|
|
21
|
+
| Workpacks | No | Yes |
|
|
22
|
+
| Secret scrubbing | No | Yes |
|
|
23
|
+
| Self-hosted backend | N/A (local only) | Yes |
|
|
24
|
+
| License | AGPL-3.0 (viral copyleft) | FSL-1.1-ALv2 (Fair Source, converts to Apache 2.0 after 2yr) |
|
|
25
|
+
| Offline support | Always local | Offline-first with sync |
|
|
26
|
+
|
|
27
|
+
**Why we win**: claude-mem is a good single-device tool but architecturally can't do cross-device or team without a major rewrite. Engrm is built from scratch around cross-device sync — no fork, no shared code, no AGPL constraints.
|
|
28
|
+
|
|
29
|
+
---
|
|
30
|
+
|
|
31
|
+
### mem0
|
|
32
|
+
**What it is**: Cloud memory layer for AI agents. VC-funded. Proprietary SaaS.
|
|
33
|
+
|
|
34
|
+
| Aspect | mem0 | Engrm |
|
|
35
|
+
|---|---|---|
|
|
36
|
+
| Hosting | SaaS only (their cloud) | Self-hosted or our cloud |
|
|
37
|
+
| Privacy | Code context sent to mem0's servers | Your infrastructure, your data |
|
|
38
|
+
| Cross-device | Yes (cloud) | Yes (self-hosted or cloud) |
|
|
39
|
+
| Team support | Limited | Built-in from day one |
|
|
40
|
+
| Multi-agent | Multiple via API | Multiple via MCP standard |
|
|
41
|
+
| Offline support | No (requires internet) | Yes (offline-first) |
|
|
42
|
+
| Workpacks | No | Yes |
|
|
43
|
+
| Pricing | Per-API-call | Subscription (predictable) |
|
|
44
|
+
| Lock-in | High (proprietary API, their storage) | Low (MCP standard, self-hosted option) |
|
|
45
|
+
|
|
46
|
+
**Why we win**: Privacy and control. Many developers and enterprises won't send code context to a third-party cloud. Self-hosted Engrm with offline-first sync is the answer they're looking for. mem0's SaaS-only model is their weakness.
|
|
47
|
+
|
|
48
|
+
---
|
|
49
|
+
|
|
50
|
+
### Cognee
|
|
51
|
+
**What it is**: Knowledge graph memory for AI agents. Open source. Focus on semantic relationships.
|
|
52
|
+
|
|
53
|
+
| Aspect | Cognee | Engrm |
|
|
54
|
+
|---|---|---|
|
|
55
|
+
| Architecture | Knowledge graphs (Neo4j / networkx) | Vector search (Qdrant + BGE-M3) |
|
|
56
|
+
| Strength | Relationship reasoning | Fast semantic retrieval |
|
|
57
|
+
| Cross-device | No | Yes |
|
|
58
|
+
| Team support | No | Yes |
|
|
59
|
+
| Setup complexity | High (graph DB, entity extraction) | Low (SQLite local, REST API remote) |
|
|
60
|
+
| Agent support | OpenClaw focus | Claude Code + OpenClaw + MCP |
|
|
61
|
+
| Developer focus | General AI memory | Purpose-built for coding agents |
|
|
62
|
+
|
|
63
|
+
**Why we win**: Cognee is solving a different problem (knowledge graphs for reasoning). We're solving the practical problem developers have right now: "I need my AI to remember what I did yesterday, on any machine."
|
|
64
|
+
|
|
65
|
+
---
|
|
66
|
+
|
|
67
|
+
### OpenClaw Built-in Memory
|
|
68
|
+
**What it is**: Local Markdown files + SQLite with FTS5 and sqlite-vec. Per-agent, per-device.
|
|
69
|
+
|
|
70
|
+
| Aspect | OpenClaw Memory | Engrm |
|
|
71
|
+
|---|---|---|
|
|
72
|
+
| Storage | Local Markdown + SQLite | Local SQLite + remote vector |
|
|
73
|
+
| Cross-device | No | Yes |
|
|
74
|
+
| Cross-agent | No (OpenClaw only) | Yes (Claude Code + OpenClaw) |
|
|
75
|
+
| Team support | No | Yes |
|
|
76
|
+
| Search | 70/30 vector/BM25 hybrid | BGE-M3 hybrid + reranking |
|
|
77
|
+
| Workpacks | No | Yes |
|
|
78
|
+
| Memory plugin slot | Replaceable (`plugins.slots.memory`) | Can slot in as replacement |
|
|
79
|
+
|
|
80
|
+
**Why we win**: OpenClaw's memory is good for single-device, single-agent use. We extend it to cross-device, cross-agent, and team scenarios while plugging into their existing memory slot architecture.
|
|
81
|
+
|
|
82
|
+
---
|
|
83
|
+
|
|
84
|
+
### Cursor / Windsurf / Continue Memory
|
|
85
|
+
**What it is**: Proprietary, IDE-integrated context. Varies by product.
|
|
86
|
+
|
|
87
|
+
| Aspect | IDE-Native Memory | Engrm |
|
|
88
|
+
|---|---|---|
|
|
89
|
+
| Portability | Locked to one IDE | Works across agents/IDEs |
|
|
90
|
+
| Cross-device | Via IDE account sync (limited) | Full observation sync |
|
|
91
|
+
| Control | Proprietary, opaque | Open plugin, transparent |
|
|
92
|
+
| Team | Some (via IDE features) | Built-in, flexible |
|
|
93
|
+
|
|
94
|
+
**Why we win**: These are walled gardens. Switch IDE, lose memory. We're the portable layer that follows you regardless of which agent or IDE you use.
|
|
95
|
+
|
|
96
|
+
---
|
|
97
|
+
|
|
98
|
+
## Defensible Moats
|
|
99
|
+
|
|
100
|
+
### 1. Full Stack Ownership
|
|
101
|
+
We control plugin + backend + workpack ecosystem. No dependency on third-party pricing or availability changes. This is rare — most memory solutions depend on external vector DBs.
|
|
102
|
+
|
|
103
|
+
### 2. Workpack Ecosystem
|
|
104
|
+
No competitor has anything like workpacks. This is a compounding advantage:
|
|
105
|
+
- More users → more observations → better auto-generated workpacks
|
|
106
|
+
- Better workpacks → more value → more users
|
|
107
|
+
- Premium workpacks = recurring revenue with high margins
|
|
108
|
+
- Community workpacks = distribution and engagement
|
|
109
|
+
|
|
110
|
+
### 3. Network Effects (Team Memory)
|
|
111
|
+
Each new team member makes the memory more valuable for everyone. This creates organic retention and word-of-mouth growth. Individual tools don't have this dynamic.
|
|
112
|
+
|
|
113
|
+
### 4. Self-Hosted Trust
|
|
114
|
+
In an era of increasing AI privacy concerns, being self-hostable is a strategic differentiator. Enterprise customers with compliance requirements can only use self-hosted solutions. This market is underserved.
|
|
115
|
+
|
|
116
|
+
### 5. Cross-Agent Portability
|
|
117
|
+
MCP is becoming the standard protocol for AI agent tools. By supporting MCP natively, we're positioned for every future agent that adopts the protocol. Competitors locked to one agent lose users when developers switch tools.
|
|
118
|
+
|
|
119
|
+
### 6. Offline-First Architecture
|
|
120
|
+
The only solution that works without internet. This sounds like a niche feature but it's actually critical for:
|
|
121
|
+
- Developers commuting (trains, flights)
|
|
122
|
+
- Corporate environments with restricted internet
|
|
123
|
+
- Regions with unreliable connectivity
|
|
124
|
+
- Privacy-first users who want to control when data syncs
|
|
125
|
+
|
|
126
|
+
### 7. Free Tier with Cloud Sync
|
|
127
|
+
Every other Claude Code memory plugin is local-only. We're the first to offer free cloud sync (10K observations, 2 devices). This makes cross-device memory the default experience, not a paid upgrade. The free tier is generous enough to be genuinely useful — adoption first, monetisation via natural upgrade when users hit limits or need team features.
|
|
128
|
+
|
|
129
|
+
### 8. FSL License + Proprietary Sentinel
|
|
130
|
+
Source-available under FSL-1.1-ALv2 (Fair Source). Developers can read, modify, and run the core freely. Competitors cannot fork and offer a competing hosted service. Each version converts to Apache 2.0 after 2 years — a trust signal no competitor matches. Premium features (Sentinel real-time AI audit) are in a separate private repo, delivered to paying customers only. This is the GitLab CE/EE pattern: open core attracts adoption, proprietary premium drives revenue.
|
|
131
|
+
|
|
132
|
+
---
|
|
133
|
+
|
|
134
|
+
## Market Timing Advantage
|
|
135
|
+
|
|
136
|
+
### Why Now
|
|
137
|
+
|
|
138
|
+
1. **Claude Code** is Anthropic's fastest-growing developer tool. Memory is the #1 community request.
|
|
139
|
+
2. **OpenClaw** hit 100k+ GitHub stars in weeks. Its community is actively building and seeking memory solutions.
|
|
140
|
+
3. **MCP adoption** is accelerating. Every major AI tool is adding MCP support.
|
|
141
|
+
4. **Developer AI spend** is normalising. Companies are budgeting for AI tooling. Memory is a natural add-on.
|
|
142
|
+
5. **No incumbent** has captured the cross-device shared memory category. The window is open.
|
|
143
|
+
|
|
144
|
+
### First Mover Advantage Opportunity
|
|
145
|
+
The agent memory market will consolidate. The first solution that nails cross-device + team + workpacks becomes the default. Switching costs in memory systems are high (you can't easily migrate years of observations). Early adopters become long-term customers.
|
|
146
|
+
|
|
147
|
+
---
|
|
148
|
+
|
|
149
|
+
## Go-to-Market Strategy
|
|
150
|
+
|
|
151
|
+
### Phase 1: Developer Adoption (Months 1-3)
|
|
152
|
+
- Open-source the plugin (MIT license) → GitHub stars, trust, community
|
|
153
|
+
- Launch on Claude Code plugin marketplace + OpenClaw skill directory
|
|
154
|
+
- Blog posts, demo videos, Twitter/X developer threads
|
|
155
|
+
- Free tier with self-hosted Candengo Vector
|
|
156
|
+
|
|
157
|
+
### Phase 2: Team Conversion (Months 3-6)
|
|
158
|
+
- Target teams already using Claude Code or OpenClaw
|
|
159
|
+
- Team onboarding flow: one config snippet, instant shared memory
|
|
160
|
+
- Case studies from our own team (dogfooding with Alchemy development)
|
|
161
|
+
|
|
162
|
+
### Phase 3: Enterprise + Workpacks (Months 6-12)
|
|
163
|
+
- Enterprise self-hosted offering with support SLA
|
|
164
|
+
- Premium workpack marketplace launch
|
|
165
|
+
- Partner with framework communities for co-branded workpacks
|
|
166
|
+
- Conference talks and workshops
|
|
167
|
+
|
|
168
|
+
### Distribution Channels
|
|
169
|
+
1. **GitHub** — open-source plugin repo, README, examples
|
|
170
|
+
2. **Plugin marketplaces** — Claude Code, OpenClaw skill registry
|
|
171
|
+
3. **Candengo website** — product page, pricing, docs
|
|
172
|
+
4. **Developer communities** — Reddit, HackerNews, Discord servers
|
|
173
|
+
5. **Content marketing** — blog posts, tutorials, comparison guides
|
|
174
|
+
6. **Word of mouth** — team features drive organic growth
|
|
@@ -0,0 +1,305 @@
|
|
|
1
|
+
# Context & Token Optimization Research
|
|
2
|
+
|
|
3
|
+
## Date: 2026-03-10
|
|
4
|
+
|
|
5
|
+
## Current State
|
|
6
|
+
|
|
7
|
+
- 7 observations, 124KB DB, 178 tests passing
|
|
8
|
+
- FTS5 search: 0.01-0.10ms (well under 50ms target)
|
|
9
|
+
- `search()` tool: 11-15ms
|
|
10
|
+
- `session_context()`: 11ms, ~653 tokens for 7 observations
|
|
11
|
+
- Context injection: top 10 by quality, 150-char narrative truncation
|
|
12
|
+
|
|
13
|
+
## Open Source Landscape
|
|
14
|
+
|
|
15
|
+
### Key Projects Studied
|
|
16
|
+
|
|
17
|
+
| Project | Key Technique | Relevance |
|
|
18
|
+
|---------|--------------|-----------|
|
|
19
|
+
| [Mem0](https://github.com/mem0ai/mem0) | Memory compression, graph memory, 90% token reduction | High |
|
|
20
|
+
| [OpenMemory](https://github.com/CaviraOSS/OpenMemory) | Hierarchical Memory Decomposition, temporal validity, multi-sector embeddings | High |
|
|
21
|
+
| [mcp-memory-service](https://github.com/doobidoo/mcp-memory-service) | Dream-inspired consolidation, knowledge graph, decay scoring | Medium |
|
|
22
|
+
| [token-optimizer-mcp](https://github.com/ooples/token-optimizer-mcp) | 95%+ token reduction via caching/compression | Medium |
|
|
23
|
+
| [mcp-memory-keeper](https://github.com/mkreyman/mcp-memory-keeper) | Token budgets with safety buffers | Medium |
|
|
24
|
+
| [Speakeasy Dynamic Toolsets](https://www.speakeasy.com/blog/how-we-reduced-token-usage-by-100x-dynamic-toolsets-v2) | 96% input token reduction via meta-tools and progressive disclosure | High |
|
|
25
|
+
| [MCP Protocol #1576](https://github.com/modelcontextprotocol/modelcontextprotocol/issues/1576) | Schema redundancy reduction | Low |
|
|
26
|
+
|
|
27
|
+
### Key Research
|
|
28
|
+
|
|
29
|
+
- [Mem0 Paper (arXiv 2504.19413)](https://arxiv.org/abs/2504.19413): 26% accuracy improvement, 91% lower latency, 90% token savings
|
|
30
|
+
- [Scott Spence: Optimising MCP Context](https://scottspence.com/posts/optimising-mcp-server-context-usage-in-claude-code): 66K tokens consumed before conversation starts with many MCPs
|
|
31
|
+
|
|
32
|
+
## Identified Improvements
|
|
33
|
+
|
|
34
|
+
### Tier 1: Quick Wins (XS-S effort)
|
|
35
|
+
|
|
36
|
+
#### 1. Token Budget (not count limit)
|
|
37
|
+
**Source**: mcp-memory-keeper
|
|
38
|
+
**Current**: `session_context` caps at 10 observations regardless of size
|
|
39
|
+
**Proposed**: Cap at ~800 tokens, fill greedily by relevance score
|
|
40
|
+
**Impact**: Prevents context blowup at scale
|
|
41
|
+
|
|
42
|
+
#### 2. Facts-First Context
|
|
43
|
+
**Source**: Mem0 compression engine
|
|
44
|
+
**Current**: Context shows narrative snippets (prose, verbose)
|
|
45
|
+
**Proposed**: Show `facts[]` array (bullet points) instead of narrative — 50% denser information per token
|
|
46
|
+
**Impact**: ~50% more information in same token budget
|
|
47
|
+
|
|
48
|
+
#### 3. Tiered Context Injection
|
|
49
|
+
**Source**: Speakeasy progressive disclosure pattern
|
|
50
|
+
**Current**: All observations get title + 150-char narrative equally
|
|
51
|
+
**Proposed**:
|
|
52
|
+
- Top 3 by relevance: title + facts/narrative snippet
|
|
53
|
+
- Remaining: title-only, single line
|
|
54
|
+
- Footer: "N more observations available via `search`"
|
|
55
|
+
**Impact**: ~40% token reduction on context injection
|
|
56
|
+
|
|
57
|
+
#### 4. Recency × Quality Blended Scoring
|
|
58
|
+
**Source**: General best practice, Mem0
|
|
59
|
+
**Current**: `ORDER BY quality DESC, created_at_epoch DESC` — quality dominates
|
|
60
|
+
**Proposed**: `score = quality * 0.6 + recency_normalized * 0.4`
|
|
61
|
+
**Impact**: Better relevance = less re-searching by agent
|
|
62
|
+
|
|
63
|
+
#### 5. Terse Tool Descriptions
|
|
64
|
+
**Source**: MCP Protocol Issue #1576
|
|
65
|
+
**Current**: Verbose `.describe()` strings on each tool parameter
|
|
66
|
+
**Proposed**: Trim to minimal descriptions, remove obvious ones
|
|
67
|
+
**Impact**: ~500 tokens saved per conversation
|
|
68
|
+
|
|
69
|
+
#### 6. Double-Injection Guard
|
|
70
|
+
**Current**: session-start hook AND `session_context` MCP tool can both fire
|
|
71
|
+
**Proposed**: Track injection per session, skip if already done
|
|
72
|
+
**Impact**: Eliminates 100% duplication when both paths fire
|
|
73
|
+
|
|
74
|
+
### Tier 2: Medium Effort (M)
|
|
75
|
+
|
|
76
|
+
#### 7. Knowledge Supersession
|
|
77
|
+
**Source**: OpenMemory temporal graph (`valid_from` / `valid_to`)
|
|
78
|
+
**Current**: Old observations can contradict current reality
|
|
79
|
+
**Proposed**: `superseded_by` field, auto-detect when new observation about same topic is saved
|
|
80
|
+
**Impact**: Prevents stale/contradictory context
|
|
81
|
+
|
|
82
|
+
#### 8. Concept-Based Linking
|
|
83
|
+
**Source**: Mem0 graph memory, OpenMemory
|
|
84
|
+
**Current**: `concepts` stored but unused for retrieval
|
|
85
|
+
**Proposed**: Auto-link observations sharing concepts/files, cluster in context injection
|
|
86
|
+
**Impact**: Better retrieval, fewer redundant observations shown
|
|
87
|
+
|
|
88
|
+
#### 9. CWD/File-Aware Boosting
|
|
89
|
+
**Current**: No awareness of what files the developer is working on
|
|
90
|
+
**Proposed**: Boost observations whose `files_modified` overlap with current `git diff --name-only` or directory listing
|
|
91
|
+
**Impact**: More relevant context for current work
|
|
92
|
+
|
|
93
|
+
### Tier 3: Larger Effort (L)
|
|
94
|
+
|
|
95
|
+
#### 10. Consolidation Scheduling
|
|
96
|
+
**Source**: mcp-memory-service dream-inspired consolidation
|
|
97
|
+
**Current**: Lifecycle aging planned but not implemented
|
|
98
|
+
**Proposed**: Nightly merge of related low-quality observations, decay scoring
|
|
99
|
+
**Impact**: Fewer, better observations over time
|
|
100
|
+
|
|
101
|
+
## What NOT to Copy
|
|
102
|
+
|
|
103
|
+
- Knowledge graphs with D3.js dashboards — over-engineered for our use case
|
|
104
|
+
- Multi-sector embeddings (5 memory types) — too complex for now
|
|
105
|
+
- Cloud sync via Cloudflare — we have Candengo Vector
|
|
106
|
+
- REST API alongside MCP — MCP is our interface
|
|
107
|
+
- Code mode workarounds — we're optimising within MCP
|
|
108
|
+
|
|
109
|
+
## Performance Baseline (2026-03-10)
|
|
110
|
+
|
|
111
|
+
```
|
|
112
|
+
Database: 7 observations, 124KB
|
|
113
|
+
FTS5 search: 0.01-0.10ms
|
|
114
|
+
search() tool: 11-15ms
|
|
115
|
+
session_context(): 11ms, ~653 tokens
|
|
116
|
+
Context injection: ~2,612 chars for 7 obs
|
|
117
|
+
Unit tests: 178 passing (600ms)
|
|
118
|
+
```
|
|
119
|
+
|
|
120
|
+
---
|
|
121
|
+
|
|
122
|
+
## Implementation Plan
|
|
123
|
+
|
|
124
|
+
### Decision: What Makes the Cut
|
|
125
|
+
|
|
126
|
+
| # | Improvement | Verdict | Sprint | Rationale |
|
|
127
|
+
|---|-------------|---------|--------|-----------|
|
|
128
|
+
| 1 | Token budget | **YES** | 1 | Prevents blowup, foundational for tiered context |
|
|
129
|
+
| 2 | Facts-first context | **YES** | 1 | Tightly coupled with token budget changes |
|
|
130
|
+
| 3 | Tiered context | **YES** | 1 | Tightly coupled with token budget changes |
|
|
131
|
+
| 4 | Blended scoring | **YES** | 2 | Independent, touches same file as 1-3 so do after |
|
|
132
|
+
| 5 | Terse descriptions | **YES** | 1 | Isolated, zero-risk, immediate savings |
|
|
133
|
+
| 6 | Double-injection guard | **YES** | 1 | Simple module-level flag, prevents waste |
|
|
134
|
+
| 7 | Knowledge supersession | **YES** | 2 | Requires migration, medium effort, high value |
|
|
135
|
+
| 8 | Concept linking | **DEFER** | 3+ | Needs join table, unclear value until more data |
|
|
136
|
+
| 9 | CWD/file-aware boosting | **DEFER** | 3+ | Adds latency to hook, complexity for unclear gain |
|
|
137
|
+
| 10 | Consolidation scheduling | **DEFER** | 3+ | Needs background process, premature at 7 obs |
|
|
138
|
+
|
|
139
|
+
### Sprint 1: Core Context Optimizations
|
|
140
|
+
|
|
141
|
+
Three parallel workstreams, all low-risk:
|
|
142
|
+
|
|
143
|
+
#### Step A: Token Budget + Facts-First + Tiered Context (Items 1+2+3)
|
|
144
|
+
|
|
145
|
+
**File**: `src/context/inject.ts`
|
|
146
|
+
|
|
147
|
+
These three are inseparable — switching to a token budget requires deciding what fills it (facts-first) and how to tier it.
|
|
148
|
+
|
|
149
|
+
**Changes**:
|
|
150
|
+
1. Add `estimateTokens(text: string): number` — simple `Math.ceil(text.length / 4)` heuristic (standard for English, 10% safety margin built in by using budget of 720 out of 800)
|
|
151
|
+
2. Change `buildSessionContext` signature: replace `maxObservations: number = 10` with options object `{ tokenBudget?: number, maxCount?: number }` for backwards compat
|
|
152
|
+
3. Replace count-based LIMIT with larger fetch (LIMIT 50), apply token-budget filling in TypeScript
|
|
153
|
+
4. Add `facts: string | null` to `ContextObservation` interface, update `toContextObservation`
|
|
154
|
+
5. Rewrite `formatContextForInjection` for tiered output:
|
|
155
|
+
- **Top tier (first 3)**: title + parsed facts as bullet points (fallback: 100-char narrative)
|
|
156
|
+
- **Lower tier (rest)**: title-only, one line
|
|
157
|
+
- **Footer**: "N more observations available via search"
|
|
158
|
+
6. Add `totalActive: number` to `InjectedContext` for footer count
|
|
159
|
+
|
|
160
|
+
**File**: `hooks/session-start.ts`
|
|
161
|
+
- Update `buildSessionContext` call to use new options: `buildSessionContext(db, event.cwd, { tokenBudget: 800 })`
|
|
162
|
+
|
|
163
|
+
**Token math**: With 800 budget → ~3 detailed entries (20-40 tokens each) + 7-10 title-only entries (10-15 each) + 30 tokens header/footer
|
|
164
|
+
|
|
165
|
+
#### Step B: Terse Tool Descriptions (Item 5)
|
|
166
|
+
|
|
167
|
+
**File**: `src/server.ts`
|
|
168
|
+
|
|
169
|
+
Trim tool and parameter descriptions:
|
|
170
|
+
|
|
171
|
+
| Tool | Before | After |
|
|
172
|
+
|------|--------|-------|
|
|
173
|
+
| `save_observation` | "Save a coding observation (discovery, bugfix, decision, pattern, etc.) to memory" | "Save an observation to memory" |
|
|
174
|
+
| `search` | "Search memory for relevant observations, discoveries, and decisions" | "Search memory" |
|
|
175
|
+
| `get_observations` | "Fetch full details for specific observation IDs" | "Get observations by ID" |
|
|
176
|
+
| `timeline` | "Get chronological context around a specific observation" | "Timeline around an observation" |
|
|
177
|
+
| `pin_observation` | "Pin or unpin an observation to prevent lifecycle aging" | "Pin/unpin observation" |
|
|
178
|
+
| `session_context` | "Get relevant project memory for the current session. Call at the start of a session to load prior context." | "Load project memory for this session" |
|
|
179
|
+
|
|
180
|
+
Parameter descriptions: trim verbose ones, remove `.describe()` where enum/type is self-documenting.
|
|
181
|
+
|
|
182
|
+
Estimated savings: ~100-150 tokens per conversation.
|
|
183
|
+
|
|
184
|
+
#### Step C: Double-Injection Guard (Item 6)
|
|
185
|
+
|
|
186
|
+
**File**: `src/server.ts`
|
|
187
|
+
|
|
188
|
+
Since hooks and MCP server are separate processes (can't share state directly), use a simple module-level flag:
|
|
189
|
+
|
|
190
|
+
```typescript
|
|
191
|
+
let contextServed = false;
|
|
192
|
+
```
|
|
193
|
+
|
|
194
|
+
In `session_context` tool handler:
|
|
195
|
+
- First call: serve full context, set `contextServed = true`
|
|
196
|
+
- Subsequent calls: return "Context already loaded. Use search for specific queries."
|
|
197
|
+
|
|
198
|
+
The flag resets on process restart, which maps to session boundaries in stdio transport.
|
|
199
|
+
|
|
200
|
+
**Note**: Hook-to-MCP dedup (via DB flag) deferred to Sprint 2 as it needs a migration.
|
|
201
|
+
|
|
202
|
+
### Sprint 2: Scoring + Supersession
|
|
203
|
+
|
|
204
|
+
#### Step D: Recency × Quality Blended Scoring (Item 4)
|
|
205
|
+
|
|
206
|
+
**File**: `src/context/inject.ts`
|
|
207
|
+
|
|
208
|
+
Add scoring function:
|
|
209
|
+
```
|
|
210
|
+
computeBlendedScore(quality, createdAtEpoch, nowEpoch):
|
|
211
|
+
recencyNorm = max(0, 1 - (now - created) / (30 * 86400)) // 30-day linear decay
|
|
212
|
+
return quality * 0.6 + recencyNorm * 0.4
|
|
213
|
+
```
|
|
214
|
+
|
|
215
|
+
Fetch 50 candidates from DB (sorted by quality DESC), re-sort in TypeScript by blended score before applying token budget.
|
|
216
|
+
|
|
217
|
+
Result: 2-day-old q=0.5 beats 25-day-old q=0.7 — correct behavior.
|
|
218
|
+
|
|
219
|
+
#### Step E: Knowledge Supersession (Item 7)
|
|
220
|
+
|
|
221
|
+
**Files**: `src/storage/migrations.ts`, `src/storage/sqlite.ts`, `src/context/inject.ts`, `src/server.ts`
|
|
222
|
+
|
|
223
|
+
1. Migration v2: `ALTER TABLE observations ADD COLUMN superseded_by INTEGER REFERENCES observations(id) ON DELETE SET NULL`
|
|
224
|
+
2. `MemDatabase.supersedeObservation(oldId, newId)` — sets `superseded_by` and archives old
|
|
225
|
+
3. Filter `AND superseded_by IS NULL` in context and search queries
|
|
226
|
+
4. Add `supersedes` param to `save_observation` tool
|
|
227
|
+
|
|
228
|
+
---
|
|
229
|
+
|
|
230
|
+
## Test Plan
|
|
231
|
+
|
|
232
|
+
### New Tests for Sprint 1 (inject.test.ts)
|
|
233
|
+
|
|
234
|
+
```
|
|
235
|
+
describe("estimateTokens", () => {
|
|
236
|
+
test("empty string returns 0")
|
|
237
|
+
test("short string estimates correctly")
|
|
238
|
+
test("long narrative estimates within expected range")
|
|
239
|
+
})
|
|
240
|
+
|
|
241
|
+
describe("token budget", () => {
|
|
242
|
+
test("respects token budget — stops adding when budget exhausted")
|
|
243
|
+
test("always includes pinned observations even if they exceed budget")
|
|
244
|
+
test("800 token budget fits more entries than old 10-count limit with narratives")
|
|
245
|
+
test("single huge observation doesn't crash — graceful overflow")
|
|
246
|
+
test("default budget is 800 when no options provided")
|
|
247
|
+
})
|
|
248
|
+
|
|
249
|
+
describe("facts-first formatting", () => {
|
|
250
|
+
test("top-tier observations show facts as bullet points")
|
|
251
|
+
test("falls back to narrative snippet when no facts available")
|
|
252
|
+
test("lower-tier observations show title only")
|
|
253
|
+
test("footer shows correct remaining count")
|
|
254
|
+
test("handles malformed facts JSON gracefully")
|
|
255
|
+
})
|
|
256
|
+
|
|
257
|
+
describe("tiered context", () => {
|
|
258
|
+
test("first 3 observations get detailed format")
|
|
259
|
+
test("observations 4+ get title-only format")
|
|
260
|
+
test("footer says 'N more available via search'")
|
|
261
|
+
test("with ≤3 observations, all get detailed format, no footer")
|
|
262
|
+
})
|
|
263
|
+
```
|
|
264
|
+
|
|
265
|
+
### New Tests for Sprint 1 (server.test.ts or inline)
|
|
266
|
+
|
|
267
|
+
```
|
|
268
|
+
describe("double-injection guard", () => {
|
|
269
|
+
test("first session_context call returns full context")
|
|
270
|
+
test("second session_context call returns abbreviated message")
|
|
271
|
+
})
|
|
272
|
+
```
|
|
273
|
+
|
|
274
|
+
### New Tests for Sprint 2 (inject.test.ts)
|
|
275
|
+
|
|
276
|
+
```
|
|
277
|
+
describe("blended scoring", () => {
|
|
278
|
+
test("recent medium-quality beats old high-quality")
|
|
279
|
+
test("very old observations get near-zero recency boost")
|
|
280
|
+
test("same-age observations sort by quality")
|
|
281
|
+
test("blended score is always between 0 and 1")
|
|
282
|
+
})
|
|
283
|
+
```
|
|
284
|
+
|
|
285
|
+
### New Tests for Sprint 2 (sqlite.test.ts, inject.test.ts)
|
|
286
|
+
|
|
287
|
+
```
|
|
288
|
+
describe("supersession", () => {
|
|
289
|
+
test("superseded observation excluded from session context")
|
|
290
|
+
test("superseded observation deprioritized in search results")
|
|
291
|
+
test("save with supersedes param archives old observation")
|
|
292
|
+
test("superseded_by set to NULL on referenced observation delete")
|
|
293
|
+
test("migration v2 adds superseded_by column correctly")
|
|
294
|
+
})
|
|
295
|
+
```
|
|
296
|
+
|
|
297
|
+
## Risks & Mitigations
|
|
298
|
+
|
|
299
|
+
| Risk | Impact | Mitigation |
|
|
300
|
+
|------|--------|------------|
|
|
301
|
+
| Token estimation inaccuracy (code tokens shorter than English) | Over-fill budget | Use 10% safety margin (720 of 800 effective budget) |
|
|
302
|
+
| `buildSessionContext` signature change breaks hook | Hook crash | Use options object with defaults for backwards compat |
|
|
303
|
+
| Module-level guard persists across sessions (if process reused) | Stale guard | Stdio transport = 1 process per session, safe |
|
|
304
|
+
| Facts stored as JSON may be malformed | Formatting crash | Try/catch JSON.parse, fall back to raw string |
|
|
305
|
+
| LIMIT 50 fetch overhead vs LIMIT 10 | Marginal latency | SQLite with indexes handles 50 rows in <1ms |
|