nano-brain 2026.3.7 → 2026.3.8
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +13 -0
- package/openspec/changes/nano-brain-phase1-memory-intelligence/.openspec.yaml +2 -0
- package/openspec/changes/nano-brain-phase1-memory-intelligence/design.md +206 -0
- package/openspec/changes/nano-brain-phase1-memory-intelligence/proposal.md +30 -0
- package/openspec/changes/nano-brain-phase1-memory-intelligence/specs/auto-categorization/spec.md +67 -0
- package/openspec/changes/nano-brain-phase1-memory-intelligence/specs/memory-relevance-decay/spec.md +85 -0
- package/openspec/changes/nano-brain-phase1-memory-intelligence/specs/search-pipeline/spec.md +23 -0
- package/openspec/changes/nano-brain-phase1-memory-intelligence/specs/storage-limits/spec.md +23 -0
- package/openspec/changes/nano-brain-phase1-memory-intelligence/specs/usage-based-boosting/spec.md +60 -0
- package/openspec/changes/nano-brain-phase1-memory-intelligence/tasks.md +49 -0
- package/openspec/changes/nano-brain-phase2-llm-memory-consolidation/.openspec.yaml +2 -0
- package/openspec/changes/nano-brain-phase2-llm-memory-consolidation/design.md +136 -0
- package/openspec/changes/nano-brain-phase2-llm-memory-consolidation/proposal.md +31 -0
- package/openspec/changes/nano-brain-phase2-llm-memory-consolidation/specs/fact-extraction/spec.md +113 -0
- package/openspec/changes/nano-brain-phase2-llm-memory-consolidation/specs/mcp-server/spec.md +62 -0
- package/openspec/changes/nano-brain-phase2-llm-memory-consolidation/specs/memory-consolidation/spec.md +108 -0
- package/openspec/changes/nano-brain-phase2-llm-memory-consolidation/specs/storage-limits/spec.md +36 -0
- package/openspec/changes/nano-brain-phase2-llm-memory-consolidation/tasks.md +105 -0
- package/openspec/changes/nano-brain-phase3-knowledge-graph/.openspec.yaml +2 -0
- package/openspec/changes/nano-brain-phase3-knowledge-graph/design.md +147 -0
- package/openspec/changes/nano-brain-phase3-knowledge-graph/proposal.md +30 -0
- package/openspec/changes/nano-brain-phase3-knowledge-graph/specs/mcp-server/spec.md +83 -0
- package/openspec/changes/nano-brain-phase3-knowledge-graph/specs/memory-entity-graph/spec.md +72 -0
- package/openspec/changes/nano-brain-phase3-knowledge-graph/specs/proactive-surfacing/spec.md +58 -0
- package/openspec/changes/nano-brain-phase3-knowledge-graph/specs/temporal-reasoning/spec.md +74 -0
- package/openspec/changes/nano-brain-phase3-knowledge-graph/tasks.md +98 -0
- package/openspec/changes/nano-brain-resource-optimization/.openspec.yaml +2 -0
- package/package.json +1 -1
- package/src/codebase.ts +74 -8
- package/src/providers/qdrant.ts +56 -24
- package/src/store.ts +5 -0
- package/src/types.ts +1 -0
package/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,18 @@
|
|
|
1
1
|
# Changelog
|
|
2
2
|
|
|
3
|
+
## [2026.3.8] - 2026-03-08
|
|
4
|
+
|
|
5
|
+
### Fixed
|
|
6
|
+
|
|
7
|
+
- **Embedding 0 chunks infinite loop**: When `chunkMarkdown` returned 0 chunks (empty/whitespace-only body), the batch was counted as embedded but no `content_vectors` rows were inserted. Next iteration fetched the same docs, looping forever. Now skips empty-body docs and adds them to `failedHashes`.
|
|
8
|
+
- **Qdrant fire-and-forget desync**: `insertEmbedding` upserted to Qdrant via `.catch()` (fire-and-forget) then immediately wrote to `content_vectors`. If Qdrant failed, SQLite thought the doc was embedded but Qdrant didn't have it. Now awaits Qdrant `batchUpsert` before writing `content_vectors`.
|
|
9
|
+
- **Qdrant socket errors under load**: Individual per-chunk upserts created hundreds of concurrent HTTP requests, overwhelming the connection. Replaced with batched upserts (100 vectors/request) with retry + exponential backoff (up to 3 retries) for `UND_ERR_SOCKET`, `ECONNRESET`, and `ECONNREFUSED` errors.
|
|
10
|
+
|
|
11
|
+
### Added
|
|
12
|
+
|
|
13
|
+
- **Embed batch file logging**: Embed log now shows file names being processed: `[embed] Batch 3 docs, 10 chunks: package.json, tsconfig.json, README.md`.
|
|
14
|
+
- **`insertEmbeddingLocal`**: SQLite-only embedding record method for use when external vector store is handled separately.
|
|
15
|
+
|
|
3
16
|
## [2026.2.0] - 2026-03-05
|
|
4
17
|
|
|
5
18
|
### Added
|
|
@@ -0,0 +1,206 @@
|
|
|
1
|
+
# Design: Memory Intelligence Phase 1
|
|
2
|
+
|
|
3
|
+
## Context
|
|
4
|
+
|
|
5
|
+
nano-brain currently treats all memories equally regardless of age or usage. A 6-month-old debugging note ranks the same as yesterday's architecture decision. The search pipeline (RRF fusion → top rank bonus → centrality boost → supersede demotion → position-aware blending) has no awareness of memory lifecycle. This design adds three lightweight intelligence features to the existing pipeline without introducing LLM dependencies or blocking operations.
|
|
6
|
+
|
|
7
|
+
The search scoring pipeline lives in `search.ts` and follows this flow:
|
|
8
|
+
1. `rrfFuse` — combines BM25 (FTS) and vector search results
|
|
9
|
+
2. `applyTopRankBonus` — boosts top-ranked results (not currently in use)
|
|
10
|
+
3. `applyCentralityBoost` — multiplies score by `(1 + centralityWeight * centrality)`
|
|
11
|
+
4. `applySupersedeDemotion` — multiplies score by `demotionFactor` for superseded docs
|
|
12
|
+
5. `positionAwareBlend` — blends RRF and rerank scores based on position (top3/mid/tail)
|
|
13
|
+
|
|
14
|
+
Schema is in `store.ts` with tables: `documents`, `content`, `document_tags`, `content_vectors`, `documents_fts`. The `documents` table currently tracks `created_at`, `modified_at`, and `active` but has no access tracking.
|
|
15
|
+
|
|
16
|
+
## Goals
|
|
17
|
+
|
|
18
|
+
**In Scope:**
|
|
19
|
+
- Track memory access patterns (count and timestamp) without performance overhead
|
|
20
|
+
- Apply time-based relevance decay to search scoring using a configurable half-life model
|
|
21
|
+
- Auto-categorize memories on write using fast heuristic rules (no LLM)
|
|
22
|
+
- Boost frequently accessed memories in search results
|
|
23
|
+
- Maintain backward compatibility with existing memories and config
|
|
24
|
+
|
|
25
|
+
**Out of Scope (Phase 2+):**
|
|
26
|
+
- LLM-based categorization or summarization
|
|
27
|
+
- Automatic memory deletion or archival
|
|
28
|
+
- User-facing UI for memory management
|
|
29
|
+
- Cross-collection memory relationships
|
|
30
|
+
- Memory consolidation or deduplication
|
|
31
|
+
|
|
32
|
+
## Decisions
|
|
33
|
+
|
|
34
|
+
### 1. Memory Relevance Decay
|
|
35
|
+
|
|
36
|
+
**What:** Add `access_count INTEGER DEFAULT 0` and `last_accessed_at TEXT` columns to the `documents` table. Track access on every search result returned to the user. Apply exponential decay to search scores based on time since last access.
|
|
37
|
+
|
|
38
|
+
**Why:** Memories that haven't been accessed in months are less likely to be relevant now. Exponential decay with a configurable half-life (default 30 days) provides intuitive control: memory is "half as relevant" after N days of no access. This is gentler than LRU eviction (which deletes) and more flexible than fixed TTL (which is binary).
|
|
39
|
+
|
|
40
|
+
**How:**
|
|
41
|
+
- Schema migration: `ALTER TABLE documents ADD COLUMN access_count INTEGER DEFAULT 0; ALTER TABLE documents ADD COLUMN last_accessed_at TEXT;`
|
|
42
|
+
- On search result return (in `hybridSearch` after final scoring), increment `access_count` and update `last_accessed_at` for each result shown to the user
|
|
43
|
+
- Decay function: `decayScore = 1 / (1 + daysSinceAccess / halfLife)` where `daysSinceAccess = (now - last_accessed_at) / 86400000`
|
|
44
|
+
- Applied as a multiplier in the scoring pipeline after RRF fusion but before position-aware blending
|
|
45
|
+
- New config section in `config.yml`:
|
|
46
|
+
```yaml
|
|
47
|
+
decay:
|
|
48
|
+
enabled: true
|
|
49
|
+
halfLife: "30d" # parsed to days
|
|
50
|
+
boostWeight: 0.15 # how much decay affects final score
|
|
51
|
+
```
|
|
52
|
+
- Backward compatible: existing memories get `access_count=0`, `last_accessed_at=NULL`; decay treats NULL as "never accessed" (maximum decay)
|
|
53
|
+
|
|
54
|
+
**Alternatives considered:**
|
|
55
|
+
- **LRU eviction:** Too aggressive. Deletes memories permanently. Users lose context.
|
|
56
|
+
- **Fixed TTL:** Too rigid. A 90-day-old architecture decision is still valuable; a 7-day-old typo fix is not.
|
|
57
|
+
- **No decay (status quo):** Loses signal. Old noise drowns out recent signal as memory grows.
|
|
58
|
+
|
|
59
|
+
**Trade-offs:**
|
|
60
|
+
- Adds 2 columns to `documents` table (minimal storage overhead: ~16 bytes per doc)
|
|
61
|
+
- Adds 1 UPDATE per search result returned (negligible for typical result counts of 5-20)
|
|
62
|
+
- Decay calculation is pure math (no I/O, no blocking)
|
|
63
|
+
|
|
64
|
+
### 2. Auto-Categorization on Write
|
|
65
|
+
|
|
66
|
+
**What:** When `memory_write` is called, classify content into predefined categories using keyword/pattern heuristics. Populate the existing `document_tags` table with auto-generated tags prefixed with `auto:`.
|
|
67
|
+
|
|
68
|
+
**Why:** Manual tagging is tedious and inconsistent. Heuristic categorization is instant, deterministic, and requires no external dependencies. Categories help users filter searches (e.g., "show me architecture decisions") and provide structure for future features (e.g., category-specific retention policies).
|
|
69
|
+
|
|
70
|
+
**How:**
|
|
71
|
+
- Categories: `architecture-decision`, `debugging-insight`, `tool-config`, `pattern`, `preference`, `context`, `workflow`
|
|
72
|
+
- Detection rules (keyword matching, case-insensitive):
|
|
73
|
+
- `architecture-decision`: "decided", "chose", "architecture", "design decision", "trade-off", "approach"
|
|
74
|
+
- `debugging-insight`: "error", "fix", "bug", "stack trace", "crash", "exception", "workaround"
|
|
75
|
+
- `tool-config`: "config", "setup", "install", "environment", "settings", "configuration"
|
|
76
|
+
- `pattern`: "pattern", "convention", "idiom", "best practice", "anti-pattern"
|
|
77
|
+
- `preference`: "prefer", "avoid", "like", "dislike", "style", "opinion"
|
|
78
|
+
- `context`: "context", "background", "overview", "summary", "explanation"
|
|
79
|
+
- `workflow`: "workflow", "process", "steps", "procedure", "checklist"
|
|
80
|
+
- Applied in `store.insertDocument` after content insertion
|
|
81
|
+
- Tags are prefixed with `auto:` (e.g., `auto:architecture-decision`) to distinguish from user-provided tags
|
|
82
|
+
- Additive: does not remove user-provided tags
|
|
83
|
+
- Multiple categories can apply to a single memory
|
|
84
|
+
|
|
85
|
+
**Alternatives considered:**
|
|
86
|
+
- **LLM-based categorization:** Too slow (100-500ms per write), costs money, requires API keys. Deferred to Phase 2.
|
|
87
|
+
- **No categorization (status quo):** Memories remain unstructured. Harder to filter or prioritize.
|
|
88
|
+
- **User-only tagging:** Requires manual effort. Users forget or skip tagging.
|
|
89
|
+
|
|
90
|
+
**Trade-offs:**
|
|
91
|
+
- Heuristics are imperfect. Some memories will be miscategorized or miss categories.
|
|
92
|
+
- Keyword matching is language-dependent (assumes English). Non-English memories may not categorize well.
|
|
93
|
+
- Adds ~5-10ms to write latency (negligible for background MCP server)
|
|
94
|
+
|
|
95
|
+
### 3. Usage-Based Search Boosting
|
|
96
|
+
|
|
97
|
+
**What:** Integrate `access_count` and `last_accessed_at` into the hybrid search scoring pipeline. Frequently accessed memories get a configurable boost.
|
|
98
|
+
|
|
99
|
+
**Why:** Memories that are accessed repeatedly are more likely to be relevant in the future. This is a form of implicit feedback: the user's past behavior signals importance. Combined with decay, this creates a "hot/cold" memory system where frequently accessed recent memories rank highest.
|
|
100
|
+
|
|
101
|
+
**How:**
|
|
102
|
+
- New function `applyUsageBoost(results, config)` similar to `applyCentralityBoost`
|
|
103
|
+
- Formula: `usageBoost = log2(1 + access_count) * decayScore * boostWeight`
|
|
104
|
+
- `log2(1 + access_count)` provides diminishing returns (10 accesses is not 10x better than 1 access)
|
|
105
|
+
- `decayScore` is the decay multiplier from Feature 1 (so old memories don't get boosted)
|
|
106
|
+
- `boostWeight` is configurable (default 0.15)
|
|
107
|
+
- Applied in the scoring pipeline after `applyCentralityBoost`, before `applySupersedeDemotion`
|
|
108
|
+
- New config field in `SearchConfig` type (`types.ts`):
|
|
109
|
+
```typescript
|
|
110
|
+
usage_boost_weight: number; // default 0.15
|
|
111
|
+
```
|
|
112
|
+
- Backward compatible: memories with `access_count=0` get no boost (log2(1) = 0)
|
|
113
|
+
|
|
114
|
+
**Alternatives considered:**
|
|
115
|
+
- **Replace RRF entirely with usage-based ranking:** Too risky. RRF is proven. Usage is a signal, not the only signal.
|
|
116
|
+
- **Separate boost index:** Over-engineered. Usage data is already in `documents` table.
|
|
117
|
+
- **Linear boost (not log):** Amplifies outliers too much. A memory accessed 100 times would dominate results.
|
|
118
|
+
|
|
119
|
+
**Trade-offs:**
|
|
120
|
+
- Adds 1 JOIN to search query (negligible: `documents` table is already joined)
|
|
121
|
+
- Boost calculation is pure math (no I/O, no blocking)
|
|
122
|
+
- Cold start problem: new memories have `access_count=0` and rank lower. Mitigated by decay (new memories have no decay penalty).
|
|
123
|
+
|
|
124
|
+
## Risks and Trade-offs
|
|
125
|
+
|
|
126
|
+
### Performance
|
|
127
|
+
- **Risk:** Access tracking adds 1 UPDATE per search result. For 20 results, that's 20 UPDATEs.
|
|
128
|
+
- **Mitigation:** SQLite handles small UPDATEs efficiently. Batch UPDATEs in a single transaction. Measure latency in testing.
|
|
129
|
+
- **Fallback:** If latency is unacceptable, make access tracking async (fire-and-forget).
|
|
130
|
+
|
|
131
|
+
### Accuracy
|
|
132
|
+
- **Risk:** Heuristic categorization is imperfect. Memories may be miscategorized.
|
|
133
|
+
- **Mitigation:** Use `auto:` prefix so users can distinguish auto-tags from manual tags. Users can remove incorrect auto-tags.
|
|
134
|
+
- **Fallback:** If categorization is too noisy, add a config flag to disable it.
|
|
135
|
+
|
|
136
|
+
### Cold Start
|
|
137
|
+
- **Risk:** New memories have `access_count=0` and rank lower than old frequently accessed memories.
|
|
138
|
+
- **Mitigation:** Decay penalizes old memories. New memories have no decay penalty, so they start with a neutral score.
|
|
139
|
+
- **Fallback:** Add a "recency boost" in Phase 2 to explicitly favor new memories.
|
|
140
|
+
|
|
141
|
+
### Backward Compatibility
|
|
142
|
+
- **Risk:** Existing memories have `access_count=0` and `last_accessed_at=NULL`.
|
|
143
|
+
- **Mitigation:** Decay treats NULL as "never accessed" (maximum decay). This is correct: old memories that were never accessed should rank lower.
|
|
144
|
+
- **Fallback:** Provide a migration script to backfill `last_accessed_at` with `created_at` for existing memories.
|
|
145
|
+
|
|
146
|
+
### Configuration Complexity
|
|
147
|
+
- **Risk:** Adding `decay` and `usage_boost_weight` config fields increases surface area.
|
|
148
|
+
- **Mitigation:** Provide sensible defaults (enabled, 30d half-life, 0.15 boost weight). Document in README.
|
|
149
|
+
- **Fallback:** If users find config overwhelming, hide advanced options behind a `--advanced` flag.
|
|
150
|
+
|
|
151
|
+
## Implementation Notes
|
|
152
|
+
|
|
153
|
+
### Schema Migration
|
|
154
|
+
```sql
|
|
155
|
+
ALTER TABLE documents ADD COLUMN access_count INTEGER DEFAULT 0;
|
|
156
|
+
ALTER TABLE documents ADD COLUMN last_accessed_at TEXT;
|
|
157
|
+
CREATE INDEX IF NOT EXISTS idx_documents_access ON documents(last_accessed_at);
|
|
158
|
+
```
|
|
159
|
+
|
|
160
|
+
### Scoring Pipeline Order
|
|
161
|
+
1. `rrfFuse` — combine BM25 + vector
|
|
162
|
+
2. `applyTopRankBoost` (if enabled)
|
|
163
|
+
3. `applyCentralityBoost` (existing)
|
|
164
|
+
4. **`applyUsageBoost` (new)** — boost frequently accessed memories
|
|
165
|
+
5. `applySupersedeDemotion` (existing)
|
|
166
|
+
6. `positionAwareBlend` — blend RRF + rerank scores
|
|
167
|
+
7. **Track access** (new) — increment `access_count`, update `last_accessed_at`
|
|
168
|
+
|
|
169
|
+
### Config Schema
|
|
170
|
+
```yaml
|
|
171
|
+
decay:
|
|
172
|
+
enabled: true
|
|
173
|
+
halfLife: "30d" # supports "7d", "30d", "90d", etc.
|
|
174
|
+
boostWeight: 0.15
|
|
175
|
+
|
|
176
|
+
search:
|
|
177
|
+
usage_boost_weight: 0.15
|
|
178
|
+
# ... existing fields
|
|
179
|
+
```
|
|
180
|
+
|
|
181
|
+
### Auto-Categorization Rules
|
|
182
|
+
Implemented as a simple keyword matcher in `store.ts`:
|
|
183
|
+
```typescript
|
|
184
|
+
function autoCategorizeTags(content: string): string[] {
|
|
185
|
+
const tags: string[] = [];
|
|
186
|
+
const lower = content.toLowerCase();
|
|
187
|
+
|
|
188
|
+
if (/\b(decided|chose|architecture|design decision|trade-?off|approach)\b/.test(lower)) {
|
|
189
|
+
tags.push('auto:architecture-decision');
|
|
190
|
+
}
|
|
191
|
+
if (/\b(error|fix|bug|stack trace|crash|exception|workaround)\b/.test(lower)) {
|
|
192
|
+
tags.push('auto:debugging-insight');
|
|
193
|
+
}
|
|
194
|
+
// ... more rules
|
|
195
|
+
|
|
196
|
+
return tags;
|
|
197
|
+
}
|
|
198
|
+
```
|
|
199
|
+
|
|
200
|
+
### Testing Strategy
|
|
201
|
+
- Unit tests for decay function (edge cases: NULL, 0, negative)
|
|
202
|
+
- Unit tests for usage boost (edge cases: 0 access_count, high access_count)
|
|
203
|
+
- Unit tests for auto-categorization (each category, multi-category, no match)
|
|
204
|
+
- Integration test: write memory → search → verify access tracking
|
|
205
|
+
- Integration test: search with decay enabled vs disabled
|
|
206
|
+
- Performance test: measure search latency with 10k memories, 20 results
|
|
@@ -0,0 +1,30 @@
|
|
|
1
|
+
## Why
|
|
2
|
+
|
|
3
|
+
nano-brain stores memories indefinitely with equal weight — a 6-month-old unused debugging note ranks the same as yesterday's critical architecture decision. There is no automatic organization, no relevance scoring, and no way to distinguish signal from noise as memory accumulates over time. Competitive memory systems (Mem0, memU) achieve 26-74% higher accuracy on benchmarks partly through intelligent memory lifecycle management. Phase 1 addresses the three lowest-effort, highest-impact gaps: relevance decay, automatic categorization, and usage-based ranking.
|
|
4
|
+
|
|
5
|
+
## What Changes
|
|
6
|
+
|
|
7
|
+
- **Memory relevance decay**: Add `access_count` and `last_accessed_at` tracking to documents. Introduce a configurable decay function that deprioritizes stale, unused memories in search results. Memories accessed frequently stay prominent; neglected ones fade gracefully without deletion.
|
|
8
|
+
- **Auto-categorization on write**: When `memory_write` is called, classify the content into predefined categories (architecture-decision, debugging-insight, tool-config, pattern, preference, context) using lightweight keyword/heuristic matching. Populate the existing `tags` field automatically. No LLM dependency for Phase 1 — keep it fast and local.
|
|
9
|
+
- **Usage-based search boosting**: Integrate access frequency and recency into the hybrid search scoring pipeline. Frequently retrieved memories get a configurable boost in RRF fusion, complementing the existing BM25 + vector + rerank pipeline.
|
|
10
|
+
|
|
11
|
+
## Capabilities
|
|
12
|
+
|
|
13
|
+
### New Capabilities
|
|
14
|
+
- `memory-relevance-decay`: Track memory access patterns and apply time-based relevance decay to search scoring
|
|
15
|
+
- `auto-categorization`: Automatically classify and tag memories on write using heuristic rules
|
|
16
|
+
- `usage-based-boosting`: Boost frequently accessed memories in hybrid search results
|
|
17
|
+
|
|
18
|
+
### Modified Capabilities
|
|
19
|
+
- `search-pipeline`: Search scoring now incorporates access frequency and recency as additional ranking signals
|
|
20
|
+
- `storage-limits`: Decay metadata (access_count, last_accessed_at) added to document schema; retention eviction can optionally prioritize low-access documents
|
|
21
|
+
|
|
22
|
+
## Impact
|
|
23
|
+
|
|
24
|
+
- **Schema**: New columns on `documents` table (`access_count INTEGER DEFAULT 0`, `last_accessed_at TEXT`)
|
|
25
|
+
- **Search pipeline** (`search.ts`): Additional scoring factor in RRF fusion for access-based boosting
|
|
26
|
+
- **Store** (`store.ts`): Track access on every search result retrieval; auto-tag on document insert
|
|
27
|
+
- **MCP server** (`server.ts`): `memory_write` gains auto-categorization; search tools update access tracking
|
|
28
|
+
- **Config** (`config.yml`): New `decay` section with `halfLife`, `boostWeight`, `enabled` fields
|
|
29
|
+
- **No new dependencies**: Heuristic categorization avoids LLM calls; decay is pure math
|
|
30
|
+
- **Backward compatible**: Existing memories get `access_count=0`, `last_accessed_at=NULL`; decay defaults to disabled until configured
|
package/openspec/changes/nano-brain-phase1-memory-intelligence/specs/auto-categorization/spec.md
ADDED
|
@@ -0,0 +1,67 @@
|
|
|
1
|
+
## ADDED Requirements
|
|
2
|
+
|
|
3
|
+
### Requirement: Automatic tag assignment on write
|
|
4
|
+
|
|
5
|
+
When `memory_write` is called, the system SHALL analyze the content and assign category tags based on keyword/pattern heuristics. Auto-generated tags SHALL be prefixed with `auto:` (e.g., `auto:architecture-decision`).
|
|
6
|
+
|
|
7
|
+
#### Scenario: Content contains architecture decision keywords
|
|
8
|
+
|
|
9
|
+
- **WHEN** content contains "decided to use PostgreSQL"
|
|
10
|
+
- **THEN** the document is tagged with `auto:architecture-decision`
|
|
11
|
+
|
|
12
|
+
#### Scenario: Content contains debugging keywords
|
|
13
|
+
|
|
14
|
+
- **WHEN** content contains "fixed bug" and "stack trace"
|
|
15
|
+
- **THEN** the document is tagged with `auto:debugging-insight`
|
|
16
|
+
|
|
17
|
+
#### Scenario: Content with no matching patterns
|
|
18
|
+
|
|
19
|
+
- **WHEN** content does not match any category patterns
|
|
20
|
+
- **THEN** no auto tags are added to the document
|
|
21
|
+
|
|
22
|
+
### Requirement: Category definitions
|
|
23
|
+
|
|
24
|
+
The system SHALL recognize these categories: `architecture-decision` (keywords: decided, chose, architecture, design, tradeoff, approach), `debugging-insight` (keywords: error, fix, bug, stack trace, debug, workaround), `tool-config` (keywords: config, setup, install, environment, .env), `pattern` (keywords: pattern, convention, always, never, rule), `preference` (keywords: prefer, like, dislike, favorite, default), `context` (keywords: context, background, note, remember), `workflow` (keywords: workflow, process, step, pipeline, deploy).
|
|
25
|
+
|
|
26
|
+
#### Scenario: Architecture decision content
|
|
27
|
+
|
|
28
|
+
- **WHEN** content is "We chose React over Vue for the frontend"
|
|
29
|
+
- **THEN** the document is tagged with `auto:architecture-decision`
|
|
30
|
+
|
|
31
|
+
#### Scenario: Debugging insight content
|
|
32
|
+
|
|
33
|
+
- **WHEN** content is "npm install failed, fixed by clearing cache"
|
|
34
|
+
- **THEN** the document is tagged with `auto:debugging-insight`
|
|
35
|
+
|
|
36
|
+
#### Scenario: Content matching multiple categories
|
|
37
|
+
|
|
38
|
+
- **WHEN** content matches patterns for both `architecture-decision` and `workflow`
|
|
39
|
+
- **THEN** all matching auto tags are applied to the document
|
|
40
|
+
|
|
41
|
+
### Requirement: Additive tagging
|
|
42
|
+
|
|
43
|
+
Auto-categorization SHALL NOT remove or replace user-provided tags. Auto tags are added alongside any tags the user explicitly provides.
|
|
44
|
+
|
|
45
|
+
#### Scenario: User provides explicit tags
|
|
46
|
+
|
|
47
|
+
- **WHEN** user provides tags=["important"] and content matches debugging patterns
|
|
48
|
+
- **THEN** the final tags are ["important", "auto:debugging-insight"]
|
|
49
|
+
|
|
50
|
+
#### Scenario: User provides no tags
|
|
51
|
+
|
|
52
|
+
- **WHEN** user provides no tags and content matches categorization patterns
|
|
53
|
+
- **THEN** only auto tags are applied to the document
|
|
54
|
+
|
|
55
|
+
### Requirement: Deterministic and fast
|
|
56
|
+
|
|
57
|
+
Auto-categorization SHALL NOT use LLM calls. It SHALL complete in under 5ms for typical content (under 10KB). The heuristic engine SHALL be pure keyword/regex matching.
|
|
58
|
+
|
|
59
|
+
#### Scenario: Small document categorization
|
|
60
|
+
|
|
61
|
+
- **WHEN** a 5KB markdown document is written
|
|
62
|
+
- **THEN** auto-categorization completes in under 5ms
|
|
63
|
+
|
|
64
|
+
#### Scenario: Large document categorization
|
|
65
|
+
|
|
66
|
+
- **WHEN** a 100KB document is written
|
|
67
|
+
- **THEN** auto-categorization completes in under 50ms
|
package/openspec/changes/nano-brain-phase1-memory-intelligence/specs/memory-relevance-decay/spec.md
ADDED
|
@@ -0,0 +1,85 @@
|
|
|
1
|
+
## ADDED Requirements
|
|
2
|
+
|
|
3
|
+
### Requirement: Access tracking on search results
|
|
4
|
+
|
|
5
|
+
The system SHALL increment `access_count` and update `last_accessed_at` for every document returned in search results to the user. Internal pipeline queries (expansion, reranking) SHALL NOT trigger access tracking.
|
|
6
|
+
|
|
7
|
+
#### Scenario: Search returns multiple results
|
|
8
|
+
|
|
9
|
+
- **WHEN** a search returns 5 results to the user
|
|
10
|
+
- **THEN** all 5 documents have their `access_count` incremented by 1
|
|
11
|
+
- **THEN** all 5 documents have their `last_accessed_at` updated to the current timestamp
|
|
12
|
+
|
|
13
|
+
#### Scenario: Same document returned in separate searches
|
|
14
|
+
|
|
15
|
+
- **WHEN** the same document is returned in two separate user searches
|
|
16
|
+
- **THEN** the document's `access_count` is 2
|
|
17
|
+
- **THEN** the document's `last_accessed_at` reflects the most recent search timestamp
|
|
18
|
+
|
|
19
|
+
#### Scenario: Internal pipeline query does not trigger tracking
|
|
20
|
+
|
|
21
|
+
- **WHEN** a vector-only internal search occurs during the hybrid pipeline
|
|
22
|
+
- **THEN** no `access_count` increments occur for those intermediate results
|
|
23
|
+
- **THEN** only the final results returned to the user trigger access tracking
|
|
24
|
+
|
|
25
|
+
### Requirement: Decay score computation
|
|
26
|
+
|
|
27
|
+
The system SHALL compute a decay score using the formula `1 / (1 + daysSinceAccess / halfLife)` where `daysSinceAccess` is the number of days since `last_accessed_at` and `halfLife` is configurable. Documents with NULL `last_accessed_at` SHALL use `created_at` as fallback.
|
|
28
|
+
|
|
29
|
+
#### Scenario: Document accessed today
|
|
30
|
+
|
|
31
|
+
- **WHEN** a document was accessed today (daysSinceAccess = 0)
|
|
32
|
+
- **THEN** the decay score is approximately 1.0
|
|
33
|
+
|
|
34
|
+
#### Scenario: Document not accessed for 30 days with 30-day half-life
|
|
35
|
+
|
|
36
|
+
- **WHEN** a document has not been accessed for 30 days and `halfLife` is 30 days
|
|
37
|
+
- **THEN** the decay score is approximately 0.5
|
|
38
|
+
|
|
39
|
+
#### Scenario: Document never accessed
|
|
40
|
+
|
|
41
|
+
- **WHEN** a document has NULL `last_accessed_at`
|
|
42
|
+
- **THEN** the system uses `created_at` for the `daysSinceAccess` calculation
|
|
43
|
+
- **THEN** the decay score is computed based on the document's age
|
|
44
|
+
|
|
45
|
+
### Requirement: Schema migration
|
|
46
|
+
|
|
47
|
+
The system SHALL add `access_count INTEGER DEFAULT 0` and `last_accessed_at TEXT DEFAULT NULL` columns to the `documents` table. Existing documents SHALL retain default values. Migration SHALL be backward compatible (no data loss).
|
|
48
|
+
|
|
49
|
+
#### Scenario: Fresh database initialization
|
|
50
|
+
|
|
51
|
+
- **WHEN** a new database is created
|
|
52
|
+
- **THEN** the `documents` table includes `access_count` and `last_accessed_at` columns from creation
|
|
53
|
+
|
|
54
|
+
#### Scenario: Existing database without decay columns
|
|
55
|
+
|
|
56
|
+
- **WHEN** an existing database does not have `access_count` or `last_accessed_at` columns
|
|
57
|
+
- **THEN** an ALTER TABLE migration adds both columns with default values
|
|
58
|
+
- **THEN** no existing data is lost
|
|
59
|
+
|
|
60
|
+
#### Scenario: Existing documents after migration
|
|
61
|
+
|
|
62
|
+
- **WHEN** the migration completes on a database with existing documents
|
|
63
|
+
- **THEN** all existing documents have `access_count` set to 0
|
|
64
|
+
- **THEN** all existing documents have `last_accessed_at` set to NULL
|
|
65
|
+
|
|
66
|
+
### Requirement: Decay configuration
|
|
67
|
+
|
|
68
|
+
The system SHALL support a `decay` section in config.yml with `enabled` (boolean, default false), `halfLife` (duration string, default "30d"), and `boostWeight` (number 0-1, default 0.15). Invalid values SHALL log a warning and use defaults.
|
|
69
|
+
|
|
70
|
+
#### Scenario: Decay not configured
|
|
71
|
+
|
|
72
|
+
- **WHEN** config.yml has no `decay` section
|
|
73
|
+
- **THEN** decay is disabled by default
|
|
74
|
+
- **THEN** no decay scoring is applied to search results
|
|
75
|
+
|
|
76
|
+
#### Scenario: Decay enabled with custom half-life
|
|
77
|
+
|
|
78
|
+
- **WHEN** config.yml contains `decay: { enabled: true, halfLife: "7d" }`
|
|
79
|
+
- **THEN** the system uses a 7-day half-life for decay calculations
|
|
80
|
+
|
|
81
|
+
#### Scenario: Invalid half-life value
|
|
82
|
+
|
|
83
|
+
- **WHEN** config.yml contains `decay: { halfLife: "banana" }`
|
|
84
|
+
- **THEN** a warning is logged indicating the invalid value
|
|
85
|
+
- **THEN** the default half-life of 30 days is used
|
|
@@ -0,0 +1,23 @@
|
|
|
1
|
+
## ADDED Requirements
|
|
2
|
+
|
|
3
|
+
### Requirement: Usage-aware scoring in hybrid pipeline
|
|
4
|
+
|
|
5
|
+
The `memory_query` hybrid search pipeline SHALL incorporate usage-based scoring as an additional ranking signal. The scoring pipeline order SHALL be: RRF fusion → top-rank bonus → centrality boost → usage boost → supersede demotion → position-aware blend (if reranking enabled).
|
|
6
|
+
|
|
7
|
+
#### Scenario: Hybrid search with usage boost enabled
|
|
8
|
+
|
|
9
|
+
- **WHEN** a hybrid search is performed with usage boost enabled
|
|
10
|
+
- **THEN** search results reflect access patterns in their ranking
|
|
11
|
+
- **THEN** frequently accessed documents rank higher than identical documents with lower access counts
|
|
12
|
+
|
|
13
|
+
#### Scenario: Hybrid search with usage boost disabled
|
|
14
|
+
|
|
15
|
+
- **WHEN** a hybrid search is performed with `usage_boost_weight` set to 0
|
|
16
|
+
- **THEN** the search results are identical to the current behavior without usage boosting
|
|
17
|
+
- **THEN** no usage-based score adjustments are applied
|
|
18
|
+
|
|
19
|
+
#### Scenario: BM25-only search does not apply usage boost
|
|
20
|
+
|
|
21
|
+
- **WHEN** a BM25-only search is performed using `memory_search`
|
|
22
|
+
- **THEN** no usage boost is applied to the results
|
|
23
|
+
- **THEN** only the hybrid pipeline (`memory_query`) incorporates usage-based scoring
|
|
@@ -0,0 +1,23 @@
|
|
|
1
|
+
## ADDED Requirements
|
|
2
|
+
|
|
3
|
+
### Requirement: Access-aware retention eviction
|
|
4
|
+
|
|
5
|
+
When performing size-based eviction (storage exceeds maxSize), the system SHALL optionally consider access_count when selecting documents to evict. Documents with lower access_count SHALL be evicted before documents with higher access_count, within the same age tier. This behavior SHALL be enabled when `decay.enabled` is true in config.
|
|
6
|
+
|
|
7
|
+
#### Scenario: Two documents same age with different access counts
|
|
8
|
+
|
|
9
|
+
- **WHEN** two documents have the same age, one with `access_count` of 10 and one with `access_count` of 0
|
|
10
|
+
- **THEN** the document with `access_count` of 0 is evicted first
|
|
11
|
+
- **THEN** the document with `access_count` of 10 is retained
|
|
12
|
+
|
|
13
|
+
#### Scenario: Decay disabled uses age-only eviction
|
|
14
|
+
|
|
15
|
+
- **WHEN** `decay.enabled` is false in config
|
|
16
|
+
- **THEN** eviction uses age-only ordering (current behavior)
|
|
17
|
+
- **THEN** `access_count` is not considered in eviction decisions
|
|
18
|
+
|
|
19
|
+
#### Scenario: All documents have zero access count
|
|
20
|
+
|
|
21
|
+
- **WHEN** all documents have `access_count` of 0
|
|
22
|
+
- **THEN** the system falls back to age-based eviction
|
|
23
|
+
- **THEN** the oldest documents are evicted first
|
package/openspec/changes/nano-brain-phase1-memory-intelligence/specs/usage-based-boosting/spec.md
ADDED
|
@@ -0,0 +1,60 @@
|
|
|
1
|
+
## ADDED Requirements
|
|
2
|
+
|
|
3
|
+
### Requirement: Usage boost in search scoring
|
|
4
|
+
|
|
5
|
+
The hybrid search pipeline SHALL apply a usage-based boost to results using the formula `usageBoost = log2(1 + access_count) * decayScore * boostWeight`. The boost SHALL be applied as an additive score adjustment.
|
|
6
|
+
|
|
7
|
+
#### Scenario: Document with no access history
|
|
8
|
+
|
|
9
|
+
- **WHEN** a document has `access_count` of 0
|
|
10
|
+
- **THEN** the usage boost is 0 (since log2(1) = 0)
|
|
11
|
+
- **THEN** the document's score is not affected by usage boosting
|
|
12
|
+
|
|
13
|
+
#### Scenario: Document with moderate access and recent activity
|
|
14
|
+
|
|
15
|
+
- **WHEN** a document has `access_count` of 7 and was recently accessed
|
|
16
|
+
- **THEN** a moderate usage boost is applied (log2(8) * decayScore * boostWeight)
|
|
17
|
+
- **THEN** the document ranks higher than identical documents with lower access counts
|
|
18
|
+
|
|
19
|
+
#### Scenario: Document with high access but stale
|
|
20
|
+
|
|
21
|
+
- **WHEN** a document has `access_count` of 100 but has not been accessed recently
|
|
22
|
+
- **THEN** the usage boost is reduced by the decay score
|
|
23
|
+
- **THEN** the boost is lower than a recently accessed document with the same access count
|
|
24
|
+
|
|
25
|
+
### Requirement: Boost pipeline position
|
|
26
|
+
|
|
27
|
+
The usage boost SHALL be applied after centrality boost and before supersede demotion in the search scoring pipeline.
|
|
28
|
+
|
|
29
|
+
#### Scenario: Result with high centrality and high usage
|
|
30
|
+
|
|
31
|
+
- **WHEN** a document has both high centrality and high usage scores
|
|
32
|
+
- **THEN** both the centrality boost and usage boost are applied
|
|
33
|
+
- **THEN** the boosts compound to increase the document's final score
|
|
34
|
+
|
|
35
|
+
#### Scenario: Superseded document with high usage
|
|
36
|
+
|
|
37
|
+
- **WHEN** a superseded document has high usage
|
|
38
|
+
- **THEN** the usage boost is applied first
|
|
39
|
+
- **THEN** the supersede demotion is applied afterward, reducing the final score
|
|
40
|
+
|
|
41
|
+
### Requirement: Configuration
|
|
42
|
+
|
|
43
|
+
The SearchConfig SHALL include `usage_boost_weight` (number, default 0.15). Setting to 0 effectively disables usage boosting without disabling access tracking.
|
|
44
|
+
|
|
45
|
+
#### Scenario: Usage boost disabled
|
|
46
|
+
|
|
47
|
+
- **WHEN** `usage_boost_weight` is set to 0
|
|
48
|
+
- **THEN** no usage boost is applied to search results
|
|
49
|
+
- **THEN** access tracking still occurs for all returned documents
|
|
50
|
+
|
|
51
|
+
#### Scenario: Stronger usage signal
|
|
52
|
+
|
|
53
|
+
- **WHEN** `usage_boost_weight` is set to 0.3
|
|
54
|
+
- **THEN** usage-based boosting has a stronger effect on search rankings
|
|
55
|
+
|
|
56
|
+
#### Scenario: Negative boost weight
|
|
57
|
+
|
|
58
|
+
- **WHEN** `usage_boost_weight` is set to a negative value
|
|
59
|
+
- **THEN** a warning is logged indicating the invalid value
|
|
60
|
+
- **THEN** the default value of 0.15 is used
|
|
@@ -0,0 +1,49 @@
|
|
|
1
|
+
# Tasks: Memory Intelligence Phase 1
|
|
2
|
+
|
|
3
|
+
## 1. Schema & Migration
|
|
4
|
+
- [ ] 1.1 Add `access_count INTEGER DEFAULT 0` and `last_accessed_at TEXT` columns to documents table CREATE TABLE in `src/store.ts`
|
|
5
|
+
- [ ] 1.2 Add migration logic to detect and ALTER TABLE for existing databases (follow existing migration pattern in store.ts around line 196-230)
|
|
6
|
+
- [ ] 1.3 Add `access_count` and `lastAccessedAt` fields to `Document` interface and `SearchResult` interface in `src/types.ts`
|
|
7
|
+
- [ ] 1.4 Add index on `last_accessed_at` column: `CREATE INDEX IF NOT EXISTS idx_documents_access ON documents(last_accessed_at)`
|
|
8
|
+
|
|
9
|
+
## 2. Decay Configuration
|
|
10
|
+
- [ ] 2.1 Add `decay` section to `CollectionConfig` interface in `src/types.ts`: `{ enabled: boolean, halfLife: string, boostWeight: number }`
|
|
11
|
+
- [ ] 2.2 Add decay config parsing with duration string support and validation (reuse existing parseDuration pattern from storage.ts if available, or add new)
|
|
12
|
+
- [ ] 2.3 Add `usage_boost_weight` field to `SearchConfig` interface and `DEFAULT_SEARCH_CONFIG` in `src/types.ts` (default 0.15)
|
|
13
|
+
- [ ] 2.4 Add config validation for `boostWeight` (0-1 range) and `usage_boost_weight` (warn on negative, use default)
|
|
14
|
+
|
|
15
|
+
## 3. Auto-Categorization Engine
|
|
16
|
+
- [ ] 3.1 Create new file `src/categorizer.ts` with `categorize(content: string): string[]` function — pure keyword/regex matching, returns array of category strings
|
|
17
|
+
- [ ] 3.2 Implement category rules: architecture-decision, debugging-insight, tool-config, pattern, preference, context, workflow with keyword lists per the spec
|
|
18
|
+
- [ ] 3.3 Add `auto:` prefix to all auto-generated tags
|
|
19
|
+
- [ ] 3.4 Wire categorizer into `memory_write` handler in `src/server.ts` — merge auto tags with user-provided tags before inserting into document_tags
|
|
20
|
+
- [ ] 3.5 Add unit tests for categorizer: test each category detection, multi-category, no-match, and performance (<5ms for 10KB)
|
|
21
|
+
|
|
22
|
+
## 4. Access Tracking
|
|
23
|
+
- [ ] 4.1 Add `trackAccess(docIds: number[])` method to Store interface and implement in `src/store.ts` — batch UPDATE access_count = access_count + 1, last_accessed_at = datetime('now')
|
|
24
|
+
- [ ] 4.2 Wire access tracking into MCP search tool handlers in `src/server.ts` — call trackAccess after returning results for memory_search, memory_vsearch, memory_query
|
|
25
|
+
- [ ] 4.3 Ensure internal pipeline queries (expansion, reranking) do NOT trigger access tracking
|
|
26
|
+
- [ ] 4.4 Add tests for access tracking (verify increment, verify internal queries don't track)
|
|
27
|
+
|
|
28
|
+
## 5. Decay Score Computation
|
|
29
|
+
- [ ] 5.1 Add `computeDecayScore(lastAccessedAt: string | null, createdAt: string, halfLifeDays: number): number` function in `src/search.ts`
|
|
30
|
+
- [ ] 5.2 Implement decay formula: `1 / (1 + daysSinceAccess / halfLife)` where daysSinceAccess uses `lastAccessedAt` or falls back to `createdAt` if NULL
|
|
31
|
+
- [ ] 5.3 Add tests for decay score computation (edge cases: NULL last_accessed_at, zero days, large daysSinceAccess)
|
|
32
|
+
|
|
33
|
+
## 6. Usage-Based Search Boosting
|
|
34
|
+
- [ ] 6.1 Add `applyUsageBoost(results: SearchResult[], config: { usageBoostWeight: number, decayHalfLifeDays: number }): SearchResult[]` function in `src/search.ts`
|
|
35
|
+
- [ ] 6.2 Implement boost formula: `log2(1 + access_count) * decayScore * boostWeight` where decayScore = `1 / (1 + daysSinceAccess / halfLife)`
|
|
36
|
+
- [ ] 6.3 Ensure access_count and last_accessed_at are loaded in search result queries (update SQL in store.ts searchFTS and searchVec)
|
|
37
|
+
- [ ] 6.4 Wire applyUsageBoost into hybrid search pipeline in `src/search.ts` in the correct position: after applyCentralityBoost, before applySupersedeDemotion
|
|
38
|
+
- [ ] 6.5 Add tests for usage boost integration in search pipeline (verify pipeline order, verify weight=0 disables boost)
|
|
39
|
+
|
|
40
|
+
## 7. Storage Integration
|
|
41
|
+
- [ ] 7.1 Update size-based eviction in `src/storage.ts` to sort by access_count (ascending) within same age tier when decay.enabled is true
|
|
42
|
+
- [ ] 7.2 Ensure eviction falls back to age-only ordering when decay.enabled is false or all access_counts are 0
|
|
43
|
+
- [ ] 7.3 Add tests for access-aware eviction (same age different access counts, decay disabled, all zero access counts)
|
|
44
|
+
|
|
45
|
+
## 8. Testing & Validation
|
|
46
|
+
- [ ] 8.1 Add tests for schema migration (fresh DB, existing DB without columns)
|
|
47
|
+
- [ ] 8.2 Run full test suite and verify no regressions
|
|
48
|
+
- [ ] 8.3 Manual smoke test: write a memory, search for it multiple times, verify access_count increments and boost affects ranking
|
|
49
|
+
- [ ] 8.4 Performance test: measure search latency with 10k memories, 20 results (verify access tracking overhead is acceptable)
|