nano-brain 2026.3.7 → 2026.3.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (32) hide show
  1. package/CHANGELOG.md +13 -0
  2. package/openspec/changes/nano-brain-phase1-memory-intelligence/.openspec.yaml +2 -0
  3. package/openspec/changes/nano-brain-phase1-memory-intelligence/design.md +206 -0
  4. package/openspec/changes/nano-brain-phase1-memory-intelligence/proposal.md +30 -0
  5. package/openspec/changes/nano-brain-phase1-memory-intelligence/specs/auto-categorization/spec.md +67 -0
  6. package/openspec/changes/nano-brain-phase1-memory-intelligence/specs/memory-relevance-decay/spec.md +85 -0
  7. package/openspec/changes/nano-brain-phase1-memory-intelligence/specs/search-pipeline/spec.md +23 -0
  8. package/openspec/changes/nano-brain-phase1-memory-intelligence/specs/storage-limits/spec.md +23 -0
  9. package/openspec/changes/nano-brain-phase1-memory-intelligence/specs/usage-based-boosting/spec.md +60 -0
  10. package/openspec/changes/nano-brain-phase1-memory-intelligence/tasks.md +49 -0
  11. package/openspec/changes/nano-brain-phase2-llm-memory-consolidation/.openspec.yaml +2 -0
  12. package/openspec/changes/nano-brain-phase2-llm-memory-consolidation/design.md +136 -0
  13. package/openspec/changes/nano-brain-phase2-llm-memory-consolidation/proposal.md +31 -0
  14. package/openspec/changes/nano-brain-phase2-llm-memory-consolidation/specs/fact-extraction/spec.md +113 -0
  15. package/openspec/changes/nano-brain-phase2-llm-memory-consolidation/specs/mcp-server/spec.md +62 -0
  16. package/openspec/changes/nano-brain-phase2-llm-memory-consolidation/specs/memory-consolidation/spec.md +108 -0
  17. package/openspec/changes/nano-brain-phase2-llm-memory-consolidation/specs/storage-limits/spec.md +36 -0
  18. package/openspec/changes/nano-brain-phase2-llm-memory-consolidation/tasks.md +105 -0
  19. package/openspec/changes/nano-brain-phase3-knowledge-graph/.openspec.yaml +2 -0
  20. package/openspec/changes/nano-brain-phase3-knowledge-graph/design.md +147 -0
  21. package/openspec/changes/nano-brain-phase3-knowledge-graph/proposal.md +30 -0
  22. package/openspec/changes/nano-brain-phase3-knowledge-graph/specs/mcp-server/spec.md +83 -0
  23. package/openspec/changes/nano-brain-phase3-knowledge-graph/specs/memory-entity-graph/spec.md +72 -0
  24. package/openspec/changes/nano-brain-phase3-knowledge-graph/specs/proactive-surfacing/spec.md +58 -0
  25. package/openspec/changes/nano-brain-phase3-knowledge-graph/specs/temporal-reasoning/spec.md +74 -0
  26. package/openspec/changes/nano-brain-phase3-knowledge-graph/tasks.md +98 -0
  27. package/openspec/changes/nano-brain-resource-optimization/.openspec.yaml +2 -0
  28. package/package.json +1 -1
  29. package/src/codebase.ts +74 -8
  30. package/src/providers/qdrant.ts +56 -24
  31. package/src/store.ts +5 -0
  32. package/src/types.ts +1 -0
package/CHANGELOG.md CHANGED
@@ -1,5 +1,18 @@
1
1
  # Changelog
2
2
 
3
+ ## [2026.3.8] - 2026-03-08
4
+
5
+ ### Fixed
6
+
7
+ - **Embedding 0 chunks infinite loop**: When `chunkMarkdown` returned 0 chunks (empty/whitespace-only body), the batch was counted as embedded but no `content_vectors` rows were inserted. Next iteration fetched the same docs, looping forever. Now skips empty-body docs and adds them to `failedHashes`.
8
+ - **Qdrant fire-and-forget desync**: `insertEmbedding` upserted to Qdrant via `.catch()` (fire-and-forget) then immediately wrote to `content_vectors`. If Qdrant failed, SQLite thought the doc was embedded but Qdrant didn't have it. Now awaits Qdrant `batchUpsert` before writing `content_vectors`.
9
+ - **Qdrant socket errors under load**: Individual per-chunk upserts created hundreds of concurrent HTTP requests, overwhelming the connection. Replaced with batched upserts (100 vectors/request) with retry + exponential backoff (up to 3 retries) for `UND_ERR_SOCKET`, `ECONNRESET`, and `ECONNREFUSED` errors.
10
+
11
+ ### Added
12
+
13
+ - **Embed batch file logging**: Embed log now shows file names being processed: `[embed] Batch 3 docs, 10 chunks: package.json, tsconfig.json, README.md`.
14
+ - **`insertEmbeddingLocal`**: SQLite-only embedding record method for use when external vector store is handled separately.
15
+
3
16
  ## [2026.2.0] - 2026-03-05
4
17
 
5
18
  ### Added
@@ -0,0 +1,2 @@
1
+ schema: spec-driven
2
+ created: 2026-03-07
@@ -0,0 +1,206 @@
1
+ # Design: Memory Intelligence Phase 1
2
+
3
+ ## Context
4
+
5
+ nano-brain currently treats all memories equally regardless of age or usage. A 6-month-old debugging note ranks the same as yesterday's architecture decision. The search pipeline (RRF fusion → top rank bonus → centrality boost → supersede demotion → position-aware blending) has no awareness of memory lifecycle. This design adds three lightweight intelligence features to the existing pipeline without introducing LLM dependencies or blocking operations.
6
+
7
+ The search scoring pipeline lives in `search.ts` and follows this flow:
8
+ 1. `rrfFuse` — combines BM25 (FTS) and vector search results
9
+ 2. `applyTopRankBonus` — boosts top-ranked results (not currently in use)
10
+ 3. `applyCentralityBoost` — multiplies score by `(1 + centralityWeight * centrality)`
11
+ 4. `applySupersedeDemotion` — multiplies score by `demotionFactor` for superseded docs
12
+ 5. `positionAwareBlend` — blends RRF and rerank scores based on position (top3/mid/tail)
13
+
14
+ Schema is in `store.ts` with tables: `documents`, `content`, `document_tags`, `content_vectors`, `documents_fts`. The `documents` table currently tracks `created_at`, `modified_at`, and `active` but has no access tracking.
15
+
16
+ ## Goals
17
+
18
+ **In Scope:**
19
+ - Track memory access patterns (count and timestamp) without performance overhead
20
+ - Apply time-based relevance decay to search scoring using a configurable half-life model
21
+ - Auto-categorize memories on write using fast heuristic rules (no LLM)
22
+ - Boost frequently accessed memories in search results
23
+ - Maintain backward compatibility with existing memories and config
24
+
25
+ **Out of Scope (Phase 2+):**
26
+ - LLM-based categorization or summarization
27
+ - Automatic memory deletion or archival
28
+ - User-facing UI for memory management
29
+ - Cross-collection memory relationships
30
+ - Memory consolidation or deduplication
31
+
32
+ ## Decisions
33
+
34
+ ### 1. Memory Relevance Decay
35
+
36
+ **What:** Add `access_count INTEGER DEFAULT 0` and `last_accessed_at TEXT` columns to the `documents` table. Track access on every search result returned to the user. Apply exponential decay to search scores based on time since last access.
37
+
38
+ **Why:** Memories that haven't been accessed in months are less likely to be relevant now. Exponential decay with a configurable half-life (default 30 days) provides intuitive control: memory is "half as relevant" after N days of no access. This is gentler than LRU eviction (which deletes) and more flexible than fixed TTL (which is binary).
39
+
40
+ **How:**
41
+ - Schema migration: `ALTER TABLE documents ADD COLUMN access_count INTEGER DEFAULT 0; ALTER TABLE documents ADD COLUMN last_accessed_at TEXT;`
42
+ - On search result return (in `hybridSearch` after final scoring), increment `access_count` and update `last_accessed_at` for each result shown to the user
43
+ - Decay function: `decayScore = 1 / (1 + daysSinceAccess / halfLife)` where `daysSinceAccess = (now - last_accessed_at) / 86400000`
44
+ - Applied as a multiplier in the scoring pipeline after RRF fusion but before position-aware blending
45
+ - New config section in `config.yml`:
46
+ ```yaml
47
+ decay:
48
+ enabled: true
49
+ halfLife: "30d" # parsed to days
50
+ boostWeight: 0.15 # how much decay affects final score
51
+ ```
52
+ - Backward compatible: existing memories get `access_count=0`, `last_accessed_at=NULL`; decay treats NULL as "never accessed" (maximum decay)
53
+
54
+ **Alternatives considered:**
55
+ - **LRU eviction:** Too aggressive. Deletes memories permanently. Users lose context.
56
+ - **Fixed TTL:** Too rigid. A 90-day-old architecture decision is still valuable; a 7-day-old typo fix is not.
57
+ - **No decay (status quo):** Loses signal. Old noise drowns out recent signal as memory grows.
58
+
59
+ **Trade-offs:**
60
+ - Adds 2 columns to `documents` table (minimal storage overhead: ~16 bytes per doc)
61
+ - Adds 1 UPDATE per search result returned (negligible for typical result counts of 5-20)
62
+ - Decay calculation is pure math (no I/O, no blocking)
63
+
64
+ ### 2. Auto-Categorization on Write
65
+
66
+ **What:** When `memory_write` is called, classify content into predefined categories using keyword/pattern heuristics. Populate the existing `document_tags` table with auto-generated tags prefixed with `auto:`.
67
+
68
+ **Why:** Manual tagging is tedious and inconsistent. Heuristic categorization is instant, deterministic, and requires no external dependencies. Categories help users filter searches (e.g., "show me architecture decisions") and provide structure for future features (e.g., category-specific retention policies).
69
+
70
+ **How:**
71
+ - Categories: `architecture-decision`, `debugging-insight`, `tool-config`, `pattern`, `preference`, `context`, `workflow`
72
+ - Detection rules (keyword matching, case-insensitive):
73
+ - `architecture-decision`: "decided", "chose", "architecture", "design decision", "trade-off", "approach"
74
+ - `debugging-insight`: "error", "fix", "bug", "stack trace", "crash", "exception", "workaround"
75
+ - `tool-config`: "config", "setup", "install", "environment", "settings", "configuration"
76
+ - `pattern`: "pattern", "convention", "idiom", "best practice", "anti-pattern"
77
+ - `preference`: "prefer", "avoid", "like", "dislike", "style", "opinion"
78
+ - `context`: "context", "background", "overview", "summary", "explanation"
79
+ - `workflow`: "workflow", "process", "steps", "procedure", "checklist"
80
+ - Applied in `store.insertDocument` after content insertion
81
+ - Tags are prefixed with `auto:` (e.g., `auto:architecture-decision`) to distinguish from user-provided tags
82
+ - Additive: does not remove user-provided tags
83
+ - Multiple categories can apply to a single memory
84
+
85
+ **Alternatives considered:**
86
+ - **LLM-based categorization:** Too slow (100-500ms per write), costs money, requires API keys. Deferred to Phase 2.
87
+ - **No categorization (status quo):** Memories remain unstructured. Harder to filter or prioritize.
88
+ - **User-only tagging:** Requires manual effort. Users forget or skip tagging.
89
+
90
+ **Trade-offs:**
91
+ - Heuristics are imperfect. Some memories will be miscategorized or miss categories.
92
+ - Keyword matching is language-dependent (assumes English). Non-English memories may not categorize well.
93
+ - Adds ~5-10ms to write latency (negligible for background MCP server)
94
+
95
+ ### 3. Usage-Based Search Boosting
96
+
97
+ **What:** Integrate `access_count` and `last_accessed_at` into the hybrid search scoring pipeline. Frequently accessed memories get a configurable boost.
98
+
99
+ **Why:** Memories that are accessed repeatedly are more likely to be relevant in the future. This is a form of implicit feedback: the user's past behavior signals importance. Combined with decay, this creates a "hot/cold" memory system where frequently accessed recent memories rank highest.
100
+
101
+ **How:**
102
+ - New function `applyUsageBoost(results, config)` similar to `applyCentralityBoost`
103
+ - Formula: `usageBoost = log2(1 + access_count) * decayScore * boostWeight`
104
+ - `log2(1 + access_count)` provides diminishing returns (10 accesses is not 10x better than 1 access)
105
+ - `decayScore` is the decay multiplier from Feature 1 (so old memories don't get boosted)
106
+ - `boostWeight` is configurable (default 0.15)
107
+ - Applied in the scoring pipeline after `applyCentralityBoost`, before `applySupersedeDemotion`
108
+ - New config field in `SearchConfig` type (`types.ts`):
109
+ ```typescript
110
+ usage_boost_weight: number; // default 0.15
111
+ ```
112
+ - Backward compatible: memories with `access_count=0` get no boost (log2(1) = 0)
113
+
114
+ **Alternatives considered:**
115
+ - **Replace RRF entirely with usage-based ranking:** Too risky. RRF is proven. Usage is a signal, not the only signal.
116
+ - **Separate boost index:** Over-engineered. Usage data is already in `documents` table.
117
+ - **Linear boost (not log):** Amplifies outliers too much. A memory accessed 100 times would dominate results.
118
+
119
+ **Trade-offs:**
120
+ - Adds 1 JOIN to search query (negligible: `documents` table is already joined)
121
+ - Boost calculation is pure math (no I/O, no blocking)
122
+ - Cold start problem: new memories have `access_count=0` and rank lower. Mitigated by decay (new memories have no decay penalty).
123
+
124
+ ## Risks and Trade-offs
125
+
126
+ ### Performance
127
+ - **Risk:** Access tracking adds 1 UPDATE per search result. For 20 results, that's 20 UPDATEs.
128
+ - **Mitigation:** SQLite handles small UPDATEs efficiently. Batch UPDATEs in a single transaction. Measure latency in testing.
129
+ - **Fallback:** If latency is unacceptable, make access tracking async (fire-and-forget).
130
+
131
+ ### Accuracy
132
+ - **Risk:** Heuristic categorization is imperfect. Memories may be miscategorized.
133
+ - **Mitigation:** Use `auto:` prefix so users can distinguish auto-tags from manual tags. Users can remove incorrect auto-tags.
134
+ - **Fallback:** If categorization is too noisy, add a config flag to disable it.
135
+
136
+ ### Cold Start
137
+ - **Risk:** New memories have `access_count=0` and rank lower than old frequently accessed memories.
138
+ - **Mitigation:** Decay penalizes old memories. New memories have no decay penalty, so they start with a neutral score.
139
+ - **Fallback:** Add a "recency boost" in Phase 2 to explicitly favor new memories.
140
+
141
+ ### Backward Compatibility
142
+ - **Risk:** Existing memories have `access_count=0` and `last_accessed_at=NULL`.
143
+ - **Mitigation:** Decay treats NULL as "never accessed" (maximum decay). This is correct: old memories that were never accessed should rank lower.
144
+ - **Fallback:** Provide a migration script to backfill `last_accessed_at` with `created_at` for existing memories.
145
+
146
+ ### Configuration Complexity
147
+ - **Risk:** Adding `decay` and `usage_boost_weight` config fields increases surface area.
148
+ - **Mitigation:** Provide sensible defaults (enabled, 30d half-life, 0.15 boost weight). Document in README.
149
+ - **Fallback:** If users find config overwhelming, hide advanced options behind a `--advanced` flag.
150
+
151
+ ## Implementation Notes
152
+
153
+ ### Schema Migration
154
+ ```sql
155
+ ALTER TABLE documents ADD COLUMN access_count INTEGER DEFAULT 0;
156
+ ALTER TABLE documents ADD COLUMN last_accessed_at TEXT;
157
+ CREATE INDEX IF NOT EXISTS idx_documents_access ON documents(last_accessed_at);
158
+ ```
159
+
160
+ ### Scoring Pipeline Order
161
+ 1. `rrfFuse` — combine BM25 + vector
162
+ 2. `applyTopRankBoost` (if enabled)
163
+ 3. `applyCentralityBoost` (existing)
164
+ 4. **`applyUsageBoost` (new)** — boost frequently accessed memories
165
+ 5. `applySupersedeDemotion` (existing)
166
+ 6. `positionAwareBlend` — blend RRF + rerank scores
167
+ 7. **Track access** (new) — increment `access_count`, update `last_accessed_at`
168
+
169
+ ### Config Schema
170
+ ```yaml
171
+ decay:
172
+ enabled: true
173
+ halfLife: "30d" # supports "7d", "30d", "90d", etc.
174
+ boostWeight: 0.15
175
+
176
+ search:
177
+ usage_boost_weight: 0.15
178
+ # ... existing fields
179
+ ```
180
+
181
+ ### Auto-Categorization Rules
182
+ Implemented as a simple keyword matcher in `store.ts`:
183
+ ```typescript
184
+ function autoCategorizeTags(content: string): string[] {
185
+ const tags: string[] = [];
186
+ const lower = content.toLowerCase();
187
+
188
+ if (/\b(decided|chose|architecture|design decision|trade-?off|approach)\b/.test(lower)) {
189
+ tags.push('auto:architecture-decision');
190
+ }
191
+ if (/\b(error|fix|bug|stack trace|crash|exception|workaround)\b/.test(lower)) {
192
+ tags.push('auto:debugging-insight');
193
+ }
194
+ // ... more rules
195
+
196
+ return tags;
197
+ }
198
+ ```
199
+
200
+ ### Testing Strategy
201
+ - Unit tests for decay function (edge cases: NULL, 0, negative)
202
+ - Unit tests for usage boost (edge cases: 0 access_count, high access_count)
203
+ - Unit tests for auto-categorization (each category, multi-category, no match)
204
+ - Integration test: write memory → search → verify access tracking
205
+ - Integration test: search with decay enabled vs disabled
206
+ - Performance test: measure search latency with 10k memories, 20 results
@@ -0,0 +1,30 @@
1
+ ## Why
2
+
3
+ nano-brain stores memories indefinitely with equal weight — a 6-month-old unused debugging note ranks the same as yesterday's critical architecture decision. There is no automatic organization, no relevance scoring, and no way to distinguish signal from noise as memory accumulates over time. Competitive memory systems (Mem0, memU) achieve 26-74% higher accuracy on benchmarks partly through intelligent memory lifecycle management. Phase 1 addresses the three lowest-effort, highest-impact gaps: relevance decay, automatic categorization, and usage-based ranking.
4
+
5
+ ## What Changes
6
+
7
+ - **Memory relevance decay**: Add `access_count` and `last_accessed_at` tracking to documents. Introduce a configurable decay function that deprioritizes stale, unused memories in search results. Memories accessed frequently stay prominent; neglected ones fade gracefully without deletion.
8
+ - **Auto-categorization on write**: When `memory_write` is called, classify the content into predefined categories (architecture-decision, debugging-insight, tool-config, pattern, preference, context) using lightweight keyword/heuristic matching. Populate the existing `tags` field automatically. No LLM dependency for Phase 1 — keep it fast and local.
9
+ - **Usage-based search boosting**: Integrate access frequency and recency into the hybrid search scoring pipeline. Frequently retrieved memories get a configurable boost in RRF fusion, complementing the existing BM25 + vector + rerank pipeline.
10
+
11
+ ## Capabilities
12
+
13
+ ### New Capabilities
14
+ - `memory-relevance-decay`: Track memory access patterns and apply time-based relevance decay to search scoring
15
+ - `auto-categorization`: Automatically classify and tag memories on write using heuristic rules
16
+ - `usage-based-boosting`: Boost frequently accessed memories in hybrid search results
17
+
18
+ ### Modified Capabilities
19
+ - `search-pipeline`: Search scoring now incorporates access frequency and recency as additional ranking signals
20
+ - `storage-limits`: Decay metadata (access_count, last_accessed_at) added to document schema; retention eviction can optionally prioritize low-access documents
21
+
22
+ ## Impact
23
+
24
+ - **Schema**: New columns on `documents` table (`access_count INTEGER DEFAULT 0`, `last_accessed_at TEXT`)
25
+ - **Search pipeline** (`search.ts`): Additional scoring factor in RRF fusion for access-based boosting
26
+ - **Store** (`store.ts`): Track access on every search result retrieval; auto-tag on document insert
27
+ - **MCP server** (`server.ts`): `memory_write` gains auto-categorization; search tools update access tracking
28
+ - **Config** (`config.yml`): New `decay` section with `halfLife`, `boostWeight`, `enabled` fields
29
+ - **No new dependencies**: Heuristic categorization avoids LLM calls; decay is pure math
30
+ - **Backward compatible**: Existing memories get `access_count=0`, `last_accessed_at=NULL`; decay defaults to disabled until configured
@@ -0,0 +1,67 @@
1
+ ## ADDED Requirements
2
+
3
+ ### Requirement: Automatic tag assignment on write
4
+
5
+ When `memory_write` is called, the system SHALL analyze the content and assign category tags based on keyword/pattern heuristics. Auto-generated tags SHALL be prefixed with `auto:` (e.g., `auto:architecture-decision`).
6
+
7
+ #### Scenario: Content contains architecture decision keywords
8
+
9
+ - **WHEN** content contains "decided to use PostgreSQL"
10
+ - **THEN** the document is tagged with `auto:architecture-decision`
11
+
12
+ #### Scenario: Content contains debugging keywords
13
+
14
+ - **WHEN** content contains "fixed bug" and "stack trace"
15
+ - **THEN** the document is tagged with `auto:debugging-insight`
16
+
17
+ #### Scenario: Content with no matching patterns
18
+
19
+ - **WHEN** content does not match any category patterns
20
+ - **THEN** no auto tags are added to the document
21
+
22
+ ### Requirement: Category definitions
23
+
24
+ The system SHALL recognize these categories: `architecture-decision` (keywords: decided, chose, architecture, design, tradeoff, approach), `debugging-insight` (keywords: error, fix, bug, stack trace, debug, workaround), `tool-config` (keywords: config, setup, install, environment, .env), `pattern` (keywords: pattern, convention, always, never, rule), `preference` (keywords: prefer, like, dislike, favorite, default), `context` (keywords: context, background, note, remember), `workflow` (keywords: workflow, process, step, pipeline, deploy).
25
+
26
+ #### Scenario: Architecture decision content
27
+
28
+ - **WHEN** content is "We chose React over Vue for the frontend"
29
+ - **THEN** the document is tagged with `auto:architecture-decision`
30
+
31
+ #### Scenario: Debugging insight content
32
+
33
+ - **WHEN** content is "npm install failed, fixed by clearing cache"
34
+ - **THEN** the document is tagged with `auto:debugging-insight`
35
+
36
+ #### Scenario: Content matching multiple categories
37
+
38
+ - **WHEN** content matches patterns for both `architecture-decision` and `workflow`
39
+ - **THEN** all matching auto tags are applied to the document
40
+
41
+ ### Requirement: Additive tagging
42
+
43
+ Auto-categorization SHALL NOT remove or replace user-provided tags. Auto tags are added alongside any tags the user explicitly provides.
44
+
45
+ #### Scenario: User provides explicit tags
46
+
47
+ - **WHEN** user provides tags=["important"] and content matches debugging patterns
48
+ - **THEN** the final tags are ["important", "auto:debugging-insight"]
49
+
50
+ #### Scenario: User provides no tags
51
+
52
+ - **WHEN** user provides no tags and content matches categorization patterns
53
+ - **THEN** only auto tags are applied to the document
54
+
55
+ ### Requirement: Deterministic and fast
56
+
57
+ Auto-categorization SHALL NOT use LLM calls. It SHALL complete in under 5ms for typical content (under 10KB). The heuristic engine SHALL be pure keyword/regex matching.
58
+
59
+ #### Scenario: Small document categorization
60
+
61
+ - **WHEN** a 5KB markdown document is written
62
+ - **THEN** auto-categorization completes in under 5ms
63
+
64
+ #### Scenario: Large document categorization
65
+
66
+ - **WHEN** a 100KB document is written
67
+ - **THEN** auto-categorization completes in under 50ms
@@ -0,0 +1,85 @@
1
+ ## ADDED Requirements
2
+
3
+ ### Requirement: Access tracking on search results
4
+
5
+ The system SHALL increment `access_count` and update `last_accessed_at` for every document returned in search results to the user. Internal pipeline queries (expansion, reranking) SHALL NOT trigger access tracking.
6
+
7
+ #### Scenario: Search returns multiple results
8
+
9
+ - **WHEN** a search returns 5 results to the user
10
+ - **THEN** all 5 documents have their `access_count` incremented by 1
11
+ - **THEN** all 5 documents have their `last_accessed_at` updated to the current timestamp
12
+
13
+ #### Scenario: Same document returned in separate searches
14
+
15
+ - **WHEN** the same document is returned in two separate user searches
16
+ - **THEN** the document's `access_count` is 2
17
+ - **THEN** the document's `last_accessed_at` reflects the most recent search timestamp
18
+
19
+ #### Scenario: Internal pipeline query does not trigger tracking
20
+
21
+ - **WHEN** a vector-only internal search occurs during the hybrid pipeline
22
+ - **THEN** no `access_count` increments occur for those intermediate results
23
+ - **THEN** only the final results returned to the user trigger access tracking
24
+
25
+ ### Requirement: Decay score computation
26
+
27
+ The system SHALL compute a decay score using the formula `1 / (1 + daysSinceAccess / halfLife)` where `daysSinceAccess` is the number of days since `last_accessed_at` and `halfLife` is configurable. Documents with NULL `last_accessed_at` SHALL use `created_at` as fallback.
28
+
29
+ #### Scenario: Document accessed today
30
+
31
+ - **WHEN** a document was accessed today (daysSinceAccess = 0)
32
+ - **THEN** the decay score is approximately 1.0
33
+
34
+ #### Scenario: Document not accessed for 30 days with 30-day half-life
35
+
36
+ - **WHEN** a document has not been accessed for 30 days and `halfLife` is 30 days
37
+ - **THEN** the decay score is approximately 0.5
38
+
39
+ #### Scenario: Document never accessed
40
+
41
+ - **WHEN** a document has NULL `last_accessed_at`
42
+ - **THEN** the system uses `created_at` for the `daysSinceAccess` calculation
43
+ - **THEN** the decay score is computed based on the document's age
44
+
45
+ ### Requirement: Schema migration
46
+
47
+ The system SHALL add `access_count INTEGER DEFAULT 0` and `last_accessed_at TEXT DEFAULT NULL` columns to the `documents` table. Existing documents SHALL retain default values. Migration SHALL be backward compatible (no data loss).
48
+
49
+ #### Scenario: Fresh database initialization
50
+
51
+ - **WHEN** a new database is created
52
+ - **THEN** the `documents` table includes `access_count` and `last_accessed_at` columns from creation
53
+
54
+ #### Scenario: Existing database without decay columns
55
+
56
+ - **WHEN** an existing database does not have `access_count` or `last_accessed_at` columns
57
+ - **THEN** an ALTER TABLE migration adds both columns with default values
58
+ - **THEN** no existing data is lost
59
+
60
+ #### Scenario: Existing documents after migration
61
+
62
+ - **WHEN** the migration completes on a database with existing documents
63
+ - **THEN** all existing documents have `access_count` set to 0
64
+ - **THEN** all existing documents have `last_accessed_at` set to NULL
65
+
66
+ ### Requirement: Decay configuration
67
+
68
+ The system SHALL support a `decay` section in config.yml with `enabled` (boolean, default false), `halfLife` (duration string, default "30d"), and `boostWeight` (number 0-1, default 0.15). Invalid values SHALL log a warning and use defaults.
69
+
70
+ #### Scenario: Decay not configured
71
+
72
+ - **WHEN** config.yml has no `decay` section
73
+ - **THEN** decay is disabled by default
74
+ - **THEN** no decay scoring is applied to search results
75
+
76
+ #### Scenario: Decay enabled with custom half-life
77
+
78
+ - **WHEN** config.yml contains `decay: { enabled: true, halfLife: "7d" }`
79
+ - **THEN** the system uses a 7-day half-life for decay calculations
80
+
81
+ #### Scenario: Invalid half-life value
82
+
83
+ - **WHEN** config.yml contains `decay: { halfLife: "banana" }`
84
+ - **THEN** a warning is logged indicating the invalid value
85
+ - **THEN** the default half-life of 30 days is used
@@ -0,0 +1,23 @@
1
+ ## ADDED Requirements
2
+
3
+ ### Requirement: Usage-aware scoring in hybrid pipeline
4
+
5
+ The `memory_query` hybrid search pipeline SHALL incorporate usage-based scoring as an additional ranking signal. The scoring pipeline order SHALL be: RRF fusion → top-rank bonus → centrality boost → usage boost → supersede demotion → position-aware blend (if reranking enabled).
6
+
7
+ #### Scenario: Hybrid search with usage boost enabled
8
+
9
+ - **WHEN** a hybrid search is performed with usage boost enabled
10
+ - **THEN** search results reflect access patterns in their ranking
11
+ - **THEN** frequently accessed documents rank higher than identical documents with lower access counts
12
+
13
+ #### Scenario: Hybrid search with usage boost disabled
14
+
15
+ - **WHEN** a hybrid search is performed with `usage_boost_weight` set to 0
16
+ - **THEN** the search results are identical to the current behavior without usage boosting
17
+ - **THEN** no usage-based score adjustments are applied
18
+
19
+ #### Scenario: BM25-only search does not apply usage boost
20
+
21
+ - **WHEN** a BM25-only search is performed using `memory_search`
22
+ - **THEN** no usage boost is applied to the results
23
+ - **THEN** only the hybrid pipeline (`memory_query`) incorporates usage-based scoring
@@ -0,0 +1,23 @@
1
+ ## ADDED Requirements
2
+
3
+ ### Requirement: Access-aware retention eviction
4
+
5
+ When performing size-based eviction (storage exceeds maxSize), the system SHALL optionally consider access_count when selecting documents to evict. Documents with lower access_count SHALL be evicted before documents with higher access_count, within the same age tier. This behavior SHALL be enabled when `decay.enabled` is true in config.
6
+
7
+ #### Scenario: Two documents same age with different access counts
8
+
9
+ - **WHEN** two documents have the same age, one with `access_count` of 10 and one with `access_count` of 0
10
+ - **THEN** the document with `access_count` of 0 is evicted first
11
+ - **THEN** the document with `access_count` of 10 is retained
12
+
13
+ #### Scenario: Decay disabled uses age-only eviction
14
+
15
+ - **WHEN** `decay.enabled` is false in config
16
+ - **THEN** eviction uses age-only ordering (current behavior)
17
+ - **THEN** `access_count` is not considered in eviction decisions
18
+
19
+ #### Scenario: All documents have zero access count
20
+
21
+ - **WHEN** all documents have `access_count` of 0
22
+ - **THEN** the system falls back to age-based eviction
23
+ - **THEN** the oldest documents are evicted first
@@ -0,0 +1,60 @@
1
+ ## ADDED Requirements
2
+
3
+ ### Requirement: Usage boost in search scoring
4
+
5
+ The hybrid search pipeline SHALL apply a usage-based boost to results using the formula `usageBoost = log2(1 + access_count) * decayScore * boostWeight`. The boost SHALL be applied as an additive score adjustment.
6
+
7
+ #### Scenario: Document with no access history
8
+
9
+ - **WHEN** a document has `access_count` of 0
10
+ - **THEN** the usage boost is 0 (since log2(1) = 0)
11
+ - **THEN** the document's score is not affected by usage boosting
12
+
13
+ #### Scenario: Document with moderate access and recent activity
14
+
15
+ - **WHEN** a document has `access_count` of 7 and was recently accessed
16
+ - **THEN** a moderate usage boost is applied (log2(8) * decayScore * boostWeight)
17
+ - **THEN** the document ranks higher than identical documents with lower access counts
18
+
19
+ #### Scenario: Document with high access but stale
20
+
21
+ - **WHEN** a document has `access_count` of 100 but has not been accessed recently
22
+ - **THEN** the usage boost is reduced by the decay score
23
+ - **THEN** the boost is lower than a recently accessed document with the same access count
24
+
25
+ ### Requirement: Boost pipeline position
26
+
27
+ The usage boost SHALL be applied after centrality boost and before supersede demotion in the search scoring pipeline.
28
+
29
+ #### Scenario: Result with high centrality and high usage
30
+
31
+ - **WHEN** a document has both high centrality and high usage scores
32
+ - **THEN** both the centrality boost and usage boost are applied
33
+ - **THEN** the boosts compound to increase the document's final score
34
+
35
+ #### Scenario: Superseded document with high usage
36
+
37
+ - **WHEN** a superseded document has high usage
38
+ - **THEN** the usage boost is applied first
39
+ - **THEN** the supersede demotion is applied afterward, reducing the final score
40
+
41
+ ### Requirement: Configuration
42
+
43
+ The SearchConfig SHALL include `usage_boost_weight` (number, default 0.15). Setting to 0 effectively disables usage boosting without disabling access tracking.
44
+
45
+ #### Scenario: Usage boost disabled
46
+
47
+ - **WHEN** `usage_boost_weight` is set to 0
48
+ - **THEN** no usage boost is applied to search results
49
+ - **THEN** access tracking still occurs for all returned documents
50
+
51
+ #### Scenario: Stronger usage signal
52
+
53
+ - **WHEN** `usage_boost_weight` is set to 0.3
54
+ - **THEN** usage-based boosting has a stronger effect on search rankings
55
+
56
+ #### Scenario: Negative boost weight
57
+
58
+ - **WHEN** `usage_boost_weight` is set to a negative value
59
+ - **THEN** a warning is logged indicating the invalid value
60
+ - **THEN** the default value of 0.15 is used
@@ -0,0 +1,49 @@
1
+ # Tasks: Memory Intelligence Phase 1
2
+
3
+ ## 1. Schema & Migration
4
+ - [ ] 1.1 Add `access_count INTEGER DEFAULT 0` and `last_accessed_at TEXT` columns to documents table CREATE TABLE in `src/store.ts`
5
+ - [ ] 1.2 Add migration logic to detect and ALTER TABLE for existing databases (follow existing migration pattern in store.ts around line 196-230)
6
+ - [ ] 1.3 Add `access_count` and `lastAccessedAt` fields to `Document` interface and `SearchResult` interface in `src/types.ts`
7
+ - [ ] 1.4 Add index on `last_accessed_at` column: `CREATE INDEX IF NOT EXISTS idx_documents_access ON documents(last_accessed_at)`
8
+
9
+ ## 2. Decay Configuration
10
+ - [ ] 2.1 Add `decay` section to `CollectionConfig` interface in `src/types.ts`: `{ enabled: boolean, halfLife: string, boostWeight: number }`
11
+ - [ ] 2.2 Add decay config parsing with duration string support and validation (reuse existing parseDuration pattern from storage.ts if available, or add new)
12
+ - [ ] 2.3 Add `usage_boost_weight` field to `SearchConfig` interface and `DEFAULT_SEARCH_CONFIG` in `src/types.ts` (default 0.15)
13
+ - [ ] 2.4 Add config validation for `boostWeight` (0-1 range) and `usage_boost_weight` (warn on negative, use default)
14
+
15
+ ## 3. Auto-Categorization Engine
16
+ - [ ] 3.1 Create new file `src/categorizer.ts` with `categorize(content: string): string[]` function — pure keyword/regex matching, returns array of category strings
17
+ - [ ] 3.2 Implement category rules: architecture-decision, debugging-insight, tool-config, pattern, preference, context, workflow with keyword lists per the spec
18
+ - [ ] 3.3 Add `auto:` prefix to all auto-generated tags
19
+ - [ ] 3.4 Wire categorizer into `memory_write` handler in `src/server.ts` — merge auto tags with user-provided tags before inserting into document_tags
20
+ - [ ] 3.5 Add unit tests for categorizer: test each category detection, multi-category, no-match, and performance (<5ms for 10KB)
21
+
22
+ ## 4. Access Tracking
23
+ - [ ] 4.1 Add `trackAccess(docIds: number[])` method to Store interface and implement in `src/store.ts` — batch UPDATE access_count = access_count + 1, last_accessed_at = datetime('now')
24
+ - [ ] 4.2 Wire access tracking into MCP search tool handlers in `src/server.ts` — call trackAccess after returning results for memory_search, memory_vsearch, memory_query
25
+ - [ ] 4.3 Ensure internal pipeline queries (expansion, reranking) do NOT trigger access tracking
26
+ - [ ] 4.4 Add tests for access tracking (verify increment, verify internal queries don't track)
27
+
28
+ ## 5. Decay Score Computation
29
+ - [ ] 5.1 Add `computeDecayScore(lastAccessedAt: string | null, createdAt: string, halfLifeDays: number): number` function in `src/search.ts`
30
+ - [ ] 5.2 Implement decay formula: `1 / (1 + daysSinceAccess / halfLife)` where daysSinceAccess uses `lastAccessedAt` or falls back to `createdAt` if NULL
31
+ - [ ] 5.3 Add tests for decay score computation (edge cases: NULL last_accessed_at, zero days, large daysSinceAccess)
32
+
33
+ ## 6. Usage-Based Search Boosting
34
+ - [ ] 6.1 Add `applyUsageBoost(results: SearchResult[], config: { usageBoostWeight: number, decayHalfLifeDays: number }): SearchResult[]` function in `src/search.ts`
35
+ - [ ] 6.2 Implement boost formula: `log2(1 + access_count) * decayScore * boostWeight` where decayScore = `1 / (1 + daysSinceAccess / halfLife)`
36
+ - [ ] 6.3 Ensure access_count and last_accessed_at are loaded in search result queries (update SQL in store.ts searchFTS and searchVec)
37
+ - [ ] 6.4 Wire applyUsageBoost into hybrid search pipeline in `src/search.ts` in the correct position: after applyCentralityBoost, before applySupersedeDemotion
38
+ - [ ] 6.5 Add tests for usage boost integration in search pipeline (verify pipeline order, verify weight=0 disables boost)
39
+
40
+ ## 7. Storage Integration
41
+ - [ ] 7.1 Update size-based eviction in `src/storage.ts` to sort by access_count (ascending) within same age tier when decay.enabled is true
42
+ - [ ] 7.2 Ensure eviction falls back to age-only ordering when decay.enabled is false or all access_counts are 0
43
+ - [ ] 7.3 Add tests for access-aware eviction (same age different access counts, decay disabled, all zero access counts)
44
+
45
+ ## 8. Testing & Validation
46
+ - [ ] 8.1 Add tests for schema migration (fresh DB, existing DB without columns)
47
+ - [ ] 8.2 Run full test suite and verify no regressions
48
+ - [ ] 8.3 Manual smoke test: write a memory, search for it multiple times, verify access_count increments and boost affects ranking
49
+ - [ ] 8.4 Performance test: measure search latency with 10k memories, 20 results (verify access tracking overhead is acceptable)
@@ -0,0 +1,2 @@
1
+ schema: spec-driven
2
+ created: 2026-03-07