keepsake-memory 1.0.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 j-zly
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1 @@
1
+ include src/fragmented_memory/py.typed
@@ -0,0 +1,424 @@
1
+ Metadata-Version: 2.4
2
+ Name: keepsake-memory
3
+ Version: 1.0.0
4
+ Summary: Keepsake — full-entry memory system for Hermes Agent. On-demand storage of complete memories with semantic search.
5
+ Author: j-zly
6
+ License: MIT
7
+ Keywords: hermes-agent,memory,rag,redisearch,llm
8
+ Classifier: Development Status :: 4 - Beta
9
+ Classifier: Intended Audience :: Developers
10
+ Classifier: License :: OSI Approved :: MIT License
11
+ Classifier: Programming Language :: Python :: 3
12
+ Classifier: Programming Language :: Python :: 3.10
13
+ Classifier: Programming Language :: Python :: 3.11
14
+ Classifier: Programming Language :: Python :: 3.12
15
+ Classifier: Programming Language :: Python :: 3.13
16
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
17
+ Requires-Python: >=3.10
18
+ Description-Content-Type: text/markdown
19
+ License-File: LICENSE
20
+ Requires-Dist: redis>=5.0
21
+ Requires-Dist: jieba>=0.42
22
+ Dynamic: license-file
23
+
24
+ # Keepsake — Memory Plugin for Hermes Agent
25
+
26
+ The Keepsake system automatically retrieves relevant memories and injects them into the conversation context for each dialogue.
27
+
28
+ ```text
29
+ User: "How did we set up that React project structure last time?"
30
+
31
+ Keepsake System ← Redis + RediSearch
32
+
33
+ ┌─────────────────────────────────────┐
34
+ │ [1] User prefers TypeScript + Vite │
35
+ │ [2] Previous projects used pinia state management │
36
+ │ [3] Backend suggested using .NET 10 implementation │
37
+ └─────────────────────────────────────┘
38
+
39
+ Model directly uses memories to answer
40
+ ```
41
+
42
+ ## Features
43
+
44
+ - **Full Entry Storage** — stores complete text as-is, no semantic splitting
45
+ - **BM25 Full-Text Search** — works out of the box with no external API
46
+ - **Optional Vector Search** — KNN via RediSearch (OpenAI / DashScope embedder)
47
+ - **Time Decay** — newer entries rank higher (60-day half-life configurable)
48
+ - **Sentiment Weighting** — emotional entries get priority
49
+ - **User Feedback** — mark entries useful/useless to improve ranking
50
+ - **Hot Topic Boost** — frequently discussed topics rank higher
51
+ - **Entity Extraction** — auto-tags entries with entities (people, places, crypto tickers, domain terms) at store time; searched alongside content text for higher recall
52
+ - **Entity Co-occurrence** — auto-track which entities appear together, expand search to co-occurring entities for associative recall ("Python" → also finds entries mentioning "Django")
53
+ - **Domain Dictionary** — jieba user dictionary auto-generated from corpus + synonym table, loaded on `/new` for better Chinese tokenization
54
+ - **Workflow Lock** — set `keepsake:workflow_lock` in Redis to globally disable memory retrieval (e.g. during automated workflows)
55
+ - **Skip Patterns** — define skip lists (via file) to avoid searching on trivial queries like "ok", "got it"
56
+ - **On-Demand Storage** — only `memory(action='add')` stores data; no automatic per-turn archiving
57
+ - **Search-Time Expiry** — `invalid_at` field in index: set a timestamp and the entry is filtered out at search time (no data loss, can be reverted)
58
+ - **Auto Maintenance** — consolidation (keyword clustering + LLM summarization) + selective forgetting (multi-dimension low-value detection) run every 2h to keep storage tidy
59
+
60
+ ## Design Philosophy: Clean Memory for LLMs
61
+
62
+ Keepsake stores **full, self-contained entries** — not entryed conversation snippets. The key insight is that LLMs need complete context to make use of stored information. A entry like "prefers TypeScript + Vite" without its surrounding context is useless; the full entry "User prefers TypeScript + Vite for frontend projects" is immediately actionable.
63
+
64
+ | Mechanism | Implementation |
65
+ |-----------|---------------|
66
+ | Complete Context | Stores full text entries, no splitting |
67
+ | Forgetting Curve | Time decay (60-day half-life) — old memories fade naturally |
68
+ | Emotion Deepens Memory | Emotional weight boost — intense moments stick |
69
+ | Repetition Reinforces | Attention tracking + hot topic scoring |
70
+ | Use It or Lose It | Feedback reinforcement (keepsake_feedback) |
71
+ | Association & Analogy | Synonym discovery (Jaccard co-occurrence statistics) — "deploy" ↔ "release" |
72
+ | Entity Association | Entity co-occurrence tracking — entries mentioning "BTC" also recall "halving" without being synonyms |
73
+ | Entity Tagging | Like the brain tagging memories with people/places/things — auto-extracted entities searched alongside content |
74
+ | On-Demand Storage | No automatic archiving; only saves when explicitly told to (memory tool) |
75
+ | Sleep Consolidation | Background maintenance every 2h: keyword-based clustering + LLM summarization |
76
+ | Context Isolation | agent_id tagging — different identities, separate memories |
77
+ | Fuzzy but Enough | BM25 full-text search — doesn't need an exact match to recall |
78
+
79
+ No vector database. No embedding API calls. No LLM inference for memory operations. Just **pure statistical methods** running on Redis + RediSearch — the same techniques the brain uses: frequency, recency, emotional salience, association, and feedback.
80
+
81
+ ## Requirements
82
+
83
+ - **Python 3.10+**
84
+ - **Hermes Agent 0.12+** — provides `MemoryProvider` interface
85
+ - **Redis 7+** — with RediSearch module (v2.6+)
86
+ - **jieba** — Chinese tokenization (auto-installed)
87
+ - **Embedding API** (optional) — OpenAI / DashScope / any compatible `/v1/embeddings` service
88
+
89
+ ## Installation
90
+
91
+ ```bash
92
+ pip install keepsake
93
+ ```
94
+
95
+ Or install directly from GitHub:
96
+
97
+ ```bash
98
+ pip install git+https://github.com/j-zly/keepsake.git
99
+ ```
100
+
101
+ ## Configuration
102
+
103
+ Configuration precedence (high to low): **Environment variables > JSON config file > config.yaml inline > defaults**
104
+
105
+ ### 1. Configuration Methods
106
+
107
+ There are three ways to configure Keepsake, listed in order of priority:
108
+
109
+ 1. **Environment Variables** (Highest precedence)
110
+ Set environment variables like `KEEPSAKE_REDIS_HOST`, `KEEPSAKE_REDIS_PASSWORD`, etc.
111
+
112
+ 2. **JSON Config File** (~/.config/keepsake/config.json)
113
+ A complete JSON configuration file for all settings.
114
+
115
+ 3. **Code Defaults** (Lowest precedence)
116
+ Default values defined in the code.
117
+
118
+ ### 2. Complete Configuration Example
119
+
120
+ Here's a comprehensive example of the configuration file `~/.config/keepsake/config.json` with all available options:
121
+
122
+ ```json
123
+ {
124
+ // Redis connection
125
+ "redis_host": "127.0.0.1",
126
+ "redis_port": 6379,
127
+ "redis_password": "",
128
+
129
+ // Search settings
130
+ "top_k": 5,
131
+ "candidate_k": 10,
132
+ "bm25_limit": 10,
133
+ "tag_filter": "",
134
+
135
+ // Skip patterns
136
+ "skip_min_length": 2,
137
+ "skip_patterns_file": "~/.config/keepsake/skip_patterns.txt",
138
+
139
+ // Time decay
140
+ "decay_half_days": 60,
141
+ "hot_topic_decay_half_days": 30,
142
+
143
+ // Ranking weights
144
+ "sentiment_boost_positive": 1.5,
145
+ "sentiment_boost_negative": 1.3,
146
+ "feedback_positive_boost": 1.3,
147
+ "feedback_negative_penalty": 0.5,
148
+ "hot_topic_boost": 1.2,
149
+
150
+ // Attention
151
+ "attention_boost_max": 1.5,
152
+ "attention_base_increment": 2.0,
153
+ "attention_emotion_factor": 1.5,
154
+
155
+ // Embedding (optional)
156
+ "embedder": {
157
+ "provider": "dashscope",
158
+ "api_key": "sk-xxx",
159
+ "base_url": "https://dashscope.aliyuncs.com/compatible-mode/v1",
160
+ "model": "text-embedding-v2"
161
+ },
162
+
163
+ // Auto maintenance
164
+ "consolidate_min_group": 2,
165
+ "consolidate_max_age_hours": 72,
166
+ "forget_max_age_days": 30,
167
+ "forget_dry_run": false,
168
+
169
+ // Agent isolation
170
+ "agent_id": "main-brain",
171
+ "is_primary": true,
172
+
173
+ // Synonym discovery
174
+ "synonym_min_word_freq": 10,
175
+ "synonym_jaccard_threshold": 0.5,
176
+ "synonym_min_co_occurrence": 3,
177
+
178
+ // Entity co-occurrence
179
+ "entity_cooc_top_n": 3,
180
+ "entity_cooc_min_count": 2,
181
+
182
+ // Emotion intensity factor
183
+ "emotion_intensity_factor": 0.4
184
+ }
185
+ ```
186
+
187
+ > Note: Redis password compatibility: leave empty for no authentication, or provide password to automatically send AUTH command.
188
+
189
+ ### 3. Environment Variables Reference
190
+
191
+ | Environment Variable | Corresponding Config Item | Description |
192
+ |----------------------|----------------------------|-------------|
193
+ | `KEEPSAKE_REDIS_HOST` | `redis_host` | Redis server host |
194
+ | `KEEPSAKE_REDIS_PORT` | `redis_port` | Redis server port |
195
+ | `KEEPSAKE_REDIS_PASSWORD` | `redis_password` | Redis password for authentication |
196
+ | `KEEPSAKE_TOP_K` | `top_k` | Number of final entries returned |
197
+ | `KEEPSAKE_CANDIDATE_K` | `candidate_k` | Candidate entries count (for KNN) |
198
+ | `KEEPSAKE_BM25_LIMIT` | `bm25_limit` | BM25 search candidate count |
199
+ | `KEEPSAKE_TAG_FILTER` | `tag_filter` | Tag filtering (comma-separated) |
200
+ | `KEEPSAKE_DECAY_HALF_DAYS` | `decay_half_days` | Time decay half-life (days) |
201
+ | `KEEPSAKE_HOT_TOPIC_DECAY_HALF_DAYS` | `hot_topic_decay_half_days` | Hot topic time decay half-life (days) |
202
+ | `KEEPSAKE_EMBED_CACHE_TTL` | `embed_cache_ttl` | Embedding cache TTL (seconds) |
203
+ | `KEEPSAKE_EMBEDDER` | `embedder.provider` | Embedding provider (`openai`, `dashscope`) |
204
+ | `KEEPSAKE_EMBEDDER_URL` | `embedder.base_url` | Embedding API endpoint |
205
+ | `KEEPSAKE_EMBEDDER_MODEL` | `embedder.model` | Embedding model name |
206
+ | `KEEPSAKE_CONSOLIDATE_MIN_GROUP` | `consolidate_min_group` | Minimum entries to trigger consolidation |
207
+ | `KEEPSAKE_CONSOLIDATE_MAX_AGE_HOURS` | `consolidate_max_age_hours` | Minimum age (hours) before entries can be consolidated |
208
+ | `KEEPSAKE_FORGET_MAX_AGE_DAYS` | `forget_max_age_days` | Number of days before entries might be forgotten |
209
+ | `KEEPSAKE_FORGET_DRY_RUN` | `forget_dry_run` | Safe mode: `true` = count only, `false` = actually delete |
210
+ | `KEEPSAKE_EMOTION_INTENSITY_FACTOR` | `emotion_intensity_factor` | Emotion intensity → weight coefficient (0=disabled, 1=max) |
211
+
212
+ > Note: Redis password is compatible with empty value (no auth) or password provided for AUTH command.
213
+ > Note: Changes to config.json take effect immediately without restarting (just send `/new`).
214
+
215
+ ### 4. Create Redis Index (first-time usage)
216
+
217
+ The code will auto-create (`ensure_index()`), or execute manually:
218
+
219
+ ```bash
220
+ redis-cli FT.CREATE idx:memories ON HASH PREFIX 1 "memory:frag:" SCHEMA \
221
+ content TEXT WEIGHT 1 \
222
+ tags TAG SEPARATOR "," \
223
+ category TAG SEPARATOR "," \
224
+ source TEXT WEIGHT 1 \
225
+ created TEXT WEIGHT 0 \
226
+ entry_type TAG SEPARATOR "," \
227
+ invalid_at TAG SEPARATOR "," \
228
+ entities TAG SEPARATOR "," \
229
+ embed_bin VECTOR FLAT 6 TYPE FLOAT32 DIM 1536 DISTANCE_METRIC COSINE
230
+ ```
231
+
232
+ > Dimension (DIM) is dynamically adjusted based on the embedding model used, default 1536.
233
+ > For Docker: `docker run -d --name redis-stack -p 6379:6379 redis/redis-stack:latest`
234
+
235
+ ### 5. Hermes Configuration
236
+
237
+ Enable in `~/.hermes/config.yaml`:
238
+
239
+ ```yaml
240
+ memory:
241
+ provider: keepsake
242
+ ```
243
+
244
+ If `embedder` is not configured, only BM25 full-text search mode will be used.
245
+
246
+ Also supports environment variable configuration (highest precedence):
247
+
248
+ ```bash
249
+ export KEEPSAKE_REDIS_HOST=127.0.0.1
250
+ export KEEPSAKE_REDIS_PORT=6379
251
+ export KEEPSAKE_TOP_K=5
252
+ export KEEPSAKE_EMBEDDER=dashscope
253
+ export KEEPSAKE_EMBEDDER_MODEL=text-embedding-v2
254
+ export OPENAI_API_KEY=sk-xxx # embedder API key
255
+ ```
256
+
257
+ ### 6. Workflow Lock
258
+
259
+ Temporarily disable memory retrieval during automated workflows (like batch processing):
260
+
261
+ ```bash
262
+ # Lock (3600s TTL)
263
+ redis-cli SET keepsake:workflow_lock 1 EX 3600
264
+
265
+ # Unlock
266
+ redis-cli DEL keepsake:workflow_lock
267
+ ```
268
+
269
+ ### 7. Skip Patterns File
270
+
271
+ Create a file (one pattern per line, `#` for comments):
272
+
273
+ ```text
274
+ # ~/.config/keepsake/skip_patterns.txt
275
+ 好的
276
+
277
+
278
+
279
+
280
+ 可以
281
+ 没错
282
+ ok
283
+ okay
284
+ yes
285
+ yeah
286
+ ```
287
+
288
+ Then reference it in config.json:
289
+
290
+ ```json
291
+ {
292
+ "skip_min_length": 2,
293
+ "skip_patterns_file": "~/.config/keepsake/skip_patterns.txt"
294
+ }
295
+ ```
296
+
297
+ ### 8. Restart Gateway
298
+
299
+ ```bash
300
+ # For CLI mode, restart session is sufficient
301
+ # For Gateway mode, restart the process
302
+ ```
303
+
304
+ ## Configuration Reference
305
+
306
+ | Config Item | Environment Variable | Default Value | Description |
307
+ |-------------|---------------------|---------------|-------------|
308
+ | `redis_host` | `KEEPSAKE_REDIS_HOST` | `127.0.0.1` | Redis address |
309
+ | `redis_port` | `KEEPSAKE_REDIS_PORT` | `6379` | Redis port |
310
+ | `top_k` | `KEEPSAKE_TOP_K` | `5` | Number of final entries returned |
311
+ | `candidate_k` | `KEEPSAKE_CANDIDATE_K` | `10` | Candidate entries count (for KNN) |
312
+ | `tag_filter` | `KEEPSAKE_TAG_FILTER` | `""` | Tag filtering (comma-separated) |
313
+ | `bm25_limit` | `KEEPSAKE_BM25_LIMIT` | `10` | BM25 search candidate count |
314
+ | `decay_half_days` | `KEEPSAKE_DECAY_HALF_DAYS` | `60` | Time decay half-life (days) |
315
+ | `embed_cache_ttl` | `KEEPSAKE_EMBED_CACHE_TTL` | `3600` | Embedding cache TTL (seconds) |
316
+ | `sentiment_boost_positive` | — | `1.5` | Positive entry weight multiplier |
317
+ | `sentiment_boost_negative` | — | `1.3` | Negative entry weight multiplier |
318
+ | `feedback_positive_boost` | — | `1.3` | Positive feedback bonus weight |
319
+ | `feedback_negative_penalty` | — | `0.5` | Negative feedback penalty coefficient |
320
+ | `hot_topic_boost` | — | `1.2` | Hot topic weighting multiplier |
321
+ | `embedder.provider` | `KEEPSAKE_EMBEDDER` | `openai` | `openai` / `dashscope` |
322
+ | `embedder.api_key` | `OPENAI_API_KEY` | — | Embedding API key |
323
+ | `embedder.base_url` | `KEEPSAKE_EMBEDDER_URL` | `https://api.openai.com/v1` | API endpoint |
324
+ | `embedder.model` | `KEEPSAKE_EMBEDDER_MODEL` | `text-embedding-3-small` | Embedding model name |
325
+ | `consolidate_min_group` | — | `2` | Minimum entries to trigger consolidation |
326
+ | `consolidate_max_age_hours` | — | `72` | Minimum age (hours) before consolidation |
327
+ | `forget_max_age_days` | — | `30` | Max age (days) before deletion |
328
+ | `forget_dry_run` | — | `true` | Safe mode: `true` = count only, `false` = delete |
329
+ | `agent_id` | — | `""` | Agent identity tag for isolation (e.g. `"main-brain"`) |
330
+ | `is_primary` | — | `false` | `true` = sees all entries; `false` = only tagged ones |
331
+ | `hot_topic_decay_half_days` | — | `30` | Hot topic time decay half-life (days) |
332
+ | `emotion_intensity_factor` | — | `0.4` | Emotion intensity → weight coefficient |
333
+ | `skip_min_length` | — | `2` | Minimum query length to trigger search |
334
+ | `skip_patterns_file` | — | `""` | Path to skip patterns file |
335
+ | `attention_boost_max` | — | `1.5` | Max attention weighting value |
336
+ | `attention_base_increment` | — | `2.0` | Base attention increment per mention |
337
+ | `attention_emotion_factor` | — | `1.5` | Emotion amplification for attention |
338
+ | `synonym_min_word_freq` | — | `10` | Min frequency for synonym candidate |
339
+ | `synonym_jaccard_threshold` | — | `0.5` | Jaccard threshold for synonym detection |
340
+ | `synonym_min_co_occurrence` | — | `3` | Min co-occurrence for synonym detection |
341
+ | `entity_cooc_top_n` | — | `3` | Number of co-occurring entities to expand search |
342
+ | `entity_cooc_min_count` | — | `2` | Min co-occurrence for entity association |
343
+
344
+ > `sentiment_*`, `feedback_*`, `hot_topic_*` and other ranking weight parameters currently only support configuration through JSON config file, not environment variables. Set to `1.0` to disable the effect of that dimension.
345
+
346
+ ### Embedding Models and Dimensions
347
+
348
+ | Model | Dimensions |
349
+ |-------|------------|
350
+ | OpenAI text-embedding-3-small | 1536 |
351
+ | OpenAI text-embedding-3-large | 3072 |
352
+ | OpenAI text-embedding-ada-002 | 1536 |
353
+ | DashScope text-embedding-v2 | 1536 |
354
+ | DashScope text-embedding-v3 | 1024 |
355
+
356
+ Dimensions are automatically detected, switching models doesn't require reconfiguration.
357
+
358
+ ### Synonym Table
359
+
360
+ Stored in Redis Hash `keepsake:synonyms`, expanded at search time to improve recall:
361
+
362
+ ```bash
363
+ redis-cli HSET keepsake:synonyms setup '["install","configure","deploy","setup"]'
364
+ redis-cli HSET keepsake:synonyms fix '["fix","modify","correct","repair","solve"]'
365
+ ```
366
+
367
+ ## Verification
368
+
369
+ Check logs after startup:
370
+
371
+ ```
372
+ Memory provider 'entryed' registered (0 tools)
373
+ entryed: connected (session=xxx, top_k=5, tag_filter=(none))
374
+ entryed: BM25-only mode (no embedder configured)
375
+ ```
376
+
377
+ ## Architecture
378
+
379
+ ```
380
+ ┌────────────────────────────────────────────────────────┐
381
+ │ User sends message │
382
+ └──────────────────┬─────────────────────────────────────┘
383
+
384
+ ┌─────────▼─────────┐
385
+ │ prefetch() │ ← Automatically triggered on every user message
386
+ │ ↓ │
387
+ │ Workflow Lock? │ ← Checks keepsake:workflow_lock
388
+ │ ↓ │
389
+ │ Skip patterns? │ ← Length / exact match against skip list
390
+ │ ↓ │
391
+ │ BM25 Full-Text Search │ ← Default, zero cost, searches full entries
392
+ │ (KNN Vector search) │ ← Optional (needs embedder)
393
+ │ Entity co-occurrence │ ← Expand query with co-occurring entities
394
+ │ ↓ │
395
+ │ Six-dimensional Re-ranking │ ← Similarity × Time decay
396
+ │ │ × Emotion × Feedback × Hot Topic × Attention
397
+ │ ↓ │
398
+ │ Top N Injected into Context │ ← Full entries returned as-is
399
+ └─────────┬─────────┘
400
+
401
+ ┌─────────▼─────────┐
402
+ │ Model Response │ ← Entries used directly (complete text)
403
+ └───────────────────┘
404
+
405
+ ┌─────────▼─────────┐
406
+ │ on_memory_write()│ ← Only on memory(action='add')
407
+ │ Stores Full Text │ ← Complete entry, no splitting
408
+ │ Entity Extraction│ ← jieba + regex → entities TAG field
409
+ │ Entity Co-occur. │ ← Track entity pairs in ZSET
410
+ │ Attention Track │ ← Extract keywords, increase attention score
411
+ │ ↓ │
412
+ │ Stored in Redis │ ← Available for next retrieval as full text
413
+ └───────────────────┘
414
+
415
+ ┌─────────▼─────────┐
416
+ │ [cron] Every 2h │ ← Background maintenance
417
+ │ ① Multi-level Consolidation │ ← Same topic → keyword clustering → LLM → level+1
418
+ │ ② Selective Forgetting │ ← Age>30d + no feedback + low emotion + low attention → delete
419
+ └───────────────────┘
420
+ ```
421
+
422
+ ## License
423
+
424
+ MIT