llm-cortex-memory 1.0.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Christopher Carpenter
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,363 @@
1
+ Metadata-Version: 2.4
2
+ Name: llm-cortex-memory
3
+ Version: 1.0.0
4
+ Summary: Portable, model-agnostic memory layer for LLM conversations
5
+ Author: Christopher Carpenter
6
+ License-Expression: MIT
7
+ Project-URL: Homepage, https://github.com/Christopher-B-Carpenter/cortex-memory
8
+ Project-URL: Repository, https://github.com/Christopher-B-Carpenter/cortex-memory
9
+ Project-URL: Issues, https://github.com/Christopher-B-Carpenter/cortex-memory/issues
10
+ Keywords: llm,memory,bm25,rag,claude,openai,conversation
11
+ Classifier: Development Status :: 4 - Beta
12
+ Classifier: Intended Audience :: Developers
13
+ Classifier: Programming Language :: Python :: 3
14
+ Classifier: Programming Language :: Python :: 3.9
15
+ Classifier: Programming Language :: Python :: 3.10
16
+ Classifier: Programming Language :: Python :: 3.11
17
+ Classifier: Programming Language :: Python :: 3.12
18
+ Classifier: Programming Language :: Python :: 3.13
19
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
20
+ Requires-Python: >=3.9
21
+ Description-Content-Type: text/markdown
22
+ License-File: LICENSE
23
+ Requires-Dist: numpy>=1.24
24
+ Requires-Dist: scipy>=1.11
25
+ Provides-Extra: anthropic
26
+ Requires-Dist: anthropic>=0.20; extra == "anthropic"
27
+ Provides-Extra: openai
28
+ Requires-Dist: openai>=1.0; extra == "openai"
29
+ Provides-Extra: all
30
+ Requires-Dist: anthropic>=0.20; extra == "all"
31
+ Requires-Dist: openai>=1.0; extra == "all"
32
+ Dynamic: license-file
33
+
34
+ # Cortex Memory
35
+
36
+ A portable, model-agnostic memory layer for LLM conversations.
37
+
38
+ Cortex stores conversation memories as plain text, retrieves them with BM25, and builds associative structure from usage patterns over time. The entire memory state — index, weights, clusters — serializes to a single compressed file of approximately **15 bytes per memory**. No embedding model. No database. No API key required for retrieval.
39
+
40
+ ```python
41
+ from cortex_memory import Memory
42
+
43
+ mem = Memory.load("project.memory")
44
+ results = mem.query("what did we decide about authentication?")
45
+ mem.store("Decided to use JWT with 24-hour expiry and Redis-backed refresh tokens.")
46
+ mem.save("project.memory")
47
+ ```
48
+
49
+ ---
50
+
51
+ ## Why
52
+
53
+ Long-lived projects accumulate context that current tools don't manage well:
54
+
55
+ - **Plain text logs** grow without structure and have no retrieval
56
+ - **RAG / vector databases** are tied to a specific embedding model — swap models and the index degrades or must be rebuilt
57
+ - **Hosted memory services** (Mem0, Zep) require cloud APIs and don't produce portable files
58
+
59
+ Cortex targets the gap: a memory file that travels with a project, survives model changes, requires no infrastructure, and improves structurally through use.
60
+
61
+ ---
62
+
63
+ ## Installation
64
+
65
+ ```bash
66
+ pip install cortex-memory
67
+ ```
68
+
69
+ Optional LLM integrations:
70
+
71
+ ```bash
72
+ pip install cortex-memory[anthropic] # for ClaudeMemoryHarness
73
+ pip install cortex-memory[openai] # for OpenAIMemoryHarness
74
+ pip install cortex-memory[all] # both
75
+ ```
76
+
77
+ ---
78
+
79
+ ## Quick start
80
+
81
+ ### Create a memory store
82
+
83
+ ```python
84
+ from cortex_memory import Memory
85
+
86
+ mem = Memory.create(
87
+ description="payments-service development",
88
+ tags=["python", "auth", "database"],
89
+ )
90
+
91
+ mem.store("Decided to use JWT tokens with 24-hour expiry.")
92
+ mem.store("SQL injection in legacy login fixed with parameterized queries.")
93
+ mem.store("Composite index on (user_id, created_at) reduced dashboard query from 8s to 200ms.")
94
+
95
+ mem.save("project.memory")
96
+ ```
97
+
98
+ ### Query it anywhere
99
+
100
+ ```python
101
+ from cortex_memory import Memory
102
+
103
+ mem = Memory.load("project.memory")
104
+
105
+ results = mem.query("what security issues did we fix?", top_k=5)
106
+ for r in results:
107
+ print(r)
108
+ ```
109
+
110
+ ### Merge two memory files
111
+
112
+ ```python
113
+ from cortex_memory import Memory
114
+
115
+ mem_a = Memory.load("alice.memory")
116
+ mem_b = Memory.load("bob.memory")
117
+
118
+ merged = Memory.merge(mem_a, mem_b, description="shared project memory")
119
+ merged.save("team.memory")
120
+ ```
121
+
122
+ ---
123
+
124
+ ## Integration with Claude Code (recommended)
125
+
126
+ One-command setup. Memory injection and storage happen automatically on every turn.
127
+
128
+ ```bash
129
+ pip install cortex-memory
130
+ python3 -m cortex_memory install # project-level setup
131
+ python3 -m cortex_memory install --global # global (cross-project) setup
132
+ ```
133
+
134
+ This creates hook files, generates `settings.json` with correct absolute paths, and initializes the `.memory` file. Then restart Claude Code — memory is automatic from that point.
135
+
136
+ **How it works:**
137
+ - `UserPromptSubmit` hook queries memory before each prompt → injects top-5 results as context
138
+ - `Stop` hook stores Claude's response after each turn → memory grows every session
139
+ - `config.json` controls the source: `project`, `global`, `both` (default), or `off`
140
+
141
+ **Seed initial context (optional):**
142
+
143
+ ```python
144
+ from cortex_memory import Memory
145
+ mem = Memory.load(".claude/memory/project.memory")
146
+ mem.store("uses Python 3.12, FastAPI, PostgreSQL, deployed on AWS ECS")
147
+ mem.store("auth uses JWT with 24h expiry, refresh tokens in Redis")
148
+ mem.save(".claude/memory/project.memory")
149
+ ```
150
+
151
+ See `examples/claude_code_hooks/setup.md` for tuning options, dev team use cases, the `/memory` slash command, and troubleshooting.
152
+
153
+ ---
154
+
155
+ ## Integration with Claude API
156
+
157
+ ```python
158
+ from cortex_memory import ClaudeMemoryHarness
159
+
160
+ harness = ClaudeMemoryHarness(
161
+ "project.memory",
162
+ model="claude-sonnet-4-6",
163
+ system_prompt="You are a technical assistant with context about this project.",
164
+ top_k=5,
165
+ )
166
+
167
+ response = harness.chat("what indexes did we add to fix the slow queries?")
168
+ print(response)
169
+
170
+ harness.save() # persists to project.memory
171
+ ```
172
+
173
+ Every turn:
174
+ 1. Queries memory with the user message
175
+ 2. Injects top-K results into the system prompt as `<memory>` context
176
+ 3. Calls Claude
177
+ 4. Stores Claude's response asynchronously
178
+
179
+ ### OpenAI / any OpenAI-compatible API
180
+
181
+ ```python
182
+ from cortex_memory import OpenAIMemoryHarness
183
+
184
+ harness = OpenAIMemoryHarness(
185
+ "project.memory",
186
+ model="gpt-4o",
187
+ # base_url="http://localhost:11434/v1" # Ollama, Together, Fireworks, etc.
188
+ )
189
+
190
+ response = harness.chat("summarize what we know about the auth service")
191
+ harness.save()
192
+ ```
193
+
194
+ ### Any LLM callable
195
+
196
+ ```python
197
+ from cortex_memory import MemoryHarness
198
+
199
+ def my_llm(messages, system, **kwargs):
200
+ # call any LLM here
201
+ ...
202
+
203
+ harness = MemoryHarness("project.memory", llm_fn=my_llm)
204
+ response = harness.chat("what did we decide?")
205
+ ```
206
+
207
+ ---
208
+
209
+ ## How it works
210
+
211
+ Three layers on top of BM25 full-text retrieval:
212
+
213
+ **1. Usage weights** — each memory has a scalar weight that strengthens when the memory is retrieved and decays slowly over time. Decay is computed lazily (no per-query O(N) loop). Frequently-useful memories surface slightly ahead of equally-relevant alternatives.
214
+
215
+ **2. Co-retrieval clustering** — when memories A and B appear together in top-K results across multiple queries, they accumulate a co-retrieval count. Above a threshold, they join the same cluster. Clusters emerge from actual usage patterns, not from lexical or semantic similarity.
216
+
217
+ **3. Two-pass retrieval** — at query time, Pass 1 scores only cluster representatives (O(clusters)), selects the top-matching clusters, and Pass 2 scores only their members. At N=1,000 with ~80 clusters, this scores ~60 memories instead of 1,000. At N=500-2,000, the architecture skips 85-97% of the store while matching flat BM25 precision.
218
+
219
+ ---
220
+
221
+ ## File format
222
+
223
+ A `.memory` file is a zip archive containing:
224
+
225
+ ```
226
+ project.memory
227
+ ├── store.pkl # Cortex state (BM25 index, weights, clusters, co-retrieval)
228
+ ├── manifest.json # metadata: description, tags, query count, LLM hint
229
+ └── README.md # auto-generated summary of top memories and clusters
230
+ ```
231
+
232
+ - **~15 bytes per memory** at N=10,000 (148 KB total)
233
+ - **81ms load time** at N=10,000
234
+ - **Lossless** — two independently loaded instances produce identical results
235
+ - No external model required to load or query
236
+
237
+ ---
238
+
239
+ ## Team and shared repositories
240
+
241
+ `.memory` files are binary — git cannot diff or auto-merge them. The recommended approach is to keep them out of feature branch commits and merge them explicitly using `Memory.merge()` at the points where you want to consolidate context.
242
+
243
+ **Merging two memory files:**
244
+
245
+ ```python
246
+ from cortex_memory import Memory
247
+
248
+ merged = Memory.merge(
249
+ Memory.load("alice.memory"),
250
+ Memory.load("bob.memory"),
251
+ description="shared project memory",
252
+ )
253
+ merged.save("team.memory")
254
+ ```
255
+
256
+ Merge semantics are non-destructive: memories are unioned, weights are max-pooled (whichever side used a memory more wins), and co-retrieval counts are summed.
257
+
258
+ **Keeping `.memory` out of PR diffs:**
259
+
260
+ If you do commit `.memory` files, add these lines to `.gitattributes` so they're hidden from code review diffs:
261
+
262
+ ```
263
+ *.memory -diff
264
+ *.memory linguist-generated=true
265
+ ```
266
+
267
+ **CI pipelines:**
268
+
269
+ For automated consolidation after branch merges, call `Memory.merge()` directly in your pipeline script — it's a straightforward Python call with no external dependencies beyond numpy and scipy.
270
+
271
+ ---
272
+
273
+ ## Benchmarks
274
+
275
+ Measured on a MacBook Pro (Apple M-series), N=100-10,000 memories, software engineering conversation corpus.
276
+
277
+ | N | Precision@8 vs Flat BM25 | Memories skipped | Load time |
278
+ |---|---|---|---|
279
+ | 100 | +0.05 | 67% | 1ms |
280
+ | 500 | -0.08 | 88% | 4ms |
281
+ | 1,000 | ~0 | 95% | 8ms |
282
+ | 2,000 | ~0 | 96% | - |
283
+ | 10,000 | ~0 | 17% | 81ms |
284
+
285
+ Context coherence (mean co-retrieval count in returned set) grows from 5 to 89 over 200 queries without any preprocessing. Token efficiency is ~22% better than flat retrieval in steady state.
286
+
287
+ See `benchmark.py` to reproduce.
288
+
289
+ ---
290
+
291
+ ## Repository structure
292
+
293
+ ```
294
+ cortex-memory/
295
+ ├── pyproject.toml # package metadata (pip install cortex-memory)
296
+ ├── src/cortex_memory/ # installable package
297
+ │ ├── __init__.py # public API exports
298
+ │ ├── cortex.py # storage engine (VectorizedBM25, Cortex)
299
+ │ ├── memory.py # portable artifact (Memory class, merge)
300
+ │ ├── harness.py # LLM integration (MemoryHarness, Claude/OpenAI)
301
+ │ └── install.py # one-command Claude Code setup
302
+ ├── cortex.py # standalone (no pip install needed)
303
+ ├── memory.py # standalone
304
+ ├── harness.py # standalone
305
+ ├── benchmark.py # reproduce the benchmarks
306
+ ├── requirements.txt
307
+ └── examples/
308
+ ├── demo.py # basic usage, no API needed
309
+ ├── claude_api.py # interactive Claude conversation loop
310
+ └── claude_code_hooks/ # Claude Code hook reference
311
+ ├── on_prompt.py # UserPromptSubmit hook
312
+ ├── on_stop.py # Stop hook
313
+ ├── config.json # memory source config
314
+ ├── memory.md # /memory slash command
315
+ ├── settings.json # settings.json template
316
+ └── setup.md # manual setup, tuning, troubleshooting
317
+ ```
318
+
319
+ **Two ways to use:**
320
+ - `pip install cortex-memory` — recommended. Hooks use the installed package.
321
+ - Clone and copy files — standalone, no pip needed. The root `cortex.py`, `memory.py`, `harness.py` work independently.
322
+
323
+ ---
324
+
325
+ ## API reference
326
+
327
+ ### `Memory`
328
+
329
+ | Method | Description |
330
+ |---|---|
331
+ | `Memory.create(description, tags)` | Create a new empty store |
332
+ | `Memory.load(path)` | Load from `.memory` file |
333
+ | `Memory.merge(a, b, description)` | Union two stores |
334
+ | `mem.store(text, memory_id, metadata)` | Add a memory |
335
+ | `mem.query(text, top_k)` | Retrieve relevant memories (returns list of strings) |
336
+ | `mem.query(text, return_scores=True)` | Returns list of dicts with score/weight/cluster |
337
+ | `mem.forget(memory_id)` | Remove a memory |
338
+ | `mem.save(path)` | Serialize to disk |
339
+ | `mem.stats()` | Store statistics |
340
+ | `mem.top_memories(n)` | Most-used memories by weight |
341
+ | `mem.clusters(n)` | Current cluster summary |
342
+
343
+ ### `MemoryHarness`
344
+
345
+ | Method | Description |
346
+ |---|---|
347
+ | `MemoryHarness(path, llm_fn, ...)` | Create harness with any LLM callable |
348
+ | `ClaudeMemoryHarness(path, model, ...)` | Anthropic SDK subclass |
349
+ | `OpenAIMemoryHarness(path, model, ...)` | OpenAI SDK subclass |
350
+ | `harness.chat(message)` | Send message, get response with memory injection |
351
+ | `harness.build_system_prompt(query)` | Get system prompt with injected context (for manual use) |
352
+ | `harness.store(text)` | Manually store a memory |
353
+ | `harness.query(text)` | Query without LLM call |
354
+ | `harness.inject_claude_md(query, path)` | Prepend memories to CLAUDE.md |
355
+ | `harness.sync_from_transcript(path)` | Store turns from a JSONL transcript |
356
+ | `harness.save()` | Flush and save to disk |
357
+ | `harness.reset_conversation()` | Clear conversation history (keep memory) |
358
+
359
+ ---
360
+
361
+ ## License
362
+
363
+ MIT