smos-mcp 0.1.2__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- smos_mcp-0.1.2/LICENSE +21 -0
- smos_mcp-0.1.2/PKG-INFO +606 -0
- smos_mcp-0.1.2/README.md +576 -0
- smos_mcp-0.1.2/pyproject.toml +52 -0
- smos_mcp-0.1.2/setup.cfg +4 -0
- smos_mcp-0.1.2/smos/__init__.py +3 -0
- smos_mcp-0.1.2/smos/__main__.py +3 -0
- smos_mcp-0.1.2/smos/compression/__init__.py +0 -0
- smos_mcp-0.1.2/smos/compression/context_builder.py +104 -0
- smos_mcp-0.1.2/smos/install/__init__.py +0 -0
- smos_mcp-0.1.2/smos/install/cli.py +376 -0
- smos_mcp-0.1.2/smos/llm/__init__.py +0 -0
- smos_mcp-0.1.2/smos/llm/client.py +15 -0
- smos_mcp-0.1.2/smos/llm/summarizer.py +128 -0
- smos_mcp-0.1.2/smos/memory/__init__.py +0 -0
- smos_mcp-0.1.2/smos/memory/embeddings.py +24 -0
- smos_mcp-0.1.2/smos/memory/lifecycle.py +87 -0
- smos_mcp-0.1.2/smos/memory/schemas.py +34 -0
- smos_mcp-0.1.2/smos/memory/vector_store.py +335 -0
- smos_mcp-0.1.2/smos/server.py +188 -0
- smos_mcp-0.1.2/smos/tools/__init__.py +0 -0
- smos_mcp-0.1.2/smos/tools/file_tools.py +75 -0
- smos_mcp-0.1.2/smos/tools/semantic_tools.py +49 -0
- smos_mcp-0.1.2/smos_mcp.egg-info/PKG-INFO +606 -0
- smos_mcp-0.1.2/smos_mcp.egg-info/SOURCES.txt +31 -0
- smos_mcp-0.1.2/smos_mcp.egg-info/dependency_links.txt +1 -0
- smos_mcp-0.1.2/smos_mcp.egg-info/entry_points.txt +3 -0
- smos_mcp-0.1.2/smos_mcp.egg-info/requires.txt +10 -0
- smos_mcp-0.1.2/smos_mcp.egg-info/top_level.txt +1 -0
- smos_mcp-0.1.2/tests/test_file_tools.py +82 -0
- smos_mcp-0.1.2/tests/test_semantic_tools.py +70 -0
- smos_mcp-0.1.2/tests/test_summarizer.py +70 -0
- smos_mcp-0.1.2/tests/test_vector_store.py +102 -0
smos_mcp-0.1.2/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 Witchd0ct0r
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
smos_mcp-0.1.2/PKG-INFO
ADDED
|
@@ -0,0 +1,606 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: smos-mcp
|
|
3
|
+
Version: 0.1.2
|
|
4
|
+
Summary: Semantic Memory Operating System — persistent memory MCP server for Claude Code
|
|
5
|
+
License: MIT
|
|
6
|
+
Project-URL: Homepage, https://github.com/Witchd0ct0r/Semantic_Memory_Operating_System_SMOS
|
|
7
|
+
Project-URL: Repository, https://github.com/Witchd0ct0r/Semantic_Memory_Operating_System_SMOS
|
|
8
|
+
Project-URL: Issues, https://github.com/Witchd0ct0r/Semantic_Memory_Operating_System_SMOS/issues
|
|
9
|
+
Keywords: claude,mcp,memory,semantic,ai,llm
|
|
10
|
+
Classifier: Development Status :: 4 - Beta
|
|
11
|
+
Classifier: Intended Audience :: Developers
|
|
12
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
13
|
+
Classifier: Programming Language :: Python :: 3
|
|
14
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
15
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
16
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
17
|
+
Requires-Python: >=3.10
|
|
18
|
+
Description-Content-Type: text/markdown
|
|
19
|
+
License-File: LICENSE
|
|
20
|
+
Requires-Dist: mcp[cli]<2.0.0,>=1.0.0
|
|
21
|
+
Requires-Dist: faiss-cpu>=1.8.0
|
|
22
|
+
Requires-Dist: sentence-transformers<4.0.0,>=3.0.0
|
|
23
|
+
Requires-Dist: openai<2.0.0,>=1.50.0
|
|
24
|
+
Requires-Dist: pydantic<3.0.0,>=2.5.0
|
|
25
|
+
Requires-Dist: numpy>=1.26.0
|
|
26
|
+
Provides-Extra: dev
|
|
27
|
+
Requires-Dist: pytest>=8.0.0; extra == "dev"
|
|
28
|
+
Requires-Dist: pytest-asyncio>=0.23.0; extra == "dev"
|
|
29
|
+
Dynamic: license-file
|
|
30
|
+
|
|
31
|
+
<h1 align="center">SMOS</h1>
|
|
32
|
+
|
|
33
|
+
<p align="center"><strong>Semantic Memory Operating System for Claude Code</strong></p>
|
|
34
|
+
|
|
35
|
+
<p align="center">
|
|
36
|
+
Compress files out of context. Query knowledge by meaning. Persist across sessions.
|
|
37
|
+
</p>
|
|
38
|
+
|
|
39
|
+
<p align="center">
|
|
40
|
+
<a href="https://github.com/Witchd0ct0r/Semantic_Memory_Operating_System_SMOS/stargazers"><img src="https://img.shields.io/github/stars/Witchd0ct0r/Semantic_Memory_Operating_System_SMOS?style=flat&color=blue" alt="Stars"></a>
|
|
41
|
+
<a href="LICENSE"><img src="https://img.shields.io/github/license/Witchd0ct0r/Semantic_Memory_Operating_System_SMOS?style=flat" alt="License"></a>
|
|
42
|
+
<a href="https://pypi.org/project/smos-mcp/"><img src="https://img.shields.io/pypi/v/smos-mcp?style=flat" alt="PyPI"></a>
|
|
43
|
+
<img src="https://img.shields.io/badge/python-3.10%2B-blue?style=flat" alt="Python">
|
|
44
|
+
</p>
|
|
45
|
+
|
|
46
|
+
<p align="center">
|
|
47
|
+
<a href="#the-problem">Problem</a> •
|
|
48
|
+
<a href="#how-it-works">How it works</a> •
|
|
49
|
+
<a href="#compression-in-practice">In practice</a> •
|
|
50
|
+
<a href="#install">Install</a> •
|
|
51
|
+
<a href="#benchmarks">Benchmarks</a> •
|
|
52
|
+
<a href="#example-use-cases">Examples</a> •
|
|
53
|
+
<a href="#tools">Tools</a> •
|
|
54
|
+
<a href="#configuration">Config</a> •
|
|
55
|
+
<a href="#update">Update</a> •
|
|
56
|
+
<a href="#uninstall">Uninstall</a>
|
|
57
|
+
</p>
|
|
58
|
+
|
|
59
|
+
---
|
|
60
|
+
|
|
61
|
+
## The problem
|
|
62
|
+
|
|
63
|
+
Every file Claude reads stays in the context window until the session ends. On a 20-file codebase, by the time Claude reaches synthesis it's carrying 40,000+ tokens of raw source — most of which it already processed, will never need again verbatim, and is paying for on every single API call.
|
|
64
|
+
|
|
65
|
+
**Caveman** compresses what Claude *says*. **SMOS** compresses what Claude *holds* — the files, the prior analysis, the context window itself.
|
|
66
|
+
|
|
67
|
+
---
|
|
68
|
+
|
|
69
|
+
## How it works
|
|
70
|
+
|
|
71
|
+
SMOS is an MCP server that gives Claude a persistent memory layer: FAISS vector search + SQLite, powered by a local LLM (qwen2.5 via Ollama) for compression.
|
|
72
|
+
|
|
73
|
+
Instead of reading a file with the built-in Read tool and leaving it in context forever, Claude calls `tool_read_file_compress`. The file is summarised by a local LLM, stored in the vector index, and **the raw source never enters the context window**. At synthesis time, Claude queries the semantic index rather than re-reading anything.
|
|
74
|
+
|
|
75
|
+
```
|
|
76
|
+
WITHOUT SMOS WITH SMOS
|
|
77
|
+
────────────────────────────────── ──────────────────────────────────
|
|
78
|
+
Read file.py (3,000 tokens) → tool_read_file_compress(file.py)
|
|
79
|
+
→ stays in context forever → local LLM compresses to ~85 tokens
|
|
80
|
+
→ stored in FAISS + SQLite
|
|
81
|
+
→ nothing in context window
|
|
82
|
+
|
|
83
|
+
10 files read → 30,000 ctx tokens 10 files compressed → ~850 ctx tokens
|
|
84
|
+
|
|
85
|
+
Synthesis: Synthesis:
|
|
86
|
+
still carrying 30,000 tokens → 4 semantic queries × ~300 tokens
|
|
87
|
+
on every API call → = ~1,200 tokens total
|
|
88
|
+
35× smaller context at synthesis
|
|
89
|
+
```
|
|
90
|
+
|
|
91
|
+
Memory persists across sessions. Session 2 queries what Session 1 stored — no re-reading.
|
|
92
|
+
|
|
93
|
+
---
|
|
94
|
+
|
|
95
|
+
## Compression in practice
|
|
96
|
+
|
|
97
|
+
### Input — raw file (312 tokens in context without SMOS)
|
|
98
|
+
|
|
99
|
+
```python
|
|
100
|
+
# smos/memory/vector_store.py (excerpt)
|
|
101
|
+
|
|
102
|
+
def store(self, content: str, metadata: dict | None = None, tier: str = "working") -> str:
|
|
103
|
+
meta = metadata or {}
|
|
104
|
+
doc_id = str(uuid.uuid4())
|
|
105
|
+
ts = datetime.now(timezone.utc).isoformat()
|
|
106
|
+
|
|
107
|
+
summary = self._summarizer.summarize(content)
|
|
108
|
+
embedding = self._embed(summary)
|
|
109
|
+
|
|
110
|
+
with self._lock:
|
|
111
|
+
idx = self._index.ntotal
|
|
112
|
+
self._index.add(embedding.reshape(1, -1))
|
|
113
|
+
self._db.execute(
|
|
114
|
+
"INSERT INTO memories VALUES (?,?,?,?,?,?)",
|
|
115
|
+
(doc_id, content, summary, json.dumps(meta), tier, ts),
|
|
116
|
+
)
|
|
117
|
+
self._db.commit()
|
|
118
|
+
self._id_map[idx] = doc_id
|
|
119
|
+
return doc_id
|
|
120
|
+
|
|
121
|
+
def query(self, text: str, top_k: int = 5) -> list[dict]:
|
|
122
|
+
embedding = self._embed(text)
|
|
123
|
+
distances, indices = self._index.search(embedding.reshape(1, -1), top_k)
|
|
124
|
+
results = []
|
|
125
|
+
for dist, idx in zip(distances[0], indices[0]):
|
|
126
|
+
if idx == -1:
|
|
127
|
+
continue
|
|
128
|
+
doc_id = self._id_map.get(int(idx))
|
|
129
|
+
row = self._db.execute(
|
|
130
|
+
"SELECT content, summary, metadata, tier, created_at FROM memories WHERE id = ?",
|
|
131
|
+
(doc_id,),
|
|
132
|
+
).fetchone()
|
|
133
|
+
if row:
|
|
134
|
+
results.append({
|
|
135
|
+
"id": doc_id,
|
|
136
|
+
"summary": row[1],
|
|
137
|
+
"score": float(dist),
|
|
138
|
+
"tier": row[3],
|
|
139
|
+
})
|
|
140
|
+
return results
|
|
141
|
+
```
|
|
142
|
+
|
|
143
|
+
### Output — LLM summary stored in SMOS (42 tokens)
|
|
144
|
+
|
|
145
|
+
```
|
|
146
|
+
Stores text with LLM-compressed embedding into FAISS index and SQLite.
|
|
147
|
+
Assigns UUID, timestamps entry, generates summary via summarizer, embeds
|
|
148
|
+
with sentence-transformer, persists metadata and tier. Query method embeds
|
|
149
|
+
input text, searches FAISS for nearest neighbours, returns scored results
|
|
150
|
+
with summary and tier.
|
|
151
|
+
```
|
|
152
|
+
|
|
153
|
+
**42 tokens stored. 312 tokens never entered the context window. 7.4× compression on this excerpt.**
|
|
154
|
+
|
|
155
|
+
### What's written to SQLite
|
|
156
|
+
|
|
157
|
+
```
|
|
158
|
+
id: f3a2b1c0-8d4e-4f7a-9b2c-1e5d6f3a2b1c
|
|
159
|
+
summary: Stores text with LLM-compressed embedding into FAISS...
|
|
160
|
+
content: [original source, stored for lossless retrieval if needed]
|
|
161
|
+
metadata: {"source": "smos/memory/vector_store.py"}
|
|
162
|
+
tier: working
|
|
163
|
+
created_at: 2026-06-22T14:23:11.847Z
|
|
164
|
+
```
|
|
165
|
+
|
|
166
|
+
The FAISS index stores the 384-dimensional embedding of the summary. Queries embed the search string and find nearest neighbours by cosine distance — no keywords, no exact match required.
|
|
167
|
+
|
|
168
|
+
### Query result returned to Claude
|
|
169
|
+
|
|
170
|
+
```
|
|
171
|
+
tool_semantic_query("how does storage work")
|
|
172
|
+
|
|
173
|
+
→ score: 0.91
|
|
174
|
+
summary: "Stores text with LLM-compressed embedding into FAISS index
|
|
175
|
+
and SQLite. Assigns UUID, timestamps entry, generates summary
|
|
176
|
+
via summarizer, embeds with sentence-transformer..."
|
|
177
|
+
source: smos/memory/vector_store.py
|
|
178
|
+
tier: working
|
|
179
|
+
```
|
|
180
|
+
|
|
181
|
+
Claude gets the 42-token summary and a confidence score. The 312-token source stays on disk.
|
|
182
|
+
|
|
183
|
+
### Lossless retrieval — when you need the exact original
|
|
184
|
+
|
|
185
|
+
For code, diffs, or any content where exact bytes matter, use the verbatim path instead:
|
|
186
|
+
|
|
187
|
+
```
|
|
188
|
+
tool_store_verbatim(content="<full source>", label="vector_store.py store+query methods")
|
|
189
|
+
→ key: "f3a2b1c0-8d4e-4f7a-9b2c-1e5d6f3a2b1c"
|
|
190
|
+
```
|
|
191
|
+
|
|
192
|
+
Retrieve it later by key:
|
|
193
|
+
|
|
194
|
+
```
|
|
195
|
+
tool_retrieve(key="f3a2b1c0-8d4e-4f7a-9b2c-1e5d6f3a2b1c")
|
|
196
|
+
|
|
197
|
+
→ key: f3a2b1c0-8d4e-4f7a-9b2c-1e5d6f3a2b1c
|
|
198
|
+
label: vector_store.py store+query methods
|
|
199
|
+
timestamp: 2026-06-22T14:23:11.847Z
|
|
200
|
+
content:
|
|
201
|
+
def store(self, content: str, metadata: dict | None = None, tier: str = "working") -> str:
|
|
202
|
+
meta = metadata or {}
|
|
203
|
+
doc_id = str(uuid.uuid4())
|
|
204
|
+
... [exact original, byte-for-byte]
|
|
205
|
+
```
|
|
206
|
+
|
|
207
|
+
**Verbatim storage has no compression and no semantic index — it is a pure key-value store.** Use it when you need to reconstruct a diff, apply a patch, or pass exact code to a tool.
|
|
208
|
+
|
|
209
|
+
### Which path to use
|
|
210
|
+
|
|
211
|
+
| Content type | Tool | Retrieval |
|
|
212
|
+
|---|---|---|
|
|
213
|
+
| Prose, analysis, docs, logs | `tool_read_file_compress` / `tool_semantic_store` | `tool_semantic_query` (by meaning) |
|
|
214
|
+
| Code, diffs, structured data | `tool_store_verbatim` | `tool_retrieve` (by key) |
|
|
215
|
+
|
|
216
|
+
---
|
|
217
|
+
|
|
218
|
+
## Install
|
|
219
|
+
|
|
220
|
+
**Prerequisites:** Python 3.10+, [Claude Code](https://claude.ai/code), [Ollama](https://ollama.com/download)
|
|
221
|
+
|
|
222
|
+
```bash
|
|
223
|
+
pip install smos-mcp
|
|
224
|
+
smos setup
|
|
225
|
+
```
|
|
226
|
+
|
|
227
|
+
> **Note:** `smos-mcp` is the PyPI package name. The CLI commands installed are `smos` and `smos-server`.
|
|
228
|
+
|
|
229
|
+
The setup wizard handles everything else: Python deps, model selection, model pull, MCP registration, and CLAUDE.md policy injection. Restart Claude Code when done.
|
|
230
|
+
|
|
231
|
+
```bash
|
|
232
|
+
claude mcp list # verify: smos should appear
|
|
233
|
+
```
|
|
234
|
+
|
|
235
|
+
Alternatively, install directly from GitHub:
|
|
236
|
+
|
|
237
|
+
```bash
|
|
238
|
+
pip install git+https://github.com/Witchd0ct0r/Semantic_Memory_Operating_System_SMOS.git
|
|
239
|
+
smos setup
|
|
240
|
+
```
|
|
241
|
+
|
|
242
|
+
### If `smos` is not found after install
|
|
243
|
+
|
|
244
|
+
pip installs CLI scripts to a directory that may not be on your `PATH`. Find it:
|
|
245
|
+
|
|
246
|
+
```bash
|
|
247
|
+
python -m site --user-scripts
|
|
248
|
+
```
|
|
249
|
+
|
|
250
|
+
Then add it permanently:
|
|
251
|
+
|
|
252
|
+
**Windows (PowerShell)**
|
|
253
|
+
|
|
254
|
+
```powershell
|
|
255
|
+
$scripts = python -m site --user-scripts
|
|
256
|
+
[Environment]::SetEnvironmentVariable("Path", "$env:Path;$scripts", "User")
|
|
257
|
+
# Restart PowerShell for the change to take effect
|
|
258
|
+
```
|
|
259
|
+
|
|
260
|
+
**macOS (zsh)**
|
|
261
|
+
|
|
262
|
+
```bash
|
|
263
|
+
echo 'export PATH="$(python3 -m site --user-scripts):$PATH"' >> ~/.zshrc && source ~/.zshrc
|
|
264
|
+
```
|
|
265
|
+
|
|
266
|
+
**Linux (bash)**
|
|
267
|
+
|
|
268
|
+
```bash
|
|
269
|
+
echo 'export PATH="$(python3 -m site --user-scripts):$PATH"' >> ~/.bashrc && source ~/.bashrc
|
|
270
|
+
```
|
|
271
|
+
|
|
272
|
+
> **conda / miniconda users:** Scripts land in `$CONDA_PREFIX\Scripts` (Windows) or `$CONDA_PREFIX/bin` (macOS/Linux). These are on PATH when the conda environment is active — if you installed into `base` or an active env, `smos` should work immediately after activating that environment.
|
|
273
|
+
|
|
274
|
+
---
|
|
275
|
+
|
|
276
|
+
## Benchmarks
|
|
277
|
+
|
|
278
|
+
All numbers measured on real data. Benchmarks live in [`tests/`](./tests/).
|
|
279
|
+
|
|
280
|
+
> **Test hardware:** AMD Ryzen 5 7640HS (6C / 12T, 4.3 GHz) · 32 GB RAM · RTX 4050 Laptop 6 GB VRAM · 1 TB Kioxia NVMe · Windows 11
|
|
281
|
+
|
|
282
|
+
### Compression quality
|
|
283
|
+
|
|
284
|
+
Local LLM (qwen2.5:7b via Ollama) compresses files to a fixed-length summary. **Factual retention is 100% at all sizes** — all seeded keywords recovered from every summary across 3 independent runs.
|
|
285
|
+
|
|
286
|
+
| File size | Tokens in context (before) | Tokens after SMOS | Compression | Retention |
|
|
287
|
+
|-----------|:--------------------------:|:-----------------:|:-----------:|:---------:|
|
|
288
|
+
| 1 KB | ~260 | ~85 | **3.1×** | 100% |
|
|
289
|
+
| 5 KB | ~1,300 | ~110 | **11.8×** | 100% |
|
|
290
|
+
| 10 KB | ~2,585 | ~76 | **34.2×** | 100% |
|
|
291
|
+
| 50 KB | ~12,825 | ~83 | **154.7×** | 100% |
|
|
292
|
+
| **avg** | | | **51×** | **100%** |
|
|
293
|
+
|
|
294
|
+
At 50KB+ files — typical for large modules, log files, or generated content — SMOS compresses **154× with zero factual loss**. Summary length plateaus at ~330–440 characters regardless of input size above ~5KB; the LLM abstracts to a fixed-length output.
|
|
295
|
+
|
|
296
|
+
```
|
|
297
|
+
Context window pressure at synthesis (20-file codebase, 5KB avg)
|
|
298
|
+
──────────────────────────────────────────────────────────────────
|
|
299
|
+
|
|
300
|
+
Without SMOS ████████████████████████████████████████ 26,000 tokens
|
|
301
|
+
With SMOS ██░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 2,200 tokens
|
|
302
|
+
|
|
303
|
+
▲ 91% smaller
|
|
304
|
+
```
|
|
305
|
+
|
|
306
|
+
### Query latency
|
|
307
|
+
|
|
308
|
+
SMOS queries are fast and scale gracefully. P95 query latency grows only **1.21× when data grows 100×**. FAISS uses SIMD dot-product batching that stays sub-linear up to ~500K entries on standard hardware.
|
|
309
|
+
|
|
310
|
+
| Memories stored | Query avg | Query P95 | Query P99 |
|
|
311
|
+
|:--------------:|:---------:|:---------:|:---------:|
|
|
312
|
+
| 1,000 | 11.6 ms | 14.6 ms | 14.6 ms |
|
|
313
|
+
| 5,000 | 11.4 ms | 14.6 ms | 14.6 ms |
|
|
314
|
+
| 10,000 | 12.3 ms | 16.3 ms | 16.3 ms |
|
|
315
|
+
| 50,000 | 11.2 ms | 13.1 ms | 13.1 ms |
|
|
316
|
+
| 100,000 | 14.0 ms | 16.9 ms | 16.9 ms |
|
|
317
|
+
|
|
318
|
+
```
|
|
319
|
+
Query latency vs. corpus size
|
|
320
|
+
─────────────────────────────
|
|
321
|
+
20ms │ · · · · · · ·
|
|
322
|
+
│
|
|
323
|
+
15ms │ × × × × × = P95 measured
|
|
324
|
+
│ · · · · ·
|
|
325
|
+
10ms │ · · = avg measured
|
|
326
|
+
│
|
|
327
|
+
5ms │
|
|
328
|
+
└──────────────────────────
|
|
329
|
+
1K 5K 10K 50K 100K
|
|
330
|
+
|
|
331
|
+
100× more data. 1.21× slower queries.
|
|
332
|
+
```
|
|
333
|
+
|
|
334
|
+
### Retrieval quality
|
|
335
|
+
|
|
336
|
+
Evaluated on 200 documents across 8 technical domains (security, auth, FastAPI, PostgreSQL, Redis, Kubernetes, monitoring, CI/CD). 40 queries, 5 per domain.
|
|
337
|
+
|
|
338
|
+
| Metric | Score |
|
|
339
|
+
|--------|------:|
|
|
340
|
+
| P@1 (first result correct domain) | **100%** |
|
|
341
|
+
| MRR (mean reciprocal rank) | **1.000** |
|
|
342
|
+
| P@3 micro-average | 78.3% |
|
|
343
|
+
| P@5 micro-average | 73.0% |
|
|
344
|
+
|
|
345
|
+
Every first result is from the correct domain across all 40 queries. Top-5 bleed is expected and reflects genuine semantic overlap (JWT tokens appear in both security and auth documents, CI/CD pipelines reference Kubernetes, etc.).
|
|
346
|
+
|
|
347
|
+
### Ingest throughput
|
|
348
|
+
|
|
349
|
+
| Path | Rate | Bottleneck |
|
|
350
|
+
|------|-----:|-----------|
|
|
351
|
+
| Real-time (store() call) | 42 docs/s | Embedding model (98% of time) |
|
|
352
|
+
| Bulk import | 300 docs/s | Embedding model only |
|
|
353
|
+
|
|
354
|
+
Embedding is the ceiling on both paths — FAISS add and SQLite write together account for ~2% of ingest time.
|
|
355
|
+
|
|
356
|
+
### Scaling ceiling
|
|
357
|
+
|
|
358
|
+
SMOS is production-ready for ≤100K memories on standard hardware. The lifecycle manager runs O(M) deduplication where M is the batch size (50), independent of total corpus size — dedup cycles stay at ~1.1 seconds whether you have 10K or 1M memories stored.
|
|
359
|
+
|
|
360
|
+
```
|
|
361
|
+
Component limits (tested: Ryzen 5 7640HS, 32 GB RAM, RTX 4050 Laptop 6 GB VRAM)
|
|
362
|
+
──────────────────────────────────────────────────────────────────────────
|
|
363
|
+
Query < 20ms P95 ████████████████████ 100K memories
|
|
364
|
+
Lifecycle functional ██████████████████████████████ 1M+ memories
|
|
365
|
+
Ingest rate 42 docs/s (real-time) / 300 docs/s (bulk)
|
|
366
|
+
Max document size ~50KB (qwen2.5:7b context window)
|
|
367
|
+
FAISS index size 147 MB at 100K memories / 1.4 GB at 1M memories
|
|
368
|
+
```
|
|
369
|
+
|
|
370
|
+
---
|
|
371
|
+
|
|
372
|
+
## When SMOS saves tokens
|
|
373
|
+
|
|
374
|
+
SMOS pays off when the knowledge being accumulated exceeds what fits comfortably in context, or when the same codebase is visited more than once.
|
|
375
|
+
|
|
376
|
+
| Scenario | Savings |
|
|
377
|
+
|----------|---------|
|
|
378
|
+
| 50KB+ files (logs, generated code, docs) | Up to **154× context reduction per file** |
|
|
379
|
+
| Codebases > 30 files | Synthesis context stays fixed; baseline grows linearly |
|
|
380
|
+
| Multi-session work | Session 2+ queries stored memory; no re-reading |
|
|
381
|
+
| Repeated analysis from different angles | Query same compressed knowledge; pay once |
|
|
382
|
+
| Long agentic runs | Prior tool outputs stored out-of-context; don't accumulate |
|
|
383
|
+
|
|
384
|
+
**Single-session, small codebases (< 10 files, < 5KB each):** SMOS overhead exceeds savings. The tool is designed for sustained use and scale, not one-shot audits of tiny repos.
|
|
385
|
+
|
|
386
|
+
---
|
|
387
|
+
|
|
388
|
+
## Example use cases
|
|
389
|
+
|
|
390
|
+
### Codebase audit across many files
|
|
391
|
+
|
|
392
|
+
```
|
|
393
|
+
Read every file in src/ and give me a security audit.
|
|
394
|
+
```
|
|
395
|
+
|
|
396
|
+
Without SMOS, Claude reads 40 files → 60,000 tokens in context by the time it reaches synthesis. With SMOS, each file is compressed to ~85 tokens and stored. Synthesis pulls only what's relevant via semantic query. Context at synthesis: ~1,200 tokens.
|
|
397
|
+
|
|
398
|
+
---
|
|
399
|
+
|
|
400
|
+
### Multi-session feature work
|
|
401
|
+
|
|
402
|
+
Day 1 — Claude reads the auth module, database schema, and API contracts. All compressed and stored.
|
|
403
|
+
|
|
404
|
+
Day 2 — new session, zero re-reading:
|
|
405
|
+
|
|
406
|
+
```
|
|
407
|
+
What did we establish about the auth flow yesterday?
|
|
408
|
+
```
|
|
409
|
+
|
|
410
|
+
SMOS returns the stored context instantly. Claude picks up exactly where it left off without touching a file.
|
|
411
|
+
|
|
412
|
+
---
|
|
413
|
+
|
|
414
|
+
### Large log / generated file analysis
|
|
415
|
+
|
|
416
|
+
```
|
|
417
|
+
Read build/output.log and tell me what failed.
|
|
418
|
+
```
|
|
419
|
+
|
|
420
|
+
A 50KB build log would consume ~12,800 tokens in context and stay there. With SMOS, it compresses 154× to ~83 tokens. Claude gets the failure summary; the raw log never enters the window.
|
|
421
|
+
|
|
422
|
+
---
|
|
423
|
+
|
|
424
|
+
### Accumulating decisions across a long agent run
|
|
425
|
+
|
|
426
|
+
Claude is running a multi-step refactor — reading files, making decisions, writing changes. Without SMOS, every prior decision accumulates in context. With SMOS:
|
|
427
|
+
|
|
428
|
+
```python
|
|
429
|
+
tool_store_verbatim(content=diff, label="auth-refactor-step-3")
|
|
430
|
+
tool_semantic_store("Decided to replace JWT with session tokens — see verbatim key abc123")
|
|
431
|
+
```
|
|
432
|
+
|
|
433
|
+
Prior steps are queryable but out-of-context. The agent runs indefinitely without hitting the context ceiling.
|
|
434
|
+
|
|
435
|
+
---
|
|
436
|
+
|
|
437
|
+
### Repeated analysis from different angles
|
|
438
|
+
|
|
439
|
+
```
|
|
440
|
+
# Session 1
|
|
441
|
+
Analyse src/payments.py for performance issues.
|
|
442
|
+
|
|
443
|
+
# Session 2
|
|
444
|
+
Analyse src/payments.py for security issues.
|
|
445
|
+
```
|
|
446
|
+
|
|
447
|
+
Session 2 queries the compressed version stored in session 1 — no re-read, no re-embedding, instant retrieval. Analysis starts immediately from stored knowledge.
|
|
448
|
+
|
|
449
|
+
---
|
|
450
|
+
|
|
451
|
+
## How Claude uses it
|
|
452
|
+
|
|
453
|
+
Once installed, Claude follows this policy automatically (injected via `~/.claude/CLAUDE.md`):
|
|
454
|
+
|
|
455
|
+
1. **Query first** — before reading any file, call `tool_semantic_query`. If the answer is already in memory, skip the read entirely.
|
|
456
|
+
2. **Compress reads** — use `tool_read_file_compress` for any file not about to be edited. Raw source never enters context.
|
|
457
|
+
3. **Precise reads** — use the built-in Read tool only immediately before an `Edit` or `Write` call.
|
|
458
|
+
4. **Lossless storage** — code, diffs, and structured data go to `tool_store_verbatim` (no LLM compression, exact bytes on retrieval).
|
|
459
|
+
5. **Synthesise from memory** — use `tool_semantic_query` instead of re-reading already-compressed files.
|
|
460
|
+
|
|
461
|
+
---
|
|
462
|
+
|
|
463
|
+
## Tools
|
|
464
|
+
|
|
465
|
+
| Tool | Description |
|
|
466
|
+
|------|------------|
|
|
467
|
+
| `tool_read_file_compress` | Read a file, compress with local LLM, store summary. Raw file never enters context window. Accepts absolute paths. |
|
|
468
|
+
| `tool_semantic_store` | Store any text as a queryable semantic memory. |
|
|
469
|
+
| `tool_semantic_query` | Retrieve compressed context via natural language. Returns summary + confidence + sources. |
|
|
470
|
+
| `tool_semantic_write` | Store a typed, tagged memory object (doc / adr / log / issue). |
|
|
471
|
+
| `tool_store_verbatim` | Store exact content losslessly — code, diffs, any artifact where exact bytes matter. Returns a retrieval key. |
|
|
472
|
+
| `tool_retrieve` | Retrieve verbatim content by key. |
|
|
473
|
+
| `tool_write_file_safe` | Write files to the sandboxed workspace directory. |
|
|
474
|
+
|
|
475
|
+
---
|
|
476
|
+
|
|
477
|
+
## Configuration
|
|
478
|
+
|
|
479
|
+
Environment variables (set during `smos setup` or in your shell):
|
|
480
|
+
|
|
481
|
+
| Variable | Default | Description |
|
|
482
|
+
|----------|---------|-------------|
|
|
483
|
+
| `OLLAMA_BASE_URL` | `http://localhost:11434/v1` | Ollama endpoint |
|
|
484
|
+
| `OLLAMA_MODEL` | `qwen2.5:7b` | Summarization model |
|
|
485
|
+
| `SUMMARIZER_MAX_TOKENS` | `512` | Max tokens per summary output |
|
|
486
|
+
|
|
487
|
+
### Model options (chosen during setup)
|
|
488
|
+
|
|
489
|
+
| Model | Size | Min RAM | Compression quality |
|
|
490
|
+
|-------|-----:|:-------:|-------------------|
|
|
491
|
+
| `qwen2.5:7b` | 4.7 GB | 8 GB | Best (benchmarked) — GPU-accelerated if CUDA available |
|
|
492
|
+
| `qwen2.5:3b` | 2.0 GB | 4 GB | Good |
|
|
493
|
+
| `qwen2.5:1.5b`| 0.9 GB | 4 GB | Fast |
|
|
494
|
+
| none | — | — | Extractive fallback (first sentences only) |
|
|
495
|
+
|
|
496
|
+
The RTX 4050 (or any CUDA GPU) will be used automatically by Ollama if available, reducing LLM latency from ~10s to ~2–3s per compression call.
|
|
497
|
+
|
|
498
|
+
Without Ollama, SMOS falls back to extractive summarization. Semantic querying and verbatim storage work normally — only LLM-driven compression degrades.
|
|
499
|
+
|
|
500
|
+
---
|
|
501
|
+
|
|
502
|
+
## Data
|
|
503
|
+
|
|
504
|
+
Each project gets its own isolated data store. SMOS creates a `.smos/` folder in the project root the first time Claude Code opens the project — no manual setup required.
|
|
505
|
+
|
|
506
|
+
```
|
|
507
|
+
<your-project>/
|
|
508
|
+
└── .smos/
|
|
509
|
+
├── faiss.index — vector index for this project (147 MB at 100K memories)
|
|
510
|
+
├── metadata.db — SQLite: content, summaries, tiers, verbatim store
|
|
511
|
+
├── workspace/ — sandboxed file write area
|
|
512
|
+
└── logs/ — write audit log
|
|
513
|
+
|
|
514
|
+
~/.smos/
|
|
515
|
+
└── .env — global model preference (OLLAMA_MODEL=qwen2.5:7b)
|
|
516
|
+
```
|
|
517
|
+
|
|
518
|
+
Nothing leaves your machine. Memories from Project A never appear in Project B queries.
|
|
519
|
+
|
|
520
|
+
To delete a project's memory:
|
|
521
|
+
|
|
522
|
+
```bash
|
|
523
|
+
# Windows
|
|
524
|
+
Remove-Item -Recurse -Force .smos
|
|
525
|
+
|
|
526
|
+
# macOS / Linux
|
|
527
|
+
rm -rf .smos
|
|
528
|
+
```
|
|
529
|
+
|
|
530
|
+
Add `.smos/` to your `.gitignore` to keep memory data out of version control (SMOS does this automatically for new projects).
|
|
531
|
+
|
|
532
|
+
The database survives crashes: on restart, SMOS detects FAISS/SQLite divergence and rebuilds the index from SQLite automatically (re-embeds all content in batches of 256).
|
|
533
|
+
|
|
534
|
+
---
|
|
535
|
+
|
|
536
|
+
## Update
|
|
537
|
+
|
|
538
|
+
```bash
|
|
539
|
+
smos update
|
|
540
|
+
```
|
|
541
|
+
|
|
542
|
+
Pulls the latest version from GitHub and upgrades in place. Restart Claude Code afterward.
|
|
543
|
+
|
|
544
|
+
Check your current version:
|
|
545
|
+
|
|
546
|
+
```bash
|
|
547
|
+
smos --version
|
|
548
|
+
```
|
|
549
|
+
|
|
550
|
+
**To get notified of new releases:** go to the [GitHub repo](https://github.com/Witchd0ct0r/Semantic_Memory_Operating_System_SMOS), click **Watch → Custom → Releases**.
|
|
551
|
+
|
|
552
|
+
---
|
|
553
|
+
|
|
554
|
+
## Uninstall
|
|
555
|
+
|
|
556
|
+
```bash
|
|
557
|
+
smos uninstall
|
|
558
|
+
```
|
|
559
|
+
|
|
560
|
+
This removes:
|
|
561
|
+
|
|
562
|
+
- **MCP registration** — `claude mcp remove smos` (runs automatically)
|
|
563
|
+
- **CLAUDE.md policy block** — strips the injected file-reading policy from `~/.claude/CLAUDE.md`
|
|
564
|
+
- **Global config** — prompts before deleting `~/.smos/` (model preference only)
|
|
565
|
+
|
|
566
|
+
**Per-project data** (`.smos/` in each project folder) must be deleted manually — the uninstaller can't know which projects you've used SMOS in:
|
|
567
|
+
|
|
568
|
+
```bash
|
|
569
|
+
# Windows (run inside the project folder)
|
|
570
|
+
Remove-Item -Recurse -Force .smos
|
|
571
|
+
|
|
572
|
+
# macOS / Linux
|
|
573
|
+
rm -rf .smos
|
|
574
|
+
```
|
|
575
|
+
|
|
576
|
+
The Python package itself is **not** removed automatically — run `pip uninstall smos-mcp` afterward if you want that too.
|
|
577
|
+
|
|
578
|
+
Ollama models are **not** removed — they are shared system-wide. To remove manually:
|
|
579
|
+
|
|
580
|
+
```bash
|
|
581
|
+
ollama rm qwen2.5:7b
|
|
582
|
+
```
|
|
583
|
+
|
|
584
|
+
Dry-run to preview what would be removed without touching anything:
|
|
585
|
+
|
|
586
|
+
```bash
|
|
587
|
+
smos uninstall --dry-run
|
|
588
|
+
```
|
|
589
|
+
|
|
590
|
+
---
|
|
591
|
+
|
|
592
|
+
## Development
|
|
593
|
+
|
|
594
|
+
```bash
|
|
595
|
+
git clone https://github.com/Witchd0ct0r/Semantic_Memory_Operating_System_SMOS
|
|
596
|
+
cd Semantic_Memory_Operating_System_SMOS
|
|
597
|
+
pip install -e ".[dev]"
|
|
598
|
+
pytest tests/ # 31 tests
|
|
599
|
+
python -m smos # run the server directly
|
|
600
|
+
```
|
|
601
|
+
|
|
602
|
+
---
|
|
603
|
+
|
|
604
|
+
## License
|
|
605
|
+
|
|
606
|
+
MIT
|