smos-mcp 0.1.2__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (33) hide show
  1. smos_mcp-0.1.2/LICENSE +21 -0
  2. smos_mcp-0.1.2/PKG-INFO +606 -0
  3. smos_mcp-0.1.2/README.md +576 -0
  4. smos_mcp-0.1.2/pyproject.toml +52 -0
  5. smos_mcp-0.1.2/setup.cfg +4 -0
  6. smos_mcp-0.1.2/smos/__init__.py +3 -0
  7. smos_mcp-0.1.2/smos/__main__.py +3 -0
  8. smos_mcp-0.1.2/smos/compression/__init__.py +0 -0
  9. smos_mcp-0.1.2/smos/compression/context_builder.py +104 -0
  10. smos_mcp-0.1.2/smos/install/__init__.py +0 -0
  11. smos_mcp-0.1.2/smos/install/cli.py +376 -0
  12. smos_mcp-0.1.2/smos/llm/__init__.py +0 -0
  13. smos_mcp-0.1.2/smos/llm/client.py +15 -0
  14. smos_mcp-0.1.2/smos/llm/summarizer.py +128 -0
  15. smos_mcp-0.1.2/smos/memory/__init__.py +0 -0
  16. smos_mcp-0.1.2/smos/memory/embeddings.py +24 -0
  17. smos_mcp-0.1.2/smos/memory/lifecycle.py +87 -0
  18. smos_mcp-0.1.2/smos/memory/schemas.py +34 -0
  19. smos_mcp-0.1.2/smos/memory/vector_store.py +335 -0
  20. smos_mcp-0.1.2/smos/server.py +188 -0
  21. smos_mcp-0.1.2/smos/tools/__init__.py +0 -0
  22. smos_mcp-0.1.2/smos/tools/file_tools.py +75 -0
  23. smos_mcp-0.1.2/smos/tools/semantic_tools.py +49 -0
  24. smos_mcp-0.1.2/smos_mcp.egg-info/PKG-INFO +606 -0
  25. smos_mcp-0.1.2/smos_mcp.egg-info/SOURCES.txt +31 -0
  26. smos_mcp-0.1.2/smos_mcp.egg-info/dependency_links.txt +1 -0
  27. smos_mcp-0.1.2/smos_mcp.egg-info/entry_points.txt +3 -0
  28. smos_mcp-0.1.2/smos_mcp.egg-info/requires.txt +10 -0
  29. smos_mcp-0.1.2/smos_mcp.egg-info/top_level.txt +1 -0
  30. smos_mcp-0.1.2/tests/test_file_tools.py +82 -0
  31. smos_mcp-0.1.2/tests/test_semantic_tools.py +70 -0
  32. smos_mcp-0.1.2/tests/test_summarizer.py +70 -0
  33. smos_mcp-0.1.2/tests/test_vector_store.py +102 -0
smos_mcp-0.1.2/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Witchd0ct0r
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,606 @@
1
+ Metadata-Version: 2.4
2
+ Name: smos-mcp
3
+ Version: 0.1.2
4
+ Summary: Semantic Memory Operating System — persistent memory MCP server for Claude Code
5
+ License: MIT
6
+ Project-URL: Homepage, https://github.com/Witchd0ct0r/Semantic_Memory_Operating_System_SMOS
7
+ Project-URL: Repository, https://github.com/Witchd0ct0r/Semantic_Memory_Operating_System_SMOS
8
+ Project-URL: Issues, https://github.com/Witchd0ct0r/Semantic_Memory_Operating_System_SMOS/issues
9
+ Keywords: claude,mcp,memory,semantic,ai,llm
10
+ Classifier: Development Status :: 4 - Beta
11
+ Classifier: Intended Audience :: Developers
12
+ Classifier: License :: OSI Approved :: MIT License
13
+ Classifier: Programming Language :: Python :: 3
14
+ Classifier: Programming Language :: Python :: 3.10
15
+ Classifier: Programming Language :: Python :: 3.11
16
+ Classifier: Programming Language :: Python :: 3.12
17
+ Requires-Python: >=3.10
18
+ Description-Content-Type: text/markdown
19
+ License-File: LICENSE
20
+ Requires-Dist: mcp[cli]<2.0.0,>=1.0.0
21
+ Requires-Dist: faiss-cpu>=1.8.0
22
+ Requires-Dist: sentence-transformers<4.0.0,>=3.0.0
23
+ Requires-Dist: openai<2.0.0,>=1.50.0
24
+ Requires-Dist: pydantic<3.0.0,>=2.5.0
25
+ Requires-Dist: numpy>=1.26.0
26
+ Provides-Extra: dev
27
+ Requires-Dist: pytest>=8.0.0; extra == "dev"
28
+ Requires-Dist: pytest-asyncio>=0.23.0; extra == "dev"
29
+ Dynamic: license-file
30
+
31
+ <h1 align="center">SMOS</h1>
32
+
33
+ <p align="center"><strong>Semantic Memory Operating System for Claude Code</strong></p>
34
+
35
+ <p align="center">
36
+ Compress files out of context. Query knowledge by meaning. Persist across sessions.
37
+ </p>
38
+
39
+ <p align="center">
40
+ <a href="https://github.com/Witchd0ct0r/Semantic_Memory_Operating_System_SMOS/stargazers"><img src="https://img.shields.io/github/stars/Witchd0ct0r/Semantic_Memory_Operating_System_SMOS?style=flat&color=blue" alt="Stars"></a>
41
+ <a href="LICENSE"><img src="https://img.shields.io/github/license/Witchd0ct0r/Semantic_Memory_Operating_System_SMOS?style=flat" alt="License"></a>
42
+ <a href="https://pypi.org/project/smos-mcp/"><img src="https://img.shields.io/pypi/v/smos-mcp?style=flat" alt="PyPI"></a>
43
+ <img src="https://img.shields.io/badge/python-3.10%2B-blue?style=flat" alt="Python">
44
+ </p>
45
+
46
+ <p align="center">
47
+ <a href="#the-problem">Problem</a> •
48
+ <a href="#how-it-works">How it works</a> •
49
+ <a href="#compression-in-practice">In practice</a> •
50
+ <a href="#install">Install</a> •
51
+ <a href="#benchmarks">Benchmarks</a> •
52
+ <a href="#example-use-cases">Examples</a> •
53
+ <a href="#tools">Tools</a> •
54
+ <a href="#configuration">Config</a> •
55
+ <a href="#update">Update</a> •
56
+ <a href="#uninstall">Uninstall</a>
57
+ </p>
58
+
59
+ ---
60
+
61
+ ## The problem
62
+
63
+ Every file Claude reads stays in the context window until the session ends. On a 20-file codebase, by the time Claude reaches synthesis it's carrying 40,000+ tokens of raw source — most of which it already processed, will never need again verbatim, and is paying for on every single API call.
64
+
65
+ **Caveman** compresses what Claude *says*. **SMOS** compresses what Claude *holds* — the files, the prior analysis, the context window itself.
66
+
67
+ ---
68
+
69
+ ## How it works
70
+
71
+ SMOS is an MCP server that gives Claude a persistent memory layer: FAISS vector search + SQLite, powered by a local LLM (qwen2.5 via Ollama) for compression.
72
+
73
+ Instead of reading a file with the built-in Read tool and leaving it in context forever, Claude calls `tool_read_file_compress`. The file is summarised by a local LLM, stored in the vector index, and **the raw source never enters the context window**. At synthesis time, Claude queries the semantic index rather than re-reading anything.
74
+
75
+ ```
76
+ WITHOUT SMOS WITH SMOS
77
+ ────────────────────────────────── ──────────────────────────────────
78
+ Read file.py (3,000 tokens) → tool_read_file_compress(file.py)
79
+ → stays in context forever → local LLM compresses to ~85 tokens
80
+ → stored in FAISS + SQLite
81
+ → nothing in context window
82
+
83
+ 10 files read → 30,000 ctx tokens 10 files compressed → ~850 ctx tokens
84
+
85
+ Synthesis: Synthesis:
86
+ still carrying 30,000 tokens → 4 semantic queries × ~300 tokens
87
+ on every API call → = ~1,200 tokens total
88
+ 35× smaller context at synthesis
89
+ ```
90
+
91
+ Memory persists across sessions. Session 2 queries what Session 1 stored — no re-reading.
92
+
93
+ ---
94
+
95
+ ## Compression in practice
96
+
97
+ ### Input — raw file (312 tokens in context without SMOS)
98
+
99
+ ```python
100
+ # smos/memory/vector_store.py (excerpt)
101
+
102
+ def store(self, content: str, metadata: dict | None = None, tier: str = "working") -> str:
103
+ meta = metadata or {}
104
+ doc_id = str(uuid.uuid4())
105
+ ts = datetime.now(timezone.utc).isoformat()
106
+
107
+ summary = self._summarizer.summarize(content)
108
+ embedding = self._embed(summary)
109
+
110
+ with self._lock:
111
+ idx = self._index.ntotal
112
+ self._index.add(embedding.reshape(1, -1))
113
+ self._db.execute(
114
+ "INSERT INTO memories VALUES (?,?,?,?,?,?)",
115
+ (doc_id, content, summary, json.dumps(meta), tier, ts),
116
+ )
117
+ self._db.commit()
118
+ self._id_map[idx] = doc_id
119
+ return doc_id
120
+
121
+ def query(self, text: str, top_k: int = 5) -> list[dict]:
122
+ embedding = self._embed(text)
123
+ distances, indices = self._index.search(embedding.reshape(1, -1), top_k)
124
+ results = []
125
+ for dist, idx in zip(distances[0], indices[0]):
126
+ if idx == -1:
127
+ continue
128
+ doc_id = self._id_map.get(int(idx))
129
+ row = self._db.execute(
130
+ "SELECT content, summary, metadata, tier, created_at FROM memories WHERE id = ?",
131
+ (doc_id,),
132
+ ).fetchone()
133
+ if row:
134
+ results.append({
135
+ "id": doc_id,
136
+ "summary": row[1],
137
+ "score": float(dist),
138
+ "tier": row[3],
139
+ })
140
+ return results
141
+ ```
142
+
143
+ ### Output — LLM summary stored in SMOS (42 tokens)
144
+
145
+ ```
146
+ Stores text with LLM-compressed embedding into FAISS index and SQLite.
147
+ Assigns UUID, timestamps entry, generates summary via summarizer, embeds
148
+ with sentence-transformer, persists metadata and tier. Query method embeds
149
+ input text, searches FAISS for nearest neighbours, returns scored results
150
+ with summary and tier.
151
+ ```
152
+
153
+ **42 tokens stored. 312 tokens never entered the context window. 7.4× compression on this excerpt.**
154
+
155
+ ### What's written to SQLite
156
+
157
+ ```
158
+ id: f3a2b1c0-8d4e-4f7a-9b2c-1e5d6f3a2b1c
159
+ summary: Stores text with LLM-compressed embedding into FAISS...
160
+ content: [original source, stored for lossless retrieval if needed]
161
+ metadata: {"source": "smos/memory/vector_store.py"}
162
+ tier: working
163
+ created_at: 2026-06-22T14:23:11.847Z
164
+ ```
165
+
166
+ The FAISS index stores the 384-dimensional embedding of the summary. Queries embed the search string and find nearest neighbours by cosine distance — no keywords, no exact match required.
167
+
168
+ ### Query result returned to Claude
169
+
170
+ ```
171
+ tool_semantic_query("how does storage work")
172
+
173
+ → score: 0.91
174
+ summary: "Stores text with LLM-compressed embedding into FAISS index
175
+ and SQLite. Assigns UUID, timestamps entry, generates summary
176
+ via summarizer, embeds with sentence-transformer..."
177
+ source: smos/memory/vector_store.py
178
+ tier: working
179
+ ```
180
+
181
+ Claude gets the 42-token summary and a confidence score. The 312-token source stays on disk.
182
+
183
+ ### Lossless retrieval — when you need the exact original
184
+
185
+ For code, diffs, or any content where exact bytes matter, use the verbatim path instead:
186
+
187
+ ```
188
+ tool_store_verbatim(content="<full source>", label="vector_store.py store+query methods")
189
+ → key: "f3a2b1c0-8d4e-4f7a-9b2c-1e5d6f3a2b1c"
190
+ ```
191
+
192
+ Retrieve it later by key:
193
+
194
+ ```
195
+ tool_retrieve(key="f3a2b1c0-8d4e-4f7a-9b2c-1e5d6f3a2b1c")
196
+
197
+ → key: f3a2b1c0-8d4e-4f7a-9b2c-1e5d6f3a2b1c
198
+ label: vector_store.py store+query methods
199
+ timestamp: 2026-06-22T14:23:11.847Z
200
+ content:
201
+ def store(self, content: str, metadata: dict | None = None, tier: str = "working") -> str:
202
+ meta = metadata or {}
203
+ doc_id = str(uuid.uuid4())
204
+ ... [exact original, byte-for-byte]
205
+ ```
206
+
207
+ **Verbatim storage has no compression and no semantic index — it is a pure key-value store.** Use it when you need to reconstruct a diff, apply a patch, or pass exact code to a tool.
208
+
209
+ ### Which path to use
210
+
211
+ | Content type | Tool | Retrieval |
212
+ |---|---|---|
213
+ | Prose, analysis, docs, logs | `tool_read_file_compress` / `tool_semantic_store` | `tool_semantic_query` (by meaning) |
214
+ | Code, diffs, structured data | `tool_store_verbatim` | `tool_retrieve` (by key) |
215
+
216
+ ---
217
+
218
+ ## Install
219
+
220
+ **Prerequisites:** Python 3.10+, [Claude Code](https://claude.ai/code), [Ollama](https://ollama.com/download)
221
+
222
+ ```bash
223
+ pip install smos-mcp
224
+ smos setup
225
+ ```
226
+
227
+ > **Note:** `smos-mcp` is the PyPI package name. The CLI commands installed are `smos` and `smos-server`.
228
+
229
+ The setup wizard handles everything else: Python deps, model selection, model pull, MCP registration, and CLAUDE.md policy injection. Restart Claude Code when done.
230
+
231
+ ```bash
232
+ claude mcp list # verify: smos should appear
233
+ ```
234
+
235
+ Alternatively, install directly from GitHub:
236
+
237
+ ```bash
238
+ pip install git+https://github.com/Witchd0ct0r/Semantic_Memory_Operating_System_SMOS.git
239
+ smos setup
240
+ ```
241
+
242
+ ### If `smos` is not found after install
243
+
244
+ pip installs CLI scripts to a directory that may not be on your `PATH`. Find it:
245
+
246
+ ```bash
247
+ python -m site --user-scripts
248
+ ```
249
+
250
+ Then add it permanently:
251
+
252
+ **Windows (PowerShell)**
253
+
254
+ ```powershell
255
+ $scripts = python -m site --user-scripts
256
+ [Environment]::SetEnvironmentVariable("Path", "$env:Path;$scripts", "User")
257
+ # Restart PowerShell for the change to take effect
258
+ ```
259
+
260
+ **macOS (zsh)**
261
+
262
+ ```bash
263
+ echo 'export PATH="$(python3 -m site --user-scripts):$PATH"' >> ~/.zshrc && source ~/.zshrc
264
+ ```
265
+
266
+ **Linux (bash)**
267
+
268
+ ```bash
269
+ echo 'export PATH="$(python3 -m site --user-scripts):$PATH"' >> ~/.bashrc && source ~/.bashrc
270
+ ```
271
+
272
+ > **conda / miniconda users:** Scripts land in `$CONDA_PREFIX\Scripts` (Windows) or `$CONDA_PREFIX/bin` (macOS/Linux). These are on PATH when the conda environment is active — if you installed into `base` or an active env, `smos` should work immediately after activating that environment.
273
+
274
+ ---
275
+
276
+ ## Benchmarks
277
+
278
+ All numbers measured on real data. Benchmarks live in [`tests/`](./tests/).
279
+
280
+ > **Test hardware:** AMD Ryzen 5 7640HS (6C / 12T, 4.3 GHz) · 32 GB RAM · RTX 4050 Laptop 6 GB VRAM · 1 TB Kioxia NVMe · Windows 11
281
+
282
+ ### Compression quality
283
+
284
+ Local LLM (qwen2.5:7b via Ollama) compresses files to a fixed-length summary. **Factual retention is 100% at all sizes** — all seeded keywords recovered from every summary across 3 independent runs.
285
+
286
+ | File size | Tokens in context (before) | Tokens after SMOS | Compression | Retention |
287
+ |-----------|:--------------------------:|:-----------------:|:-----------:|:---------:|
288
+ | 1 KB | ~260 | ~85 | **3.1×** | 100% |
289
+ | 5 KB | ~1,300 | ~110 | **11.8×** | 100% |
290
+ | 10 KB | ~2,585 | ~76 | **34.2×** | 100% |
291
+ | 50 KB | ~12,825 | ~83 | **154.7×** | 100% |
292
+ | **avg** | | | **51×** | **100%** |
293
+
294
+ At 50KB+ files — typical for large modules, log files, or generated content — SMOS compresses **154× with zero factual loss**. Summary length plateaus at ~330–440 characters regardless of input size above ~5KB; the LLM abstracts to a fixed-length output.
295
+
296
+ ```
297
+ Context window pressure at synthesis (20-file codebase, 5KB avg)
298
+ ──────────────────────────────────────────────────────────────────
299
+
300
+ Without SMOS ████████████████████████████████████████ 26,000 tokens
301
+ With SMOS ██░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 2,200 tokens
302
+
303
+ ▲ 91% smaller
304
+ ```
305
+
306
+ ### Query latency
307
+
308
+ SMOS queries are fast and scale gracefully. P95 query latency grows only **1.21× when data grows 100×**. FAISS uses SIMD dot-product batching that stays sub-linear up to ~500K entries on standard hardware.
309
+
310
+ | Memories stored | Query avg | Query P95 | Query P99 |
311
+ |:--------------:|:---------:|:---------:|:---------:|
312
+ | 1,000 | 11.6 ms | 14.6 ms | 14.6 ms |
313
+ | 5,000 | 11.4 ms | 14.6 ms | 14.6 ms |
314
+ | 10,000 | 12.3 ms | 16.3 ms | 16.3 ms |
315
+ | 50,000 | 11.2 ms | 13.1 ms | 13.1 ms |
316
+ | 100,000 | 14.0 ms | 16.9 ms | 16.9 ms |
317
+
318
+ ```
319
+ Query latency vs. corpus size
320
+ ─────────────────────────────
321
+ 20ms │ · · · · · · ·
322
+
323
+ 15ms │ × × × × × = P95 measured
324
+ │ · · · · ·
325
+ 10ms │ · · = avg measured
326
+
327
+ 5ms │
328
+ └──────────────────────────
329
+ 1K 5K 10K 50K 100K
330
+
331
+ 100× more data. 1.21× slower queries.
332
+ ```
333
+
334
+ ### Retrieval quality
335
+
336
+ Evaluated on 200 documents across 8 technical domains (security, auth, FastAPI, PostgreSQL, Redis, Kubernetes, monitoring, CI/CD). 40 queries, 5 per domain.
337
+
338
+ | Metric | Score |
339
+ |--------|------:|
340
+ | P@1 (first result correct domain) | **100%** |
341
+ | MRR (mean reciprocal rank) | **1.000** |
342
+ | P@3 micro-average | 78.3% |
343
+ | P@5 micro-average | 73.0% |
344
+
345
+ Every first result is from the correct domain across all 40 queries. Top-5 bleed is expected and reflects genuine semantic overlap (JWT tokens appear in both security and auth documents, CI/CD pipelines reference Kubernetes, etc.).
346
+
347
+ ### Ingest throughput
348
+
349
+ | Path | Rate | Bottleneck |
350
+ |------|-----:|-----------|
351
+ | Real-time (store() call) | 42 docs/s | Embedding model (98% of time) |
352
+ | Bulk import | 300 docs/s | Embedding model only |
353
+
354
+ Embedding is the ceiling on both paths — FAISS add and SQLite write together account for ~2% of ingest time.
355
+
356
+ ### Scaling ceiling
357
+
358
+ SMOS is production-ready for ≤100K memories on standard hardware. The lifecycle manager runs O(M) deduplication where M is the batch size (50), independent of total corpus size — dedup cycles stay at ~1.1 seconds whether you have 10K or 1M memories stored.
359
+
360
+ ```
361
+ Component limits (tested: Ryzen 5 7640HS, 32 GB RAM, RTX 4050 Laptop 6 GB VRAM)
362
+ ──────────────────────────────────────────────────────────────────────────
363
+ Query < 20ms P95 ████████████████████ 100K memories
364
+ Lifecycle functional ██████████████████████████████ 1M+ memories
365
+ Ingest rate 42 docs/s (real-time) / 300 docs/s (bulk)
366
+ Max document size ~50KB (qwen2.5:7b context window)
367
+ FAISS index size 147 MB at 100K memories / 1.4 GB at 1M memories
368
+ ```
369
+
370
+ ---
371
+
372
+ ## When SMOS saves tokens
373
+
374
+ SMOS pays off when the knowledge being accumulated exceeds what fits comfortably in context, or when the same codebase is visited more than once.
375
+
376
+ | Scenario | Savings |
377
+ |----------|---------|
378
+ | 50KB+ files (logs, generated code, docs) | Up to **154× context reduction per file** |
379
+ | Codebases > 30 files | Synthesis context stays fixed; baseline grows linearly |
380
+ | Multi-session work | Session 2+ queries stored memory; no re-reading |
381
+ | Repeated analysis from different angles | Query same compressed knowledge; pay once |
382
+ | Long agentic runs | Prior tool outputs stored out-of-context; don't accumulate |
383
+
384
+ **Single-session, small codebases (< 10 files, < 5KB each):** SMOS overhead exceeds savings. The tool is designed for sustained use and scale, not one-shot audits of tiny repos.
385
+
386
+ ---
387
+
388
+ ## Example use cases
389
+
390
+ ### Codebase audit across many files
391
+
392
+ ```
393
+ Read every file in src/ and give me a security audit.
394
+ ```
395
+
396
+ Without SMOS, Claude reads 40 files → 60,000 tokens in context by the time it reaches synthesis. With SMOS, each file is compressed to ~85 tokens and stored. Synthesis pulls only what's relevant via semantic query. Context at synthesis: ~1,200 tokens.
397
+
398
+ ---
399
+
400
+ ### Multi-session feature work
401
+
402
+ Day 1 — Claude reads the auth module, database schema, and API contracts. All compressed and stored.
403
+
404
+ Day 2 — new session, zero re-reading:
405
+
406
+ ```
407
+ What did we establish about the auth flow yesterday?
408
+ ```
409
+
410
+ SMOS returns the stored context instantly. Claude picks up exactly where it left off without touching a file.
411
+
412
+ ---
413
+
414
+ ### Large log / generated file analysis
415
+
416
+ ```
417
+ Read build/output.log and tell me what failed.
418
+ ```
419
+
420
+ A 50KB build log would consume ~12,800 tokens in context and stay there. With SMOS, it compresses 154× to ~83 tokens. Claude gets the failure summary; the raw log never enters the window.
421
+
422
+ ---
423
+
424
+ ### Accumulating decisions across a long agent run
425
+
426
+ Claude is running a multi-step refactor — reading files, making decisions, writing changes. Without SMOS, every prior decision accumulates in context. With SMOS:
427
+
428
+ ```python
429
+ tool_store_verbatim(content=diff, label="auth-refactor-step-3")
430
+ tool_semantic_store("Decided to replace JWT with session tokens — see verbatim key abc123")
431
+ ```
432
+
433
+ Prior steps are queryable but out-of-context. The agent runs indefinitely without hitting the context ceiling.
434
+
435
+ ---
436
+
437
+ ### Repeated analysis from different angles
438
+
439
+ ```
440
+ # Session 1
441
+ Analyse src/payments.py for performance issues.
442
+
443
+ # Session 2
444
+ Analyse src/payments.py for security issues.
445
+ ```
446
+
447
+ Session 2 queries the compressed version stored in session 1 — no re-read, no re-embedding, instant retrieval. Analysis starts immediately from stored knowledge.
448
+
449
+ ---
450
+
451
+ ## How Claude uses it
452
+
453
+ Once installed, Claude follows this policy automatically (injected via `~/.claude/CLAUDE.md`):
454
+
455
+ 1. **Query first** — before reading any file, call `tool_semantic_query`. If the answer is already in memory, skip the read entirely.
456
+ 2. **Compress reads** — use `tool_read_file_compress` for any file not about to be edited. Raw source never enters context.
457
+ 3. **Precise reads** — use the built-in Read tool only immediately before an `Edit` or `Write` call.
458
+ 4. **Lossless storage** — code, diffs, and structured data go to `tool_store_verbatim` (no LLM compression, exact bytes on retrieval).
459
+ 5. **Synthesise from memory** — use `tool_semantic_query` instead of re-reading already-compressed files.
460
+
461
+ ---
462
+
463
+ ## Tools
464
+
465
+ | Tool | Description |
466
+ |------|------------|
467
+ | `tool_read_file_compress` | Read a file, compress with local LLM, store summary. Raw file never enters context window. Accepts absolute paths. |
468
+ | `tool_semantic_store` | Store any text as a queryable semantic memory. |
469
+ | `tool_semantic_query` | Retrieve compressed context via natural language. Returns summary + confidence + sources. |
470
+ | `tool_semantic_write` | Store a typed, tagged memory object (doc / adr / log / issue). |
471
+ | `tool_store_verbatim` | Store exact content losslessly — code, diffs, any artifact where exact bytes matter. Returns a retrieval key. |
472
+ | `tool_retrieve` | Retrieve verbatim content by key. |
473
+ | `tool_write_file_safe` | Write files to the sandboxed workspace directory. |
474
+
475
+ ---
476
+
477
+ ## Configuration
478
+
479
+ Environment variables (set during `smos setup` or in your shell):
480
+
481
+ | Variable | Default | Description |
482
+ |----------|---------|-------------|
483
+ | `OLLAMA_BASE_URL` | `http://localhost:11434/v1` | Ollama endpoint |
484
+ | `OLLAMA_MODEL` | `qwen2.5:7b` | Summarization model |
485
+ | `SUMMARIZER_MAX_TOKENS` | `512` | Max tokens per summary output |
486
+
487
+ ### Model options (chosen during setup)
488
+
489
+ | Model | Size | Min RAM | Compression quality |
490
+ |-------|-----:|:-------:|-------------------|
491
+ | `qwen2.5:7b` | 4.7 GB | 8 GB | Best (benchmarked) — GPU-accelerated if CUDA available |
492
+ | `qwen2.5:3b` | 2.0 GB | 4 GB | Good |
493
+ | `qwen2.5:1.5b`| 0.9 GB | 4 GB | Fast |
494
+ | none | — | — | Extractive fallback (first sentences only) |
495
+
496
+ The RTX 4050 (or any CUDA GPU) will be used automatically by Ollama if available, reducing LLM latency from ~10s to ~2–3s per compression call.
497
+
498
+ Without Ollama, SMOS falls back to extractive summarization. Semantic querying and verbatim storage work normally — only LLM-driven compression degrades.
499
+
500
+ ---
501
+
502
+ ## Data
503
+
504
+ Each project gets its own isolated data store. SMOS creates a `.smos/` folder in the project root the first time Claude Code opens the project — no manual setup required.
505
+
506
+ ```
507
+ <your-project>/
508
+ └── .smos/
509
+ ├── faiss.index — vector index for this project (147 MB at 100K memories)
510
+ ├── metadata.db — SQLite: content, summaries, tiers, verbatim store
511
+ ├── workspace/ — sandboxed file write area
512
+ └── logs/ — write audit log
513
+
514
+ ~/.smos/
515
+ └── .env — global model preference (OLLAMA_MODEL=qwen2.5:7b)
516
+ ```
517
+
518
+ Nothing leaves your machine. Memories from Project A never appear in Project B queries.
519
+
520
+ To delete a project's memory:
521
+
522
+ ```bash
523
+ # Windows
524
+ Remove-Item -Recurse -Force .smos
525
+
526
+ # macOS / Linux
527
+ rm -rf .smos
528
+ ```
529
+
530
+ Add `.smos/` to your `.gitignore` to keep memory data out of version control (SMOS does this automatically for new projects).
531
+
532
+ The database survives crashes: on restart, SMOS detects FAISS/SQLite divergence and rebuilds the index from SQLite automatically (re-embeds all content in batches of 256).
533
+
534
+ ---
535
+
536
+ ## Update
537
+
538
+ ```bash
539
+ smos update
540
+ ```
541
+
542
+ Pulls the latest version from GitHub and upgrades in place. Restart Claude Code afterward.
543
+
544
+ Check your current version:
545
+
546
+ ```bash
547
+ smos --version
548
+ ```
549
+
550
+ **To get notified of new releases:** go to the [GitHub repo](https://github.com/Witchd0ct0r/Semantic_Memory_Operating_System_SMOS), click **Watch → Custom → Releases**.
551
+
552
+ ---
553
+
554
+ ## Uninstall
555
+
556
+ ```bash
557
+ smos uninstall
558
+ ```
559
+
560
+ This removes:
561
+
562
+ - **MCP registration** — `claude mcp remove smos` (runs automatically)
563
+ - **CLAUDE.md policy block** — strips the injected file-reading policy from `~/.claude/CLAUDE.md`
564
+ - **Global config** — prompts before deleting `~/.smos/` (model preference only)
565
+
566
+ **Per-project data** (`.smos/` in each project folder) must be deleted manually — the uninstaller can't know which projects you've used SMOS in:
567
+
568
+ ```bash
569
+ # Windows (run inside the project folder)
570
+ Remove-Item -Recurse -Force .smos
571
+
572
+ # macOS / Linux
573
+ rm -rf .smos
574
+ ```
575
+
576
+ The Python package itself is **not** removed automatically — run `pip uninstall smos-mcp` afterward if you want that too.
577
+
578
+ Ollama models are **not** removed — they are shared system-wide. To remove manually:
579
+
580
+ ```bash
581
+ ollama rm qwen2.5:7b
582
+ ```
583
+
584
+ Dry-run to preview what would be removed without touching anything:
585
+
586
+ ```bash
587
+ smos uninstall --dry-run
588
+ ```
589
+
590
+ ---
591
+
592
+ ## Development
593
+
594
+ ```bash
595
+ git clone https://github.com/Witchd0ct0r/Semantic_Memory_Operating_System_SMOS
596
+ cd Semantic_Memory_Operating_System_SMOS
597
+ pip install -e ".[dev]"
598
+ pytest tests/ # 31 tests
599
+ python -m smos # run the server directly
600
+ ```
601
+
602
+ ---
603
+
604
+ ## License
605
+
606
+ MIT