superlocalmemory 3.3.26 → 3.3.28
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/ATTRIBUTION.md +13 -0
- package/CHANGELOG.md +15 -0
- package/README.md +29 -5
- package/package.json +1 -1
- package/pyproject.toml +1 -1
- package/src/superlocalmemory/cli/commands.py +53 -16
- package/src/superlocalmemory/cli/daemon.py +91 -0
- package/src/superlocalmemory/core/embeddings.py +125 -0
- package/src/superlocalmemory/core/engine_wiring.py +28 -8
- package/src/superlocalmemory/core/ollama_embedder.py +68 -5
- package/src/superlocalmemory/encoding/scene_builder.py +17 -2
- package/src/superlocalmemory/mcp/tools_core.py +48 -20
package/ATTRIBUTION.md
CHANGED
|
@@ -36,6 +36,19 @@ from qualixar_attribution import QualixarSigner
|
|
|
36
36
|
is_valid = QualixarSigner.verify(signed_output)
|
|
37
37
|
```
|
|
38
38
|
|
|
39
|
+
### Research Papers
|
|
40
|
+
|
|
41
|
+
SuperLocalMemory is backed by three peer-reviewed research papers:
|
|
42
|
+
|
|
43
|
+
1. **Paper 1 — Trust & Behavioral Foundations** (arXiv:2603.02240)
|
|
44
|
+
Bayesian trust defense, behavioral pattern mining, OWASP-aligned memory poisoning protection.
|
|
45
|
+
|
|
46
|
+
2. **Paper 2 — Information-Geometric Foundations** (arXiv:2603.14588)
|
|
47
|
+
Fisher-Rao geodesic distance, cellular sheaf cohomology, Riemannian Langevin lifecycle dynamics.
|
|
48
|
+
|
|
49
|
+
3. **Paper 3 — The Living Brain** (arXiv:2604.04514)
|
|
50
|
+
FRQAD mixed-precision metric, Ebbinghaus adaptive forgetting, 7-channel cognitive retrieval, memory parameterization, trust-weighted forgetting.
|
|
51
|
+
|
|
39
52
|
### Research Initiative
|
|
40
53
|
|
|
41
54
|
Qualixar is a research initiative for AI agent development tools by Varun Pratap Bhardwaj. SuperLocalMemory is one of several research initiatives under the Qualixar umbrella.
|
package/CHANGELOG.md
CHANGED
|
@@ -16,6 +16,21 @@ SuperLocalMemory V3 - Intelligent local memory system for AI coding assistants.
|
|
|
16
16
|
|
|
17
17
|
---
|
|
18
18
|
|
|
19
|
+
## [3.3.28] - 2026-04-07 — Stability Hotfix
|
|
20
|
+
|
|
21
|
+
### Fixed
|
|
22
|
+
- **Excessive memory usage during rapid file edits** — auto-observe now reuses a single background process instead of spawning one per edit. Rapid multi-file operations (parallel agents, branch switching, batch edits) no longer risk high memory usage.
|
|
23
|
+
- **Observation debounce** — rapid-fire observations are batched and deduplicated within a short window, reducing redundant work.
|
|
24
|
+
- **Memory-aware worker management** — new safety check skips heavy processing when system memory is low.
|
|
25
|
+
|
|
26
|
+
### New Environment Variables
|
|
27
|
+
| Variable | Default | Description |
|
|
28
|
+
|----------|---------|-------------|
|
|
29
|
+
| `SLM_OBSERVE_DEBOUNCE_SEC` | `3.0` | Observation batching window |
|
|
30
|
+
| `SLM_MIN_AVAILABLE_MEMORY_GB` | `2.0` | Min free RAM for background processing |
|
|
31
|
+
|
|
32
|
+
---
|
|
33
|
+
|
|
19
34
|
## [3.3.3] - 2026-04-01 — Langevin Awakening
|
|
20
35
|
|
|
21
36
|
### Fixed
|
package/README.md
CHANGED
|
@@ -4,7 +4,8 @@
|
|
|
4
4
|
|
|
5
5
|
<h1 align="center">SuperLocalMemory V3.3</h1>
|
|
6
6
|
<p align="center"><strong>Every other AI forgets. Yours won't.</strong><br/><em>Infinite memory for Claude Code, Cursor, Windsurf & 17+ AI tools.</em></p>
|
|
7
|
-
<p align="center"><code>v3.3.
|
|
7
|
+
<p align="center"><code>v3.3.26</code> — Install once. Every session remembers the last. Automatically.</p>
|
|
8
|
+
<p align="center"><strong>Backed by 3 peer-reviewed research papers</strong> · <a href="https://arxiv.org/abs/2603.02240">arXiv:2603.02240</a> · <a href="https://arxiv.org/abs/2603.14588">arXiv:2603.14588</a> · <a href="https://arxiv.org/abs/2604.04514">arXiv:2604.04514</a></p>
|
|
8
9
|
|
|
9
10
|
<p align="center">
|
|
10
11
|
<code>+16pp vs Mem0 (zero cloud)</code> · <code>85% Open-Domain (best of any system)</code> · <code>EU AI Act Ready</code>
|
|
@@ -435,12 +436,19 @@ Auto-capture hooks: `slm hooks install` + `slm observe` + `slm session-context`.
|
|
|
435
436
|
|
|
436
437
|
## Research Papers
|
|
437
438
|
|
|
438
|
-
|
|
439
|
+
SuperLocalMemory is backed by three peer-reviewed research papers covering trust, information geometry, and cognitive memory architecture.
|
|
440
|
+
|
|
441
|
+
### Paper 3: The Living Brain (V3.3)
|
|
442
|
+
> **SuperLocalMemory V3.3: The Living Brain — Biologically-Inspired Forgetting, Cognitive Quantization, and Multi-Channel Retrieval for Zero-LLM Agent Memory Systems**
|
|
443
|
+
> Varun Pratap Bhardwaj (2026)
|
|
444
|
+
> [arXiv:2604.04514](https://arxiv.org/abs/2604.04514) · [Zenodo DOI: 10.5281/zenodo.19435120](https://zenodo.org/records/19435120)
|
|
445
|
+
|
|
446
|
+
### Paper 2: Information-Geometric Foundations (V3)
|
|
439
447
|
> **SuperLocalMemory V3: Information-Geometric Foundations for Zero-LLM Enterprise Agent Memory**
|
|
440
448
|
> Varun Pratap Bhardwaj (2026)
|
|
441
449
|
> [arXiv:2603.14588](https://arxiv.org/abs/2603.14588) · [Zenodo DOI: 10.5281/zenodo.19038659](https://zenodo.org/records/19038659)
|
|
442
450
|
|
|
443
|
-
###
|
|
451
|
+
### Paper 1: Trust & Behavioral Foundations (V2)
|
|
444
452
|
> **SuperLocalMemory: A Structured Local Memory Architecture for Persistent AI Agent Context**
|
|
445
453
|
> Varun Pratap Bhardwaj (2026)
|
|
446
454
|
> [arXiv:2603.02240](https://arxiv.org/abs/2603.02240) · [Zenodo DOI: 10.5281/zenodo.18709670](https://zenodo.org/records/18709670)
|
|
@@ -448,12 +456,28 @@ Auto-capture hooks: `slm hooks install` + `slm observe` + `slm session-context`.
|
|
|
448
456
|
### Cite This Work
|
|
449
457
|
|
|
450
458
|
```bibtex
|
|
459
|
+
@article{bhardwaj2026slmv33,
|
|
460
|
+
title={SuperLocalMemory V3.3: The Living Brain — Biologically-Inspired
|
|
461
|
+
Forgetting, Cognitive Quantization, and Multi-Channel Retrieval
|
|
462
|
+
for Zero-LLM Agent Memory Systems},
|
|
463
|
+
author={Bhardwaj, Varun Pratap},
|
|
464
|
+
journal={arXiv preprint arXiv:2604.04514},
|
|
465
|
+
year={2026},
|
|
466
|
+
url={https://arxiv.org/abs/2604.04514}
|
|
467
|
+
}
|
|
468
|
+
|
|
451
469
|
@article{bhardwaj2026slmv3,
|
|
452
470
|
title={Information-Geometric Foundations for Zero-LLM Enterprise Agent Memory},
|
|
453
471
|
author={Bhardwaj, Varun Pratap},
|
|
454
472
|
journal={arXiv preprint arXiv:2603.14588},
|
|
455
|
-
year={2026}
|
|
456
|
-
|
|
473
|
+
year={2026}
|
|
474
|
+
}
|
|
475
|
+
|
|
476
|
+
@article{bhardwaj2026slm,
|
|
477
|
+
title={A Structured Local Memory Architecture for Persistent AI Agent Context},
|
|
478
|
+
author={Bhardwaj, Varun Pratap},
|
|
479
|
+
journal={arXiv preprint arXiv:2603.02240},
|
|
480
|
+
year={2026}
|
|
457
481
|
}
|
|
458
482
|
```
|
|
459
483
|
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "superlocalmemory",
|
|
3
|
-
"version": "3.3.
|
|
3
|
+
"version": "3.3.28",
|
|
4
4
|
"description": "Information-geometric agent memory with mathematical guarantees. 4-channel retrieval, Fisher-Rao similarity, zero-LLM mode, EU AI Act compliant. Works with Claude, Cursor, Windsurf, and 17+ AI tools.",
|
|
5
5
|
"keywords": [
|
|
6
6
|
"ai-memory",
|
package/pyproject.toml
CHANGED
|
@@ -1554,11 +1554,14 @@ def cmd_session_context(args: Namespace) -> None:
|
|
|
1554
1554
|
|
|
1555
1555
|
|
|
1556
1556
|
def cmd_observe(args: Namespace) -> None:
|
|
1557
|
-
"""Evaluate and auto-capture content from stdin or argument.
|
|
1557
|
+
"""Evaluate and auto-capture content from stdin or argument.
|
|
1558
|
+
|
|
1559
|
+
V3.3.28: Routes through daemon to prevent embedding worker memory blast.
|
|
1560
|
+
Previously each `slm observe` spawned its own MemoryEngine + embedding
|
|
1561
|
+
worker (~1.4 GB each). With 20 parallel edits = 28+ GB = system crash.
|
|
1562
|
+
Now uses the daemon's singleton engine (1 worker total).
|
|
1563
|
+
"""
|
|
1558
1564
|
import sys
|
|
1559
|
-
from superlocalmemory.hooks.auto_capture import AutoCapture
|
|
1560
|
-
from superlocalmemory.core.config import SLMConfig
|
|
1561
|
-
from superlocalmemory.core.engine import MemoryEngine
|
|
1562
1565
|
|
|
1563
1566
|
content = getattr(args, "content", "") or ""
|
|
1564
1567
|
if not content and not sys.stdin.isatty():
|
|
@@ -1568,22 +1571,56 @@ def cmd_observe(args: Namespace) -> None:
|
|
|
1568
1571
|
print("No content to observe.")
|
|
1569
1572
|
return
|
|
1570
1573
|
|
|
1574
|
+
# V3.3.28: Route through daemon (singleton engine, single embedding worker).
|
|
1575
|
+
# This is the P0 fix for the memory blast incident of April 7, 2026.
|
|
1571
1576
|
try:
|
|
1572
|
-
|
|
1573
|
-
|
|
1574
|
-
|
|
1577
|
+
from superlocalmemory.cli.daemon import is_daemon_running, daemon_request, ensure_daemon
|
|
1578
|
+
if is_daemon_running() or ensure_daemon():
|
|
1579
|
+
result = daemon_request("POST", "/observe", {"content": content})
|
|
1580
|
+
if result is not None:
|
|
1581
|
+
if result.get("captured"):
|
|
1582
|
+
cat = result.get("category", "unknown")
|
|
1583
|
+
conf = result.get("confidence", 0)
|
|
1584
|
+
print(f"Auto-captured: {cat} (confidence: {conf:.2f}) (via daemon)")
|
|
1585
|
+
else:
|
|
1586
|
+
reason = result.get("reason", "no patterns matched")
|
|
1587
|
+
print(f"Not captured: {reason}")
|
|
1588
|
+
return
|
|
1589
|
+
except Exception:
|
|
1590
|
+
pass # Fall through to direct engine
|
|
1575
1591
|
|
|
1576
|
-
|
|
1577
|
-
|
|
1592
|
+
# Fallback: direct engine (only if daemon unavailable).
|
|
1593
|
+
# Acquires a system-wide file lock to prevent concurrent worker spawns.
|
|
1594
|
+
try:
|
|
1595
|
+
from superlocalmemory.hooks.auto_capture import AutoCapture
|
|
1596
|
+
from superlocalmemory.core.config import SLMConfig
|
|
1597
|
+
from superlocalmemory.core.engine import MemoryEngine
|
|
1598
|
+
from superlocalmemory.core.embeddings import acquire_embedding_lock
|
|
1599
|
+
|
|
1600
|
+
if not acquire_embedding_lock():
|
|
1601
|
+
logger.debug("observe: another embedding worker active, skipping")
|
|
1602
|
+
print("Not captured: system busy (another embedding in progress)")
|
|
1603
|
+
return
|
|
1604
|
+
|
|
1605
|
+
try:
|
|
1606
|
+
config = SLMConfig.load()
|
|
1607
|
+
engine = MemoryEngine(config)
|
|
1608
|
+
engine.initialize()
|
|
1578
1609
|
|
|
1579
|
-
|
|
1580
|
-
|
|
1581
|
-
|
|
1582
|
-
|
|
1610
|
+
auto = AutoCapture(engine=engine)
|
|
1611
|
+
decision = auto.evaluate(content)
|
|
1612
|
+
|
|
1613
|
+
if decision.capture:
|
|
1614
|
+
stored = auto.capture(content, category=decision.category)
|
|
1615
|
+
if stored:
|
|
1616
|
+
print(f"Auto-captured: {decision.category} (confidence: {decision.confidence:.2f})")
|
|
1617
|
+
else:
|
|
1618
|
+
print(f"Detected {decision.category} but store failed.")
|
|
1583
1619
|
else:
|
|
1584
|
-
print(f"
|
|
1585
|
-
|
|
1586
|
-
|
|
1620
|
+
print(f"Not captured: {decision.reason}")
|
|
1621
|
+
finally:
|
|
1622
|
+
from superlocalmemory.core.embeddings import release_embedding_lock
|
|
1623
|
+
release_embedding_lock()
|
|
1587
1624
|
except Exception as exc:
|
|
1588
1625
|
logger.debug("observe failed: %s", exc)
|
|
1589
1626
|
|
|
@@ -37,6 +37,7 @@ import sys
|
|
|
37
37
|
import time
|
|
38
38
|
from http.server import HTTPServer, BaseHTTPRequestHandler
|
|
39
39
|
from pathlib import Path
|
|
40
|
+
import threading
|
|
40
41
|
from threading import Thread
|
|
41
42
|
|
|
42
43
|
logger = logging.getLogger(__name__)
|
|
@@ -153,6 +154,73 @@ def stop_daemon() -> bool:
|
|
|
153
154
|
_engine = None
|
|
154
155
|
_last_activity = time.monotonic()
|
|
155
156
|
|
|
157
|
+
# ---------------------------------------------------------------------------
|
|
158
|
+
# V3.3.28: Observation debounce buffer.
|
|
159
|
+
#
|
|
160
|
+
# When 20+ file edits arrive in quick succession (from parallel AI agents,
|
|
161
|
+
# git checkout, or batch sed), we buffer observations for _OBSERVE_DEBOUNCE_SEC
|
|
162
|
+
# seconds and deduplicate by content hash. This reduces 20 observations → 1-3
|
|
163
|
+
# batches, each processed by the singleton engine (1 embedding worker).
|
|
164
|
+
# ---------------------------------------------------------------------------
|
|
165
|
+
|
|
166
|
+
_OBSERVE_DEBOUNCE_SEC = float(os.environ.get("SLM_OBSERVE_DEBOUNCE_SEC", "3.0"))
|
|
167
|
+
_observe_buffer: list[str] = []
|
|
168
|
+
_observe_seen: set[str] = set() # content hashes for dedup within window
|
|
169
|
+
_observe_lock = threading.Lock()
|
|
170
|
+
_observe_timer: threading.Timer | None = None
|
|
171
|
+
|
|
172
|
+
|
|
173
|
+
def _flush_observe_buffer() -> None:
|
|
174
|
+
"""Process all buffered observations as a single batch."""
|
|
175
|
+
global _observe_timer
|
|
176
|
+
with _observe_lock:
|
|
177
|
+
if not _observe_buffer:
|
|
178
|
+
return
|
|
179
|
+
batch = list(_observe_buffer)
|
|
180
|
+
_observe_buffer.clear()
|
|
181
|
+
_observe_seen.clear()
|
|
182
|
+
_observe_timer = None
|
|
183
|
+
|
|
184
|
+
# Process each unique observation (already deduped)
|
|
185
|
+
engine = _get_engine()
|
|
186
|
+
from superlocalmemory.hooks.auto_capture import AutoCapture
|
|
187
|
+
auto = AutoCapture(engine=engine)
|
|
188
|
+
|
|
189
|
+
for content in batch:
|
|
190
|
+
try:
|
|
191
|
+
decision = auto.evaluate(content)
|
|
192
|
+
if decision.capture:
|
|
193
|
+
auto.capture(content, category=decision.category)
|
|
194
|
+
except Exception:
|
|
195
|
+
pass # Don't let one bad observation kill the batch
|
|
196
|
+
|
|
197
|
+
logger.info("Observe debounce: processed %d observations (from buffer)", len(batch))
|
|
198
|
+
|
|
199
|
+
|
|
200
|
+
def _enqueue_observation(content: str) -> dict:
|
|
201
|
+
"""Add an observation to the debounce buffer. Returns immediate response."""
|
|
202
|
+
global _observe_timer
|
|
203
|
+
import hashlib
|
|
204
|
+
content_hash = hashlib.md5(content.encode()).hexdigest()
|
|
205
|
+
|
|
206
|
+
with _observe_lock:
|
|
207
|
+
if content_hash in _observe_seen:
|
|
208
|
+
return {"captured": False, "reason": "duplicate within debounce window"}
|
|
209
|
+
|
|
210
|
+
_observe_seen.add(content_hash)
|
|
211
|
+
_observe_buffer.append(content)
|
|
212
|
+
buf_size = len(_observe_buffer)
|
|
213
|
+
|
|
214
|
+
# Reset debounce timer
|
|
215
|
+
if _observe_timer is not None:
|
|
216
|
+
_observe_timer.cancel()
|
|
217
|
+
_observe_timer = threading.Timer(_OBSERVE_DEBOUNCE_SEC, _flush_observe_buffer)
|
|
218
|
+
_observe_timer.daemon = True
|
|
219
|
+
_observe_timer.start()
|
|
220
|
+
|
|
221
|
+
return {"captured": True, "queued": True, "buffer_size": buf_size,
|
|
222
|
+
"debounce_sec": _OBSERVE_DEBOUNCE_SEC}
|
|
223
|
+
|
|
156
224
|
|
|
157
225
|
def _get_engine():
|
|
158
226
|
global _engine
|
|
@@ -276,6 +344,24 @@ class DaemonHandler(BaseHTTPRequestHandler):
|
|
|
276
344
|
self._send_json(500, {"error": str(exc)})
|
|
277
345
|
return
|
|
278
346
|
|
|
347
|
+
if self.path == "/observe":
|
|
348
|
+
try:
|
|
349
|
+
body = self._read_body()
|
|
350
|
+
content = body.get("content", "")
|
|
351
|
+
if not content:
|
|
352
|
+
self._send_json(400, {"error": "content required"})
|
|
353
|
+
return
|
|
354
|
+
|
|
355
|
+
# V3.3.28: Debounced observation processing.
|
|
356
|
+
# Buffers observations for 3s, deduplicates, processes as batch.
|
|
357
|
+
# Returns immediately — the actual capture happens asynchronously
|
|
358
|
+
# via the debounce timer, using the singleton engine.
|
|
359
|
+
result = _enqueue_observation(content)
|
|
360
|
+
self._send_json(200, result)
|
|
361
|
+
except Exception as exc:
|
|
362
|
+
self._send_json(500, {"error": str(exc)})
|
|
363
|
+
return
|
|
364
|
+
|
|
279
365
|
if self.path == "/stop":
|
|
280
366
|
self._send_json(200, {"status": "stopping"})
|
|
281
367
|
Thread(target=_shutdown_server, daemon=True).start()
|
|
@@ -294,6 +380,11 @@ _server_start_time = time.monotonic()
|
|
|
294
380
|
|
|
295
381
|
def _shutdown_server() -> None:
|
|
296
382
|
global _engine, _server
|
|
383
|
+
# V3.3.28: Flush any buffered observations before shutdown
|
|
384
|
+
try:
|
|
385
|
+
_flush_observe_buffer()
|
|
386
|
+
except Exception:
|
|
387
|
+
pass
|
|
297
388
|
time.sleep(0.5)
|
|
298
389
|
if _engine is not None:
|
|
299
390
|
try:
|
|
@@ -49,6 +49,66 @@ class DimensionMismatchError(RuntimeError):
|
|
|
49
49
|
"""Raised when the actual embedding dimension differs from config."""
|
|
50
50
|
|
|
51
51
|
|
|
52
|
+
# ---------------------------------------------------------------------------
|
|
53
|
+
# V3.3.28: System-wide concurrency guard for embedding workers.
|
|
54
|
+
#
|
|
55
|
+
# The memory blast incident (April 7, 2026) was caused by 20+ concurrent
|
|
56
|
+
# `slm observe` CLI processes each spawning their own embedding_worker
|
|
57
|
+
# subprocess (1.4 GB each). This file lock ensures only MAX_CONCURRENT
|
|
58
|
+
# embedding workers can exist across ALL processes on the machine.
|
|
59
|
+
#
|
|
60
|
+
# Primary defense: daemon routing (cmd_observe → daemon → singleton engine).
|
|
61
|
+
# This lock is the secondary safety net for when the daemon isn't available.
|
|
62
|
+
# ---------------------------------------------------------------------------
|
|
63
|
+
|
|
64
|
+
_EMBEDDING_LOCK_FILE = Path.home() / ".superlocalmemory" / ".embedding.lock"
|
|
65
|
+
_MAX_CONCURRENT_WORKERS = int(os.environ.get("SLM_MAX_EMBEDDING_WORKERS", 2))
|
|
66
|
+
_embedding_lock_fd: int | None = None
|
|
67
|
+
|
|
68
|
+
|
|
69
|
+
def acquire_embedding_lock(timeout: float = 5.0) -> bool:
|
|
70
|
+
"""Acquire system-wide embedding worker lock.
|
|
71
|
+
|
|
72
|
+
Uses fcntl.flock on Unix. On Windows, falls back to allowing (no lock).
|
|
73
|
+
Returns True if lock acquired, False if timed out (another worker active).
|
|
74
|
+
"""
|
|
75
|
+
global _embedding_lock_fd
|
|
76
|
+
if sys.platform == "win32":
|
|
77
|
+
return True # No file locking on Windows — daemon routing is primary defense
|
|
78
|
+
|
|
79
|
+
import fcntl
|
|
80
|
+
_EMBEDDING_LOCK_FILE.parent.mkdir(parents=True, exist_ok=True)
|
|
81
|
+
|
|
82
|
+
try:
|
|
83
|
+
_embedding_lock_fd = os.open(str(_EMBEDDING_LOCK_FILE), os.O_CREAT | os.O_RDWR)
|
|
84
|
+
deadline = time.time() + timeout
|
|
85
|
+
while time.time() < deadline:
|
|
86
|
+
try:
|
|
87
|
+
fcntl.flock(_embedding_lock_fd, fcntl.LOCK_EX | fcntl.LOCK_NB)
|
|
88
|
+
return True
|
|
89
|
+
except (BlockingIOError, OSError):
|
|
90
|
+
time.sleep(0.2)
|
|
91
|
+
# Timeout — another worker holds the lock
|
|
92
|
+
os.close(_embedding_lock_fd)
|
|
93
|
+
_embedding_lock_fd = None
|
|
94
|
+
return False
|
|
95
|
+
except Exception:
|
|
96
|
+
return True # On error, allow through (don't block functionality)
|
|
97
|
+
|
|
98
|
+
|
|
99
|
+
def release_embedding_lock() -> None:
|
|
100
|
+
"""Release system-wide embedding worker lock."""
|
|
101
|
+
global _embedding_lock_fd
|
|
102
|
+
if _embedding_lock_fd is not None:
|
|
103
|
+
try:
|
|
104
|
+
import fcntl
|
|
105
|
+
fcntl.flock(_embedding_lock_fd, fcntl.LOCK_UN)
|
|
106
|
+
os.close(_embedding_lock_fd)
|
|
107
|
+
except Exception:
|
|
108
|
+
pass
|
|
109
|
+
_embedding_lock_fd = None
|
|
110
|
+
|
|
111
|
+
|
|
52
112
|
_IDLE_TIMEOUT_SECONDS = 120 # 2 minutes — kill worker after idle
|
|
53
113
|
# V3.3.12: Configurable via SLM_EMBED_IDLE_TIMEOUT env var (seconds)
|
|
54
114
|
_IDLE_TIMEOUT_SECONDS = int(os.environ.get("SLM_EMBED_IDLE_TIMEOUT", _IDLE_TIMEOUT_SECONDS))
|
|
@@ -270,11 +330,76 @@ class EmbeddingService:
|
|
|
270
330
|
raise error_container[0]
|
|
271
331
|
return result_container[0] if result_container else ""
|
|
272
332
|
|
|
333
|
+
@staticmethod
|
|
334
|
+
def _check_memory_pressure() -> bool:
|
|
335
|
+
"""Check if system has enough memory to spawn a worker.
|
|
336
|
+
|
|
337
|
+
V3.3.28: Prevents spawning embedding workers (1.4 GB each) when
|
|
338
|
+
the system is already under memory pressure. Returns True if safe.
|
|
339
|
+
"""
|
|
340
|
+
min_available_gb = float(os.environ.get("SLM_MIN_AVAILABLE_MEMORY_GB", "2.0"))
|
|
341
|
+
try:
|
|
342
|
+
if sys.platform == "darwin":
|
|
343
|
+
# macOS: use vm_stat to get free + inactive pages
|
|
344
|
+
import subprocess as _sp
|
|
345
|
+
result = _sp.run(["vm_stat"], capture_output=True, text=True, timeout=5)
|
|
346
|
+
if result.returncode == 0:
|
|
347
|
+
lines = result.stdout.split("\n")
|
|
348
|
+
page_size = 16384 # default on Apple Silicon
|
|
349
|
+
free_pages = 0
|
|
350
|
+
for line in lines:
|
|
351
|
+
if "page size of" in line:
|
|
352
|
+
try:
|
|
353
|
+
page_size = int(line.split()[-2])
|
|
354
|
+
except (ValueError, IndexError):
|
|
355
|
+
pass
|
|
356
|
+
if "Pages free" in line or "Pages inactive" in line:
|
|
357
|
+
try:
|
|
358
|
+
free_pages += int(line.split()[-1].rstrip("."))
|
|
359
|
+
except (ValueError, IndexError):
|
|
360
|
+
pass
|
|
361
|
+
available_gb = (free_pages * page_size) / (1024 ** 3)
|
|
362
|
+
if available_gb < min_available_gb:
|
|
363
|
+
logger.warning(
|
|
364
|
+
"Low memory (%.1f GB available, need %.1f GB) — "
|
|
365
|
+
"deferring embedding worker spawn",
|
|
366
|
+
available_gb, min_available_gb,
|
|
367
|
+
)
|
|
368
|
+
return False
|
|
369
|
+
else:
|
|
370
|
+
# Linux/other: use /proc/meminfo or psutil
|
|
371
|
+
try:
|
|
372
|
+
with open("/proc/meminfo") as f:
|
|
373
|
+
for line in f:
|
|
374
|
+
if line.startswith("MemAvailable:"):
|
|
375
|
+
available_kb = int(line.split()[1])
|
|
376
|
+
available_gb = available_kb / (1024 * 1024)
|
|
377
|
+
if available_gb < min_available_gb:
|
|
378
|
+
logger.warning(
|
|
379
|
+
"Low memory (%.1f GB available) — "
|
|
380
|
+
"deferring embedding worker spawn",
|
|
381
|
+
available_gb,
|
|
382
|
+
)
|
|
383
|
+
return False
|
|
384
|
+
break
|
|
385
|
+
except FileNotFoundError:
|
|
386
|
+
pass # Not Linux, allow through
|
|
387
|
+
except Exception:
|
|
388
|
+
pass # On error, allow through (don't block functionality)
|
|
389
|
+
return True
|
|
390
|
+
|
|
273
391
|
def _ensure_worker(self) -> None:
|
|
274
392
|
"""Spawn worker subprocess if not running."""
|
|
275
393
|
if self._worker_proc is not None and self._worker_proc.poll() is None:
|
|
276
394
|
return
|
|
277
395
|
self._worker_proc = None
|
|
396
|
+
|
|
397
|
+
# V3.3.28: Check memory pressure before spawning
|
|
398
|
+
if not self._check_memory_pressure():
|
|
399
|
+
logger.warning("Skipping embedding worker spawn due to memory pressure")
|
|
400
|
+
self._available = False
|
|
401
|
+
return
|
|
402
|
+
|
|
278
403
|
worker_module = "superlocalmemory.core.embedding_worker"
|
|
279
404
|
try:
|
|
280
405
|
env = {
|
|
@@ -79,18 +79,38 @@ def init_embedder(config: SLMConfig) -> Any | None:
|
|
|
79
79
|
provider = emb_cfg.provider
|
|
80
80
|
|
|
81
81
|
# --- Explicit ollama provider ---
|
|
82
|
+
# V3.3.27: HYBRID MODE B — use sentence-transformers subprocess for
|
|
83
|
+
# embeddings (fast, batched, ~2s) instead of Ollama HTTP per-call (~30s).
|
|
84
|
+
# Ollama is still used for LLM operations (fact extraction, context
|
|
85
|
+
# generation) via llm/backbone.py — that path is unchanged.
|
|
86
|
+
#
|
|
87
|
+
# Why: The store pipeline calls embed() 200+ times per remember
|
|
88
|
+
# (scene_builder, type_router, consolidator, entropy_gate, etc.).
|
|
89
|
+
# Ollama HTTP: 200 * 45ms = 9s minimum + cold starts.
|
|
90
|
+
# sentence-transformers subprocess: 200 embeds batched = ~1s.
|
|
91
|
+
#
|
|
92
|
+
# The embedding model is the SAME (nomic-embed-text-v1.5, 768d) —
|
|
93
|
+
# identical vectors, zero quality difference. Only the transport changes.
|
|
82
94
|
if provider == "ollama":
|
|
95
|
+
if config.mode == Mode.B:
|
|
96
|
+
# Mode B hybrid: prefer subprocess embedder (fast, batched)
|
|
97
|
+
st_emb = _try_service_embedder(EmbeddingService, emb_cfg)
|
|
98
|
+
if st_emb is not None:
|
|
99
|
+
logger.info(
|
|
100
|
+
"Mode B hybrid: using sentence-transformers subprocess "
|
|
101
|
+
"for embeddings (fast batched). Ollama used for LLM only."
|
|
102
|
+
)
|
|
103
|
+
return st_emb
|
|
104
|
+
# Fallback: if subprocess unavailable, use Ollama embeddings
|
|
105
|
+
logger.info("Mode B: sentence-transformers unavailable, using Ollama embeddings")
|
|
106
|
+
result = _try_ollama_embedder(emb_cfg)
|
|
107
|
+
if result is not None:
|
|
108
|
+
return result
|
|
109
|
+
return None
|
|
110
|
+
# Mode A/C with explicit ollama: use Ollama embeddings
|
|
83
111
|
result = _try_ollama_embedder(emb_cfg)
|
|
84
112
|
if result is not None:
|
|
85
113
|
return result
|
|
86
|
-
# Mode B explicitly wants Ollama — if unavailable, fall through
|
|
87
|
-
# to subprocess (still safe, never in-process)
|
|
88
|
-
if config.mode == Mode.B:
|
|
89
|
-
logger.warning(
|
|
90
|
-
"Ollama unavailable for Mode B. Falling back to "
|
|
91
|
-
"sentence-transformers subprocess."
|
|
92
|
-
)
|
|
93
|
-
return _try_service_embedder(EmbeddingService, emb_cfg)
|
|
94
114
|
return None
|
|
95
115
|
|
|
96
116
|
# --- Explicit cloud provider ---
|
|
@@ -41,8 +41,16 @@ class OllamaEmbedder:
|
|
|
41
41
|
Drop-in replacement for EmbeddingService. Implements the same
|
|
42
42
|
public interface (embed, embed_batch, compute_fisher_params,
|
|
43
43
|
is_available, dimension) so the engine can swap transparently.
|
|
44
|
+
|
|
45
|
+
V3.3.27: Session-scoped LRU cache eliminates redundant HTTP calls.
|
|
46
|
+
The store pipeline calls embed() 200+ times for the same texts
|
|
47
|
+
across different components (type_router, scene_builder, consolidator,
|
|
48
|
+
entropy_gate, sheaf_checker). Caching avoids ~215 Ollama roundtrips
|
|
49
|
+
per remember call, reducing latency from 30s to ~3s on Mode B.
|
|
44
50
|
"""
|
|
45
51
|
|
|
52
|
+
_CACHE_MAX_SIZE = 2048 # entries — covers a full store + recall cycle
|
|
53
|
+
|
|
46
54
|
def __init__(
|
|
47
55
|
self,
|
|
48
56
|
model: str = "nomic-embed-text",
|
|
@@ -53,6 +61,10 @@ class OllamaEmbedder:
|
|
|
53
61
|
self._base_url = base_url.rstrip("/")
|
|
54
62
|
self._dimension = dimension
|
|
55
63
|
self._available: bool | None = None # lazy-checked
|
|
64
|
+
# V3.3.27: Session-scoped embedding cache (text -> normalized vector)
|
|
65
|
+
self._embed_cache: dict[str, list[float]] = {}
|
|
66
|
+
self._cache_hits: int = 0
|
|
67
|
+
self._cache_misses: int = 0
|
|
56
68
|
|
|
57
69
|
# ------------------------------------------------------------------
|
|
58
70
|
# Public interface (matches EmbeddingService)
|
|
@@ -71,24 +83,75 @@ class OllamaEmbedder:
|
|
|
71
83
|
return self._dimension
|
|
72
84
|
|
|
73
85
|
def embed(self, text: str) -> list[float] | None:
|
|
74
|
-
"""Embed a single text. Returns normalized vector or None on failure.
|
|
86
|
+
"""Embed a single text. Returns normalized vector or None on failure.
|
|
87
|
+
|
|
88
|
+
V3.3.27: Returns cached result if the same text was embedded
|
|
89
|
+
earlier in this session, avoiding redundant Ollama HTTP calls.
|
|
90
|
+
"""
|
|
75
91
|
if not text or not text.strip():
|
|
76
92
|
raise ValueError("Cannot embed empty text")
|
|
93
|
+
|
|
94
|
+
# V3.3.27: Check cache first
|
|
95
|
+
cache_key = text.strip()
|
|
96
|
+
if cache_key in self._embed_cache:
|
|
97
|
+
self._cache_hits += 1
|
|
98
|
+
return self._embed_cache[cache_key]
|
|
99
|
+
|
|
77
100
|
try:
|
|
78
|
-
|
|
101
|
+
result = self._call_ollama_embed(text)
|
|
102
|
+
# Cache the result (evict oldest if over limit)
|
|
103
|
+
if result is not None:
|
|
104
|
+
if len(self._embed_cache) >= self._CACHE_MAX_SIZE:
|
|
105
|
+
# Evict first entry (oldest insertion)
|
|
106
|
+
first_key = next(iter(self._embed_cache))
|
|
107
|
+
del self._embed_cache[first_key]
|
|
108
|
+
self._embed_cache[cache_key] = result
|
|
109
|
+
self._cache_misses += 1
|
|
110
|
+
return result
|
|
79
111
|
except Exception as exc:
|
|
80
112
|
logger.warning("Ollama embed failed: %s", exc)
|
|
81
113
|
return None
|
|
82
114
|
|
|
83
115
|
def embed_batch(self, texts: list[str]) -> list[list[float] | None]:
|
|
84
|
-
"""Embed a batch of texts. Uses the batch API when available.
|
|
116
|
+
"""Embed a batch of texts. Uses the batch API when available.
|
|
117
|
+
|
|
118
|
+
V3.3.27: Skips already-cached texts, only sends uncached to Ollama.
|
|
119
|
+
"""
|
|
85
120
|
if not texts:
|
|
86
121
|
raise ValueError("Cannot embed empty batch")
|
|
122
|
+
|
|
123
|
+
# V3.3.27: Split into cached and uncached
|
|
124
|
+
results: list[list[float] | None] = [None] * len(texts)
|
|
125
|
+
uncached_indices: list[int] = []
|
|
126
|
+
uncached_texts: list[str] = []
|
|
127
|
+
|
|
128
|
+
for i, text in enumerate(texts):
|
|
129
|
+
key = text.strip()
|
|
130
|
+
if key in self._embed_cache:
|
|
131
|
+
results[i] = self._embed_cache[key]
|
|
132
|
+
self._cache_hits += 1
|
|
133
|
+
else:
|
|
134
|
+
uncached_indices.append(i)
|
|
135
|
+
uncached_texts.append(text)
|
|
136
|
+
|
|
137
|
+
if not uncached_texts:
|
|
138
|
+
return results # All cached — zero HTTP calls
|
|
139
|
+
|
|
87
140
|
try:
|
|
88
|
-
|
|
141
|
+
batch_results = self._call_ollama_embed_batch(uncached_texts)
|
|
142
|
+
for idx, emb in zip(uncached_indices, batch_results):
|
|
143
|
+
results[idx] = emb
|
|
144
|
+
if emb is not None:
|
|
145
|
+
key = texts[idx].strip()
|
|
146
|
+
if len(self._embed_cache) >= self._CACHE_MAX_SIZE:
|
|
147
|
+
first_key = next(iter(self._embed_cache))
|
|
148
|
+
del self._embed_cache[first_key]
|
|
149
|
+
self._embed_cache[key] = emb
|
|
150
|
+
self._cache_misses += 1
|
|
151
|
+
return results
|
|
89
152
|
except Exception as exc:
|
|
90
153
|
logger.warning("Ollama batch embed failed: %s", exc)
|
|
91
|
-
return
|
|
154
|
+
return results # Return whatever was cached + None for rest
|
|
92
155
|
|
|
93
156
|
def compute_fisher_params(
|
|
94
157
|
self, embedding: list[float],
|
|
@@ -64,13 +64,28 @@ class SceneBuilder:
|
|
|
64
64
|
best_scene: MemoryScene | None = None
|
|
65
65
|
best_sim = -1.0
|
|
66
66
|
|
|
67
|
+
# V3.3.27: Batch-embed all uncached scene themes in ONE call.
|
|
68
|
+
# Previously: 200+ individual embed() calls per fact (30s on Mode B).
|
|
69
|
+
# Now: 1 batch call for all uncached themes, then cache hits for the rest.
|
|
70
|
+
uncached_themes = [s.theme for s in scenes if s.theme not in self._scene_embeddings_cache]
|
|
71
|
+
if uncached_themes and hasattr(self._embedder, 'embed_batch'):
|
|
72
|
+
try:
|
|
73
|
+
batch_embs = self._embedder.embed_batch(uncached_themes)
|
|
74
|
+
for theme, emb in zip(uncached_themes, batch_embs):
|
|
75
|
+
if emb is not None:
|
|
76
|
+
self._scene_embeddings_cache[theme] = emb
|
|
77
|
+
except Exception:
|
|
78
|
+
pass # Fall through to individual embeds below
|
|
79
|
+
|
|
67
80
|
for scene in scenes:
|
|
68
|
-
# Use cached embedding if available, otherwise compute fresh
|
|
69
81
|
if scene.theme in self._scene_embeddings_cache:
|
|
70
82
|
theme_emb = self._scene_embeddings_cache[scene.theme]
|
|
71
83
|
else:
|
|
72
84
|
theme_emb = self._embedder.embed(scene.theme)
|
|
73
|
-
|
|
85
|
+
if theme_emb is not None:
|
|
86
|
+
self._scene_embeddings_cache[scene.theme] = theme_emb
|
|
87
|
+
if theme_emb is None:
|
|
88
|
+
continue
|
|
74
89
|
sim = _cosine(fact_emb, theme_emb)
|
|
75
90
|
if sim > best_sim:
|
|
76
91
|
best_sim = sim
|
|
@@ -97,26 +97,54 @@ def register_core_tools(server, get_engine: Callable) -> None:
|
|
|
97
97
|
"""
|
|
98
98
|
import asyncio
|
|
99
99
|
try:
|
|
100
|
-
|
|
101
|
-
|
|
102
|
-
#
|
|
103
|
-
#
|
|
104
|
-
#
|
|
105
|
-
|
|
106
|
-
|
|
107
|
-
|
|
108
|
-
|
|
109
|
-
|
|
110
|
-
|
|
111
|
-
|
|
112
|
-
|
|
113
|
-
|
|
114
|
-
|
|
115
|
-
|
|
116
|
-
|
|
117
|
-
|
|
118
|
-
|
|
119
|
-
|
|
100
|
+
# V3.3.27: Store-first pattern — write to pending.db immediately
|
|
101
|
+
# (<100ms), then process through full pipeline in background.
|
|
102
|
+
# This eliminates the 30-40s blocking that Mode B users experience.
|
|
103
|
+
# Pending memories are auto-processed on next engine.initialize()
|
|
104
|
+
# or by the daemon's background loop.
|
|
105
|
+
from superlocalmemory.cli.pending_store import store_pending, mark_done
|
|
106
|
+
|
|
107
|
+
pending_id = store_pending(content, tags=tags, metadata={
|
|
108
|
+
"project": project,
|
|
109
|
+
"importance": importance,
|
|
110
|
+
"agent_id": agent_id,
|
|
111
|
+
"session_id": session_id,
|
|
112
|
+
})
|
|
113
|
+
|
|
114
|
+
# Fire-and-forget: process in background thread
|
|
115
|
+
async def _process_in_background():
|
|
116
|
+
try:
|
|
117
|
+
from superlocalmemory.core.worker_pool import WorkerPool
|
|
118
|
+
pool = WorkerPool.shared()
|
|
119
|
+
result = await asyncio.to_thread(
|
|
120
|
+
pool.store, content, metadata={
|
|
121
|
+
"tags": tags, "project": project,
|
|
122
|
+
"importance": importance, "agent_id": agent_id,
|
|
123
|
+
"session_id": session_id,
|
|
124
|
+
},
|
|
125
|
+
)
|
|
126
|
+
if result.get("ok"):
|
|
127
|
+
mark_done(pending_id)
|
|
128
|
+
_emit_event("memory.created", {
|
|
129
|
+
"content_preview": content[:80],
|
|
130
|
+
"agent_id": agent_id,
|
|
131
|
+
"fact_count": result.get("count", 0),
|
|
132
|
+
}, source_agent=agent_id)
|
|
133
|
+
except Exception as _bg_exc:
|
|
134
|
+
logger.warning(
|
|
135
|
+
"Background store failed (pending_id=%s): %s",
|
|
136
|
+
pending_id, _bg_exc,
|
|
137
|
+
)
|
|
138
|
+
|
|
139
|
+
asyncio.create_task(_process_in_background())
|
|
140
|
+
|
|
141
|
+
return {
|
|
142
|
+
"success": True,
|
|
143
|
+
"fact_ids": [f"pending:{pending_id}"],
|
|
144
|
+
"count": 1,
|
|
145
|
+
"pending": True,
|
|
146
|
+
"message": "Stored to pending — processing in background.",
|
|
147
|
+
}
|
|
120
148
|
except Exception as exc:
|
|
121
149
|
logger.exception("remember failed")
|
|
122
150
|
return {"success": False, "error": str(exc)}
|