@simbimbo/memory-ocmemog 0.1.6 → 0.1.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -1,5 +1,15 @@
1
1
  # Changelog
2
2
 
3
+ ## 0.1.7 — 2026-03-19
4
+
5
+ llama.cpp-first cleanup after the 0.1.6 runtime cutover.
6
+
7
+ ### Highlights
8
+ - made llama.cpp / local OpenAI-compatible endpoints the primary documented and scripted local runtime path
9
+ - reduced misleading Ollama-first defaults in installers, sidecar scripts, docs, and helper tooling
10
+ - aligned context/distill/runtime helpers with the fixed local model architecture (`17890` gateway, `17891` sidecar, `18080` text, `18081` embeddings)
11
+ - kept compatibility hooks only where still useful for rollback or mixed environments
12
+
3
13
  ## 0.1.6 — 2026-03-19
4
14
 
5
15
  Port-separation and publish-solid follow-up.
@@ -8,6 +18,7 @@ Port-separation and publish-solid follow-up.
8
18
  - Split ocmemog sidecar onto dedicated loopback port `17891` to avoid collision with the OpenClaw gateway/dashboard on `17890`
9
19
  - Restored the plain realtime dashboard on `/dashboard` and fixed the `local_html` template crash
10
20
  - Updated plugin/runtime defaults, scripts, and documentation to use the dedicated sidecar endpoint on `17891`
21
+ - Switched repo-facing local-runtime defaults to llama.cpp-first endpoints on `18080`/`18081` with Qwen2.5 text and `nomic-embed-text-v1.5` embeddings, while keeping Ollama as explicit legacy fallback only
11
22
  - Added governance retrieval/governance-policy hardening plus expanded regression coverage for duplicate, contradiction, supersession, queue, audit, rollback, and auto-resolve flows
12
23
  - Aligned package/version metadata across npm, Python, and FastAPI surfaces
13
24
 
@@ -16,7 +27,7 @@ Port-separation and publish-solid follow-up.
16
27
  Repair and hardening follow-up after the 0.1.4 publish.
17
28
 
18
29
  ### Highlights
19
- - Fixed vector reindex defaults so repair scripts use provider-backed Ollama embeddings instead of silently rebuilding weak local/hash vectors
30
+ - Fixed vector reindex defaults so repair scripts use provider-backed local embeddings instead of silently rebuilding weak local/hash vectors
20
31
  - Added battery-aware sidecar defaults for macOS laptops (`OCMEMOG_LAPTOP_MODE=auto|ac|battery`)
21
32
  - Fixed `record_reinforcement()` so new experiences preserve `memory_reference`, and added integrity repair to backfill legacy missing references
22
33
  - Added incremental vector backfill tooling (`scripts/ocmemog-backfill-vectors.py`) for non-destructive backlog repair
package/README.md CHANGED
@@ -78,20 +78,24 @@ Optional environment variables:
78
78
  - `OCMEMOG_OPENAI_API_BASE` (default: `https://api.openai.com/v1`)
79
79
  - `OCMEMOG_OPENAI_EMBED_MODEL` (default: `text-embedding-3-small`)
80
80
  - `BRAIN_EMBED_MODEL_LOCAL` (`simple` by default)
81
- - `BRAIN_EMBED_MODEL_PROVIDER` (`openai` to enable provider embeddings)
81
+ - `BRAIN_EMBED_MODEL_PROVIDER` (`local-openai` to use the local llama.cpp embedding endpoint; `openai` remains available for hosted embeddings)
82
82
  - `OCMEMOG_TRANSCRIPT_WATCHER` (`true` to auto-start transcript watcher inside the sidecar)
83
83
  - `OCMEMOG_TRANSCRIPT_ROOTS` (comma-separated allowed roots for transcript context retrieval; default: `~/.openclaw/workspace/memory`)
84
84
  - `OCMEMOG_API_TOKEN` (optional; if set, requests must include `x-ocmemog-token` or `Authorization: Bearer ...`)
85
85
  - `OCMEMOG_AUTO_HYDRATION` (`true` to re-enable prompt-time continuity prepending; defaults to `false` as a safety guard until the host runtime is verified not to persist prepended context into session history)
86
86
  - `OCMEMOG_LAPTOP_MODE` (`auto` by default; on macOS battery power this slows watcher polling, reduces ingest batch size, and disables sentiment reinforcement unless explicitly overridden)
87
- - `OCMEMOG_USE_OLLAMA` (`true` to use Ollama for distill/inference)
88
- - `OCMEMOG_OLLAMA_HOST` (default: `http://127.0.0.1:11434`)
89
- - `OCMEMOG_OLLAMA_MODEL` (default: `phi3:latest`; lightweight local fallback / cheap cognition)
90
- - `OCMEMOG_OLLAMA_EMBED_MODEL` (default: `nomic-embed-text:latest`)
87
+ - `OCMEMOG_LOCAL_LLM_BASE_URL` (default: `http://127.0.0.1:18080/v1`; local OpenAI-compatible text endpoint, e.g. llama.cpp)
88
+ - `OCMEMOG_LOCAL_LLM_MODEL` (default: `qwen2.5-7b-instruct`; matches the active Qwen2.5-7B-Instruct GGUF runtime)
89
+ - `OCMEMOG_LOCAL_EMBED_BASE_URL` (default: `http://127.0.0.1:18081/v1`; local OpenAI-compatible embedding endpoint)
90
+ - `OCMEMOG_LOCAL_EMBED_MODEL` (default: `nomic-embed-text-v1.5`)
91
+ - `OCMEMOG_USE_OLLAMA` (`true` to force legacy Ollama local inference path)
92
+ - `OCMEMOG_OLLAMA_HOST` (default: `http://127.0.0.1:11434`; legacy fallback)
93
+ - `OCMEMOG_OLLAMA_MODEL` (default: `qwen2.5:7b`; legacy fallback for machines that still use Ollama)
94
+ - `OCMEMOG_OLLAMA_EMBED_MODEL` (default: `nomic-embed-text:latest`; legacy embedding fallback)
91
95
  - `OCMEMOG_PROMOTION_THRESHOLD` (default: `0.5`)
92
96
  - `OCMEMOG_DEMOTION_THRESHOLD` (default: `0.2`)
93
97
  - `OCMEMOG_PONDER_ENABLED` (default: `true`)
94
- - `OCMEMOG_PONDER_MODEL` (default via launcher: `qwen2.5:7b`; recommended for structured local memory refinement)
98
+ - `OCMEMOG_PONDER_MODEL` (default via launcher: `local-openai:qwen2.5-7b-instruct`; recommended for structured local memory refinement)
95
99
  - `OCMEMOG_LESSON_MINING_ENABLED` (default: `true`)
96
100
 
97
101
  ## Security
@@ -129,12 +133,13 @@ This installer will try to:
129
133
  - install Python requirements
130
134
  - install/enable the OpenClaw plugin when the `openclaw` CLI is available
131
135
  - install/load LaunchAgents via `scripts/ocmemog-install.sh`
132
- - pull required local Ollama models when Ollama is already installed
136
+ - verify the local llama.cpp runtime and expected text/embed endpoints
133
137
  - validate `/healthz`
134
138
 
135
139
  Notes:
136
- - If `OCMEMOG_INSTALL_PREREQS=true` and Homebrew is present, the installer will try to install missing `ollama` and `ffmpeg` automatically.
137
- - If Ollama is not installed and prereq auto-install is off or unavailable, the installer warns and continues; local model support will remain unavailable until Ollama is installed.
140
+ - If `OCMEMOG_INSTALL_PREREQS=true` and Homebrew is present, the installer will try to install missing `llama.cpp` and `ffmpeg` automatically.
141
+ - The installer no longer pulls local models. It assumes your llama.cpp text endpoint is on `127.0.0.1:18080` and your embedding endpoint is on `127.0.0.1:18081`.
142
+ - Legacy Ollama compatibility remains available only when you explicitly opt into it with `OCMEMOG_USE_OLLAMA=true`.
138
143
  - If package install is unavailable in the local OpenClaw build, the installer falls back to local-path plugin install.
139
144
  - Advanced flags are available for local debugging/CI (`--skip-plugin-install`, `--skip-launchagents`, `--skip-model-pulls`, `--endpoint`, `--repo-url`).
140
145
 
@@ -154,7 +159,7 @@ launchctl bootstrap gui/$UID scripts/launchagents/com.openclaw.ocmemog.guard.pli
154
159
 
155
160
  ## Recent changes
156
161
 
157
- ### 0.1.5 (current main)
162
+ ### 0.1.6 (current main)
158
163
 
159
164
  Package ownership + runtime safety release:
160
165
  - Publish package under `@simbimbo/memory-ocmemog` instead of the unauthorized `@openclaw` scope
@@ -9,8 +9,13 @@ OCMEMOG_MEMORY_MODEL = os.environ.get("OCMEMOG_MEMORY_MODEL", "gpt-4o-mini")
9
9
  OCMEMOG_OPENAI_API_BASE = os.environ.get("OCMEMOG_OPENAI_API_BASE", "https://api.openai.com/v1")
10
10
  OCMEMOG_OPENAI_EMBED_MODEL = os.environ.get("OCMEMOG_OPENAI_EMBED_MODEL", "text-embedding-3-small")
11
11
 
12
+ OCMEMOG_LOCAL_LLM_BASE_URL = os.environ.get("OCMEMOG_LOCAL_LLM_BASE_URL", "http://127.0.0.1:18080/v1")
13
+ OCMEMOG_LOCAL_LLM_MODEL = os.environ.get("OCMEMOG_LOCAL_LLM_MODEL", "qwen2.5-7b-instruct")
14
+ OCMEMOG_LOCAL_EMBED_BASE_URL = os.environ.get("OCMEMOG_LOCAL_EMBED_BASE_URL", "http://127.0.0.1:18081/v1")
15
+ OCMEMOG_LOCAL_EMBED_MODEL = os.environ.get("OCMEMOG_LOCAL_EMBED_MODEL", "nomic-embed-text-v1.5")
16
+
12
17
  OCMEMOG_OLLAMA_HOST = os.environ.get("OCMEMOG_OLLAMA_HOST", "http://127.0.0.1:11434")
13
- OCMEMOG_OLLAMA_MODEL = os.environ.get("OCMEMOG_OLLAMA_MODEL", "phi3:latest")
18
+ OCMEMOG_OLLAMA_MODEL = os.environ.get("OCMEMOG_OLLAMA_MODEL", "qwen2.5:7b")
14
19
  OCMEMOG_OLLAMA_EMBED_MODEL = os.environ.get("OCMEMOG_OLLAMA_EMBED_MODEL", "nomic-embed-text:latest")
15
20
 
16
21
  OCMEMOG_PROMOTION_THRESHOLD = float(os.environ.get("OCMEMOG_PROMOTION_THRESHOLD", "0.5"))
@@ -11,6 +11,35 @@ from brain.runtime.instrumentation import emit_event
11
11
  LOGFILE = state_store.reports_dir() / "brain_memory.log.jsonl"
12
12
 
13
13
 
14
+ def _infer_openai_compatible(prompt: str, *, base_url: str, model: str, api_key: str | None = None, provider_label: str = "openai-compatible") -> dict[str, str]:
15
+ url = f"{base_url.rstrip('/')}/chat/completions"
16
+ payload = {
17
+ "model": model,
18
+ "messages": [{"role": "user", "content": prompt}],
19
+ "temperature": 0.2,
20
+ }
21
+ data = json.dumps(payload).encode("utf-8")
22
+ req = urllib.request.Request(url, data=data, method="POST")
23
+ if api_key:
24
+ req.add_header("Authorization", f"Bearer {api_key}")
25
+ req.add_header("Content-Type", "application/json")
26
+
27
+ try:
28
+ with urllib.request.urlopen(req, timeout=30) as resp:
29
+ response = json.loads(resp.read().decode("utf-8"))
30
+ except Exception as exc:
31
+ emit_event(LOGFILE, "brain_infer_error", status="error", provider=provider_label, error=str(exc))
32
+ return {"status": "error", "error": f"request_failed:{exc}"}
33
+
34
+ try:
35
+ output = response["choices"][0]["message"]["content"]
36
+ except Exception as exc:
37
+ emit_event(LOGFILE, "brain_infer_error", status="error", provider=provider_label, error=str(exc))
38
+ return {"status": "error", "error": "invalid_response"}
39
+
40
+ return {"status": "ok", "output": str(output).strip()}
41
+
42
+
14
43
  def _infer_ollama(prompt: str, model: str | None = None) -> dict[str, str]:
15
44
  payload = {
16
45
  "model": model or config.OCMEMOG_OLLAMA_MODEL,
@@ -33,6 +62,21 @@ def _infer_ollama(prompt: str, model: str | None = None) -> dict[str, str]:
33
62
  return {"status": "ok", "output": str(output).strip()}
34
63
 
35
64
 
65
+ def _looks_like_local_openai_model(name: str) -> bool:
66
+ if not name:
67
+ return False
68
+ lowered = name.strip().lower()
69
+ return lowered.startswith("local-openai:") or lowered.startswith("local_openai:") or lowered.startswith("llamacpp:")
70
+
71
+
72
+ def _normalize_local_model_name(name: str) -> str:
73
+ lowered = (name or "").strip()
74
+ for prefix in ("local-openai:", "local_openai:", "llamacpp:"):
75
+ if lowered.lower().startswith(prefix):
76
+ return lowered[len(prefix):]
77
+ return lowered
78
+
79
+
36
80
  def _looks_like_ollama_model(name: str) -> bool:
37
81
  if not name:
38
82
  return False
@@ -69,41 +113,37 @@ def infer(prompt: str, provider_name: str | None = None) -> dict[str, str]:
69
113
 
70
114
  use_ollama = os.environ.get("OCMEMOG_USE_OLLAMA", "").lower() in {"1", "true", "yes"}
71
115
  model_override = provider_name or config.OCMEMOG_MEMORY_MODEL
116
+ if _looks_like_local_openai_model(model_override):
117
+ model = _normalize_local_model_name(model_override) or config.OCMEMOG_LOCAL_LLM_MODEL
118
+ return _infer_openai_compatible(
119
+ prompt,
120
+ base_url=config.OCMEMOG_LOCAL_LLM_BASE_URL,
121
+ model=model,
122
+ api_key=os.environ.get("OCMEMOG_LOCAL_LLM_API_KEY") or os.environ.get("LOCAL_LLM_API_KEY"),
123
+ provider_label="local-openai",
124
+ )
72
125
  if use_ollama or _looks_like_ollama_model(model_override):
73
126
  model = model_override.split(":", 1)[-1] if model_override.startswith("ollama:") else model_override
74
127
  return _infer_ollama(prompt, model)
75
128
 
76
129
  api_key = os.environ.get("OCMEMOG_OPENAI_API_KEY") or os.environ.get("OPENAI_API_KEY")
77
130
  if not api_key:
78
- # fall back to local ollama if configured
79
- return _infer_ollama(prompt, config.OCMEMOG_OLLAMA_MODEL)
131
+ return _infer_openai_compatible(
132
+ prompt,
133
+ base_url=config.OCMEMOG_LOCAL_LLM_BASE_URL,
134
+ model=config.OCMEMOG_LOCAL_LLM_MODEL,
135
+ api_key=os.environ.get("OCMEMOG_LOCAL_LLM_API_KEY") or os.environ.get("LOCAL_LLM_API_KEY"),
136
+ provider_label="local-openai",
137
+ )
80
138
 
81
139
  model = model_override
82
- url = f"{config.OCMEMOG_OPENAI_API_BASE.rstrip('/')}/chat/completions"
83
- payload = {
84
- "model": model,
85
- "messages": [{"role": "user", "content": prompt}],
86
- "temperature": 0.2,
87
- }
88
- data = json.dumps(payload).encode("utf-8")
89
- req = urllib.request.Request(url, data=data, method="POST")
90
- req.add_header("Authorization", f"Bearer {api_key}")
91
- req.add_header("Content-Type", "application/json")
92
-
93
- try:
94
- with urllib.request.urlopen(req, timeout=30) as resp:
95
- response = json.loads(resp.read().decode("utf-8"))
96
- except Exception as exc:
97
- emit_event(LOGFILE, "brain_infer_error", status="error", provider="openai", error=str(exc))
98
- return {"status": "error", "error": f"request_failed:{exc}"}
99
-
100
- try:
101
- output = response["choices"][0]["message"]["content"]
102
- except Exception as exc:
103
- emit_event(LOGFILE, "brain_infer_error", status="error", provider="openai", error=str(exc))
104
- return {"status": "error", "error": "invalid_response"}
105
-
106
- return {"status": "ok", "output": str(output).strip()}
140
+ return _infer_openai_compatible(
141
+ prompt,
142
+ base_url=config.OCMEMOG_OPENAI_API_BASE,
143
+ model=model,
144
+ api_key=api_key,
145
+ provider_label="openai",
146
+ )
107
147
 
108
148
 
109
149
  def parse_operator_name(text: str) -> dict[str, str] | None:
@@ -316,7 +316,10 @@ def _model_contradiction_hint(left: str, right: str) -> Optional[Dict[str, Any]]
316
316
  f"Statement A: {left}\n"
317
317
  f"Statement B: {right}\n"
318
318
  )
319
- result = inference.infer(prompt, provider_name="qwen2.5:7b")
319
+ result = inference.infer(
320
+ prompt,
321
+ provider_name=os.environ.get("OCMEMOG_PONDER_MODEL", "local-openai:qwen2.5-7b-instruct"),
322
+ )
320
323
  if result.get("status") != "ok":
321
324
  return None
322
325
  try:
@@ -53,7 +53,7 @@ def _groom_queries(prompt: str, limit: int = 3) -> List[str]:
53
53
  return []
54
54
  if _should_skip_query_grooming(cleaned):
55
55
  return _heuristic_queries(cleaned, limit=limit)
56
- model = os.environ.get("OCMEMOG_PONDER_MODEL", "qwen2.5:7b")
56
+ model = os.environ.get("OCMEMOG_PONDER_MODEL", "local-openai:qwen2.5-7b-instruct")
57
57
  ask = (
58
58
  "Rewrite this raw memory request into up to 3 short search queries. "
59
59
  "Return strict JSON as {\"queries\":[\"...\"]}. "
@@ -43,7 +43,7 @@ def _local_distill_summary(text: str) -> str:
43
43
  f"Experience:\n{text}\n\n"
44
44
  "Summary:"
45
45
  )
46
- model = os.environ.get("OCMEMOG_PONDER_MODEL", "qwen2.5:7b")
46
+ model = os.environ.get("OCMEMOG_PONDER_MODEL", "local-openai:qwen2.5-7b-instruct")
47
47
  try:
48
48
  result = inference.infer(prompt, provider_name=model)
49
49
  except Exception:
@@ -17,6 +17,8 @@ def get_provider_for_role(role: str) -> ModelSelection:
17
17
  provider = (config.BRAIN_EMBED_MODEL_PROVIDER or "").strip().lower()
18
18
  if provider in {"openai", "openai_compatible", "openai-compatible"}:
19
19
  return ModelSelection(provider_id="openai", model=config.OCMEMOG_OPENAI_EMBED_MODEL)
20
+ if provider in {"local-openai", "local_openai", "llamacpp", "llama.cpp"}:
21
+ return ModelSelection(provider_id="local-openai", model=config.OCMEMOG_LOCAL_EMBED_MODEL)
20
22
  if provider in {"ollama", "local-ollama"}:
21
23
  return ModelSelection(provider_id="ollama", model=config.OCMEMOG_OLLAMA_EMBED_MODEL)
22
24
  return ModelSelection()
@@ -14,25 +14,34 @@ class ProviderExecute:
14
14
  def execute_embedding_call(self, selection, text: str) -> dict[str, object]:
15
15
  provider_id = getattr(selection, "provider_id", "") or ""
16
16
  model = getattr(selection, "model", "") or config.OCMEMOG_OPENAI_EMBED_MODEL
17
- if provider_id == "openai":
18
- api_key = os.environ.get("OCMEMOG_OPENAI_API_KEY") or os.environ.get("OPENAI_API_KEY")
19
- if not api_key:
20
- return {}
21
- url = f"{config.OCMEMOG_OPENAI_API_BASE.rstrip('/')}/embeddings"
17
+ if provider_id in {"openai", "local-openai"}:
18
+ api_key = None
19
+ url_base = config.OCMEMOG_OPENAI_API_BASE
20
+ provider_label = "openai"
21
+ if provider_id == "openai":
22
+ api_key = os.environ.get("OCMEMOG_OPENAI_API_KEY") or os.environ.get("OPENAI_API_KEY")
23
+ if not api_key:
24
+ return {}
25
+ else:
26
+ url_base = config.OCMEMOG_LOCAL_EMBED_BASE_URL
27
+ api_key = os.environ.get("OCMEMOG_LOCAL_EMBED_API_KEY") or os.environ.get("LOCAL_EMBED_API_KEY")
28
+ provider_label = "local-openai"
29
+ url = f"{url_base.rstrip('/')}/embeddings"
22
30
  payload = json.dumps({"model": model, "input": text}).encode("utf-8")
23
31
  req = urllib.request.Request(url, data=payload, method="POST")
24
- req.add_header("Authorization", f"Bearer {api_key}")
32
+ if api_key:
33
+ req.add_header("Authorization", f"Bearer {api_key}")
25
34
  req.add_header("Content-Type", "application/json")
26
35
  try:
27
36
  with urllib.request.urlopen(req, timeout=20) as resp:
28
37
  data = json.loads(resp.read().decode("utf-8"))
29
38
  except Exception as exc:
30
- emit_event(LOGFILE, "brain_embedding_provider_error", status="error", provider="openai", error=str(exc))
39
+ emit_event(LOGFILE, "brain_embedding_provider_error", status="error", provider=provider_label, error=str(exc))
31
40
  return {}
32
41
  try:
33
42
  embedding = data["data"][0]["embedding"]
34
43
  except Exception as exc:
35
- emit_event(LOGFILE, "brain_embedding_provider_error", status="error", provider="openai", error=str(exc))
44
+ emit_event(LOGFILE, "brain_embedding_provider_error", status="error", provider=provider_label, error=str(exc))
36
45
  return {}
37
46
  return {"embedding": embedding}
38
47
 
@@ -12,8 +12,8 @@ This pass focused on turning `ocmemog` from a noisy/fragile memory stack into a
12
12
  ## Changes landed
13
13
 
14
14
  ### Embedding and rebuild behavior
15
- - Fixed the vector reindex entrypoint so it defaults to provider-backed Ollama embeddings instead of silently rebuilding weak hash/simple vectors.
16
- - Confirmed local Ollama embeddings (`nomic-embed-text:latest`) are available and produce 768-dim vectors.
15
+ - Fixed the vector reindex entrypoint so it defaults to provider-backed local embeddings instead of silently rebuilding weak hash/simple vectors.
16
+ - At the time this landed, the provider-backed path used Ollama-hosted `nomic-embed-text:latest`; the current repo default is the llama.cpp embedding endpoint on `127.0.0.1:18081` with `nomic-embed-text-v1.5`.
17
17
  - Added a new incremental repair path:
18
18
  - `backfill_missing_vectors()` in `brain/runtime/memory/vector_index.py`
19
19
  - `scripts/ocmemog-backfill-vectors.py`
@@ -62,7 +62,7 @@ For laptop-friendly backlog burn-down, use staged backfills in roughly this orde
62
62
  6. knowledge last
63
63
 
64
64
  ## Commits from this sweep
65
- - `f3d3dd9` — fix: default vector reindex to ollama embeddings
65
+ - `f3d3dd9` — fix: default vector reindex to provider-backed embeddings
66
66
  - `759d23d` — feat: add battery-aware sidecar defaults
67
67
  - `4a102eb` — fix: clean memory freshness summaries
68
68
  - `9ee7966` — fix: report duplicate promotion counts accurately
@@ -1,8 +1,10 @@
1
1
  # Local model role matrix — 2026-03-18
2
2
 
3
+ Historical note: this bakeoff was recorded before the local-runtime cutover from Ollama to llama.cpp. Keep the conclusions, but map them onto the current llama.cpp-served GGUF models when using this repo today.
4
+
3
5
  Purpose: document which installed local model is best suited for which `ocmemog` task so background cognition can be smarter without putting heavy/slow models on every path.
4
6
 
5
- Installed local models observed:
7
+ Installed local models observed at the time:
6
8
  - `phi3:latest`
7
9
  - `qwen2.5:7b`
8
10
  - `llama3.1:8b`
@@ -45,6 +47,8 @@ Installed local models observed:
45
47
  - richer optional background cognition: `llama3.1:8b`
46
48
 
47
49
  ## Operational recommendation
48
- - Keep `OCMEMOG_OLLAMA_MODEL=phi3:latest` for lightweight local fallback behavior.
49
- - Set `OCMEMOG_PONDER_MODEL=qwen2.5:7b` for unresolved-state rewrite, lesson extraction, and cluster recommendation shaping.
50
+ - Current llama.cpp-first equivalent for this repo:
51
+ - Set `OCMEMOG_LOCAL_LLM_MODEL=qwen2.5-7b-instruct` and `OCMEMOG_PONDER_MODEL=local-openai:qwen2.5-7b-instruct` for unresolved-state rewrite, lesson extraction, and cluster recommendation shaping.
52
+ - Set `OCMEMOG_LOCAL_EMBED_MODEL=nomic-embed-text-v1.5` for embeddings on the `18081` endpoint.
53
+ - If you intentionally keep Ollama on another machine, prefer `OCMEMOG_OLLAMA_MODEL=qwen2.5:7b` instead of `phi3`.
50
54
  - Consider `llama3.1:8b` for optional deeper background cognition passes where latency is acceptable.
package/docs/usage.md CHANGED
@@ -2,10 +2,10 @@
2
2
 
3
3
  ## Current operating model
4
4
 
5
- ocmemog is a repo-local OpenClaw memory sidecar backed by SQLite. It is not a full brAIn runtime clone. The safe assumption is:
5
+ ocmemog is a repo-local OpenClaw memory sidecar backed by SQLite with llama.cpp-first local inference and embeddings. It is not a full brAIn runtime clone. The safe assumption is:
6
6
 
7
7
  - search/get over local memory are supported
8
- - heuristic embeddings are supported by default
8
+ - provider-backed local embeddings are the primary path
9
9
  - several advanced brAIn memory flows are copied in but still degraded by missing runtime dependencies
10
10
 
11
11
  ## Running the sidecar
@@ -47,8 +47,12 @@ export OCMEMOG_MEMORY_MODEL=gpt-4o-mini
47
47
  export OCMEMOG_OPENAI_API_KEY=sk-...
48
48
  export OCMEMOG_OPENAI_API_BASE=https://api.openai.com/v1
49
49
  export OCMEMOG_OPENAI_EMBED_MODEL=text-embedding-3-small
50
+ export OCMEMOG_LOCAL_LLM_BASE_URL=http://127.0.0.1:18080/v1
51
+ export OCMEMOG_LOCAL_LLM_MODEL=qwen2.5-7b-instruct
52
+ export OCMEMOG_LOCAL_EMBED_BASE_URL=http://127.0.0.1:18081/v1
53
+ export OCMEMOG_LOCAL_EMBED_MODEL=nomic-embed-text-v1.5
50
54
  export BRAIN_EMBED_MODEL_LOCAL=simple
51
- export BRAIN_EMBED_MODEL_PROVIDER=openai
55
+ export BRAIN_EMBED_MODEL_PROVIDER=local-openai
52
56
  export OCMEMOG_TRANSCRIPT_DIR=$HOME/.openclaw/workspace/memory/transcripts
53
57
  export OCMEMOG_TRANSCRIPT_GLOB=*.log
54
58
  export OCMEMOG_TRANSCRIPT_POLL_SECONDS=1
@@ -182,8 +186,8 @@ Notes:
182
186
  - `brain/runtime/memory/api.py`
183
187
  - It targets missing/legacy tables and columns.
184
188
  - Provider-backed embeddings
185
- - Available when `BRAIN_EMBED_MODEL_PROVIDER=openai` and `OCMEMOG_OPENAI_API_KEY` is set.
186
- - Falls back to local embeddings when missing.
189
+ - Available when `BRAIN_EMBED_MODEL_PROVIDER=local-openai` and the local embedding endpoint is reachable.
190
+ - Legacy OpenAI-hosted embeddings remain available when `BRAIN_EMBED_MODEL_PROVIDER=openai` and `OCMEMOG_OPENAI_API_KEY` is set.
187
191
  - Model-backed distillation
188
192
  - Available when `OCMEMOG_OPENAI_API_KEY` is set; otherwise falls back to heuristic distill.
189
193
  - Role-prioritized context building
@@ -19,7 +19,7 @@ from ocmemog.sidecar.transcript_watcher import watch_forever
19
19
 
20
20
  DEFAULT_CATEGORIES = ("knowledge", "reflections", "directives", "tasks", "runbooks", "lessons")
21
21
 
22
- app = FastAPI(title="ocmemog sidecar", version="0.1.6")
22
+ app = FastAPI(title="ocmemog sidecar", version="0.1.7")
23
23
 
24
24
  API_TOKEN = os.environ.get("OCMEMOG_API_TOKEN")
25
25
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@simbimbo/memory-ocmemog",
3
- "version": "0.1.6",
3
+ "version": "0.1.7",
4
4
  "description": "Advanced OpenClaw memory plugin with durable recall, transcript-backed continuity, and sidecar APIs",
5
5
  "license": "MIT",
6
6
  "repository": {
@@ -8,7 +8,9 @@ PLUGIN_PACKAGE="@simbimbo/memory-ocmemog"
8
8
  PLUGIN_ID="memory-ocmemog"
9
9
  ENDPOINT="${OCMEMOG_ENDPOINT:-http://127.0.0.1:17891}"
10
10
  TIMEOUT_MS="${OCMEMOG_TIMEOUT_MS:-30000}"
11
- DEFAULT_OLLAMA_MODEL="${OCMEMOG_OLLAMA_MODEL:-phi3:latest}"
11
+ DEFAULT_LOCAL_LLM_MODEL="${OCMEMOG_LOCAL_LLM_MODEL:-qwen2.5-7b-instruct}"
12
+ DEFAULT_LOCAL_EMBED_MODEL="${OCMEMOG_LOCAL_EMBED_MODEL:-nomic-embed-text-v1.5}"
13
+ DEFAULT_OLLAMA_MODEL="${OCMEMOG_OLLAMA_MODEL:-qwen2.5:7b}"
12
14
  DEFAULT_OLLAMA_EMBED_MODEL="${OCMEMOG_OLLAMA_EMBED_MODEL:-nomic-embed-text:latest}"
13
15
  INSTALL_PREREQS="${OCMEMOG_INSTALL_PREREQS:-false}"
14
16
  SKIP_PLUGIN_INSTALL="false"
@@ -27,10 +29,10 @@ Arguments:
27
29
 
28
30
  Options:
29
31
  --help Show this help text.
30
- --install-prereqs Auto-install missing ollama/ffmpeg via Homebrew.
32
+ --install-prereqs Auto-install missing llama.cpp/ffmpeg via Homebrew.
31
33
  --skip-plugin-install Skip OpenClaw plugin install/enable.
32
34
  --skip-launchagents Skip LaunchAgent install/load.
33
- --skip-model-pulls Skip local Ollama model pulls.
35
+ --skip-model-pulls Skip local llama.cpp runtime checks.
34
36
  --dry-run Print what would happen without making changes.
35
37
  --endpoint URL Override sidecar endpoint (default: http://127.0.0.1:17891).
36
38
  --timeout-ms N Override plugin timeout summary value (default: 30000).
@@ -38,8 +40,10 @@ Options:
38
40
 
39
41
  Environment:
40
42
  OCMEMOG_INSTALL_PREREQS=true Same as --install-prereqs.
41
- OCMEMOG_OLLAMA_MODEL Default local model to pull.
42
- OCMEMOG_OLLAMA_EMBED_MODEL Default local embedding model to pull.
43
+ OCMEMOG_LOCAL_LLM_MODEL Default local llama.cpp/OpenAI-compatible text model.
44
+ OCMEMOG_LOCAL_EMBED_MODEL Default local llama.cpp/OpenAI-compatible embedding model.
45
+ OCMEMOG_OLLAMA_MODEL Legacy Ollama text model fallback.
46
+ OCMEMOG_OLLAMA_EMBED_MODEL Legacy Ollama embedding model fallback.
43
47
  EOF
44
48
  }
45
49
 
@@ -125,9 +129,9 @@ maybe_install_prereqs() {
125
129
  warn "Homebrew not found; cannot auto-install prerequisites"
126
130
  return
127
131
  fi
128
- if ! have ollama; then
129
- log "Installing Ollama via Homebrew"
130
- run_cmd brew install ollama || warn "brew install ollama failed"
132
+ if ! have llama-server; then
133
+ log "Installing llama.cpp via Homebrew"
134
+ run_cmd brew install llama.cpp || warn "brew install llama.cpp failed"
131
135
  fi
132
136
  if ! have ffmpeg; then
133
137
  log "Installing ffmpeg via Homebrew"
@@ -206,23 +210,18 @@ install_launchagents() {
206
210
  run_cmd "$ROOT_DIR/scripts/ocmemog-install.sh"
207
211
  }
208
212
 
209
- ensure_ollama_models() {
213
+ ensure_local_runtime() {
210
214
  if [[ "$SKIP_MODEL_PULLS" == "true" ]]; then
211
- log "Skipping local model pulls by request"
215
+ log "Skipping local llama.cpp runtime checks by request"
212
216
  return
213
217
  fi
214
- if ! have ollama; then
215
- warn "Ollama not found. Install from https://ollama.com/download to enable local models."
218
+ if ! have llama-server; then
219
+ warn "llama-server not found. Install llama.cpp or provide your own local OpenAI-compatible endpoints."
216
220
  return
217
221
  fi
218
- if ! ollama list | rg -q "$(printf '%s' "$DEFAULT_OLLAMA_MODEL" | sed 's/:.*$//')"; then
219
- log "Pulling local model $DEFAULT_OLLAMA_MODEL"
220
- run_cmd ollama pull "$DEFAULT_OLLAMA_MODEL"
221
- fi
222
- if ! ollama list | rg -q "$(printf '%s' "$DEFAULT_OLLAMA_EMBED_MODEL" | sed 's/:.*$//')"; then
223
- log "Pulling local embed model $DEFAULT_OLLAMA_EMBED_MODEL"
224
- run_cmd ollama pull "$DEFAULT_OLLAMA_EMBED_MODEL"
225
- fi
222
+ log "Detected llama.cpp runtime via llama-server"
223
+ log "Expect local text endpoint at http://127.0.0.1:18080/v1 using model $DEFAULT_LOCAL_LLM_MODEL"
224
+ log "Expect local embed endpoint at http://127.0.0.1:18081/v1 using model $DEFAULT_LOCAL_EMBED_MODEL"
226
225
  }
227
226
 
228
227
  validate_install() {
@@ -252,12 +251,13 @@ ocmemog install summary
252
251
  - repo: $ROOT_DIR
253
252
  - endpoint: $ENDPOINT
254
253
  - timeoutMs: $TIMEOUT_MS
255
- - local model: $DEFAULT_OLLAMA_MODEL
256
- - embed model: $DEFAULT_OLLAMA_EMBED_MODEL
254
+ - local text model: $DEFAULT_LOCAL_LLM_MODEL
255
+ - local embed model: $DEFAULT_LOCAL_EMBED_MODEL
256
+ - legacy Ollama fallback model: $DEFAULT_OLLAMA_MODEL
257
257
  - install prereqs automatically: $INSTALL_PREREQS
258
258
  - skip plugin install: $SKIP_PLUGIN_INSTALL
259
259
  - skip LaunchAgents: $SKIP_LAUNCHAGENTS
260
- - skip model pulls: $SKIP_MODEL_PULLS
260
+ - skip local runtime checks: $SKIP_MODEL_PULLS
261
261
  - dry run: $DRY_RUN
262
262
 
263
263
  Next checks:
@@ -272,6 +272,6 @@ maybe_install_prereqs
272
272
  ensure_python
273
273
  install_plugin
274
274
  install_launchagents
275
- ensure_ollama_models
275
+ ensure_local_runtime
276
276
  validate_install
277
277
  print_summary
@@ -9,10 +9,12 @@ from pathlib import Path
9
9
  REPO_ROOT = Path(__file__).resolve().parents[1]
10
10
  sys.path.insert(0, str(REPO_ROOT))
11
11
 
12
- os.environ.setdefault("OCMEMOG_USE_OLLAMA", "true")
13
- os.environ.setdefault("OCMEMOG_OLLAMA_MODEL", "phi3:latest")
14
- os.environ.setdefault("OCMEMOG_OLLAMA_EMBED_MODEL", "nomic-embed-text:latest")
15
- os.environ.setdefault("BRAIN_EMBED_MODEL_PROVIDER", "ollama")
12
+ os.environ.setdefault("OCMEMOG_USE_OLLAMA", "false")
13
+ os.environ.setdefault("OCMEMOG_LOCAL_LLM_BASE_URL", "http://127.0.0.1:18080/v1")
14
+ os.environ.setdefault("OCMEMOG_LOCAL_LLM_MODEL", "qwen2.5-7b-instruct")
15
+ os.environ.setdefault("OCMEMOG_LOCAL_EMBED_BASE_URL", "http://127.0.0.1:18081/v1")
16
+ os.environ.setdefault("OCMEMOG_LOCAL_EMBED_MODEL", "nomic-embed-text-v1.5")
17
+ os.environ.setdefault("BRAIN_EMBED_MODEL_PROVIDER", "local-openai")
16
18
  os.environ.setdefault("BRAIN_EMBED_MODEL_LOCAL", "")
17
19
  os.environ.setdefault("OCMEMOG_STATE_DIR", str(REPO_ROOT / ".ocmemog-state"))
18
20
 
@@ -50,7 +50,7 @@ def demo_precision() -> dict:
50
50
  "synology nas",
51
51
  "openclaw status --deep",
52
52
  "gateway bind loopback",
53
- "ollama embeddings",
53
+ "llama.cpp embeddings",
54
54
  "memory pipeline",
55
55
  "jira projects",
56
56
  "calix arden",
@@ -66,21 +66,13 @@ for plist in "$ROOT_DIR"/scripts/launchagents/com.openclaw.ocmemog.{sidecar,pond
66
66
  echo "Loaded $label"
67
67
  done
68
68
 
69
- if ! command -v ollama >/dev/null 2>&1; then
70
- echo "Ollama not found. Install from: https://ollama.com/download"
71
- echo "Then run: ollama pull phi3 && ollama pull nomic-embed-text"
69
+ if ! command -v llama-server >/dev/null 2>&1; then
70
+ echo "llama.cpp not found. Install with: brew install llama.cpp"
72
71
  exit 0
73
72
  fi
74
73
 
75
- if ! ollama list | rg -q "phi3"; then
76
- echo "Pulling phi3..."
77
- ollama pull phi3
78
- fi
79
-
80
- if ! ollama list | rg -q "nomic-embed-text"; then
81
- echo "Pulling nomic-embed-text..."
82
- ollama pull nomic-embed-text
83
- fi
74
+ echo "Expect local llama.cpp text endpoint at http://127.0.0.1:18080/v1"
75
+ echo "Expect local llama.cpp embed endpoint at http://127.0.0.1:18081/v1"
84
76
 
85
77
  if ! command -v ffmpeg >/dev/null 2>&1; then
86
78
  echo "ffmpeg not found. Install with: brew install ffmpeg"
@@ -17,7 +17,7 @@ QUERIES = [
17
17
  "ssh key policy",
18
18
  "synology nas",
19
19
  "openclaw status --deep",
20
- "ollama embeddings",
20
+ "llama.cpp embeddings",
21
21
  "memory pipeline",
22
22
  "calix arden",
23
23
  "gateway bind loopback",
@@ -11,7 +11,7 @@ QUERIES = [
11
11
  "ssh key policy",
12
12
  "synology nas",
13
13
  "openclaw status --deep",
14
- "ollama embeddings",
14
+ "llama.cpp embeddings",
15
15
  "memory pipeline",
16
16
  "calix arden",
17
17
  ]
@@ -8,10 +8,12 @@ from pathlib import Path
8
8
  REPO_ROOT = Path(__file__).resolve().parents[1]
9
9
  sys.path.insert(0, str(REPO_ROOT))
10
10
 
11
- os.environ.setdefault("OCMEMOG_USE_OLLAMA", "true")
12
- os.environ.setdefault("OCMEMOG_OLLAMA_MODEL", "phi3:latest")
13
- os.environ.setdefault("OCMEMOG_OLLAMA_EMBED_MODEL", "nomic-embed-text:latest")
14
- os.environ.setdefault("BRAIN_EMBED_MODEL_PROVIDER", "ollama")
11
+ os.environ.setdefault("OCMEMOG_USE_OLLAMA", "false")
12
+ os.environ.setdefault("OCMEMOG_LOCAL_LLM_BASE_URL", "http://127.0.0.1:18080/v1")
13
+ os.environ.setdefault("OCMEMOG_LOCAL_LLM_MODEL", "qwen2.5-7b-instruct")
14
+ os.environ.setdefault("OCMEMOG_LOCAL_EMBED_BASE_URL", "http://127.0.0.1:18081/v1")
15
+ os.environ.setdefault("OCMEMOG_LOCAL_EMBED_MODEL", "nomic-embed-text-v1.5")
16
+ os.environ.setdefault("BRAIN_EMBED_MODEL_PROVIDER", "local-openai")
15
17
  os.environ.setdefault("BRAIN_EMBED_MODEL_LOCAL", "")
16
18
  os.environ.setdefault("OCMEMOG_STATE_DIR", str(REPO_ROOT / ".ocmemog-state"))
17
19
 
@@ -31,12 +31,16 @@ if [[ "$LAPTOP_MODE" == "auto" ]]; then
31
31
  fi
32
32
  export OCMEMOG_LAPTOP_MODE="$LAPTOP_MODE"
33
33
 
34
- # defaults for local ollama-backed inference/embeddings
35
- export OCMEMOG_USE_OLLAMA="${OCMEMOG_USE_OLLAMA:-true}"
36
- export OCMEMOG_OLLAMA_MODEL="${OCMEMOG_OLLAMA_MODEL:-phi3:latest}"
34
+ # defaults for local llama.cpp / OpenAI-compatible inference and embeddings
35
+ export OCMEMOG_USE_OLLAMA="${OCMEMOG_USE_OLLAMA:-false}"
36
+ export OCMEMOG_LOCAL_LLM_BASE_URL="${OCMEMOG_LOCAL_LLM_BASE_URL:-http://127.0.0.1:18080/v1}"
37
+ export OCMEMOG_LOCAL_LLM_MODEL="${OCMEMOG_LOCAL_LLM_MODEL:-qwen2.5-7b-instruct}"
38
+ export OCMEMOG_LOCAL_EMBED_BASE_URL="${OCMEMOG_LOCAL_EMBED_BASE_URL:-http://127.0.0.1:18081/v1}"
39
+ export OCMEMOG_LOCAL_EMBED_MODEL="${OCMEMOG_LOCAL_EMBED_MODEL:-nomic-embed-text-v1.5}"
40
+ export OCMEMOG_OLLAMA_MODEL="${OCMEMOG_OLLAMA_MODEL:-qwen2.5:7b}"
37
41
  export OCMEMOG_OLLAMA_EMBED_MODEL="${OCMEMOG_OLLAMA_EMBED_MODEL:-nomic-embed-text:latest}"
38
- export OCMEMOG_PONDER_MODEL="${OCMEMOG_PONDER_MODEL:-qwen2.5:7b}"
39
- export BRAIN_EMBED_MODEL_PROVIDER="${BRAIN_EMBED_MODEL_PROVIDER:-ollama}"
42
+ export OCMEMOG_PONDER_MODEL="${OCMEMOG_PONDER_MODEL:-local-openai:qwen2.5-7b-instruct}"
43
+ export BRAIN_EMBED_MODEL_PROVIDER="${BRAIN_EMBED_MODEL_PROVIDER:-local-openai}"
40
44
  export BRAIN_EMBED_MODEL_LOCAL="${BRAIN_EMBED_MODEL_LOCAL:-}"
41
45
 
42
46
  # battery-aware transcript watcher defaults
@@ -153,8 +153,9 @@ def _distill_batches(endpoint: str, target: int, batch_sizes: list[int], timeout
153
153
 
154
154
  def _enable_local_embeddings() -> None:
155
155
  os.environ.setdefault("BRAIN_EMBED_MODEL_LOCAL", "")
156
- os.environ.setdefault("BRAIN_EMBED_MODEL_PROVIDER", "ollama")
157
- os.environ.setdefault("OCMEMOG_OLLAMA_EMBED_MODEL", "nomic-embed-text:latest")
156
+ os.environ.setdefault("BRAIN_EMBED_MODEL_PROVIDER", "local-openai")
157
+ os.environ.setdefault("OCMEMOG_LOCAL_EMBED_BASE_URL", "http://127.0.0.1:18081/v1")
158
+ os.environ.setdefault("OCMEMOG_LOCAL_EMBED_MODEL", "nomic-embed-text-v1.5")
158
159
 
159
160
 
160
161
  def main() -> int: