@simbimbo/memory-ocmemog 0.1.6 → 0.1.8
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +21 -1
- package/README.md +18 -10
- package/brain/runtime/config.py +6 -1
- package/brain/runtime/inference.py +67 -27
- package/brain/runtime/memory/api.py +4 -1
- package/brain/runtime/memory/context_builder.py +1 -1
- package/brain/runtime/memory/distill.py +1 -1
- package/brain/runtime/model_router.py +2 -0
- package/brain/runtime/providers.py +17 -8
- package/docs/architecture/local-runtime-2026-03-19.md +33 -0
- package/docs/notes/2026-03-18-memory-repair-and-backfill.md +3 -3
- package/docs/notes/local-model-role-matrix-2026-03-18.md +7 -3
- package/docs/usage.md +9 -5
- package/ocmemog/sidecar/app.py +1 -1
- package/package.json +1 -1
- package/scripts/install-ocmemog.sh +24 -24
- package/scripts/ocmemog-backfill-vectors.py +6 -4
- package/scripts/ocmemog-demo.py +1 -1
- package/scripts/ocmemog-install.sh +4 -12
- package/scripts/ocmemog-load-test.py +1 -1
- package/scripts/ocmemog-recall-test.py +1 -1
- package/scripts/ocmemog-reindex-vectors.py +6 -4
- package/scripts/ocmemog-sidecar.sh +9 -5
- package/scripts/ocmemog-test-rig.py +3 -2
package/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,24 @@
|
|
|
1
1
|
# Changelog
|
|
2
2
|
|
|
3
|
+
## 0.1.8 — 2026-03-19
|
|
4
|
+
|
|
5
|
+
Documentation and release follow-through after the llama.cpp migration and repo grooming pass.
|
|
6
|
+
|
|
7
|
+
### Highlights
|
|
8
|
+
- documented the stable local runtime architecture (gateway/sidecar/text/embed split)
|
|
9
|
+
- published the repo in a llama.cpp-first state with fixed ports and cleaned installers/scripts
|
|
10
|
+
- kept compatibility hooks only where still useful instead of leaving Ollama as the implied primary path
|
|
11
|
+
|
|
12
|
+
## 0.1.7 — 2026-03-19
|
|
13
|
+
|
|
14
|
+
llama.cpp-first cleanup after the 0.1.6 runtime cutover.
|
|
15
|
+
|
|
16
|
+
### Highlights
|
|
17
|
+
- made llama.cpp / local OpenAI-compatible endpoints the primary documented and scripted local runtime path
|
|
18
|
+
- reduced misleading Ollama-first defaults in installers, sidecar scripts, docs, and helper tooling
|
|
19
|
+
- aligned context/distill/runtime helpers with the fixed local model architecture (`17890` gateway, `17891` sidecar, `18080` text, `18081` embeddings)
|
|
20
|
+
- kept compatibility hooks only where still useful for rollback or mixed environments
|
|
21
|
+
|
|
3
22
|
## 0.1.6 — 2026-03-19
|
|
4
23
|
|
|
5
24
|
Port-separation and publish-solid follow-up.
|
|
@@ -8,6 +27,7 @@ Port-separation and publish-solid follow-up.
|
|
|
8
27
|
- Split ocmemog sidecar onto dedicated loopback port `17891` to avoid collision with the OpenClaw gateway/dashboard on `17890`
|
|
9
28
|
- Restored the plain realtime dashboard on `/dashboard` and fixed the `local_html` template crash
|
|
10
29
|
- Updated plugin/runtime defaults, scripts, and documentation to use the dedicated sidecar endpoint on `17891`
|
|
30
|
+
- Switched repo-facing local-runtime defaults to llama.cpp-first endpoints on `18080`/`18081` with Qwen2.5 text and `nomic-embed-text-v1.5` embeddings, while keeping Ollama as explicit legacy fallback only
|
|
11
31
|
- Added governance retrieval/governance-policy hardening plus expanded regression coverage for duplicate, contradiction, supersession, queue, audit, rollback, and auto-resolve flows
|
|
12
32
|
- Aligned package/version metadata across npm, Python, and FastAPI surfaces
|
|
13
33
|
|
|
@@ -16,7 +36,7 @@ Port-separation and publish-solid follow-up.
|
|
|
16
36
|
Repair and hardening follow-up after the 0.1.4 publish.
|
|
17
37
|
|
|
18
38
|
### Highlights
|
|
19
|
-
- Fixed vector reindex defaults so repair scripts use provider-backed
|
|
39
|
+
- Fixed vector reindex defaults so repair scripts use provider-backed local embeddings instead of silently rebuilding weak local/hash vectors
|
|
20
40
|
- Added battery-aware sidecar defaults for macOS laptops (`OCMEMOG_LAPTOP_MODE=auto|ac|battery`)
|
|
21
41
|
- Fixed `record_reinforcement()` so new experiences preserve `memory_reference`, and added integrity repair to backfill legacy missing references
|
|
22
42
|
- Added incremental vector backfill tooling (`scripts/ocmemog-backfill-vectors.py`) for non-destructive backlog repair
|
package/README.md
CHANGED
|
@@ -14,6 +14,9 @@ Architecture at a glance:
|
|
|
14
14
|
- **FastAPI sidecar (`ocmemog/sidecar/`)** exposes memory and continuity APIs
|
|
15
15
|
- **SQLite-backed runtime (`brain/runtime/memory/`)** powers storage, hydration, checkpoints, salience ranking, and pondering
|
|
16
16
|
|
|
17
|
+
Current local runtime architecture note:
|
|
18
|
+
- `docs/architecture/local-runtime-2026-03-19.md`
|
|
19
|
+
|
|
17
20
|
## Repo layout
|
|
18
21
|
|
|
19
22
|
- `openclaw.plugin.json`, `index.ts`, `package.json`: OpenClaw plugin package and manifest.
|
|
@@ -78,20 +81,24 @@ Optional environment variables:
|
|
|
78
81
|
- `OCMEMOG_OPENAI_API_BASE` (default: `https://api.openai.com/v1`)
|
|
79
82
|
- `OCMEMOG_OPENAI_EMBED_MODEL` (default: `text-embedding-3-small`)
|
|
80
83
|
- `BRAIN_EMBED_MODEL_LOCAL` (`simple` by default)
|
|
81
|
-
- `BRAIN_EMBED_MODEL_PROVIDER` (`openai` to
|
|
84
|
+
- `BRAIN_EMBED_MODEL_PROVIDER` (`local-openai` to use the local llama.cpp embedding endpoint; `openai` remains available for hosted embeddings)
|
|
82
85
|
- `OCMEMOG_TRANSCRIPT_WATCHER` (`true` to auto-start transcript watcher inside the sidecar)
|
|
83
86
|
- `OCMEMOG_TRANSCRIPT_ROOTS` (comma-separated allowed roots for transcript context retrieval; default: `~/.openclaw/workspace/memory`)
|
|
84
87
|
- `OCMEMOG_API_TOKEN` (optional; if set, requests must include `x-ocmemog-token` or `Authorization: Bearer ...`)
|
|
85
88
|
- `OCMEMOG_AUTO_HYDRATION` (`true` to re-enable prompt-time continuity prepending; defaults to `false` as a safety guard until the host runtime is verified not to persist prepended context into session history)
|
|
86
89
|
- `OCMEMOG_LAPTOP_MODE` (`auto` by default; on macOS battery power this slows watcher polling, reduces ingest batch size, and disables sentiment reinforcement unless explicitly overridden)
|
|
87
|
-
- `
|
|
88
|
-
- `
|
|
89
|
-
- `
|
|
90
|
-
- `
|
|
90
|
+
- `OCMEMOG_LOCAL_LLM_BASE_URL` (default: `http://127.0.0.1:18080/v1`; local OpenAI-compatible text endpoint, e.g. llama.cpp)
|
|
91
|
+
- `OCMEMOG_LOCAL_LLM_MODEL` (default: `qwen2.5-7b-instruct`; matches the active Qwen2.5-7B-Instruct GGUF runtime)
|
|
92
|
+
- `OCMEMOG_LOCAL_EMBED_BASE_URL` (default: `http://127.0.0.1:18081/v1`; local OpenAI-compatible embedding endpoint)
|
|
93
|
+
- `OCMEMOG_LOCAL_EMBED_MODEL` (default: `nomic-embed-text-v1.5`)
|
|
94
|
+
- `OCMEMOG_USE_OLLAMA` (`true` to force legacy Ollama local inference path)
|
|
95
|
+
- `OCMEMOG_OLLAMA_HOST` (default: `http://127.0.0.1:11434`; legacy fallback)
|
|
96
|
+
- `OCMEMOG_OLLAMA_MODEL` (default: `qwen2.5:7b`; legacy fallback for machines that still use Ollama)
|
|
97
|
+
- `OCMEMOG_OLLAMA_EMBED_MODEL` (default: `nomic-embed-text:latest`; legacy embedding fallback)
|
|
91
98
|
- `OCMEMOG_PROMOTION_THRESHOLD` (default: `0.5`)
|
|
92
99
|
- `OCMEMOG_DEMOTION_THRESHOLD` (default: `0.2`)
|
|
93
100
|
- `OCMEMOG_PONDER_ENABLED` (default: `true`)
|
|
94
|
-
- `OCMEMOG_PONDER_MODEL` (default via launcher: `qwen2.5
|
|
101
|
+
- `OCMEMOG_PONDER_MODEL` (default via launcher: `local-openai:qwen2.5-7b-instruct`; recommended for structured local memory refinement)
|
|
95
102
|
- `OCMEMOG_LESSON_MINING_ENABLED` (default: `true`)
|
|
96
103
|
|
|
97
104
|
## Security
|
|
@@ -129,12 +136,13 @@ This installer will try to:
|
|
|
129
136
|
- install Python requirements
|
|
130
137
|
- install/enable the OpenClaw plugin when the `openclaw` CLI is available
|
|
131
138
|
- install/load LaunchAgents via `scripts/ocmemog-install.sh`
|
|
132
|
-
-
|
|
139
|
+
- verify the local llama.cpp runtime and expected text/embed endpoints
|
|
133
140
|
- validate `/healthz`
|
|
134
141
|
|
|
135
142
|
Notes:
|
|
136
|
-
- If `OCMEMOG_INSTALL_PREREQS=true` and Homebrew is present, the installer will try to install missing `
|
|
137
|
-
-
|
|
143
|
+
- If `OCMEMOG_INSTALL_PREREQS=true` and Homebrew is present, the installer will try to install missing `llama.cpp` and `ffmpeg` automatically.
|
|
144
|
+
- The installer no longer pulls local models. It assumes your llama.cpp text endpoint is on `127.0.0.1:18080` and your embedding endpoint is on `127.0.0.1:18081`.
|
|
145
|
+
- Legacy Ollama compatibility remains available only when you explicitly opt into it with `OCMEMOG_USE_OLLAMA=true`.
|
|
138
146
|
- If package install is unavailable in the local OpenClaw build, the installer falls back to local-path plugin install.
|
|
139
147
|
- Advanced flags are available for local debugging/CI (`--skip-plugin-install`, `--skip-launchagents`, `--skip-model-pulls`, `--endpoint`, `--repo-url`).
|
|
140
148
|
|
|
@@ -154,7 +162,7 @@ launchctl bootstrap gui/$UID scripts/launchagents/com.openclaw.ocmemog.guard.pli
|
|
|
154
162
|
|
|
155
163
|
## Recent changes
|
|
156
164
|
|
|
157
|
-
### 0.1.
|
|
165
|
+
### 0.1.6 (current main)
|
|
158
166
|
|
|
159
167
|
Package ownership + runtime safety release:
|
|
160
168
|
- Publish package under `@simbimbo/memory-ocmemog` instead of the unauthorized `@openclaw` scope
|
package/brain/runtime/config.py
CHANGED
|
@@ -9,8 +9,13 @@ OCMEMOG_MEMORY_MODEL = os.environ.get("OCMEMOG_MEMORY_MODEL", "gpt-4o-mini")
|
|
|
9
9
|
OCMEMOG_OPENAI_API_BASE = os.environ.get("OCMEMOG_OPENAI_API_BASE", "https://api.openai.com/v1")
|
|
10
10
|
OCMEMOG_OPENAI_EMBED_MODEL = os.environ.get("OCMEMOG_OPENAI_EMBED_MODEL", "text-embedding-3-small")
|
|
11
11
|
|
|
12
|
+
OCMEMOG_LOCAL_LLM_BASE_URL = os.environ.get("OCMEMOG_LOCAL_LLM_BASE_URL", "http://127.0.0.1:18080/v1")
|
|
13
|
+
OCMEMOG_LOCAL_LLM_MODEL = os.environ.get("OCMEMOG_LOCAL_LLM_MODEL", "qwen2.5-7b-instruct")
|
|
14
|
+
OCMEMOG_LOCAL_EMBED_BASE_URL = os.environ.get("OCMEMOG_LOCAL_EMBED_BASE_URL", "http://127.0.0.1:18081/v1")
|
|
15
|
+
OCMEMOG_LOCAL_EMBED_MODEL = os.environ.get("OCMEMOG_LOCAL_EMBED_MODEL", "nomic-embed-text-v1.5")
|
|
16
|
+
|
|
12
17
|
OCMEMOG_OLLAMA_HOST = os.environ.get("OCMEMOG_OLLAMA_HOST", "http://127.0.0.1:11434")
|
|
13
|
-
OCMEMOG_OLLAMA_MODEL = os.environ.get("OCMEMOG_OLLAMA_MODEL", "
|
|
18
|
+
OCMEMOG_OLLAMA_MODEL = os.environ.get("OCMEMOG_OLLAMA_MODEL", "qwen2.5:7b")
|
|
14
19
|
OCMEMOG_OLLAMA_EMBED_MODEL = os.environ.get("OCMEMOG_OLLAMA_EMBED_MODEL", "nomic-embed-text:latest")
|
|
15
20
|
|
|
16
21
|
OCMEMOG_PROMOTION_THRESHOLD = float(os.environ.get("OCMEMOG_PROMOTION_THRESHOLD", "0.5"))
|
|
@@ -11,6 +11,35 @@ from brain.runtime.instrumentation import emit_event
|
|
|
11
11
|
LOGFILE = state_store.reports_dir() / "brain_memory.log.jsonl"
|
|
12
12
|
|
|
13
13
|
|
|
14
|
+
def _infer_openai_compatible(prompt: str, *, base_url: str, model: str, api_key: str | None = None, provider_label: str = "openai-compatible") -> dict[str, str]:
|
|
15
|
+
url = f"{base_url.rstrip('/')}/chat/completions"
|
|
16
|
+
payload = {
|
|
17
|
+
"model": model,
|
|
18
|
+
"messages": [{"role": "user", "content": prompt}],
|
|
19
|
+
"temperature": 0.2,
|
|
20
|
+
}
|
|
21
|
+
data = json.dumps(payload).encode("utf-8")
|
|
22
|
+
req = urllib.request.Request(url, data=data, method="POST")
|
|
23
|
+
if api_key:
|
|
24
|
+
req.add_header("Authorization", f"Bearer {api_key}")
|
|
25
|
+
req.add_header("Content-Type", "application/json")
|
|
26
|
+
|
|
27
|
+
try:
|
|
28
|
+
with urllib.request.urlopen(req, timeout=30) as resp:
|
|
29
|
+
response = json.loads(resp.read().decode("utf-8"))
|
|
30
|
+
except Exception as exc:
|
|
31
|
+
emit_event(LOGFILE, "brain_infer_error", status="error", provider=provider_label, error=str(exc))
|
|
32
|
+
return {"status": "error", "error": f"request_failed:{exc}"}
|
|
33
|
+
|
|
34
|
+
try:
|
|
35
|
+
output = response["choices"][0]["message"]["content"]
|
|
36
|
+
except Exception as exc:
|
|
37
|
+
emit_event(LOGFILE, "brain_infer_error", status="error", provider=provider_label, error=str(exc))
|
|
38
|
+
return {"status": "error", "error": "invalid_response"}
|
|
39
|
+
|
|
40
|
+
return {"status": "ok", "output": str(output).strip()}
|
|
41
|
+
|
|
42
|
+
|
|
14
43
|
def _infer_ollama(prompt: str, model: str | None = None) -> dict[str, str]:
|
|
15
44
|
payload = {
|
|
16
45
|
"model": model or config.OCMEMOG_OLLAMA_MODEL,
|
|
@@ -33,6 +62,21 @@ def _infer_ollama(prompt: str, model: str | None = None) -> dict[str, str]:
|
|
|
33
62
|
return {"status": "ok", "output": str(output).strip()}
|
|
34
63
|
|
|
35
64
|
|
|
65
|
+
def _looks_like_local_openai_model(name: str) -> bool:
|
|
66
|
+
if not name:
|
|
67
|
+
return False
|
|
68
|
+
lowered = name.strip().lower()
|
|
69
|
+
return lowered.startswith("local-openai:") or lowered.startswith("local_openai:") or lowered.startswith("llamacpp:")
|
|
70
|
+
|
|
71
|
+
|
|
72
|
+
def _normalize_local_model_name(name: str) -> str:
|
|
73
|
+
lowered = (name or "").strip()
|
|
74
|
+
for prefix in ("local-openai:", "local_openai:", "llamacpp:"):
|
|
75
|
+
if lowered.lower().startswith(prefix):
|
|
76
|
+
return lowered[len(prefix):]
|
|
77
|
+
return lowered
|
|
78
|
+
|
|
79
|
+
|
|
36
80
|
def _looks_like_ollama_model(name: str) -> bool:
|
|
37
81
|
if not name:
|
|
38
82
|
return False
|
|
@@ -69,41 +113,37 @@ def infer(prompt: str, provider_name: str | None = None) -> dict[str, str]:
|
|
|
69
113
|
|
|
70
114
|
use_ollama = os.environ.get("OCMEMOG_USE_OLLAMA", "").lower() in {"1", "true", "yes"}
|
|
71
115
|
model_override = provider_name or config.OCMEMOG_MEMORY_MODEL
|
|
116
|
+
if _looks_like_local_openai_model(model_override):
|
|
117
|
+
model = _normalize_local_model_name(model_override) or config.OCMEMOG_LOCAL_LLM_MODEL
|
|
118
|
+
return _infer_openai_compatible(
|
|
119
|
+
prompt,
|
|
120
|
+
base_url=config.OCMEMOG_LOCAL_LLM_BASE_URL,
|
|
121
|
+
model=model,
|
|
122
|
+
api_key=os.environ.get("OCMEMOG_LOCAL_LLM_API_KEY") or os.environ.get("LOCAL_LLM_API_KEY"),
|
|
123
|
+
provider_label="local-openai",
|
|
124
|
+
)
|
|
72
125
|
if use_ollama or _looks_like_ollama_model(model_override):
|
|
73
126
|
model = model_override.split(":", 1)[-1] if model_override.startswith("ollama:") else model_override
|
|
74
127
|
return _infer_ollama(prompt, model)
|
|
75
128
|
|
|
76
129
|
api_key = os.environ.get("OCMEMOG_OPENAI_API_KEY") or os.environ.get("OPENAI_API_KEY")
|
|
77
130
|
if not api_key:
|
|
78
|
-
|
|
79
|
-
|
|
131
|
+
return _infer_openai_compatible(
|
|
132
|
+
prompt,
|
|
133
|
+
base_url=config.OCMEMOG_LOCAL_LLM_BASE_URL,
|
|
134
|
+
model=config.OCMEMOG_LOCAL_LLM_MODEL,
|
|
135
|
+
api_key=os.environ.get("OCMEMOG_LOCAL_LLM_API_KEY") or os.environ.get("LOCAL_LLM_API_KEY"),
|
|
136
|
+
provider_label="local-openai",
|
|
137
|
+
)
|
|
80
138
|
|
|
81
139
|
model = model_override
|
|
82
|
-
|
|
83
|
-
|
|
84
|
-
|
|
85
|
-
|
|
86
|
-
|
|
87
|
-
|
|
88
|
-
|
|
89
|
-
req = urllib.request.Request(url, data=data, method="POST")
|
|
90
|
-
req.add_header("Authorization", f"Bearer {api_key}")
|
|
91
|
-
req.add_header("Content-Type", "application/json")
|
|
92
|
-
|
|
93
|
-
try:
|
|
94
|
-
with urllib.request.urlopen(req, timeout=30) as resp:
|
|
95
|
-
response = json.loads(resp.read().decode("utf-8"))
|
|
96
|
-
except Exception as exc:
|
|
97
|
-
emit_event(LOGFILE, "brain_infer_error", status="error", provider="openai", error=str(exc))
|
|
98
|
-
return {"status": "error", "error": f"request_failed:{exc}"}
|
|
99
|
-
|
|
100
|
-
try:
|
|
101
|
-
output = response["choices"][0]["message"]["content"]
|
|
102
|
-
except Exception as exc:
|
|
103
|
-
emit_event(LOGFILE, "brain_infer_error", status="error", provider="openai", error=str(exc))
|
|
104
|
-
return {"status": "error", "error": "invalid_response"}
|
|
105
|
-
|
|
106
|
-
return {"status": "ok", "output": str(output).strip()}
|
|
140
|
+
return _infer_openai_compatible(
|
|
141
|
+
prompt,
|
|
142
|
+
base_url=config.OCMEMOG_OPENAI_API_BASE,
|
|
143
|
+
model=model,
|
|
144
|
+
api_key=api_key,
|
|
145
|
+
provider_label="openai",
|
|
146
|
+
)
|
|
107
147
|
|
|
108
148
|
|
|
109
149
|
def parse_operator_name(text: str) -> dict[str, str] | None:
|
|
@@ -316,7 +316,10 @@ def _model_contradiction_hint(left: str, right: str) -> Optional[Dict[str, Any]]
|
|
|
316
316
|
f"Statement A: {left}\n"
|
|
317
317
|
f"Statement B: {right}\n"
|
|
318
318
|
)
|
|
319
|
-
result = inference.infer(
|
|
319
|
+
result = inference.infer(
|
|
320
|
+
prompt,
|
|
321
|
+
provider_name=os.environ.get("OCMEMOG_PONDER_MODEL", "local-openai:qwen2.5-7b-instruct"),
|
|
322
|
+
)
|
|
320
323
|
if result.get("status") != "ok":
|
|
321
324
|
return None
|
|
322
325
|
try:
|
|
@@ -53,7 +53,7 @@ def _groom_queries(prompt: str, limit: int = 3) -> List[str]:
|
|
|
53
53
|
return []
|
|
54
54
|
if _should_skip_query_grooming(cleaned):
|
|
55
55
|
return _heuristic_queries(cleaned, limit=limit)
|
|
56
|
-
model = os.environ.get("OCMEMOG_PONDER_MODEL", "qwen2.5
|
|
56
|
+
model = os.environ.get("OCMEMOG_PONDER_MODEL", "local-openai:qwen2.5-7b-instruct")
|
|
57
57
|
ask = (
|
|
58
58
|
"Rewrite this raw memory request into up to 3 short search queries. "
|
|
59
59
|
"Return strict JSON as {\"queries\":[\"...\"]}. "
|
|
@@ -43,7 +43,7 @@ def _local_distill_summary(text: str) -> str:
|
|
|
43
43
|
f"Experience:\n{text}\n\n"
|
|
44
44
|
"Summary:"
|
|
45
45
|
)
|
|
46
|
-
model = os.environ.get("OCMEMOG_PONDER_MODEL", "qwen2.5
|
|
46
|
+
model = os.environ.get("OCMEMOG_PONDER_MODEL", "local-openai:qwen2.5-7b-instruct")
|
|
47
47
|
try:
|
|
48
48
|
result = inference.infer(prompt, provider_name=model)
|
|
49
49
|
except Exception:
|
|
@@ -17,6 +17,8 @@ def get_provider_for_role(role: str) -> ModelSelection:
|
|
|
17
17
|
provider = (config.BRAIN_EMBED_MODEL_PROVIDER or "").strip().lower()
|
|
18
18
|
if provider in {"openai", "openai_compatible", "openai-compatible"}:
|
|
19
19
|
return ModelSelection(provider_id="openai", model=config.OCMEMOG_OPENAI_EMBED_MODEL)
|
|
20
|
+
if provider in {"local-openai", "local_openai", "llamacpp", "llama.cpp"}:
|
|
21
|
+
return ModelSelection(provider_id="local-openai", model=config.OCMEMOG_LOCAL_EMBED_MODEL)
|
|
20
22
|
if provider in {"ollama", "local-ollama"}:
|
|
21
23
|
return ModelSelection(provider_id="ollama", model=config.OCMEMOG_OLLAMA_EMBED_MODEL)
|
|
22
24
|
return ModelSelection()
|
|
@@ -14,25 +14,34 @@ class ProviderExecute:
|
|
|
14
14
|
def execute_embedding_call(self, selection, text: str) -> dict[str, object]:
|
|
15
15
|
provider_id = getattr(selection, "provider_id", "") or ""
|
|
16
16
|
model = getattr(selection, "model", "") or config.OCMEMOG_OPENAI_EMBED_MODEL
|
|
17
|
-
if provider_id
|
|
18
|
-
api_key =
|
|
19
|
-
|
|
20
|
-
|
|
21
|
-
|
|
17
|
+
if provider_id in {"openai", "local-openai"}:
|
|
18
|
+
api_key = None
|
|
19
|
+
url_base = config.OCMEMOG_OPENAI_API_BASE
|
|
20
|
+
provider_label = "openai"
|
|
21
|
+
if provider_id == "openai":
|
|
22
|
+
api_key = os.environ.get("OCMEMOG_OPENAI_API_KEY") or os.environ.get("OPENAI_API_KEY")
|
|
23
|
+
if not api_key:
|
|
24
|
+
return {}
|
|
25
|
+
else:
|
|
26
|
+
url_base = config.OCMEMOG_LOCAL_EMBED_BASE_URL
|
|
27
|
+
api_key = os.environ.get("OCMEMOG_LOCAL_EMBED_API_KEY") or os.environ.get("LOCAL_EMBED_API_KEY")
|
|
28
|
+
provider_label = "local-openai"
|
|
29
|
+
url = f"{url_base.rstrip('/')}/embeddings"
|
|
22
30
|
payload = json.dumps({"model": model, "input": text}).encode("utf-8")
|
|
23
31
|
req = urllib.request.Request(url, data=payload, method="POST")
|
|
24
|
-
|
|
32
|
+
if api_key:
|
|
33
|
+
req.add_header("Authorization", f"Bearer {api_key}")
|
|
25
34
|
req.add_header("Content-Type", "application/json")
|
|
26
35
|
try:
|
|
27
36
|
with urllib.request.urlopen(req, timeout=20) as resp:
|
|
28
37
|
data = json.loads(resp.read().decode("utf-8"))
|
|
29
38
|
except Exception as exc:
|
|
30
|
-
emit_event(LOGFILE, "brain_embedding_provider_error", status="error", provider=
|
|
39
|
+
emit_event(LOGFILE, "brain_embedding_provider_error", status="error", provider=provider_label, error=str(exc))
|
|
31
40
|
return {}
|
|
32
41
|
try:
|
|
33
42
|
embedding = data["data"][0]["embedding"]
|
|
34
43
|
except Exception as exc:
|
|
35
|
-
emit_event(LOGFILE, "brain_embedding_provider_error", status="error", provider=
|
|
44
|
+
emit_event(LOGFILE, "brain_embedding_provider_error", status="error", provider=provider_label, error=str(exc))
|
|
36
45
|
return {}
|
|
37
46
|
return {"embedding": embedding}
|
|
38
47
|
|
|
@@ -0,0 +1,33 @@
|
|
|
1
|
+
# Local Runtime Architecture — 2026-03-19
|
|
2
|
+
|
|
3
|
+
This repo is now documented and operated with a **llama.cpp-first** local runtime architecture.
|
|
4
|
+
|
|
5
|
+
## Stable loopback-only service split
|
|
6
|
+
|
|
7
|
+
- OpenClaw gateway/dashboard: `127.0.0.1:17890`
|
|
8
|
+
- ocmemog sidecar/dashboard: `127.0.0.1:17891`
|
|
9
|
+
- llama.cpp text inference: `127.0.0.1:18080`
|
|
10
|
+
- llama.cpp embeddings: `127.0.0.1:18081`
|
|
11
|
+
|
|
12
|
+
## Active local models
|
|
13
|
+
|
|
14
|
+
- Text: `Qwen2.5-7B-Instruct-Q4_K_M.gguf`
|
|
15
|
+
- Embeddings: `nomic-embed-text-v1.5.Q4_K_M.gguf`
|
|
16
|
+
|
|
17
|
+
## Configuration direction
|
|
18
|
+
|
|
19
|
+
Primary local envs:
|
|
20
|
+
|
|
21
|
+
- `OCMEMOG_LOCAL_LLM_BASE_URL=http://127.0.0.1:18080/v1`
|
|
22
|
+
- `OCMEMOG_LOCAL_LLM_MODEL=qwen2.5-7b-instruct`
|
|
23
|
+
- `OCMEMOG_LOCAL_EMBED_BASE_URL=http://127.0.0.1:18081/v1`
|
|
24
|
+
- `OCMEMOG_LOCAL_EMBED_MODEL=nomic-embed-text-v1.5`
|
|
25
|
+
|
|
26
|
+
Legacy Ollama knobs may remain in code for compatibility/rollback, but they are **not the primary runtime path**.
|
|
27
|
+
|
|
28
|
+
## Operational notes
|
|
29
|
+
|
|
30
|
+
- The sidecar should remain loopback-only by default.
|
|
31
|
+
- The old plain dashboard lives at `http://127.0.0.1:17891/dashboard`.
|
|
32
|
+
- Memory search and pondering should target the sidecar, not the OpenClaw gateway port.
|
|
33
|
+
- Avoid reusing `17890` for the sidecar; that previously caused a routing collision with the OpenClaw dashboard/gateway.
|
|
@@ -12,8 +12,8 @@ This pass focused on turning `ocmemog` from a noisy/fragile memory stack into a
|
|
|
12
12
|
## Changes landed
|
|
13
13
|
|
|
14
14
|
### Embedding and rebuild behavior
|
|
15
|
-
- Fixed the vector reindex entrypoint so it defaults to provider-backed
|
|
16
|
-
-
|
|
15
|
+
- Fixed the vector reindex entrypoint so it defaults to provider-backed local embeddings instead of silently rebuilding weak hash/simple vectors.
|
|
16
|
+
- At the time this landed, the provider-backed path used Ollama-hosted `nomic-embed-text:latest`; the current repo default is the llama.cpp embedding endpoint on `127.0.0.1:18081` with `nomic-embed-text-v1.5`.
|
|
17
17
|
- Added a new incremental repair path:
|
|
18
18
|
- `backfill_missing_vectors()` in `brain/runtime/memory/vector_index.py`
|
|
19
19
|
- `scripts/ocmemog-backfill-vectors.py`
|
|
@@ -62,7 +62,7 @@ For laptop-friendly backlog burn-down, use staged backfills in roughly this orde
|
|
|
62
62
|
6. knowledge last
|
|
63
63
|
|
|
64
64
|
## Commits from this sweep
|
|
65
|
-
- `f3d3dd9` — fix: default vector reindex to
|
|
65
|
+
- `f3d3dd9` — fix: default vector reindex to provider-backed embeddings
|
|
66
66
|
- `759d23d` — feat: add battery-aware sidecar defaults
|
|
67
67
|
- `4a102eb` — fix: clean memory freshness summaries
|
|
68
68
|
- `9ee7966` — fix: report duplicate promotion counts accurately
|
|
@@ -1,8 +1,10 @@
|
|
|
1
1
|
# Local model role matrix — 2026-03-18
|
|
2
2
|
|
|
3
|
+
Historical note: this bakeoff was recorded before the local-runtime cutover from Ollama to llama.cpp. Keep the conclusions, but map them onto the current llama.cpp-served GGUF models when using this repo today.
|
|
4
|
+
|
|
3
5
|
Purpose: document which installed local model is best suited for which `ocmemog` task so background cognition can be smarter without putting heavy/slow models on every path.
|
|
4
6
|
|
|
5
|
-
Installed local models observed:
|
|
7
|
+
Installed local models observed at the time:
|
|
6
8
|
- `phi3:latest`
|
|
7
9
|
- `qwen2.5:7b`
|
|
8
10
|
- `llama3.1:8b`
|
|
@@ -45,6 +47,8 @@ Installed local models observed:
|
|
|
45
47
|
- richer optional background cognition: `llama3.1:8b`
|
|
46
48
|
|
|
47
49
|
## Operational recommendation
|
|
48
|
-
-
|
|
49
|
-
- Set `OCMEMOG_PONDER_MODEL=qwen2.5
|
|
50
|
+
- Current llama.cpp-first equivalent for this repo:
|
|
51
|
+
- Set `OCMEMOG_LOCAL_LLM_MODEL=qwen2.5-7b-instruct` and `OCMEMOG_PONDER_MODEL=local-openai:qwen2.5-7b-instruct` for unresolved-state rewrite, lesson extraction, and cluster recommendation shaping.
|
|
52
|
+
- Set `OCMEMOG_LOCAL_EMBED_MODEL=nomic-embed-text-v1.5` for embeddings on the `18081` endpoint.
|
|
53
|
+
- If you intentionally keep Ollama on another machine, prefer `OCMEMOG_OLLAMA_MODEL=qwen2.5:7b` instead of `phi3`.
|
|
50
54
|
- Consider `llama3.1:8b` for optional deeper background cognition passes where latency is acceptable.
|
package/docs/usage.md
CHANGED
|
@@ -2,10 +2,10 @@
|
|
|
2
2
|
|
|
3
3
|
## Current operating model
|
|
4
4
|
|
|
5
|
-
ocmemog is a repo-local OpenClaw memory sidecar backed by SQLite. It is not a full brAIn runtime clone. The safe assumption is:
|
|
5
|
+
ocmemog is a repo-local OpenClaw memory sidecar backed by SQLite with llama.cpp-first local inference and embeddings. It is not a full brAIn runtime clone. The safe assumption is:
|
|
6
6
|
|
|
7
7
|
- search/get over local memory are supported
|
|
8
|
-
-
|
|
8
|
+
- provider-backed local embeddings are the primary path
|
|
9
9
|
- several advanced brAIn memory flows are copied in but still degraded by missing runtime dependencies
|
|
10
10
|
|
|
11
11
|
## Running the sidecar
|
|
@@ -47,8 +47,12 @@ export OCMEMOG_MEMORY_MODEL=gpt-4o-mini
|
|
|
47
47
|
export OCMEMOG_OPENAI_API_KEY=sk-...
|
|
48
48
|
export OCMEMOG_OPENAI_API_BASE=https://api.openai.com/v1
|
|
49
49
|
export OCMEMOG_OPENAI_EMBED_MODEL=text-embedding-3-small
|
|
50
|
+
export OCMEMOG_LOCAL_LLM_BASE_URL=http://127.0.0.1:18080/v1
|
|
51
|
+
export OCMEMOG_LOCAL_LLM_MODEL=qwen2.5-7b-instruct
|
|
52
|
+
export OCMEMOG_LOCAL_EMBED_BASE_URL=http://127.0.0.1:18081/v1
|
|
53
|
+
export OCMEMOG_LOCAL_EMBED_MODEL=nomic-embed-text-v1.5
|
|
50
54
|
export BRAIN_EMBED_MODEL_LOCAL=simple
|
|
51
|
-
export BRAIN_EMBED_MODEL_PROVIDER=openai
|
|
55
|
+
export BRAIN_EMBED_MODEL_PROVIDER=local-openai
|
|
52
56
|
export OCMEMOG_TRANSCRIPT_DIR=$HOME/.openclaw/workspace/memory/transcripts
|
|
53
57
|
export OCMEMOG_TRANSCRIPT_GLOB=*.log
|
|
54
58
|
export OCMEMOG_TRANSCRIPT_POLL_SECONDS=1
|
|
@@ -182,8 +186,8 @@ Notes:
|
|
|
182
186
|
- `brain/runtime/memory/api.py`
|
|
183
187
|
- It targets missing/legacy tables and columns.
|
|
184
188
|
- Provider-backed embeddings
|
|
185
|
-
- Available when `BRAIN_EMBED_MODEL_PROVIDER=openai` and
|
|
186
|
-
-
|
|
189
|
+
- Available when `BRAIN_EMBED_MODEL_PROVIDER=local-openai` and the local embedding endpoint is reachable.
|
|
190
|
+
- Legacy OpenAI-hosted embeddings remain available when `BRAIN_EMBED_MODEL_PROVIDER=openai` and `OCMEMOG_OPENAI_API_KEY` is set.
|
|
187
191
|
- Model-backed distillation
|
|
188
192
|
- Available when `OCMEMOG_OPENAI_API_KEY` is set; otherwise falls back to heuristic distill.
|
|
189
193
|
- Role-prioritized context building
|
package/ocmemog/sidecar/app.py
CHANGED
|
@@ -19,7 +19,7 @@ from ocmemog.sidecar.transcript_watcher import watch_forever
|
|
|
19
19
|
|
|
20
20
|
DEFAULT_CATEGORIES = ("knowledge", "reflections", "directives", "tasks", "runbooks", "lessons")
|
|
21
21
|
|
|
22
|
-
app = FastAPI(title="ocmemog sidecar", version="0.1.
|
|
22
|
+
app = FastAPI(title="ocmemog sidecar", version="0.1.8")
|
|
23
23
|
|
|
24
24
|
API_TOKEN = os.environ.get("OCMEMOG_API_TOKEN")
|
|
25
25
|
|
package/package.json
CHANGED
|
@@ -8,7 +8,9 @@ PLUGIN_PACKAGE="@simbimbo/memory-ocmemog"
|
|
|
8
8
|
PLUGIN_ID="memory-ocmemog"
|
|
9
9
|
ENDPOINT="${OCMEMOG_ENDPOINT:-http://127.0.0.1:17891}"
|
|
10
10
|
TIMEOUT_MS="${OCMEMOG_TIMEOUT_MS:-30000}"
|
|
11
|
-
|
|
11
|
+
DEFAULT_LOCAL_LLM_MODEL="${OCMEMOG_LOCAL_LLM_MODEL:-qwen2.5-7b-instruct}"
|
|
12
|
+
DEFAULT_LOCAL_EMBED_MODEL="${OCMEMOG_LOCAL_EMBED_MODEL:-nomic-embed-text-v1.5}"
|
|
13
|
+
DEFAULT_OLLAMA_MODEL="${OCMEMOG_OLLAMA_MODEL:-qwen2.5:7b}"
|
|
12
14
|
DEFAULT_OLLAMA_EMBED_MODEL="${OCMEMOG_OLLAMA_EMBED_MODEL:-nomic-embed-text:latest}"
|
|
13
15
|
INSTALL_PREREQS="${OCMEMOG_INSTALL_PREREQS:-false}"
|
|
14
16
|
SKIP_PLUGIN_INSTALL="false"
|
|
@@ -27,10 +29,10 @@ Arguments:
|
|
|
27
29
|
|
|
28
30
|
Options:
|
|
29
31
|
--help Show this help text.
|
|
30
|
-
--install-prereqs Auto-install missing
|
|
32
|
+
--install-prereqs Auto-install missing llama.cpp/ffmpeg via Homebrew.
|
|
31
33
|
--skip-plugin-install Skip OpenClaw plugin install/enable.
|
|
32
34
|
--skip-launchagents Skip LaunchAgent install/load.
|
|
33
|
-
--skip-model-pulls Skip local
|
|
35
|
+
--skip-model-pulls Skip local llama.cpp runtime checks.
|
|
34
36
|
--dry-run Print what would happen without making changes.
|
|
35
37
|
--endpoint URL Override sidecar endpoint (default: http://127.0.0.1:17891).
|
|
36
38
|
--timeout-ms N Override plugin timeout summary value (default: 30000).
|
|
@@ -38,8 +40,10 @@ Options:
|
|
|
38
40
|
|
|
39
41
|
Environment:
|
|
40
42
|
OCMEMOG_INSTALL_PREREQS=true Same as --install-prereqs.
|
|
41
|
-
|
|
42
|
-
|
|
43
|
+
OCMEMOG_LOCAL_LLM_MODEL Default local llama.cpp/OpenAI-compatible text model.
|
|
44
|
+
OCMEMOG_LOCAL_EMBED_MODEL Default local llama.cpp/OpenAI-compatible embedding model.
|
|
45
|
+
OCMEMOG_OLLAMA_MODEL Legacy Ollama text model fallback.
|
|
46
|
+
OCMEMOG_OLLAMA_EMBED_MODEL Legacy Ollama embedding model fallback.
|
|
43
47
|
EOF
|
|
44
48
|
}
|
|
45
49
|
|
|
@@ -125,9 +129,9 @@ maybe_install_prereqs() {
|
|
|
125
129
|
warn "Homebrew not found; cannot auto-install prerequisites"
|
|
126
130
|
return
|
|
127
131
|
fi
|
|
128
|
-
if ! have
|
|
129
|
-
log "Installing
|
|
130
|
-
run_cmd brew install
|
|
132
|
+
if ! have llama-server; then
|
|
133
|
+
log "Installing llama.cpp via Homebrew"
|
|
134
|
+
run_cmd brew install llama.cpp || warn "brew install llama.cpp failed"
|
|
131
135
|
fi
|
|
132
136
|
if ! have ffmpeg; then
|
|
133
137
|
log "Installing ffmpeg via Homebrew"
|
|
@@ -206,23 +210,18 @@ install_launchagents() {
|
|
|
206
210
|
run_cmd "$ROOT_DIR/scripts/ocmemog-install.sh"
|
|
207
211
|
}
|
|
208
212
|
|
|
209
|
-
|
|
213
|
+
ensure_local_runtime() {
|
|
210
214
|
if [[ "$SKIP_MODEL_PULLS" == "true" ]]; then
|
|
211
|
-
log "Skipping local
|
|
215
|
+
log "Skipping local llama.cpp runtime checks by request"
|
|
212
216
|
return
|
|
213
217
|
fi
|
|
214
|
-
if ! have
|
|
215
|
-
warn "
|
|
218
|
+
if ! have llama-server; then
|
|
219
|
+
warn "llama-server not found. Install llama.cpp or provide your own local OpenAI-compatible endpoints."
|
|
216
220
|
return
|
|
217
221
|
fi
|
|
218
|
-
|
|
219
|
-
|
|
220
|
-
|
|
221
|
-
fi
|
|
222
|
-
if ! ollama list | rg -q "$(printf '%s' "$DEFAULT_OLLAMA_EMBED_MODEL" | sed 's/:.*$//')"; then
|
|
223
|
-
log "Pulling local embed model $DEFAULT_OLLAMA_EMBED_MODEL"
|
|
224
|
-
run_cmd ollama pull "$DEFAULT_OLLAMA_EMBED_MODEL"
|
|
225
|
-
fi
|
|
222
|
+
log "Detected llama.cpp runtime via llama-server"
|
|
223
|
+
log "Expect local text endpoint at http://127.0.0.1:18080/v1 using model $DEFAULT_LOCAL_LLM_MODEL"
|
|
224
|
+
log "Expect local embed endpoint at http://127.0.0.1:18081/v1 using model $DEFAULT_LOCAL_EMBED_MODEL"
|
|
226
225
|
}
|
|
227
226
|
|
|
228
227
|
validate_install() {
|
|
@@ -252,12 +251,13 @@ ocmemog install summary
|
|
|
252
251
|
- repo: $ROOT_DIR
|
|
253
252
|
- endpoint: $ENDPOINT
|
|
254
253
|
- timeoutMs: $TIMEOUT_MS
|
|
255
|
-
- local model: $
|
|
256
|
-
- embed model: $
|
|
254
|
+
- local text model: $DEFAULT_LOCAL_LLM_MODEL
|
|
255
|
+
- local embed model: $DEFAULT_LOCAL_EMBED_MODEL
|
|
256
|
+
- legacy Ollama fallback model: $DEFAULT_OLLAMA_MODEL
|
|
257
257
|
- install prereqs automatically: $INSTALL_PREREQS
|
|
258
258
|
- skip plugin install: $SKIP_PLUGIN_INSTALL
|
|
259
259
|
- skip LaunchAgents: $SKIP_LAUNCHAGENTS
|
|
260
|
-
- skip
|
|
260
|
+
- skip local runtime checks: $SKIP_MODEL_PULLS
|
|
261
261
|
- dry run: $DRY_RUN
|
|
262
262
|
|
|
263
263
|
Next checks:
|
|
@@ -272,6 +272,6 @@ maybe_install_prereqs
|
|
|
272
272
|
ensure_python
|
|
273
273
|
install_plugin
|
|
274
274
|
install_launchagents
|
|
275
|
-
|
|
275
|
+
ensure_local_runtime
|
|
276
276
|
validate_install
|
|
277
277
|
print_summary
|
|
@@ -9,10 +9,12 @@ from pathlib import Path
|
|
|
9
9
|
REPO_ROOT = Path(__file__).resolve().parents[1]
|
|
10
10
|
sys.path.insert(0, str(REPO_ROOT))
|
|
11
11
|
|
|
12
|
-
os.environ.setdefault("OCMEMOG_USE_OLLAMA", "
|
|
13
|
-
os.environ.setdefault("
|
|
14
|
-
os.environ.setdefault("
|
|
15
|
-
os.environ.setdefault("
|
|
12
|
+
os.environ.setdefault("OCMEMOG_USE_OLLAMA", "false")
|
|
13
|
+
os.environ.setdefault("OCMEMOG_LOCAL_LLM_BASE_URL", "http://127.0.0.1:18080/v1")
|
|
14
|
+
os.environ.setdefault("OCMEMOG_LOCAL_LLM_MODEL", "qwen2.5-7b-instruct")
|
|
15
|
+
os.environ.setdefault("OCMEMOG_LOCAL_EMBED_BASE_URL", "http://127.0.0.1:18081/v1")
|
|
16
|
+
os.environ.setdefault("OCMEMOG_LOCAL_EMBED_MODEL", "nomic-embed-text-v1.5")
|
|
17
|
+
os.environ.setdefault("BRAIN_EMBED_MODEL_PROVIDER", "local-openai")
|
|
16
18
|
os.environ.setdefault("BRAIN_EMBED_MODEL_LOCAL", "")
|
|
17
19
|
os.environ.setdefault("OCMEMOG_STATE_DIR", str(REPO_ROOT / ".ocmemog-state"))
|
|
18
20
|
|
package/scripts/ocmemog-demo.py
CHANGED
|
@@ -66,21 +66,13 @@ for plist in "$ROOT_DIR"/scripts/launchagents/com.openclaw.ocmemog.{sidecar,pond
|
|
|
66
66
|
echo "Loaded $label"
|
|
67
67
|
done
|
|
68
68
|
|
|
69
|
-
if ! command -v
|
|
70
|
-
echo "
|
|
71
|
-
echo "Then run: ollama pull phi3 && ollama pull nomic-embed-text"
|
|
69
|
+
if ! command -v llama-server >/dev/null 2>&1; then
|
|
70
|
+
echo "llama.cpp not found. Install with: brew install llama.cpp"
|
|
72
71
|
exit 0
|
|
73
72
|
fi
|
|
74
73
|
|
|
75
|
-
|
|
76
|
-
|
|
77
|
-
ollama pull phi3
|
|
78
|
-
fi
|
|
79
|
-
|
|
80
|
-
if ! ollama list | rg -q "nomic-embed-text"; then
|
|
81
|
-
echo "Pulling nomic-embed-text..."
|
|
82
|
-
ollama pull nomic-embed-text
|
|
83
|
-
fi
|
|
74
|
+
echo "Expect local llama.cpp text endpoint at http://127.0.0.1:18080/v1"
|
|
75
|
+
echo "Expect local llama.cpp embed endpoint at http://127.0.0.1:18081/v1"
|
|
84
76
|
|
|
85
77
|
if ! command -v ffmpeg >/dev/null 2>&1; then
|
|
86
78
|
echo "ffmpeg not found. Install with: brew install ffmpeg"
|
|
@@ -8,10 +8,12 @@ from pathlib import Path
|
|
|
8
8
|
REPO_ROOT = Path(__file__).resolve().parents[1]
|
|
9
9
|
sys.path.insert(0, str(REPO_ROOT))
|
|
10
10
|
|
|
11
|
-
os.environ.setdefault("OCMEMOG_USE_OLLAMA", "
|
|
12
|
-
os.environ.setdefault("
|
|
13
|
-
os.environ.setdefault("
|
|
14
|
-
os.environ.setdefault("
|
|
11
|
+
os.environ.setdefault("OCMEMOG_USE_OLLAMA", "false")
|
|
12
|
+
os.environ.setdefault("OCMEMOG_LOCAL_LLM_BASE_URL", "http://127.0.0.1:18080/v1")
|
|
13
|
+
os.environ.setdefault("OCMEMOG_LOCAL_LLM_MODEL", "qwen2.5-7b-instruct")
|
|
14
|
+
os.environ.setdefault("OCMEMOG_LOCAL_EMBED_BASE_URL", "http://127.0.0.1:18081/v1")
|
|
15
|
+
os.environ.setdefault("OCMEMOG_LOCAL_EMBED_MODEL", "nomic-embed-text-v1.5")
|
|
16
|
+
os.environ.setdefault("BRAIN_EMBED_MODEL_PROVIDER", "local-openai")
|
|
15
17
|
os.environ.setdefault("BRAIN_EMBED_MODEL_LOCAL", "")
|
|
16
18
|
os.environ.setdefault("OCMEMOG_STATE_DIR", str(REPO_ROOT / ".ocmemog-state"))
|
|
17
19
|
|
|
@@ -31,12 +31,16 @@ if [[ "$LAPTOP_MODE" == "auto" ]]; then
|
|
|
31
31
|
fi
|
|
32
32
|
export OCMEMOG_LAPTOP_MODE="$LAPTOP_MODE"
|
|
33
33
|
|
|
34
|
-
# defaults for local
|
|
35
|
-
export OCMEMOG_USE_OLLAMA="${OCMEMOG_USE_OLLAMA:-
|
|
36
|
-
export
|
|
34
|
+
# defaults for local llama.cpp / OpenAI-compatible inference and embeddings
|
|
35
|
+
export OCMEMOG_USE_OLLAMA="${OCMEMOG_USE_OLLAMA:-false}"
|
|
36
|
+
export OCMEMOG_LOCAL_LLM_BASE_URL="${OCMEMOG_LOCAL_LLM_BASE_URL:-http://127.0.0.1:18080/v1}"
|
|
37
|
+
export OCMEMOG_LOCAL_LLM_MODEL="${OCMEMOG_LOCAL_LLM_MODEL:-qwen2.5-7b-instruct}"
|
|
38
|
+
export OCMEMOG_LOCAL_EMBED_BASE_URL="${OCMEMOG_LOCAL_EMBED_BASE_URL:-http://127.0.0.1:18081/v1}"
|
|
39
|
+
export OCMEMOG_LOCAL_EMBED_MODEL="${OCMEMOG_LOCAL_EMBED_MODEL:-nomic-embed-text-v1.5}"
|
|
40
|
+
export OCMEMOG_OLLAMA_MODEL="${OCMEMOG_OLLAMA_MODEL:-qwen2.5:7b}"
|
|
37
41
|
export OCMEMOG_OLLAMA_EMBED_MODEL="${OCMEMOG_OLLAMA_EMBED_MODEL:-nomic-embed-text:latest}"
|
|
38
|
-
export OCMEMOG_PONDER_MODEL="${OCMEMOG_PONDER_MODEL:-qwen2.5
|
|
39
|
-
export BRAIN_EMBED_MODEL_PROVIDER="${BRAIN_EMBED_MODEL_PROVIDER:-
|
|
42
|
+
export OCMEMOG_PONDER_MODEL="${OCMEMOG_PONDER_MODEL:-local-openai:qwen2.5-7b-instruct}"
|
|
43
|
+
export BRAIN_EMBED_MODEL_PROVIDER="${BRAIN_EMBED_MODEL_PROVIDER:-local-openai}"
|
|
40
44
|
export BRAIN_EMBED_MODEL_LOCAL="${BRAIN_EMBED_MODEL_LOCAL:-}"
|
|
41
45
|
|
|
42
46
|
# battery-aware transcript watcher defaults
|
|
@@ -153,8 +153,9 @@ def _distill_batches(endpoint: str, target: int, batch_sizes: list[int], timeout
|
|
|
153
153
|
|
|
154
154
|
def _enable_local_embeddings() -> None:
|
|
155
155
|
os.environ.setdefault("BRAIN_EMBED_MODEL_LOCAL", "")
|
|
156
|
-
os.environ.setdefault("BRAIN_EMBED_MODEL_PROVIDER", "
|
|
157
|
-
os.environ.setdefault("
|
|
156
|
+
os.environ.setdefault("BRAIN_EMBED_MODEL_PROVIDER", "local-openai")
|
|
157
|
+
os.environ.setdefault("OCMEMOG_LOCAL_EMBED_BASE_URL", "http://127.0.0.1:18081/v1")
|
|
158
|
+
os.environ.setdefault("OCMEMOG_LOCAL_EMBED_MODEL", "nomic-embed-text-v1.5")
|
|
158
159
|
|
|
159
160
|
|
|
160
161
|
def main() -> int:
|