nexo-brain 7.8.1 → 7.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "nexo-brain",
3
- "version": "7.8.1",
3
+ "version": "7.9.0",
4
4
  "description": "Local cognitive runtime for Claude Code \u2014 persistent memory, overnight learning, doctor diagnostics, personal scripts, recovery-aware jobs, startup preflight, and optional dashboard/power helper.",
5
5
  "author": {
6
6
  "name": "NEXO Brain",
package/README.md CHANGED
@@ -18,7 +18,11 @@
18
18
 
19
19
  [Watch the overview video](https://nexo-brain.com/watch/) · [Watch on YouTube](https://www.youtube.com/watch?v=i2lkGhKyVqI) · [Open the infographic](https://nexo-brain.com/assets/nexo-brain-infographic-v5.png)
20
20
 
21
- Version `7.8.1` is the current packaged-runtime line. Patch release that closes the last compaction-continuity gap Francisco flagged after v7.8.0: `pre-compact.sh` Layer 2 emergency auto-diary and Layer 3 `compaction_memory.record_auto_flush` now use the exact `TARGET_SID` resolved from `CLAUDE_SESSION_ID` instead of falling back to `ORDER BY last_update_epoch DESC LIMIT 1` ("latest active session"). In multi-conversation Desktop that fallback routinely wrote the emergency diary against the wrong conversation even though the main restore path was already exact-SID in v7.8.0. `last_diary_ts` is also scoped by `session_id` now. Fail-closed when no `CLAUDE_SESSION_ID` resolves. New behavioural tests drive the real shell script with two sessions in the DB to pin the invariant. Fixed a latent bash-escape bug in `pre-compact.sh` where a double-quoted string inside a Python comment silently closed the `python3 -c "..."` argument early caught by adding the behavioural tests. Pytest 2092 passing (+2 new behavioural). No Desktop bump.
21
+ Version `7.9.0` is the current packaged-runtime line. Minor release that ships the foundation of the semantic stack (router + reasoner + CLI) under the ONEPASS LLM Coverage plan, plus two product-bug fixes observed in the wild on 2026-04-23. New `src/semantic_router.py` exposes 18 named `decision_kinds` (13 textual + 5 code-aware) with a per-kind policy table and the layer chain `fast_local → semantic_reasoner → remote_fallback`. New `src/semantic_reasoner.py` adds Mode A (`multipass_local`: reuses the mDeBERTa pin with three prompt-perturbed passes + majority vote + 0.75 floor) and Mode B (`cached_llm`: wrapper over `call_model_raw` with a pid+uuid atomic-write 24h-TTL disk cache at `~/.nexo/runtime/operations/semantic-reasoner-cache.json`, SHA-256 keyed by `decision_kind` + normalized input, LRU-bounded at 2000 entries, corrupt entries dropped on read). New `scripts/semantic-classify.py` JSON-in JSON-out CLI lets external MCP clients (including the closed-source NEXO Desktop companion) query Brain as the single semantic authority. New `NEXO_SEMANTIC_REASONER` kill switch (`0`/`off`/`false`/`no`/`disable`/`disabled`) honours the plan mandate for a runtime opt-out separate from `NEXO_LOCAL_CLASSIFIER`. Bug fixes: `bin/nexo-brain.js` upgrade flow now copies `templates/` root the same way fresh install and same-version refresh already did (Maria iMac 7.1.10→7.8.1 upgrade had lost 27 core-prompts templates and broken post-update import verification); and `tool-enforcement-map.json` `nexo_startup.enforcement.inject_prompt` now instructs the model to preload the 13 `mcp__nexo__*` protocol tools via `ToolSearch` before calling `nexo_startup` when the host MCP client defers tool schemas (Claude Code with many MCPs installed). Audit-driven hardening: router/reasoner defensively use `getattr` over the `call_model_raw` module and add a trailing `except Exception` so provider errors degrade with `remote_error` instead of propagating; cache writes use pid+uuid tmp + `fsync` + `os.replace` to survive concurrent writers; `NEXO_SEMANTIC_REASONER_TTL` parse tolerates malformed values. Tests: +50 (22 router, 20 reasoner, 8 CLI). Per-site migration of existing callers (`session_end_intent`, `r14`, `r16`, `r17`, `r20`, `r34`, T4 gates, `tools_drive`, `nexo-followup-runner`) is explicitly deferred to follow-up patch releases and tracked as `NF-SEMANTIC-ROUTER-SITE-MIGRATION`; nothing in this release changes the behaviour of the existing callers. Companion coordinated release: NEXO Desktop v0.28.0.
22
+
23
+ Previously in `7.8.2`: patch release that fixes the compact-hook observability gap Francisco flagged after v7.8.1: `hook_runs.session_id` was empty for 7 out of 8 recent compaction rows (and when populated it stored the raw Claude Code token instead of the NEXO sid), so per-session queries over `hook_runs` for compact events could not be joined back to the NEXO session that actually compacted. v7.8.2 adds `src/hooks/compact_session_resolver.py` with `resolve_nexo_sid(claude_session_id)`, which walks the same rails the shell already uses: `sessions.claude_session_id` match, then `session_claude_aliases.claude_session_id` (most recent `last_seen` wins), then the per-conversation sidecar under `runtime/data/compacting/<safe-claude-id>.txt`, then the legacy global sidecar for single-conversation setups. `src/hooks/pre_compact.py` and `src/hooks/post_compact.py` now call the resolver and store the real NEXO sid in `hook_runs.session_id`; both wrappers also stash `{claude_session_id, sid_source}` in `hook_runs.metadata` so "why is this row still empty?" has a one-query answer. Nine new tests in `tests/test_hook_runs_compact_sid_resolution.py` pin the five resolver rails (sessions / alias / sidecar / legacy / none), malformed-sidecar rejection, the pre- and post-compact wrapper end-to-end paths, and the empty-state wrapper rail so a clean audit trail is written even when nothing resolves. No Desktop bump.
24
+
25
+ Previously in `7.8.1`: patch release that closed the last compaction-continuity gap Francisco flagged after v7.8.0: `pre-compact.sh` Layer 2 emergency auto-diary and Layer 3 `compaction_memory.record_auto_flush` now use the exact `TARGET_SID` resolved from `CLAUDE_SESSION_ID` instead of falling back to `ORDER BY last_update_epoch DESC LIMIT 1` ("latest active session"). In multi-conversation Desktop that fallback routinely wrote the emergency diary against the wrong conversation even though the main restore path was already exact-SID in v7.8.0. `last_diary_ts` is also scoped by `session_id` now. Fail-closed when no `CLAUDE_SESSION_ID` resolves. New behavioural tests drive the real shell script with two sessions in the DB to pin the invariant. Fixed a latent bash-escape bug in `pre-compact.sh` where a double-quoted string inside a Python comment silently closed the `python3 -c "..."` argument early — caught by adding the behavioural tests. Pytest 2092 passing (+2 new behavioural). No Desktop bump.
22
26
 
23
27
  Previously in `7.8.0`: minor release that closed the PostCompact continuity work Francisco requested after v7.7: `src/hooks/post_compact.py` is a real registered hook (part of the canonical 9-hook set, was 8), `pre-compact.sh` resolves the exact NEXO SID from `CLAUDE_SESSION_ID` instead of falling back to "latest active session" (that was actively wrong in multi-conversation Desktop), the sidecar moves from `/tmp` to `$NEXO_HOME/runtime/data/compacting-sid.txt` so two concurrent compactions on two conversations cannot race on `/tmp`, `post-compact.sh` removes its "latest checkpoint" fallback (fail-closed to a diagnostic systemMessage instead of restoring the wrong conversation), and the hook cross-checks the sidecar SID against the env-resolved one so a "SID mismatch" is logged as such. Pre- and post-compact now emit NDJSON events the engine drains on every periodic tick via `_consume_pending_hook_events()`; the queue file is truncated after read so an event never fires twice. A new contract test (`tests/test_v78_compaction_continuity.py`) pins 11 invariants across ten rails including the hook registration, the exact-SID resolution path, fail-closed behaviour, and that `compaction_count` only increments on real restore. Pytest 2086 passing (+16 vs v7.7). No Desktop bump — v0.27.0 continues to ship.
24
28
 
package/bin/nexo-brain.js CHANGED
@@ -2127,6 +2127,18 @@ async function main() {
2127
2127
  writeRuntimeCoreArtifactsManifest(NEXO_HOME, srcDir);
2128
2128
  log(" Scripts updated.");
2129
2129
 
2130
+ // Update templates/ root (core-prompts/, CLAUDE.md.template, etc.) — recursive
2131
+ // Managed surface: copyDirRec overwrites without diffing, so any
2132
+ // hand-edited template under ~/.nexo/templates/ is replaced on
2133
+ // upgrade. Keep local forks under personal/ or outside the runtime
2134
+ // home to avoid silent loss.
2135
+ const migTemplatesSrc = path.join(__dirname, "..", "templates");
2136
+ const migTemplatesDest = path.join(NEXO_HOME, "templates");
2137
+ if (fs.existsSync(migTemplatesSrc)) {
2138
+ copyDirRec(migTemplatesSrc, migTemplatesDest);
2139
+ log(" Templates updated (user-edited templates/ files are overwritten).");
2140
+ }
2141
+
2130
2142
  // Register ALL 8 core hooks in settings.json (additive — don't remove user's custom hooks)
2131
2143
  let settings = {};
2132
2144
  if (fs.existsSync(CLAUDE_SETTINGS)) {
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "nexo-brain",
3
- "version": "7.8.1",
3
+ "version": "7.9.0",
4
4
  "mcpName": "io.github.wazionapps/nexo",
5
5
  "description": "NEXO Brain \u2014 Shared brain for AI agents. Persistent memory, semantic RAG, natural forgetting, metacognitive guard, trust scoring, 150+ MCP tools. Works with Claude Code, Codex, Claude Desktop & any MCP client. 100% local, free.",
6
6
  "homepage": "https://nexo-brain.com",
@@ -0,0 +1,135 @@
1
+ """Resolve the NEXO sid for compaction hook observability.
2
+
3
+ `hook_runs.session_id` must hold the NEXO sid (`nexo-NNNNNN-N`) so that a
4
+ query like "every compaction of session X" works without joining on the
5
+ raw Claude Code token. Pre-v7.8.2 the two Python wrappers stored
6
+ `os.environ.get("CLAUDE_SESSION_ID", "")` directly, which produced two
7
+ problems at once: rows with `session_id=''` when the env was missing,
8
+ and rows with the raw Claude token (not a NEXO sid) when it was
9
+ present. This helper centralises the resolution against the same rails
10
+ the shell scripts use.
11
+
12
+ Resolution order:
13
+ 1. ENV `CLAUDE_SESSION_ID` with `sessions.claude_session_id` match.
14
+ 2. ENV `CLAUDE_SESSION_ID` with `session_claude_aliases.claude_session_id`
15
+ match (most recent `last_seen` wins).
16
+ 3. Per-conversation sidecar written by pre-compact.sh at
17
+ the compacting folder under the runtime data dir.
18
+ 4. Legacy global sidecar at compacting-sid.txt (single-conv legacy path).
19
+
20
+ Returns (nexo_sid, source) so the caller can stash `source` in the
21
+ `hook_runs.metadata` JSON for debugging "why is this row still empty".
22
+ """
23
+ from __future__ import annotations
24
+
25
+ import os
26
+ import re
27
+ import sqlite3
28
+ from pathlib import Path
29
+
30
+ _NEXO_SID_RE = re.compile(r"^nexo-[0-9]+-[0-9]+$")
31
+ _SAFE_CLAUDE_ID_RE = re.compile(r"[^a-zA-Z0-9._-]")
32
+
33
+
34
+ def _nexo_home() -> Path:
35
+ return Path(os.environ.get("NEXO_HOME", str(Path.home() / ".nexo")))
36
+
37
+
38
+ def _candidate_data_dirs() -> list[Path]:
39
+ home = _nexo_home()
40
+ dirs: list[Path] = []
41
+ for cand in (home / "runtime" / "data", home / "data"):
42
+ if cand not in dirs:
43
+ dirs.append(cand)
44
+ return dirs
45
+
46
+
47
+ def _db_path() -> Path | None:
48
+ for d in _candidate_data_dirs():
49
+ p = d / "nexo.db"
50
+ if p.is_file():
51
+ return p
52
+ return None
53
+
54
+
55
+ def _safe_claude_id(claude_session_id: str) -> str:
56
+ return _SAFE_CLAUDE_ID_RE.sub("_", claude_session_id or "")
57
+
58
+
59
+ def _read_sidecar(path: Path) -> str:
60
+ try:
61
+ text = path.read_text(encoding="utf-8").strip()
62
+ except Exception:
63
+ return ""
64
+ return text if _NEXO_SID_RE.match(text) else ""
65
+
66
+
67
+ def _db_lookup(claude_session_id: str) -> tuple[str, str]:
68
+ if not claude_session_id:
69
+ return "", ""
70
+ db = _db_path()
71
+ if db is None:
72
+ return "", ""
73
+ try:
74
+ conn = sqlite3.connect(str(db), timeout=3)
75
+ except Exception:
76
+ return "", ""
77
+ try:
78
+ try:
79
+ row = conn.execute(
80
+ "SELECT sid FROM sessions WHERE claude_session_id = ? LIMIT 1",
81
+ (claude_session_id,),
82
+ ).fetchone()
83
+ except Exception:
84
+ row = None
85
+ if row and row[0] and _NEXO_SID_RE.match(row[0]):
86
+ return row[0], "sessions"
87
+ try:
88
+ row = conn.execute(
89
+ "SELECT sid FROM session_claude_aliases "
90
+ "WHERE claude_session_id = ? "
91
+ "ORDER BY last_seen DESC LIMIT 1",
92
+ (claude_session_id,),
93
+ ).fetchone()
94
+ except Exception:
95
+ row = None
96
+ if row and row[0] and _NEXO_SID_RE.match(row[0]):
97
+ return row[0], "alias"
98
+ finally:
99
+ try:
100
+ conn.close()
101
+ except Exception:
102
+ pass
103
+ return "", ""
104
+
105
+
106
+ def resolve_nexo_sid(claude_session_id: str = "") -> tuple[str, str]:
107
+ """Resolve the NEXO sid for the current compaction invocation.
108
+
109
+ Returns ``(nexo_sid, source)`` where ``source`` is one of:
110
+
111
+ - ``sessions`` resolved via sessions table claude_session_id.
112
+ - ``alias`` resolved via session_claude_aliases.
113
+ - ``sidecar`` per-conversation sidecar file.
114
+ - ``sidecar_legacy`` legacy global sidecar (single-conv path).
115
+ - ``none`` no rail matched; caller stores empty string.
116
+ """
117
+ token = (claude_session_id or os.environ.get("CLAUDE_SESSION_ID", "") or "").strip()
118
+
119
+ if token:
120
+ sid, source = _db_lookup(token)
121
+ if sid:
122
+ return sid, source
123
+ safe_id = _safe_claude_id(token)
124
+ if safe_id:
125
+ for base in _candidate_data_dirs():
126
+ side = _read_sidecar(base / "compacting" / f"{safe_id}.txt")
127
+ if side:
128
+ return side, "sidecar"
129
+
130
+ for base in _candidate_data_dirs():
131
+ side = _read_sidecar(base / "compacting-sid.txt")
132
+ if side:
133
+ return side, "sidecar_legacy"
134
+
135
+ return "", "none"
@@ -22,15 +22,30 @@ from pathlib import Path
22
22
  _DIR = Path(__file__).resolve().parent
23
23
 
24
24
 
25
- def _record(duration_ms: int, exit_code: int, session_id: str) -> None:
25
+ def _record(duration_ms: int, exit_code: int, claude_session_id: str) -> None:
26
+ """Log a hook_runs row with the resolved NEXO sid.
27
+
28
+ v7.8.2 — see the matching docstring in pre_compact.py. Post-compact
29
+ runs after the shell script has already consumed the per-conv
30
+ sidecar, but the DB rails (sessions/aliases) stay valid, so the
31
+ resolver still returns a sid in the common case. `sid_source` goes
32
+ into metadata for empty-row triage.
33
+ """
26
34
  try:
27
35
  sys.path.insert(0, str(_DIR.parent))
36
+ sys.path.insert(0, str(_DIR))
28
37
  import hook_observability # type: ignore
38
+ from compact_session_resolver import resolve_nexo_sid # type: ignore
39
+ nexo_sid, sid_source = resolve_nexo_sid(claude_session_id)
29
40
  hook_observability.record_hook_run(
30
41
  "post_compact",
31
42
  duration_ms=duration_ms,
32
43
  exit_code=exit_code,
33
- session_id=session_id,
44
+ session_id=nexo_sid,
45
+ metadata={
46
+ "claude_session_id": claude_session_id,
47
+ "sid_source": sid_source,
48
+ },
34
49
  )
35
50
  except Exception:
36
51
  pass
@@ -18,14 +18,33 @@ _DIR = Path(__file__).resolve().parent
18
18
 
19
19
 
20
20
  def _record(duration_ms: int, exit_code: int) -> None:
21
+ """Log a hook_runs row with the resolved NEXO sid.
22
+
23
+ v7.8.2 — the raw `CLAUDE_SESSION_ID` env token is not a NEXO sid, so
24
+ storing it in `hook_runs.session_id` made per-session queries useless
25
+ and left the column empty whenever Claude Code did not forward the
26
+ env. `compact_session_resolver.resolve_nexo_sid` walks the same
27
+ rails the shell script uses (sessions → aliases → per-conv sidecar
28
+ → legacy global sidecar) and returns `(nexo_sid, source)`. The raw
29
+ Claude token and the resolution source end up in `metadata` so an
30
+ operator can debug why a given row is still empty.
31
+ """
21
32
  try:
22
33
  sys.path.insert(0, str(_DIR.parent))
34
+ sys.path.insert(0, str(_DIR))
23
35
  import hook_observability # type: ignore
36
+ from compact_session_resolver import resolve_nexo_sid # type: ignore
37
+ claude_id = os.environ.get("CLAUDE_SESSION_ID", "")
38
+ nexo_sid, sid_source = resolve_nexo_sid(claude_id)
24
39
  hook_observability.record_hook_run(
25
40
  "pre_compact",
26
41
  duration_ms=duration_ms,
27
42
  exit_code=exit_code,
28
- session_id=os.environ.get("CLAUDE_SESSION_ID", ""),
43
+ session_id=nexo_sid,
44
+ metadata={
45
+ "claude_session_id": claude_id,
46
+ "sid_source": sid_source,
47
+ },
29
48
  )
30
49
  except Exception:
31
50
  pass
@@ -0,0 +1,581 @@
1
+ """semantic_reasoner — second-layer semantic decision maker.
2
+
3
+ Plan ONEPASS LLM Coverage. Called through ``src/semantic_router.py``.
4
+ Exposes a single ``reason()`` entrypoint with two modes:
5
+
6
+ Mode A — ``multipass_local`` (textual decision kinds)
7
+
8
+ Reuses the already-pinned ``LocalZeroShotClassifier`` (see
9
+ ``docs/classifier-model-notes.md``) but with stricter behaviour:
10
+ three inference passes with mild prompt perturbations, then
11
+ majority vote across passes. A decision is only accepted if at
12
+ least two of three passes agree AND the agreed confidence is
13
+ above the stricter threshold. This kills single-pass false
14
+ positives without adding a new model dependency.
15
+
16
+ Mode B — ``cached_llm`` (code-aware decision kinds)
17
+
18
+ Thin wrapper around ``call_model_raw`` with a disk cache scoped
19
+ by (decision_kind, sha256(normalized_prompt)). TTL = 24h. The
20
+ cache lives under ``~/.nexo/runtime/operations/semantic-reasoner-cache.json``
21
+ alongside the existing classifier install state. Cache hits
22
+ return instantly and are flagged in ``meta.cache_hit``. Misses
23
+ call the LLM; the response and its normalized verdict are
24
+ written back to the cache atomically.
25
+
26
+ Pin notes: this module does not introduce a new downloaded model.
27
+ Mode A reuses ``MODEL_ID``/``MODEL_REVISION`` from ``classifier_local``.
28
+ Mode B resolves the LLM through the standard resonance map with
29
+ ``caller='semantic_reasoner'`` and ``tier='muy_bajo'``; the pin lives
30
+ in ``resonance_map`` like every other LLM caller.
31
+
32
+ See ``docs/semantic-reasoner-model-notes.md`` for the rationale behind
33
+ this "upgrade-in-place, pin-by-reuse" strategy, and why a dedicated
34
+ stronger local LLM (Llama 3.1 8B, etc.) is explicitly deferred to a
35
+ future release.
36
+ """
37
+ from __future__ import annotations
38
+
39
+ import hashlib
40
+ import json
41
+ import logging
42
+ import os
43
+ import re
44
+ import time
45
+ from dataclasses import dataclass, field
46
+ from pathlib import Path
47
+ from typing import Any
48
+
49
+ _logger = logging.getLogger(__name__)
50
+
51
+
52
+ # ---------------------------------------------------------------------------
53
+ # Shared dataclass imported from the router
54
+ # ---------------------------------------------------------------------------
55
+
56
+
57
+ def _import_router_result():
58
+ """Lazy import to avoid circular dependency on semantic_router."""
59
+ from semantic_router import RouterResult
60
+
61
+ return RouterResult
62
+
63
+
64
+ # ---------------------------------------------------------------------------
65
+ # Mode A — multi-pass local
66
+ # ---------------------------------------------------------------------------
67
+
68
+
69
+ _PROMPT_PERTURBATIONS: tuple[str, ...] = (
70
+ "{q}",
71
+ "Decide: {q}",
72
+ "Classify this utterance: {q}",
73
+ )
74
+
75
+
76
+ def _collect_local_votes(
77
+ question: str, labels: tuple[str, ...]
78
+ ) -> list[tuple[str, float, dict[str, float]]]:
79
+ """Run the local classifier three times with mild prompt variations.
80
+
81
+ Returns a list of ``(label, confidence, scores)`` triples. Any
82
+ pass that fails silently returns a zero-confidence entry so the
83
+ vote aggregator can still detect quorum problems.
84
+ """
85
+ try:
86
+ from classifier_local import LocalZeroShotClassifier
87
+ except Exception as exc: # pragma: no cover
88
+ _logger.debug("semantic_reasoner: classifier_local unavailable (%s)", exc)
89
+ return []
90
+
91
+ clf = LocalZeroShotClassifier(confidence_floor=0.0)
92
+ votes: list[tuple[str, float, dict[str, float]]] = []
93
+ for template in _PROMPT_PERTURBATIONS:
94
+ prompt = template.format(q=question)
95
+ result = clf.classify(prompt, labels)
96
+ if result is None:
97
+ votes.append(("", 0.0, {}))
98
+ continue
99
+ votes.append((result.label, float(result.confidence), dict(result.scores)))
100
+ return votes
101
+
102
+
103
+ def _aggregate_votes(
104
+ votes: list[tuple[str, float, dict[str, float]]],
105
+ confidence_floor: float,
106
+ ) -> tuple[str | None, float, dict[str, Any]]:
107
+ """Majority vote across passes. Returns (label_or_none, confidence, meta)."""
108
+ if not votes:
109
+ return None, 0.0, {"reason": "no_votes"}
110
+
111
+ counts: dict[str, int] = {}
112
+ confidences: dict[str, list[float]] = {}
113
+ for label, confidence, _scores in votes:
114
+ if not label:
115
+ continue
116
+ counts[label] = counts.get(label, 0) + 1
117
+ confidences.setdefault(label, []).append(confidence)
118
+
119
+ if not counts:
120
+ return None, 0.0, {"reason": "all_passes_failed", "votes": len(votes)}
121
+
122
+ best_label = max(counts, key=lambda lbl: (counts[lbl], max(confidences[lbl])))
123
+ vote_count = counts[best_label]
124
+ avg_confidence = sum(confidences[best_label]) / len(confidences[best_label])
125
+
126
+ meta: dict[str, Any] = {
127
+ "votes_total": len(votes),
128
+ "votes_for_best": vote_count,
129
+ "avg_confidence": round(avg_confidence, 4),
130
+ "per_label_counts": dict(counts),
131
+ }
132
+
133
+ if vote_count < 2:
134
+ meta["reason"] = "no_majority"
135
+ return None, avg_confidence, meta
136
+ if avg_confidence < confidence_floor:
137
+ meta["reason"] = "below_threshold"
138
+ return None, avg_confidence, meta
139
+ return best_label, avg_confidence, meta
140
+
141
+
142
+ def _reason_multipass_local(
143
+ *,
144
+ decision_kind: str,
145
+ question: str,
146
+ labels: tuple[str, ...] | None,
147
+ confidence_floor: float,
148
+ ):
149
+ RouterResult = _import_router_result()
150
+ if not labels:
151
+ return RouterResult(
152
+ ok=False,
153
+ decision_kind=decision_kind,
154
+ route_used="semantic_reasoner",
155
+ degraded=True,
156
+ error="multipass_local requires labels",
157
+ )
158
+
159
+ votes = _collect_local_votes(question, labels)
160
+ label, confidence, meta = _aggregate_votes(votes, confidence_floor)
161
+ if label is None:
162
+ return RouterResult(
163
+ ok=False,
164
+ decision_kind=decision_kind,
165
+ route_used="semantic_reasoner",
166
+ degraded=True,
167
+ error=meta.get("reason", "aggregation_failed"),
168
+ meta={"mode": "multipass_local", "aggregate": meta},
169
+ )
170
+ return RouterResult(
171
+ ok=True,
172
+ decision_kind=decision_kind,
173
+ verdict=label,
174
+ label=label,
175
+ confidence=round(float(confidence), 4),
176
+ route_used="semantic_reasoner",
177
+ degraded=False,
178
+ meta={"mode": "multipass_local", "aggregate": meta},
179
+ )
180
+
181
+
182
+ # ---------------------------------------------------------------------------
183
+ # Mode B — cached LLM
184
+ # ---------------------------------------------------------------------------
185
+
186
+
187
+ _DEFAULT_CACHE_TTL_SECONDS = 24 * 3600
188
+
189
+
190
+ def _cache_path() -> Path:
191
+ """Resolve the on-disk cache location.
192
+
193
+ Reuses ``paths.operations_dir()`` so the reasoner state lives next to
194
+ the existing ``classifier-install-state.json``. If ``paths`` is not
195
+ importable (heavy module; test context), fall back to a deterministic
196
+ location under ``NEXO_HOME``.
197
+ """
198
+ override = os.environ.get("NEXO_SEMANTIC_REASONER_CACHE_PATH", "").strip()
199
+ if override:
200
+ return Path(override)
201
+ try:
202
+ import paths
203
+
204
+ return paths.operations_dir() / "semantic-reasoner-cache.json"
205
+ except Exception:
206
+ home = os.environ.get("NEXO_HOME", "").strip()
207
+ root = Path(home) if home else Path.home() / ".nexo"
208
+ return root / "runtime" / "operations" / "semantic-reasoner-cache.json"
209
+
210
+
211
+ def _normalize_for_hash(text: str) -> str:
212
+ """Normalise whitespace/case so equivalent prompts hit the same cache
213
+ entry. Does not touch content semantics beyond whitespace collapse."""
214
+ return re.sub(r"\s+", " ", (text or "").strip().lower())
215
+
216
+
217
+ def _cache_key(
218
+ *,
219
+ decision_kind: str,
220
+ question: str,
221
+ labels: tuple[str, ...] | None,
222
+ context: str,
223
+ ) -> str:
224
+ payload = json.dumps(
225
+ {
226
+ "kind": decision_kind,
227
+ "q": _normalize_for_hash(question),
228
+ "ctx": _normalize_for_hash(context)[:400],
229
+ "labels": list(labels) if labels else [],
230
+ },
231
+ sort_keys=True,
232
+ ensure_ascii=False,
233
+ )
234
+ return hashlib.sha256(payload.encode("utf-8")).hexdigest()
235
+
236
+
237
+ def _read_cache() -> dict[str, Any]:
238
+ try:
239
+ path = _cache_path()
240
+ if not path.is_file():
241
+ return {}
242
+ data = json.loads(path.read_text() or "{}")
243
+ if isinstance(data, dict):
244
+ return data
245
+ except Exception as exc: # pragma: no cover — corrupt cache
246
+ _logger.warning("semantic_reasoner: cache read failed (%s); starting fresh", exc)
247
+ return {}
248
+
249
+
250
+ def _write_cache(cache: dict[str, Any]) -> None:
251
+ """Atomic write with pid+uuid suffix so concurrent Brain / Desktop CLI
252
+ writers do not stomp each other's temp file."""
253
+ try:
254
+ path = _cache_path()
255
+ path.parent.mkdir(parents=True, exist_ok=True)
256
+ import os as _os
257
+ import uuid as _uuid
258
+
259
+ tmp = path.with_name(
260
+ f"{path.name}.tmp.{_os.getpid()}.{_uuid.uuid4().hex[:8]}"
261
+ )
262
+ payload = json.dumps(cache, ensure_ascii=False, sort_keys=True)
263
+ with open(tmp, "w", encoding="utf-8") as handle:
264
+ handle.write(payload)
265
+ handle.flush()
266
+ try:
267
+ _os.fsync(handle.fileno())
268
+ except OSError:
269
+ pass
270
+ _os.replace(tmp, path)
271
+ except Exception as exc: # pragma: no cover
272
+ _logger.warning("semantic_reasoner: cache write failed (%s)", exc)
273
+
274
+
275
+ def _cache_get(key: str, ttl_seconds: int) -> dict[str, Any] | None:
276
+ cache = _read_cache()
277
+ entry = cache.get(key)
278
+ if not isinstance(entry, dict):
279
+ return None
280
+ ts = float(entry.get("ts", 0.0) or 0.0)
281
+ if ts <= 0.0:
282
+ return None
283
+ if (time.time() - ts) > ttl_seconds:
284
+ return None
285
+ return entry
286
+
287
+
288
+ def _cache_put(key: str, entry: dict[str, Any]) -> None:
289
+ cache = _read_cache()
290
+ cache[key] = {**entry, "ts": time.time()}
291
+ if len(cache) > 2000:
292
+ # Keep the 1800 most-recent entries to avoid unbounded growth. The
293
+ # bound is advisory; callers should keep reasoner prompts small.
294
+ items = sorted(cache.items(), key=lambda kv: float(kv[1].get("ts", 0.0) or 0.0))
295
+ cache = dict(items[-1800:])
296
+ _write_cache(cache)
297
+
298
+
299
+ def _parse_ttl_env() -> int:
300
+ """Read ``NEXO_SEMANTIC_REASONER_TTL`` defensively.
301
+
302
+ Malformed values (non-integer, negative) fall back to the default so
303
+ operator typos never crash the reasoner on first call.
304
+ """
305
+ raw = os.environ.get("NEXO_SEMANTIC_REASONER_TTL", "")
306
+ if not raw:
307
+ return _DEFAULT_CACHE_TTL_SECONDS
308
+ try:
309
+ parsed = int(raw)
310
+ except (TypeError, ValueError):
311
+ _logger.warning(
312
+ "semantic_reasoner: invalid NEXO_SEMANTIC_REASONER_TTL=%r; "
313
+ "using default %d",
314
+ raw,
315
+ _DEFAULT_CACHE_TTL_SECONDS,
316
+ )
317
+ return _DEFAULT_CACHE_TTL_SECONDS
318
+ if parsed <= 0:
319
+ return _DEFAULT_CACHE_TTL_SECONDS
320
+ return parsed
321
+
322
+
323
+ def _reason_cached_llm(
324
+ *,
325
+ decision_kind: str,
326
+ question: str,
327
+ labels: tuple[str, ...] | None,
328
+ context: str,
329
+ confidence_floor: float,
330
+ ):
331
+ RouterResult = _import_router_result()
332
+ ttl = _parse_ttl_env()
333
+ key = _cache_key(
334
+ decision_kind=decision_kind,
335
+ question=question,
336
+ labels=labels,
337
+ context=context,
338
+ )
339
+
340
+ cached = _cache_get(key, ttl)
341
+ if cached is not None:
342
+ cached_verdict = cached.get("verdict")
343
+ if isinstance(cached_verdict, str) and cached_verdict.strip():
344
+ return RouterResult(
345
+ ok=True,
346
+ decision_kind=decision_kind,
347
+ verdict=cached_verdict,
348
+ label=cached_verdict,
349
+ confidence=float(cached.get("confidence", 0.6)),
350
+ route_used="semantic_reasoner",
351
+ degraded=False,
352
+ meta={
353
+ "mode": "cached_llm",
354
+ "cache_hit": True,
355
+ "cache_key": key[:12],
356
+ },
357
+ )
358
+ # Corrupt entry (verdict missing or non-string). Drop it and fall
359
+ # through to a live call so the caller is never handed a cached
360
+ # "ok=True, verdict=None" sentinel.
361
+ _logger.warning(
362
+ "semantic_reasoner: dropping corrupt cache entry for key=%s",
363
+ key[:12],
364
+ )
365
+
366
+ try:
367
+ import call_model_raw as _cmr
368
+ except Exception as exc: # pragma: no cover
369
+ return RouterResult(
370
+ ok=False,
371
+ decision_kind=decision_kind,
372
+ route_used="semantic_reasoner",
373
+ degraded=True,
374
+ error=f"call_model_raw unavailable: {exc}",
375
+ meta={"mode": "cached_llm", "cache_hit": False},
376
+ )
377
+
378
+ call_model_raw_fn = getattr(_cmr, "call_model_raw", None)
379
+ classifier_unavailable_cls = getattr(
380
+ _cmr, "ClassifierUnavailableError", Exception
381
+ )
382
+ if call_model_raw_fn is None:
383
+ return RouterResult(
384
+ ok=False,
385
+ decision_kind=decision_kind,
386
+ route_used="semantic_reasoner",
387
+ degraded=True,
388
+ error="call_model_raw callable missing",
389
+ meta={"mode": "cached_llm", "cache_hit": False},
390
+ )
391
+
392
+ prompt = _build_reasoner_prompt(
393
+ decision_kind=decision_kind,
394
+ question=question,
395
+ labels=labels,
396
+ context=context,
397
+ )
398
+ system = (
399
+ "You are NEXO's code-aware semantic reasoner. Answer with the "
400
+ "single best label from the provided list (no prose). If no "
401
+ "label fits, answer 'unknown'."
402
+ )
403
+ try:
404
+ raw = call_model_raw_fn(
405
+ prompt,
406
+ system=system,
407
+ caller="semantic_reasoner",
408
+ tier="muy_bajo",
409
+ max_tokens=32,
410
+ temperature=0.0,
411
+ )
412
+ except classifier_unavailable_cls as exc:
413
+ return RouterResult(
414
+ ok=False,
415
+ decision_kind=decision_kind,
416
+ route_used="semantic_reasoner",
417
+ degraded=True,
418
+ error=f"remote_unavailable: {exc}",
419
+ meta={"mode": "cached_llm", "cache_hit": False},
420
+ )
421
+ except Exception as exc: # noqa: BLE001 — fail-closed
422
+ return RouterResult(
423
+ ok=False,
424
+ decision_kind=decision_kind,
425
+ route_used="semantic_reasoner",
426
+ degraded=True,
427
+ error=f"remote_error: {exc}",
428
+ meta={"mode": "cached_llm", "cache_hit": False},
429
+ )
430
+
431
+ verdict = _normalize_verdict(raw, labels)
432
+ if verdict is None:
433
+ return RouterResult(
434
+ ok=False,
435
+ decision_kind=decision_kind,
436
+ route_used="semantic_reasoner",
437
+ degraded=True,
438
+ error="llm_returned_unknown_or_unparseable",
439
+ meta={
440
+ "mode": "cached_llm",
441
+ "cache_hit": False,
442
+ "raw": (raw or "")[:80],
443
+ },
444
+ )
445
+
446
+ _cache_put(
447
+ key,
448
+ {
449
+ "verdict": verdict,
450
+ "confidence": max(confidence_floor, 0.6),
451
+ "decision_kind": decision_kind,
452
+ },
453
+ )
454
+
455
+ return RouterResult(
456
+ ok=True,
457
+ decision_kind=decision_kind,
458
+ verdict=verdict,
459
+ label=verdict,
460
+ confidence=max(confidence_floor, 0.6),
461
+ route_used="semantic_reasoner",
462
+ degraded=False,
463
+ meta={"mode": "cached_llm", "cache_hit": False, "cache_key": key[:12]},
464
+ )
465
+
466
+
467
+ def _build_reasoner_prompt(
468
+ *,
469
+ decision_kind: str,
470
+ question: str,
471
+ labels: tuple[str, ...] | None,
472
+ context: str,
473
+ ) -> str:
474
+ parts = [
475
+ f"decision_kind: {decision_kind}",
476
+ f"question: {question}",
477
+ ]
478
+ if context:
479
+ parts.append(f"context: {context[:600]}")
480
+ if labels:
481
+ parts.append("candidate_labels: " + ", ".join(labels))
482
+ parts.append("Reply with exactly one of the labels above.")
483
+ else:
484
+ parts.append("Reply with the shortest phrase answering the question.")
485
+ return "\n".join(parts)
486
+
487
+
488
+ def _normalize_verdict(
489
+ raw: str, labels: tuple[str, ...] | None
490
+ ) -> str | None:
491
+ text = (raw or "").strip().lower()
492
+ if not text:
493
+ return None
494
+ if text == "unknown":
495
+ return None
496
+ if labels:
497
+ for label in labels:
498
+ if label.lower() == text:
499
+ return label
500
+ for label in labels:
501
+ if label.lower() in text:
502
+ return label
503
+ return None
504
+ return text
505
+
506
+
507
+ # ---------------------------------------------------------------------------
508
+ # Public entrypoint
509
+ # ---------------------------------------------------------------------------
510
+
511
+
512
+ _REASONER_OFF_VALUES = {"0", "off", "false", "no", "disable", "disabled"}
513
+
514
+
515
+ def _is_reasoner_disabled() -> bool:
516
+ """Honour the ``NEXO_SEMANTIC_REASONER`` runtime kill switch.
517
+
518
+ The plan (ONEPASS LLM Coverage) explicitly required an env opt-out
519
+ dedicated to the reasoner, separate from ``NEXO_LOCAL_CLASSIFIER``
520
+ (which only gates install-time provisioning). Operators who hit a
521
+ reasoner regression in production can set ``NEXO_SEMANTIC_REASONER=0``
522
+ to force every ``reason()`` call to refuse; the router then falls
523
+ through to ``remote_fallback`` on its own.
524
+ """
525
+ raw = os.environ.get("NEXO_SEMANTIC_REASONER", "").strip().lower()
526
+ return raw in _REASONER_OFF_VALUES
527
+
528
+
529
+ def reason(
530
+ *,
531
+ decision_kind: str,
532
+ question: str,
533
+ labels: tuple[str, ...] | list[str] | None,
534
+ context: str = "",
535
+ mode: str = "multipass_local",
536
+ confidence_floor: float = 0.75,
537
+ ):
538
+ """Dispatch to the configured mode. Called by ``semantic_router.route``.
539
+
540
+ Returns a ``RouterResult``. The router knows how to keep going to
541
+ ``remote_fallback`` if this layer refuses.
542
+ """
543
+ RouterResult = _import_router_result()
544
+
545
+ if _is_reasoner_disabled():
546
+ return RouterResult(
547
+ ok=False,
548
+ decision_kind=decision_kind,
549
+ route_used="semantic_reasoner",
550
+ degraded=True,
551
+ error="reasoner_disabled_by_env",
552
+ meta={"env": "NEXO_SEMANTIC_REASONER"},
553
+ )
554
+
555
+ labels_tuple: tuple[str, ...] | None = tuple(labels) if labels else None
556
+ if mode == "multipass_local":
557
+ return _reason_multipass_local(
558
+ decision_kind=decision_kind,
559
+ question=question,
560
+ labels=labels_tuple,
561
+ confidence_floor=confidence_floor,
562
+ )
563
+ if mode == "cached_llm":
564
+ return _reason_cached_llm(
565
+ decision_kind=decision_kind,
566
+ question=question,
567
+ labels=labels_tuple,
568
+ context=context,
569
+ confidence_floor=confidence_floor,
570
+ )
571
+
572
+ return RouterResult(
573
+ ok=False,
574
+ decision_kind=decision_kind,
575
+ route_used="semantic_reasoner",
576
+ degraded=True,
577
+ error=f"unknown reasoner mode: {mode}",
578
+ )
579
+
580
+
581
+ __all__ = ["reason"]
@@ -0,0 +1,452 @@
1
+ """semantic_router — Plan ONEPASS LLM Coverage.
2
+
3
+ Central router for every model-backed semantic decision in NEXO Brain. Call
4
+ sites declare a *decision_kind* and pass question/context; the router
5
+ applies the policy for that kind and dispatches through the stack:
6
+
7
+ fast_local -> semantic_reasoner -> remote_fallback
8
+
9
+ Design contract (from ~/Desktop/NEXO-ONEPASS-LLM-COVERAGE-RELEASE-PLAN.md):
10
+
11
+ - Brain owns the semantic contract, model pins and routing policy.
12
+ - Every call site passes a *named* decision_kind; policy lives here, not in
13
+ the caller. This replaces the previous pattern where each caller invented
14
+ its own policy tree.
15
+ - The existing ``LocalZeroShotClassifier`` stays as the cheap multilingual
16
+ first pass (``fast_local``).
17
+ - ``semantic_reasoner`` is the second, stronger layer. Its implementation
18
+ lives in ``src/semantic_reasoner.py`` with two modes: Mode A (strict
19
+ multi-pass over the same local classifier with tighter thresholds) and
20
+ Mode B (LLM-cached reasoner for code-aware decisions).
21
+ - ``remote_fallback`` is the existing ``call_model_raw`` chain. It is no
22
+ longer the default path for local-friendly decisions; it only fires if
23
+ the upstream layers refuse or degrade.
24
+
25
+ The router returns a ``RouterResult`` dataclass so callers can inspect
26
+ which route was used, whether degraded mode is active, and what confidence
27
+ the decision carries. This is also what Desktop will consume via the
28
+ ``brain-semantic-router.js`` bridge shipped in the companion PR.
29
+ """
30
+ from __future__ import annotations
31
+
32
+ import logging
33
+ from dataclasses import dataclass, field
34
+ from typing import Any
35
+
36
+ _logger = logging.getLogger(__name__)
37
+
38
+
39
+ # ---------------------------------------------------------------------------
40
+ # Contract dataclasses
41
+ # ---------------------------------------------------------------------------
42
+
43
+
44
+ @dataclass
45
+ class RouterResult:
46
+ """Outcome of a ``route()`` call.
47
+
48
+ Fields match the minimum contract documented in the plan (section
49
+ "Minimum router output contract"):
50
+
51
+ - ``ok``: overall success (at least one layer produced a decision)
52
+ - ``decision_kind``: the kind the caller passed
53
+ - ``verdict``: the chosen label when the caller used zero-shot
54
+ classification; None when the underlying layer returned free text
55
+ - ``label``: alias for ``verdict`` to match the plan's wording; kept
56
+ consistent to simplify Desktop bridge mapping
57
+ - ``confidence``: [0.0, 1.0]
58
+ - ``route_used``: one of ``fast_local``, ``semantic_reasoner``,
59
+ ``remote_fallback``, or ``no_route`` when every layer refused
60
+ - ``degraded``: True when the chosen layer could not meet its normal
61
+ bar (fallback fired, stricter threshold not met, cache-only, etc.)
62
+ - ``error``: short human-readable reason when ``ok`` is False
63
+ - ``meta``: free-form layer-specific evidence (scores dict, cache
64
+ key, latency, model id) — Desktop uses it for telemetry
65
+ """
66
+
67
+ ok: bool
68
+ decision_kind: str
69
+ verdict: str | None = None
70
+ label: str | None = None
71
+ confidence: float = 0.0
72
+ route_used: str = "no_route"
73
+ degraded: bool = False
74
+ error: str | None = None
75
+ meta: dict[str, Any] = field(default_factory=dict)
76
+
77
+
78
+ # ---------------------------------------------------------------------------
79
+ # Decision kinds + policy table
80
+ # ---------------------------------------------------------------------------
81
+ #
82
+ # The plan enumerates 18 decision_kinds that need to route through here. They
83
+ # fall into two families:
84
+ #
85
+ # TEXTUAL — the first-line local classifier is good enough; the
86
+ # reasoner adds a stricter multi-pass check for ambiguous
87
+ # cases. Remote is only a last-resort safety net.
88
+ #
89
+ # CODE_AWARE — the fast local classifier is not designed for code-aware
90
+ # semantics (T4 R15/R23e/R23f/R23h, r20). The reasoner
91
+ # routes those straight to a cached LLM call.
92
+ #
93
+ # Any decision_kind not listed here falls through to remote_fallback with
94
+ # ``degraded=True`` to make accidental misuse visible in telemetry instead
95
+ # of silent.
96
+ #
97
+ # Keep this map in lockstep with ``docs/semantic-reasoner-model-notes.md``.
98
+
99
+
100
+ TEXTUAL_KINDS: tuple[str, ...] = (
101
+ "session_end_intent",
102
+ "autonomy_mandate",
103
+ "guard_verbal_ack",
104
+ "r14_correction",
105
+ "r16_declared_done",
106
+ "r17_promise_debt",
107
+ "r34_identity_coherence",
108
+ "followup_operator_attention",
109
+ "drive_signal_type",
110
+ "drive_area",
111
+ "reply_event_type",
112
+ "query_intent",
113
+ "sentiment_intent",
114
+ )
115
+
116
+
117
+ CODE_AWARE_KINDS: tuple[str, ...] = (
118
+ "r20_constant_change",
119
+ "t4_r15",
120
+ "t4_r23e",
121
+ "t4_r23f",
122
+ "t4_r23h",
123
+ )
124
+
125
+
126
+ ALL_DECISION_KINDS: tuple[str, ...] = TEXTUAL_KINDS + CODE_AWARE_KINDS
127
+
128
+
129
+ # Per-kind policy. Explicit, human-readable, no defaults that silently
130
+ # expand coverage. Changing policy = editing this dict + updating the
131
+ # model-notes doc + bumping tests.
132
+ _POLICY: dict[str, dict[str, Any]] = {
133
+ kind: {
134
+ "family": "textual",
135
+ "fast_local_threshold": 0.60,
136
+ "reasoner_mode": "multipass_local",
137
+ "reasoner_threshold": 0.75,
138
+ "allow_remote_fallback": True,
139
+ }
140
+ for kind in TEXTUAL_KINDS
141
+ }
142
+
143
+ _POLICY.update(
144
+ {
145
+ kind: {
146
+ "family": "code_aware",
147
+ "fast_local_threshold": None, # skip fast_local
148
+ "reasoner_mode": "cached_llm",
149
+ "reasoner_threshold": 0.60,
150
+ "allow_remote_fallback": True,
151
+ }
152
+ for kind in CODE_AWARE_KINDS
153
+ }
154
+ )
155
+
156
+
157
+ def policy_for(decision_kind: str) -> dict[str, Any] | None:
158
+ """Return the policy entry for a kind, or None if unknown."""
159
+ return _POLICY.get(decision_kind)
160
+
161
+
162
+ # ---------------------------------------------------------------------------
163
+ # Layer adapters
164
+ # ---------------------------------------------------------------------------
165
+ #
166
+ # The router does not import the heavy modules at the top of the file so
167
+ # that a caller who only wants ``policy_for`` or ``ALL_DECISION_KINDS`` does
168
+ # not pay the import cost. The adapters below resolve the dependencies
169
+ # lazily and wrap failures as ``None`` so the router can advance to the
170
+ # next layer deterministically.
171
+
172
+
173
+ def _run_fast_local(
174
+ *,
175
+ question: str,
176
+ labels: tuple[str, ...],
177
+ confidence_floor: float,
178
+ ) -> RouterResult | None:
179
+ """Try ``LocalZeroShotClassifier``. Return None on unavailable or
180
+ below-threshold so the router advances."""
181
+ try:
182
+ from classifier_local import LocalZeroShotClassifier
183
+ except Exception as exc: # pragma: no cover — install not ready
184
+ _logger.debug("semantic_router: classifier_local unavailable (%s)", exc)
185
+ return None
186
+
187
+ clf = LocalZeroShotClassifier(confidence_floor=confidence_floor)
188
+ result = clf.classify(question, labels)
189
+ if result is None:
190
+ return None
191
+ if result.confidence < confidence_floor:
192
+ return None
193
+
194
+ return RouterResult(
195
+ ok=True,
196
+ decision_kind="", # filled by caller
197
+ verdict=result.label,
198
+ label=result.label,
199
+ confidence=float(result.confidence),
200
+ route_used="fast_local",
201
+ degraded=False,
202
+ meta={
203
+ "scores": dict(result.scores),
204
+ "latency_ms": float(result.latency_ms),
205
+ "threshold": confidence_floor,
206
+ },
207
+ )
208
+
209
+
210
+ def _run_semantic_reasoner(
211
+ *,
212
+ decision_kind: str,
213
+ question: str,
214
+ labels: tuple[str, ...] | None,
215
+ context: str,
216
+ mode: str,
217
+ confidence_floor: float,
218
+ ) -> RouterResult | None:
219
+ """Delegate to ``src/semantic_reasoner.py``. Return None on unavailable
220
+ so the router advances to remote_fallback."""
221
+ try:
222
+ from semantic_reasoner import reason
223
+ except Exception as exc: # pragma: no cover
224
+ _logger.debug("semantic_router: semantic_reasoner unavailable (%s)", exc)
225
+ return None
226
+
227
+ try:
228
+ return reason(
229
+ decision_kind=decision_kind,
230
+ question=question,
231
+ labels=labels,
232
+ context=context,
233
+ mode=mode,
234
+ confidence_floor=confidence_floor,
235
+ )
236
+ except Exception as exc: # noqa: BLE001 — fail-closed, degrade to remote
237
+ _logger.warning("semantic_reasoner.reason raised: %s", exc)
238
+ return None
239
+
240
+
241
+ def _run_remote_fallback(
242
+ *,
243
+ decision_kind: str,
244
+ question: str,
245
+ labels: tuple[str, ...] | None,
246
+ context: str,
247
+ ) -> RouterResult | None:
248
+ """Last-resort LLM call via ``call_model_raw``. The router marks the
249
+ result as ``degraded=True`` so telemetry shows when the stack fell
250
+ through."""
251
+ try:
252
+ import call_model_raw as _cmr
253
+ except Exception as exc: # pragma: no cover
254
+ _logger.debug("semantic_router: call_model_raw unavailable (%s)", exc)
255
+ return None
256
+
257
+ # Resolve symbols defensively. Tests sometimes stub only ``call_model_raw``
258
+ # and forget ``ClassifierUnavailableError`` (or vice versa); without this
259
+ # guard a missing attribute later becomes NameError at ``except`` time and
260
+ # crashes the router instead of degrading.
261
+ call_model_raw_fn = getattr(_cmr, "call_model_raw", None)
262
+ classifier_unavailable_cls = getattr(
263
+ _cmr, "ClassifierUnavailableError", Exception
264
+ )
265
+ if call_model_raw_fn is None:
266
+ return RouterResult(
267
+ ok=False,
268
+ decision_kind=decision_kind,
269
+ route_used="remote_fallback",
270
+ degraded=True,
271
+ error="call_model_raw callable missing",
272
+ )
273
+
274
+ prompt = _build_remote_prompt(
275
+ decision_kind=decision_kind,
276
+ question=question,
277
+ labels=labels,
278
+ context=context,
279
+ )
280
+ system = (
281
+ "You are NEXO's remote semantic fallback. Answer with the single "
282
+ "best label from the provided list, or with 'unknown' if none fit. "
283
+ "No prose, no explanation."
284
+ )
285
+
286
+ try:
287
+ raw = call_model_raw_fn(
288
+ prompt,
289
+ system=system,
290
+ caller="semantic_reasoner",
291
+ tier="muy_bajo",
292
+ max_tokens=32,
293
+ temperature=0.0,
294
+ )
295
+ except classifier_unavailable_cls as exc:
296
+ return RouterResult(
297
+ ok=False,
298
+ decision_kind=decision_kind,
299
+ route_used="remote_fallback",
300
+ degraded=True,
301
+ error=f"remote_unavailable: {exc}",
302
+ )
303
+ except Exception as exc: # noqa: BLE001 — fail-closed, never re-raise
304
+ return RouterResult(
305
+ ok=False,
306
+ decision_kind=decision_kind,
307
+ route_used="remote_fallback",
308
+ degraded=True,
309
+ error=f"remote_error: {exc}",
310
+ )
311
+
312
+ verdict = _normalize_remote_answer(raw, labels)
313
+ raw_preview = (raw or "")[:120]
314
+ return RouterResult(
315
+ ok=verdict is not None,
316
+ decision_kind=decision_kind,
317
+ verdict=verdict,
318
+ label=verdict,
319
+ confidence=0.55 if verdict is not None else 0.0,
320
+ route_used="remote_fallback",
321
+ degraded=True, # always degraded relative to the local-first ideal
322
+ meta={"raw_response": raw_preview},
323
+ )
324
+
325
+
326
+ def _build_remote_prompt(
327
+ *,
328
+ decision_kind: str,
329
+ question: str,
330
+ labels: tuple[str, ...] | None,
331
+ context: str,
332
+ ) -> str:
333
+ parts = [
334
+ f"Decision kind: {decision_kind}",
335
+ f"Question: {question}",
336
+ ]
337
+ if context:
338
+ parts.append(f"Context: {context[:400]}")
339
+ if labels:
340
+ parts.append("Candidate labels: " + ", ".join(labels))
341
+ parts.append("Reply with exactly one of the labels above.")
342
+ else:
343
+ parts.append("Reply with the shortest phrase that answers the question.")
344
+ return "\n".join(parts)
345
+
346
+
347
+ def _normalize_remote_answer(
348
+ raw: str, labels: tuple[str, ...] | None
349
+ ) -> str | None:
350
+ text = (raw or "").strip().lower()
351
+ if not text:
352
+ return None
353
+ if labels:
354
+ for label in labels:
355
+ if label.lower() == text:
356
+ return label
357
+ for label in labels:
358
+ if label.lower() in text:
359
+ return label
360
+ return None
361
+ return text
362
+
363
+
364
+ # ---------------------------------------------------------------------------
365
+ # Public entrypoint
366
+ # ---------------------------------------------------------------------------
367
+
368
+
369
+ def route(
370
+ *,
371
+ decision_kind: str,
372
+ question: str,
373
+ context: str = "",
374
+ labels: tuple[str, ...] | list[str] | None = None,
375
+ allow_remote_fallback: bool = True,
376
+ ) -> RouterResult:
377
+ """Route a semantic decision through the stack.
378
+
379
+ The caller names the *kind* of decision. The router looks up the policy,
380
+ dispatches through fast_local -> semantic_reasoner -> remote_fallback,
381
+ and returns the first layer that produced a decision above its
382
+ threshold.
383
+
384
+ ``allow_remote_fallback=False`` forces local-only behaviour; the router
385
+ will return ``ok=False, route_used='no_route'`` if every local layer
386
+ refused. Useful for strict-offline automation or pytest.
387
+ """
388
+ policy = policy_for(decision_kind)
389
+ if policy is None:
390
+ return RouterResult(
391
+ ok=False,
392
+ decision_kind=decision_kind,
393
+ route_used="no_route",
394
+ degraded=True,
395
+ error=f"unknown decision_kind: {decision_kind}",
396
+ )
397
+
398
+ labels_tuple: tuple[str, ...] | None = (
399
+ tuple(labels) if labels else None
400
+ )
401
+
402
+ # Step 1 — fast_local for textual families only.
403
+ if policy["fast_local_threshold"] is not None and labels_tuple:
404
+ fast = _run_fast_local(
405
+ question=question,
406
+ labels=labels_tuple,
407
+ confidence_floor=float(policy["fast_local_threshold"]),
408
+ )
409
+ if fast is not None:
410
+ fast.decision_kind = decision_kind
411
+ return fast
412
+
413
+ # Step 2 — semantic_reasoner (Mode A or B depending on policy).
414
+ reasoned = _run_semantic_reasoner(
415
+ decision_kind=decision_kind,
416
+ question=question,
417
+ labels=labels_tuple,
418
+ context=context,
419
+ mode=str(policy["reasoner_mode"]),
420
+ confidence_floor=float(policy["reasoner_threshold"]),
421
+ )
422
+ if reasoned is not None and reasoned.ok:
423
+ return reasoned
424
+
425
+ # Step 3 — remote_fallback if allowed.
426
+ if allow_remote_fallback and policy.get("allow_remote_fallback", True):
427
+ remote = _run_remote_fallback(
428
+ decision_kind=decision_kind,
429
+ question=question,
430
+ labels=labels_tuple,
431
+ context=context,
432
+ )
433
+ if remote is not None:
434
+ return remote
435
+
436
+ return RouterResult(
437
+ ok=False,
438
+ decision_kind=decision_kind,
439
+ route_used="no_route",
440
+ degraded=True,
441
+ error="every layer refused or was unavailable",
442
+ )
443
+
444
+
445
+ __all__ = [
446
+ "ALL_DECISION_KINDS",
447
+ "CODE_AWARE_KINDS",
448
+ "RouterResult",
449
+ "TEXTUAL_KINDS",
450
+ "policy_for",
451
+ "route",
452
+ ]
@@ -3214,7 +3214,7 @@
3214
3214
  "threshold": 1
3215
3215
  }
3216
3216
  ],
3217
- "inject_prompt": "You must start by calling nexo_startup to register this session. Execute it now with a brief task description. Do not produce visible text.",
3217
+ "inject_prompt": "You must start by calling nexo_startup to register this session. If mcp__nexo__* tools appear as deferred in the tool list (names visible but JSONSchemas not loaded), first call ToolSearch with query \"select:mcp__nexo__nexo_startup,mcp__nexo__nexo_heartbeat,mcp__nexo__nexo_session_diary_read,mcp__nexo__nexo_reminders,mcp__nexo__nexo_smart_startup,mcp__nexo__nexo_task_open,mcp__nexo__nexo_task_close,mcp__nexo__nexo_task_acknowledge_guard,mcp__nexo__nexo_guard_check,mcp__nexo__nexo_learning_add,mcp__nexo__nexo_confidence_check,mcp__nexo__nexo_followup_create,mcp__nexo__nexo_protocol_debt_resolve\" to load the schemas — deferred is not absent. If more nexo_* tools appear deferred later in the session, preload them the same way instead of giving up on them. Then execute nexo_startup with a brief task description. Do not produce visible text.",
3218
3218
  "triggers_after": [
3219
3219
  "nexo_smart_startup",
3220
3220
  "nexo_session_diary_read",