nexo-brain 7.8.2 → 7.9.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "nexo-brain",
3
- "version": "7.8.2",
3
+ "version": "7.9.1",
4
4
  "description": "Local cognitive runtime for Claude Code \u2014 persistent memory, overnight learning, doctor diagnostics, personal scripts, recovery-aware jobs, startup preflight, and optional dashboard/power helper.",
5
5
  "author": {
6
6
  "name": "NEXO Brain",
package/README.md CHANGED
@@ -18,7 +18,11 @@
18
18
 
19
19
  [Watch the overview video](https://nexo-brain.com/watch/) · [Watch on YouTube](https://www.youtube.com/watch?v=i2lkGhKyVqI) · [Open the infographic](https://nexo-brain.com/assets/nexo-brain-infographic-v5.png)
20
20
 
21
- Version `7.8.2` is the current packaged-runtime line. Patch release that fixes the compact-hook observability gap Francisco flagged after v7.8.1: `hook_runs.session_id` was empty for 7 out of 8 recent compaction rows (and when populated it stored the raw Claude Code token instead of the NEXO sid), so per-session queries over `hook_runs` for compact events could not be joined back to the NEXO session that actually compacted. v7.8.2 adds `src/hooks/compact_session_resolver.py` with `resolve_nexo_sid(claude_session_id)`, which walks the same rails the shell already uses: `sessions.claude_session_id` match, then `session_claude_aliases.claude_session_id` (most recent `last_seen` wins), then the per-conversation sidecar under `runtime/data/compacting/<safe-claude-id>.txt`, then the legacy global sidecar for single-conversation setups. `src/hooks/pre_compact.py` and `src/hooks/post_compact.py` now call the resolver and store the real NEXO sid in `hook_runs.session_id`; both wrappers also stash `{claude_session_id, sid_source}` in `hook_runs.metadata` so "why is this row still empty?" has a one-query answer. Nine new tests in `tests/test_hook_runs_compact_sid_resolution.py` pin the five resolver rails (sessions / alias / sidecar / legacy / none), malformed-sidecar rejection, the pre- and post-compact wrapper end-to-end paths, and the empty-state wrapper rail so a clean audit trail is written even when nothing resolves. No Desktop bump.
21
+ Version `7.9.1` is the current packaged-runtime line. Patch release that starts the semantic-router site migration promised after v7.9.0: six safe textual-conversational callers now route through `semantic_router.route(...)` instead of importing `enforcement_classifier.classify` directly (`session_end_intent`, `r14_correction_learning`, `r16_declared_done`, `r17_promise_debt`, `autonomy_mandate`, `guard_verbal_ack`). The patch also fixes the semantic stack's local layers to classify the live `context` text rather than letting static prompt templates dominate zero-shot decisions, and migrates the six callers to semantic labels (`session_end`/`continue_session`, `negative_feedback`/`ordinary_request`, etc.) instead of generic `yes`/`no`. Existing fail-closed behaviour and test injection seams are preserved. Targeted verification: 105 tests passing across router, reasoner, migrated call sites, and their enforcement integrations. Remaining textual/code-aware callers stay tracked under `NF-SEMANTIC-ROUTER-SITE-MIGRATION` for later focused patches. No Desktop bump.
22
+
23
+ Previously in `7.9.0`: minor release that ships the foundation of the semantic stack (router + reasoner + CLI) under the ONEPASS LLM Coverage plan, plus two product-bug fixes observed in the wild on 2026-04-23. New `src/semantic_router.py` exposes 18 named `decision_kinds` (13 textual + 5 code-aware) with a per-kind policy table and the layer chain `fast_local → semantic_reasoner → remote_fallback`. New `src/semantic_reasoner.py` adds Mode A (`multipass_local`: reuses the mDeBERTa pin with three prompt-perturbed passes + majority vote + 0.75 floor) and Mode B (`cached_llm`: wrapper over `call_model_raw` with a pid+uuid atomic-write 24h-TTL disk cache at `~/.nexo/runtime/operations/semantic-reasoner-cache.json`, SHA-256 keyed by `decision_kind` + normalized input, LRU-bounded at 2000 entries, corrupt entries dropped on read). New `scripts/semantic-classify.py` JSON-in JSON-out CLI lets external MCP clients (including the closed-source NEXO Desktop companion) query Brain as the single semantic authority. New `NEXO_SEMANTIC_REASONER` kill switch (`0`/`off`/`false`/`no`/`disable`/`disabled`) honours the plan mandate for a runtime opt-out separate from `NEXO_LOCAL_CLASSIFIER`. Bug fixes: `bin/nexo-brain.js` upgrade flow now copies `templates/` root the same way fresh install and same-version refresh already did (Maria iMac 7.1.10→7.8.1 upgrade had lost 27 core-prompts templates and broken post-update import verification); and `tool-enforcement-map.json` `nexo_startup.enforcement.inject_prompt` now instructs the model to preload the 13 `mcp__nexo__*` protocol tools via `ToolSearch` before calling `nexo_startup` when the host MCP client defers tool schemas (Claude Code with many MCPs installed). Audit-driven hardening: router/reasoner defensively use `getattr` over the `call_model_raw` module and add a trailing `except Exception` so provider errors degrade with `remote_error` instead of propagating; cache writes use pid+uuid tmp + `fsync` + `os.replace` to survive concurrent writers; `NEXO_SEMANTIC_REASONER_TTL` parse tolerates malformed values. Tests: +50 (22 router, 20 reasoner, 8 CLI). Per-site migration of existing callers (`session_end_intent`, `r14`, `r16`, `r17`, `r20`, `r34`, T4 gates, `tools_drive`, `nexo-followup-runner`) is explicitly deferred to follow-up patch releases and tracked as `NF-SEMANTIC-ROUTER-SITE-MIGRATION`; nothing in this release changes the behaviour of the existing callers. Companion coordinated release: NEXO Desktop v0.28.0.
24
+
25
+ Previously in `7.8.2`: patch release that fixes the compact-hook observability gap Francisco flagged after v7.8.1: `hook_runs.session_id` was empty for 7 out of 8 recent compaction rows (and when populated it stored the raw Claude Code token instead of the NEXO sid), so per-session queries over `hook_runs` for compact events could not be joined back to the NEXO session that actually compacted. v7.8.2 adds `src/hooks/compact_session_resolver.py` with `resolve_nexo_sid(claude_session_id)`, which walks the same rails the shell already uses: `sessions.claude_session_id` match, then `session_claude_aliases.claude_session_id` (most recent `last_seen` wins), then the per-conversation sidecar under `runtime/data/compacting/<safe-claude-id>.txt`, then the legacy global sidecar for single-conversation setups. `src/hooks/pre_compact.py` and `src/hooks/post_compact.py` now call the resolver and store the real NEXO sid in `hook_runs.session_id`; both wrappers also stash `{claude_session_id, sid_source}` in `hook_runs.metadata` so "why is this row still empty?" has a one-query answer. Nine new tests in `tests/test_hook_runs_compact_sid_resolution.py` pin the five resolver rails (sessions / alias / sidecar / legacy / none), malformed-sidecar rejection, the pre- and post-compact wrapper end-to-end paths, and the empty-state wrapper rail so a clean audit trail is written even when nothing resolves. No Desktop bump.
22
26
 
23
27
  Previously in `7.8.1`: patch release that closed the last compaction-continuity gap Francisco flagged after v7.8.0: `pre-compact.sh` Layer 2 emergency auto-diary and Layer 3 `compaction_memory.record_auto_flush` now use the exact `TARGET_SID` resolved from `CLAUDE_SESSION_ID` instead of falling back to `ORDER BY last_update_epoch DESC LIMIT 1` ("latest active session"). In multi-conversation Desktop that fallback routinely wrote the emergency diary against the wrong conversation even though the main restore path was already exact-SID in v7.8.0. `last_diary_ts` is also scoped by `session_id` now. Fail-closed when no `CLAUDE_SESSION_ID` resolves. New behavioural tests drive the real shell script with two sessions in the DB to pin the invariant. Fixed a latent bash-escape bug in `pre-compact.sh` where a double-quoted string inside a Python comment silently closed the `python3 -c "..."` argument early — caught by adding the behavioural tests. Pytest 2092 passing (+2 new behavioural). No Desktop bump.
24
28
 
package/bin/nexo-brain.js CHANGED
@@ -2127,6 +2127,18 @@ async function main() {
2127
2127
  writeRuntimeCoreArtifactsManifest(NEXO_HOME, srcDir);
2128
2128
  log(" Scripts updated.");
2129
2129
 
2130
+ // Update templates/ root (core-prompts/, CLAUDE.md.template, etc.) — recursive
2131
+ // Managed surface: copyDirRec overwrites without diffing, so any
2132
+ // hand-edited template under ~/.nexo/templates/ is replaced on
2133
+ // upgrade. Keep local forks under personal/ or outside the runtime
2134
+ // home to avoid silent loss.
2135
+ const migTemplatesSrc = path.join(__dirname, "..", "templates");
2136
+ const migTemplatesDest = path.join(NEXO_HOME, "templates");
2137
+ if (fs.existsSync(migTemplatesSrc)) {
2138
+ copyDirRec(migTemplatesSrc, migTemplatesDest);
2139
+ log(" Templates updated (user-edited templates/ files are overwritten).");
2140
+ }
2141
+
2130
2142
  // Register ALL 8 core hooks in settings.json (additive — don't remove user's custom hooks)
2131
2143
  let settings = {};
2132
2144
  if (fs.existsSync(CLAUDE_SETTINGS)) {
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "nexo-brain",
3
- "version": "7.8.2",
3
+ "version": "7.9.1",
4
4
  "mcpName": "io.github.wazionapps/nexo",
5
5
  "description": "NEXO Brain \u2014 Shared brain for AI agents. Persistent memory, semantic RAG, natural forgetting, metacognitive guard, trust scoring, 150+ MCP tools. Works with Claude Code, Codex, Claude Desktop & any MCP client. 100% local, free.",
6
6
  "homepage": "https://nexo-brain.com",
@@ -39,6 +39,7 @@ from core_prompts import render_core_prompt
39
39
  NEXO_HOME = Path(os.environ.get("NEXO_HOME", str(Path.home() / ".nexo")))
40
40
  STATE_PATH = NEXO_HOME / "runtime" / "data" / "autonomy_mandate.json"
41
41
  CLASSIFIER_QUESTION = render_core_prompt("autonomy-mandate-question")
42
+ SEMANTIC_LABELS = ("autonomy_mandate", "not_mandate")
42
43
 
43
44
  # Marker list per NF-DS-45569A27. Case-insensitive substring match.
44
45
  MARKERS = (
@@ -119,9 +120,21 @@ def _detect_marker(text: str, *, classifier=None) -> Optional[str]:
119
120
  return marker
120
121
  if classifier is None:
121
122
  try:
122
- from enforcement_classifier import classify as classifier # type: ignore
123
+ from semantic_router import route as semantic_route
123
124
  except Exception:
124
125
  return None
126
+ try:
127
+ result = semantic_route(
128
+ decision_kind="autonomy_mandate",
129
+ question=CLASSIFIER_QUESTION,
130
+ context=text.strip()[:1200],
131
+ labels=SEMANTIC_LABELS,
132
+ )
133
+ if bool(result.ok and (result.label or result.verdict) == "autonomy_mandate"):
134
+ return _SEMANTIC_MARKER
135
+ except Exception:
136
+ return None
137
+ return None
125
138
  try:
126
139
  if bool(classifier(question=CLASSIFIER_QUESTION, context=text.strip()[:1200])):
127
140
  return _SEMANTIC_MARKER
@@ -10,6 +10,7 @@ from core_prompts import render_core_prompt
10
10
 
11
11
 
12
12
  CLASSIFIER_QUESTION = render_core_prompt("guard-verbal-ack-question")
13
+ SEMANTIC_LABELS = ("explicit_ack", "not_ack")
13
14
 
14
15
 
15
16
  def _build_context(
@@ -44,7 +45,7 @@ def detect_guard_verbal_ack(
44
45
  return False
45
46
  if classifier is None:
46
47
  try:
47
- from enforcement_classifier import classify as classifier # type: ignore
48
+ from semantic_router import route as semantic_route
48
49
  except Exception:
49
50
  return False
50
51
  context = _build_context(
@@ -54,6 +55,17 @@ def detect_guard_verbal_ack(
54
55
  file_path=file_path,
55
56
  guard_summary=guard_summary,
56
57
  )
58
+ if classifier is None:
59
+ try:
60
+ result = semantic_route(
61
+ decision_kind="guard_verbal_ack",
62
+ question=CLASSIFIER_QUESTION,
63
+ context=context,
64
+ labels=SEMANTIC_LABELS,
65
+ )
66
+ return bool(result.ok and (result.label or result.verdict) == "explicit_ack")
67
+ except Exception:
68
+ return False
57
69
  try:
58
70
  return bool(classifier(question=CLASSIFIER_QUESTION, context=context))
59
71
  except Exception:
@@ -9,10 +9,9 @@ Fase 2 Protocol Enforcer Fase C (Capa 2) item R14. Plan doc 1 reads:
9
9
 
10
10
  Implementation contract:
11
11
 
12
- - Correction detection goes through the enforcement_classifier
13
- (triple-reinforced yes/no on call_model_raw). Learning #122
14
- prohibits keyword-based semantic detection; the classifier path
15
- is the sanctioned alternative.
12
+ - Correction detection goes through semantic_router decision_kind
13
+ ``r14_correction``. Learning #122 prohibits keyword-based semantic
14
+ detection; the router path is the sanctioned alternative.
16
15
  - Fail-closed: when the classifier is unavailable (no API key,
17
16
  automation_backend=none, timeout, 5xx), is_correction returns
18
17
  False. Downstream R28 (system prompt) and the auto_capture hook
@@ -31,6 +30,8 @@ from __future__ import annotations
31
30
  from core_prompts import render_core_prompt
32
31
 
33
32
  CLASSIFIER_QUESTION = render_core_prompt("r14-correction-learning-question")
33
+ SEMANTIC_LABELS = ("negative_feedback", "ordinary_request")
34
+ POSITIVE_LABEL = "negative_feedback"
34
35
 
35
36
 
36
37
  INJECTION_PROMPT_TEMPLATE = render_core_prompt("r14-correction-learning-injection")
@@ -45,7 +46,7 @@ def detect_correction(user_text: str, *, classifier=None) -> bool:
45
46
  Args:
46
47
  user_text: Raw user-role text from the stream.
47
48
  classifier: Injection point for tests. Defaults to
48
- enforcement_classifier.classify.
49
+ semantic_router.route(decision_kind="r14_correction").
49
50
 
50
51
  Fail-closed on ClassifierUnavailableError — returns False rather
51
52
  than raising so the caller's enforcement loop never crashes on a
@@ -62,7 +63,17 @@ def detect_correction(user_text: str, *, classifier=None) -> bool:
62
63
  return False
63
64
  if classifier is None:
64
65
  try:
65
- from enforcement_classifier import classify as classifier # type: ignore
66
+ from semantic_router import route as semantic_route
67
+ except Exception:
68
+ return False
69
+ try:
70
+ result = semantic_route(
71
+ decision_kind="r14_correction",
72
+ question=CLASSIFIER_QUESTION,
73
+ context=text,
74
+ labels=SEMANTIC_LABELS,
75
+ )
76
+ return bool(result.ok and (result.label or result.verdict) == POSITIVE_LABEL)
66
77
  except Exception:
67
78
  return False
68
79
  try:
@@ -10,9 +10,9 @@ Exposes detect_declared_done(assistant_text, classifier=None) → bool and
10
10
  the reminder prompt template. The window-and-state tracking lives in
11
11
  the HeadlessEnforcer / Desktop EnforcementEngine, not here.
12
12
 
13
- Classifier contract: same triple-reinforced yes/no path as R14
14
- (enforcement_classifier.classify → call_model_raw). Fail-closed on
15
- unavailable backend → detect returns False rather than raising.
13
+ Classifier contract: same semantic_router yes/no path as R14
14
+ (``decision_kind=r16_declared_done``). Fail-closed on unavailable backend →
15
+ detect returns False rather than raising.
16
16
 
17
17
  Mirror: nexo-desktop/lib/r16-declared-done.js (pending, landing in the
18
18
  next tranche alongside the JS classifier infrastructure).
@@ -22,6 +22,7 @@ from __future__ import annotations
22
22
  from core_prompts import render_core_prompt
23
23
 
24
24
  CLASSIFIER_QUESTION = render_core_prompt("r16-declared-done-question")
25
+ SEMANTIC_LABELS = ("declared_done", "not_done")
25
26
 
26
27
 
27
28
  INJECTION_PROMPT_TEMPLATE = render_core_prompt("r16-declared-done-injection")
@@ -43,7 +44,17 @@ def detect_declared_done(assistant_text: str, *, classifier=None) -> bool:
43
44
  return False
44
45
  if classifier is None:
45
46
  try:
46
- from enforcement_classifier import classify as classifier # type: ignore
47
+ from semantic_router import route as semantic_route
48
+ except Exception:
49
+ return False
50
+ try:
51
+ result = semantic_route(
52
+ decision_kind="r16_declared_done",
53
+ question=CLASSIFIER_QUESTION,
54
+ context=text,
55
+ labels=SEMANTIC_LABELS,
56
+ )
57
+ return bool(result.ok and (result.label or result.verdict) == "declared_done")
47
58
  except Exception:
48
59
  return False
49
60
  try:
@@ -9,9 +9,9 @@ Fase 2 Protocol Enforcer Fase D item R17. Plan doc 1 reads:
9
9
  Exposes detect_promise(text, classifier) → bool. State (promise window
10
10
  countdown) lives in the caller — mirrors the R14 / R16 pattern.
11
11
 
12
- Classifier path is the same as R14 / R16: enforcement_classifier.classify
13
- routes through call_model_raw with triple reinforcement. Fail-closed on
14
- any unavailable backend (no promise flagged rather than a false positive).
12
+ Classifier path is the same as R14 / R16:
13
+ semantic_router decision_kind ``r17_promise_debt``. Fail-closed on any
14
+ unavailable backend (no promise flagged rather than a false positive).
15
15
 
16
16
  Mirror: nexo-desktop/lib/r17-promise-debt.js (bundled with Fase D JS
17
17
  twins at the end of the tranche).
@@ -21,6 +21,7 @@ from __future__ import annotations
21
21
  from core_prompts import render_core_prompt
22
22
 
23
23
  CLASSIFIER_QUESTION = render_core_prompt("r17-promise-debt-question")
24
+ SEMANTIC_LABELS = ("promise", "no_promise")
24
25
 
25
26
  INJECTION_PROMPT_TEMPLATE = render_core_prompt("r17-promise-debt-injection")
26
27
 
@@ -37,7 +38,17 @@ def detect_promise(assistant_text: str, *, classifier=None) -> bool:
37
38
  return False
38
39
  if classifier is None:
39
40
  try:
40
- from enforcement_classifier import classify as classifier # type: ignore
41
+ from semantic_router import route as semantic_route
42
+ except Exception:
43
+ return False
44
+ try:
45
+ result = semantic_route(
46
+ decision_kind="r17_promise_debt",
47
+ question=CLASSIFIER_QUESTION,
48
+ context=text,
49
+ labels=SEMANTIC_LABELS,
50
+ )
51
+ return bool(result.ok and (result.label or result.verdict) == "promise")
41
52
  except Exception:
42
53
  return False
43
54
  try:
@@ -0,0 +1,584 @@
1
+ """semantic_reasoner — second-layer semantic decision maker.
2
+
3
+ Plan ONEPASS LLM Coverage. Called through ``src/semantic_router.py``.
4
+ Exposes a single ``reason()`` entrypoint with two modes:
5
+
6
+ Mode A — ``multipass_local`` (textual decision kinds)
7
+
8
+ Reuses the already-pinned ``LocalZeroShotClassifier`` (see
9
+ ``docs/classifier-model-notes.md``) but with stricter behaviour:
10
+ three inference passes with mild prompt perturbations, then
11
+ majority vote across passes. A decision is only accepted if at
12
+ least two of three passes agree AND the agreed confidence is
13
+ above the stricter threshold. This kills single-pass false
14
+ positives without adding a new model dependency.
15
+
16
+ Mode B — ``cached_llm`` (code-aware decision kinds)
17
+
18
+ Thin wrapper around ``call_model_raw`` with a disk cache scoped
19
+ by (decision_kind, sha256(normalized_prompt)). TTL = 24h. The
20
+ cache lives under ``~/.nexo/runtime/operations/semantic-reasoner-cache.json``
21
+ alongside the existing classifier install state. Cache hits
22
+ return instantly and are flagged in ``meta.cache_hit``. Misses
23
+ call the LLM; the response and its normalized verdict are
24
+ written back to the cache atomically.
25
+
26
+ Pin notes: this module does not introduce a new downloaded model.
27
+ Mode A reuses ``MODEL_ID``/``MODEL_REVISION`` from ``classifier_local``.
28
+ Mode B resolves the LLM through the standard resonance map with
29
+ ``caller='semantic_reasoner'`` and ``tier='muy_bajo'``; the pin lives
30
+ in ``resonance_map`` like every other LLM caller.
31
+
32
+ See ``docs/semantic-reasoner-model-notes.md`` for the rationale behind
33
+ this "upgrade-in-place, pin-by-reuse" strategy, and why a dedicated
34
+ stronger local LLM (Llama 3.1 8B, etc.) is explicitly deferred to a
35
+ future release.
36
+ """
37
+ from __future__ import annotations
38
+
39
+ import hashlib
40
+ import json
41
+ import logging
42
+ import os
43
+ import re
44
+ import time
45
+ from dataclasses import dataclass, field
46
+ from pathlib import Path
47
+ from typing import Any
48
+
49
+ _logger = logging.getLogger(__name__)
50
+
51
+
52
+ # ---------------------------------------------------------------------------
53
+ # Shared dataclass imported from the router
54
+ # ---------------------------------------------------------------------------
55
+
56
+
57
+ def _import_router_result():
58
+ """Lazy import to avoid circular dependency on semantic_router."""
59
+ from semantic_router import RouterResult
60
+
61
+ return RouterResult
62
+
63
+
64
+ # ---------------------------------------------------------------------------
65
+ # Mode A — multi-pass local
66
+ # ---------------------------------------------------------------------------
67
+
68
+
69
+ _PROMPT_PERTURBATIONS: tuple[str, ...] = (
70
+ "{q}",
71
+ "Decide: {q}",
72
+ "Classify this utterance: {q}",
73
+ )
74
+
75
+
76
+ def _collect_local_votes(
77
+ question: str, labels: tuple[str, ...]
78
+ ) -> list[tuple[str, float, dict[str, float]]]:
79
+ """Run the local classifier three times with mild prompt variations.
80
+
81
+ Returns a list of ``(label, confidence, scores)`` triples. Any
82
+ pass that fails silently returns a zero-confidence entry so the
83
+ vote aggregator can still detect quorum problems.
84
+ """
85
+ try:
86
+ from classifier_local import LocalZeroShotClassifier
87
+ except Exception as exc: # pragma: no cover
88
+ _logger.debug("semantic_reasoner: classifier_local unavailable (%s)", exc)
89
+ return []
90
+
91
+ clf = LocalZeroShotClassifier(confidence_floor=0.0)
92
+ votes: list[tuple[str, float, dict[str, float]]] = []
93
+ for template in _PROMPT_PERTURBATIONS:
94
+ prompt = template.format(q=question)
95
+ result = clf.classify(prompt, labels)
96
+ if result is None:
97
+ votes.append(("", 0.0, {}))
98
+ continue
99
+ votes.append((result.label, float(result.confidence), dict(result.scores)))
100
+ return votes
101
+
102
+
103
+ def _aggregate_votes(
104
+ votes: list[tuple[str, float, dict[str, float]]],
105
+ confidence_floor: float,
106
+ ) -> tuple[str | None, float, dict[str, Any]]:
107
+ """Majority vote across passes. Returns (label_or_none, confidence, meta)."""
108
+ if not votes:
109
+ return None, 0.0, {"reason": "no_votes"}
110
+
111
+ counts: dict[str, int] = {}
112
+ confidences: dict[str, list[float]] = {}
113
+ for label, confidence, _scores in votes:
114
+ if not label:
115
+ continue
116
+ counts[label] = counts.get(label, 0) + 1
117
+ confidences.setdefault(label, []).append(confidence)
118
+
119
+ if not counts:
120
+ return None, 0.0, {"reason": "all_passes_failed", "votes": len(votes)}
121
+
122
+ best_label = max(counts, key=lambda lbl: (counts[lbl], max(confidences[lbl])))
123
+ vote_count = counts[best_label]
124
+ avg_confidence = sum(confidences[best_label]) / len(confidences[best_label])
125
+
126
+ meta: dict[str, Any] = {
127
+ "votes_total": len(votes),
128
+ "votes_for_best": vote_count,
129
+ "avg_confidence": round(avg_confidence, 4),
130
+ "per_label_counts": dict(counts),
131
+ }
132
+
133
+ if vote_count < 2:
134
+ meta["reason"] = "no_majority"
135
+ return None, avg_confidence, meta
136
+ if avg_confidence < confidence_floor:
137
+ meta["reason"] = "below_threshold"
138
+ return None, avg_confidence, meta
139
+ return best_label, avg_confidence, meta
140
+
141
+
142
+ def _reason_multipass_local(
143
+ *,
144
+ decision_kind: str,
145
+ question: str,
146
+ context: str = "",
147
+ labels: tuple[str, ...] | None,
148
+ confidence_floor: float,
149
+ ):
150
+ RouterResult = _import_router_result()
151
+ if not labels:
152
+ return RouterResult(
153
+ ok=False,
154
+ decision_kind=decision_kind,
155
+ route_used="semantic_reasoner",
156
+ degraded=True,
157
+ error="multipass_local requires labels",
158
+ )
159
+
160
+ semantic_input = (context or "").strip() or question
161
+ votes = _collect_local_votes(semantic_input, labels)
162
+ label, confidence, meta = _aggregate_votes(votes, confidence_floor)
163
+ if label is None:
164
+ return RouterResult(
165
+ ok=False,
166
+ decision_kind=decision_kind,
167
+ route_used="semantic_reasoner",
168
+ degraded=True,
169
+ error=meta.get("reason", "aggregation_failed"),
170
+ meta={"mode": "multipass_local", "aggregate": meta},
171
+ )
172
+ return RouterResult(
173
+ ok=True,
174
+ decision_kind=decision_kind,
175
+ verdict=label,
176
+ label=label,
177
+ confidence=round(float(confidence), 4),
178
+ route_used="semantic_reasoner",
179
+ degraded=False,
180
+ meta={"mode": "multipass_local", "aggregate": meta},
181
+ )
182
+
183
+
184
+ # ---------------------------------------------------------------------------
185
+ # Mode B — cached LLM
186
+ # ---------------------------------------------------------------------------
187
+
188
+
189
+ _DEFAULT_CACHE_TTL_SECONDS = 24 * 3600
190
+
191
+
192
+ def _cache_path() -> Path:
193
+ """Resolve the on-disk cache location.
194
+
195
+ Reuses ``paths.operations_dir()`` so the reasoner state lives next to
196
+ the existing ``classifier-install-state.json``. If ``paths`` is not
197
+ importable (heavy module; test context), fall back to a deterministic
198
+ location under ``NEXO_HOME``.
199
+ """
200
+ override = os.environ.get("NEXO_SEMANTIC_REASONER_CACHE_PATH", "").strip()
201
+ if override:
202
+ return Path(override)
203
+ try:
204
+ import paths
205
+
206
+ return paths.operations_dir() / "semantic-reasoner-cache.json"
207
+ except Exception:
208
+ home = os.environ.get("NEXO_HOME", "").strip()
209
+ root = Path(home) if home else Path.home() / ".nexo"
210
+ return root / "runtime" / "operations" / "semantic-reasoner-cache.json"
211
+
212
+
213
+ def _normalize_for_hash(text: str) -> str:
214
+ """Normalise whitespace/case so equivalent prompts hit the same cache
215
+ entry. Does not touch content semantics beyond whitespace collapse."""
216
+ return re.sub(r"\s+", " ", (text or "").strip().lower())
217
+
218
+
219
+ def _cache_key(
220
+ *,
221
+ decision_kind: str,
222
+ question: str,
223
+ labels: tuple[str, ...] | None,
224
+ context: str,
225
+ ) -> str:
226
+ payload = json.dumps(
227
+ {
228
+ "kind": decision_kind,
229
+ "q": _normalize_for_hash(question),
230
+ "ctx": _normalize_for_hash(context)[:400],
231
+ "labels": list(labels) if labels else [],
232
+ },
233
+ sort_keys=True,
234
+ ensure_ascii=False,
235
+ )
236
+ return hashlib.sha256(payload.encode("utf-8")).hexdigest()
237
+
238
+
239
+ def _read_cache() -> dict[str, Any]:
240
+ try:
241
+ path = _cache_path()
242
+ if not path.is_file():
243
+ return {}
244
+ data = json.loads(path.read_text() or "{}")
245
+ if isinstance(data, dict):
246
+ return data
247
+ except Exception as exc: # pragma: no cover — corrupt cache
248
+ _logger.warning("semantic_reasoner: cache read failed (%s); starting fresh", exc)
249
+ return {}
250
+
251
+
252
+ def _write_cache(cache: dict[str, Any]) -> None:
253
+ """Atomic write with pid+uuid suffix so concurrent Brain / Desktop CLI
254
+ writers do not stomp each other's temp file."""
255
+ try:
256
+ path = _cache_path()
257
+ path.parent.mkdir(parents=True, exist_ok=True)
258
+ import os as _os
259
+ import uuid as _uuid
260
+
261
+ tmp = path.with_name(
262
+ f"{path.name}.tmp.{_os.getpid()}.{_uuid.uuid4().hex[:8]}"
263
+ )
264
+ payload = json.dumps(cache, ensure_ascii=False, sort_keys=True)
265
+ with open(tmp, "w", encoding="utf-8") as handle:
266
+ handle.write(payload)
267
+ handle.flush()
268
+ try:
269
+ _os.fsync(handle.fileno())
270
+ except OSError:
271
+ pass
272
+ _os.replace(tmp, path)
273
+ except Exception as exc: # pragma: no cover
274
+ _logger.warning("semantic_reasoner: cache write failed (%s)", exc)
275
+
276
+
277
+ def _cache_get(key: str, ttl_seconds: int) -> dict[str, Any] | None:
278
+ cache = _read_cache()
279
+ entry = cache.get(key)
280
+ if not isinstance(entry, dict):
281
+ return None
282
+ ts = float(entry.get("ts", 0.0) or 0.0)
283
+ if ts <= 0.0:
284
+ return None
285
+ if (time.time() - ts) > ttl_seconds:
286
+ return None
287
+ return entry
288
+
289
+
290
+ def _cache_put(key: str, entry: dict[str, Any]) -> None:
291
+ cache = _read_cache()
292
+ cache[key] = {**entry, "ts": time.time()}
293
+ if len(cache) > 2000:
294
+ # Keep the 1800 most-recent entries to avoid unbounded growth. The
295
+ # bound is advisory; callers should keep reasoner prompts small.
296
+ items = sorted(cache.items(), key=lambda kv: float(kv[1].get("ts", 0.0) or 0.0))
297
+ cache = dict(items[-1800:])
298
+ _write_cache(cache)
299
+
300
+
301
+ def _parse_ttl_env() -> int:
302
+ """Read ``NEXO_SEMANTIC_REASONER_TTL`` defensively.
303
+
304
+ Malformed values (non-integer, negative) fall back to the default so
305
+ operator typos never crash the reasoner on first call.
306
+ """
307
+ raw = os.environ.get("NEXO_SEMANTIC_REASONER_TTL", "")
308
+ if not raw:
309
+ return _DEFAULT_CACHE_TTL_SECONDS
310
+ try:
311
+ parsed = int(raw)
312
+ except (TypeError, ValueError):
313
+ _logger.warning(
314
+ "semantic_reasoner: invalid NEXO_SEMANTIC_REASONER_TTL=%r; "
315
+ "using default %d",
316
+ raw,
317
+ _DEFAULT_CACHE_TTL_SECONDS,
318
+ )
319
+ return _DEFAULT_CACHE_TTL_SECONDS
320
+ if parsed <= 0:
321
+ return _DEFAULT_CACHE_TTL_SECONDS
322
+ return parsed
323
+
324
+
325
+ def _reason_cached_llm(
326
+ *,
327
+ decision_kind: str,
328
+ question: str,
329
+ labels: tuple[str, ...] | None,
330
+ context: str,
331
+ confidence_floor: float,
332
+ ):
333
+ RouterResult = _import_router_result()
334
+ ttl = _parse_ttl_env()
335
+ key = _cache_key(
336
+ decision_kind=decision_kind,
337
+ question=question,
338
+ labels=labels,
339
+ context=context,
340
+ )
341
+
342
+ cached = _cache_get(key, ttl)
343
+ if cached is not None:
344
+ cached_verdict = cached.get("verdict")
345
+ if isinstance(cached_verdict, str) and cached_verdict.strip():
346
+ return RouterResult(
347
+ ok=True,
348
+ decision_kind=decision_kind,
349
+ verdict=cached_verdict,
350
+ label=cached_verdict,
351
+ confidence=float(cached.get("confidence", 0.6)),
352
+ route_used="semantic_reasoner",
353
+ degraded=False,
354
+ meta={
355
+ "mode": "cached_llm",
356
+ "cache_hit": True,
357
+ "cache_key": key[:12],
358
+ },
359
+ )
360
+ # Corrupt entry (verdict missing or non-string). Drop it and fall
361
+ # through to a live call so the caller is never handed a cached
362
+ # "ok=True, verdict=None" sentinel.
363
+ _logger.warning(
364
+ "semantic_reasoner: dropping corrupt cache entry for key=%s",
365
+ key[:12],
366
+ )
367
+
368
+ try:
369
+ import call_model_raw as _cmr
370
+ except Exception as exc: # pragma: no cover
371
+ return RouterResult(
372
+ ok=False,
373
+ decision_kind=decision_kind,
374
+ route_used="semantic_reasoner",
375
+ degraded=True,
376
+ error=f"call_model_raw unavailable: {exc}",
377
+ meta={"mode": "cached_llm", "cache_hit": False},
378
+ )
379
+
380
+ call_model_raw_fn = getattr(_cmr, "call_model_raw", None)
381
+ classifier_unavailable_cls = getattr(
382
+ _cmr, "ClassifierUnavailableError", Exception
383
+ )
384
+ if call_model_raw_fn is None:
385
+ return RouterResult(
386
+ ok=False,
387
+ decision_kind=decision_kind,
388
+ route_used="semantic_reasoner",
389
+ degraded=True,
390
+ error="call_model_raw callable missing",
391
+ meta={"mode": "cached_llm", "cache_hit": False},
392
+ )
393
+
394
+ prompt = _build_reasoner_prompt(
395
+ decision_kind=decision_kind,
396
+ question=question,
397
+ labels=labels,
398
+ context=context,
399
+ )
400
+ system = (
401
+ "You are NEXO's code-aware semantic reasoner. Answer with the "
402
+ "single best label from the provided list (no prose). If no "
403
+ "label fits, answer 'unknown'."
404
+ )
405
+ try:
406
+ raw = call_model_raw_fn(
407
+ prompt,
408
+ system=system,
409
+ caller="semantic_reasoner",
410
+ tier="muy_bajo",
411
+ max_tokens=32,
412
+ temperature=0.0,
413
+ )
414
+ except classifier_unavailable_cls as exc:
415
+ return RouterResult(
416
+ ok=False,
417
+ decision_kind=decision_kind,
418
+ route_used="semantic_reasoner",
419
+ degraded=True,
420
+ error=f"remote_unavailable: {exc}",
421
+ meta={"mode": "cached_llm", "cache_hit": False},
422
+ )
423
+ except Exception as exc: # noqa: BLE001 — fail-closed
424
+ return RouterResult(
425
+ ok=False,
426
+ decision_kind=decision_kind,
427
+ route_used="semantic_reasoner",
428
+ degraded=True,
429
+ error=f"remote_error: {exc}",
430
+ meta={"mode": "cached_llm", "cache_hit": False},
431
+ )
432
+
433
+ verdict = _normalize_verdict(raw, labels)
434
+ if verdict is None:
435
+ return RouterResult(
436
+ ok=False,
437
+ decision_kind=decision_kind,
438
+ route_used="semantic_reasoner",
439
+ degraded=True,
440
+ error="llm_returned_unknown_or_unparseable",
441
+ meta={
442
+ "mode": "cached_llm",
443
+ "cache_hit": False,
444
+ "raw": (raw or "")[:80],
445
+ },
446
+ )
447
+
448
+ _cache_put(
449
+ key,
450
+ {
451
+ "verdict": verdict,
452
+ "confidence": max(confidence_floor, 0.6),
453
+ "decision_kind": decision_kind,
454
+ },
455
+ )
456
+
457
+ return RouterResult(
458
+ ok=True,
459
+ decision_kind=decision_kind,
460
+ verdict=verdict,
461
+ label=verdict,
462
+ confidence=max(confidence_floor, 0.6),
463
+ route_used="semantic_reasoner",
464
+ degraded=False,
465
+ meta={"mode": "cached_llm", "cache_hit": False, "cache_key": key[:12]},
466
+ )
467
+
468
+
469
+ def _build_reasoner_prompt(
470
+ *,
471
+ decision_kind: str,
472
+ question: str,
473
+ labels: tuple[str, ...] | None,
474
+ context: str,
475
+ ) -> str:
476
+ parts = [
477
+ f"decision_kind: {decision_kind}",
478
+ f"question: {question}",
479
+ ]
480
+ if context:
481
+ parts.append(f"context: {context[:600]}")
482
+ if labels:
483
+ parts.append("candidate_labels: " + ", ".join(labels))
484
+ parts.append("Reply with exactly one of the labels above.")
485
+ else:
486
+ parts.append("Reply with the shortest phrase answering the question.")
487
+ return "\n".join(parts)
488
+
489
+
490
+ def _normalize_verdict(
491
+ raw: str, labels: tuple[str, ...] | None
492
+ ) -> str | None:
493
+ text = (raw or "").strip().lower()
494
+ if not text:
495
+ return None
496
+ if text == "unknown":
497
+ return None
498
+ if labels:
499
+ for label in labels:
500
+ if label.lower() == text:
501
+ return label
502
+ for label in labels:
503
+ if label.lower() in text:
504
+ return label
505
+ return None
506
+ return text
507
+
508
+
509
+ # ---------------------------------------------------------------------------
510
+ # Public entrypoint
511
+ # ---------------------------------------------------------------------------
512
+
513
+
514
+ _REASONER_OFF_VALUES = {"0", "off", "false", "no", "disable", "disabled"}
515
+
516
+
517
+ def _is_reasoner_disabled() -> bool:
518
+ """Honour the ``NEXO_SEMANTIC_REASONER`` runtime kill switch.
519
+
520
+ The plan (ONEPASS LLM Coverage) explicitly required an env opt-out
521
+ dedicated to the reasoner, separate from ``NEXO_LOCAL_CLASSIFIER``
522
+ (which only gates install-time provisioning). Operators who hit a
523
+ reasoner regression in production can set ``NEXO_SEMANTIC_REASONER=0``
524
+ to force every ``reason()`` call to refuse; the router then falls
525
+ through to ``remote_fallback`` on its own.
526
+ """
527
+ raw = os.environ.get("NEXO_SEMANTIC_REASONER", "").strip().lower()
528
+ return raw in _REASONER_OFF_VALUES
529
+
530
+
531
+ def reason(
532
+ *,
533
+ decision_kind: str,
534
+ question: str,
535
+ labels: tuple[str, ...] | list[str] | None,
536
+ context: str = "",
537
+ mode: str = "multipass_local",
538
+ confidence_floor: float = 0.75,
539
+ ):
540
+ """Dispatch to the configured mode. Called by ``semantic_router.route``.
541
+
542
+ Returns a ``RouterResult``. The router knows how to keep going to
543
+ ``remote_fallback`` if this layer refuses.
544
+ """
545
+ RouterResult = _import_router_result()
546
+
547
+ if _is_reasoner_disabled():
548
+ return RouterResult(
549
+ ok=False,
550
+ decision_kind=decision_kind,
551
+ route_used="semantic_reasoner",
552
+ degraded=True,
553
+ error="reasoner_disabled_by_env",
554
+ meta={"env": "NEXO_SEMANTIC_REASONER"},
555
+ )
556
+
557
+ labels_tuple: tuple[str, ...] | None = tuple(labels) if labels else None
558
+ if mode == "multipass_local":
559
+ return _reason_multipass_local(
560
+ decision_kind=decision_kind,
561
+ question=question,
562
+ context=context,
563
+ labels=labels_tuple,
564
+ confidence_floor=confidence_floor,
565
+ )
566
+ if mode == "cached_llm":
567
+ return _reason_cached_llm(
568
+ decision_kind=decision_kind,
569
+ question=question,
570
+ labels=labels_tuple,
571
+ context=context,
572
+ confidence_floor=confidence_floor,
573
+ )
574
+
575
+ return RouterResult(
576
+ ok=False,
577
+ decision_kind=decision_kind,
578
+ route_used="semantic_reasoner",
579
+ degraded=True,
580
+ error=f"unknown reasoner mode: {mode}",
581
+ )
582
+
583
+
584
+ __all__ = ["reason"]
@@ -0,0 +1,462 @@
1
+ """semantic_router — Plan ONEPASS LLM Coverage.
2
+
3
+ Central router for every model-backed semantic decision in NEXO Brain. Call
4
+ sites declare a *decision_kind* and pass question/context; the router
5
+ applies the policy for that kind and dispatches through the stack:
6
+
7
+ fast_local -> semantic_reasoner -> remote_fallback
8
+
9
+ Design contract (from ~/Desktop/NEXO-ONEPASS-LLM-COVERAGE-RELEASE-PLAN.md):
10
+
11
+ - Brain owns the semantic contract, model pins and routing policy.
12
+ - Every call site passes a *named* decision_kind; policy lives here, not in
13
+ the caller. This replaces the previous pattern where each caller invented
14
+ its own policy tree.
15
+ - The existing ``LocalZeroShotClassifier`` stays as the cheap multilingual
16
+ first pass (``fast_local``).
17
+ - ``semantic_reasoner`` is the second, stronger layer. Its implementation
18
+ lives in ``src/semantic_reasoner.py`` with two modes: Mode A (strict
19
+ multi-pass over the same local classifier with tighter thresholds) and
20
+ Mode B (LLM-cached reasoner for code-aware decisions).
21
+ - ``remote_fallback`` is the existing ``call_model_raw`` chain. It is no
22
+ longer the default path for local-friendly decisions; it only fires if
23
+ the upstream layers refuse or degrade.
24
+
25
+ The router returns a ``RouterResult`` dataclass so callers can inspect
26
+ which route was used, whether degraded mode is active, and what confidence
27
+ the decision carries. This is also what Desktop will consume via the
28
+ ``brain-semantic-router.js`` bridge shipped in the companion PR.
29
+ """
30
+ from __future__ import annotations
31
+
32
+ import logging
33
+ from dataclasses import dataclass, field
34
+ from typing import Any
35
+
36
+ _logger = logging.getLogger(__name__)
37
+
38
+
39
+ # ---------------------------------------------------------------------------
40
+ # Contract dataclasses
41
+ # ---------------------------------------------------------------------------
42
+
43
+
44
+ @dataclass
45
+ class RouterResult:
46
+ """Outcome of a ``route()`` call.
47
+
48
+ Fields match the minimum contract documented in the plan (section
49
+ "Minimum router output contract"):
50
+
51
+ - ``ok``: overall success (at least one layer produced a decision)
52
+ - ``decision_kind``: the kind the caller passed
53
+ - ``verdict``: the chosen label when the caller used zero-shot
54
+ classification; None when the underlying layer returned free text
55
+ - ``label``: alias for ``verdict`` to match the plan's wording; kept
56
+ consistent to simplify Desktop bridge mapping
57
+ - ``confidence``: [0.0, 1.0]
58
+ - ``route_used``: one of ``fast_local``, ``semantic_reasoner``,
59
+ ``remote_fallback``, or ``no_route`` when every layer refused
60
+ - ``degraded``: True when the chosen layer could not meet its normal
61
+ bar (fallback fired, stricter threshold not met, cache-only, etc.)
62
+ - ``error``: short human-readable reason when ``ok`` is False
63
+ - ``meta``: free-form layer-specific evidence (scores dict, cache
64
+ key, latency, model id) — Desktop uses it for telemetry
65
+ """
66
+
67
+ ok: bool
68
+ decision_kind: str
69
+ verdict: str | None = None
70
+ label: str | None = None
71
+ confidence: float = 0.0
72
+ route_used: str = "no_route"
73
+ degraded: bool = False
74
+ error: str | None = None
75
+ meta: dict[str, Any] = field(default_factory=dict)
76
+
77
+
78
+ # ---------------------------------------------------------------------------
79
+ # Decision kinds + policy table
80
+ # ---------------------------------------------------------------------------
81
+ #
82
+ # The plan enumerates 18 decision_kinds that need to route through here. They
83
+ # fall into two families:
84
+ #
85
+ # TEXTUAL — the first-line local classifier is good enough; the
86
+ # reasoner adds a stricter multi-pass check for ambiguous
87
+ # cases. Remote is only a last-resort safety net.
88
+ #
89
+ # CODE_AWARE — the fast local classifier is not designed for code-aware
90
+ # semantics (T4 R15/R23e/R23f/R23h, r20). The reasoner
91
+ # routes those straight to a cached LLM call.
92
+ #
93
+ # Any decision_kind not listed here falls through to remote_fallback with
94
+ # ``degraded=True`` to make accidental misuse visible in telemetry instead
95
+ # of silent.
96
+ #
97
+ # Keep this map in lockstep with ``docs/semantic-reasoner-model-notes.md``.
98
+
99
+
100
+ TEXTUAL_KINDS: tuple[str, ...] = (
101
+ "session_end_intent",
102
+ "autonomy_mandate",
103
+ "guard_verbal_ack",
104
+ "r14_correction",
105
+ "r16_declared_done",
106
+ "r17_promise_debt",
107
+ "r34_identity_coherence",
108
+ "followup_operator_attention",
109
+ "drive_signal_type",
110
+ "drive_area",
111
+ "reply_event_type",
112
+ "query_intent",
113
+ "sentiment_intent",
114
+ )
115
+
116
+
117
+ CODE_AWARE_KINDS: tuple[str, ...] = (
118
+ "r20_constant_change",
119
+ "t4_r15",
120
+ "t4_r23e",
121
+ "t4_r23f",
122
+ "t4_r23h",
123
+ )
124
+
125
+
126
+ ALL_DECISION_KINDS: tuple[str, ...] = TEXTUAL_KINDS + CODE_AWARE_KINDS
127
+
128
+
129
+ # Per-kind policy. Explicit, human-readable, no defaults that silently
130
+ # expand coverage. Changing policy = editing this dict + updating the
131
+ # model-notes doc + bumping tests.
132
+ _POLICY: dict[str, dict[str, Any]] = {
133
+ kind: {
134
+ "family": "textual",
135
+ "fast_local_threshold": 0.60,
136
+ "reasoner_mode": "multipass_local",
137
+ "reasoner_threshold": 0.75,
138
+ "allow_remote_fallback": True,
139
+ }
140
+ for kind in TEXTUAL_KINDS
141
+ }
142
+
143
+ _POLICY.update(
144
+ {
145
+ kind: {
146
+ "family": "code_aware",
147
+ "fast_local_threshold": None, # skip fast_local
148
+ "reasoner_mode": "cached_llm",
149
+ "reasoner_threshold": 0.60,
150
+ "allow_remote_fallback": True,
151
+ }
152
+ for kind in CODE_AWARE_KINDS
153
+ }
154
+ )
155
+
156
+
157
+ def policy_for(decision_kind: str) -> dict[str, Any] | None:
158
+ """Return the policy entry for a kind, or None if unknown."""
159
+ return _POLICY.get(decision_kind)
160
+
161
+
162
+ # ---------------------------------------------------------------------------
163
+ # Layer adapters
164
+ # ---------------------------------------------------------------------------
165
+ #
166
+ # The router does not import the heavy modules at the top of the file so
167
+ # that a caller who only wants ``policy_for`` or ``ALL_DECISION_KINDS`` does
168
+ # not pay the import cost. The adapters below resolve the dependencies
169
+ # lazily and wrap failures as ``None`` so the router can advance to the
170
+ # next layer deterministically.
171
+
172
+
173
+ def _run_fast_local(
174
+ *,
175
+ question: str,
176
+ context: str = "",
177
+ labels: tuple[str, ...],
178
+ confidence_floor: float,
179
+ ) -> RouterResult | None:
180
+ """Try ``LocalZeroShotClassifier``. Return None on unavailable or
181
+ below-threshold so the router advances.
182
+
183
+ The first layer must classify the actual user/assistant payload. For
184
+ guard decisions the ``question`` is usually a stable prompt template and
185
+ the live text lives in ``context``; feeding both into a zero-shot NLI
186
+ classifier makes the static prompt dominate the decision. Use context
187
+ when present, and fall back to question for simple direct callers.
188
+ """
189
+ try:
190
+ from classifier_local import LocalZeroShotClassifier
191
+ except Exception as exc: # pragma: no cover — install not ready
192
+ _logger.debug("semantic_router: classifier_local unavailable (%s)", exc)
193
+ return None
194
+
195
+ clf = LocalZeroShotClassifier(confidence_floor=confidence_floor)
196
+ classifier_input = (context or "").strip() or question
197
+ result = clf.classify(classifier_input, labels)
198
+ if result is None:
199
+ return None
200
+ if result.confidence < confidence_floor:
201
+ return None
202
+
203
+ return RouterResult(
204
+ ok=True,
205
+ decision_kind="", # filled by caller
206
+ verdict=result.label,
207
+ label=result.label,
208
+ confidence=float(result.confidence),
209
+ route_used="fast_local",
210
+ degraded=False,
211
+ meta={
212
+ "scores": dict(result.scores),
213
+ "latency_ms": float(result.latency_ms),
214
+ "threshold": confidence_floor,
215
+ },
216
+ )
217
+
218
+
219
+ def _run_semantic_reasoner(
220
+ *,
221
+ decision_kind: str,
222
+ question: str,
223
+ labels: tuple[str, ...] | None,
224
+ context: str,
225
+ mode: str,
226
+ confidence_floor: float,
227
+ ) -> RouterResult | None:
228
+ """Delegate to ``src/semantic_reasoner.py``. Return None on unavailable
229
+ so the router advances to remote_fallback."""
230
+ try:
231
+ from semantic_reasoner import reason
232
+ except Exception as exc: # pragma: no cover
233
+ _logger.debug("semantic_router: semantic_reasoner unavailable (%s)", exc)
234
+ return None
235
+
236
+ try:
237
+ return reason(
238
+ decision_kind=decision_kind,
239
+ question=question,
240
+ labels=labels,
241
+ context=context,
242
+ mode=mode,
243
+ confidence_floor=confidence_floor,
244
+ )
245
+ except Exception as exc: # noqa: BLE001 — fail-closed, degrade to remote
246
+ _logger.warning("semantic_reasoner.reason raised: %s", exc)
247
+ return None
248
+
249
+
250
+ def _run_remote_fallback(
251
+ *,
252
+ decision_kind: str,
253
+ question: str,
254
+ labels: tuple[str, ...] | None,
255
+ context: str,
256
+ ) -> RouterResult | None:
257
+ """Last-resort LLM call via ``call_model_raw``. The router marks the
258
+ result as ``degraded=True`` so telemetry shows when the stack fell
259
+ through."""
260
+ try:
261
+ import call_model_raw as _cmr
262
+ except Exception as exc: # pragma: no cover
263
+ _logger.debug("semantic_router: call_model_raw unavailable (%s)", exc)
264
+ return None
265
+
266
+ # Resolve symbols defensively. Tests sometimes stub only ``call_model_raw``
267
+ # and forget ``ClassifierUnavailableError`` (or vice versa); without this
268
+ # guard a missing attribute later becomes NameError at ``except`` time and
269
+ # crashes the router instead of degrading.
270
+ call_model_raw_fn = getattr(_cmr, "call_model_raw", None)
271
+ classifier_unavailable_cls = getattr(
272
+ _cmr, "ClassifierUnavailableError", Exception
273
+ )
274
+ if call_model_raw_fn is None:
275
+ return RouterResult(
276
+ ok=False,
277
+ decision_kind=decision_kind,
278
+ route_used="remote_fallback",
279
+ degraded=True,
280
+ error="call_model_raw callable missing",
281
+ )
282
+
283
+ prompt = _build_remote_prompt(
284
+ decision_kind=decision_kind,
285
+ question=question,
286
+ labels=labels,
287
+ context=context,
288
+ )
289
+ system = (
290
+ "You are NEXO's remote semantic fallback. Answer with the single "
291
+ "best label from the provided list, or with 'unknown' if none fit. "
292
+ "No prose, no explanation."
293
+ )
294
+
295
+ try:
296
+ raw = call_model_raw_fn(
297
+ prompt,
298
+ system=system,
299
+ caller="semantic_reasoner",
300
+ tier="muy_bajo",
301
+ max_tokens=32,
302
+ temperature=0.0,
303
+ )
304
+ except classifier_unavailable_cls as exc:
305
+ return RouterResult(
306
+ ok=False,
307
+ decision_kind=decision_kind,
308
+ route_used="remote_fallback",
309
+ degraded=True,
310
+ error=f"remote_unavailable: {exc}",
311
+ )
312
+ except Exception as exc: # noqa: BLE001 — fail-closed, never re-raise
313
+ return RouterResult(
314
+ ok=False,
315
+ decision_kind=decision_kind,
316
+ route_used="remote_fallback",
317
+ degraded=True,
318
+ error=f"remote_error: {exc}",
319
+ )
320
+
321
+ verdict = _normalize_remote_answer(raw, labels)
322
+ raw_preview = (raw or "")[:120]
323
+ return RouterResult(
324
+ ok=verdict is not None,
325
+ decision_kind=decision_kind,
326
+ verdict=verdict,
327
+ label=verdict,
328
+ confidence=0.55 if verdict is not None else 0.0,
329
+ route_used="remote_fallback",
330
+ degraded=True, # always degraded relative to the local-first ideal
331
+ meta={"raw_response": raw_preview},
332
+ )
333
+
334
+
335
+ def _build_remote_prompt(
336
+ *,
337
+ decision_kind: str,
338
+ question: str,
339
+ labels: tuple[str, ...] | None,
340
+ context: str,
341
+ ) -> str:
342
+ parts = [
343
+ f"Decision kind: {decision_kind}",
344
+ f"Question: {question}",
345
+ ]
346
+ if context:
347
+ parts.append(f"Context: {context[:400]}")
348
+ if labels:
349
+ parts.append("Candidate labels: " + ", ".join(labels))
350
+ parts.append("Reply with exactly one of the labels above.")
351
+ else:
352
+ parts.append("Reply with the shortest phrase that answers the question.")
353
+ return "\n".join(parts)
354
+
355
+
356
+ def _normalize_remote_answer(
357
+ raw: str, labels: tuple[str, ...] | None
358
+ ) -> str | None:
359
+ text = (raw or "").strip().lower()
360
+ if not text:
361
+ return None
362
+ if labels:
363
+ for label in labels:
364
+ if label.lower() == text:
365
+ return label
366
+ for label in labels:
367
+ if label.lower() in text:
368
+ return label
369
+ return None
370
+ return text
371
+
372
+
373
+ # ---------------------------------------------------------------------------
374
+ # Public entrypoint
375
+ # ---------------------------------------------------------------------------
376
+
377
+
378
+ def route(
379
+ *,
380
+ decision_kind: str,
381
+ question: str,
382
+ context: str = "",
383
+ labels: tuple[str, ...] | list[str] | None = None,
384
+ allow_remote_fallback: bool = True,
385
+ ) -> RouterResult:
386
+ """Route a semantic decision through the stack.
387
+
388
+ The caller names the *kind* of decision. The router looks up the policy,
389
+ dispatches through fast_local -> semantic_reasoner -> remote_fallback,
390
+ and returns the first layer that produced a decision above its
391
+ threshold.
392
+
393
+ ``allow_remote_fallback=False`` forces local-only behaviour; the router
394
+ will return ``ok=False, route_used='no_route'`` if every local layer
395
+ refused. Useful for strict-offline automation or pytest.
396
+ """
397
+ policy = policy_for(decision_kind)
398
+ if policy is None:
399
+ return RouterResult(
400
+ ok=False,
401
+ decision_kind=decision_kind,
402
+ route_used="no_route",
403
+ degraded=True,
404
+ error=f"unknown decision_kind: {decision_kind}",
405
+ )
406
+
407
+ labels_tuple: tuple[str, ...] | None = (
408
+ tuple(labels) if labels else None
409
+ )
410
+
411
+ # Step 1 — fast_local for textual families only.
412
+ if policy["fast_local_threshold"] is not None and labels_tuple:
413
+ fast = _run_fast_local(
414
+ question=question,
415
+ context=context,
416
+ labels=labels_tuple,
417
+ confidence_floor=float(policy["fast_local_threshold"]),
418
+ )
419
+ if fast is not None:
420
+ fast.decision_kind = decision_kind
421
+ return fast
422
+
423
+ # Step 2 — semantic_reasoner (Mode A or B depending on policy).
424
+ reasoned = _run_semantic_reasoner(
425
+ decision_kind=decision_kind,
426
+ question=question,
427
+ labels=labels_tuple,
428
+ context=context,
429
+ mode=str(policy["reasoner_mode"]),
430
+ confidence_floor=float(policy["reasoner_threshold"]),
431
+ )
432
+ if reasoned is not None and reasoned.ok:
433
+ return reasoned
434
+
435
+ # Step 3 — remote_fallback if allowed.
436
+ if allow_remote_fallback and policy.get("allow_remote_fallback", True):
437
+ remote = _run_remote_fallback(
438
+ decision_kind=decision_kind,
439
+ question=question,
440
+ labels=labels_tuple,
441
+ context=context,
442
+ )
443
+ if remote is not None:
444
+ return remote
445
+
446
+ return RouterResult(
447
+ ok=False,
448
+ decision_kind=decision_kind,
449
+ route_used="no_route",
450
+ degraded=True,
451
+ error="every layer refused or was unavailable",
452
+ )
453
+
454
+
455
+ __all__ = [
456
+ "ALL_DECISION_KINDS",
457
+ "CODE_AWARE_KINDS",
458
+ "RouterResult",
459
+ "TEXTUAL_KINDS",
460
+ "policy_for",
461
+ "route",
462
+ ]
@@ -8,6 +8,7 @@ from __future__ import annotations
8
8
  from core_prompts import render_core_prompt
9
9
 
10
10
  CLASSIFIER_QUESTION = render_core_prompt("session-end-intent-question")
11
+ SEMANTIC_LABELS = ("session_end", "continue_session")
11
12
 
12
13
 
13
14
  def detect_session_end_intent(user_text: str, *, classifier=None) -> bool:
@@ -16,7 +17,17 @@ def detect_session_end_intent(user_text: str, *, classifier=None) -> bool:
16
17
  return False
17
18
  if classifier is None:
18
19
  try:
19
- from enforcement_classifier import classify as classifier # type: ignore
20
+ from semantic_router import route as semantic_route
21
+ except Exception:
22
+ return False
23
+ try:
24
+ result = semantic_route(
25
+ decision_kind="session_end_intent",
26
+ question=CLASSIFIER_QUESTION,
27
+ context=text,
28
+ labels=SEMANTIC_LABELS,
29
+ )
30
+ return bool(result.ok and (result.label or result.verdict) == "session_end")
20
31
  except Exception:
21
32
  return False
22
33
  try:
@@ -3214,7 +3214,7 @@
3214
3214
  "threshold": 1
3215
3215
  }
3216
3216
  ],
3217
- "inject_prompt": "You must start by calling nexo_startup to register this session. Execute it now with a brief task description. Do not produce visible text.",
3217
+ "inject_prompt": "You must start by calling nexo_startup to register this session. If mcp__nexo__* tools appear as deferred in the tool list (names visible but JSONSchemas not loaded), first call ToolSearch with query \"select:mcp__nexo__nexo_startup,mcp__nexo__nexo_heartbeat,mcp__nexo__nexo_session_diary_read,mcp__nexo__nexo_reminders,mcp__nexo__nexo_smart_startup,mcp__nexo__nexo_task_open,mcp__nexo__nexo_task_close,mcp__nexo__nexo_task_acknowledge_guard,mcp__nexo__nexo_guard_check,mcp__nexo__nexo_learning_add,mcp__nexo__nexo_confidence_check,mcp__nexo__nexo_followup_create,mcp__nexo__nexo_protocol_debt_resolve\" to load the schemas — deferred is not absent. If more nexo_* tools appear deferred later in the session, preload them the same way instead of giving up on them. Then execute nexo_startup with a brief task description. Do not produce visible text.",
3218
3218
  "triggers_after": [
3219
3219
  "nexo_smart_startup",
3220
3220
  "nexo_session_diary_read",