npm - nexo-brain - Versions diffs - 7.8.2 → 7.9.1 - Mend

nexo-brain 7.8.2 → 7.9.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (13) hide show

package/.claude-plugin/plugin.json +1 -1
package/README.md +5 -1
package/bin/nexo-brain.js +12 -0
package/package.json +1 -1
package/src/autonomy_mandate.py +14 -1
package/src/guard_verbal_ack.py +13 -1
package/src/r14_correction_learning.py +17 -6
package/src/r16_declared_done.py +15 -4
package/src/r17_promise_debt.py +15 -4
package/src/semantic_reasoner.py +584 -0
package/src/semantic_router.py +462 -0
package/src/session_end_intent.py +12 -1
package/tool-enforcement-map.json +1 -1

package/.claude-plugin/plugin.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "nexo-brain",
-  "version": "7.8.2",
+  "version": "7.9.1",
   "description": "Local cognitive runtime for Claude Code \u2014 persistent memory, overnight learning, doctor diagnostics, personal scripts, recovery-aware jobs, startup preflight, and optional dashboard/power helper.",
   "author": {
     "name": "NEXO Brain",

package/README.md CHANGED Viewed

@@ -18,7 +18,11 @@
 [Watch the overview video](https://nexo-brain.com/watch/) · [Watch on YouTube](https://www.youtube.com/watch?v=i2lkGhKyVqI) · [Open the infographic](https://nexo-brain.com/assets/nexo-brain-infographic-v5.png)
-Version `7.8.2` is the current packaged-runtime line. Patch release that fixes the compact-hook observability gap Francisco flagged after v7.8.1: `hook_runs.session_id` was empty for 7 out of 8 recent compaction rows (and when populated it stored the raw Claude Code token instead of the NEXO sid), so per-session queries over `hook_runs` for compact events could not be joined back to the NEXO session that actually compacted. v7.8.2 adds `src/hooks/compact_session_resolver.py` with `resolve_nexo_sid(claude_session_id)`, which walks the same rails the shell already uses: `sessions.claude_session_id` match, then `session_claude_aliases.claude_session_id` (most recent `last_seen` wins), then the per-conversation sidecar under `runtime/data/compacting/<safe-claude-id>.txt`, then the legacy global sidecar for single-conversation setups. `src/hooks/pre_compact.py` and `src/hooks/post_compact.py` now call the resolver and store the real NEXO sid in `hook_runs.session_id`; both wrappers also stash `{claude_session_id, sid_source}` in `hook_runs.metadata` so "why is this row still empty?" has a one-query answer. Nine new tests in `tests/test_hook_runs_compact_sid_resolution.py` pin the five resolver rails (sessions / alias / sidecar / legacy / none), malformed-sidecar rejection, the pre- and post-compact wrapper end-to-end paths, and the empty-state wrapper rail so a clean audit trail is written even when nothing resolves. No Desktop bump.
+Version `7.9.1` is the current packaged-runtime line. Patch release that starts the semantic-router site migration promised after v7.9.0: six safe textual-conversational callers now route through `semantic_router.route(...)` instead of importing `enforcement_classifier.classify` directly (`session_end_intent`, `r14_correction_learning`, `r16_declared_done`, `r17_promise_debt`, `autonomy_mandate`, `guard_verbal_ack`). The patch also fixes the semantic stack's local layers to classify the live `context` text rather than letting static prompt templates dominate zero-shot decisions, and migrates the six callers to semantic labels (`session_end`/`continue_session`, `negative_feedback`/`ordinary_request`, etc.) instead of generic `yes`/`no`. Existing fail-closed behaviour and test injection seams are preserved. Targeted verification: 105 tests passing across router, reasoner, migrated call sites, and their enforcement integrations. Remaining textual/code-aware callers stay tracked under `NF-SEMANTIC-ROUTER-SITE-MIGRATION` for later focused patches. No Desktop bump.
+Previously in `7.9.0`: minor release that ships the foundation of the semantic stack (router + reasoner + CLI) under the ONEPASS LLM Coverage plan, plus two product-bug fixes observed in the wild on 2026-04-23. New `src/semantic_router.py` exposes 18 named `decision_kinds` (13 textual + 5 code-aware) with a per-kind policy table and the layer chain `fast_local → semantic_reasoner → remote_fallback`. New `src/semantic_reasoner.py` adds Mode A (`multipass_local`: reuses the mDeBERTa pin with three prompt-perturbed passes + majority vote + 0.75 floor) and Mode B (`cached_llm`: wrapper over `call_model_raw` with a pid+uuid atomic-write 24h-TTL disk cache at `~/.nexo/runtime/operations/semantic-reasoner-cache.json`, SHA-256 keyed by `decision_kind` + normalized input, LRU-bounded at 2000 entries, corrupt entries dropped on read). New `scripts/semantic-classify.py` JSON-in JSON-out CLI lets external MCP clients (including the closed-source NEXO Desktop companion) query Brain as the single semantic authority. New `NEXO_SEMANTIC_REASONER` kill switch (`0`/`off`/`false`/`no`/`disable`/`disabled`) honours the plan mandate for a runtime opt-out separate from `NEXO_LOCAL_CLASSIFIER`. Bug fixes: `bin/nexo-brain.js` upgrade flow now copies `templates/` root the same way fresh install and same-version refresh already did (Maria iMac 7.1.10→7.8.1 upgrade had lost 27 core-prompts templates and broken post-update import verification); and `tool-enforcement-map.json` `nexo_startup.enforcement.inject_prompt` now instructs the model to preload the 13 `mcp__nexo__*` protocol tools via `ToolSearch` before calling `nexo_startup` when the host MCP client defers tool schemas (Claude Code with many MCPs installed). Audit-driven hardening: router/reasoner defensively use `getattr` over the `call_model_raw` module and add a trailing `except Exception` so provider errors degrade with `remote_error` instead of propagating; cache writes use pid+uuid tmp + `fsync` + `os.replace` to survive concurrent writers; `NEXO_SEMANTIC_REASONER_TTL` parse tolerates malformed values. Tests: +50 (22 router, 20 reasoner, 8 CLI). Per-site migration of existing callers (`session_end_intent`, `r14`, `r16`, `r17`, `r20`, `r34`, T4 gates, `tools_drive`, `nexo-followup-runner`) is explicitly deferred to follow-up patch releases and tracked as `NF-SEMANTIC-ROUTER-SITE-MIGRATION`; nothing in this release changes the behaviour of the existing callers. Companion coordinated release: NEXO Desktop v0.28.0.
+Previously in `7.8.2`: patch release that fixes the compact-hook observability gap Francisco flagged after v7.8.1: `hook_runs.session_id` was empty for 7 out of 8 recent compaction rows (and when populated it stored the raw Claude Code token instead of the NEXO sid), so per-session queries over `hook_runs` for compact events could not be joined back to the NEXO session that actually compacted. v7.8.2 adds `src/hooks/compact_session_resolver.py` with `resolve_nexo_sid(claude_session_id)`, which walks the same rails the shell already uses: `sessions.claude_session_id` match, then `session_claude_aliases.claude_session_id` (most recent `last_seen` wins), then the per-conversation sidecar under `runtime/data/compacting/<safe-claude-id>.txt`, then the legacy global sidecar for single-conversation setups. `src/hooks/pre_compact.py` and `src/hooks/post_compact.py` now call the resolver and store the real NEXO sid in `hook_runs.session_id`; both wrappers also stash `{claude_session_id, sid_source}` in `hook_runs.metadata` so "why is this row still empty?" has a one-query answer. Nine new tests in `tests/test_hook_runs_compact_sid_resolution.py` pin the five resolver rails (sessions / alias / sidecar / legacy / none), malformed-sidecar rejection, the pre- and post-compact wrapper end-to-end paths, and the empty-state wrapper rail so a clean audit trail is written even when nothing resolves. No Desktop bump.
 Previously in `7.8.1`: patch release that closed the last compaction-continuity gap Francisco flagged after v7.8.0: `pre-compact.sh` Layer 2 emergency auto-diary and Layer 3 `compaction_memory.record_auto_flush` now use the exact `TARGET_SID` resolved from `CLAUDE_SESSION_ID` instead of falling back to `ORDER BY last_update_epoch DESC LIMIT 1` ("latest active session"). In multi-conversation Desktop that fallback routinely wrote the emergency diary against the wrong conversation even though the main restore path was already exact-SID in v7.8.0. `last_diary_ts` is also scoped by `session_id` now. Fail-closed when no `CLAUDE_SESSION_ID` resolves. New behavioural tests drive the real shell script with two sessions in the DB to pin the invariant. Fixed a latent bash-escape bug in `pre-compact.sh` where a double-quoted string inside a Python comment silently closed the `python3 -c "..."` argument early — caught by adding the behavioural tests. Pytest 2092 passing (+2 new behavioural). No Desktop bump.

package/bin/nexo-brain.js CHANGED Viewed

@@ -2127,6 +2127,18 @@ async function main() {
         writeRuntimeCoreArtifactsManifest(NEXO_HOME, srcDir);
         log("  Scripts updated.");
+        // Update templates/ root (core-prompts/, CLAUDE.md.template, etc.) — recursive
+        // Managed surface: copyDirRec overwrites without diffing, so any
+        // hand-edited template under ~/.nexo/templates/ is replaced on
+        // upgrade. Keep local forks under personal/ or outside the runtime
+        // home to avoid silent loss.
+        const migTemplatesSrc = path.join(__dirname, "..", "templates");
+        const migTemplatesDest = path.join(NEXO_HOME, "templates");
+        if (fs.existsSync(migTemplatesSrc)) {
+          copyDirRec(migTemplatesSrc, migTemplatesDest);
+          log("  Templates updated (user-edited templates/ files are overwritten).");
+        }
         // Register ALL 8 core hooks in settings.json (additive — don't remove user's custom hooks)
         let settings = {};
         if (fs.existsSync(CLAUDE_SETTINGS)) {

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "nexo-brain",
-  "version": "7.8.2",
+  "version": "7.9.1",
   "mcpName": "io.github.wazionapps/nexo",
   "description": "NEXO Brain \u2014 Shared brain for AI agents. Persistent memory, semantic RAG, natural forgetting, metacognitive guard, trust scoring, 150+ MCP tools. Works with Claude Code, Codex, Claude Desktop & any MCP client. 100% local, free.",
   "homepage": "https://nexo-brain.com",

package/src/autonomy_mandate.py CHANGED Viewed

@@ -39,6 +39,7 @@ from core_prompts import render_core_prompt
 NEXO_HOME = Path(os.environ.get("NEXO_HOME", str(Path.home() / ".nexo")))
 STATE_PATH = NEXO_HOME / "runtime" / "data" / "autonomy_mandate.json"
 CLASSIFIER_QUESTION = render_core_prompt("autonomy-mandate-question")
+SEMANTIC_LABELS = ("autonomy_mandate", "not_mandate")
 # Marker list per NF-DS-45569A27. Case-insensitive substring match.
 MARKERS = (
@@ -119,9 +120,21 @@ def _detect_marker(text: str, *, classifier=None) -> Optional[str]:
             return marker
     if classifier is None:
         try:
-            from enforcement_classifier import classify as classifier  # type: ignore
+            from semantic_router import route as semantic_route
         except Exception:
             return None
+        try:
+            result = semantic_route(
+                decision_kind="autonomy_mandate",
+                question=CLASSIFIER_QUESTION,
+                context=text.strip()[:1200],
+                labels=SEMANTIC_LABELS,
+            )
+            if bool(result.ok and (result.label or result.verdict) == "autonomy_mandate"):
+                return _SEMANTIC_MARKER
+        except Exception:
+            return None
+        return None
     try:
         if bool(classifier(question=CLASSIFIER_QUESTION, context=text.strip()[:1200])):
             return _SEMANTIC_MARKER

package/src/guard_verbal_ack.py CHANGED Viewed

@@ -10,6 +10,7 @@ from core_prompts import render_core_prompt
 CLASSIFIER_QUESTION = render_core_prompt("guard-verbal-ack-question")
+SEMANTIC_LABELS = ("explicit_ack", "not_ack")
 def _build_context(
@@ -44,7 +45,7 @@ def detect_guard_verbal_ack(
         return False
     if classifier is None:
         try:
-            from enforcement_classifier import classify as classifier  # type: ignore
+            from semantic_router import route as semantic_route
         except Exception:
             return False
     context = _build_context(
@@ -54,6 +55,17 @@ def detect_guard_verbal_ack(
         file_path=file_path,
         guard_summary=guard_summary,
     )
+    if classifier is None:
+        try:
+            result = semantic_route(
+                decision_kind="guard_verbal_ack",
+                question=CLASSIFIER_QUESTION,
+                context=context,
+                labels=SEMANTIC_LABELS,
+            )
+            return bool(result.ok and (result.label or result.verdict) == "explicit_ack")
+        except Exception:
+            return False
     try:
         return bool(classifier(question=CLASSIFIER_QUESTION, context=context))
     except Exception:

package/src/r14_correction_learning.py CHANGED Viewed

@@ -9,10 +9,9 @@ Fase 2 Protocol Enforcer Fase C (Capa 2) item R14. Plan doc 1 reads:
 Implementation contract:
-  - Correction detection goes through the enforcement_classifier
-    (triple-reinforced yes/no on call_model_raw). Learning #122
-    prohibits keyword-based semantic detection; the classifier path
-    is the sanctioned alternative.
+  - Correction detection goes through semantic_router decision_kind
+    ``r14_correction``. Learning #122 prohibits keyword-based semantic
+    detection; the router path is the sanctioned alternative.
   - Fail-closed: when the classifier is unavailable (no API key,
     automation_backend=none, timeout, 5xx), is_correction returns
     False. Downstream R28 (system prompt) and the auto_capture hook
@@ -31,6 +30,8 @@ from __future__ import annotations
 from core_prompts import render_core_prompt
 CLASSIFIER_QUESTION = render_core_prompt("r14-correction-learning-question")
+SEMANTIC_LABELS = ("negative_feedback", "ordinary_request")
+POSITIVE_LABEL = "negative_feedback"
 INJECTION_PROMPT_TEMPLATE = render_core_prompt("r14-correction-learning-injection")
@@ -45,7 +46,7 @@ def detect_correction(user_text: str, *, classifier=None) -> bool:
     Args:
         user_text: Raw user-role text from the stream.
         classifier: Injection point for tests. Defaults to
-            enforcement_classifier.classify.
+            semantic_router.route(decision_kind="r14_correction").
     Fail-closed on ClassifierUnavailableError — returns False rather
     than raising so the caller's enforcement loop never crashes on a
@@ -62,7 +63,17 @@ def detect_correction(user_text: str, *, classifier=None) -> bool:
         return False
     if classifier is None:
         try:
-            from enforcement_classifier import classify as classifier  # type: ignore
+            from semantic_router import route as semantic_route
+        except Exception:
+            return False
+        try:
+            result = semantic_route(
+                decision_kind="r14_correction",
+                question=CLASSIFIER_QUESTION,
+                context=text,
+                labels=SEMANTIC_LABELS,
+            )
+            return bool(result.ok and (result.label or result.verdict) == POSITIVE_LABEL)
         except Exception:
             return False
     try:

package/src/r16_declared_done.py CHANGED Viewed

@@ -10,9 +10,9 @@ Exposes detect_declared_done(assistant_text, classifier=None) → bool and
 the reminder prompt template. The window-and-state tracking lives in
 the HeadlessEnforcer / Desktop EnforcementEngine, not here.
-Classifier contract: same triple-reinforced yes/no path as R14
-(enforcement_classifier.classify → call_model_raw). Fail-closed on
-unavailable backend → detect returns False rather than raising.
+Classifier contract: same semantic_router yes/no path as R14
+(``decision_kind=r16_declared_done``). Fail-closed on unavailable backend →
+detect returns False rather than raising.
 Mirror: nexo-desktop/lib/r16-declared-done.js (pending, landing in the
 next tranche alongside the JS classifier infrastructure).
@@ -22,6 +22,7 @@ from __future__ import annotations
 from core_prompts import render_core_prompt
 CLASSIFIER_QUESTION = render_core_prompt("r16-declared-done-question")
+SEMANTIC_LABELS = ("declared_done", "not_done")
 INJECTION_PROMPT_TEMPLATE = render_core_prompt("r16-declared-done-injection")
@@ -43,7 +44,17 @@ def detect_declared_done(assistant_text: str, *, classifier=None) -> bool:
         return False
     if classifier is None:
         try:
-            from enforcement_classifier import classify as classifier  # type: ignore
+            from semantic_router import route as semantic_route
+        except Exception:
+            return False
+        try:
+            result = semantic_route(
+                decision_kind="r16_declared_done",
+                question=CLASSIFIER_QUESTION,
+                context=text,
+                labels=SEMANTIC_LABELS,
+            )
+            return bool(result.ok and (result.label or result.verdict) == "declared_done")
         except Exception:
             return False
     try:

package/src/r17_promise_debt.py CHANGED Viewed

@@ -9,9 +9,9 @@ Fase 2 Protocol Enforcer Fase D item R17. Plan doc 1 reads:
 Exposes detect_promise(text, classifier) → bool. State (promise window
 countdown) lives in the caller — mirrors the R14 / R16 pattern.
-Classifier path is the same as R14 / R16: enforcement_classifier.classify
-routes through call_model_raw with triple reinforcement. Fail-closed on
-any unavailable backend (no promise flagged rather than a false positive).
+Classifier path is the same as R14 / R16:
+semantic_router decision_kind ``r17_promise_debt``. Fail-closed on any
+unavailable backend (no promise flagged rather than a false positive).
 Mirror: nexo-desktop/lib/r17-promise-debt.js (bundled with Fase D JS
 twins at the end of the tranche).
@@ -21,6 +21,7 @@ from __future__ import annotations
 from core_prompts import render_core_prompt
 CLASSIFIER_QUESTION = render_core_prompt("r17-promise-debt-question")
+SEMANTIC_LABELS = ("promise", "no_promise")
 INJECTION_PROMPT_TEMPLATE = render_core_prompt("r17-promise-debt-injection")
@@ -37,7 +38,17 @@ def detect_promise(assistant_text: str, *, classifier=None) -> bool:
         return False
     if classifier is None:
         try:
-            from enforcement_classifier import classify as classifier  # type: ignore
+            from semantic_router import route as semantic_route
+        except Exception:
+            return False
+        try:
+            result = semantic_route(
+                decision_kind="r17_promise_debt",
+                question=CLASSIFIER_QUESTION,
+                context=text,
+                labels=SEMANTIC_LABELS,
+            )
+            return bool(result.ok and (result.label or result.verdict) == "promise")
         except Exception:
             return False
     try:

package/src/semantic_reasoner.py ADDED Viewed

@@ -0,0 +1,584 @@
+"""semantic_reasoner — second-layer semantic decision maker.
+Plan ONEPASS LLM Coverage. Called through ``src/semantic_router.py``.
+Exposes a single ``reason()`` entrypoint with two modes:
+    Mode A  — ``multipass_local``  (textual decision kinds)
+        Reuses the already-pinned ``LocalZeroShotClassifier`` (see
+        ``docs/classifier-model-notes.md``) but with stricter behaviour:
+        three inference passes with mild prompt perturbations, then
+        majority vote across passes. A decision is only accepted if at
+        least two of three passes agree AND the agreed confidence is
+        above the stricter threshold. This kills single-pass false
+        positives without adding a new model dependency.
+    Mode B  — ``cached_llm``  (code-aware decision kinds)
+        Thin wrapper around ``call_model_raw`` with a disk cache scoped
+        by (decision_kind, sha256(normalized_prompt)). TTL = 24h. The
+        cache lives under ``~/.nexo/runtime/operations/semantic-reasoner-cache.json``
+        alongside the existing classifier install state. Cache hits
+        return instantly and are flagged in ``meta.cache_hit``. Misses
+        call the LLM; the response and its normalized verdict are
+        written back to the cache atomically.
+Pin notes: this module does not introduce a new downloaded model.
+Mode A reuses ``MODEL_ID``/``MODEL_REVISION`` from ``classifier_local``.
+Mode B resolves the LLM through the standard resonance map with
+``caller='semantic_reasoner'`` and ``tier='muy_bajo'``; the pin lives
+in ``resonance_map`` like every other LLM caller.
+See ``docs/semantic-reasoner-model-notes.md`` for the rationale behind
+this "upgrade-in-place, pin-by-reuse" strategy, and why a dedicated
+stronger local LLM (Llama 3.1 8B, etc.) is explicitly deferred to a
+future release.
+"""
+from __future__ import annotations
+import hashlib
+import json
+import logging
+import os
+import re
+import time
+from dataclasses import dataclass, field
+from pathlib import Path
+from typing import Any
+_logger = logging.getLogger(__name__)
+# ---------------------------------------------------------------------------
+# Shared dataclass imported from the router
+# ---------------------------------------------------------------------------
+def _import_router_result():
+    """Lazy import to avoid circular dependency on semantic_router."""
+    from semantic_router import RouterResult
+    return RouterResult
+# ---------------------------------------------------------------------------
+# Mode A — multi-pass local
+# ---------------------------------------------------------------------------
+_PROMPT_PERTURBATIONS: tuple[str, ...] = (
+    "{q}",
+    "Decide: {q}",
+    "Classify this utterance: {q}",
+)
+def _collect_local_votes(
+    question: str, labels: tuple[str, ...]
+) -> list[tuple[str, float, dict[str, float]]]:
+    """Run the local classifier three times with mild prompt variations.
+    Returns a list of ``(label, confidence, scores)`` triples. Any
+    pass that fails silently returns a zero-confidence entry so the
+    vote aggregator can still detect quorum problems.
+    """
+    try:
+        from classifier_local import LocalZeroShotClassifier
+    except Exception as exc:  # pragma: no cover
+        _logger.debug("semantic_reasoner: classifier_local unavailable (%s)", exc)
+        return []
+    clf = LocalZeroShotClassifier(confidence_floor=0.0)
+    votes: list[tuple[str, float, dict[str, float]]] = []
+    for template in _PROMPT_PERTURBATIONS:
+        prompt = template.format(q=question)
+        result = clf.classify(prompt, labels)
+        if result is None:
+            votes.append(("", 0.0, {}))
+            continue
+        votes.append((result.label, float(result.confidence), dict(result.scores)))
+    return votes
+def _aggregate_votes(
+    votes: list[tuple[str, float, dict[str, float]]],
+    confidence_floor: float,
+) -> tuple[str | None, float, dict[str, Any]]:
+    """Majority vote across passes. Returns (label_or_none, confidence, meta)."""
+    if not votes:
+        return None, 0.0, {"reason": "no_votes"}
+    counts: dict[str, int] = {}
+    confidences: dict[str, list[float]] = {}
+    for label, confidence, _scores in votes:
+        if not label:
+            continue
+        counts[label] = counts.get(label, 0) + 1
+        confidences.setdefault(label, []).append(confidence)
+    if not counts:
+        return None, 0.0, {"reason": "all_passes_failed", "votes": len(votes)}
+    best_label = max(counts, key=lambda lbl: (counts[lbl], max(confidences[lbl])))
+    vote_count = counts[best_label]
+    avg_confidence = sum(confidences[best_label]) / len(confidences[best_label])
+    meta: dict[str, Any] = {
+        "votes_total": len(votes),
+        "votes_for_best": vote_count,
+        "avg_confidence": round(avg_confidence, 4),
+        "per_label_counts": dict(counts),
+    }
+    if vote_count < 2:
+        meta["reason"] = "no_majority"
+        return None, avg_confidence, meta
+    if avg_confidence < confidence_floor:
+        meta["reason"] = "below_threshold"
+        return None, avg_confidence, meta
+    return best_label, avg_confidence, meta
+def _reason_multipass_local(
+    *,
+    decision_kind: str,
+    question: str,
+    context: str = "",
+    labels: tuple[str, ...] | None,
+    confidence_floor: float,
+):
+    RouterResult = _import_router_result()
+    if not labels:
+        return RouterResult(
+            ok=False,
+            decision_kind=decision_kind,
+            route_used="semantic_reasoner",
+            degraded=True,
+            error="multipass_local requires labels",
+        )
+    semantic_input = (context or "").strip() or question
+    votes = _collect_local_votes(semantic_input, labels)
+    label, confidence, meta = _aggregate_votes(votes, confidence_floor)
+    if label is None:
+        return RouterResult(
+            ok=False,
+            decision_kind=decision_kind,
+            route_used="semantic_reasoner",
+            degraded=True,
+            error=meta.get("reason", "aggregation_failed"),
+            meta={"mode": "multipass_local", "aggregate": meta},
+        )
+    return RouterResult(
+        ok=True,
+        decision_kind=decision_kind,
+        verdict=label,
+        label=label,
+        confidence=round(float(confidence), 4),
+        route_used="semantic_reasoner",
+        degraded=False,
+        meta={"mode": "multipass_local", "aggregate": meta},
+    )
+# ---------------------------------------------------------------------------
+# Mode B — cached LLM
+# ---------------------------------------------------------------------------
+_DEFAULT_CACHE_TTL_SECONDS = 24 * 3600
+def _cache_path() -> Path:
+    """Resolve the on-disk cache location.
+    Reuses ``paths.operations_dir()`` so the reasoner state lives next to
+    the existing ``classifier-install-state.json``. If ``paths`` is not
+    importable (heavy module; test context), fall back to a deterministic
+    location under ``NEXO_HOME``.
+    """
+    override = os.environ.get("NEXO_SEMANTIC_REASONER_CACHE_PATH", "").strip()
+    if override:
+        return Path(override)
+    try:
+        import paths
+        return paths.operations_dir() / "semantic-reasoner-cache.json"
+    except Exception:
+        home = os.environ.get("NEXO_HOME", "").strip()
+        root = Path(home) if home else Path.home() / ".nexo"
+        return root / "runtime" / "operations" / "semantic-reasoner-cache.json"
+def _normalize_for_hash(text: str) -> str:
+    """Normalise whitespace/case so equivalent prompts hit the same cache
+    entry. Does not touch content semantics beyond whitespace collapse."""
+    return re.sub(r"\s+", " ", (text or "").strip().lower())
+def _cache_key(
+    *,
+    decision_kind: str,
+    question: str,
+    labels: tuple[str, ...] | None,
+    context: str,
+) -> str:
+    payload = json.dumps(
+        {
+            "kind": decision_kind,
+            "q": _normalize_for_hash(question),
+            "ctx": _normalize_for_hash(context)[:400],
+            "labels": list(labels) if labels else [],
+        },
+        sort_keys=True,
+        ensure_ascii=False,
+    )
+    return hashlib.sha256(payload.encode("utf-8")).hexdigest()
+def _read_cache() -> dict[str, Any]:
+    try:
+        path = _cache_path()
+        if not path.is_file():
+            return {}
+        data = json.loads(path.read_text() or "{}")
+        if isinstance(data, dict):
+            return data
+    except Exception as exc:  # pragma: no cover — corrupt cache
+        _logger.warning("semantic_reasoner: cache read failed (%s); starting fresh", exc)
+    return {}
+def _write_cache(cache: dict[str, Any]) -> None:
+    """Atomic write with pid+uuid suffix so concurrent Brain / Desktop CLI
+    writers do not stomp each other's temp file."""
+    try:
+        path = _cache_path()
+        path.parent.mkdir(parents=True, exist_ok=True)
+        import os as _os
+        import uuid as _uuid
+        tmp = path.with_name(
+            f"{path.name}.tmp.{_os.getpid()}.{_uuid.uuid4().hex[:8]}"
+        )
+        payload = json.dumps(cache, ensure_ascii=False, sort_keys=True)
+        with open(tmp, "w", encoding="utf-8") as handle:
+            handle.write(payload)
+            handle.flush()
+            try:
+                _os.fsync(handle.fileno())
+            except OSError:
+                pass
+        _os.replace(tmp, path)
+    except Exception as exc:  # pragma: no cover
+        _logger.warning("semantic_reasoner: cache write failed (%s)", exc)
+def _cache_get(key: str, ttl_seconds: int) -> dict[str, Any] | None:
+    cache = _read_cache()
+    entry = cache.get(key)
+    if not isinstance(entry, dict):
+        return None
+    ts = float(entry.get("ts", 0.0) or 0.0)
+    if ts <= 0.0:
+        return None
+    if (time.time() - ts) > ttl_seconds:
+        return None
+    return entry
+def _cache_put(key: str, entry: dict[str, Any]) -> None:
+    cache = _read_cache()
+    cache[key] = {**entry, "ts": time.time()}
+    if len(cache) > 2000:
+        # Keep the 1800 most-recent entries to avoid unbounded growth. The
+        # bound is advisory; callers should keep reasoner prompts small.
+        items = sorted(cache.items(), key=lambda kv: float(kv[1].get("ts", 0.0) or 0.0))
+        cache = dict(items[-1800:])
+    _write_cache(cache)
+def _parse_ttl_env() -> int:
+    """Read ``NEXO_SEMANTIC_REASONER_TTL`` defensively.
+    Malformed values (non-integer, negative) fall back to the default so
+    operator typos never crash the reasoner on first call.
+    """
+    raw = os.environ.get("NEXO_SEMANTIC_REASONER_TTL", "")
+    if not raw:
+        return _DEFAULT_CACHE_TTL_SECONDS
+    try:
+        parsed = int(raw)
+    except (TypeError, ValueError):
+        _logger.warning(
+            "semantic_reasoner: invalid NEXO_SEMANTIC_REASONER_TTL=%r; "
+            "using default %d",
+            raw,
+            _DEFAULT_CACHE_TTL_SECONDS,
+        )
+        return _DEFAULT_CACHE_TTL_SECONDS
+    if parsed <= 0:
+        return _DEFAULT_CACHE_TTL_SECONDS
+    return parsed
+def _reason_cached_llm(
+    *,
+    decision_kind: str,
+    question: str,
+    labels: tuple[str, ...] | None,
+    context: str,
+    confidence_floor: float,
+):
+    RouterResult = _import_router_result()
+    ttl = _parse_ttl_env()
+    key = _cache_key(
+        decision_kind=decision_kind,
+        question=question,
+        labels=labels,
+        context=context,
+    )
+    cached = _cache_get(key, ttl)
+    if cached is not None:
+        cached_verdict = cached.get("verdict")
+        if isinstance(cached_verdict, str) and cached_verdict.strip():
+            return RouterResult(
+                ok=True,
+                decision_kind=decision_kind,
+                verdict=cached_verdict,
+                label=cached_verdict,
+                confidence=float(cached.get("confidence", 0.6)),
+                route_used="semantic_reasoner",
+                degraded=False,
+                meta={
+                    "mode": "cached_llm",
+                    "cache_hit": True,
+                    "cache_key": key[:12],
+                },
+            )
+        # Corrupt entry (verdict missing or non-string). Drop it and fall
+        # through to a live call so the caller is never handed a cached
+        # "ok=True, verdict=None" sentinel.
+        _logger.warning(
+            "semantic_reasoner: dropping corrupt cache entry for key=%s",
+            key[:12],
+        )
+    try:
+        import call_model_raw as _cmr
+    except Exception as exc:  # pragma: no cover
+        return RouterResult(
+            ok=False,
+            decision_kind=decision_kind,
+            route_used="semantic_reasoner",
+            degraded=True,
+            error=f"call_model_raw unavailable: {exc}",
+            meta={"mode": "cached_llm", "cache_hit": False},
+        )
+    call_model_raw_fn = getattr(_cmr, "call_model_raw", None)
+    classifier_unavailable_cls = getattr(
+        _cmr, "ClassifierUnavailableError", Exception
+    )
+    if call_model_raw_fn is None:
+        return RouterResult(
+            ok=False,
+            decision_kind=decision_kind,
+            route_used="semantic_reasoner",
+            degraded=True,
+            error="call_model_raw callable missing",
+            meta={"mode": "cached_llm", "cache_hit": False},
+        )
+    prompt = _build_reasoner_prompt(
+        decision_kind=decision_kind,
+        question=question,
+        labels=labels,
+        context=context,
+    )
+    system = (
+        "You are NEXO's code-aware semantic reasoner. Answer with the "
+        "single best label from the provided list (no prose). If no "
+        "label fits, answer 'unknown'."
+    )
+    try:
+        raw = call_model_raw_fn(
+            prompt,
+            system=system,
+            caller="semantic_reasoner",
+            tier="muy_bajo",
+            max_tokens=32,
+            temperature=0.0,
+        )
+    except classifier_unavailable_cls as exc:
+        return RouterResult(
+            ok=False,
+            decision_kind=decision_kind,
+            route_used="semantic_reasoner",
+            degraded=True,
+            error=f"remote_unavailable: {exc}",
+            meta={"mode": "cached_llm", "cache_hit": False},
+        )
+    except Exception as exc:  # noqa: BLE001 — fail-closed
+        return RouterResult(
+            ok=False,
+            decision_kind=decision_kind,
+            route_used="semantic_reasoner",
+            degraded=True,
+            error=f"remote_error: {exc}",
+            meta={"mode": "cached_llm", "cache_hit": False},
+        )
+    verdict = _normalize_verdict(raw, labels)
+    if verdict is None:
+        return RouterResult(
+            ok=False,
+            decision_kind=decision_kind,
+            route_used="semantic_reasoner",
+            degraded=True,
+            error="llm_returned_unknown_or_unparseable",
+            meta={
+                "mode": "cached_llm",
+                "cache_hit": False,
+                "raw": (raw or "")[:80],
+            },
+        )
+    _cache_put(
+        key,
+        {
+            "verdict": verdict,
+            "confidence": max(confidence_floor, 0.6),
+            "decision_kind": decision_kind,
+        },
+    )
+    return RouterResult(
+        ok=True,
+        decision_kind=decision_kind,
+        verdict=verdict,
+        label=verdict,
+        confidence=max(confidence_floor, 0.6),
+        route_used="semantic_reasoner",
+        degraded=False,
+        meta={"mode": "cached_llm", "cache_hit": False, "cache_key": key[:12]},
+    )
+def _build_reasoner_prompt(
+    *,
+    decision_kind: str,
+    question: str,
+    labels: tuple[str, ...] | None,
+    context: str,
+) -> str:
+    parts = [
+        f"decision_kind: {decision_kind}",
+        f"question: {question}",
+    ]
+    if context:
+        parts.append(f"context: {context[:600]}")
+    if labels:
+        parts.append("candidate_labels: " + ", ".join(labels))
+        parts.append("Reply with exactly one of the labels above.")
+    else:
+        parts.append("Reply with the shortest phrase answering the question.")
+    return "\n".join(parts)
+def _normalize_verdict(
+    raw: str, labels: tuple[str, ...] | None
+) -> str | None:
+    text = (raw or "").strip().lower()
+    if not text:
+        return None
+    if text == "unknown":
+        return None
+    if labels:
+        for label in labels:
+            if label.lower() == text:
+                return label
+        for label in labels:
+            if label.lower() in text:
+                return label
+        return None
+    return text
+# ---------------------------------------------------------------------------
+# Public entrypoint
+# ---------------------------------------------------------------------------
+_REASONER_OFF_VALUES = {"0", "off", "false", "no", "disable", "disabled"}
+def _is_reasoner_disabled() -> bool:
+    """Honour the ``NEXO_SEMANTIC_REASONER`` runtime kill switch.
+    The plan (ONEPASS LLM Coverage) explicitly required an env opt-out
+    dedicated to the reasoner, separate from ``NEXO_LOCAL_CLASSIFIER``
+    (which only gates install-time provisioning). Operators who hit a
+    reasoner regression in production can set ``NEXO_SEMANTIC_REASONER=0``
+    to force every ``reason()`` call to refuse; the router then falls
+    through to ``remote_fallback`` on its own.
+    """
+    raw = os.environ.get("NEXO_SEMANTIC_REASONER", "").strip().lower()
+    return raw in _REASONER_OFF_VALUES
+def reason(
+    *,
+    decision_kind: str,
+    question: str,
+    labels: tuple[str, ...] | list[str] | None,
+    context: str = "",
+    mode: str = "multipass_local",
+    confidence_floor: float = 0.75,
+):
+    """Dispatch to the configured mode. Called by ``semantic_router.route``.
+    Returns a ``RouterResult``. The router knows how to keep going to
+    ``remote_fallback`` if this layer refuses.
+    """
+    RouterResult = _import_router_result()
+    if _is_reasoner_disabled():
+        return RouterResult(
+            ok=False,
+            decision_kind=decision_kind,
+            route_used="semantic_reasoner",
+            degraded=True,
+            error="reasoner_disabled_by_env",
+            meta={"env": "NEXO_SEMANTIC_REASONER"},
+        )
+    labels_tuple: tuple[str, ...] | None = tuple(labels) if labels else None
+    if mode == "multipass_local":
+        return _reason_multipass_local(
+            decision_kind=decision_kind,
+            question=question,
+            context=context,
+            labels=labels_tuple,
+            confidence_floor=confidence_floor,
+        )
+    if mode == "cached_llm":
+        return _reason_cached_llm(
+            decision_kind=decision_kind,
+            question=question,
+            labels=labels_tuple,
+            context=context,
+            confidence_floor=confidence_floor,
+        )
+    return RouterResult(
+        ok=False,
+        decision_kind=decision_kind,
+        route_used="semantic_reasoner",
+        degraded=True,
+        error=f"unknown reasoner mode: {mode}",
+    )
+__all__ = ["reason"]

package/src/semantic_router.py ADDED Viewed

@@ -0,0 +1,462 @@
+"""semantic_router — Plan ONEPASS LLM Coverage.
+Central router for every model-backed semantic decision in NEXO Brain. Call
+sites declare a *decision_kind* and pass question/context; the router
+applies the policy for that kind and dispatches through the stack:
+    fast_local  ->  semantic_reasoner  ->  remote_fallback
+Design contract (from ~/Desktop/NEXO-ONEPASS-LLM-COVERAGE-RELEASE-PLAN.md):
+- Brain owns the semantic contract, model pins and routing policy.
+- Every call site passes a *named* decision_kind; policy lives here, not in
+  the caller. This replaces the previous pattern where each caller invented
+  its own policy tree.
+- The existing ``LocalZeroShotClassifier`` stays as the cheap multilingual
+  first pass (``fast_local``).
+- ``semantic_reasoner`` is the second, stronger layer. Its implementation
+  lives in ``src/semantic_reasoner.py`` with two modes: Mode A (strict
+  multi-pass over the same local classifier with tighter thresholds) and
+  Mode B (LLM-cached reasoner for code-aware decisions).
+- ``remote_fallback`` is the existing ``call_model_raw`` chain. It is no
+  longer the default path for local-friendly decisions; it only fires if
+  the upstream layers refuse or degrade.
+The router returns a ``RouterResult`` dataclass so callers can inspect
+which route was used, whether degraded mode is active, and what confidence
+the decision carries. This is also what Desktop will consume via the
+``brain-semantic-router.js`` bridge shipped in the companion PR.
+"""
+from __future__ import annotations
+import logging
+from dataclasses import dataclass, field
+from typing import Any
+_logger = logging.getLogger(__name__)
+# ---------------------------------------------------------------------------
+# Contract dataclasses
+# ---------------------------------------------------------------------------
+@dataclass
+class RouterResult:
+    """Outcome of a ``route()`` call.
+    Fields match the minimum contract documented in the plan (section
+    "Minimum router output contract"):
+      - ``ok``: overall success (at least one layer produced a decision)
+      - ``decision_kind``: the kind the caller passed
+      - ``verdict``: the chosen label when the caller used zero-shot
+        classification; None when the underlying layer returned free text
+      - ``label``: alias for ``verdict`` to match the plan's wording; kept
+        consistent to simplify Desktop bridge mapping
+      - ``confidence``: [0.0, 1.0]
+      - ``route_used``: one of ``fast_local``, ``semantic_reasoner``,
+        ``remote_fallback``, or ``no_route`` when every layer refused
+      - ``degraded``: True when the chosen layer could not meet its normal
+        bar (fallback fired, stricter threshold not met, cache-only, etc.)
+      - ``error``: short human-readable reason when ``ok`` is False
+      - ``meta``: free-form layer-specific evidence (scores dict, cache
+        key, latency, model id) — Desktop uses it for telemetry
+    """
+    ok: bool
+    decision_kind: str
+    verdict: str | None = None
+    label: str | None = None
+    confidence: float = 0.0
+    route_used: str = "no_route"
+    degraded: bool = False
+    error: str | None = None
+    meta: dict[str, Any] = field(default_factory=dict)
+# ---------------------------------------------------------------------------
+# Decision kinds + policy table
+# ---------------------------------------------------------------------------
+#
+# The plan enumerates 18 decision_kinds that need to route through here. They
+# fall into two families:
+#
+#   TEXTUAL     — the first-line local classifier is good enough; the
+#                 reasoner adds a stricter multi-pass check for ambiguous
+#                 cases. Remote is only a last-resort safety net.
+#
+#   CODE_AWARE  — the fast local classifier is not designed for code-aware
+#                 semantics (T4 R15/R23e/R23f/R23h, r20). The reasoner
+#                 routes those straight to a cached LLM call.
+#
+# Any decision_kind not listed here falls through to remote_fallback with
+# ``degraded=True`` to make accidental misuse visible in telemetry instead
+# of silent.
+#
+# Keep this map in lockstep with ``docs/semantic-reasoner-model-notes.md``.
+TEXTUAL_KINDS: tuple[str, ...] = (
+    "session_end_intent",
+    "autonomy_mandate",
+    "guard_verbal_ack",
+    "r14_correction",
+    "r16_declared_done",
+    "r17_promise_debt",
+    "r34_identity_coherence",
+    "followup_operator_attention",
+    "drive_signal_type",
+    "drive_area",
+    "reply_event_type",
+    "query_intent",
+    "sentiment_intent",
+)
+CODE_AWARE_KINDS: tuple[str, ...] = (
+    "r20_constant_change",
+    "t4_r15",
+    "t4_r23e",
+    "t4_r23f",
+    "t4_r23h",
+)
+ALL_DECISION_KINDS: tuple[str, ...] = TEXTUAL_KINDS + CODE_AWARE_KINDS
+# Per-kind policy. Explicit, human-readable, no defaults that silently
+# expand coverage. Changing policy = editing this dict + updating the
+# model-notes doc + bumping tests.
+_POLICY: dict[str, dict[str, Any]] = {
+    kind: {
+        "family": "textual",
+        "fast_local_threshold": 0.60,
+        "reasoner_mode": "multipass_local",
+        "reasoner_threshold": 0.75,
+        "allow_remote_fallback": True,
+    }
+    for kind in TEXTUAL_KINDS
+}
+_POLICY.update(
+    {
+        kind: {
+            "family": "code_aware",
+            "fast_local_threshold": None,  # skip fast_local
+            "reasoner_mode": "cached_llm",
+            "reasoner_threshold": 0.60,
+            "allow_remote_fallback": True,
+        }
+        for kind in CODE_AWARE_KINDS
+    }
+)
+def policy_for(decision_kind: str) -> dict[str, Any] | None:
+    """Return the policy entry for a kind, or None if unknown."""
+    return _POLICY.get(decision_kind)
+# ---------------------------------------------------------------------------
+# Layer adapters
+# ---------------------------------------------------------------------------
+#
+# The router does not import the heavy modules at the top of the file so
+# that a caller who only wants ``policy_for`` or ``ALL_DECISION_KINDS`` does
+# not pay the import cost. The adapters below resolve the dependencies
+# lazily and wrap failures as ``None`` so the router can advance to the
+# next layer deterministically.
+def _run_fast_local(
+    *,
+    question: str,
+    context: str = "",
+    labels: tuple[str, ...],
+    confidence_floor: float,
+) -> RouterResult | None:
+    """Try ``LocalZeroShotClassifier``. Return None on unavailable or
+    below-threshold so the router advances.
+    The first layer must classify the actual user/assistant payload. For
+    guard decisions the ``question`` is usually a stable prompt template and
+    the live text lives in ``context``; feeding both into a zero-shot NLI
+    classifier makes the static prompt dominate the decision. Use context
+    when present, and fall back to question for simple direct callers.
+    """
+    try:
+        from classifier_local import LocalZeroShotClassifier
+    except Exception as exc:  # pragma: no cover — install not ready
+        _logger.debug("semantic_router: classifier_local unavailable (%s)", exc)
+        return None
+    clf = LocalZeroShotClassifier(confidence_floor=confidence_floor)
+    classifier_input = (context or "").strip() or question
+    result = clf.classify(classifier_input, labels)
+    if result is None:
+        return None
+    if result.confidence < confidence_floor:
+        return None
+    return RouterResult(
+        ok=True,
+        decision_kind="",  # filled by caller
+        verdict=result.label,
+        label=result.label,
+        confidence=float(result.confidence),
+        route_used="fast_local",
+        degraded=False,
+        meta={
+            "scores": dict(result.scores),
+            "latency_ms": float(result.latency_ms),
+            "threshold": confidence_floor,
+        },
+    )
+def _run_semantic_reasoner(
+    *,
+    decision_kind: str,
+    question: str,
+    labels: tuple[str, ...] | None,
+    context: str,
+    mode: str,
+    confidence_floor: float,
+) -> RouterResult | None:
+    """Delegate to ``src/semantic_reasoner.py``. Return None on unavailable
+    so the router advances to remote_fallback."""
+    try:
+        from semantic_reasoner import reason
+    except Exception as exc:  # pragma: no cover
+        _logger.debug("semantic_router: semantic_reasoner unavailable (%s)", exc)
+        return None
+    try:
+        return reason(
+            decision_kind=decision_kind,
+            question=question,
+            labels=labels,
+            context=context,
+            mode=mode,
+            confidence_floor=confidence_floor,
+        )
+    except Exception as exc:  # noqa: BLE001 — fail-closed, degrade to remote
+        _logger.warning("semantic_reasoner.reason raised: %s", exc)
+        return None
+def _run_remote_fallback(
+    *,
+    decision_kind: str,
+    question: str,
+    labels: tuple[str, ...] | None,
+    context: str,
+) -> RouterResult | None:
+    """Last-resort LLM call via ``call_model_raw``. The router marks the
+    result as ``degraded=True`` so telemetry shows when the stack fell
+    through."""
+    try:
+        import call_model_raw as _cmr
+    except Exception as exc:  # pragma: no cover
+        _logger.debug("semantic_router: call_model_raw unavailable (%s)", exc)
+        return None
+    # Resolve symbols defensively. Tests sometimes stub only ``call_model_raw``
+    # and forget ``ClassifierUnavailableError`` (or vice versa); without this
+    # guard a missing attribute later becomes NameError at ``except`` time and
+    # crashes the router instead of degrading.
+    call_model_raw_fn = getattr(_cmr, "call_model_raw", None)
+    classifier_unavailable_cls = getattr(
+        _cmr, "ClassifierUnavailableError", Exception
+    )
+    if call_model_raw_fn is None:
+        return RouterResult(
+            ok=False,
+            decision_kind=decision_kind,
+            route_used="remote_fallback",
+            degraded=True,
+            error="call_model_raw callable missing",
+        )
+    prompt = _build_remote_prompt(
+        decision_kind=decision_kind,
+        question=question,
+        labels=labels,
+        context=context,
+    )
+    system = (
+        "You are NEXO's remote semantic fallback. Answer with the single "
+        "best label from the provided list, or with 'unknown' if none fit. "
+        "No prose, no explanation."
+    )
+    try:
+        raw = call_model_raw_fn(
+            prompt,
+            system=system,
+            caller="semantic_reasoner",
+            tier="muy_bajo",
+            max_tokens=32,
+            temperature=0.0,
+        )
+    except classifier_unavailable_cls as exc:
+        return RouterResult(
+            ok=False,
+            decision_kind=decision_kind,
+            route_used="remote_fallback",
+            degraded=True,
+            error=f"remote_unavailable: {exc}",
+        )
+    except Exception as exc:  # noqa: BLE001 — fail-closed, never re-raise
+        return RouterResult(
+            ok=False,
+            decision_kind=decision_kind,
+            route_used="remote_fallback",
+            degraded=True,
+            error=f"remote_error: {exc}",
+        )
+    verdict = _normalize_remote_answer(raw, labels)
+    raw_preview = (raw or "")[:120]
+    return RouterResult(
+        ok=verdict is not None,
+        decision_kind=decision_kind,
+        verdict=verdict,
+        label=verdict,
+        confidence=0.55 if verdict is not None else 0.0,
+        route_used="remote_fallback",
+        degraded=True,  # always degraded relative to the local-first ideal
+        meta={"raw_response": raw_preview},
+    )
+def _build_remote_prompt(
+    *,
+    decision_kind: str,
+    question: str,
+    labels: tuple[str, ...] | None,
+    context: str,
+) -> str:
+    parts = [
+        f"Decision kind: {decision_kind}",
+        f"Question: {question}",
+    ]
+    if context:
+        parts.append(f"Context: {context[:400]}")
+    if labels:
+        parts.append("Candidate labels: " + ", ".join(labels))
+        parts.append("Reply with exactly one of the labels above.")
+    else:
+        parts.append("Reply with the shortest phrase that answers the question.")
+    return "\n".join(parts)
+def _normalize_remote_answer(
+    raw: str, labels: tuple[str, ...] | None
+) -> str | None:
+    text = (raw or "").strip().lower()
+    if not text:
+        return None
+    if labels:
+        for label in labels:
+            if label.lower() == text:
+                return label
+        for label in labels:
+            if label.lower() in text:
+                return label
+        return None
+    return text
+# ---------------------------------------------------------------------------
+# Public entrypoint
+# ---------------------------------------------------------------------------
+def route(
+    *,
+    decision_kind: str,
+    question: str,
+    context: str = "",
+    labels: tuple[str, ...] | list[str] | None = None,
+    allow_remote_fallback: bool = True,
+) -> RouterResult:
+    """Route a semantic decision through the stack.
+    The caller names the *kind* of decision. The router looks up the policy,
+    dispatches through fast_local -> semantic_reasoner -> remote_fallback,
+    and returns the first layer that produced a decision above its
+    threshold.
+    ``allow_remote_fallback=False`` forces local-only behaviour; the router
+    will return ``ok=False, route_used='no_route'`` if every local layer
+    refused. Useful for strict-offline automation or pytest.
+    """
+    policy = policy_for(decision_kind)
+    if policy is None:
+        return RouterResult(
+            ok=False,
+            decision_kind=decision_kind,
+            route_used="no_route",
+            degraded=True,
+            error=f"unknown decision_kind: {decision_kind}",
+        )
+    labels_tuple: tuple[str, ...] | None = (
+        tuple(labels) if labels else None
+    )
+    # Step 1 — fast_local for textual families only.
+    if policy["fast_local_threshold"] is not None and labels_tuple:
+        fast = _run_fast_local(
+            question=question,
+            context=context,
+            labels=labels_tuple,
+            confidence_floor=float(policy["fast_local_threshold"]),
+        )
+        if fast is not None:
+            fast.decision_kind = decision_kind
+            return fast
+    # Step 2 — semantic_reasoner (Mode A or B depending on policy).
+    reasoned = _run_semantic_reasoner(
+        decision_kind=decision_kind,
+        question=question,
+        labels=labels_tuple,
+        context=context,
+        mode=str(policy["reasoner_mode"]),
+        confidence_floor=float(policy["reasoner_threshold"]),
+    )
+    if reasoned is not None and reasoned.ok:
+        return reasoned
+    # Step 3 — remote_fallback if allowed.
+    if allow_remote_fallback and policy.get("allow_remote_fallback", True):
+        remote = _run_remote_fallback(
+            decision_kind=decision_kind,
+            question=question,
+            labels=labels_tuple,
+            context=context,
+        )
+        if remote is not None:
+            return remote
+    return RouterResult(
+        ok=False,
+        decision_kind=decision_kind,
+        route_used="no_route",
+        degraded=True,
+        error="every layer refused or was unavailable",
+    )
+__all__ = [
+    "ALL_DECISION_KINDS",
+    "CODE_AWARE_KINDS",
+    "RouterResult",
+    "TEXTUAL_KINDS",
+    "policy_for",
+    "route",
+]

package/src/session_end_intent.py CHANGED Viewed

@@ -8,6 +8,7 @@ from __future__ import annotations
 from core_prompts import render_core_prompt
 CLASSIFIER_QUESTION = render_core_prompt("session-end-intent-question")
+SEMANTIC_LABELS = ("session_end", "continue_session")
 def detect_session_end_intent(user_text: str, *, classifier=None) -> bool:
@@ -16,7 +17,17 @@ def detect_session_end_intent(user_text: str, *, classifier=None) -> bool:
         return False
     if classifier is None:
         try:
-            from enforcement_classifier import classify as classifier  # type: ignore
+            from semantic_router import route as semantic_route
+        except Exception:
+            return False
+        try:
+            result = semantic_route(
+                decision_kind="session_end_intent",
+                question=CLASSIFIER_QUESTION,
+                context=text,
+                labels=SEMANTIC_LABELS,
+            )
+            return bool(result.ok and (result.label or result.verdict) == "session_end")
         except Exception:
             return False
     try:

package/tool-enforcement-map.json CHANGED Viewed

@@ -3214,7 +3214,7 @@
             "threshold": 1
           }
         ],
-        "inject_prompt": "You must start by calling nexo_startup to register this session. Execute it now with a brief task description. Do not produce visible text.",
+        "inject_prompt": "You must start by calling nexo_startup to register this session. If mcp__nexo__* tools appear as deferred in the tool list (names visible but JSONSchemas not loaded), first call ToolSearch with query \"select:mcp__nexo__nexo_startup,mcp__nexo__nexo_heartbeat,mcp__nexo__nexo_session_diary_read,mcp__nexo__nexo_reminders,mcp__nexo__nexo_smart_startup,mcp__nexo__nexo_task_open,mcp__nexo__nexo_task_close,mcp__nexo__nexo_task_acknowledge_guard,mcp__nexo__nexo_guard_check,mcp__nexo__nexo_learning_add,mcp__nexo__nexo_confidence_check,mcp__nexo__nexo_followup_create,mcp__nexo__nexo_protocol_debt_resolve\" to load the schemas — deferred is not absent. If more nexo_* tools appear deferred later in the session, preload them the same way instead of giving up on them. Then execute nexo_startup with a brief task description. Do not produce visible text.",
         "triggers_after": [
           "nexo_smart_startup",
           "nexo_session_diary_read",