nexo-brain 7.11.1 → 7.11.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "nexo-brain",
3
- "version": "7.11.1",
3
+ "version": "7.11.2",
4
4
  "description": "Local cognitive runtime for Claude Code \u2014 persistent memory, overnight learning, doctor diagnostics, personal scripts, recovery-aware jobs, startup preflight, and optional dashboard/power helper.",
5
5
  "author": {
6
6
  "name": "NEXO Brain",
package/README.md CHANGED
@@ -18,7 +18,9 @@
18
18
 
19
19
  [Watch the overview video](https://nexo-brain.com/watch/) · [Watch on YouTube](https://www.youtube.com/watch?v=i2lkGhKyVqI) · [Open the infographic](https://nexo-brain.com/assets/nexo-brain-infographic-v5.png)
20
20
 
21
- Version `7.11.1` is the current packaged-runtime line. Patch release — caches the runtime fingerprint by `(file_count, size_total, max_mtime)` signature so MCP startup and the per-tool-call `resolve_restart_required` skip the 263-file rehash when nothing on disk changed. ~11× speedup warm path (~40ms ~3.7ms locally), ~10-20s/day saved across Claude Code / Codex / headless / deep-sleep / cron startups. Cache miss is always safe (falls through to full hash and self-repairs). Default `use_cache=False` keeps `plugins/update.py` on the ground-truth path around `git pull` / `npm update`. Builds on the v7.11.0 runtime fingerprint that gates `mcp-restart-required.json`. Full write-up in [`docs/runtime-fingerprint.md`](docs/runtime-fingerprint.md).
21
+ Version `7.11.2` is the current packaged-runtime line. Patch release — two reliability fixes in the same family ("components ignoring signals they should respect"): (1) `STUCK CRON REAPER` added to `nexo-watchdog.sh` and (2) the Guardian/Enforcer now honors the `mcp-restart-required` marker. Previously the enforcer kept injecting `<system-reminder>` blocks asking the agent to call `nexo_*` tools while the MCP server was already returning `mcp_restart_required` for every call — every ping was a guaranteed no-op. The new gate at the top of `HeadlessEnforcer._enqueue()` reads the marker file (cached per-instance, 30s TTL) and skips reminders that mention `nexo_` while the marker is present. Reminders that don't reference `nexo_*` (R23 deploy guards, R25 nora/maria read-only, etc.) still fire they don't depend on the MCP. The watchdog reaper closes a sibling gap: the v5.8.1 fix taught the watchdog to leave running jobs alone (it had been killing `deep-sleep` mid-flight 2026-04-14..17). The same restraint silently let truly hung wrappers — e.g. headless `claude --bare` blocked on an MCP that flagged `mcp_restart_required` — block their own next tick for days (`morning-agent`, `followup-runner` and `orchestrator-v2` went silent 2026-04-24..27). The reaper sweeps every `cron_runs` row with `ended_at IS NULL` and reaps anything older than `stuck_after_seconds` (per-cron from `manifest.json`, fallback 12h global). Live wrapper → `SIGTERM` (the wrapper's existing trap closes the row at `exit 143`), 10s grace, then `SIGKILL` on wrapper + descendants. Orphan zombi row → cleaned in-band with `exit_code=137`. `cron_id='watchdog'` is hard-coded skip so the watchdog never reaps itself. Generous defaults (deep-sleep 8h, sleep/evolution 4h) prevent any v5.8.1 regression. New observability: `summary.reaped` in `watchdog-status.json`, `REAPED:` header in the human report, `REAPED=N` in the final log line. 6 new tests; 3 existing watchdog tests stay green.
22
+
23
+ Previously in `7.11.1`: patch release — caches the runtime fingerprint by `(file_count, size_total, max_mtime)` signature so MCP startup and the per-tool-call `resolve_restart_required` skip the 263-file rehash when nothing on disk changed. ~11× speedup warm path (~40ms → ~3.7ms locally), ~10-20s/day saved across Claude Code / Codex / headless / deep-sleep / cron startups. Cache miss is always safe (falls through to full hash and self-repairs). Default `use_cache=False` keeps `plugins/update.py` on the ground-truth path around `git pull` / `npm update`. Builds on the v7.11.0 runtime fingerprint that gates `mcp-restart-required.json`. Full write-up in [`docs/runtime-fingerprint.md`](docs/runtime-fingerprint.md).
22
24
 
23
25
  Previously in `7.10.0`: minor release — **removes the LLM proxy override path that 7.9.28 → 7.9.34 introduced**. Background: 7.9.28 added two opt-in files at `~/.nexo/config/llm_endpoint.json` and `~/.nexo/config/auth_provider.json` that let a third-party orchestrator (NEXO Desktop) redirect every Anthropic SDK call from Brain to a custom proxy and resolve the bearer via a local helper, with concrete model names translated to wire aliases (`nexo-max`, `nexo-high`, `nexo-medium`, `nexo-low`, `nexo-mini`) and an `Idempotency-Key` per request. NEXO Desktop's commercial model has changed: Desktop is now a wrapper over the user's own Claude Code subscription (Max / Pro), with a separate Desktop licence. Brain calls go directly to `api.anthropic.com` using the user's existing OAuth (the one stored under `~/.claude/` and consumed by Claude Code spawns) or a plain `ANTHROPIC_API_KEY`. There is no NEXO bearer, no NEXO proxy, no NEXO credit accounting in this codebase. Every proxy symbol is gone from `call_model_raw.py` and `agent_runner.py`; the proxy-specific tests and `docs/api/override-files.md` are removed; any pre-existing override files on disk are simply ignored from this release forward.
24
26
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "nexo-brain",
3
- "version": "7.11.1",
3
+ "version": "7.11.2",
4
4
  "mcpName": "io.github.wazionapps/nexo",
5
5
  "description": "NEXO Brain — Shared brain for AI agents. Persistent memory, semantic RAG, natural forgetting, metacognitive guard, trust scoring, 150+ MCP tools. Works with Claude Code, Codex, Claude Desktop & any MCP client. 100% local, free.",
6
6
  "homepage": "https://nexo-brain.com",
@@ -13,6 +13,7 @@
13
13
  "recovery_policy": "catchup",
14
14
  "idempotent": true,
15
15
  "max_catchup_age": 172800,
16
+ "stuck_after_seconds": 28800,
16
17
  "run_on_boot": true,
17
18
  "run_on_wake": true
18
19
  },
@@ -38,6 +39,7 @@
38
39
  "recovery_policy": "catchup",
39
40
  "idempotent": true,
40
41
  "max_catchup_age": 172800,
42
+ "stuck_after_seconds": 14400,
41
43
  "run_on_boot": true,
42
44
  "run_on_wake": true
43
45
  },
@@ -140,6 +142,7 @@
140
142
  "recovery_policy": "catchup",
141
143
  "idempotent": true,
142
144
  "max_catchup_age": 1209600,
145
+ "stuck_after_seconds": 14400,
143
146
  "run_on_boot": true,
144
147
  "run_on_wake": true
145
148
  },
@@ -295,6 +298,7 @@
295
298
  "recovery_policy": "run_once_on_wake",
296
299
  "idempotent": true,
297
300
  "max_catchup_age": 1200,
301
+ "stuck_after_seconds": 600,
298
302
  "run_on_boot": false,
299
303
  "run_on_wake": true
300
304
  },
@@ -308,6 +312,7 @@
308
312
  "recovery_policy": "run_once_on_wake",
309
313
  "idempotent": true,
310
314
  "max_catchup_age": 7200,
315
+ "stuck_after_seconds": 1800,
311
316
  "run_on_boot": false,
312
317
  "run_on_wake": true
313
318
  },
@@ -321,6 +326,7 @@
321
326
  "recovery_policy": "catchup",
322
327
  "idempotent": true,
323
328
  "max_catchup_age": 86400,
329
+ "stuck_after_seconds": 1800,
324
330
  "run_on_boot": false,
325
331
  "run_on_wake": true
326
332
  }
@@ -2520,6 +2520,44 @@ class HeadlessEnforcer:
2520
2520
  # the per-rule tag collision check, and time-dedup at the call site.
2521
2521
  _LEGACY_TAG_PREFIXES = ("after:", "periodic_msgs:", "periodic_time:", "start:")
2522
2522
 
2523
+ @staticmethod
2524
+ def _mcp_restart_marker_path() -> "Path":
2525
+ """Resolve the path to the MCP restart-required marker on disk.
2526
+
2527
+ The marker is written by `plugins/update.py` when a `nexo update`
2528
+ actually changes runtime `.py` bytes (cf. v7.11.0 fingerprint
2529
+ gating). Honors the F0.6 runtime/operations/ canonical layout
2530
+ with a fall-back to the pre-F0.6 operations/ legacy layout so
2531
+ half-migrated installs are still detected correctly.
2532
+ """
2533
+ from pathlib import Path as _Path
2534
+ home = _Path(os.environ.get("NEXO_HOME", str(_Path.home() / ".nexo")))
2535
+ new = home / "runtime" / "operations" / "mcp-restart-required.json"
2536
+ if new.is_file():
2537
+ return new
2538
+ legacy = home / "operations" / "mcp-restart-required.json"
2539
+ return legacy if legacy.is_file() else new
2540
+
2541
+ def _mcp_restart_pending(self) -> bool:
2542
+ """Return True if the MCP server has a restart-required marker on disk.
2543
+
2544
+ Cached per-instance with a 30s TTL: the marker rarely changes mid-
2545
+ session (it's written by `nexo update` and cleared by the next
2546
+ client restart) but a TTL keeps long-lived enforcer instances from
2547
+ getting stuck on a stale negative cache if the operator runs
2548
+ `nexo update` mid-session without restarting.
2549
+ """
2550
+ cached_at = getattr(self, "_mcp_restart_pending_cache_at", 0.0)
2551
+ if (time.time() - cached_at) < 30.0:
2552
+ return getattr(self, "_mcp_restart_pending_cache", False)
2553
+ try:
2554
+ result = self._mcp_restart_marker_path().is_file()
2555
+ except Exception: # noqa: BLE001 — never block enforcement on path errors
2556
+ result = False
2557
+ self._mcp_restart_pending_cache = result
2558
+ self._mcp_restart_pending_cache_at = time.time()
2559
+ return result
2560
+
2523
2561
  def _enqueue(self, prompt: str, tag: str, rule_id: str = ""):
2524
2562
  """Enqueue an injection. Mirrors Desktop _enqueue for parity.
2525
2563
 
@@ -2535,6 +2573,21 @@ class HeadlessEnforcer:
2535
2573
  """
2536
2574
  if any(q["tag"] == tag for q in self.injection_queue):
2537
2575
  return
2576
+ # v7.11.2: suppress reminders that ask the agent to call nexo_*
2577
+ # tools while the MCP server has a restart-required marker on
2578
+ # disk. Without this gate every periodic ping ("Execute
2579
+ # nexo_session_diary_write", "Execute nexo_smart_startup",
2580
+ # nexo_guard_check pre-Edit, etc) returns mcp_restart_required
2581
+ # and the agent burns cycles on guaranteed no-ops. Reminders that
2582
+ # don't reference nexo_* (R23 deploy guards, R25 nora/maria
2583
+ # read-only, etc) still fire — they don't depend on the MCP.
2584
+ if "nexo_" in prompt and self._mcp_restart_pending():
2585
+ _logger.info(
2586
+ "SKIP: %s — mcp_restart_required marker present (rule_id=%s)",
2587
+ tag,
2588
+ rule_id or "?",
2589
+ )
2590
+ return
2538
2591
  legacy = tag.startswith(self._LEGACY_TAG_PREFIXES)
2539
2592
  if legacy:
2540
2593
  tool = tag.split(":")[-1].split("->")[-1]
@@ -530,6 +530,183 @@ json_escape() {
530
530
  echo "$1" | sed 's/\\/\\\\/g; s/"/\\"/g; s/ / /g' | tr '\n' ' '
531
531
  }
532
532
 
533
+ # ============================================================================
534
+ # STUCK CRON REAPER (v7.11.2)
535
+ # ============================================================================
536
+ # Mirror image of the v5.8.1 in-flight detection. The v5.8.1 fix taught the
537
+ # watchdog to leave running jobs alone when their cron_runs row was open
538
+ # (started_at present, ended_at NULL) — that closed the loop where the
539
+ # watchdog kept kickstart -k'ing deep-sleep mid-flight (2026-04-14..17).
540
+ #
541
+ # But the same restraint became the new failure mode: when a wrapper child
542
+ # truly hangs (e.g. headless `claude --bare` blocked on an MCP that flagged
543
+ # `mcp_restart_required`), the row stays open forever, no new tick can run
544
+ # (the next wrapper sees "Another instance running. Skipping"), and the
545
+ # watchdog's only response was WARN. Morning brief, followup runner, and
546
+ # orchestrator-v2 went silent for days because of this.
547
+ #
548
+ # The reaper closes that gap without bringing back the v5.8.1 bug:
549
+ # * Per-cron threshold via `stuck_after_seconds` in manifest.json.
550
+ # * Generous default (12h) so legitimate long jobs keep running.
551
+ # * Override deep-sleep to 8h, sleep/evolution to 4h — well above their
552
+ # real worst-case so the v5.8.1 incident cannot repeat.
553
+ # * Reaper sends SIGTERM to the wrapper — its trap (line 187) closes the
554
+ # cron_runs row exit_code=143 and propagates to the child. Only after
555
+ # a 10s grace does it escalate to SIGKILL on wrapper + descendants.
556
+ # * If no wrapper PID is alive (orphan row), the reaper just closes the
557
+ # row in-band with exit_code=137 so the next tick can run.
558
+ # ============================================================================
559
+
560
+ STUCK_DEFAULT_SECONDS="${STUCK_DEFAULT_SECONDS:-43200}" # 12h
561
+ STUCK_KILL_GRACE="${STUCK_KILL_GRACE:-10}"
562
+ TOTAL_REAPED=0
563
+
564
+ # Skip cron_ids that should never be reaped from inside a watchdog tick.
565
+ # 'watchdog' is us — reaping ourselves would be self-immolation.
566
+ STUCK_REAPER_SKIP="watchdog"
567
+
568
+ _build_stuck_thresholds_from_manifest() {
569
+ if [ ! -f "$MANIFEST_FILE" ]; then
570
+ return
571
+ fi
572
+ python3 - "$MANIFEST_FILE" <<'PY' 2>/dev/null
573
+ import json, sys
574
+ try:
575
+ with open(sys.argv[1]) as f:
576
+ data = json.load(f)
577
+ except Exception:
578
+ sys.exit(0)
579
+ for c in data.get('crons', []):
580
+ cid = c.get('id')
581
+ th = c.get('stuck_after_seconds')
582
+ if cid and isinstance(th, (int, float)) and th > 0:
583
+ print(f"{cid}|{int(th)}")
584
+ PY
585
+ }
586
+
587
+ STUCK_THRESHOLDS_RAW=""
588
+ _load_stuck_thresholds() {
589
+ STUCK_THRESHOLDS_RAW=$(_build_stuck_thresholds_from_manifest)
590
+ }
591
+
592
+ lookup_stuck_threshold() {
593
+ local cron_id="$1"
594
+ if [ -z "$STUCK_THRESHOLDS_RAW" ]; then
595
+ echo "$STUCK_DEFAULT_SECONDS"
596
+ return
597
+ fi
598
+ local line
599
+ line=$(echo "$STUCK_THRESHOLDS_RAW" | grep "^${cron_id}|" | head -1)
600
+ if [ -n "$line" ]; then
601
+ echo "$line" | cut -d'|' -f2
602
+ else
603
+ echo "$STUCK_DEFAULT_SECONDS"
604
+ fi
605
+ }
606
+
607
+ find_wrapper_pids() {
608
+ local cron_id="$1"
609
+ # Match the wrapper's exact arg slot: "nexo-cron-wrapper.sh CRON_ID "
610
+ # The trailing space prevents prefix collisions (e.g. "morning-agent" vs
611
+ # a hypothetical "morning-agent-v2").
612
+ pgrep -f "nexo-cron-wrapper\.sh ${cron_id} " 2>/dev/null
613
+ }
614
+
615
+ reap_stuck_cron_pids() {
616
+ local cron_id="$1"
617
+ local pids
618
+ pids=$(find_wrapper_pids "$cron_id")
619
+ if [ -z "$pids" ]; then
620
+ # No wrapper alive — caller should fall through to in-band row cleanup.
621
+ return 1
622
+ fi
623
+ log_repair "STUCK REAPER: SIGTERM to wrapper PIDs ($cron_id): $(echo "$pids" | tr '\n' ' ')"
624
+ for pid in $pids; do
625
+ kill -TERM "$pid" 2>/dev/null || true
626
+ done
627
+ # Grace period — the wrapper trap (TERM → forward to child → finalize_row)
628
+ # needs a few seconds to close the cron_runs row cleanly.
629
+ local waited=0
630
+ local still
631
+ while [ $waited -lt "$STUCK_KILL_GRACE" ]; do
632
+ sleep 1
633
+ waited=$((waited + 1))
634
+ still=$(find_wrapper_pids "$cron_id")
635
+ [ -z "$still" ] && break
636
+ done
637
+ # Escalate to SIGKILL for any survivor (wrapper + descendants).
638
+ local survivors
639
+ survivors=$(find_wrapper_pids "$cron_id")
640
+ if [ -n "$survivors" ]; then
641
+ log_repair "STUCK REAPER: SIGKILL escalation ($cron_id): $(echo "$survivors" | tr '\n' ' ')"
642
+ for pid in $survivors; do
643
+ # Kill descendants first so they don't get reparented to PID 1.
644
+ pkill -KILL -P "$pid" 2>/dev/null || true
645
+ kill -KILL "$pid" 2>/dev/null || true
646
+ done
647
+ sleep 1
648
+ fi
649
+ # Last sanity check.
650
+ if [ -n "$(find_wrapper_pids "$cron_id")" ]; then
651
+ log "STUCK REAPER: failed to kill wrapper for $cron_id (still alive after SIGKILL)"
652
+ return 2
653
+ fi
654
+ return 0
655
+ }
656
+
657
+ finalize_stuck_db_row() {
658
+ local row_id="$1"
659
+ local cron_id="$2"
660
+ [ ! -f "$DB_PATH" ] && return 1
661
+ sqlite3 "$DB_PATH" "
662
+ UPDATE cron_runs
663
+ SET ended_at = strftime('%Y-%m-%d %H:%M:%S','now'),
664
+ exit_code = 137,
665
+ summary = 'stuck row reaped by watchdog: wrapper PID gone',
666
+ error = 'Watchdog STUCK REAPER: orphan in-flight row cleaned up',
667
+ duration_secs = CAST(strftime('%s','now') - strftime('%s', started_at) AS REAL)
668
+ WHERE id = $row_id;
669
+ " 2>/dev/null
670
+ log_repair "STUCK REAPER: cleaned up zombie cron_runs row id=$row_id ($cron_id)"
671
+ }
672
+
673
+ run_stuck_reaper() {
674
+ [ ! -f "$DB_PATH" ] && return 0
675
+ _load_stuck_thresholds
676
+ local row_id cron_id age_secs threshold
677
+ while IFS='|' read -r row_id cron_id age_secs; do
678
+ [ -z "$row_id" ] && continue
679
+ [ -z "$cron_id" ] && continue
680
+ # Skip self and any explicitly-protected cron_ids.
681
+ case " $STUCK_REAPER_SKIP " in
682
+ *" $cron_id "*) continue ;;
683
+ esac
684
+ threshold=$(lookup_stuck_threshold "$cron_id")
685
+ if [ "$age_secs" -gt "$threshold" ]; then
686
+ log "STUCK REAPER: cron_id=$cron_id row_id=$row_id age=${age_secs}s threshold=${threshold}s — reaping"
687
+ if reap_stuck_cron_pids "$cron_id"; then
688
+ # Wrapper trap closes the row with exit 143; nothing else to do.
689
+ TOTAL_REAPED=$((TOTAL_REAPED + 1))
690
+ else
691
+ # No wrapper alive (orphan zombie row) — close it in-band so the
692
+ # next tick of this cron isn't blocked by "Another instance running".
693
+ finalize_stuck_db_row "$row_id" "$cron_id"
694
+ TOTAL_REAPED=$((TOTAL_REAPED + 1))
695
+ fi
696
+ fi
697
+ done < <(sqlite3 -separator '|' "$DB_PATH" "
698
+ SELECT id, cron_id, CAST(strftime('%s','now') - strftime('%s', started_at) AS INTEGER)
699
+ FROM cron_runs
700
+ WHERE ended_at IS NULL
701
+ ORDER BY id DESC;
702
+ " 2>/dev/null)
703
+ if [ "$TOTAL_REAPED" -gt 0 ]; then
704
+ log "STUCK REAPER: complete — reaped $TOTAL_REAPED stuck cron(s)"
705
+ fi
706
+ }
707
+
708
+ run_stuck_reaper
709
+
533
710
  # ============================================================================
534
711
  # RUN CHECKS
535
712
  # ============================================================================
@@ -1023,6 +1200,7 @@ cat > "$STATUS_JSON" <<JSONEOF
1023
1200
  "warn": $TOTAL_WARN,
1024
1201
  "fail": $TOTAL_FAIL,
1025
1202
  "healed": $TOTAL_HEALED,
1203
+ "reaped": $TOTAL_REAPED,
1026
1204
  "overall": "$OVERALL"
1027
1205
  },
1028
1206
  "launch_agents": [
@@ -1047,7 +1225,7 @@ cat > "$REPORT_TXT" <<REPORTEOF
1047
1225
  ======================================================
1048
1226
  NEXO WATCHDOG REPORT — $TS
1049
1227
  ======================================================
1050
- PASS: $TOTAL_PASS | HEALED: $TOTAL_HEALED | WARN: $TOTAL_WARN | FAIL: $TOTAL_FAIL | TOTAL: $TOTAL
1228
+ PASS: $TOTAL_PASS | HEALED: $TOTAL_HEALED | WARN: $TOTAL_WARN | FAIL: $TOTAL_FAIL | REAPED: $TOTAL_REAPED | TOTAL: $TOTAL
1051
1229
  OVERALL: $OVERALL
1052
1230
  ======================================================
1053
1231
 
@@ -1261,4 +1439,4 @@ fi
1261
1439
  # ============================================================================
1262
1440
  # LOG SUMMARY
1263
1441
  # ============================================================================
1264
- log "Complete: PASS=$TOTAL_PASS HEALED=$TOTAL_HEALED WARN=$TOTAL_WARN FAIL=$TOTAL_FAIL"
1442
+ log "Complete: PASS=$TOTAL_PASS HEALED=$TOTAL_HEALED WARN=$TOTAL_WARN FAIL=$TOTAL_FAIL REAPED=$TOTAL_REAPED"