nexo-brain 7.9.27 → 7.9.30

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "nexo-brain",
3
- "version": "7.9.27",
3
+ "version": "7.9.30",
4
4
  "description": "Local cognitive runtime for Claude Code \u2014 persistent memory, overnight learning, doctor diagnostics, personal scripts, recovery-aware jobs, startup preflight, and optional dashboard/power helper.",
5
5
  "author": {
6
6
  "name": "NEXO Brain",
package/README.md CHANGED
@@ -18,7 +18,13 @@
18
18
 
19
19
  [Watch the overview video](https://nexo-brain.com/watch/) · [Watch on YouTube](https://www.youtube.com/watch?v=i2lkGhKyVqI) · [Open the infographic](https://nexo-brain.com/assets/nexo-brain-infographic-v5.png)
20
20
 
21
- Version `7.9.27` is the current packaged-runtime line. Patch release over `7.9.26`: server startup no longer hangs the MCP `initialize` handshake when legacy followups/reminders still need owner backfill the synchronous startup migration now runs `--rules-only` and skips the multi-minute `LocalZeroShotClassifier` load, keeping handshake under a few seconds.
21
+ Version `7.9.30` is the current packaged-runtime line. Patch release over `7.9.29`: hotfix for a missing ``import sys`` in ``src/agent_runner.py`` that ruff F821 caught in CI and blocked the 7.9.29 publish workflow before any npm artifact shipped. ``nexo-brain@7.9.30`` is the first npm release that carries the 7.9.29 override-path hardening.
22
+
23
+ Previously in `7.9.29`: hardening pass on the optional LLM endpoint and auth provider override path. The bearer is now passed to the Anthropic SDK via `auth_token` so it lands in the standard `Authorization: Bearer` header (7.9.28 sent it as `X-Api-Key` and any compatible proxy rejected every request with 401). The Brain config directory is resolved on each call instead of cached at import, so LaunchAgent crons that export `NEXO_HOME` via a wrapper now reach the right `~/.nexo/config/`. The `Idempotency-Key` header accepts a caller-provided value so application-level retries reuse the same dedup key. Override mode is strict about its bearer source: if `auth_provider.json` is missing or the helper fails, the call raises `ClassifierUnavailableError` instead of falling back to the operator's real `ANTHROPIC_API_KEY`, which would otherwise leak to the custom proxy as a second header. A new end-to-end test suite drives the real SDK against a local `http.server` and asserts on captured wire headers and body, complementing the SDK-mock unit tests.
24
+
25
+ Previously in `7.9.28`: optional override files at `~/.nexo/config/llm_endpoint.json` and `~/.nexo/config/auth_provider.json` let third-party orchestrators redirect Brain's Anthropic SDK calls and delegate bearer token resolution to a local command (analogous to git's `credential.helper`). The same redirection is propagated to every CLI child Brain spawns (deep-sleep, evolution, followup-runner, morning-agent, email-monitor, `nexo chat`) by injecting `ANTHROPIC_BASE_URL` and `ANTHROPIC_API_KEY` into the spawned environment, so headless crons reach the proxy too. An `Idempotency-Key` (UUID4 hex) is attached per request for proxy-side dedup of transparent retries within 24h. Brain libre standalone (no override files) hits `api.anthropic.com` directly with `ANTHROPIC_API_KEY` exactly as before.
26
+
27
+ Previously in `7.9.27`: server startup no longer hangs the MCP `initialize` handshake when legacy followups/reminders still need owner backfill — the synchronous startup migration now runs `--rules-only` and skips the multi-minute `LocalZeroShotClassifier` load, keeping handshake under a few seconds.
22
28
 
23
29
  Previously in `7.9.26`: headless automation prompts now receive the operator-language contract centrally, so reports, diaries, syntheses, followups, escalations, and Deep Sleep-generated memory text follow calibration even when the underlying template is English.
24
30
 
@@ -1077,6 +1083,19 @@ Use a personal plugin only when you need a new MCP tool in the runtime surface.
1077
1083
  - **Auto-update is resilient.** NEXO checks for updates on startup. If an update fails, it continues with the current version and notifies you. Local migrations (database schema, configuration) always run. Network updates (git pull) can be disabled by setting `auto_update: false` in `NEXO_HOME/config/schedule.json`.
1078
1084
  - **Secret redaction.** API keys and tokens are stripped before they ever reach memory storage.
1079
1085
 
1086
+ ## Custom LLM endpoint (advanced)
1087
+
1088
+ NEXO Brain reads two optional override files at `~/.nexo/config/`:
1089
+
1090
+ - `llm_endpoint.json` — set a custom Anthropic-compatible base URL.
1091
+ - `auth_provider.json` — delegate bearer token resolution to a local command (analogous to git's `credential.helper`).
1092
+
1093
+ This lets third-party orchestrators — for example an Anthropic-compatible proxy that adds rate limiting, cost accounting, multi-provider failover, or per-team auth — route Brain's LLM calls without modifying its source.
1094
+
1095
+ **If neither file exists, Brain operates exactly as before:** direct call to `https://api.anthropic.com` using `ANTHROPIC_API_KEY` from environment or filesystem. The override path is opt-in.
1096
+
1097
+ When override mode is active, Brain attaches an opaque `Idempotency-Key` to every request so the proxy can dedup transparent retries (24h window) without double-billing. The same redirection applies to every CLI child Brain spawns (deep-sleep, evolution, followup-runner, morning-agent, email-monitor, `nexo chat`): `agent_runner.py` injects `ANTHROPIC_BASE_URL` and `ANTHROPIC_API_KEY` into the spawned environment when override mode is on, so headless crons hit the proxy too — LaunchAgent crons do not inherit env from a UI process. See `docs/api/override-files.md` for the full schema, fallback rules, and an end-to-end example.
1098
+
1080
1099
  ## The Psychology Behind NEXO Brain
1081
1100
 
1082
1101
  NEXO Brain isn't just engineering — it's applied cognitive psychology:
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "nexo-brain",
3
- "version": "7.9.27",
3
+ "version": "7.9.30",
4
4
  "mcpName": "io.github.wazionapps/nexo",
5
5
  "description": "NEXO Brain — Shared brain for AI agents. Persistent memory, semantic RAG, natural forgetting, metacognitive guard, trust scoring, 150+ MCP tools. Works with Claude Code, Codex, Claude Desktop & any MCP client. 100% local, free.",
6
6
  "homepage": "https://nexo-brain.com",
@@ -8,6 +8,7 @@ import paths
8
8
  import shlex
9
9
  import shutil
10
10
  import subprocess
11
+ import sys
11
12
  import tempfile
12
13
  import time
13
14
  from functools import lru_cache
@@ -374,6 +375,68 @@ def _codex_config_path() -> Path:
374
375
  return Path.home() / ".codex" / "config.toml"
375
376
 
376
377
 
378
+ def _apply_llm_endpoint_override(env: dict) -> dict:
379
+ """Redirect the child Anthropic-compatible CLI to the configured proxy
380
+ when Brain is in override mode (``~/.nexo/config/llm_endpoint.json``
381
+ present). Standalone runs leave ``env`` untouched, so Brain libre keeps
382
+ hitting ``api.anthropic.com`` directly with whatever ``ANTHROPIC_API_KEY``
383
+ the operator already had configured.
384
+
385
+ Security guarantee: if override mode is active but ``auth_provider.json``
386
+ is missing, malformed, or the helper command fails to produce a
387
+ bearer, the helper does NOT inject ``ANTHROPIC_BASE_URL`` either.
388
+ Otherwise the spawned CLI would inherit the operator's real
389
+ ``ANTHROPIC_API_KEY`` (a ``sk-ant-...`` key) from the parent process
390
+ and send it as the bearer to the custom proxy, leaking the real
391
+ Anthropic credential to a third party. Fail-closed: the override is
392
+ skipped completely, and the CLI either runs against
393
+ ``api.anthropic.com`` with the operator's real key (if also present
394
+ as standalone fallback) or fails locally.
395
+
396
+ The contract is symmetric with what ``call_model_raw.py`` does for SDK
397
+ direct calls: same files, same precedence, same alias system. The CLI
398
+ child reads ``ANTHROPIC_BASE_URL`` (Anthropic SDK convention) and
399
+ ``ANTHROPIC_API_KEY`` from the spawned environment.
400
+
401
+ No-op (and silent) when ``call_model_raw`` is unavailable for any
402
+ reason; the headless surface should never block on this helper.
403
+ """
404
+ try:
405
+ from call_model_raw import (
406
+ is_override_mode,
407
+ resolve_api_base_url,
408
+ resolve_auth_token,
409
+ )
410
+ except Exception:
411
+ return env
412
+ try:
413
+ if not is_override_mode():
414
+ return env
415
+ bearer = resolve_auth_token()
416
+ if not bearer:
417
+ # auth_provider.json missing or failed in override mode.
418
+ # Do NOT redirect to the proxy with a stale operator key — see
419
+ # the docstring above. Skip the override entirely and let the
420
+ # spawn either run standalone or fail explicitly elsewhere.
421
+ sys.stderr.write(
422
+ "[brain] llm_endpoint override active but auth_provider "
423
+ "produced no bearer; skipping CLI env injection to avoid "
424
+ "leaking a real ANTHROPIC_API_KEY to the proxy.\n"
425
+ )
426
+ return env
427
+ base_url = resolve_api_base_url()
428
+ if base_url:
429
+ env["ANTHROPIC_BASE_URL"] = base_url
430
+ env["ANTHROPIC_API_KEY"] = bearer
431
+ except Exception:
432
+ # Override is best-effort: a misconfigured override file must not
433
+ # crash an automation run that would otherwise have worked in
434
+ # standalone. The SDK direct path already surfaces config errors
435
+ # via ClassifierUnavailableError; the CLI path stays defensive.
436
+ pass
437
+ return env
438
+
439
+
377
440
  def _headless_env(env: dict | None = None) -> dict:
378
441
  merged = os.environ.copy()
379
442
  if env:
@@ -382,7 +445,7 @@ def _headless_env(env: dict | None = None) -> dict:
382
445
  merged["NEXO_AUTOMATION"] = "1"
383
446
  merged.pop("CLAUDECODE", None)
384
447
  merged.pop("CLAUDE_CODE", None)
385
- return merged
448
+ return _apply_llm_endpoint_override(merged)
386
449
 
387
450
 
388
451
  def _load_client_bootstrap_prompt(client: str) -> str:
@@ -603,6 +666,7 @@ def run_automation_interactive(
603
666
  launch_env = os.environ.copy()
604
667
  if env:
605
668
  launch_env.update(env)
669
+ launch_env = _apply_llm_endpoint_override(launch_env)
606
670
  cwd_path = Path(_interactive_target_cwd(target))
607
671
 
608
672
  # Best-effort resonance lookup — interactive sessions do not swap the
@@ -1043,7 +1107,21 @@ def run_automation_prompt(
1043
1107
 
1044
1108
  bare_api_key = ""
1045
1109
  if resolved_bare:
1046
- bare_api_key = _resolve_anthropic_api_key()
1110
+ # In override mode the bearer was already injected into
1111
+ # run_env by _apply_llm_endpoint_override (proxy token, not
1112
+ # the operator's raw Anthropic key). Reuse it instead of
1113
+ # asking the keychain helper for a real Anthropic key — the
1114
+ # proxy expects its own bearer and would reject the real one.
1115
+ override_bearer = run_env.get("ANTHROPIC_API_KEY", "").strip() if run_env else ""
1116
+ try:
1117
+ from call_model_raw import is_override_mode as _is_override_mode
1118
+ _override_active = _is_override_mode()
1119
+ except Exception:
1120
+ _override_active = False
1121
+ if _override_active and override_bearer:
1122
+ bare_api_key = override_bearer
1123
+ else:
1124
+ bare_api_key = _resolve_anthropic_api_key()
1047
1125
  if not bare_api_key:
1048
1126
  # Silent fallback: we would rather take the slower path
1049
1127
  # than force the caller to fail-closed on an env quirk.
@@ -41,7 +41,11 @@ gap.
41
41
  from __future__ import annotations
42
42
 
43
43
  import json
44
+ import logging
44
45
  import os
46
+ import subprocess
47
+ import sys
48
+ import uuid
45
49
  from pathlib import Path
46
50
 
47
51
 
@@ -65,6 +69,246 @@ _OPENAI_KEY_PATHS = (
65
69
  Path.home() / ".codex" / "auth.json",
66
70
  )
67
71
 
72
+ # ---------------------------------------------------------------------------
73
+ # Optional override files (~/.nexo/config/)
74
+ # ---------------------------------------------------------------------------
75
+ # Two forward-compatible JSON files let third-party orchestrators (such as an
76
+ # Anthropic-compatible proxy) redirect the LLM endpoint and delegate token
77
+ # resolution to a local helper. Pattern is analogous to git's `core.editor`
78
+ # and `credential.helper`.
79
+ #
80
+ # ~/.nexo/config/llm_endpoint.json
81
+ # {
82
+ # "version": 1,
83
+ # "anthropic_base_url": "https://my-proxy.example.com/api/proxy"
84
+ # }
85
+ #
86
+ # ~/.nexo/config/auth_provider.json
87
+ # {
88
+ # "version": 1,
89
+ # "command": "/path/to/auth-helper",
90
+ # "args": ["--for", "anthropic"],
91
+ # "timeout_sec": 5
92
+ # }
93
+ #
94
+ # If neither file exists the caller falls back to standalone behaviour:
95
+ # direct call to api.anthropic.com using ANTHROPIC_API_KEY from environment
96
+ # or filesystem. NEXO Brain's open-source distribution is unaffected.
97
+
98
+ def _resolve_brain_config_dir() -> Path:
99
+ """Honour ``NEXO_HOME`` so tests, devcontainers and non-default
100
+ installs (Maria iMac, Codex sandboxes, etc.) hit the right
101
+ ``config/`` directory. Resolved at every call so a process that
102
+ sets ``NEXO_HOME`` after this module is imported still picks up
103
+ the right path on the next request — relevant for LaunchAgent
104
+ crons that rely on env exported by their wrapper script. Falls
105
+ back to ``~/.nexo/config/``."""
106
+ nexo_home = os.environ.get("NEXO_HOME", "").strip()
107
+ if nexo_home:
108
+ return Path(nexo_home).expanduser() / "config"
109
+ return Path.home() / ".nexo" / "config"
110
+
111
+
112
+ # Tests monkeypatch this attribute to redirect overrides to a tmp dir.
113
+ # Production code MUST NOT read this directly — use ``_brain_config_dir()``.
114
+ # Default ``None`` lets ``_brain_config_dir()`` fall through to the live
115
+ # ``_resolve_brain_config_dir()`` so call-time NEXO_HOME changes are honoured.
116
+ _BRAIN_CONFIG_DIR: Path | None = None
117
+
118
+
119
+ def _brain_config_dir() -> Path:
120
+ """Production-side resolver. Honours the test monkeypatch hook above
121
+ when set, otherwise resolves from the live environment on every call."""
122
+ if _BRAIN_CONFIG_DIR is not None:
123
+ return _BRAIN_CONFIG_DIR
124
+ return _resolve_brain_config_dir()
125
+
126
+
127
+ _SUPPORTED_OVERRIDE_VERSION = 1
128
+ _LLM_ENDPOINT_FILENAME = "llm_endpoint.json"
129
+ _AUTH_PROVIDER_FILENAME = "auth_provider.json"
130
+ _DEFAULT_ANTHROPIC_BASE_URL = "https://api.anthropic.com"
131
+ _DEFAULT_AUTH_PROVIDER_TIMEOUT = 5
132
+
133
+ # Internal map: (concrete_model, effort) -> wire alias accepted by an
134
+ # Anthropic-compatible proxy. ONLY consulted when override mode is active.
135
+ # Standalone mode never reads this map and keeps using the concrete model.
136
+ #
137
+ # Add entries here in lockstep with new tiers added to resonance_tiers.json.
138
+ # Failing fast on an unmapped (model, effort) is preferable to letting the
139
+ # proxy reject the request with a 400 — the operator gets a clear local
140
+ # error instead of a remote one.
141
+ _CONCRETE_TO_ALIAS: dict[tuple[str, str], str] = {
142
+ ("claude-opus-4-7[1m]", "max"): "nexo-max",
143
+ ("claude-opus-4-7[1m]", "xhigh"): "nexo-high",
144
+ ("claude-opus-4-7[1m]", "high"): "nexo-medium",
145
+ ("claude-opus-4-7[1m]", "medium"): "nexo-low",
146
+ ("claude-haiku-4-5-20251001", ""): "nexo-mini",
147
+ }
148
+
149
+
150
+ def _read_versioned_config(filename: str) -> dict | None:
151
+ """Load a versioned override file from the Brain config directory.
152
+
153
+ Calls ``_brain_config_dir()`` on every invocation so a process that
154
+ sets ``NEXO_HOME`` after importing the module picks up the new path
155
+ immediately. Tests can monkeypatch ``_BRAIN_CONFIG_DIR`` to redirect
156
+ to a tmp dir.
157
+
158
+ Returns the dict iff the file exists, parses as JSON and declares
159
+ ``version: 1``. Any other case (missing, malformed, unsupported version)
160
+ returns None and emits a stderr warning so operators can see why the
161
+ override was ignored. Never raises.
162
+ """
163
+ path = _brain_config_dir() / filename
164
+ try:
165
+ if not path.is_file():
166
+ return None
167
+ cfg = json.loads(path.read_text())
168
+ except (OSError, json.JSONDecodeError) as exc:
169
+ sys.stderr.write(
170
+ f"[brain] failed to read override {filename}: {exc}; ignoring\n"
171
+ )
172
+ return None
173
+ if not isinstance(cfg, dict):
174
+ sys.stderr.write(
175
+ f"[brain] override {filename} is not a JSON object; ignoring\n"
176
+ )
177
+ return None
178
+ version = cfg.get("version", 0)
179
+ if version != _SUPPORTED_OVERRIDE_VERSION:
180
+ sys.stderr.write(
181
+ f"[brain] override {filename} version {version!r} not supported "
182
+ f"(expected {_SUPPORTED_OVERRIDE_VERSION}); ignoring\n"
183
+ )
184
+ return None
185
+ return cfg
186
+
187
+
188
+ def resolve_api_base_url() -> str:
189
+ """Return the Anthropic API base URL.
190
+
191
+ Resolution order:
192
+ 1) ``~/.nexo/config/llm_endpoint.json`` with ``anthropic_base_url``.
193
+ 2) ``NEXO_LLM_ENDPOINT`` env var.
194
+ 3) Default ``https://api.anthropic.com`` (standalone).
195
+ """
196
+ cfg = _read_versioned_config(_LLM_ENDPOINT_FILENAME)
197
+ if cfg:
198
+ url = str(cfg.get("anthropic_base_url", "") or "").strip()
199
+ if url:
200
+ return url
201
+ env_url = os.environ.get("NEXO_LLM_ENDPOINT", "").strip()
202
+ if env_url:
203
+ return env_url
204
+ return _DEFAULT_ANTHROPIC_BASE_URL
205
+
206
+
207
+ def _override_force_disabled() -> bool:
208
+ # Internal escape hatch used by the test suite and by maintainers when
209
+ # they need to validate a regression against the upstream Anthropic API
210
+ # without renaming the override files on disk. Intentionally undocumented
211
+ # outside the source so that the canonical override-mode contract stays
212
+ # purely file-driven for everybody else.
213
+ raw = os.environ.get("NEXO_RAW_ANTHROPIC", "").strip().lower()
214
+ return raw in ("1", "true", "yes", "on")
215
+
216
+
217
+ def is_override_mode() -> bool:
218
+ """True iff a valid ``llm_endpoint.json`` is present and selects a custom
219
+ base URL. The override gate is the file (not an env var) so that
220
+ env-only configurations remain transparent to standalone callers."""
221
+ if _override_force_disabled():
222
+ return False
223
+ cfg = _read_versioned_config(_LLM_ENDPOINT_FILENAME)
224
+ if not cfg:
225
+ return False
226
+ url = str(cfg.get("anthropic_base_url", "") or "").strip()
227
+ return bool(url)
228
+
229
+
230
+ def _resolve_auth_provider_token() -> str:
231
+ """Resolve the bearer token strictly from ``auth_provider.json``.
232
+
233
+ Returns the trimmed stdout of the configured command on success.
234
+ Returns ``""`` if the file is absent, malformed, or the command
235
+ times out / fails / exits non-zero / produces empty stdout. Never
236
+ falls back to environment or filesystem keys; that decision is
237
+ made by the caller based on whether override mode is active.
238
+ """
239
+ cfg = _read_versioned_config(_AUTH_PROVIDER_FILENAME)
240
+ if not cfg:
241
+ return ""
242
+ cmd = str(cfg.get("command", "") or "").strip()
243
+ if not cmd:
244
+ return ""
245
+ args_raw = cfg.get("args", []) or []
246
+ args = [str(a) for a in args_raw if isinstance(a, (str, int, float))]
247
+ try:
248
+ timeout_sec = int(cfg.get("timeout_sec", _DEFAULT_AUTH_PROVIDER_TIMEOUT))
249
+ except (TypeError, ValueError):
250
+ timeout_sec = _DEFAULT_AUTH_PROVIDER_TIMEOUT
251
+ try:
252
+ result = subprocess.run(
253
+ [cmd, *args],
254
+ capture_output=True,
255
+ text=True,
256
+ timeout=timeout_sec,
257
+ check=False,
258
+ )
259
+ except subprocess.TimeoutExpired as exc:
260
+ # Learning #294: subprocess timeouts must be captured explicitly so
261
+ # the operator sees the helper hung instead of a generic
262
+ # "auth missing" downstream.
263
+ sys.stderr.write(
264
+ f"[brain] auth_provider command timed out after {timeout_sec}s: "
265
+ f"{exc}\n"
266
+ )
267
+ return ""
268
+ except (FileNotFoundError, PermissionError, OSError) as exc:
269
+ sys.stderr.write(f"[brain] auth_provider command failed: {exc}\n")
270
+ return ""
271
+ if result.returncode != 0:
272
+ stderr_excerpt = (result.stderr or "").strip()[:200]
273
+ sys.stderr.write(
274
+ f"[brain] auth_provider command exit={result.returncode}: "
275
+ f"{stderr_excerpt}\n"
276
+ )
277
+ return ""
278
+ return (result.stdout or "").strip()
279
+
280
+
281
+ def resolve_auth_token() -> str:
282
+ """Return the bearer token to use against the resolved base URL.
283
+
284
+ The resolution depends on whether override mode is active:
285
+
286
+ * **Override mode** (``llm_endpoint.json`` valid): the token MUST
287
+ come from ``auth_provider.json``. Falling back to
288
+ ``ANTHROPIC_API_KEY`` (a real ``sk-ant-...`` key bound to the
289
+ operator's Anthropic account) and sending it as the bearer to a
290
+ third-party proxy would leak that credential. If the helper
291
+ command fails or is not configured, returns ``""`` so the caller
292
+ raises ``ClassifierUnavailableError``.
293
+ * **Standalone mode** (no override file): cascade
294
+ ``auth_provider.json`` → ``ANTHROPIC_API_KEY`` env →
295
+ ``~/.claude/anthropic-api-key.txt`` → ``~/.nexo/config/anthropic-api-key.txt``.
296
+ The legacy fallbacks exist so an operator that scripted bearer
297
+ resolution via the helper can still rely on the env var when
298
+ Brain is not redirected anywhere.
299
+ """
300
+ if is_override_mode():
301
+ # Strict: the bearer must come from the configured helper. If
302
+ # the helper is missing or fails, refuse to authenticate rather
303
+ # than leak a real Anthropic key to a custom proxy.
304
+ return _resolve_auth_provider_token()
305
+
306
+ # Standalone: helper first (if scripted), env/files otherwise.
307
+ helper_token = _resolve_auth_provider_token()
308
+ if helper_token:
309
+ return helper_token
310
+ return _resolve_anthropic_key()
311
+
68
312
 
69
313
  def _resolve_anthropic_key() -> str:
70
314
  env_key = os.environ.get("ANTHROPIC_API_KEY", "").strip()
@@ -138,28 +382,91 @@ def _extract_openai_text(response) -> str:
138
382
  return ""
139
383
 
140
384
 
385
+ def _resolve_override_alias(model: str, effort: str) -> str:
386
+ """In override mode the proxy speaks aliases, not concrete model names.
387
+ Translate ``(model, effort)`` into the wire alias the proxy validates.
388
+ Unmapped pairs fail-closed: better to surface a local config error than
389
+ let the proxy reject the request remotely.
390
+ """
391
+ key = (model, effort)
392
+ alias = _CONCRETE_TO_ALIAS.get(key)
393
+ if not alias:
394
+ raise ClassifierUnavailableError(
395
+ f"override mode: no alias mapped for (model={model!r}, "
396
+ f"effort={effort!r}); update _CONCRETE_TO_ALIAS in call_model_raw.py"
397
+ )
398
+ return alias
399
+
400
+
141
401
  def _call_anthropic_raw(
142
402
  *,
143
403
  prompt: str,
144
404
  system: str | None,
145
405
  model: str,
406
+ effort: str,
146
407
  max_tokens: int,
147
408
  temperature: float,
148
409
  stop_sequences: list[str],
149
410
  timeout: float,
411
+ idempotency_key: str | None = None,
150
412
  ) -> str:
151
413
  try:
152
414
  import anthropic # type: ignore
153
415
  except ImportError as exc:
154
416
  raise ClassifierUnavailableError(f"anthropic SDK missing: {exc}") from exc
155
417
 
156
- api_key = _resolve_anthropic_key()
157
- if not api_key:
158
- raise ClassifierUnavailableError("anthropic: no ANTHROPIC_API_KEY found")
418
+ override = is_override_mode()
419
+ if override:
420
+ # Proxy mode. The Anthropic SDK distinguishes:
421
+ # api_key=... -> header "X-Api-Key: <value>" (Anthropic-style)
422
+ # auth_token=... -> header "Authorization: Bearer <value>" (OAuth-style)
423
+ # NEXO Desktop and any compatible proxy parse the standard
424
+ # "Authorization: Bearer" header, so we MUST pass the resolved
425
+ # bearer through ``auth_token`` — passing it as ``api_key`` would
426
+ # send "X-Api-Key" which the proxy would reject with 401.
427
+ wire_model = _resolve_override_alias(model, effort)
428
+ base_url = resolve_api_base_url()
429
+ bearer = resolve_auth_token()
430
+ if not bearer:
431
+ raise ClassifierUnavailableError(
432
+ "anthropic override: no bearer resolved from auth_provider.json; "
433
+ "override mode requires a configured auth helper to avoid leaking "
434
+ "a real ANTHROPIC_API_KEY to a custom proxy"
435
+ )
436
+ # The SDK ``__init__`` resolves ``api_key`` from
437
+ # ``ANTHROPIC_API_KEY`` whenever the kwarg is ``None`` (the
438
+ # parameter default). It then sends BOTH ``X-Api-Key`` (from the
439
+ # env-resolved api_key) and ``Authorization: Bearer`` (from
440
+ # auth_token) on every request. A custom proxy would see and
441
+ # potentially log the operator's real ``sk-ant-...`` key. Passing
442
+ # ``api_key=""`` does not fix it either: the SDK's auth_headers
443
+ # check is ``if api_key is None`` (strict ``is``, not falsy), so
444
+ # the empty string still produces an ``X-Api-Key:`` header.
445
+ # Solution: pop the env var around the constructor call so the
446
+ # SDK records ``api_key=None`` and skips the X-Api-Key header
447
+ # entirely. Then restore the original env so we don't break
448
+ # other code paths in the same Python process.
449
+ _saved_anthropic_env = os.environ.pop("ANTHROPIC_API_KEY", None)
450
+ try:
451
+ client = anthropic.Anthropic(
452
+ auth_token=bearer,
453
+ base_url=base_url,
454
+ timeout=timeout,
455
+ )
456
+ finally:
457
+ if _saved_anthropic_env is not None:
458
+ os.environ["ANTHROPIC_API_KEY"] = _saved_anthropic_env
459
+ else:
460
+ # Standalone: behaviour identical to pre-V11. No override, no alias
461
+ # translation, no extra headers — direct hit to api.anthropic.com.
462
+ wire_model = model
463
+ api_key = _resolve_anthropic_key()
464
+ if not api_key:
465
+ raise ClassifierUnavailableError("anthropic: no ANTHROPIC_API_KEY found")
466
+ client = anthropic.Anthropic(api_key=api_key, timeout=timeout)
159
467
 
160
- client = anthropic.Anthropic(api_key=api_key, timeout=timeout)
161
468
  kwargs: dict = {
162
- "model": model,
469
+ "model": wire_model,
163
470
  "max_tokens": max_tokens,
164
471
  "temperature": temperature,
165
472
  "stop_sequences": stop_sequences,
@@ -168,6 +475,21 @@ def _call_anthropic_raw(
168
475
  if system:
169
476
  kwargs["system"] = system
170
477
 
478
+ if override:
479
+ # Idempotency-Key: opaque per-request token. The proxy dedups on
480
+ # (token_id + idempotency_key) for 24h, so network-level retries
481
+ # do not double-bill the user. The caller is encouraged to pass
482
+ # an explicit ``idempotency_key`` and reuse it across application-
483
+ # level retries (e.g. enforcement_classifier retrying after a
484
+ # ClassifierUnavailableError) so the proxy treats the second
485
+ # attempt as a duplicate of the first instead of a brand-new
486
+ # billable request. If the caller omits it we generate a fresh
487
+ # UUID4, which still covers SDK-level transparent retries since
488
+ # the SDK reuses the same ``kwargs`` across them.
489
+ if idempotency_key is None:
490
+ idempotency_key = uuid.uuid4().hex
491
+ kwargs["extra_headers"] = {"Idempotency-Key": idempotency_key}
492
+
171
493
  try:
172
494
  response = client.messages.create(**kwargs)
173
495
  except anthropic.APITimeoutError as exc:
@@ -247,21 +569,32 @@ def call_model_raw(
247
569
  stop_sequences: list[str] | None = None,
248
570
  timeout: float = 10.0,
249
571
  system: str | None = None,
572
+ idempotency_key: str | None = None,
250
573
  ) -> str:
251
574
  """Run a single short LLM completion for enforcement-class classification.
252
575
 
253
576
  Parameters follow the Fase 2 plan doc 1 spec:
254
577
 
255
- prompt — the user-role text (English or the model's default).
256
- tier — resonance tier; default "muy_bajo" → Haiku / gpt-5.4-mini.
257
- caller — resonance caller label. Must be registered in
258
- resonance_map.SYSTEM_OWNED_CALLERS. Default
259
- "enforcer_classifier".
260
- max_tokens — hard cap on output tokens. Default 3 (yes/no only).
261
- temperature — sampling temperature. Default 0.0 (deterministic).
262
- stop_sequences — early-stop strings. Default ["\\n", ".", " "].
263
- timeout — per-request timeout in seconds. Default 10.0.
264
- system — optional system prompt. Default None (provider default).
578
+ prompt — the user-role text (English or the model's default).
579
+ tier — resonance tier; default "muy_bajo" → Haiku / gpt-5.4-mini.
580
+ caller — resonance caller label. Must be registered in
581
+ resonance_map.SYSTEM_OWNED_CALLERS. Default
582
+ "enforcer_classifier".
583
+ max_tokens — hard cap on output tokens. Default 3 (yes/no only).
584
+ temperature — sampling temperature. Default 0.0 (deterministic).
585
+ stop_sequences — early-stop strings. Default ["\\n", ".", " "].
586
+ timeout — per-request timeout in seconds. Default 10.0.
587
+ system — optional system prompt. Default None (provider default).
588
+ idempotency_key — optional opaque token attached as
589
+ ``Idempotency-Key`` header in override mode. Reuse
590
+ the same value across application-level retries
591
+ (e.g. when the caller catches
592
+ ``ClassifierUnavailableError`` and tries again)
593
+ so the proxy treats the retry as a duplicate of
594
+ the first request and does not double-bill.
595
+ Ignored in standalone mode. If omitted in
596
+ override mode, a fresh UUID4 is generated which
597
+ still covers transparent SDK-level retries.
265
598
 
266
599
  Returns the raw text response, trimmed. The CALLER is responsible for
267
600
  parsing yes/no — the "triple reinforcement" (prompt strict, max_tokens
@@ -301,7 +634,7 @@ def call_model_raw(
301
634
  raise ClassifierUnavailableError("automation_backend=none")
302
635
 
303
636
  try:
304
- model, _effort = resolve_model_and_effort(
637
+ model, effort = resolve_model_and_effort(
305
638
  caller=caller,
306
639
  backend=backend,
307
640
  explicit_tier=tier,
@@ -320,10 +653,12 @@ def call_model_raw(
320
653
  prompt=prompt,
321
654
  system=system,
322
655
  model=model,
656
+ effort=effort,
323
657
  max_tokens=max_tokens,
324
658
  temperature=temperature,
325
659
  stop_sequences=stop_sequences,
326
660
  timeout=timeout,
661
+ idempotency_key=idempotency_key,
327
662
  )
328
663
  if backend == CLIENT_CODEX:
329
664
  return _call_openai_raw(
@@ -339,4 +674,10 @@ def call_model_raw(
339
674
  raise ClassifierUnavailableError(f"unsupported backend: {backend}")
340
675
 
341
676
 
342
- __all__ = ["call_model_raw", "ClassifierUnavailableError"]
677
+ __all__ = [
678
+ "call_model_raw",
679
+ "ClassifierUnavailableError",
680
+ "is_override_mode",
681
+ "resolve_api_base_url",
682
+ "resolve_auth_token",
683
+ ]