nexo-brain 7.9.28 → 7.9.31

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "nexo-brain",
3
- "version": "7.9.28",
3
+ "version": "7.9.31",
4
4
  "description": "Local cognitive runtime for Claude Code \u2014 persistent memory, overnight learning, doctor diagnostics, personal scripts, recovery-aware jobs, startup preflight, and optional dashboard/power helper.",
5
5
  "author": {
6
6
  "name": "NEXO Brain",
package/README.md CHANGED
@@ -18,7 +18,13 @@
18
18
 
19
19
  [Watch the overview video](https://nexo-brain.com/watch/) · [Watch on YouTube](https://www.youtube.com/watch?v=i2lkGhKyVqI) · [Open the infographic](https://nexo-brain.com/assets/nexo-brain-infographic-v5.png)
20
20
 
21
- Version `7.9.28` is the current packaged-runtime line. Patch release over `7.9.27`: optional override files at `~/.nexo/config/llm_endpoint.json` and `~/.nexo/config/auth_provider.json` let third-party orchestrators redirect Brain's Anthropic SDK calls and delegate bearer token resolution to a local command (analogous to git's `credential.helper`). The same redirection is propagated to every CLI child Brain spawns (deep-sleep, evolution, followup-runner, morning-agent, email-monitor, `nexo chat`) by injecting `ANTHROPIC_BASE_URL` and `ANTHROPIC_API_KEY` into the spawned environment, so headless crons reach the proxy too. An `Idempotency-Key` (UUID4 hex) is attached per request for proxy-side dedup of transparent retries within 24h. Brain libre standalone (no override files) hits `api.anthropic.com` directly with `ANTHROPIC_API_KEY` exactly as before.
21
+ Version `7.9.31` is the current packaged-runtime line. Patch release over `7.9.30`: fixes a wire-level bug where ``call_model_raw`` was sending ``stop_sequences=["\n", ".", " "]`` by default, which the current Anthropic Messages API rejects with HTTP 400 ``each stop sequence must contain non-whitespace``. The default is now ``None`` (no ``stop_sequences`` field sent) since ``max_tokens=3`` already caps the yes/no classifier output. A local guard rejects whitespace-only caller values up front so the error shows where the caller is, not as a remote 400. Also removes an internal design document that did not belong in the open-source distribution.
22
+
23
+ Previously in `7.9.30`: hotfix for a missing ``import sys`` in ``src/agent_runner.py`` that ruff F821 caught in CI and blocked the 7.9.29 publish workflow before any npm artifact shipped. ``nexo-brain@7.9.30`` is the first npm release that carries the 7.9.29 override-path hardening.
24
+
25
+ Previously in `7.9.29`: hardening pass on the optional LLM endpoint and auth provider override path. The bearer is now passed to the Anthropic SDK via `auth_token` so it lands in the standard `Authorization: Bearer` header (7.9.28 sent it as `X-Api-Key` and any compatible proxy rejected every request with 401). The Brain config directory is resolved on each call instead of cached at import, so LaunchAgent crons that export `NEXO_HOME` via a wrapper now reach the right `~/.nexo/config/`. The `Idempotency-Key` header accepts a caller-provided value so application-level retries reuse the same dedup key. Override mode is strict about its bearer source: if `auth_provider.json` is missing or the helper fails, the call raises `ClassifierUnavailableError` instead of falling back to the operator's real `ANTHROPIC_API_KEY`, which would otherwise leak to the custom proxy as a second header. A new end-to-end test suite drives the real SDK against a local `http.server` and asserts on captured wire headers and body, complementing the SDK-mock unit tests.
26
+
27
+ Previously in `7.9.28`: optional override files at `~/.nexo/config/llm_endpoint.json` and `~/.nexo/config/auth_provider.json` let third-party orchestrators redirect Brain's Anthropic SDK calls and delegate bearer token resolution to a local command (analogous to git's `credential.helper`). The same redirection is propagated to every CLI child Brain spawns (deep-sleep, evolution, followup-runner, morning-agent, email-monitor, `nexo chat`) by injecting `ANTHROPIC_BASE_URL` and `ANTHROPIC_API_KEY` into the spawned environment, so headless crons reach the proxy too. An `Idempotency-Key` (UUID4 hex) is attached per request for proxy-side dedup of transparent retries within 24h. Brain libre standalone (no override files) hits `api.anthropic.com` directly with `ANTHROPIC_API_KEY` exactly as before.
22
28
 
23
29
  Previously in `7.9.27`: server startup no longer hangs the MCP `initialize` handshake when legacy followups/reminders still need owner backfill — the synchronous startup migration now runs `--rules-only` and skips the multi-minute `LocalZeroShotClassifier` load, keeping handshake under a few seconds.
24
30
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "nexo-brain",
3
- "version": "7.9.28",
3
+ "version": "7.9.31",
4
4
  "mcpName": "io.github.wazionapps/nexo",
5
5
  "description": "NEXO Brain — Shared brain for AI agents. Persistent memory, semantic RAG, natural forgetting, metacognitive guard, trust scoring, 150+ MCP tools. Works with Claude Code, Codex, Claude Desktop & any MCP client. 100% local, free.",
6
6
  "homepage": "https://nexo-brain.com",
@@ -8,6 +8,7 @@ import paths
8
8
  import shlex
9
9
  import shutil
10
10
  import subprocess
11
+ import sys
11
12
  import tempfile
12
13
  import time
13
14
  from functools import lru_cache
@@ -381,6 +382,17 @@ def _apply_llm_endpoint_override(env: dict) -> dict:
381
382
  hitting ``api.anthropic.com`` directly with whatever ``ANTHROPIC_API_KEY``
382
383
  the operator already had configured.
383
384
 
385
+ Security guarantee: if override mode is active but ``auth_provider.json``
386
+ is missing, malformed, or the helper command fails to produce a
387
+ bearer, the helper does NOT inject ``ANTHROPIC_BASE_URL`` either.
388
+ Otherwise the spawned CLI would inherit the operator's real
389
+ ``ANTHROPIC_API_KEY`` (a ``sk-ant-...`` key) from the parent process
390
+ and send it as the bearer to the custom proxy, leaking the real
391
+ Anthropic credential to a third party. Fail-closed: the override is
392
+ skipped completely, and the CLI either runs against
393
+ ``api.anthropic.com`` with the operator's real key (if also present
394
+ as standalone fallback) or fails locally.
395
+
384
396
  The contract is symmetric with what ``call_model_raw.py`` does for SDK
385
397
  direct calls: same files, same precedence, same alias system. The CLI
386
398
  child reads ``ANTHROPIC_BASE_URL`` (Anthropic SDK convention) and
@@ -400,12 +412,22 @@ def _apply_llm_endpoint_override(env: dict) -> dict:
400
412
  try:
401
413
  if not is_override_mode():
402
414
  return env
415
+ bearer = resolve_auth_token()
416
+ if not bearer:
417
+ # auth_provider.json missing or failed in override mode.
418
+ # Do NOT redirect to the proxy with a stale operator key — see
419
+ # the docstring above. Skip the override entirely and let the
420
+ # spawn either run standalone or fail explicitly elsewhere.
421
+ sys.stderr.write(
422
+ "[brain] llm_endpoint override active but auth_provider "
423
+ "produced no bearer; skipping CLI env injection to avoid "
424
+ "leaking a real ANTHROPIC_API_KEY to the proxy.\n"
425
+ )
426
+ return env
403
427
  base_url = resolve_api_base_url()
404
428
  if base_url:
405
429
  env["ANTHROPIC_BASE_URL"] = base_url
406
- bearer = resolve_auth_token()
407
- if bearer:
408
- env["ANTHROPIC_API_KEY"] = bearer
430
+ env["ANTHROPIC_API_KEY"] = bearer
409
431
  except Exception:
410
432
  # Override is best-effort: a misconfigured override file must not
411
433
  # crash an automation run that would otherwise have worked in
@@ -98,14 +98,32 @@ _OPENAI_KEY_PATHS = (
98
98
  def _resolve_brain_config_dir() -> Path:
99
99
  """Honour ``NEXO_HOME`` so tests, devcontainers and non-default
100
100
  installs (Maria iMac, Codex sandboxes, etc.) hit the right
101
- ``config/`` directory. Falls back to ``~/.nexo/config/``."""
101
+ ``config/`` directory. Resolved at every call so a process that
102
+ sets ``NEXO_HOME`` after this module is imported still picks up
103
+ the right path on the next request — relevant for LaunchAgent
104
+ crons that rely on env exported by their wrapper script. Falls
105
+ back to ``~/.nexo/config/``."""
102
106
  nexo_home = os.environ.get("NEXO_HOME", "").strip()
103
107
  if nexo_home:
104
108
  return Path(nexo_home).expanduser() / "config"
105
109
  return Path.home() / ".nexo" / "config"
106
110
 
107
111
 
108
- _BRAIN_CONFIG_DIR = _resolve_brain_config_dir()
112
+ # Tests monkeypatch this attribute to redirect overrides to a tmp dir.
113
+ # Production code MUST NOT read this directly — use ``_brain_config_dir()``.
114
+ # Default ``None`` lets ``_brain_config_dir()`` fall through to the live
115
+ # ``_resolve_brain_config_dir()`` so call-time NEXO_HOME changes are honoured.
116
+ _BRAIN_CONFIG_DIR: Path | None = None
117
+
118
+
119
+ def _brain_config_dir() -> Path:
120
+ """Production-side resolver. Honours the test monkeypatch hook above
121
+ when set, otherwise resolves from the live environment on every call."""
122
+ if _BRAIN_CONFIG_DIR is not None:
123
+ return _BRAIN_CONFIG_DIR
124
+ return _resolve_brain_config_dir()
125
+
126
+
109
127
  _SUPPORTED_OVERRIDE_VERSION = 1
110
128
  _LLM_ENDPOINT_FILENAME = "llm_endpoint.json"
111
129
  _AUTH_PROVIDER_FILENAME = "auth_provider.json"
@@ -132,17 +150,17 @@ _CONCRETE_TO_ALIAS: dict[tuple[str, str], str] = {
132
150
  def _read_versioned_config(filename: str) -> dict | None:
133
151
  """Load a versioned override file from the Brain config directory.
134
152
 
135
- The directory is resolved at call time (not module import time) so
136
- tests can monkeypatch ``_BRAIN_CONFIG_DIR`` and so a process that
137
- sets ``NEXO_HOME`` after importing the module still picks up the
138
- right path on the first real call.
153
+ Calls ``_brain_config_dir()`` on every invocation so a process that
154
+ sets ``NEXO_HOME`` after importing the module picks up the new path
155
+ immediately. Tests can monkeypatch ``_BRAIN_CONFIG_DIR`` to redirect
156
+ to a tmp dir.
139
157
 
140
158
  Returns the dict iff the file exists, parses as JSON and declares
141
159
  ``version: 1``. Any other case (missing, malformed, unsupported version)
142
160
  returns None and emits a stderr warning so operators can see why the
143
161
  override was ignored. Never raises.
144
162
  """
145
- path = _BRAIN_CONFIG_DIR / filename
163
+ path = _brain_config_dir() / filename
146
164
  try:
147
165
  if not path.is_file():
148
166
  return None
@@ -209,61 +227,86 @@ def is_override_mode() -> bool:
209
227
  return bool(url)
210
228
 
211
229
 
212
- def resolve_auth_token() -> str:
213
- """Return the bearer token to use against the resolved base URL.
230
+ def _resolve_auth_provider_token() -> str:
231
+ """Resolve the bearer token strictly from ``auth_provider.json``.
214
232
 
215
- Resolution order:
216
- 1) ``~/.nexo/config/auth_provider.json`` ``command`` (subprocess
217
- stdout, trimmed). Honours ``timeout_sec`` (default 5). Falls
218
- through to (2) on any failure.
219
- 2) ``ANTHROPIC_API_KEY`` env var.
220
- 3) Legacy filesystem fallbacks (``_ANTHROPIC_KEY_PATHS``).
221
-
222
- Returns an empty string if nothing resolves; the caller raises
223
- ``ClassifierUnavailableError`` so the failure surfaces explicitly.
233
+ Returns the trimmed stdout of the configured command on success.
234
+ Returns ``""`` if the file is absent, malformed, or the command
235
+ times out / fails / exits non-zero / produces empty stdout. Never
236
+ falls back to environment or filesystem keys; that decision is
237
+ made by the caller based on whether override mode is active.
224
238
  """
225
239
  cfg = _read_versioned_config(_AUTH_PROVIDER_FILENAME)
226
- if cfg:
227
- cmd = str(cfg.get("command", "") or "").strip()
228
- if cmd:
229
- args_raw = cfg.get("args", []) or []
230
- args = [str(a) for a in args_raw if isinstance(a, (str, int, float))]
231
- try:
232
- timeout_sec = int(cfg.get("timeout_sec", _DEFAULT_AUTH_PROVIDER_TIMEOUT))
233
- except (TypeError, ValueError):
234
- timeout_sec = _DEFAULT_AUTH_PROVIDER_TIMEOUT
235
- try:
236
- result = subprocess.run(
237
- [cmd, *args],
238
- capture_output=True,
239
- text=True,
240
- timeout=timeout_sec,
241
- check=False,
242
- )
243
- except subprocess.TimeoutExpired as exc:
244
- # Learning #294: subprocess timeouts must be captured
245
- # explicitly so the operator sees the helper hung instead
246
- # of a generic "auth missing" downstream.
247
- sys.stderr.write(
248
- f"[brain] auth_provider command timed out after {timeout_sec}s: "
249
- f"{exc}; falling back to env\n"
250
- )
251
- except (FileNotFoundError, PermissionError, OSError) as exc:
252
- sys.stderr.write(
253
- f"[brain] auth_provider command failed: {exc}; falling back to env\n"
254
- )
255
- else:
256
- if result.returncode == 0:
257
- token = (result.stdout or "").strip()
258
- if token:
259
- return token
260
- else:
261
- stderr_excerpt = (result.stderr or "").strip()[:200]
262
- sys.stderr.write(
263
- f"[brain] auth_provider command exit={result.returncode}: "
264
- f"{stderr_excerpt}; falling back to env\n"
265
- )
240
+ if not cfg:
241
+ return ""
242
+ cmd = str(cfg.get("command", "") or "").strip()
243
+ if not cmd:
244
+ return ""
245
+ args_raw = cfg.get("args", []) or []
246
+ args = [str(a) for a in args_raw if isinstance(a, (str, int, float))]
247
+ try:
248
+ timeout_sec = int(cfg.get("timeout_sec", _DEFAULT_AUTH_PROVIDER_TIMEOUT))
249
+ except (TypeError, ValueError):
250
+ timeout_sec = _DEFAULT_AUTH_PROVIDER_TIMEOUT
251
+ try:
252
+ result = subprocess.run(
253
+ [cmd, *args],
254
+ capture_output=True,
255
+ text=True,
256
+ timeout=timeout_sec,
257
+ check=False,
258
+ )
259
+ except subprocess.TimeoutExpired as exc:
260
+ # Learning #294: subprocess timeouts must be captured explicitly so
261
+ # the operator sees the helper hung instead of a generic
262
+ # "auth missing" downstream.
263
+ sys.stderr.write(
264
+ f"[brain] auth_provider command timed out after {timeout_sec}s: "
265
+ f"{exc}\n"
266
+ )
267
+ return ""
268
+ except (FileNotFoundError, PermissionError, OSError) as exc:
269
+ sys.stderr.write(f"[brain] auth_provider command failed: {exc}\n")
270
+ return ""
271
+ if result.returncode != 0:
272
+ stderr_excerpt = (result.stderr or "").strip()[:200]
273
+ sys.stderr.write(
274
+ f"[brain] auth_provider command exit={result.returncode}: "
275
+ f"{stderr_excerpt}\n"
276
+ )
277
+ return ""
278
+ return (result.stdout or "").strip()
279
+
280
+
281
+ def resolve_auth_token() -> str:
282
+ """Return the bearer token to use against the resolved base URL.
266
283
 
284
+ The resolution depends on whether override mode is active:
285
+
286
+ * **Override mode** (``llm_endpoint.json`` valid): the token MUST
287
+ come from ``auth_provider.json``. Falling back to
288
+ ``ANTHROPIC_API_KEY`` (a real ``sk-ant-...`` key bound to the
289
+ operator's Anthropic account) and sending it as the bearer to a
290
+ third-party proxy would leak that credential. If the helper
291
+ command fails or is not configured, returns ``""`` so the caller
292
+ raises ``ClassifierUnavailableError``.
293
+ * **Standalone mode** (no override file): cascade
294
+ ``auth_provider.json`` → ``ANTHROPIC_API_KEY`` env →
295
+ ``~/.claude/anthropic-api-key.txt`` → ``~/.nexo/config/anthropic-api-key.txt``.
296
+ The legacy fallbacks exist so an operator that scripted bearer
297
+ resolution via the helper can still rely on the env var when
298
+ Brain is not redirected anywhere.
299
+ """
300
+ if is_override_mode():
301
+ # Strict: the bearer must come from the configured helper. If
302
+ # the helper is missing or fails, refuse to authenticate rather
303
+ # than leak a real Anthropic key to a custom proxy.
304
+ return _resolve_auth_provider_token()
305
+
306
+ # Standalone: helper first (if scripted), env/files otherwise.
307
+ helper_token = _resolve_auth_provider_token()
308
+ if helper_token:
309
+ return helper_token
267
310
  return _resolve_anthropic_key()
268
311
 
269
312
 
@@ -365,6 +408,7 @@ def _call_anthropic_raw(
365
408
  temperature: float,
366
409
  stop_sequences: list[str],
367
410
  timeout: float,
411
+ idempotency_key: str | None = None,
368
412
  ) -> str:
369
413
  try:
370
414
  import anthropic # type: ignore
@@ -373,21 +417,45 @@ def _call_anthropic_raw(
373
417
 
374
418
  override = is_override_mode()
375
419
  if override:
376
- # Proxy mode: resolve bearer via auth_provider + env fallbacks,
377
- # redirect base_url, translate concrete model to wire alias, and
378
- # attach an Idempotency-Key so the proxy can dedup retries.
420
+ # Proxy mode. The Anthropic SDK distinguishes:
421
+ # api_key=... -> header "X-Api-Key: <value>" (Anthropic-style)
422
+ # auth_token=... -> header "Authorization: Bearer <value>" (OAuth-style)
423
+ # NEXO Desktop and any compatible proxy parse the standard
424
+ # "Authorization: Bearer" header, so we MUST pass the resolved
425
+ # bearer through ``auth_token`` — passing it as ``api_key`` would
426
+ # send "X-Api-Key" which the proxy would reject with 401.
379
427
  wire_model = _resolve_override_alias(model, effort)
380
428
  base_url = resolve_api_base_url()
381
- api_key = resolve_auth_token()
382
- if not api_key:
429
+ bearer = resolve_auth_token()
430
+ if not bearer:
383
431
  raise ClassifierUnavailableError(
384
- "anthropic override: no bearer resolved (auth_provider and env both empty)"
432
+ "anthropic override: no bearer resolved from auth_provider.json; "
433
+ "override mode requires a configured auth helper to avoid leaking "
434
+ "a real ANTHROPIC_API_KEY to a custom proxy"
385
435
  )
386
- client = anthropic.Anthropic(
387
- api_key=api_key,
388
- base_url=base_url,
389
- timeout=timeout,
390
- )
436
+ # The SDK ``__init__`` resolves ``api_key`` from
437
+ # ``ANTHROPIC_API_KEY`` whenever the kwarg is ``None`` (the
438
+ # parameter default). It then sends BOTH ``X-Api-Key`` (from the
439
+ # env-resolved api_key) and ``Authorization: Bearer`` (from
440
+ # auth_token) on every request. A custom proxy would see and
441
+ # potentially log the operator's real ``sk-ant-...`` key. Passing
442
+ # ``api_key=""`` does not fix it either: the SDK's auth_headers
443
+ # check is ``if api_key is None`` (strict ``is``, not falsy), so
444
+ # the empty string still produces an ``X-Api-Key:`` header.
445
+ # Solution: pop the env var around the constructor call so the
446
+ # SDK records ``api_key=None`` and skips the X-Api-Key header
447
+ # entirely. Then restore the original env so we don't break
448
+ # other code paths in the same Python process.
449
+ _saved_anthropic_env = os.environ.pop("ANTHROPIC_API_KEY", None)
450
+ try:
451
+ client = anthropic.Anthropic(
452
+ auth_token=bearer,
453
+ base_url=base_url,
454
+ timeout=timeout,
455
+ )
456
+ finally:
457
+ if _saved_anthropic_env is not None:
458
+ os.environ["ANTHROPIC_API_KEY"] = _saved_anthropic_env
391
459
  else:
392
460
  # Standalone: behaviour identical to pre-V11. No override, no alias
393
461
  # translation, no extra headers — direct hit to api.anthropic.com.
@@ -401,17 +469,33 @@ def _call_anthropic_raw(
401
469
  "model": wire_model,
402
470
  "max_tokens": max_tokens,
403
471
  "temperature": temperature,
404
- "stop_sequences": stop_sequences,
405
472
  "messages": [{"role": "user", "content": prompt}],
406
473
  }
474
+ if stop_sequences:
475
+ # Anthropic API rejects whitespace-only stop sequences with
476
+ # 400 ``each stop sequence must contain non-whitespace``. The
477
+ # caller-validation in call_model_raw filters these out before
478
+ # we reach this point; the empty/None case is also covered by
479
+ # the truthy check above so we omit the field entirely instead
480
+ # of sending ``stop_sequences: null`` to the wire.
481
+ kwargs["stop_sequences"] = stop_sequences
407
482
  if system:
408
483
  kwargs["system"] = system
409
484
 
410
485
  if override:
411
- # Idempotency-Key: opaque per-request token reused on transparent
412
- # retries. Proxy dedups on (token_id + idempotency_key) for 24h, so
413
- # network-level retries do not double-bill the user.
414
- kwargs["extra_headers"] = {"Idempotency-Key": uuid.uuid4().hex}
486
+ # Idempotency-Key: opaque per-request token. The proxy dedups on
487
+ # (token_id + idempotency_key) for 24h, so network-level retries
488
+ # do not double-bill the user. The caller is encouraged to pass
489
+ # an explicit ``idempotency_key`` and reuse it across application-
490
+ # level retries (e.g. enforcement_classifier retrying after a
491
+ # ClassifierUnavailableError) so the proxy treats the second
492
+ # attempt as a duplicate of the first instead of a brand-new
493
+ # billable request. If the caller omits it we generate a fresh
494
+ # UUID4, which still covers SDK-level transparent retries since
495
+ # the SDK reuses the same ``kwargs`` across them.
496
+ if idempotency_key is None:
497
+ idempotency_key = uuid.uuid4().hex
498
+ kwargs["extra_headers"] = {"Idempotency-Key": idempotency_key}
415
499
 
416
500
  try:
417
501
  response = client.messages.create(**kwargs)
@@ -492,21 +576,45 @@ def call_model_raw(
492
576
  stop_sequences: list[str] | None = None,
493
577
  timeout: float = 10.0,
494
578
  system: str | None = None,
579
+ idempotency_key: str | None = None,
495
580
  ) -> str:
496
581
  """Run a single short LLM completion for enforcement-class classification.
497
582
 
498
583
  Parameters follow the Fase 2 plan doc 1 spec:
499
584
 
500
- prompt — the user-role text (English or the model's default).
501
- tier — resonance tier; default "muy_bajo" → Haiku / gpt-5.4-mini.
502
- caller — resonance caller label. Must be registered in
503
- resonance_map.SYSTEM_OWNED_CALLERS. Default
504
- "enforcer_classifier".
505
- max_tokens — hard cap on output tokens. Default 3 (yes/no only).
506
- temperature — sampling temperature. Default 0.0 (deterministic).
507
- stop_sequences — early-stop strings. Default ["\\n", ".", " "].
508
- timeout — per-request timeout in seconds. Default 10.0.
509
- system — optional system prompt. Default None (provider default).
585
+ prompt — the user-role text (English or the model's default).
586
+ tier — resonance tier; default "muy_bajo" → Haiku / gpt-5.4-mini.
587
+ caller — resonance caller label. Must be registered in
588
+ resonance_map.SYSTEM_OWNED_CALLERS. Default
589
+ "enforcer_classifier".
590
+ max_tokens — hard cap on output tokens. Default 3 (yes/no only).
591
+ temperature — sampling temperature. Default 0.0 (deterministic).
592
+ stop_sequences — early-stop strings. Default ``None`` (no stop
593
+ sequence sent on the wire). Anthropic's API
594
+ rejects whitespace-only entries with
595
+ ``each stop sequence must contain
596
+ non-whitespace`` (HTTP 400), so the previous
597
+ default of ``["\\n", ".", " "]`` made every
598
+ ``enforcer_classifier`` request fail in
599
+ production. ``max_tokens=3`` already serves as
600
+ the hard cap for yes/no classification, so a
601
+ stop sequence is unnecessary by default.
602
+ Callers that want a deterministic stop can
603
+ pass e.g. ``["."]``; whitespace-only entries
604
+ are rejected locally with
605
+ ``ClassifierUnavailableError``.
606
+ timeout — per-request timeout in seconds. Default 10.0.
607
+ system — optional system prompt. Default None (provider default).
608
+ idempotency_key — optional opaque token attached as
609
+ ``Idempotency-Key`` header in override mode. Reuse
610
+ the same value across application-level retries
611
+ (e.g. when the caller catches
612
+ ``ClassifierUnavailableError`` and tries again)
613
+ so the proxy treats the retry as a duplicate of
614
+ the first request and does not double-bill.
615
+ Ignored in standalone mode. If omitted in
616
+ override mode, a fresh UUID4 is generated which
617
+ still covers transparent SDK-level retries.
510
618
 
511
619
  Returns the raw text response, trimmed. The CALLER is responsible for
512
620
  parsing yes/no — the "triple reinforcement" (prompt strict, max_tokens
@@ -524,8 +632,22 @@ def call_model_raw(
524
632
  Callers MUST catch this and fall back to a safer default. Fase 2 spec
525
633
  0.20 is explicit: silence is not obedience. Never fail-open.
526
634
  """
527
- if stop_sequences is None:
528
- stop_sequences = ["\n", ".", " "]
635
+ if stop_sequences is not None:
636
+ # Anthropic API: ``each stop sequence must contain
637
+ # non-whitespace`` (HTTP 400). Surface the configuration error
638
+ # locally instead of letting Anthropic 400 the request — and,
639
+ # in override mode, instead of letting the proxy translate that
640
+ # 400 into a misleading ``503 all_providers_down``.
641
+ invalid = [
642
+ repr(s) for s in stop_sequences
643
+ if not isinstance(s, str) or not s.strip()
644
+ ]
645
+ if invalid:
646
+ raise ClassifierUnavailableError(
647
+ "stop_sequences contains whitespace-only or non-string "
648
+ f"entries: {', '.join(invalid)}; Anthropic API requires "
649
+ "every stop sequence to contain non-whitespace"
650
+ )
529
651
 
530
652
  # Local imports to avoid circulars and keep agent_runner.py decoupled.
531
653
  from client_preferences import ( # type: ignore
@@ -570,6 +692,7 @@ def call_model_raw(
570
692
  temperature=temperature,
571
693
  stop_sequences=stop_sequences,
572
694
  timeout=timeout,
695
+ idempotency_key=idempotency_key,
573
696
  )
574
697
  if backend == CLIENT_CODEX:
575
698
  return _call_openai_raw(