npm - comisai - Versions diffs - 1.0.27 → 1.0.30 - Mend

comisai 1.0.27 → 1.0.30

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (246) hide show

package/node_modules/@comis/agent/dist/bootstrap/sections/tool-descriptions.js CHANGED Viewed

@@ -18,6 +18,15 @@
  * @module
  */
 import { getToolMetadata } from "@comis/core";
+import { getProviders } from "@mariozechner/pi-ai";
+// ---------------------------------------------------------------------------
+// Layer 1D (260430-vwt) -- live native-catalog provider list
+//
+// Computed once at module load time. Used by the providers_manage TOOL_GUIDE
+// "Built-in vs Custom Provider Check" block below so the text reflects
+// pi-ai's current native catalog without per-version source patches.
+// ---------------------------------------------------------------------------
+const _builtInProvidersList = [...getProviders()].sort().join(", ");
 // ---------------------------------------------------------------------------
 // TOOL_SUMMARIES: 5-8 word terse summaries (system prompt orientation)
 // ---------------------------------------------------------------------------
@@ -272,24 +281,68 @@ Present a plan to the user before creating agents in batch.
 Multiple agents can be created in one turn. Customize ALL workspace files for each agent after creation — or use single-call creation above to inline ROLE.md/IDENTITY.md and skip the post-create writes entirely.`,
     providers_manage: `## Provider Configuration Guide
+### Credential Pre-Check (MANDATORY before any provider switch)
+BEFORE storing or switching, you MUST verify the credential exists. Never patch agents.*.provider, agents.*.model, call agents_manage update, call providers_manage create, or use gateway.patch on any agents.*.{provider,model} key without first running this 4-step flow:
+1. **List existing keys** — gateway({ action: "env_list", filter: "<PROVIDER>*" })
+   Examples: filter "OPENROUTER*" / "ANTHROPIC*" / "GROQ*" / "DEEPSEEK*".
+   env_list returns only NAMES, never values — safe to call proactively.
+   Use the canonical name: <PROVIDER_UPPER>_API_KEY for cloud providers
+   (OPENROUTER_API_KEY, ANTHROPIC_API_KEY, GROQ_API_KEY, DEEPSEEK_API_KEY,
+   GEMINI_API_KEY for google, etc.). If env_list returns a non-canonical
+   name (e.g. "OR_KEY", "MY_OPENROUTER_KEY"), use that matched name verbatim
+   as apiKeyName. Local providers (ollama, lm-studio, vLLM) skip this step
+   entirely (no API key needed).
+2. **Decide based on env_list result:**
+   a. **Match found** — use the matching env name as apiKeyName. Skip to step 4.
+   b. **No match** — ASK THE USER for the API key. Phrase it explicitly:
+      "I don't see a <KEY_NAME> configured. Please share your <Provider>
+      API key (signup link if relevant) and I'll store it before switching."
+      Do NOT proceed without the key. Do NOT invent a fake key. Do NOT
+      silently skip the switch.
+3. **Store the key (only after the user supplies it)** —
+   gateway({ action: "env_set", env_key: "<KEY_NAME>", env_value: "<USER_PROVIDED_KEY>" })
+4. **Now safe to switch** — call providers_manage create (for non-built-in)
+   or proceed directly to agents_manage update (for built-in). See the
+   "Built-in vs Custom Provider Check" and "Switching an Agent's Provider
+   or Model" sections below.
+This rule applies UNCONDITIONALLY across ALL provider-switch paths:
+- agents_manage update with new {provider, model}
+- providers_manage create (the apiKeyName must reference a key already in env)
+- gateway.patch agents.*.provider or agents.*.model
+- providers_manage update changing apiKeyName
+If the agent's primary will use a credential, you verify the credential exists FIRST. Skipping this check is the bug that causes "No API key found for <provider>" failures at the next chat turn — the user sees a generic "An error occurred" message and the bot silently breaks.
 ### Built-in vs Custom Provider Check (MANDATORY first step)
-Before creating a custom provider, check if the model already exists in the built-in catalog. Built-in providers (anthropic, google, openai, groq, mistral, deepseek, cerebras, xai, openrouter) already have their models registered — creating a redundant custom entry is wrong and will be ignored. Use models_manage({ action: "list" }) to see available built-in models.
+Before creating a custom provider, check if the model already exists in the built-in catalog. Built-in providers (${_builtInProvidersList}) already have their models registered — creating a redundant custom entry is wrong and will be ignored. Call models_manage({ action: "list_providers" }) for the live native-catalog list, or models_manage({ action: "list" }) for available models. Prefer list_providers over the static list above when you need an up-to-date roster.
-If the model IS built-in: skip provider creation. Just store the API key (gateway env_set) and switch the agent directly.
+If the model IS built-in: skip provider creation. After credential pre-check passes (above), go straight to agents_manage update with the new provider/model pair (the credential is already in env from pre-check step 3, so apiKeyName resolution succeeds at the next session).
 If the model is NOT built-in: you need a custom provider. Proceed to the steps below, but first gather ALL required configuration.
+### Choosing the \`type\` Field (POST AUTO-PROMOTE FLOW)
+After Layer 1C of the catalog-driven providers redesign, the \`type\` field follows two distinct rules depending on the provider name:
+- **If \`provider_id\` matches a built-in name** (use models_manage list_providers to verify): OMIT \`type\` entirely from the create config. The daemon auto-promotes \`type\` to the native catalog name when \`provider_id\` matches a native entry AND no custom \`baseUrl\` is supplied. Setting \`type:"openai"\` for a built-in name still works (auto-promoted), but omitting it is cleaner.
+- **If \`provider_id\` is a custom OpenAI-compatible proxy** (NVIDIA NIM, Together, Fireworks, etc.) NOT in the native catalog: set \`type:"openai"\` (or whatever wire-format API matches). Auto-promotion does not fire for non-catalog names.
+- **If \`baseUrl\` differs from the native catalog URL** for a built-in name: this signals you want the OpenAI-passthrough shape (custom proxy that masquerades as the built-in). Auto-promotion is suppressed; the entry stays as \`type:"openai"\`.
 ### Information Gathering for Custom Providers
 When creating a non-built-in provider, you MUST have: (1) the API base URL, (2) the exact model ID string, (3) the API protocol type. If the user did not supply all three:
 1. Use web_search to look up the provider's API documentation (search for "<provider name> API base URL" or "<provider name> API docs").
 2. If web search finds the information, use it to fill in the missing fields.
 3. If web search does NOT find the information, ask the user to supply the missing fields before proceeding. Do NOT guess or invent URLs.
-### Credential Workflow
-API keys are NEVER stored in provider config. Always use this two-step process:
-1. Store the API key: gateway({ action: "env_set", env_key: "<KEY_NAME>", env_value: "<key>" })
-2. Create the provider: providers_manage({ action: "create", provider_id: "<name>", config: { type: "openai", baseUrl: "<url>", apiKeyName: "<KEY_NAME>", models: [{ id: "<model>" }] } })
+### Credential Workflow Summary
+API keys are NEVER stored in provider config — they live in env (set via Credential Pre-Check step 3 above) and are referenced by name via apiKeyName. The Credential Pre-Check above is the canonical entry point; this section just documents what gets stored where:
+- Env (~/.comis/.env): the API key value, keyed by name (e.g. OPENROUTER_API_KEY=sk-or-v1-...)
+- providers.entries.<id>.apiKeyName: the env NAME (not the value)
+- SecretManager / setRuntimeApiKey: comis populates this from the env at daemon boot and on hot-reload after env_set; agents never set it directly.
-For local providers (Ollama, LM Studio, vLLM) that don't need API keys, omit apiKeyName.
+For local providers (Ollama, LM Studio, vLLM) that don't need API keys, the Credential Pre-Check is skipped (step 1 noted local providers skip); just call providers_manage create with \`apiKeyName\` omitted.
 ### After Creating a Provider
 Switch an agent to use the new provider:
@@ -301,9 +354,10 @@ To switch an agent to a different provider/model, call agents_manage update with
 \`agents.*.model\` and \`agents.*.provider\` are listed in MUTABLE_CONFIG_OVERRIDES, so the immutability guard does not block the patch.
-**Two preconditions the LLM MUST verify before issuing the update:**
+**Three preconditions the LLM MUST verify before issuing the update:**
   1. The target provider exists as a \`providers.entries.<provider_id>\` key. If it does not, call providers_manage create FIRST (and gateway env_set for the API key if needed). Patching an agent to a provider that has no entry resolves under the wrong provider family at the next session — the original bug.
   2. The model id matches a \`models[].id\` in that provider entry (or is a built-in known to the pi-ai catalog for that provider type). Otherwise \`registry.find(provider, model)\` returns undefined and the next session falls back with a "Model not found" message.
+  3. **Credential pre-check passed** (see top of this guide). The target provider's apiKeyName is non-empty AND \`gateway env_list filter:"<PROVIDER>*"\` confirmed the named secret exists in env. Skipping this step is the bug that causes "No API key found" failures at the next chat turn — verified production repro on 2026-05-01.
 **Timing — the change is NOT hot-applied to the active session.**
 agents_manage update writes through persistToConfig WITHOUT a hot-update callback, which triggers a SIGUSR2 daemon restart (2-second debounce). The new provider/model takes effect on the next session, not the currently-running prompt. Tell the user the switch is queued and will take effect after the daemon settles.

package/node_modules/@comis/agent/dist/bootstrap/sections/tooling-sections.js CHANGED Viewed

@@ -4,6 +4,7 @@
  * and compacted output recovery.
  */
 import { TOOL_SUMMARIES, TOOL_ORDER } from "./tool-descriptions.js";
+import { getProviders } from "@mariozechner/pi-ai";
 export function buildToolingSection(toolNames, _modelTier, toolSummaries) {
     if (toolNames.length === 0)
         return [];
@@ -360,7 +361,8 @@ export function buildPrivilegedToolsSection(toolNames, isMinimal, deferred) {
         "- **Reset vs delete session**: Reset clears messages but keeps the session identity (good for \"start fresh\"). Delete archives the transcript and removes the session entirely.",
         "- **Memory delete vs flush**: Delete removes specific entries by ID (surgical). Flush removes all entries for a scope (nuclear -- use with caution, requires approval).",
         "- **Token rotation**: Prefer rotate over revoke+create -- rotation is atomic and prevents downtime.",
-        "- **Built-in first**: Before creating a custom provider, check if the model is already built-in (models_manage list). Built-in providers (anthropic, google, openai, groq, mistral, deepseek, cerebras, xai, openrouter) need only an API key — no custom provider entry. Only create a custom provider for models NOT in the built-in catalog.",
+        `- **Built-in first**: Before creating a custom provider, check if the model is already built-in (models_manage list). Built-in providers (${[...getProviders()].sort().join(", ")}) need only an API key — no custom provider entry. Only create a custom provider for models NOT in the built-in catalog.`,
+        `- **Discover providers at runtime**: Call \`models_manage({ action: "list_providers" })\` for the live native-catalog list — preferred over relying on the static text above when the SDK is upgraded.`,
         "- **Provider then agent**: When adding a custom (non-built-in) provider, first create the provider entry (providers_manage create), store the API key if needed (gateway env_set -- skip for keyless providers like Ollama), then switch the agent (agents_manage update). Never set an agent's model to a name that has no matching provider. If you lack the provider's base URL or model ID, use web_search to find it; if that fails, ask the user.",
         "- **Failover chain**: After creating multiple providers, configure automatic model failover on the agent (agents_manage update with modelFailover.fallbackModels). Each fallback entry is a {provider, modelId} pair referencing a configured provider. Failover order: primary > cache-aware retry > auth key rotation > fallback models in order. Never add a fallback model whose provider does not exist.",
         "- **Add vs replace fallback**: modelFailover.fallbackModels and authProfiles are REPLACED wholesale on update (scalar fields deep-merge; arrays do not). When the user says 'add' / 'also' / 'in addition', call agents_manage get FIRST to read the current array, append, then update with the full list. When the user says 'set' / 'use' / 'switch to', overwrite directly.",

package/node_modules/@comis/agent/dist/bridge/pi-event-bridge.d.ts CHANGED Viewed

@@ -49,6 +49,13 @@ export interface PiEventBridgeDeps {
     onDelta?: (delta: string) => void;
     /** Called when a safety control triggers -- PiExecutor uses this to call session.abort(). */
     onAbort?: () => void;
+    /** Called when a `rate_limited` error fires inside the SDK's auto-retry loop --
+     *  PiExecutor wires this to `session.abortRetry()` to cancel the SDK's
+     *  internal retry. Rate-limit windows are per-minute (longer than the SDK's
+     *  ~30s retry budget), so retrying within the window cannot succeed.
+     *  Non-`rate_limited` retryable errors (overloaded, network, 5xx) bypass this
+     *  hook -- the SDK's normal retry-with-backoff proceeds. (260501-dkl) */
+    onAbortRetry?: () => void;
     /** SDK context usage accessor -- returns live context metrics from AgentSession. */
     getContextUsage?: () => ContextUsageData | undefined;
     /** Context window guard for percent-based warn/block checks. */

package/node_modules/@comis/agent/dist/bridge/pi-event-bridge.js CHANGED Viewed

@@ -16,6 +16,7 @@ import { randomUUID } from "node:crypto";
 import { resolveModelPricing } from "../model/model-catalog.js";
 import { getCacheProviderInfo } from "../executor/cache-usage-helpers.js";
 import { sanitizeMcpToolNameForAnalytics } from "../executor/cache-break-detection.js";
+import { classifyError } from "../executor/error-classifier.js";
 import { extractPlanFromResponse } from "../planner/plan-extractor.js";
 import { extractMcpServerName, classifyMcpErrorType, sanitizeToolArgs, extractErrorText } from "./bridge-event-handlers.js";
 import { createBridgeMetrics, buildBridgeResult } from "./bridge-metrics.js";
@@ -843,6 +844,31 @@ export function createPiEventBridge(deps) {
                     break;
                 }
                 // -----------------------------------------------------------------
+                // SDK auto-retry loop: abort on rate_limited (260501-dkl)
+                // -----------------------------------------------------------------
+                case "auto_retry_start": {
+                    const errorMessage = event.errorMessage ?? "";
+                    const attempt = event.attempt;
+                    const maxAttempts = event.maxAttempts;
+                    const delayMs = event.delayMs;
+                    const classification = classifyError(new Error(errorMessage));
+                    if (classification.category === "rate_limited") {
+                        deps.logger.info({
+                            module: "agent.bridge.auto-retry-abort",
+                            attempt,
+                            maxAttempts,
+                            delayMs,
+                            errorMessage,
+                            hint: "Rate-limit windows are per-minute; SDK retry budget cannot bridge the window -- aborting retry to surface terminal failure",
+                            errorKind: "rate_limited",
+                        }, "Aborting SDK auto-retry on rate-limited error");
+                        deps.onAbortRetry?.();
+                    }
+                    // Non-rate_limited categories (overloaded, network, server_error, etc.)
+                    // fall through -- let the SDK's normal retry-with-backoff proceed.
+                    break;
+                }
+                // -----------------------------------------------------------------
                 // Default: ignore unknown event types (future SDK events)
                 // -----------------------------------------------------------------
                 default:

package/node_modules/@comis/agent/dist/context-engine/signature-replay-scrubber.d.ts CHANGED Viewed

@@ -20,6 +20,27 @@
  * guarantees the surrounding context changes turn-to-turn. So the latest's
  * signatures get invalidated too. Drop them all.
  *
+ * 260430-anthropic-400-thinking-block: the prior cache-fence skip
+ * (`if (i <= budget.cacheFenceIndex) preserve`) caused a per-execution
+ * regression. In iteration 1 of an execution the fence is -1 so all signed
+ * thinking blocks are stripped and the wire body establishes a cached
+ * prefix WITHOUT signatures. In subsequent iterations the fence becomes
+ * positive (= the breakpoint placed in iter 1) and the skip preserved
+ * messages 0…fence as-is. But `buildSessionContext()` reloads from on-disk
+ * JSONL where signatures are intact, so the wire body re-introduced signed
+ * thinking blocks at fence-protected positions that Anthropic had cached
+ * as unsigned. The cache-prefix validator detected the divergence and
+ * rejected with `400 invalid_request_error: ... blocks cannot be modified`.
+ *
+ * Fix: scrub uniformly across the array, regardless of cacheFenceIndex.
+ * The scrubber is pure/deterministic — input messages → same scrubbed
+ * output every time — so iter 1 strips, Anthropic caches the stripped
+ * prefix, iter 2 strips identically, and the cache hits. There is NO
+ * per-iteration cache penalty: the rebuild only happens once per session
+ * (when iter 1 first establishes the cached prefix). The cacheFenceIndex
+ * is read from the budget for diagnostic stats only and never gates
+ * stripping.
+ *
  * Provider coverage: NOT gated on `model.reasoning` because Gemini's
  * `thoughtSignature` lives on toolCall blocks even when the model itself
  * is not flagged as reasoning. Cost is one walk over assistant messages,

package/node_modules/@comis/agent/dist/context-engine/signature-replay-scrubber.js CHANGED Viewed

@@ -21,6 +21,27 @@
  * guarantees the surrounding context changes turn-to-turn. So the latest's
  * signatures get invalidated too. Drop them all.
  *
+ * 260430-anthropic-400-thinking-block: the prior cache-fence skip
+ * (`if (i <= budget.cacheFenceIndex) preserve`) caused a per-execution
+ * regression. In iteration 1 of an execution the fence is -1 so all signed
+ * thinking blocks are stripped and the wire body establishes a cached
+ * prefix WITHOUT signatures. In subsequent iterations the fence becomes
+ * positive (= the breakpoint placed in iter 1) and the skip preserved
+ * messages 0…fence as-is. But `buildSessionContext()` reloads from on-disk
+ * JSONL where signatures are intact, so the wire body re-introduced signed
+ * thinking blocks at fence-protected positions that Anthropic had cached
+ * as unsigned. The cache-prefix validator detected the divergence and
+ * rejected with `400 invalid_request_error: ... blocks cannot be modified`.
+ *
+ * Fix: scrub uniformly across the array, regardless of cacheFenceIndex.
+ * The scrubber is pure/deterministic — input messages → same scrubbed
+ * output every time — so iter 1 strips, Anthropic caches the stripped
+ * prefix, iter 2 strips identically, and the cache hits. There is NO
+ * per-iteration cache penalty: the rebuild only happens once per session
+ * (when iter 1 first establishes the cached prefix). The cacheFenceIndex
+ * is read from the budget for diagnostic stats only and never gates
+ * stripping.
+ *
  * Provider coverage: NOT gated on `model.reasoning` because Gemini's
  * `thoughtSignature` lives on toolCall blocks even when the model itself
  * is not flagged as reasoning. Cost is one walk over assistant messages,
@@ -43,7 +64,7 @@
 export function createSignatureReplayScrubber(deps) {
     return {
         name: "signature-replay-scrubber",
-        async apply(messages, budget) {
+        async apply(messages, _budget) {
             if (messages.length === 0)
                 return messages;
             // Find the latest assistant message index. If none, no scrub.
@@ -64,20 +85,19 @@ export function createSignatureReplayScrubber(deps) {
             for (let i = 0; i < messages.length; i++) {
                 // eslint-disable-next-line security/detect-object-injection -- numeric index
                 const original = messages[i];
-                // Cache fence: messages at or below the fence must not be modified.
-                if (i <= budget.cacheFenceIndex) {
-                    // eslint-disable-next-line security/detect-object-injection -- numeric index
-                    result[i] = original;
-                    continue;
-                }
+                // 260430-anthropic-400-thinking-block: cacheFenceIndex is intentionally
+                // NOT consulted here. Stripping uniformly across the array keeps the
+                // scrubbed prefix identical across iterations of the same execution,
+                // which is what Anthropic's prompt-cache validator requires. See
+                // module docstring for the full rationale.
                 const msg = original;
                 if (msg.role !== "assistant" || !Array.isArray(msg.content)) {
                     // eslint-disable-next-line security/detect-object-injection -- numeric index
                     result[i] = original;
                     continue;
                 }
-                // Assistant message past the fence — walk content blocks. Latest
-                // included: cross-turn signature validation invalidates it too.
+                // Walk content blocks. Latest included: cross-turn signature
+                // validation invalidates it too.
                 const content = msg.content;
                 let messageChanged = false;
                 const newContent = new Array(content.length);

package/node_modules/@comis/agent/dist/context-engine/signature-surrogate-guard.d.ts CHANGED Viewed

@@ -21,8 +21,16 @@
  * plain text rather than sending sanitized-text + original-signature
  * mismatch. Skips `redacted: true` blocks (no readable text to taint).
  *
- * Cache fence respected exactly like `thinking-block-cleaner` and
- * `signature-replay-scrubber`.
+ * 260430-anthropic-400-thinking-block: cacheFenceIndex is intentionally
+ * NOT consulted to gate guarding. The guard is pure/deterministic — input
+ * messages → same guarded output every time — so iter 1 strips,
+ * Anthropic caches the guarded prefix, iter 2 strips identically, and the
+ * cache hits. The prior fence-skip caused per-execution divergence
+ * symmetric to the bug found in `signature-replay-scrubber.ts` and
+ * `thinking-block-cleaner.ts`: iter 1 stripped (fence=-1) and built a
+ * surrogate-safe cached prefix, iter 2 preserved fence-protected messages
+ * (fence>0) and re-introduced surrogate-tainted-with-original-signature
+ * blocks at positions Anthropic had cached without them.
  *
  * Immutability: never mutates input; shallow-copies the block and the
  * containing message only when scrubbing is needed. When no scrub fires,

package/node_modules/@comis/agent/dist/context-engine/signature-surrogate-guard.js CHANGED Viewed

@@ -22,8 +22,16 @@
  * plain text rather than sending sanitized-text + original-signature
  * mismatch. Skips `redacted: true` blocks (no readable text to taint).
  *
- * Cache fence respected exactly like `thinking-block-cleaner` and
- * `signature-replay-scrubber`.
+ * 260430-anthropic-400-thinking-block: cacheFenceIndex is intentionally
+ * NOT consulted to gate guarding. The guard is pure/deterministic — input
+ * messages → same guarded output every time — so iter 1 strips,
+ * Anthropic caches the guarded prefix, iter 2 strips identically, and the
+ * cache hits. The prior fence-skip caused per-execution divergence
+ * symmetric to the bug found in `signature-replay-scrubber.ts` and
+ * `thinking-block-cleaner.ts`: iter 1 stripped (fence=-1) and built a
+ * surrogate-safe cached prefix, iter 2 preserved fence-protected messages
+ * (fence>0) and re-introduced surrogate-tainted-with-original-signature
+ * blocks at positions Anthropic had cached without them.
  *
  * Immutability: never mutates input; shallow-copies the block and the
  * containing message only when scrubbing is needed. When no scrub fires,
@@ -50,7 +58,7 @@ const UNPAIRED_LOW_SURROGATE = /(?<![\uD800-\uDBFF])[\uDC00-\uDFFF]/;
 export function createSignatureSurrogateGuard(deps) {
     return {
         name: "signature-surrogate-guard",
-        async apply(messages, budget) {
+        async apply(messages, _budget) {
             if (messages.length === 0) {
                 deps?.onGuarded?.({ signaturesStripped: 0 });
                 return messages;
@@ -61,12 +69,10 @@ export function createSignatureSurrogateGuard(deps) {
             for (let i = 0; i < messages.length; i++) {
                 // eslint-disable-next-line security/detect-object-injection -- numeric index
                 const original = messages[i];
-                // Cache fence: messages at or below the fence must not be modified.
-                if (i <= budget.cacheFenceIndex) {
-                    // eslint-disable-next-line security/detect-object-injection -- numeric index
-                    result[i] = original;
-                    continue;
-                }
+                // 260430-anthropic-400-thinking-block: cacheFenceIndex is intentionally
+                // NOT consulted here. Stripping uniformly across the array keeps the
+                // guarded prefix identical across iterations of the same execution,
+                // which is what Anthropic's prompt-cache validator requires.
                 const msg = original;
                 if (msg.role !== "assistant" || !Array.isArray(msg.content)) {
                     // eslint-disable-next-line security/detect-object-injection -- numeric index

package/node_modules/@comis/agent/dist/context-engine/thinking-block-cleaner.d.ts CHANGED Viewed

@@ -5,6 +5,18 @@
  * keep-window, measured in assistant turns (not turn pairs). Redacted thinking
  * blocks (containing encrypted signatures for API continuity) are always preserved.
  *
+ * 260430-anthropic-400-thinking-block: cacheFenceIndex is intentionally NOT
+ * consulted to gate stripping. The cleaner is pure/deterministic — input
+ * messages → same cleaned output every time — so iteration 1 strips,
+ * Anthropic caches the cleaned prefix, iteration 2 strips identically, and
+ * the cache hits. The prior fence-skip caused per-execution divergence:
+ * iter 1 stripped (fence=-1) and built a thinking-free cached prefix,
+ * iter 2 preserved fence-protected messages (fence>0) and re-introduced
+ * thinking blocks at positions Anthropic had cached without them, which
+ * the prompt-cache validator rejected with `400 ... blocks cannot be
+ * modified`. The cacheFenceIndex on the budget is read for diagnostic
+ * stats only and never gates the strip decision.
+ *
  * Immutability: never mutates input messages or arrays. Returns new arrays and
  * shallow-copied messages only when changes are needed. When no changes are
  * required, returns the original array reference (zero allocation).
@@ -26,9 +38,12 @@ import type { ContextLayer } from "./types.js";
  */
 export declare function createThinkingBlockCleaner(keepTurns: number, onCleaned?: (stats: {
     blocksRemoved: number;
-    /** Cache fence index when blocks were removed with fence active. */
+    /** Cache fence index when present on the budget; reported for diagnostics
+     *  only. Stripping is no longer gated on the fence (260430-anthropic-400-
+     *  thinking-block). */
     cacheFenceIndex?: number;
-    /** Number of messages protected by the cache fence. */
+    /** Number of messages protected by the cache fence. Always undefined now
+     *  because the fence does not protect any messages from stripping. */
     messagesProtected?: number;
     /** Total messages in the conversation. */
     totalMessages?: number;

package/node_modules/@comis/agent/dist/context-engine/thinking-block-cleaner.js CHANGED Viewed

@@ -6,6 +6,18 @@
  * keep-window, measured in assistant turns (not turn pairs). Redacted thinking
  * blocks (containing encrypted signatures for API continuity) are always preserved.
  *
+ * 260430-anthropic-400-thinking-block: cacheFenceIndex is intentionally NOT
+ * consulted to gate stripping. The cleaner is pure/deterministic — input
+ * messages → same cleaned output every time — so iteration 1 strips,
+ * Anthropic caches the cleaned prefix, iteration 2 strips identically, and
+ * the cache hits. The prior fence-skip caused per-execution divergence:
+ * iter 1 stripped (fence=-1) and built a thinking-free cached prefix,
+ * iter 2 preserved fence-protected messages (fence>0) and re-introduced
+ * thinking blocks at positions Anthropic had cached without them, which
+ * the prompt-cache validator rejected with `400 ... blocks cannot be
+ * modified`. The cacheFenceIndex on the budget is read for diagnostic
+ * stats only and never gates the strip decision.
+ *
  * Immutability: never mutates input messages or arrays. Returns new arrays and
  * shallow-copied messages only when changes are needed. When no changes are
  * required, returns the original array reference (zero allocation).
@@ -65,11 +77,10 @@ export function createThinkingBlockCleaner(keepTurns, onCleaned, getKeepTurnsOve
             let blocksRemoved = 0;
             const result = new Array(messages.length);
             for (let i = 0; i < messages.length; i++) {
-                // Messages at or before the cache fence must not be modified
-                if (i <= budget.cacheFenceIndex) {
-                    result[i] = messages[i];
-                    continue;
-                }
+                // 260430-anthropic-400-thinking-block: cacheFenceIndex is intentionally
+                // NOT consulted here. Stripping uniformly across the array keeps the
+                // cleaned prefix identical across iterations of the same execution,
+                // which is what Anthropic's prompt-cache validator requires.
                 const msg = messages[i];
                 if (!oldAssistantIndices.has(i)) {
                     // Within keep-window or not an assistant message -- pass through unchanged
@@ -103,13 +114,13 @@ export function createThinkingBlockCleaner(keepTurns, onCleaned, getKeepTurnsOve
             // If no changes were made to any message, return original array reference
             if (!anyChanged)
                 return messages;
-            // Report cleaning stats via callback
-            // Include cache fence impact when blocks were skipped due to fence
+            // Report cleaning stats via callback. cacheFenceIndex is reported for
+            // diagnostic visibility but is no longer gating stripping. messagesProtected
+            // is intentionally omitted because no messages are fence-protected anymore.
             onCleaned?.({
                 blocksRemoved,
                 ...(budget.cacheFenceIndex >= 0 && blocksRemoved > 0 && {
                     cacheFenceIndex: budget.cacheFenceIndex,
-                    messagesProtected: budget.cacheFenceIndex + 1,
                     totalMessages: messages.length,
                 }),
             });

package/node_modules/@comis/agent/dist/executor/executor-prompt-runner.js CHANGED Viewed

@@ -363,6 +363,34 @@ export async function runPrompt(params) {
                         // eslint-disable-next-line no-useless-assignment
                         silentRetryAttempted = true;
                     }
+                    else if (earlyClassification.category === "rate_limited") {
+                        // Provider-side time-based throttle (429/529). Retrying within the
+                        // same runPrompt invocation cannot succeed — the rate-limit window
+                        // hasn't rolled. The model-retry layer's cache-aware short retry
+                        // (model-retry.ts:261-294) is the correct retry point for 429 with
+                        // a parseable Retry-After header < SHORT_RETRY_THRESHOLD_MS. If we
+                        // got here, that retry was either skipped (no Retry-After) or
+                        // exhausted, AND the SDK didn't throw the 429 out (caught inside
+                        // pi-ai's stream wrapper, surfaced as empty response). Re-entering
+                        // runWithModelRetry from this layer would do another N retries that
+                        // all hit the same rate-limit window — observed in production as
+                        // 1 user message → 8 LLM calls (daemon.1.log:23:35:06-23:35:52,
+                        // OpenRouter qwen/qwen3-coder:free 8 RPM cap). Short-circuit.
+                        deps.logger.warn({
+                            llmCalls: earlyBridgeResult.llmCalls,
+                            finishReason: earlyBridgeResult.finishReason,
+                            providerError: llmErrSource,
+                            hint: "Provider returned a rate-limit error; retrying within the same window cannot succeed — surfacing terminal failure to caller",
+                            errorKind: "rate_limited",
+                        }, "Rate-limit error — skipping silent-retry and declaring terminal failure");
+                        promptSucceeded = false;
+                        const llmDetail = llmErrSource ? ` — ${llmErrSource}` : "";
+                        promptError = new Error(`Rate limit exceeded: ${earlyBridgeResult.llmCalls} LLM call(s) produced empty response (finishReason: ${earlyBridgeResult.finishReason ?? "unknown"})${llmDetail}`);
+                        // Defensive invariant: close the gate so a future refactor that
+                        // re-enters this region cannot run a second silent-retry cycle.
+                        // eslint-disable-next-line no-useless-assignment
+                        silentRetryAttempted = true;
+                    }
                     else if (earlyClassification.category === "client_request") {
                         // Plain client_request: deterministic failure (e.g. unprocessable_entity,
                         // bare "cannot be modified" without signature noun). Retrying would

package/node_modules/@comis/agent/dist/executor/executor-response-filter.js CHANGED Viewed

@@ -274,6 +274,9 @@ function summarizeToolCall(call) {
         case "gateway": {
             const action = typeof args.action === "string" ? args.action : undefined;
             const section = typeof args.section === "string" ? args.section : undefined;
+            const key = typeof args.key === "string" ? args.key : undefined;
+            if (action && section && key)
+                return `gateway({action: "${action}", section: "${section}", key: "${key}"})`;
             if (action && section)
                 return `gateway({action: "${action}", section: "${section}"})`;
             if (action)

package/node_modules/@comis/agent/dist/executor/executor-tool-assembly.js CHANGED Viewed

@@ -93,7 +93,9 @@ export async function assembleTools(params) {
             enabled: config.sdkRetry?.enabled ?? true,
             maxRetries: config.sdkRetry?.maxRetries ?? 5,
             baseDelayMs: config.sdkRetry?.baseDelayMs ?? 4000,
-            maxDelayMs: config.sdkRetry?.maxDelayMs ?? 60000,
+            provider: {
+                maxRetryDelayMs: config.sdkRetry?.maxDelayMs ?? 60000,
+            },
         },
     };
     // Selective override: directive takes precedence over config

package/node_modules/@comis/agent/dist/executor/phase-filter.d.ts CHANGED Viewed

@@ -12,10 +12,26 @@ export declare function parsePhase(textSignature: unknown): string | undefined;
 /** True if a content block is user-visible text (not commentary). */
 export declare function isVisibleTextBlock(block: any): boolean;
 /**
- * Extract user-visible text from the last assistant message in a session.
+ * Extract user-visible text from the last "real" assistant message.
  *
- * When the last assistant message contains commentary-phase text blocks,
- * filters them out and returns only visible text. Otherwise delegates to
- * the SDK's getLastAssistantText() method.
+ * Filters non-real assistants from the tail walk:
+ *   - aborted-empty (stopReason "aborted" + empty content) — original.
+ *   - error-empty (stopReason "error" + empty content) — sibling of
+ *     aborted-empty, marks failed LLM calls (e.g. 429 / 5xx swallowed
+ *     inside pi-ai's stream wrapper, surfaced as empty content).
+ *   - synthetic-injected (model === "synthetic") — appended by
+ *     orphaned-message-repair.ts to restore role alternation after a
+ *     daemon restart; not user-visible LLM output.
+ *   - cross-turn boundary (role === "user" encountered before a
+ *     qualifying assistant) — return "" because the user message marks
+ *     the start of the current execution window; assistants before it
+ *     belong to prior turns (260501-gyy).
+ *
+ * When the resulting last assistant contains commentary-phase text
+ * blocks, drops them and returns only visible text. Otherwise returns
+ * the visible (non-commentary) text blocks of the last assistant
+ * directly — does NOT delegate to session.getLastAssistantText(),
+ * which walks past empty messages and would re-introduce the
+ * synthetic-leak (260501-egj).
  */
 export declare function getVisibleAssistantText(session: any): string;

package/node_modules/@comis/agent/dist/executor/phase-filter.js CHANGED Viewed

@@ -28,28 +28,61 @@ export function isVisibleTextBlock(block) {
         parsePhase(block.textSignature) !== "commentary");
 }
 /**
- * Extract user-visible text from the last assistant message in a session.
+ * Extract user-visible text from the last "real" assistant message.
  *
- * When the last assistant message contains commentary-phase text blocks,
- * filters them out and returns only visible text. Otherwise delegates to
- * the SDK's getLastAssistantText() method.
+ * Filters non-real assistants from the tail walk:
+ *   - aborted-empty (stopReason "aborted" + empty content) — original.
+ *   - error-empty (stopReason "error" + empty content) — sibling of
+ *     aborted-empty, marks failed LLM calls (e.g. 429 / 5xx swallowed
+ *     inside pi-ai's stream wrapper, surfaced as empty content).
+ *   - synthetic-injected (model === "synthetic") — appended by
+ *     orphaned-message-repair.ts to restore role alternation after a
+ *     daemon restart; not user-visible LLM output.
+ *   - cross-turn boundary (role === "user" encountered before a
+ *     qualifying assistant) — return "" because the user message marks
+ *     the start of the current execution window; assistants before it
+ *     belong to prior turns (260501-gyy).
+ *
+ * When the resulting last assistant contains commentary-phase text
+ * blocks, drops them and returns only visible text. Otherwise returns
+ * the visible (non-commentary) text blocks of the last assistant
+ * directly — does NOT delegate to session.getLastAssistantText(),
+ * which walks past empty messages and would re-introduce the
+ * synthetic-leak (260501-egj).
  */
 export function getVisibleAssistantText(session) {
     const messages = session?.messages;
-    // Find last non-aborted assistant message
-    const lastAssistant = Array.isArray(messages)
-        ? messages
-            .slice()
-            .reverse()
-            .find((m) => {
-            if (m.role !== "assistant")
-                return false;
+    // Find last "real" assistant message in the CURRENT execution window —
+    // skip aborted-empty, error-empty, and synthetic-injected; stop at the
+    // first user message (turn boundary) to avoid leaking prior-turn text
+    // (260501-gyy).
+    const lastAssistant = (() => {
+        if (!Array.isArray(messages))
+            return undefined;
+        for (let i = messages.length - 1; i >= 0; i--) {
+            const m = messages[i]; // eslint-disable-line security/detect-object-injection
+            // Crossed turn boundary — assistants before this user message belong
+            // to a prior turn and must not be returned.
+            if (m?.role === "user")
+                return undefined;
+            // toolResult / tool / other roles — keep walking within current turn.
+            if (m?.role !== "assistant")
+                continue;
+            // Skip aborted-empty (existing behavior — preserved).
             if (m.stopReason === "aborted" && m.content?.length === 0)
-                return false;
-            return true;
-        })
-        : undefined;
-    // Only activate phase filtering when commentary blocks are present
+                continue;
+            // Skip error-empty — failed LLM calls (e.g. 429 swallowed inside
+            // pi-ai's stream wrapper).
+            if (m.stopReason === "error" && m.content?.length === 0)
+                continue;
+            // Skip synthetic-injected — orphaned-message-repair scaffolding.
+            if (m.model === "synthetic")
+                continue;
+            return m;
+        }
+        return undefined;
+    })();
+    // Only activate phase filtering when commentary blocks are present.
     const hasCommentary = lastAssistant?.content?.some((b) => b?.type === "text" && parsePhase(b.textSignature) === "commentary") ?? false;
     if (hasCommentary) {
         return lastAssistant.content
@@ -57,7 +90,17 @@ export function getVisibleAssistantText(session) {
             .map((b) => b.text)
             .join("");
     }
-    // No commentary — delegate to SDK method
-    return session?.getLastAssistantText?.() ?? "";
+    // No commentary — return lastAssistant's visible text directly.
+    // Do NOT delegate to session.getLastAssistantText() because it walks
+    // past empty messages (aborted/error/etc.) and re-introduces the
+    // synthetic-leak (production bug 260501-egj: post-restart-resumption
+    // rate-limit returned synthetic placeholder instead of the
+    // 260501-cur "Rate limit exceeded" terminal error).
+    if (!lastAssistant?.content || !Array.isArray(lastAssistant.content))
+        return "";
+    return lastAssistant.content
+        .filter(isVisibleTextBlock)
+        .map((b) => b.text)
+        .join("");
 }
 /* eslint-enable @typescript-eslint/no-explicit-any */