npm - @botbotgo/agent-harness - Versions diffs - 0.0.400 → 0.0.402 - Mend

@botbotgo/agent-harness 0.0.400 → 0.0.402

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (23) hide show

package/README.md +17 -0
package/README.zh.md +17 -0
package/dist/contracts/runtime-observability.d.ts +25 -0
package/dist/package-version.d.ts +1 -1
package/dist/package-version.js +1 -1
package/dist/runtime/adapter/flow/stream-runtime.js +16 -1
package/dist/runtime/adapter/local-tool-invocation.js +242 -14
package/dist/runtime/adapter/middleware-assembly.js +1 -1
package/dist/runtime/adapter/model/invocation-request.js +1 -1
package/dist/runtime/adapter/model/model-providers.js +178 -119
package/dist/runtime/adapter/stream-event-projection.js +70 -5
package/dist/runtime/adapter/tool/tool-hitl.js +49 -6
package/dist/runtime/agent-runtime-adapter.d.ts +2 -0
package/dist/runtime/agent-runtime-adapter.js +329 -36
package/dist/runtime/harness/bindings.js +2 -0
package/dist/runtime/harness/tool-gateway/index.d.ts +2 -0
package/dist/runtime/harness/tool-gateway/index.js +2 -0
package/dist/runtime/harness/tool-gateway/policy.d.ts +2 -0
package/dist/runtime/harness/tool-gateway/policy.js +45 -0
package/dist/runtime/harness/tool-gateway/validation.d.ts +33 -0
package/dist/runtime/harness/tool-gateway/validation.js +176 -0
package/dist/runtime/parsing/output-recovery.js +1 -4
package/package.json +15 -15

package/README.md CHANGED Viewed

@@ -390,6 +390,22 @@ botbotgo -w /path/to/another-workspace "Summarize this project."
 Development tip: repository-owned Ollama workspaces now default to `http://127.0.0.1:11434` for release-friendly local behavior. During development, point them at a shared remote Ollama by exporting `AGENT_HARNESS_OLLAMA_BASE_URL=https://ollama-rtx-4070.easynet.world` or `AGENT_HARNESS_OPENAI_COMPATIBLE_BASE_URL=https://ollama-rtx-4070.easynet.world/v1` before starting the runtime.
+For CPU-only hosts with large RAM, run `llama.cpp` as an OpenAI-compatible server and use the existing `openai-compatible` provider:
+```yaml
+apiVersion: agent-harness/v1alpha1
+kind: Models
+spec:
+  - name: default
+    provider: openai-compatible
+    model: local-model
+    baseUrl: ${env:AGENT_HARNESS_LLAMA_CPP_BASE_URL:-http://127.0.0.1:8080/v1}
+    apiKey: dummy
+    toolCallingMode: prompted-json
+```
+Start the model separately with `llama-server -m /path/to/model.gguf --host 127.0.0.1 --port 8080`. `apiKey: dummy` uses the existing OpenAI-compatible auth-omission path, so the runtime does not send bearer auth to local `llama-server`.
 Workspace layout:
 ```text
@@ -847,6 +863,7 @@ Practical guidance:
 Local GGUF note:
 - `provider: node-llama-cpp` now exposes a LangChain-style tool-binding shim, so local GGUF models can enter the standard tool-calling path without an app-owned model wrapper
+- `provider: openai-compatible` targets an external `llama-server` endpoint when the model process should be tuned or supervised outside Node.js
 - `backend: langchain-v1` is the straightforward local GGUF path and is the currently verified default for `node-llama-cpp` tool use
 - `backend: deepagent` can also reach the same tool-calling path, but final reliability still depends on the selected model following upstream tool schemas correctly
 - `agent-harness` does not try to normalize every model-specific argument drift or malformed tool payload; once the runtime hands a call to upstream tools, schema fidelity is a model responsibility

package/README.zh.md CHANGED Viewed

@@ -386,6 +386,22 @@ botbotgo -w /path/to/another-workspace "Summarize this project."
 开发时如果要把仓库自带的 Ollama workspace 切到共享远端，只需要在启动前设置环境变量即可：发布默认仍会回到 `http://127.0.0.1:11434` 这种本地 endpoint，而开发阶段可以通过 `AGENT_HARNESS_OLLAMA_BASE_URL=https://ollama-rtx-4070.easynet.world` 或 `AGENT_HARNESS_OPENAI_COMPATIBLE_BASE_URL=https://ollama-rtx-4070.easynet.world/v1` 覆盖到远端。
+如果目标机器没有 GPU 但内存很大，可以单独启动 `llama.cpp` 的 OpenAI-compatible server，并继续使用已有的 `openai-compatible` provider：
+```yaml
+apiVersion: agent-harness/v1alpha1
+kind: Models
+spec:
+  - name: default
+    provider: openai-compatible
+    model: local-model
+    baseUrl: ${env:AGENT_HARNESS_LLAMA_CPP_BASE_URL:-http://127.0.0.1:8080/v1}
+    apiKey: dummy
+    toolCallingMode: prompted-json
+```
+模型进程用 `llama-server -m /path/to/model.gguf --host 127.0.0.1 --port 8080` 单独启动。`apiKey: dummy` 会复用现有 OpenAI-compatible 的 auth omission 路径，因此 runtime 不会向本地 `llama-server` 发送 bearer auth。
 工作区布局：
 ```text
@@ -804,6 +820,7 @@ await stop(runtime);
 本地 GGUF 补充说明：
 - `provider: node-llama-cpp` 现在带有一层 LangChain 风格的 tool-binding shim，因此本地 GGUF 模型可以进入标准 tool-calling 路径，而不需要应用自己包一层 model wrapper
+- `provider: openai-compatible` 可以指向外部 `llama-server` endpoint；当模型进程需要在 Node.js 外部单独调参、守护或部署时，继续复用这条已有路径
 - 对 `node-llama-cpp` 来说，`backend: langchain-v1` 仍然是更直接、当前已验证的本地 tool use 路径
 - `backend: deepagent` 也可以走到同一条 tool-calling 路径，但最终稳定性仍取决于所选模型是否能正确遵守 upstream tool schema
 - `agent-harness` 不会为每个模型的参数漂移或畸形 tool payload 做无限兼容；runtime 把调用交给 upstream tools 之后，schema fidelity 就属于模型责任

package/dist/contracts/runtime-observability.d.ts CHANGED Viewed

@@ -126,6 +126,11 @@ export type RuntimeToolExecutionToolPolicy = {
     hasInputSchema: boolean;
     requiresApproval: boolean;
 };
+export type RuntimeToolGatewayToolPolicy = RuntimeToolExecutionToolPolicy & {
+    gatewayMode: "schema-first" | "approval-gated" | "best-effort";
+    modelRole: "propose";
+    runtimeRole: "validate-and-execute" | "request-approval" | "execute-with-runtime-checks";
+};
 export type RuntimeSnapshotModel = {
     id: string;
     provider: string;
@@ -188,6 +193,26 @@ export type RuntimeSnapshot = {
 };
 export type RuntimeToolExecutionPolicy = {
     agentId: string;
+    gateway: {
+        layer: "tool-gateway";
+        toolScope: {
+            source: "agent-binding";
+            exposedToolCount: number;
+            schemaBoundToolCount: number;
+            approvalRequiredToolCount: number;
+        };
+        validation: {
+            strategy: "schema-first";
+            runtimeValidationRequired: boolean;
+            strictProviderSchemaPreferred: boolean;
+        };
+        correction: {
+            invalidArguments: "structured-error-retry";
+            maxModelRetries: number;
+            highRiskInvalidArguments: "approval-or-deny";
+        };
+        tools: RuntimeToolGatewayToolPolicy[];
+    };
     invokeTimeoutMs?: number;
     streamIdleTimeoutMs: number;
     providerRetries: {

package/dist/package-version.d.ts CHANGED Viewed

@@ -1,2 +1,2 @@
-export declare const AGENT_HARNESS_VERSION = "0.0.400";
+export declare const AGENT_HARNESS_VERSION = "0.0.402";
 export declare const AGENT_HARNESS_RELEASE_DATE = "2026-05-02";

package/dist/package-version.js CHANGED Viewed

@@ -1,2 +1,2 @@
-export const AGENT_HARNESS_VERSION = "0.0.400";
+export const AGENT_HARNESS_VERSION = "0.0.402";
 export const AGENT_HARNESS_RELEASE_DATE = "2026-05-02";

package/dist/runtime/adapter/flow/stream-runtime.js CHANGED Viewed

@@ -20,6 +20,12 @@ const CLOSE_REQUIRED_PLAN_RECOVERY_INSTRUCTION = [
     "Your next action must be write_todos: update every remaining pending or in_progress item to completed if evidence was gathered, or failed if it cannot be completed with the available tools.",
     "After that write_todos call, provide the final answer required by the agent response format.",
 ].join("\n");
+const RUN_EVIDENCE_AFTER_PREMATURE_PLAN_CLOSE_INSTRUCTION = [
+    "The required todo board was closed before any non-TODO evidence tool returned.",
+    "Do not call write_todos again yet.",
+    "Your next action must be exactly one non-TODO evidence tool call selected from the available tool descriptions and schemas.",
+    "After that evidence tool returns, update the todo board and then provide the final answer required by the agent response format.",
+].join("\n");
 const INITIAL_REQUIRED_PLAN_INSTRUCTION = [
     "This agent has a required visible planning contract.",
     "Your first action for this request must be write_todos with concrete task steps and statuses.",
@@ -193,10 +199,13 @@ function hasUsefulVisibleSynthesis(value) {
     if (/^(?:model_request|tool_call|call_tool)/iu.test(trimmed)) {
         return false;
     }
+    if (/^(?:name|tool_call_id)\s*=/iu.test(trimmed)) {
+        return false;
+    }
     if (/^(?:we\s+need\s+to|so\s+next\s+step\b)/iu.test(trimmed)) {
         return false;
     }
-    if (/^\{\s*"(?:name|arguments|todos|symbol|query|market|count)"\s*:/iu.test(trimmed)) {
+    if (/^\{\s*"(?:name|arguments|args|argv|todos|symbol|query|market|count|stdout|stderr|exitCode)"\s*:/iu.test(trimmed)) {
         return false;
     }
     if (/^(?:stdout|stderr|exitCode)\s*:/iu.test(trimmed)) {
@@ -702,6 +711,11 @@ export async function* streamRuntimeExecution(options) {
             const streamedIncompletePlanRecoveryInstruction = requiresPlanEvidence(options.binding) && streamedExecutionEvidence.hasIncompletePlanState
                 ? CLOSE_REQUIRED_PLAN_RECOVERY_INSTRUCTION
                 : null;
+            const streamedPrematurePlanCloseRecoveryInstruction = requiresPlanEvidence(options.binding)
+                && streamedExecutionEvidence.hasPlanStateEvidence
+                && !streamedExecutionEvidence.hasSuccessfulNonTodoToolResultEvidence
+                ? RUN_EVIDENCE_AFTER_PREMATURE_PLAN_CLOSE_INSTRUCTION
+                : null;
             const delegatedExecutionRecoveryInstruction = !emittedUnsafeStreamSideEffects || streamedDelegatedRecoveryInstruction
                 ? streamedDelegatedRecoveryInstruction
                 : null;
@@ -734,6 +748,7 @@ export async function* streamRuntimeExecution(options) {
                 ? INVALID_TOOL_SELECTION_RECOVERY_INSTRUCTION
                 : delegatedExecutionRecoveryInstruction
                     ?? streamedIncompletePlanRecoveryInstruction
+                    ?? streamedPrematurePlanCloseRecoveryInstruction
                     ?? streamedRuntimeFailureRecoveryInstruction
                     ?? missingPlanRecoveryInstruction
                     ?? streamedDelegationOnlyRecoveryInstruction

package/dist/runtime/adapter/local-tool-invocation.js CHANGED Viewed

@@ -4,10 +4,15 @@ import { canReplayToolCallsLocally } from "./tool/tool-replay.js";
 import { extractToolCallsFromResult, normalizeToolArgsForSchema, stringifyToolOutput } from "./tool/tool-arguments.js";
 import { extractMemoryCandidatesFromToolOutput } from "../harness/system/runtime-memory-candidates.js";
 import { maybePersistLargeToolOutput } from "./tool/tool-output-artifacts.js";
+import { toolRequiresRuntimeApproval } from "./tool/tool-hitl.js";
+import { validateToolGatewayInput } from "../harness/tool-gateway/index.js";
 import { appendToolRecoveryInstruction, extractVisibleOutput, resolveMissingPlanRecoveryInstruction, resolveExecutionWithoutToolEvidenceTextInstruction, resolveToolCallRecoveryInstruction, sanitizeVisibleText, STRICT_TOOL_JSON_INSTRUCTION, } from "../parsing/output-parsing.js";
 import { salvageJsonToolCalls } from "../parsing/output-tool-args.js";
 import { AUTONOMOUS_INVESTIGATION_RECOVERY_INSTRUCTION } from "../prompts/runtime-prompts.js";
 const TOOL_FOLLOW_UP_INSTRUCTION = "One or more tool results are already available in this conversation. Answer the user's current request directly from the existing context and tool results. Do not ask the user to repeat inputs that are already present above.";
+function isObject(value) {
+    return typeof value === "object" && value !== null && !Array.isArray(value);
+}
 function readPlanStateSummary(output) {
     if (typeof output !== "object" || output === null) {
         return null;
@@ -38,32 +43,176 @@ function hasIncompleteExecutedPlan(executedToolResults) {
     }
     return false;
 }
+function normalizeToolName(value) {
+    return typeof value === "string" ? value.trim().toLowerCase().replace(/[\s-]+/gu, "_") : "";
+}
 function hasNonTodoToolEvidence(executedToolResults) {
-    return executedToolResults.some((item) => item.toolName !== "write_todos" && item.toolName !== "read_todos");
+    return executedToolResults.some((item) => !isPlanToolName(item.toolName));
 }
 function isPlanToolName(toolName) {
-    return toolName === "write_todos"
-        || toolName === "read_todos"
-        || toolName === "tool_call_write_todos"
-        || toolName === "tool_call_read_todos";
+    const normalized = normalizeToolName(toolName);
+    return normalized === "write_todos"
+        || normalized === "read_todos"
+        || normalized === "tool_call_write_todos"
+        || normalized === "tool_call_read_todos";
 }
 function isFallbackTodoCompletionToolCall(toolCall) {
     return typeof toolCall.id === "string"
         && toolCall.id.startsWith("fallback-complete-")
         && (toolCall.name === "write_todos" || toolCall.name === "tool_call_write_todos");
 }
-function isCompletedTodoUpdateToolCall(toolCall) {
-    if (toolCall.name !== "write_todos" && toolCall.name !== "tool_call_write_todos") {
+function isTerminalTodoUpdateToolCall(toolCall) {
+    if (!isPlanToolName(toolCall.name) || normalizeToolName(toolCall.name).includes("read_todos")) {
         return false;
     }
     if (typeof toolCall.args !== "object" || toolCall.args === null || !Array.isArray(toolCall.args.todos)) {
         return false;
     }
     const todos = toolCall.args.todos;
-    return todos.length > 0 && todos.every((todo) => typeof todo === "object"
-        && todo !== null
-        && typeof todo.status === "string"
-        && todo.status.trim().toLowerCase() === "completed");
+    return todos.length > 0 && todos.every((todo) => {
+        if (typeof todo !== "object" || todo === null || typeof todo.status !== "string") {
+            return false;
+        }
+        const status = todo.status.trim().toLowerCase();
+        return status !== "pending" && status !== "in_progress";
+    });
+}
+function readSchemaShape(schema) {
+    if (!isObject(schema)) {
+        return null;
+    }
+    if (isObject(schema.properties)) {
+        return schema.properties;
+    }
+    if (isObject(schema.shape)) {
+        return schema.shape;
+    }
+    const def = schema._def;
+    if (!def) {
+        return null;
+    }
+    const shape = typeof def.shape === "function" ? def.shape() : def.shape;
+    return isObject(shape) ? shape : null;
+}
+function readSchemaDescription(schemaPart) {
+    if (!isObject(schemaPart)) {
+        return "";
+    }
+    const direct = schemaPart.description;
+    if (typeof direct === "string") {
+        return direct;
+    }
+    const nested = schemaPart._def;
+    if (typeof nested?.description === "string") {
+        return nested.description;
+    }
+    return readSchemaDescription(nested?.innerType);
+}
+function readSchemaDefault(schemaPart) {
+    if (!isObject(schemaPart)) {
+        return undefined;
+    }
+    const typed = schemaPart;
+    const hasJsonDefault = Object.prototype.hasOwnProperty.call(schemaPart, "default") && typeof typed.default !== "function";
+    if (hasJsonDefault) {
+        return typed.default;
+    }
+    if (Object.prototype.hasOwnProperty.call(schemaPart, "const")) {
+        return typed.const;
+    }
+    const def = schemaPart._def;
+    if (!def) {
+        return undefined;
+    }
+    if (def.defaultValue !== undefined) {
+        return typeof def.defaultValue === "function" ? def.defaultValue() : def.defaultValue;
+    }
+    return readSchemaDefault(def.innerType);
+}
+function parseFirstStringArrayExample(description) {
+    const arrayMatch = description.match(/\[[^\]]+\]/u);
+    if (!arrayMatch) {
+        return null;
+    }
+    const values = [...arrayMatch[0].matchAll(/["']([^"']+)["']/gu)].map((match) => match[1]).filter(Boolean);
+    return values.length > 0 ? values : null;
+}
+function buildGenericFallbackArgsFromSchema(schema, latestUserInput) {
+    const shape = readSchemaShape(schema);
+    if (!shape) {
+        return {};
+    }
+    const args = {};
+    for (const [key, schemaPart] of Object.entries(shape)) {
+        const defaultValue = readSchemaDefault(schemaPart);
+        if (defaultValue !== undefined) {
+            args[key] = defaultValue;
+            continue;
+        }
+        const description = readSchemaDescription(schemaPart);
+        const arrayExample = parseFirstStringArrayExample(description);
+        if (arrayExample) {
+            args[key] = arrayExample;
+            continue;
+        }
+        if (latestUserInput
+            && !args[key]
+            && /(?:query|question|prompt|input|text)/iu.test(`${key} ${description}`)) {
+            args[key] = latestUserInput;
+        }
+    }
+    return args;
+}
+function readTodoPlanTextFromToolCalls(toolCalls) {
+    const fragments = [];
+    for (const toolCall of toolCalls) {
+        if (typeof toolCall.args !== "object" || toolCall.args === null) {
+            continue;
+        }
+        const todos = toolCall.args.todos;
+        if (!Array.isArray(todos)) {
+            continue;
+        }
+        for (const todo of todos) {
+            if (typeof todo === "object" && todo !== null && typeof todo.content === "string") {
+                fragments.push(todo.content);
+            }
+        }
+    }
+    return fragments.join("\n");
+}
+function selectGenericFallbackEvidenceTool(params) {
+    const candidates = [];
+    const appendCandidate = (name) => {
+        if (isPlanToolName(name)) {
+            return;
+        }
+        const resolved = resolveModelFacingToolName(name, params.toolNameMapping, params.primaryTools);
+        const executable = params.executableTools.get(name)
+            ?? params.executableTools.get(resolved)
+            ?? params.builtinExecutableTools.get(name)
+            ?? params.builtinExecutableTools.get(resolved);
+        if (!executable || candidates.some((candidate) => candidate.executable.name === executable.name)) {
+            return;
+        }
+        candidates.push({ requestedName: name, executable });
+    };
+    for (const tool of params.primaryTools) {
+        appendCandidate(tool.name);
+        const modelFacing = params.toolNameMapping.originalToModelFacing.get(tool.name);
+        if (modelFacing) {
+            appendCandidate(modelFacing);
+        }
+    }
+    for (const name of [...params.executableTools.keys(), ...params.builtinExecutableTools.keys()]) {
+        appendCandidate(name);
+    }
+    if (candidates.length === 0) {
+        return null;
+    }
+    const normalizedPlanText = params.planText.toLowerCase();
+    return candidates.find((candidate) => normalizedPlanText.includes(candidate.requestedName.toLowerCase())
+        || normalizedPlanText.includes(candidate.executable.name.toLowerCase())) ?? candidates[0];
 }
 function buildDeterministicFinalFromToolEvidence(executedToolResults) {
     const evidence = executedToolResults
@@ -92,6 +241,11 @@ function latestToolErrorRecoveryInstruction(executedToolResults) {
     if (!latest || latest.isError !== true) {
         return null;
     }
+    if (typeof latest.output === "object" &&
+        latest.output !== null &&
+        latest.output.code === "INVALID_ARGUMENTS") {
+        return null;
+    }
     const message = typeof latest.output === "string" ? latest.output : JSON.stringify(latest.output);
     return resolveToolCallRecoveryInstruction(new Error(message)) ?? AUTONOMOUS_INVESTIGATION_RECOVERY_INSTRUCTION;
 }
@@ -140,12 +294,19 @@ export async function runLocalToolInvocationLoop({ binding, request, primaryTool
             const hasIncompletePlanState = hasIncompleteExecutedPlan(executedToolResults);
             const shouldEnforceIncompletePlan = requiresPlanEvidence(binding) && hasIncompletePlanState;
             const hasExecutionBeyondTodoPlanning = hasNonTodoToolEvidence(executedToolResults);
+            const missingInitialPlanRecoveryInstruction = resolveMissingPlanRecoveryInstruction({
+                request: activeRequest,
+                requiresPlan: requiresPlanEvidence(binding),
+                hasPlanStateEvidence: hasPlanStateEvidence(executedToolResults),
+                hasWriteTodosEvidence: executedToolResults.some((item) => item.toolName === "write_todos"),
+                hasToolResultEvidence: hasExecutionBeyondTodoPlanning,
+            });
             const toolErrorRecoveryInstruction = latestToolErrorRecoveryInstruction(executedToolResults)
                 ?? terminalToolErrorRecoveryInstruction(terminalText);
             const leakedJsonToolCallRecoveryInstruction = terminalText && salvageJsonToolCalls(terminalText).length > 0
                 ? STRICT_TOOL_JSON_INSTRUCTION
                 : null;
-            const recoveryInstruction = toolErrorRecoveryInstruction ?? leakedJsonToolCallRecoveryInstruction ?? (terminalText
+            const recoveryInstruction = toolErrorRecoveryInstruction ?? leakedJsonToolCallRecoveryInstruction ?? missingInitialPlanRecoveryInstruction ?? (terminalText
                 ? resolveExecutionWithoutToolEvidenceTextInstruction(activeRequest, terminalText, false, {
                     hasWriteTodosEvidence: executedToolResults.some((item) => item.toolName === "write_todos"),
                     hasToolResultEvidence: hasExecutionBeyondTodoPlanning,
@@ -197,6 +358,7 @@ export async function runLocalToolInvocationLoop({ binding, request, primaryTool
             role: "system",
             content: TOOL_FOLLOW_UP_INSTRUCTION,
         });
+        const hadNonTodoEvidenceBeforeToolReplay = hasNonTodoToolEvidence(executedToolResults);
         for (let toolIndex = 0; toolIndex < toolCalls.length; toolIndex += 1) {
             const toolCall = toolCalls[toolIndex];
             const resolvedToolName = resolveModelFacingToolName(toolCall.name, toolNameMapping, primaryTools);
@@ -214,9 +376,28 @@ export async function runLocalToolInvocationLoop({ binding, request, primaryTool
             const normalizedArgs = normalizeToolArgsForSchema(toolCall.args, activeExecutable.schema, toolCall.rawArgsInput, {
                 latestUserInput,
             });
+            const gateway = validateToolGatewayInput({
+                toolName: activeExecutable.name,
+                schema: activeExecutable.schema,
+                args: normalizedArgs,
+                requiresApproval: compiledTool ? toolRequiresRuntimeApproval(compiledTool) : false,
+            });
+            if (!gateway.ok) {
+                executedToolResults.push({
+                    toolName: activeExecutable.name,
+                    output: gateway.error,
+                    isError: true,
+                });
+                nextMessages.push(new ToolMessage({
+                    name: activeExecutable.name,
+                    tool_call_id: toolCall.id ?? `tool-${iteration + 1}-${toolIndex + 1}`,
+                    content: stringifyToolOutput(gateway.error),
+                }));
+                continue;
+            }
             const toolResult = toolRuntimeContext
-                ? await activeExecutable.invoke(normalizedArgs, { toolRuntimeContext })
-                : await activeExecutable.invoke(normalizedArgs);
+                ? await activeExecutable.invoke(gateway.input, { toolRuntimeContext })
+                : await activeExecutable.invoke(gateway.input);
             const memoryCandidates = compiledTool ? extractMemoryCandidatesFromToolOutput(compiledTool, toolResult) : [];
             const safeToolResult = await maybePersistLargeToolOutput({
                 toolName: activeExecutable.name,
@@ -234,6 +415,53 @@ export async function runLocalToolInvocationLoop({ binding, request, primaryTool
                 content: stringifyToolOutput(safeToolResult),
             }));
         }
+        if (requiresPlanEvidence(binding)
+            && !hadNonTodoEvidenceBeforeToolReplay
+            && !hasNonTodoToolEvidence(executedToolResults)
+            && toolCalls.length > 0
+            && toolCalls.every((toolCall) => isPlanToolName(toolCall.name))
+            && toolCalls.some(isTerminalTodoUpdateToolCall)) {
+            const fallbackEvidenceTool = selectGenericFallbackEvidenceTool({
+                planText: readTodoPlanTextFromToolCalls(toolCalls),
+                primaryTools,
+                toolNameMapping,
+                executableTools,
+                builtinExecutableTools,
+            });
+            if (fallbackEvidenceTool) {
+                const fallbackArgs = buildGenericFallbackArgsFromSchema(fallbackEvidenceTool.executable.schema, latestUserInput);
+                const normalizedArgs = normalizeToolArgsForSchema(fallbackArgs, fallbackEvidenceTool.executable.schema, undefined, {
+                    latestUserInput,
+                });
+                const compiledTool = toolCatalog.get(fallbackEvidenceTool.requestedName) ?? toolCatalog.get(fallbackEvidenceTool.executable.name);
+                const gateway = validateToolGatewayInput({
+                    toolName: fallbackEvidenceTool.executable.name,
+                    schema: fallbackEvidenceTool.executable.schema,
+                    args: normalizedArgs,
+                    requiresApproval: compiledTool ? toolRequiresRuntimeApproval(compiledTool) : false,
+                });
+                if (gateway.ok) {
+                    const toolResult = toolRuntimeContext
+                        ? await fallbackEvidenceTool.executable.invoke(gateway.input, { toolRuntimeContext })
+                        : await fallbackEvidenceTool.executable.invoke(gateway.input);
+                    const memoryCandidates = compiledTool ? extractMemoryCandidatesFromToolOutput(compiledTool, toolResult) : [];
+                    const safeToolResult = await maybePersistLargeToolOutput({
+                        toolName: fallbackEvidenceTool.executable.name,
+                        output: toolResult,
+                        toolRuntimeContext: toolRuntimeContext,
+                    });
+                    executedToolResults.push({
+                        toolName: fallbackEvidenceTool.executable.name,
+                        output: safeToolResult,
+                        ...(memoryCandidates.length > 0 ? { memoryCandidates } : {}),
+                    });
+                    return {
+                        result: buildDeterministicFinalFromToolEvidence(executedToolResults),
+                        executedToolResults,
+                    };
+                }
+            }
+        }
         if (requiresPlanEvidence(binding)
             && toolCalls.length > 0
             && toolCalls.every((toolCall) => isPlanToolName(toolCall.name))

package/dist/runtime/adapter/middleware-assembly.js CHANGED Viewed

@@ -418,7 +418,7 @@ export async function invokeBuiltinTaskTool(input) {
     if (!hasSubagentExecutionToolEvidence(result, resolvedSubagentTools, selectedCompiledSubagent?.tools)) {
         result = await invokeSubagent([description, EXECUTION_WITH_TOOL_EVIDENCE_RETRY_INSTRUCTION].filter(Boolean).join("\n\n"));
         if (!hasSubagentExecutionToolEvidence(result, resolvedSubagentTools, selectedCompiledSubagent?.tools)) {
-            throw new Error(`Delegated agent ${selectedSubagent.name} completed without tool execution evidence.`);
+            throw new Error(`Delegated agent ${selectedSubagent.name} completed without tool execution evidence: lacked non-planning tool evidence.`);
         }
     }
     const structuredResponse = typeof result === "object" && result !== null && "structuredResponse" in result

package/dist/runtime/adapter/model/invocation-request.js CHANGED Viewed

@@ -130,7 +130,7 @@ function isIncidentFollowUpTurn(inputText) {
     if (!normalized || hasExplicitResourceReference(normalized)) {
         return false;
     }
-    return /(the rca|deep research.*rca|root cause|go deeper|those issues|these issues|that issue|current incident|kubernetes issues)/i.test(normalized);
+    return /(the rca|deep research.*rca|root cause|go deeper|those issues|these issues|that issue|current incident)/i.test(normalized);
 }
 function findLastAssistantText(history) {
     for (let index = history.length - 1; index >= 0; index -= 1) {