@swarmclawai/swarmclaw 1.5.37 → 1.5.39
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +21 -85
- package/package.json +1 -1
- package/src/components/agents/agent-sheet.tsx +33 -0
- package/src/components/tasks/task-card.tsx +11 -3
- package/src/components/tasks/task-sheet.tsx +19 -8
- package/src/lib/providers/openclaw.ts +15 -0
- package/src/lib/server/autonomy/supervisor-reflection.ts +12 -0
- package/src/lib/server/build-llm.test.ts +12 -0
- package/src/lib/server/build-llm.ts +2 -0
- package/src/lib/server/chat-execution/chat-turn-preparation.ts +33 -6
- package/src/lib/server/chat-execution/chat-turn-state.test.ts +39 -0
- package/src/lib/server/chat-execution/chat-turn-state.ts +4 -0
- package/src/lib/server/chat-execution/continuation-evaluator.ts +10 -0
- package/src/lib/server/chat-execution/tool-event-tracker.test.ts +67 -0
- package/src/lib/server/chat-execution/tool-event-tracker.ts +38 -0
- package/src/lib/server/chat-execution/tool-summary-progress.test.ts +42 -0
- package/src/lib/server/chat-execution/tool-summary-progress.ts +13 -0
- package/src/lib/server/openclaw/gateway.ts +27 -7
- package/src/lib/server/runtime/perf.ts +5 -1
- package/src/lib/server/runtime/queue/core.ts +19 -0
- package/src/lib/server/runtime/queue-recovery.test.ts +176 -0
- package/src/lib/server/skills/runtime-skill-resolver.ts +34 -1
- package/src/lib/server/tasks/task-checkout.ts +7 -1
- package/src/lib/server/universal-tool-access.test.ts +71 -0
- package/src/lib/server/universal-tool-access.ts +23 -0
- package/src/types/agent.ts +7 -0
package/README.md
CHANGED
|
@@ -389,6 +389,26 @@ Operational docs: https://swarmclaw.ai/docs/observability
|
|
|
389
389
|
|
|
390
390
|
## Releases
|
|
391
391
|
|
|
392
|
+
### v1.5.39 Highlights
|
|
393
|
+
|
|
394
|
+
- **Agents default to scoped tool access**: new agents (and existing agents whose `tools` list is non-empty) now only see the tools they've been given in the system prompt. This trims ~3 k input tokens per turn — an observed CEO/coordinator agent with 14 tools and 4 loaded skills went from 62 k to 38 k chars of system prompt. Opt back into the old firehose by toggling **Universal tool access** in the agent sheet's new "Context & Tool Access" section. Memory, context management, and `ask_human` are always included regardless of the scoped list.
|
|
395
|
+
- **Pinned skills budget hardening**: one long markdown skill was eating 24 k of a 62 k prompt. Inlined pinned-skill content is now capped at 3 k chars with a pointer to `use_skill` action="load" for the full guide, and auto-attached *learned* skills get a dedicated sub-budget (max 6 skills / 8 k chars) so they cannot dominate the main pinned-skills section.
|
|
396
|
+
- **OpenClaw chat fast-fails on dangling credentials**: v1.5.38 added gateway-side fast-fail; the chat streaming path now does the same, emitting a clear `err` event naming the missing credential instead of dialing the gateway unauthenticated and waiting 120 s for the timeout.
|
|
397
|
+
- **Queue: orphan-recovery auto-heals stale checkouts**: pre-1.5.38 storage could leave `queued` tasks with a stale `checkoutRunId` that `checkoutTask()` refused forever. Orphan recovery now clears the stale id in the same sweep that re-queues the task, and `reconcileFinishedRunningTasks` / agent-not-found / capability-mismatch paths also null out the checkout when they terminally fail a task.
|
|
398
|
+
- **Perf ring buffer raised to 2 000 entries**: queue/task repository events fire ~20 Hz during task processing and were evicting chat-execution/prompt perf entries out of the 200-entry buffer before they could be read. The larger buffer lets the perf viewer actually show a full turn.
|
|
399
|
+
- **Tests**: added regression tests for pre-1.5.38 stale-checkout orphan recovery and for the scoped-tool-access algorithm.
|
|
400
|
+
|
|
401
|
+
### v1.5.38 Highlights
|
|
402
|
+
|
|
403
|
+
- **Task queue: reclaim stale checkouts**: `checkoutTask()` now reclaims a lingering `checkoutRunId` on a `queued` task instead of refusing it forever. An ungraceful server exit mid-turn (crash, SIGKILL, HMR reload) previously left tasks uncheckoutable, producing a dispatch → orphan-recovery → failed-checkout spin that logged "Recovering orphaned queued task" tens of thousands of times per session. `scheduleRetryOrDeadLetter()` also clears the prior checkout when scheduling a retry or dead-lettering.
|
|
404
|
+
- **Chat: suppress duplicate parallel tool calls**: some OSS models on Ollama (notably `devstral`) emit the same tool call twice in a single turn. The LangGraph tool-event tracker now dedupes by `name + input` signature, swallowing the duplicate start and its result while allowing a genuinely later identical call once the first completes. Hardened against replayed-start events (HMR, graph retries) that previously could leak a `run_id` into both the accepted and suppressed sets and leave `pendingCount` stuck above zero.
|
|
405
|
+
- **Chat: disable `parallel_tool_calls` for Ollama**: local Ollama sessions now pass `parallel_tool_calls: false` to prevent the upstream duplicate-call behavior at the source for models that honor it.
|
|
406
|
+
- **Chat: no-progress guard for tool summary retries**: if the model produces essentially no new text on a `tool_summary` continuation, the loop stops retrying instead of streaming the same short sentence two or three times. The guard is snapshot-aware: a transient-error rollback no longer leaves a stale progress counter that silently skips a legitimate retry (`lastToolSummaryTextLen` is now round-tripped through `ChatTurnState.snapshot`/`restore`).
|
|
407
|
+
- **Task UI: distinguish retry-pending from failure**: a retrying task now renders in amber with a "Retry Pending" label in the task card and sheet, instead of the same red treatment used for dead-lettered failures.
|
|
408
|
+
- **Autonomy: dedupe reflection memories across kinds**: the supervisor reflection writer now drops notes whose normalized text has already been stored this run, eliminating near-identical memory rows classified under multiple kinds.
|
|
409
|
+
- **OpenClaw gateway: fast-fail on dangling credentials**: when an agent's OpenClaw route references a deleted or missing credential, the gateway now refuses to dial the WebSocket up front instead of attempting an unauthenticated handshake and waiting the full 120 s for the agent-side timeout. The credential-missing log line is promoted from warn to error so it surfaces in routine monitoring.
|
|
410
|
+
- **Prompt size profiler**: setting `SWARMCLAW_PROFILE_PROMPT=1` now logs a per-section size breakdown of the assembled system prompt (block index, first-line label, char count) on every turn, making it practical to diagnose why a specific agent is eating context budget. Off by default so production turns stay quiet.
|
|
411
|
+
|
|
392
412
|
### v1.5.37 Highlights
|
|
393
413
|
|
|
394
414
|
- **Factory Droid CLI as a provider and delegation backend**: adds [`droid`](https://docs.factory.ai/cli/droid-exec/overview) as a first-class chat provider and `delegate` backend with streaming JSON output, session resume, and a conservative `--auto low` autonomy pin on the delegate path. Install `droid` and sign in via browser (or set `FACTORY_API_KEY`), then pick **Factory Droid CLI** in the setup wizard. Resolves #38.
|
|
@@ -411,91 +431,7 @@ Operational docs: https://swarmclaw.ai/docs/observability
|
|
|
411
431
|
- **Gateway credential resolution logging**: when a gateway credential can't be resolved, the server now logs a clear warning identifying the missing credential ID.
|
|
412
432
|
- **Credential decryption error logging**: when a stored credential can't be decrypted (e.g. after `CREDENTIAL_SECRET` changes), the server now logs the credential ID and provider so users know which key to re-add.
|
|
413
433
|
|
|
414
|
-
|
|
415
|
-
|
|
416
|
-
- **Ollama Cloud auth fix**: SwarmClaw now normalizes `api.ollama.com` and `www.ollama.com` to `ollama.com` before making authenticated requests, avoiding the redirect that was dropping authorization headers and causing false provider-health/runtime failures.
|
|
417
|
-
- **Chat execution context hardening**: tool invocation now resolves names case-insensitively, oversized tool results are truncated before they are fed back into the model, and proactive grounding/heartbeat prompts stay smaller under pressure to reduce avoidable context blowouts.
|
|
418
|
-
- **API compatibility fixes**: OpenAI-compatible streaming now captures reasoning deltas from providers that emit them outside `delta.content`, and A2A endpoints are exempt from the main proxy access-key gate so they can rely on their own auth scheme.
|
|
419
|
-
|
|
420
|
-
### v1.5.33 Highlights
|
|
421
|
-
|
|
422
|
-
- **CLI global flag compatibility**: legacy-routed commands now honor the documented `--access-key` and `--base-url` aliases even when they appear after the subcommand, so authenticated CLI automation works the same across binary entry points.
|
|
423
|
-
- **Docker build memory hardening**: production Next.js builds now size `--max-old-space-size` from the detected container/cgroup memory limit, with `SWARMCLAW_BUILD_MAX_OLD_SPACE_SIZE_MB` available as an explicit override for constrained Docker Desktop and CI environments.
|
|
424
|
-
|
|
425
|
-
### v1.5.31 Highlights
|
|
426
|
-
|
|
427
|
-
- **Fix Docker first-run crash**: resolved `EISDIR: illegal operation on a directory, read` error when running `docker compose up` without a pre-existing `.env.local` file. Docker was creating a directory mount instead of a file, which crashed Next.js on startup. Replaced the file bind mount with `env_file` directive using `required: false`.
|
|
428
|
-
|
|
429
|
-
### v1.5.4 Highlights
|
|
430
|
-
|
|
431
|
-
- **Cursor Agent CLI built-in provider**: Cursor Agent CLI is now a first-class worker provider with session continuity, headless execution, and delegation support.
|
|
432
|
-
- **Qwen Code CLI built-in provider**: Qwen Code CLI is now available as a built-in worker provider and delegation backend with structured headless execution support.
|
|
433
|
-
- **Goose built-in provider**: Goose is now supported as a runtime-managed worker provider, using its own local auth and provider configuration while preserving SwarmClaw session continuity.
|
|
434
|
-
- **CLI setup and health parity**: setup flows, provider checks, setup doctor, and provider-facing UI now recognize Cursor, Qwen Code, and Goose alongside the existing CLI-backed providers.
|
|
435
|
-
|
|
436
|
-
### v1.5.3 Highlights
|
|
437
|
-
|
|
438
|
-
- **Copilot CLI v1.x compatibility**: the `copilot-cli` provider now handles the current event format (`assistant.message_delta`, `assistant.message`, updated `result` payload) while keeping backward compatibility with the legacy format. Also fixes `--resume` flag syntax. (Community contribution by [@borislavnnikolov](https://github.com/borislavnnikolov) -- PR #36)
|
|
439
|
-
|
|
440
|
-
### v1.5.2 Highlights
|
|
441
|
-
|
|
442
|
-
- **Hosted deploy path for SwarmClaw itself**: added root-level `render.yaml`, `fly.toml`, and `railway.json` so the published `ghcr.io/swarmclawai/swarmclaw:latest` image is easier to run on always-on platforms.
|
|
443
|
-
- **Public health endpoint for hosted platforms**: added `/api/healthz` and exempted it from access-key auth so Render, Fly.io, and Railway can perform liveness checks without weakening the rest of the API surface.
|
|
444
|
-
- **OTLP/OpenTelemetry foundation**: SwarmClaw can now export traces for chat turns, direct model streams, protocol runs, and tool execution to any OTLP-compatible backend using environment variables only.
|
|
445
|
-
- **Docs and landing-page deploy refresh**: `swarmclaw.ai` now exposes the hosted deploy path and a dedicated observability guide instead of burying those operator workflows in general setup docs.
|
|
446
|
-
|
|
447
|
-
### v1.5.1 Highlights
|
|
448
|
-
|
|
449
|
-
- **Standalone connector lifecycle**: connector start, stop, status, and repair now work correctly in standalone production builds (`npm start` / pm2) where the daemon runs in-process. Previously these operations silently failed because the controller assumed a daemon subprocess was always present. (Community contribution by [@borislavnnikolov](https://github.com/borislavnnikolov) -- PR #35)
|
|
450
|
-
|
|
451
|
-
### v1.5.0 Highlights
|
|
452
|
-
|
|
453
|
-
- **First-run activation refresh**: setup now includes a dedicated start-path step, broad starter shapes instead of niche presets, and draft agents generated directly from the chosen setup shape.
|
|
454
|
-
- **Guided post-setup launchpad**: finishing setup now routes through action-oriented next steps such as opening the first agent chat, launching a structured session, connecting platforms, or reviewing usage.
|
|
455
|
-
- **State-aware home and protocols**: fresh workspaces now open on a launchpad instead of a sparse ops dashboard, and the Protocols page now surfaces the visual builder and template gallery directly.
|
|
456
|
-
|
|
457
|
-
### v1.4.9 Highlights
|
|
458
|
-
|
|
459
|
-
- **Standalone build reliability**: `public/`, `.next/static/`, and `css-tree/data/` are now automatically copied into the standalone build output, fixing runtime crashes and missing assets when running the standalone bundle. (Community contribution by [@borislavnnikolov](https://github.com/borislavnnikolov) — PR #34)
|
|
460
|
-
|
|
461
|
-
### v1.4.8 Highlights
|
|
462
|
-
|
|
463
|
-
- **Agent-scoped SwarmFeed dashboard**: the in-app feed now has an explicit acting-agent model so humans can direct social actions without ever posting as a separate user identity.
|
|
464
|
-
- **Expanded feed surface**: added Bookmarks and Notifications tabs, SwarmFeed search, suggested follows, thread detail sheets, profile sheets, and a restored visible composer.
|
|
465
|
-
- **Broader SwarmFeed tool/API support**: the built-in `swarmfeed` tool and internal API now support follow/unfollow, bookmark/unbookmark, quote reposts, notifications, profile lookup, thread reads, and search.
|
|
466
|
-
- **Social heartbeat enforcement**: task-completion posting, daily/manual-only guardrails, and heartbeat dependency warnings now match the agent-first SwarmFeed model instead of leaving social automation loosely implied.
|
|
467
|
-
|
|
468
|
-
### v1.4.7 Highlights
|
|
469
|
-
|
|
470
|
-
- **Hermes Agent built-in provider**: Added first-class Hermes support through the Hermes API server, including optional auth, local or remote `/v1` endpoints, and runtime-managed agent handling.
|
|
471
|
-
- **OpenRouter built-in provider**: OpenRouter is now a built-in provider instead of living only behind the generic custom-provider path.
|
|
472
|
-
- **Runtime-managed provider handling**: Hermes now skips SwarmClaw's local extension/tool injection path so its own runtime stays in control, while setup and model discovery still work through the normal provider flow.
|
|
473
|
-
- **Provider docs refresh**: README and docs now reflect the new provider list, remote Hermes API-server support, and logo assets for OpenRouter and Hermes Agent.
|
|
474
|
-
|
|
475
|
-
### v1.4.6 Highlights
|
|
476
|
-
|
|
477
|
-
- **SwarmDock startup sync**: Existing SwarmDock agents now authenticate and reconcile their live marketplace profile on connector start, updating stale description, skills, framework/model metadata, and payout wallet fields
|
|
478
|
-
- **Agent wallet fallback**: SwarmDock connectors now fall back to the agent's selected marketplace wallet when no connector-level wallet address is configured
|
|
479
|
-
- **Task filter fix**: The built-in `swarmdock` tool now uses the correct `skills=` task filter when browsing marketplace tasks from chat
|
|
480
|
-
- **SwarmDock SDK bump**: Updated `@swarmdock/sdk` from `0.5.2` to `0.5.3`, aligning the connector with the published metadata-sync fixes
|
|
481
|
-
|
|
482
|
-
### v1.4.5 Highlights
|
|
483
|
-
|
|
484
|
-
- **OpenClaw 2026.4.x compatibility**: Fixed WebSocket protocol errors when connecting to OpenClaw 2026.4.2+ gateways (`profileId` was incorrectly included in RPC params)
|
|
485
|
-
- **OpenClaw dependency bump**: Updated minimum OpenClaw from `2026.2.26` to `2026.4.2`
|
|
486
|
-
|
|
487
|
-
### v1.4.4 Highlights
|
|
488
|
-
|
|
489
|
-
- **SwarmDock SDK bump**: Updated `@swarmdock/sdk` from `0.4.1` to `0.5.2`, picking up new error types, skill templates, and agent primitives
|
|
490
|
-
|
|
491
|
-
### v1.4.3 Highlights
|
|
492
|
-
|
|
493
|
-
- **SwarmDock agent opt-in**: Agents can now opt into the SwarmDock marketplace directly from their settings sheet with description, skills, wallet, and auto-bid configuration
|
|
494
|
-
- **SwarmFeed & SwarmDock tools**: Agents get `swarmfeed` and `swarmdock` tools auto-enabled when opted in, allowing autonomous posting, replying, liking, browsing tasks, and checking status from chat
|
|
495
|
-
- **Auto-registration**: Enabling SwarmFeed on an agent automatically registers it on the SwarmFeed network (no manual connector setup required)
|
|
496
|
-
- **Marketplace page**: New `/marketplace` sidebar page showing live SwarmDock tasks and agents
|
|
497
|
-
- **Following tab fix**: SwarmFeed Following tab gracefully handles unregistered agents instead of showing a 401 error
|
|
498
|
-
- **Compose removal**: Removed manual compose UI from Feed page — agents post autonomously through their tools
|
|
434
|
+
Older releases: https://swarmclaw.ai/docs/release-notes
|
|
499
435
|
|
|
500
436
|
- GitHub releases: https://github.com/swarmclawai/swarmclaw/releases
|
|
501
437
|
- npm package: https://www.npmjs.com/package/@swarmclawai/swarmclaw
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@swarmclawai/swarmclaw",
|
|
3
|
-
"version": "1.5.
|
|
3
|
+
"version": "1.5.39",
|
|
4
4
|
"description": "Build and run autonomous AI agents with OpenClaw, Hermes, multiple model providers, orchestration, delegation, memory, skills, schedules, and chat connectors.",
|
|
5
5
|
"main": "electron-dist/main.js",
|
|
6
6
|
"license": "MIT",
|
|
@@ -210,6 +210,11 @@ export function AgentSheet() {
|
|
|
210
210
|
const [delegationTargetMode, setDelegationTargetMode] = useState<'all' | 'selected'>('all')
|
|
211
211
|
const [delegationTargetAgentIds, setDelegationTargetAgentIds] = useState<string[]>([])
|
|
212
212
|
const [tools, setTools] = useState<string[]>([])
|
|
213
|
+
// Scoped tool access is the default for new agents (cuts ~3 k input tokens
|
|
214
|
+
// per turn). Existing agents with no toolAccessMode field persisted stay
|
|
215
|
+
// universal server-side for backward compat; the new-agent setup path
|
|
216
|
+
// below also explicitly writes 'scoped' so it persists on save.
|
|
217
|
+
const [toolAccessMode, setToolAccessMode] = useState<'universal' | 'scoped'>('scoped')
|
|
213
218
|
const [extensions, setExtensions] = useState<string[]>([])
|
|
214
219
|
const [enabledExtensionIds, setEnabledExtensionIds] = useState<Set<string> | null>(null)
|
|
215
220
|
const [skills, setSkills] = useState<string[]>([])
|
|
@@ -415,6 +420,7 @@ export function AgentSheet() {
|
|
|
415
420
|
setDelegationTargetMode(editing.delegationTargetMode === 'selected' ? 'selected' : 'all')
|
|
416
421
|
setDelegationTargetAgentIds(editing.delegationTargetAgentIds || [])
|
|
417
422
|
setTools(getEnabledToolIds(editing))
|
|
423
|
+
setToolAccessMode(editing.toolAccessMode === 'scoped' ? 'scoped' : 'universal')
|
|
418
424
|
setExtensions(getEnabledExtensionIds(editing))
|
|
419
425
|
setSkills(editing.skills || [])
|
|
420
426
|
setSkillIds(editing.skillIds || [])
|
|
@@ -497,6 +503,7 @@ export function AgentSheet() {
|
|
|
497
503
|
setDelegationTargetMode(src.delegationTargetMode === 'selected' ? 'selected' : 'all')
|
|
498
504
|
setDelegationTargetAgentIds(src.delegationTargetAgentIds || [])
|
|
499
505
|
setTools(getEnabledToolIds(src))
|
|
506
|
+
setToolAccessMode(src.toolAccessMode === 'scoped' ? 'scoped' : 'universal')
|
|
500
507
|
setExtensions(getEnabledExtensionIds(src))
|
|
501
508
|
setSkills(src.skills || [])
|
|
502
509
|
setSkillIds(src.skillIds || [])
|
|
@@ -576,6 +583,7 @@ export function AgentSheet() {
|
|
|
576
583
|
setDelegationTargetMode('all')
|
|
577
584
|
setDelegationTargetAgentIds([])
|
|
578
585
|
setTools(getDefaultAgentToolIds())
|
|
586
|
+
setToolAccessMode('scoped')
|
|
579
587
|
setExtensions([])
|
|
580
588
|
setSkills([])
|
|
581
589
|
setSkillIds([])
|
|
@@ -783,6 +791,7 @@ export function AgentSheet() {
|
|
|
783
791
|
delegationTargetMode: delegationEnabled || role === 'coordinator' ? delegationTargetMode : 'all',
|
|
784
792
|
delegationTargetAgentIds: (delegationEnabled || role === 'coordinator') && delegationTargetMode === 'selected' ? delegationTargetAgentIds : [],
|
|
785
793
|
tools,
|
|
794
|
+
toolAccessMode,
|
|
786
795
|
extensions,
|
|
787
796
|
skills,
|
|
788
797
|
skillIds,
|
|
@@ -2005,6 +2014,30 @@ export function AgentSheet() {
|
|
|
2005
2014
|
summary={advancedSummary}
|
|
2006
2015
|
badges={agentAdvancedBadges}
|
|
2007
2016
|
>
|
|
2017
|
+
<SectionCard
|
|
2018
|
+
title="Context & Tool Access"
|
|
2019
|
+
description="Control how many tools are described in this agent's system prompt. Scoped (default) keeps the agent focused and saves ~3 k input tokens per turn; Universal gives it visibility into every built-in tool."
|
|
2020
|
+
className="mb-6 border-white/[0.05] bg-white/[0.01]"
|
|
2021
|
+
>
|
|
2022
|
+
<div className="space-y-3">
|
|
2023
|
+
<label className="flex items-center gap-3 cursor-pointer">
|
|
2024
|
+
<div
|
|
2025
|
+
onClick={() => setToolAccessMode((current) => current === 'universal' ? 'scoped' : 'universal')}
|
|
2026
|
+
className={`w-11 h-6 rounded-full transition-all duration-200 relative cursor-pointer shrink-0 ${toolAccessMode === 'universal' ? 'bg-accent-bright' : 'bg-white/[0.08]'}`}
|
|
2027
|
+
>
|
|
2028
|
+
<div className={`absolute top-0.5 w-5 h-5 rounded-full bg-white transition-all duration-200 ${toolAccessMode === 'universal' ? 'left-[22px]' : 'left-0.5'}`} />
|
|
2029
|
+
</div>
|
|
2030
|
+
<span className="text-[13px] text-text-2">Universal tool access</span>
|
|
2031
|
+
<HintTip text="Off (default, recommended): the agent only sees tools enabled in its Tools list. On: every built-in tool is described in the system prompt. Turn on only for coordinator agents that need visibility across every possible downstream tool, or temporarily for debugging." />
|
|
2032
|
+
</label>
|
|
2033
|
+
<p className="text-[12px] text-text-3/70 pl-[56px] -mt-1">
|
|
2034
|
+
{toolAccessMode === 'universal'
|
|
2035
|
+
? 'Full tool universe is injected into the prompt. Costs ~3 k more input tokens per turn.'
|
|
2036
|
+
: 'Only the tools enabled above are visible to the agent — this is the focused default.'}
|
|
2037
|
+
</p>
|
|
2038
|
+
</div>
|
|
2039
|
+
</SectionCard>
|
|
2040
|
+
|
|
2008
2041
|
<SectionCard
|
|
2009
2042
|
title="Voice & Autonomy"
|
|
2010
2043
|
description="Tune voice and the detailed heartbeat behavior for this agent."
|
|
@@ -343,9 +343,17 @@ export function TaskCard({
|
|
|
343
343
|
)}
|
|
344
344
|
</div>
|
|
345
345
|
|
|
346
|
-
{task.error && (
|
|
347
|
-
|
|
348
|
-
|
|
346
|
+
{task.error && (() => {
|
|
347
|
+
const retryPending =
|
|
348
|
+
task.status !== 'failed' &&
|
|
349
|
+
!task.deadLetteredAt &&
|
|
350
|
+
(task.retryScheduledAt != null || /^Retry scheduled after failure/i.test(task.error))
|
|
351
|
+
return (
|
|
352
|
+
<p className={`mt-2 text-[11px] line-clamp-2 ${retryPending ? 'text-amber-400/80' : 'text-red-400/80'}`}>
|
|
353
|
+
{task.error}
|
|
354
|
+
</p>
|
|
355
|
+
)
|
|
356
|
+
})()}
|
|
349
357
|
|
|
350
358
|
{/* Inline comments — show latest 2 */}
|
|
351
359
|
{task.comments && task.comments.length > 0 && (
|
|
@@ -549,15 +549,26 @@ export function TaskSheet() {
|
|
|
549
549
|
</div>
|
|
550
550
|
)}
|
|
551
551
|
|
|
552
|
-
{/* Error */}
|
|
553
|
-
{editing.error && (
|
|
554
|
-
|
|
555
|
-
|
|
556
|
-
|
|
557
|
-
|
|
552
|
+
{/* Error / Retry notice */}
|
|
553
|
+
{editing.error && (() => {
|
|
554
|
+
const retryPending =
|
|
555
|
+
editing.status !== 'failed' &&
|
|
556
|
+
!editing.deadLetteredAt &&
|
|
557
|
+
(editing.retryScheduledAt != null || /^Retry scheduled after failure/i.test(editing.error))
|
|
558
|
+
const label = retryPending ? 'Retry Pending' : 'Error'
|
|
559
|
+
const tone = retryPending
|
|
560
|
+
? 'border-amber-500/15 bg-amber-500/[0.04] text-amber-300/80'
|
|
561
|
+
: 'border-red-500/10 bg-red-500/[0.03] text-red-400/80'
|
|
562
|
+
const labelTone = retryPending ? 'text-amber-400' : 'text-red-400'
|
|
563
|
+
return (
|
|
564
|
+
<div className="mb-8">
|
|
565
|
+
<label className={`block font-display text-[12px] font-600 uppercase tracking-[0.08em] mb-3 ${labelTone}`}>{label}</label>
|
|
566
|
+
<div className={`p-4 rounded-[14px] border text-[13px] whitespace-pre-wrap ${tone}`}>
|
|
567
|
+
{editing.error}
|
|
568
|
+
</div>
|
|
558
569
|
</div>
|
|
559
|
-
|
|
560
|
-
)}
|
|
570
|
+
)
|
|
571
|
+
})()}
|
|
561
572
|
|
|
562
573
|
{/* Comments (with input — adding comments from view mode is useful) */}
|
|
563
574
|
<div className="mb-8">
|
|
@@ -422,6 +422,21 @@ export function streamOpenClawChat({ session, message, imagePath, apiKey, write,
|
|
|
422
422
|
|
|
423
423
|
const wsUrl = session.apiEndpoint ? deriveOpenClawWsUrl(session.apiEndpoint) : 'ws://127.0.0.1:18789'
|
|
424
424
|
const token = apiKey || session.apiKey || undefined
|
|
425
|
+
// If the session references a credential but nothing resolved, the credential
|
|
426
|
+
// was deleted or corrupted. Fail fast with a clear error instead of dialing
|
|
427
|
+
// the gateway unauthenticated and timing out 120 s later (the original symptom
|
|
428
|
+
// behind the "OpenClaw gateway timed out after 120 s" report).
|
|
429
|
+
const credentialIdSet = typeof session.credentialId === 'string' && session.credentialId.trim().length > 0
|
|
430
|
+
if (credentialIdSet && !token) {
|
|
431
|
+
return Promise.resolve().then(() => {
|
|
432
|
+
active.delete(session.id)
|
|
433
|
+
write(`data: ${JSON.stringify({
|
|
434
|
+
t: 'err',
|
|
435
|
+
text: `OpenClaw credential "${session.credentialId}" is missing from the credential store. Reattach an existing credential or create a new one in Settings → Credentials.`,
|
|
436
|
+
})}\n\n`)
|
|
437
|
+
return ''
|
|
438
|
+
})
|
|
439
|
+
}
|
|
425
440
|
return new Promise((resolve) => {
|
|
426
441
|
let fullResponse = ''
|
|
427
442
|
let settled = false
|
|
@@ -779,8 +779,20 @@ function writeReflectionMemories(params: {
|
|
|
779
779
|
{ kind: 'open_loop', notes: params.openLoops },
|
|
780
780
|
]
|
|
781
781
|
|
|
782
|
+
// Cross-kind dedup: skip notes whose normalized text we've already stored
|
|
783
|
+
// this run. The reflection classifier often produces near-identical notes
|
|
784
|
+
// under multiple kinds (e.g. "successfully completed a multi-step task…"
|
|
785
|
+
// appearing as both invariant and lesson), which floods memory with noise.
|
|
786
|
+
const seenNormalized = new Set<string>()
|
|
787
|
+
const normalizeNote = (note: string): string =>
|
|
788
|
+
note.toLowerCase().replace(/\s+/g, ' ').trim().slice(0, 240)
|
|
789
|
+
|
|
782
790
|
for (const group of groups) {
|
|
783
791
|
for (const note of group.notes) {
|
|
792
|
+
const norm = normalizeNote(note)
|
|
793
|
+
if (!norm) continue
|
|
794
|
+
if (seenNormalized.has(norm)) continue
|
|
795
|
+
seenNormalized.add(norm)
|
|
784
796
|
const metadata: Record<string, unknown> = {
|
|
785
797
|
origin: 'autonomy-reflection',
|
|
786
798
|
reflectionId: params.reflectionId,
|
|
@@ -238,6 +238,18 @@ test('buildChatModel keeps local Ollama local even when a credential and :cloud
|
|
|
238
238
|
assert.equal(llm.clientConfig?.baseURL, 'http://localhost:11434/v1')
|
|
239
239
|
})
|
|
240
240
|
|
|
241
|
+
test('buildChatModel disables parallel_tool_calls for Ollama local to avoid duplicate tool emissions from some OSS models', () => {
|
|
242
|
+
const llm = buildChatModel({
|
|
243
|
+
provider: 'ollama',
|
|
244
|
+
model: 'devstral',
|
|
245
|
+
ollamaMode: 'local',
|
|
246
|
+
apiKey: null,
|
|
247
|
+
}) as ChatOpenAI & { modelKwargs?: Record<string, unknown> }
|
|
248
|
+
|
|
249
|
+
assert.equal(llm.modelKwargs?.parallel_tool_calls, false)
|
|
250
|
+
assert.equal(llm.clientConfig?.baseURL, 'http://localhost:11434/v1')
|
|
251
|
+
})
|
|
252
|
+
|
|
241
253
|
test('buildChatModel uses Ollama Cloud only when explicit cloud mode is selected', () => {
|
|
242
254
|
saveCredentials({
|
|
243
255
|
'cred-1': {
|
|
@@ -38,6 +38,7 @@ type OpenAiReasoningEffort = 'low' | 'medium' | 'high'
|
|
|
38
38
|
type ChatOpenAiConfig = ConstructorParameters<typeof ChatOpenAI>[0] & {
|
|
39
39
|
modelKwargs?: {
|
|
40
40
|
reasoning_effort?: OpenAiReasoningEffort
|
|
41
|
+
parallel_tool_calls?: boolean
|
|
41
42
|
}
|
|
42
43
|
configuration?: {
|
|
43
44
|
baseURL?: string
|
|
@@ -104,6 +105,7 @@ export function buildChatModel(opts: {
|
|
|
104
105
|
timeout: OPENAI_COMPAT_MODEL_TIMEOUT_MS,
|
|
105
106
|
maxRetries: OPENAI_COMPAT_MODEL_MAX_RETRIES,
|
|
106
107
|
configuration: { baseURL },
|
|
108
|
+
modelKwargs: { parallel_tool_calls: false },
|
|
107
109
|
})
|
|
108
110
|
}
|
|
109
111
|
|
|
@@ -1,6 +1,7 @@
|
|
|
1
1
|
import fs from 'fs'
|
|
2
2
|
import os from 'os'
|
|
3
3
|
|
|
4
|
+
import { log } from '@/lib/server/logger'
|
|
4
5
|
import { getProvider } from '@/lib/providers'
|
|
5
6
|
import type { ExecutionBrief, Message, Session } from '@/types'
|
|
6
7
|
import {
|
|
@@ -14,7 +15,7 @@ import { loadSettings } from '@/lib/server/settings/settings-repository'
|
|
|
14
15
|
import { loadSkills } from '@/lib/server/skills/skill-repository'
|
|
15
16
|
import { resolveImagePath } from '@/lib/server/resolve-image'
|
|
16
17
|
import { resolveSessionToolPolicy } from '@/lib/server/tool-capability-policy'
|
|
17
|
-
import { listUniversalToolAccessExtensionIds } from '@/lib/server/universal-tool-access'
|
|
18
|
+
import { listUniversalToolAccessExtensionIds, listScopedToolAccessExtensionIds } from '@/lib/server/universal-tool-access'
|
|
18
19
|
import {
|
|
19
20
|
buildAgentDisabledMessage,
|
|
20
21
|
isAgentDisabled,
|
|
@@ -331,9 +332,17 @@ function buildAgentSystemPrompt(
|
|
|
331
332
|
const allowSilentReplies = isDirectConnectorSession(session)
|
|
332
333
|
const lightweightDirectChat = options?.lightweightDirectChat === true
|
|
333
334
|
const parts: string[] = []
|
|
334
|
-
const
|
|
335
|
-
|
|
336
|
-
|
|
335
|
+
const capabilityIds = getEnabledCapabilityIds(session).length > 0
|
|
336
|
+
? getEnabledCapabilityIds(session)
|
|
337
|
+
: getEnabledCapabilityIds(agent)
|
|
338
|
+
// Scoped tool access is the new default: if the agent declares a non-empty
|
|
339
|
+
// `tools` list, the system prompt only describes those tools. Explicit
|
|
340
|
+
// `toolAccessMode: 'universal'` opts into the full firehose (for coordinators
|
|
341
|
+
// or debugging). Agents with no declared tools fall back to universal so
|
|
342
|
+
// empty-config agents aren't crippled.
|
|
343
|
+
const enabledExtensions = agent.toolAccessMode !== 'universal' && Array.isArray(agent.tools) && agent.tools.length > 0
|
|
344
|
+
? listScopedToolAccessExtensionIds(agent.tools, capabilityIds)
|
|
345
|
+
: listUniversalToolAccessExtensionIds(capabilityIds)
|
|
337
346
|
|
|
338
347
|
const identityLines = ['## My Identity']
|
|
339
348
|
identityLines.push(`Name: ${agent.name}`)
|
|
@@ -418,7 +427,17 @@ function buildAgentSystemPrompt(
|
|
|
418
427
|
'You run on an autonomous heartbeat. If you receive a heartbeat poll and nothing needs attention, reply exactly: HEARTBEAT_OK',
|
|
419
428
|
].join('\n'))
|
|
420
429
|
|
|
421
|
-
|
|
430
|
+
const assembled = parts.join('\n\n')
|
|
431
|
+
if (process.env.SWARMCLAW_PROFILE_PROMPT === '1') {
|
|
432
|
+
// Dump per-section sizes once per turn to help size the context budget.
|
|
433
|
+
// Kept behind an env flag so production turns stay quiet.
|
|
434
|
+
const sectionSizes = parts.map((block, idx) => {
|
|
435
|
+
const firstLine = block.split('\n', 1)[0]
|
|
436
|
+
return ` [${idx}] ${firstLine.slice(0, 60)} — ${block.length} chars`
|
|
437
|
+
}).join('\n')
|
|
438
|
+
log.info('prompt-profile', `System prompt assembled (${assembled.length} chars, ${parts.length} blocks, ${enabledExtensions.length} extensions):\n${sectionSizes}`)
|
|
439
|
+
}
|
|
440
|
+
return assembled
|
|
422
441
|
}
|
|
423
442
|
|
|
424
443
|
function resolveApiKeyForSession(session: SessionWithCredentials, provider: ProviderApiKeyConfig): string | null {
|
|
@@ -536,8 +555,16 @@ export async function prepareChatTurn(input: ExecuteChatTurnInput): Promise<Prep
|
|
|
536
555
|
const runtimeCapabilityIds = filterRuntimeCapabilityIds(getEnabledCapabilityIds(session), {
|
|
537
556
|
delegationEnabled: agentForSession?.delegationEnabled === true,
|
|
538
557
|
})
|
|
558
|
+
// Match the resolver in buildAgentSystemPrompt: default to scoped whenever
|
|
559
|
+
// the agent declares a non-empty tools list, unless explicitly set to
|
|
560
|
+
// 'universal'. Agents with no declared tools stay universal.
|
|
561
|
+
const scopedAccess = agentForSession?.toolAccessMode !== 'universal'
|
|
562
|
+
&& Array.isArray(agentForSession?.tools)
|
|
563
|
+
&& (agentForSession!.tools!.length > 0)
|
|
539
564
|
const requestedCapabilityIds = runtimeCapabilityIds.length > 0
|
|
540
|
-
?
|
|
565
|
+
? (scopedAccess
|
|
566
|
+
? listScopedToolAccessExtensionIds(agentForSession!.tools!, runtimeCapabilityIds)
|
|
567
|
+
: listUniversalToolAccessExtensionIds(runtimeCapabilityIds))
|
|
541
568
|
: []
|
|
542
569
|
const toolPolicy = resolveSessionToolPolicy(requestedCapabilityIds, appSettings)
|
|
543
570
|
const isHeartbeatRun = input.internal === true && source === 'heartbeat'
|
|
@@ -0,0 +1,39 @@
|
|
|
1
|
+
import { describe, it } from 'node:test'
|
|
2
|
+
import assert from 'node:assert/strict'
|
|
3
|
+
import { ChatTurnState } from '@/lib/server/chat-execution/chat-turn-state'
|
|
4
|
+
|
|
5
|
+
describe('ChatTurnState snapshot/restore', () => {
|
|
6
|
+
it('round-trips lastToolSummaryTextLen so a rollback does not skip a legitimate tool_summary retry', () => {
|
|
7
|
+
// Regression: before this field was included in the snapshot, a transient
|
|
8
|
+
// error + restore() would leave lastToolSummaryTextLen ahead of fullText.
|
|
9
|
+
// The no-progress guard in checkToolSummary then saw a negative delta and
|
|
10
|
+
// silently skipped the retry — the model lost its chance to summarize.
|
|
11
|
+
const state = new ChatTurnState()
|
|
12
|
+
state.fullText = 'initial prompt text'
|
|
13
|
+
state.hasToolCalls = true
|
|
14
|
+
const snap = state.snapshot()
|
|
15
|
+
assert.equal(snap.lastToolSummaryTextLen, -1)
|
|
16
|
+
|
|
17
|
+
// Simulate a tool_summary retry that advanced the guard counter, then a
|
|
18
|
+
// rollback to the pre-iteration snapshot.
|
|
19
|
+
state.lastToolSummaryTextLen = state.fullText.length
|
|
20
|
+
state.fullText += ' partial speculative output that gets thrown away'
|
|
21
|
+
state.restore(snap)
|
|
22
|
+
|
|
23
|
+
assert.equal(state.lastToolSummaryTextLen, -1)
|
|
24
|
+
assert.equal(state.fullText, 'initial prompt text')
|
|
25
|
+
})
|
|
26
|
+
|
|
27
|
+
it('preserves an already-advanced lastToolSummaryTextLen across a non-rollback snapshot cycle', () => {
|
|
28
|
+
const state = new ChatTurnState()
|
|
29
|
+
state.fullText = 'Here is the answer.'
|
|
30
|
+
state.lastToolSummaryTextLen = state.fullText.length
|
|
31
|
+
const snap = state.snapshot()
|
|
32
|
+
|
|
33
|
+
state.fullText += ' Extended thought.'
|
|
34
|
+
state.restore(snap)
|
|
35
|
+
|
|
36
|
+
assert.equal(state.lastToolSummaryTextLen, 'Here is the answer.'.length)
|
|
37
|
+
assert.equal(state.fullText, 'Here is the answer.')
|
|
38
|
+
})
|
|
39
|
+
})
|
|
@@ -19,6 +19,7 @@ export interface TurnStateSnapshot {
|
|
|
19
19
|
toolFrequencyBlocked: string | false
|
|
20
20
|
terminalToolBoundary: 'memory_write' | 'durable_wait' | 'context_compaction' | null
|
|
21
21
|
memoryWriteTerminalAllowed: boolean | null
|
|
22
|
+
lastToolSummaryTextLen: number
|
|
22
23
|
}
|
|
23
24
|
|
|
24
25
|
export class ChatTurnState {
|
|
@@ -39,6 +40,7 @@ export class ChatTurnState {
|
|
|
39
40
|
terminalToolBoundary: 'memory_write' | 'durable_wait' | 'context_compaction' | null = null
|
|
40
41
|
terminalToolResponse = ''
|
|
41
42
|
memoryWriteTerminalAllowed: boolean | null = null
|
|
43
|
+
lastToolSummaryTextLen = -1
|
|
42
44
|
|
|
43
45
|
snapshot(): TurnStateSnapshot {
|
|
44
46
|
return {
|
|
@@ -53,6 +55,7 @@ export class ChatTurnState {
|
|
|
53
55
|
toolFrequencyBlocked: this.toolFrequencyBlocked,
|
|
54
56
|
terminalToolBoundary: this.terminalToolBoundary,
|
|
55
57
|
memoryWriteTerminalAllowed: this.memoryWriteTerminalAllowed,
|
|
58
|
+
lastToolSummaryTextLen: this.lastToolSummaryTextLen,
|
|
56
59
|
}
|
|
57
60
|
}
|
|
58
61
|
|
|
@@ -68,6 +71,7 @@ export class ChatTurnState {
|
|
|
68
71
|
this.toolFrequencyBlocked = snap.toolFrequencyBlocked
|
|
69
72
|
this.terminalToolBoundary = snap.terminalToolBoundary
|
|
70
73
|
this.memoryWriteTerminalAllowed = snap.memoryWriteTerminalAllowed
|
|
74
|
+
this.lastToolSummaryTextLen = snap.lastToolSummaryTextLen
|
|
71
75
|
}
|
|
72
76
|
|
|
73
77
|
/**
|
|
@@ -28,6 +28,7 @@ import {
|
|
|
28
28
|
} from '@/lib/server/chat-execution/memory-mutation-tools'
|
|
29
29
|
import { shouldForceAttachmentFollowthrough } from '@/lib/server/chat-execution/prompt-builder'
|
|
30
30
|
import { shouldSkipToolSummaryForShortResponse } from '@/lib/server/chat-execution/chat-streaming-utils'
|
|
31
|
+
import { toolSummaryHasMeaningfulProgress } from '@/lib/server/chat-execution/tool-summary-progress'
|
|
31
32
|
import { logExecution, type LogCategory } from '@/lib/server/execution-log'
|
|
32
33
|
|
|
33
34
|
// ---------------------------------------------------------------------------
|
|
@@ -378,6 +379,15 @@ function checkToolSummary(ctx: ContinuationContext): ContinuationDecision | null
|
|
|
378
379
|
)
|
|
379
380
|
)
|
|
380
381
|
if (!textIsTrivial) return null
|
|
382
|
+
const currentLen = ctx.state.fullText.length
|
|
383
|
+
const priorLen = ctx.state.lastToolSummaryTextLen
|
|
384
|
+
if (!toolSummaryHasMeaningfulProgress(priorLen, currentLen)) {
|
|
385
|
+
logStatus(ctx, 'decision', `Tool summary retry skipped — no meaningful progress (delta=${currentLen - priorLen} chars)`, {
|
|
386
|
+
priorLen, currentLen, toolEventCount: ctx.state.streamedToolEvents.length,
|
|
387
|
+
})
|
|
388
|
+
return null
|
|
389
|
+
}
|
|
390
|
+
ctx.state.lastToolSummaryTextLen = currentLen
|
|
381
391
|
const count = ctx.limits.increment('tool_summary')
|
|
382
392
|
const summaryReason = !ctx.state.fullText.trim() ? 'empty_response_after_tools' : 'trivial_preamble_after_tools'
|
|
383
393
|
logStatus(ctx, 'decision', `Tools called but response text is trivial (${ctx.state.fullText.trim().length} chars) — forcing summary continuation`, {
|
|
@@ -33,4 +33,71 @@ describe('tool-event-tracker', () => {
|
|
|
33
33
|
assert.equal(tracker.complete('nested_1'), false)
|
|
34
34
|
assert.equal(tracker.pendingCount, 0)
|
|
35
35
|
})
|
|
36
|
+
|
|
37
|
+
it('suppresses duplicate parallel tool_calls with identical name+input in the same turn', () => {
|
|
38
|
+
const tracker = new LangGraphToolEventTracker()
|
|
39
|
+
const metadata = { langgraph_node: 'tools' }
|
|
40
|
+
const event = { name: 'files', data: { input: { action: 'write', path: '/tmp/x' } }, metadata }
|
|
41
|
+
|
|
42
|
+
// First acceptance emits; second identical call is swallowed.
|
|
43
|
+
assert.equal(tracker.acceptStart({ run_id: 'r1', ...event }), true)
|
|
44
|
+
assert.equal(tracker.acceptStart({ run_id: 'r2', ...event }), false)
|
|
45
|
+
assert.equal(tracker.pendingCount, 1)
|
|
46
|
+
|
|
47
|
+
// complete() returns false for the suppressed one so the caller skips its result too.
|
|
48
|
+
assert.equal(tracker.complete('r2'), false)
|
|
49
|
+
assert.equal(tracker.complete('r1'), true)
|
|
50
|
+
assert.equal(tracker.pendingCount, 0)
|
|
51
|
+
|
|
52
|
+
// After the first call fully completes, a legitimately later identical call is accepted.
|
|
53
|
+
assert.equal(tracker.acceptStart({ run_id: 'r3', ...event }), true)
|
|
54
|
+
assert.equal(tracker.complete('r3'), true)
|
|
55
|
+
})
|
|
56
|
+
|
|
57
|
+
it('does not leak the accepted run if the same run_id re-enters acceptStart', () => {
|
|
58
|
+
// Guards against replayed start events (e.g., HMR, graph retries) causing
|
|
59
|
+
// the same run_id to be recorded both as accepted and suppressed, which
|
|
60
|
+
// would leave pendingCount > 0 after complete().
|
|
61
|
+
const tracker = new LangGraphToolEventTracker()
|
|
62
|
+
const metadata = { langgraph_node: 'tools' }
|
|
63
|
+
const event = { name: 'shell', data: { input: { cmd: 'ls' } }, metadata }
|
|
64
|
+
|
|
65
|
+
assert.equal(tracker.acceptStart({ run_id: 'same', ...event }), true)
|
|
66
|
+
assert.equal(tracker.acceptStart({ run_id: 'same', ...event }), false)
|
|
67
|
+
assert.equal(tracker.pendingCount, 1)
|
|
68
|
+
assert.equal(tracker.complete('same'), true)
|
|
69
|
+
assert.equal(tracker.pendingCount, 0)
|
|
70
|
+
})
|
|
71
|
+
|
|
72
|
+
it('handles triple-duplicate (2 suppressed) parallel tool_calls cleanly', () => {
|
|
73
|
+
const tracker = new LangGraphToolEventTracker()
|
|
74
|
+
const metadata = { langgraph_node: 'tools' }
|
|
75
|
+
const event = { name: 'files', data: { input: { action: 'read', path: '/a' } }, metadata }
|
|
76
|
+
|
|
77
|
+
assert.equal(tracker.acceptStart({ run_id: 'r1', ...event }), true)
|
|
78
|
+
assert.equal(tracker.acceptStart({ run_id: 'r2', ...event }), false)
|
|
79
|
+
assert.equal(tracker.acceptStart({ run_id: 'r3', ...event }), false)
|
|
80
|
+
assert.equal(tracker.pendingCount, 1)
|
|
81
|
+
|
|
82
|
+
// Out-of-order completions still settle correctly.
|
|
83
|
+
assert.equal(tracker.complete('r3'), false)
|
|
84
|
+
assert.equal(tracker.complete('r1'), true)
|
|
85
|
+
assert.equal(tracker.complete('r2'), false)
|
|
86
|
+
assert.equal(tracker.pendingCount, 0)
|
|
87
|
+
})
|
|
88
|
+
|
|
89
|
+
it('distinct inputs produce distinct signatures and both are accepted', () => {
|
|
90
|
+
const tracker = new LangGraphToolEventTracker()
|
|
91
|
+
const metadata = { langgraph_node: 'tools' }
|
|
92
|
+
|
|
93
|
+
assert.equal(tracker.acceptStart({
|
|
94
|
+
run_id: 'a', name: 'files', data: { input: { path: '/a' } }, metadata,
|
|
95
|
+
}), true)
|
|
96
|
+
assert.equal(tracker.acceptStart({
|
|
97
|
+
run_id: 'b', name: 'files', data: { input: { path: '/b' } }, metadata,
|
|
98
|
+
}), true)
|
|
99
|
+
assert.equal(tracker.pendingCount, 2)
|
|
100
|
+
assert.equal(tracker.complete('a'), true)
|
|
101
|
+
assert.equal(tracker.complete('b'), true)
|
|
102
|
+
})
|
|
36
103
|
})
|
|
@@ -1,5 +1,7 @@
|
|
|
1
1
|
export interface StreamToolEventLike {
|
|
2
2
|
run_id: string
|
|
3
|
+
name?: string
|
|
4
|
+
data?: { input?: unknown }
|
|
3
5
|
metadata?: Record<string, unknown>
|
|
4
6
|
}
|
|
5
7
|
|
|
@@ -9,18 +11,54 @@ export function isLangGraphToolNodeMetadata(metadata: Record<string, unknown> |
|
|
|
9
11
|
|| typeof metadata.__pregel_task_id === 'string'
|
|
10
12
|
}
|
|
11
13
|
|
|
14
|
+
function toolCallSignature(event: StreamToolEventLike): string {
|
|
15
|
+
const name = event.name || ''
|
|
16
|
+
// Only dedup when we have enough to form a meaningful signature — name
|
|
17
|
+
// is required. Otherwise callers (and tests) that track distinct run ids
|
|
18
|
+
// with no name/input must continue to work as before.
|
|
19
|
+
if (!name) return ''
|
|
20
|
+
const input = event.data?.input
|
|
21
|
+
const inputStr = typeof input === 'string' ? input : JSON.stringify(input ?? '')
|
|
22
|
+
return `${name}:${inputStr}`
|
|
23
|
+
}
|
|
24
|
+
|
|
12
25
|
export class LangGraphToolEventTracker {
|
|
13
26
|
private readonly acceptedRunIds = new Set<string>()
|
|
27
|
+
private readonly suppressedRunIds = new Set<string>()
|
|
28
|
+
// Active signatures -> first accepted run_id. Used to suppress duplicate
|
|
29
|
+
// parallel tool_calls emitted by some open-source models (e.g. Ollama's
|
|
30
|
+
// devstral emits identical tool_calls twice per turn).
|
|
31
|
+
private readonly activeSignatures = new Map<string, string>()
|
|
14
32
|
|
|
15
33
|
acceptStart(event: StreamToolEventLike): boolean {
|
|
16
34
|
if (!isLangGraphToolNodeMetadata(event.metadata)) return false
|
|
35
|
+
const signature = toolCallSignature(event)
|
|
36
|
+
if (signature && this.activeSignatures.has(signature)) {
|
|
37
|
+
const firstAcceptedId = this.activeSignatures.get(signature)
|
|
38
|
+
// If the incoming run_id matches the already-accepted one, this is a
|
|
39
|
+
// duplicate start event for the same run — treat as a no-op accept
|
|
40
|
+
// (do not suppress, since we must still acknowledge its completion).
|
|
41
|
+
if (firstAcceptedId === event.run_id) return false
|
|
42
|
+
this.suppressedRunIds.add(event.run_id)
|
|
43
|
+
return false
|
|
44
|
+
}
|
|
45
|
+
if (signature) this.activeSignatures.set(signature, event.run_id)
|
|
17
46
|
this.acceptedRunIds.add(event.run_id)
|
|
18
47
|
return true
|
|
19
48
|
}
|
|
20
49
|
|
|
21
50
|
complete(runId: string): boolean {
|
|
51
|
+
if (this.suppressedRunIds.has(runId)) {
|
|
52
|
+
this.suppressedRunIds.delete(runId)
|
|
53
|
+
return false
|
|
54
|
+
}
|
|
22
55
|
if (!this.acceptedRunIds.has(runId)) return false
|
|
23
56
|
this.acceptedRunIds.delete(runId)
|
|
57
|
+
// Clear matching signature so a legitimately-new call with the same args
|
|
58
|
+
// later in the turn is not mistaken for a duplicate.
|
|
59
|
+
for (const [sig, id] of this.activeSignatures) {
|
|
60
|
+
if (id === runId) { this.activeSignatures.delete(sig); break }
|
|
61
|
+
}
|
|
24
62
|
return true
|
|
25
63
|
}
|
|
26
64
|
|
|
@@ -0,0 +1,42 @@
|
|
|
1
|
+
import { describe, it } from 'node:test'
|
|
2
|
+
import assert from 'node:assert/strict'
|
|
3
|
+
import {
|
|
4
|
+
TOOL_SUMMARY_PROGRESS_MIN_DELTA,
|
|
5
|
+
toolSummaryHasMeaningfulProgress,
|
|
6
|
+
} from '@/lib/server/chat-execution/tool-summary-progress'
|
|
7
|
+
|
|
8
|
+
describe('toolSummaryHasMeaningfulProgress (no-progress guard for tool_summary retries)', () => {
|
|
9
|
+
it('allows the first retry even when fullText is empty (sentinel priorLen = -1)', () => {
|
|
10
|
+
assert.equal(toolSummaryHasMeaningfulProgress(-1, 0), true)
|
|
11
|
+
})
|
|
12
|
+
|
|
13
|
+
it('allows the first retry even when some text already exists (sentinel priorLen = -1)', () => {
|
|
14
|
+
assert.equal(toolSummaryHasMeaningfulProgress(-1, 9999), true)
|
|
15
|
+
})
|
|
16
|
+
|
|
17
|
+
it('skips subsequent retries when the delta is below the minimum', () => {
|
|
18
|
+
assert.equal(toolSummaryHasMeaningfulProgress(100, 105), false)
|
|
19
|
+
assert.equal(toolSummaryHasMeaningfulProgress(100, 100), false)
|
|
20
|
+
// The boundary: exactly MIN_DELTA - 1 still skips.
|
|
21
|
+
assert.equal(
|
|
22
|
+
toolSummaryHasMeaningfulProgress(100, 100 + TOOL_SUMMARY_PROGRESS_MIN_DELTA - 1),
|
|
23
|
+
false,
|
|
24
|
+
)
|
|
25
|
+
})
|
|
26
|
+
|
|
27
|
+
it('allows a retry when the delta meets or exceeds the minimum', () => {
|
|
28
|
+
assert.equal(
|
|
29
|
+
toolSummaryHasMeaningfulProgress(100, 100 + TOOL_SUMMARY_PROGRESS_MIN_DELTA),
|
|
30
|
+
true,
|
|
31
|
+
)
|
|
32
|
+
assert.equal(toolSummaryHasMeaningfulProgress(100, 1000), true)
|
|
33
|
+
})
|
|
34
|
+
|
|
35
|
+
it('skips retry when fullText SHRANK after a restore — protects against stale priorLen post-rollback', () => {
|
|
36
|
+
// This scenario was the real reason we also added lastToolSummaryTextLen to
|
|
37
|
+
// ChatTurnState snapshot/restore. Here we just verify the math: if priorLen
|
|
38
|
+
// is ahead of currentLen (e.g., rollback happened without syncing the
|
|
39
|
+
// counter), the delta is negative, so the guard correctly skips retry.
|
|
40
|
+
assert.equal(toolSummaryHasMeaningfulProgress(500, 200), false)
|
|
41
|
+
})
|
|
42
|
+
})
|
|
@@ -0,0 +1,13 @@
|
|
|
1
|
+
/** Minimum new-character delta required between tool_summary retries. */
|
|
2
|
+
export const TOOL_SUMMARY_PROGRESS_MIN_DELTA = 30
|
|
3
|
+
|
|
4
|
+
/**
|
|
5
|
+
* Returns false when a prior tool_summary retry already ran and the model
|
|
6
|
+
* has produced essentially no additional text on the follow-up turn — the
|
|
7
|
+
* signal to stop retrying. On the first pass (priorLen < 0) this is always
|
|
8
|
+
* true so the retry can happen at least once.
|
|
9
|
+
*/
|
|
10
|
+
export function toolSummaryHasMeaningfulProgress(priorLen: number, currentLen: number): boolean {
|
|
11
|
+
if (priorLen < 0) return true
|
|
12
|
+
return currentLen - priorLen >= TOOL_SUMMARY_PROGRESS_MIN_DELTA
|
|
13
|
+
}
|
|
@@ -51,6 +51,9 @@ function buildGatewayConfigFromAgent(agent: Agent, options?: { allowDisabled?: b
|
|
|
51
51
|
const route = resolvePrimaryAgentRoute(agent)
|
|
52
52
|
if (route?.provider !== 'openclaw') return null
|
|
53
53
|
|
|
54
|
+
const credential = resolveTokenForCredential(route.credentialId)
|
|
55
|
+
if (credential.dangling) return null
|
|
56
|
+
|
|
54
57
|
const routeProfile = route.gatewayProfileId ? getGatewayProfile(route.gatewayProfileId) : null
|
|
55
58
|
return {
|
|
56
59
|
key: route.gatewayProfileId ? `profile:${route.gatewayProfileId}` : `agent:${agent.id}`,
|
|
@@ -60,7 +63,7 @@ function buildGatewayConfigFromAgent(agent: Agent, options?: { allowDisabled?: b
|
|
|
60
63
|
: route.apiEndpoint
|
|
61
64
|
? deriveOpenClawWsUrl(route.apiEndpoint)
|
|
62
65
|
: DEFAULT_OPENCLAW_WS_URL,
|
|
63
|
-
token:
|
|
66
|
+
token: credential.token,
|
|
64
67
|
}
|
|
65
68
|
}
|
|
66
69
|
|
|
@@ -80,15 +83,28 @@ function normalizeWsUrl(raw: string): string {
|
|
|
80
83
|
return url.replace(/^http:/i, 'ws:').replace(/^https:/i, 'wss:')
|
|
81
84
|
}
|
|
82
85
|
|
|
83
|
-
|
|
84
|
-
|
|
86
|
+
/**
|
|
87
|
+
* Resolves an OpenClaw gateway credential.
|
|
88
|
+
*
|
|
89
|
+
* - `credentialId` null/undefined: the caller wants an unauthenticated
|
|
90
|
+
* connection. Returns `{ token: undefined, dangling: false }`.
|
|
91
|
+
* - `credentialId` set and resolvable: returns `{ token, dangling: false }`.
|
|
92
|
+
* - `credentialId` set but missing/deleted: returns `{ token: undefined,
|
|
93
|
+
* dangling: true }`. Callers should treat this as a hard configuration
|
|
94
|
+
* error and refuse to dial — otherwise the WS handshake fails as
|
|
95
|
+
* `unauthorized: gateway token missing` and the agent-side timeout waits
|
|
96
|
+
* the full 120 s before surfacing the error to the user.
|
|
97
|
+
*/
|
|
98
|
+
function resolveTokenForCredential(credentialId?: string | null): { token: string | undefined; dangling: boolean } {
|
|
99
|
+
if (!credentialId) return { token: undefined, dangling: false }
|
|
85
100
|
const secret = resolveCredentialSecret(credentialId)
|
|
86
101
|
if (!secret) {
|
|
87
|
-
log.
|
|
102
|
+
log.error(TAG, `Credential "${credentialId}" is referenced but missing from the credential store — refusing gateway connection`, {
|
|
88
103
|
credentialId,
|
|
89
104
|
})
|
|
105
|
+
return { token: undefined, dangling: true }
|
|
90
106
|
}
|
|
91
|
-
return secret
|
|
107
|
+
return { token: secret, dangling: false }
|
|
92
108
|
}
|
|
93
109
|
|
|
94
110
|
export function resolveGatewayConfig(target?: {
|
|
@@ -99,11 +115,13 @@ export function resolveGatewayConfig(target?: {
|
|
|
99
115
|
if (profileId) {
|
|
100
116
|
const profile = getGatewayProfile(profileId)
|
|
101
117
|
if (!profile) return null
|
|
118
|
+
const credential = resolveTokenForCredential(profile.credentialId)
|
|
119
|
+
if (credential.dangling) return null
|
|
102
120
|
return {
|
|
103
121
|
key: `profile:${profile.id}`,
|
|
104
122
|
profileId: profile.id,
|
|
105
123
|
wsUrl: profile.wsUrl ? normalizeWsUrl(profile.wsUrl) : deriveOpenClawWsUrl(profile.endpoint),
|
|
106
|
-
token:
|
|
124
|
+
token: credential.token,
|
|
107
125
|
}
|
|
108
126
|
}
|
|
109
127
|
|
|
@@ -123,11 +141,13 @@ export function resolveGatewayConfig(target?: {
|
|
|
123
141
|
const gatewayProfiles = getGatewayProfiles('openclaw')
|
|
124
142
|
if (gatewayProfiles[0]) {
|
|
125
143
|
const profile = gatewayProfiles[0]
|
|
144
|
+
const credential = resolveTokenForCredential(profile.credentialId)
|
|
145
|
+
if (credential.dangling) return null
|
|
126
146
|
return {
|
|
127
147
|
key: `profile:${profile.id}`,
|
|
128
148
|
profileId: profile.id,
|
|
129
149
|
wsUrl: profile.wsUrl ? normalizeWsUrl(profile.wsUrl) : deriveOpenClawWsUrl(profile.endpoint),
|
|
130
|
-
token:
|
|
150
|
+
token: credential.token,
|
|
131
151
|
}
|
|
132
152
|
}
|
|
133
153
|
return null
|
|
@@ -34,7 +34,11 @@ const perfState = hmrSingleton('__swarmclaw_perf__', () => ({
|
|
|
34
34
|
recentEntries: [] as PerfEntry[],
|
|
35
35
|
}))
|
|
36
36
|
|
|
37
|
-
|
|
37
|
+
// Keep a generous ring buffer so perf entries from a chat turn survive the
|
|
38
|
+
// flurry of repository/queue events that fire between them. 200 was too small
|
|
39
|
+
// — queue.get/tasks.list fire ~20/s during task processing and would evict
|
|
40
|
+
// chat-execution/prompt entries before they could be read.
|
|
41
|
+
const MAX_RECENT = 2000
|
|
38
42
|
|
|
39
43
|
function emitEntry(entry: PerfEntry): void {
|
|
40
44
|
perfState.recentEntries.push(entry)
|
|
@@ -700,6 +700,7 @@ export function reconcileFinishedRunningTasks(): { reconciled: number; deadLette
|
|
|
700
700
|
if (!fallbackText && !task.result) {
|
|
701
701
|
task.status = 'failed'
|
|
702
702
|
task.result = 'Agent session finished without producing output.'
|
|
703
|
+
task.checkoutRunId = null
|
|
703
704
|
task.updatedAt = now
|
|
704
705
|
tasksDirty = true
|
|
705
706
|
continue
|
|
@@ -964,6 +965,7 @@ function scheduleRetryOrDeadLetter(task: BoardTask, reason: string): 'retry' | '
|
|
|
964
965
|
if (isCancelledTask(task)) {
|
|
965
966
|
task.retryScheduledAt = null
|
|
966
967
|
task.deadLetteredAt = null
|
|
968
|
+
task.checkoutRunId = null
|
|
967
969
|
task.updatedAt = Date.now()
|
|
968
970
|
return 'dead_lettered'
|
|
969
971
|
}
|
|
@@ -975,6 +977,10 @@ function scheduleRetryOrDeadLetter(task: BoardTask, reason: string): 'retry' | '
|
|
|
975
977
|
const delayMs = jitteredBackoff((task.retryBackoffSec || 30) * 1000, Math.max(0, (task.attempts || 1) - 1), 6 * 3600_000)
|
|
976
978
|
task.status = 'queued'
|
|
977
979
|
task.retryScheduledAt = now + delayMs
|
|
980
|
+
// Release the prior checkout so the task can be checked out again on retry.
|
|
981
|
+
// Without this, checkoutTask() returns null every attempt and the orphan-
|
|
982
|
+
// recovery loop burns CPU re-queueing a task that can never run.
|
|
983
|
+
task.checkoutRunId = null
|
|
978
984
|
task.updatedAt = now
|
|
979
985
|
task.error = `Retry scheduled after failure: ${reason}`.slice(0, 500)
|
|
980
986
|
if (!task.comments) task.comments = []
|
|
@@ -990,6 +996,7 @@ function scheduleRetryOrDeadLetter(task: BoardTask, reason: string): 'retry' | '
|
|
|
990
996
|
task.status = 'failed'
|
|
991
997
|
task.deadLetteredAt = now
|
|
992
998
|
task.retryScheduledAt = null
|
|
999
|
+
task.checkoutRunId = null
|
|
993
1000
|
task.updatedAt = now
|
|
994
1001
|
task.error = `Dead-lettered after ${task.attempts}/${task.maxAttempts} attempts: ${reason}`.slice(0, 500)
|
|
995
1002
|
if (!task.comments) task.comments = []
|
|
@@ -1099,13 +1106,23 @@ export async function processNext() {
|
|
|
1099
1106
|
const currentQueue = loadQueue()
|
|
1100
1107
|
const queueSet = new Set(currentQueue)
|
|
1101
1108
|
let recovered = false
|
|
1109
|
+
let tasksDirty = false
|
|
1102
1110
|
for (const [id, t] of Object.entries(allTasks) as [string, BoardTask][]) {
|
|
1103
1111
|
if (t.status === 'queued' && !queueSet.has(id)) {
|
|
1104
1112
|
log.info(TAG, `[queue] Recovering orphaned queued task: "${t.title}" (${id})`)
|
|
1113
|
+
// Defence in depth: a queued task must not carry a stale checkoutRunId
|
|
1114
|
+
// (left over from pre-1.5.38 retries). If it does, checkoutTask() will
|
|
1115
|
+
// reject every attempt and this orphan-recovery loop will spin at 100%
|
|
1116
|
+
// CPU re-queueing a task that can never run.
|
|
1117
|
+
if (t.checkoutRunId) {
|
|
1118
|
+
t.checkoutRunId = null
|
|
1119
|
+
tasksDirty = true
|
|
1120
|
+
}
|
|
1105
1121
|
pushQueueUnique(currentQueue, id)
|
|
1106
1122
|
recovered = true
|
|
1107
1123
|
}
|
|
1108
1124
|
}
|
|
1125
|
+
if (tasksDirty) saveTasks(allTasks)
|
|
1109
1126
|
if (recovered) saveQueue(currentQueue)
|
|
1110
1127
|
}
|
|
1111
1128
|
|
|
@@ -1146,6 +1163,7 @@ export async function processNext() {
|
|
|
1146
1163
|
if (!agent) {
|
|
1147
1164
|
task.status = 'failed'
|
|
1148
1165
|
task.deadLetteredAt = Date.now()
|
|
1166
|
+
task.checkoutRunId = null
|
|
1149
1167
|
task.error = `Agent ${task.agentId} not found`
|
|
1150
1168
|
task.updatedAt = Date.now()
|
|
1151
1169
|
saveTasks(latestTasks)
|
|
@@ -1176,6 +1194,7 @@ export async function processNext() {
|
|
|
1176
1194
|
} else {
|
|
1177
1195
|
task.status = 'failed'
|
|
1178
1196
|
task.deadLetteredAt = Date.now()
|
|
1197
|
+
task.checkoutRunId = null
|
|
1179
1198
|
task.error = `No agent matches required capabilities: [${reqCaps.join(', ')}]`
|
|
1180
1199
|
task.updatedAt = Date.now()
|
|
1181
1200
|
saveTasks(latestTasks)
|
|
@@ -248,6 +248,182 @@ describe('queue recovery', () => {
|
|
|
248
248
|
assert.equal(output.scheduledCalls, 1)
|
|
249
249
|
})
|
|
250
250
|
|
|
251
|
+
it('scheduleRetryOrDeadLetter via stall recovery clears checkoutRunId so the next attempt can check out', () => {
|
|
252
|
+
// Regression: a task transitioning running -> queued on retry must release its
|
|
253
|
+
// prior checkout. Without this, checkoutTask() returns null on every attempt
|
|
254
|
+
// and the orphan-recovery loop burns CPU re-queueing a task that can never run.
|
|
255
|
+
const output = runWithTempDataDir<{
|
|
256
|
+
status: string | null
|
|
257
|
+
checkoutRunId: string | null
|
|
258
|
+
queued: string[]
|
|
259
|
+
attempts: number | null
|
|
260
|
+
}>(`
|
|
261
|
+
const storageMod = await import('@/lib/server/storage')
|
|
262
|
+
const queueMod = await import('@/lib/server/runtime/queue')
|
|
263
|
+
const storage = storageMod.default || storageMod
|
|
264
|
+
const queue = queueMod.default || queueMod
|
|
265
|
+
|
|
266
|
+
const now = Date.now()
|
|
267
|
+
storage.saveSettings({
|
|
268
|
+
...storage.loadSettings(),
|
|
269
|
+
taskStallTimeoutMin: 5,
|
|
270
|
+
taskRetryBackoffSec: 30,
|
|
271
|
+
})
|
|
272
|
+
storage.saveTasks({
|
|
273
|
+
stuck: {
|
|
274
|
+
id: 'stuck',
|
|
275
|
+
title: 'Stuck with stale checkout',
|
|
276
|
+
description: 'Running task that stalled and must release its checkout on retry',
|
|
277
|
+
status: 'running',
|
|
278
|
+
agentId: 'agent-a',
|
|
279
|
+
startedAt: now - 600_000,
|
|
280
|
+
updatedAt: now - 600_000,
|
|
281
|
+
createdAt: now - 700_000,
|
|
282
|
+
maxAttempts: 3,
|
|
283
|
+
attempts: 0,
|
|
284
|
+
checkoutRunId: 'stale-run-id',
|
|
285
|
+
},
|
|
286
|
+
})
|
|
287
|
+
storage.saveQueue([])
|
|
288
|
+
|
|
289
|
+
const originalSetTimeout = globalThis.setTimeout
|
|
290
|
+
globalThis.setTimeout = () => 0
|
|
291
|
+
try {
|
|
292
|
+
queue.recoverStalledRunningTasks()
|
|
293
|
+
} finally {
|
|
294
|
+
globalThis.setTimeout = originalSetTimeout
|
|
295
|
+
}
|
|
296
|
+
|
|
297
|
+
const task = storage.loadTasks().stuck
|
|
298
|
+
console.log(JSON.stringify({
|
|
299
|
+
status: task?.status ?? null,
|
|
300
|
+
checkoutRunId: task?.checkoutRunId ?? null,
|
|
301
|
+
queued: storage.loadQueue(),
|
|
302
|
+
attempts: task?.attempts ?? null,
|
|
303
|
+
}))
|
|
304
|
+
`)
|
|
305
|
+
|
|
306
|
+
assert.equal(output.status, 'queued', 'task should be requeued for retry')
|
|
307
|
+
assert.equal(output.checkoutRunId, null, 'stale checkoutRunId must be released so retry can check out')
|
|
308
|
+
assert.deepEqual(output.queued, ['stuck'])
|
|
309
|
+
assert.equal(output.attempts, 1)
|
|
310
|
+
})
|
|
311
|
+
|
|
312
|
+
it('processNext orphan recovery clears stale checkoutRunId on queued tasks', () => {
|
|
313
|
+
// Regression: tasks written before the 1.5.38 fix could land in storage with
|
|
314
|
+
// status='queued' + a set checkoutRunId (because the old scheduleRetryOrDeadLetter
|
|
315
|
+
// forgot to release the checkout). Orphan recovery must repair this invalid combo
|
|
316
|
+
// so the next checkoutTask() can succeed — otherwise the loop spins forever.
|
|
317
|
+
const output = runWithTempDataDir<{
|
|
318
|
+
status: string | null
|
|
319
|
+
checkoutRunId: string | null
|
|
320
|
+
queued: string[]
|
|
321
|
+
}>(`
|
|
322
|
+
const storageMod = await import('@/lib/server/storage')
|
|
323
|
+
const queueMod = await import('@/lib/server/runtime/queue')
|
|
324
|
+
const storage = storageMod.default || storageMod
|
|
325
|
+
const queue = queueMod.default || queueMod
|
|
326
|
+
|
|
327
|
+
const now = Date.now()
|
|
328
|
+
storage.saveAgents({
|
|
329
|
+
'agent-a': {
|
|
330
|
+
id: 'agent-a',
|
|
331
|
+
name: 'Agent A',
|
|
332
|
+
provider: 'openai',
|
|
333
|
+
model: 'gpt-test',
|
|
334
|
+
createdAt: now,
|
|
335
|
+
updatedAt: now,
|
|
336
|
+
},
|
|
337
|
+
})
|
|
338
|
+
storage.saveTasks({
|
|
339
|
+
stale: {
|
|
340
|
+
id: 'stale',
|
|
341
|
+
title: 'Pre-1.5.38 stuck task',
|
|
342
|
+
description: 'Queued but still holds a stale checkoutRunId from a prior failed run',
|
|
343
|
+
status: 'queued',
|
|
344
|
+
agentId: 'agent-a',
|
|
345
|
+
checkoutRunId: 'stale-run-id',
|
|
346
|
+
createdAt: now - 10_000,
|
|
347
|
+
updatedAt: now - 10_000,
|
|
348
|
+
},
|
|
349
|
+
})
|
|
350
|
+
// Intentionally NOT in the queue array — simulates the orphan condition.
|
|
351
|
+
storage.saveQueue([])
|
|
352
|
+
|
|
353
|
+
await queue.processNext()
|
|
354
|
+
|
|
355
|
+
const task = storage.loadTasks().stale
|
|
356
|
+
console.log(JSON.stringify({
|
|
357
|
+
status: task?.status ?? null,
|
|
358
|
+
checkoutRunId: task?.checkoutRunId ?? null,
|
|
359
|
+
queued: storage.loadQueue(),
|
|
360
|
+
}))
|
|
361
|
+
`)
|
|
362
|
+
|
|
363
|
+
// Orphan recovery should have put the task back in the queue AND cleared the stale id.
|
|
364
|
+
assert.equal(output.checkoutRunId, null, 'orphan recovery must clear stale checkoutRunId')
|
|
365
|
+
// After recovery the task either stayed queued or moved to running (depending on concurrency).
|
|
366
|
+
// Either way it must not still be stuck in an orphan state.
|
|
367
|
+
assert.ok(
|
|
368
|
+
output.status === 'queued' || output.status === 'running' || output.status === 'failed',
|
|
369
|
+
`unexpected status after recovery: ${output.status}`,
|
|
370
|
+
)
|
|
371
|
+
})
|
|
372
|
+
|
|
373
|
+
it('dead-letter path clears checkoutRunId so terminal tasks do not appear checked-out', () => {
|
|
374
|
+
const output = runWithTempDataDir<{
|
|
375
|
+
status: string | null
|
|
376
|
+
checkoutRunId: string | null
|
|
377
|
+
attempts: number | null
|
|
378
|
+
}>(`
|
|
379
|
+
const storageMod = await import('@/lib/server/storage')
|
|
380
|
+
const queueMod = await import('@/lib/server/runtime/queue')
|
|
381
|
+
const storage = storageMod.default || storageMod
|
|
382
|
+
const queue = queueMod.default || queueMod
|
|
383
|
+
|
|
384
|
+
const now = Date.now()
|
|
385
|
+
storage.saveSettings({
|
|
386
|
+
...storage.loadSettings(),
|
|
387
|
+
taskStallTimeoutMin: 5,
|
|
388
|
+
})
|
|
389
|
+
storage.saveTasks({
|
|
390
|
+
doomed: {
|
|
391
|
+
id: 'doomed',
|
|
392
|
+
title: 'Exhausted retries',
|
|
393
|
+
description: 'Task at its last attempt that stalls should dead-letter and release checkout',
|
|
394
|
+
status: 'running',
|
|
395
|
+
agentId: 'agent-a',
|
|
396
|
+
startedAt: now - 600_000,
|
|
397
|
+
updatedAt: now - 600_000,
|
|
398
|
+
createdAt: now - 700_000,
|
|
399
|
+
maxAttempts: 2,
|
|
400
|
+
attempts: 1,
|
|
401
|
+
checkoutRunId: 'stale-run-id',
|
|
402
|
+
},
|
|
403
|
+
})
|
|
404
|
+
storage.saveQueue([])
|
|
405
|
+
|
|
406
|
+
const originalSetTimeout = globalThis.setTimeout
|
|
407
|
+
globalThis.setTimeout = () => 0
|
|
408
|
+
try {
|
|
409
|
+
queue.recoverStalledRunningTasks()
|
|
410
|
+
} finally {
|
|
411
|
+
globalThis.setTimeout = originalSetTimeout
|
|
412
|
+
}
|
|
413
|
+
|
|
414
|
+
const task = storage.loadTasks().doomed
|
|
415
|
+
console.log(JSON.stringify({
|
|
416
|
+
status: task?.status ?? null,
|
|
417
|
+
checkoutRunId: task?.checkoutRunId ?? null,
|
|
418
|
+
attempts: task?.attempts ?? null,
|
|
419
|
+
}))
|
|
420
|
+
`)
|
|
421
|
+
|
|
422
|
+
assert.equal(output.status, 'failed', 'task should be dead-lettered after exhausting retries')
|
|
423
|
+
assert.equal(output.checkoutRunId, null, 'dead-lettered tasks must not retain a stale checkoutRunId')
|
|
424
|
+
assert.equal(output.attempts, 2)
|
|
425
|
+
})
|
|
426
|
+
|
|
251
427
|
it('resumeQueue restores blocked queued tasks without clobbering their queuedAt timestamp', () => {
|
|
252
428
|
const output = runWithTempDataDir<{
|
|
253
429
|
queued: string[]
|
|
@@ -654,6 +654,16 @@ export function resolveRuntimeSkills(options: ResolveRuntimeSkillsOptions = {}):
|
|
|
654
654
|
}
|
|
655
655
|
}
|
|
656
656
|
|
|
657
|
+
// Dedicated sub-budget for auto-attached learned skills. buildSeedFromLearned
|
|
658
|
+
// marks every learned skill as `attached`, which means a single coordinator
|
|
659
|
+
// agent with 100+ historical learnings could flood the whole 30 k pinned-skill
|
|
660
|
+
// block every turn (observed: 178 learned skills / 176 k chars candidate pool
|
|
661
|
+
// → 24 k-char Pinned Skills section on every CEO turn). We cap learned-skill
|
|
662
|
+
// injection well below the full budget so explicitly-pinned/always-on skills
|
|
663
|
+
// still fit afterward.
|
|
664
|
+
const MAX_LEARNED_SKILLS_PROMPT_CHARS = 8000
|
|
665
|
+
const MAX_LEARNED_SKILLS_IN_PROMPT = 6
|
|
666
|
+
|
|
657
667
|
function selectPromptSkills(skills: ResolvedRuntimeSkill[]): ResolvedRuntimeSkill[] {
|
|
658
668
|
const ordered = [...skills]
|
|
659
669
|
.filter((skill) =>
|
|
@@ -670,16 +680,39 @@ function selectPromptSkills(skills: ResolvedRuntimeSkill[]): ResolvedRuntimeSkil
|
|
|
670
680
|
|
|
671
681
|
const selected: ResolvedRuntimeSkill[] = []
|
|
672
682
|
let totalChars = 0
|
|
683
|
+
let learnedChars = 0
|
|
684
|
+
let learnedCount = 0
|
|
673
685
|
for (const skill of ordered) {
|
|
674
686
|
if (selected.length >= MAX_SKILLS_IN_PROMPT) break
|
|
675
687
|
const contentLen = skill.name.length + skill.content.length + 12
|
|
676
688
|
if (totalChars + contentLen > MAX_SKILLS_PROMPT_CHARS) continue
|
|
689
|
+
const isLearned = skill.source === 'learned'
|
|
690
|
+
if (isLearned) {
|
|
691
|
+
if (learnedCount >= MAX_LEARNED_SKILLS_IN_PROMPT) continue
|
|
692
|
+
if (learnedChars + contentLen > MAX_LEARNED_SKILLS_PROMPT_CHARS) continue
|
|
693
|
+
learnedChars += contentLen
|
|
694
|
+
learnedCount += 1
|
|
695
|
+
}
|
|
677
696
|
totalChars += contentLen
|
|
678
697
|
selected.push(skill)
|
|
679
698
|
}
|
|
680
699
|
return selected
|
|
681
700
|
}
|
|
682
701
|
|
|
702
|
+
// Hard cap on how much skill content we inline per pinned skill. Long skill
|
|
703
|
+
// files (multi-page markdown guides) were dominating the system prompt — one
|
|
704
|
+
// coordinator agent had 24,402 chars (39% of its 62 k budget) from a single
|
|
705
|
+
// pinned skill. When content exceeds the cap we truncate and instruct the
|
|
706
|
+
// agent to pull the rest on demand via `use_skill` action="load".
|
|
707
|
+
const INLINED_SKILL_CHAR_CAP = 3000
|
|
708
|
+
|
|
709
|
+
function truncateInlinedSkillContent(content: string, skillName: string): string {
|
|
710
|
+
const trimmed = content.trim()
|
|
711
|
+
if (trimmed.length <= INLINED_SKILL_CHAR_CAP) return trimmed
|
|
712
|
+
const head = trimmed.slice(0, INLINED_SKILL_CHAR_CAP)
|
|
713
|
+
return `${head}\n\n[Skill content truncated at ${INLINED_SKILL_CHAR_CAP} chars to save context. Call \`use_skill\` with action="load" and skillId for "${skillName}" to load the full guide when you need it.]`
|
|
714
|
+
}
|
|
715
|
+
|
|
683
716
|
function sectionFromSkills(params: {
|
|
684
717
|
title: string
|
|
685
718
|
preface: string
|
|
@@ -688,7 +721,7 @@ function sectionFromSkills(params: {
|
|
|
688
721
|
const usable = params.skills.filter((skill) => skill.content.trim())
|
|
689
722
|
if (usable.length === 0) return ''
|
|
690
723
|
const body = usable
|
|
691
|
-
.map((skill) => `### ${skill.name}\n${skill.content}`)
|
|
724
|
+
.map((skill) => `### ${skill.name}\n${truncateInlinedSkillContent(skill.content, skill.name)}`)
|
|
692
725
|
.join('\n\n')
|
|
693
726
|
return [params.title, params.preface, '', body].join('\n')
|
|
694
727
|
}
|
|
@@ -19,7 +19,13 @@ export function checkoutTask(
|
|
|
19
19
|
const tasks = loadTasks() as Record<string, BoardTask>
|
|
20
20
|
const task = tasks[taskId]
|
|
21
21
|
if (!task || task.status !== 'queued') return null
|
|
22
|
-
|
|
22
|
+
// A stale checkoutRunId can survive an ungraceful server exit (crash,
|
|
23
|
+
// SIGKILL, HMR reload mid-turn). If status is 'queued', the runId cannot
|
|
24
|
+
// reference a live checkout — only running tasks hold active checkouts —
|
|
25
|
+
// so treat the lingering id as stale and reclaim it. Previously this
|
|
26
|
+
// returned null forever, so the dispatch → orphan-recovery → failed-
|
|
27
|
+
// checkout cycle spammed "Recovering orphaned queued task" every ~2 ms
|
|
28
|
+
// (21 k log lines in a single session).
|
|
23
29
|
|
|
24
30
|
const now = Date.now()
|
|
25
31
|
task.status = 'running'
|
|
@@ -0,0 +1,71 @@
|
|
|
1
|
+
import { describe, it } from 'node:test'
|
|
2
|
+
import assert from 'node:assert/strict'
|
|
3
|
+
|
|
4
|
+
// NOTE: we intentionally avoid importing the real universal-tool-access
|
|
5
|
+
// module here — it pulls in the extension manager which transitively loads
|
|
6
|
+
// the whole plugin system and OOMs in test workers. We re-declare the pure
|
|
7
|
+
// logic and verify the algorithmic behavior. Integration coverage for the
|
|
8
|
+
// extension-manager branch happens via live-chat profiling instead.
|
|
9
|
+
|
|
10
|
+
const SCOPED_TOOL_BASELINE = ['memory', 'context_mgmt', 'ask_human'] as const
|
|
11
|
+
const UNIVERSAL_SAMPLE = new Set([
|
|
12
|
+
'shell', 'files', 'edit_file', 'delegate', 'web', 'browser', 'memory',
|
|
13
|
+
'manage_platform', 'manage_tasks', 'context_mgmt', 'ask_human',
|
|
14
|
+
'schedule_wake', 'email', 'image_gen',
|
|
15
|
+
])
|
|
16
|
+
|
|
17
|
+
function normalize(value: string[] | undefined | null): string[] {
|
|
18
|
+
if (!Array.isArray(value)) return []
|
|
19
|
+
return value.map((entry) => (typeof entry === 'string' ? entry.trim() : '')).filter(Boolean)
|
|
20
|
+
}
|
|
21
|
+
|
|
22
|
+
function scoped(declared: string[] | null | undefined, universe: Set<string> = UNIVERSAL_SAMPLE): string[] {
|
|
23
|
+
const picks = normalize(declared).filter((t) => universe.has(t))
|
|
24
|
+
return Array.from(new Set([...SCOPED_TOOL_BASELINE, ...picks]))
|
|
25
|
+
}
|
|
26
|
+
|
|
27
|
+
describe('scoped tool access algorithm', () => {
|
|
28
|
+
it('intersects declared tools with the universe and keeps the baseline', () => {
|
|
29
|
+
const out = scoped(['shell', 'files', 'edit_file', 'web'])
|
|
30
|
+
assert.ok(out.includes('memory'))
|
|
31
|
+
assert.ok(out.includes('context_mgmt'))
|
|
32
|
+
assert.ok(out.includes('ask_human'))
|
|
33
|
+
assert.ok(out.includes('shell'))
|
|
34
|
+
assert.ok(out.includes('files'))
|
|
35
|
+
assert.ok(out.includes('edit_file'))
|
|
36
|
+
assert.ok(out.includes('web'))
|
|
37
|
+
assert.ok(!out.includes('browser'))
|
|
38
|
+
assert.ok(!out.includes('manage_platform'))
|
|
39
|
+
assert.ok(!out.includes('delegate'))
|
|
40
|
+
})
|
|
41
|
+
|
|
42
|
+
it('drops declared tools that are not in the universe', () => {
|
|
43
|
+
const out = scoped(['shell', 'not_a_real_tool'])
|
|
44
|
+
assert.ok(out.includes('shell'))
|
|
45
|
+
assert.ok(!out.includes('not_a_real_tool'))
|
|
46
|
+
})
|
|
47
|
+
|
|
48
|
+
it('returns only the baseline when declared tools is empty', () => {
|
|
49
|
+
assert.deepEqual(scoped([]).sort(), ['ask_human', 'context_mgmt', 'memory'])
|
|
50
|
+
})
|
|
51
|
+
|
|
52
|
+
it('produces a strictly smaller set than the universe for a focused agent', () => {
|
|
53
|
+
assert.ok(scoped(['shell', 'files', 'web']).length < UNIVERSAL_SAMPLE.size)
|
|
54
|
+
})
|
|
55
|
+
|
|
56
|
+
it('deduplicates when baseline overlaps with declared tools', () => {
|
|
57
|
+
const out = scoped(['memory', 'shell'])
|
|
58
|
+
assert.equal(out.filter((t) => t === 'memory').length, 1)
|
|
59
|
+
})
|
|
60
|
+
|
|
61
|
+
it('treats null / undefined / non-array declared tools as empty', () => {
|
|
62
|
+
assert.deepEqual(scoped(null).sort(), ['ask_human', 'context_mgmt', 'memory'])
|
|
63
|
+
assert.deepEqual(scoped(undefined).sort(), ['ask_human', 'context_mgmt', 'memory'])
|
|
64
|
+
})
|
|
65
|
+
|
|
66
|
+
it('trims whitespace in declared tool names', () => {
|
|
67
|
+
const out = scoped([' shell ', '\tfiles\n'])
|
|
68
|
+
assert.ok(out.includes('shell'))
|
|
69
|
+
assert.ok(out.includes('files'))
|
|
70
|
+
})
|
|
71
|
+
})
|
|
@@ -57,3 +57,26 @@ export function listUniversalToolAccessExtensionIds(extraExtensions?: string[] |
|
|
|
57
57
|
...normalizeExtensionList(extraExtensions),
|
|
58
58
|
])
|
|
59
59
|
}
|
|
60
|
+
|
|
61
|
+
// Minimum extensions that a 'scoped' agent always gets regardless of its
|
|
62
|
+
// declared tool list. Memory + context management are required for the agent
|
|
63
|
+
// to function (remembering things, noticing when it's out of context), and
|
|
64
|
+
// ask_human lets it escalate to the user when stuck. Everything else is
|
|
65
|
+
// filterable through agent.tools.
|
|
66
|
+
const SCOPED_TOOL_BASELINE = ['memory', 'context_mgmt', 'ask_human'] as const
|
|
67
|
+
|
|
68
|
+
/**
|
|
69
|
+
* Returns the set of enabled extension IDs for a scoped-access agent: the
|
|
70
|
+
* intersection of `listUniversalToolAccessExtensionIds()` with the agent's
|
|
71
|
+
* declared tools, plus the non-negotiable baseline. Use this when an agent
|
|
72
|
+
* has opted into `toolAccessMode: 'scoped'` to shrink per-turn context.
|
|
73
|
+
*/
|
|
74
|
+
export function listScopedToolAccessExtensionIds(
|
|
75
|
+
declaredTools: string[] | null | undefined,
|
|
76
|
+
extraExtensions?: string[] | null,
|
|
77
|
+
): string[] {
|
|
78
|
+
const universe = new Set(listUniversalToolAccessExtensionIds(extraExtensions))
|
|
79
|
+
const declared = normalizeExtensionList(declaredTools)
|
|
80
|
+
const scoped = declared.filter((tool) => universe.has(tool))
|
|
81
|
+
return dedup([...SCOPED_TOOL_BASELINE, ...scoped])
|
|
82
|
+
}
|
package/src/types/agent.ts
CHANGED
|
@@ -68,6 +68,13 @@ export interface Agent {
|
|
|
68
68
|
delegationTargetMode?: DelegationTargetMode
|
|
69
69
|
delegationTargetAgentIds?: string[]
|
|
70
70
|
tools?: string[]
|
|
71
|
+
// When 'scoped', the chat turn restricts enabled extensions to the
|
|
72
|
+
// intersection of the universal core list and agent.tools (plus a small
|
|
73
|
+
// non-negotiable baseline for memory + context management). Default
|
|
74
|
+
// 'universal' preserves existing behavior. Opt in to cut per-turn tool
|
|
75
|
+
// guidance dramatically — a focused agent with 5 tools drops ~15 k chars
|
|
76
|
+
// of tool-related prompt text vs. the full 33-tool universe.
|
|
77
|
+
toolAccessMode?: 'universal' | 'scoped'
|
|
71
78
|
extensions?: string[]
|
|
72
79
|
skills?: string[] // e.g. ['frontend-design'] — pinned Claude Code skills to mention explicitly
|
|
73
80
|
skillIds?: string[] // IDs of pinned managed skills to keep always-on for this agent
|