@swarmclawai/swarmclaw 1.5.37 → 1.5.38

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -389,6 +389,17 @@ Operational docs: https://swarmclaw.ai/docs/observability
389
389
 
390
390
  ## Releases
391
391
 
392
+ ### v1.5.38 Highlights
393
+
394
+ - **Task queue: reclaim stale checkouts**: `checkoutTask()` now reclaims a lingering `checkoutRunId` on a `queued` task instead of refusing it forever. An ungraceful server exit mid-turn (crash, SIGKILL, HMR reload) previously left tasks uncheckoutable, producing a dispatch → orphan-recovery → failed-checkout spin that logged "Recovering orphaned queued task" tens of thousands of times per session. `scheduleRetryOrDeadLetter()` also clears the prior checkout when scheduling a retry or dead-lettering.
395
+ - **Chat: suppress duplicate parallel tool calls**: some OSS models on Ollama (notably `devstral`) emit the same tool call twice in a single turn. The LangGraph tool-event tracker now dedupes by `name + input` signature, swallowing the duplicate start and its result while allowing a genuinely later identical call once the first completes. Hardened against replayed-start events (HMR, graph retries) that previously could leak a `run_id` into both the accepted and suppressed sets and leave `pendingCount` stuck above zero.
396
+ - **Chat: disable `parallel_tool_calls` for Ollama**: local Ollama sessions now pass `parallel_tool_calls: false` to prevent the upstream duplicate-call behavior at the source for models that honor it.
397
+ - **Chat: no-progress guard for tool summary retries**: if the model produces essentially no new text on a `tool_summary` continuation, the loop stops retrying instead of streaming the same short sentence two or three times. The guard is snapshot-aware: a transient-error rollback no longer leaves a stale progress counter that silently skips a legitimate retry (`lastToolSummaryTextLen` is now round-tripped through `ChatTurnState.snapshot`/`restore`).
398
+ - **Task UI: distinguish retry-pending from failure**: a retrying task now renders in amber with a "Retry Pending" label in the task card and sheet, instead of the same red treatment used for dead-lettered failures.
399
+ - **Autonomy: dedupe reflection memories across kinds**: the supervisor reflection writer now drops notes whose normalized text has already been stored this run, eliminating near-identical memory rows classified under multiple kinds.
400
+ - **OpenClaw gateway: fast-fail on dangling credentials**: when an agent's OpenClaw route references a deleted or missing credential, the gateway now refuses to dial the WebSocket up front instead of attempting an unauthenticated handshake and waiting the full 120 s for the agent-side timeout. The credential-missing log line is promoted from warn to error so it surfaces in routine monitoring.
401
+ - **Prompt size profiler**: setting `SWARMCLAW_PROFILE_PROMPT=1` now logs a per-section size breakdown of the assembled system prompt (block index, first-line label, char count) on every turn, making it practical to diagnose why a specific agent is eating context budget. Off by default so production turns stay quiet.
402
+
392
403
  ### v1.5.37 Highlights
393
404
 
394
405
  - **Factory Droid CLI as a provider and delegation backend**: adds [`droid`](https://docs.factory.ai/cli/droid-exec/overview) as a first-class chat provider and `delegate` backend with streaming JSON output, session resume, and a conservative `--auto low` autonomy pin on the delegate path. Install `droid` and sign in via browser (or set `FACTORY_API_KEY`), then pick **Factory Droid CLI** in the setup wizard. Resolves #38.
@@ -417,85 +428,7 @@ Operational docs: https://swarmclaw.ai/docs/observability
417
428
  - **Chat execution context hardening**: tool invocation now resolves names case-insensitively, oversized tool results are truncated before they are fed back into the model, and proactive grounding/heartbeat prompts stay smaller under pressure to reduce avoidable context blowouts.
418
429
  - **API compatibility fixes**: OpenAI-compatible streaming now captures reasoning deltas from providers that emit them outside `delta.content`, and A2A endpoints are exempt from the main proxy access-key gate so they can rely on their own auth scheme.
419
430
 
420
- ### v1.5.33 Highlights
421
-
422
- - **CLI global flag compatibility**: legacy-routed commands now honor the documented `--access-key` and `--base-url` aliases even when they appear after the subcommand, so authenticated CLI automation works the same across binary entry points.
423
- - **Docker build memory hardening**: production Next.js builds now size `--max-old-space-size` from the detected container/cgroup memory limit, with `SWARMCLAW_BUILD_MAX_OLD_SPACE_SIZE_MB` available as an explicit override for constrained Docker Desktop and CI environments.
424
-
425
- ### v1.5.31 Highlights
426
-
427
- - **Fix Docker first-run crash**: resolved `EISDIR: illegal operation on a directory, read` error when running `docker compose up` without a pre-existing `.env.local` file. Docker was creating a directory mount instead of a file, which crashed Next.js on startup. Replaced the file bind mount with `env_file` directive using `required: false`.
428
-
429
- ### v1.5.4 Highlights
430
-
431
- - **Cursor Agent CLI built-in provider**: Cursor Agent CLI is now a first-class worker provider with session continuity, headless execution, and delegation support.
432
- - **Qwen Code CLI built-in provider**: Qwen Code CLI is now available as a built-in worker provider and delegation backend with structured headless execution support.
433
- - **Goose built-in provider**: Goose is now supported as a runtime-managed worker provider, using its own local auth and provider configuration while preserving SwarmClaw session continuity.
434
- - **CLI setup and health parity**: setup flows, provider checks, setup doctor, and provider-facing UI now recognize Cursor, Qwen Code, and Goose alongside the existing CLI-backed providers.
435
-
436
- ### v1.5.3 Highlights
437
-
438
- - **Copilot CLI v1.x compatibility**: the `copilot-cli` provider now handles the current event format (`assistant.message_delta`, `assistant.message`, updated `result` payload) while keeping backward compatibility with the legacy format. Also fixes `--resume` flag syntax. (Community contribution by [@borislavnnikolov](https://github.com/borislavnnikolov) -- PR #36)
439
-
440
- ### v1.5.2 Highlights
441
-
442
- - **Hosted deploy path for SwarmClaw itself**: added root-level `render.yaml`, `fly.toml`, and `railway.json` so the published `ghcr.io/swarmclawai/swarmclaw:latest` image is easier to run on always-on platforms.
443
- - **Public health endpoint for hosted platforms**: added `/api/healthz` and exempted it from access-key auth so Render, Fly.io, and Railway can perform liveness checks without weakening the rest of the API surface.
444
- - **OTLP/OpenTelemetry foundation**: SwarmClaw can now export traces for chat turns, direct model streams, protocol runs, and tool execution to any OTLP-compatible backend using environment variables only.
445
- - **Docs and landing-page deploy refresh**: `swarmclaw.ai` now exposes the hosted deploy path and a dedicated observability guide instead of burying those operator workflows in general setup docs.
446
-
447
- ### v1.5.1 Highlights
448
-
449
- - **Standalone connector lifecycle**: connector start, stop, status, and repair now work correctly in standalone production builds (`npm start` / pm2) where the daemon runs in-process. Previously these operations silently failed because the controller assumed a daemon subprocess was always present. (Community contribution by [@borislavnnikolov](https://github.com/borislavnnikolov) -- PR #35)
450
-
451
- ### v1.5.0 Highlights
452
-
453
- - **First-run activation refresh**: setup now includes a dedicated start-path step, broad starter shapes instead of niche presets, and draft agents generated directly from the chosen setup shape.
454
- - **Guided post-setup launchpad**: finishing setup now routes through action-oriented next steps such as opening the first agent chat, launching a structured session, connecting platforms, or reviewing usage.
455
- - **State-aware home and protocols**: fresh workspaces now open on a launchpad instead of a sparse ops dashboard, and the Protocols page now surfaces the visual builder and template gallery directly.
456
-
457
- ### v1.4.9 Highlights
458
-
459
- - **Standalone build reliability**: `public/`, `.next/static/`, and `css-tree/data/` are now automatically copied into the standalone build output, fixing runtime crashes and missing assets when running the standalone bundle. (Community contribution by [@borislavnnikolov](https://github.com/borislavnnikolov) — PR #34)
460
-
461
- ### v1.4.8 Highlights
462
-
463
- - **Agent-scoped SwarmFeed dashboard**: the in-app feed now has an explicit acting-agent model so humans can direct social actions without ever posting as a separate user identity.
464
- - **Expanded feed surface**: added Bookmarks and Notifications tabs, SwarmFeed search, suggested follows, thread detail sheets, profile sheets, and a restored visible composer.
465
- - **Broader SwarmFeed tool/API support**: the built-in `swarmfeed` tool and internal API now support follow/unfollow, bookmark/unbookmark, quote reposts, notifications, profile lookup, thread reads, and search.
466
- - **Social heartbeat enforcement**: task-completion posting, daily/manual-only guardrails, and heartbeat dependency warnings now match the agent-first SwarmFeed model instead of leaving social automation loosely implied.
467
-
468
- ### v1.4.7 Highlights
469
-
470
- - **Hermes Agent built-in provider**: Added first-class Hermes support through the Hermes API server, including optional auth, local or remote `/v1` endpoints, and runtime-managed agent handling.
471
- - **OpenRouter built-in provider**: OpenRouter is now a built-in provider instead of living only behind the generic custom-provider path.
472
- - **Runtime-managed provider handling**: Hermes now skips SwarmClaw's local extension/tool injection path so its own runtime stays in control, while setup and model discovery still work through the normal provider flow.
473
- - **Provider docs refresh**: README and docs now reflect the new provider list, remote Hermes API-server support, and logo assets for OpenRouter and Hermes Agent.
474
-
475
- ### v1.4.6 Highlights
476
-
477
- - **SwarmDock startup sync**: Existing SwarmDock agents now authenticate and reconcile their live marketplace profile on connector start, updating stale description, skills, framework/model metadata, and payout wallet fields
478
- - **Agent wallet fallback**: SwarmDock connectors now fall back to the agent's selected marketplace wallet when no connector-level wallet address is configured
479
- - **Task filter fix**: The built-in `swarmdock` tool now uses the correct `skills=` task filter when browsing marketplace tasks from chat
480
- - **SwarmDock SDK bump**: Updated `@swarmdock/sdk` from `0.5.2` to `0.5.3`, aligning the connector with the published metadata-sync fixes
481
-
482
- ### v1.4.5 Highlights
483
-
484
- - **OpenClaw 2026.4.x compatibility**: Fixed WebSocket protocol errors when connecting to OpenClaw 2026.4.2+ gateways (`profileId` was incorrectly included in RPC params)
485
- - **OpenClaw dependency bump**: Updated minimum OpenClaw from `2026.2.26` to `2026.4.2`
486
-
487
- ### v1.4.4 Highlights
488
-
489
- - **SwarmDock SDK bump**: Updated `@swarmdock/sdk` from `0.4.1` to `0.5.2`, picking up new error types, skill templates, and agent primitives
490
-
491
- ### v1.4.3 Highlights
492
-
493
- - **SwarmDock agent opt-in**: Agents can now opt into the SwarmDock marketplace directly from their settings sheet with description, skills, wallet, and auto-bid configuration
494
- - **SwarmFeed & SwarmDock tools**: Agents get `swarmfeed` and `swarmdock` tools auto-enabled when opted in, allowing autonomous posting, replying, liking, browsing tasks, and checking status from chat
495
- - **Auto-registration**: Enabling SwarmFeed on an agent automatically registers it on the SwarmFeed network (no manual connector setup required)
496
- - **Marketplace page**: New `/marketplace` sidebar page showing live SwarmDock tasks and agents
497
- - **Following tab fix**: SwarmFeed Following tab gracefully handles unregistered agents instead of showing a 401 error
498
- - **Compose removal**: Removed manual compose UI from Feed page — agents post autonomously through their tools
431
+ Older releases: https://swarmclaw.ai/docs/release-notes
499
432
 
500
433
  - GitHub releases: https://github.com/swarmclawai/swarmclaw/releases
501
434
  - npm package: https://www.npmjs.com/package/@swarmclawai/swarmclaw
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@swarmclawai/swarmclaw",
3
- "version": "1.5.37",
3
+ "version": "1.5.38",
4
4
  "description": "Build and run autonomous AI agents with OpenClaw, Hermes, multiple model providers, orchestration, delegation, memory, skills, schedules, and chat connectors.",
5
5
  "main": "electron-dist/main.js",
6
6
  "license": "MIT",
@@ -343,9 +343,17 @@ export function TaskCard({
343
343
  )}
344
344
  </div>
345
345
 
346
- {task.error && (
347
- <p className="mt-2 text-[11px] text-red-400/80 line-clamp-2">{task.error}</p>
348
- )}
346
+ {task.error && (() => {
347
+ const retryPending =
348
+ task.status !== 'failed' &&
349
+ !task.deadLetteredAt &&
350
+ (task.retryScheduledAt != null || /^Retry scheduled after failure/i.test(task.error))
351
+ return (
352
+ <p className={`mt-2 text-[11px] line-clamp-2 ${retryPending ? 'text-amber-400/80' : 'text-red-400/80'}`}>
353
+ {task.error}
354
+ </p>
355
+ )
356
+ })()}
349
357
 
350
358
  {/* Inline comments — show latest 2 */}
351
359
  {task.comments && task.comments.length > 0 && (
@@ -549,15 +549,26 @@ export function TaskSheet() {
549
549
  </div>
550
550
  )}
551
551
 
552
- {/* Error */}
553
- {editing.error && (
554
- <div className="mb-8">
555
- <label className="block font-display text-[12px] font-600 text-red-400 uppercase tracking-[0.08em] mb-3">Error</label>
556
- <div className="p-4 rounded-[14px] border border-red-500/10 bg-red-500/[0.03] text-[13px] text-red-400/80 whitespace-pre-wrap">
557
- {editing.error}
552
+ {/* Error / Retry notice */}
553
+ {editing.error && (() => {
554
+ const retryPending =
555
+ editing.status !== 'failed' &&
556
+ !editing.deadLetteredAt &&
557
+ (editing.retryScheduledAt != null || /^Retry scheduled after failure/i.test(editing.error))
558
+ const label = retryPending ? 'Retry Pending' : 'Error'
559
+ const tone = retryPending
560
+ ? 'border-amber-500/15 bg-amber-500/[0.04] text-amber-300/80'
561
+ : 'border-red-500/10 bg-red-500/[0.03] text-red-400/80'
562
+ const labelTone = retryPending ? 'text-amber-400' : 'text-red-400'
563
+ return (
564
+ <div className="mb-8">
565
+ <label className={`block font-display text-[12px] font-600 uppercase tracking-[0.08em] mb-3 ${labelTone}`}>{label}</label>
566
+ <div className={`p-4 rounded-[14px] border text-[13px] whitespace-pre-wrap ${tone}`}>
567
+ {editing.error}
568
+ </div>
558
569
  </div>
559
- </div>
560
- )}
570
+ )
571
+ })()}
561
572
 
562
573
  {/* Comments (with input — adding comments from view mode is useful) */}
563
574
  <div className="mb-8">
@@ -779,8 +779,20 @@ function writeReflectionMemories(params: {
779
779
  { kind: 'open_loop', notes: params.openLoops },
780
780
  ]
781
781
 
782
+ // Cross-kind dedup: skip notes whose normalized text we've already stored
783
+ // this run. The reflection classifier often produces near-identical notes
784
+ // under multiple kinds (e.g. "successfully completed a multi-step task…"
785
+ // appearing as both invariant and lesson), which floods memory with noise.
786
+ const seenNormalized = new Set<string>()
787
+ const normalizeNote = (note: string): string =>
788
+ note.toLowerCase().replace(/\s+/g, ' ').trim().slice(0, 240)
789
+
782
790
  for (const group of groups) {
783
791
  for (const note of group.notes) {
792
+ const norm = normalizeNote(note)
793
+ if (!norm) continue
794
+ if (seenNormalized.has(norm)) continue
795
+ seenNormalized.add(norm)
784
796
  const metadata: Record<string, unknown> = {
785
797
  origin: 'autonomy-reflection',
786
798
  reflectionId: params.reflectionId,
@@ -238,6 +238,18 @@ test('buildChatModel keeps local Ollama local even when a credential and :cloud
238
238
  assert.equal(llm.clientConfig?.baseURL, 'http://localhost:11434/v1')
239
239
  })
240
240
 
241
+ test('buildChatModel disables parallel_tool_calls for Ollama local to avoid duplicate tool emissions from some OSS models', () => {
242
+ const llm = buildChatModel({
243
+ provider: 'ollama',
244
+ model: 'devstral',
245
+ ollamaMode: 'local',
246
+ apiKey: null,
247
+ }) as ChatOpenAI & { modelKwargs?: Record<string, unknown> }
248
+
249
+ assert.equal(llm.modelKwargs?.parallel_tool_calls, false)
250
+ assert.equal(llm.clientConfig?.baseURL, 'http://localhost:11434/v1')
251
+ })
252
+
241
253
  test('buildChatModel uses Ollama Cloud only when explicit cloud mode is selected', () => {
242
254
  saveCredentials({
243
255
  'cred-1': {
@@ -38,6 +38,7 @@ type OpenAiReasoningEffort = 'low' | 'medium' | 'high'
38
38
  type ChatOpenAiConfig = ConstructorParameters<typeof ChatOpenAI>[0] & {
39
39
  modelKwargs?: {
40
40
  reasoning_effort?: OpenAiReasoningEffort
41
+ parallel_tool_calls?: boolean
41
42
  }
42
43
  configuration?: {
43
44
  baseURL?: string
@@ -104,6 +105,7 @@ export function buildChatModel(opts: {
104
105
  timeout: OPENAI_COMPAT_MODEL_TIMEOUT_MS,
105
106
  maxRetries: OPENAI_COMPAT_MODEL_MAX_RETRIES,
106
107
  configuration: { baseURL },
108
+ modelKwargs: { parallel_tool_calls: false },
107
109
  })
108
110
  }
109
111
 
@@ -1,6 +1,7 @@
1
1
  import fs from 'fs'
2
2
  import os from 'os'
3
3
 
4
+ import { log } from '@/lib/server/logger'
4
5
  import { getProvider } from '@/lib/providers'
5
6
  import type { ExecutionBrief, Message, Session } from '@/types'
6
7
  import {
@@ -418,7 +419,17 @@ function buildAgentSystemPrompt(
418
419
  'You run on an autonomous heartbeat. If you receive a heartbeat poll and nothing needs attention, reply exactly: HEARTBEAT_OK',
419
420
  ].join('\n'))
420
421
 
421
- return parts.join('\n\n')
422
+ const assembled = parts.join('\n\n')
423
+ if (process.env.SWARMCLAW_PROFILE_PROMPT === '1') {
424
+ // Dump per-section sizes once per turn to help size the context budget.
425
+ // Kept behind an env flag so production turns stay quiet.
426
+ const sectionSizes = parts.map((block, idx) => {
427
+ const firstLine = block.split('\n', 1)[0]
428
+ return ` [${idx}] ${firstLine.slice(0, 60)} — ${block.length} chars`
429
+ }).join('\n')
430
+ log.info('prompt-profile', `System prompt assembled (${assembled.length} chars, ${parts.length} blocks, ${enabledExtensions.length} extensions):\n${sectionSizes}`)
431
+ }
432
+ return assembled
422
433
  }
423
434
 
424
435
  function resolveApiKeyForSession(session: SessionWithCredentials, provider: ProviderApiKeyConfig): string | null {
@@ -0,0 +1,39 @@
1
+ import { describe, it } from 'node:test'
2
+ import assert from 'node:assert/strict'
3
+ import { ChatTurnState } from '@/lib/server/chat-execution/chat-turn-state'
4
+
5
+ describe('ChatTurnState snapshot/restore', () => {
6
+ it('round-trips lastToolSummaryTextLen so a rollback does not skip a legitimate tool_summary retry', () => {
7
+ // Regression: before this field was included in the snapshot, a transient
8
+ // error + restore() would leave lastToolSummaryTextLen ahead of fullText.
9
+ // The no-progress guard in checkToolSummary then saw a negative delta and
10
+ // silently skipped the retry — the model lost its chance to summarize.
11
+ const state = new ChatTurnState()
12
+ state.fullText = 'initial prompt text'
13
+ state.hasToolCalls = true
14
+ const snap = state.snapshot()
15
+ assert.equal(snap.lastToolSummaryTextLen, -1)
16
+
17
+ // Simulate a tool_summary retry that advanced the guard counter, then a
18
+ // rollback to the pre-iteration snapshot.
19
+ state.lastToolSummaryTextLen = state.fullText.length
20
+ state.fullText += ' partial speculative output that gets thrown away'
21
+ state.restore(snap)
22
+
23
+ assert.equal(state.lastToolSummaryTextLen, -1)
24
+ assert.equal(state.fullText, 'initial prompt text')
25
+ })
26
+
27
+ it('preserves an already-advanced lastToolSummaryTextLen across a non-rollback snapshot cycle', () => {
28
+ const state = new ChatTurnState()
29
+ state.fullText = 'Here is the answer.'
30
+ state.lastToolSummaryTextLen = state.fullText.length
31
+ const snap = state.snapshot()
32
+
33
+ state.fullText += ' Extended thought.'
34
+ state.restore(snap)
35
+
36
+ assert.equal(state.lastToolSummaryTextLen, 'Here is the answer.'.length)
37
+ assert.equal(state.fullText, 'Here is the answer.')
38
+ })
39
+ })
@@ -19,6 +19,7 @@ export interface TurnStateSnapshot {
19
19
  toolFrequencyBlocked: string | false
20
20
  terminalToolBoundary: 'memory_write' | 'durable_wait' | 'context_compaction' | null
21
21
  memoryWriteTerminalAllowed: boolean | null
22
+ lastToolSummaryTextLen: number
22
23
  }
23
24
 
24
25
  export class ChatTurnState {
@@ -39,6 +40,7 @@ export class ChatTurnState {
39
40
  terminalToolBoundary: 'memory_write' | 'durable_wait' | 'context_compaction' | null = null
40
41
  terminalToolResponse = ''
41
42
  memoryWriteTerminalAllowed: boolean | null = null
43
+ lastToolSummaryTextLen = -1
42
44
 
43
45
  snapshot(): TurnStateSnapshot {
44
46
  return {
@@ -53,6 +55,7 @@ export class ChatTurnState {
53
55
  toolFrequencyBlocked: this.toolFrequencyBlocked,
54
56
  terminalToolBoundary: this.terminalToolBoundary,
55
57
  memoryWriteTerminalAllowed: this.memoryWriteTerminalAllowed,
58
+ lastToolSummaryTextLen: this.lastToolSummaryTextLen,
56
59
  }
57
60
  }
58
61
 
@@ -68,6 +71,7 @@ export class ChatTurnState {
68
71
  this.toolFrequencyBlocked = snap.toolFrequencyBlocked
69
72
  this.terminalToolBoundary = snap.terminalToolBoundary
70
73
  this.memoryWriteTerminalAllowed = snap.memoryWriteTerminalAllowed
74
+ this.lastToolSummaryTextLen = snap.lastToolSummaryTextLen
71
75
  }
72
76
 
73
77
  /**
@@ -28,6 +28,7 @@ import {
28
28
  } from '@/lib/server/chat-execution/memory-mutation-tools'
29
29
  import { shouldForceAttachmentFollowthrough } from '@/lib/server/chat-execution/prompt-builder'
30
30
  import { shouldSkipToolSummaryForShortResponse } from '@/lib/server/chat-execution/chat-streaming-utils'
31
+ import { toolSummaryHasMeaningfulProgress } from '@/lib/server/chat-execution/tool-summary-progress'
31
32
  import { logExecution, type LogCategory } from '@/lib/server/execution-log'
32
33
 
33
34
  // ---------------------------------------------------------------------------
@@ -378,6 +379,15 @@ function checkToolSummary(ctx: ContinuationContext): ContinuationDecision | null
378
379
  )
379
380
  )
380
381
  if (!textIsTrivial) return null
382
+ const currentLen = ctx.state.fullText.length
383
+ const priorLen = ctx.state.lastToolSummaryTextLen
384
+ if (!toolSummaryHasMeaningfulProgress(priorLen, currentLen)) {
385
+ logStatus(ctx, 'decision', `Tool summary retry skipped — no meaningful progress (delta=${currentLen - priorLen} chars)`, {
386
+ priorLen, currentLen, toolEventCount: ctx.state.streamedToolEvents.length,
387
+ })
388
+ return null
389
+ }
390
+ ctx.state.lastToolSummaryTextLen = currentLen
381
391
  const count = ctx.limits.increment('tool_summary')
382
392
  const summaryReason = !ctx.state.fullText.trim() ? 'empty_response_after_tools' : 'trivial_preamble_after_tools'
383
393
  logStatus(ctx, 'decision', `Tools called but response text is trivial (${ctx.state.fullText.trim().length} chars) — forcing summary continuation`, {
@@ -33,4 +33,71 @@ describe('tool-event-tracker', () => {
33
33
  assert.equal(tracker.complete('nested_1'), false)
34
34
  assert.equal(tracker.pendingCount, 0)
35
35
  })
36
+
37
+ it('suppresses duplicate parallel tool_calls with identical name+input in the same turn', () => {
38
+ const tracker = new LangGraphToolEventTracker()
39
+ const metadata = { langgraph_node: 'tools' }
40
+ const event = { name: 'files', data: { input: { action: 'write', path: '/tmp/x' } }, metadata }
41
+
42
+ // First acceptance emits; second identical call is swallowed.
43
+ assert.equal(tracker.acceptStart({ run_id: 'r1', ...event }), true)
44
+ assert.equal(tracker.acceptStart({ run_id: 'r2', ...event }), false)
45
+ assert.equal(tracker.pendingCount, 1)
46
+
47
+ // complete() returns false for the suppressed one so the caller skips its result too.
48
+ assert.equal(tracker.complete('r2'), false)
49
+ assert.equal(tracker.complete('r1'), true)
50
+ assert.equal(tracker.pendingCount, 0)
51
+
52
+ // After the first call fully completes, a legitimately later identical call is accepted.
53
+ assert.equal(tracker.acceptStart({ run_id: 'r3', ...event }), true)
54
+ assert.equal(tracker.complete('r3'), true)
55
+ })
56
+
57
+ it('does not leak the accepted run if the same run_id re-enters acceptStart', () => {
58
+ // Guards against replayed start events (e.g., HMR, graph retries) causing
59
+ // the same run_id to be recorded both as accepted and suppressed, which
60
+ // would leave pendingCount > 0 after complete().
61
+ const tracker = new LangGraphToolEventTracker()
62
+ const metadata = { langgraph_node: 'tools' }
63
+ const event = { name: 'shell', data: { input: { cmd: 'ls' } }, metadata }
64
+
65
+ assert.equal(tracker.acceptStart({ run_id: 'same', ...event }), true)
66
+ assert.equal(tracker.acceptStart({ run_id: 'same', ...event }), false)
67
+ assert.equal(tracker.pendingCount, 1)
68
+ assert.equal(tracker.complete('same'), true)
69
+ assert.equal(tracker.pendingCount, 0)
70
+ })
71
+
72
+ it('handles triple-duplicate (2 suppressed) parallel tool_calls cleanly', () => {
73
+ const tracker = new LangGraphToolEventTracker()
74
+ const metadata = { langgraph_node: 'tools' }
75
+ const event = { name: 'files', data: { input: { action: 'read', path: '/a' } }, metadata }
76
+
77
+ assert.equal(tracker.acceptStart({ run_id: 'r1', ...event }), true)
78
+ assert.equal(tracker.acceptStart({ run_id: 'r2', ...event }), false)
79
+ assert.equal(tracker.acceptStart({ run_id: 'r3', ...event }), false)
80
+ assert.equal(tracker.pendingCount, 1)
81
+
82
+ // Out-of-order completions still settle correctly.
83
+ assert.equal(tracker.complete('r3'), false)
84
+ assert.equal(tracker.complete('r1'), true)
85
+ assert.equal(tracker.complete('r2'), false)
86
+ assert.equal(tracker.pendingCount, 0)
87
+ })
88
+
89
+ it('distinct inputs produce distinct signatures and both are accepted', () => {
90
+ const tracker = new LangGraphToolEventTracker()
91
+ const metadata = { langgraph_node: 'tools' }
92
+
93
+ assert.equal(tracker.acceptStart({
94
+ run_id: 'a', name: 'files', data: { input: { path: '/a' } }, metadata,
95
+ }), true)
96
+ assert.equal(tracker.acceptStart({
97
+ run_id: 'b', name: 'files', data: { input: { path: '/b' } }, metadata,
98
+ }), true)
99
+ assert.equal(tracker.pendingCount, 2)
100
+ assert.equal(tracker.complete('a'), true)
101
+ assert.equal(tracker.complete('b'), true)
102
+ })
36
103
  })
@@ -1,5 +1,7 @@
1
1
  export interface StreamToolEventLike {
2
2
  run_id: string
3
+ name?: string
4
+ data?: { input?: unknown }
3
5
  metadata?: Record<string, unknown>
4
6
  }
5
7
 
@@ -9,18 +11,54 @@ export function isLangGraphToolNodeMetadata(metadata: Record<string, unknown> |
9
11
  || typeof metadata.__pregel_task_id === 'string'
10
12
  }
11
13
 
14
+ function toolCallSignature(event: StreamToolEventLike): string {
15
+ const name = event.name || ''
16
+ // Only dedup when we have enough to form a meaningful signature — name
17
+ // is required. Otherwise callers (and tests) that track distinct run ids
18
+ // with no name/input must continue to work as before.
19
+ if (!name) return ''
20
+ const input = event.data?.input
21
+ const inputStr = typeof input === 'string' ? input : JSON.stringify(input ?? '')
22
+ return `${name}:${inputStr}`
23
+ }
24
+
12
25
  export class LangGraphToolEventTracker {
13
26
  private readonly acceptedRunIds = new Set<string>()
27
+ private readonly suppressedRunIds = new Set<string>()
28
+ // Active signatures -> first accepted run_id. Used to suppress duplicate
29
+ // parallel tool_calls emitted by some open-source models (e.g. Ollama's
30
+ // devstral emits identical tool_calls twice per turn).
31
+ private readonly activeSignatures = new Map<string, string>()
14
32
 
15
33
  acceptStart(event: StreamToolEventLike): boolean {
16
34
  if (!isLangGraphToolNodeMetadata(event.metadata)) return false
35
+ const signature = toolCallSignature(event)
36
+ if (signature && this.activeSignatures.has(signature)) {
37
+ const firstAcceptedId = this.activeSignatures.get(signature)
38
+ // If the incoming run_id matches the already-accepted one, this is a
39
+ // duplicate start event for the same run — treat as a no-op accept
40
+ // (do not suppress, since we must still acknowledge its completion).
41
+ if (firstAcceptedId === event.run_id) return false
42
+ this.suppressedRunIds.add(event.run_id)
43
+ return false
44
+ }
45
+ if (signature) this.activeSignatures.set(signature, event.run_id)
17
46
  this.acceptedRunIds.add(event.run_id)
18
47
  return true
19
48
  }
20
49
 
21
50
  complete(runId: string): boolean {
51
+ if (this.suppressedRunIds.has(runId)) {
52
+ this.suppressedRunIds.delete(runId)
53
+ return false
54
+ }
22
55
  if (!this.acceptedRunIds.has(runId)) return false
23
56
  this.acceptedRunIds.delete(runId)
57
+ // Clear matching signature so a legitimately-new call with the same args
58
+ // later in the turn is not mistaken for a duplicate.
59
+ for (const [sig, id] of this.activeSignatures) {
60
+ if (id === runId) { this.activeSignatures.delete(sig); break }
61
+ }
24
62
  return true
25
63
  }
26
64
 
@@ -0,0 +1,42 @@
1
+ import { describe, it } from 'node:test'
2
+ import assert from 'node:assert/strict'
3
+ import {
4
+ TOOL_SUMMARY_PROGRESS_MIN_DELTA,
5
+ toolSummaryHasMeaningfulProgress,
6
+ } from '@/lib/server/chat-execution/tool-summary-progress'
7
+
8
+ describe('toolSummaryHasMeaningfulProgress (no-progress guard for tool_summary retries)', () => {
9
+ it('allows the first retry even when fullText is empty (sentinel priorLen = -1)', () => {
10
+ assert.equal(toolSummaryHasMeaningfulProgress(-1, 0), true)
11
+ })
12
+
13
+ it('allows the first retry even when some text already exists (sentinel priorLen = -1)', () => {
14
+ assert.equal(toolSummaryHasMeaningfulProgress(-1, 9999), true)
15
+ })
16
+
17
+ it('skips subsequent retries when the delta is below the minimum', () => {
18
+ assert.equal(toolSummaryHasMeaningfulProgress(100, 105), false)
19
+ assert.equal(toolSummaryHasMeaningfulProgress(100, 100), false)
20
+ // The boundary: exactly MIN_DELTA - 1 still skips.
21
+ assert.equal(
22
+ toolSummaryHasMeaningfulProgress(100, 100 + TOOL_SUMMARY_PROGRESS_MIN_DELTA - 1),
23
+ false,
24
+ )
25
+ })
26
+
27
+ it('allows a retry when the delta meets or exceeds the minimum', () => {
28
+ assert.equal(
29
+ toolSummaryHasMeaningfulProgress(100, 100 + TOOL_SUMMARY_PROGRESS_MIN_DELTA),
30
+ true,
31
+ )
32
+ assert.equal(toolSummaryHasMeaningfulProgress(100, 1000), true)
33
+ })
34
+
35
+ it('skips retry when fullText SHRANK after a restore — protects against stale priorLen post-rollback', () => {
36
+ // This scenario was the real reason we also added lastToolSummaryTextLen to
37
+ // ChatTurnState snapshot/restore. Here we just verify the math: if priorLen
38
+ // is ahead of currentLen (e.g., rollback happened without syncing the
39
+ // counter), the delta is negative, so the guard correctly skips retry.
40
+ assert.equal(toolSummaryHasMeaningfulProgress(500, 200), false)
41
+ })
42
+ })
@@ -0,0 +1,13 @@
1
+ /** Minimum new-character delta required between tool_summary retries. */
2
+ export const TOOL_SUMMARY_PROGRESS_MIN_DELTA = 30
3
+
4
+ /**
5
+ * Returns false when a prior tool_summary retry already ran and the model
6
+ * has produced essentially no additional text on the follow-up turn — the
7
+ * signal to stop retrying. On the first pass (priorLen < 0) this is always
8
+ * true so the retry can happen at least once.
9
+ */
10
+ export function toolSummaryHasMeaningfulProgress(priorLen: number, currentLen: number): boolean {
11
+ if (priorLen < 0) return true
12
+ return currentLen - priorLen >= TOOL_SUMMARY_PROGRESS_MIN_DELTA
13
+ }
@@ -51,6 +51,9 @@ function buildGatewayConfigFromAgent(agent: Agent, options?: { allowDisabled?: b
51
51
  const route = resolvePrimaryAgentRoute(agent)
52
52
  if (route?.provider !== 'openclaw') return null
53
53
 
54
+ const credential = resolveTokenForCredential(route.credentialId)
55
+ if (credential.dangling) return null
56
+
54
57
  const routeProfile = route.gatewayProfileId ? getGatewayProfile(route.gatewayProfileId) : null
55
58
  return {
56
59
  key: route.gatewayProfileId ? `profile:${route.gatewayProfileId}` : `agent:${agent.id}`,
@@ -60,7 +63,7 @@ function buildGatewayConfigFromAgent(agent: Agent, options?: { allowDisabled?: b
60
63
  : route.apiEndpoint
61
64
  ? deriveOpenClawWsUrl(route.apiEndpoint)
62
65
  : DEFAULT_OPENCLAW_WS_URL,
63
- token: resolveTokenForCredential(route.credentialId),
66
+ token: credential.token,
64
67
  }
65
68
  }
66
69
 
@@ -80,15 +83,28 @@ function normalizeWsUrl(raw: string): string {
80
83
  return url.replace(/^http:/i, 'ws:').replace(/^https:/i, 'wss:')
81
84
  }
82
85
 
83
- function resolveTokenForCredential(credentialId?: string | null): string | undefined {
84
- if (!credentialId) return undefined
86
+ /**
87
+ * Resolves an OpenClaw gateway credential.
88
+ *
89
+ * - `credentialId` null/undefined: the caller wants an unauthenticated
90
+ * connection. Returns `{ token: undefined, dangling: false }`.
91
+ * - `credentialId` set and resolvable: returns `{ token, dangling: false }`.
92
+ * - `credentialId` set but missing/deleted: returns `{ token: undefined,
93
+ * dangling: true }`. Callers should treat this as a hard configuration
94
+ * error and refuse to dial — otherwise the WS handshake fails as
95
+ * `unauthorized: gateway token missing` and the agent-side timeout waits
96
+ * the full 120 s before surfacing the error to the user.
97
+ */
98
+ function resolveTokenForCredential(credentialId?: string | null): { token: string | undefined; dangling: boolean } {
99
+ if (!credentialId) return { token: undefined, dangling: false }
85
100
  const secret = resolveCredentialSecret(credentialId)
86
101
  if (!secret) {
87
- log.warn(TAG, `Credential "${credentialId}" is referenced but could not be resolved — gateway connection will lack a token`, {
102
+ log.error(TAG, `Credential "${credentialId}" is referenced but missing from the credential store refusing gateway connection`, {
88
103
  credentialId,
89
104
  })
105
+ return { token: undefined, dangling: true }
90
106
  }
91
- return secret || undefined
107
+ return { token: secret, dangling: false }
92
108
  }
93
109
 
94
110
  export function resolveGatewayConfig(target?: {
@@ -99,11 +115,13 @@ export function resolveGatewayConfig(target?: {
99
115
  if (profileId) {
100
116
  const profile = getGatewayProfile(profileId)
101
117
  if (!profile) return null
118
+ const credential = resolveTokenForCredential(profile.credentialId)
119
+ if (credential.dangling) return null
102
120
  return {
103
121
  key: `profile:${profile.id}`,
104
122
  profileId: profile.id,
105
123
  wsUrl: profile.wsUrl ? normalizeWsUrl(profile.wsUrl) : deriveOpenClawWsUrl(profile.endpoint),
106
- token: resolveTokenForCredential(profile.credentialId),
124
+ token: credential.token,
107
125
  }
108
126
  }
109
127
 
@@ -123,11 +141,13 @@ export function resolveGatewayConfig(target?: {
123
141
  const gatewayProfiles = getGatewayProfiles('openclaw')
124
142
  if (gatewayProfiles[0]) {
125
143
  const profile = gatewayProfiles[0]
144
+ const credential = resolveTokenForCredential(profile.credentialId)
145
+ if (credential.dangling) return null
126
146
  return {
127
147
  key: `profile:${profile.id}`,
128
148
  profileId: profile.id,
129
149
  wsUrl: profile.wsUrl ? normalizeWsUrl(profile.wsUrl) : deriveOpenClawWsUrl(profile.endpoint),
130
- token: resolveTokenForCredential(profile.credentialId),
150
+ token: credential.token,
131
151
  }
132
152
  }
133
153
  return null
@@ -964,6 +964,7 @@ function scheduleRetryOrDeadLetter(task: BoardTask, reason: string): 'retry' | '
964
964
  if (isCancelledTask(task)) {
965
965
  task.retryScheduledAt = null
966
966
  task.deadLetteredAt = null
967
+ task.checkoutRunId = null
967
968
  task.updatedAt = Date.now()
968
969
  return 'dead_lettered'
969
970
  }
@@ -975,6 +976,10 @@ function scheduleRetryOrDeadLetter(task: BoardTask, reason: string): 'retry' | '
975
976
  const delayMs = jitteredBackoff((task.retryBackoffSec || 30) * 1000, Math.max(0, (task.attempts || 1) - 1), 6 * 3600_000)
976
977
  task.status = 'queued'
977
978
  task.retryScheduledAt = now + delayMs
979
+ // Release the prior checkout so the task can be checked out again on retry.
980
+ // Without this, checkoutTask() returns null every attempt and the orphan-
981
+ // recovery loop burns CPU re-queueing a task that can never run.
982
+ task.checkoutRunId = null
978
983
  task.updatedAt = now
979
984
  task.error = `Retry scheduled after failure: ${reason}`.slice(0, 500)
980
985
  if (!task.comments) task.comments = []
@@ -990,6 +995,7 @@ function scheduleRetryOrDeadLetter(task: BoardTask, reason: string): 'retry' | '
990
995
  task.status = 'failed'
991
996
  task.deadLetteredAt = now
992
997
  task.retryScheduledAt = null
998
+ task.checkoutRunId = null
993
999
  task.updatedAt = now
994
1000
  task.error = `Dead-lettered after ${task.attempts}/${task.maxAttempts} attempts: ${reason}`.slice(0, 500)
995
1001
  if (!task.comments) task.comments = []
@@ -248,6 +248,121 @@ describe('queue recovery', () => {
248
248
  assert.equal(output.scheduledCalls, 1)
249
249
  })
250
250
 
251
+ it('scheduleRetryOrDeadLetter via stall recovery clears checkoutRunId so the next attempt can check out', () => {
252
+ // Regression: a task transitioning running -> queued on retry must release its
253
+ // prior checkout. Without this, checkoutTask() returns null on every attempt
254
+ // and the orphan-recovery loop burns CPU re-queueing a task that can never run.
255
+ const output = runWithTempDataDir<{
256
+ status: string | null
257
+ checkoutRunId: string | null
258
+ queued: string[]
259
+ attempts: number | null
260
+ }>(`
261
+ const storageMod = await import('@/lib/server/storage')
262
+ const queueMod = await import('@/lib/server/runtime/queue')
263
+ const storage = storageMod.default || storageMod
264
+ const queue = queueMod.default || queueMod
265
+
266
+ const now = Date.now()
267
+ storage.saveSettings({
268
+ ...storage.loadSettings(),
269
+ taskStallTimeoutMin: 5,
270
+ taskRetryBackoffSec: 30,
271
+ })
272
+ storage.saveTasks({
273
+ stuck: {
274
+ id: 'stuck',
275
+ title: 'Stuck with stale checkout',
276
+ description: 'Running task that stalled and must release its checkout on retry',
277
+ status: 'running',
278
+ agentId: 'agent-a',
279
+ startedAt: now - 600_000,
280
+ updatedAt: now - 600_000,
281
+ createdAt: now - 700_000,
282
+ maxAttempts: 3,
283
+ attempts: 0,
284
+ checkoutRunId: 'stale-run-id',
285
+ },
286
+ })
287
+ storage.saveQueue([])
288
+
289
+ const originalSetTimeout = globalThis.setTimeout
290
+ globalThis.setTimeout = () => 0
291
+ try {
292
+ queue.recoverStalledRunningTasks()
293
+ } finally {
294
+ globalThis.setTimeout = originalSetTimeout
295
+ }
296
+
297
+ const task = storage.loadTasks().stuck
298
+ console.log(JSON.stringify({
299
+ status: task?.status ?? null,
300
+ checkoutRunId: task?.checkoutRunId ?? null,
301
+ queued: storage.loadQueue(),
302
+ attempts: task?.attempts ?? null,
303
+ }))
304
+ `)
305
+
306
+ assert.equal(output.status, 'queued', 'task should be requeued for retry')
307
+ assert.equal(output.checkoutRunId, null, 'stale checkoutRunId must be released so retry can check out')
308
+ assert.deepEqual(output.queued, ['stuck'])
309
+ assert.equal(output.attempts, 1)
310
+ })
311
+
312
+ it('dead-letter path clears checkoutRunId so terminal tasks do not appear checked-out', () => {
313
+ const output = runWithTempDataDir<{
314
+ status: string | null
315
+ checkoutRunId: string | null
316
+ attempts: number | null
317
+ }>(`
318
+ const storageMod = await import('@/lib/server/storage')
319
+ const queueMod = await import('@/lib/server/runtime/queue')
320
+ const storage = storageMod.default || storageMod
321
+ const queue = queueMod.default || queueMod
322
+
323
+ const now = Date.now()
324
+ storage.saveSettings({
325
+ ...storage.loadSettings(),
326
+ taskStallTimeoutMin: 5,
327
+ })
328
+ storage.saveTasks({
329
+ doomed: {
330
+ id: 'doomed',
331
+ title: 'Exhausted retries',
332
+ description: 'Task at its last attempt that stalls should dead-letter and release checkout',
333
+ status: 'running',
334
+ agentId: 'agent-a',
335
+ startedAt: now - 600_000,
336
+ updatedAt: now - 600_000,
337
+ createdAt: now - 700_000,
338
+ maxAttempts: 2,
339
+ attempts: 1,
340
+ checkoutRunId: 'stale-run-id',
341
+ },
342
+ })
343
+ storage.saveQueue([])
344
+
345
+ const originalSetTimeout = globalThis.setTimeout
346
+ globalThis.setTimeout = () => 0
347
+ try {
348
+ queue.recoverStalledRunningTasks()
349
+ } finally {
350
+ globalThis.setTimeout = originalSetTimeout
351
+ }
352
+
353
+ const task = storage.loadTasks().doomed
354
+ console.log(JSON.stringify({
355
+ status: task?.status ?? null,
356
+ checkoutRunId: task?.checkoutRunId ?? null,
357
+ attempts: task?.attempts ?? null,
358
+ }))
359
+ `)
360
+
361
+ assert.equal(output.status, 'failed', 'task should be dead-lettered after exhausting retries')
362
+ assert.equal(output.checkoutRunId, null, 'dead-lettered tasks must not retain a stale checkoutRunId')
363
+ assert.equal(output.attempts, 2)
364
+ })
365
+
251
366
  it('resumeQueue restores blocked queued tasks without clobbering their queuedAt timestamp', () => {
252
367
  const output = runWithTempDataDir<{
253
368
  queued: string[]
@@ -19,7 +19,13 @@ export function checkoutTask(
19
19
  const tasks = loadTasks() as Record<string, BoardTask>
20
20
  const task = tasks[taskId]
21
21
  if (!task || task.status !== 'queued') return null
22
- if (task.checkoutRunId) return null // already checked out
22
+ // A stale checkoutRunId can survive an ungraceful server exit (crash,
23
+ // SIGKILL, HMR reload mid-turn). If status is 'queued', the runId cannot
24
+ // reference a live checkout — only running tasks hold active checkouts —
25
+ // so treat the lingering id as stale and reclaim it. Previously this
26
+ // returned null forever, so the dispatch → orphan-recovery → failed-
27
+ // checkout cycle spammed "Recovering orphaned queued task" every ~2 ms
28
+ // (21 k log lines in a single session).
23
29
 
24
30
  const now = Date.now()
25
31
  task.status = 'running'