@link-assistant/hive-mind 2.0.1 → 2.0.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -1,5 +1,42 @@
1
1
  # @link-assistant/hive-mind
2
2
 
3
+ ## 2.0.2
4
+
5
+ ### Patch Changes
6
+
7
+ - 19aea85: fix(retry): auto-resume on "Stream idle timeout - partial response received" (#1937)
8
+
9
+ A long-running solve session (391 turns, ~$34.11) had its streaming response
10
+ stall mid-answer. The Claude CLI surfaced it as a `result` event with
11
+ `is_error: true`, `subtype: "success"`, and:
12
+
13
+ ```
14
+ API Error: Stream idle timeout - partial response received
15
+ ```
16
+
17
+ Instead of retrying with the session preserved, the harness fell straight
18
+ through to the generic failure path and exited with code 1 after **zero
19
+ retries** — abandoning the whole session even though it had a valid session ID
20
+ and printed the exact `--resume` command needed to continue.
21
+
22
+ Root cause: the shared retry classifier `classifyRetryableError()`
23
+ (`src/tool-retry.lib.mjs`) had no branch for the stream-idle-timeout family, so
24
+ `isRetryable` was false, `isTransientError` evaluated to false, and the unified
25
+ exponential-backoff retry block was never entered.
26
+
27
+ This error is a transient transport-level stall (a slow/stuck server-sent-events
28
+ socket), not a request-content rejection — the on-disk session transcript stays
29
+ valid, which is why a manual `--resume` works. The fix adds one branch to
30
+ `classifyRetryableError()` returning
31
+ `{ isRetryable: true, isCapacity: false, label: 'Stream idle timeout (partial response)' }`,
32
+ so the existing retry block resumes the session with the same context after an
33
+ exponential backoff. Because the classifier is shared, this fixes the behaviour
34
+ for **all** tools (claude/codex/gemini/opencode/qwen/agent) at once.
35
+
36
+ Added `tests/test-issue-1937-stream-idle-timeout-retry.mjs` (17 assertions) and a
37
+ full case study with timeline, root-cause analysis, upstream references, and the
38
+ captured logs under `docs/case-studies/issue-1937`.
39
+
3
40
  ## 2.0.1
4
41
 
5
42
  ### Patch Changes
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@link-assistant/hive-mind",
3
- "version": "2.0.1",
3
+ "version": "2.0.2",
4
4
  "description": "AI-powered issue solver and hive mind for collaborative problem solving",
5
5
  "main": "src/hive.mjs",
6
6
  "type": "module",
@@ -43,6 +43,22 @@ export const classifyRetryableError = value => {
43
43
  return { message, isRetryable: true, isCapacity: false, label: 'Stream disconnected before completion' };
44
44
  }
45
45
 
46
+ // Issue #1937: Stream idle timeout. When the Anthropic streaming response stalls
47
+ // (no bytes for the SDK's idle window) after the model has already emitted part of
48
+ // its answer, the Claude CLI aborts the turn and surfaces a synthetic assistant /
49
+ // result message:
50
+ // "API Error: Stream idle timeout - partial response received"
51
+ // This is a transient network/streaming stall (a slow or stuck server-sent-events
52
+ // socket), not a request-content error, so the session is still valid and safe to
53
+ // resume. Before this branch classifyRetryableError() did not recognise it, so
54
+ // isRetryable was false and the whole solve session aborted with exit code 1 even
55
+ // though `--resume <sessionId>` could continue with the same context. Switching
56
+ // models does not help (the stall is in the response stream, not model capacity),
57
+ // so isCapacity is false → retry with the session preserved after a backoff.
58
+ if (lower.includes('stream idle timeout') || (lower.includes('idle timeout') && lower.includes('partial response'))) {
59
+ return { message, isRetryable: true, isCapacity: false, label: 'Stream idle timeout (partial response)' };
60
+ }
61
+
46
62
  // Issue #1881: Transient socket / network disconnects from the SDK's underlying fetch.
47
63
  // When the HTTP(S)/streaming socket drops mid-request, the Claude/Codex CLI surfaces a
48
64
  // synthetic assistant message such as: