alvin-bot 4.12.3 → 4.12.4
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md
CHANGED
|
@@ -2,6 +2,55 @@
|
|
|
2
2
|
|
|
3
3
|
All notable changes to Alvin Bot are documented here.
|
|
4
4
|
|
|
5
|
+
## [4.12.4] — 2026-04-16
|
|
6
|
+
|
|
7
|
+
### 🐛 Patch: recover partial output from interrupted background sub-agents
|
|
8
|
+
|
|
9
|
+
**The bug Ali saw:** Two Telegram messages appeared hours apart: `⏱️ Background agent a5bf8c74 timeout · 720m 3s · 0 in / 0 out` and `... ab9372d4 timeout · 720m 1s · 0 in / 0 out`, both with `(empty output)`. Three more agents were still pending, all interrupted mid-execution with hundreds of KB of real work sitting on disk.
|
|
10
|
+
|
|
11
|
+
**Root cause:** v4.12.3's bypass-abort calls `session.abortController.abort()`, which propagates through `claude-sdk-provider.ts`'s `internalAbortController` into the SDK's CLI subprocess, which in turn propagates into any in-flight `Agent(run_in_background: true)` tool executions. Evidence from the disk:
|
|
12
|
+
|
|
13
|
+
- `agent-a03ce829...jsonl`: 116 lines, last event = literally `"[Request interrupted by user for tool use]"` mid-Bash-tool-use
|
|
14
|
+
- `agent-af61fa6e...jsonl`: 81 lines, last assistant text = `"Ich habe jetzt genug Daten für den vollständigen Audit. Hier ist der Report:"` — interrupted while streaming the final report
|
|
15
|
+
- `agent-ac47c4a2...jsonl`: 131 lines, last assistant text = `"## Perseus Audit — Ergebnis\n### Kritische Bugs"` — interrupted a few words into the payoff
|
|
16
|
+
|
|
17
|
+
None of them reached `stop_reason: "end_turn"`. The pre-v4.12.4 `parseOutputFileStatus` only recognized `end_turn` as a completion signal, so these agents sat in the pending list for 12h until `giveUpAt` elapsed, then got delivered as `(empty output)` while their real work was still on disk.
|
|
18
|
+
|
|
19
|
+
**The fix:** `parseOutputFileStatus` now has a staleness fallback. When no `end_turn` is present BUT the outputFile hasn't been written to in `stalenessMs` (default 5 min, configurable via `ALVIN_SUBAGENT_STALENESS_MS`) AND there is usable assistant text content in the tail, the parser:
|
|
20
|
+
|
|
21
|
+
1. Aggregates ALL text blocks across all assistant turns in the tail (not just the last one — bias toward delivering more context)
|
|
22
|
+
2. Prepends a clear banner: `⚠️ _Sub-Agent wurde unterbrochen — hier ist der partielle Output:_`
|
|
23
|
+
3. Returns `state: "completed"` so the watcher delivers it instead of continuing to poll
|
|
24
|
+
|
|
25
|
+
Result: on the next `pollOnce()` after v4.12.4 ships, the three stuck agents get delivered with their real partial output (combined ~1.2MB of text across the three). Future interrupts recover within 5 minutes instead of hanging 12 hours.
|
|
26
|
+
|
|
27
|
+
### Behavioral notes
|
|
28
|
+
|
|
29
|
+
- **Clean `end_turn` sub-agents are unchanged** — the staleness fallback is a *fallback only*. The existing strict path runs first and takes precedence.
|
|
30
|
+
- **`stalenessMs: 0` disables the fallback entirely** — strict end_turn-only mode for callers that prefer it.
|
|
31
|
+
- **Thinking blocks are still filtered out** of the partial delivery — same as with clean completion.
|
|
32
|
+
- **Files with no assistant text at all** (only tool_use) stay in `running` state — nothing useful to deliver.
|
|
33
|
+
- **Tokens are surfaced when available** — the last assistant event's `usage.input_tokens`/`output_tokens` flow through to the delivery banner.
|
|
34
|
+
|
|
35
|
+
### Known limitations (carried over from v4.12.3, deferred to v4.13)
|
|
36
|
+
|
|
37
|
+
- The bypass-abort mechanism in `message.ts` still propagates to the SDK subprocess and kills in-flight sub-agents. v4.12.4 works around this at the delivery layer (recovering partial output); a true fix requires either architectural replacement of the SDK's `Task` tool with our own detached-subprocess dispatch, or SDK support for per-task-branch abort signals. Tracked for v4.13.
|
|
38
|
+
- Users may still experience the bot's "typing…" indicator when Claude is thinking in the main turn (before dispatching any background agent). Bypass only fires once `pendingBackgroundCount > 0`. For interrupt before dispatch, use `/cancel`.
|
|
39
|
+
|
|
40
|
+
### Testing
|
|
41
|
+
|
|
42
|
+
- **Baseline**: 436 tests (v4.12.3)
|
|
43
|
+
- **New**: `test/async-agent-parser-staleness.test.ts` — 11 tests covering: clean `end_turn` still wins over staleness, fresh-interrupted file stays running, stale-interrupted file delivers partial with banner, no-text file stays running, `stalenessMs: 0` disables, aggregation across multiple turns, thinking-block filtering, token extraction, interrupt-only file with no useful content, and ordering preservation.
|
|
44
|
+
- **Total**: 447 tests, all green, TSC clean.
|
|
45
|
+
|
|
46
|
+
### Files changed
|
|
47
|
+
|
|
48
|
+
- **Modified**: `src/services/async-agent-parser.ts` — staleness fallback in `parseOutputFileStatus`, `DEFAULT_STALENESS_MS` constant, `INTERRUPTED_BANNER` prefix.
|
|
49
|
+
- **NEW tests**: `test/async-agent-parser-staleness.test.ts`.
|
|
50
|
+
- **Version**: `package.json` 4.12.3 → 4.12.4.
|
|
51
|
+
|
|
52
|
+
---
|
|
53
|
+
|
|
5
54
|
## [4.12.3] — 2026-04-15
|
|
6
55
|
|
|
7
56
|
### 🐛 Patch: Background sub-agent no longer blocks the main Telegram session
|
|
@@ -68,14 +68,45 @@ export function parseAsyncLaunchedToolResult(raw) {
|
|
|
68
68
|
return { agentId, outputFile };
|
|
69
69
|
}
|
|
70
70
|
const DEFAULT_TAIL_BYTES = 64 * 1024;
|
|
71
|
+
/**
|
|
72
|
+
* v4.12.4 — Default staleness window for partial-output delivery.
|
|
73
|
+
*
|
|
74
|
+
* If an outputFile has not been written to for at least this long AND
|
|
75
|
+
* there is usable assistant text content in it, treat it as "completed
|
|
76
|
+
* with partial output" rather than leaving it to time out at 12h with
|
|
77
|
+
* an empty banner. 5 minutes is a balance between:
|
|
78
|
+
* - Fast enough to unblock interrupted agents (most useful work is
|
|
79
|
+
* done within a few minutes)
|
|
80
|
+
* - Slow enough to avoid false-positives on slow-but-alive agents
|
|
81
|
+
* (typical tool_use gaps are under 30s)
|
|
82
|
+
*
|
|
83
|
+
* Override per call via opts.stalenessMs, or globally via the
|
|
84
|
+
* ALVIN_SUBAGENT_STALENESS_MS env var. `0` disables the fallback
|
|
85
|
+
* entirely (strict end_turn-only completion detection).
|
|
86
|
+
*/
|
|
87
|
+
const DEFAULT_STALENESS_MS = Number(process.env.ALVIN_SUBAGENT_STALENESS_MS) || 5 * 60 * 1000;
|
|
88
|
+
/**
|
|
89
|
+
* Banner prepended to partial-output deliveries so the user knows the
|
|
90
|
+
* sub-agent was interrupted and this isn't a clean completion.
|
|
91
|
+
*/
|
|
92
|
+
const INTERRUPTED_BANNER = "⚠️ _Sub-Agent wurde unterbrochen — hier ist der partielle Output:_\n\n";
|
|
71
93
|
/**
|
|
72
94
|
* Read the tail of an SDK background-agent outputFile and decide what
|
|
73
95
|
* state the sub-agent is in. See spec doc for the JSONL format. We only
|
|
74
96
|
* read the last `maxTailBytes` of the file because long-running agents
|
|
75
97
|
* (SEO audits etc.) can produce hundreds of KB of intermediate JSONL.
|
|
98
|
+
*
|
|
99
|
+
* v4.12.4 adds staleness-based partial-output delivery. When no
|
|
100
|
+
* `end_turn` marker is present, the parser checks file mtime: if the
|
|
101
|
+
* file hasn't grown in `stalenessMs` AND there is text content in the
|
|
102
|
+
* assistant turns, aggregate the text across all turns (not just the
|
|
103
|
+
* last), prepend an "interrupted" banner, and return "completed". This
|
|
104
|
+
* recovers real work from agents killed mid-execution (e.g. by the
|
|
105
|
+
* v4.12.3 bypass abort propagating through the SDK subprocess).
|
|
76
106
|
*/
|
|
77
107
|
export async function parseOutputFileStatus(path, opts = {}) {
|
|
78
108
|
const maxTailBytes = opts.maxTailBytes ?? DEFAULT_TAIL_BYTES;
|
|
109
|
+
const stalenessMs = opts.stalenessMs ?? DEFAULT_STALENESS_MS;
|
|
79
110
|
let stat;
|
|
80
111
|
try {
|
|
81
112
|
stat = await fs.stat(path);
|
|
@@ -147,6 +178,50 @@ export async function parseOutputFileStatus(path, opts = {}) {
|
|
|
147
178
|
};
|
|
148
179
|
}
|
|
149
180
|
}
|
|
150
|
-
// No
|
|
181
|
+
// v4.12.4 — No clean end_turn. Check for staleness + partial text.
|
|
182
|
+
if (stalenessMs > 0) {
|
|
183
|
+
const ageMs = Date.now() - stat.mtimeMs;
|
|
184
|
+
if (ageMs >= stalenessMs) {
|
|
185
|
+
// Aggregate ALL assistant text blocks across the tail, in order.
|
|
186
|
+
// We parse forward now (not backward like the end_turn scan) so
|
|
187
|
+
// the delivered text preserves the natural reading order.
|
|
188
|
+
const textFragments = [];
|
|
189
|
+
let lastUsage;
|
|
190
|
+
for (const line of usable) {
|
|
191
|
+
let parsed;
|
|
192
|
+
try {
|
|
193
|
+
parsed = JSON.parse(line);
|
|
194
|
+
}
|
|
195
|
+
catch {
|
|
196
|
+
continue;
|
|
197
|
+
}
|
|
198
|
+
if (parsed.type === "assistant" &&
|
|
199
|
+
Array.isArray(parsed.message?.content)) {
|
|
200
|
+
for (const c of parsed.message.content) {
|
|
201
|
+
if (c?.type === "text" && typeof c.text === "string") {
|
|
202
|
+
textFragments.push(c.text);
|
|
203
|
+
}
|
|
204
|
+
}
|
|
205
|
+
if (parsed.message?.usage) {
|
|
206
|
+
lastUsage = {
|
|
207
|
+
input: parsed.message.usage.input_tokens ?? 0,
|
|
208
|
+
output: parsed.message.usage.output_tokens ?? 0,
|
|
209
|
+
};
|
|
210
|
+
}
|
|
211
|
+
}
|
|
212
|
+
}
|
|
213
|
+
if (textFragments.length > 0) {
|
|
214
|
+
const aggregated = textFragments.join("\n\n").trim();
|
|
215
|
+
if (aggregated.length > 0) {
|
|
216
|
+
return {
|
|
217
|
+
state: "completed",
|
|
218
|
+
output: INTERRUPTED_BANNER + aggregated,
|
|
219
|
+
tokensUsed: lastUsage,
|
|
220
|
+
};
|
|
221
|
+
}
|
|
222
|
+
}
|
|
223
|
+
}
|
|
224
|
+
}
|
|
225
|
+
// No completion marker found and not stale (or no text) — still running.
|
|
151
226
|
return { state: "running", size: stat.size };
|
|
152
227
|
}
|
package/package.json
CHANGED
|
@@ -0,0 +1,412 @@
|
|
|
1
|
+
/**
|
|
2
|
+
* v4.12.4 — parseOutputFileStatus staleness detection.
|
|
3
|
+
*
|
|
4
|
+
* Problem this fixes: when a background sub-agent is interrupted (e.g. by
|
|
5
|
+
* v4.12.3's bypass-abort propagating through the SDK subprocess), its
|
|
6
|
+
* outputFile is left with partial JSONL — real work, real text — but
|
|
7
|
+
* without the `stop_reason: "end_turn"` marker the pre-v4.12.4 parser
|
|
8
|
+
* required for "completed" state.
|
|
9
|
+
*
|
|
10
|
+
* Real-world evidence (2026-04-16):
|
|
11
|
+
* - Three agents (a03ce829, af61fa6e, ac47c4a2) pending in state file
|
|
12
|
+
* - Each outputFile has 81-131 lines of REAL work (WebSearch, tool_use,
|
|
13
|
+
* partial reports like "Here's the summary:\n\n## Critical Bugs")
|
|
14
|
+
* - Last event is either "[Request interrupted by user for tool use]"
|
|
15
|
+
* or a mid-streaming assistant text that never got end_turn
|
|
16
|
+
* - Watcher polls forever, hits 12h giveUpAt, delivers "empty output"
|
|
17
|
+
* - User sees useless "720m timeout · 0 in / 0 out · (empty output)"
|
|
18
|
+
* messages hours later, while the actual work is sitting on disk
|
|
19
|
+
*
|
|
20
|
+
* Fix behavior:
|
|
21
|
+
* - If no end_turn is found, check mtime/size of the file
|
|
22
|
+
* - If file hasn't been touched for `stalenessMs` (default 5 min) AND
|
|
23
|
+
* there's usable text content in the tail, mark as "completed"
|
|
24
|
+
* with the partial output PREFIXED by an "⚠️ interrupted, partial
|
|
25
|
+
* output" header so the user knows it's not a clean finish
|
|
26
|
+
* - If file IS fresh or has no text content, stay in "running" state
|
|
27
|
+
* (normal polling continues)
|
|
28
|
+
*
|
|
29
|
+
* This deliberately biases toward delivering SOMETHING rather than
|
|
30
|
+
* nothing. Worst case: an agent that's still alive but genuinely idle
|
|
31
|
+
* for >5 min gets its partial text delivered early. Best case: dozens
|
|
32
|
+
* of stuck interrupted agents get their real work back to the user.
|
|
33
|
+
*/
|
|
34
|
+
import { describe, it, expect, beforeEach, afterEach } from "vitest";
|
|
35
|
+
import fs from "fs";
|
|
36
|
+
import os from "os";
|
|
37
|
+
import { resolve } from "path";
|
|
38
|
+
import { parseOutputFileStatus } from "../src/services/async-agent-parser.js";
|
|
39
|
+
|
|
40
|
+
const TMP_BASE = resolve(os.tmpdir(), `alvin-parser-stale-${process.pid}`);
|
|
41
|
+
|
|
42
|
+
beforeEach(() => {
|
|
43
|
+
fs.mkdirSync(TMP_BASE, { recursive: true });
|
|
44
|
+
});
|
|
45
|
+
afterEach(() => {
|
|
46
|
+
try {
|
|
47
|
+
fs.rmSync(TMP_BASE, { recursive: true, force: true });
|
|
48
|
+
} catch {
|
|
49
|
+
/* ignore */
|
|
50
|
+
}
|
|
51
|
+
});
|
|
52
|
+
|
|
53
|
+
/**
|
|
54
|
+
* Write a JSONL file with a mid-execution interrupted state. No end_turn,
|
|
55
|
+
* but contains real assistant text + tool calls. Last line is the
|
|
56
|
+
* "Request interrupted" marker.
|
|
57
|
+
*/
|
|
58
|
+
function writeInterruptedJsonl(name: string): string {
|
|
59
|
+
const path = resolve(TMP_BASE, name);
|
|
60
|
+
const lines = [
|
|
61
|
+
JSON.stringify({
|
|
62
|
+
type: "user",
|
|
63
|
+
isSidechain: true,
|
|
64
|
+
agentId: "x",
|
|
65
|
+
message: { role: "user", content: "do a report" },
|
|
66
|
+
}),
|
|
67
|
+
JSON.stringify({
|
|
68
|
+
type: "assistant",
|
|
69
|
+
isSidechain: true,
|
|
70
|
+
agentId: "x",
|
|
71
|
+
message: {
|
|
72
|
+
role: "assistant",
|
|
73
|
+
content: [{ type: "text", text: "Starting research..." }],
|
|
74
|
+
stop_reason: "tool_use",
|
|
75
|
+
},
|
|
76
|
+
}),
|
|
77
|
+
JSON.stringify({
|
|
78
|
+
type: "assistant",
|
|
79
|
+
isSidechain: true,
|
|
80
|
+
agentId: "x",
|
|
81
|
+
message: {
|
|
82
|
+
role: "assistant",
|
|
83
|
+
content: [
|
|
84
|
+
{
|
|
85
|
+
type: "text",
|
|
86
|
+
text:
|
|
87
|
+
"Here's what I found:\n\n## Key Findings\n- Finding A\n- Finding B\n- Finding C",
|
|
88
|
+
},
|
|
89
|
+
],
|
|
90
|
+
stop_reason: "tool_use",
|
|
91
|
+
},
|
|
92
|
+
}),
|
|
93
|
+
JSON.stringify({
|
|
94
|
+
type: "user",
|
|
95
|
+
isSidechain: true,
|
|
96
|
+
agentId: "x",
|
|
97
|
+
message: {
|
|
98
|
+
role: "user",
|
|
99
|
+
content: [
|
|
100
|
+
{
|
|
101
|
+
type: "tool_result",
|
|
102
|
+
content: "[Request interrupted by user for tool use]",
|
|
103
|
+
},
|
|
104
|
+
],
|
|
105
|
+
},
|
|
106
|
+
}),
|
|
107
|
+
];
|
|
108
|
+
fs.writeFileSync(path, lines.join("\n") + "\n", "utf-8");
|
|
109
|
+
return path;
|
|
110
|
+
}
|
|
111
|
+
|
|
112
|
+
/** Set file mtime to N ms in the past. */
|
|
113
|
+
function setStale(path: string, ageMs: number): void {
|
|
114
|
+
const target = Date.now() - ageMs;
|
|
115
|
+
fs.utimesSync(path, target / 1000, target / 1000);
|
|
116
|
+
}
|
|
117
|
+
|
|
118
|
+
describe("parseOutputFileStatus — staleness detection (v4.12.4)", () => {
|
|
119
|
+
it("still returns 'completed' when end_turn is present (staleness is a fallback only)", async () => {
|
|
120
|
+
const path = resolve(TMP_BASE, "complete.jsonl");
|
|
121
|
+
fs.writeFileSync(
|
|
122
|
+
path,
|
|
123
|
+
JSON.stringify({
|
|
124
|
+
type: "assistant",
|
|
125
|
+
agentId: "x",
|
|
126
|
+
message: {
|
|
127
|
+
content: [{ type: "text", text: "clean end" }],
|
|
128
|
+
stop_reason: "end_turn",
|
|
129
|
+
},
|
|
130
|
+
}) + "\n",
|
|
131
|
+
"utf-8",
|
|
132
|
+
);
|
|
133
|
+
setStale(path, 3600_000); // 1h old
|
|
134
|
+
const status = await parseOutputFileStatus(path, {
|
|
135
|
+
stalenessMs: 300_000,
|
|
136
|
+
});
|
|
137
|
+
expect(status.state).toBe("completed");
|
|
138
|
+
if (status.state === "completed") {
|
|
139
|
+
expect(status.output).toContain("clean end");
|
|
140
|
+
// No interrupted banner for clean end_turn
|
|
141
|
+
expect(status.output).not.toMatch(/interrupt/i);
|
|
142
|
+
}
|
|
143
|
+
});
|
|
144
|
+
|
|
145
|
+
it("returns 'running' when file is fresh and no end_turn (normal polling)", async () => {
|
|
146
|
+
const path = writeInterruptedJsonl("fresh-interrupted.jsonl");
|
|
147
|
+
// File is fresh (just written)
|
|
148
|
+
const status = await parseOutputFileStatus(path, {
|
|
149
|
+
stalenessMs: 300_000,
|
|
150
|
+
});
|
|
151
|
+
expect(status.state).toBe("running");
|
|
152
|
+
});
|
|
153
|
+
|
|
154
|
+
it("returns 'completed' (partial) when file is stale AND has text content", async () => {
|
|
155
|
+
const path = writeInterruptedJsonl("stale-interrupted.jsonl");
|
|
156
|
+
setStale(path, 600_000); // 10 min old
|
|
157
|
+
const status = await parseOutputFileStatus(path, {
|
|
158
|
+
stalenessMs: 300_000, // 5 min threshold
|
|
159
|
+
});
|
|
160
|
+
expect(status.state).toBe("completed");
|
|
161
|
+
if (status.state === "completed") {
|
|
162
|
+
// Should contain the real report content
|
|
163
|
+
expect(status.output).toContain("Key Findings");
|
|
164
|
+
expect(status.output).toContain("Finding A");
|
|
165
|
+
// Should be prefixed with an interrupted banner so user knows
|
|
166
|
+
// (German "unterbrochen" / "partielle" OR English "interrupted"/"partial")
|
|
167
|
+
expect(status.output).toMatch(/interrupt|partial|unterbroch|partiell|⚠️/i);
|
|
168
|
+
}
|
|
169
|
+
});
|
|
170
|
+
|
|
171
|
+
it("returns 'running' when file is stale but has NO text content (nothing to deliver)", async () => {
|
|
172
|
+
// Only tool-use events, no text. Delivery would be useless.
|
|
173
|
+
const path = resolve(TMP_BASE, "no-text.jsonl");
|
|
174
|
+
fs.writeFileSync(
|
|
175
|
+
path,
|
|
176
|
+
[
|
|
177
|
+
JSON.stringify({
|
|
178
|
+
type: "user",
|
|
179
|
+
agentId: "x",
|
|
180
|
+
message: { role: "user", content: "go" },
|
|
181
|
+
}),
|
|
182
|
+
JSON.stringify({
|
|
183
|
+
type: "assistant",
|
|
184
|
+
agentId: "x",
|
|
185
|
+
message: {
|
|
186
|
+
content: [
|
|
187
|
+
{ type: "tool_use", name: "Bash", input: { command: "ls" } },
|
|
188
|
+
],
|
|
189
|
+
stop_reason: "tool_use",
|
|
190
|
+
},
|
|
191
|
+
}),
|
|
192
|
+
].join("\n") + "\n",
|
|
193
|
+
"utf-8",
|
|
194
|
+
);
|
|
195
|
+
setStale(path, 600_000);
|
|
196
|
+
const status = await parseOutputFileStatus(path, {
|
|
197
|
+
stalenessMs: 300_000,
|
|
198
|
+
});
|
|
199
|
+
expect(status.state).toBe("running");
|
|
200
|
+
});
|
|
201
|
+
|
|
202
|
+
it("default stalenessMs is applied when not provided (no crashes on legacy callers)", async () => {
|
|
203
|
+
const path = writeInterruptedJsonl("default-cfg.jsonl");
|
|
204
|
+
setStale(path, 24 * 3600_000); // 24h old — very stale
|
|
205
|
+
const status = await parseOutputFileStatus(path);
|
|
206
|
+
// Whatever the default is, 24h should definitely exceed it
|
|
207
|
+
expect(status.state).toBe("completed");
|
|
208
|
+
});
|
|
209
|
+
|
|
210
|
+
it("stalenessMs: 0 disables the staleness fallback entirely", async () => {
|
|
211
|
+
const path = writeInterruptedJsonl("disabled.jsonl");
|
|
212
|
+
setStale(path, 24 * 3600_000);
|
|
213
|
+
const status = await parseOutputFileStatus(path, { stalenessMs: 0 });
|
|
214
|
+
// With staleness disabled, we're back to strict end_turn requirement
|
|
215
|
+
expect(status.state).toBe("running");
|
|
216
|
+
});
|
|
217
|
+
|
|
218
|
+
it("aggregates ALL text blocks from ALL assistant turns when delivering partial", async () => {
|
|
219
|
+
const path = resolve(TMP_BASE, "multi-turn-interrupted.jsonl");
|
|
220
|
+
const lines = [
|
|
221
|
+
{ type: "user", agentId: "x", message: { role: "user", content: "go" } },
|
|
222
|
+
{
|
|
223
|
+
type: "assistant",
|
|
224
|
+
agentId: "x",
|
|
225
|
+
message: {
|
|
226
|
+
content: [{ type: "text", text: "First thought." }],
|
|
227
|
+
stop_reason: "tool_use",
|
|
228
|
+
},
|
|
229
|
+
},
|
|
230
|
+
{
|
|
231
|
+
type: "assistant",
|
|
232
|
+
agentId: "x",
|
|
233
|
+
message: {
|
|
234
|
+
content: [{ type: "text", text: "Second thought." }],
|
|
235
|
+
stop_reason: "tool_use",
|
|
236
|
+
},
|
|
237
|
+
},
|
|
238
|
+
{
|
|
239
|
+
type: "assistant",
|
|
240
|
+
agentId: "x",
|
|
241
|
+
message: {
|
|
242
|
+
content: [{ type: "text", text: "Final partial report." }],
|
|
243
|
+
stop_reason: "tool_use",
|
|
244
|
+
},
|
|
245
|
+
},
|
|
246
|
+
];
|
|
247
|
+
fs.writeFileSync(
|
|
248
|
+
path,
|
|
249
|
+
lines.map((l) => JSON.stringify(l)).join("\n") + "\n",
|
|
250
|
+
"utf-8",
|
|
251
|
+
);
|
|
252
|
+
setStale(path, 600_000);
|
|
253
|
+
const status = await parseOutputFileStatus(path, {
|
|
254
|
+
stalenessMs: 300_000,
|
|
255
|
+
});
|
|
256
|
+
expect(status.state).toBe("completed");
|
|
257
|
+
if (status.state === "completed") {
|
|
258
|
+
// Should contain text from all three turns (bias toward delivering more)
|
|
259
|
+
expect(status.output).toContain("First thought");
|
|
260
|
+
expect(status.output).toContain("Second thought");
|
|
261
|
+
expect(status.output).toContain("Final partial report");
|
|
262
|
+
}
|
|
263
|
+
});
|
|
264
|
+
|
|
265
|
+
it("ignores thinking blocks in partial delivery (user doesn't want Claude's scratchpad)", async () => {
|
|
266
|
+
const path = resolve(TMP_BASE, "thinking-filter.jsonl");
|
|
267
|
+
const lines = [
|
|
268
|
+
{
|
|
269
|
+
type: "assistant",
|
|
270
|
+
agentId: "x",
|
|
271
|
+
message: {
|
|
272
|
+
content: [
|
|
273
|
+
{ type: "thinking", text: "internal reasoning nobody should see" },
|
|
274
|
+
{ type: "text", text: "Actual output text." },
|
|
275
|
+
],
|
|
276
|
+
stop_reason: "tool_use",
|
|
277
|
+
},
|
|
278
|
+
},
|
|
279
|
+
];
|
|
280
|
+
fs.writeFileSync(
|
|
281
|
+
path,
|
|
282
|
+
lines.map((l) => JSON.stringify(l)).join("\n") + "\n",
|
|
283
|
+
"utf-8",
|
|
284
|
+
);
|
|
285
|
+
setStale(path, 600_000);
|
|
286
|
+
const status = await parseOutputFileStatus(path, {
|
|
287
|
+
stalenessMs: 300_000,
|
|
288
|
+
});
|
|
289
|
+
expect(status.state).toBe("completed");
|
|
290
|
+
if (status.state === "completed") {
|
|
291
|
+
expect(status.output).toContain("Actual output text");
|
|
292
|
+
expect(status.output).not.toContain("internal reasoning");
|
|
293
|
+
}
|
|
294
|
+
});
|
|
295
|
+
|
|
296
|
+
it("extracts usage tokens from the last assistant event when available", async () => {
|
|
297
|
+
const path = resolve(TMP_BASE, "tokens-partial.jsonl");
|
|
298
|
+
const lines = [
|
|
299
|
+
{
|
|
300
|
+
type: "assistant",
|
|
301
|
+
agentId: "x",
|
|
302
|
+
message: {
|
|
303
|
+
content: [{ type: "text", text: "partial text" }],
|
|
304
|
+
stop_reason: "tool_use",
|
|
305
|
+
usage: { input_tokens: 500, output_tokens: 200 },
|
|
306
|
+
},
|
|
307
|
+
},
|
|
308
|
+
];
|
|
309
|
+
fs.writeFileSync(
|
|
310
|
+
path,
|
|
311
|
+
lines.map((l) => JSON.stringify(l)).join("\n") + "\n",
|
|
312
|
+
"utf-8",
|
|
313
|
+
);
|
|
314
|
+
setStale(path, 600_000);
|
|
315
|
+
const status = await parseOutputFileStatus(path, {
|
|
316
|
+
stalenessMs: 300_000,
|
|
317
|
+
});
|
|
318
|
+
expect(status.state).toBe("completed");
|
|
319
|
+
if (status.state === "completed") {
|
|
320
|
+
expect(status.tokensUsed).toEqual({ input: 500, output: 200 });
|
|
321
|
+
}
|
|
322
|
+
});
|
|
323
|
+
|
|
324
|
+
it("handles file that only has the interruption marker (nothing useful to deliver)", async () => {
|
|
325
|
+
// Edge case: only interruption, no prior text
|
|
326
|
+
const path = resolve(TMP_BASE, "only-interrupt.jsonl");
|
|
327
|
+
const lines = [
|
|
328
|
+
{
|
|
329
|
+
type: "user",
|
|
330
|
+
agentId: "x",
|
|
331
|
+
message: {
|
|
332
|
+
role: "user",
|
|
333
|
+
content: [
|
|
334
|
+
{
|
|
335
|
+
type: "tool_result",
|
|
336
|
+
content: "[Request interrupted by user for tool use]",
|
|
337
|
+
},
|
|
338
|
+
],
|
|
339
|
+
},
|
|
340
|
+
},
|
|
341
|
+
];
|
|
342
|
+
fs.writeFileSync(
|
|
343
|
+
path,
|
|
344
|
+
lines.map((l) => JSON.stringify(l)).join("\n") + "\n",
|
|
345
|
+
"utf-8",
|
|
346
|
+
);
|
|
347
|
+
setStale(path, 600_000);
|
|
348
|
+
const status = await parseOutputFileStatus(path, {
|
|
349
|
+
stalenessMs: 300_000,
|
|
350
|
+
});
|
|
351
|
+
// No assistant text content at all → still running (nothing useful)
|
|
352
|
+
expect(status.state).toBe("running");
|
|
353
|
+
});
|
|
354
|
+
|
|
355
|
+
it("preserves ordering of text across turns (earlier text first, later text last)", async () => {
|
|
356
|
+
const path = resolve(TMP_BASE, "order.jsonl");
|
|
357
|
+
const lines = [
|
|
358
|
+
{
|
|
359
|
+
type: "assistant",
|
|
360
|
+
agentId: "x",
|
|
361
|
+
message: {
|
|
362
|
+
content: [{ type: "text", text: "ALPHA" }],
|
|
363
|
+
stop_reason: "tool_use",
|
|
364
|
+
},
|
|
365
|
+
},
|
|
366
|
+
{
|
|
367
|
+
type: "user",
|
|
368
|
+
agentId: "x",
|
|
369
|
+
message: { content: [{ type: "tool_result", content: "..." }] },
|
|
370
|
+
},
|
|
371
|
+
{
|
|
372
|
+
type: "assistant",
|
|
373
|
+
agentId: "x",
|
|
374
|
+
message: {
|
|
375
|
+
content: [{ type: "text", text: "BETA" }],
|
|
376
|
+
stop_reason: "tool_use",
|
|
377
|
+
},
|
|
378
|
+
},
|
|
379
|
+
{
|
|
380
|
+
type: "user",
|
|
381
|
+
agentId: "x",
|
|
382
|
+
message: { content: [{ type: "tool_result", content: "..." }] },
|
|
383
|
+
},
|
|
384
|
+
{
|
|
385
|
+
type: "assistant",
|
|
386
|
+
agentId: "x",
|
|
387
|
+
message: {
|
|
388
|
+
content: [{ type: "text", text: "GAMMA" }],
|
|
389
|
+
stop_reason: "tool_use",
|
|
390
|
+
},
|
|
391
|
+
},
|
|
392
|
+
];
|
|
393
|
+
fs.writeFileSync(
|
|
394
|
+
path,
|
|
395
|
+
lines.map((l) => JSON.stringify(l)).join("\n") + "\n",
|
|
396
|
+
"utf-8",
|
|
397
|
+
);
|
|
398
|
+
setStale(path, 600_000);
|
|
399
|
+
const status = await parseOutputFileStatus(path, {
|
|
400
|
+
stalenessMs: 300_000,
|
|
401
|
+
});
|
|
402
|
+
expect(status.state).toBe("completed");
|
|
403
|
+
if (status.state === "completed") {
|
|
404
|
+
const alphaIdx = status.output.indexOf("ALPHA");
|
|
405
|
+
const betaIdx = status.output.indexOf("BETA");
|
|
406
|
+
const gammaIdx = status.output.indexOf("GAMMA");
|
|
407
|
+
expect(alphaIdx).toBeGreaterThan(-1);
|
|
408
|
+
expect(betaIdx).toBeGreaterThan(alphaIdx);
|
|
409
|
+
expect(gammaIdx).toBeGreaterThan(betaIdx);
|
|
410
|
+
}
|
|
411
|
+
});
|
|
412
|
+
});
|