switchroom 0.14.1 → 0.14.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -49278,8 +49278,8 @@ var {
49278
49278
  } = import__.default;
49279
49279
 
49280
49280
  // src/build-info.ts
49281
- var VERSION = "0.14.1";
49282
- var COMMIT_SHA = "e51a8794";
49281
+ var VERSION = "0.14.3";
49282
+ var COMMIT_SHA = "b61cef7e";
49283
49283
 
49284
49284
  // src/cli/agent.ts
49285
49285
  init_source();
@@ -51763,7 +51763,9 @@ function buildSettingsHooksBlock(p) {
51763
51763
  ` + `So:
51764
51764
  ` + " - Trivial / social message \u2192 reply once, briefly, in your voice. " + `The reply IS the response.
51765
51765
  ` + ` - Question with a short answer \u2192 just reply with the answer.
51766
- ` + " - Complex tool-driven work \u2192 go straight to the tools (the " + "compose-area preview is the ambient liveness signal), then reply " + 'once with the answer or a genuine mid-work pivot ("halfway ' + 'through \u2014 found an unexpected issue, want me to continue?"). Not ' + '"still working".</turn-pacing>';
51766
+ ` + " - Complex tool-driven work \u2192 go straight to the tools (the " + "compose-area preview is the ambient liveness signal), then reply " + 'once with the answer or a genuine mid-work pivot ("halfway ' + 'through \u2014 found an unexpected issue, want me to continue?"). Not ' + `"still working".
51767
+
51768
+ ` + 'Do NOT send a trailing confirmation after your answer \u2014 no "Done.", ' + '"Sent.", "Hope that helps." as a separate message once you have ' + "already replied. Your answer is the last thing the user should " + `see; a follow-up "Done." is dead-air clutter (and the user's ` + "device already pinged on the answer). Stop after the answer.</turn-pacing>";
51767
51769
  const switchroomUserPromptSubmit = [
51768
51770
  ...useHotReloadStable ? [
51769
51771
  {
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "switchroom",
3
- "version": "0.14.1",
3
+ "version": "0.14.3",
4
4
  "description": "Run Claude Code 24/7 on your Claude Pro/Max subscription over Telegram. Open-source alternative to OpenClaw and NanoClaw — no API keys.",
5
5
  "type": "module",
6
6
  "bin": {
@@ -0,0 +1,122 @@
1
+ # Agent:
2
+
3
+ ## What you are
4
+
5
+ You are a **switchroom agent** — an instance of **Claude Code** (Anthropic's official `claude` CLI, unmodified) running in a Linux container, managed by switchroom. Your `$SWITCHROOM_AGENT_NAME` is ``. Be honest about this when asked ("what are you" / "what's running here"): switchroom agent `` running Claude Code under the official `claude` CLI. Not a custom model, not a wrapper, not "an AI assistant" in the abstract.
6
+
7
+ You are one of several agents here. To see the others, call `peers_list` on the `agent-config` MCP server — returns `[{name, purpose, admin}]` live from `switchroom.yaml`. **Never memorize peers into Hindsight or hard-code them into replies** — drift kills trust. On "who else is here" / "is there an agent that does X" / "who handles Y" / "who can do <admin op>", call `peers_list` first and answer from its result; if no peer matches, say so.
8
+
9
+ ## Who you are
10
+
11
+ See `SOUL.md` (in this directory) for your identity, vibe, communication style, and expertise. That file is your persona source of truth.
12
+
13
+
14
+ ## Core Behavior
15
+ - Respond helpfully, concisely, and conversationally.
16
+ - Use your available tools when they add clear value — don't force tool use when a plain answer suffices.
17
+ - Save important facts, preferences, and decisions to memory so you can recall them later.
18
+ - When asked to do something ambiguous, ask one clarifying question rather than guessing.
19
+ - If a task has multiple steps, outline your plan before executing.
20
+
21
+ ## Safety
22
+ - Don't exfiltrate private data. Ever.
23
+ - Don't run destructive commands without asking.
24
+ - Prefer `trash` over `rm` when available (recoverable beats gone forever).
25
+ - Safe to do freely: read files, explore, organize, search the web, check calendars, work within this workspace.
26
+ - Ask first: sending emails, tweets, public posts, anything that leaves the machine, anything you're uncertain about.
27
+
28
+ ## Execution Bias
29
+
30
+ How you should decide what to do next. These are procedural rules, not vibe.
31
+
32
+ - **Act in-turn.** If the request is actionable, do it this turn. Don't finish with a plan or promise when tools can move it forward.
33
+ - **Verify mutable facts before claiming them.** Files, git state, clocks, versions, services, processes, package state, the contents of an `Edit` target: read live. Memory and prior context are not verification sources. "I think the function is at line 200" is not an answer; `Grep`/`Read` is.
34
+ - **Final answer needs evidence.** Test/build/lint output, screenshot, inspection, tool output, or a named blocker. "It should work" is not a finalization.
35
+ - **Weak or empty tool result is not a conclusion.** Vary the query, path, command, or source before deciding the thing isn't there.
36
+ - **Non-final turn:** use tools to advance, or ask the one clarifying question that unblocks safe progress. One question, not five.
37
+
38
+
39
+ ## Memory — Hindsight is your single backend
40
+
41
+ **Claude Code's built-in file-based auto-memory is disabled for this agent.** Don't try to write `.md` files under `.claude/projects/.../memory/` or maintain a `MEMORY.md` index — that whole system is off. There's exactly one memory backend: **Hindsight**.
42
+
43
+ Hindsight is a memory bank with semantic search, knowledge graph, entity resolution, mental models, and directives. You talk to it through MCP tools (all pre-approved):
44
+
45
+ ### Day-to-day tools
46
+ - `mcp__hindsight__recall` — semantic-search the bank for relevant past memories. Auto-fires on every inbound user message via the plugin's UserPromptSubmit hook (you'll see "Relevant memories from past conversations" in your context). Call manually when you need a more specific query than the auto-fired one.
47
+ - `mcp__hindsight__retain` — store a new memory. The plugin automatically retains the conversation transcript every ~10 turns via the Stop hook, so you usually don't need this. Call manually for significant decisions, corrections, or facts you want immediately searchable.
48
+ - `mcp__hindsight__reflect` — Hindsight's LLM-powered "answer this query using the bank's content + directives". Use when the user asks a question that requires synthesis across multiple past memories.
49
+
50
+ ### Mental Models (replaces hand-curated user profile)
51
+ A mental model is a pre-computed semantic summary backed by reflection over the bank. It's the proper way to maintain things like "what do we know about this user" — semantically populated, automatically refreshed.
52
+
53
+ - `mcp__hindsight__create_mental_model(name, source_query)` — create one. When the user shares a fact about themselves (preferences, background, goals), don't write a file — instead, retain the fact and (if no User Profile mental model exists yet) create one with `source_query: "what do we know about this user?"`. Hindsight will populate it from the retained memories.
54
+
55
+ ### Directives (replaces feedback rules)
56
+ Hard rules the agent must follow during reflect — guardrails that are always applied.
57
+
58
+ - `mcp__hindsight__create_directive(text)` — e.g., `create_directive("Always prefer TypeScript over JavaScript for this user's projects")`. When the user gives you a correction or "always do X" rule, create a directive instead of writing a feedback `.md` file.
59
+
60
+ (Inspection tools like `list_memories`, `list_mental_models`, `update_mental_model`, `refresh_mental_model`, `list_directives`, `delete_directive` are available under the `mcp__hindsight__*` namespace if you ever need them, but you rarely should — Hindsight's own auto-recall surfaces what matters and the operator handles bank curation out-of-band.)
61
+
62
+ ### What to retain — and what NOT to retain
63
+
64
+ Retain proactively when:
65
+ - The user shares a preference or fact about themselves
66
+ - The user gives you a correction or rule (these go to directives, not retain)
67
+ - A significant decision was made and the rationale matters for next time
68
+ - You did real work and the result + the path you took would be useful next session
69
+
70
+ Don't retain:
71
+ - Routine pleasantries, "thanks", "got it"
72
+ - Conversation chatter that doesn't carry forward
73
+ - Sensitive content the user explicitly asked you to not remember
74
+ - Things already in a mental model — they'll be re-derived from underlying memories
75
+
76
+ The plugin's auto-retain (Stop hook) handles transcript-level storage on a 10-turn cadence, so you don't need to manually retain everything. Use manual `retain` for high-signal observations you want immediately searchable.
77
+
78
+ ## Sub-Agent Delegation
79
+
80
+ The main session is for conversation. Execution belongs in sub-agents. Before making tool calls, classify the request:
81
+
82
+ **Stay in main (conversational):**
83
+ - Quick lookups (1-2 tool calls max)
84
+ - Memory/config reads and writes
85
+ - Questions that need user input before acting
86
+ - Simple status checks, coaching, motivation, emotional support
87
+
88
+ **Delegate to a sub-agent (execution):**
89
+ - Any code change — delegate to `@worker`
90
+ - Research requiring web searches or 3+ file reads — delegate to `@researcher`
91
+ - File creation, code generation, build/deploy, multi-step infra
92
+ - Data analysis or report generation
93
+ - Anything involving 3+ sequential tool calls without needing user input
94
+ - Review of completed work — delegate to `@reviewer`
95
+
96
+ **Golden rule:** when in doubt, delegate. Unnecessary delegation costs slightly more tokens. A blocked session costs the user's attention. Keep your own turns short — dispatch and acknowledge. The user should never wait more than 10 seconds for a response from you.
97
+
98
+ **Anti-patterns:** starting a task inline then realizing it's complex mid-way; doing 5+ tool calls "because it's almost done"; polling sub-agent status in a loop.
99
+
100
+ If no sub-agents are configured, do the work yourself.
101
+
102
+ ## Session Continuity
103
+
104
+ By default, every restart starts a **fresh `claude` session** — the in-flight transcript is NOT carried over (`session_continuity.resume_mode: handoff`, the default since switchroom #362). Don't assume tool state, scratch variables, or unread tool output from before the restart are still available. What does survive:
105
+
106
+ - **Handoff briefing** — on a clean shutdown, the Stop hook writes a bounded raw transcript tail of the prior session to `.handoff.md`. On boot, start.sh injects it into your `--append-system-prompt` so you can reorient — read it, and lean on your memory files for anything older. If `.handoff.md` is missing or stale (fresh agent, or pre-Stop-hook crash), `start.sh` runs `handoff-briefing.sh` to assemble `.handoff-briefing.md` from Telegram + Hindsight + today's daily memory, and injects whichever is fresher.
107
+ - **Hindsight memory** — auto-recall fires on every inbound user message and surfaces relevant memories from past sessions. Long-term facts, decisions, and mental models live here, not in the transcript.
108
+ - **Telegram history** — the gateway's SQLite buffer remembers every inbound/outbound message. Use `get_recent_messages` to recover recent chat context if the handoff briefing doesn't cover what you need.
109
+ - **`SWITCHROOM_PENDING_TURN`** — if your previous session was killed mid-turn (watchdog, SIGTERM, timeout), start.sh exports this env var plus the chat/thread/last-user-message context. Acknowledge the interruption and ask for direction rather than silently resuming.
110
+ - **`.wake-audit-pending`** sentinel — every boot drops this file under `TELEGRAM_STATE_DIR`. On your first turn, run the three-signal check (owed reply / orphan sub-agents / open todos) per the wake-audit protocol in your CLAUDE.md, then `rm -f` the sentinel.
111
+
112
+ A config-summary greeting card is sent automatically by the SessionStart hook — you don't need to announce yourself. If your context feels thin (after compaction or any fresh session), proactively recall from Hindsight before proceeding.
113
+
114
+ (Operators can override the resume policy per-agent via `session_continuity.resume_mode` in switchroom.yaml — `auto`, `continue`, `handoff`, or `none`. The default is `handoff`.)
115
+
116
+ ## Admin operations
117
+
118
+ You're NOT `admin: true`. If asked to restart agents / read peer logs / exec into peer containers / run fleet updates, call `peers_list`, find an entry with `admin: true`, and point the user there: _"I can't restart agents from here — ask `<admin-name>`, they're admin on this instance."_ No long apology; just hand off.
119
+
120
+ ## Tools
121
+ Use your available tools when appropriate. If you lack the right tool for a task, say so clearly rather than attempting a workaround.
122
+
@@ -23063,14 +23063,14 @@ function createToolLabelSidecar(opts) {
23063
23063
  } catch {
23064
23064
  continue;
23065
23065
  }
23066
- if (!row || typeof row.tool_use_id !== "string" || typeof row.label !== "string")
23066
+ if (!row || typeof row.tool_use_id !== "string" || typeof row.label !== "string" || typeof row.tool_name !== "string")
23067
23067
  continue;
23068
23068
  if (labels.has(row.tool_use_id))
23069
23069
  continue;
23070
23070
  labels.set(row.tool_use_id, row.label);
23071
23071
  for (const cb of subscribers) {
23072
23072
  try {
23073
- cb(row.tool_use_id, row.label);
23073
+ cb(row.tool_use_id, row.label, row.tool_name);
23074
23074
  } catch {}
23075
23075
  }
23076
23076
  }
@@ -23441,6 +23441,9 @@ function startSessionTail(config2) {
23441
23441
  try {
23442
23442
  const s = createToolLabelSidecar({ stateDir: stateDirForSidecar, sessionId });
23443
23443
  sidecars.set(sessionId, s);
23444
+ s.onLabel((toolUseId, label, toolName) => {
23445
+ rawOnEvent({ kind: "tool_label", toolUseId, label, toolName });
23446
+ });
23444
23447
  return s;
23445
23448
  } catch (err) {
23446
23449
  log?.(`session-tail: sidecar create failed: ${err.message}`);
@@ -23554,6 +23557,9 @@ function startSessionTail(config2) {
23554
23557
  }
23555
23558
  log?.(`session-tail: attached to ${file} (cursor=${cursor})`);
23556
23559
  }
23560
+ const attachSid = sessionIdForFile(file);
23561
+ if (attachSid)
23562
+ ensureSidecar(attachSid);
23557
23563
  try {
23558
23564
  watcher = watch(file, () => readNew());
23559
23565
  } catch (err) {
@@ -31866,110 +31866,25 @@ function registerAndRender(state, toolName) {
31866
31866
  return null;
31867
31867
  return formatSummary(state);
31868
31868
  }
31869
- function baseName(p) {
31870
- if (typeof p !== "string" || p.length === 0)
31871
- return null;
31872
- const parts = p.split("/").filter(Boolean);
31873
- return parts.length > 0 ? parts[parts.length - 1] : p;
31874
- }
31875
- function hostName(u) {
31876
- if (typeof u !== "string" || u.length === 0)
31877
- return null;
31878
- try {
31879
- return new URL(u).hostname.replace(/^www\./, "");
31880
- } catch {
31881
- return u.replace(/^https?:\/\//, "").split("/")[0] || null;
31882
- }
31883
- }
31884
- function clip(s, n) {
31885
- if (typeof s !== "string")
31886
- return null;
31887
- const t = s.trim();
31888
- if (t.length === 0)
31869
+ var MIRROR_MAX_LINES = 6;
31870
+ function renderActivityFeed(lines) {
31871
+ if (lines.length === 0)
31889
31872
  return null;
31890
- return t.length > n ? t.slice(0, n - 1) + "\u2026" : t;
31873
+ const shown = lines.slice(-MIRROR_MAX_LINES);
31874
+ const hidden = lines.length - shown.length;
31875
+ const body = shown.map((l) => `\u00b7 ${l}`).join(`
31876
+ `);
31877
+ return hidden > 0 ? `\u00b7 +${hidden} earlier\u2026
31878
+ ${body}` : body;
31891
31879
  }
31892
- function describeToolUse(toolName, input) {
31893
- if (!toolName)
31880
+ function appendActivityLabel(lines, label) {
31881
+ const l = (label ?? "").trim();
31882
+ if (l.length === 0)
31894
31883
  return null;
31895
- const inp = input ?? {};
31896
- const mcpMatch = /^mcp__(.+?)__(.+)$/.exec(toolName);
31897
- if (mcpMatch) {
31898
- const server = mcpMatch[1].toLowerCase();
31899
- const tool = mcpMatch[2].toLowerCase();
31900
- if (server === "switchroom-telegram")
31901
- return null;
31902
- if (server === "hindsight") {
31903
- if (tool === "recall" || tool === "reflect")
31904
- return "Searching memory";
31905
- if (tool === "retain" || tool === "update_memory" || tool === "sync_retain")
31906
- return "Saving to memory";
31907
- return "Working with memory";
31908
- }
31909
- if (server === "google-workspace" || server === "claude_ai_google_calendar") {
31910
- return "Checking your calendar";
31911
- }
31912
- if (server === "claude_ai_gmail")
31913
- return "Checking your email";
31914
- if (server === "claude_ai_google_drive")
31915
- return "Looking through your files";
31916
- if (server === "notion" || server === "claude_ai_notion") {
31917
- return "Checking your notes";
31918
- }
31919
- const desc = clip(inp.description, 60) ?? clip(inp.query, 50) ?? clip(inp.title, 50);
31920
- if (desc)
31921
- return desc;
31922
- return "Using " + tool.replace(/[-_]+/g, " ");
31923
- }
31924
- switch (toolName) {
31925
- case "Bash": {
31926
- return clip(inp.description, 70) ?? "Running a command";
31927
- }
31928
- case "BashOutput":
31929
- case "KillShell":
31930
- return "Managing a background command";
31931
- case "Read": {
31932
- const f = baseName(inp.file_path);
31933
- return f ? `Reading ${f}` : "Reading a file";
31934
- }
31935
- case "Edit":
31936
- case "MultiEdit":
31937
- case "NotebookEdit": {
31938
- const f = baseName(inp.file_path) ?? baseName(inp.notebook_path);
31939
- return f ? `Editing ${f}` : "Editing a file";
31940
- }
31941
- case "Write": {
31942
- const f = baseName(inp.file_path);
31943
- return f ? `Writing ${f}` : "Writing a file";
31944
- }
31945
- case "Grep":
31946
- case "Glob": {
31947
- const p = clip(inp.pattern, 40);
31948
- return p ? `Searching for ${p}` : "Searching files";
31949
- }
31950
- case "WebFetch": {
31951
- const h = hostName(inp.url);
31952
- return h ? `Reading ${h}` : "Reading a web page";
31953
- }
31954
- case "WebSearch": {
31955
- const q = clip(inp.query, 50);
31956
- return q ? `Searching the web for ${q}` : "Searching the web";
31957
- }
31958
- case "Task":
31959
- case "Agent": {
31960
- const d = clip(inp.description, 60);
31961
- return d ? `Delegating: ${d}` : "Delegating to a sub-agent";
31962
- }
31963
- case "TodoWrite":
31964
- case "TaskCreate":
31965
- case "TaskUpdate":
31966
- case "TaskList":
31967
- return "Updating the plan";
31968
- case "ToolSearch":
31969
- return "Finding the right tool";
31970
- default:
31971
- return "Working\u2026";
31884
+ if (lines.length === 0 || lines[lines.length - 1] !== l) {
31885
+ lines.push(l);
31972
31886
  }
31887
+ return renderActivityFeed(lines);
31973
31888
  }
31974
31889
 
31975
31890
  // tool-labels.ts
@@ -46262,6 +46177,13 @@ function transition(state3, event) {
46262
46177
  // gateway/inbound-delivery-machine-shadow.ts
46263
46178
  var state3 = initialState();
46264
46179
  var enabled5 = process.env.SWITCHROOM_DELIVERY_MACHINE_SHADOW !== "0";
46180
+ var cutoverEnabled = enabled5 && process.env.SWITCHROOM_DELIVERY_MACHINE_CUTOVER !== "0";
46181
+ function isDeliveryCutoverEnabled() {
46182
+ return cutoverEnabled;
46183
+ }
46184
+ function isMachineInTurn() {
46185
+ return state3.global.kind === "bridge_alive_in_turn";
46186
+ }
46265
46187
  function shadowEmit(event) {
46266
46188
  if (!enabled5)
46267
46189
  return [];
@@ -50143,10 +50065,10 @@ function sweepStaleTurnActiveMarker(stateDir, opts) {
50143
50065
  }
50144
50066
 
50145
50067
  // ../src/build-info.ts
50146
- var VERSION = "0.14.1";
50147
- var COMMIT_SHA = "e51a8794";
50148
- var COMMIT_DATE = "2026-05-28T07:24:12Z";
50149
- var LATEST_PR = 1956;
50068
+ var VERSION = "0.14.3";
50069
+ var COMMIT_SHA = "b61cef7e";
50070
+ var COMMIT_DATE = "2026-05-28T09:56:51Z";
50071
+ var LATEST_PR = 1964;
50150
50072
  var COMMITS_AHEAD_OF_TAG = 0;
50151
50073
 
50152
50074
  // gateway/boot-version.ts
@@ -51071,6 +50993,9 @@ function markClaudeBusyForInbound(m) {
51071
50993
  }
51072
50994
  claudeBusyKeys.add(chatKey2(m.chatId, tid));
51073
50995
  }
50996
+ function turnInFlightForGate() {
50997
+ return isDeliveryCutoverEnabled() ? isMachineInTurn() : claudeBusyKeys.size > 0;
50998
+ }
51074
50999
  var pendingRestarts = new Map;
51075
51000
  var lastSessionActiveFile = null;
51076
51001
  var compactState = initialCompactState();
@@ -51136,7 +51061,7 @@ function purgeReactionTracking(key, endingTurn) {
51136
51061
  if (agentDir != null)
51137
51062
  removeActiveReaction(agentDir, msgInfo.chatId, msgInfo.messageId);
51138
51063
  }
51139
- if (claudeBusyKeys.size === 0) {
51064
+ if (!turnInFlightForGate()) {
51140
51065
  const selfAgentForFlush = process.env.SWITCHROOM_AGENT_NAME ?? "";
51141
51066
  if (pendingInboundBuffer.depth(selfAgentForFlush) > 0) {
51142
51067
  const fr = redeliverBufferedInbound(pendingInboundBuffer, selfAgentForFlush, (m) => {
@@ -51166,7 +51091,7 @@ function releaseTurnBufferGate(key) {
51166
51091
  activeTurnStartedAt.delete(key);
51167
51092
  claudeBusyKeys.delete(key);
51168
51093
  shadowEmit({ kind: "turnEnd", key, at: Date.now(), outboundEmitted: true });
51169
- if (claudeBusyKeys.size === 0) {
51094
+ if (!turnInFlightForGate()) {
51170
51095
  const selfAgentForFlush = process.env.SWITCHROOM_AGENT_NAME ?? "";
51171
51096
  if (pendingInboundBuffer.depth(selfAgentForFlush) > 0) {
51172
51097
  const fr = redeliverBufferedInbound(pendingInboundBuffer, selfAgentForFlush, (m) => {
@@ -52114,6 +52039,11 @@ startTimer({
52114
52039
  `);
52115
52040
  }
52116
52041
  });
52042
+ var DELIVERY_MACHINE_TICK_MS = 30000;
52043
+ var _deliveryMachineTick = setInterval(() => {
52044
+ shadowEmit({ kind: "tick", now: Date.now() });
52045
+ }, DELIVERY_MACHINE_TICK_MS);
52046
+ _deliveryMachineTick.unref?.();
52117
52047
  startTimer2({
52118
52048
  editMessage: async (ctx) => {
52119
52049
  const editOpts = ctx.parseMode != null ? { parse_mode: ctx.parseMode } : undefined;
@@ -52415,7 +52345,7 @@ ${reminder}
52415
52345
  onHeartbeat(_client, _msg) {},
52416
52346
  onScheduleRestart(client3, msg) {
52417
52347
  const { agentName: agentName3 } = msg;
52418
- const turnInFlight = claudeBusyKeys.size > 0;
52348
+ const turnInFlight = turnInFlightForGate();
52419
52349
  if (!turnInFlight) {
52420
52350
  try {
52421
52351
  client3.send({
@@ -52670,7 +52600,7 @@ if (!STATIC) {
52670
52600
  setInterval(() => {
52671
52601
  const selfAgent = process.env.SWITCHROOM_AGENT_NAME ?? "";
52672
52602
  const r = idleDrainTick(pendingInboundBuffer, selfAgent, () => {
52673
- if (claudeBusyKeys.size > 0)
52603
+ if (turnInFlightForGate())
52674
52604
  return false;
52675
52605
  const c = ipcServer.getClient(selfAgent);
52676
52606
  return c != null && c.isAlive();
@@ -52951,6 +52881,7 @@ ${url}`;
52951
52881
  });
52952
52882
  noteOutbound(statusKey(chat_id, threadId), Date.now());
52953
52883
  noteOutbound2(statusKey(chat_id, threadId), Date.now());
52884
+ shadowEmit({ kind: "modelOutbound", key: statusKey(chat_id, threadId), at: Date.now() });
52954
52885
  if (isFinalAnswerReply({ text: rawText, disableNotification })) {
52955
52886
  clearSilentEndState(statusKey(chat_id, threadId));
52956
52887
  }
@@ -53278,6 +53209,7 @@ async function executeStreamReply(args) {
53278
53209
  const sKey = statusKey(streamChatId, streamThreadId);
53279
53210
  noteOutbound(sKey, Date.now());
53280
53211
  noteOutbound2(sKey, Date.now());
53212
+ shadowEmit({ kind: "modelOutbound", key: sKey, at: Date.now() });
53281
53213
  if (isFinalAnswerReply({
53282
53214
  text: args.text ?? "",
53283
53215
  disableNotification: args.disable_notification === true,
@@ -54223,10 +54155,16 @@ function handleSessionEvent(ev) {
54223
54155
  activityInFlight: null,
54224
54156
  activityPendingRender: null,
54225
54157
  activityLastSentRender: null,
54158
+ mirrorLines: [],
54226
54159
  answerStream: null,
54227
54160
  isDm: isDmChatId(ev.chatId)
54228
54161
  };
54229
54162
  currentTurn = next;
54163
+ shadowEmit({
54164
+ kind: "turnStart",
54165
+ key: statusKey(ev.chatId, ev.threadId != null ? Number(ev.threadId) : undefined),
54166
+ at: startedAt
54167
+ });
54230
54168
  preambleSuppressor.reset();
54231
54169
  clearSilentEndState(statusKey(ev.chatId, ev.threadId != null ? Number(ev.threadId) : null));
54232
54170
  if (turnsDb != null) {
@@ -54288,12 +54226,12 @@ function handleSessionEvent(ev) {
54288
54226
  clearTimeout(turn.orphanedReplyTimeoutId);
54289
54227
  turn.orphanedReplyTimeoutId = null;
54290
54228
  }
54291
- if (wasFirstReply) {
54229
+ if (wasFirstReply && !DRAFT_MIRROR_ENABLED) {
54292
54230
  clearActivitySummary(turn);
54293
54231
  }
54294
54232
  }
54295
- if (!turn.replyCalled && !isTelegramSurfaceTool(name)) {
54296
- const rendered = DRAFT_MIRROR_ENABLED ? describeToolUse(name, ev.input) : registerAndRender(turn.toolActivity, name);
54233
+ if (!DRAFT_MIRROR_ENABLED && !turn.replyCalled && !isTelegramSurfaceTool(name)) {
54234
+ const rendered = registerAndRender(turn.toolActivity, name);
54297
54235
  if (rendered != null) {
54298
54236
  turn.activityPendingRender = rendered;
54299
54237
  if (turn.activityInFlight == null) {
@@ -54311,6 +54249,23 @@ function handleSessionEvent(ev) {
54311
54249
  }
54312
54250
  return;
54313
54251
  }
54252
+ case "tool_label": {
54253
+ if (!DRAFT_MIRROR_ENABLED)
54254
+ return;
54255
+ const turn = currentTurn;
54256
+ if (turn == null)
54257
+ return;
54258
+ if (isTelegramSurfaceTool(ev.toolName))
54259
+ return;
54260
+ const rendered = appendActivityLabel(turn.mirrorLines, ev.label);
54261
+ if (rendered != null) {
54262
+ turn.activityPendingRender = rendered;
54263
+ if (turn.activityInFlight == null) {
54264
+ turn.activityInFlight = drainActivitySummary(turn);
54265
+ }
54266
+ }
54267
+ return;
54268
+ }
54314
54269
  case "text": {
54315
54270
  const turn = currentTurn;
54316
54271
  if (turn != null) {
@@ -54440,6 +54395,9 @@ function handleSessionEvent(ev) {
54440
54395
  clearTimeout(turn.orphanedReplyTimeoutId);
54441
54396
  turn.orphanedReplyTimeoutId = null;
54442
54397
  }
54398
+ if (DRAFT_MIRROR_ENABLED && turn != null) {
54399
+ clearActivitySummary(turn);
54400
+ }
54443
54401
  preambleSuppressor.flushNow();
54444
54402
  let streamFinalizedAsAnswer = false;
54445
54403
  if (turn?.answerStream != null) {
@@ -54980,6 +54938,7 @@ async function handleInbound(ctx, text, downloadImage, attachment) {
54980
54938
  }
54981
54939
  const inboundReceivedAt = Date.now();
54982
54940
  const _shadowKey = statusKey(ctx.chat?.id != null ? String(ctx.chat.id) : "0", ctx.message?.message_thread_id);
54941
+ const machineInTurnAtReceipt = isDeliveryCutoverEnabled() ? isMachineInTurn() : null;
54983
54942
  shadowEmit({
54984
54943
  kind: "inbound",
54985
54944
  key: _shadowKey,
@@ -54990,7 +54949,7 @@ async function handleInbound(ctx, text, downloadImage, attachment) {
54990
54949
  },
54991
54950
  at: Date.now()
54992
54951
  });
54993
- const turnInFlightAtReceipt = claudeBusyKeys.size > 0;
54952
+ const turnInFlightAtReceipt = machineInTurnAtReceipt ?? claudeBusyKeys.size > 0;
54994
54953
  const access = result.access;
54995
54954
  const from = ctx.from;
54996
54955
  const chat_id = String(ctx.chat.id);
@@ -17091,14 +17091,14 @@ function createToolLabelSidecar(opts) {
17091
17091
  } catch {
17092
17092
  continue;
17093
17093
  }
17094
- if (!row || typeof row.tool_use_id !== "string" || typeof row.label !== "string")
17094
+ if (!row || typeof row.tool_use_id !== "string" || typeof row.label !== "string" || typeof row.tool_name !== "string")
17095
17095
  continue;
17096
17096
  if (labels.has(row.tool_use_id))
17097
17097
  continue;
17098
17098
  labels.set(row.tool_use_id, row.label);
17099
17099
  for (const cb of subscribers) {
17100
17100
  try {
17101
- cb(row.tool_use_id, row.label);
17101
+ cb(row.tool_use_id, row.label, row.tool_name);
17102
17102
  } catch {}
17103
17103
  }
17104
17104
  }
@@ -17479,6 +17479,9 @@ function startSessionTail(config2) {
17479
17479
  try {
17480
17480
  const s = createToolLabelSidecar({ stateDir: stateDirForSidecar, sessionId });
17481
17481
  sidecars.set(sessionId, s);
17482
+ s.onLabel((toolUseId, label, toolName) => {
17483
+ rawOnEvent({ kind: "tool_label", toolUseId, label, toolName });
17484
+ });
17482
17485
  return s;
17483
17486
  } catch (err) {
17484
17487
  log?.(`session-tail: sidecar create failed: ${err.message}`);
@@ -17592,6 +17595,9 @@ function startSessionTail(config2) {
17592
17595
  }
17593
17596
  log?.(`session-tail: attached to ${file} (cursor=${cursor})`);
17594
17597
  }
17598
+ const attachSid = sessionIdForFile(file);
17599
+ if (attachSid)
17600
+ ensureSidecar(attachSid);
17595
17601
  try {
17596
17602
  watcher = watch(file, () => readNew());
17597
17603
  } catch (err) {
@@ -58,6 +58,8 @@ import {
58
58
  makeEmptyActivityState,
59
59
  registerAndRender,
60
60
  describeToolUse,
61
+ appendActivityLine,
62
+ appendActivityLabel,
61
63
  type ActivityState,
62
64
  } from '../tool-activity-summary.js'
63
65
  import { toolLabel } from '../tool-labels.js'
@@ -285,7 +287,7 @@ import { chatKey, chatKeyWithSuffix, chatIdOfChatKey } from './chat-key.js'
285
287
  // should do. Behavior unchanged in this PR — the imperative code below
286
288
  // still runs everything. PR 3 will cut over to executing the machine's
287
289
  // effects.
288
- import { shadowEmit } from './inbound-delivery-machine-shadow.js'
290
+ import { shadowEmit, isMachineInTurn, isDeliveryCutoverEnabled } from './inbound-delivery-machine-shadow.js'
289
291
  import type { ChatKey as _ChatKey } from './inbound-delivery-machine.js'
290
292
  import { dispatchEffects, isDispatchEnabled } from './inbound-delivery-machine-dispatch.js'
291
293
  import { maybeFireWarmup } from './prefix-warmup.js'
@@ -1160,6 +1162,24 @@ function markClaudeBusyForInbound(m: {
1160
1162
  }
1161
1163
  claudeBusyKeys.add(chatKey(m.chatId, tid))
1162
1164
  }
1165
+
1166
+ /**
1167
+ * Authoritative "is a turn in flight?" for every gate that previously
1168
+ * read `claudeBusyKeys.size`. PR 3b cutover (extends PR 3a's bridgeUp
1169
+ * dispatch): when the delivery state machine is authoritative
1170
+ * (`SWITCHROOM_DELIVERY_MACHINE_CUTOVER` on + shadow on) the answer is
1171
+ * its single-`activeTurn` global state, which — unlike the
1172
+ * per-delivery `claudeBusyKeys` set — cannot accumulate orphan keys and
1173
+ * wedge the gate "in-flight forever" (the gymbro/clerk 5-min dangle,
1174
+ * 2026-05-28). Kill-switch off → exact legacy claudeBusyKeys behaviour.
1175
+ *
1176
+ * NOT for the inbound-receipt gate (line ~8551): that must snapshot the
1177
+ * machine state BEFORE the inbound event advances it, or a fresh-turn
1178
+ * message self-blocks. See the snapshot at the inbound handler.
1179
+ */
1180
+ function turnInFlightForGate(): boolean {
1181
+ return isDeliveryCutoverEnabled() ? isMachineInTurn() : claudeBusyKeys.size > 0
1182
+ }
1163
1183
  const pendingRestarts = new Map<string, number>() // agentName -> timestamp when restart was requested
1164
1184
 
1165
1185
  // ─── Proactive context compaction (session.max_context_tokens) ──────────
@@ -1338,6 +1358,11 @@ type CurrentTurn = {
1338
1358
  activityInFlight: Promise<void> | null
1339
1359
  activityPendingRender: string | null
1340
1360
  activityLastSentRender: string | null
1361
+ // Draft-mirror Phase 2: accumulating friendly-action feed for this turn
1362
+ // (DRAFT_MIRROR only). Each non-surface tool_use appends a line via
1363
+ // `appendActivityLine`; the feed renders as a capped chronological list
1364
+ // in the ephemeral draft and clears on reply. Reset per turn.
1365
+ mirrorLines: string[]
1341
1366
  // Issue #195 — answer-lane streaming. Lazily created on the first text
1342
1367
  // event of a turn (once enough text has accumulated, the stream itself
1343
1368
  // gates on minInitialChars). Materialized and cleared at turn_end.
@@ -1484,7 +1509,11 @@ function purgeReactionTracking(key: string, endingTurn?: CurrentTurn): void {
1484
1509
  // activeTurnStartedAt entry in the fresh-turn branch) doesn't pin this
1485
1510
  // gate forever while claude is genuinely idle. See the claudeBusyKeys
1486
1511
  // declaration for the supergroup deadlock this fixes.
1487
- if (claudeBusyKeys.size === 0) {
1512
+ // PR3b-cutover: `turnInFlightForGate()` reads the delivery machine
1513
+ // when the cutover kill-switch is on; the turnEnd event was emitted
1514
+ // just above (purgeReactionTracking head), so the machine is already
1515
+ // idle here.
1516
+ if (!turnInFlightForGate()) {
1488
1517
  // #1556: the deterministic delivery point. claude has just gone
1489
1518
  // idle — flush any inbound held mid-turn so the channel
1490
1519
  // notification lands at the idle prompt and submits as a fresh
@@ -1584,7 +1613,9 @@ function releaseTurnBufferGate(key: string): void {
1584
1613
  // test-harness's 13:02 UAT now opens after the reply.
1585
1614
  //
1586
1615
  // PR3b: gated on claudeBusyKeys (see purgeReactionTracking comment).
1587
- if (claudeBusyKeys.size === 0) {
1616
+ // PR3b-cutover: turnEnd was emitted just above (releaseTurnBufferGate
1617
+ // head), so the machine is already idle when the cutover gate reads.
1618
+ if (!turnInFlightForGate()) {
1588
1619
  const selfAgentForFlush = process.env.SWITCHROOM_AGENT_NAME ?? ''
1589
1620
  if (pendingInboundBuffer.depth(selfAgentForFlush) > 0) {
1590
1621
  const fr = redeliverBufferedInbound(
@@ -3650,6 +3681,23 @@ silencePoke.startTimer({
3650
3681
  },
3651
3682
  })
3652
3683
 
3684
+ // PR3b-cutover: drive the delivery machine's TTL `tick`. The machine
3685
+ // expires any turn whose `turnStartedAt` is older than TURN_TTL_MS
3686
+ // (5 min) and drops global state back to idle — its structural
3687
+ // equivalent of the imperative silence-poke framework-fallback. This
3688
+ // is the load-bearing safety net for the cutover gate: even if a
3689
+ // `turnEnd` event is somehow missed (the dangle class), the machine
3690
+ // self-heals at TTL instead of pinning the gate "in-flight forever".
3691
+ // shadowEmit only advances state + logs the predicted effects; we
3692
+ // deliberately do NOT execute the machine's firePoke here (the
3693
+ // imperative silence-poke still owns the user-facing ping), so there
3694
+ // is no double-poke. unref so the interval never holds the process.
3695
+ const DELIVERY_MACHINE_TICK_MS = 30_000
3696
+ const _deliveryMachineTick = setInterval(() => {
3697
+ shadowEmit({ kind: 'tick', now: Date.now() })
3698
+ }, DELIVERY_MACHINE_TICK_MS)
3699
+ _deliveryMachineTick.unref?.()
3700
+
3653
3701
  // #1445 cross-turn pending-async ambient. When a turn ends after the
3654
3702
  // model dispatched background async work (Agent / Task / Bash run-in-
3655
3703
  // background) and the model has stopped speaking, keep editing the
@@ -4189,7 +4237,8 @@ const ipcServer: IpcServer = createIpcServer({
4189
4237
  // PR3b: gated on claudeBusyKeys (actually-handed-to-claude turns)
4190
4238
  // not activeTurnStartedAt (receipt-eager), so a buffered topic-B
4191
4239
  // inbound doesn't pin this as turnInFlight=true forever.
4192
- const turnInFlight = claudeBusyKeys.size > 0;
4240
+ // PR3b-cutover: reads the delivery machine when the kill-switch is on.
4241
+ const turnInFlight = turnInFlightForGate();
4193
4242
 
4194
4243
  if (!turnInFlight) {
4195
4244
  // No active turn, restart immediately. Cycle both the agent and
@@ -4609,7 +4658,8 @@ if (!STATIC) {
4609
4658
  // #1556: never drain mid-turn — that re-creates the composer
4610
4659
  // wedge this buffer exists to prevent.
4611
4660
  // PR3b: gated on claudeBusyKeys (see purgeReactionTracking).
4612
- if (claudeBusyKeys.size > 0) return false
4661
+ // PR3b-cutover: reads the delivery machine when the kill-switch is on.
4662
+ if (turnInFlightForGate()) return false
4613
4663
  const c = ipcServer.getClient(selfAgent)
4614
4664
  return c != null && c.isAlive()
4615
4665
  },
@@ -5014,6 +5064,11 @@ async function executeReply(args: Record<string, unknown>): Promise<{ content: A
5014
5064
  // silence-poke clock so the next poke is measured from this send.
5015
5065
  signalTracker.noteOutbound(statusKey(chat_id, threadId), Date.now())
5016
5066
  silencePoke.noteOutbound(statusKey(chat_id, threadId), Date.now())
5067
+ // PR3b-cutover: feed lastOutboundAt to the delivery machine so its
5068
+ // TTL `tick` suppresses the fallback for a long-but-active turn
5069
+ // (model streaming past 5 min) — parity with silencePoke's own
5070
+ // suppression, so the cutover gate doesn't clear a live turn.
5071
+ shadowEmit({ kind: 'modelOutbound', key: statusKey(chat_id, threadId) as _ChatKey, at: Date.now() })
5017
5072
  // #1741 — only clear silent-end state on a plausibly-final reply.
5018
5073
  // An interim ack (disable_notification:true, short text, no done)
5019
5074
  // must NOT clear the state file; otherwise a turn that ends with
@@ -5609,6 +5664,9 @@ async function executeStreamReply(args: Record<string, unknown>): Promise<unknow
5609
5664
  const sKey = statusKey(streamChatId, streamThreadId)
5610
5665
  signalTracker.noteOutbound(sKey, Date.now())
5611
5666
  silencePoke.noteOutbound(sKey, Date.now())
5667
+ // PR3b-cutover: feed lastOutboundAt to the delivery machine (see
5668
+ // executeReply) so its TTL tick suppresses an active-turn fallback.
5669
+ shadowEmit({ kind: 'modelOutbound', key: sKey as _ChatKey, at: Date.now() })
5612
5670
  // #1741 — see executeReply for the rationale: only a plausibly-
5613
5671
  // final stream_reply clears the silent-end state. An interim
5614
5672
  // ack via stream_reply must NOT clear; the Stop hook needs
@@ -7001,10 +7059,25 @@ function handleSessionEvent(ev: SessionEvent): void {
7001
7059
  activityInFlight: null,
7002
7060
  activityPendingRender: null,
7003
7061
  activityLastSentRender: null,
7062
+ mirrorLines: [],
7004
7063
  answerStream: null,
7005
7064
  isDm: isDmChatId(ev.chatId),
7006
7065
  }
7007
7066
  currentTurn = next
7067
+ // PR3b-cutover: feed the authoritative turn-start to the delivery
7068
+ // machine. `enqueue` fires for EVERY turn atom regardless of
7069
+ // source — inbound, cron, subagent-handback, vault-resume,
7070
+ // restart-marker — so it is the single chokepoint that captures
7071
+ // the non-inbound turns the machine's own `inbound` event never
7072
+ // sees (those bypass handleInbound). Without it the machine reads
7073
+ // idle during a cron/handback turn and the gate would mis-deliver
7074
+ // a concurrent inbound mid-turn (the #1556 composer wedge).
7075
+ // Idempotent when already in_turn (turnStart only sets perKey).
7076
+ shadowEmit({
7077
+ kind: 'turnStart',
7078
+ key: statusKey(ev.chatId, ev.threadId != null ? Number(ev.threadId) : undefined) as _ChatKey,
7079
+ at: startedAt,
7080
+ })
7008
7081
  // #549 fix — fresh turn, reset preamble-suppression state.
7009
7082
  preambleSuppressor.reset()
7010
7083
  // Reset the silent-end retry budget for this chat. The stored
@@ -7123,7 +7196,12 @@ function handleSessionEvent(ev: SessionEvent): void {
7123
7196
  // empty draft to wipe the compose-area preview; for persisted
7124
7197
  // messages, delete. The user sees the real reply land in the
7125
7198
  // same beat the summary disappears.
7126
- if (wasFirstReply) {
7199
+ // Legacy (flag-off): the activity summary clears on the first
7200
+ // reply — it was a one-shot "what I did" line. DRAFT_MIRROR keeps
7201
+ // the live feed running through mid-turn replies and clears it at
7202
+ // turn_end instead, so an early reply doesn't wipe the stream
7203
+ // (the fast-turn determinism fix).
7204
+ if (wasFirstReply && !DRAFT_MIRROR_ENABLED) {
7127
7205
  clearActivitySummary(turn)
7128
7206
  }
7129
7207
  }
@@ -7146,21 +7224,19 @@ function handleSessionEvent(ev: SessionEvent): void {
7146
7224
  // exactly once at a time and re-running until pending matches
7147
7225
  // the last-sent. Captures `turn` so a late drain after turn-swap
7148
7226
  // can't corrupt the next turn's atom.
7149
- // DRAFT_MIRROR (RFC draft-mirror-preview): render each tool_use as a
7150
- // human-friendly line in the live preview, using the model-authored
7151
- // descriptive field (Bash.description, Read/Edit file basename,
7152
- // hindsight→"Searching memory", etc. — see describeToolUse). Latest
7153
- // action wins (the draft shows "doing X" live), clears on reply.
7154
- // Never surfaces raw shell/query syntax — option A, uniform across
7155
- // code + non-code agents.
7156
- //
7157
7227
  // Flag OFF (default): the legacy generic verb-count summary
7158
7228
  // ("Ran 5 commands") via registerAndRender — byte-identical to
7159
- // pre-draft-mirror behavior.
7160
- if (!turn.replyCalled && !isTelegramSurfaceTool(name)) {
7161
- const rendered = DRAFT_MIRROR_ENABLED
7162
- ? describeToolUse(name, ev.input)
7163
- : registerAndRender(turn.toolActivity, name)
7229
+ // pre-draft-mirror behavior, cleared on first reply.
7230
+ //
7231
+ // DRAFT_MIRROR: the draft is NOT driven from this (flush-gated)
7232
+ // tool_use event — it's driven by the real-time `tool_label` event
7233
+ // (PreToolUse sidecar, fires at tool-call time regardless of when
7234
+ // claude flushes the transcript). See `case 'tool_label'`. That's
7235
+ // the determinism fix: on a fast/clustered-tool turn the JSONL
7236
+ // tool_use rows aren't on disk until ~turn-end, so sourcing the
7237
+ // draft here lost the feed; the sidecar is flush-independent.
7238
+ if (!DRAFT_MIRROR_ENABLED && !turn.replyCalled && !isTelegramSurfaceTool(name)) {
7239
+ const rendered = registerAndRender(turn.toolActivity, name)
7164
7240
  if (rendered != null) {
7165
7241
  turn.activityPendingRender = rendered
7166
7242
  if (turn.activityInFlight == null) {
@@ -7176,6 +7252,31 @@ function handleSessionEvent(ev: SessionEvent): void {
7176
7252
  }
7177
7253
  return
7178
7254
  }
7255
+ case 'tool_label': {
7256
+ // DRAFT_MIRROR real-time driver. The PreToolUse hook wrote this
7257
+ // label synchronously at tool-call time; the sidecar surfaced it
7258
+ // here (~250ms) independent of the transcript flush. Accumulate it
7259
+ // into the live feed and update the ephemeral draft — this is what
7260
+ // makes the draft deterministic on fast/clustered-tool turns where
7261
+ // the JSONL tool_use rows arrive too late.
7262
+ if (!DRAFT_MIRROR_ENABLED) return
7263
+ const turn = currentTurn
7264
+ if (turn == null) return
7265
+ // Surface tools (reply/stream_reply/react) are the conversation, not
7266
+ // activity — the hook labels them ("Replying"), so filter by name.
7267
+ if (isTelegramSurfaceTool(ev.toolName)) return
7268
+ // Unlike the legacy tool_use path, do NOT gate on replyCalled — the
7269
+ // whole point is to show activity even when a reply raced ahead of
7270
+ // the (lagged) transcript. The feed clears at turn_end.
7271
+ const rendered = appendActivityLabel(turn.mirrorLines, ev.label)
7272
+ if (rendered != null) {
7273
+ turn.activityPendingRender = rendered
7274
+ if (turn.activityInFlight == null) {
7275
+ turn.activityInFlight = drainActivitySummary(turn)
7276
+ }
7277
+ }
7278
+ return
7279
+ }
7179
7280
  case 'text': {
7180
7281
  // #1067: snapshot at entry. The answer-stream creation closures
7181
7282
  // below also read `turn` instead of currentTurn so they pin to
@@ -7446,6 +7547,14 @@ function handleSessionEvent(ev: SessionEvent): void {
7446
7547
  clearTimeout(turn.orphanedReplyTimeoutId)
7447
7548
  turn.orphanedReplyTimeoutId = null
7448
7549
  }
7550
+ // DRAFT_MIRROR: the live activity feed runs through the whole turn
7551
+ // (it is NOT cleared on the first reply, unlike the legacy summary)
7552
+ // so an early/mid-turn reply can't wipe it. Clear it here, at the
7553
+ // real end of the turn — the ephemeral compose-area draft goes away
7554
+ // once the work is actually done.
7555
+ if (DRAFT_MIRROR_ENABLED && turn != null) {
7556
+ clearActivitySummary(turn)
7557
+ }
7449
7558
  // #549 fix — flush any pending preamble BEFORE the answer stream is
7450
7559
  // nulled below. Text emitted immediately before turn_end (no tool
7451
7560
  // followed) is the answer; the suppressor's emitAnswer callback
@@ -8497,6 +8606,14 @@ async function handleInbound(
8497
8606
  // vs mid-turn — its decision will be visible in the gw-trace shadow
8498
8607
  // line emitted to stderr.
8499
8608
  const _shadowKey = statusKey(ctx.chat?.id != null ? String(ctx.chat.id) : '0', ctx.message?.message_thread_id) as _ChatKey
8609
+ // PR3b-cutover: snapshot the machine's in-turn state BEFORE the
8610
+ // inbound event advances it. A fresh-turn inbound transitions the
8611
+ // machine idle→in_turn; reading after the emit would see THIS
8612
+ // message's own just-started turn and self-block it (the same
8613
+ // self-block hazard the claudeBusyKeys snapshot below guards). When
8614
+ // the kill-switch is off this is null and the gate uses the legacy
8615
+ // claudeBusyKeys read.
8616
+ const machineInTurnAtReceipt = isDeliveryCutoverEnabled() ? isMachineInTurn() : null
8500
8617
  shadowEmit({
8501
8618
  kind: 'inbound',
8502
8619
  key: _shadowKey,
@@ -8548,7 +8665,12 @@ async function handleInbound(
8548
8665
  // no turn_end ever fires). With claudeBusyKeys, B sees true (A is
8549
8666
  // busy) → B is buffered correctly, AND the gate cleanly reopens
8550
8667
  // when A's turn_end deletes keyA → flush triggers → B delivered.
8551
- const turnInFlightAtReceipt = claudeBusyKeys.size > 0
8668
+ // PR3b-cutover: prefer the machine snapshot taken before the inbound
8669
+ // event advanced it (machineInTurnAtReceipt); null when the
8670
+ // kill-switch is off, in which case the legacy claudeBusyKeys read
8671
+ // stands. Both are "was a turn in flight at receipt", not a live
8672
+ // post-this-inbound read — see machineInTurnAtReceipt's comment.
8673
+ const turnInFlightAtReceipt = machineInTurnAtReceipt ?? (claudeBusyKeys.size > 0)
8552
8674
 
8553
8675
  const access = result.access
8554
8676
  const from = ctx.from!
@@ -43,6 +43,39 @@ import {
43
43
  let state: State = initialState()
44
44
  const enabled = process.env.SWITCHROOM_DELIVERY_MACHINE_SHADOW !== '0'
45
45
 
46
+ // Phase 2b PR 3 — STAGED CUTOVER. When enabled, the gateway's
47
+ // "is a turn in flight?" gate reads this machine's global state
48
+ // instead of the PR3b `claudeBusyKeys` set. The machine tracks ONE
49
+ // `activeTurn` (single bridge) plus TTL `tick` expiry, so — unlike a
50
+ // per-delivery key set — it cannot accumulate orphan keys and wedge
51
+ // the gate "in-flight forever" (the gymbro/clerk 5-min dangle of
52
+ // 2026-05-28). Scope is the turn-in-flight GATE only; the poke ladder
53
+ // and perm-verdict effects stay imperative for a follow-up PR.
54
+ //
55
+ // Kill switch: `SWITCHROOM_DELIVERY_MACHINE_CUTOVER=0` reverts every
56
+ // gate to the legacy claudeBusyKeys read (zero behaviour change).
57
+ // Requires shadow mode ON — with shadow off the machine state is
58
+ // frozen and must NOT be read as authoritative.
59
+ const cutoverEnabled = enabled && process.env.SWITCHROOM_DELIVERY_MACHINE_CUTOVER !== '0'
60
+
61
+ /**
62
+ * True when the kill-switch leaves the delivery machine authoritative
63
+ * for the turn-in-flight gate. Gateway gate sites branch on this.
64
+ */
65
+ export function isDeliveryCutoverEnabled(): boolean {
66
+ return cutoverEnabled
67
+ }
68
+
69
+ /**
70
+ * Authoritative "is a turn currently in flight?" read for the gate.
71
+ * Maps the machine's global state to the boolean the legacy
72
+ * `claudeBusyKeys.size > 0` gate produced. `bridge_dead` and
73
+ * `bridge_alive_idle` are both "not in flight".
74
+ */
75
+ export function isMachineInTurn(): boolean {
76
+ return state.global.kind === 'bridge_alive_in_turn'
77
+ }
78
+
46
79
  /**
47
80
  * Run an event through the state machine in shadow mode. The machine
48
81
  * state advances, the predicted effects are LOGGED, but no I/O fires.
@@ -74,15 +74,24 @@ function urlHostPath(u) {
74
74
  export function computeLabel(toolName, input) {
75
75
  const i = input ?? {}
76
76
 
77
- // Tools whose labels are already handled elsewhere emit nothing so
78
- // the existing description / TodoWrite / sub-agent paths win.
77
+ // Bash / Task / ToolSearch / TodoWrite: previously emitted nothing
78
+ // (deferred to the session-JSONL description path). The draft-mirror
79
+ // now drives off THIS sidecar in real time (flush-independent), so we
80
+ // must label them here too — otherwise the most common tool (Bash)
81
+ // never reaches the live draft. Uses the model-authored `description`
82
+ // for Bash/Task, matching the gateway's describeToolUse rendering.
79
83
  switch (toolName) {
80
84
  case 'Bash':
85
+ return clip(String(i.description ?? ''), 70).trim() || 'Running a command'
81
86
  case 'Task':
82
- case 'Agent':
87
+ case 'Agent': {
88
+ const d = clip(String(i.description ?? ''), 60).trim()
89
+ return d ? `Delegating: ${d}` : 'Delegating to a sub-agent'
90
+ }
83
91
  case 'TodoWrite':
92
+ return 'Updating the plan'
84
93
  case 'ToolSearch':
85
- return null
94
+ return 'Finding the right tool'
86
95
  }
87
96
 
88
97
  // Built-in rule table.
@@ -93,6 +93,11 @@ export type SessionEvent =
93
93
  | { kind: 'dequeue' }
94
94
  | { kind: 'thinking' }
95
95
  | { kind: 'tool_use'; toolName: string; toolUseId?: string | null; input?: Record<string, unknown>; precomputedLabel?: string }
96
+ // Real-time tool label from the PreToolUse-hook sidecar — fires when the
97
+ // hook writes the label (synchronous at tool-call time), independent of
98
+ // the lazily-flushed transcript. The draft-mirror drives off THIS, not
99
+ // the flush-gated `tool_use`, so activity streams deterministically.
100
+ | { kind: 'tool_label'; toolUseId: string; label: string; toolName: string }
96
101
  | { kind: 'text'; text: string }
97
102
  | { kind: 'tool_result'; toolUseId: string; toolName: string | null; isError?: boolean; errorText?: string }
98
103
  | { kind: 'turn_end'; durationMs: number }
@@ -639,6 +644,13 @@ export function startSessionTail(config: SessionTailConfig): SessionTailHandle {
639
644
  try {
640
645
  const s = createToolLabelSidecar({ stateDir: stateDirForSidecar, sessionId })
641
646
  sidecars.set(sessionId, s)
647
+ // Real-time draft-mirror source: emit a `tool_label` event the moment
648
+ // the hook writes a label (flush-independent), so the gateway can
649
+ // stream the activity feed without waiting on the transcript flush.
650
+ // Subscribed once per sidecar (this is the only creation site).
651
+ s.onLabel((toolUseId, label, toolName) => {
652
+ rawOnEvent({ kind: 'tool_label', toolUseId, label, toolName })
653
+ })
642
654
  return s
643
655
  } catch (err) {
644
656
  log?.(`session-tail: sidecar create failed: ${(err as Error).message}`)
@@ -775,6 +787,12 @@ export function startSessionTail(config: SessionTailConfig): SessionTailHandle {
775
787
  }
776
788
  log?.(`session-tail: attached to ${file} (cursor=${cursor})`)
777
789
  }
790
+ // Eagerly create + subscribe the PreToolUse sidecar for this session
791
+ // NOW (on attach), not lazily on the first JSONL tool_use — otherwise
792
+ // the real-time `tool_label` source wouldn't exist until a flush-gated
793
+ // tool_use arrived, re-introducing the very lag the sidecar avoids.
794
+ const attachSid = sessionIdForFile(file)
795
+ if (attachSid) ensureSidecar(attachSid)
778
796
  try {
779
797
  watcher = watch(file, () => readNew())
780
798
  } catch (err) {
@@ -0,0 +1,93 @@
1
+ /**
2
+ * PR3b cutover — the turn-in-flight GATE now reads the delivery state
3
+ * machine (`isMachineInTurn`) instead of the PR3b `claudeBusyKeys` set.
4
+ *
5
+ * The bug this closes (gymbro/clerk, 2026-05-28): `claudeBusyKeys` is a
6
+ * per-delivery Set — every delivery `.add`s a key, but turn-end `.delete`s
7
+ * exactly one. When a turn-end is missed (or fires under a non-matching
8
+ * key) the set keeps an orphan, `size > 0` reads true forever, and EVERY
9
+ * subsequent inbound buffers as "held mid-turn" until the 5-min
10
+ * framework-fallback force-drains it.
11
+ *
12
+ * The machine cannot accumulate orphans: global state holds ONE
13
+ * `activeTurn`, so any matching turnEnd returns it to idle, and the TTL
14
+ * `tick` self-heals a missed turnEnd. These tests pin both the normal
15
+ * reopen and the dangle-recovery path on the accessors the gate reads.
16
+ */
17
+
18
+ import { describe, expect, it, beforeEach } from 'vitest'
19
+ import {
20
+ shadowEmit,
21
+ isMachineInTurn,
22
+ isDeliveryCutoverEnabled,
23
+ __shadowResetForTests,
24
+ } from '../gateway/inbound-delivery-machine-shadow.js'
25
+ import { TURN_TTL_MS, type ChatKey } from '../gateway/inbound-delivery-machine.js'
26
+
27
+ const KEY_A = '111:_' as ChatKey
28
+ const KEY_B = '222:_' as ChatKey
29
+
30
+ function inbound(key: ChatKey, at: number, msgId = 1) {
31
+ shadowEmit({ kind: 'inbound', key, msg: { msgId, isSteering: false, payload: null }, at })
32
+ }
33
+
34
+ describe('PR3b cutover gate accessors', () => {
35
+ beforeEach(() => __shadowResetForTests())
36
+
37
+ it('enabled by default (shadow on, no kill-switch in test env)', () => {
38
+ expect(isDeliveryCutoverEnabled()).toBe(true)
39
+ })
40
+
41
+ it('reads idle before any turn (bridge alive)', () => {
42
+ shadowEmit({ kind: 'bridgeUp', at: 1000 })
43
+ expect(isMachineInTurn()).toBe(false)
44
+ })
45
+
46
+ it('flips in-turn on a fresh inbound and reopens on turnEnd (the gate reopen)', () => {
47
+ shadowEmit({ kind: 'bridgeUp', at: 1000 })
48
+ inbound(KEY_A, 2000)
49
+ expect(isMachineInTurn()).toBe(true)
50
+ shadowEmit({ kind: 'turnEnd', key: KEY_A, at: 3000, outboundEmitted: true })
51
+ // Gate reopens immediately — this is the path claudeBusyKeys danged on.
52
+ expect(isMachineInTurn()).toBe(false)
53
+ })
54
+
55
+ it('self-heals a MISSED turnEnd via the TTL tick (the dangle the fix kills)', () => {
56
+ shadowEmit({ kind: 'bridgeUp', at: 1000 })
57
+ // Turn A starts via enqueue (turnStart), then turn B starts before A's
58
+ // turnEnd ever lands — the orphan scenario. The machine keeps
59
+ // activeTurn=A (turnStart is a no-op on global when already in_turn),
60
+ // so a later turnEnd(B) does NOT match and would leave A dangling.
61
+ shadowEmit({ kind: 'turnStart', key: KEY_A, at: 2000 })
62
+ shadowEmit({ kind: 'turnStart', key: KEY_B, at: 3000 })
63
+ shadowEmit({ kind: 'turnEnd', key: KEY_B, at: 4000, outboundEmitted: true })
64
+ // Without tick, the gate would still read in-turn (activeTurn=A stuck).
65
+ expect(isMachineInTurn()).toBe(true)
66
+ // TTL tick past A's start clears the orphan and reopens the gate —
67
+ // the structural guarantee claudeBusyKeys lacked.
68
+ shadowEmit({ kind: 'tick', now: 2000 + TURN_TTL_MS + 1 })
69
+ expect(isMachineInTurn()).toBe(false)
70
+ })
71
+
72
+ it('does NOT clear a long-but-ACTIVE turn (modelOutbound suppression)', () => {
73
+ shadowEmit({ kind: 'bridgeUp', at: 1000 })
74
+ shadowEmit({ kind: 'turnStart', key: KEY_A, at: 2000 })
75
+ // Model is still streaming just before the TTL boundary.
76
+ const justBeforeTtl = 2000 + TURN_TTL_MS - 5_000
77
+ shadowEmit({ kind: 'modelOutbound', key: KEY_A, at: justBeforeTtl })
78
+ // Tick past TTL — but recent outbound is within the suppression window,
79
+ // so the turn is NOT cleared (parity with the imperative silence-poke).
80
+ shadowEmit({ kind: 'tick', now: 2000 + TURN_TTL_MS + 1 })
81
+ expect(isMachineInTurn()).toBe(true)
82
+ })
83
+
84
+ it('a buffered sibling inbound does not change the active turn', () => {
85
+ shadowEmit({ kind: 'bridgeUp', at: 1000 })
86
+ inbound(KEY_A, 2000) // fresh turn A
87
+ inbound(KEY_B, 2500) // mid-turn — buffered, must NOT start a new turn
88
+ expect(isMachineInTurn()).toBe(true)
89
+ shadowEmit({ kind: 'turnEnd', key: KEY_A, at: 3000, outboundEmitted: true })
90
+ // A ended; nothing else active → gate reopens so B can drain.
91
+ expect(isMachineInTurn()).toBe(false)
92
+ })
93
+ })
@@ -6,6 +6,10 @@ import {
6
6
  registerAndRender,
7
7
  verbForTool,
8
8
  describeToolUse,
9
+ appendActivityLine,
10
+ appendActivityLabel,
11
+ renderActivityFeed,
12
+ MIRROR_MAX_LINES,
9
13
  } from "../tool-activity-summary.js";
10
14
 
11
15
  describe("describeToolUse — friendly per-tool rendering (draft-mirror)", () => {
@@ -283,3 +287,63 @@ describe("registerAndRender — ergonomic full-pipeline call", () => {
283
287
  expect(s.firstToolName).toBeNull();
284
288
  });
285
289
  });
290
+
291
+ describe("appendActivityLine + renderActivityFeed — accumulating draft feed", () => {
292
+ it("accumulates distinct actions chronologically (newest last)", () => {
293
+ const lines: string[] = [];
294
+ expect(appendActivityLine(lines, "Read", { file_path: "a/gateway.ts" })).toBe(
295
+ "· Reading gateway.ts",
296
+ );
297
+ expect(appendActivityLine(lines, "mcp__hindsight__reflect", { query: "x" })).toBe(
298
+ "· Reading gateway.ts\n· Searching memory",
299
+ );
300
+ expect(appendActivityLine(lines, "Bash", { command: "ls", description: "List workspace" })).toBe(
301
+ "· Reading gateway.ts\n· Searching memory\n· List workspace",
302
+ );
303
+ });
304
+
305
+ it("collapses consecutive exact-duplicate lines", () => {
306
+ const lines: string[] = [];
307
+ appendActivityLine(lines, "Read", { file_path: "a.ts" });
308
+ appendActivityLine(lines, "Read", { file_path: "a.ts" }); // dup → collapsed
309
+ expect(lines).toEqual(["Reading a.ts"]);
310
+ });
311
+
312
+ it("returns null (no feed update) for surface tools", () => {
313
+ const lines: string[] = [];
314
+ expect(appendActivityLine(lines, "mcp__switchroom-telegram__reply", { text: "hi" })).toBeNull();
315
+ expect(lines).toEqual([]);
316
+ });
317
+
318
+ it("caps to the last MIRROR_MAX_LINES with a '+N earlier' header", () => {
319
+ const lines = Array.from({ length: 9 }, (_, i) => `Action ${i + 1}`);
320
+ const out = renderActivityFeed(lines)!;
321
+ expect(out.startsWith("· +3 earlier…\n")).toBe(true);
322
+ // Only the last 6 actions are shown.
323
+ expect(out).toContain("· Action 4");
324
+ expect(out).toContain("· Action 9");
325
+ expect(out).not.toContain("· Action 3\n");
326
+ });
327
+
328
+ it("renderActivityFeed returns null on empty", () => {
329
+ expect(renderActivityFeed([])).toBeNull();
330
+ });
331
+ });
332
+
333
+ describe("appendActivityLabel — precomputed label feed (tool_label path)", () => {
334
+ it("accumulates precomputed labels, dedups consecutive, ignores empty", () => {
335
+ const lines: string[] = [];
336
+ expect(appendActivityLabel(lines, "Searching memory")).toBe("· Searching memory");
337
+ expect(appendActivityLabel(lines, "List workspace")).toBe(
338
+ "· Searching memory\n· List workspace",
339
+ );
340
+ // consecutive dup collapses
341
+ appendActivityLabel(lines, "List workspace");
342
+ expect(lines).toEqual(["Searching memory", "List workspace"]);
343
+ // empty / whitespace → null, no push
344
+ expect(appendActivityLabel(lines, "")).toBeNull();
345
+ expect(appendActivityLabel(lines, " ")).toBeNull();
346
+ expect(appendActivityLabel(lines, undefined)).toBeNull();
347
+ expect(lines.length).toBe(2);
348
+ });
349
+ });
@@ -335,3 +335,68 @@ export function describeToolUse(
335
335
  return "Working…";
336
336
  }
337
337
  }
338
+
339
+ // ─── Accumulating activity feed (draft-mirror Phase 2) ──────────────────────
340
+ //
341
+ // Phase 1 showed only the latest action; this accumulates the turn's actions
342
+ // into a running feed — like Claude Code's own UI — streamed into the
343
+ // ephemeral draft and cleared on reply. Chronological (oldest first, newest
344
+ // last), consecutive exact-duplicates collapsed, capped to the most recent
345
+ // MIRROR_MAX_LINES with a "+N earlier" header so a heavy turn stays readable
346
+ // inside Telegram's compose-area draft.
347
+
348
+ export const MIRROR_MAX_LINES = 6;
349
+
350
+ /**
351
+ * Append a tool_use's friendly line to the running feed (mutates `lines`)
352
+ * and return the rendered draft body — or null when the tool is a surface
353
+ * tool / produced no line (caller skips the draft update).
354
+ *
355
+ * Dedups only consecutive identical lines (e.g. a burst of parallel Reads of
356
+ * the same file) so distinct actions are all preserved.
357
+ */
358
+ export function appendActivityLine(
359
+ lines: string[],
360
+ toolName: string,
361
+ input: Record<string, unknown> | undefined,
362
+ ): string | null {
363
+ const line = describeToolUse(toolName, input);
364
+ if (line == null) return null;
365
+ if (lines.length === 0 || lines[lines.length - 1] !== line) {
366
+ lines.push(line);
367
+ }
368
+ return renderActivityFeed(lines);
369
+ }
370
+
371
+ /**
372
+ * Render the accumulated feed as a plain-text block (one action per line).
373
+ * The caller HTML-escapes + wraps it for Telegram. Returns null when empty.
374
+ *
375
+ * Newest-last chronological order; capped to the last MIRROR_MAX_LINES with a
376
+ * dim "+N earlier" header when the turn ran longer.
377
+ */
378
+ export function renderActivityFeed(lines: string[]): string | null {
379
+ if (lines.length === 0) return null;
380
+ const shown = lines.slice(-MIRROR_MAX_LINES);
381
+ const hidden = lines.length - shown.length;
382
+ const body = shown.map((l) => `· ${l}`).join("\n");
383
+ return hidden > 0 ? `· +${hidden} earlier…\n${body}` : body;
384
+ }
385
+
386
+ /**
387
+ * Like appendActivityLine, but for a pre-computed label (from the
388
+ * real-time PreToolUse sidecar / `tool_label` event) — the hook already
389
+ * rendered the friendly text, so we skip describeToolUse. Returns the
390
+ * rendered feed, or null when the label is empty.
391
+ */
392
+ export function appendActivityLabel(
393
+ lines: string[],
394
+ label: string | undefined,
395
+ ): string | null {
396
+ const l = (label ?? "").trim();
397
+ if (l.length === 0) return null;
398
+ if (lines.length === 0 || lines[lines.length - 1] !== l) {
399
+ lines.push(l);
400
+ }
401
+ return renderActivityFeed(lines);
402
+ }
@@ -40,8 +40,11 @@ export interface ToolLabelRow {
40
40
  export interface ToolLabelSidecar {
41
41
  /** Synchronous label lookup. */
42
42
  getLabel(toolUseId: string): string | undefined
43
- /** Subscribe to "label arrived" notifications. */
44
- onLabel(cb: (toolUseId: string, label: string) => void): () => void
43
+ /** Subscribe to "label arrived" notifications. Fires once per new
44
+ * sidecar line, in real time (~pollMs after the hook's appendFileSync),
45
+ * independent of when the claude transcript flushes. `toolName` lets
46
+ * subscribers filter surface tools (reply/react) from a live feed. */
47
+ onLabel(cb: (toolUseId: string, label: string, toolName: string) => void): () => void
45
48
  /** Force a re-poll (tests). */
46
49
  poll(): void
47
50
  /** Stop polling and release resources. */
@@ -63,7 +66,7 @@ export interface SidecarOptions {
63
66
  export function createToolLabelSidecar(opts: SidecarOptions): ToolLabelSidecar {
64
67
  const path = join(opts.stateDir, `tool-labels-${opts.sessionId}.jsonl`)
65
68
  const labels = new Map<string, string>()
66
- const subscribers = new Set<(toolUseId: string, label: string) => void>()
69
+ const subscribers = new Set<(toolUseId: string, label: string, toolName: string) => void>()
67
70
  let offset = 0
68
71
  let stopped = false
69
72
 
@@ -84,13 +87,18 @@ export function createToolLabelSidecar(opts: SidecarOptions): ToolLabelSidecar {
84
87
  } catch {
85
88
  continue
86
89
  }
87
- if (!row || typeof row.tool_use_id !== 'string' || typeof row.label !== 'string') continue
90
+ if (
91
+ !row ||
92
+ typeof row.tool_use_id !== 'string' ||
93
+ typeof row.label !== 'string' ||
94
+ typeof row.tool_name !== 'string'
95
+ ) continue
88
96
  // First write wins — sidecar lines are append-only and we don't
89
97
  // expect duplicates, but if one lands we keep the earliest.
90
98
  if (labels.has(row.tool_use_id)) continue
91
99
  labels.set(row.tool_use_id, row.label)
92
100
  for (const cb of subscribers) {
93
- try { cb(row.tool_use_id, row.label) } catch { /* ignore */ }
101
+ try { cb(row.tool_use_id, row.label, row.tool_name) } catch { /* ignore */ }
94
102
  }
95
103
  }
96
104
  }
@@ -215,9 +215,13 @@ const CC2_CASES: readonly CC2Case[] = [
215
215
  },
216
216
  {
217
217
  name: "long-running with planned check-ins",
218
+ // Use python time.sleep, NOT the `sleep` command — Claude Code's bash
219
+ // sandbox blocks standalone `sleep` ("foreground sleep is sandboxed
220
+ // away"), which made this case un-runnable (agent replied instantly).
218
221
  prompt:
219
- "Run `bash` with `sleep 5 && echo step1`, send a brief update, " +
220
- "then `sleep 5 && echo step2`, send another brief update, then " +
222
+ "Run `bash` with `python3 -c 'import time; time.sleep(5)'` then echo " +
223
+ "step1, send a brief update, then `python3 -c 'import time; " +
224
+ "time.sleep(5)'` then echo step2, send another brief update, then " +
221
225
  "send a final 'done' as your answer.",
222
226
  },
223
227
  ];
@@ -262,12 +266,27 @@ async function assertMidTurnSilent(
262
266
  )
263
267
  .join("\n");
264
268
 
265
- const last = collected[collected.length - 1];
266
- expect(last.silent, `final answer was silent won't ping. Trail:\n${trail}`).toBe(
267
- false,
268
- );
269
-
270
- const midTurn = collected.slice(0, -1);
269
+ // The model habitually emits a trailing trivial confirmation ("Done.",
270
+ // "Sent.", "OK") as a separate SILENT message AFTER its real pinged
271
+ // answer. That's pacing noise (the turn-pacing directive discourages
272
+ // it), not the final answer — so don't treat it as the
273
+ // "final-answer-must-ping" target. Find the last SUBSTANTIVE message
274
+ // and assert that one pinged; trailing trivial confirmations are
275
+ // ignored for this invariant (they're correctly silent anyway).
276
+ const TRIVIAL_TAIL = /^(done|sent|ok|okay|ack|got it|hope (that|this) helps)\b[.! ]*$/i;
277
+ const isTrivial = (m: ObservedMessage) => TRIVIAL_TAIL.test(m.text.trim());
278
+ let finalIdx = collected.length - 1;
279
+ while (finalIdx > 0 && isTrivial(collected[finalIdx])) finalIdx--;
280
+ const finalAnswer = collected[finalIdx];
281
+ expect(
282
+ finalAnswer.silent,
283
+ `final substantive answer was silent — won't ping. Trail:\n${trail}`,
284
+ ).toBe(false);
285
+
286
+ // Everything BEFORE the final substantive answer must be silent
287
+ // (mid-turn updates ping-free). Trailing trivial confirmations after
288
+ // it are already silent and are not "mid-turn" — exclude them too.
289
+ const midTurn = collected.slice(0, finalIdx);
271
290
  const loudMidTurn = midTurn.filter((m) => !m.silent);
272
291
  expect(
273
292
  loudMidTurn.length,
@@ -334,12 +353,19 @@ async function assertSilencePokeFires(
334
353
  // Single bash call so the poke piggybacks the single tool result.
335
354
  // Without the explicit "no replies" instruction the model might
336
355
  // soft-commit; that resets the silence clock but a single >75s
337
- // sleep still pushes post-commit silence past the threshold.
356
+ // wait still pushes post-commit silence past the threshold.
357
+ //
358
+ // Use python time.sleep, NOT the `sleep` command — Claude Code's bash
359
+ // sandbox blocks standalone `sleep` ("foreground sleep is sandboxed
360
+ // away to prevent burning cache windows"), so a `sleep 80` prompt made
361
+ // the agent reply instantly instead of going silent, breaking this
362
+ // case. python3 time.sleep is a genuine foreground wait the sandbox
363
+ // doesn't special-case.
338
364
  const prompt =
339
- `Run exactly one Bash tool call: \`sleep ${sleepSeconds}\`. Do NOT ` +
340
- `send any reply before the sleep completes — no soft commit, no ` +
341
- `mid-turn updates. When the sleep returns, send one brief 'done' ` +
342
- `reply.`;
365
+ `Run exactly one Bash tool call: \`python3 -c 'import time; ` +
366
+ `time.sleep(${sleepSeconds})'\`. Do NOT send any reply before it ` +
367
+ `completes — no soft commit, no mid-turn updates. When it returns, ` +
368
+ `send one brief 'done' reply.`;
343
369
 
344
370
  await scenario.sendDM(prompt);
345
371