npm - oxtail - Versions diffs - 0.14.1 → 0.15.0 - Mend

oxtail 0.14.1 → 0.15.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

package/README.md +26 -6
package/assets/pretooluse.sh +12 -0
package/dist/mailbox.js +73 -0
package/dist/pending-ask.js +167 -0
package/dist/server.js +194 -33
package/package.json +1 -1
package/scripts/hook-constants.mjs +5 -1

package/README.md CHANGED Viewed

@@ -67,7 +67,7 @@ Contributing? `git clone https://github.com/d4j3y2k/oxtail && cd oxtail && npm i
 - `set_my_state` — write a small "state card" onto this session's registry entry so peers can see what we're doing without reading our transcript. v1 surfaces a single field, `purpose` (≤200 chars).
 - `send_message` — **fire-and-forget** message to a peer. Target is a tmux session name or a raw `client_session_id` UUID. Body ≤ 8KB. Delivery is async via the peer's mailbox file. A plain message does **not** wake an idle peer; pass `wake: "auto"` to nudge one (state-gated — see [Waking an idle peer](#waking-an-idle-peer)). Replies to `ask_peer` should pass `reply_to: "<request_id>"` when the inbound message carries a `request_id` — and a reply **auto-wakes the requester by default** (strictly gated; `wake: "off"` opts out). (v0.5+)
 - `read_my_messages` — drain this session's mailbox and return any queued messages. Messages include `from_session_id`, server-stamped `origin: "peer"`, and optional `request_id` / `reply_to`. Codex peers (and unhooked Claude Code) poll this; Claude Code peers with the hooks installed see messages mid-turn (PreToolUse) or at turn end (Stop) instead. (v0.5+)
-- `ask_peer` — **delegate-and-wait**. Enqueues a message with a `request_id` and blocks server-side until the peer replies with `send_message({ reply_to: request_id })` or the timeout elapses. Default timeout is 45s (`OXTAIL_ASK_PEER_TIMEOUT_MS`), and each call may pass `timeout_ms`. New peers use strict `reply_to` correlation; legacy/no-capability peers fall back to best-effort first-message matching and the response reports `correlation: "uncorrelated"`. That legacy path may stale-match old same-peer chatter, so callers should treat `uncorrelated` as compatibility-only. Use `send_message` for fire-and-forget. (v0.7+)
+- `ask_peer` — **delegate-and-wait**. Enqueues a message with a `request_id` and blocks server-side until the peer replies with `send_message({ reply_to: request_id })` or the timeout elapses. Default timeout is 60s (`OXTAIL_ASK_PEER_TIMEOUT_MS`), and each call may pass `timeout_ms`. New peers use strict `reply_to` correlation; legacy/no-capability peers fall back to best-effort first-message matching and the response reports `correlation: "uncorrelated"`. That legacy path may stale-match old same-peer chatter, so callers should treat `uncorrelated` as compatibility-only. **Durable on timeout (v0.15+):** if the wait elapses, the request is recorded as a pending obligation, so when the peer's reply finally arrives — minutes or hours later — it *wakes the requester back* (`wake_reason: "late_reply_to_pending"`) instead of landing silently. That makes `ask_peer` safe for long-running delegations: let it time out, end the turn, get pulled back when the work is done. Use `send_message` for fire-and-forget. (v0.7+)
 - `reply_to_message` — **reply by `message_id`**. The atomic, correlation-safe alternative to hand-wiring `send_message`'s `target` + `reply_to`: pass the `message_id` the hook or `read_my_messages` showed you and the server looks the inbound envelope up in this session's durable **received-ledger**, derives the reply target (the original sender), carries `reply_to: request_id` when the inbound was an `ask_peer` (keeping the exchange correlated), and stamps `source_message_id`. Replying to a plain `send_message` works too — it just omits `reply_to`. Ownership is structural (you can only reply to a message delivered to *you*); fail-closed on an unknown/aged-out id. Same wake semantics as `send_message`, including the wake-on-reply default. (v0.13+)
 - `register_my_session` — pin this MCP server's `session_id` directly. Kept for debugging; prefer `claim_session`.
 - `get_my_session` — return this MCP server's own registry entry plus a per-strategy detection diagnosis. Useful for debugging.
@@ -163,7 +163,9 @@ send_message({ target: "<requester>", body: "...", reply_to: "<request_id>" })
 The reply path is deliberately **stricter** than explicit `wake: "auto"`. It fires only when the target is **freshly idle** — an `idle` activity marker newer than `OXTAIL_AUTOWAKE_FRESH_IDLE_MS` (default 5 min). Stale, unknown, missing, or busy state yields `skipped_no_fresh_idle` (no best-effort wake — typing unprompted into a terminal that may be unattended is the risk we refuse to take). Two more guards bound it: a **per-target rate limit** (`OXTAIL_AUTOWAKE_MIN_INTERVAL_MS`, default 4s → `skipped_rate_limited`) since one wake already drains the whole mailbox, and a **one-wake dedupe** keyed on `(session_id, reply_to)` (`skipped_deduped`) so a duplicate or late hook drain of the same reply can't re-fire. If the dedupe/rate store is somehow unwritable the wake degrades to `skipped_store_error` rather than failing the (already-delivered) message. The env kill-switch `OXTAIL_AUTOWAKE=off` disables reply auto-wake entirely (`wake_status: "disabled"`). Every outcome that reaches the gate surfaces a `wake_status`; the reply path also stamps `wake_reason: "reply_to_default"` (present even on a resolve error like `ambiguous-target`, where there's no single target to wake).
-**Coverage (which requesters this reaches).** The fresh-idle gate keys on the requester's busy/idle activity marker, which only the Claude Code hooks maintain. So wake-on-reply currently closes the stranding for a **hooked Claude Code requester** (the originally-observed case: a peer's async reply to an idle Claude session). A **Codex** requester — or a Claude requester without the hooks installed — has no idle marker, so a reply with `wake` unset returns `skipped_no_fresh_idle` and is **not** auto-woken; reach it with an explicit `wake: "auto"`, which always takes the lenient wake path (idle/unknown/stale all wake; only a fresh-`busy` peer is skipped) and bypasses the strict fresh-idle gate even for a reply. Closing the Codex/unhooked-requester direction *by default* needs a requester-side waiter signal (`expects_reply`), which is the next slice — a blind `unknown ⇒ wake` default is deliberately avoided because it reintroduces the double-wake-an-active-waiter risk this gate exists to prevent.
+**Coverage (which requesters this reaches).** The fresh-idle gate keys on the requester's busy/idle activity marker, which only the Claude Code hooks maintain. So wake-on-reply currently closes the stranding for a **hooked Claude Code requester** (the originally-observed case: a peer's async reply to an idle Claude session). A **Codex** requester — or a Claude requester without the hooks installed — has no idle marker, so a reply with `wake` unset returns `skipped_no_fresh_idle` and is **not** auto-woken; reach it with an explicit `wake: "auto"`, which always takes the lenient wake path (idle/unknown/stale all wake; only a fresh-`busy` peer is skipped) and bypasses the strict fresh-idle gate even for a reply.
+For the **`ask_peer` case specifically**, the Codex/unhooked-requester direction is now closed *by default* (v0.15+, see [Durable `ask_peer`](#durable-ask_peer-long-efforts) below): a timed-out `ask_peer` records a durable **pending-ask** keyed on the requester's `session_id` + `request_id`, and the matching late reply takes the lenient wake path regardless of any idle marker — so even a markerless idle Codex requester is pulled back. This is exactly the requester-side waiter signal the blind `unknown ⇒ wake` default was avoided for: it's evidence the requester *explicitly asked and is waiting*, so it can't double-wake an unrelated active turn.
 **Codex and the wake matrix.** The send-keys wake needs a tmux pane. A Codex peer running **outside tmux** has none, so it returns `wake_status: "skipped_no_target"` — its idle delivery stays poll-based (`read_my_messages`). Run Codex **inside a tmux pane** to get symmetric idle-wake; the routing already handles the Codex paste-burst case.
@@ -238,17 +240,34 @@ All wake paths funnel through one place, which **coalesces** rapid repeat wakes
 ### Constraints
 - The target peer must have a registered `client.session_id`. Codex peers must call `claim_session` / `register_my_session` first; without that, `ask_peer` returns `error: "peer-has-no-session-id"` rather than guessing.
-- Timeout defaults to 45000ms (conservative under typical MCP-client tool-call abort windows). Pass `timeout_ms` on a call when a specific delegation needs a different bound; max 300000ms.
+- Timeout defaults to 60000ms — enough headroom for a slower multi-tool-call peer reply (e.g. a Codex peer running `set_my_state` + `reply_to_message` + composing a report, observed ~46s) while staying under both known callers' tool-call abort windows (Claude Code is clean to ~60s; Codex aborts ~120s). Pass `timeout_ms` on a call when a specific delegation needs a different bound; max 300000ms.
 ### Tuning the timeout
-If `ask_peer` returns an abort error before its built-in 45s timeout fires, your MCP client's tool-call ceiling is lower than 45s. Override the bound at server startup:
+If `ask_peer` returns an abort error before its built-in 60s timeout fires, your MCP client's tool-call ceiling is lower than 60s. Override the bound at server startup:
 ```sh
 OXTAIL_ASK_PEER_TIMEOUT_MS=30000 npx -y oxtail@0.10.1
 ```
-The server reads the env var once at boot and uses it as the fixed timeout for all `ask_peer` calls in that session. Values must be positive numbers; anything else falls back to the 45000ms default.
+The server reads the env var once at boot and uses it as the fixed timeout for all `ask_peer` calls in that session. Values must be positive numbers; anything else falls back to the 60000ms default.
+### Durable `ask_peer` (long efforts)
+The blocking wait is a *short* primitive (bounded by the client's tool-call abort window, ~60s). A real task can take minutes or hours — far longer than any wait can block. So `ask_peer` decouples the **wait** from the **delivery of the answer**:
+- On timeout (for a correlated peer + a claimed requester), the request is recorded as a durable **pending-ask** at `~/.oxtail/pending-ask/p-<hash(session_id, request_id)>`, keyed on the *requester's* `session_id` + `request_id`. A `recordPendingAsk` runs **before** one final authoritative union-drain of the requester's mailbox (write-before-final-drain), so a reply that lands in the poll-vs-deadline gap is returned immediately, and a reply that arrives later finds the persisted record.
+- When that reply eventually arrives, `resolveSendWake` finds the matching pending-ask, **consumes it** (atomic `unlink`, single-winner — a duplicate/re-delivered reply can't re-fire), and takes the **lenient** wake path (`wake_reason: "late_reply_to_pending"`). Because the record is *proof the requester explicitly asked and is waiting*, the wake fires regardless of the 5-min fresh-idle window — and reaches a **markerless idle Codex** requester that the strict reply-default would skip. It also stamps the autowake dedupe for `(session_id, request_id)` so a later duplicate can't strict-wake via the fresh-idle fallback.
+- `wake: "off"` still **consumes** the record (the obligation is satisfied — leaving it would let a later duplicate wake and violate the explicit off) but suppresses the wake (`wake_reason: "late_reply_to_pending_suppressed"`). The automatic (wake-unset) path honors `OXTAIL_AUTOWAKE=off` (`wake_status: "disabled"`); an explicit `wake: "auto"` intentionally does not.
+- The reply drain is a **union across the requester's sibling MCP-child pids** (`drainMatchingReplyMany`), mirroring `read_my_messages` — a dual-scope requester's reply may land in a sibling pid, not the one that blocked in `ask_peer`.
+Records are honored for `OXTAIL_PENDING_ASK_TTL_MS` (default 1h, sized for long efforts): a reply after that still delivers durably via `read_my_messages` but won't fire the pull-back wake (`consumePendingAsk` is TTL-aware — it removes an over-TTL record without waking). GC is **opportunistic** — abandoned records (a reply that never came) are swept when a later `ask_peer` times out, not on a wall-clock timer; the files are tiny, and a reply always cleans up its own record on arrival.
+**The pattern:** `ask_peer` a long task → let it return `timed_out: true` → end your turn → get woken when the answer lands. Pair with a generous `OXTAIL_ACTIVITY_BUSY_TTL_MS` if your turns run long (see below).
+### Keeping a long turn marked busy
+`wake: "auto"` skips a peer that is **freshly `busy`** (mid-turn — its hooks deliver, so a keystroke wake would be noise). The `busy` marker is set at turn start (UserPromptSubmit hook) and **re-stamped on every tool call** (PreToolUse hook, v0.15+), so a long *active* turn stays fresh and never invites a spurious wake. A turn that stops making tool calls — one giant single tool call, or a crash without a clean Stop — ages past `OXTAIL_ACTIVITY_BUSY_TTL_MS` (default 10 min) and then *does* wake, which is the intended stale-busy → recovery behavior. Widen the TTL for deployments with very long single-tool-call turns.
 ### Recommended permissions for autonomous agent-to-agent collaboration
@@ -306,8 +325,9 @@ A scheduled CI job (`.github/workflows/codex-drift.yml`, also runnable on demand
 ## Status
-v0.13.0. Pushes the autonomous peer-messaging matrix toward zero human relay, hardens the wake path, then makes correlated replies atomic.
+v0.15.0. Pushes the autonomous peer-messaging matrix toward zero human relay, hardens the wake path, makes correlated replies atomic, and makes delegation durable across long (minutes-to-hours) efforts.
+- **Durable `ask_peer` + long-effort liveness (v0.15.0).** A timed-out `ask_peer` records a pending obligation (`~/.oxtail/pending-ask/`, keyed on requester `session_id` + `request_id`, written *before* a final authoritative union-drain), so the peer's reply — arriving minutes or hours later — *wakes the requester back* (`wake_reason: "late_reply_to_pending"`) instead of landing silently. The pull-back takes the lenient wake path, so it reaches even a markerless idle Codex requester — closing the last wake-on-reply asymmetry. The reply drain unions the requester's sibling MCP-child pids (and sweeps migrate-crash duplicates) so a dual-scope reply can't strand. Separately, the `PreToolUse` hook now re-stamps the `busy` marker every tool call, so a long *active* turn never reads as stale-busy and invites a spurious wake. New env: `OXTAIL_PENDING_ASK_TTL_MS` (1h), `OXTAIL_ACTIVITY_BUSY_TTL_MS` (10m); `ask_peer` default timeout 45s→60s.
 - **Reply by id (v0.13.0).** `reply_to_message(message_id, body)` removes the manual `target` + `reply_to` rewiring that silently degraded a correlated exchange into loose mailbox traffic: the server looks the inbound envelope up in a durable per-session **received-ledger** (`~/.oxtail/received/<hash(session_id)>.jsonl`), derives the reply target and `reply_to` itself, and enforces ownership structurally (you can only reply to a message delivered to you). The ledger is written *before* the mailbox line is visible — so a handle the hook displays is always resolvable even though both delivery paths destroy the queue entry once it is handed off. Fail-closed on an unknown/aged-out id.
 - **Wake-on-reply (v0.11.0).** A reply — `send_message` with `reply_to` — auto-wakes a freshly-idle requester by default, so an awaited answer doesn't strand an idle peer. Strictly gated (fresh-idle only, per-target rate limit, one-wake dedupe, `OXTAIL_AUTOWAKE=off` kill-switch). `wake:"off"` opts out; explicit `wake:"auto"` is the escape hatch for a requester without an idle marker (Codex / hookless Claude).
 - **Wake hardening (v0.12.0).** Wake keystrokes only ever target the pane the process tree confirms hosts the peer's `server_pid` — never a self-written `tmux_pane`/`tmux_session`, and registry entries whose `server_pid` doesn't match their filename are rejected. Rapid repeat wakes to one peer are coalesced (`skipped_debounced`). `oxtail diagnose` summarizes wake outcomes from `MCP_TRACE_FILE`, and a scheduled CI job flags drift in Codex's paste-burst window before it can break the wake.

package/assets/pretooluse.sh CHANGED Viewed

@@ -42,6 +42,18 @@ if [ ! -t 0 ]; then
 fi
 [ -z "$sid" ] && exit 0
+# Re-stamp "busy" on EVERY tool call (before any early-exit below) so a long,
+# ACTIVE turn keeps a fresh marker and never reads as stale-busy (>TTL) to a
+# peer's wake:auto. UserPromptSubmit sets "busy" once at turn start; without this
+# a turn outrunning the TTL would invite a spurious keystroke wake into a working
+# agent. The Stop hook flips this back to "idle" on a real stop. Keyed by
+# session_id; sanitization MUST match the server's activitySessionKey().
+safe_sid=$(printf '%s' "$sid" | tr -c 'A-Za-z0-9_-' '_')
+[ -n "$safe_sid" ] && {
+  mkdir -p "$HOME/.oxtail/activity" 2>/dev/null || true
+  printf 'busy' > "$HOME/.oxtail/activity/$safe_sid" 2>/dev/null || true
+}
 sessions_dir="$HOME/.oxtail/sessions"
 mailboxes_dir="$HOME/.oxtail/mailboxes"
 [ -d "$sessions_dir" ] || exit 0

package/dist/mailbox.js CHANGED Viewed

@@ -349,6 +349,79 @@ export function drainMatchingSession(my_pid, from_session_id) {
 export function drainMatchingReply(my_pid, from_session_id, reply_to) {
     return drainFirstMatching(my_pid, (msg) => msg.from_session_id === from_session_id && msg.reply_to === reply_to);
 }
+// Union variant of drainMatchingReply across a session's sibling/previous MCP
+// child pids. ask_peer waits on the requester's OWN pid, but the reply is
+// addressed by client.session_id and resolveTarget(readAll) enqueues it to the
+// session's freshest sibling — which, in a dual-scope / pid-rotation setup, may
+// NOT be the pid blocked in ask_peer. A single-pid drain would then miss a reply
+// that already landed in a sibling mailbox and strand it. Mirrors the session
+// union read_my_messages / the PreToolUse hook already use.
+//
+// Returns the FIRST matching reply across the (deduped) pids. It does NOT pull
+// every match: two DISTINCT replies to the same request_id (an answer + a
+// follow-up correction) must not both be drained with one silently dropped — the
+// second stays for read_my_messages. But once the first match is found, it DOES
+// sweep an exact same-message_id duplicate out of the remaining pids: a
+// migrate-crash can leave the SAME message in two siblings, and if we returned
+// one copy and left the other, a later union drain would see only the lone
+// survivor and re-deliver it as a "new" message. Sweeping by message_id removes
+// the duplicate while leaving any distinct reply intact.
+//
+// `skipped` reports pids that could not be inspected (lock contention after the
+// internal acquire-retry budget). The poll tolerates this (it retries next tick);
+// the authoritative final drain in ask_peer retries the skipped pids so a
+// transiently-locked sibling holding the reply isn't mistaken for "no reply".
+export function drainMatchingReplyManyChecked(pids, from_session_id, reply_to) {
+    const seen = new Set();
+    const skipped = [];
+    let found = null;
+    for (const pid of pids) {
+        if (seen.has(pid))
+            continue;
+        seen.add(pid);
+        try {
+            if (!found) {
+                const m = drainMatchingReply(pid, from_session_id, reply_to);
+                if (m)
+                    found = m;
+            }
+            else {
+                // Sweep an exact-message_id duplicate (migrate-crash) from this sibling;
+                // a distinct reply (different id) is left untouched.
+                const dupId = found.id;
+                drainFirstMatching(pid, (msg) => msg.id === dupId);
+            }
+        }
+        catch {
+            skipped.push(pid);
+        }
+    }
+    return { reply: found, skipped };
+}
+export function drainMatchingReplyMany(pids, from_session_id, reply_to) {
+    return drainMatchingReplyManyChecked(pids, from_session_id, reply_to).reply;
+}
+// Best-effort removal of an EXACT message_id from each of `pids`. Used to clean
+// up a migrate-crash duplicate that was left in a pid the union drain couldn't
+// inspect (lock contention) at the time the reply was pulled from another pid —
+// otherwise a later read_my_messages would re-deliver the lone survivor as a
+// "new" message. Matches by message_id only, so a DISTINCT reply (different id)
+// in the same pid is never touched. Per-pid errors are skipped.
+export function sweepMessageId(pids, messageId) {
+    const seen = new Set();
+    for (const pid of pids) {
+        if (seen.has(pid))
+            continue;
+        seen.add(pid);
+        try {
+            drainFirstMatching(pid, (msg) => msg.id === messageId);
+        }
+        catch {
+            // best effort — a still-locked pid is left; the dup is a rare crash-window
+            // artifact and the cost is at most one re-delivered (same-id) message.
+        }
+    }
+}
 function drainFirstMatching(my_pid, matches) {
     acquireLock(my_pid);
     try {

package/dist/pending-ask.js ADDED Viewed

@@ -0,0 +1,167 @@
+// Pending-ask registry — durable ask_peer (the long-effort liveness fix).
+//
+// When an ask_peer wait TIMES OUT, the requester records a "pending ask" here:
+// a durable note that it is still awaiting a reply correlated by request_id.
+// When that reply eventually arrives — minutes or hours later, long after the
+// 5-minute fresh-idle window the strict reply-default wake is gated to — the
+// reply handler (server.ts resolveSendWake) finds the matching record and fires
+// a LENIENT wake to pull the requester back, instead of stranding it idle until
+// its next turn. This is what turns ask_peer into "delegate a long task and get
+// pulled back the moment it's done", and it also reaches a markerless idle Codex
+// requester that the fresh-idle gate would skip as skipped_no_fresh_idle.
+//
+// Design mirrors autowake.ts exactly: one small file per record under
+// ~/.oxtail/pending-ask/, mtime is the source of truth (driven by an injected
+// nowMs so it's deterministic in tests), the body is a debug breadcrumb, GC'd by
+// age. Keyed on the REQUESTER's client.session_id + the request_id (the agent
+// identity per AGENTS.md, never server_pid). Best-effort: a broken store
+// degrades to "no record" — it NEVER throws, because a thrown error here would
+// surface on an already-enqueued/already-delivered message and invite a retry.
+import { createHash } from "node:crypto";
+import { closeSync, mkdirSync, openSync, readdirSync, statSync, unlinkSync, utimesSync, writeFileSync, } from "node:fs";
+import { homedir } from "node:os";
+import { join } from "node:path";
+function envPosInt(name, def, env = process.env) {
+    const v = env[name];
+    if (!v)
+        return def;
+    const n = Number(v);
+    return Number.isFinite(n) && n > 0 ? n : def;
+}
+// How long a recorded pending-ask is honored before GC reclaims it. Sized for
+// long efforts (a delegated task that runs for the better part of an hour) — a
+// reply arriving after this window still delivers durably via read_my_messages,
+// it just won't fire the pull-back wake. Generous by default; tunable.
+export const PENDING_ASK_TTL_MS = envPosInt("OXTAIL_PENDING_ASK_TTL_MS", 60 * 60 * 1000);
+export function defaultPendingAskDir() {
+    return join(homedir(), ".oxtail", "pending-ask");
+}
+function hash(s) {
+    // request_id is caller-influenced, so never build a filename from it directly.
+    return createHash("sha256").update(s).digest("hex").slice(0, 32);
+}
+function recordPath(dir, sessionId, requestId) {
+    // JSON-encode the pair so the (sessionId, requestId) boundary is unambiguous
+    // and can't be crafted to collide with a different split (mirrors autowake.ts).
+    return join(dir, `p-${hash(JSON.stringify([sessionId, requestId]))}`);
+}
+function setMtime(path, nowMs) {
+    const t = nowMs / 1000;
+    try {
+        utimesSync(path, t, t);
+    }
+    catch {
+        // best effort — mtime drives TTL math; a failure only skews freshness by the
+        // small real-vs-injected clock delta.
+    }
+}
+// Record a pending ask. Atomic create-exclusive so a duplicate record (same
+// requester + request_id) is a no-op rather than resetting the TTL clock.
+// Returns true if a record now exists for this pair (freshly written OR already
+// present), false only on a missing identity or an unusable store. Never throws.
+export function recordPendingAsk(dir, sessionId, requestId, nowMs) {
+    // Never key on an empty identity: an unclaimed requester can't be correlated
+    // or replied-to, so there's nothing to wake later.
+    if (!sessionId || !requestId)
+        return false;
+    try {
+        mkdirSync(dir, { recursive: true, mode: 0o700 });
+        const p = recordPath(dir, sessionId, requestId);
+        try {
+            const fd = openSync(p, "wx"); // atomic create-exclusive
+            try {
+                writeFileSync(fd, JSON.stringify({ sessionId, requestId, at: nowMs }));
+            }
+            finally {
+                closeSync(fd);
+            }
+            setMtime(p, nowMs);
+            return true;
+        }
+        catch (e) {
+            // EEXIST: a record already exists → fine, leave its original mtime so the
+            // TTL counts from the first record, not this duplicate.
+            if (e.code === "EEXIST")
+                return true;
+            throw e;
+        }
+    }
+    catch {
+        // Store unusable (e.g. ~/.oxtail/pending-ask is a file, permission error) —
+        // degrade to "no durable record"; the strict fresh-idle reply-default still
+        // covers a Claude requester that went idle <5 min ago.
+        return false;
+    }
+}
+// Read-only: is there a live (within TTL) pending-ask for this pair?
+export function hasPendingAsk(dir, sessionId, requestId, nowMs, ttlMs = PENDING_ASK_TTL_MS) {
+    if (!sessionId || !requestId)
+        return false;
+    try {
+        const st = statSync(recordPath(dir, sessionId, requestId));
+        return nowMs - st.mtimeMs < ttlMs;
+    }
+    catch {
+        return false;
+    }
+}
+// Atomically consume (delete) the pending-ask for this pair. Returns true iff a
+// record existed, was within the TTL, and THIS caller removed it — the
+// single-winner signal the reply handler uses to fire exactly one pull-back
+// wake. A concurrent second reply (or a re-delivered duplicate) racing the same
+// key loses: unlinkSync throws ENOENT for the loser, so it returns false and
+// does not re-wake.
+//
+// When nowMs is supplied, an OVER-TTL record is still unlinked (so a stale
+// record can't leak) but the function returns false — honoring the contract that
+// a reply arriving after PENDING_ASK_TTL_MS still delivers durably but does NOT
+// fire the late wake. Omit nowMs to consume regardless of age (used right after
+// recordPendingAsk, where the record is freshly written).
+export function consumePendingAsk(dir, sessionId, requestId, nowMs, ttlMs = PENDING_ASK_TTL_MS) {
+    if (!sessionId || !requestId)
+        return false;
+    const p = recordPath(dir, sessionId, requestId);
+    let withinTtl = true;
+    if (nowMs !== undefined) {
+        try {
+            withinTtl = nowMs - statSync(p).mtimeMs < ttlMs;
+        }
+        catch {
+            return false; // no record to consume
+        }
+    }
+    try {
+        unlinkSync(p); // remove regardless of age so a stale record can't leak
+    }
+    catch {
+        // ENOENT (no record / already consumed by a racing caller) or any store
+        // error → not ours to act on.
+        return false;
+    }
+    return withinTtl;
+}
+// Remove pending-ask records older than the TTL. Cheap, low-volume dir; run
+// opportunistically so abandoned records (a reply that never came) can't
+// accumulate. Mirrors gcAutowake.
+export function gcPendingAsk(dir, nowMs, ttlMs = PENDING_ASK_TTL_MS) {
+    let names;
+    try {
+        names = readdirSync(dir);
+    }
+    catch {
+        return; // dir not created yet
+    }
+    for (const name of names) {
+        if (name[0] !== "p")
+            continue;
+        const p = join(dir, name);
+        try {
+            const st = statSync(p);
+            if (nowMs - st.mtimeMs >= ttlMs)
+                unlinkSync(p);
+        }
+        catch {
+            // best effort
+        }
+    }
+}

package/dist/server.js CHANGED Viewed

@@ -15,7 +15,8 @@ import * as mailbox from "./mailbox.js";
 import * as received from "./received.js";
 import { deliverExistingToPeer, deliverToPeer } from "./delivery.js";
 import { recoverClaim, resolveAncestors, writeClaim } from "./claims.js";
-import { decideReplyAutoWake, defaultAutowakeDir } from "./autowake.js";
+import { autowakeKillSwitchOff, claimWake, decideReplyAutoWake, defaultAutowakeDir, } from "./autowake.js";
+import { consumePendingAsk, defaultPendingAskDir, gcPendingAsk, recordPendingAsk, } from "./pending-ask.js";
 import { markWoke, newWakeDebounceStore, recentlyWoke } from "./wake-debounce.js";
 // CLI subcommand dispatch must run before any MCP setup so that
 // `npx oxtail install-hook` doesn't open an MCP transport or register a
@@ -1010,7 +1011,7 @@ function resolveTarget(target, caller) {
 server.registerTool("send_message", {
     description: [
         "Fire-and-forget message to a peer in the same project root. Target: a tmux session name OR a client_session_id (UUID). Async via the peer's mailbox — delivered mid-turn (PreToolUse hook) or next-turn (read_my_messages); cross-project targets are rejected.",
-        "A plain message does NOT wake an idle peer. Pass wake:\"auto\" to nudge one via per-client send-keys, state-gated (skipped if the peer is mid-turn). EXCEPTION (wake-on-reply): when you set reply_to, this auto-wakes the requester by default so your answer doesn't strand them idle — pass wake:\"off\" to suppress. The reply-default wake is strictly gated: it fires only for a FRESHLY-IDLE requester (one whose Claude Code hooks maintain a fresh idle marker), with a per-target rate limit and a one-wake dedupe; env kill-switch OXTAIL_AUTOWAKE=off. A requester with no idle marker (Codex, or Claude without the hooks) returns skipped_no_fresh_idle and is NOT auto-woken — use explicit wake:\"auto\" for those. Response carries wake_status (\"fired\" | \"skipped_busy\" | \"skipped_debounced\" | \"skipped_no_fresh_idle\" | \"skipped_rate_limited\" | \"skipped_deduped\" | \"skipped_store_error\" | \"skipped_no_target\" | \"disabled\") and, on the reply path, wake_reason:\"reply_to_default\".",
+        "A plain message does NOT wake an idle peer. Pass wake:\"auto\" to nudge one via per-client send-keys, state-gated (skipped if the peer is mid-turn). EXCEPTION (wake-on-reply): when you set reply_to, this auto-wakes the requester by default so your answer doesn't strand them idle — pass wake:\"off\" to suppress. The reply-default wake is strictly gated: it fires only for a FRESHLY-IDLE requester (one whose Claude Code hooks maintain a fresh idle marker), with a per-target rate limit and a one-wake dedupe; env kill-switch OXTAIL_AUTOWAKE=off. A requester with no idle marker (Codex, or Claude without the hooks) returns skipped_no_fresh_idle and is NOT auto-woken — use explicit wake:\"auto\" for those. Response carries wake_status (\"fired\" | \"skipped_busy\" | \"skipped_debounced\" | \"skipped_no_fresh_idle\" | \"skipped_rate_limited\" | \"skipped_deduped\" | \"skipped_store_error\" | \"skipped_no_target\" | \"disabled\") and, on the reply path, wake_reason:\"reply_to_default\" — or wake_reason:\"late_reply_to_pending\" when this reply answers an ask_peer that had timed out (durably pulls the requester back regardless of the fresh-idle window; \"late_reply_to_pending_suppressed\" if you passed wake:\"off\").",
         "Body is verbatim — wrap in <system-reminder>...</system-reminder> yourself if you want that framing. When replying to ask_peer, include reply_to: request_id from the inbound message. For a blocking send-and-wait, use ask_peer instead.",
     ].join(" "),
     inputSchema: {
@@ -1085,7 +1086,7 @@ server.registerTool("send_message", {
 server.registerTool("reply_to_message", {
     description: [
         "Reply to a specific inbound peer message by its message_id — the atomic, correlation-safe alternative to hand-wiring send_message's target + reply_to. The server looks the message up in this session's durable received-ledger, so you pass only the message_id the PreToolUse hook or read_my_messages already showed you; it derives the reply target (the original sender), carries reply_to=request_id when the inbound was an ask_peer (keeping the exchange correlated), and sets source_message_id for provenance. Replying to a plain send_message works too — it just omits reply_to. Ownership is structural: you can only reply to a message delivered to you.",
-        "Delivery + wake match send_message exactly, including the wake-on-reply default: when the inbound carried a request_id and you leave wake unset, a freshly-idle requester is auto-woken; pass wake:\"auto\" to nudge any idle peer, or wake:\"off\" to suppress. Fail-closed: an unknown or aged-out message_id returns error message-not-found instead of guessing a target.",
+        "Delivery + wake match send_message exactly, including the wake-on-reply default: when the inbound carried a request_id and you leave wake unset, a freshly-idle requester is auto-woken; pass wake:\"auto\" to nudge any idle peer, or wake:\"off\" to suppress. If the inbound ask_peer had since timed out, this reply durably pulls the requester back (wake_reason late_reply_to_pending) regardless of the fresh-idle window. Fail-closed: an unknown or aged-out message_id returns error message-not-found instead of guessing a target.",
     ].join(" "),
     inputSchema: {
         message_id: z
@@ -1261,15 +1262,18 @@ server.registerTool("read_my_messages", {
 // elapses. Reply-to-capable peers must reply with reply_to=request_id; legacy
 // peers fall back to the original from_session_id-only matching.
 //
-// User-tunable override via OXTAIL_ASK_PEER_TIMEOUT_MS; defaults to 45000ms
-// (conservative under typical MCP-client tool-call abort windows). Set to a
-// lower value if your client aborts before our timeout fires.
+// User-tunable override via OXTAIL_ASK_PEER_TIMEOUT_MS; defaults to 60000ms.
+// 60s covers a slower multi-tool-call peer reply (a Codex peer composing
+// set_my_state + reply_to_message + a report was observed at ~46s and falsely
+// timed out under the old 45s default) while staying under both known callers'
+// tool-call abort windows: Claude Code is clean to ~60s, Codex aborts ~120s.
+// Set to a lower value if your client aborts before our timeout fires.
 const ASK_PEER_TIMEOUT_MS = (() => {
     const env = process.env.OXTAIL_ASK_PEER_TIMEOUT_MS;
     if (!env)
-        return 45_000;
+        return 60_000;
     const n = Number(env);
-    return Number.isFinite(n) && n > 0 ? n : 45_000;
+    return Number.isFinite(n) && n > 0 ? n : 60_000;
 })();
 const ASK_PEER_GRACE_MS = 500;
 const ASK_PEER_POLL_MS = 200;
@@ -1480,7 +1484,19 @@ async function wakePeer(peer) {
 // Keyed by session_id (the agent identity), NOT server_pid: a dual-scope agent
 // has several MCP children sharing one session_id, and the hooks/sender must
 // agree on the key (see AGENTS.md). Must match the sanitization in the hooks.
-const ACTIVITY_BUSY_TTL_MS = 10 * 60 * 1000;
+// How long a "busy" marker is trusted before a peer treats the turn as stale and
+// wakes anyway. The PreToolUse hook now re-stamps "busy" on every tool call, so
+// a long ACTIVE turn stays fresh; this TTL only governs a turn that stops making
+// tool calls (one giant single tool call, or a crash without a clean Stop) — the
+// latter is exactly the stale-busy→wake recovery we want. Configurable for
+// deployments with very long single-tool-call turns.
+const ACTIVITY_BUSY_TTL_MS = (() => {
+    const env = process.env.OXTAIL_ACTIVITY_BUSY_TTL_MS;
+    if (!env)
+        return 10 * 60 * 1000;
+    const n = Number(env);
+    return Number.isFinite(n) && n > 0 ? n : 10 * 60 * 1000;
+})();
 function activitySessionKey(sessionId) {
     return sessionId.replace(/[^A-Za-z0-9_-]/g, "_");
 }
@@ -1553,11 +1569,64 @@ async function autoWakeOnReply(peer, replyTo) {
     trace("autowake_reply_fire", { target_session_id: sid });
     return wakePeer(peer);
 }
-// Resolve the wake for a send_message. The strict reply-default path engages
-// only for a reply with wake UNSET; an explicit wake:"auto" always means the
-// lenient wakeForSend path (even for a reply — the Codex/hookless escape hatch),
-// and wake:"off" means no wake. Returns the status + reason to surface.
+// Stamp the autowake dedupe record for (sessionId, replyTo) when the durable
+// pending-ask path fires, so a re-delivered / duplicate copy of the SAME reply
+// can't separately strict-wake the requester via the fresh-idle reply-default
+// (the in-memory wakePeer debounce is per-process and not reply_to-keyed, so it
+// doesn't cover a restart or a >1s gap). Best-effort; we're stamping, not gating.
+//
+// Like the existing reply-default path (decideReplyAutoWake → claimWake), this is
+// stamped on the wake ATTEMPT — before wakeForSend's keystroke outcome is known —
+// and claimWake also stamps the per-target RATE record. Intentional and
+// consistent with that path: one wake pulls the requester in to drain its whole
+// mailbox, so a second reply within the rate window doesn't need its own wake.
+// (It is NOT stamped on the wake:"off" / kill-switch-disabled paths, where no
+// wake is intended — see resolveSendWake.)
+function stampReplyWakeDedupe(sessionId, replyTo) {
+    if (!sessionId)
+        return;
+    try {
+        claimWake(defaultAutowakeDir(), sessionId, replyTo, Date.now());
+    }
+    catch {
+        // best effort — a failure only means a duplicate could still strict-wake,
+        // which is harmless (debounced, and the requester drains an empty mailbox).
+    }
+}
+// Resolve the wake for a send_message / reply_to_message. Order matters:
+//   1. DURABLE pending-ask: if this reply satisfies an ask_peer that timed out
+//      and recorded a pending obligation, consume it (regardless of wake mode —
+//      a late reply satisfies the obligation even under wake:"off", and leaving
+//      the record would let a later duplicate wake and violate the explicit off)
+//      and fire the LENIENT wakeForSend so even a long-idle / markerless-Codex
+//      requester is pulled back. The automatic (wake unset) variant honors the
+//      OXTAIL_AUTOWAKE kill-switch; an explicit wake:"auto" intentionally does
+//      not (it's the caller's explicit ask, matching existing semantics).
+//   2. STRICT reply-default: a reply with wake UNSET and no pending record →
+//      fresh-idle-only auto-wake (autowake.ts), wake_reason "reply_to_default".
+//   3. Explicit wake:"auto" → lenient wakeForSend. wake:"off" → no wake.
 async function resolveSendWake(peer, wake, replyTo) {
+    if (replyTo) {
+        const sid = peer.client.session_id ?? "";
+        if (consumePendingAsk(defaultPendingAskDir(), sid, replyTo, Date.now())) {
+            // wake:"off" and the kill-switch path do NOT wake — so they must NOT stamp
+            // the wake-dedupe: stamping there would later suppress the strict wake for a
+            // genuine, distinct second reply to the same request_id (no wake happened,
+            // so there is nothing to dedupe against). Only stamp on the path that fires.
+            if (wake === "off") {
+                trace("late_reply_pending_suppressed", { target_session_id: sid });
+                return { wake_reason: "late_reply_to_pending_suppressed" };
+            }
+            if (wake === undefined && autowakeKillSwitchOff()) {
+                return { wake_status: "disabled", wake_reason: "late_reply_to_pending" };
+            }
+            // About to actually wake → stamp so a re-delivered copy of THIS reply can't
+            // strict-wake again via the fresh-idle fallback.
+            stampReplyWakeDedupe(peer.client.session_id, replyTo);
+            trace("late_reply_pending_wake", { target_session_id: sid });
+            return { wake_status: await wakeForSend(peer), wake_reason: "late_reply_to_pending" };
+        }
+    }
     if (replyAutoWakeTriggered(wake, replyTo)) {
         return { wake_status: await autoWakeOnReply(peer, replyTo), wake_reason: "reply_to_default" };
     }
@@ -1571,24 +1640,38 @@ async function resolveSendWake(peer, wake, replyTo) {
 // mailbox lock when there's a probable hit. The lock is held only inside
 // drainMatchingSession (sub-10ms) — never across the poll interval, so the
 // PreToolUse hook on subsequent caller tool calls is never starved.
-async function askPeerPoll(my_pid, from_session_id, request_id, require_reply_to, deadlineMs, signal) {
-    let lastMtime = -1;
-    const path = mailbox.mailboxFilePath(my_pid);
+// The requester's mailbox pid union: own pid first (fast-path locality), then
+// any sibling/previous MCP child sharing the session_id. Recomputed at the final
+// drain so a sibling that appeared DURING the wait is still covered.
+function requesterPids(ownPid, sessionId) {
+    return sessionId
+        ? [ownPid, ...sessionPidsForId(sessionId).filter((p) => p !== ownPid)]
+        : [ownPid];
+}
+async function askPeerPoll(pids, from_session_id, request_id, require_reply_to, deadlineMs, signal) {
+    // Watch the mtime of EVERY sibling pid's mailbox (a dual-scope requester's
+    // reply may land in a pid other than the one blocked here), draining only when
+    // a file that exists has changed — so the lock is acquired on a probable hit,
+    // never every tick. Mirrors the single-pid optimization, widened to the union.
+    const lastMtimes = new Map();
     while (Date.now() < deadlineMs) {
         if (signal.aborted)
             throw new Error("aborted");
-        let stat = null;
-        try {
-            stat = statSync(path);
-        }
-        catch {
-            // ENOENT: mailbox file not created yet; treat as no change
+        let changed = false;
+        for (const pid of pids) {
+            let m = -1;
+            try {
+                m = statSync(mailbox.mailboxFilePath(pid)).mtimeMs;
+            }
+            catch {
+                // ENOENT: mailbox file not created yet
+            }
+            if (m !== -1 && lastMtimes.get(pid) !== m)
+                changed = true;
+            lastMtimes.set(pid, m);
         }
-        if (stat && stat.mtimeMs !== lastMtime) {
-            lastMtime = stat.mtimeMs;
-            const reply = require_reply_to
-                ? mailbox.drainMatchingReply(my_pid, from_session_id, request_id)
-                : mailbox.drainMatchingSession(my_pid, from_session_id);
+        if (changed) {
+            const reply = drainAskPeerReply(pids, from_session_id, request_id, require_reply_to);
             if (reply)
                 return reply;
         }
@@ -1599,15 +1682,18 @@ async function askPeerPoll(my_pid, from_session_id, request_id, require_reply_to
     }
     return null;
 }
-function drainAskPeerReply(my_pid, from_session_id, request_id, require_reply_to) {
+function drainAskPeerReply(pids, from_session_id, request_id, require_reply_to) {
+    // Correlated peers: union-drain by reply_to across the requester's siblings.
+    // Legacy/uncorrelated peers: keep the best-effort own-pid session match (no
+    // request_id to correlate the union safely).
     return require_reply_to
-        ? mailbox.drainMatchingReply(my_pid, from_session_id, request_id)
-        : mailbox.drainMatchingSession(my_pid, from_session_id);
+        ? mailbox.drainMatchingReplyMany(pids, from_session_id, request_id)
+        : mailbox.drainMatchingSession(pids[0], from_session_id);
 }
 server.registerTool("ask_peer", {
     description: [
         "Delegate-and-wait: enqueue a message to a peer in the same project root, wake them, and block until they reply (via send_message) or the timeout elapses. Use this for back-and-forth; use send_message for fire-and-forget.",
-        "Wakes the peer via per-client tmux send-keys (Codex gets a paste-burst-aware gap, Claude Code doesn't), then polls for a reply. For reply_to-capable peers, only from_session_id + reply_to == request_id satisfies the wait; legacy peers fall back to best-effort from_session_id matching and the response reports correlation:\"uncorrelated\". Response carries wake_status: \"fired\" | \"skipped_busy\" | \"skipped_no_target\" | \"disabled\" (skipped_unsupported is reserved). A peer that is mid-turn is NOT keystroke-woken (skipped_busy) — its hook/poll delivers the enqueued message and we still poll for the reply. Returns reply: null, timed_out: true on timeout (default 45000ms, override per call with timeout_ms, or set OXTAIL_ASK_PEER_TIMEOUT_MS at startup). timeout_ms is clamped to a safe ceiling (default 100000ms, env OXTAIL_ASK_PEER_MAX_TIMEOUT_MS) so the wait can't outlast the client's tool-call abort window — exceeding it makes the client hard-fail the call instead of returning graceful timed_out; the response reports timeout_clamped_from_ms when clamped. Late replies still arrive via read_my_messages / the hook.",
+        "Wakes the peer via per-client tmux send-keys (Codex gets a paste-burst-aware gap, Claude Code doesn't), then polls for a reply. For reply_to-capable peers, only from_session_id + reply_to == request_id satisfies the wait; legacy peers fall back to best-effort from_session_id matching and the response reports correlation:\"uncorrelated\". Response carries wake_status: \"fired\" | \"skipped_busy\" | \"skipped_no_target\" | \"disabled\" (skipped_unsupported is reserved). A peer that is mid-turn is NOT keystroke-woken (skipped_busy) — its hook/poll delivers the enqueued message and we still poll for the reply. Returns reply: null, timed_out: true on timeout (default 60000ms, override per call with timeout_ms, or set OXTAIL_ASK_PEER_TIMEOUT_MS at startup). timeout_ms is clamped to a safe ceiling (default 100000ms, env OXTAIL_ASK_PEER_MAX_TIMEOUT_MS) so the wait can't outlast the client's tool-call abort window — exceeding it makes the client hard-fail the call instead of returning graceful timed_out; the response reports timeout_clamped_from_ms when clamped. DURABLE DELEGATION: on timeout (correlated peers, claimed requester), the request is recorded as a pending obligation, so when the peer's reply finally arrives — minutes or hours later — it WAKES you back (wake_reason late_reply_to_pending), not just landing silently in read_my_messages. So ask_peer is safe for long tasks: let it time out, end your turn, get pulled back when the work is done.",
         "Target must have a registered client.session_id (Codex peers call claim_session first). Body is verbatim — frame it as an assignment (objective + requested action) so it reads as delegation, not chat. Wake overridable via OXTAIL_ASK_PEER_WAKE_STRATEGY=auto|legacy|off.",
     ].join(" "),
     inputSchema: {
@@ -1656,6 +1742,10 @@ server.registerTool("ask_peer", {
     const requestId = randomBytes(8).toString("hex");
     const requireReplyTo = peerSupportsReplyTo(peer);
     const fromSessionId = entry.client.session_id ?? undefined;
+    // The reply is addressed to OUR session_id; resolveTarget enqueues it to the
+    // session's freshest sibling, which may not be entry.server_pid. Drain the
+    // union (own pid first for fast-path locality), mirroring read_my_messages.
+    const myPids = requesterPids(entry.server_pid, fromSessionId);
     // Record-before-append (mirrors send_message): lets the peer answer with
     // reply_to_message(message_id) instead of hand-wiring target + reply_to.
     const msg = deliverToPeer(expectedSessionId, peer.server_pid, body, fromSessionId, {
@@ -1683,7 +1773,7 @@ server.registerTool("ask_peer", {
         // our outbound arrived, their hook delivered it as additionalContext and
         // their response may already be in our mailbox.
         await askPeerDelay(ASK_PEER_GRACE_MS, extra.signal);
-        reply = drainAskPeerReply(entry.server_pid, expectedSessionId, requestId, requireReplyTo);
+        reply = drainAskPeerReply(myPids, expectedSessionId, requestId, requireReplyTo);
         if (!reply) {
             // Common path: peer was idle. Route the wake per client_type, but skip
             // the keystroke if the peer is FRESHLY busy (mid-turn): typing into a
@@ -1706,7 +1796,7 @@ server.registerTool("ask_peer", {
                 // return this and the caller fail-fasts instead of polling.
             }
             else {
-                reply = await askPeerPoll(entry.server_pid, expectedSessionId, requestId, requireReplyTo, deadlineMs, extra.signal);
+                reply = await askPeerPoll(myPids, expectedSessionId, requestId, requireReplyTo, deadlineMs, extra.signal);
             }
         }
         else {
@@ -1749,6 +1839,77 @@ server.registerTool("ask_peer", {
     // attempted) is NOT a timeout; the message has been enqueued and will be
     // delivered when the peer next enters a turn.
     const polled = wakeStatus !== "skipped_unsupported";
+    // Durable delegation: we polled to the deadline with no reply. Record a
+    // pending obligation FIRST, then do one final authoritative UNION drain —
+    // write-before-final-drain closes the poll-vs-deadline TOCTOU. A reply that
+    // landed in the gap is caught here and returned now; a reply that arrives
+    // AFTER finds the persisted record and pulls us back via resolveSendWake's
+    // late_reply_to_pending path — even minutes/hours later, and even for a
+    // markerless idle Codex requester. Correlated peers + claimed requester only.
+    if (polled && reply === null && !aborted && requireReplyTo) {
+        if (fromSessionId) {
+            const dir = defaultPendingAskDir();
+            // Opportunistic sweep so abandoned records (a reply that never came)
+            // can't accumulate — mirrors gcAutowake inside decideReplyAutoWake.
+            gcPendingAsk(dir, Date.now());
+            // Write the pending obligation BEFORE the final drain (write-before-
+            // final-drain): a reply that lands after the drain finds this record and
+            // wakes us via resolveSendWake; one that landed before is caught below.
+            if (!recordPendingAsk(dir, fromSessionId, requestId, Date.now())) {
+                // Store unwritable → silently degrades to the read_my_messages path
+                // (no durable pull-back). Surface it so the degradation is observable.
+                trace("ask_peer_pending_record_failed", { request_id: requestId });
+            }
+            // Authoritative final drain. Recompute the pid union NOW — a sibling MCP
+            // child may have appeared during the wait. Use the CHECKED variant and
+            // retry any pid we couldn't inspect (transient lock): silently treating
+            // "couldn't read" as "no reply" would leave the record with no later
+            // event to consume it → a stranded pull-back.
+            const finalPids = requesterPids(entry.server_pid, fromSessionId);
+            let drained = mailbox.drainMatchingReplyManyChecked(finalPids, expectedSessionId, requestId);
+            if (drained.skipped.length > 0) {
+                // A pid we couldn't inspect might hold either the already-landed reply
+                // (if we have none yet) OR a migrate-crash duplicate of the reply we DID
+                // pull (which a later read_my_messages would re-deliver). Retry once
+                // after a brief delay for the lock to clear.
+                try {
+                    await askPeerDelay(ASK_PEER_POLL_MS, extra.signal);
+                    if (!drained.reply) {
+                        drained = mailbox.drainMatchingReplyManyChecked(drained.skipped, expectedSessionId, requestId);
+                        if (!drained.reply && drained.skipped.length > 0) {
+                            // Still un-inspectable after the retry: a lock held past the
+                            // acquire budget + retry (SIGSTOP-class / long holder). diagnose
+                            // can use this to tell "no reply" from "a reply may sit behind a
+                            // locked pid" — the record persists, so a later send still wakes.
+                            trace("ask_peer_skipped_after_final_retry", {
+                                request_id: requestId,
+                                skipped: drained.skipped,
+                            });
+                        }
+                    }
+                    else {
+                        // We have the reply — sweep only its exact id from the skipped pids
+                        // (a distinct second reply, different id, is left for read_my_messages).
+                        mailbox.sweepMessageId(drained.skipped, drained.reply.id);
+                    }
+                }
+                catch {
+                    // aborted during the brief retry delay — leave the record; we return
+                    // timed_out and the reply still delivers via read_my_messages.
+                }
+            }
+            if (drained.reply) {
+                consumePendingAsk(dir, fromSessionId, requestId);
+                reply = drained.reply;
+                trace("ask_peer_late_catch", { request_id: requestId, message_id: drained.reply.id });
+            }
+        }
+        else {
+            // Unclaimed requester: a peer can't correlate/reply_to_message back to
+            // us, so there's nothing to durably wake — surface it rather than guess.
+            trace("ask_peer_pending_skipped_unclaimed", { request_id: requestId });
+        }
+    }
     const timedOut = polled && reply === null;
     trace("ask_peer_end", {
         target_session_id: expectedSessionId,

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "oxtail",
-  "version": "0.14.1",
+  "version": "0.15.0",
   "private": false,
   "type": "module",
   "description": "Coordination layer for parallel AI coding agent sessions, exposed over MCP.",

package/scripts/hook-constants.mjs CHANGED Viewed

@@ -33,10 +33,14 @@ export const HOOK_MARKER_KEY = "_oxtailHook";
 //       with no owner check, so during an upgrade window (before re-install) the
 //       old hook can still lose the stall-resume / double-clear races against a
 //       v6 peer. The version bump forces re-install to close that window.
+//   v7: pretooluse re-stamps the "busy" activity marker on every tool call, so a
+//       long ACTIVE turn stays fresh and doesn't invite a spurious wake:auto once
+//       it outruns ACTIVITY_BUSY_TTL_MS. A stale pre-v7 hook just doesn't refresh
+//       (the prior behavior) — never wrong, only less fresh on long turns.
 // INVARIANT: any change to an assets/*.sh script MUST bump this version, so
 // existing installs are forced to re-install. scripts/check-hook-version.mjs
 // enforces this in CI.
-export const HOOK_MARKER_VERSION = 6;
+export const HOOK_MARKER_VERSION = 7;
 const HOOKS_DIR = path.join(os.homedir(), ".oxtail", "hooks");