oxtail 0.14.1 → 0.15.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -67,7 +67,7 @@ Contributing? `git clone https://github.com/d4j3y2k/oxtail && cd oxtail && npm i
67
67
  - `set_my_state` — write a small "state card" onto this session's registry entry so peers can see what we're doing without reading our transcript. v1 surfaces a single field, `purpose` (≤200 chars).
68
68
  - `send_message` — **fire-and-forget** message to a peer. Target is a tmux session name or a raw `client_session_id` UUID. Body ≤ 8KB. Delivery is async via the peer's mailbox file. A plain message does **not** wake an idle peer; pass `wake: "auto"` to nudge one (state-gated — see [Waking an idle peer](#waking-an-idle-peer)). Replies to `ask_peer` should pass `reply_to: "<request_id>"` when the inbound message carries a `request_id` — and a reply **auto-wakes the requester by default** (strictly gated; `wake: "off"` opts out). (v0.5+)
69
69
  - `read_my_messages` — drain this session's mailbox and return any queued messages. Messages include `from_session_id`, server-stamped `origin: "peer"`, and optional `request_id` / `reply_to`. Codex peers (and unhooked Claude Code) poll this; Claude Code peers with the hooks installed see messages mid-turn (PreToolUse) or at turn end (Stop) instead. (v0.5+)
70
- - `ask_peer` — **delegate-and-wait**. Enqueues a message with a `request_id` and blocks server-side until the peer replies with `send_message({ reply_to: request_id })` or the timeout elapses. Default timeout is 45s (`OXTAIL_ASK_PEER_TIMEOUT_MS`), and each call may pass `timeout_ms`. New peers use strict `reply_to` correlation; legacy/no-capability peers fall back to best-effort first-message matching and the response reports `correlation: "uncorrelated"`. That legacy path may stale-match old same-peer chatter, so callers should treat `uncorrelated` as compatibility-only. Use `send_message` for fire-and-forget. (v0.7+)
70
+ - `ask_peer` — **delegate-and-wait**. Enqueues a message with a `request_id` and blocks server-side until the peer replies with `send_message({ reply_to: request_id })` or the timeout elapses. Default timeout is 60s (`OXTAIL_ASK_PEER_TIMEOUT_MS`), and each call may pass `timeout_ms`. New peers use strict `reply_to` correlation; legacy/no-capability peers fall back to best-effort first-message matching and the response reports `correlation: "uncorrelated"`. That legacy path may stale-match old same-peer chatter, so callers should treat `uncorrelated` as compatibility-only. **Durable on timeout (v0.15+):** if the wait elapses, the request is recorded as a pending obligation, so when the peer's reply finally arrives — minutes or hours later — it *wakes the requester back* (`wake_reason: "late_reply_to_pending"`) instead of landing silently. That makes `ask_peer` safe for long-running delegations: let it time out, end the turn, get pulled back when the work is done. Use `send_message` for fire-and-forget. (v0.7+)
71
71
  - `reply_to_message` — **reply by `message_id`**. The atomic, correlation-safe alternative to hand-wiring `send_message`'s `target` + `reply_to`: pass the `message_id` the hook or `read_my_messages` showed you and the server looks the inbound envelope up in this session's durable **received-ledger**, derives the reply target (the original sender), carries `reply_to: request_id` when the inbound was an `ask_peer` (keeping the exchange correlated), and stamps `source_message_id`. Replying to a plain `send_message` works too — it just omits `reply_to`. Ownership is structural (you can only reply to a message delivered to *you*); fail-closed on an unknown/aged-out id. Same wake semantics as `send_message`, including the wake-on-reply default. (v0.13+)
72
72
  - `register_my_session` — pin this MCP server's `session_id` directly. Kept for debugging; prefer `claim_session`.
73
73
  - `get_my_session` — return this MCP server's own registry entry plus a per-strategy detection diagnosis. Useful for debugging.
@@ -163,7 +163,9 @@ send_message({ target: "<requester>", body: "...", reply_to: "<request_id>" })
163
163
 
164
164
  The reply path is deliberately **stricter** than explicit `wake: "auto"`. It fires only when the target is **freshly idle** — an `idle` activity marker newer than `OXTAIL_AUTOWAKE_FRESH_IDLE_MS` (default 5 min). Stale, unknown, missing, or busy state yields `skipped_no_fresh_idle` (no best-effort wake — typing unprompted into a terminal that may be unattended is the risk we refuse to take). Two more guards bound it: a **per-target rate limit** (`OXTAIL_AUTOWAKE_MIN_INTERVAL_MS`, default 4s → `skipped_rate_limited`) since one wake already drains the whole mailbox, and a **one-wake dedupe** keyed on `(session_id, reply_to)` (`skipped_deduped`) so a duplicate or late hook drain of the same reply can't re-fire. If the dedupe/rate store is somehow unwritable the wake degrades to `skipped_store_error` rather than failing the (already-delivered) message. The env kill-switch `OXTAIL_AUTOWAKE=off` disables reply auto-wake entirely (`wake_status: "disabled"`). Every outcome that reaches the gate surfaces a `wake_status`; the reply path also stamps `wake_reason: "reply_to_default"` (present even on a resolve error like `ambiguous-target`, where there's no single target to wake).
165
165
 
166
- **Coverage (which requesters this reaches).** The fresh-idle gate keys on the requester's busy/idle activity marker, which only the Claude Code hooks maintain. So wake-on-reply currently closes the stranding for a **hooked Claude Code requester** (the originally-observed case: a peer's async reply to an idle Claude session). A **Codex** requester — or a Claude requester without the hooks installed — has no idle marker, so a reply with `wake` unset returns `skipped_no_fresh_idle` and is **not** auto-woken; reach it with an explicit `wake: "auto"`, which always takes the lenient wake path (idle/unknown/stale all wake; only a fresh-`busy` peer is skipped) and bypasses the strict fresh-idle gate even for a reply. Closing the Codex/unhooked-requester direction *by default* needs a requester-side waiter signal (`expects_reply`), which is the next slice — a blind `unknown ⇒ wake` default is deliberately avoided because it reintroduces the double-wake-an-active-waiter risk this gate exists to prevent.
166
+ **Coverage (which requesters this reaches).** The fresh-idle gate keys on the requester's busy/idle activity marker, which only the Claude Code hooks maintain. So wake-on-reply currently closes the stranding for a **hooked Claude Code requester** (the originally-observed case: a peer's async reply to an idle Claude session). A **Codex** requester — or a Claude requester without the hooks installed — has no idle marker, so a reply with `wake` unset returns `skipped_no_fresh_idle` and is **not** auto-woken; reach it with an explicit `wake: "auto"`, which always takes the lenient wake path (idle/unknown/stale all wake; only a fresh-`busy` peer is skipped) and bypasses the strict fresh-idle gate even for a reply.
167
+
168
+ For the **`ask_peer` case specifically**, the Codex/unhooked-requester direction is now closed *by default* (v0.15+, see [Durable `ask_peer`](#durable-ask_peer-long-efforts) below): a timed-out `ask_peer` records a durable **pending-ask** keyed on the requester's `session_id` + `request_id`, and the matching late reply takes the lenient wake path regardless of any idle marker — so even a markerless idle Codex requester is pulled back. This is exactly the requester-side waiter signal the blind `unknown ⇒ wake` default was avoided for: it's evidence the requester *explicitly asked and is waiting*, so it can't double-wake an unrelated active turn.
167
169
 
168
170
  **Codex and the wake matrix.** The send-keys wake needs a tmux pane. A Codex peer running **outside tmux** has none, so it returns `wake_status: "skipped_no_target"` — its idle delivery stays poll-based (`read_my_messages`). Run Codex **inside a tmux pane** to get symmetric idle-wake; the routing already handles the Codex paste-burst case.
169
171
 
@@ -238,17 +240,34 @@ All wake paths funnel through one place, which **coalesces** rapid repeat wakes
238
240
  ### Constraints
239
241
 
240
242
  - The target peer must have a registered `client.session_id`. Codex peers must call `claim_session` / `register_my_session` first; without that, `ask_peer` returns `error: "peer-has-no-session-id"` rather than guessing.
241
- - Timeout defaults to 45000ms (conservative under typical MCP-client tool-call abort windows). Pass `timeout_ms` on a call when a specific delegation needs a different bound; max 300000ms.
243
+ - Timeout defaults to 60000ms — enough headroom for a slower multi-tool-call peer reply (e.g. a Codex peer running `set_my_state` + `reply_to_message` + composing a report, observed ~46s) while staying under both known callers' tool-call abort windows (Claude Code is clean to ~60s; Codex aborts ~120s). Pass `timeout_ms` on a call when a specific delegation needs a different bound; max 300000ms.
242
244
 
243
245
  ### Tuning the timeout
244
246
 
245
- If `ask_peer` returns an abort error before its built-in 45s timeout fires, your MCP client's tool-call ceiling is lower than 45s. Override the bound at server startup:
247
+ If `ask_peer` returns an abort error before its built-in 60s timeout fires, your MCP client's tool-call ceiling is lower than 60s. Override the bound at server startup:
246
248
 
247
249
  ```sh
248
250
  OXTAIL_ASK_PEER_TIMEOUT_MS=30000 npx -y oxtail@0.10.1
249
251
  ```
250
252
 
251
- The server reads the env var once at boot and uses it as the fixed timeout for all `ask_peer` calls in that session. Values must be positive numbers; anything else falls back to the 45000ms default.
253
+ The server reads the env var once at boot and uses it as the fixed timeout for all `ask_peer` calls in that session. Values must be positive numbers; anything else falls back to the 60000ms default.
254
+
255
+ ### Durable `ask_peer` (long efforts)
256
+
257
+ The blocking wait is a *short* primitive (bounded by the client's tool-call abort window, ~60s). A real task can take minutes or hours — far longer than any wait can block. So `ask_peer` decouples the **wait** from the **delivery of the answer**:
258
+
259
+ - On timeout (for a correlated peer + a claimed requester), the request is recorded as a durable **pending-ask** at `~/.oxtail/pending-ask/p-<hash(session_id, request_id)>`, keyed on the *requester's* `session_id` + `request_id`. A `recordPendingAsk` runs **before** one final authoritative union-drain of the requester's mailbox (write-before-final-drain), so a reply that lands in the poll-vs-deadline gap is returned immediately, and a reply that arrives later finds the persisted record.
260
+ - When that reply eventually arrives, `resolveSendWake` finds the matching pending-ask, **consumes it** (atomic `unlink`, single-winner — a duplicate/re-delivered reply can't re-fire), and takes the **lenient** wake path (`wake_reason: "late_reply_to_pending"`). Because the record is *proof the requester explicitly asked and is waiting*, the wake fires regardless of the 5-min fresh-idle window — and reaches a **markerless idle Codex** requester that the strict reply-default would skip. It also stamps the autowake dedupe for `(session_id, request_id)` so a later duplicate can't strict-wake via the fresh-idle fallback.
261
+ - `wake: "off"` still **consumes** the record (the obligation is satisfied — leaving it would let a later duplicate wake and violate the explicit off) but suppresses the wake (`wake_reason: "late_reply_to_pending_suppressed"`). The automatic (wake-unset) path honors `OXTAIL_AUTOWAKE=off` (`wake_status: "disabled"`); an explicit `wake: "auto"` intentionally does not.
262
+ - The reply drain is a **union across the requester's sibling MCP-child pids** (`drainMatchingReplyMany`), mirroring `read_my_messages` — a dual-scope requester's reply may land in a sibling pid, not the one that blocked in `ask_peer`.
263
+
264
+ Records are honored for `OXTAIL_PENDING_ASK_TTL_MS` (default 1h, sized for long efforts): a reply after that still delivers durably via `read_my_messages` but won't fire the pull-back wake (`consumePendingAsk` is TTL-aware — it removes an over-TTL record without waking). GC is **opportunistic** — abandoned records (a reply that never came) are swept when a later `ask_peer` times out, not on a wall-clock timer; the files are tiny, and a reply always cleans up its own record on arrival.
265
+
266
+ **The pattern:** `ask_peer` a long task → let it return `timed_out: true` → end your turn → get woken when the answer lands. Pair with a generous `OXTAIL_ACTIVITY_BUSY_TTL_MS` if your turns run long (see below).
267
+
268
+ ### Keeping a long turn marked busy
269
+
270
+ `wake: "auto"` skips a peer that is **freshly `busy`** (mid-turn — its hooks deliver, so a keystroke wake would be noise). The `busy` marker is set at turn start (UserPromptSubmit hook) and **re-stamped on every tool call** (PreToolUse hook, v0.15+), so a long *active* turn stays fresh and never invites a spurious wake. A turn that stops making tool calls — one giant single tool call, or a crash without a clean Stop — ages past `OXTAIL_ACTIVITY_BUSY_TTL_MS` (default 10 min) and then *does* wake, which is the intended stale-busy → recovery behavior. Widen the TTL for deployments with very long single-tool-call turns.
252
271
 
253
272
  ### Recommended permissions for autonomous agent-to-agent collaboration
254
273
 
@@ -306,8 +325,9 @@ A scheduled CI job (`.github/workflows/codex-drift.yml`, also runnable on demand
306
325
 
307
326
  ## Status
308
327
 
309
- v0.13.0. Pushes the autonomous peer-messaging matrix toward zero human relay, hardens the wake path, then makes correlated replies atomic.
328
+ v0.15.0. Pushes the autonomous peer-messaging matrix toward zero human relay, hardens the wake path, makes correlated replies atomic, and makes delegation durable across long (minutes-to-hours) efforts.
310
329
 
330
+ - **Durable `ask_peer` + long-effort liveness (v0.15.0).** A timed-out `ask_peer` records a pending obligation (`~/.oxtail/pending-ask/`, keyed on requester `session_id` + `request_id`, written *before* a final authoritative union-drain), so the peer's reply — arriving minutes or hours later — *wakes the requester back* (`wake_reason: "late_reply_to_pending"`) instead of landing silently. The pull-back takes the lenient wake path, so it reaches even a markerless idle Codex requester — closing the last wake-on-reply asymmetry. The reply drain unions the requester's sibling MCP-child pids (and sweeps migrate-crash duplicates) so a dual-scope reply can't strand. Separately, the `PreToolUse` hook now re-stamps the `busy` marker every tool call, so a long *active* turn never reads as stale-busy and invites a spurious wake. New env: `OXTAIL_PENDING_ASK_TTL_MS` (1h), `OXTAIL_ACTIVITY_BUSY_TTL_MS` (10m); `ask_peer` default timeout 45s→60s.
311
331
  - **Reply by id (v0.13.0).** `reply_to_message(message_id, body)` removes the manual `target` + `reply_to` rewiring that silently degraded a correlated exchange into loose mailbox traffic: the server looks the inbound envelope up in a durable per-session **received-ledger** (`~/.oxtail/received/<hash(session_id)>.jsonl`), derives the reply target and `reply_to` itself, and enforces ownership structurally (you can only reply to a message delivered to you). The ledger is written *before* the mailbox line is visible — so a handle the hook displays is always resolvable even though both delivery paths destroy the queue entry once it is handed off. Fail-closed on an unknown/aged-out id.
312
332
  - **Wake-on-reply (v0.11.0).** A reply — `send_message` with `reply_to` — auto-wakes a freshly-idle requester by default, so an awaited answer doesn't strand an idle peer. Strictly gated (fresh-idle only, per-target rate limit, one-wake dedupe, `OXTAIL_AUTOWAKE=off` kill-switch). `wake:"off"` opts out; explicit `wake:"auto"` is the escape hatch for a requester without an idle marker (Codex / hookless Claude).
313
333
  - **Wake hardening (v0.12.0).** Wake keystrokes only ever target the pane the process tree confirms hosts the peer's `server_pid` — never a self-written `tmux_pane`/`tmux_session`, and registry entries whose `server_pid` doesn't match their filename are rejected. Rapid repeat wakes to one peer are coalesced (`skipped_debounced`). `oxtail diagnose` summarizes wake outcomes from `MCP_TRACE_FILE`, and a scheduled CI job flags drift in Codex's paste-burst window before it can break the wake.
@@ -42,6 +42,18 @@ if [ ! -t 0 ]; then
42
42
  fi
43
43
  [ -z "$sid" ] && exit 0
44
44
 
45
+ # Re-stamp "busy" on EVERY tool call (before any early-exit below) so a long,
46
+ # ACTIVE turn keeps a fresh marker and never reads as stale-busy (>TTL) to a
47
+ # peer's wake:auto. UserPromptSubmit sets "busy" once at turn start; without this
48
+ # a turn outrunning the TTL would invite a spurious keystroke wake into a working
49
+ # agent. The Stop hook flips this back to "idle" on a real stop. Keyed by
50
+ # session_id; sanitization MUST match the server's activitySessionKey().
51
+ safe_sid=$(printf '%s' "$sid" | tr -c 'A-Za-z0-9_-' '_')
52
+ [ -n "$safe_sid" ] && {
53
+ mkdir -p "$HOME/.oxtail/activity" 2>/dev/null || true
54
+ printf 'busy' > "$HOME/.oxtail/activity/$safe_sid" 2>/dev/null || true
55
+ }
56
+
45
57
  sessions_dir="$HOME/.oxtail/sessions"
46
58
  mailboxes_dir="$HOME/.oxtail/mailboxes"
47
59
  [ -d "$sessions_dir" ] || exit 0
package/dist/mailbox.js CHANGED
@@ -349,6 +349,79 @@ export function drainMatchingSession(my_pid, from_session_id) {
349
349
  export function drainMatchingReply(my_pid, from_session_id, reply_to) {
350
350
  return drainFirstMatching(my_pid, (msg) => msg.from_session_id === from_session_id && msg.reply_to === reply_to);
351
351
  }
352
+ // Union variant of drainMatchingReply across a session's sibling/previous MCP
353
+ // child pids. ask_peer waits on the requester's OWN pid, but the reply is
354
+ // addressed by client.session_id and resolveTarget(readAll) enqueues it to the
355
+ // session's freshest sibling — which, in a dual-scope / pid-rotation setup, may
356
+ // NOT be the pid blocked in ask_peer. A single-pid drain would then miss a reply
357
+ // that already landed in a sibling mailbox and strand it. Mirrors the session
358
+ // union read_my_messages / the PreToolUse hook already use.
359
+ //
360
+ // Returns the FIRST matching reply across the (deduped) pids. It does NOT pull
361
+ // every match: two DISTINCT replies to the same request_id (an answer + a
362
+ // follow-up correction) must not both be drained with one silently dropped — the
363
+ // second stays for read_my_messages. But once the first match is found, it DOES
364
+ // sweep an exact same-message_id duplicate out of the remaining pids: a
365
+ // migrate-crash can leave the SAME message in two siblings, and if we returned
366
+ // one copy and left the other, a later union drain would see only the lone
367
+ // survivor and re-deliver it as a "new" message. Sweeping by message_id removes
368
+ // the duplicate while leaving any distinct reply intact.
369
+ //
370
+ // `skipped` reports pids that could not be inspected (lock contention after the
371
+ // internal acquire-retry budget). The poll tolerates this (it retries next tick);
372
+ // the authoritative final drain in ask_peer retries the skipped pids so a
373
+ // transiently-locked sibling holding the reply isn't mistaken for "no reply".
374
+ export function drainMatchingReplyManyChecked(pids, from_session_id, reply_to) {
375
+ const seen = new Set();
376
+ const skipped = [];
377
+ let found = null;
378
+ for (const pid of pids) {
379
+ if (seen.has(pid))
380
+ continue;
381
+ seen.add(pid);
382
+ try {
383
+ if (!found) {
384
+ const m = drainMatchingReply(pid, from_session_id, reply_to);
385
+ if (m)
386
+ found = m;
387
+ }
388
+ else {
389
+ // Sweep an exact-message_id duplicate (migrate-crash) from this sibling;
390
+ // a distinct reply (different id) is left untouched.
391
+ const dupId = found.id;
392
+ drainFirstMatching(pid, (msg) => msg.id === dupId);
393
+ }
394
+ }
395
+ catch {
396
+ skipped.push(pid);
397
+ }
398
+ }
399
+ return { reply: found, skipped };
400
+ }
401
+ export function drainMatchingReplyMany(pids, from_session_id, reply_to) {
402
+ return drainMatchingReplyManyChecked(pids, from_session_id, reply_to).reply;
403
+ }
404
+ // Best-effort removal of an EXACT message_id from each of `pids`. Used to clean
405
+ // up a migrate-crash duplicate that was left in a pid the union drain couldn't
406
+ // inspect (lock contention) at the time the reply was pulled from another pid —
407
+ // otherwise a later read_my_messages would re-deliver the lone survivor as a
408
+ // "new" message. Matches by message_id only, so a DISTINCT reply (different id)
409
+ // in the same pid is never touched. Per-pid errors are skipped.
410
+ export function sweepMessageId(pids, messageId) {
411
+ const seen = new Set();
412
+ for (const pid of pids) {
413
+ if (seen.has(pid))
414
+ continue;
415
+ seen.add(pid);
416
+ try {
417
+ drainFirstMatching(pid, (msg) => msg.id === messageId);
418
+ }
419
+ catch {
420
+ // best effort — a still-locked pid is left; the dup is a rare crash-window
421
+ // artifact and the cost is at most one re-delivered (same-id) message.
422
+ }
423
+ }
424
+ }
352
425
  function drainFirstMatching(my_pid, matches) {
353
426
  acquireLock(my_pid);
354
427
  try {
@@ -0,0 +1,167 @@
1
+ // Pending-ask registry — durable ask_peer (the long-effort liveness fix).
2
+ //
3
+ // When an ask_peer wait TIMES OUT, the requester records a "pending ask" here:
4
+ // a durable note that it is still awaiting a reply correlated by request_id.
5
+ // When that reply eventually arrives — minutes or hours later, long after the
6
+ // 5-minute fresh-idle window the strict reply-default wake is gated to — the
7
+ // reply handler (server.ts resolveSendWake) finds the matching record and fires
8
+ // a LENIENT wake to pull the requester back, instead of stranding it idle until
9
+ // its next turn. This is what turns ask_peer into "delegate a long task and get
10
+ // pulled back the moment it's done", and it also reaches a markerless idle Codex
11
+ // requester that the fresh-idle gate would skip as skipped_no_fresh_idle.
12
+ //
13
+ // Design mirrors autowake.ts exactly: one small file per record under
14
+ // ~/.oxtail/pending-ask/, mtime is the source of truth (driven by an injected
15
+ // nowMs so it's deterministic in tests), the body is a debug breadcrumb, GC'd by
16
+ // age. Keyed on the REQUESTER's client.session_id + the request_id (the agent
17
+ // identity per AGENTS.md, never server_pid). Best-effort: a broken store
18
+ // degrades to "no record" — it NEVER throws, because a thrown error here would
19
+ // surface on an already-enqueued/already-delivered message and invite a retry.
20
+ import { createHash } from "node:crypto";
21
+ import { closeSync, mkdirSync, openSync, readdirSync, statSync, unlinkSync, utimesSync, writeFileSync, } from "node:fs";
22
+ import { homedir } from "node:os";
23
+ import { join } from "node:path";
24
+ function envPosInt(name, def, env = process.env) {
25
+ const v = env[name];
26
+ if (!v)
27
+ return def;
28
+ const n = Number(v);
29
+ return Number.isFinite(n) && n > 0 ? n : def;
30
+ }
31
+ // How long a recorded pending-ask is honored before GC reclaims it. Sized for
32
+ // long efforts (a delegated task that runs for the better part of an hour) — a
33
+ // reply arriving after this window still delivers durably via read_my_messages,
34
+ // it just won't fire the pull-back wake. Generous by default; tunable.
35
+ export const PENDING_ASK_TTL_MS = envPosInt("OXTAIL_PENDING_ASK_TTL_MS", 60 * 60 * 1000);
36
+ export function defaultPendingAskDir() {
37
+ return join(homedir(), ".oxtail", "pending-ask");
38
+ }
39
+ function hash(s) {
40
+ // request_id is caller-influenced, so never build a filename from it directly.
41
+ return createHash("sha256").update(s).digest("hex").slice(0, 32);
42
+ }
43
+ function recordPath(dir, sessionId, requestId) {
44
+ // JSON-encode the pair so the (sessionId, requestId) boundary is unambiguous
45
+ // and can't be crafted to collide with a different split (mirrors autowake.ts).
46
+ return join(dir, `p-${hash(JSON.stringify([sessionId, requestId]))}`);
47
+ }
48
+ function setMtime(path, nowMs) {
49
+ const t = nowMs / 1000;
50
+ try {
51
+ utimesSync(path, t, t);
52
+ }
53
+ catch {
54
+ // best effort — mtime drives TTL math; a failure only skews freshness by the
55
+ // small real-vs-injected clock delta.
56
+ }
57
+ }
58
+ // Record a pending ask. Atomic create-exclusive so a duplicate record (same
59
+ // requester + request_id) is a no-op rather than resetting the TTL clock.
60
+ // Returns true if a record now exists for this pair (freshly written OR already
61
+ // present), false only on a missing identity or an unusable store. Never throws.
62
+ export function recordPendingAsk(dir, sessionId, requestId, nowMs) {
63
+ // Never key on an empty identity: an unclaimed requester can't be correlated
64
+ // or replied-to, so there's nothing to wake later.
65
+ if (!sessionId || !requestId)
66
+ return false;
67
+ try {
68
+ mkdirSync(dir, { recursive: true, mode: 0o700 });
69
+ const p = recordPath(dir, sessionId, requestId);
70
+ try {
71
+ const fd = openSync(p, "wx"); // atomic create-exclusive
72
+ try {
73
+ writeFileSync(fd, JSON.stringify({ sessionId, requestId, at: nowMs }));
74
+ }
75
+ finally {
76
+ closeSync(fd);
77
+ }
78
+ setMtime(p, nowMs);
79
+ return true;
80
+ }
81
+ catch (e) {
82
+ // EEXIST: a record already exists → fine, leave its original mtime so the
83
+ // TTL counts from the first record, not this duplicate.
84
+ if (e.code === "EEXIST")
85
+ return true;
86
+ throw e;
87
+ }
88
+ }
89
+ catch {
90
+ // Store unusable (e.g. ~/.oxtail/pending-ask is a file, permission error) —
91
+ // degrade to "no durable record"; the strict fresh-idle reply-default still
92
+ // covers a Claude requester that went idle <5 min ago.
93
+ return false;
94
+ }
95
+ }
96
+ // Read-only: is there a live (within TTL) pending-ask for this pair?
97
+ export function hasPendingAsk(dir, sessionId, requestId, nowMs, ttlMs = PENDING_ASK_TTL_MS) {
98
+ if (!sessionId || !requestId)
99
+ return false;
100
+ try {
101
+ const st = statSync(recordPath(dir, sessionId, requestId));
102
+ return nowMs - st.mtimeMs < ttlMs;
103
+ }
104
+ catch {
105
+ return false;
106
+ }
107
+ }
108
+ // Atomically consume (delete) the pending-ask for this pair. Returns true iff a
109
+ // record existed, was within the TTL, and THIS caller removed it — the
110
+ // single-winner signal the reply handler uses to fire exactly one pull-back
111
+ // wake. A concurrent second reply (or a re-delivered duplicate) racing the same
112
+ // key loses: unlinkSync throws ENOENT for the loser, so it returns false and
113
+ // does not re-wake.
114
+ //
115
+ // When nowMs is supplied, an OVER-TTL record is still unlinked (so a stale
116
+ // record can't leak) but the function returns false — honoring the contract that
117
+ // a reply arriving after PENDING_ASK_TTL_MS still delivers durably but does NOT
118
+ // fire the late wake. Omit nowMs to consume regardless of age (used right after
119
+ // recordPendingAsk, where the record is freshly written).
120
+ export function consumePendingAsk(dir, sessionId, requestId, nowMs, ttlMs = PENDING_ASK_TTL_MS) {
121
+ if (!sessionId || !requestId)
122
+ return false;
123
+ const p = recordPath(dir, sessionId, requestId);
124
+ let withinTtl = true;
125
+ if (nowMs !== undefined) {
126
+ try {
127
+ withinTtl = nowMs - statSync(p).mtimeMs < ttlMs;
128
+ }
129
+ catch {
130
+ return false; // no record to consume
131
+ }
132
+ }
133
+ try {
134
+ unlinkSync(p); // remove regardless of age so a stale record can't leak
135
+ }
136
+ catch {
137
+ // ENOENT (no record / already consumed by a racing caller) or any store
138
+ // error → not ours to act on.
139
+ return false;
140
+ }
141
+ return withinTtl;
142
+ }
143
+ // Remove pending-ask records older than the TTL. Cheap, low-volume dir; run
144
+ // opportunistically so abandoned records (a reply that never came) can't
145
+ // accumulate. Mirrors gcAutowake.
146
+ export function gcPendingAsk(dir, nowMs, ttlMs = PENDING_ASK_TTL_MS) {
147
+ let names;
148
+ try {
149
+ names = readdirSync(dir);
150
+ }
151
+ catch {
152
+ return; // dir not created yet
153
+ }
154
+ for (const name of names) {
155
+ if (name[0] !== "p")
156
+ continue;
157
+ const p = join(dir, name);
158
+ try {
159
+ const st = statSync(p);
160
+ if (nowMs - st.mtimeMs >= ttlMs)
161
+ unlinkSync(p);
162
+ }
163
+ catch {
164
+ // best effort
165
+ }
166
+ }
167
+ }
package/dist/server.js CHANGED
@@ -15,7 +15,8 @@ import * as mailbox from "./mailbox.js";
15
15
  import * as received from "./received.js";
16
16
  import { deliverExistingToPeer, deliverToPeer } from "./delivery.js";
17
17
  import { recoverClaim, resolveAncestors, writeClaim } from "./claims.js";
18
- import { decideReplyAutoWake, defaultAutowakeDir } from "./autowake.js";
18
+ import { autowakeKillSwitchOff, claimWake, decideReplyAutoWake, defaultAutowakeDir, } from "./autowake.js";
19
+ import { consumePendingAsk, defaultPendingAskDir, gcPendingAsk, recordPendingAsk, } from "./pending-ask.js";
19
20
  import { markWoke, newWakeDebounceStore, recentlyWoke } from "./wake-debounce.js";
20
21
  // CLI subcommand dispatch must run before any MCP setup so that
21
22
  // `npx oxtail install-hook` doesn't open an MCP transport or register a
@@ -1010,7 +1011,7 @@ function resolveTarget(target, caller) {
1010
1011
  server.registerTool("send_message", {
1011
1012
  description: [
1012
1013
  "Fire-and-forget message to a peer in the same project root. Target: a tmux session name OR a client_session_id (UUID). Async via the peer's mailbox — delivered mid-turn (PreToolUse hook) or next-turn (read_my_messages); cross-project targets are rejected.",
1013
- "A plain message does NOT wake an idle peer. Pass wake:\"auto\" to nudge one via per-client send-keys, state-gated (skipped if the peer is mid-turn). EXCEPTION (wake-on-reply): when you set reply_to, this auto-wakes the requester by default so your answer doesn't strand them idle — pass wake:\"off\" to suppress. The reply-default wake is strictly gated: it fires only for a FRESHLY-IDLE requester (one whose Claude Code hooks maintain a fresh idle marker), with a per-target rate limit and a one-wake dedupe; env kill-switch OXTAIL_AUTOWAKE=off. A requester with no idle marker (Codex, or Claude without the hooks) returns skipped_no_fresh_idle and is NOT auto-woken — use explicit wake:\"auto\" for those. Response carries wake_status (\"fired\" | \"skipped_busy\" | \"skipped_debounced\" | \"skipped_no_fresh_idle\" | \"skipped_rate_limited\" | \"skipped_deduped\" | \"skipped_store_error\" | \"skipped_no_target\" | \"disabled\") and, on the reply path, wake_reason:\"reply_to_default\".",
1014
+ "A plain message does NOT wake an idle peer. Pass wake:\"auto\" to nudge one via per-client send-keys, state-gated (skipped if the peer is mid-turn). EXCEPTION (wake-on-reply): when you set reply_to, this auto-wakes the requester by default so your answer doesn't strand them idle — pass wake:\"off\" to suppress. The reply-default wake is strictly gated: it fires only for a FRESHLY-IDLE requester (one whose Claude Code hooks maintain a fresh idle marker), with a per-target rate limit and a one-wake dedupe; env kill-switch OXTAIL_AUTOWAKE=off. A requester with no idle marker (Codex, or Claude without the hooks) returns skipped_no_fresh_idle and is NOT auto-woken — use explicit wake:\"auto\" for those. Response carries wake_status (\"fired\" | \"skipped_busy\" | \"skipped_debounced\" | \"skipped_no_fresh_idle\" | \"skipped_rate_limited\" | \"skipped_deduped\" | \"skipped_store_error\" | \"skipped_no_target\" | \"disabled\") and, on the reply path, wake_reason:\"reply_to_default\" — or wake_reason:\"late_reply_to_pending\" when this reply answers an ask_peer that had timed out (durably pulls the requester back regardless of the fresh-idle window; \"late_reply_to_pending_suppressed\" if you passed wake:\"off\").",
1014
1015
  "Body is verbatim — wrap in <system-reminder>...</system-reminder> yourself if you want that framing. When replying to ask_peer, include reply_to: request_id from the inbound message. For a blocking send-and-wait, use ask_peer instead.",
1015
1016
  ].join(" "),
1016
1017
  inputSchema: {
@@ -1085,7 +1086,7 @@ server.registerTool("send_message", {
1085
1086
  server.registerTool("reply_to_message", {
1086
1087
  description: [
1087
1088
  "Reply to a specific inbound peer message by its message_id — the atomic, correlation-safe alternative to hand-wiring send_message's target + reply_to. The server looks the message up in this session's durable received-ledger, so you pass only the message_id the PreToolUse hook or read_my_messages already showed you; it derives the reply target (the original sender), carries reply_to=request_id when the inbound was an ask_peer (keeping the exchange correlated), and sets source_message_id for provenance. Replying to a plain send_message works too — it just omits reply_to. Ownership is structural: you can only reply to a message delivered to you.",
1088
- "Delivery + wake match send_message exactly, including the wake-on-reply default: when the inbound carried a request_id and you leave wake unset, a freshly-idle requester is auto-woken; pass wake:\"auto\" to nudge any idle peer, or wake:\"off\" to suppress. Fail-closed: an unknown or aged-out message_id returns error message-not-found instead of guessing a target.",
1089
+ "Delivery + wake match send_message exactly, including the wake-on-reply default: when the inbound carried a request_id and you leave wake unset, a freshly-idle requester is auto-woken; pass wake:\"auto\" to nudge any idle peer, or wake:\"off\" to suppress. If the inbound ask_peer had since timed out, this reply durably pulls the requester back (wake_reason late_reply_to_pending) regardless of the fresh-idle window. Fail-closed: an unknown or aged-out message_id returns error message-not-found instead of guessing a target.",
1089
1090
  ].join(" "),
1090
1091
  inputSchema: {
1091
1092
  message_id: z
@@ -1261,15 +1262,18 @@ server.registerTool("read_my_messages", {
1261
1262
  // elapses. Reply-to-capable peers must reply with reply_to=request_id; legacy
1262
1263
  // peers fall back to the original from_session_id-only matching.
1263
1264
  //
1264
- // User-tunable override via OXTAIL_ASK_PEER_TIMEOUT_MS; defaults to 45000ms
1265
- // (conservative under typical MCP-client tool-call abort windows). Set to a
1266
- // lower value if your client aborts before our timeout fires.
1265
+ // User-tunable override via OXTAIL_ASK_PEER_TIMEOUT_MS; defaults to 60000ms.
1266
+ // 60s covers a slower multi-tool-call peer reply (a Codex peer composing
1267
+ // set_my_state + reply_to_message + a report was observed at ~46s and falsely
1268
+ // timed out under the old 45s default) while staying under both known callers'
1269
+ // tool-call abort windows: Claude Code is clean to ~60s, Codex aborts ~120s.
1270
+ // Set to a lower value if your client aborts before our timeout fires.
1267
1271
  const ASK_PEER_TIMEOUT_MS = (() => {
1268
1272
  const env = process.env.OXTAIL_ASK_PEER_TIMEOUT_MS;
1269
1273
  if (!env)
1270
- return 45_000;
1274
+ return 60_000;
1271
1275
  const n = Number(env);
1272
- return Number.isFinite(n) && n > 0 ? n : 45_000;
1276
+ return Number.isFinite(n) && n > 0 ? n : 60_000;
1273
1277
  })();
1274
1278
  const ASK_PEER_GRACE_MS = 500;
1275
1279
  const ASK_PEER_POLL_MS = 200;
@@ -1480,7 +1484,19 @@ async function wakePeer(peer) {
1480
1484
  // Keyed by session_id (the agent identity), NOT server_pid: a dual-scope agent
1481
1485
  // has several MCP children sharing one session_id, and the hooks/sender must
1482
1486
  // agree on the key (see AGENTS.md). Must match the sanitization in the hooks.
1483
- const ACTIVITY_BUSY_TTL_MS = 10 * 60 * 1000;
1487
+ // How long a "busy" marker is trusted before a peer treats the turn as stale and
1488
+ // wakes anyway. The PreToolUse hook now re-stamps "busy" on every tool call, so
1489
+ // a long ACTIVE turn stays fresh; this TTL only governs a turn that stops making
1490
+ // tool calls (one giant single tool call, or a crash without a clean Stop) — the
1491
+ // latter is exactly the stale-busy→wake recovery we want. Configurable for
1492
+ // deployments with very long single-tool-call turns.
1493
+ const ACTIVITY_BUSY_TTL_MS = (() => {
1494
+ const env = process.env.OXTAIL_ACTIVITY_BUSY_TTL_MS;
1495
+ if (!env)
1496
+ return 10 * 60 * 1000;
1497
+ const n = Number(env);
1498
+ return Number.isFinite(n) && n > 0 ? n : 10 * 60 * 1000;
1499
+ })();
1484
1500
  function activitySessionKey(sessionId) {
1485
1501
  return sessionId.replace(/[^A-Za-z0-9_-]/g, "_");
1486
1502
  }
@@ -1553,11 +1569,64 @@ async function autoWakeOnReply(peer, replyTo) {
1553
1569
  trace("autowake_reply_fire", { target_session_id: sid });
1554
1570
  return wakePeer(peer);
1555
1571
  }
1556
- // Resolve the wake for a send_message. The strict reply-default path engages
1557
- // only for a reply with wake UNSET; an explicit wake:"auto" always means the
1558
- // lenient wakeForSend path (even for a reply — the Codex/hookless escape hatch),
1559
- // and wake:"off" means no wake. Returns the status + reason to surface.
1572
+ // Stamp the autowake dedupe record for (sessionId, replyTo) when the durable
1573
+ // pending-ask path fires, so a re-delivered / duplicate copy of the SAME reply
1574
+ // can't separately strict-wake the requester via the fresh-idle reply-default
1575
+ // (the in-memory wakePeer debounce is per-process and not reply_to-keyed, so it
1576
+ // doesn't cover a restart or a >1s gap). Best-effort; we're stamping, not gating.
1577
+ //
1578
+ // Like the existing reply-default path (decideReplyAutoWake → claimWake), this is
1579
+ // stamped on the wake ATTEMPT — before wakeForSend's keystroke outcome is known —
1580
+ // and claimWake also stamps the per-target RATE record. Intentional and
1581
+ // consistent with that path: one wake pulls the requester in to drain its whole
1582
+ // mailbox, so a second reply within the rate window doesn't need its own wake.
1583
+ // (It is NOT stamped on the wake:"off" / kill-switch-disabled paths, where no
1584
+ // wake is intended — see resolveSendWake.)
1585
+ function stampReplyWakeDedupe(sessionId, replyTo) {
1586
+ if (!sessionId)
1587
+ return;
1588
+ try {
1589
+ claimWake(defaultAutowakeDir(), sessionId, replyTo, Date.now());
1590
+ }
1591
+ catch {
1592
+ // best effort — a failure only means a duplicate could still strict-wake,
1593
+ // which is harmless (debounced, and the requester drains an empty mailbox).
1594
+ }
1595
+ }
1596
+ // Resolve the wake for a send_message / reply_to_message. Order matters:
1597
+ // 1. DURABLE pending-ask: if this reply satisfies an ask_peer that timed out
1598
+ // and recorded a pending obligation, consume it (regardless of wake mode —
1599
+ // a late reply satisfies the obligation even under wake:"off", and leaving
1600
+ // the record would let a later duplicate wake and violate the explicit off)
1601
+ // and fire the LENIENT wakeForSend so even a long-idle / markerless-Codex
1602
+ // requester is pulled back. The automatic (wake unset) variant honors the
1603
+ // OXTAIL_AUTOWAKE kill-switch; an explicit wake:"auto" intentionally does
1604
+ // not (it's the caller's explicit ask, matching existing semantics).
1605
+ // 2. STRICT reply-default: a reply with wake UNSET and no pending record →
1606
+ // fresh-idle-only auto-wake (autowake.ts), wake_reason "reply_to_default".
1607
+ // 3. Explicit wake:"auto" → lenient wakeForSend. wake:"off" → no wake.
1560
1608
  async function resolveSendWake(peer, wake, replyTo) {
1609
+ if (replyTo) {
1610
+ const sid = peer.client.session_id ?? "";
1611
+ if (consumePendingAsk(defaultPendingAskDir(), sid, replyTo, Date.now())) {
1612
+ // wake:"off" and the kill-switch path do NOT wake — so they must NOT stamp
1613
+ // the wake-dedupe: stamping there would later suppress the strict wake for a
1614
+ // genuine, distinct second reply to the same request_id (no wake happened,
1615
+ // so there is nothing to dedupe against). Only stamp on the path that fires.
1616
+ if (wake === "off") {
1617
+ trace("late_reply_pending_suppressed", { target_session_id: sid });
1618
+ return { wake_reason: "late_reply_to_pending_suppressed" };
1619
+ }
1620
+ if (wake === undefined && autowakeKillSwitchOff()) {
1621
+ return { wake_status: "disabled", wake_reason: "late_reply_to_pending" };
1622
+ }
1623
+ // About to actually wake → stamp so a re-delivered copy of THIS reply can't
1624
+ // strict-wake again via the fresh-idle fallback.
1625
+ stampReplyWakeDedupe(peer.client.session_id, replyTo);
1626
+ trace("late_reply_pending_wake", { target_session_id: sid });
1627
+ return { wake_status: await wakeForSend(peer), wake_reason: "late_reply_to_pending" };
1628
+ }
1629
+ }
1561
1630
  if (replyAutoWakeTriggered(wake, replyTo)) {
1562
1631
  return { wake_status: await autoWakeOnReply(peer, replyTo), wake_reason: "reply_to_default" };
1563
1632
  }
@@ -1571,24 +1640,38 @@ async function resolveSendWake(peer, wake, replyTo) {
1571
1640
  // mailbox lock when there's a probable hit. The lock is held only inside
1572
1641
  // drainMatchingSession (sub-10ms) — never across the poll interval, so the
1573
1642
  // PreToolUse hook on subsequent caller tool calls is never starved.
1574
- async function askPeerPoll(my_pid, from_session_id, request_id, require_reply_to, deadlineMs, signal) {
1575
- let lastMtime = -1;
1576
- const path = mailbox.mailboxFilePath(my_pid);
1643
+ // The requester's mailbox pid union: own pid first (fast-path locality), then
1644
+ // any sibling/previous MCP child sharing the session_id. Recomputed at the final
1645
+ // drain so a sibling that appeared DURING the wait is still covered.
1646
+ function requesterPids(ownPid, sessionId) {
1647
+ return sessionId
1648
+ ? [ownPid, ...sessionPidsForId(sessionId).filter((p) => p !== ownPid)]
1649
+ : [ownPid];
1650
+ }
1651
+ async function askPeerPoll(pids, from_session_id, request_id, require_reply_to, deadlineMs, signal) {
1652
+ // Watch the mtime of EVERY sibling pid's mailbox (a dual-scope requester's
1653
+ // reply may land in a pid other than the one blocked here), draining only when
1654
+ // a file that exists has changed — so the lock is acquired on a probable hit,
1655
+ // never every tick. Mirrors the single-pid optimization, widened to the union.
1656
+ const lastMtimes = new Map();
1577
1657
  while (Date.now() < deadlineMs) {
1578
1658
  if (signal.aborted)
1579
1659
  throw new Error("aborted");
1580
- let stat = null;
1581
- try {
1582
- stat = statSync(path);
1583
- }
1584
- catch {
1585
- // ENOENT: mailbox file not created yet; treat as no change
1660
+ let changed = false;
1661
+ for (const pid of pids) {
1662
+ let m = -1;
1663
+ try {
1664
+ m = statSync(mailbox.mailboxFilePath(pid)).mtimeMs;
1665
+ }
1666
+ catch {
1667
+ // ENOENT: mailbox file not created yet
1668
+ }
1669
+ if (m !== -1 && lastMtimes.get(pid) !== m)
1670
+ changed = true;
1671
+ lastMtimes.set(pid, m);
1586
1672
  }
1587
- if (stat && stat.mtimeMs !== lastMtime) {
1588
- lastMtime = stat.mtimeMs;
1589
- const reply = require_reply_to
1590
- ? mailbox.drainMatchingReply(my_pid, from_session_id, request_id)
1591
- : mailbox.drainMatchingSession(my_pid, from_session_id);
1673
+ if (changed) {
1674
+ const reply = drainAskPeerReply(pids, from_session_id, request_id, require_reply_to);
1592
1675
  if (reply)
1593
1676
  return reply;
1594
1677
  }
@@ -1599,15 +1682,18 @@ async function askPeerPoll(my_pid, from_session_id, request_id, require_reply_to
1599
1682
  }
1600
1683
  return null;
1601
1684
  }
1602
- function drainAskPeerReply(my_pid, from_session_id, request_id, require_reply_to) {
1685
+ function drainAskPeerReply(pids, from_session_id, request_id, require_reply_to) {
1686
+ // Correlated peers: union-drain by reply_to across the requester's siblings.
1687
+ // Legacy/uncorrelated peers: keep the best-effort own-pid session match (no
1688
+ // request_id to correlate the union safely).
1603
1689
  return require_reply_to
1604
- ? mailbox.drainMatchingReply(my_pid, from_session_id, request_id)
1605
- : mailbox.drainMatchingSession(my_pid, from_session_id);
1690
+ ? mailbox.drainMatchingReplyMany(pids, from_session_id, request_id)
1691
+ : mailbox.drainMatchingSession(pids[0], from_session_id);
1606
1692
  }
1607
1693
  server.registerTool("ask_peer", {
1608
1694
  description: [
1609
1695
  "Delegate-and-wait: enqueue a message to a peer in the same project root, wake them, and block until they reply (via send_message) or the timeout elapses. Use this for back-and-forth; use send_message for fire-and-forget.",
1610
- "Wakes the peer via per-client tmux send-keys (Codex gets a paste-burst-aware gap, Claude Code doesn't), then polls for a reply. For reply_to-capable peers, only from_session_id + reply_to == request_id satisfies the wait; legacy peers fall back to best-effort from_session_id matching and the response reports correlation:\"uncorrelated\". Response carries wake_status: \"fired\" | \"skipped_busy\" | \"skipped_no_target\" | \"disabled\" (skipped_unsupported is reserved). A peer that is mid-turn is NOT keystroke-woken (skipped_busy) — its hook/poll delivers the enqueued message and we still poll for the reply. Returns reply: null, timed_out: true on timeout (default 45000ms, override per call with timeout_ms, or set OXTAIL_ASK_PEER_TIMEOUT_MS at startup). timeout_ms is clamped to a safe ceiling (default 100000ms, env OXTAIL_ASK_PEER_MAX_TIMEOUT_MS) so the wait can't outlast the client's tool-call abort window — exceeding it makes the client hard-fail the call instead of returning graceful timed_out; the response reports timeout_clamped_from_ms when clamped. Late replies still arrive via read_my_messages / the hook.",
1696
+ "Wakes the peer via per-client tmux send-keys (Codex gets a paste-burst-aware gap, Claude Code doesn't), then polls for a reply. For reply_to-capable peers, only from_session_id + reply_to == request_id satisfies the wait; legacy peers fall back to best-effort from_session_id matching and the response reports correlation:\"uncorrelated\". Response carries wake_status: \"fired\" | \"skipped_busy\" | \"skipped_no_target\" | \"disabled\" (skipped_unsupported is reserved). A peer that is mid-turn is NOT keystroke-woken (skipped_busy) — its hook/poll delivers the enqueued message and we still poll for the reply. Returns reply: null, timed_out: true on timeout (default 60000ms, override per call with timeout_ms, or set OXTAIL_ASK_PEER_TIMEOUT_MS at startup). timeout_ms is clamped to a safe ceiling (default 100000ms, env OXTAIL_ASK_PEER_MAX_TIMEOUT_MS) so the wait can't outlast the client's tool-call abort window — exceeding it makes the client hard-fail the call instead of returning graceful timed_out; the response reports timeout_clamped_from_ms when clamped. DURABLE DELEGATION: on timeout (correlated peers, claimed requester), the request is recorded as a pending obligation, so when the peer's reply finally arrives — minutes or hours later — it WAKES you back (wake_reason late_reply_to_pending), not just landing silently in read_my_messages. So ask_peer is safe for long tasks: let it time out, end your turn, get pulled back when the work is done.",
1611
1697
  "Target must have a registered client.session_id (Codex peers call claim_session first). Body is verbatim — frame it as an assignment (objective + requested action) so it reads as delegation, not chat. Wake overridable via OXTAIL_ASK_PEER_WAKE_STRATEGY=auto|legacy|off.",
1612
1698
  ].join(" "),
1613
1699
  inputSchema: {
@@ -1656,6 +1742,10 @@ server.registerTool("ask_peer", {
1656
1742
  const requestId = randomBytes(8).toString("hex");
1657
1743
  const requireReplyTo = peerSupportsReplyTo(peer);
1658
1744
  const fromSessionId = entry.client.session_id ?? undefined;
1745
+ // The reply is addressed to OUR session_id; resolveTarget enqueues it to the
1746
+ // session's freshest sibling, which may not be entry.server_pid. Drain the
1747
+ // union (own pid first for fast-path locality), mirroring read_my_messages.
1748
+ const myPids = requesterPids(entry.server_pid, fromSessionId);
1659
1749
  // Record-before-append (mirrors send_message): lets the peer answer with
1660
1750
  // reply_to_message(message_id) instead of hand-wiring target + reply_to.
1661
1751
  const msg = deliverToPeer(expectedSessionId, peer.server_pid, body, fromSessionId, {
@@ -1683,7 +1773,7 @@ server.registerTool("ask_peer", {
1683
1773
  // our outbound arrived, their hook delivered it as additionalContext and
1684
1774
  // their response may already be in our mailbox.
1685
1775
  await askPeerDelay(ASK_PEER_GRACE_MS, extra.signal);
1686
- reply = drainAskPeerReply(entry.server_pid, expectedSessionId, requestId, requireReplyTo);
1776
+ reply = drainAskPeerReply(myPids, expectedSessionId, requestId, requireReplyTo);
1687
1777
  if (!reply) {
1688
1778
  // Common path: peer was idle. Route the wake per client_type, but skip
1689
1779
  // the keystroke if the peer is FRESHLY busy (mid-turn): typing into a
@@ -1706,7 +1796,7 @@ server.registerTool("ask_peer", {
1706
1796
  // return this and the caller fail-fasts instead of polling.
1707
1797
  }
1708
1798
  else {
1709
- reply = await askPeerPoll(entry.server_pid, expectedSessionId, requestId, requireReplyTo, deadlineMs, extra.signal);
1799
+ reply = await askPeerPoll(myPids, expectedSessionId, requestId, requireReplyTo, deadlineMs, extra.signal);
1710
1800
  }
1711
1801
  }
1712
1802
  else {
@@ -1749,6 +1839,77 @@ server.registerTool("ask_peer", {
1749
1839
  // attempted) is NOT a timeout; the message has been enqueued and will be
1750
1840
  // delivered when the peer next enters a turn.
1751
1841
  const polled = wakeStatus !== "skipped_unsupported";
1842
+ // Durable delegation: we polled to the deadline with no reply. Record a
1843
+ // pending obligation FIRST, then do one final authoritative UNION drain —
1844
+ // write-before-final-drain closes the poll-vs-deadline TOCTOU. A reply that
1845
+ // landed in the gap is caught here and returned now; a reply that arrives
1846
+ // AFTER finds the persisted record and pulls us back via resolveSendWake's
1847
+ // late_reply_to_pending path — even minutes/hours later, and even for a
1848
+ // markerless idle Codex requester. Correlated peers + claimed requester only.
1849
+ if (polled && reply === null && !aborted && requireReplyTo) {
1850
+ if (fromSessionId) {
1851
+ const dir = defaultPendingAskDir();
1852
+ // Opportunistic sweep so abandoned records (a reply that never came)
1853
+ // can't accumulate — mirrors gcAutowake inside decideReplyAutoWake.
1854
+ gcPendingAsk(dir, Date.now());
1855
+ // Write the pending obligation BEFORE the final drain (write-before-
1856
+ // final-drain): a reply that lands after the drain finds this record and
1857
+ // wakes us via resolveSendWake; one that landed before is caught below.
1858
+ if (!recordPendingAsk(dir, fromSessionId, requestId, Date.now())) {
1859
+ // Store unwritable → silently degrades to the read_my_messages path
1860
+ // (no durable pull-back). Surface it so the degradation is observable.
1861
+ trace("ask_peer_pending_record_failed", { request_id: requestId });
1862
+ }
1863
+ // Authoritative final drain. Recompute the pid union NOW — a sibling MCP
1864
+ // child may have appeared during the wait. Use the CHECKED variant and
1865
+ // retry any pid we couldn't inspect (transient lock): silently treating
1866
+ // "couldn't read" as "no reply" would leave the record with no later
1867
+ // event to consume it → a stranded pull-back.
1868
+ const finalPids = requesterPids(entry.server_pid, fromSessionId);
1869
+ let drained = mailbox.drainMatchingReplyManyChecked(finalPids, expectedSessionId, requestId);
1870
+ if (drained.skipped.length > 0) {
1871
+ // A pid we couldn't inspect might hold either the already-landed reply
1872
+ // (if we have none yet) OR a migrate-crash duplicate of the reply we DID
1873
+ // pull (which a later read_my_messages would re-deliver). Retry once
1874
+ // after a brief delay for the lock to clear.
1875
+ try {
1876
+ await askPeerDelay(ASK_PEER_POLL_MS, extra.signal);
1877
+ if (!drained.reply) {
1878
+ drained = mailbox.drainMatchingReplyManyChecked(drained.skipped, expectedSessionId, requestId);
1879
+ if (!drained.reply && drained.skipped.length > 0) {
1880
+ // Still un-inspectable after the retry: a lock held past the
1881
+ // acquire budget + retry (SIGSTOP-class / long holder). diagnose
1882
+ // can use this to tell "no reply" from "a reply may sit behind a
1883
+ // locked pid" — the record persists, so a later send still wakes.
1884
+ trace("ask_peer_skipped_after_final_retry", {
1885
+ request_id: requestId,
1886
+ skipped: drained.skipped,
1887
+ });
1888
+ }
1889
+ }
1890
+ else {
1891
+ // We have the reply — sweep only its exact id from the skipped pids
1892
+ // (a distinct second reply, different id, is left for read_my_messages).
1893
+ mailbox.sweepMessageId(drained.skipped, drained.reply.id);
1894
+ }
1895
+ }
1896
+ catch {
1897
+ // aborted during the brief retry delay — leave the record; we return
1898
+ // timed_out and the reply still delivers via read_my_messages.
1899
+ }
1900
+ }
1901
+ if (drained.reply) {
1902
+ consumePendingAsk(dir, fromSessionId, requestId);
1903
+ reply = drained.reply;
1904
+ trace("ask_peer_late_catch", { request_id: requestId, message_id: drained.reply.id });
1905
+ }
1906
+ }
1907
+ else {
1908
+ // Unclaimed requester: a peer can't correlate/reply_to_message back to
1909
+ // us, so there's nothing to durably wake — surface it rather than guess.
1910
+ trace("ask_peer_pending_skipped_unclaimed", { request_id: requestId });
1911
+ }
1912
+ }
1752
1913
  const timedOut = polled && reply === null;
1753
1914
  trace("ask_peer_end", {
1754
1915
  target_session_id: expectedSessionId,
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "oxtail",
3
- "version": "0.14.1",
3
+ "version": "0.15.0",
4
4
  "private": false,
5
5
  "type": "module",
6
6
  "description": "Coordination layer for parallel AI coding agent sessions, exposed over MCP.",
@@ -33,10 +33,14 @@ export const HOOK_MARKER_KEY = "_oxtailHook";
33
33
  // with no owner check, so during an upgrade window (before re-install) the
34
34
  // old hook can still lose the stall-resume / double-clear races against a
35
35
  // v6 peer. The version bump forces re-install to close that window.
36
+ // v7: pretooluse re-stamps the "busy" activity marker on every tool call, so a
37
+ // long ACTIVE turn stays fresh and doesn't invite a spurious wake:auto once
38
+ // it outruns ACTIVITY_BUSY_TTL_MS. A stale pre-v7 hook just doesn't refresh
39
+ // (the prior behavior) — never wrong, only less fresh on long turns.
36
40
  // INVARIANT: any change to an assets/*.sh script MUST bump this version, so
37
41
  // existing installs are forced to re-install. scripts/check-hook-version.mjs
38
42
  // enforces this in CI.
39
- export const HOOK_MARKER_VERSION = 6;
43
+ export const HOOK_MARKER_VERSION = 7;
40
44
 
41
45
  const HOOKS_DIR = path.join(os.homedir(), ".oxtail", "hooks");
42
46