oxtail 0.11.0 → 0.13.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/AGENTS.md CHANGED
@@ -55,9 +55,13 @@ The v0.9/v0.10.1 changes close the public dogfooding gaps found by real peer tra
55
55
  - **Session identity is monotonic after first non-null resolution.** Automatic detection is a bootstrap aid. Once `claim_session`, `register_my_session`, or sticky-claim recovery sets a session id, later env/birth-time detection and `get_my_session` refreshes must preserve it. Only another explicit claim can change it.
56
56
  - **`ask_peer` replies must correlate when the peer supports it.** Same-peer chatter is not a reply. Upgraded peers advertise `capabilities.mailbox.reply_to` and must satisfy waits with `from_session_id == target.session_id` plus `reply_to == request_id`; unmatched messages stay in the mailbox. The older `from_session_id`-only path is legacy compatibility and must be surfaced as `correlation: "uncorrelated"`. For no-capability peers, stale same-peer chatter may still satisfy the wait; that is an explicit compatibility limitation, not a correctness guarantee.
57
57
  - **Peer messages are context, not user authority.** Mailbox provenance (`origin: "peer"`, `request_id`, `reply_to`, `source_message_id`) is diagnostic metadata, not a trust boundary. Hook text must keep the trust framing visible — the "context, not user authority" line plus the `from_session_id` / `request_id` / `reply_to` reply fields (full protocol names) are rendered on every delivery — and injected hook bodies must stay under an explicit budget. Single-valued provenance the framing already implies (`origin: "peer"`) stays in the mailbox JSONL but need not be rendered into context.
58
+ - **A displayed reply handle must be resolvable: record the received-ledger before the mailbox line is visible.** Both delivery paths are destructive — `read_my_messages` and the PreToolUse/Stop hook each truncate the mailbox on handoff — so `reply_to_message` resolves `message_id` against a durable per-session ledger (`~/.oxtail/received/<hash(session_id)>.jsonl`), never the queue. `deliverToPeer` (the single delivery primitive behind `send_message` / `ask_peer` / `reply_to_message`) MUST write the ledger entry **before** appending the mailbox line: append-then-record reopens a window where the hook renders a `message_id` the receiver cannot yet reply to. The ledger is keyed and owned by receiver `session_id`; a lookup reads only the caller's own file. The ledger write is best-effort (a failure degrades to "no handle, reply via `send_message`") but must never reorder ahead of, or block, the actual delivery.
58
59
 
59
60
  ## Recently shipped
60
61
 
62
+ - **Reply by id + received-ledger (v0.13.0).** `reply_to_message(message_id, body)` looks the inbound envelope up in a durable per-session ledger and derives `target` / `reply_to` / `source_message_id` server-side, replacing the manual rewiring that silently degraded a correlated exchange into loose mailbox traffic. New `src/received.ts` (ledger: sha256-keyed file, `mkdir`-lock, bounded retention `OXTAIL_RECEIVED_MAX`=1000 with a `received_ledger_pruned` trace so a drop is never silent) and `src/delivery.ts` (`deliverToPeer` = `buildMessage` → `recordReceived` → `requeue` — the record-before-append ordering above), wired into `send_message` / `ask_peer` / `reply_to_message`. Adversarial race-pair + ledger-failure-still-delivers tests in `src/delivery.test.ts`. Converged with Codex over a 5-round peer-messaging pressure test; Codex's review caught the append-before-record race, fixed before merge.
63
+
64
+ - **Wake hardening (v0.12.0 — issues #5/#6/#7, the v0.7-review backlog).** Three deferred wake items, landed together. **#6 (security):** wake send-keys now only ever target the pane the live process tree says hosts the peer's `server_pid` (`chooseVerifiedWakePane` → `currentPaneForServerPid`), never the peer's self-written `tmux_pane`/`tmux_session`; unverifiable ⇒ refuse (`skipped_no_target`). Registry-sourced tmux ids are shape-validated (`isValidTmuxPane`/`isValidTmuxSession`) and a spoofed `TMUX_PANE` env is ignored. This removed the cached-pane and session-name send-keys fallbacks (legit peers always register a real pane; churn is handled by re-resolution). **#5 (debounce):** all wake paths funnel through `wakePeer`, which coalesces repeat wakes to the same peer within `OXTAIL_WAKE_DEBOUNCE_MS` (default 1s, in-memory per process) ⇒ `skipped_debounced`. **#7 (observability):** a `wake_outcome` trace event per wake; `oxtail diagnose` summarizes wake_status counts by tool from `MCP_TRACE_FILE`; a scheduled `codex-drift.yml` fails if Codex's `PASTE_ENTER_SUPPRESS_WINDOW` drifts past our 500ms gap. New modules: `src/wake-debounce.ts`, `src/diagnose.ts`; `chooseVerifiedWakePane` in `src/registry.ts`.
61
65
  - **Wake-on-reply (Slice 1, peer-messaging refinement push).** A `send_message` that carries `reply_to` now auto-wakes the original requester **by default** (explicit `wake:"off"` opts out), closing the observed stranding where a peer's async reply to an idle requester forced a human to relay it. The reply path is a separate, stricter gate than the lenient `wake:"auto"` path (`src/autowake.ts`): it fires only for a **fresh-idle** target (idle marker newer than `OXTAIL_AUTOWAKE_FRESH_IDLE_MS`, default 5m) — stale/unknown/missing/busy ⇒ `skipped_no_fresh_idle`, never a best-effort wake — and adds a **per-target rate limit** (`skipped_rate_limited`), a persistent **one-wake dedupe** keyed on `(session_id, reply_to)` (`skipped_deduped`, GC'd by age) to survive duplicate/late hook drains, an `OXTAIL_AUTOWAKE=off` kill-switch, and a best-effort `skipped_store_error` degrade so a broken dedupe store can never turn an already-enqueued reply into a tool error. Target is resolved by `client.session_id` with the pane re-resolved immediately before send-keys (no `server_pid`/stale-pane reuse). Response surfaces `wake_status` + `wake_reason:"reply_to_default"`. **Coverage caveat:** the fresh-idle gate keys on the busy/idle marker that only the Claude Code hooks maintain, so this slice reaches a **hooked Claude Code requester** (the observed case). A Codex / hookless-Claude requester has no idle marker ⇒ `skipped_no_fresh_idle` (reach it with explicit `wake:"auto"`); closing that direction is **Slice 2** (`expects_reply:true` — a requester-side waiter signal), deliberately not faked here with a blind `unknown ⇒ wake` that would reintroduce the active-waiter double-wake.
62
66
  - **Protocol hardening (v0.10.1).** `ask_peer` now stamps outbound messages with `request_id`; reply-to-capable peers answer with `send_message({ reply_to: request_id })`, and the waiter ignores stale same-peer messages. Explicit identity claims are monotonic, so stale automatic detection cannot clobber a real client session id. PreToolUse/Stop hook pushes are body-budgeted and labeled as peer context, not user authority.
63
67
  - **Deliver-on-complete and state-gated wake (v0.9).** The Stop hook delivers waiting messages at turn end, closing the text-only-turn gap left by PreToolUse. `UserPromptSubmit`/`Stop` maintain a busy/idle flag so `send_message({ wake: "auto" })` nudges idle peers without typing into a busy composer. Sticky Codex claim recovery keeps identity across MCP child restarts.
package/README.md CHANGED
@@ -36,7 +36,7 @@ args = ["-y", "oxtail@0.10.1"]
36
36
 
37
37
  ```sh
38
38
  mkdir -p ~/.claude/commands
39
- curl -L https://raw.githubusercontent.com/d4j3y2k/oxtail/v0.10.1/.claude/commands/oxtail-join.md \
39
+ curl -L https://raw.githubusercontent.com/d4j3y2k/oxtail/v0.13.0/.claude/commands/oxtail-join.md \
40
40
  -o ~/.claude/commands/oxtail-join.md
41
41
  ```
42
42
 
@@ -44,9 +44,9 @@ curl -L https://raw.githubusercontent.com/d4j3y2k/oxtail/v0.10.1/.claude/command
44
44
 
45
45
  ```sh
46
46
  mkdir -p ~/.codex/skills/oxtail-join/agents
47
- curl -L https://raw.githubusercontent.com/d4j3y2k/oxtail/v0.10.1/integrations/codex/oxtail-join/SKILL.md \
47
+ curl -L https://raw.githubusercontent.com/d4j3y2k/oxtail/v0.13.0/integrations/codex/oxtail-join/SKILL.md \
48
48
  -o ~/.codex/skills/oxtail-join/SKILL.md
49
- curl -L https://raw.githubusercontent.com/d4j3y2k/oxtail/v0.10.1/integrations/codex/oxtail-join/agents/openai.yaml \
49
+ curl -L https://raw.githubusercontent.com/d4j3y2k/oxtail/v0.13.0/integrations/codex/oxtail-join/agents/openai.yaml \
50
50
  -o ~/.codex/skills/oxtail-join/agents/openai.yaml
51
51
  ```
52
52
 
@@ -68,10 +68,11 @@ Contributing? `git clone https://github.com/d4j3y2k/oxtail && cd oxtail && npm i
68
68
  - `send_message` — **fire-and-forget** message to a peer. Target is a tmux session name or a raw `client_session_id` UUID. Body ≤ 8KB. Delivery is async via the peer's mailbox file. A plain message does **not** wake an idle peer; pass `wake: "auto"` to nudge one (state-gated — see [Waking an idle peer](#waking-an-idle-peer)). Replies to `ask_peer` should pass `reply_to: "<request_id>"` when the inbound message carries a `request_id` — and a reply **auto-wakes the requester by default** (strictly gated; `wake: "off"` opts out). (v0.5+)
69
69
  - `read_my_messages` — drain this session's mailbox and return any queued messages. Messages include `from_session_id`, server-stamped `origin: "peer"`, and optional `request_id` / `reply_to`. Codex peers (and unhooked Claude Code) poll this; Claude Code peers with the hooks installed see messages mid-turn (PreToolUse) or at turn end (Stop) instead. (v0.5+)
70
70
  - `ask_peer` — **delegate-and-wait**. Enqueues a message with a `request_id` and blocks server-side until the peer replies with `send_message({ reply_to: request_id })` or the timeout elapses. Default timeout is 45s (`OXTAIL_ASK_PEER_TIMEOUT_MS`), and each call may pass `timeout_ms`. New peers use strict `reply_to` correlation; legacy/no-capability peers fall back to best-effort first-message matching and the response reports `correlation: "uncorrelated"`. That legacy path may stale-match old same-peer chatter, so callers should treat `uncorrelated` as compatibility-only. Use `send_message` for fire-and-forget. (v0.7+)
71
+ - `reply_to_message` — **reply by `message_id`**. The atomic, correlation-safe alternative to hand-wiring `send_message`'s `target` + `reply_to`: pass the `message_id` the hook or `read_my_messages` showed you and the server looks the inbound envelope up in this session's durable **received-ledger**, derives the reply target (the original sender), carries `reply_to: request_id` when the inbound was an `ask_peer` (keeping the exchange correlated), and stamps `source_message_id`. Replying to a plain `send_message` works too — it just omits `reply_to`. Ownership is structural (you can only reply to a message delivered to *you*); fail-closed on an unknown/aged-out id. Same wake semantics as `send_message`, including the wake-on-reply default. (v0.13+)
71
72
  - `register_my_session` — pin this MCP server's `session_id` directly. Kept for debugging; prefer `claim_session`.
72
73
  - `get_my_session` — return this MCP server's own registry entry plus a per-strategy detection diagnosis. Useful for debugging.
73
74
 
74
- See [design principles](https://github.com/d4j3y2k/oxtail/blob/v0.10.1/AGENTS.md) for scope and architecture.
75
+ See [design principles](https://github.com/d4j3y2k/oxtail/blob/v0.13.0/AGENTS.md) for scope and architecture.
75
76
 
76
77
  ## Usage from an agent
77
78
 
@@ -90,6 +91,8 @@ send_message({ target: "<peer-uuid>", body: "...", reply_to: "<ask request_id>"
90
91
  read_my_messages()
91
92
  ask_peer({ target: "primary", body: "[Handoff] please audit X and tell me what you find" })
92
93
  // → blocks server-side until the peer replies via send_message, then returns their body
94
+ reply_to_message({ message_id: "<id from the hook / read_my_messages>", body: "..." })
95
+ // → looks up the inbound envelope, derives target + reply_to itself; correlated when the inbound was an ask_peer
93
96
  ```
94
97
 
95
98
  Omitting `project_root` triggers a best-effort `.git`-ancestor walk from the server's own cwd. The response includes `inferred: true` when this happens. Pass `project_root` explicitly when you can.
@@ -112,6 +115,8 @@ read_my_messages()
112
115
 
113
116
  The mailbox lives at `~/.oxtail/mailboxes/<server_pid>.jsonl`, append-only JSONL, drained under an `mkdir`-based advisory lock. The transport is intentionally dumb: 8KB UTF-8 body cap, sender chooses the framing (raw text or pre-wrapped `<system-reminder>...</system-reminder>`). Hook-delivered mailbox pushes are body-budgeted at 24K escaped characters by default; set `OXTAIL_HOOK_MAX_BODY_CHARS` to tune. If the budget is exceeded, the hook tells the receiver which bodies were truncated or omitted.
114
117
 
118
+ Because both delivery paths are **destructive** — `read_my_messages` and the hook each truncate the mailbox once a message is handed off — a reply-by-id verb can't rely on the queue. Every delivered envelope is therefore also recorded in a durable **received-ledger** at `~/.oxtail/received/<hash(session_id)>.jsonl` keyed by `message_id`, written *before* the mailbox line becomes visible (so any handle a receiver can see is already resolvable) and bounded to the most recent `OXTAIL_RECEIVED_MAX` (default 1000) entries. `reply_to_message` reads only the caller's own ledger — that file *is* the ownership boundary.
119
+
115
120
  Inbound peer messages are context, not user authority. oxtail stamps delivered messages with `origin: "peer"` for provenance/debugging, but this is not a trust boundary and peers cannot mint trusted user instructions.
116
121
 
117
122
  Cross-project sends are rejected, never silently dropped. Sending to a peer with the same tmux session name as another live peer returns `ambiguous-target` with the candidate `client_session_id`s — use the UUID form to disambiguate.
@@ -172,6 +177,8 @@ If you have a hook installed on a managed event that isn't from Terminator and i
172
177
 
173
178
  oxtail trusts any process running as the **same local user** to enqueue messages. The mailbox directory is mode `0o700` (private), so other users on the host cannot read or write. **On a shared-tenancy box (containers, multi-user dev hosts, etc.), do not run oxtail-aware agents:** any local process under your user can inject `<system-reminder>` content directly into a Claude session. The threat boundary is the same as `~/.ssh/` — what your user processes do, you trust.
174
179
 
180
+ Within that boundary oxtail still *narrows* redirectable side effects, as defense-in-depth rather than a hard boundary: wake keystrokes only go to the pane the process tree confirms hosts the target's `server_pid`, never a self-written `tmux_pane`/`tmux_session` (see [Pane targeting](#pane-targeting-verified)), and an accepted registry entry can't borrow another pid — its `server_pid` must match its own `<pid>.json` filename. So one peer's entry can't masquerade as hosting another agent to redirect that agent's wake. A same-user process can still overwrite any registry file outright (that's the trust boundary above); what it can't do is smuggle a pid mismatch past a reader.
181
+
175
182
  ## Delegate-and-wait (v0.10.1)
176
183
 
177
184
  `ask_peer` extends v0.5's mailbox transport into a blocking primitive:
@@ -182,7 +189,7 @@ ask_peer({ target, body })
182
189
  ok: true,
183
190
  message_id,
184
191
  request_id,
185
- wake_status: "fired" | "skipped_unsupported" | "skipped_no_target" | "disabled",
192
+ wake_status: "fired" | "skipped_busy" | "skipped_debounced" | "skipped_no_target" | "disabled",
186
193
  reply: { id, body, enqueued_at, from_session_id, reply_to, correlation } | null,
187
194
  correlation: "correlated" | "uncorrelated" | "none",
188
195
  timeout_ms,
@@ -190,7 +197,7 @@ ask_peer({ target, body })
190
197
  }
191
198
  ```
192
199
 
193
- `wake_status` distinguishes the four outcomes a caller may need to handle differently. `fired` means the wake was attempted (or the reply arrived during the grace window, so no wake was needed). `skipped_unsupported` is reservedno client currently returns this in auto mode (both Codex and Claude Code wake via send-keys). `skipped_no_target` means no tmux pane/session resolved for the target. `disabled` means `OXTAIL_ASK_PEER_WAKE_STRATEGY=off` is in effect.
200
+ `wake_status` distinguishes the outcomes a caller may need to handle differently. `fired` means the wake was attempted (or the reply arrived during the grace window, so no wake was needed). `skipped_busy` means the peer is mid-turn (its hooks/poll will deliver we still poll for the reply). `skipped_debounced` means a wake fired for this peer moments ago and this one was coalesced. `skipped_no_target` means no process-tree-verified pane resolved for the target. `disabled` means `OXTAIL_ASK_PEER_WAKE_STRATEGY=off` is in effect. (`skipped_unsupported` is reserved — no client currently returns it.)
194
201
 
195
202
  `timed_out` is `true` only when the poll loop ran to its deadline without a reply.
196
203
 
@@ -220,9 +227,13 @@ ask_peer({ target, body })
220
227
  4. Poll the caller's mailbox at 200ms. For reply-to-capable peers, only a message with both `from_session_id == target.session_id` and `reply_to == request_id` satisfies the wait; non-matching messages stay in the mailbox untouched. Legacy/no-capability peers are best-effort and are marked `correlation: "uncorrelated"`; this preserves old peers but can stale-match old same-peer chatter.
221
228
  5. Return the reply on match, or `{ reply: null, timed_out: true, wake_status, correlation: "none" }` after the timeout. Late replies fall back to the normal v0.5 hook / `read_my_messages` path — never lost, just delivered out of band.
222
229
 
223
- ### Pane staleness
230
+ ### Pane targeting (verified)
231
+
232
+ A peer's cached `tmux_pane` / `tmux_session` are written by the peer into its **own** registry file, so they aren't trustworthy targets for keystrokes — a malicious local peer could point them at someone else's pane. The **only** send-keys target oxtail uses is the pane the live process tree says currently hosts the peer's `server_pid` (resolved at wake-time via `ps`/`tmux` ancestry — unforgeable by editing a JSON file). This also handles pane-id churn for free: the pane is always re-resolved fresh. If `server_pid` can't be bound to any live pane, oxtail **refuses** to wake (`wake_status: "skipped_no_target"`) rather than fall back to a self-written value. `server_pid` itself is self-written too, so registry entries whose `server_pid` doesn't match their own `<pid>.json` filename are rejected — a forged entry can't borrow another process's pane. The pane id that does reach `tmux` is shape-validated (`%\d+`); session names are no longer used as a send-keys target at all. (Hardening from issue #6.)
224
233
 
225
- Pane targeting can go stale: `tmux_pane` is cached at server startup, but tmux can reuse pane ids after a pane is killed. v0.7 re-resolves the pane from the peer's `server_pid` at wake-time (via process-tree ancestry), preferring the live pane id over the cached one. If the peer is no longer in any tmux pane (orphaned), oxtail falls back to the registered tmux session name. If both targeting attempts fail, `wake_status` returns `skipped_no_target`.
234
+ ### Wake debouncing
235
+
236
+ All wake paths funnel through one place, which **coalesces** rapid repeat wakes to the same peer: if a wake fired for a peer within `OXTAIL_WAKE_DEBOUNCE_MS` (default 1s), a follow-up wake is skipped (`wake_status: "skipped_debounced"`) and relies on the still-pending response. This keeps a retried `ask_peer`, two callers racing the same peer, or a polling loop from stacking notification lines into the peer's composer. In-memory and per-process by design. (Issue #5.)
226
237
 
227
238
  ### Constraints
228
239
 
@@ -281,10 +292,25 @@ When a strategy doesn't fire, it returns an abstention with a `reason` (e.g. `"2
281
292
 
282
293
  If `MCP_TRACE_FILE` is set in the environment, every detection run appends an NDJSON record with trigger, winning strategy, per-strategy outcomes, and `next_step`. Useful for diagnosing unresolved `client_session_id`s in the wild.
283
294
 
295
+ ### Diagnosing wakes (`oxtail diagnose`)
296
+
297
+ The same `MCP_TRACE_FILE` also captures a `wake_outcome` record for every wake (which tool drove it and the resulting `wake_status`). Run:
298
+
299
+ ```sh
300
+ oxtail diagnose
301
+ ```
302
+
303
+ to get a summary — counts by `wake_status`, broken down by tool — so "is the wake mechanism working in my environment?" is one command instead of grepping JSONL. With `MCP_TRACE_FILE` unset it just prints how to enable tracing. (Issue #7.)
304
+
305
+ A scheduled CI job (`.github/workflows/codex-drift.yml`, also runnable on demand) fetches Codex's upstream `PASTE_ENTER_SUPPRESS_WINDOW` and fails if it drifts past oxtail's 500ms Codex wake gap — so a future Codex release that would break the wake surfaces as a red job rather than a silent field regression.
306
+
284
307
  ## Status
285
308
 
286
- v0.10.1. Completes the autonomous peer-messaging matrix and hardens the protocol: a message reaches a Claude Code peer whether it's mid-turn, finishing, or fully idle, and delegate-and-wait replies are correlated by `request_id` / `reply_to` for upgraded peers.
309
+ v0.13.0. Pushes the autonomous peer-messaging matrix toward zero human relay, hardens the wake path, then makes correlated replies atomic.
287
310
 
311
+ - **Reply by id (v0.13.0).** `reply_to_message(message_id, body)` removes the manual `target` + `reply_to` rewiring that silently degraded a correlated exchange into loose mailbox traffic: the server looks the inbound envelope up in a durable per-session **received-ledger** (`~/.oxtail/received/<hash(session_id)>.jsonl`), derives the reply target and `reply_to` itself, and enforces ownership structurally (you can only reply to a message delivered to you). The ledger is written *before* the mailbox line is visible — so a handle the hook displays is always resolvable even though both delivery paths destroy the queue entry once it is handed off. Fail-closed on an unknown/aged-out id.
312
+ - **Wake-on-reply (v0.11.0).** A reply — `send_message` with `reply_to` — auto-wakes a freshly-idle requester by default, so an awaited answer doesn't strand an idle peer. Strictly gated (fresh-idle only, per-target rate limit, one-wake dedupe, `OXTAIL_AUTOWAKE=off` kill-switch). `wake:"off"` opts out; explicit `wake:"auto"` is the escape hatch for a requester without an idle marker (Codex / hookless Claude).
313
+ - **Wake hardening (v0.12.0).** Wake keystrokes only ever target the pane the process tree confirms hosts the peer's `server_pid` — never a self-written `tmux_pane`/`tmux_session`, and registry entries whose `server_pid` doesn't match their filename are rejected. Rapid repeat wakes to one peer are coalesced (`skipped_debounced`). `oxtail diagnose` summarizes wake outcomes from `MCP_TRACE_FILE`, and a scheduled CI job flags drift in Codex's paste-burst window before it can break the wake.
288
314
  - **Correlated delegate-and-wait.** `ask_peer` now sends a `request_id`; upgraded peers reply with `send_message({ reply_to })`, and the waiter ignores same-peer chatter that does not match. Legacy peers are still supported, but their replies are marked `correlation: "uncorrelated"`.
289
315
  - **Identity monotonicity.** `claim_session` / `register_my_session` and sticky-claim recovery are authoritative after they set a session id; later automatic detection cannot clobber a claimed id with stale env data.
290
316
  - **Hook push budgeting and provenance.** PreToolUse/Stop delivery stamps `origin: "peer"`, reminds receivers that peer messages are not user authority, and caps hook-injected body text via `OXTAIL_HOOK_MAX_BODY_CHARS`.
@@ -0,0 +1,32 @@
1
+ import * as mailbox from "./mailbox.js";
2
+ import { recordReceived } from "./received.js";
3
+ import { trace } from "./trace.js";
4
+ // Deliver a message to a peer's mailbox, recording the durable reply-handle in
5
+ // the receiver's ledger BEFORE the mailbox line becomes visible. The ordering is
6
+ // the correctness guarantee: a hook/poll drainer can only observe the mailbox
7
+ // line after the append, which happens strictly after the ledger write — so any
8
+ // message_id a receiver can drain/render already has a ledger entry behind it.
9
+ // The reverse order (append, then record) left a window where the hook rendered
10
+ // a handle reply_to_message could not yet resolve (the race Codex caught).
11
+ //
12
+ // receiverSessionId may be null/empty (an unclaimed peer): then there is no
13
+ // ledger to own the handle and we skip the record — reply_to_message simply
14
+ // won't find it, which is the documented fall-back-to-send_message path.
15
+ //
16
+ // The ledger write is best-effort: a ledger failure must NEVER drop the actual
17
+ // delivery. Worst case the reply handle is missing and the peer falls back to
18
+ // send_message — never the reverse (a visible line with no handle on success),
19
+ // because record precedes append.
20
+ export function deliverToPeer(receiverSessionId, targetPid, body, fromSessionId, options = {}) {
21
+ const msg = mailbox.buildMessage(body, fromSessionId, options);
22
+ if (receiverSessionId) {
23
+ try {
24
+ recordReceived(receiverSessionId, msg);
25
+ }
26
+ catch (e) {
27
+ trace("received_ledger_write_failed", { message_id: msg.id, error: String(e) });
28
+ }
29
+ }
30
+ mailbox.requeue(targetPid, msg);
31
+ return msg;
32
+ }
@@ -0,0 +1,75 @@
1
+ // Issue #7 — `oxtail diagnose`.
2
+ //
3
+ // The wake mechanism is environment-sensitive (tmux present? peer in a pane?
4
+ // Codex paste-burst gap still sufficient?). When it silently doesn't work, a
5
+ // user otherwise has to spelunk MCP_TRACE_FILE by hand. This summarizes the
6
+ // `wake_outcome` trace events oxtail emits — counts by wake_status, broken down
7
+ // by which tool drove the wake — so "is wake working here?" is one command.
8
+ import { readFileSync } from "node:fs";
9
+ // Keep only `wake_outcome` events, newest `limit`, and tally them. Malformed
10
+ // JSONL lines are skipped (a trace file can be concurrently appended).
11
+ export function summarizeWakeOutcomes(lines, limit = 200) {
12
+ const outcomes = [];
13
+ for (const line of lines) {
14
+ if (!line.trim())
15
+ continue;
16
+ let rec;
17
+ try {
18
+ rec = JSON.parse(line);
19
+ }
20
+ catch {
21
+ continue;
22
+ }
23
+ if (rec.event === "wake_outcome")
24
+ outcomes.push(rec);
25
+ }
26
+ const recent = limit > 0 ? outcomes.slice(-limit) : outcomes;
27
+ const byStatus = {};
28
+ const byVia = {};
29
+ for (const r of recent) {
30
+ const status = String(r.wake_status ?? "unknown");
31
+ const via = String(r.via ?? "unknown");
32
+ byStatus[status] = (byStatus[status] ?? 0) + 1;
33
+ const viaBucket = (byVia[via] ??= {});
34
+ viaBucket[status] = (viaBucket[status] ?? 0) + 1;
35
+ }
36
+ return { total: recent.length, considered: outcomes.length, byStatus, byVia };
37
+ }
38
+ function sortedCounts(counts) {
39
+ return Object.entries(counts).sort((a, b) => b[1] - a[1] || a[0].localeCompare(b[0]));
40
+ }
41
+ export function formatWakeSummary(s) {
42
+ if (s.total === 0) {
43
+ return "oxtail diagnose: no wake_outcome events in the trace yet (no ask_peer / wake:auto / reply-default wakes recorded).";
44
+ }
45
+ const lines = [];
46
+ const capped = s.considered > s.total ? ` (newest ${s.total} of ${s.considered})` : ` (${s.total})`;
47
+ lines.push(`oxtail diagnose — wake outcomes${capped}:`);
48
+ for (const [status, n] of sortedCounts(s.byStatus)) {
49
+ lines.push(` ${status}: ${n}`);
50
+ }
51
+ lines.push("by tool:");
52
+ for (const [via, counts] of Object.entries(s.byVia).sort()) {
53
+ const parts = sortedCounts(counts).map(([st, n]) => `${st} ${n}`);
54
+ lines.push(` ${via}: ${parts.join(", ")}`);
55
+ }
56
+ return lines.join("\n");
57
+ }
58
+ // CLI entry. Returns a process exit code; `out` is injectable for tests.
59
+ export function runDiagnose(traceFile, out = console.log) {
60
+ if (!traceFile) {
61
+ out("oxtail diagnose: MCP_TRACE_FILE is not set, so there is no trace data to summarize.");
62
+ out("Set MCP_TRACE_FILE=/path/to/oxtail-trace.jsonl in the oxtail MCP server's env (e.g. in .mcp.json / ~/.claude.json / ~/.codex/config.toml), reproduce some wakes, then re-run `oxtail diagnose`.");
63
+ return 0;
64
+ }
65
+ let content;
66
+ try {
67
+ content = readFileSync(traceFile, "utf8");
68
+ }
69
+ catch {
70
+ out(`oxtail diagnose: could not read trace file ${traceFile} (set MCP_TRACE_FILE and reproduce some wakes first).`);
71
+ return 1;
72
+ }
73
+ out(formatWakeSummary(summarizeWakeOutcomes(content.split("\n"))));
74
+ return 0;
75
+ }
package/dist/mailbox.js CHANGED
@@ -107,8 +107,12 @@ export function serializeMailboxLine(msg) {
107
107
  }
108
108
  return line;
109
109
  }
110
- export function enqueue(target_pid, body, from_session_id, options = {}) {
111
- const msg = {
110
+ // Mint a message envelope WITHOUT writing it anywhere. Split out from enqueue so
111
+ // a higher layer (delivery.ts) can record the durable received-ledger entry
112
+ // BEFORE the mailbox line becomes visible — the ordering that guarantees any
113
+ // message_id a receiver can drain/render already has a ledger entry behind it.
114
+ export function buildMessage(body, from_session_id, options = {}) {
115
+ return {
112
116
  schema_version: 1,
113
117
  id: randomBytes(8).toString("hex"),
114
118
  body,
@@ -120,10 +124,12 @@ export function enqueue(target_pid, body, from_session_id, options = {}) {
120
124
  ...(options.reply_to ? { reply_to: options.reply_to } : {}),
121
125
  ...(options.source_message_id ? { source_message_id: options.source_message_id } : {}),
122
126
  };
123
- const line = serializeMailboxLine(msg);
127
+ }
128
+ export function enqueue(target_pid, body, from_session_id, options = {}) {
129
+ const msg = buildMessage(body, from_session_id, options);
124
130
  acquireLock(target_pid);
125
131
  try {
126
- appendFileSync(mailboxPath(target_pid), line);
132
+ appendFileSync(mailboxPath(target_pid), serializeMailboxLine(msg));
127
133
  }
128
134
  finally {
129
135
  releaseLock(target_pid);
@@ -0,0 +1,176 @@
1
+ import { createHash } from "node:crypto";
2
+ import { mkdirSync, readFileSync, rmdirSync, statSync, writeFileSync, } from "node:fs";
3
+ import { homedir } from "node:os";
4
+ import { join } from "node:path";
5
+ import { trace } from "./trace.js";
6
+ // The received-message ledger: a durable, per-session index of every inbound
7
+ // envelope, keyed by message_id. It exists because both delivery paths are
8
+ // DESTRUCTIVE — mailbox.drain() truncates the queue to 0 after a read, and the
9
+ // PreToolUse hook does `:> "$m"` after rendering messages into model context.
10
+ // So once a message is delivered, the mailbox no longer holds it. A reply verb
11
+ // (reply_to_message) that looks a message up by id therefore cannot rely on the
12
+ // mailbox; it needs this separate ledger.
13
+ //
14
+ // Correctness hinges on ORDERING, enforced by delivery.ts: the ledger entry is
15
+ // written BEFORE the mailbox line becomes visible. A drainer can only observe
16
+ // the line after the append, which happens strictly after this write — so any
17
+ // message_id a receiver can see has a ledger entry behind it. (The reverse order
18
+ // left a window where the hook rendered a handle reply_to_message couldn't yet
19
+ // resolve — the race Codex caught in review.)
20
+ //
21
+ // Ownership is structural: the ledger lives at received/<hash(session_id)>, and
22
+ // lookups only ever read the caller's own file. You can only reply to a message
23
+ // that was delivered to YOU.
24
+ function receivedDir() {
25
+ // Resolved lazily so tests can swap HOME between cases (mirrors mailbox.ts).
26
+ return join(homedir(), ".oxtail", "received");
27
+ }
28
+ // Hash the session_id into the filename (mirrors claims.ts) so two distinct ids
29
+ // can never collide onto one ledger file — a lossy character-sanitize could map
30
+ // different sessions to the same path. UUIDs are already path-safe; the hash is
31
+ // defensive and collision-free.
32
+ function ledgerKey(sessionId) {
33
+ return createHash("sha256").update(sessionId).digest("hex").slice(0, 32);
34
+ }
35
+ function ledgerPath(sessionId) {
36
+ return join(receivedDir(), `${ledgerKey(sessionId)}.jsonl`);
37
+ }
38
+ function lockPath(sessionId) {
39
+ return `${ledgerPath(sessionId)}.lock`;
40
+ }
41
+ // Lock idiom mirrors mailbox.ts (mkdir-based, staleness-cleared). The ledger
42
+ // read-modify-write is small (bounded by receivedMax() lines) so the lock
43
+ // window is short.
44
+ const LOCK_STALE_MS = 30_000;
45
+ const LOCK_RETRY_LIMIT = 50;
46
+ const LOCK_RETRY_DELAY_MS = 10;
47
+ // Bounded retention: keep at most this many of the most-recent inbound messages
48
+ // per session. Read lazily so tests can tune it per-case. Generous by default so
49
+ // a realistic mailbox burst (read_my_messages budgets 50/drain) can't push a
50
+ // just-displayed handle out of the ledger before the receiver replies; when the
51
+ // cap DOES bite, recordReceived traces the drop so it is never silent.
52
+ export function receivedMax() {
53
+ const n = Number(process.env.OXTAIL_RECEIVED_MAX);
54
+ return Number.isFinite(n) && n > 0 ? Math.floor(n) : 1000;
55
+ }
56
+ function sleepSync(ms) {
57
+ Atomics.wait(new Int32Array(new SharedArrayBuffer(4)), 0, 0, ms);
58
+ }
59
+ function acquireLock(sessionId) {
60
+ mkdirSync(receivedDir(), { recursive: true, mode: 0o700 });
61
+ const lock = lockPath(sessionId);
62
+ for (let i = 0; i < LOCK_RETRY_LIMIT; i++) {
63
+ try {
64
+ mkdirSync(lock, { mode: 0o700 });
65
+ return;
66
+ }
67
+ catch (e) {
68
+ const err = e;
69
+ if (err.code !== "EEXIST")
70
+ throw err;
71
+ try {
72
+ const st = statSync(lock);
73
+ if (Date.now() - st.mtimeMs > LOCK_STALE_MS) {
74
+ try {
75
+ rmdirSync(lock);
76
+ trace("received_lock_stale_clear", { session_id: sessionId });
77
+ }
78
+ catch {
79
+ // raced with another clearer; fall through to retry
80
+ }
81
+ continue;
82
+ }
83
+ }
84
+ catch {
85
+ // stat may race; just retry
86
+ }
87
+ sleepSync(LOCK_RETRY_DELAY_MS);
88
+ }
89
+ }
90
+ throw new Error(`could not acquire received-ledger lock for ${sessionId}`);
91
+ }
92
+ function releaseLock(sessionId) {
93
+ try {
94
+ rmdirSync(lockPath(sessionId));
95
+ }
96
+ catch {
97
+ // ignore ENOENT / not-empty / EPERM
98
+ }
99
+ }
100
+ function readLines(sessionId) {
101
+ try {
102
+ const raw = readFileSync(ledgerPath(sessionId), "utf8");
103
+ if (!raw)
104
+ return [];
105
+ return raw.split("\n").filter((l) => l.length > 0);
106
+ }
107
+ catch (e) {
108
+ const err = e;
109
+ if (err.code === "ENOENT")
110
+ return [];
111
+ throw err;
112
+ }
113
+ }
114
+ // Append an inbound envelope to the receiver's ledger and prune to receivedMax()
115
+ // (oldest dropped first). Called by delivery.ts BEFORE the mailbox append.
116
+ export function recordReceived(receiverSessionId, msg) {
117
+ if (!receiverSessionId)
118
+ return;
119
+ acquireLock(receiverSessionId);
120
+ try {
121
+ const lines = readLines(receiverSessionId);
122
+ lines.push(JSON.stringify(msg));
123
+ const max = receivedMax();
124
+ let pruned = lines;
125
+ if (lines.length > max) {
126
+ pruned = lines.slice(lines.length - max);
127
+ // No silent caps: a dropped handle becomes reply_to_message
128
+ // "message-not-found", so surface that the bound bit.
129
+ trace("received_ledger_pruned", {
130
+ session_id: receiverSessionId,
131
+ dropped: lines.length - max,
132
+ kept: max,
133
+ });
134
+ }
135
+ writeFileSync(ledgerPath(receiverSessionId), pruned.join("\n") + "\n", {
136
+ mode: 0o600,
137
+ });
138
+ }
139
+ finally {
140
+ releaseLock(receiverSessionId);
141
+ }
142
+ }
143
+ // Look up a previously-received envelope by message_id in this session's ledger.
144
+ // Newest-first scan (ids are unique, so the first match is the only match).
145
+ // Returns null when not found / aged out — the fail-closed signal the reply
146
+ // verb turns into message-not-found. Read under the same lock so a concurrent
147
+ // recordReceived rewrite can't be observed half-written.
148
+ export function lookupReceived(receiverSessionId, messageId) {
149
+ if (!receiverSessionId)
150
+ return null;
151
+ acquireLock(receiverSessionId);
152
+ try {
153
+ const lines = readLines(receiverSessionId);
154
+ for (let i = lines.length - 1; i >= 0; i--) {
155
+ let parsed;
156
+ try {
157
+ parsed = JSON.parse(lines[i]);
158
+ }
159
+ catch {
160
+ continue;
161
+ }
162
+ if (parsed &&
163
+ typeof parsed === "object" &&
164
+ parsed.id === messageId) {
165
+ return parsed;
166
+ }
167
+ }
168
+ return null;
169
+ }
170
+ finally {
171
+ releaseLock(receiverSessionId);
172
+ }
173
+ }
174
+ export function receivedFilePath(sessionId) {
175
+ return ledgerPath(sessionId);
176
+ }
package/dist/registry.js CHANGED
@@ -42,6 +42,73 @@ function ensureDir() {
42
42
  function entryPath(pid) {
43
43
  return join(registryDir(), `${pid}.json`);
44
44
  }
45
+ // tmux's own identifiers, used to sanitize registry-sourced values before they
46
+ // reach a `tmux` command. A pane id is always `%<n>`; a session name, per tmux's
47
+ // rules for names we create, is `[A-Za-z0-9_-]+`. Validating defends against a
48
+ // malicious local peer writing a crafted `tmux_pane`/`tmux_session` into its own
49
+ // registry file to redirect or trick our wake send-keys (issue #6).
50
+ export function isValidTmuxPane(s) {
51
+ return /^%\d+$/.test(s);
52
+ }
53
+ export function isValidTmuxSession(s) {
54
+ return /^[A-Za-z0-9_-]+$/.test(s);
55
+ }
56
+ // The ONLY trustworthy send-keys target for waking a peer: the pane the live
57
+ // process tree says currently hosts the peer's `server_pid`. This is computed
58
+ // from `ps`/`tmux` state (currentPaneForServerPid), so it cannot be forged by a
59
+ // peer editing its own `~/.oxtail/sessions/<pid>.json` — unlike the cached
60
+ // `tmux_pane`/`tmux_session` fields, which the peer self-writes. Returns null
61
+ // (caller must refuse to wake) when:
62
+ // - the peer never registered a pane: a legit tmux-hosted peer always does
63
+ // (its session is derived from the pane), so a pane-less/session-only entry
64
+ // is hand-written or spoofed and must never be blind-fired; gating on a
65
+ // registered pane also avoids fishing for a pane from server_pid alone,
66
+ // which in tests can collide with the test runner's own pane.
67
+ // - server_pid isn't under any live tmux pane: we can't bind a trustworthy
68
+ // target, so we refuse rather than fall back to the self-written cached value.
69
+ // - the resolved pane isn't a well-formed pane id (tmux output anomaly).
70
+ // resolvePane is injected in tests; production uses currentPaneForServerPid.
71
+ export function chooseVerifiedWakePane(peer, resolvePane = currentPaneForServerPid) {
72
+ if (!peer.tmux_pane)
73
+ return null;
74
+ const live = resolvePane(peer.server_pid);
75
+ if (!live || !isValidTmuxPane(live))
76
+ return null;
77
+ return live;
78
+ }
79
+ // Extract the pid a registry filename encodes: `<pid>.json` → pid, else null.
80
+ export function filenamePid(file) {
81
+ const m = /^(\d+)\.json$/.exec(file);
82
+ if (!m)
83
+ return null;
84
+ const pid = Number(m[1]);
85
+ return Number.isInteger(pid) && pid > 0 ? pid : null;
86
+ }
87
+ // Read + parse a registry file, enforcing the provenance invariant that a
88
+ // process only ever writes its OWN `<pid>.json`: the parsed `server_pid` MUST
89
+ // equal the pid in the filename. register() always writes them in agreement, so
90
+ // a mismatch means the entry was hand-forged to borrow another process's pid —
91
+ // the #6 redirect where a peer self-writes `server_pid: <victimPid>` so that
92
+ // chooseVerifiedWakePane → currentPaneForServerPid resolves (and wakes) the
93
+ // victim's pane. Such entries, plus non-`<pid>.json` names and parse failures,
94
+ // are rejected (returns null) so no raw-registry reader trusts them. The
95
+ // local-user trust boundary still holds (a same-user process can overwrite any
96
+ // file), but this stops one peer's entry from impersonating another pid.
97
+ export function readEntryFile(dir, file) {
98
+ const fnamePid = filenamePid(file);
99
+ if (fnamePid === null)
100
+ return null;
101
+ let entry;
102
+ try {
103
+ entry = JSON.parse(readFileSync(join(dir, file), "utf8"));
104
+ }
105
+ catch {
106
+ return null;
107
+ }
108
+ if (entry.server_pid !== fnamePid)
109
+ return null;
110
+ return entry;
111
+ }
45
112
  function resolveTmuxSessionFromPane(pane) {
46
113
  if (!pane)
47
114
  return null;
@@ -120,7 +187,10 @@ export function findTmuxPaneByAncestry(startPid, panePids, ppids) {
120
187
  return null;
121
188
  }
122
189
  export function resolveTmuxPane(env = process.env, pid = process.pid) {
123
- if (env.TMUX_PANE)
190
+ // TMUX_PANE is a peer-controllable env var: only trust it if it has tmux's
191
+ // pane-id shape (%N). A spoofed/malformed value falls through to process-tree
192
+ // ancestry, which can't be forged by editing the environment (issue #6).
193
+ if (env.TMUX_PANE && isValidTmuxPane(env.TMUX_PANE))
124
194
  return env.TMUX_PANE;
125
195
  return findTmuxPaneByAncestry(pid, listTmuxPanePids(), listAllPpids());
126
196
  }
@@ -194,16 +264,10 @@ function gcDeadSiblings(entry) {
194
264
  if (!existsSync(dir))
195
265
  return;
196
266
  for (const file of readdirSync(dir)) {
197
- if (!file.endsWith(".json"))
198
- continue;
267
+ const other = readEntryFile(dir, file);
268
+ if (!other)
269
+ continue; // skip non-<pid>.json, parse errors, and forged entries
199
270
  const full = join(dir, file);
200
- let other;
201
- try {
202
- other = JSON.parse(readFileSync(full, "utf8"));
203
- }
204
- catch {
205
- continue;
206
- }
207
271
  if (other.server_pid === entry.server_pid)
208
272
  continue;
209
273
  if (other.client.session_id !== sid)
@@ -267,16 +331,10 @@ export function readAll() {
267
331
  return [];
268
332
  const live = [];
269
333
  for (const file of readdirSync(dir)) {
270
- if (!file.endsWith(".json"))
271
- continue;
334
+ const entry = readEntryFile(dir, file);
335
+ if (!entry)
336
+ continue; // non-<pid>.json, parse error, or forged server_pid
272
337
  const full = join(dir, file);
273
- let entry;
274
- try {
275
- entry = JSON.parse(readFileSync(full, "utf8"));
276
- }
277
- catch {
278
- continue;
279
- }
280
338
  if (!isAlive(entry.server_pid)) {
281
339
  // Reap-deferral: a dead child's mailbox may still hold undrained mail
282
340
  // that the session's union-drain (PreToolUse hook + read_my_messages)
@@ -342,15 +400,9 @@ export function sessionPidsForId(sessionId) {
342
400
  return [];
343
401
  const entries = [];
344
402
  for (const file of readdirSync(dir)) {
345
- if (!file.endsWith(".json"))
346
- continue;
347
- let e;
348
- try {
349
- e = JSON.parse(readFileSync(join(dir, file), "utf8"));
350
- }
351
- catch {
352
- continue;
353
- }
403
+ const e = readEntryFile(dir, file);
404
+ if (!e)
405
+ continue; // skip non-<pid>.json, parse errors, and forged entries
354
406
  if (e.client.session_id === sessionId)
355
407
  entries.push(e);
356
408
  }
package/dist/server.js CHANGED
@@ -10,10 +10,13 @@ import { dirname, join, sep } from "node:path";
10
10
  import { clientFromHandshake, detectClient, enrichWithDiagnosis, transcriptPathFor, } from "./clients.js";
11
11
  import { isAbstain } from "./detect/index.js";
12
12
  import { trace } from "./trace.js";
13
- import { buildEntry, currentPaneForServerPid, findByTmuxSession, readAll, refreshTmuxBinding, register, sessionPidsForId, unregister, } from "./registry.js";
13
+ import { buildEntry, chooseVerifiedWakePane, findByTmuxSession, readAll, refreshTmuxBinding, register, sessionPidsForId, unregister, } from "./registry.js";
14
14
  import * as mailbox from "./mailbox.js";
15
+ import * as received from "./received.js";
16
+ import { deliverToPeer } from "./delivery.js";
15
17
  import { recoverClaim, resolveAncestors, writeClaim } from "./claims.js";
16
18
  import { decideReplyAutoWake, defaultAutowakeDir } from "./autowake.js";
19
+ import { markWoke, newWakeDebounceStore, recentlyWoke } from "./wake-debounce.js";
17
20
  // CLI subcommand dispatch must run before any MCP setup so that
18
21
  // `npx oxtail install-hook` doesn't open an MCP transport or register a
19
22
  // session. Use named exports and await them; calling `await import(...)`
@@ -33,6 +36,10 @@ import { decideReplyAutoWake, defaultAutowakeDir } from "./autowake.js";
33
36
  await mod.uninstall();
34
37
  process.exit(0);
35
38
  }
39
+ if (sub === "diagnose") {
40
+ const { runDiagnose } = await import("./diagnose.js");
41
+ process.exit(runDiagnose(process.env.MCP_TRACE_FILE));
42
+ }
36
43
  }
37
44
  import { readClaudeTranscript, readCodexTranscript, } from "./transcripts.js";
38
45
  // Single builder for every readSession return so the field set (including the
@@ -1003,7 +1010,7 @@ function resolveTarget(target, caller) {
1003
1010
  server.registerTool("send_message", {
1004
1011
  description: [
1005
1012
  "Fire-and-forget message to a peer in the same project root. Target: a tmux session name OR a client_session_id (UUID). Async via the peer's mailbox — delivered mid-turn (PreToolUse hook) or next-turn (read_my_messages); cross-project targets are rejected.",
1006
- "A plain message does NOT wake an idle peer. Pass wake:\"auto\" to nudge one via per-client send-keys, state-gated (skipped if the peer is mid-turn). EXCEPTION (wake-on-reply): when you set reply_to, this auto-wakes the requester by default so your answer doesn't strand them idle — pass wake:\"off\" to suppress. The reply-default wake is strictly gated: it fires only for a FRESHLY-IDLE requester (one whose Claude Code hooks maintain a fresh idle marker), with a per-target rate limit and a one-wake dedupe; env kill-switch OXTAIL_AUTOWAKE=off. A requester with no idle marker (Codex, or Claude without the hooks) returns skipped_no_fresh_idle and is NOT auto-woken — use explicit wake:\"auto\" for those. Response carries wake_status (\"fired\" | \"skipped_busy\" | \"skipped_no_fresh_idle\" | \"skipped_rate_limited\" | \"skipped_deduped\" | \"skipped_store_error\" | \"skipped_no_target\" | \"disabled\") and, on the reply path, wake_reason:\"reply_to_default\".",
1013
+ "A plain message does NOT wake an idle peer. Pass wake:\"auto\" to nudge one via per-client send-keys, state-gated (skipped if the peer is mid-turn). EXCEPTION (wake-on-reply): when you set reply_to, this auto-wakes the requester by default so your answer doesn't strand them idle — pass wake:\"off\" to suppress. The reply-default wake is strictly gated: it fires only for a FRESHLY-IDLE requester (one whose Claude Code hooks maintain a fresh idle marker), with a per-target rate limit and a one-wake dedupe; env kill-switch OXTAIL_AUTOWAKE=off. A requester with no idle marker (Codex, or Claude without the hooks) returns skipped_no_fresh_idle and is NOT auto-woken — use explicit wake:\"auto\" for those. Response carries wake_status (\"fired\" | \"skipped_busy\" | \"skipped_debounced\" | \"skipped_no_fresh_idle\" | \"skipped_rate_limited\" | \"skipped_deduped\" | \"skipped_store_error\" | \"skipped_no_target\" | \"disabled\") and, on the reply path, wake_reason:\"reply_to_default\".",
1007
1014
  "Body is verbatim — wrap in <system-reminder>...</system-reminder> yourself if you want that framing. When replying to ask_peer, include reply_to: request_id from the inbound message. For a blocking send-and-wait, use ask_peer instead.",
1008
1015
  ].join(" "),
1009
1016
  inputSchema: {
@@ -1048,11 +1055,23 @@ server.registerTool("send_message", {
1048
1055
  }
1049
1056
  const peer = resolved.entry;
1050
1057
  const fromSessionId = entry.client.session_id ?? undefined;
1051
- const msg = mailbox.enqueue(peer.server_pid, body, fromSessionId, {
1058
+ // deliverToPeer records the durable reply-handle in the recipient's ledger
1059
+ // BEFORE the mailbox line is visible, so a later reply_to_message(message_id)
1060
+ // resolves even after the destructive mailbox/hook drain — and never sees a
1061
+ // displayed-but-unrecorded handle (record precedes append).
1062
+ const msg = deliverToPeer(peer.client.session_id, peer.server_pid, body, fromSessionId, {
1052
1063
  reply_to,
1053
1064
  source_message_id,
1054
1065
  });
1055
1066
  const { wake_status, wake_reason } = await resolveSendWake(peer, wake, reply_to);
1067
+ if (wake_status) {
1068
+ trace("wake_outcome", {
1069
+ via: wake_reason === "reply_to_default" ? "reply_default" : "send_message",
1070
+ wake_status,
1071
+ target_session_id: peer.client.session_id,
1072
+ client_type: peer.client.type,
1073
+ });
1074
+ }
1056
1075
  return jsonResult({
1057
1076
  schema_version: 1,
1058
1077
  ok: true,
@@ -1063,6 +1082,100 @@ server.registerTool("send_message", {
1063
1082
  ...(wake_reason ? { wake_reason } : {}),
1064
1083
  });
1065
1084
  });
1085
+ server.registerTool("reply_to_message", {
1086
+ description: [
1087
+ "Reply to a specific inbound peer message by its message_id — the atomic, correlation-safe alternative to hand-wiring send_message's target + reply_to. The server looks the message up in this session's durable received-ledger, so you pass only the message_id the PreToolUse hook or read_my_messages already showed you; it derives the reply target (the original sender), carries reply_to=request_id when the inbound was an ask_peer (keeping the exchange correlated), and sets source_message_id for provenance. Replying to a plain send_message works too — it just omits reply_to. Ownership is structural: you can only reply to a message delivered to you.",
1088
+ "Delivery + wake match send_message exactly, including the wake-on-reply default: when the inbound carried a request_id and you leave wake unset, a freshly-idle requester is auto-woken; pass wake:\"auto\" to nudge any idle peer, or wake:\"off\" to suppress. Fail-closed: an unknown or aged-out message_id returns error message-not-found instead of guessing a target.",
1089
+ ].join(" "),
1090
+ inputSchema: {
1091
+ message_id: z
1092
+ .string()
1093
+ .min(1)
1094
+ .describe("The message_id of the inbound peer message you are replying to, as shown by the PreToolUse hook or read_my_messages."),
1095
+ body: z
1096
+ .string()
1097
+ .min(1)
1098
+ .refine((s) => Buffer.byteLength(s, "utf8") <= 8192, {
1099
+ message: "body exceeds 8192 UTF-8 bytes",
1100
+ })
1101
+ .describe("Reply body, ≤8KB UTF-8. Verbatim."),
1102
+ wake: z
1103
+ .enum(["off", "auto"])
1104
+ .optional()
1105
+ .describe('Wake strategy, same semantics as send_message. Unset: wake-on-reply default (auto-wakes a freshly-idle requester when the inbound was an ask_peer). "auto": nudge any idle peer. "off": no nudge.'),
1106
+ },
1107
+ }, async ({ message_id, body, wake }) => {
1108
+ const myId = entry.client.session_id;
1109
+ if (!myId) {
1110
+ return jsonResult({
1111
+ schema_version: 1,
1112
+ ok: false,
1113
+ error: "no-session-id",
1114
+ message: "This session has not claimed a session_id, so it has no received-ledger to reply from. Call claim_session first.",
1115
+ });
1116
+ }
1117
+ const inbound = received.lookupReceived(myId, message_id);
1118
+ if (!inbound) {
1119
+ return jsonResult({
1120
+ schema_version: 1,
1121
+ ok: false,
1122
+ error: "message-not-found",
1123
+ message: `No received message ${message_id} in this session's ledger (it may have aged out of retention, or predates reply_to_message). Fall back to send_message with an explicit target.`,
1124
+ });
1125
+ }
1126
+ const targetSid = inbound.from_session_id;
1127
+ if (!targetSid) {
1128
+ return jsonResult({
1129
+ schema_version: 1,
1130
+ ok: false,
1131
+ error: "no-reply-target",
1132
+ message: `Inbound message ${message_id} has no from_session_id, so there is no peer to reply to.`,
1133
+ });
1134
+ }
1135
+ const replyTo = inbound.request_id; // undefined when the inbound was a plain send_message
1136
+ const resolved = resolveTarget(targetSid, entry);
1137
+ if (!resolved.ok) {
1138
+ const replyDefault = replyAutoWakeTriggered(wake, replyTo);
1139
+ const wakeIntended = wake === "auto" || replyDefault;
1140
+ const wake_status = wakeIntended ? resolveErrorWakeStatus(resolved.error) : undefined;
1141
+ return jsonResult({
1142
+ schema_version: 1,
1143
+ ...resolved,
1144
+ in_reply_to_message_id: message_id,
1145
+ original_from_session_id: targetSid,
1146
+ ...(wake_status ? { wake_status } : {}),
1147
+ ...(replyDefault ? { wake_reason: "reply_to_default" } : {}),
1148
+ });
1149
+ }
1150
+ const peer = resolved.entry;
1151
+ const fromSessionId = entry.client.session_id ?? undefined;
1152
+ // Record the reply itself into the original asker's ledger (record-before-
1153
+ // append) so replies can be replied to in turn — chained correlation.
1154
+ const msg = deliverToPeer(peer.client.session_id, peer.server_pid, body, fromSessionId, {
1155
+ reply_to: replyTo,
1156
+ source_message_id: message_id,
1157
+ });
1158
+ const { wake_status, wake_reason } = await resolveSendWake(peer, wake, replyTo);
1159
+ if (wake_status) {
1160
+ trace("wake_outcome", {
1161
+ via: wake_reason === "reply_to_default" ? "reply_default" : "reply_to_message",
1162
+ wake_status,
1163
+ target_session_id: peer.client.session_id,
1164
+ client_type: peer.client.type,
1165
+ });
1166
+ }
1167
+ return jsonResult({
1168
+ schema_version: 1,
1169
+ ok: true,
1170
+ message_id: msg.id,
1171
+ in_reply_to_message_id: message_id,
1172
+ target_session_id: peer.client.session_id,
1173
+ target_server_pid: peer.server_pid,
1174
+ correlation: replyTo ? "correlated" : "uncorrelated",
1175
+ ...(wake_status ? { wake_status } : {}),
1176
+ ...(wake_reason ? { wake_reason } : {}),
1177
+ });
1178
+ });
1066
1179
  // read_my_messages budget. A session's union drain can return a backlog; cap
1067
1180
  // how much one call hands back so a flood (or a peer spamming near-8KB bodies)
1068
1181
  // can't blow the caller's context in a single drain. Overflow is NOT dropped or
@@ -1251,11 +1364,11 @@ function askPeerDelay(ms, signal) {
1251
1364
  // parsed as a key event. The -l flag neutralizes any tmux keysequences a
1252
1365
  // malicious peer could plant in its registry entry.
1253
1366
  //
1254
- // Pane targeting can go stale: tmux_pane is cached at server startup
1255
- // (registry resolveTmuxPane), but Terminator-style window churn can move or
1256
- // close the pane after registration. send-keys against a dead pane id
1257
- // errors; if pane targeting fails and a sessionName is also available,
1258
- // retry against it (targets the session's currently-active pane).
1367
+ // askPeerWakeImpl keeps a generic pane→sessionName retry for its own unit
1368
+ // tests, but PRODUCTION wakePeer now passes only the process-tree-verified pane
1369
+ // (sessionName = null): a self-written tmux_session is not a trustworthy
1370
+ // send-keys target (issue #6), and pane-id churn is handled by re-resolving the
1371
+ // pane from server_pid on every wake rather than by a session fallback.
1259
1372
  async function defaultFireWakeKeystrokes(target, clientType) {
1260
1373
  execFileSync("tmux", ["send-keys", "-t", target, "-l", ASK_PEER_WAKE_TEXT], {
1261
1374
  stdio: ["ignore", "pipe", "pipe"],
@@ -1302,46 +1415,61 @@ export async function askPeerWakeImpl(pane, sessionName, fire) {
1302
1415
  // peer's client_type. Returns the wake_status that should surface in the
1303
1416
  // ask_peer response so callers can distinguish "we tried, no answer" from
1304
1417
  // "we didn't try because the client can't be woken."
1418
+ // In-memory per-process wake-debounce state, keyed by peer session_id. Coalesces
1419
+ // rapid repeat wakes to the same peer across all wake paths (issue #5).
1420
+ const wakeDebounce = newWakeDebounceStore();
1305
1421
  async function wakePeer(peer) {
1306
1422
  if (ASK_PEER_WAKE_STRATEGY === "off") {
1307
1423
  trace("ask_peer_wake_skipped", { reason: "strategy-off" });
1308
1424
  return "disabled";
1309
1425
  }
1310
1426
  const clientType = peer.client.type;
1311
- if (!peer.tmux_pane && !peer.tmux_session) {
1312
- return "skipped_no_target";
1313
- }
1314
- // Race-fix: tmux_pane is cached at registration but pane ids can be reused
1315
- // by tmux after a pane is killed. If we send-keys against a reused id we
1316
- // wake the wrong shell. When the peer registered WITH a cached pane,
1317
- // re-resolve from its server_pid at wake-time and prefer the live value.
1318
- // If the peer registered without a pane (no TMUX_PANE in env, no ancestry
1319
- // match), skip the re-resolution entirely fishing for a pane based on
1320
- // server_pid alone is unsafe (server_pid may not even still be alive, and
1321
- // in tests it can coincide with the test runner's process tree).
1322
- const livePane = peer.tmux_pane
1323
- ? currentPaneForServerPid(peer.server_pid)
1324
- : null;
1325
- if (peer.tmux_pane && livePane && livePane !== peer.tmux_pane) {
1326
- trace("ask_peer_wake_pane_refreshed", {
1427
+ // #5: coalesce a rapid repeat wake to the same peer (concurrent/retried
1428
+ // ask_peer, polling loops) so we don't stack a second notification line into
1429
+ // its composer. Keyed on session_id; an unclaimed peer (no id) isn't debounced.
1430
+ const sid = peer.client.session_id;
1431
+ if (sid && recentlyWoke(wakeDebounce, sid, Date.now())) {
1432
+ trace("ask_peer_wake_skipped", { reason: "debounced", target_session_id: sid });
1433
+ return "skipped_debounced";
1434
+ }
1435
+ // Security (#6): tmux_pane / tmux_session come from the peer's OWN registry
1436
+ // file, so a malicious local peer could point them at someone else's pane or
1437
+ // session to redirect our wake keystrokes. The ONLY trustworthy send-keys
1438
+ // target is the pane the live process tree says currently hosts the peer's
1439
+ // server_pid — chooseVerifiedWakePane resolves that and refuses (returns null)
1440
+ // when it can't be verified, instead of falling back to the self-written
1441
+ // cached pane or tmux_session. This also subsumes the old stale-pane re-
1442
+ // resolution race fix: we ALWAYS use the freshly process-tree-resolved pane.
1443
+ const verifiedPane = chooseVerifiedWakePane(peer);
1444
+ if (!verifiedPane) {
1445
+ trace("ask_peer_wake_skipped", {
1446
+ reason: "no-verified-pane",
1327
1447
  cached: peer.tmux_pane,
1328
- live: livePane,
1329
1448
  server_pid: peer.server_pid,
1449
+ target_session_id: peer.client.session_id,
1330
1450
  });
1451
+ return "skipped_no_target";
1331
1452
  }
1332
- else if (peer.tmux_pane && !livePane) {
1333
- trace("ask_peer_wake_pane_orphaned", {
1453
+ if (verifiedPane !== peer.tmux_pane) {
1454
+ trace("ask_peer_wake_pane_refreshed", {
1334
1455
  cached: peer.tmux_pane,
1456
+ live: verifiedPane,
1335
1457
  server_pid: peer.server_pid,
1336
1458
  });
1337
1459
  }
1338
- const effectivePane = livePane ?? peer.tmux_pane;
1339
1460
  // Legacy mode bypasses per-client routing: every wake is the v0.6 sequence
1340
1461
  // (no inter-keystroke delay). Cast to "unknown" so defaultFireWakeKeystrokes
1341
1462
  // skips the Codex delay branch.
1342
1463
  const fireType = ASK_PEER_WAKE_STRATEGY === "legacy" ? "unknown" : clientType;
1343
1464
  const fire = (target) => defaultFireWakeKeystrokes(target, fireType);
1344
- const ok = await askPeerWakeImpl(effectivePane, peer.tmux_session, fire);
1465
+ // #5: stamp the debounce BEFORE the (possibly async, paste-burst-delayed) fire
1466
+ // so a concurrent second wakePeer for this peer — which runs while we're
1467
+ // awaiting send-keys — sees the stamp and coalesces instead of double-firing.
1468
+ if (sid)
1469
+ markWoke(wakeDebounce, sid, Date.now());
1470
+ // No session-name fallback: a self-written tmux_session could target another
1471
+ // session, and the verified pane already handles pane-id churn. Pass null.
1472
+ const ok = await askPeerWakeImpl(verifiedPane, null, fire);
1345
1473
  return ok ? "fired" : "skipped_no_target";
1346
1474
  }
1347
1475
  // --- send_message wake:auto gating -------------------------------------------
@@ -1528,7 +1656,9 @@ server.registerTool("ask_peer", {
1528
1656
  const requestId = randomBytes(8).toString("hex");
1529
1657
  const requireReplyTo = peerSupportsReplyTo(peer);
1530
1658
  const fromSessionId = entry.client.session_id ?? undefined;
1531
- const msg = mailbox.enqueue(peer.server_pid, body, fromSessionId, {
1659
+ // Record-before-append (mirrors send_message): lets the peer answer with
1660
+ // reply_to_message(message_id) instead of hand-wiring target + reply_to.
1661
+ const msg = deliverToPeer(expectedSessionId, peer.server_pid, body, fromSessionId, {
1532
1662
  request_id: requestId,
1533
1663
  });
1534
1664
  const startedAt = Date.now();
@@ -1562,6 +1692,12 @@ server.registerTool("ask_peer", {
1562
1692
  // send_message wake:auto. (Codex has no activity file, so it is never
1563
1693
  // detected busy and still fires — unchanged for that client.)
1564
1694
  wakeStatus = await wakeForSend(peer);
1695
+ trace("wake_outcome", {
1696
+ via: "ask_peer",
1697
+ wake_status: wakeStatus,
1698
+ target_session_id: peer.client.session_id,
1699
+ client_type: peer.client.type,
1700
+ });
1565
1701
  if (wakeStatus === "skipped_unsupported") {
1566
1702
  // Reserved branch. No client currently returns skipped_unsupported
1567
1703
  // in auto mode (Codex and Claude Code both wake via send-keys).
@@ -0,0 +1,45 @@
1
+ // Issue #5 — per-peer wake debouncer.
2
+ //
3
+ // Every wake fires `tmux send-keys` into the peer's composer. When the same peer
4
+ // is woken again within a fraction of a second — a caller retrying ask_peer, two
5
+ // callers targeting the same peer concurrently, or a polling loop — oxtail blasts
6
+ // a second WAKE_TEXT line on top of the first, which (with the Codex paste-burst
7
+ // gap) can land inside an already-active turn. This debouncer coalesces those:
8
+ // if a wake fired for a peer within a short window, subsequent wakes are skipped
9
+ // and rely on the still-pending response.
10
+ //
11
+ // Deliberately in-memory and per-process (state lives on the calling oxtail
12
+ // server): the common burst — one caller hammering one peer — is same-process,
13
+ // and cross-process coordination is out of scope for this slice. All wake paths
14
+ // (ask_peer, send_message wake:"auto", the reply-default wake) funnel through
15
+ // wakePeer, so one check there covers them all.
16
+ function envPosInt(name, def, env = process.env) {
17
+ const v = env[name];
18
+ if (!v)
19
+ return def;
20
+ const n = Number(v);
21
+ return Number.isFinite(n) && n > 0 ? n : def;
22
+ }
23
+ // Default 1s — long enough to swallow a rapid retry / concurrent double-wake,
24
+ // short enough that a genuinely separate follow-up wake a moment later still
25
+ // lands. Tunable via OXTAIL_WAKE_DEBOUNCE_MS.
26
+ export const WAKE_DEBOUNCE_MS = envPosInt("OXTAIL_WAKE_DEBOUNCE_MS", 1000);
27
+ export function newWakeDebounceStore() {
28
+ return new Map();
29
+ }
30
+ // True if a wake fired for this key within the window — i.e. skip this one.
31
+ export function recentlyWoke(store, key, nowMs, windowMs = WAKE_DEBOUNCE_MS) {
32
+ const last = store.get(key);
33
+ return last !== undefined && nowMs - last < windowMs;
34
+ }
35
+ // Record that a wake fired for this key. Opportunistically evicts stale entries
36
+ // so the map can't grow unbounded across many short-lived peers.
37
+ export function markWoke(store, key, nowMs, windowMs = WAKE_DEBOUNCE_MS) {
38
+ store.set(key, nowMs);
39
+ if (store.size > 256) {
40
+ for (const [k, t] of store) {
41
+ if (nowMs - t > windowMs * 10)
42
+ store.delete(k);
43
+ }
44
+ }
45
+ }
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "oxtail",
3
- "version": "0.11.0",
3
+ "version": "0.13.0",
4
4
  "private": false,
5
5
  "type": "module",
6
6
  "description": "Coordination layer for parallel AI coding agent sessions, exposed over MCP.",
@@ -0,0 +1,63 @@
1
+ #!/usr/bin/env node
2
+ // Issue #7 — drift detector for Codex's paste-burst window.
3
+ //
4
+ // oxtail's Codex wake inserts a 500ms gap (ASK_PEER_CODEX_SUBMIT_DELAY_MS)
5
+ // between the typed wake text and Enter, to outlast Codex's paste-burst
6
+ // PASTE_ENTER_SUPPRESS_WINDOW — a private constant tested at 120ms. If Codex
7
+ // bumps that window past our gap in a future release, our wake silently
8
+ // regresses to "Enter gets swallowed" with no signal pointing at the cause.
9
+ //
10
+ // This script fetches the upstream constant and exits non-zero if it changed
11
+ // (or moved/renamed). Run on a schedule (see .github/workflows/codex-drift.yml)
12
+ // so drift surfaces as a failing job rather than a silent field regression.
13
+
14
+ const URL =
15
+ "https://raw.githubusercontent.com/openai/codex/main/codex-rs/tui/src/bottom_pane/paste_burst.rs";
16
+ const EXPECTED_MS = 120; // value oxtail's 500ms gap was verified against
17
+ const OUR_GAP_MS = 500; // ASK_PEER_CODEX_SUBMIT_DELAY_MS in src/server.ts
18
+
19
+ async function fetchSource(attempts = 3) {
20
+ let lastErr;
21
+ for (let i = 0; i < attempts; i++) {
22
+ try {
23
+ const res = await fetch(URL);
24
+ if (res.ok) return await res.text();
25
+ lastErr = new Error(`HTTP ${res.status}`);
26
+ } catch (e) {
27
+ lastErr = e;
28
+ }
29
+ await new Promise((r) => setTimeout(r, 1000 * (i + 1)));
30
+ }
31
+ throw lastErr;
32
+ }
33
+
34
+ let src;
35
+ try {
36
+ src = await fetchSource();
37
+ } catch (e) {
38
+ console.error(`drift-check: could not fetch paste_burst.rs (${e?.message ?? e}). Transient — re-run.`);
39
+ process.exit(2);
40
+ }
41
+
42
+ const m = src.match(/PASTE_ENTER_SUPPRESS_WINDOW[\s\S]{0,120}?from_millis\((\d+)\)/);
43
+ if (!m) {
44
+ console.error(
45
+ "drift-check: PASTE_ENTER_SUPPRESS_WINDOW / from_millis(...) not found upstream — Codex may have renamed or restructured the paste-burst logic. Re-verify oxtail's Codex wake gap (ASK_PEER_CODEX_SUBMIT_DELAY_MS) by hand.",
46
+ );
47
+ process.exit(1);
48
+ }
49
+
50
+ const ms = Number(m[1]);
51
+ if (ms !== EXPECTED_MS) {
52
+ const stillSafe = ms < OUR_GAP_MS;
53
+ console.error(
54
+ `drift-check: PASTE_ENTER_SUPPRESS_WINDOW changed ${EXPECTED_MS}ms -> ${ms}ms. ` +
55
+ `oxtail's gap is ${OUR_GAP_MS}ms — ` +
56
+ (stillSafe
57
+ ? "still larger, so wake should still submit, but update EXPECTED_MS here once re-verified."
58
+ : "NO LONGER LARGER: Codex wake will regress (Enter swallowed). Bump ASK_PEER_CODEX_SUBMIT_DELAY_MS in src/server.ts."),
59
+ );
60
+ process.exit(1);
61
+ }
62
+
63
+ console.log(`drift-check: PASTE_ENTER_SUPPRESS_WINDOW still ${ms}ms; oxtail gap ${OUR_GAP_MS}ms — OK.`);