@dmsdc-ai/aigentry-telepty 0.1.97 → 0.3.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,139 @@
1
+ # 2026-05-02 — `inject --submit-force` + idempotent client retry
2
+
3
+ Closes task #347 (telepty 0.3.2 `--submit` prompt-symbol gate reliability —
4
+ context-ref inject arrived at orchestrator but Enter was skipped when the
5
+ input area had a transient render mismatch: autocomplete dropdown open,
6
+ cursor moved, mid-render race).
7
+
8
+ ## Problem
9
+
10
+ `telepty inject --submit` runs three layers of gating before pressing
11
+ Enter:
12
+
13
+ | Layer | File | Trigger | Skip behavior |
14
+ |---|---|---|---|
15
+ | 3. Prompt-symbol (0.3.2) | `src/submit-gate.js` `awaitPromptSymbol` | `cmux read-screen` does not show the per-CLI prompt symbol stably for ≥200 ms within 8 s | Falls through to Layer 1 (`no_prompt_symbol_seen`) |
16
+ | 1. State-gated (0.3.1) | `src/submit-gate.js` `awaitReplReady` | `sessionStateManager` is not in `idle`/`waiting` with conf ≥ 0.5 within 10 s | Best-effort dispatch on `timeout`; hard-fail short-circuits to 504 on `session_dead`/`error`/`restarting`/`no_state` |
17
+ | Verify | `src/submit-gate.js` `verifyBodyConsumed` | Injected body still visible in `outputRing` after dispatch | One bounded retry; if still visible, 504 with `reason: 'gated_dispatch_unconsumed'` |
18
+
19
+ In production this still produces a residual failure rate when the
20
+ orchestrator session has a transient render mismatch (autocomplete drop-down,
21
+ cursor outside input area, mid-paste). The body is injected, the gate times
22
+ out, the dispatch fires Enter into a "wrong" focus, and `verifyBodyConsumed`
23
+ correctly sees the body still in the input box → 504. Sub-sessions then
24
+ print `⚠️ Submit gated-timeout` and the human user has to press Enter
25
+ manually for the orchestrator to consume the inject.
26
+
27
+ ## Constraints
28
+
29
+ - **Article 1 (경량)**: minimum-touch fix. No new modules, no new daemon
30
+ endpoint, no new helper module.
31
+ - **Article 17 (무의존)**: no new runtime dependency.
32
+ - **Article 9 (독립)**: telepty must keep working standalone (no cmux/kitty
33
+ required for the new flags).
34
+ - **Backward compat**: existing `--submit` semantics unchanged. Default
35
+ `--submit-retry` value MUST be 0-effect on the happy path (which is the
36
+ vast majority of calls, currently shipping reliably).
37
+ - **Idempotency**: a retry must never double-press Enter.
38
+
39
+ ## Approach
40
+
41
+ Two opt-in CLI knobs on `telepty inject`, both implemented client-side
42
+ in `cli.js`. Daemon `/submit` endpoint is untouched — `force: true` is
43
+ already supported (introduced in 0.3.1 for `telepty send-key`); we just
44
+ plumb it through from the inject path.
45
+
46
+ ### `--submit-force`
47
+
48
+ Adds `force: true` to the `/submit` POST body. Daemon-side this skips
49
+ both Layer 3 (prompt-symbol) and Layer 1 (state-gate) and dispatches Enter
50
+ once via the existing `terminalLevelSubmit` chain (kitty → cmux → PTY).
51
+
52
+ Use case: caller is confident the target REPL is ready (e.g., orchestrator
53
+ visibly idle, or Phase-6 cascade where sub-session has just verified the
54
+ orchestrator's last bus event). Mirrors the existing `telepty send-key`
55
+ escape hatch but at the inject level so a single command does both.
56
+
57
+ ### `--submit-retry N` (default 1, clamp [0, 3])
58
+
59
+ After a 504 from `/submit` with a **retry-safe** reason, wait 300 ms and
60
+ re-issue the same `/submit` request up to N times. Retry-safe reasons:
61
+
62
+ | Reason | Source | Why retry is idempotent |
63
+ |---|---|---|
64
+ | `gated_dispatch_unconsumed` | `daemon.js:1680` | The verify path saw the body STILL in the input box after best-effort dispatch. Re-firing Enter when the body is visibly un-consumed cannot double-submit. |
65
+ | `gate_timeout` | `awaitReplReady` returning `timeout` (no longer reaches 504 directly in 0.3.1, but kept for forward-compat) | Same: body has not been consumed if we're still on the gated path. |
66
+ | `no_prompt_symbol_seen` | `awaitPromptSymbol` Layer 3 timeout (also not currently a 504 source, but kept for forward-compat) | Layer 3 alone never emits 504 today. Listed for completeness. |
67
+
68
+ Retry is **explicitly NOT** safe for hard-fail reasons — `session_dead`,
69
+ `session_error`, `session_restarting`, `no_state`, `no_state_manager`. Those
70
+ short-circuit the loop immediately because re-firing won't recover. Same
71
+ for any non-504 status (4xx) — no point retrying a malformed request.
72
+
73
+ The retry preserves the original flag set (`force` stays `force`, etc.).
74
+ The `attemptsMade` counter is rendered into the success line as
75
+ `[retry K/N]` so operators can see when the retry path actually fired.
76
+
77
+ ### Why client-side (not daemon-side)?
78
+
79
+ - Server-side already retries once internally inside `verifyBodyConsumed`
80
+ (`daemon.js:1663-1672`). Adding a second loop server-side conflates two
81
+ feedback signals (the inner verify retry vs. the outer client retry) in
82
+ one response shape.
83
+ - Per-call client control is more flexible — sub-sessions that have
84
+ cheap evidence of orchestrator readiness can pass `--submit-retry 0`
85
+ to avoid the extra round-trip; ones that don't can pass `--submit-retry 2`.
86
+ - Keeps the daemon stable. 0.3.0 cluster (memory:
87
+ `feedback_telepty_send_key_regression.md`) was a daemon-side change that
88
+ rippled into manual-override breakage. Client-side change has a strictly
89
+ smaller blast radius.
90
+
91
+ ## File map
92
+
93
+ | File | Change | LoC delta |
94
+ |---|---|---|
95
+ | `cli.js` (inject command) | Parse `--submit-force` + `--submit-retry`. Wrap existing `useSubmit` block in idempotent retry loop on 504-with-safe-reason. | +~55, -~25 |
96
+ | `test/cli.test.js` | Three new tests: --submit-force passes force=true; --submit-retry retries on safe-reason 504; --submit-retry does NOT retry on hard-fail 504. | +~120 |
97
+ | `CHANGELOG.md` | 0.3.3 entry. | +~30 |
98
+ | `package.json` | 0.3.2 → 0.3.3. | +1, -1 |
99
+ | `test/enforce-report.test.js:280` | Update stale version assertion 0.2.0 → 0.3.3. | +1, -1 |
100
+ | `README.md` | Mention new flags in inject summary. | +~6 |
101
+
102
+ No new files outside `test/` and `docs/`. No daemon changes. No new
103
+ dependencies. Total surface ≪ 200 LoC including tests.
104
+
105
+ ## Tests
106
+
107
+ ### Unit / integration (`test/cli.test.js`)
108
+
109
+ 1. **`--submit-force` passes `force: true` to /submit**
110
+ Spawn a session, intercept `/submit` (use existing harness method or
111
+ inspect bus event), invoke `telepty inject --submit --submit-force <id>
112
+ "x"`, assert daemon received `{ force: true }` in the request body.
113
+
114
+ 2. **`--submit-retry N` retries on safe-reason 504**
115
+ Mock the daemon to return 504 `{reason: 'gated_dispatch_unconsumed'}`
116
+ on the first call and 200 on the second. Assert the CLI made exactly
117
+ 2 POST /submit calls and exited 0. Assert `[retry 1/N]` is present
118
+ in stdout.
119
+
120
+ 3. **`--submit-retry N` does NOT retry on hard-fail 504**
121
+ Mock the daemon to return 504 `{reason: 'session_dead'}`. Assert the
122
+ CLI made exactly 1 POST /submit call (no retry).
123
+
124
+ ### Regression — full suite
125
+
126
+ `npm test` — 229 tests, all should pass after updating the stale
127
+ `enforce-report.test.js:280` version assertion.
128
+
129
+ ## Future-proofing notes
130
+
131
+ - If the daemon adds new 504 reasons, they are by default **NOT** retry-
132
+ safe (the safe set is an explicit allowlist). Adding a new safe reason
133
+ is a one-line `RETRY_SAFE_REASONS.add(...)` change in `cli.js`.
134
+ - The flag pair composes: `--submit-force --submit-retry 0` (force-once),
135
+ `--submit-force --submit-retry 2` (force, with idempotent retry on the
136
+ rare 503 — though force never returns 504 today).
137
+ - The 300 ms retry delay is a constant, not a flag, to keep the surface
138
+ small. Empirically chosen at the upper end of the architect's
139
+ 100–300 ms window for the autocomplete-dropdown-close case.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@dmsdc-ai/aigentry-telepty",
3
- "version": "0.1.97",
3
+ "version": "0.3.3",
4
4
  "main": "daemon.js",
5
5
  "bin": {
6
6
  "aigentry-telepty": "install.js",
@@ -9,9 +9,9 @@
9
9
  "telepty-mcp": "mcp-server/index.mjs"
10
10
  },
11
11
  "scripts": {
12
- "test": "node --test test/auth.test.js test/daemon.test.js test/daemon-singleton.test.js test/cli.test.js test/skill-installer.test.js test/interactive-terminal.test.js test/runtime-info.test.js test/session-routing.test.js test/session-state.test.js test/mailbox-lock.test.js",
13
- "test:watch": "node --test --watch test/auth.test.js test/daemon.test.js test/daemon-singleton.test.js test/cli.test.js test/skill-installer.test.js test/interactive-terminal.test.js test/runtime-info.test.js test/session-routing.test.js test/session-state.test.js test/mailbox-lock.test.js",
14
- "test:ci": "node --test --test-reporter=spec test/auth.test.js test/daemon.test.js test/daemon-singleton.test.js test/cli.test.js test/skill-installer.test.js test/interactive-terminal.test.js test/runtime-info.test.js test/session-routing.test.js test/session-state.test.js test/mailbox-lock.test.js"
12
+ "test": "node --test test/auth.test.js test/daemon.test.js test/daemon-singleton.test.js test/cli.test.js test/skill-installer.test.js test/interactive-terminal.test.js test/runtime-info.test.js test/session-routing.test.js test/session-state.test.js test/mailbox-lock.test.js test/report-enforcement.test.js test/enforce-report.test.js test/submit-gate.test.js test/prompt-symbol-registry.test.js test/inject-submit-flags.test.js",
13
+ "test:watch": "node --test --watch test/auth.test.js test/daemon.test.js test/daemon-singleton.test.js test/cli.test.js test/skill-installer.test.js test/interactive-terminal.test.js test/runtime-info.test.js test/session-routing.test.js test/session-state.test.js test/mailbox-lock.test.js test/report-enforcement.test.js test/enforce-report.test.js test/submit-gate.test.js test/prompt-symbol-registry.test.js test/inject-submit-flags.test.js",
14
+ "test:ci": "node --test --test-reporter=spec test/auth.test.js test/daemon.test.js test/daemon-singleton.test.js test/cli.test.js test/skill-installer.test.js test/interactive-terminal.test.js test/runtime-info.test.js test/session-routing.test.js test/session-state.test.js test/mailbox-lock.test.js test/report-enforcement.test.js test/enforce-report.test.js test/submit-gate.test.js test/prompt-symbol-registry.test.js test/inject-submit-flags.test.js"
15
15
  },
16
16
  "keywords": [
17
17
  "pty",
@@ -0,0 +1,201 @@
1
+ # SPEC: Codex inject reliability — 4 issues
2
+
3
+ **Bug source:** orchestrator inject e9f41301...
4
+ **Session:** aigentry-telepty
5
+ **Status:** SPEC — awaiting orchestrator approval
6
+
7
+ ---
8
+
9
+ ## Goal
10
+
11
+ Make `telepty inject` work reliably with codex sessions. Currently 4 failure
12
+ modes: Enter not pressed, active work overwrite, REPORT not sent, multi-task
13
+ partial processing.
14
+
15
+ ---
16
+
17
+ ## Root Cause Analysis
18
+
19
+ ### Issue 1: inject succeeds but Enter NOT pressed
20
+
21
+ **Flow:** daemon `deliverInjectionToSession()` → mailbox → `tick()` →
22
+ `writeDataToSession()` sends text via WS → allow-bridge → `child.write(text)`.
23
+ Then 500ms later, `writeDataToSession(id, session, '\r')` → WS → allow-bridge →
24
+ `child.write('\r')`.
25
+
26
+ **Root cause:** codex CLI puts terminal in raw mode with custom input handling.
27
+ PTY-level `\r` via `child.write('\r')` is NOT equivalent to pressing Enter in
28
+ codex's input model. codex reads PTY input character by character in raw mode
29
+ and interprets `\r` differently than a keyboard Enter event.
30
+
31
+ **Evidence:** Project memory: "PTY `\r` 직접 의존 금지" — don't depend on PTY
32
+ `\r` directly. "inject submit은 항상 osascript/kitty terminal-level submit 우선".
33
+
34
+ The `--submit` flag exists in CLI but POST /submit also uses `submitViaPty()` →
35
+ same `\r` via WS. It does NOT use terminal-level submit (kitty/cmux).
36
+
37
+ ### Issue 2: New inject overwrites active work
38
+
39
+ **Flow:** `deliverInjectionToSession()` enqueues to mailbox and calls
40
+ `mailboxDelivery.tick()` immediately. Text goes via WS → allow-bridge.
41
+
42
+ Allow-bridge has queuing: if `isIdle()` is false, text goes to
43
+ `enqueueBridgeMessage()`. The safety timer flushes after 5s regardless. But the
44
+ daemon doesn't check session state — it pushes immediately.
45
+
46
+ **Root cause:** Two layers of the problem:
47
+ 1. Daemon sends inject regardless of session state (working/thinking/idle)
48
+ 2. Allow-bridge 5s safety flush writes queued text to PTY even if session is
49
+ still working, which interrupts codex's current task
50
+
51
+ ### Issue 3: REPORT not sent after completion
52
+
53
+ **Flow:** Auto-report mechanism (`pendingReports`) triggers when allow-bridge
54
+ sends `{ type: 'ready' }` WS message. The `ready` signal fires when
55
+ `promptPattern.test(data)` matches in the PTY output.
56
+
57
+ **Root cause:** codex prompt pattern `codex: /[❯>]\s*$/` doesn't reliably match
58
+ codex's actual prompt output. If prompt is never detected → `ready` never sent →
59
+ `pendingReports` never cleared → auto-report never fires.
60
+
61
+ The new session state machine (#185) detects `idle` via OSC 133 + silence
62
+ timeout, but auto-report still uses the legacy `ready` WS signal (daemon.js
63
+ line 2290-2315), not the `session_auto_state` transitions.
64
+
65
+ ### Issue 4: Multiple tasks in one inject — partial processing
66
+
67
+ **Root cause:** AI behavior, not telepty bug. When a --ref file contains Task A
68
+ + Task B, codex processes Task A and returns to prompt. This is standard LLM
69
+ behavior — no telepty fix needed.
70
+
71
+ **Mitigation:** Orchestrator should split multi-task injects into separate
72
+ sequential calls with idle-gating between them (orchestrator-side logic).
73
+
74
+ ---
75
+
76
+ ## Scope
77
+
78
+ **Phase 1 (this spec):** Fix Issues 1 and 3 (guaranteed Enter + guaranteed
79
+ REPORT). These are telepty-side fixes.
80
+
81
+ **Phase 2 (separate task):** Fix Issue 2 (inject queuing during active work).
82
+ Requires daemon-side session state awareness.
83
+
84
+ **Out of scope:** Issue 4 (orchestrator-level task splitting).
85
+
86
+ ---
87
+
88
+ ## Files to Modify
89
+
90
+ | File | Change |
91
+ |---|---|
92
+ | `daemon.js` | Fix 1: `deliverInjectionToSession()` — use `sendViaKitty()` for CR instead of PTY `\r`. Fix 3: Wire auto-report to session state `idle` transition instead of legacy `ready` signal. |
93
+ | `daemon.js` | Fix 1: POST `/submit` endpoint — use kitty send-text with cmux fallback instead of `submitViaPty()`. |
94
+
95
+ ---
96
+
97
+ ## Approach
98
+
99
+ ### Fix 1: Terminal-level submit for wrapped sessions
100
+
101
+ Replace PTY `\r` with `sendViaKitty()` in `deliverInjectionToSession()`:
102
+
103
+ ```js
104
+ // BEFORE (daemon.js ~line 590):
105
+ if (!options.noEnter && session.type !== 'aterm') {
106
+ const submitDelay = session.type === 'wrapped' ? 500 : 300;
107
+ setTimeout(async () => {
108
+ const submitResult = await writeDataToSession(id, session, '\r');
109
+ // ...
110
+ }, submitDelay);
111
+ }
112
+
113
+ // AFTER:
114
+ if (!options.noEnter && session.type !== 'aterm') {
115
+ const submitDelay = session.type === 'wrapped' ? 500 : 300;
116
+ setTimeout(async () => {
117
+ let submitted = false;
118
+ // Priority 1: kitty send-text (terminal-level, bypasses PTY quirks)
119
+ if (session.type === 'wrapped') {
120
+ submitted = sendViaKitty(id, '\r');
121
+ }
122
+ // Priority 2: cmux send-key (for cmux-managed sessions)
123
+ if (!submitted && session.backend === 'cmux' && session.cmuxWorkspaceId) {
124
+ submitted = submitViaCmux(id);
125
+ }
126
+ // Priority 3: PTY fallback (spawned sessions without kitty)
127
+ if (!submitted) {
128
+ const submitResult = await writeDataToSession(id, session, '\r');
129
+ if (!submitResult.success) {
130
+ emitInjectFailureEvent(id, submitResult.code, submitResult.error, {
131
+ phase: 'submit', source: options.source || 'inject'
132
+ }, session);
133
+ }
134
+ }
135
+ }, submitDelay);
136
+ }
137
+ ```
138
+
139
+ Also update POST `/submit` endpoint to use same priority chain instead of
140
+ always calling `submitViaPty()`.
141
+
142
+ ### Fix 3: Auto-report via session state machine
143
+
144
+ Wire auto-report to the `session_auto_state` transition event (already emitted
145
+ by `sessionStateManager.onTransition()`). When a session transitions to `idle`
146
+ and has a pending report, fire the auto-report.
147
+
148
+ ```js
149
+ // In the existing sessionStateManager.onTransition callback (daemon.js ~line 37):
150
+ sessionStateManager.onTransition((sessionId, from, to, detail) => {
151
+ const session = sessions[sessionId];
152
+ if (!session) return;
153
+ broadcastSessionEvent('session_auto_state', sessionId, session, {
154
+ extra: { auto_state: to, auto_state_from: from, auto_detail: detail }
155
+ });
156
+
157
+ // Auto-report: fire when session transitions to idle after inject
158
+ if (to === 'idle' && pendingReports[sessionId]) {
159
+ const pendingReport = pendingReports[sessionId];
160
+ delete pendingReports[sessionId];
161
+ const elapsed = ((Date.now() - new Date(pendingReport.injectedAt).getTime()) / 1000).toFixed(1);
162
+ const reportMsg = `TASK_COMPLETE: ${sessionId} is now idle after processing inject (${elapsed}s)`;
163
+ const srcId = resolveSessionAlias(pendingReport.source) || pendingReport.source;
164
+ const srcSession = sessions[srcId];
165
+ if (srcSession) {
166
+ deliverInjectionToSession(srcId, srcSession, reportMsg, { noEnter: false, source: 'auto_report' });
167
+ console.log(`[AUTO-REPORT] ${sessionId} → ${srcId}: idle after ${elapsed}s`);
168
+ }
169
+ }
170
+ });
171
+ ```
172
+
173
+ Keep the legacy `ready`-based auto-report as fallback (don't remove it).
174
+
175
+ ---
176
+
177
+ ## Verification
178
+
179
+ 1. **Test:** `telepty inject xtem-rtm "echo hello"` → codex processes it
180
+ (Enter pressed via kitty send-text)
181
+ 2. **Test:** `telepty inject --ref --from orchestrator xtem-rtm 'task'` → after
182
+ codex completes → auto-report fires via idle state transition
183
+ 3. **Test:** Sessions without kitty (spawned) → PTY `\r` fallback still works
184
+ 4. **Test:** Existing 131 tests still pass
185
+
186
+ ---
187
+
188
+ ## Risks
189
+
190
+ 1. **kitty not available.** Mitigated: 3-tier fallback (kitty → cmux → PTY).
191
+ PTY path preserved as last resort.
192
+ 2. **`sendViaKitty()` needs kitty socket + window ID match.** Already
193
+ implemented and working for other features. If kitty window not found,
194
+ falls through to PTY.
195
+ 3. **Auto-report via state machine may fire too early.** The idle detection
196
+ uses 5s silence timeout. If codex pauses >5s mid-task, it may fire
197
+ prematurely. Mitigated: auto-report has `AUTO_REPORT_IDLE_SECONDS` (10s)
198
+ threshold. Can add a minimum elapsed time guard.
199
+ 4. **Dual auto-report paths (state machine + legacy ready).** Could fire
200
+ twice. Mitigated: `delete pendingReports[sessionId]` in both paths —
201
+ whichever fires first consumes the pending report.
@@ -0,0 +1,237 @@
1
+ # SPEC: Enforce result-summary REPORT when sessions go idle
2
+
3
+ **Source:** orchestrator inject d94c9990...
4
+ **Session:** aigentry-telepty
5
+ **Status:** SPEC — awaiting orchestrator approval
6
+ **Topic:** REPORT enforcement after inject-driven idle transitions
7
+
8
+ ---
9
+
10
+ ## 1. Design options & recommendation
11
+
12
+ ### Option A — Gate idle transition until REPORT arrives
13
+ Prevent `idle` transition from firing for N seconds until content REPORT
14
+ detected as sent by the session.
15
+
16
+ - ❌ Violates invariant: "Do NOT break existing idle detection"
17
+ - ❌ Requires invasive state machine changes
18
+ - **Rejected.**
19
+
20
+ ### Option B — Auto-summarize PTY output
21
+ Scrape last X lines of session PTY output, strip ANSI, attach as
22
+ `auto_summary` field on `TASK_COMPLETE`.
23
+
24
+ - ✅ Zero session-side changes
25
+ - ✅ Always provides content payload
26
+ - ❌ PTY scraping is noisy (progress bars, status lines, spinner remnants)
27
+ - ❌ Masks the root cause — sessions still forget to REPORT
28
+ - **Keep as fallback, not primary.**
29
+
30
+ ### Option C — Two-stage notification
31
+ On idle transition, fire `TASK_IDLE_NO_REPORT` (not `TASK_COMPLETE`).
32
+ Watch for content REPORT inject BACK to the source session for N seconds.
33
+ If REPORT detected → emit `TASK_COMPLETE_WITH_REPORT`. Else → emit
34
+ `TASK_TIMEOUT_NO_REPORT` with `auto_summary` fallback (Option B).
35
+
36
+ - ✅ Observable from orchestrator without code changes (richer events)
37
+ - ✅ Doesn't break existing idle detection (fires AFTER idle transition)
38
+ - ✅ No session-side changes required
39
+ - ✅ Backward-compat (old consumers see bus event, just with new `type`)
40
+ - ✅ Provides clear state difference between "REPORTed" and "idled silently"
41
+ - **Recommended primary.**
42
+
43
+ ### Option D — Prompt-injection reminder
44
+ When session about to go idle after inject, auto-inject reminder text.
45
+
46
+ - ❌ Interferes with active work
47
+ - ❌ Doesn't guarantee compliance
48
+ - ❌ Session might be in final cleanup — inject causes confusion
49
+ - **Rejected.**
50
+
51
+ ### Recommendation: **Option C + Option B fallback**
52
+
53
+ Two-stage notification with PTY-scrape auto-summary as timeout fallback.
54
+ Minimal blast radius, maximal observability, preserves all invariants.
55
+
56
+ ---
57
+
58
+ ## 2. Content REPORT schema
59
+
60
+ Parse from inject body text via prefix. Structured envelope would require
61
+ session-side library; free-text prefix keeps all LLMs compatible.
62
+
63
+ **Detection rule:** An inject from session X BACK to session Y (where Y was
64
+ the original `--from` source for X's last inject) whose prompt text starts
65
+ with one of:
66
+ - `REPORT:` (completed / partial result)
67
+ - `STATUS:` (blocked / dismissed / error)
68
+ - `ENFORCE-SPEC:`, `SPEC:`, `OWNER-DIAGNOSIS:` — recognized REPORT variants
69
+
70
+ Required fields (parsed from pipe-separated text):
71
+ - `source_session` — auto (sender of the reply inject)
72
+ - `target_session` — auto (recipient, i.e. the original orchestrator)
73
+ - `inject_ref` — auto (matched via pendingReports tracking)
74
+ - `status` — parsed from prefix: `REPORT:` → completed; `STATUS: blocked` → blocked; etc.
75
+ - `summary` — the full prompt text (20-500 chars recommended, not enforced)
76
+ - `artifacts` — optional, parsed from `files={...}` pipe-field
77
+ - `next_action` — optional, parsed from `next={...}` pipe-field
78
+
79
+ **Non-breaking:** If the reply inject doesn't match any REPORT prefix, it's
80
+ treated as a regular inject (current behavior preserved).
81
+
82
+ ---
83
+
84
+ ## 3. Timeout + failure handling
85
+
86
+ | Condition | Action | Notification |
87
+ |---|---|---|
88
+ | REPORT arrives within `reportTimeoutSecs` (default 120s) | Cancel timer, mark as reported | `TASK_COMPLETE_WITH_REPORT` (rich payload) |
89
+ | No REPORT within `reportTimeoutSecs` | Fire timeout | `TASK_TIMEOUT_NO_REPORT` with `auto_summary` (last 40 non-blank stripAnsi lines from `session.outputRing`) |
90
+ | Session sends `STATUS: blocked` explicitly | Immediate settlement | `TASK_BLOCKED_WITH_REASON` |
91
+ | Session dies before REPORT | Detected via `dead` transition | `TASK_DEAD_NO_REPORT` with `auto_summary` |
92
+
93
+ **Interaction with existing 60s deliberation timeout:** Orthogonal. Deliberation
94
+ timeout is a separate orchestrator-level concept. This daemon-level REPORT
95
+ timeout fires AFTER idle but BEFORE any orchestrator follow-up. Default 120s
96
+ gives orchestrator time to see `TASK_IDLE_NO_REPORT` and follow up before
97
+ auto-summary fires.
98
+
99
+ ---
100
+
101
+ ## 4. Back-compat
102
+
103
+ - Legacy `TASK_COMPLETE: {session} is now idle after processing inject ({N}s)`
104
+ text format: **deprecated but kept emitting** for 1 minor version. Emit BOTH
105
+ the new `TASK_IDLE_NO_REPORT` bus event AND the legacy text-inject-to-source
106
+ during transition period.
107
+ - New bus event types: `TASK_IDLE_NO_REPORT`, `TASK_COMPLETE_WITH_REPORT`,
108
+ `TASK_TIMEOUT_NO_REPORT`, `TASK_BLOCKED_WITH_REASON`, `TASK_DEAD_NO_REPORT`.
109
+ - Sessions that never send REPORT: grandfathered — they get
110
+ `TASK_TIMEOUT_NO_REPORT` with auto-summary fallback (no hard failure).
111
+ - Orchestrator code that parses legacy `TASK_COMPLETE: ...` text: still works
112
+ (text still emitted during transition).
113
+
114
+ ---
115
+
116
+ ## 5. Scope boundaries
117
+
118
+ | Work source | Require REPORT? | How distinguished |
119
+ |---|---|---|
120
+ | Inject with `--from X` | ✅ Yes (track in `pendingReports[sessionId]`) | `pendingReports` map populated on inject |
121
+ | Inject without `--from` | ❌ No (no one to report to) | `pendingReports` key absent |
122
+ | User typed directly | ❌ No | No inject event, no pendingReport entry |
123
+ | Self-initiated REPORT inject | ❌ No (it IS the report) | prefix match: `REPORT:` etc. |
124
+
125
+ **Key rule:** Only sessions with a `pendingReports[id]` entry are subject to
126
+ enforcement. User-driven work naturally doesn't populate this map.
127
+
128
+ ---
129
+
130
+ ## 6. Files to modify
131
+
132
+ | File | Change |
133
+ |---|---|
134
+ | `daemon.js` — sessionStateManager.onTransition (lines 37-57) | Replace direct auto-report with two-stage notification. Fire `TASK_IDLE_NO_REPORT`, start REPORT watch timer. |
135
+ | `daemon.js` — inject endpoint (lines 1547-1550) | Extend `pendingReports[id]` with `awaitingReport: true`, `reportWatchUntil: ts`. |
136
+ | `daemon.js` — inject endpoint (new detection) | Check incoming inject prompt for REPORT prefix + reverse-match to originating pendingReport. If matched: cancel timer, fire `TASK_COMPLETE_WITH_REPORT`. |
137
+ | `daemon.js` — state machine `dead` transition handler | Fire `TASK_DEAD_NO_REPORT` with auto-summary. |
138
+ | `daemon.js` — new helper `buildAutoSummary(session)` | Read `session.outputRing`, strip ANSI, filter blanks, take last 40 lines, max 4KB. |
139
+ | `src/mailbox/config.js` or similar config | Add `reportTimeoutSecs: 120`, `autoSummaryLines: 40`, `autoSummaryMaxBytes: 4096`. |
140
+ | `daemon.js` — legacy auto-report removal (lines 2131-2147, 2328-2346) | Retire duplicate legacy paths (or keep with deprecation flag). |
141
+ | `test/daemon.test.js` | New tests: REPORT-detected path, timeout path, dead-before-report path, no-inject-source ignored path. |
142
+
143
+ No new files. No new ports. No new process spawning.
144
+
145
+ ---
146
+
147
+ ## 7. Test plan
148
+
149
+ **Unit tests (test/daemon.test.js additions):**
150
+ 1. Idle after inject → emits `TASK_IDLE_NO_REPORT` bus event (NOT `TASK_COMPLETE`)
151
+ 2. REPORT-prefixed inject reply within timeout → emits `TASK_COMPLETE_WITH_REPORT` with parsed fields
152
+ 3. No REPORT within timeout → emits `TASK_TIMEOUT_NO_REPORT` with auto_summary containing last session output
153
+ 4. `STATUS: blocked` reply → immediate `TASK_BLOCKED_WITH_REASON`
154
+ 5. Session dies before report → `TASK_DEAD_NO_REPORT` with auto_summary
155
+ 6. Idle WITHOUT pendingReports entry (user-driven work) → no enforcement events
156
+ 7. `buildAutoSummary()`: strips ANSI, drops blanks, truncates to 40 lines / 4KB
157
+ 8. Legacy text-inject to source still fires (back-compat grandfathering)
158
+
159
+ **E2E tests:**
160
+ 1. Full cycle: `inject --from A B "task"` → B works → B sends `telepty inject --from B A "REPORT: ..."` → A receives REPORT → bus emits `TASK_COMPLETE_WITH_REPORT`
161
+ 2. Timeout cycle: same but B never replies → after 120s → A receives `TASK_TIMEOUT_NO_REPORT` with auto_summary
162
+
163
+ **Regression:**
164
+ - All 131 existing tests pass unchanged
165
+ - Existing `TASK_COMPLETE:` text format still emitted (grandfather)
166
+
167
+ ---
168
+
169
+ ## 8. Semver
170
+
171
+ **Minor bump → 0.2.0.**
172
+
173
+ Justification:
174
+ - New bus event types (additive, not breaking)
175
+ - New config keys (additive with defaults)
176
+ - Legacy notification text preserved (back-compat)
177
+ - No breaking API changes
178
+ - Observable new behavior that consumers may opt into
179
+
180
+ Not a patch because it introduces new observable event types.
181
+ Not major because nothing is removed or renamed.
182
+
183
+ ---
184
+
185
+ ## 9. Risks — top 3
186
+
187
+ 1. **REPORT detection false positives** — an inject back to source that
188
+ happens to start with "REPORT:" but is actually a new task request gets
189
+ miscategorized. Mitigation: REPORT detection requires BOTH prefix match
190
+ AND reverse-match to `pendingReports[senderSession]` with matching
191
+ `inject_ref`. If no pending outbound report tracked, treat as new inject.
192
+ 2. **Auto-summary leaks sensitive output** — PTY output may contain secrets
193
+ (tokens, passwords echoed). Mitigation: honor a denylist regex
194
+ (`api[_-]?key|password|token=\\S+`) before attaching; truncate aggressive.
195
+ Document that auto_summary is best-effort preview, not full transcript.
196
+ 3. **Timeout storm on orchestrator** — if many sessions timeout simultaneously,
197
+ orchestrator receives a flurry of `TASK_TIMEOUT_NO_REPORT` events.
198
+ Mitigation: rate-limit timeout emissions per-orchestrator via mailbox
199
+ coalescing (existing `notifyCoalesceMs`).
200
+
201
+ ---
202
+
203
+ ## 10. Open questions
204
+
205
+ 1. **Should `TASK_IDLE_NO_REPORT` be delivered as an inject (legacy) or ONLY
206
+ as a bus event?** Recommendation: bus event only during transition — legacy
207
+ text-inject preserved unchanged. Rich event flows via bus where consumers
208
+ can subscribe.
209
+ 2. **Cross-machine:** Does the REPORT watch timer survive tailnet peer relay?
210
+ Current `pendingReports` is in-memory on the daemon handling the inject.
211
+ If orchestrator is on a different machine, does the remote peer also track?
212
+ Recommendation: timer stays on the daemon that accepted the original
213
+ inject; remote orchestrator gets events via existing bus relay. No
214
+ cross-machine state sync needed.
215
+ 3. **Should `dismissed` be session-initiated or orchestrator-initiated?**
216
+ Proposed: session sends `STATUS: dismissed` (I decided not to do this);
217
+ orchestrator can also mark via `DELETE /api/pendingReports/{id}`
218
+ (new endpoint). Both clear the watch.
219
+ 4. **Two injects in quick succession from same orchestrator:** First inject
220
+ creates pendingReport; second inject arrives before REPORT for first.
221
+ Does second inject overwrite or queue? Recommendation: overwrite (only
222
+ latest inject expects REPORT). Log `[AUTO-REPORT] overwritten pending`
223
+ warning for observability.
224
+ 5. **reportTimeoutSecs default (120s):** Is this the right baseline? Evidence
225
+ table shows tasks ranging 7.5s → 649s. 120s too short for long tasks.
226
+ Alternative: no default timer — only fire fallback when `dead` detected
227
+ or explicit orchestrator-side query. Needs orchestrator input.
228
+
229
+ ---
230
+
231
+ ## Invariants honored
232
+
233
+ - ✅ Existing idle detection unchanged (state machine onTransition fires as before)
234
+ - ✅ Orchestrator needs no code changes to benefit (bus events flow passively)
235
+ - ✅ No new process spawning / no new network ports
236
+ - ✅ Cross-machine sync via existing mailbox unchanged
237
+ - ✅ Scoped to REPORT enforcement — no inject rewrite