@dmsdc-ai/aigentry-telepty 0.1.97 → 0.3.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,608 @@
1
+ ---
2
+ status: draft
3
+ date: 2026-04-26
4
+ topic: submit-gate-fixes-v2
5
+ predecessors:
6
+ - docs/superpowers/specs/2026-04-26-inject-submit-enter-reliability.md (δ Phase 1+2, telepty 0.3.0 commit 0c66d87)
7
+ fixes:
8
+ - δ-fix-2: send-key bypass gate (P0)
9
+ - δ-fix-3: gate threshold relaxation (P1)
10
+ - δ-fix-4: timeout extension + dispatch-on-timeout (P1)
11
+ constitution_rules: [Rule 17 무의존, Rule 26 cross-OS]
12
+ ---
13
+
14
+ # SPEC: submit-gate-fixes-v2 — δ-fix-2/3/4
15
+
16
+ **Date:** 2026-04-26
17
+ **Author:** aigentry-telepty-coder
18
+ **Status:** SPEC — Phase 1, awaiting orchestrator approval
19
+ **Track:** orchestrator UX trap — δ-fix-2/3/4 (post-δ-Phase-2 regression cluster)
20
+ **Authority:** orchestrator's root-cause analysis 2026-04-26
21
+ **Predecessor:** `docs/superpowers/specs/2026-04-26-inject-submit-enter-reliability.md` (δ Phase 1, landed as 0.3.0 commit `0c66d87`)
22
+ **Memory citations:** `feedback_telepty_send_key_regression.md`, `feedback_evidence_based_bugfix.md`, `feedback_dustcraw_evidence_required.md`, `feedback_git_explicit_paths.md`
23
+
24
+ ---
25
+
26
+ ## 0. Problem statement
27
+
28
+ After δ Phase 2 (telepty 0.3.0, `0c66d87`) shipped, three regressions were observed across 4 fresh-spawned sessions (impl-a, impl-b, v3-tester, builder-e2e) on 2026-04-26:
29
+
30
+ 1. **`telepty send-key <sid> enter` returns 504** on fresh-spawned claude/codex sessions. The CLI's send-key path (`cli.js:1741`) routes through the same gated `/submit` endpoint. Even the manual Enter override fails — the orchestrator is forced to bypass telepty entirely (`cmux send-key --workspace workspace:N enter`).
31
+ 2. **Gate confidence threshold 0.85 is too strict for fresh sessions.** `sessionStateManager` reports `state='idle'` once silence detection fires, but with `confidence=0.6` when neither OSC 133 nor a shell-prompt pattern matched (the common case for the claude TUI). `awaitReplReady` (`src/submit-gate.js:62`) then rejects, and the gate eventually times out with `reason='timeout'`.
32
+ 3. **Default `gate_timeout_ms=5000` is too short** for fresh REPLs (claude observed at 3–6 s; with stale silence detection on top, 5 s is borderline). When the gate times out, dispatch is abandoned entirely (`daemon.js:1558-1568`) — strictly worse than the pre-0.3.0 blind retry, which at least *attempted* the dispatch.
33
+
34
+ These three regressions all stem from a single decision in 0.3.0: gating became binary ("ready or 504") with strict thresholds tuned to high-confidence states (OSC 133 / shell prompt pattern), while the most common AI-CLI ready state on fresh spawn is the silence-fallback (`confidence=0.6`).
35
+
36
+ ---
37
+
38
+ ## 1. Root cause analysis
39
+
40
+ ### 1.1 Fix 1 — send-key routes through gate (P0)
41
+
42
+ **`cli.js:1741`** (verbatim):
43
+
44
+ ```js
45
+ const res = await fetchWithAuth(`http://${target.host}:${PORT}/api/sessions/${encodeURIComponent(target.id)}/submit`, { method: 'POST' });
46
+ ```
47
+
48
+ The `send-key` command was unchanged by δ Phase 2 — it still POSTs to `/submit` with an empty body. But that endpoint's behaviour was changed to gate by default (`daemon.js:1552-1568`):
49
+
50
+ ```js
51
+ // daemon.js:1554-1569 — gate runs unconditionally when TELEPTY_SUBMIT_GATE != 'off'
52
+ const gateResult = await submitGate.awaitReplReady(id, sessionStateManager, {
53
+ timeoutMs: gateTimeoutMs,
54
+ });
55
+ if (!gateResult.ready) {
56
+ return res.status(504).json({
57
+ error: 'Submit gated-timeout — target REPL never readied for input',
58
+ reason: gateResult.reason, last_state: gateResult.last_state,
59
+ strategy: 'none', attempts: 0, gated: true, gate_wait_ms: gateResult.waited_ms,
60
+ });
61
+ }
62
+ ```
63
+
64
+ There is no per-request opt-out, only a daemon-global env var (`TELEPTY_SUBMIT_GATE=off` at line `daemon.js:1511`). Thus a manual Enter — explicitly the "press this key, no questions" semantic — inherits `inject --submit`'s render-readiness gate. **This is the regression.**
65
+
66
+ The δ Phase 1 spec §4 contemplated `send-key` benefiting from the same gate (§4.2 line 322 in the predecessor spec) — that turned out to be wrong. **Manual override must remain manual.** Memory `feedback_telepty_send_key_regression.md` records the workaround in production (`cmux send-key` direct).
67
+
68
+ ### 1.2 Fix 2 — confidence threshold too strict (P1)
69
+
70
+ **`src/submit-gate.js:50-51`** (verbatim):
71
+
72
+ ```js
73
+ const timeoutMs = Number.isFinite(opts.timeoutMs) ? opts.timeoutMs : 5000;
74
+ const minConfidence = Number.isFinite(opts.minConfidence) ? opts.minConfidence : 0.85;
75
+ ```
76
+
77
+ **`session-state.js:378-380`** (verbatim — the only IDLE-confidence assignment site):
78
+
79
+ ```js
80
+ const hasOsc133 = this._lastOsc133At && (now - this._lastOsc133At) < this.config.idle_timeout_ms * 2;
81
+ const hasPrompt = this._matchesAny(lastLine, PROMPT_PATTERNS);
82
+ const confidence = hasOsc133 ? 0.95 : (hasPrompt ? 0.9 : 0.6);
83
+ ```
84
+
85
+ The state machine emits exactly three `IDLE` confidences in normal operation:
86
+
87
+ | Trigger | Confidence | Source |
88
+ |---|---|---|
89
+ | OSC 133;A or 133;B mark within last 2× idle_timeout | **0.95** | `session-state.js:380` |
90
+ | Last line matches `PROMPT_PATTERNS` (shell-style `$#%>❯›»`, python `>>>`, etc.) | **0.9** | `session-state.js:380` |
91
+ | Silence > `idle_timeout_ms` (5 s default), no prompt match | **0.6** | `session-state.js:380` |
92
+
93
+ **`session-state.js:77-83`** — `PROMPT_PATTERNS` (verbatim):
94
+
95
+ ```js
96
+ const PROMPT_PATTERNS = [
97
+ /[$#%>❯›»] *$/, // common shell prompts
98
+ />>> *$/, // python REPL
99
+ /\.\.\. *$/, // python continuation
100
+ /\(.*\) *[$#>] *$/, // virtualenv / conda prefix
101
+ /^\[.*@.*\][$#] *$/m, // [user@host]$
102
+ ];
103
+ ```
104
+
105
+ The claude TUI is a custom Ink/React renderer: its input UI is a Unicode-box with a `│ > │` interior — `>` is followed by `│`, not end-of-line, so PROMPT_PATTERNS do NOT match. The claude TUI also does NOT emit OSC 133 (verified by reading the upstream Ink source — no `\x1b]133;` in claude-code's prompt component). **Therefore fresh claude lands in the silence-fallback bucket → confidence 0.6.**
106
+
107
+ With `minConfidence=0.85` (the hard-coded default), 0.6 is rejected and the gate either waits for a transition (none comes — the session is already at peak readiness) or times out. **The state machine considers the session ready; the gate disagrees.** Mismatch.
108
+
109
+ ### 1.3 Fix 3 — timeout too short + dispatch abandoned on timeout (P1)
110
+
111
+ **Two sub-problems compounded:**
112
+
113
+ (a) **Timeout default 5000 ms is below empirical claude-ready time.** Predecessor spec §1.2 (line 144) cites "freshly-spawned `claude` REPL takes 3–6 s before its input loop is ready". With `idle_timeout_ms=5000` (`session-state.js:58`), silence detection itself takes 5 s before any IDLE transition. A 5 s gate timeout is on the failure side of that distribution.
114
+
115
+ (b) **Gate timeout abandons dispatch (`daemon.js:1558-1568`):**
116
+
117
+ ```js
118
+ if (!gateResult.ready) {
119
+ return res.status(504).json({ ... attempts: 0, gated: true, ... }); // ← never calls terminalLevelSubmit
120
+ }
121
+ ```
122
+
123
+ Pre-0.3.0 (legacy blind path) always called `terminalLevelSubmit` at least once. Post-0.3.0, on gate timeout, *zero* dispatch attempts happen. This is **strictly worse than the pre-0.3.0 worst case**: previously the body might land late (silently); now it never lands and the orchestrator sees 504. The gate was supposed to be *additive* (verify ready, *then* dispatch); on timeout it became *subtractive* (skip dispatch entirely).
124
+
125
+ The verify step (§5.4 of predecessor, `src/submit-gate.js:125-162`) already returns optimistic `consumed:true` when the body was never visible (`reason: 'never_visible'`) — it would handle the "we dispatched blind, body got consumed" case. We just never get there.
126
+
127
+ ### 1.4 Why this is most reproducible on fresh sessions
128
+
129
+ - Long-running claude sessions: have already had OSC-133 events (if any plugin ever fired one) cached, OR have produced enough output to push prior shell prompts into recent lines, OR have completed at least one full cycle that bumped confidence. Less common to hit 0.6 fallback.
130
+ - Fresh sessions: no event history. Spawn → render banner (~1 s) → trust dialog → blank input box. The first IDLE transition on a 5 s silence after spawn produces 0.6 confidence. Gate rejects. The orchestrator's "spawn-then-immediately-inject" pattern (used for parallel session fan-out) maximally exposes the trap — which exactly matches the production observations.
131
+
132
+ ### 1.5 Existing primitives (no new deps — Rule 17)
133
+
134
+ - `src/submit-gate.js` is already a self-contained module with `awaitReplReady` and `verifyBodyConsumed`. All three fixes can be implemented by adding parameters to its existing API surface and one early-exit branch in the `/submit` endpoint.
135
+ - `terminalLevelSubmit` (`daemon.js:636-644`) is unchanged — it remains the single dispatch primitive.
136
+ - HTTP body parameters (`pre_delay_ms`, `retries`, `injected_body`, etc.) already pass-through pattern via `req.body?.<field>` clamping; adding `force`, `min_confidence` follows the same convention.
137
+
138
+ ---
139
+
140
+ ## 2. Decision matrix per fix
141
+
142
+ ### 2.1 Fix 1 — send-key bypass gate
143
+
144
+ | Approach | What it does | API impact | LOC | Cross-OS | Backwards compat | Verdict |
145
+ |---|---|---|---|---|---|---|
146
+ | **A. `force` body param on `/submit`** | `POST /submit { force: true }` skips gate + verify, dispatches once via `terminalLevelSubmit`, returns `{ success, strategy, attempts:1, gated:false }`. CLI's `send-key` command always sets `force:true`. | Additive body field; response shape unchanged when `force:false`. | ~15 daemon + ~5 CLI | ✅ same path | ✅ default opt-in gate for inject; opt-out per-request for send-key | ✅ **Recommended.** |
147
+ | B. New `POST /api/sessions/:id/key` endpoint | Dedicated never-gated endpoint. CLI's `send-key` POSTs `/key`. `/submit` retains gate semantics. | New endpoint; new test surface; bus event duplication. | ~40 daemon + ~10 CLI + ~30 test | ✅ same path | ✅ but doubles routes | ⚠️ Cleaner contract but heavier. |
148
+
149
+ **Recommendation: A.** Rationale tied to constitution:
150
+
151
+ - **Rule 1 (경량) / KISS**: One endpoint, one contract. New endpoint duplicates routing, body parsing, bus emission, and tests.
152
+ - **Rule 17 (무의존)**: No new dependency surface. `force` is a body field — same JSON serializer, same bus event format.
153
+ - **Symmetry with the existing `TELEPTY_SUBMIT_GATE=off` env var**: that flag is the daemon-wide opt-out at `daemon.js:1511`. `force` is the per-request opt-out — same code path, narrower scope.
154
+ - **Backwards compatibility**: every existing caller omits `force` → unchanged behaviour. Only `cli.js:1741` needs to pass `{ force:true }` when the user types `send-key`.
155
+
156
+ The trade-off is conceptual purity (B's argument: send-key and submit are semantically different operations) versus operational simplicity (A's argument: same dispatch primitive, one switch). Given this codebase's existing pattern of body-field gating (e.g. `injected_body`, `gate_timeout_ms`), A is more idiomatic.
157
+
158
+ ### 2.2 Fix 2 — gate confidence threshold
159
+
160
+ | Approach | What it does | Tuning vs evidence | Risk of false-ready | Verdict |
161
+ |---|---|---|---|---|
162
+ | (i) **Lower default `minConfidence` to 0.5** | Allows the silence-fallback IDLE (conf=0.6) through. Per-request `min_confidence` body param remains for callers wanting tighter gating. | Matches `session-state.js:380` lowest legitimate IDLE confidence; 0.5 is comfortably below 0.6 with margin. | Very low — IDLE is only entered after `idle_timeout_ms` (5 s) silence. Low-confidence IDLE is the *dominant* ready state for AI-CLI TUIs (no OSC 133, no shell prompt). | ✅ **Recommended.** |
163
+ | (ii) Allow `confidence === undefined` to pass | Loophole tactic; current state machine never emits undefined confidence (the `_transition` constructor always assigns one). Adopts a contract that doesn't actually exist today and creates fragility if state shape changes. | No empirical basis. | Higher — future state shape changes could leak through. | ❌ |
164
+ | (iii) Per-CLI threshold table | Map `claude→0.5`, `codex→0.7`, `gemini→0.7`, default 0.85. | Over-tuned; current state machine emits the same {0.95, 0.9, 0.6} regardless of CLI. The CLI-specific axis is *prompt-pattern matchability*, not confidence semantics. | Low but with maintenance cost as new CLIs appear. | ⚠️ Premature optimization (YAGNI). |
165
+
166
+ **Recommendation: (i) — single default `minConfidence = 0.5`**, plus per-request override `min_confidence` (already accepted but un-clamped — clamp `[0, 1]`).
167
+
168
+ Why 0.5 specifically (not 0.6 or 0.7)?
169
+ - 0.5 sits below the lowest legitimate IDLE conf (0.6) with explicit margin.
170
+ - 0.7 would still admit shell-prompt and OSC 133 only — same regression.
171
+ - 0.6 (exact match) is fragile: if `session-state.js:380` ever drops the silence-fallback to 0.55 (e.g. when stale), 0.6 threshold breaks silently. 0.5 is the conservative "below all legitimate IDLEs" boundary.
172
+ - The state machine's `WAITING` state is set to 0.9 (`session-state.js:316`), so 0.5 also admits all WAITING.
173
+ - 0.5 is well above the "no signal" floor — the state machine never emits an IDLE/WAITING below 0.6 today.
174
+
175
+ **Test impact:** the existing test at `test/submit-gate.test.js:185-193` ("rejects ready transition with low confidence and falls through to timeout") uses `minConfidence: 0.85` + `confidence: 0.6` and expects timeout. It tests **the threshold mechanism** (not the specific value). Update test to: pass `minConfidence: 0.7`, sim transition to `confidence: 0.5` — same assertion, different numbers.
176
+
177
+ ### 2.3 Fix 3 — timeout extension + dispatch-on-timeout
178
+
179
+ Two concerns; recommended jointly because the right fix for (b) reduces the operational sting of (a).
180
+
181
+ **(a) Default `gate_timeout_ms`:**
182
+
183
+ | Option | Value | Rationale |
184
+ |---|---|---|
185
+ | Keep 5000 | regression untouched | ❌ |
186
+ | Raise to 10000 | matches claude 3–6 s ready window with margin | ✅ |
187
+ | Per-CLI (claude=10000, codex=8000, gemini=8000) | tighter on faster CLIs | YAGNI; the gate short-circuits when ready, so a 10 s ceiling pays nothing on warm sessions and 2 extra seconds on cold codex/gemini if they ever miss the 8 s mark | ⚠️ |
188
+
189
+ **Recommendation: 10000 ms uniform default.** Per-CLI is over-engineered until evidence (E2E §4.3 below) shows codex/gemini routinely paying the extra 2 s.
190
+
191
+ **(b) Dispatch-on-timeout (best-effort):**
192
+
193
+ | Option | Behaviour on timeout | Distinguishability of failure | Verdict |
194
+ |---|---|---|---|
195
+ | **D₁. Dispatch + verify (recommended)** | Call `terminalLevelSubmit` once, then `verifyBodyConsumed` (timeout 2 s, polling outputRing). Three terminal states: <br/>- ready=true, dispatch ok → 200 (normal path) <br/>- ready=false (gate timeout), dispatched, `consumed=true` → 200 with `gated_dispatch_after_timeout: true` flag <br/>- ready=false, dispatched, `consumed=false` → 504 honest fail (`reason: 'gated_dispatch_unconsumed'`) <br/>- ready=false, no `injected_body` (e.g. `inject --submit ""` empty body / send-key edge) → 200 with flag (no way to verify; trust dispatch) | High — flag distinguishes "gate failed but body landed" from "ready and consumed". | ✅ **Recommended.** |
196
+ | D₂. Dispatch blind on timeout (no verify) | Single `terminalLevelSubmit` on timeout, return 200. | Low — collapses two outcomes into one; loses honesty signal. | ❌ regression of 0.3.0's main goal. |
197
+ | D₃. Keep 504-on-timeout (strict) | 0.3.0 behaviour. | High — but unhelpfully so; 504 means "we didn't even try" today. | ❌ status quo. |
198
+
199
+ **Recommendation: D₁ — dispatch + verify on timeout.** This restores the dispatch attempt that pre-0.3.0 would have made, while keeping the new honesty signal (504 only fires when verification confirms the body is still in the input box). It strictly Pareto-dominates D₂ and D₃.
200
+
201
+ The new response field `gated_dispatch_after_timeout: true` is additive — clients that don't handle it see a normal 200 OK. Clients that special-cased 504 to retry will see fewer 504s, which is a relaxation, not a tightening.
202
+
203
+ ---
204
+
205
+ ## 3. Implementation plan per fix
206
+
207
+ All three fixes ship in **one commit** (telepty 0.3.1). Rationale: they share the same endpoint and helper module; splitting would produce three commits each touching daemon.js:1497-1624 with merge conflicts. Memory `feedback_git_explicit_paths.md` applies — stage with explicit paths only:
208
+
209
+ ```bash
210
+ git add src/submit-gate.js daemon.js cli.js test/submit-gate.test.js test/daemon.test.js docs/superpowers/specs/2026-04-26-submit-gate-fixes-v2.md package.json CHANGELOG.md
211
+ ```
212
+
213
+ ### 3.1 Fix 1 (send-key bypass)
214
+
215
+ **File: `daemon.js:1497-1550` (POST /submit, near top after body parsing).**
216
+
217
+ Insert immediately after `gateOff` block:
218
+
219
+ ```js
220
+ // Per-request bypass: { force: true } skips gate + verify, single dispatch.
221
+ // Used by `telepty send-key` (manual override) and any caller explicitly
222
+ // opting out of render-readiness gating.
223
+ const force = req.body?.force === true;
224
+ if (force) {
225
+ const strategy = terminalLevelSubmit(id, session);
226
+ if (strategy) {
227
+ emitSubmitBus({ strategy, attempts: 1, gated: false, forced: true });
228
+ return res.json({ success: true, strategy, attempts: 1, gated: false, forced: true });
229
+ }
230
+ return res.status(503).json({
231
+ error: 'Submit failed via all strategies (kitty/cmux/pty)',
232
+ strategy: 'none', attempts: 0, gated: false, forced: true,
233
+ });
234
+ }
235
+ ```
236
+
237
+ **File: `cli.js:1741` (send-key command).**
238
+
239
+ Change:
240
+
241
+ ```js
242
+ const res = await fetchWithAuth(`http://${target.host}:${PORT}/api/sessions/${encodeURIComponent(target.id)}/submit`, { method: 'POST' });
243
+ ```
244
+
245
+ To:
246
+
247
+ ```js
248
+ const res = await fetchWithAuth(`http://${target.host}:${PORT}/api/sessions/${encodeURIComponent(target.id)}/submit`, {
249
+ method: 'POST',
250
+ headers: { 'Content-Type': 'application/json' },
251
+ body: JSON.stringify({ force: true }),
252
+ });
253
+ ```
254
+
255
+ ### 3.2 Fix 2 (threshold relaxation)
256
+
257
+ **File: `src/submit-gate.js:51`.**
258
+
259
+ Change:
260
+
261
+ ```js
262
+ const minConfidence = Number.isFinite(opts.minConfidence) ? opts.minConfidence : 0.85;
263
+ ```
264
+
265
+ To:
266
+
267
+ ```js
268
+ const minConfidence = Number.isFinite(opts.minConfidence) ? opts.minConfidence : 0.5;
269
+ ```
270
+
271
+ **File: `daemon.js:1507-1508` (POST /submit body parsing).**
272
+
273
+ Add per-request override (clamped) immediately after `verifyTimeoutMs`:
274
+
275
+ ```js
276
+ const minConfidence = req.body?.min_confidence != null
277
+ ? Math.min(Math.max(Number(req.body.min_confidence), 0), 1)
278
+ : undefined; // undefined → src/submit-gate.js default
279
+ ```
280
+
281
+ Then in the `awaitReplReady` call (`daemon.js:1555-1557`), pass it through:
282
+
283
+ ```js
284
+ const gateResult = await submitGate.awaitReplReady(id, sessionStateManager, {
285
+ timeoutMs: gateTimeoutMs,
286
+ ...(minConfidence !== undefined ? { minConfidence } : {}),
287
+ });
288
+ ```
289
+
290
+ **File: `test/submit-gate.test.js:185-193` (existing test — UPDATE).**
291
+
292
+ Change literals to keep the same semantic ("below threshold rejects") but with values that don't conflate with the new default:
293
+
294
+ ```js
295
+ test('awaitReplReady rejects ready transition with below-threshold confidence and falls through to timeout', async () => {
296
+ const sm = makeStateManager({ s1: { state: 'working', confidence: 0.9 } });
297
+ const promise = awaitReplReady('s1', sm, { timeoutMs: 80, minConfidence: 0.7 });
298
+ // idle but with confidence below the explicit threshold — should NOT settle.
299
+ setImmediate(() => sm.setState('s1', 'idle', 0.5));
300
+ const result = await promise;
301
+ assert.equal(result.ready, false);
302
+ assert.equal(result.reason, 'timeout');
303
+ });
304
+ ```
305
+
306
+ ### 3.3 Fix 3 (timeout + dispatch-on-timeout)
307
+
308
+ **File: `daemon.js:1507` (POST /submit body parsing).**
309
+
310
+ Change:
311
+
312
+ ```js
313
+ const gateTimeoutMs = Math.min(Math.max(Number(req.body?.gate_timeout_ms) || 5000, 500), 15000);
314
+ ```
315
+
316
+ To:
317
+
318
+ ```js
319
+ const gateTimeoutMs = Math.min(Math.max(Number(req.body?.gate_timeout_ms) || 10000, 500), 30000);
320
+ ```
321
+
322
+ (Upper clamp raised to 30 s for the rare extreme-cold case; default 10 s is the operative change.)
323
+
324
+ **File: `daemon.js:1554-1606` (gate + dispatch + verify block — REWRITE).**
325
+
326
+ Replace:
327
+
328
+ ```js
329
+ // Step 1: wait for REPL readiness via session state machine.
330
+ const gateResult = await submitGate.awaitReplReady(...);
331
+ if (!gateResult.ready) {
332
+ return res.status(504).json({ ... });
333
+ }
334
+
335
+ // Step 2: dispatch Enter via existing kitty → cmux → PTY chain.
336
+ let strategy = terminalLevelSubmit(id, session);
337
+ let attempts = strategy ? 1 : 0;
338
+ if (!strategy) { return res.status(503).json({...}); }
339
+
340
+ // Step 3: verify body consumption (only when the caller provided the body).
341
+ let verify = null;
342
+ if (injectedBody && injectedBody.length > 0) { ... }
343
+ ```
344
+
345
+ With:
346
+
347
+ ```js
348
+ // Step 1: wait for REPL readiness (best-effort — proceed on timeout).
349
+ const gateResult = await submitGate.awaitReplReady(id, sessionStateManager, {
350
+ timeoutMs: gateTimeoutMs,
351
+ ...(minConfidence !== undefined ? { minConfidence } : {}),
352
+ });
353
+ const gatedDispatchAfterTimeout = !gateResult.ready;
354
+ if (gatedDispatchAfterTimeout) {
355
+ // Distinguish unrecoverable session states (dead/error/restarting/no_state) —
356
+ // those still produce 504 (no point dispatching to a dead PTY).
357
+ if (gateResult.reason && gateResult.reason !== 'timeout') {
358
+ console.log(`[SUBMIT] gate hard-fail ${id}: ${gateResult.reason} (last_state=${gateResult.last_state})`);
359
+ return res.status(504).json({
360
+ error: 'Submit gated-timeout — target REPL not in a dispatchable state',
361
+ reason: gateResult.reason, last_state: gateResult.last_state,
362
+ strategy: 'none', attempts: 0, gated: true, gate_wait_ms: gateResult.waited_ms,
363
+ });
364
+ }
365
+ console.log(`[SUBMIT] gate timeout ${id}: dispatching anyway (last_state=${gateResult.last_state})`);
366
+ }
367
+
368
+ // Step 2: dispatch Enter (always attempts at least once unless hard-fail above).
369
+ let strategy = terminalLevelSubmit(id, session);
370
+ let attempts = strategy ? 1 : 0;
371
+ if (!strategy) {
372
+ return res.status(503).json({
373
+ error: 'Submit failed via all strategies (kitty/cmux/pty)',
374
+ strategy: 'none', attempts: 0, gated: true, gate_wait_ms: gateResult.waited_ms,
375
+ });
376
+ }
377
+
378
+ // Step 3: verify body consumption.
379
+ let verify = null;
380
+ if (injectedBody && injectedBody.length > 0) {
381
+ verify = await submitGate.verifyBodyConsumed(session, injectedBody, {
382
+ timeoutMs: verifyTimeoutMs, stripAnsi: stripAnsiState,
383
+ });
384
+ if (!verify.consumed) {
385
+ await new Promise(resolve => setTimeout(resolve, retryDelayMs));
386
+ const retryStrategy = terminalLevelSubmit(id, session);
387
+ if (retryStrategy) {
388
+ strategy = retryStrategy;
389
+ attempts++;
390
+ verify = await submitGate.verifyBodyConsumed(session, injectedBody, {
391
+ timeoutMs: verifyTimeoutMs, stripAnsi: stripAnsiState,
392
+ });
393
+ }
394
+ }
395
+ // If gate timed out AND verify still says still_visible → honest 504.
396
+ if (gatedDispatchAfterTimeout && !verify.consumed) {
397
+ emitSubmitBus({ strategy, attempts, gated: true, gate_wait_ms: gateResult.waited_ms, verify, gated_dispatch_after_timeout: true });
398
+ return res.status(504).json({
399
+ error: 'Submit gated-timeout and body not consumed after best-effort dispatch',
400
+ reason: 'gated_dispatch_unconsumed',
401
+ last_state: gateResult.last_state,
402
+ strategy, attempts, gated: true, gate_wait_ms: gateResult.waited_ms, verify,
403
+ });
404
+ }
405
+ }
406
+
407
+ const responseBody = {
408
+ success: true, strategy, attempts,
409
+ gated: true, gate_wait_ms: gateResult.waited_ms, verify,
410
+ ...(gatedDispatchAfterTimeout ? { gated_dispatch_after_timeout: true } : {}),
411
+ };
412
+ emitSubmitBus(responseBody);
413
+ return res.json(responseBody);
414
+ ```
415
+
416
+ **File: `cli.js:1665-1671` (inject --submit response handling).**
417
+
418
+ Surface the new flag in the success path:
419
+
420
+ ```js
421
+ if (submitRes.ok) {
422
+ const gateNote = submitData.gated && submitData.gate_wait_ms > 0
423
+ ? ` [gate ${submitData.gate_wait_ms}ms]`
424
+ : '';
425
+ const lateNote = submitData.gated_dispatch_after_timeout
426
+ ? ' (dispatched-after-gate-timeout)'
427
+ : '';
428
+ const attemptsNote = submitData.attempts > 1 ? ` (${submitData.attempts} attempts)` : '';
429
+ console.log(`✅ Submitted via ${submitData.strategy}${attemptsNote}${gateNote}${lateNote}.`);
430
+ }
431
+ ```
432
+
433
+ ---
434
+
435
+ ## 4. Test plan per fix
436
+
437
+ All new tests use `node:test` (existing harness). No new dev deps. Memory `feedback_evidence_based_bugfix.md` applies — every assertion is grounded in code we can cite.
438
+
439
+ ### 4.1 Fix 1 — send-key bypass (unit + daemon integration)
440
+
441
+ In `test/submit-gate.test.js` — Fix 1 needs no submit-gate-module changes (it short-circuits before the gate is invoked). Cover at the daemon level only.
442
+
443
+ In `test/daemon.test.js` (or new `test/submit-force.test.js`):
444
+
445
+ 1. `POST /api/sessions/:id/submit { force: true }` on a session in `state='starting'` (would normally fail gate) → response `{ success:true, strategy:'pty_cr', attempts:1, gated:false, forced:true }`, HTTP 200.
446
+ 2. Same call on a missing session → 404 (existing behaviour, regression check).
447
+ 3. Same call when all dispatch strategies fail (mock `terminalLevelSubmit` returning null) → 503 with `forced:true`.
448
+ 4. **Regression**: `POST /submit { }` (no `force`) on a `state='starting'` fresh session — gate path still runs; with new defaults this resolves either via Fix 3 dispatch-on-timeout (200) or via 504 if hard-fail. Captured in §4.3.
449
+ 5. **Smoke**: `telepty send-key <id> enter` against a fresh `claude` session in the live E2E — succeeds without manual workaround.
450
+
451
+ ### 4.2 Fix 2 — threshold relaxation (unit)
452
+
453
+ In `test/submit-gate.test.js`:
454
+
455
+ 6. `awaitReplReady` with default opts (no `minConfidence` passed), session `{state:'idle', confidence: 0.6}` → resolves immediately `ready:true`. **This is the pre-fix failure case.**
456
+ 7. With explicit `minConfidence: 0.85`, session `{state:'idle', confidence: 0.6}` → respects override, falls through to timeout. (Verifies the override still works.)
457
+ 8. With `minConfidence: 0.5` (new default), session `{state:'idle', confidence: 0.5}` → ready (boundary inclusive, since `isReady` checks `< minConfidence` strictly).
458
+ 9. With `minConfidence: 0.5`, session `{state:'idle', confidence: 0.49}` → not ready, falls through to timeout.
459
+ 10. **Update existing test** at submit-gate.test.js:185-193 per §3.2 above.
460
+
461
+ In `test/daemon.test.js`:
462
+
463
+ 11. `POST /submit { min_confidence: 0.95 }` on a session that just hit IDLE with conf=0.9 (prompt-match) → 504 (per-request stricter override). Validates clamp + pass-through.
464
+ 12. `POST /submit { min_confidence: -1 }` (invalid) → clamped to 0, no error. Validates clamp.
465
+ 13. `POST /submit { min_confidence: 2 }` (invalid) → clamped to 1, gate effectively never passes for non-1.0 confidence — expect 504. Validates clamp.
466
+
467
+ ### 4.3 Fix 3 — timeout extension + dispatch-on-timeout
468
+
469
+ In `test/submit-gate.test.js` — `awaitReplReady` defaults are tested at the submit-gate module layer; daemon integration covers the dispatch-on-timeout branch.
470
+
471
+ In `test/daemon.test.js`:
472
+
473
+ 14. **Pre-existing regression check**: `POST /submit { injected_body: 'X' }` on a session that goes IDLE (conf=0.95) within 200 ms → 200, `attempts:1`, `gate_wait_ms <= 250`, no `gated_dispatch_after_timeout`. (Warm-session happy path unchanged.)
474
+ 15. **Dispatch-on-timeout success**: `POST /submit { injected_body: 'X', gate_timeout_ms: 100 }` against a session that never reaches IDLE; mock outputRing such that `verifyBodyConsumed` returns `consumed:true` (body never visible OR cleared) → 200, `gated_dispatch_after_timeout:true`, `attempts >= 1`.
475
+ 16. **Dispatch-on-timeout honest fail**: same setup, but outputRing keeps body visible past `verify_timeout_ms` → 504 with `reason:'gated_dispatch_unconsumed'`.
476
+ 17. **Hard-fail short-circuit**: session in `state='dead'` → `awaitReplReady` returns `reason:'session_dead'` (not `'timeout'`) → 504 immediately, no dispatch attempted (the hard-fail branch in §3.3).
477
+ 18. **Bare Enter (no injected_body) on timeout**: `POST /submit { force: false, gate_timeout_ms: 50 }` empty body → dispatches anyway, returns 200 with `gated_dispatch_after_timeout:true`, no `verify` field.
478
+ 19. **Default timeout verification**: `POST /submit { }` on warm session — `gate_wait_ms` should be `< 200`; `gate_timeout_ms` default of 10000 confirmed by reading response shape (a 504 timeout case in test should report `gate_wait_ms ≈ 10000`).
479
+
480
+ ### 4.4 E2E reliability harness (test/e2e-submit.manual.js — extend)
481
+
482
+ Already-opt-in (`TELEPTY_E2E=1`). Extend the harness from δ Phase 1 spec §4.3:
483
+
484
+ 20. **100× spawn-and-inject on fresh `claude`** — same harness; pass criterion ≥99/100 maintained or improved.
485
+ 21. **100× send-key on fresh `claude`** (NEW): `telepty allow --id e2e-claude-NN claude` then immediately `telepty send-key e2e-claude-NN enter`. Pass ≥99/100. **Currently 0/100 by orchestrator's evidence.**
486
+
487
+ ### 4.5 Regression coverage
488
+
489
+ 22. All 23 existing `test/submit-gate.test.js` tests pass — except test at line 185 (semantically preserved, literals updated per §3.2).
490
+ 23. All daemon tests pass unchanged.
491
+ 24. `inject --ref` (without `--submit`) — unchanged daemon path (`deliverInjectionToSession`), unchanged tests.
492
+ 25. `TELEPTY_SUBMIT_GATE=off` legacy escape hatch — preserved verbatim.
493
+ 26. Aterm sessions — `terminalLevelSubmit` already short-circuits via `session.type === 'aterm'` guards; aterm path is skipped (existing test `test/daemon.test.js:135` covers).
494
+
495
+ ---
496
+
497
+ ## 5. Failed approaches (must NOT propose)
498
+
499
+ | Anti-approach | Why rejected |
500
+ |---|---|
501
+ | Set `TELEPTY_SUBMIT_GATE=off` as default | Defeats the purpose of the gate; reverts to 0.2.x's open-loop blind retry. Memory `feedback_telepty_send_key_regression.md` lines 21-22 — the env var is a parity-test escape hatch only. |
502
+ | Remove the gate entirely from `/submit` | Regression of 0.3.0; predecessor spec §4.3 reliability target (≥99% on warm sessions) would rely on dispatch-on-timeout alone — weaker than gate-then-dispatch. |
503
+ | Add a new external dependency (e.g. `osc-detect`, `tty-cursor`) | Violates Rule 17 (무의존). All required primitives exist in `src/submit-gate.js` + `session-state.js` + `daemon.js`. |
504
+ | Lower `idle_timeout_ms` from 5000 to e.g. 1500 in `session-state.js` | Out of scope — that is a state-machine tuning, not a gate fix. Would over-fire IDLE on long-running working sessions. Memory `feedback_evidence_based_bugfix.md` — no evidence supports this change. |
505
+ | Detect claude TUI specifically (regex on banner / Ink markers) | Couples gate to a specific CLI's UI string. Fragile; breaks across claude versions; violates Rule 26 (cross-OS / cross-CLI). |
506
+ | Move gate to CLI side (have `cli.js` poll state before POST) | Couples CLI to in-process daemon state and breaks remote injects (`crossMachine.remoteInject`). Predecessor spec §5 already rejected this. |
507
+ | "Just ship a 30 s default timeout" | Inflates latency floor on warm-session fan-out without addressing root cause (threshold + dispatch-on-timeout). |
508
+ | Per-CLI threshold table | YAGNI — current state machine emits same {0.95, 0.9, 0.6} for every CLI. The CLI-specific axis is *prompt-pattern matchability*, not confidence; the right fix lives in the threshold, not a table. |
509
+ | Bundle unrelated fixes (e.g. enforce-report tweaks) into this commit | δ Phase 2's hygiene issue per the task brief. Memory `feedback_git_explicit_paths.md` — explicit-path staging only. |
510
+
511
+ ---
512
+
513
+ ## 6. Constitution check
514
+
515
+ | Rule | Compliance |
516
+ |---|---|
517
+ | **Rule 1 — 경량** | ✅ All three fixes are parameter additions and a single branch insertion. ~50 net LOC including tests. No new helper modules; reuses `awaitReplReady`/`verifyBodyConsumed`/`terminalLevelSubmit`. |
518
+ | **Rule 5 — 최선 (best-first)** | ✅ Restores send-key as a true manual override (no workaround); identifies the actual confidence gap (state-machine evidence cited verbatim); replaces the 0.3.0 strict-fail with best-effort dispatch + honest verification. |
519
+ | **Rule 13 — 비판적+건설적+객관적** | ✅ Anti-approaches enumerated with reasons. Recommendations cite line numbers + evidence, not assertion. Decision matrices show losing options. |
520
+ | **Rule 17 — 무의존** | ✅ Zero new external dependencies. All edits within `src/submit-gate.js`, `daemon.js`, `cli.js`, existing test files. |
521
+ | **Rule 26 — cross-OS** | ✅ No new per-OS branches. The fixes are pure JS reading in-memory state. The OS-specific shell-outs (`kitty`, `cmux`, `osascript`) are unchanged — only the *gate parameters* and the *dispatch-on-timeout* branch are new. |
522
+ | **Constitution Rule 1 (AI gap)** | ✅ Closes the orchestrator UX trap that wastes parallel-fanout latency budget AND breaks the manual-override fallback. |
523
+
524
+ ---
525
+
526
+ ## 7. Invariants (what MUST NOT change vs δ Phase 2 spec)
527
+
528
+ - ✅ **Default behaviour of `inject --submit` on already-warm sessions**: gate short-circuits at conf≥0.85 (still passes after threshold drop to 0.5). `gate_wait_ms` remains <250 ms in the warm path. ≥99% reliability target preserved.
529
+ - ✅ **504 still emitted in true-fail case**: when `verifyBodyConsumed` returns `consumed:false` after best-effort dispatch on timeout, response is 504 with `reason:'gated_dispatch_unconsumed'` (new) — the old `reason:'gate_timeout'` is replaced. 504 is preserved as a status code; consumers checking for "any 504" continue to work.
530
+ - ✅ **`TELEPTY_SUBMIT_GATE=off` escape hatch**: preserved verbatim (`daemon.js:1511, 1529-1550` block unchanged).
531
+ - ✅ **Bus event `submit` shape**: existing fields (`strategy`, `attempts`, `gated`, `gate_wait_ms`, `verify`) preserved. New optional fields `forced`, `gated_dispatch_after_timeout` are additive.
532
+ - ✅ **HTTP 503 (dispatch-failure)**: preserved when all strategies (kitty/cmux/pty) return null.
533
+ - ✅ **HTTP 200 success shape**: existing fields preserved. `forced:true` and `gated_dispatch_after_timeout:true` are additive optional fields.
534
+ - ✅ **23 existing unit tests in `test/submit-gate.test.js`**: 22 pass unchanged. **1 changes**: test at line 185-193 must be updated per §3.2 — its semantic is preserved (threshold-rejects-low-confidence) but literals shift to avoid colliding with the new default.
535
+ - ✅ **Aterm sessions**: unaffected (gate path is bypassed via `session.type === 'aterm'` guards in `terminalLevelSubmit`).
536
+ - ✅ **Cross-machine remote inject**: `crossMachine.remoteInject` path unchanged; only local-daemon `/submit` callers affected.
537
+
538
+ ---
539
+
540
+ ## 8. Implementation estimate
541
+
542
+ **LOC (net add):**
543
+
544
+ | Fix | daemon.js | cli.js | src/submit-gate.js | tests | Total |
545
+ |---|---|---|---|---|---|
546
+ | 1 (send-key bypass) | +14 | +5 | 0 | +30 (3 new) | +49 |
547
+ | 2 (threshold relax) | +5 | 0 | -1 / +1 | +25 (4 new + 1 update) | +30 |
548
+ | 3 (timeout + dispatch) | +35 (mostly rewriting an existing block) | +4 | 0 | +60 (6 new) | +99 |
549
+ | **Total** | **~50 net** | **~9 net** | **~1 net** | **~115** | **~175 LOC** |
550
+
551
+ **Wall budget for Phase 2 implementation:** ≤ 4 h (matches δ Phase 1 spec §11). Within sub-budgets:
552
+
553
+ - Code edits: ~45 min
554
+ - Test scaffolding + new tests: ~90 min
555
+ - Local smoke (`npm test`, manual claude spawn × 5): ~30 min
556
+ - E2E §4.4 #21 (100× send-key on fresh claude): ~30 min if feasible locally; otherwise gated under TELEPTY_E2E=1
557
+
558
+ **Risk surface:** confined to `src/submit-gate.js` (1-line default change + 1 clamp) and `daemon.js:1497-1624` (one branch + one block rewrite). No state-machine changes. No bus-schema changes. CLI changes are body-field additions in two call sites.
559
+
560
+ ---
561
+
562
+ ## 9. Out of scope
563
+
564
+ - **Claude TUI ready detection via screen content**: Detecting claude's specific input box (`│ > │`) via screen scraping to short-circuit the silence-fallback. Speculative; better solved by lowering threshold (Fix 2). Separate spec if ever needed.
565
+ - **Daemon refactor `/submit` vs `/key` architectural debate beyond Fix 1**: We chose A (force flag); B (new endpoint) is rejected for this iteration. Re-litigation belongs in a follow-up if A proves insufficient.
566
+ - **Tuning `idle_timeout_ms` (currently 5000) in `session-state.js`**: Affects all state-machine consumers, not just submit. Out of scope.
567
+ - **Per-CLI confidence tables in `session-state.js`**: Same — state-machine concern.
568
+ - **Adding OSC 133 emission to claude/codex/gemini**: Requires upstream changes; not telepty's domain.
569
+ - **REPORT enforcement (`specs/enforce-report-spec.md`)**: orthogonal — that spec governs post-idle behaviour after inject succeeds; we handle whether inject submitted at all.
570
+
571
+ ---
572
+
573
+ ## 10. Semver impact
574
+
575
+ **Recommendation: PATCH bump 0.3.0 → 0.3.1.**
576
+
577
+ Rationale:
578
+
579
+ 1. **Fix 1 (`force` body field)**: additive opt-in body parameter. Existing callers (omitting `force`) see unchanged behaviour. CLI's send-key client is the only caller flipping the new flag. Non-breaking.
580
+ 2. **Fix 2 (threshold default 0.85 → 0.5)**: a relaxation. Sessions that previously failed gate now pass; sessions that previously passed still pass. No new failure modes. Per-request `min_confidence` override preserves strict-mode availability for callers who need it. Non-breaking.
581
+ 3. **Fix 3 (timeout 5000 → 10000 + dispatch-on-timeout)**: latency ceiling extension is conservative (warm sessions short-circuit; only cold sessions pay). Dispatch-on-timeout converts some 504s into 200s — strictly a relaxation of failure semantics. New optional fields (`gated_dispatch_after_timeout`) are additive. The 504 status code surface itself is preserved; only the trigger conditions narrow.
582
+ 4. **No new HTTP endpoints, no removed fields, no schema changes.**
583
+ 5. **All three fixes are bug fixes against regressions introduced in 0.3.0** — patch is the conventional vehicle.
584
+
585
+ **Alternative considered: MINOR (0.3.0 → 0.4.0).** The rationale would be: visibility of new optional fields and the threshold semantic shift. Rejected because (a) consumers tolerate unknown fields per JSON convention; (b) the threshold change is a behaviour fix in service of the stated 0.3.0 goal, not a new feature; (c) δ Phase 2's CHANGELOG positioned 0.3.0 specifically as "render-gated submit reliability" — a follow-up patch is more honest than a minor that implies new capability.
586
+
587
+ If the orchestrator prefers the conservative MINOR for visibility reasons, the impl is identical — only the version literal in `package.json` changes.
588
+
589
+ ---
590
+
591
+ ## 11. Phase 2 entry criteria
592
+
593
+ - Orchestrator approves:
594
+ 1. Fix 1 approach: **A** (force body param)
595
+ 2. Fix 2 approach: **(i)** lower default to 0.5
596
+ 3. Fix 3 approach: **D₁** dispatch + verify on timeout, default `gate_timeout_ms=10000`
597
+ 4. Semver: **PATCH 0.3.1**
598
+ - Phase 2 implementation budget ≤ 4 h wall, ≤ 175 net LOC.
599
+ - Phase 2 success: all assertions in §4 pass, no regression on `inject --ref` (no-submit), `inject --submit` warm path, or aterm paths.
600
+ - Stage with explicit paths only (memory `feedback_git_explicit_paths.md`); commit message follows δ Phase 1 commit pattern (`fix(submit): …`).
601
+
602
+ ---
603
+
604
+ ## 12. Open questions (for Phase 2 input — non-blocking)
605
+
606
+ 1. **Should `cli.js`'s `send-key` ALSO accept an optional `--gate` flag** to opt back into gating (e.g. `telepty send-key <id> enter --gate`)? Recommendation: not in this spec. YAGNI; if a caller wants gating, they should use `inject --submit` semantics. Revisit only if a use case appears.
607
+ 2. **Should the bus event distinguish `gated_dispatch_after_timeout` vs `forced:true` consumers**? Today both produce a `submit` event; only the new optional flags differ. Recommendation: ship as-is; downstream listeners can inspect the flags.
608
+ 3. **Should `gate_wait_ms` upper-bound be raised in the bus emission for telemetry**? Currently uncapped in event but capped at `gateTimeoutMs` (10 s default). Recommendation: leave as-is; the cap is implicit in the dispatch-on-timeout path.