@dmsdc-ai/aigentry-telepty 0.1.98 → 0.3.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +326 -0
- package/CLAUDE.md +5 -1
- package/README.md +3 -0
- package/cli.js +109 -16
- package/daemon.js +399 -39
- package/docs/superpowers/specs/2026-04-26-inject-submit-enter-reliability.md +447 -0
- package/docs/superpowers/specs/2026-04-26-prompt-symbol-render-gate.md +571 -0
- package/docs/superpowers/specs/2026-04-26-submit-gate-fixes-v2.md +608 -0
- package/docs/superpowers/specs/2026-05-02-submit-force-and-retry.md +139 -0
- package/package.json +4 -4
- package/specs/enforce-report-spec.md +237 -0
- package/src/prompt-symbol-registry.js +97 -0
- package/src/report-enforcement.js +86 -0
- package/src/submit-gate.js +269 -0
|
@@ -0,0 +1,139 @@
|
|
|
1
|
+
# 2026-05-02 — `inject --submit-force` + idempotent client retry
|
|
2
|
+
|
|
3
|
+
Closes task #347 (telepty 0.3.2 `--submit` prompt-symbol gate reliability —
|
|
4
|
+
context-ref inject arrived at orchestrator but Enter was skipped when the
|
|
5
|
+
input area had a transient render mismatch: autocomplete dropdown open,
|
|
6
|
+
cursor moved, mid-render race).
|
|
7
|
+
|
|
8
|
+
## Problem
|
|
9
|
+
|
|
10
|
+
`telepty inject --submit` runs three layers of gating before pressing
|
|
11
|
+
Enter:
|
|
12
|
+
|
|
13
|
+
| Layer | File | Trigger | Skip behavior |
|
|
14
|
+
|---|---|---|---|
|
|
15
|
+
| 3. Prompt-symbol (0.3.2) | `src/submit-gate.js` `awaitPromptSymbol` | `cmux read-screen` does not show the per-CLI prompt symbol stably for ≥200 ms within 8 s | Falls through to Layer 1 (`no_prompt_symbol_seen`) |
|
|
16
|
+
| 1. State-gated (0.3.1) | `src/submit-gate.js` `awaitReplReady` | `sessionStateManager` is not in `idle`/`waiting` with conf ≥ 0.5 within 10 s | Best-effort dispatch on `timeout`; hard-fail short-circuits to 504 on `session_dead`/`error`/`restarting`/`no_state` |
|
|
17
|
+
| Verify | `src/submit-gate.js` `verifyBodyConsumed` | Injected body still visible in `outputRing` after dispatch | One bounded retry; if still visible, 504 with `reason: 'gated_dispatch_unconsumed'` |
|
|
18
|
+
|
|
19
|
+
In production this still produces a residual failure rate when the
|
|
20
|
+
orchestrator session has a transient render mismatch (autocomplete drop-down,
|
|
21
|
+
cursor outside input area, mid-paste). The body is injected, the gate times
|
|
22
|
+
out, the dispatch fires Enter into a "wrong" focus, and `verifyBodyConsumed`
|
|
23
|
+
correctly sees the body still in the input box → 504. Sub-sessions then
|
|
24
|
+
print `⚠️ Submit gated-timeout` and the human user has to press Enter
|
|
25
|
+
manually for the orchestrator to consume the inject.
|
|
26
|
+
|
|
27
|
+
## Constraints
|
|
28
|
+
|
|
29
|
+
- **Article 1 (경량)**: minimum-touch fix. No new modules, no new daemon
|
|
30
|
+
endpoint, no new helper module.
|
|
31
|
+
- **Article 17 (무의존)**: no new runtime dependency.
|
|
32
|
+
- **Article 9 (독립)**: telepty must keep working standalone (no cmux/kitty
|
|
33
|
+
required for the new flags).
|
|
34
|
+
- **Backward compat**: existing `--submit` semantics unchanged. Default
|
|
35
|
+
`--submit-retry` value MUST be 0-effect on the happy path (which is the
|
|
36
|
+
vast majority of calls, currently shipping reliably).
|
|
37
|
+
- **Idempotency**: a retry must never double-press Enter.
|
|
38
|
+
|
|
39
|
+
## Approach
|
|
40
|
+
|
|
41
|
+
Two opt-in CLI knobs on `telepty inject`, both implemented client-side
|
|
42
|
+
in `cli.js`. Daemon `/submit` endpoint is untouched — `force: true` is
|
|
43
|
+
already supported (introduced in 0.3.1 for `telepty send-key`); we just
|
|
44
|
+
plumb it through from the inject path.
|
|
45
|
+
|
|
46
|
+
### `--submit-force`
|
|
47
|
+
|
|
48
|
+
Adds `force: true` to the `/submit` POST body. Daemon-side this skips
|
|
49
|
+
both Layer 3 (prompt-symbol) and Layer 1 (state-gate) and dispatches Enter
|
|
50
|
+
once via the existing `terminalLevelSubmit` chain (kitty → cmux → PTY).
|
|
51
|
+
|
|
52
|
+
Use case: caller is confident the target REPL is ready (e.g., orchestrator
|
|
53
|
+
visibly idle, or Phase-6 cascade where sub-session has just verified the
|
|
54
|
+
orchestrator's last bus event). Mirrors the existing `telepty send-key`
|
|
55
|
+
escape hatch but at the inject level so a single command does both.
|
|
56
|
+
|
|
57
|
+
### `--submit-retry N` (default 1, clamp [0, 3])
|
|
58
|
+
|
|
59
|
+
After a 504 from `/submit` with a **retry-safe** reason, wait 300 ms and
|
|
60
|
+
re-issue the same `/submit` request up to N times. Retry-safe reasons:
|
|
61
|
+
|
|
62
|
+
| Reason | Source | Why retry is idempotent |
|
|
63
|
+
|---|---|---|
|
|
64
|
+
| `gated_dispatch_unconsumed` | `daemon.js:1680` | The verify path saw the body STILL in the input box after best-effort dispatch. Re-firing Enter when the body is visibly un-consumed cannot double-submit. |
|
|
65
|
+
| `gate_timeout` | `awaitReplReady` returning `timeout` (no longer reaches 504 directly in 0.3.1, but kept for forward-compat) | Same: body has not been consumed if we're still on the gated path. |
|
|
66
|
+
| `no_prompt_symbol_seen` | `awaitPromptSymbol` Layer 3 timeout (also not currently a 504 source, but kept for forward-compat) | Layer 3 alone never emits 504 today. Listed for completeness. |
|
|
67
|
+
|
|
68
|
+
Retry is **explicitly NOT** safe for hard-fail reasons — `session_dead`,
|
|
69
|
+
`session_error`, `session_restarting`, `no_state`, `no_state_manager`. Those
|
|
70
|
+
short-circuit the loop immediately because re-firing won't recover. Same
|
|
71
|
+
for any non-504 status (4xx) — no point retrying a malformed request.
|
|
72
|
+
|
|
73
|
+
The retry preserves the original flag set (`force` stays `force`, etc.).
|
|
74
|
+
The `attemptsMade` counter is rendered into the success line as
|
|
75
|
+
`[retry K/N]` so operators can see when the retry path actually fired.
|
|
76
|
+
|
|
77
|
+
### Why client-side (not daemon-side)?
|
|
78
|
+
|
|
79
|
+
- Server-side already retries once internally inside `verifyBodyConsumed`
|
|
80
|
+
(`daemon.js:1663-1672`). Adding a second loop server-side conflates two
|
|
81
|
+
feedback signals (the inner verify retry vs. the outer client retry) in
|
|
82
|
+
one response shape.
|
|
83
|
+
- Per-call client control is more flexible — sub-sessions that have
|
|
84
|
+
cheap evidence of orchestrator readiness can pass `--submit-retry 0`
|
|
85
|
+
to avoid the extra round-trip; ones that don't can pass `--submit-retry 2`.
|
|
86
|
+
- Keeps the daemon stable. 0.3.0 cluster (memory:
|
|
87
|
+
`feedback_telepty_send_key_regression.md`) was a daemon-side change that
|
|
88
|
+
rippled into manual-override breakage. Client-side change has a strictly
|
|
89
|
+
smaller blast radius.
|
|
90
|
+
|
|
91
|
+
## File map
|
|
92
|
+
|
|
93
|
+
| File | Change | LoC delta |
|
|
94
|
+
|---|---|---|
|
|
95
|
+
| `cli.js` (inject command) | Parse `--submit-force` + `--submit-retry`. Wrap existing `useSubmit` block in idempotent retry loop on 504-with-safe-reason. | +~55, -~25 |
|
|
96
|
+
| `test/cli.test.js` | Three new tests: --submit-force passes force=true; --submit-retry retries on safe-reason 504; --submit-retry does NOT retry on hard-fail 504. | +~120 |
|
|
97
|
+
| `CHANGELOG.md` | 0.3.3 entry. | +~30 |
|
|
98
|
+
| `package.json` | 0.3.2 → 0.3.3. | +1, -1 |
|
|
99
|
+
| `test/enforce-report.test.js:280` | Update stale version assertion 0.2.0 → 0.3.3. | +1, -1 |
|
|
100
|
+
| `README.md` | Mention new flags in inject summary. | +~6 |
|
|
101
|
+
|
|
102
|
+
No new files outside `test/` and `docs/`. No daemon changes. No new
|
|
103
|
+
dependencies. Total surface ≪ 200 LoC including tests.
|
|
104
|
+
|
|
105
|
+
## Tests
|
|
106
|
+
|
|
107
|
+
### Unit / integration (`test/cli.test.js`)
|
|
108
|
+
|
|
109
|
+
1. **`--submit-force` passes `force: true` to /submit**
|
|
110
|
+
Spawn a session, intercept `/submit` (use existing harness method or
|
|
111
|
+
inspect bus event), invoke `telepty inject --submit --submit-force <id>
|
|
112
|
+
"x"`, assert daemon received `{ force: true }` in the request body.
|
|
113
|
+
|
|
114
|
+
2. **`--submit-retry N` retries on safe-reason 504**
|
|
115
|
+
Mock the daemon to return 504 `{reason: 'gated_dispatch_unconsumed'}`
|
|
116
|
+
on the first call and 200 on the second. Assert the CLI made exactly
|
|
117
|
+
2 POST /submit calls and exited 0. Assert `[retry 1/N]` is present
|
|
118
|
+
in stdout.
|
|
119
|
+
|
|
120
|
+
3. **`--submit-retry N` does NOT retry on hard-fail 504**
|
|
121
|
+
Mock the daemon to return 504 `{reason: 'session_dead'}`. Assert the
|
|
122
|
+
CLI made exactly 1 POST /submit call (no retry).
|
|
123
|
+
|
|
124
|
+
### Regression — full suite
|
|
125
|
+
|
|
126
|
+
`npm test` — 229 tests, all should pass after updating the stale
|
|
127
|
+
`enforce-report.test.js:280` version assertion.
|
|
128
|
+
|
|
129
|
+
## Future-proofing notes
|
|
130
|
+
|
|
131
|
+
- If the daemon adds new 504 reasons, they are by default **NOT** retry-
|
|
132
|
+
safe (the safe set is an explicit allowlist). Adding a new safe reason
|
|
133
|
+
is a one-line `RETRY_SAFE_REASONS.add(...)` change in `cli.js`.
|
|
134
|
+
- The flag pair composes: `--submit-force --submit-retry 0` (force-once),
|
|
135
|
+
`--submit-force --submit-retry 2` (force, with idempotent retry on the
|
|
136
|
+
rare 503 — though force never returns 504 today).
|
|
137
|
+
- The 300 ms retry delay is a constant, not a flag, to keep the surface
|
|
138
|
+
small. Empirically chosen at the upper end of the architect's
|
|
139
|
+
100–300 ms window for the autocomplete-dropdown-close case.
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@dmsdc-ai/aigentry-telepty",
|
|
3
|
-
"version": "0.
|
|
3
|
+
"version": "0.3.3",
|
|
4
4
|
"main": "daemon.js",
|
|
5
5
|
"bin": {
|
|
6
6
|
"aigentry-telepty": "install.js",
|
|
@@ -9,9 +9,9 @@
|
|
|
9
9
|
"telepty-mcp": "mcp-server/index.mjs"
|
|
10
10
|
},
|
|
11
11
|
"scripts": {
|
|
12
|
-
"test": "node --test test/auth.test.js test/daemon.test.js test/daemon-singleton.test.js test/cli.test.js test/skill-installer.test.js test/interactive-terminal.test.js test/runtime-info.test.js test/session-routing.test.js test/session-state.test.js test/mailbox-lock.test.js",
|
|
13
|
-
"test:watch": "node --test --watch test/auth.test.js test/daemon.test.js test/daemon-singleton.test.js test/cli.test.js test/skill-installer.test.js test/interactive-terminal.test.js test/runtime-info.test.js test/session-routing.test.js test/session-state.test.js test/mailbox-lock.test.js",
|
|
14
|
-
"test:ci": "node --test --test-reporter=spec test/auth.test.js test/daemon.test.js test/daemon-singleton.test.js test/cli.test.js test/skill-installer.test.js test/interactive-terminal.test.js test/runtime-info.test.js test/session-routing.test.js test/session-state.test.js test/mailbox-lock.test.js"
|
|
12
|
+
"test": "node --test test/auth.test.js test/daemon.test.js test/daemon-singleton.test.js test/cli.test.js test/skill-installer.test.js test/interactive-terminal.test.js test/runtime-info.test.js test/session-routing.test.js test/session-state.test.js test/mailbox-lock.test.js test/report-enforcement.test.js test/enforce-report.test.js test/submit-gate.test.js test/prompt-symbol-registry.test.js test/inject-submit-flags.test.js",
|
|
13
|
+
"test:watch": "node --test --watch test/auth.test.js test/daemon.test.js test/daemon-singleton.test.js test/cli.test.js test/skill-installer.test.js test/interactive-terminal.test.js test/runtime-info.test.js test/session-routing.test.js test/session-state.test.js test/mailbox-lock.test.js test/report-enforcement.test.js test/enforce-report.test.js test/submit-gate.test.js test/prompt-symbol-registry.test.js test/inject-submit-flags.test.js",
|
|
14
|
+
"test:ci": "node --test --test-reporter=spec test/auth.test.js test/daemon.test.js test/daemon-singleton.test.js test/cli.test.js test/skill-installer.test.js test/interactive-terminal.test.js test/runtime-info.test.js test/session-routing.test.js test/session-state.test.js test/mailbox-lock.test.js test/report-enforcement.test.js test/enforce-report.test.js test/submit-gate.test.js test/prompt-symbol-registry.test.js test/inject-submit-flags.test.js"
|
|
15
15
|
},
|
|
16
16
|
"keywords": [
|
|
17
17
|
"pty",
|
|
@@ -0,0 +1,237 @@
|
|
|
1
|
+
# SPEC: Enforce result-summary REPORT when sessions go idle
|
|
2
|
+
|
|
3
|
+
**Source:** orchestrator inject d94c9990...
|
|
4
|
+
**Session:** aigentry-telepty
|
|
5
|
+
**Status:** SPEC — awaiting orchestrator approval
|
|
6
|
+
**Topic:** REPORT enforcement after inject-driven idle transitions
|
|
7
|
+
|
|
8
|
+
---
|
|
9
|
+
|
|
10
|
+
## 1. Design options & recommendation
|
|
11
|
+
|
|
12
|
+
### Option A — Gate idle transition until REPORT arrives
|
|
13
|
+
Prevent `idle` transition from firing for N seconds until content REPORT
|
|
14
|
+
detected as sent by the session.
|
|
15
|
+
|
|
16
|
+
- ❌ Violates invariant: "Do NOT break existing idle detection"
|
|
17
|
+
- ❌ Requires invasive state machine changes
|
|
18
|
+
- **Rejected.**
|
|
19
|
+
|
|
20
|
+
### Option B — Auto-summarize PTY output
|
|
21
|
+
Scrape last X lines of session PTY output, strip ANSI, attach as
|
|
22
|
+
`auto_summary` field on `TASK_COMPLETE`.
|
|
23
|
+
|
|
24
|
+
- ✅ Zero session-side changes
|
|
25
|
+
- ✅ Always provides content payload
|
|
26
|
+
- ❌ PTY scraping is noisy (progress bars, status lines, spinner remnants)
|
|
27
|
+
- ❌ Masks the root cause — sessions still forget to REPORT
|
|
28
|
+
- **Keep as fallback, not primary.**
|
|
29
|
+
|
|
30
|
+
### Option C — Two-stage notification
|
|
31
|
+
On idle transition, fire `TASK_IDLE_NO_REPORT` (not `TASK_COMPLETE`).
|
|
32
|
+
Watch for content REPORT inject BACK to the source session for N seconds.
|
|
33
|
+
If REPORT detected → emit `TASK_COMPLETE_WITH_REPORT`. Else → emit
|
|
34
|
+
`TASK_TIMEOUT_NO_REPORT` with `auto_summary` fallback (Option B).
|
|
35
|
+
|
|
36
|
+
- ✅ Observable from orchestrator without code changes (richer events)
|
|
37
|
+
- ✅ Doesn't break existing idle detection (fires AFTER idle transition)
|
|
38
|
+
- ✅ No session-side changes required
|
|
39
|
+
- ✅ Backward-compat (old consumers see bus event, just with new `type`)
|
|
40
|
+
- ✅ Provides clear state difference between "REPORTed" and "idled silently"
|
|
41
|
+
- **Recommended primary.**
|
|
42
|
+
|
|
43
|
+
### Option D — Prompt-injection reminder
|
|
44
|
+
When session about to go idle after inject, auto-inject reminder text.
|
|
45
|
+
|
|
46
|
+
- ❌ Interferes with active work
|
|
47
|
+
- ❌ Doesn't guarantee compliance
|
|
48
|
+
- ❌ Session might be in final cleanup — inject causes confusion
|
|
49
|
+
- **Rejected.**
|
|
50
|
+
|
|
51
|
+
### Recommendation: **Option C + Option B fallback**
|
|
52
|
+
|
|
53
|
+
Two-stage notification with PTY-scrape auto-summary as timeout fallback.
|
|
54
|
+
Minimal blast radius, maximal observability, preserves all invariants.
|
|
55
|
+
|
|
56
|
+
---
|
|
57
|
+
|
|
58
|
+
## 2. Content REPORT schema
|
|
59
|
+
|
|
60
|
+
Parse from inject body text via prefix. Structured envelope would require
|
|
61
|
+
session-side library; free-text prefix keeps all LLMs compatible.
|
|
62
|
+
|
|
63
|
+
**Detection rule:** An inject from session X BACK to session Y (where Y was
|
|
64
|
+
the original `--from` source for X's last inject) whose prompt text starts
|
|
65
|
+
with one of:
|
|
66
|
+
- `REPORT:` (completed / partial result)
|
|
67
|
+
- `STATUS:` (blocked / dismissed / error)
|
|
68
|
+
- `ENFORCE-SPEC:`, `SPEC:`, `OWNER-DIAGNOSIS:` — recognized REPORT variants
|
|
69
|
+
|
|
70
|
+
Required fields (parsed from pipe-separated text):
|
|
71
|
+
- `source_session` — auto (sender of the reply inject)
|
|
72
|
+
- `target_session` — auto (recipient, i.e. the original orchestrator)
|
|
73
|
+
- `inject_ref` — auto (matched via pendingReports tracking)
|
|
74
|
+
- `status` — parsed from prefix: `REPORT:` → completed; `STATUS: blocked` → blocked; etc.
|
|
75
|
+
- `summary` — the full prompt text (20-500 chars recommended, not enforced)
|
|
76
|
+
- `artifacts` — optional, parsed from `files={...}` pipe-field
|
|
77
|
+
- `next_action` — optional, parsed from `next={...}` pipe-field
|
|
78
|
+
|
|
79
|
+
**Non-breaking:** If the reply inject doesn't match any REPORT prefix, it's
|
|
80
|
+
treated as a regular inject (current behavior preserved).
|
|
81
|
+
|
|
82
|
+
---
|
|
83
|
+
|
|
84
|
+
## 3. Timeout + failure handling
|
|
85
|
+
|
|
86
|
+
| Condition | Action | Notification |
|
|
87
|
+
|---|---|---|
|
|
88
|
+
| REPORT arrives within `reportTimeoutSecs` (default 120s) | Cancel timer, mark as reported | `TASK_COMPLETE_WITH_REPORT` (rich payload) |
|
|
89
|
+
| No REPORT within `reportTimeoutSecs` | Fire timeout | `TASK_TIMEOUT_NO_REPORT` with `auto_summary` (last 40 non-blank stripAnsi lines from `session.outputRing`) |
|
|
90
|
+
| Session sends `STATUS: blocked` explicitly | Immediate settlement | `TASK_BLOCKED_WITH_REASON` |
|
|
91
|
+
| Session dies before REPORT | Detected via `dead` transition | `TASK_DEAD_NO_REPORT` with `auto_summary` |
|
|
92
|
+
|
|
93
|
+
**Interaction with existing 60s deliberation timeout:** Orthogonal. Deliberation
|
|
94
|
+
timeout is a separate orchestrator-level concept. This daemon-level REPORT
|
|
95
|
+
timeout fires AFTER idle but BEFORE any orchestrator follow-up. Default 120s
|
|
96
|
+
gives orchestrator time to see `TASK_IDLE_NO_REPORT` and follow up before
|
|
97
|
+
auto-summary fires.
|
|
98
|
+
|
|
99
|
+
---
|
|
100
|
+
|
|
101
|
+
## 4. Back-compat
|
|
102
|
+
|
|
103
|
+
- Legacy `TASK_COMPLETE: {session} is now idle after processing inject ({N}s)`
|
|
104
|
+
text format: **deprecated but kept emitting** for 1 minor version. Emit BOTH
|
|
105
|
+
the new `TASK_IDLE_NO_REPORT` bus event AND the legacy text-inject-to-source
|
|
106
|
+
during transition period.
|
|
107
|
+
- New bus event types: `TASK_IDLE_NO_REPORT`, `TASK_COMPLETE_WITH_REPORT`,
|
|
108
|
+
`TASK_TIMEOUT_NO_REPORT`, `TASK_BLOCKED_WITH_REASON`, `TASK_DEAD_NO_REPORT`.
|
|
109
|
+
- Sessions that never send REPORT: grandfathered — they get
|
|
110
|
+
`TASK_TIMEOUT_NO_REPORT` with auto-summary fallback (no hard failure).
|
|
111
|
+
- Orchestrator code that parses legacy `TASK_COMPLETE: ...` text: still works
|
|
112
|
+
(text still emitted during transition).
|
|
113
|
+
|
|
114
|
+
---
|
|
115
|
+
|
|
116
|
+
## 5. Scope boundaries
|
|
117
|
+
|
|
118
|
+
| Work source | Require REPORT? | How distinguished |
|
|
119
|
+
|---|---|---|
|
|
120
|
+
| Inject with `--from X` | ✅ Yes (track in `pendingReports[sessionId]`) | `pendingReports` map populated on inject |
|
|
121
|
+
| Inject without `--from` | ❌ No (no one to report to) | `pendingReports` key absent |
|
|
122
|
+
| User typed directly | ❌ No | No inject event, no pendingReport entry |
|
|
123
|
+
| Self-initiated REPORT inject | ❌ No (it IS the report) | prefix match: `REPORT:` etc. |
|
|
124
|
+
|
|
125
|
+
**Key rule:** Only sessions with a `pendingReports[id]` entry are subject to
|
|
126
|
+
enforcement. User-driven work naturally doesn't populate this map.
|
|
127
|
+
|
|
128
|
+
---
|
|
129
|
+
|
|
130
|
+
## 6. Files to modify
|
|
131
|
+
|
|
132
|
+
| File | Change |
|
|
133
|
+
|---|---|
|
|
134
|
+
| `daemon.js` — sessionStateManager.onTransition (lines 37-57) | Replace direct auto-report with two-stage notification. Fire `TASK_IDLE_NO_REPORT`, start REPORT watch timer. |
|
|
135
|
+
| `daemon.js` — inject endpoint (lines 1547-1550) | Extend `pendingReports[id]` with `awaitingReport: true`, `reportWatchUntil: ts`. |
|
|
136
|
+
| `daemon.js` — inject endpoint (new detection) | Check incoming inject prompt for REPORT prefix + reverse-match to originating pendingReport. If matched: cancel timer, fire `TASK_COMPLETE_WITH_REPORT`. |
|
|
137
|
+
| `daemon.js` — state machine `dead` transition handler | Fire `TASK_DEAD_NO_REPORT` with auto-summary. |
|
|
138
|
+
| `daemon.js` — new helper `buildAutoSummary(session)` | Read `session.outputRing`, strip ANSI, filter blanks, take last 40 lines, max 4KB. |
|
|
139
|
+
| `src/mailbox/config.js` or similar config | Add `reportTimeoutSecs: 120`, `autoSummaryLines: 40`, `autoSummaryMaxBytes: 4096`. |
|
|
140
|
+
| `daemon.js` — legacy auto-report removal (lines 2131-2147, 2328-2346) | Retire duplicate legacy paths (or keep with deprecation flag). |
|
|
141
|
+
| `test/daemon.test.js` | New tests: REPORT-detected path, timeout path, dead-before-report path, no-inject-source ignored path. |
|
|
142
|
+
|
|
143
|
+
No new files. No new ports. No new process spawning.
|
|
144
|
+
|
|
145
|
+
---
|
|
146
|
+
|
|
147
|
+
## 7. Test plan
|
|
148
|
+
|
|
149
|
+
**Unit tests (test/daemon.test.js additions):**
|
|
150
|
+
1. Idle after inject → emits `TASK_IDLE_NO_REPORT` bus event (NOT `TASK_COMPLETE`)
|
|
151
|
+
2. REPORT-prefixed inject reply within timeout → emits `TASK_COMPLETE_WITH_REPORT` with parsed fields
|
|
152
|
+
3. No REPORT within timeout → emits `TASK_TIMEOUT_NO_REPORT` with auto_summary containing last session output
|
|
153
|
+
4. `STATUS: blocked` reply → immediate `TASK_BLOCKED_WITH_REASON`
|
|
154
|
+
5. Session dies before report → `TASK_DEAD_NO_REPORT` with auto_summary
|
|
155
|
+
6. Idle WITHOUT pendingReports entry (user-driven work) → no enforcement events
|
|
156
|
+
7. `buildAutoSummary()`: strips ANSI, drops blanks, truncates to 40 lines / 4KB
|
|
157
|
+
8. Legacy text-inject to source still fires (back-compat grandfathering)
|
|
158
|
+
|
|
159
|
+
**E2E tests:**
|
|
160
|
+
1. Full cycle: `inject --from A B "task"` → B works → B sends `telepty inject --from B A "REPORT: ..."` → A receives REPORT → bus emits `TASK_COMPLETE_WITH_REPORT`
|
|
161
|
+
2. Timeout cycle: same but B never replies → after 120s → A receives `TASK_TIMEOUT_NO_REPORT` with auto_summary
|
|
162
|
+
|
|
163
|
+
**Regression:**
|
|
164
|
+
- All 131 existing tests pass unchanged
|
|
165
|
+
- Existing `TASK_COMPLETE:` text format still emitted (grandfather)
|
|
166
|
+
|
|
167
|
+
---
|
|
168
|
+
|
|
169
|
+
## 8. Semver
|
|
170
|
+
|
|
171
|
+
**Minor bump → 0.2.0.**
|
|
172
|
+
|
|
173
|
+
Justification:
|
|
174
|
+
- New bus event types (additive, not breaking)
|
|
175
|
+
- New config keys (additive with defaults)
|
|
176
|
+
- Legacy notification text preserved (back-compat)
|
|
177
|
+
- No breaking API changes
|
|
178
|
+
- Observable new behavior that consumers may opt into
|
|
179
|
+
|
|
180
|
+
Not a patch because it introduces new observable event types.
|
|
181
|
+
Not major because nothing is removed or renamed.
|
|
182
|
+
|
|
183
|
+
---
|
|
184
|
+
|
|
185
|
+
## 9. Risks — top 3
|
|
186
|
+
|
|
187
|
+
1. **REPORT detection false positives** — an inject back to source that
|
|
188
|
+
happens to start with "REPORT:" but is actually a new task request gets
|
|
189
|
+
miscategorized. Mitigation: REPORT detection requires BOTH prefix match
|
|
190
|
+
AND reverse-match to `pendingReports[senderSession]` with matching
|
|
191
|
+
`inject_ref`. If no pending outbound report tracked, treat as new inject.
|
|
192
|
+
2. **Auto-summary leaks sensitive output** — PTY output may contain secrets
|
|
193
|
+
(tokens, passwords echoed). Mitigation: honor a denylist regex
|
|
194
|
+
(`api[_-]?key|password|token=\\S+`) before attaching; truncate aggressive.
|
|
195
|
+
Document that auto_summary is best-effort preview, not full transcript.
|
|
196
|
+
3. **Timeout storm on orchestrator** — if many sessions timeout simultaneously,
|
|
197
|
+
orchestrator receives a flurry of `TASK_TIMEOUT_NO_REPORT` events.
|
|
198
|
+
Mitigation: rate-limit timeout emissions per-orchestrator via mailbox
|
|
199
|
+
coalescing (existing `notifyCoalesceMs`).
|
|
200
|
+
|
|
201
|
+
---
|
|
202
|
+
|
|
203
|
+
## 10. Open questions
|
|
204
|
+
|
|
205
|
+
1. **Should `TASK_IDLE_NO_REPORT` be delivered as an inject (legacy) or ONLY
|
|
206
|
+
as a bus event?** Recommendation: bus event only during transition — legacy
|
|
207
|
+
text-inject preserved unchanged. Rich event flows via bus where consumers
|
|
208
|
+
can subscribe.
|
|
209
|
+
2. **Cross-machine:** Does the REPORT watch timer survive tailnet peer relay?
|
|
210
|
+
Current `pendingReports` is in-memory on the daemon handling the inject.
|
|
211
|
+
If orchestrator is on a different machine, does the remote peer also track?
|
|
212
|
+
Recommendation: timer stays on the daemon that accepted the original
|
|
213
|
+
inject; remote orchestrator gets events via existing bus relay. No
|
|
214
|
+
cross-machine state sync needed.
|
|
215
|
+
3. **Should `dismissed` be session-initiated or orchestrator-initiated?**
|
|
216
|
+
Proposed: session sends `STATUS: dismissed` (I decided not to do this);
|
|
217
|
+
orchestrator can also mark via `DELETE /api/pendingReports/{id}`
|
|
218
|
+
(new endpoint). Both clear the watch.
|
|
219
|
+
4. **Two injects in quick succession from same orchestrator:** First inject
|
|
220
|
+
creates pendingReport; second inject arrives before REPORT for first.
|
|
221
|
+
Does second inject overwrite or queue? Recommendation: overwrite (only
|
|
222
|
+
latest inject expects REPORT). Log `[AUTO-REPORT] overwritten pending`
|
|
223
|
+
warning for observability.
|
|
224
|
+
5. **reportTimeoutSecs default (120s):** Is this the right baseline? Evidence
|
|
225
|
+
table shows tasks ranging 7.5s → 649s. 120s too short for long tasks.
|
|
226
|
+
Alternative: no default timer — only fire fallback when `dead` detected
|
|
227
|
+
or explicit orchestrator-side query. Needs orchestrator input.
|
|
228
|
+
|
|
229
|
+
---
|
|
230
|
+
|
|
231
|
+
## Invariants honored
|
|
232
|
+
|
|
233
|
+
- ✅ Existing idle detection unchanged (state machine onTransition fires as before)
|
|
234
|
+
- ✅ Orchestrator needs no code changes to benefit (bus events flow passively)
|
|
235
|
+
- ✅ No new process spawning / no new network ports
|
|
236
|
+
- ✅ Cross-machine sync via existing mailbox unchanged
|
|
237
|
+
- ✅ Scoped to REPORT enforcement — no inject rewrite
|
|
@@ -0,0 +1,97 @@
|
|
|
1
|
+
// src/prompt-symbol-registry.js — Per-CLI prompt-symbol detection (0.3.2)
|
|
2
|
+
// See docs/superpowers/specs/2026-04-26-prompt-symbol-render-gate.md
|
|
3
|
+
//
|
|
4
|
+
// Maps `session.command` (e.g. 'claude', 'codex', 'gemini') to a
|
|
5
|
+
// { symbol, byteSeq, detect(screen) → { found, line_index?, col? } }
|
|
6
|
+
// entry. The detect() function takes the rendered screen text from
|
|
7
|
+
// `cmux read-screen` (already terminal-state-applied; no ANSI stripping
|
|
8
|
+
// needed) and returns the LAST occurrence (closest to the bottom) so
|
|
9
|
+
// transcript echoes earlier in the viewport do not produce false positives.
|
|
10
|
+
//
|
|
11
|
+
// Adding a new CLI: append a new entry + write a unit test against a
|
|
12
|
+
// captured `cmux read-screen` sample.
|
|
13
|
+
|
|
14
|
+
'use strict';
|
|
15
|
+
|
|
16
|
+
const ENTRIES = {
|
|
17
|
+
// claude renders an empty input row as "❯" + spaces, sandwiched between
|
|
18
|
+
// two horizontal-rule lines made of U+2500 ('─').
|
|
19
|
+
claude: {
|
|
20
|
+
symbol: '❯',
|
|
21
|
+
byteSeq: Buffer.from([0xE2, 0x9D, 0xAF]),
|
|
22
|
+
detect(screen) {
|
|
23
|
+
const lines = String(screen == null ? '' : screen).split('\n');
|
|
24
|
+
for (let i = lines.length - 1; i >= 1; i--) {
|
|
25
|
+
const line = lines[i];
|
|
26
|
+
if (!/^❯\s*$/.test(line)) continue;
|
|
27
|
+
const above = lines[i - 1] || '';
|
|
28
|
+
const below = lines[i + 1] || '';
|
|
29
|
+
if (above.includes('─') || below.includes('─')) {
|
|
30
|
+
return { found: true, line_index: i, col: line.indexOf('❯') + 1 };
|
|
31
|
+
}
|
|
32
|
+
}
|
|
33
|
+
return { found: false };
|
|
34
|
+
},
|
|
35
|
+
},
|
|
36
|
+
// codex renders idle as " › <placeholder>" (column 2). Status footer
|
|
37
|
+
// ("gpt-5.5 …" or "gpt-5 …") sits 1–2 lines below.
|
|
38
|
+
codex: {
|
|
39
|
+
symbol: '›',
|
|
40
|
+
byteSeq: Buffer.from([0xE2, 0x80, 0xBA]),
|
|
41
|
+
detect(screen) {
|
|
42
|
+
const lines = String(screen == null ? '' : screen).split('\n');
|
|
43
|
+
for (let i = lines.length - 1; i >= 0; i--) {
|
|
44
|
+
const line = lines[i];
|
|
45
|
+
if (!/^ › /.test(line)) continue;
|
|
46
|
+
const footer = (lines[i + 1] || '') + '\n' + (lines[i + 2] || '');
|
|
47
|
+
if (/gpt-\d/.test(footer)) {
|
|
48
|
+
return { found: true, line_index: i, col: 2 };
|
|
49
|
+
}
|
|
50
|
+
}
|
|
51
|
+
return { found: false };
|
|
52
|
+
},
|
|
53
|
+
},
|
|
54
|
+
// gemini empty input: " * Type your message or @path/to/file"
|
|
55
|
+
// gemini non-empty: " * <user typed text>"
|
|
56
|
+
// Geometry: bracketed by U+2580 ('▀') above and U+2584 ('▄') below.
|
|
57
|
+
gemini: {
|
|
58
|
+
symbol: '*',
|
|
59
|
+
byteSeq: Buffer.from([0x2A]),
|
|
60
|
+
detect(screen) {
|
|
61
|
+
const lines = String(screen == null ? '' : screen).split('\n');
|
|
62
|
+
for (let i = lines.length - 1; i >= 1; i--) {
|
|
63
|
+
const line = lines[i];
|
|
64
|
+
if (!/^ \* {2,}/.test(line)) continue;
|
|
65
|
+
const above = lines[i - 1] || '';
|
|
66
|
+
const below = lines[i + 1] || '';
|
|
67
|
+
if (above.includes('▀') || below.includes('▄')) {
|
|
68
|
+
return { found: true, line_index: i, col: 2 };
|
|
69
|
+
}
|
|
70
|
+
}
|
|
71
|
+
return { found: false };
|
|
72
|
+
},
|
|
73
|
+
},
|
|
74
|
+
};
|
|
75
|
+
|
|
76
|
+
// Normalize: strip path and args
|
|
77
|
+
// '/usr/local/bin/claude --resume' → 'claude'
|
|
78
|
+
// 'codex resume' → 'resume' (false negative — see note)
|
|
79
|
+
//
|
|
80
|
+
// The naive split/pop returns the LAST whitespace-or-slash-delimited token,
|
|
81
|
+
// which is correct for absolute paths but wrong for `<bin> <subcmd>` forms.
|
|
82
|
+
// We compensate by also trying the FIRST path-stripped token before falling
|
|
83
|
+
// back to the last token, matching whichever ENTRIES key exists.
|
|
84
|
+
function lookup(command) {
|
|
85
|
+
if (!command) return null;
|
|
86
|
+
const raw = String(command).trim();
|
|
87
|
+
if (!raw) return null;
|
|
88
|
+
const tokens = raw.split(/\s+/).filter(Boolean);
|
|
89
|
+
for (const tok of tokens) {
|
|
90
|
+
const base = tok.split('/').filter(Boolean).pop() || '';
|
|
91
|
+
const key = base.toLowerCase();
|
|
92
|
+
if (ENTRIES[key]) return ENTRIES[key];
|
|
93
|
+
}
|
|
94
|
+
return null;
|
|
95
|
+
}
|
|
96
|
+
|
|
97
|
+
module.exports = { lookup, ENTRIES };
|
|
@@ -0,0 +1,86 @@
|
|
|
1
|
+
// src/report-enforcement.js — REPORT enforcement helpers (0.2.0)
|
|
2
|
+
// See specs/enforce-report-spec.md
|
|
3
|
+
//
|
|
4
|
+
// Exports pure, testable helpers:
|
|
5
|
+
// - classifyReportPrompt(prompt): categorize an inject prompt
|
|
6
|
+
// - buildAutoSummary(session, opts): scrape last lines of output with redaction
|
|
7
|
+
// - ANSI_STRIPPER_RE, SECRET_DENYLIST_RE: regex constants (exported for tests)
|
|
8
|
+
// - REPORT_PREFIX_RE, REPORT_STATUS_*_RE: classification regexes
|
|
9
|
+
|
|
10
|
+
'use strict';
|
|
11
|
+
|
|
12
|
+
// Prefix patterns that identify a content REPORT inject (reverse-match required)
|
|
13
|
+
const REPORT_PREFIX_RE = /^\s*(REPORT|STATUS|SPEC|OWNER-DIAGNOSIS|ENFORCE-SPEC|LOG-FIX-SPEC|LOG-FIX-IMPLEMENTED|FIX-SPEC|FIX-IMPLEMENTED|SPEC-SYNC|DIAGNOSIS|ENFORCE-IMPLEMENTED)[:\s]/;
|
|
14
|
+
const REPORT_STATUS_BLOCKED_RE = /^\s*STATUS:\s*blocked\b/i;
|
|
15
|
+
const REPORT_STATUS_DISMISSED_RE = /^\s*STATUS:\s*dismissed\b/i;
|
|
16
|
+
const REPORT_STATUS_ERROR_RE = /^\s*STATUS:\s*error\b/i;
|
|
17
|
+
|
|
18
|
+
// ANSI stripper (matches session-state.js)
|
|
19
|
+
const ANSI_STRIPPER_RE = /\x1b\[[0-9;]*[a-zA-Z]|\x1b\][^\x07]*\x07|\x1b[()][AB012]|\x1b\[[\?]?[0-9;]*[hlm]/g;
|
|
20
|
+
|
|
21
|
+
// Secret denylist — redact common credential patterns
|
|
22
|
+
const SECRET_DENYLIST_RE = /(api[_-]?key\s*[:=]\s*\S+|password\s*[:=]\s*\S+|token\s*[:=]\s*\S+|secret\s*[:=]\s*\S+)/gi;
|
|
23
|
+
|
|
24
|
+
// Default config (overridable via options)
|
|
25
|
+
const DEFAULT_AUTO_SUMMARY_LINES = 40;
|
|
26
|
+
const DEFAULT_AUTO_SUMMARY_MAX_BYTES = 4096;
|
|
27
|
+
|
|
28
|
+
/**
|
|
29
|
+
* Classify incoming inject prompt for REPORT enforcement.
|
|
30
|
+
* Returns one of: 'report_dismissed', 'report_blocked', 'report_error',
|
|
31
|
+
* 'report_complete', or null (not a report).
|
|
32
|
+
*
|
|
33
|
+
* Order matters: STATUS variants checked before generic prefix.
|
|
34
|
+
*/
|
|
35
|
+
function classifyReportPrompt(prompt) {
|
|
36
|
+
if (typeof prompt !== 'string') return null;
|
|
37
|
+
if (REPORT_STATUS_DISMISSED_RE.test(prompt)) return 'report_dismissed';
|
|
38
|
+
if (REPORT_STATUS_BLOCKED_RE.test(prompt)) return 'report_blocked';
|
|
39
|
+
if (REPORT_STATUS_ERROR_RE.test(prompt)) return 'report_error';
|
|
40
|
+
if (REPORT_PREFIX_RE.test(prompt)) return 'report_complete';
|
|
41
|
+
return null;
|
|
42
|
+
}
|
|
43
|
+
|
|
44
|
+
/**
|
|
45
|
+
* Build an auto_summary from a session's output ring.
|
|
46
|
+
* - Strips ANSI sequences
|
|
47
|
+
* - Filters blank lines
|
|
48
|
+
* - Takes last N non-blank lines
|
|
49
|
+
* - Redacts secrets via denylist regex
|
|
50
|
+
* - Caps at max_bytes total (UTF-8 byte length)
|
|
51
|
+
*
|
|
52
|
+
* @param {Object} session — { outputRing: string[] }
|
|
53
|
+
* @param {Object} [options]
|
|
54
|
+
* @param {number} [options.maxLines] — default 40
|
|
55
|
+
* @param {number} [options.maxBytes] — default 4096
|
|
56
|
+
* @returns {string}
|
|
57
|
+
*/
|
|
58
|
+
function buildAutoSummary(session, options = {}) {
|
|
59
|
+
const maxLines = options.maxLines || DEFAULT_AUTO_SUMMARY_LINES;
|
|
60
|
+
const maxBytes = options.maxBytes || DEFAULT_AUTO_SUMMARY_MAX_BYTES;
|
|
61
|
+
if (!session || !session.outputRing || session.outputRing.length === 0) return '';
|
|
62
|
+
|
|
63
|
+
const raw = session.outputRing.join('');
|
|
64
|
+
const stripped = raw.replace(ANSI_STRIPPER_RE, '');
|
|
65
|
+
const lines = stripped.split(/\r?\n/).map(l => l.trim()).filter(l => l.length > 0);
|
|
66
|
+
const tail = lines.slice(-maxLines);
|
|
67
|
+
let joined = tail.join('\n');
|
|
68
|
+
joined = joined.replace(SECRET_DENYLIST_RE, '[REDACTED]');
|
|
69
|
+
if (Buffer.byteLength(joined, 'utf8') > maxBytes) {
|
|
70
|
+
joined = joined.slice(0, maxBytes);
|
|
71
|
+
}
|
|
72
|
+
return joined;
|
|
73
|
+
}
|
|
74
|
+
|
|
75
|
+
module.exports = {
|
|
76
|
+
classifyReportPrompt,
|
|
77
|
+
buildAutoSummary,
|
|
78
|
+
REPORT_PREFIX_RE,
|
|
79
|
+
REPORT_STATUS_BLOCKED_RE,
|
|
80
|
+
REPORT_STATUS_DISMISSED_RE,
|
|
81
|
+
REPORT_STATUS_ERROR_RE,
|
|
82
|
+
ANSI_STRIPPER_RE,
|
|
83
|
+
SECRET_DENYLIST_RE,
|
|
84
|
+
DEFAULT_AUTO_SUMMARY_LINES,
|
|
85
|
+
DEFAULT_AUTO_SUMMARY_MAX_BYTES,
|
|
86
|
+
};
|