dominds 1.12.1 → 1.12.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,338 @@
1
+ # Daemon Runner and Recoverable Scrollback
2
+
3
+ Chinese version: [中文版](./daemon-cmd-runner.zh.md)
4
+
5
+ This document defines a replacement design for the `os` toolset's daemon mechanism. The current model can rediscover a previously launched daemon after a Dominds crash/restart, but it cannot recover the daemon's captured stdout/stderr scrollback because those buffers live in the Dominds main process memory. Once the main process is gone, `get_daemon_output` loses its source of truth.
6
+
7
+ The new design does not patch around that limitation. It moves daemon ownership, scrollback ownership, output queries, and stop control into a dedicated per-daemon runner process that survives a Dominds main-process restart.
8
+
9
+ This is a design document, not an implementation plan.
10
+
11
+ ---
12
+
13
+ ## Goals
14
+
15
+ - Keep daemon scrollback readable after the Dominds main process crashes and comes back.
16
+ - Move daemon ownership out of the Dominds main process and into an external `cmd_runner`.
17
+ - Use one execution path for both short-lived and long-lived commands.
18
+ - Replace the current `get_daemon_output(stream=...)` contract with a friendlier API that can fetch both streams at once.
19
+ - Make stdin behavior explicit: for now, daemonized shell commands are non-interactive.
20
+
21
+ ## Non-goals
22
+
23
+ - No PTY support in this round.
24
+ - No API for writing into daemon stdin.
25
+ - No cross-machine or cross-user daemon control protocol.
26
+ - No compatibility layer for the old daemon tracking model.
27
+
28
+ ---
29
+
30
+ ## Core Decisions
31
+
32
+ ### 1. `1 daemon : 1 runner`
33
+
34
+ Every daemon-capable `shell_cmd` execution gets its own `cmd_runner` process. That runner is responsible for:
35
+
36
+ - spawning the actual shell/command process
37
+ - owning the stdout/stderr pipes
38
+ - maintaining scrollback buffers
39
+ - answering local IPC requests for status, output, and stop
40
+
41
+ The runner is not a shared global service. It lives and dies with the daemon it owns:
42
+
43
+ - if the daemon exits normally, the runner exits too
44
+ - if the daemon is stopped, the runner exits too
45
+ - a runner never manages multiple daemon commands
46
+
47
+ This is the key architectural shift. The component that owns the output buffers must not be the same process that is allowed to crash and restart independently.
48
+
49
+ ### 2. Every `shell_cmd` starts under a runner from the beginning
50
+
51
+ The old model had an ownership split:
52
+
53
+ - Dominds main process spawns the command
54
+ - Dominds main process captures output
55
+ - if the timeout expires, the command is reclassified as a daemon
56
+
57
+ That split is the root of the recovery problem. The new model removes it entirely.
58
+
59
+ Every `shell_cmd` runs under a runner from the start:
60
+
61
+ - if the command finishes before the timeout, the runner returns the result and exits
62
+ - if the timeout expires, the runner stays alive, the command becomes a tracked daemon, and reminder metadata is written
63
+
64
+ There is no “upgrade” step where ownership changes mid-flight.
65
+
66
+ ### 3. On Unix, the runner is the process-group leader
67
+
68
+ The runner should be the process-group leader, and the daemon command should inherit that pgid by default. That makes fallback cleanup straightforward.
69
+
70
+ The stop flow is intentionally two-layered:
71
+
72
+ - **Graceful stop path:** the runner sends a signal directly to the daemon pid
73
+ - **Fallback path:** if the daemon does not exit in time, the Dominds main process kills the whole process group
74
+ - **Escape is allowed:** if the daemon intentionally changes its pgid, that is treated as an escape hatch; fallback cleanup should still attempt a direct pid kill afterward
75
+
76
+ The direct signal to the daemon pid is the primary stop mechanism, not an afterthought.
77
+
78
+ ---
79
+
80
+ ## High-level Structure
81
+
82
+ There are three roles in the new model:
83
+
84
+ - **Dominds main process**
85
+ - handles tool calls
86
+ - persists reminders
87
+ - reconnects to runners after restart
88
+ - **cmd_runner**
89
+ - executes the shell command
90
+ - owns stdout/stderr scrollback
91
+ - serves a local IPC endpoint
92
+ - **daemon command**
93
+ - the actual user-requested command
94
+ - supervised by the runner
95
+
96
+ Ownership boundaries are simple:
97
+
98
+ - the runner is the source of truth for daemon output
99
+ - the Dominds main process is not a durable owner of daemon scrollback
100
+ - reminders store only the data needed to reconnect and validate identity
101
+
102
+ ---
103
+
104
+ ## Tool Contract Changes
105
+
106
+ ## `get_daemon_output`
107
+
108
+ ### Old contract
109
+
110
+ - `pid`
111
+ - optional `stream: "stdout" | "stderr"`
112
+
113
+ ### New contract
114
+
115
+ - `pid: number`
116
+ - `stdout?: boolean`
117
+ - `stderr?: boolean`
118
+
119
+ ### Semantics
120
+
121
+ - if both booleans are omitted, treat them as `stdout=true` and `stderr=true`
122
+ - if both are explicitly `false`, return an error
123
+ - output order is always `stdout` first, then `stderr`
124
+ - each stream gets its own heading, content block, and scroll notice
125
+ - unrequested streams are omitted entirely
126
+
127
+ ### Why this is better
128
+
129
+ Daemon troubleshooting usually wants both streams together. Two booleans are also a better fit for “only stdout”, “only stderr”, or “both” than a single enum.
130
+
131
+ There is no compatibility shim for the old `stream` parameter.
132
+
133
+ ## `stop_daemon`
134
+
135
+ The tool keeps the same role, but the control path changes:
136
+
137
+ 1. Dominds connects to the runner
138
+ 2. the runner sends a graceful stop signal directly to the daemon pid
139
+ 3. Dominds waits for a short grace window
140
+ 4. if needed, Dominds kills the whole process group as fallback
141
+ 5. if needed, Dominds also kills the daemon pid directly
142
+ 6. reminder state and local tracking are removed
143
+
144
+ The runner-driven direct pid signal is the first-class stop mechanism.
145
+
146
+ ---
147
+
148
+ ## IPC Model
149
+
150
+ ### Transport
151
+
152
+ - Linux: prefer `${XDG_RUNTIME_DIR}`; fall back to a writable temp directory if needed; the endpoint name includes the daemon pid
153
+ - macOS: use a socket under `${TMPDIR}`; the exact path is persisted in reminder metadata, so `TMPDIR` drift across restarts does not matter
154
+ - Windows: use a named pipe with a stable daemon-oriented name
155
+
156
+ The main rule is simple:
157
+
158
+ - the exact endpoint path/name must be stored in reminder metadata
159
+ - recovery should reconnect to that exact endpoint instead of trying to reconstruct it heuristically
160
+
161
+ Using `/run/...` directly as a default is discouraged because many ordinary user processes cannot create sockets at the filesystem root under `/run`.
162
+
163
+ ### Protocol style
164
+
165
+ v1 only needs a local request-response protocol. No subscriptions, no streaming channel, no long-lived session state.
166
+
167
+ The runner should support at least:
168
+
169
+ - `ping`
170
+ - `get_status`
171
+ - `get_output`
172
+ - `stop`
173
+
174
+ Responses should always include enough identity data for validation:
175
+
176
+ - `daemonPid`
177
+ - `runnerPid`
178
+ - `startTime`
179
+ - `daemonCommandLine`
180
+
181
+ The point is not just to prove “something is listening on that endpoint”. The point is to prove that the listener is still the runner for the original daemon instance.
182
+
183
+ ### `get_output`
184
+
185
+ Its request shape should mirror the tool contract:
186
+
187
+ - `stdout: boolean`
188
+ - `stderr: boolean`
189
+
190
+ Its response should keep the streams separate:
191
+
192
+ - `stdout.content`
193
+ - `stdout.linesScrolledOut`
194
+ - `stderr.content`
195
+ - `stderr.linesScrolledOut`
196
+
197
+ Do not collapse both streams into one merged blob. The main process still needs stream-aware rendering and diagnostics.
198
+
199
+ ---
200
+
201
+ ## Reminder Metadata Contract
202
+
203
+ Daemon reminders should store at least:
204
+
205
+ - `kind: "daemon"`
206
+ - `daemonPid`
207
+ - `runnerPid`
208
+ - `runnerEndpoint`
209
+ - `initialCommandLine`
210
+ - `daemonCommandLine`
211
+ - `shell`
212
+ - `startTime`
213
+ - `processGroupId`
214
+ - `originDialogId`
215
+ - `completed?`
216
+ - `lastUpdated?`
217
+
218
+ Why each field matters:
219
+
220
+ - `daemonPid` is the tool-facing identity users care about
221
+ - `runnerEndpoint` is the primary reconnect target
222
+ - `runnerPid` and `processGroupId` help with stop and stale cleanup
223
+ - `daemonCommandLine + startTime` protect against pid reuse
224
+
225
+ This design intentionally does **not** add an `authToken`. If recovery must survive a main-process restart, any such token would also need to survive in recoverable state. Under that constraint, it adds complexity without much real local-security value. The meaningful protection boundary here is local reachability plus restrictive endpoint permissions.
226
+
227
+ ---
228
+
229
+ ## Stale Detection and Cleanup
230
+
231
+ When Dominds touches a daemon reminder, recovery should work like this:
232
+
233
+ 1. read `runnerEndpoint` from reminder metadata
234
+ 2. try to connect and issue `ping` or `get_status`
235
+ 3. if the runner reports matching `daemonPid`, `daemonCommandLine`, and `startTime`, treat it as healthy
236
+ 4. if the endpoint is unreachable, inspect the current OS process for `daemonPid`
237
+ 5. if that process no longer exists, drop the reminder
238
+ 6. if that process still exists and still matches the recorded command line and start time, treat it as a **stale daemon**
239
+ 7. kill the stale daemon and drop the reminder
240
+ 8. if the pid now belongs to an unrelated process, do not kill it; just invalidate the reminder
241
+
242
+ The key rule is:
243
+
244
+ - a daemon is only truly recoverable if its runner is still reachable
245
+
246
+ If the daemon process is still alive but the runner is gone, the scrollback owner is already gone too. That is a stale instance, not a healthy one.
247
+
248
+ ---
249
+
250
+ ## Scrollback Semantics
251
+
252
+ The runner owns two independent rolling buffers:
253
+
254
+ - `stdout`
255
+ - `stderr`
256
+
257
+ The retention policy can stay line-based:
258
+
259
+ - each stream tracks its own `linesScrolledOut`
260
+ - `get_daemon_output` reports the two streams separately
261
+ - daemon reminder snapshots render the two streams separately too
262
+
263
+ As long as the runner is alive, a restarted Dominds main process can reconnect and read the same scrollback state. That is the entire point of the redesign.
264
+
265
+ ---
266
+
267
+ ## stdin Policy
268
+
269
+ For this round, shell commands under the runner are explicitly **non-interactive**:
270
+
271
+ - `stdin` is always configured as `ignore`
272
+ - commands do not inherit the Dominds main-process terminal
273
+ - there is no stdin forwarding API
274
+ - the system does not pretend to support partial interactivity
275
+
276
+ This is cleaner than the old quasi-interactive setup:
277
+
278
+ - commands that expect stdin will see EOF immediately or fail according to their own logic
279
+ - commands do not hang forever waiting for invisible input
280
+ - the product contract becomes honest: interactive shells are not supported yet
281
+
282
+ If interactive command sessions are needed later, that should be a separate design with:
283
+
284
+ - a PTY-backed runtime
285
+ - explicit write-to-stdin APIs
286
+ - a clear split between ordinary shell commands and interactive terminal sessions
287
+
288
+ ---
289
+
290
+ ## Failure Semantics
291
+
292
+ The redesign must stay loud by default. It must not silently degrade into “no output”.
293
+
294
+ Examples:
295
+
296
+ - runner endpoint unreachable
297
+ - runner endpoint reachable but identity does not match the reminder
298
+ - daemon pid still alive but runner is gone
299
+ - `stdout=false` and `stderr=false`
300
+ - stop requested but the daemon refuses to die
301
+
302
+ Those cases should surface as explicit errors or explicit stale/unrecoverable states. In particular:
303
+
304
+ - “runner gone, daemon still alive” must not be rendered as an empty output buffer
305
+ - pid reuse must be detected explicitly so that Dominds never mistakes an unrelated process for the original daemon
306
+
307
+ ---
308
+
309
+ ## Replacement Scope
310
+
311
+ This redesign should replace the daemon path end-to-end:
312
+
313
+ - `shell_cmd` daemon execution becomes runner-owned
314
+ - `get_daemon_output` moves to the dual-boolean contract
315
+ - `stop_daemon` becomes runner-aware
316
+ - daemon reminders become runner-aware
317
+ - the main-process daemon scrollback owner logic is removed
318
+
319
+ The following are explicitly out of scope:
320
+
321
+ - a compatibility layer for the old `stream` parameter
322
+ - long-term compatibility for old daemon reminder metadata
323
+ - dual-write or dual-read between main-process buffers and runner buffers
324
+
325
+ Old daemon reminders from the previous implementation should be treated as non-recoverable legacy state and cleaned up on first contact instead of being kept alive through a half-compatible path.
326
+
327
+ ---
328
+
329
+ ## Summary
330
+
331
+ This redesign is not about making `get_daemon_output` slightly smarter. It is about moving daemon ownership to the only place where restart-safe scrollback can actually exist.
332
+
333
+ - **execution owner:** runner
334
+ - **scrollback owner:** runner
335
+ - **stop owner:** runner first, main process as fallback
336
+ - **recovery owner:** main process reconnecting through reminder metadata
337
+
338
+ Once those ownership boundaries are corrected, daemon log recovery after a Dominds main-process restart becomes a real capability instead of a best-effort illusion.
@@ -0,0 +1,339 @@
1
+ # daemon runner 与可恢复 scrollback 设计
2
+
3
+ 英文版:[English](./daemon-cmd-runner.md)
4
+
5
+ 本文定义 `os` 工具中 daemon 机制的新设计,用于解决一个明确缺陷:Dominds 主进程异常退出并重启后,虽然还能凭 reminder 重新识别出先前的 daemon,但旧实现里的 stdout/stderr scrollback buffer 已经随主进程内存一起丢失,`get_daemon_output` 因而失效。
6
+
7
+ 这次设计不做兼容层,也不保留双轨实现。目标不是“在现有内存态 tracking 上补丁式修复”,而是把 daemon 托管、scrollback 持有、日志读取、停止控制这一整条链路改成一套可以跨主进程重连恢复的机制。
8
+
9
+ 本文是设计文档,不讨论具体代码拆分与实现排期。
10
+
11
+ ---
12
+
13
+ ## 目标
14
+
15
+ - 让 daemon 的 stdout/stderr scrollback 在 Dominds 主进程崩溃重启后仍可读取。
16
+ - 把 daemon 的真实 owner 从 Dominds 主进程迁移到独立 `cmd_runner` 进程。
17
+ - 统一短命令与长命令执行链路,避免“先由主进程跑,超时后再升级成 daemon”这种中途切换。
18
+ - 简化 `get_daemon_output` 工具契约,让一次调用即可同时看到 `stdout` 与 `stderr`。
19
+ - 明确 interactive stdin 当前不支持,避免命令误以为可从 Dominds 主进程终端读取输入。
20
+
21
+ ## 非目标
22
+
23
+ - 本轮不引入 PTY。
24
+ - 本轮不提供“向 daemon stdin 喂数据”的工具 API。
25
+ - 本轮不设计跨机器、跨用户、跨宿主机的远程守护协议。
26
+ - 本轮不保留旧 reminder meta 与旧 daemon tracking 的兼容恢复路径。
27
+
28
+ ---
29
+
30
+ ## 核心结论
31
+
32
+ ### 1. 采用 `1 daemon : 1 runner`
33
+
34
+ 每一个通过 `shell_cmd` 进入 daemon 托管语义的命令,都由一个独立的 `cmd_runner` 进程负责:
35
+
36
+ - 创建目标 shell / command 子进程
37
+ - 持有该子进程的 `stdout` / `stderr` pipe
38
+ - 维护 scrollback buffer
39
+ - 对外提供状态查询、日志读取、停止请求
40
+
41
+ runner 与其托管的 daemon 绑定生死:
42
+
43
+ - daemon 正常退出,runner 随之退出
44
+ - daemon 被 `stop_daemon` 停止后,runner 也应退出
45
+ - runner 不做全局常驻服务,不做多 daemon 复用
46
+
47
+ 这样做的意义是把“真正拥有日志流与滚动缓冲区的主体”从 Dominds 主进程中剥离出去。主进程即便崩溃,runner 只要还活着,日志查询能力就还在。
48
+
49
+ ### 2. 所有 `shell_cmd` 一律从一开始就经由 runner 执行
50
+
51
+ 不再允许以下旧路径:
52
+
53
+ - Dominds 主进程自己 `spawn`
54
+ - 主进程自己监听 `stdout` / `stderr`
55
+ - 运行超时后才把它“视作 daemon”
56
+
57
+ 新设计下,不论命令最终是短命还是长命,都先由 runner 执行:
58
+
59
+ - 若命令在超时前结束,runner 汇总输出后直接返回结果并退出
60
+ - 若命令超时,则返回“已作为 daemon 启动”,同时写入 reminder meta,runner 继续托管该命令
61
+
62
+ 这样才能保证 pipe ownership、scrollback owner、stop owner 始终一致。
63
+
64
+ ### 3. Unix 下默认让 runner 成为 process group leader
65
+
66
+ runner 自己作为进程组 leader,daemon 默认继承该 pgid。这样 `stop_daemon` 的兜底清理可以直接面向整个进程组。
67
+
68
+ 这不是说“优雅停止靠杀进程组”,而是:
69
+
70
+ - **优雅停止主路径:** runner 直接向 daemon pid 发信号
71
+ - **兜底清理:** 若 daemon 未在预期时间内退出,Dominds 主进程再面向整个 pg 杀一轮
72
+ - **逃逸容忍:** 如果 daemon 主动改了 pgid,允许它作为逃逸手段存在;此时兜底还应再补一轮直接面向 daemon pid 的 kill
73
+
74
+ ---
75
+
76
+ ## 总体结构
77
+
78
+ ### 运行角色
79
+
80
+ 有三个角色:
81
+
82
+ - **Dominds 主进程**
83
+ - 处理工具调用
84
+ - 维护 reminder
85
+ - 在重启后依据 reminder 恢复对 runner 的连接
86
+ - **cmd_runner**
87
+ - 真正执行 shell 命令
88
+ - 维护 scrollback buffer
89
+ - 提供本地 IPC 服务
90
+ - **daemon command**
91
+ - 真正业务命令
92
+ - 被 runner 托管
93
+
94
+ ### 所有权边界
95
+
96
+ - `stdout` / `stderr` buffer 的 single source of truth 在 runner
97
+ - Dominds 主进程不再保存 daemon 运行期日志缓冲的权威副本
98
+ - reminder 只保存恢复连接和 stale 判定所需的元信息
99
+
100
+ ---
101
+
102
+ ## 工具契约调整
103
+
104
+ ## `get_daemon_output`
105
+
106
+ ### 旧契约
107
+
108
+ - 参数:`pid`,可选 `stream: "stdout" | "stderr"`
109
+ - 一次只能看一个流
110
+
111
+ ### 新契约
112
+
113
+ - 参数:
114
+ - `pid: number`
115
+ - `stdout?: boolean`
116
+ - `stderr?: boolean`
117
+
118
+ ### 语义
119
+
120
+ - 两个参数都省略时,默认 `stdout=true` 且 `stderr=true`
121
+ - 显式传 `stdout=false, stderr=false` 时,直接报错
122
+ - 返回顺序固定为 `stdout` 在前、`stderr` 在后
123
+ - 每个流各自带自己的标题、内容、scroll notice
124
+ - 未请求的流不展示
125
+
126
+ ### 取舍理由
127
+
128
+ - daemon 调试场景里,绝大多数时候需要同时看两个流
129
+ - 双 bool 比单选枚举更适合表达“看其中一个 / 看两个 / 明确排除某一个”
130
+ - 不保留旧 `stream` 参数兼容层,避免工具契约双轨
131
+
132
+ ## `stop_daemon`
133
+
134
+ `stop_daemon` 的职责不变,但内部控制链路改为:
135
+
136
+ 1. 主进程连接 runner
137
+ 2. runner 直接对 daemon pid 发优雅停止信号
138
+ 3. 等待短暂宽限期
139
+ 4. 若未退出,主进程对整个 pg 做兜底 kill
140
+ 5. 若需要,再补一轮直接对 daemon pid kill
141
+ 6. 清理 reminder 与本地 tracking 状态
142
+
143
+ 这里“对 daemon pid 发信号”是主路径,不是后备手段。
144
+
145
+ ---
146
+
147
+ ## IPC 设计
148
+
149
+ ### 传输介质
150
+
151
+ - Linux:优先 `${XDG_RUNTIME_DIR}`;若不可用,再退化到可写临时目录;endpoint 名称中包含 daemon pid
152
+ - macOS:使用 `${TMPDIR}` 下的 socket 路径;准确路径直接写入 reminder meta,不依赖重启后重新推导
153
+ - Windows:使用全局 named pipe 命名
154
+
155
+ 约定重点是:
156
+
157
+ - **endpoint 的精确路径/名称必须写入 reminder meta**
158
+ - 主进程恢复时优先信任 meta 里的 endpoint,而不是靠平台规则重新猜
159
+
160
+ 不建议把 Linux 默认路径写死为 `/run/...` 根目录,因为普通用户进程通常并不具备在那里直接创建 socket 的权限。
161
+
162
+ ### 协议风格
163
+
164
+ v1 采用简单本地请求-响应协议即可,不做长连接订阅。
165
+
166
+ 建议 runner 支持以下请求:
167
+
168
+ - `ping`
169
+ - `get_status`
170
+ - `get_output`
171
+ - `stop`
172
+
173
+ 建议响应中始终带上:
174
+
175
+ - `ok`
176
+ - `daemonPid`
177
+ - `runnerPid`
178
+ - `startTime`
179
+ - `daemonCommandLine`
180
+
181
+ 其中 `ping` / `get_status` 的作用不只是“证明 endpoint 可连”,还要让主进程确认“这个 endpoint 对应的确实是当初那个 daemon”。
182
+
183
+ ### `get_output` 请求
184
+
185
+ 请求体语义建议与工具契约对齐:
186
+
187
+ - `stdout: boolean`
188
+ - `stderr: boolean`
189
+
190
+ 响应体应分别返回:
191
+
192
+ - `stdout.content`
193
+ - `stdout.linesScrolledOut`
194
+ - `stderr.content`
195
+ - `stderr.linesScrolledOut`
196
+
197
+ 不要把两个流合并成单一文本再返回,否则主进程就失去精确展示与错误诊断能力。
198
+
199
+ ---
200
+
201
+ ## reminder meta 契约
202
+
203
+ daemon reminder 至少应保存以下字段:
204
+
205
+ - `kind: "daemon"`
206
+ - `daemonPid`
207
+ - `runnerPid`
208
+ - `runnerEndpoint`
209
+ - `initialCommandLine`
210
+ - `daemonCommandLine`
211
+ - `shell`
212
+ - `startTime`
213
+ - `processGroupId`
214
+ - `originDialogId`
215
+ - `completed?`
216
+ - `lastUpdated?`
217
+
218
+ 其中:
219
+
220
+ - `daemonPid` 是工具层面和用户认知里的主标识
221
+ - `runnerEndpoint` 是重连 runner 的一手信息
222
+ - `runnerPid` 与 `processGroupId` 用于 stop 与 stale 清理辅助
223
+ - `daemonCommandLine + startTime` 用于抵御 pid reuse
224
+
225
+ 本设计不引入 `authToken`。恢复能力既然必须跨主进程存在,token 最终也必须落到可恢复状态里;在这种约束下,它对本地同用户攻击面的收益有限,不值得让协议与 meta 额外复杂化。安全边界主要依赖本机可达性与 endpoint 文件权限。
226
+
227
+ ---
228
+
229
+ ## stale 判定与清理
230
+
231
+ 主进程在使用 daemon reminder 时,按以下顺序恢复:
232
+
233
+ 1. 读取 reminder meta 中的 `runnerEndpoint`
234
+ 2. 尝试连接 runner 并发出 `ping` / `get_status`
235
+ 3. 若响应中的 `daemonPid`、`daemonCommandLine`、`startTime` 与 reminder 匹配,则视为健康 runner
236
+ 4. 若 endpoint 无法连接,则检查 `daemonPid` 当前对应的 OS 进程
237
+ 5. 若该进程不存在,则视为 daemon 已结束,drop reminder
238
+ 6. 若该进程存在,且命令行与启动时间仍与 reminder 相符,则视为 **stale daemon**
239
+ 7. 对 stale daemon 做清理性 kill,然后 drop reminder
240
+ 8. 若 pid 已被其他无关进程复用,或命令行/启动时间不匹配,则不得误杀,应直接把原 reminder 视为失效并 drop
241
+
242
+ 这里的关键判断是:
243
+
244
+ - **可连接的 runner** 才算真正“可恢复”
245
+ - **仅剩 daemon 进程但没有 runner**,不是“继续沿用”,而是 stale
246
+
247
+ 因为 scrollback buffer owner 是 runner,不是 daemon 本身。只剩 daemon 存活时,旧日志读取能力已经不可恢复。
248
+
249
+ ---
250
+
251
+ ## scrollback 语义
252
+
253
+ runner 维护两个独立滚动缓冲区:
254
+
255
+ - `stdout`
256
+ - `stderr`
257
+
258
+ 缓冲策略沿用“按行滚动保留”的语义即可:
259
+
260
+ - 每个流单独计算 `linesScrolledOut`
261
+ - `get_daemon_output` 返回时分别显示
262
+ - reminder 状态快照也分别展示
263
+
264
+ 只要 runner 还活着,主进程重连后就仍能读到同一份缓冲区;不再存在“主进程重启导致日志历史立即失忆”的问题。
265
+
266
+ ---
267
+
268
+ ## stdin 政策
269
+
270
+ 本轮明确将 daemon/std shell 命令视为**非交互执行**:
271
+
272
+ - runner 启动命令时,`stdin` 一律设为 `ignore`
273
+ - 不继承 Dominds 主进程的终端
274
+ - 不提供输入转发 API
275
+ - 不尝试维持“好像能交互,但其实没有人喂数据”的半交互状态
276
+
277
+ 这样做比旧实现更干净:
278
+
279
+ - 命令若依赖 stdin,会立即看到 EOF 或按其自身逻辑报错
280
+ - 命令不会因为等不到输入而无意义挂起
281
+ - 语义上也更符合当前 `os` 工具能力边界
282
+
283
+ 未来若需要支持交互命令,应另起一轮设计:
284
+
285
+ - 使用 PTY 作为 stdin/stdout/stderr 容器
286
+ - 暴露明确的输入写入 API
287
+ - 在工具层显式区分“普通 shell 命令”和“交互终端会话”
288
+
289
+ ---
290
+
291
+ ## 失败语义
292
+
293
+ 新设计必须 loud by default,不允许静默降级成“无输出”。
294
+
295
+ 典型场景:
296
+
297
+ - runner 连不上
298
+ - reminder 里的 pid 仍活着但 runner 已消失
299
+ - runner 返回的 daemon 身份信息与 reminder 不匹配
300
+ - 请求 `stdout=false, stderr=false`
301
+ - stop 后 daemon 仍拒绝退出
302
+
303
+ 这些情况都应给出明确错误或状态说明,至少在运行时日志与工具输出层面做到可诊断。尤其是:
304
+
305
+ - “runner 不可达但 daemon 还活着”必须暴露为 stale / unrecoverable,而不是伪装成空日志
306
+ - “pid 被复用成别的进程”必须显式识别,不能误把陌生进程当成旧 daemon
307
+
308
+ ---
309
+
310
+ ## 实施边界
311
+
312
+ 本次重构应一次性完成以下替换:
313
+
314
+ - `shell_cmd` 的 daemon 路径改由 runner 托管
315
+ - `get_daemon_output` 改为双 bool 契约
316
+ - `stop_daemon` 改为 runner-aware stop 链路
317
+ - daemon reminder meta 改为 runner-aware 契约
318
+ - 主进程内存态 daemon scrollback owner 逻辑整体删除
319
+
320
+ 不引入:
321
+
322
+ - 老 `stream` 参数兼容层
323
+ - 老 reminder meta 的长期兼容恢复
324
+ - “主进程内存 buffer + runner buffer” 双写双读
325
+
326
+ 旧实现下已经存在的 daemon reminder,在新实现落地后的处理原则应是:**不尝试恢复其历史 scrollback;在首次接触时按旧契约不可恢复对象处理并尽快清理。**
327
+
328
+ ---
329
+
330
+ ## 设计摘要
331
+
332
+ 这次改造的本质,不是给 `get_daemon_output` 补一个“崩溃后再猜一猜日志在哪”的修补逻辑,而是重新定义 daemon 机制的 owner:
333
+
334
+ - **命令执行 owner**:runner
335
+ - **scrollback owner**:runner
336
+ - **停止控制 owner**:runner 为主,主进程兜底
337
+ - **状态恢复 owner**:主进程凭 reminder 重连 runner
338
+
339
+ 只有把 owner 边界彻底搬对,主进程崩溃重启后的日志恢复能力才会真正成立。