openclaw-diag-cli 0.1.0 → 0.1.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +90 -200
- package/bin/openclaw-diag.js +67 -44
- package/ocdiag/__init__.py +1 -1
- package/ocdiag/dispatcher.py +120 -69
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -1,259 +1,149 @@
|
|
|
1
|
-
#
|
|
1
|
+
# OpenClaw 诊断工具箱
|
|
2
2
|
|
|
3
|
-
> OpenClaw / ArkClaw
|
|
3
|
+
> 排查 OpenClaw / ArkClaw 故障的只读 CLI。一组诊断、一个入口、零依赖。
|
|
4
4
|
|
|
5
|
-
##
|
|
6
|
-
|
|
7
|
-
无需 git clone,通过 npm 拉一份缓存即可(之后离线可用):
|
|
5
|
+
## 安装
|
|
8
6
|
|
|
9
7
|
```bash
|
|
10
|
-
#
|
|
11
|
-
npx openclaw-diag-cli
|
|
12
|
-
npx openclaw-diag-cli run gateway
|
|
13
|
-
npx openclaw-diag-cli run all --json | jq -s '.'
|
|
8
|
+
# 一次性运行(无需安装,npm 缓存后离线可用)
|
|
9
|
+
npx openclaw-diag-cli
|
|
14
10
|
|
|
15
|
-
# 装到 PATH
|
|
11
|
+
# 装到 PATH
|
|
16
12
|
npm install -g openclaw-diag-cli
|
|
17
|
-
openclaw-diag
|
|
18
|
-
openclaw-diag doctor # 检查环境是否就绪
|
|
19
|
-
openclaw-diag bundle gateway > gw.py # 生成单文件诊断脚本
|
|
13
|
+
openclaw-diag
|
|
20
14
|
```
|
|
21
15
|
|
|
22
|
-
依赖:Node 18
|
|
23
|
-
`python3` 并把参数透传给现有的 dispatcher,所以 `python3 diag/04_gateway.py`
|
|
24
|
-
和 `python3 bin/ocdiag run gateway` 仍然完全可用。
|
|
25
|
-
|
|
26
|
-
## 为什么存在
|
|
27
|
-
|
|
28
|
-
排查 OpenClaw 故障时面对的真实痛点:
|
|
29
|
-
|
|
30
|
-
- **数据散在多个角落**:session.jsonl 在 agents/ 下,配置在 openclaw.json,进程行为在 journalctl,cron 状态在 cron/jobs.json,模型耗时藏在 trajectory 里…… 手敲 jq + grep 组合费时且易漏。
|
|
31
|
-
- **`openclaw-diag.sh` 已成为 4391 行单体 bash**,里面塞着 10 段 heredoc 嵌入的 Python,难修改、难单测、难复用。
|
|
32
|
-
- **诊断脚本应该是"原子操作"**:每条数据有明确来源,每个模块解决一类问题,可以单独跑、可以组合管道、可以被自动化驱动。
|
|
33
|
-
|
|
34
|
-
这个仓库就是把那个 4391 行 bash 拆开重写——每个采集动作独立成一个 Python 脚本,按一组公理设计,让"采集 → 分析 → 上报"变成可推理的工程而不是手工活。
|
|
35
|
-
|
|
36
|
-
---
|
|
37
|
-
|
|
38
|
-
## 设计公理(First Principles)
|
|
39
|
-
|
|
40
|
-
下面 6 条是**不可让步**的硬约束。所有目录结构、API、输出格式都从这 6 条推导出来。
|
|
41
|
-
|
|
42
|
-
### 1. 只读(Read-Only)
|
|
43
|
-
诊断脚本**永远不能**修改文件、写配置、重启服务。代价:再难拿的数据也要靠"读"获得;不允许走 `openclaw <subcmd>` 修改类入口。
|
|
44
|
-
**收益**:在生产环境、在排查事故现场、在客户机器上跑都安全。
|
|
45
|
-
|
|
46
|
-
### 2. 零运行时依赖(Zero Runtime Dependencies)
|
|
47
|
-
**只用 Python 3.8+ 标准库**。不写 `requirements.txt`,不要 `pip install`。唯一例外:`croniter` 在 `06_cron_jobs.py` 中可选导入(缺失时退化到从历史 runs 推算间隔)。
|
|
48
|
-
**收益**:任何能跑 OpenClaw 的节点都能跑诊断(OpenClaw 自己依赖 Node.js,但诊断脚本不依赖 OpenClaw 装在 Python 端的任何包)。`git clone` 完直接 `python3 diag/04_gateway.py`。
|
|
16
|
+
依赖:Node 18+ 和 Python 3.8+。
|
|
49
17
|
|
|
50
|
-
|
|
51
|
-
**每个诊断脚本必须能单独跑通**,不依赖 dispatcher、不需要 source 任何 env 文件、不需要先执行别的脚本。
|
|
52
|
-
**推论**:脚本顶部用 `sys.path.insert(0, ...)` 把仓库根加进去再 import 共享库;不强制装包。
|
|
18
|
+
## 五分钟上手
|
|
53
19
|
|
|
54
|
-
|
|
55
|
-
|
|
56
|
-
-
|
|
57
|
-
- `bin/ocdiag run all --json` 输出 **NDJSON**(每行一个模块的 JSON),可以 `... | jq -s '.'` 聚合,或者 `... | jq 'select(.module=="cron_jobs") | .data'` 抽取。
|
|
58
|
-
**推论**:进度信息走 stderr,永远不污染 stdout 的 JSON 流。
|
|
59
|
-
|
|
60
|
-
### 5. 数据可靠(Data Fidelity)
|
|
61
|
-
脚本输出的每个数字、每个状态都必须能溯源:
|
|
62
|
-
- 系统数据 → `subprocess.run(["free","-m"])`、`/proc/<pid>/environ`、`journalctl ...`
|
|
63
|
-
- OpenClaw 数据 → `~/.openclaw/openclaw.json`、`~/.openclaw/cron/jobs.json`、`~/.openclaw/agents/*/sessions/*.jsonl`
|
|
64
|
-
- 日志数据 → `/tmp/openclaw/openclaw-*.log`(按 mtime 取今日)
|
|
65
|
-
|
|
66
|
-
数据来源在文档里逐模块列清,不允许"看上去合理就行"。同一字段,文本输出和 JSON 输出必须**值一致**。
|
|
20
|
+
```bash
|
|
21
|
+
# 1. 看看能做什么
|
|
22
|
+
openclaw-diag
|
|
67
23
|
|
|
68
|
-
|
|
69
|
-
-
|
|
70
|
-
- 单个数据源缺失(配置不存在、日志没生成、session 文件被删)**不能**抛异常,要明确报告"未找到"。
|
|
71
|
-
- 不要 swallow 异常变 silent:失败要在 stderr 留 traceback,rc 非 0。
|
|
24
|
+
# 2. 检查环境是否就绪
|
|
25
|
+
openclaw-diag doctor
|
|
72
26
|
|
|
73
|
-
|
|
27
|
+
# 3. 跑某个诊断
|
|
28
|
+
openclaw-diag gateway
|
|
74
29
|
|
|
75
|
-
|
|
30
|
+
# 4. 全部 state collectors 跑一遍(任一崩了不影响其他)
|
|
31
|
+
openclaw-diag all
|
|
76
32
|
|
|
77
|
-
|
|
78
|
-
openclaw-diag
|
|
79
|
-
├── ocdiag/ 共享原语(公理 #2 推论:库小而稳)
|
|
80
|
-
│ ├── paths.py 路径常量 + 环境变量覆盖
|
|
81
|
-
│ ├── jsonlog.py OpenClaw JSON 日志解析(公理 #5)
|
|
82
|
-
│ ├── timeutil.py ISO/epoch 时间转换 + 人类友好格式化
|
|
83
|
-
│ ├── tokens.py fmt_tokens / percentile / human_size
|
|
84
|
-
│ ├── sensitive.py 密钥/Token 脱敏(公理 #1 的延伸:输出也要安全)
|
|
85
|
-
│ ├── output.py 双模式输出(人类可读 + JSON)— 公理 #4 实现
|
|
86
|
-
│ ├── recent_logs.py 发现今日更新日志
|
|
87
|
-
│ ├── cli.py 公共 argparse(--config / --log-dir / --json)
|
|
88
|
-
│ └── dispatcher.py bin/ocdiag 复用的入口
|
|
89
|
-
│
|
|
90
|
-
├── diag/ 诊断模块(公理 #3:每个能独立跑)
|
|
91
|
-
│ ├── 01_sys_health.py 系统健康(DNS/网络/CPU/内存/磁盘/IO/进程/时间同步)
|
|
92
|
-
│ ├── 02_environment.py OpenClaw 基础环境(版本一致性、Gateway 进程 env)
|
|
93
|
-
│ ├── 03_configuration.py openclaw.json 展平(脱敏后)
|
|
94
|
-
│ ├── 04_gateway.py Gateway 状态(WS 生命周期 + 错误码统一视图)
|
|
95
|
-
│ ├── 05_recent_errors.py 近期错误(多日志聚合 + journalctl + tool 错误)
|
|
96
|
-
│ ├── 06_cron_jobs.py 定时任务(jobs.json + state + runs/ 三源合并)
|
|
97
|
-
│ ├── 07_performance.py 模型/工具性能(慢调用 Top 20 / E2E 延迟 / Cache)
|
|
98
|
-
│ ├── 08_sessions.py Session 数据(六维分析 + Stuck 探测)
|
|
99
|
-
│ ├── 09_plugin_diag.py 插件诊断(一致性 + ERROR/WARN + Hook + Channel + DNS)
|
|
100
|
-
│ └── 10_shell_history.py Shell 历史(高危命令 + openclaw 命令)
|
|
101
|
-
│
|
|
102
|
-
├── tools/ 单点深挖工具(不是采集,是分析特定对象)
|
|
103
|
-
│ ├── oc_session_trace.py 跟踪一条 user 消息从进入到响应的完整时间轴
|
|
104
|
-
│ └── oc_session_extract.py 把 session jsonl 导出为可读格式(含 reset/bak/deleted 全状态)
|
|
105
|
-
│
|
|
106
|
-
└── bin/
|
|
107
|
-
└── ocdiag 可选的总入口(list / run <id> / run all)
|
|
33
|
+
# 5. 输出结构化 JSON
|
|
34
|
+
openclaw-diag gateway --json
|
|
108
35
|
```
|
|
109
36
|
|
|
110
|
-
|
|
37
|
+
## 诊断列表
|
|
111
38
|
|
|
112
|
-
|
|
39
|
+
诊断按"是否需要参数"分两类。
|
|
113
40
|
|
|
114
|
-
|
|
41
|
+
### State collectors(无需参数,扫一遍系统当前状态)
|
|
115
42
|
|
|
116
|
-
|
|
|
43
|
+
| 诊断 | 看什么 |
|
|
117
44
|
|---|---|
|
|
118
|
-
|
|
|
119
|
-
|
|
|
120
|
-
|
|
|
121
|
-
|
|
|
122
|
-
|
|
|
123
|
-
|
|
|
124
|
-
|
|
|
125
|
-
|
|
|
126
|
-
|
|
|
127
|
-
|
|
|
128
|
-
|
|
129
|
-
|
|
130
|
-
|
|
131
|
-
|
|
132
|
-
|
|
133
|
-
|
|
45
|
+
| `sys_health` | DNS / 网络 / CPU / 内存 / 磁盘 / IO / 进程 / 时间同步 |
|
|
46
|
+
| `environment` | OpenClaw 版本一致性、Gateway 进程环境变量 |
|
|
47
|
+
| `configuration` | `openclaw.json` 展平(敏感字段已脱敏) |
|
|
48
|
+
| `gateway` | Gateway 进程、端口、24h 启停、WS 生命周期、错误码 |
|
|
49
|
+
| `recent_errors` | 应用日志 / journalctl / session 工具调用错误聚合 |
|
|
50
|
+
| `cron_jobs` | 定时任务状态、连续失败、调度漂移、静默检测 |
|
|
51
|
+
| `performance` | 模型/工具耗时 P50/P95、慢调用 Top 20、E2E 延迟、Cache 命中率 |
|
|
52
|
+
| `sessions` | Session 总览、活跃度、Stuck 探测 |
|
|
53
|
+
| `plugin_diag` | 插件状态一致性、ERROR/WARN、Hook 异常、Channel、外部依赖 DNS |
|
|
54
|
+
| `shell_history` | 高危命令、openclaw 命令、最近操作 |
|
|
55
|
+
|
|
56
|
+
### Object inspectors(需要 session uuid,深挖一个具体对象)
|
|
57
|
+
|
|
58
|
+
| 诊断 | 看什么 |
|
|
59
|
+
|---|---|
|
|
60
|
+
| `trace <uuid>` | 追踪一条用户消息从进入到响应的完整时间轴 |
|
|
61
|
+
| `extract <uuid>` | 导出 session.jsonl 为可读格式(reset / bak / deleted 全状态) |
|
|
134
62
|
|
|
135
|
-
###
|
|
63
|
+
### Meta
|
|
136
64
|
|
|
137
|
-
|
|
138
|
-
|
|
139
|
-
|
|
140
|
-
|
|
141
|
-
|
|
142
|
-
|
|
65
|
+
| 命令 | 作用 |
|
|
66
|
+
|---|---|
|
|
67
|
+
| `openclaw-diag all` | 跑全部 state collectors |
|
|
68
|
+
| `openclaw-diag list` | 列出所有诊断 |
|
|
69
|
+
| `openclaw-diag doctor` | 检查 Node / Python / ocdiag / OpenClaw 环境 |
|
|
70
|
+
| `openclaw-diag bundle <id>` | 打成 self-contained 单文件 .py |
|
|
143
71
|
|
|
144
|
-
|
|
72
|
+
## 常见配方
|
|
145
73
|
|
|
146
74
|
```bash
|
|
147
|
-
|
|
148
|
-
|
|
149
|
-
python3 bin/ocdiag run all # 全部跑一遍(任一模块崩了不影响其他)
|
|
150
|
-
python3 bin/ocdiag run all --skip performance,sessions # 跳过重模块
|
|
151
|
-
```
|
|
75
|
+
# 找出哪个 cron 任务在连续失败
|
|
76
|
+
openclaw-diag cron_jobs --json | jq '.data.jobs[] | select(.status!="ok")'
|
|
152
77
|
|
|
153
|
-
|
|
78
|
+
# 看哪个模型的 P95 延迟最高
|
|
79
|
+
openclaw-diag performance | grep -A1 "P95"
|
|
154
80
|
|
|
155
|
-
|
|
156
|
-
|
|
157
|
-
npx openclaw-diag-cli run gateway --json
|
|
158
|
-
npx openclaw-diag-cli run all --skip performance,sessions
|
|
159
|
-
npx openclaw-diag-cli doctor # 检查 Node/Python/ocdiag/OpenClaw
|
|
160
|
-
npx openclaw-diag-cli bundle 04_gateway > standalone-gateway.py
|
|
161
|
-
```
|
|
81
|
+
# 哪些插件今天有 ERROR
|
|
82
|
+
openclaw-diag plugin_diag --json | jq '.data.plugin_errors | to_entries[] | select(.value.error_count > 0)'
|
|
162
83
|
|
|
163
|
-
|
|
84
|
+
# 把所有诊断聚合成单个 JSON 报告
|
|
85
|
+
openclaw-diag all --json 2>/dev/null | jq -s '.' > report.json
|
|
164
86
|
|
|
165
|
-
|
|
166
|
-
|
|
167
|
-
python3 diag/06_cron_jobs.py --json | jq '.data.jobs | length'
|
|
87
|
+
# 找出有 stuck session 的事件
|
|
88
|
+
openclaw-diag sessions --json | jq '.data.stuck_sessions'
|
|
168
89
|
|
|
169
|
-
#
|
|
170
|
-
|
|
90
|
+
# 追踪用户消息时间轴
|
|
91
|
+
openclaw-diag trace <session-uuid> --msg-index 0
|
|
171
92
|
|
|
172
|
-
#
|
|
173
|
-
|
|
174
|
-
|
|
175
|
-
# 4) 提取所有 cron 任务的成功率
|
|
176
|
-
python3 bin/ocdiag run all --json 2>/dev/null \
|
|
177
|
-
| jq 'select(.module=="cron_jobs") | .data.jobs[] | {name, success_rate}'
|
|
93
|
+
# 导出 session 为可读格式
|
|
94
|
+
openclaw-diag extract <session-uuid> --summary
|
|
178
95
|
```
|
|
179
96
|
|
|
180
|
-
|
|
97
|
+
## 离线机器:bundle 出单文件
|
|
181
98
|
|
|
182
99
|
```bash
|
|
183
|
-
#
|
|
184
|
-
|
|
100
|
+
# 在有网的机器
|
|
101
|
+
openclaw-diag bundle gateway > standalone-gateway.py
|
|
185
102
|
|
|
186
|
-
#
|
|
187
|
-
|
|
188
|
-
python3
|
|
103
|
+
# scp 到目标机器(只需要 Python 3.8+,无需安装任何东西)
|
|
104
|
+
scp standalone-gateway.py prod-server:/tmp/
|
|
105
|
+
ssh prod-server "python3 /tmp/standalone-gateway.py --json"
|
|
189
106
|
```
|
|
190
107
|
|
|
191
|
-
|
|
108
|
+
`bundle` 会把脚本和它依赖的共享代码合并成一个 self-contained `.py`,零依赖。
|
|
109
|
+
|
|
110
|
+
## 配置覆盖
|
|
192
111
|
|
|
193
|
-
|
|
112
|
+
诊断别人机器或容器时,无需改代码:
|
|
194
113
|
|
|
195
|
-
|
|
|
114
|
+
| 环境变量 | 默认值 | 说明 |
|
|
196
115
|
|---|---|---|
|
|
197
116
|
| `OPENCLAW_HOME` | `~/.openclaw` | OpenClaw 主目录 |
|
|
198
117
|
| `OPENCLAW_CONFIG` | `$OPENCLAW_HOME/openclaw.json` | 配置文件 |
|
|
199
118
|
| `OPENCLAW_LOG_DIR` | `/tmp/openclaw` | 日志目录 |
|
|
200
119
|
| `OPENCLAW_SESSIONS` | `$OPENCLAW_HOME/agents` | Session 根 |
|
|
201
|
-
| `OPENCLAW_SERVICE_FILE` | `~/.config/systemd/user/openclaw-gateway.service` | systemd 服务单元 |
|
|
202
120
|
|
|
203
|
-
也可以用 `--config /path/to/
|
|
121
|
+
也可以用 `--config /path/to/file --log-dir /path/to/logs` 覆盖单次。
|
|
204
122
|
|
|
205
|
-
|
|
206
|
-
|
|
207
|
-
## 退出码与错误隔离
|
|
123
|
+
## 退出码
|
|
208
124
|
|
|
209
125
|
| rc | 含义 |
|
|
210
126
|
|---|---|
|
|
211
|
-
| 0 |
|
|
212
|
-
| 1 |
|
|
213
|
-
| 2 |
|
|
214
|
-
|
|
215
|
-
`bin/ocdiag run all` 的总 rc 取最大值;任一模块崩溃 stderr 留 traceback,但 stdout 流仍完整。
|
|
216
|
-
|
|
217
|
-
---
|
|
218
|
-
|
|
219
|
-
## 扩展:加一个新诊断模块
|
|
220
|
-
|
|
221
|
-
遵循公理即可:
|
|
127
|
+
| 0 | 诊断成功 |
|
|
128
|
+
| 1 | 诊断运行成功但报告 `status: "error"`(数据源缺失等) |
|
|
129
|
+
| 2 | 诊断崩溃(已隔离,不影响 `all`) |
|
|
222
130
|
|
|
223
|
-
|
|
224
|
-
2. 顶部 docstring 说明:**采集什么 + 数据来源 + 输出含义**
|
|
225
|
-
3. `sys.path.insert(0, str(Path(__file__).resolve().parent.parent))` 接入共享库
|
|
226
|
-
4. `from ocdiag import cli, output, paths` 拿到统一基础设施
|
|
227
|
-
5. `parser = cli.build_common_parser(...); args = parser.parse_args()`
|
|
228
|
-
6. `out = output.init("my_check", json_mode=args.json, ...)`
|
|
229
|
-
7. 业务逻辑——文本输出用 `out.item / out.evidence / out.section`,JSON 数据用 `out.set_data("key", value)`
|
|
230
|
-
8. 流式读 JSONL(`for line in open(...)`),不能 `.read().split('\n')`
|
|
231
|
-
9. 子进程调用必须带 `timeout`
|
|
232
|
-
10. 数据源缺失要明确报告"未找到",不抛异常
|
|
131
|
+
## 设计原则
|
|
233
132
|
|
|
234
|
-
|
|
235
|
-
|
|
236
|
-
---
|
|
237
|
-
|
|
238
|
-
## 不做的事(反模式)
|
|
239
|
-
|
|
240
|
-
| 不做 | 原因 |
|
|
133
|
+
| | |
|
|
241
134
|
|---|---|
|
|
242
|
-
|
|
|
243
|
-
|
|
|
244
|
-
|
|
|
245
|
-
|
|
|
246
|
-
|
|
|
247
|
-
| 不引入 jq 子进程 | Python 自带 json,更可控 |
|
|
248
|
-
| 不内嵌 Python 在 bash heredoc 里 | 这就是我们要替代的旧形态 |
|
|
249
|
-
|
|
250
|
-
---
|
|
135
|
+
| **只读** | 永远不修改文件、不重启服务 |
|
|
136
|
+
| **零依赖** | 仅 Python 3.8+ 标准库 |
|
|
137
|
+
| **故障隔离** | 单诊断崩溃不带崩 `all` |
|
|
138
|
+
| **数据可靠** | 每个字段都能溯源 |
|
|
139
|
+
| **可组合** | 文本 + JSON 双输出,stderr 与 stdout 分流 |
|
|
251
140
|
|
|
252
|
-
|
|
141
|
+
详细设计 → [docs/DESIGN.md](docs/DESIGN.md)(公理推导、目录结构、扩展指南)
|
|
253
142
|
|
|
254
|
-
|
|
143
|
+
## 反馈
|
|
255
144
|
|
|
256
|
-
|
|
145
|
+
- Issues: https://github.com/wujiaming88/openclaw-diag-cli/issues
|
|
146
|
+
- 来源:从 4391 行的 `openclaw-diag.sh` 拆分重写
|
|
257
147
|
|
|
258
148
|
## License
|
|
259
149
|
|
package/bin/openclaw-diag.js
CHANGED
|
@@ -17,6 +17,22 @@ const PYTHON_CANDIDATES = process.platform === 'win32'
|
|
|
17
17
|
? ['python3', 'python', 'py']
|
|
18
18
|
: ['python3', 'python'];
|
|
19
19
|
|
|
20
|
+
// Keep these in sync with ocdiag/dispatcher.py.
|
|
21
|
+
const STATE_COLLECTORS = [
|
|
22
|
+
'sys_health', 'environment', 'configuration', 'gateway', 'recent_errors',
|
|
23
|
+
'cron_jobs', 'performance', 'sessions', 'plugin_diag', 'shell_history',
|
|
24
|
+
];
|
|
25
|
+
const OBJECT_INSPECTORS = ['trace', 'extract'];
|
|
26
|
+
const MODULE_IDS = new Set([...STATE_COLLECTORS, ...OBJECT_INSPECTORS]);
|
|
27
|
+
|
|
28
|
+
const STATE_SCRIPTS = [
|
|
29
|
+
'01_sys_health.py', '02_environment.py', '03_configuration.py',
|
|
30
|
+
'04_gateway.py', '05_recent_errors.py', '06_cron_jobs.py',
|
|
31
|
+
'07_performance.py', '08_sessions.py', '09_plugin_diag.py',
|
|
32
|
+
'10_shell_history.py',
|
|
33
|
+
];
|
|
34
|
+
const OBJECT_SCRIPTS = ['oc_session_trace.py', 'oc_session_extract.py'];
|
|
35
|
+
|
|
20
36
|
function findPython() {
|
|
21
37
|
for (const cmd of PYTHON_CANDIDATES) {
|
|
22
38
|
try {
|
|
@@ -51,23 +67,26 @@ function printVersion() {
|
|
|
51
67
|
|
|
52
68
|
function printHelp() {
|
|
53
69
|
const lines = [
|
|
54
|
-
'openclaw-diag — OpenClaw / ArkClaw
|
|
70
|
+
'openclaw-diag — OpenClaw / ArkClaw 诊断工具箱',
|
|
55
71
|
'',
|
|
56
72
|
'Usage:',
|
|
57
|
-
' openclaw-diag
|
|
58
|
-
' openclaw-diag list
|
|
59
|
-
' openclaw-diag
|
|
60
|
-
' openclaw-diag
|
|
61
|
-
' openclaw-diag
|
|
62
|
-
' openclaw-diag bundle <id>
|
|
63
|
-
' openclaw-diag doctor [--json]
|
|
64
|
-
' openclaw-diag --version
|
|
65
|
-
' openclaw-diag --help
|
|
73
|
+
' openclaw-diag 打印 banner + 诊断目录',
|
|
74
|
+
' openclaw-diag list 列出全部诊断(按类型分组)',
|
|
75
|
+
' openclaw-diag <id> [args...] 跑单个诊断',
|
|
76
|
+
' openclaw-diag all [--skip a,b] 跑全部 state collectors',
|
|
77
|
+
' openclaw-diag all [--json] NDJSON 聚合输出',
|
|
78
|
+
' openclaw-diag bundle <id> 打成 self-contained 单文件 .py',
|
|
79
|
+
' openclaw-diag doctor [--json] 检查 Node / Python / ocdiag / OpenClaw env',
|
|
80
|
+
' openclaw-diag --version 打印版本号',
|
|
81
|
+
' openclaw-diag --help 本帮助',
|
|
66
82
|
'',
|
|
67
|
-
'
|
|
68
|
-
'
|
|
83
|
+
'State collectors (无需参数):',
|
|
84
|
+
' ' + STATE_COLLECTORS.join(' '),
|
|
69
85
|
'',
|
|
70
|
-
'
|
|
86
|
+
'Object inspectors (需要 session uuid):',
|
|
87
|
+
' ' + OBJECT_INSPECTORS.join(' '),
|
|
88
|
+
'',
|
|
89
|
+
'透传给诊断脚本: --config --log-dir --json --no-color',
|
|
71
90
|
];
|
|
72
91
|
console.log(lines.join('\n'));
|
|
73
92
|
}
|
|
@@ -90,15 +109,10 @@ function runDispatcher(args) {
|
|
|
90
109
|
});
|
|
91
110
|
}
|
|
92
111
|
|
|
93
|
-
function
|
|
94
|
-
if (args.length === 0) {
|
|
95
|
-
console.error('Error: bundle requires a module id (e.g. `openclaw-diag bundle gateway`)');
|
|
96
|
-
process.exit(2);
|
|
97
|
-
}
|
|
112
|
+
function runScript(scriptPath, args) {
|
|
98
113
|
const py = findPython();
|
|
99
114
|
if (!py) pythonNotFound();
|
|
100
|
-
const
|
|
101
|
-
const child = spawn(py.cmd, [bundleScript, ...args], { stdio: 'inherit' });
|
|
115
|
+
const child = spawn(py.cmd, [scriptPath, ...args], { stdio: 'inherit' });
|
|
102
116
|
child.on('error', (err) => {
|
|
103
117
|
console.error(`Error: failed to spawn ${py.cmd}: ${err.message}`);
|
|
104
118
|
process.exit(1);
|
|
@@ -112,14 +126,15 @@ function runBundle(args) {
|
|
|
112
126
|
});
|
|
113
127
|
}
|
|
114
128
|
|
|
115
|
-
|
|
129
|
+
function runBundle(args) {
|
|
130
|
+
if (args.length === 0) {
|
|
131
|
+
console.error('Error: bundle requires a module id (e.g. `openclaw-diag bundle gateway`)');
|
|
132
|
+
process.exit(2);
|
|
133
|
+
}
|
|
134
|
+
runScript(path.join(REPO_ROOT, 'lib', 'bundle.py'), args);
|
|
135
|
+
}
|
|
116
136
|
|
|
117
|
-
|
|
118
|
-
'01_sys_health.py', '02_environment.py', '03_configuration.py',
|
|
119
|
-
'04_gateway.py', '05_recent_errors.py', '06_cron_jobs.py',
|
|
120
|
-
'07_performance.py', '08_sessions.py', '09_plugin_diag.py',
|
|
121
|
-
'10_shell_history.py',
|
|
122
|
-
];
|
|
137
|
+
// ── doctor ──
|
|
123
138
|
|
|
124
139
|
function nodeVersionOk() {
|
|
125
140
|
const m = process.versions.node.match(/^(\d+)\./);
|
|
@@ -143,17 +158,20 @@ function checkOcdiagImport(pyCmd) {
|
|
|
143
158
|
|
|
144
159
|
function checkDiagScripts(pyCmd) {
|
|
145
160
|
const failed = [];
|
|
146
|
-
|
|
147
|
-
|
|
148
|
-
|
|
161
|
+
const all = [
|
|
162
|
+
...STATE_SCRIPTS.map((n) => ({ name: n, path: path.join(REPO_ROOT, 'diag', n) })),
|
|
163
|
+
...OBJECT_SCRIPTS.map((n) => ({ name: n, path: path.join(REPO_ROOT, 'tools', n) })),
|
|
164
|
+
];
|
|
165
|
+
for (const item of all) {
|
|
166
|
+
const r = spawnSync(pyCmd, [item.path, '--help'], {
|
|
149
167
|
stdio: ['ignore', 'pipe', 'pipe'],
|
|
150
168
|
timeout: 10000,
|
|
151
169
|
});
|
|
152
170
|
if (r.status !== 0) {
|
|
153
|
-
failed.push({ script: name, status: r.status, stderr: ((r.stderr || '').toString().trim()).slice(0, 200) });
|
|
171
|
+
failed.push({ script: item.name, status: r.status, stderr: ((r.stderr || '').toString().trim()).slice(0, 200) });
|
|
154
172
|
}
|
|
155
173
|
}
|
|
156
|
-
return failed;
|
|
174
|
+
return { failed, total: all.length };
|
|
157
175
|
}
|
|
158
176
|
|
|
159
177
|
function checkOpenclawConfig() {
|
|
@@ -190,10 +208,10 @@ function runDoctor(args) {
|
|
|
190
208
|
const ocdiag = checkOcdiagImport(py.cmd);
|
|
191
209
|
result.ocdiag = ocdiag;
|
|
192
210
|
|
|
193
|
-
const failed = checkDiagScripts(py.cmd);
|
|
211
|
+
const { failed, total } = checkDiagScripts(py.cmd);
|
|
194
212
|
result.diag_scripts = {
|
|
195
213
|
ok: failed.length === 0,
|
|
196
|
-
total
|
|
214
|
+
total,
|
|
197
215
|
failed,
|
|
198
216
|
};
|
|
199
217
|
|
|
@@ -214,9 +232,9 @@ function runDoctor(args) {
|
|
|
214
232
|
}
|
|
215
233
|
}
|
|
216
234
|
if (failed.length === 0) {
|
|
217
|
-
console.log(`✓ All ${
|
|
235
|
+
console.log(`✓ All ${total} diagnostics respond to --help`);
|
|
218
236
|
} else {
|
|
219
|
-
console.log(`✗ ${failed.length}/${
|
|
237
|
+
console.log(`✗ ${failed.length}/${total} diagnostics failed --help:`);
|
|
220
238
|
for (const f of failed) {
|
|
221
239
|
console.log(` ${f.script} (rc=${f.status})`);
|
|
222
240
|
}
|
|
@@ -238,15 +256,20 @@ function main() {
|
|
|
238
256
|
const argv = process.argv.slice(2);
|
|
239
257
|
|
|
240
258
|
if (argv.length === 0) {
|
|
241
|
-
|
|
259
|
+
const py = findPython();
|
|
260
|
+
if (!py) pythonNotFound();
|
|
261
|
+
console.log(`openclaw-diag v${PKG.version} — OpenClaw / ArkClaw 诊断工具箱`);
|
|
242
262
|
console.log('');
|
|
243
|
-
|
|
244
|
-
|
|
245
|
-
console.log(' npx openclaw-diag-cli doctor 检查环境是否就绪');
|
|
246
|
-
console.log(' npx openclaw-diag-cli --help 查看完整帮助');
|
|
263
|
+
const dispatcher = path.join(REPO_ROOT, 'bin', 'ocdiag');
|
|
264
|
+
spawnSync(py.cmd, [dispatcher, 'list'], { stdio: 'inherit' });
|
|
247
265
|
console.log('');
|
|
248
|
-
|
|
249
|
-
|
|
266
|
+
console.log('常用命令:');
|
|
267
|
+
console.log(' openclaw-diag gateway 跑单个 state collector');
|
|
268
|
+
console.log(' openclaw-diag all 全部 state collectors');
|
|
269
|
+
console.log(' openclaw-diag trace <uuid> 追踪一条用户消息');
|
|
270
|
+
console.log(' openclaw-diag doctor 检查环境');
|
|
271
|
+
console.log(' openclaw-diag --help 完整帮助');
|
|
272
|
+
process.exit(0);
|
|
250
273
|
}
|
|
251
274
|
|
|
252
275
|
const head = argv[0];
|
|
@@ -268,7 +291,7 @@ function main() {
|
|
|
268
291
|
return;
|
|
269
292
|
}
|
|
270
293
|
|
|
271
|
-
// Pass through everything else
|
|
294
|
+
// Pass through everything else (flat ids, `all`, `list`, `run` alias, unknown) to dispatcher.
|
|
272
295
|
runDispatcher(argv);
|
|
273
296
|
}
|
|
274
297
|
|
package/ocdiag/__init__.py
CHANGED
package/ocdiag/dispatcher.py
CHANGED
|
@@ -1,8 +1,15 @@
|
|
|
1
|
-
"""Dispatcher:
|
|
1
|
+
"""Dispatcher: every diagnostic is a top-level subcommand.
|
|
2
|
+
|
|
3
|
+
Layout:
|
|
4
|
+
ocdiag <state-collector> runs that collector (e.g. `ocdiag gateway`)
|
|
5
|
+
ocdiag <object-inspector> ARG runs that inspector (e.g. `ocdiag trace UUID`)
|
|
6
|
+
ocdiag all [--skip a,b] runs every state collector
|
|
7
|
+
ocdiag list prints the catalogue grouped by parameter mode
|
|
8
|
+
ocdiag run <id> [args...] legacy alias retained for 0.1.x users
|
|
9
|
+
"""
|
|
2
10
|
|
|
3
11
|
from __future__ import annotations
|
|
4
12
|
|
|
5
|
-
import argparse
|
|
6
13
|
import os
|
|
7
14
|
import runpy
|
|
8
15
|
import sys
|
|
@@ -13,29 +20,47 @@ from typing import List
|
|
|
13
20
|
|
|
14
21
|
REPO_ROOT = Path(__file__).resolve().parent.parent
|
|
15
22
|
|
|
16
|
-
#
|
|
17
|
-
|
|
18
|
-
("sys_health", "系统健康检查",
|
|
19
|
-
("environment", "
|
|
20
|
-
("configuration", "
|
|
21
|
-
("gateway", "
|
|
22
|
-
("recent_errors", "
|
|
23
|
-
("cron_jobs", "
|
|
24
|
-
("performance", "
|
|
25
|
-
("sessions", "
|
|
26
|
-
("plugin_diag", "
|
|
27
|
-
("shell_history", "
|
|
23
|
+
# State collectors: zero required args, parameter-free observation of system state.
|
|
24
|
+
STATE_COLLECTORS = [
|
|
25
|
+
("sys_health", "系统健康检查", "diag/01_sys_health.py"),
|
|
26
|
+
("environment", "OpenClaw 基础环境", "diag/02_environment.py"),
|
|
27
|
+
("configuration", "配置展平(脱敏)", "diag/03_configuration.py"),
|
|
28
|
+
("gateway", "Gateway 状态", "diag/04_gateway.py"),
|
|
29
|
+
("recent_errors", "近期错误聚合", "diag/05_recent_errors.py"),
|
|
30
|
+
("cron_jobs", "定时任务状态", "diag/06_cron_jobs.py"),
|
|
31
|
+
("performance", "模型/工具性能", "diag/07_performance.py"),
|
|
32
|
+
("sessions", "Session 数据", "diag/08_sessions.py"),
|
|
33
|
+
("plugin_diag", "插件诊断", "diag/09_plugin_diag.py"),
|
|
34
|
+
("shell_history", "Shell 历史", "diag/10_shell_history.py"),
|
|
35
|
+
]
|
|
36
|
+
|
|
37
|
+
# Object inspectors: take a session uuid (or other identifier) and inspect it.
|
|
38
|
+
OBJECT_INSPECTORS = [
|
|
39
|
+
("trace", "追踪用户消息时间轴", "tools/oc_session_trace.py"),
|
|
40
|
+
("extract", "导出 session 为可读格式", "tools/oc_session_extract.py"),
|
|
28
41
|
]
|
|
29
42
|
|
|
30
|
-
|
|
43
|
+
STATE_BY_ID = {mid: (label, script) for mid, label, script in STATE_COLLECTORS}
|
|
44
|
+
OBJECT_BY_ID = {mid: (label, script) for mid, label, script in OBJECT_INSPECTORS}
|
|
45
|
+
MODULE_BY_ID = {**STATE_BY_ID, **OBJECT_BY_ID}
|
|
46
|
+
MODULE_IDS = set(MODULE_BY_ID.keys())
|
|
31
47
|
|
|
32
48
|
|
|
33
49
|
def cmd_list() -> int:
|
|
34
|
-
print("Available
|
|
35
|
-
|
|
36
|
-
|
|
50
|
+
print("Available diagnostics:")
|
|
51
|
+
print()
|
|
52
|
+
print(" State collectors (no args needed):")
|
|
53
|
+
for mid, label, _ in STATE_COLLECTORS:
|
|
54
|
+
print(f" {mid:<16s} {label}")
|
|
37
55
|
print()
|
|
38
|
-
print("
|
|
56
|
+
print(" Object inspectors (require session uuid):")
|
|
57
|
+
for mid, label, _ in OBJECT_INSPECTORS:
|
|
58
|
+
print(f" {mid:<16s} {label}")
|
|
59
|
+
print()
|
|
60
|
+
print(" Meta:")
|
|
61
|
+
print(" all 跑全部 state collectors")
|
|
62
|
+
print(" doctor 检查 Node/Python/OpenClaw 环境")
|
|
63
|
+
print(" bundle <id> 打包成 self-contained 单文件")
|
|
39
64
|
return 0
|
|
40
65
|
|
|
41
66
|
|
|
@@ -64,72 +89,98 @@ def run_script(script_rel: str, extra_args: List[str]) -> int:
|
|
|
64
89
|
sys.argv = saved_argv
|
|
65
90
|
|
|
66
91
|
|
|
67
|
-
def
|
|
92
|
+
def cmd_all(extra_args: List[str], skip_ids: List[str]) -> int:
|
|
68
93
|
json_mode = "--json" in extra_args
|
|
69
94
|
progress_stream = sys.stderr if json_mode else sys.stdout
|
|
70
|
-
|
|
71
|
-
|
|
72
|
-
|
|
73
|
-
|
|
74
|
-
|
|
75
|
-
|
|
76
|
-
|
|
77
|
-
|
|
78
|
-
|
|
79
|
-
|
|
80
|
-
|
|
81
|
-
|
|
82
|
-
|
|
83
|
-
|
|
84
|
-
|
|
85
|
-
|
|
86
|
-
|
|
87
|
-
|
|
88
|
-
|
|
89
|
-
|
|
90
|
-
|
|
95
|
+
rc_overall = 0
|
|
96
|
+
total = sum(1 for mid, _, _ in STATE_COLLECTORS if mid not in skip_ids)
|
|
97
|
+
n = 0
|
|
98
|
+
for mid, label, script in STATE_COLLECTORS:
|
|
99
|
+
if mid in skip_ids:
|
|
100
|
+
continue
|
|
101
|
+
n += 1
|
|
102
|
+
print(f"\n[{n}/{total}] {label} ({mid})...", flush=True, file=progress_stream)
|
|
103
|
+
t0 = time.time()
|
|
104
|
+
rc = run_script(script, extra_args)
|
|
105
|
+
elapsed = time.time() - t0
|
|
106
|
+
print(f"[{n}/{total}] {label} ({mid}) ... done ({elapsed:.1f}s)",
|
|
107
|
+
flush=True, file=progress_stream)
|
|
108
|
+
if rc != 0:
|
|
109
|
+
rc_overall = rc
|
|
110
|
+
return rc_overall
|
|
111
|
+
|
|
112
|
+
|
|
113
|
+
def _split_skip(rest: List[str]) -> (List[str], List[str]):
|
|
114
|
+
"""Pull out --skip a,b out of an argv tail; return (skip_ids, passthrough)."""
|
|
115
|
+
skip_ids: List[str] = []
|
|
116
|
+
passthrough: List[str] = []
|
|
117
|
+
i = 0
|
|
118
|
+
while i < len(rest):
|
|
119
|
+
a = rest[i]
|
|
120
|
+
if a == "--skip" and i + 1 < len(rest):
|
|
121
|
+
skip_ids.extend(s.strip() for s in rest[i + 1].split(",") if s.strip())
|
|
122
|
+
i += 2
|
|
123
|
+
continue
|
|
124
|
+
passthrough.append(a)
|
|
125
|
+
i += 1
|
|
126
|
+
return skip_ids, passthrough
|
|
127
|
+
|
|
128
|
+
|
|
129
|
+
def print_help() -> None:
|
|
130
|
+
print("ocdiag — OpenClaw 诊断工具箱")
|
|
131
|
+
print()
|
|
132
|
+
print("Usage:")
|
|
133
|
+
print(" ocdiag <id> [args...] 跑单个诊断(state collector 或 object inspector)")
|
|
134
|
+
print(" ocdiag all [--skip a,b] 跑全部 state collectors")
|
|
135
|
+
print(" ocdiag list 列出所有诊断")
|
|
136
|
+
print(" ocdiag run <id> [args...] 旧用法别名(0.1.x 兼容)")
|
|
137
|
+
print()
|
|
138
|
+
print("State collectors:")
|
|
139
|
+
print(" " + " ".join(mid for mid, _, _ in STATE_COLLECTORS))
|
|
140
|
+
print("Object inspectors:")
|
|
141
|
+
print(" " + " ".join(mid for mid, _, _ in OBJECT_INSPECTORS))
|
|
142
|
+
print()
|
|
143
|
+
print("--skip 后接逗号分隔 id 列表(仅对 all 有意义)。")
|
|
144
|
+
print("其它参数(--config / --log-dir / --json / --no-color)原样传递给脚本。")
|
|
91
145
|
|
|
92
146
|
|
|
93
147
|
def main(argv=None) -> int:
|
|
94
148
|
argv = list(sys.argv[1:] if argv is None else argv)
|
|
95
149
|
|
|
96
150
|
if not argv or argv[0] in ("-h", "--help"):
|
|
97
|
-
|
|
98
|
-
print()
|
|
99
|
-
print("Usage:")
|
|
100
|
-
print(" ocdiag list 列出所有诊断模块")
|
|
101
|
-
print(" ocdiag run <id> 运行单个模块(id 或 all)")
|
|
102
|
-
print(" ocdiag run all [--skip ids] 运行全部模块,可跳过若干")
|
|
103
|
-
print()
|
|
104
|
-
print("--skip 后接逗号分隔的 module id 列表(如 performance,sessions)。")
|
|
105
|
-
print("其它参数(--config / --log-dir / --json / --no-color)原样传递。")
|
|
151
|
+
print_help()
|
|
106
152
|
return 0
|
|
107
153
|
|
|
108
|
-
|
|
154
|
+
head, rest = argv[0], argv[1:]
|
|
109
155
|
|
|
110
|
-
if
|
|
156
|
+
if head == "list":
|
|
111
157
|
return cmd_list()
|
|
112
158
|
|
|
113
|
-
if
|
|
159
|
+
if head == "all":
|
|
160
|
+
skip_ids, passthrough = _split_skip(rest)
|
|
161
|
+
return cmd_all(passthrough, skip_ids)
|
|
162
|
+
|
|
163
|
+
# Backward-compat alias: `ocdiag run <id> [args...]` still works.
|
|
164
|
+
if head == "run":
|
|
114
165
|
if not rest:
|
|
115
166
|
print("Error: run requires a target (module id or 'all').", file=sys.stderr)
|
|
116
167
|
return 2
|
|
117
|
-
target = rest[0]
|
|
118
|
-
|
|
119
|
-
|
|
120
|
-
|
|
121
|
-
|
|
122
|
-
|
|
123
|
-
|
|
124
|
-
|
|
125
|
-
|
|
126
|
-
|
|
127
|
-
|
|
128
|
-
|
|
129
|
-
|
|
130
|
-
|
|
131
|
-
|
|
132
|
-
|
|
168
|
+
target, sub = rest[0], rest[1:]
|
|
169
|
+
if target == "all":
|
|
170
|
+
skip_ids, passthrough = _split_skip(sub)
|
|
171
|
+
return cmd_all(passthrough, skip_ids)
|
|
172
|
+
if target in MODULE_BY_ID:
|
|
173
|
+
_, script = MODULE_BY_ID[target]
|
|
174
|
+
return run_script(script, sub)
|
|
175
|
+
print(f"Error: unknown diagnostic '{target}'. Use `ocdiag list`.", file=sys.stderr)
|
|
176
|
+
return 2
|
|
177
|
+
|
|
178
|
+
if head in MODULE_BY_ID:
|
|
179
|
+
_, script = MODULE_BY_ID[head]
|
|
180
|
+
return run_script(script, rest)
|
|
181
|
+
|
|
182
|
+
print(f"Error: unknown command '{head}'. Use `ocdiag list` to see available diagnostics.",
|
|
183
|
+
file=sys.stderr)
|
|
133
184
|
return 2
|
|
134
185
|
|
|
135
186
|
|
package/package.json
CHANGED