npm - openclaw-diag-cli - Versions diffs - 0.1.0 → 0.1.2 - Mend

openclaw-diag-cli 0.1.0 → 0.1.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/README.md CHANGED Viewed

@@ -1,259 +1,149 @@
-# openclaw-diag-cli
+# OpenClaw 诊断工具箱
-> OpenClaw / ArkClaw 故障诊断工具集。零依赖、只读、可组合的纯 Python 脚本。
+> 排查 OpenClaw / ArkClaw 故障的只读 CLI。一组诊断、一个入口、零依赖。
-## 快速开始
-无需 git clone，通过 npm 拉一份缓存即可（之后离线可用）：
+## 安装
 ```bash
-# 一次性运行（npm 缓存后离线可用）
-npx openclaw-diag-cli list
-npx openclaw-diag-cli run gateway
-npx openclaw-diag-cli run all --json | jq -s '.'
+# 一次性运行（无需安装，npm 缓存后离线可用）
+npx openclaw-diag-cli
-# 装到 PATH（更短的命令）
+# 装到 PATH
 npm install -g openclaw-diag-cli
-openclaw-diag list
-openclaw-diag doctor                       # 检查环境是否就绪
-openclaw-diag bundle gateway > gw.py       # 生成单文件诊断脚本
+openclaw-diag
 ```
-依赖：Node 18+（npx）和 Python 3.8+。Node 层是零 npm 依赖的薄壳，只负责定位
-`python3` 并把参数透传给现有的 dispatcher，所以 `python3 diag/04_gateway.py`
-和 `python3 bin/ocdiag run gateway` 仍然完全可用。
-## 为什么存在
-排查 OpenClaw 故障时面对的真实痛点：
-- **数据散在多个角落**：session.jsonl 在 agents/ 下，配置在 openclaw.json，进程行为在 journalctl，cron 状态在 cron/jobs.json，模型耗时藏在 trajectory 里…… 手敲 jq + grep 组合费时且易漏。
-- **`openclaw-diag.sh` 已成为 4391 行单体 bash**，里面塞着 10 段 heredoc 嵌入的 Python，难修改、难单测、难复用。
-- **诊断脚本应该是"原子操作"**：每条数据有明确来源，每个模块解决一类问题，可以单独跑、可以组合管道、可以被自动化驱动。
-这个仓库就是把那个 4391 行 bash 拆开重写——每个采集动作独立成一个 Python 脚本，按一组公理设计，让"采集 → 分析 → 上报"变成可推理的工程而不是手工活。
----
-## 设计公理（First Principles）
-下面 6 条是**不可让步**的硬约束。所有目录结构、API、输出格式都从这 6 条推导出来。
-### 1. 只读（Read-Only）
-诊断脚本**永远不能**修改文件、写配置、重启服务。代价：再难拿的数据也要靠"读"获得；不允许走 `openclaw <subcmd>` 修改类入口。
-**收益**：在生产环境、在排查事故现场、在客户机器上跑都安全。
-### 2. 零运行时依赖（Zero Runtime Dependencies）
-**只用 Python 3.8+ 标准库**。不写 `requirements.txt`，不要 `pip install`。唯一例外：`croniter` 在 `06_cron_jobs.py` 中可选导入（缺失时退化到从历史 runs 推算间隔）。
-**收益**：任何能跑 OpenClaw 的节点都能跑诊断（OpenClaw 自己依赖 Node.js，但诊断脚本不依赖 OpenClaw 装在 Python 端的任何包）。`git clone` 完直接 `python3 diag/04_gateway.py`。
+依赖：Node 18+ 和 Python 3.8+。
-### 3. 独立可执行（Independent）
-**每个诊断脚本必须能单独跑通**，不依赖 dispatcher、不需要 source 任何 env 文件、不需要先执行别的脚本。
-**推论**：脚本顶部用 `sys.path.insert(0, ...)` 把仓库根加进去再 import 共享库；不强制装包。
+## 五分钟上手
-### 4. 可组合（Composable）
-默认输出是人类可读文本（中文，带 emoji 装饰），加 `--json` 输出**结构化 JSON**。
-- 单脚本：`{"module": "<id>", "status": "ok|error", "data": {...}}`
-- `bin/ocdiag run all --json` 输出 **NDJSON**（每行一个模块的 JSON），可以 `... | jq -s '.'` 聚合，或者 `... | jq 'select(.module=="cron_jobs") | .data'` 抽取。
-**推论**：进度信息走 stderr，永远不污染 stdout 的 JSON 流。
-### 5. 数据可靠（Data Fidelity）
-脚本输出的每个数字、每个状态都必须能溯源：
-- 系统数据 → `subprocess.run(["free","-m"])`、`/proc/<pid>/environ`、`journalctl ...`
-- OpenClaw 数据 → `~/.openclaw/openclaw.json`、`~/.openclaw/cron/jobs.json`、`~/.openclaw/agents/*/sessions/*.jsonl`
-- 日志数据 → `/tmp/openclaw/openclaw-*.log`（按 mtime 取今日）
-数据来源在文档里逐模块列清，不允许"看上去合理就行"。同一字段，文本输出和 JSON 输出必须**值一致**。
+```bash
+# 1. 看看能做什么
+openclaw-diag
-### 6. 故障隔离（Failure Isolation）
-- 单个模块崩溃**不能**带崩 `run all`：dispatcher 在 `runpy.run_path` 外包 try/except。
-- 单个数据源缺失（配置不存在、日志没生成、session 文件被删）**不能**抛异常，要明确报告"未找到"。
-- 不要 swallow 异常变 silent：失败要在 stderr 留 traceback，rc 非 0。
+# 2. 检查环境是否就绪
+openclaw-diag doctor
----
+# 3. 跑某个诊断
+openclaw-diag gateway
-## 推导出的架构
+# 4. 全部 state collectors 跑一遍（任一崩了不影响其他）
+openclaw-diag all
-```
-openclaw-diag-cli/
-├── ocdiag/         共享原语（公理 #2 推论：库小而稳）
-│   ├── paths.py        路径常量 + 环境变量覆盖
-│   ├── jsonlog.py      OpenClaw JSON 日志解析（公理 #5）
-│   ├── timeutil.py     ISO/epoch 时间转换 + 人类友好格式化
-│   ├── tokens.py       fmt_tokens / percentile / human_size
-│   ├── sensitive.py    密钥/Token 脱敏（公理 #1 的延伸：输出也要安全）
-│   ├── output.py       双模式输出（人类可读 + JSON）— 公理 #4 实现
-│   ├── recent_logs.py  发现今日更新日志
-│   ├── cli.py          公共 argparse（--config / --log-dir / --json）
-│   └── dispatcher.py   bin/ocdiag 复用的入口
-│
-├── diag/           诊断模块（公理 #3：每个能独立跑）
-│   ├── 01_sys_health.py        系统健康（DNS/网络/CPU/内存/磁盘/IO/进程/时间同步）
-│   ├── 02_environment.py       OpenClaw 基础环境（版本一致性、Gateway 进程 env）
-│   ├── 03_configuration.py     openclaw.json 展平（脱敏后）
-│   ├── 04_gateway.py           Gateway 状态（WS 生命周期 + 错误码统一视图）
-│   ├── 05_recent_errors.py     近期错误（多日志聚合 + journalctl + tool 错误）
-│   ├── 06_cron_jobs.py         定时任务（jobs.json + state + runs/ 三源合并）
-│   ├── 07_performance.py       模型/工具性能（慢调用 Top 20 / E2E 延迟 / Cache）
-│   ├── 08_sessions.py          Session 数据（六维分析 + Stuck 探测）
-│   ├── 09_plugin_diag.py       插件诊断（一致性 + ERROR/WARN + Hook + Channel + DNS）
-│   └── 10_shell_history.py     Shell 历史（高危命令 + openclaw 命令）
-│
-├── tools/          单点深挖工具（不是采集，是分析特定对象）
-│   ├── oc_session_trace.py     跟踪一条 user 消息从进入到响应的完整时间轴
-│   └── oc_session_extract.py   把 session jsonl 导出为可读格式（含 reset/bak/deleted 全状态）
-│
-└── bin/
-    └── ocdiag      可选的总入口（list / run <id> / run all）
+# 5. 输出结构化 JSON
+openclaw-diag gateway --json
 ```
----
+## 诊断列表
-## 数据来源（每条数据从哪里读）
+诊断按"是否需要参数"分两类。
-公理 #5 的具体落地——下游用任何字段都能查到它从哪来：
+### State collectors（无需参数，扫一遍系统当前状态）
-| 模块 | 数据来源 |
+| 诊断 | 看什么 |
 |---|---|
-| 01_sys_health | `dig`/`getent`、`free -m`、`df -m`、`/proc/<pid>/limits`、`timedatectl` |
-| 02_environment | `openclaw --version`、`/proc/<gw-pid>/environ`、`~/.config/systemd/user/openclaw-gateway.service.d/env.conf` |
-| 03_configuration | `~/.openclaw/openclaw.json`（脱敏：key/secret/token/password 等关键词命中后 mask） |
-| 04_gateway | `systemctl status` + `journalctl --since 24h` + `~/.openclaw/openclaw.json:gateway.port` + `/tmp/openclaw/openclaw-*.log`（subsystem 白名单过滤） |
-| 05_recent_errors | 今日 `openclaw-*.log` 的 ERROR/FATAL + `journalctl --priority err` + 最近 session.jsonl 的 toolResult.isError |
-| 06_cron_jobs | `~/.openclaw/cron/jobs.json` + `jobs-state.json` + `runs/<jobId>.jsonl`（合并三源） |
-| 07_performance | 最近 20 个 `agents/*/sessions/*.jsonl`（含 reset 文件，按 mtime） |
-| 08_sessions | 同上 + `/tmp/openclaw/openclaw-*.log` 中 subsystem=diagnostic 的 stuck-session 行 |
-| 09_plugin_diag | 今日日志 `_meta.name` 解析 + `~/.openclaw/openclaw.json:plugins.entries` + `~/.openclaw/extensions/` + DNS 探测 |
-| 10_shell_history | `~/.bash_history` + `~/.zsh_history` |
-| oc_session_trace | session.jsonl + 同目录 `*.trajectory.jsonl`（可选） + Gateway 日志（可选） |
-| oc_session_extract | session.jsonl + 兄弟文件 `.deleted` / `.reset.N` / `.bak-*` |
----
-## 用法
+| `sys_health` | DNS / 网络 / CPU / 内存 / 磁盘 / IO / 进程 / 时间同步 |
+| `environment` | OpenClaw 版本一致性、Gateway 进程环境变量 |
+| `configuration` | `openclaw.json` 展平（敏感字段已脱敏） |
+| `gateway` | Gateway 进程、端口、24h 启停、WS 生命周期、错误码 |
+| `recent_errors` | 应用日志 / journalctl / session 工具调用错误聚合 |
+| `cron_jobs` | 定时任务状态、连续失败、调度漂移、静默检测 |
+| `performance` | 模型/工具耗时 P50/P95、慢调用 Top 20、E2E 延迟、Cache 命中率 |
+| `sessions` | Session 总览、活跃度、Stuck 探测 |
+| `plugin_diag` | 插件状态一致性、ERROR/WARN、Hook 异常、Channel、外部依赖 DNS |
+| `shell_history` | 高危命令、openclaw 命令、最近操作 |
+### Object inspectors（需要 session uuid，深挖一个具体对象）
+| 诊断 | 看什么 |
+|---|---|
+| `trace <uuid>` | 追踪一条用户消息从进入到响应的完整时间轴 |
+| `extract <uuid>` | 导出 session.jsonl 为可读格式（reset / bak / deleted 全状态） |
-### 最小用法（独立脚本）
+### Meta
-```bash
-git clone https://github.com/wujiaming88/openclaw-diag-cli.git
-cd openclaw-diag-cli
-python3 diag/04_gateway.py            # 直接跑，零配置
-python3 diag/04_gateway.py --json     # 同样的数据，JSON 格式
-```
+| 命令 | 作用 |
+|---|---|
+| `openclaw-diag all` | 跑全部 state collectors |
+| `openclaw-diag list` | 列出所有诊断 |
+| `openclaw-diag doctor` | 检查 Node / Python / ocdiag / OpenClaw 环境 |
+| `openclaw-diag bundle <id>` | 打成 self-contained 单文件 .py |
-### 总入口（可选）
+## 常见配方
 ```bash
-python3 bin/ocdiag list                # 列出 10 个模块
-python3 bin/ocdiag run gateway         # 跑 04_gateway
-python3 bin/ocdiag run all             # 全部跑一遍（任一模块崩了不影响其他）
-python3 bin/ocdiag run all --skip performance,sessions  # 跳过重模块
-```
+# 找出哪个 cron 任务在连续失败
+openclaw-diag cron_jobs --json | jq '.data.jobs[] | select(.status!="ok")'
-### npm / npx 入口（同样支持上述全部参数）
+# 看哪个模型的 P95 延迟最高
+openclaw-diag performance | grep -A1 "P95"
-```bash
-npx openclaw-diag-cli list
-npx openclaw-diag-cli run gateway --json
-npx openclaw-diag-cli run all --skip performance,sessions
-npx openclaw-diag-cli doctor                # 检查 Node/Python/ocdiag/OpenClaw
-npx openclaw-diag-cli bundle 04_gateway > standalone-gateway.py
-```
+# 哪些插件今天有 ERROR
+openclaw-diag plugin_diag --json | jq '.data.plugin_errors | to_entries[] | select(.value.error_count > 0)'
-### JSON 管道（公理 #4 的真正用法）
+# 把所有诊断聚合成单个 JSON 报告
+openclaw-diag all --json 2>/dev/null | jq -s '.' > report.json
-```bash
-# 1) 单模块 JSON → jq 抽取关键字段
-python3 diag/06_cron_jobs.py --json | jq '.data.jobs | length'
+# 找出有 stuck session 的事件
+openclaw-diag sessions --json | jq '.data.stuck_sessions'
-# 2) run all NDJSON → 聚合为单文档
-python3 bin/ocdiag run all --json 2>/dev/null | jq -s '.' > report.json
+# 追踪用户消息时间轴
+openclaw-diag trace <session-uuid> --msg-index 0
-# 3) 找出有错误的模块
-python3 bin/ocdiag run all --json 2>/dev/null | jq 'select(.status=="error")'
-# 4) 提取所有 cron 任务的成功率
-python3 bin/ocdiag run all --json 2>/dev/null \
-  | jq 'select(.module=="cron_jobs") | .data.jobs[] | {name, success_rate}'
+# 导出 session 为可读格式
+openclaw-diag extract <session-uuid> --summary
 ```
-### 工具：单点深挖
+## 离线机器：bundle 出单文件
 ```bash
-# 跟踪一条 user 消息的处理时间轴
-python3 tools/oc_session_trace.py <session-uuid> --msg-index 0
+# 在有网的机器
+openclaw-diag bundle gateway > standalone-gateway.py
-# 导出 session 为可读格式
-python3 tools/oc_session_extract.py <session-uuid> --summary
-python3 tools/oc_session_extract.py <session-uuid> --types message --no-pretty
+# scp 到目标机器（只需要 Python 3.8+，无需安装任何东西）
+scp standalone-gateway.py prod-server:/tmp/
+ssh prod-server "python3 /tmp/standalone-gateway.py --json"
 ```
-### 环境变量覆盖
+`bundle` 会把脚本和它依赖的共享代码合并成一个 self-contained `.py`，零依赖。
+## 配置覆盖
-跑别人机器/容器时不用改代码，覆盖路径即可：
+诊断别人机器或容器时，无需改代码：
-| 变量 | 默认值 | 说明 |
+| 环境变量 | 默认值 | 说明 |
 |---|---|---|
 | `OPENCLAW_HOME` | `~/.openclaw` | OpenClaw 主目录 |
 | `OPENCLAW_CONFIG` | `$OPENCLAW_HOME/openclaw.json` | 配置文件 |
 | `OPENCLAW_LOG_DIR` | `/tmp/openclaw` | 日志目录 |
 | `OPENCLAW_SESSIONS` | `$OPENCLAW_HOME/agents` | Session 根 |
-| `OPENCLAW_SERVICE_FILE` | `~/.config/systemd/user/openclaw-gateway.service` | systemd 服务单元 |
-也可以用 `--config /path/to/openclaw.json --log-dir /path/to/logs` 覆盖单个参数。
+也可以用 `--config /path/to/file --log-dir /path/to/logs` 覆盖单次。
----
-## 退出码与错误隔离
+## 退出码
 | rc | 含义 |
 |---|---|
-| 0 | 模块成功，data 字段已填 |
-| 1 | 模块运行成功但报告 `status: "error"`（数据源缺失等业务错误） |
-| 2 | 单模块崩溃（dispatcher 已隔离，不影响其他模块） |
-`bin/ocdiag run all` 的总 rc 取最大值；任一模块崩溃 stderr 留 traceback，但 stdout 流仍完整。
----
-## 扩展：加一个新诊断模块
-遵循公理即可：
+| 0 | 诊断成功 |
+| 1 | 诊断运行成功但报告 `status: "error"`（数据源缺失等） |
+| 2 | 诊断崩溃（已隔离，不影响 `all`） |
-1. 新建 `diag/11_my_check.py`，shebang `#!/usr/bin/env python3`
-2. 顶部 docstring 说明：**采集什么 + 数据来源 + 输出含义**
-3. `sys.path.insert(0, str(Path(__file__).resolve().parent.parent))` 接入共享库
-4. `from ocdiag import cli, output, paths` 拿到统一基础设施
-5. `parser = cli.build_common_parser(...); args = parser.parse_args()`
-6. `out = output.init("my_check", json_mode=args.json, ...)`
-7. 业务逻辑——文本输出用 `out.item / out.evidence / out.section`，JSON 数据用 `out.set_data("key", value)`
-8. 流式读 JSONL（`for line in open(...)`），不能 `.read().split('\n')`
-9. 子进程调用必须带 `timeout`
-10. 数据源缺失要明确报告"未找到"，不抛异常
+## 设计原则
-注册到 `bin/ocdiag` 只需在 `ocdiag/dispatcher.py:MODULES` 列表加一行。
----
-## 不做的事（反模式）
-| 不做 | 原因 |
+| | |
 |---|---|
-| 不写测试框架 | 优先靠 ground truth 对齐验证；测试以后补 |
-| 不加 web UI / TUI / Rich | 公理 #2（零依赖）+ 公理 #4（管道友好）冲突 |
-| 不需要 `pip install` | 公理 #2 + #3 |
-| 不重启 / 不修改 / 不发请求 | 公理 #1 |
-| 不强制配置 / 不强制 token | 任何节点 clone 即跑 |
-| 不引入 jq 子进程 | Python 自带 json，更可控 |
-| 不内嵌 Python 在 bash heredoc 里 | 这就是我们要替代的旧形态 |
----
+| **只读** | 永远不修改文件、不重启服务 |
+| **零依赖** | 仅 Python 3.8+ 标准库 |
+| **故障隔离** | 单诊断崩溃不带崩 `all` |
+| **数据可靠** | 每个字段都能溯源 |
+| **可组合** | 文本 + JSON 双输出，stderr 与 stdout 分流 |
-## 来历
+详细设计 → [docs/DESIGN.md](docs/DESIGN.md)（公理推导、目录结构、扩展指南）
-由 4391 行的 `openclaw-diag.sh`（10 个 bash 模块 + 10 段 heredoc Python）拆分重写。原脚本仍在维护，作为"打包采集 + 远程发送报告"的 all-in-one 用例存在；本仓库面向"模块化、自动化、可推理"的诊断场景。
+## 反馈
----
+- Issues: https://github.com/wujiaming88/openclaw-diag-cli/issues
+- 来源：从 4391 行的 `openclaw-diag.sh` 拆分重写
 ## License

package/bin/openclaw-diag.js CHANGED Viewed

@@ -17,6 +17,22 @@ const PYTHON_CANDIDATES = process.platform === 'win32'
   ? ['python3', 'python', 'py']
   : ['python3', 'python'];
+// Keep these in sync with ocdiag/dispatcher.py.
+const STATE_COLLECTORS = [
+  'sys_health', 'environment', 'configuration', 'gateway', 'recent_errors',
+  'cron_jobs', 'performance', 'sessions', 'plugin_diag', 'shell_history',
+];
+const OBJECT_INSPECTORS = ['trace', 'extract'];
+const MODULE_IDS = new Set([...STATE_COLLECTORS, ...OBJECT_INSPECTORS]);
+const STATE_SCRIPTS = [
+  '01_sys_health.py', '02_environment.py', '03_configuration.py',
+  '04_gateway.py', '05_recent_errors.py', '06_cron_jobs.py',
+  '07_performance.py', '08_sessions.py', '09_plugin_diag.py',
+  '10_shell_history.py',
+];
+const OBJECT_SCRIPTS = ['oc_session_trace.py', 'oc_session_extract.py'];
 function findPython() {
   for (const cmd of PYTHON_CANDIDATES) {
     try {
@@ -51,23 +67,26 @@ function printVersion() {
 function printHelp() {
   const lines = [
-    'openclaw-diag — OpenClaw / ArkClaw read-only diagnostic CLI',
+    'openclaw-diag — OpenClaw / ArkClaw 诊断工具箱',
     '',
     'Usage:',
-    '  openclaw-diag                          Show banner + module list',
-    '  openclaw-diag list                     List all diagnostic modules',
-    '  openclaw-diag run <id>                 Run a single module (or "all")',
-    '  openclaw-diag run all [--skip a,b]     Run all modules (skip optional)',
-    '  openclaw-diag run <id> --json          Emit JSON (NDJSON for "all")',
-    '  openclaw-diag bundle <id>              Print self-contained single-file .py to stdout',
-    '  openclaw-diag doctor [--json]          Check Node / Python / ocdiag / OpenClaw env',
-    '  openclaw-diag --version                Print package version',
-    '  openclaw-diag --help                   Print this help',
+    '  openclaw-diag                          打印 banner + 诊断目录',
+    '  openclaw-diag list                     列出全部诊断（按类型分组）',
+    '  openclaw-diag <id> [args...]           跑单个诊断',
+    '  openclaw-diag all [--skip a,b]         跑全部 state collectors',
+    '  openclaw-diag all [--json]             NDJSON 聚合输出',
+    '  openclaw-diag bundle <id>              打成 self-contained 单文件 .py',
+    '  openclaw-diag doctor [--json]          检查 Node / Python / ocdiag / OpenClaw env',
+    '  openclaw-diag --version                打印版本号',
+    '  openclaw-diag --help                   本帮助',
     '',
-    'Module ids: sys_health environment configuration gateway recent_errors',
-    '            cron_jobs performance sessions plugin_diag shell_history',
+    'State collectors (无需参数):',
+    '  ' + STATE_COLLECTORS.join('  '),
     '',
-    'Pass-through flags (forwarded to Python): --config --log-dir --json --no-color',
+    'Object inspectors (需要 session uuid):',
+    '  ' + OBJECT_INSPECTORS.join('  '),
+    '',
+    '透传给诊断脚本: --config --log-dir --json --no-color',
   ];
   console.log(lines.join('\n'));
 }
@@ -90,15 +109,10 @@ function runDispatcher(args) {
   });
 }
-function runBundle(args) {
-  if (args.length === 0) {
-    console.error('Error: bundle requires a module id (e.g. `openclaw-diag bundle gateway`)');
-    process.exit(2);
-  }
+function runScript(scriptPath, args) {
   const py = findPython();
   if (!py) pythonNotFound();
-  const bundleScript = path.join(REPO_ROOT, 'lib', 'bundle.py');
-  const child = spawn(py.cmd, [bundleScript, ...args], { stdio: 'inherit' });
+  const child = spawn(py.cmd, [scriptPath, ...args], { stdio: 'inherit' });
   child.on('error', (err) => {
     console.error(`Error: failed to spawn ${py.cmd}: ${err.message}`);
     process.exit(1);
@@ -112,14 +126,15 @@ function runBundle(args) {
   });
 }
-// ── doctor ──
+function runBundle(args) {
+  if (args.length === 0) {
+    console.error('Error: bundle requires a module id (e.g. `openclaw-diag bundle gateway`)');
+    process.exit(2);
+  }
+  runScript(path.join(REPO_ROOT, 'lib', 'bundle.py'), args);
+}
-const DIAG_SCRIPTS = [
-  '01_sys_health.py', '02_environment.py', '03_configuration.py',
-  '04_gateway.py', '05_recent_errors.py', '06_cron_jobs.py',
-  '07_performance.py', '08_sessions.py', '09_plugin_diag.py',
-  '10_shell_history.py',
-];
+// ── doctor ──
 function nodeVersionOk() {
   const m = process.versions.node.match(/^(\d+)\./);
@@ -143,17 +158,20 @@ function checkOcdiagImport(pyCmd) {
 function checkDiagScripts(pyCmd) {
   const failed = [];
-  for (const name of DIAG_SCRIPTS) {
-    const p = path.join(REPO_ROOT, 'diag', name);
-    const r = spawnSync(pyCmd, [p, '--help'], {
+  const all = [
+    ...STATE_SCRIPTS.map((n) => ({ name: n, path: path.join(REPO_ROOT, 'diag', n) })),
+    ...OBJECT_SCRIPTS.map((n) => ({ name: n, path: path.join(REPO_ROOT, 'tools', n) })),
+  ];
+  for (const item of all) {
+    const r = spawnSync(pyCmd, [item.path, '--help'], {
       stdio: ['ignore', 'pipe', 'pipe'],
       timeout: 10000,
     });
     if (r.status !== 0) {
-      failed.push({ script: name, status: r.status, stderr: ((r.stderr || '').toString().trim()).slice(0, 200) });
+      failed.push({ script: item.name, status: r.status, stderr: ((r.stderr || '').toString().trim()).slice(0, 200) });
     }
   }
-  return failed;
+  return { failed, total: all.length };
 }
 function checkOpenclawConfig() {
@@ -190,10 +208,10 @@ function runDoctor(args) {
   const ocdiag = checkOcdiagImport(py.cmd);
   result.ocdiag = ocdiag;
-  const failed = checkDiagScripts(py.cmd);
+  const { failed, total } = checkDiagScripts(py.cmd);
   result.diag_scripts = {
     ok: failed.length === 0,
-    total: DIAG_SCRIPTS.length,
+    total,
     failed,
   };
@@ -214,9 +232,9 @@ function runDoctor(args) {
       }
     }
     if (failed.length === 0) {
-      console.log(`✓ All ${DIAG_SCRIPTS.length} diag modules respond to --help`);
+      console.log(`✓ All ${total} diagnostics respond to --help`);
     } else {
-      console.log(`✗ ${failed.length}/${DIAG_SCRIPTS.length} diag modules failed --help:`);
+      console.log(`✗ ${failed.length}/${total} diagnostics failed --help:`);
       for (const f of failed) {
         console.log(`    ${f.script} (rc=${f.status})`);
       }
@@ -238,15 +256,20 @@ function main() {
   const argv = process.argv.slice(2);
   if (argv.length === 0) {
-    console.log(`openclaw-diag v${PKG.version} — OpenClaw / ArkClaw 诊断 CLI`);
+    const py = findPython();
+    if (!py) pythonNotFound();
+    console.log(`openclaw-diag v${PKG.version} — OpenClaw / ArkClaw 诊断工具箱`);
     console.log('');
-    console.log('  npx openclaw-diag-cli list           列出所有诊断模块');
-    console.log('  npx openclaw-diag-cli run <id>       运行单个模块（或 all）');
-    console.log('  npx openclaw-diag-cli doctor         检查环境是否就绪');
-    console.log('  npx openclaw-diag-cli --help         查看完整帮助');
+    const dispatcher = path.join(REPO_ROOT, 'bin', 'ocdiag');
+    spawnSync(py.cmd, [dispatcher, 'list'], { stdio: 'inherit' });
     console.log('');
-    runDispatcher(['list']);
-    return;
+    console.log('常用命令：');
+    console.log('  openclaw-diag gateway           跑单个 state collector');
+    console.log('  openclaw-diag all               全部 state collectors');
+    console.log('  openclaw-diag trace <uuid>      追踪一条用户消息');
+    console.log('  openclaw-diag doctor            检查环境');
+    console.log('  openclaw-diag --help            完整帮助');
+    process.exit(0);
   }
   const head = argv[0];
@@ -268,7 +291,7 @@ function main() {
     return;
   }
-  // Pass through everything else to the Python dispatcher.
+  // Pass through everything else (flat ids, `all`, `list`, `run` alias, unknown) to dispatcher.
   runDispatcher(argv);
 }

package/ocdiag/__init__.py CHANGED Viewed

@@ -1,3 +1,3 @@
 """ocdiag — shared library for openclaw-diag-cli scripts."""
-__version__ = "0.1.0"
+__version__ = "0.1.2"

package/ocdiag/dispatcher.py CHANGED Viewed

@@ -1,8 +1,15 @@
-"""Dispatcher: list / run <name> / run all."""
+"""Dispatcher: every diagnostic is a top-level subcommand.
+Layout:
+  ocdiag <state-collector>      runs that collector (e.g. `ocdiag gateway`)
+  ocdiag <object-inspector> ARG runs that inspector  (e.g. `ocdiag trace UUID`)
+  ocdiag all [--skip a,b]       runs every state collector
+  ocdiag list                   prints the catalogue grouped by parameter mode
+  ocdiag run <id> [args...]     legacy alias retained for 0.1.x users
+"""
 from __future__ import annotations
-import argparse
 import os
 import runpy
 import sys
@@ -13,29 +20,47 @@ from typing import List
 REPO_ROOT = Path(__file__).resolve().parent.parent
-# Module ID -> (label, script filename relative to REPO_ROOT)
-MODULES = [
-    ("sys_health",     "系统健康检查",     "diag/01_sys_health.py"),
-    ("environment",    "采集基础环境",     "diag/02_environment.py"),
-    ("configuration",  "采集配置",         "diag/03_configuration.py"),
-    ("gateway",        "采集 Gateway 状态", "diag/04_gateway.py"),
-    ("recent_errors",  "采集近期日志",     "diag/05_recent_errors.py"),
-    ("cron_jobs",      "采集定时任务",     "diag/06_cron_jobs.py"),
-    ("performance",    "采集模型与性能数据", "diag/07_performance.py"),
-    ("sessions",       "采集 Session 数据", "diag/08_sessions.py"),
-    ("plugin_diag",    "采集插件诊断",     "diag/09_plugin_diag.py"),
-    ("shell_history",  "采集命令执行历史",  "diag/10_shell_history.py"),
+# State collectors: zero required args, parameter-free observation of system state.
+STATE_COLLECTORS = [
+    ("sys_health",     "系统健康检查",          "diag/01_sys_health.py"),
+    ("environment",    "OpenClaw 基础环境",     "diag/02_environment.py"),
+    ("configuration",  "配置展平（脱敏）",      "diag/03_configuration.py"),
+    ("gateway",        "Gateway 状态",          "diag/04_gateway.py"),
+    ("recent_errors",  "近期错误聚合",          "diag/05_recent_errors.py"),
+    ("cron_jobs",      "定时任务状态",          "diag/06_cron_jobs.py"),
+    ("performance",    "模型/工具性能",         "diag/07_performance.py"),
+    ("sessions",       "Session 数据",          "diag/08_sessions.py"),
+    ("plugin_diag",    "插件诊断",              "diag/09_plugin_diag.py"),
+    ("shell_history",  "Shell 历史",            "diag/10_shell_history.py"),
+]
+# Object inspectors: take a session uuid (or other identifier) and inspect it.
+OBJECT_INSPECTORS = [
+    ("trace",   "追踪用户消息时间轴",  "tools/oc_session_trace.py"),
+    ("extract", "导出 session 为可读格式", "tools/oc_session_extract.py"),
 ]
-MODULE_BY_ID = {mid: (label, script) for mid, label, script in MODULES}
+STATE_BY_ID = {mid: (label, script) for mid, label, script in STATE_COLLECTORS}
+OBJECT_BY_ID = {mid: (label, script) for mid, label, script in OBJECT_INSPECTORS}
+MODULE_BY_ID = {**STATE_BY_ID, **OBJECT_BY_ID}
+MODULE_IDS = set(MODULE_BY_ID.keys())
 def cmd_list() -> int:
-    print("Available modules:")
-    for mid, label, _ in MODULES:
-        print(f"  [x] {mid:<16s} {label}")
+    print("Available diagnostics:")
+    print()
+    print("  State collectors (no args needed):")
+    for mid, label, _ in STATE_COLLECTORS:
+        print(f"    {mid:<16s} {label}")
     print()
-    print("Usage: ocdiag run <id> | ocdiag run all [--skip id1,id2] [--json]")
+    print("  Object inspectors (require session uuid):")
+    for mid, label, _ in OBJECT_INSPECTORS:
+        print(f"    {mid:<16s} {label}")
+    print()
+    print("  Meta:")
+    print("    all              跑全部 state collectors")
+    print("    doctor           检查 Node/Python/OpenClaw 环境")
+    print("    bundle <id>      打包成 self-contained 单文件")
     return 0
@@ -64,72 +89,98 @@ def run_script(script_rel: str, extra_args: List[str]) -> int:
         sys.argv = saved_argv
-def cmd_run(target: str, extra_args: List[str], skip_ids: List[str]) -> int:
+def cmd_all(extra_args: List[str], skip_ids: List[str]) -> int:
     json_mode = "--json" in extra_args
     progress_stream = sys.stderr if json_mode else sys.stdout
-    if target == "all":
-        rc_overall = 0
-        total = sum(1 for mid, _, _ in MODULES if mid not in skip_ids)
-        n = 0
-        for mid, label, script in MODULES:
-            if mid in skip_ids:
-                continue
-            n += 1
-            print(f"\n[{n}/{total}] {label} ({mid})...", flush=True, file=progress_stream)
-            t0 = time.time()
-            rc = run_script(script, extra_args)
-            elapsed = time.time() - t0
-            print(f"[{n}/{total}] {label} ({mid}) ... done ({elapsed:.1f}s)", flush=True, file=progress_stream)
-            if rc != 0:
-                rc_overall = rc
-        return rc_overall
-    if target not in MODULE_BY_ID:
-        print(f"Error: unknown module '{target}'. Use `ocdiag list`.", file=sys.stderr)
-        return 2
-    _, script = MODULE_BY_ID[target]
-    return run_script(script, extra_args)
+    rc_overall = 0
+    total = sum(1 for mid, _, _ in STATE_COLLECTORS if mid not in skip_ids)
+    n = 0
+    for mid, label, script in STATE_COLLECTORS:
+        if mid in skip_ids:
+            continue
+        n += 1
+        print(f"\n[{n}/{total}] {label} ({mid})...", flush=True, file=progress_stream)
+        t0 = time.time()
+        rc = run_script(script, extra_args)
+        elapsed = time.time() - t0
+        print(f"[{n}/{total}] {label} ({mid}) ... done ({elapsed:.1f}s)",
+              flush=True, file=progress_stream)
+        if rc != 0:
+            rc_overall = rc
+    return rc_overall
+def _split_skip(rest: List[str]) -> (List[str], List[str]):
+    """Pull out --skip a,b out of an argv tail; return (skip_ids, passthrough)."""
+    skip_ids: List[str] = []
+    passthrough: List[str] = []
+    i = 0
+    while i < len(rest):
+        a = rest[i]
+        if a == "--skip" and i + 1 < len(rest):
+            skip_ids.extend(s.strip() for s in rest[i + 1].split(",") if s.strip())
+            i += 2
+            continue
+        passthrough.append(a)
+        i += 1
+    return skip_ids, passthrough
+def print_help() -> None:
+    print("ocdiag — OpenClaw 诊断工具箱")
+    print()
+    print("Usage:")
+    print("  ocdiag <id> [args...]            跑单个诊断（state collector 或 object inspector）")
+    print("  ocdiag all [--skip a,b]          跑全部 state collectors")
+    print("  ocdiag list                      列出所有诊断")
+    print("  ocdiag run <id> [args...]        旧用法别名（0.1.x 兼容）")
+    print()
+    print("State collectors:")
+    print("  " + "  ".join(mid for mid, _, _ in STATE_COLLECTORS))
+    print("Object inspectors:")
+    print("  " + "  ".join(mid for mid, _, _ in OBJECT_INSPECTORS))
+    print()
+    print("--skip 后接逗号分隔 id 列表（仅对 all 有意义）。")
+    print("其它参数（--config / --log-dir / --json / --no-color）原样传递给脚本。")
 def main(argv=None) -> int:
     argv = list(sys.argv[1:] if argv is None else argv)
     if not argv or argv[0] in ("-h", "--help"):
-        print("ocdiag — OpenClaw 诊断 CLI dispatcher")
-        print()
-        print("Usage:")
-        print("  ocdiag list                      列出所有诊断模块")
-        print("  ocdiag run <id>                  运行单个模块（id 或 all）")
-        print("  ocdiag run all [--skip ids]      运行全部模块，可跳过若干")
-        print()
-        print("--skip 后接逗号分隔的 module id 列表（如 performance,sessions）。")
-        print("其它参数（--config / --log-dir / --json / --no-color）原样传递。")
+        print_help()
         return 0
-    cmd, rest = argv[0], argv[1:]
+    head, rest = argv[0], argv[1:]
-    if cmd == "list":
+    if head == "list":
         return cmd_list()
-    if cmd == "run":
+    if head == "all":
+        skip_ids, passthrough = _split_skip(rest)
+        return cmd_all(passthrough, skip_ids)
+    # Backward-compat alias: `ocdiag run <id> [args...]` still works.
+    if head == "run":
         if not rest:
             print("Error: run requires a target (module id or 'all').", file=sys.stderr)
             return 2
-        target = rest[0]
-        sub = rest[1:]
-        skip_ids: List[str] = []
-        passthrough: List[str] = []
-        i = 0
-        while i < len(sub):
-            a = sub[i]
-            if a == "--skip" and i + 1 < len(sub):
-                skip_ids.extend(s.strip() for s in sub[i + 1].split(",") if s.strip())
-                i += 2
-                continue
-            passthrough.append(a)
-            i += 1
-        return cmd_run(target, passthrough, skip_ids)
-    print(f"Error: unknown command '{cmd}'", file=sys.stderr)
+        target, sub = rest[0], rest[1:]
+        if target == "all":
+            skip_ids, passthrough = _split_skip(sub)
+            return cmd_all(passthrough, skip_ids)
+        if target in MODULE_BY_ID:
+            _, script = MODULE_BY_ID[target]
+            return run_script(script, sub)
+        print(f"Error: unknown diagnostic '{target}'. Use `ocdiag list`.", file=sys.stderr)
+        return 2
+    if head in MODULE_BY_ID:
+        _, script = MODULE_BY_ID[head]
+        return run_script(script, rest)
+    print(f"Error: unknown command '{head}'. Use `ocdiag list` to see available diagnostics.",
+          file=sys.stderr)
     return 2

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "openclaw-diag-cli",
-  "version": "0.1.0",
+  "version": "0.1.2",
   "description": "OpenClaw / ArkClaw read-only diagnostic CLI. Zero-dependency Python scripts wrapped in Node for npx-friendly install.",
   "keywords": [
     "openclaw",