npm - superlab - Versions diffs - 0.1.13 → 0.1.14 - Mend

superlab 0.1.13 → 0.1.14

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

package/README.md +8 -0
package/README.zh-CN.md +8 -0
package/lib/i18n.cjs +16 -4
package/package-assets/claude/commands/lab/auto.md +3 -0
package/package-assets/claude/commands/lab.md +14 -0
package/package-assets/codex/prompts/lab-auto.md +3 -0
package/package-assets/codex/prompts/lab.md +14 -0
package/package-assets/shared/lab/context/auto-mode.md +8 -1
package/package-assets/shared/skills/lab/stages/auto.md +20 -0
package/package.json +1 -1

package/README.md CHANGED Viewed

@@ -187,6 +187,14 @@ superlab auto stop
 It does not replace manual `idea`, `data`, `framing`, or `spec` decisions.
+Good `/lab:auto` input is explicit. Treat `Autonomy level L1/L2/L3` as execution privilege, and treat `paper layer`, `phase`, or `table` as experiment targets. If the workflow language is Chinese, summaries, checklist items, task labels, and progress updates should also stay in Chinese unless a literal identifier must remain unchanged.
+Example:
+```text
+/lab:auto Autonomy level L2. Objective: advance paper layer 3 organizer enforcement. Terminal goal: task-completion. Scope: bounded protocol, tests, minimal implementation, and one small run. Allowed modifications: evaluator prompt registry, ingestion, and parser only.
+```
 ## Version
 Show the CLI version and the current project asset version:

package/README.zh-CN.md CHANGED Viewed

@@ -185,6 +185,14 @@ superlab auto stop
 它不会替代手动的 `idea`、`data`、`framing`、`spec` 决策。
+好的 `/lab:auto` 输入应该显式写清。把 `Autonomy level L1/L2/L3` 当成执行权限级别，把 `paper layer`、`phase`、`table` 当成实验目标，不要混用。如果 workflow language 是中文，摘要、清单条目、任务标签和进度更新也应保持中文，除非某个字面标识符必须保持原样。
+示例：
+```text
+/lab:auto 自治级别 L2。目标：推进 paper layer 3 的 organizer enforcement。终止条件：完成 bounded protocol、测试、最小实现和一轮小规模结果。允许修改：evaluator prompt registry、ingestion、parser。
+```
 ## 版本查询
 查看当前 CLI 版本和当前目录项目的资产版本：

package/lib/i18n.cjs CHANGED Viewed

@@ -934,12 +934,16 @@ const ZH_SKILL_FILES = {
 - Objective:
 - Autonomy level: L2
+- Autonomy level 只表示执行权限级别，不表示论文 layer 或 table 编号。
+- 如果你想表达论文层、实验 phase 或主表，请明确写成 \`paper layer\`、\`phase\` 或 \`table\`。
 - Approval status: draft
 - Allowed stages: run, iterate, review, report
 - Success criteria:
 - Terminal goal type:
 - Terminal goal target:
 - Required terminal artifact:
+- 如果 workflow language 是中文，摘要、清单条目、任务标签和进度更新都应使用中文。
+- 示例 Objective: 推进 paper layer 3 的 organizer enforcement，完成一轮 bounded protocol、测试、最小实现和一轮小规模结果。
 ## 循环预算
@@ -950,6 +954,9 @@ const ZH_SKILL_FILES = {
 ## 阶段命令
+- Rung 的 \`Command\` 应该绑定真实的长任务命令，由它产出最终实验结果。
+- 短 watcher 只用于查看进度；当真实实验还在运行时，不要把短 watcher 当成 stage 或 rung 的主命令。
+- 当真实实验进程还活着时，只记录进度更新并继续等待。
 - Run command:
 - Iterate command:
 - Review command:
@@ -1383,7 +1390,7 @@ ZH_CONTENT[path.join(".lab", ".managed", "templates", "framing.md")] = `# 论文
 ZH_CONTENT[path.join(".codex", "prompts", "lab.md")] = codexPrompt(
   "查看 /lab 研究工作流总览并选择合适阶段",
   "workflow question 或 stage choice",
-  "# `/lab` for Codex\n\n`/lab` 是严格的研究工作流命令族。每次都使用同一套仓库工件和阶段边界。\n\n## 子命令\n\n- `/lab:idea`\n  调研 idea，定义问题与 failure case，归类 contribution 与 breakthrough level，对比现有方法，收束三个一眼就有意义的点，并在实现前保留 approval gate。\n\n- `/lab:data`\n  把已批准的 idea 转成数据集与 benchmark 方案，记录数据集年份、使用过该数据集的论文、下载来源、许可或访问限制，以及 classic-public、recent-strong-public、claim-specific 三类 benchmark 的纳入理由，和 canonical baselines、strong historical baselines、recent strong public methods、closest prior work 四类对比方法的纳入理由。\n\n- `/lab:auto`\n  在不改变 mission、framing 和核心 claims 的前提下，读取 eval-protocol 与 auto-mode 契约并自动编排 `run`、`iterate`、`review`、`report`，必要时扩展数据集、benchmark 和 comparison methods，并在满足升格策略时自动升级 primary package。启动前必须选定 autonomy level、声明 terminal goal，并显式批准契约。\n\n- `/lab:framing`\n  通过审计当前领域与相邻领域的术语，锁定 paper-facing 的方法名、模块名、论文题目和 contribution bullets，并在 section 起草前保留 approval gate。\n\n- `/lab:spec`\n  把已批准的 idea 转成 `.lab/changes/<change-id>/` 下的一个 lab change 目录，并在其中写出 `proposal`、`design`、`spec`、`tasks`。\n\n- `/lab:run`\n  执行最小有意义验证运行，登记 run，并生成第一版标准化评估摘要。\n\n- `/lab:iterate`\n  在冻结 mission、阈值、verification commands 与 `completion_promise` 的前提下执行有边界的实验迭代。\n\n- `/lab:review`\n  以 reviewer mode 审查文档或结果，先给短摘要，再输出 findings、fatal flaws、fix priority 和 residual risks。\n\n- `/lab:report`\n  从 runs 和 iterations 工件生成最终研究报告。\n\n- `/lab:write`\n  使用已安装 `lab` skill 下 vendored 的 paper-writing references，把稳定 report 工件转成论文 section。\n\n## 调度规则\n\n- 始终使用 `skills/lab/SKILL.md` 作为工作流合同。\n- 用户显式调用 `/lab:<stage>` 时，要立刻执行该 stage，而不是只推荐别的 `/lab` stage。\n- 先给简洁摘要，再决定是否写工件，最后回报输出路径和下一步。\n- 如果歧义会影响结论，一次只问一个问题；如果有多条可行路径，先给 2-3 个方案再收敛。\n- `/lab:spec` 前应已有经批准的数据集与 benchmark 方案。\n- `/lab:run`、`/lab:iterate`、`/lab:auto`、`/lab:report` 都应遵循 `.lab/context/eval-protocol.md`。\n- `.lab/context/eval-protocol.md` 不只定义主指标和主表，也应定义指标释义、实验阶梯，以及指标和对比实现的来源。\n- `/lab:auto` 只编排已批准边界内的执行阶段，不替代手动的 idea/data/framing/spec 决策。\n- `/lab:write` 前必须已有经批准的 `/lab:framing` 工件。\n"
+  "# `/lab` for Codex\n\n`/lab` 是严格的研究工作流命令族。每次都使用同一套仓库工件和阶段边界。\n\n## 子命令\n\n- `/lab:idea`\n  调研 idea，定义问题与 failure case，归类 contribution 与 breakthrough level，对比现有方法，收束三个一眼就有意义的点，并在实现前保留 approval gate。\n\n- `/lab:data`\n  把已批准的 idea 转成数据集与 benchmark 方案，记录数据集年份、使用过该数据集的论文、下载来源、许可或访问限制，以及 classic-public、recent-strong-public、claim-specific 三类 benchmark 的纳入理由，和 canonical baselines、strong historical baselines、recent strong public methods、closest prior work 四类对比方法的纳入理由。\n\n- `/lab:auto`\n  在不改变 mission、framing 和核心 claims 的前提下，读取 eval-protocol 与 auto-mode 契约并自动编排 `run`、`iterate`、`review`、`report`，必要时扩展数据集、benchmark 和 comparison methods，并在满足升格策略时自动升级 primary package。启动前必须选定 autonomy level、声明 terminal goal，并显式批准契约。\n\n- `/lab:framing`\n  通过审计当前领域与相邻领域的术语，锁定 paper-facing 的方法名、模块名、论文题目和 contribution bullets，并在 section 起草前保留 approval gate。\n\n- `/lab:spec`\n  把已批准的 idea 转成 `.lab/changes/<change-id>/` 下的一个 lab change 目录，并在其中写出 `proposal`、`design`、`spec`、`tasks`。\n\n- `/lab:run`\n  执行最小有意义验证运行，登记 run，并生成第一版标准化评估摘要。\n\n- `/lab:iterate`\n  在冻结 mission、阈值、verification commands 与 `completion_promise` 的前提下执行有边界的实验迭代。\n\n- `/lab:review`\n  以 reviewer mode 审查文档或结果，先给短摘要，再输出 findings、fatal flaws、fix priority 和 residual risks。\n\n- `/lab:report`\n  从 runs 和 iterations 工件生成最终研究报告。\n\n- `/lab:write`\n  使用已安装 `lab` skill 下 vendored 的 paper-writing references，把稳定 report 工件转成论文 section。\n\n## 调度规则\n\n- 始终使用 `skills/lab/SKILL.md` 作为工作流合同。\n- 用户显式调用 `/lab:<stage>` 时，要立刻执行该 stage，而不是只推荐别的 `/lab` stage。\n- 先给简洁摘要，再决定是否写工件，最后回报输出路径和下一步。\n- 如果歧义会影响结论，一次只问一个问题；如果有多条可行路径，先给 2-3 个方案再收敛。\n- `/lab:spec` 前应已有经批准的数据集与 benchmark 方案。\n- `/lab:run`、`/lab:iterate`、`/lab:auto`、`/lab:report` 都应遵循 `.lab/context/eval-protocol.md`。\n- `.lab/context/eval-protocol.md` 不只定义主指标和主表，也应定义指标释义、实验阶梯，以及指标和对比实现的来源。\n- `/lab:auto` 只编排已批准边界内的执行阶段，不替代手动的 idea/data/framing/spec 决策。\n- `/lab:write` 前必须已有经批准的 `/lab:framing` 工件。\n\n## 如何输入 `/lab:auto`\n\n- 把 `Autonomy level L1/L2/L3` 视为执行权限级别，不要和论文里的 layer、phase、table 编号混用。\n- 把 `paper layer`、`phase`、`table` 视为实验目标。例如 `paper layer 3` 或 `Phase 1 reviewer fidelity` 不是 `Autonomy level L3`。\n- 一条好的 `/lab:auto` 输入应至少说清：objective、自治级别、terminal goal、scope、allowed modifications。\n- 如果 workflow language 是中文，摘要、清单条目、任务标签和进度更新都应使用中文，除非文件路径、代码标识符或字面指标名必须保持原样。\n- 示例：`/lab:auto 自治级别 L2。目标：推进 paper layer 3 的 organizer enforcement。终止条件：完成 bounded protocol、测试、最小实现和一轮小规模结果。允许修改：evaluator prompt registry、ingestion、parser。`\n"
 );
 ZH_CONTENT[path.join(".codex", "prompts", "lab-data.md")] = codexPrompt(
@@ -1395,14 +1402,14 @@ ZH_CONTENT[path.join(".codex", "prompts", "lab-data.md")] = codexPrompt(
 ZH_CONTENT[path.join(".codex", "prompts", "lab-auto.md")] = codexPrompt(
   "在已批准边界内编排自动实验循环",
   "auto mode objective",
-  "使用已安装的 `lab` 技能：`.codex/skills/lab/SKILL.md`。\n\n立刻针对用户当前给出的参数执行 `/lab:auto`，不要只推荐别的 `/lab` 阶段。只有在缺少阻塞性前提时，才明确指出缺什么，并且一次最多追问一个问题。\n\n本命令运行 `/lab:auto` 阶段。它必须读取 `.lab/context/eval-protocol.md`、`.lab/context/auto-mode.md`、`.lab/context/auto-status.md` 与 `.lab/context/auto-outcome.md`，先确认 autonomy level、approval status 与 terminal goal schema，再把 eval-protocol 里的指标释义、主表计划、来源约束与结构化实验阶梯当作执行依据，在不修改 mission、framing 和核心 claims 的前提下编排已批准的 `run`、`iterate`、`review`、`report`，轮询长任务完成情况；如果声明了 rung，就保持会话活着并按 rung 转移继续推进。"
+  "使用已安装的 `lab` 技能：`.codex/skills/lab/SKILL.md`。\n\n立刻针对用户当前给出的参数执行 `/lab:auto`，不要只推荐别的 `/lab` 阶段。只有在缺少阻塞性前提时，才明确指出缺什么，并且一次最多追问一个问题。\n\n本命令运行 `/lab:auto` 阶段。它必须读取 `.lab/context/eval-protocol.md`、`.lab/context/auto-mode.md`、`.lab/context/auto-status.md` 与 `.lab/context/auto-outcome.md`，先确认 autonomy level、approval status 与 terminal goal schema，再把 eval-protocol 里的指标释义、主表计划、来源约束与结构化实验阶梯当作执行依据，在不修改 mission、framing 和核心 claims 的前提下编排已批准的 `run`、`iterate`、`review`、`report`，轮询长任务完成情况；如果声明了 rung，就保持会话活着并按 rung 转移继续推进。\n如果仓库的 workflow language 是中文，摘要、清单条目、任务标签和进度更新都必须使用中文，除非某个文件路径、代码标识符或字面指标名必须保持原样。\n把 `Layer 3`、`Phase 1`、`Table 2` 这类表达视为论文范围目标；只有显式写成 `Autonomy level L3` 或 `自治级别 L3` 时，才把它当成执行权限级别。\n不要用 `sleep 30`、单次 `pgrep` 或一次性的 `metrics.json` 探针来代替真实长任务命令；当真实实验进程还活着时，只允许发进度更新并继续等待。"
 );
 ZH_CONTENT[path.join(".claude", "commands", "lab.md")] = claudeCommand(
   "LAB",
   "查看 /lab 研究工作流总览并选择合适阶段",
   "workflow, research, overview",
-  "# `/lab` for Claude\n\n`/lab` 是严格的研究工作流命令族。每次都使用同一套仓库工件和阶段边界。\n\n## 子命令\n\n- `/lab:idea`\n  调研 idea，定义问题与 failure case，归类 contribution 与 breakthrough level，对比现有方法，收束三个一眼就有意义的点，并在实现前保留 approval gate。\n\n- `/lab:data`\n  把已批准的 idea 转成数据集与 benchmark 方案，记录数据集年份、使用过该数据集的论文、下载来源、许可或访问限制，以及 classic-public、recent-strong-public、claim-specific 三类 benchmark 的纳入理由，和 canonical baselines、strong historical baselines、recent strong public methods、closest prior work 四类对比方法的纳入理由。\n\n- `/lab:auto`\n  在不改变 mission、framing 和核心 claims 的前提下，读取 eval-protocol 与 auto-mode 契约并自动编排 `run`、`iterate`、`review`、`report`，必要时扩展数据集、benchmark 和 comparison methods，并在满足升格策略时自动升级 primary package。启动前必须选定 autonomy level、声明 terminal goal，并显式批准契约。\n\n- `/lab:framing`\n  通过审计当前领域与相邻领域的术语，锁定 paper-facing 的方法名、模块名、论文题目和 contribution bullets，并在 section 起草前保留 approval gate。\n\n- `/lab:spec`\n  把已批准的 idea 转成 `.lab/changes/<change-id>/` 下的一个 lab change 目录，并在其中写出 `proposal`、`design`、`spec`、`tasks`。\n\n- `/lab:run`\n  执行最小有意义验证运行，登记 run，并生成第一版标准化评估摘要。\n\n- `/lab:iterate`\n  在冻结 mission、阈值、verification commands 与 `completion_promise` 的前提下执行有边界的实验迭代。\n\n- `/lab:review`\n  以 reviewer mode 审查文档或结果，先给短摘要，再输出 findings、fatal flaws、fix priority 和 residual risks。\n\n- `/lab:report`\n  从 runs 和 iterations 工件生成最终研究报告。\n\n- `/lab:write`\n  使用已安装 `lab` skill 下 vendored 的 paper-writing references，把稳定 report 工件转成论文 section。\n\n## 调度规则\n\n- 始终使用 `skills/lab/SKILL.md` 作为工作流合同。\n- 用户显式调用 `/lab:<stage>` 时，要立刻执行该 stage，而不是只推荐别的 `/lab` stage。\n- 先给简洁摘要，再决定是否写工件，最后回报输出路径和下一步。\n- 如果歧义会影响结论，一次只问一个问题；如果有多条可行路径，先给 2-3 个方案再收敛。\n- `/lab:spec` 前应已有经批准的数据集与 benchmark 方案。\n- `/lab:run`、`/lab:iterate`、`/lab:auto`、`/lab:report` 都应遵循 `.lab/context/eval-protocol.md`。\n- `.lab/context/eval-protocol.md` 不只定义主指标和主表，也应定义指标释义、实验阶梯，以及指标和对比实现的来源。\n- `/lab:auto` 只编排已批准边界内的执行阶段，不替代手动的 idea/data/framing/spec 决策。\n- `/lab:write` 前必须已有经批准的 `/lab:framing` 工件。\n"
+  "# `/lab` for Claude\n\n`/lab` 是严格的研究工作流命令族。每次都使用同一套仓库工件和阶段边界。\n\n## 子命令\n\n- `/lab:idea`\n  调研 idea，定义问题与 failure case，归类 contribution 与 breakthrough level，对比现有方法，收束三个一眼就有意义的点，并在实现前保留 approval gate。\n\n- `/lab:data`\n  把已批准的 idea 转成数据集与 benchmark 方案，记录数据集年份、使用过该数据集的论文、下载来源、许可或访问限制，以及 classic-public、recent-strong-public、claim-specific 三类 benchmark 的纳入理由，和 canonical baselines、strong historical baselines、recent strong public methods、closest prior work 四类对比方法的纳入理由。\n\n- `/lab:auto`\n  在不改变 mission、framing 和核心 claims 的前提下，读取 eval-protocol 与 auto-mode 契约并自动编排 `run`、`iterate`、`review`、`report`，必要时扩展数据集、benchmark 和 comparison methods，并在满足升格策略时自动升级 primary package。启动前必须选定 autonomy level、声明 terminal goal，并显式批准契约。\n\n- `/lab:framing`\n  通过审计当前领域与相邻领域的术语，锁定 paper-facing 的方法名、模块名、论文题目和 contribution bullets，并在 section 起草前保留 approval gate。\n\n- `/lab:spec`\n  把已批准的 idea 转成 `.lab/changes/<change-id>/` 下的一个 lab change 目录，并在其中写出 `proposal`、`design`、`spec`、`tasks`。\n\n- `/lab:run`\n  执行最小有意义验证运行，登记 run，并生成第一版标准化评估摘要。\n\n- `/lab:iterate`\n  在冻结 mission、阈值、verification commands 与 `completion_promise` 的前提下执行有边界的实验迭代。\n\n- `/lab:review`\n  以 reviewer mode 审查文档或结果，先给短摘要，再输出 findings、fatal flaws、fix priority 和 residual risks。\n\n- `/lab:report`\n  从 runs 和 iterations 工件生成最终研究报告。\n\n- `/lab:write`\n  使用已安装 `lab` skill 下 vendored 的 paper-writing references，把稳定 report 工件转成论文 section。\n\n## 调度规则\n\n- 始终使用 `skills/lab/SKILL.md` 作为工作流合同。\n- 用户显式调用 `/lab:<stage>` 时，要立刻执行该 stage，而不是只推荐别的 `/lab` stage。\n- 先给简洁摘要，再决定是否写工件，最后回报输出路径和下一步。\n- 如果歧义会影响结论，一次只问一个问题；如果有多条可行路径，先给 2-3 个方案再收敛。\n- `/lab:spec` 前应已有经批准的数据集与 benchmark 方案。\n- `/lab:run`、`/lab:iterate`、`/lab:auto`、`/lab:report` 都应遵循 `.lab/context/eval-protocol.md`。\n- `.lab/context/eval-protocol.md` 不只定义主指标和主表，也应定义指标释义、实验阶梯，以及指标和对比实现的来源。\n- `/lab:auto` 只编排已批准边界内的执行阶段，不替代手动的 idea/data/framing/spec 决策。\n- `/lab:write` 前必须已有经批准的 `/lab:framing` 工件。\n\n## 如何输入 `/lab:auto`\n\n- 把 `Autonomy level L1/L2/L3` 视为执行权限级别，不要和论文里的 layer、phase、table 编号混用。\n- 把 `paper layer`、`phase`、`table` 视为实验目标。例如 `paper layer 3` 或 `Phase 1 reviewer fidelity` 不是 `Autonomy level L3`。\n- 一条好的 `/lab:auto` 输入应至少说清：objective、自治级别、terminal goal、scope、allowed modifications。\n- 如果 workflow language 是中文，摘要、清单条目、任务标签和进度更新都应使用中文，除非文件路径、代码标识符或字面指标名必须保持原样。\n- 示例：`/lab:auto 自治级别 L2。目标：推进 paper layer 3 的 organizer enforcement。终止条件：完成 bounded protocol、测试、最小实现和一轮小规模结果。允许修改：evaluator prompt registry、ingestion、parser。`\n"
 );
 ZH_CONTENT[path.join(".claude", "commands", "lab", "data.md")] = claudeCommand(
@@ -1416,7 +1423,7 @@ ZH_CONTENT[path.join(".claude", "commands", "lab", "auto.md")] = claudeCommand(
   "LAB: Auto",
   "在已批准边界内编排自动实验循环",
   "workflow, research, auto",
-  "使用已安装的 `lab` 技能：`.claude/skills/lab/SKILL.md`。\n\n立刻针对用户当前给出的参数执行 `/lab:auto`，不要只推荐别的 `/lab` 阶段。只有在缺少阻塞性前提时，才明确指出缺什么，并且一次最多追问一个问题。\n\n本命令运行 `/lab:auto` 阶段。它必须读取 `.lab/context/eval-protocol.md`、`.lab/context/auto-mode.md`、`.lab/context/auto-status.md` 与 `.lab/context/auto-outcome.md`，先确认 autonomy level、approval status 与 terminal goal schema，再把 eval-protocol 里的指标释义、主表计划、来源约束与结构化实验阶梯当作执行依据，在不修改 mission、framing 和核心 claims 的前提下编排已批准的 `run`、`iterate`、`review`、`report`，轮询长任务完成情况；如果声明了 rung，就保持会话活着并按 rung 转移继续推进。"
+  "使用已安装的 `lab` 技能：`.claude/skills/lab/SKILL.md`。\n\n立刻针对用户当前给出的参数执行 `/lab:auto`，不要只推荐别的 `/lab` 阶段。只有在缺少阻塞性前提时，才明确指出缺什么，并且一次最多追问一个问题。\n\n本命令运行 `/lab:auto` 阶段。它必须读取 `.lab/context/eval-protocol.md`、`.lab/context/auto-mode.md`、`.lab/context/auto-status.md` 与 `.lab/context/auto-outcome.md`，先确认 autonomy level、approval status 与 terminal goal schema，再把 eval-protocol 里的指标释义、主表计划、来源约束与结构化实验阶梯当作执行依据，在不修改 mission、framing 和核心 claims 的前提下编排已批准的 `run`、`iterate`、`review`、`report`，轮询长任务完成情况；如果声明了 rung，就保持会话活着并按 rung 转移继续推进。\n如果仓库的 workflow language 是中文，摘要、清单条目、任务标签和进度更新都必须使用中文，除非某个文件路径、代码标识符或字面指标名必须保持原样。\n把 `Layer 3`、`Phase 1`、`Table 2` 这类表达视为论文范围目标；只有显式写成 `Autonomy level L3` 或 `自治级别 L3` 时，才把它当成执行权限级别。\n不要用 `sleep 30`、单次 `pgrep` 或一次性的 `metrics.json` 探针来代替真实长任务命令；当真实实验进程还活着时，只允许发进度更新并继续等待。"
 );
 ZH_CONTENT[path.join(".codex", "skills", "lab", "SKILL.md")] = `---
@@ -1939,6 +1946,11 @@ ZH_CONTENT[path.join(".codex", "skills", "lab", "stages", "auto.md")] = `# \`/la
 - 如果契约本身不完整，一次只追问一个问题。
 - 如果存在多个可信的下一动作，先给 2-3 个 bounded 方案和推荐项，再启动长任务。
 - 只有当下一步会离开已批准的 exploration envelope、超出选定 autonomy level，或实质改变 frozen core 时，才保留人工 approval gate。
+- 先做输入归一化：把 \`Autonomy level L1/L2/L3\` 视为执行权限级别，把 \`Layer 3\`、\`Phase 1\`、\`Table 2\` 视为论文范围目标。
+- 如果用户同时提了论文层、实验 phase 和自治级别，先用一句话重述：objective、自治级别、terminal goal、scope、allowed modifications。
+- 如果 workflow language 是中文，摘要、清单条目、任务标签和进度更新都应使用中文，除非文件路径、代码标识符或字面指标名必须保持原样。
+- 不要把 \`sleep 30\`、单次 \`pgrep\` 或一次性的 \`metrics.json\` 探针当成 rung 主命令；这些只能算进度检查。
+- 当真实实验进程还活着时，只允许发进度更新并继续等待，不能把这一 rung 当作已经完成。
 `;
 ZH_CONTENT[path.join(".claude", "skills", "lab", "stages", "auto.md")] =

package/package-assets/claude/commands/lab/auto.md CHANGED Viewed

@@ -9,3 +9,6 @@ Use the installed `lab` skill at `.claude/skills/lab/SKILL.md`.
 Execute the requested `/lab:auto` stage against the user's argument now. Do not only recommend another lab stage. If a blocking prerequisite is missing, say exactly what is missing and ask at most one clarifying question.
 This command runs the `/lab:auto` stage. It must read `.lab/context/eval-protocol.md`, `.lab/context/auto-mode.md`, `.lab/context/auto-status.md`, and `.lab/context/auto-outcome.md`, enforce the declared terminal goal schema, orchestrate approved run, iterate, review, and report stages inside that contract, poll long-running work until completion or stop conditions, and write progress plus the final outcome back into `.lab/context/auto-status.md` and `.lab/context/auto-outcome.md`.
+When the repository workflow language is Chinese, summaries, checklist items, task labels, and progress updates should be written in Chinese unless a literal identifier must stay unchanged.
+Treat `Layer 3`, `Phase 1`, or `Table 2` as paper-scope targets. Treat `Autonomy level L3` as the execution permission level.
+Do not replace the real long-running experiment command with a short watcher such as `sleep 30`, `pgrep`, or a one-shot `metrics.json` probe. While the real experiment process is still alive, emit only a progress update and keep waiting.

package/package-assets/claude/commands/lab.md CHANGED Viewed

@@ -54,3 +54,17 @@ tags: [workflow, research, overview]
 - `/lab:run`, `/lab:iterate`, `/lab:auto`, and `/lab:report` should all follow `.lab/context/eval-protocol.md`, including its recorded sources for metrics and comparison implementations.
 - `/lab:write` requires an approved framing artifact from `/lab:framing`.
 - `/lab:write` requires stable report artifacts, a mini-outline, the active section guide, `paper-review.md`, and `does-my-writing-flow-source.md`, and should only change one section per round.
+## How to Ask for `/lab:auto`
+- Treat `Autonomy level L1/L2/L3` as the execution privilege level, not as a paper layer, phase, or table number.
+- Treat `paper layer`, `phase`, and `table` as experiment targets. For example, `paper layer 3` or `Phase 1 reviewer fidelity` should not be interpreted as `Autonomy level L3`.
+- A good `/lab:auto` request should name:
+  - the objective
+  - the autonomy level
+  - the terminal goal
+  - the scope or phase to advance
+  - the allowed modifications
+- If the repository workflow language is Chinese, summaries, checklist items, task labels, and progress updates should be written in Chinese unless a code identifier or file path must stay literal.
+- Good example:
+  - `/lab:auto Autonomy level L2. Objective: advance paper layer 3 organizer enforcement. Terminal goal: task-completion. Scope: bounded protocol, tests, minimal implementation, and one small run. Allowed modifications: evaluator prompt registry, ingestion, and parser only.`

package/package-assets/codex/prompts/lab-auto.md CHANGED Viewed

@@ -7,3 +7,6 @@ Use the installed `lab` skill at `.codex/skills/lab/SKILL.md`.
 Execute the requested `/lab:auto` stage against the user's argument now. Do not only recommend another lab stage. If a blocking prerequisite is missing, say exactly what is missing and ask at most one clarifying question.
 This command runs the `/lab:auto` stage. It must read `.lab/context/eval-protocol.md`, `.lab/context/auto-mode.md`, `.lab/context/auto-status.md`, and `.lab/context/auto-outcome.md`, enforce the declared terminal goal schema, orchestrate approved run, iterate, review, and report stages inside that contract, poll long-running work until completion or stop conditions, and write progress plus the final outcome back into `.lab/context/auto-status.md` and `.lab/context/auto-outcome.md`.
+When the repository workflow language is Chinese, summaries, checklist items, task labels, and progress updates should be written in Chinese unless a literal identifier must stay unchanged.
+Treat `Layer 3`, `Phase 1`, or `Table 2` as paper-scope targets. Treat `Autonomy level L3` as the execution permission level.
+Do not replace the real long-running experiment command with a short watcher such as `sleep 30`, `pgrep`, or a one-shot `metrics.json` probe. While the real experiment process is still alive, emit only a progress update and keep waiting.

package/package-assets/codex/prompts/lab.md CHANGED Viewed

@@ -52,3 +52,17 @@ argument-hint: workflow question or stage choice
 - `/lab:run`, `/lab:iterate`, `/lab:auto`, and `/lab:report` should all follow `.lab/context/eval-protocol.md`, including its recorded sources for metrics and comparison implementations.
 - `/lab:write` requires an approved framing artifact from `/lab:framing`.
 - `/lab:write` requires stable report artifacts, a mini-outline, the active section guide, `paper-review.md`, and `does-my-writing-flow-source.md`, and should only change one section per round.
+## How to Ask for `/lab:auto`
+- Treat `Autonomy level L1/L2/L3` as the execution privilege level, not as a paper layer, phase, or table number.
+- Treat `paper layer`, `phase`, and `table` as experiment targets. For example, `paper layer 3` or `Phase 1 reviewer fidelity` should not be interpreted as `Autonomy level L3`.
+- A good `/lab:auto` request should name:
+  - the objective
+  - the autonomy level
+  - the terminal goal
+  - the scope or phase to advance
+  - the allowed modifications
+- If the repository workflow language is Chinese, summaries, checklist items, task labels, and progress updates should be written in Chinese unless a code identifier or file path must stay literal.
+- Good example:
+  - `/lab:auto Autonomy level L2. Objective: advance paper layer 3 organizer enforcement. Terminal goal: task-completion. Scope: bounded protocol, tests, minimal implementation, and one small run. Allowed modifications: evaluator prompt registry, ingestion, and parser only.`

package/package-assets/shared/lab/context/auto-mode.md CHANGED Viewed

@@ -6,14 +6,18 @@ If `eval-protocol.md` declares structured rung entries, auto mode follows those
 ## Objective
-- Objective:
+- Objective:
 - Autonomy level: L2
+- Autonomy level controls execution privilege, not paper layer or table number.
+- If you mean a paper layer, phase, or table, spell it explicitly as `paper layer`, `phase`, or `table`.
 - Approval status: draft
 - Allowed stages: run, iterate, review, report
 - Success criteria:
 - Terminal goal type:
 - Terminal goal target:
 - Required terminal artifact:
+- If the workflow language is Chinese, keep summaries, checklist items, task labels, and progress updates in Chinese.
+- Example objective: advance paper layer 3 organizer enforcement through one bounded protocol, tests, minimal implementation, and one small run.
 ## Loop Budget
@@ -24,6 +28,9 @@ If `eval-protocol.md` declares structured rung entries, auto mode follows those
 ## Stage Commands
+- Rung `Command` should be the real long-running command that owns the experiment result.
+- A short watcher is only a progress probe. Do not use a short watcher as the stage or rung command when the real experiment is still running.
+- While the real experiment process is still alive, only record a progress update and keep waiting.
 - Run command:
 - Iterate command:
 - Review command:

package/package-assets/shared/skills/lab/stages/auto.md CHANGED Viewed

@@ -54,6 +54,8 @@
 - You may promote exploratory additions to the primary package only when the contract's promotion policy is satisfied and the promotion is written back into `data-decisions.md`, `decisions.md`, `state.md`, and `session-brief.md`.
 - Poll long-running commands until they finish, hit a timeout, or hit a stop condition.
 - Keep a poll-based waiting loop instead of sleeping blindly.
+- Do not treat a short watcher such as `sleep 30`, a one-shot `pgrep`, or a single `metrics.json` probe as the rung command when the real experiment is still running.
+- Bind each rung to the real long-running command or process that owns the experiment result.
 - Always write a canonical `.lab/context/auto-outcome.md` when the run completes, stops, or fails.
 - When the evaluation protocol declares structured ladder rungs, execute them as a foreground rung state machine:
   - each rung must declare `Stage`, `Goal`, `Command`, `Watch`, `Gate`, `On pass`, `On fail`, and `On stop`
@@ -87,3 +89,21 @@
 - If the contract is incomplete, ask one clarifying question at a time.
 - If multiple next actions are credible, present 2-3 bounded options with trade-offs before arming a long run.
 - Only ask for approval when the next step would leave the approved exploration envelope, exceed the chosen autonomy level, or materially change the frozen core.
+## Input Normalization
+- Normalize ambiguous user requests before arming the loop.
+- Treat `Autonomy level L1/L2/L3` as execution privilege only.
+- Treat `Layer`, `Phase`, and `Table` references as paper-structure or experiment-scope targets, not as autonomy levels.
+- Example:
+  - `Layer 3 organizer enforcement` means a paper layer or experiment target.
+  - `Autonomy level L3` means the aggressive campaign permission envelope.
+- If the user mixes framework work and experiment work in one request, restate a normalized contract with:
+  - objective
+  - autonomy level
+  - terminal goal
+  - scope
+  - allowed modifications
+- Then ask at most one clarifying question if a blocking field is still missing.
+- If `.lab/config/workflow.json` sets the workflow language to Chinese, write summaries, options, checklist items, task labels, and progress updates in Chinese unless a file path, code identifier, or literal metric name must remain unchanged.
+- While the real experiment process is still alive, emit only a progress update and keep waiting. Do not present a terminal summary for that rung until the process exits or the rung hits an explicit stop boundary.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "superlab",
-  "version": "0.1.13",
+  "version": "0.1.14",
   "description": "Strict /lab research workflow installer for Codex and Claude",
   "keywords": [
     "codex",