superlab 0.1.39 → 0.1.41

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (28) hide show
  1. package/lib/i18n.cjs +88 -11
  2. package/lib/install.cjs +2 -0
  3. package/lib/lab_write_contract.json +2 -2
  4. package/package-assets/claude/commands/lab-auto.md +1 -1
  5. package/package-assets/claude/commands/lab-write.md +1 -1
  6. package/package-assets/claude/commands/lab.md +2 -1
  7. package/package-assets/codex/prompts/lab-auto.md +1 -1
  8. package/package-assets/codex/prompts/lab-write.md +1 -1
  9. package/package-assets/codex/prompts/lab.md +2 -1
  10. package/package-assets/shared/lab/.managed/templates/artifact-status.md +8 -0
  11. package/package-assets/shared/lab/.managed/templates/final-report.md +8 -0
  12. package/package-assets/shared/lab/.managed/templates/idea.md +2 -0
  13. package/package-assets/shared/lab/.managed/templates/review-checklist.md +6 -0
  14. package/package-assets/shared/lab/.managed/templates/spec.md +6 -0
  15. package/package-assets/shared/lab/.managed/templates/terminology-glossary.md +6 -4
  16. package/package-assets/shared/lab/.managed/templates/write-iteration.md +11 -1
  17. package/package-assets/shared/lab/context/auto-mode.md +5 -0
  18. package/package-assets/shared/lab/system/core.md +1 -0
  19. package/package-assets/shared/skills/lab/SKILL.md +10 -2
  20. package/package-assets/shared/skills/lab/references/recipes.md +38 -0
  21. package/package-assets/shared/skills/lab/references/workflow.md +2 -0
  22. package/package-assets/shared/skills/lab/stages/auto.md +9 -1
  23. package/package-assets/shared/skills/lab/stages/idea.md +4 -0
  24. package/package-assets/shared/skills/lab/stages/report.md +2 -0
  25. package/package-assets/shared/skills/lab/stages/review.md +4 -0
  26. package/package-assets/shared/skills/lab/stages/spec.md +3 -0
  27. package/package-assets/shared/skills/lab/stages/write.md +5 -3
  28. package/package.json +1 -1
package/lib/i18n.cjs CHANGED
@@ -473,6 +473,7 @@ const ZH_SKILL_FILES = {
473
473
  - 不要把 idea 工件本身当成唯一证据记录;两轮文献检索的查询、来源分桶和最终来源数必须同步记到 \`.lab/writing/idea-source-log.md\`。
474
474
  - 在 \`idea\` 阶段可以说大致怎么验证、最小实验是什么,但不要在这里冻结 sample size、招募方案、条件数、问卷设计或随机化 protocol。
475
475
  - human-subject experiment design 应该留到 \`/lab:spec\`,在那里再把招募、分组、测量和伦理细节写死。
476
+ - 使用 \`skills/lab/references/recipes.md\` 作为常见阶段链路的快速路径,不要据此发明新的命令或别名。
476
477
  - 三个 meaningful points 每个都控制在一句直接的话里。
477
478
  - 在批准前,必须运行 \`.lab/.managed/scripts/validate_idea_artifact.py --idea <idea-artifact> --source-log .lab/writing/idea-source-log.md --workflow-config .lab/config/workflow.json\`。
478
479
  - rewrite-only 模式下不能更新 \`.lab/context/mission.md\`、\`.lab/context/decisions.md\` 或 \`.lab/context/open-questions.md\`。
@@ -518,6 +519,46 @@ const ZH_SKILL_FILES = {
518
519
  - 主指标是否符合顶会惯例
519
520
  - claims 是否有记录在案的证据支撑
520
521
  - failures 和 limitations 是否被保留
522
+ `,
523
+ [path.join(".codex", "skills", "lab", "references", "recipes.md")]:
524
+ `# /lab 阶段快速路径
525
+
526
+ 把这个文件当成常见阶段链路的快速路由图。它不会新增命令,也不会替代各 stage 的正式合同。
527
+
528
+ ## idea -> data -> spec
529
+
530
+ 当问题还在筛选或收窄时,走这条路径。
531
+
532
+ - 从 \`/lab:idea\` 开始,对候选方向做比较、文献检索,并在 approval gate 处收口。
533
+ - idea 批准后,如果需要冻结 benchmark 方案,就进入 \`/lab:data\`。
534
+ - idea 和 data package 都批准后,如果需要把变更写成可执行设计,就进入 \`/lab:spec\`。
535
+
536
+ ## run -> iterate -> review -> report
537
+
538
+ 当 mission 已批准、当前工作变成证据生成时,走这条路径。
539
+
540
+ - 从 \`/lab:run\` 开始,先做最小有意义验证。
541
+ - 当 campaign 需要在冻结 mission 内做 bounded rounds 时,进入 \`/lab:iterate\`。
542
+ - 当方法学、公平性、泄漏风险或 claim discipline 需要 reviewer 式审查时,进入 \`/lab:review\`。
543
+ - 当验证过的证据已经稳定到可以给协作者阅读时,进入 \`/lab:report\`。
544
+
545
+ ## framing -> write -> review
546
+
547
+ 当 report 工件已经稳定、论文叙事需要定稿时,走这条路径。
548
+
549
+ - 从 \`/lab:framing\` 开始,锁定 paper-facing 名称、题目方向和 contribution wording。
550
+ - 进入 \`/lab:write\`,基于已批准的 framing 和 report 证据逐 section 起草稿件。
551
+ - 当 section 草稿或整篇论文的 claim set 需要 reviewer mode critique 时,回到 \`/lab:review\`。
552
+
553
+ ## \`/lab:auto\` 按已批准范围使用
554
+
555
+ 只有在上游阶段决策都已经批准后,才使用 \`/lab:auto\`。
556
+
557
+ - \`L1\`:针对已批准执行阶段的一次 bounded validation cycle
558
+ - \`L2\`:在冻结核心边界内做默认的 bounded experiment iteration
559
+ - \`L3\`:更大范围的 campaign;只有 framing 已批准时才可选带写作
560
+
561
+ \`/lab:auto\` 是对已批准阶段链路的编排层,不替代 \`idea\`、\`data\`、\`framing\` 或 \`spec\`。
521
562
  `,
522
563
  [path.join(".lab", ".managed", "templates", "idea.md")]:
523
564
  `# Idea 工件
@@ -1325,10 +1366,12 @@ const ZH_SKILL_FILES = {
1325
1366
  - Key terms introduced or revised this round:
1326
1367
  - Was \`.lab/writing/terminology-glossary.md\` updated:
1327
1368
  - Did first mention use the full form, with an approved short form only after that:
1369
+ - Did each concept keep one natural-language paper-facing name throughout the prose:
1328
1370
  - For each term, what it is and why it matters here:
1329
1371
  - Did any internal identifier leak into reader-facing prose:
1330
- - Did any newly coined label joined by hyphens remain in reader-facing prose:
1372
+ - Did any label containing \`_\` or \`-\` remain in reader-facing prose:
1331
1373
  - Were the first mentions explained in the prose:
1374
+ - Did any alias drift remain unresolved:
1332
1375
  - Remaining reader-facing jargon risk:
1333
1376
 
1334
1377
  ## Language Decision
@@ -1353,9 +1396,10 @@ const ZH_SKILL_FILES = {
1353
1396
  ## 命名规则
1354
1397
 
1355
1398
  - 第一次出现先写全称;如果后面要复用简称或缩写,就在第一次出现时写成 \`全称(简称)\`。
1356
- - 同一个概念只能有一个 canonical paper-facing name。
1357
- - 内部标识符、配置键和实验包标签默认不要进正文;若必须出现,也只能先给读者做一次映射。
1358
- - 尽量使用自然短语,不要新造连字符拼接标签。
1399
+ - 同一个概念只能有一个自然语言 paper-facing 名称。
1400
+ - 正文里只用自然语言全称。
1401
+ - 正文里不要使用包含 \`_\` 或 \`-\` 的标签名。
1402
+ - 内部标识符、配置键和实验包标签默认不要进正文;若必须出现,也只能先给读者做一次映射,然后移回正文之外。
1359
1403
 
1360
1404
  ## Entries
1361
1405
 
@@ -1367,7 +1411,8 @@ const ZH_SKILL_FILES = {
1367
1411
  - Reader-facing explanation:
1368
1412
  - Why this term matters here:
1369
1413
  - First-use section:
1370
- - Allowed aliases:
1414
+ - Approved short natural-language alias, if any:
1415
+ - Forbidden aliases or labels:
1371
1416
  - Internal identifiers to avoid in prose:
1372
1417
 
1373
1418
  ## Audit
@@ -1537,6 +1582,11 @@ const ZH_SKILL_FILES = {
1537
1582
  - Terminal goal type:
1538
1583
  - Terminal goal target:
1539
1584
  - Required terminal artifact:
1585
+ - Primary gate:
1586
+ - Secondary guard:
1587
+ - Promotion condition:
1588
+ - Stop reason:
1589
+ - Escalation reason:
1540
1590
  - 如果 workflow language 是中文,摘要、清单条目、任务标签和进度更新都应使用中文。
1541
1591
  - 示例 Objective: 推进 paper layer 3,完成一轮 bounded protocol、测试、最小实现和一轮小规模结果。
1542
1592
 
@@ -1952,8 +2002,10 @@ ZH_CONTENT[path.join(".codex", "skills", "lab", "stages", "write.md")] = `# \`/l
1952
2002
  - 把 \`.lab/writing/terminology-glossary.md\` 当成写作期 glossary,用来沉淀全称、批准缩写、对外解释和可接受别名;它和 \`.lab/context/terminology-lock.md\` 的职责不同,不能混用。
1953
2003
  - 无论当前语言是什么,都要满足同一套学术可读性标准;如果本轮引入或改写了关键术语、缩写、指标名、机制名或系统标签,就在首次出现时顺手说明它是什么、为什么在这里重要。
1954
2004
  - 第一次出现先写全称;如果后面要复用简称或缩写,就在首次出现时定义。
1955
- - 同一个概念只能保留一个 paper-facing 名称;内部标识符、配置键和实验包标签默认不要进入 reader-facing prose。
1956
- - 尽量使用自然短语,不要新造连字符拼接标签。
2005
+ - 同一个概念只保留一个自然语言 paper-facing 名称。
2006
+ - 正文里使用自然语言全称;如果后面要复用简称,就先在第一次出现时定义。
2007
+ - reader-facing prose 里不要使用包含 \`_\` 或 \`-\` 的标签名。
2008
+ - 内部标识符、配置键和实验包标签默认不要进正文;若必须出现,也只能给读者映射一次,然后移回正文之外。
1957
2009
  - 不要靠堆术语来制造学术感;如果本轮引入或改写了关键术语,就在用户可见的轮次总结里补一段简短术语说明,并把 terminology-clarity 自检写进 write iteration artifact。
1958
2010
  - 只加载当前 section guide,不要一次加载全部章节参考。
1959
2011
  - 如果当前 section 是 \`abstract\`、\`introduction\` 或 \`method\`,还必须继续读取本地 example bank:\`references/paper-writing/examples/index.md\`、对应的 examples index,以及 1-2 个具体 example 文件。
@@ -2077,7 +2129,7 @@ ZH_CONTENT[path.join(".lab", ".managed", "templates", "framing.md")] = `# 论文
2077
2129
  ZH_CONTENT[path.join(".codex", "prompts", "lab.md")] = codexPrompt(
2078
2130
  "查看 /lab 研究工作流总览并选择合适阶段",
2079
2131
  "workflow question 或 stage choice",
2080
- "# `/lab` for Codex\n\n`/lab` 是严格的研究工作流命令族。每次都使用同一套仓库工件和阶段边界。\n\n## 子命令\n\n- `/lab:idea`\n 先做两轮脑暴和两轮文献检索,再定义问题与 failure case、对比最接近前作,并输出带 approval gate 的 source-backed recommendation。\n\n- `/lab:data`\n 把已批准的 idea 转成数据集与 benchmark 方案,记录数据集年份、使用过该数据集的论文、下载来源、许可或访问限制,以及 classic-public、recent-strong-public、claim-specific 三类 benchmark 的纳入理由,和 canonical baselines、strong historical baselines、recent strong public methods、closest prior work 四类对比方法的纳入理由。\n\n- `/lab:auto`\n 在不改变 mission、framing 和核心 claims 的前提下,读取 eval-protocol 与 auto-mode 契约并自动编排 `run`、`iterate`、`review`、`report`,必要时扩展数据集、benchmark 和 comparison methods,并在满足升格策略时自动升级 primary package。启动前必须选定 autonomy level、声明 terminal goal,并显式批准契约。\n\n- `/lab:framing`\n 通过审计当前领域与相邻领域的术语,锁定 paper-facing 的方法名、模块名、论文题目和 contribution bullets,并在 section 起草前保留 approval gate。\n\n- `/lab:spec`\n 把已批准的 idea 转成 `.lab/changes/<change-id>/` 下的一个 lab change 目录,并在其中写出 `proposal`、`design`、`spec`、`tasks`。\n\n- `/lab:run`\n 执行最小有意义验证运行,登记 run,并生成第一版标准化评估摘要。\n\n- `/lab:iterate`\n 在冻结 mission、阈值、verification commands 与 `completion_promise` 的前提下执行有边界的实验迭代。\n\n- `/lab:review`\n 以 reviewer mode 审查文档或结果,先给短摘要,再输出 findings、fatal flaws、fix priority 和 residual risks。\n\n- `/lab:report`\n 从 runs 和 iterations 工件生成最终研究报告。\n\n- `/lab:write`\n 使用已安装 `lab` skill 下 vendored 的 paper-writing references,把稳定 report 工件转成论文 section。\n\n## 调度规则\n\n- 始终使用 `skills/lab/SKILL.md` 作为工作流合同。\n- 用户显式调用 `/lab:<stage>` 时,要立刻执行该 stage,而不是只推荐别的 `/lab` stage。\n- 先给简洁的阶段摘要;只要 stage 合同要求受管工件,就应立刻落盘,再回报输出路径和下一步。\n- 如果歧义会影响结论,一次只问一个问题;如果有多条可行路径,先给 2-3 个方案再收敛。\n- `/lab:spec` 前应已有经批准的数据集与 benchmark 方案。\n- `/lab:run`、`/lab:iterate`、`/lab:auto`、`/lab:report` 都应遵循 `.lab/context/eval-protocol.md`。\n- `.lab/context/eval-protocol.md` 不只定义主指标和主表,也应定义指标释义、实验阶梯,以及指标和对比实现的来源。\n- `/lab:auto` 只编排已批准边界内的执行阶段,不替代手动的 idea/data/framing/spec 决策。\n- `/lab:write` 前必须已有经批准的 `/lab:framing` 工件。\n\n## 如何输入 `/lab:auto`\n\n## `/lab:auto` 层级指南\n\n- `L1`:适合安全验证、一轮 bounded 真实运行,或简单 report 刷新。\n- `L2`:默认推荐级别,适合冻结核心边界内的常规实验迭代。\n- `L3`:激进 campaign 级别,只在你明确想做更大范围探索和可选写作时使用。\n- 如果不确定,默认推荐 `L2`。\n- 如果用户输入没写级别,或者把级别和 `paper layer`、`phase`、`table` 混用了,就应先停下来,要求用户明确选 `L1/L2/L3`。\n\n- 把 `Autonomy level L1/L2/L3` 视为执行权限级别,不要和论文里的 layer、phase、table 编号混用。\n- 把 `paper layer`、`phase`、`table` 视为实验目标。例如 `paper layer 3` 或 `Phase 1` 不是 `Autonomy level L3`。\n- 一条好的 `/lab:auto` 输入应至少说清:objective、自治级别、terminal goal、scope、allowed modifications。\n- 如果 workflow language 是中文,摘要、清单条目、任务标签和进度更新都应使用中文,除非文件路径、代码标识符或字面指标名必须保持原样。\n- 示例:`/lab:auto 自治级别 L2。目标:推进 paper layer 3。终止条件:完成 bounded protocol、测试、最小实现和一轮小规模结果。允许修改:配置、数据接入、评估脚本。`\n"
2132
+ "# `/lab` for Codex\n\n`/lab` 是严格的研究工作流命令族。每次都使用同一套仓库工件和阶段边界。\n\n## 子命令\n\n- `/lab:idea`\n 先做两轮脑暴和两轮文献检索,再定义问题与 failure case、对比最接近前作,并输出带 approval gate 的 source-backed recommendation。\n\n- `/lab:data`\n 把已批准的 idea 转成数据集与 benchmark 方案,记录数据集年份、使用过该数据集的论文、下载来源、许可或访问限制,以及 classic-public、recent-strong-public、claim-specific 三类 benchmark 的纳入理由,和 canonical baselines、strong historical baselines、recent strong public methods、closest prior work 四类对比方法的纳入理由。\n\n- `/lab:auto`\n 在不改变 mission、framing 和核心 claims 的前提下,读取 eval-protocol 与 auto-mode 契约并自动编排 `run`、`iterate`、`review`、`report`,必要时扩展数据集、benchmark 和 comparison methods,并在满足升格策略时自动升级 primary package。启动前必须选定 autonomy level、声明 terminal goal,并显式写清 primary gate、secondary guard、promotion condition、stop reason 和 escalation reason,再批准契约。\n\n- `/lab:framing`\n 通过审计当前领域与相邻领域的术语,锁定 paper-facing 的方法名、模块名、论文题目和 contribution bullets,并在 section 起草前保留 approval gate。\n\n- `/lab:spec`\n 把已批准的 idea 转成 `.lab/changes/<change-id>/` 下的一个 lab change 目录,并在其中写出 `proposal`、`design`、`spec`、`tasks`。\n\n- `/lab:run`\n 执行最小有意义验证运行,登记 run,并生成第一版标准化评估摘要。\n\n- `/lab:iterate`\n 在冻结 mission、阈值、verification commands 与 `completion_promise` 的前提下执行有边界的实验迭代。\n\n- `/lab:review`\n 以 reviewer mode 审查文档或结果,先给短摘要,再输出 findings、fatal flaws、fix priority 和 residual risks。\n\n- `/lab:report`\n 从 runs 和 iterations 工件生成最终研究报告。\n\n- `/lab:write`\n 使用已安装 `lab` skill 下 vendored 的 paper-writing references,把稳定 report 工件转成论文 section。\n\n## 调度规则\n\n- 始终使用 `skills/lab/SKILL.md` 作为工作流合同。\n- 用户显式调用 `/lab:<stage>` 时,要立刻执行该 stage,而不是只推荐别的 `/lab` stage。\n- 先给简洁的阶段摘要;只要 stage 合同要求受管工件,就应立刻落盘,再回报输出路径和下一步。\n- 如果歧义会影响结论,一次只问一个问题;如果有多条可行路径,先给 2-3 个方案再收敛。\n- `/lab:spec` 前应已有经批准的数据集与 benchmark 方案。\n- `/lab:run`、`/lab:iterate`、`/lab:auto`、`/lab:report` 都应遵循 `.lab/context/eval-protocol.md`。\n- `.lab/context/eval-protocol.md` 不只定义主指标和主表,也应定义指标释义、实验阶梯,以及指标和对比实现的来源。\n- `/lab:auto` 只编排已批准边界内的执行阶段,不替代手动的 idea/data/framing/spec 决策。\n- `/lab:write` 前必须已有经批准的 `/lab:framing` 工件。\n\n## 如何输入 `/lab:auto`\n\n## `/lab:auto` 层级指南\n\n- `L1`:适合安全验证、一轮 bounded 真实运行,或简单 report 刷新。\n- `L2`:默认推荐级别,适合冻结核心边界内的常规实验迭代。\n- `L3`:激进 campaign 级别,只在你明确想做更大范围探索和可选写作时使用。\n- 如果不确定,默认推荐 `L2`。\n- 如果用户输入没写级别,或者把级别和 `paper layer`、`phase`、`table` 混用了,就应先停下来,要求用户明确选 `L1/L2/L3`。\n\n- 把 `Autonomy level L1/L2/L3` 视为执行权限级别,不要和论文里的 layer、phase、table 编号混用。\n- 把 `paper layer`、`phase`、`table` 视为实验目标。例如 `paper layer 3` 或 `Phase 1` 不是 `Autonomy level L3`。\n- 一条好的 `/lab:auto` 输入应至少说清:objective、自治级别、terminal goal、scope、allowed modifications。\n- 如果 workflow language 是中文,摘要、清单条目、任务标签和进度更新都应使用中文,除非文件路径、代码标识符或字面指标名必须保持原样。\n- 示例:`/lab:auto 自治级别 L2。目标:推进 paper layer 3。终止条件:完成 bounded protocol、测试、最小实现和一轮小规模结果。允许修改:配置、数据接入、评估脚本。`\n"
2081
2133
  );
2082
2134
 
2083
2135
  ZH_CONTENT[path.join(".codex", "prompts", "lab-data.md")] = codexPrompt(
@@ -2089,7 +2141,7 @@ ZH_CONTENT[path.join(".codex", "prompts", "lab-data.md")] = codexPrompt(
2089
2141
  ZH_CONTENT[path.join(".codex", "prompts", "lab-auto.md")] = codexPrompt(
2090
2142
  "在已批准边界内编排自动实验循环",
2091
2143
  "auto mode objective",
2092
- "使用已安装的 `lab` 技能:`.codex/skills/lab/SKILL.md`。\n\n立刻针对用户当前给出的参数执行 `/lab:auto`,不要只推荐别的 `/lab` 阶段。只有在缺少阻塞性前提时,才明确指出缺什么,并且一次最多追问一个问题。\n\n本命令运行 `/lab:auto` 阶段。它必须读取 `.lab/context/eval-protocol.md`、`.lab/context/auto-mode.md`、`.lab/context/auto-status.md` 与 `.lab/context/auto-outcome.md`,先确认 autonomy level、approval statusterminal goal schema,再把 eval-protocol 里的指标释义、主表计划、来源约束与结构化实验阶梯当作执行依据,在不修改 mission、framing 和核心 claims 的前提下编排已批准的 `run`、`iterate`、`review`、`report`,轮询长任务完成情况;如果声明了 rung,就保持会话活着并按 rung 转移继续推进。\n如果仓库的 workflow language 是中文,摘要、清单条目、任务标签和进度更新都必须使用中文,除非某个文件路径、代码标识符或字面指标名必须保持原样。\n把 `Layer 3`、`Phase 1`、`Table 2` 这类表达视为论文范围目标;只有显式写成 `Autonomy level L3` 或 `自治级别 L3` 时,才把它当成执行权限级别。\n不要用 `sleep 30`、单次 `pgrep` 或一次性的 `metrics.json` 探针来代替真实长任务命令;当真实实验进程还活着时,只允许发进度更新并继续等待。"
2144
+ "使用已安装的 `lab` 技能:`.codex/skills/lab/SKILL.md`。\n\n立刻针对用户当前给出的参数执行 `/lab:auto`,不要只推荐别的 `/lab` 阶段。只有在缺少阻塞性前提时,才明确指出缺什么,并且一次最多追问一个问题。\n\n本命令运行 `/lab:auto` 阶段。它必须读取 `.lab/context/eval-protocol.md`、`.lab/context/auto-mode.md`、`.lab/context/auto-status.md` 与 `.lab/context/auto-outcome.md`,先确认 autonomy level、approval statusterminal goal schema,以及 primary gate、secondary guard、promotion condition、stop reason、escalation reason,再把 eval-protocol 里的指标释义、主表计划、来源约束与结构化实验阶梯当作执行依据,在不修改 mission、framing 和核心 claims 的前提下编排已批准的 `run`、`iterate`、`review`、`report`,轮询长任务完成情况;如果声明了 rung,就保持会话活着并按 rung 转移继续推进。\n如果仓库的 workflow language 是中文,摘要、清单条目、任务标签和进度更新都必须使用中文,除非某个文件路径、代码标识符或字面指标名必须保持原样。\n把 `Layer 3`、`Phase 1`、`Table 2` 这类表达视为论文范围目标;只有显式写成 `Autonomy level L3` 或 `自治级别 L3` 时,才把它当成执行权限级别。\n不要用 `sleep 30`、单次 `pgrep` 或一次性的 `metrics.json` 探针来代替真实长任务命令;当真实实验进程还活着时,只允许发进度更新并继续等待。"
2093
2145
  );
2094
2146
 
2095
2147
  ZH_CONTENT[path.join(".claude", "commands", "lab.md")] = claudeCommand(
@@ -2110,7 +2162,24 @@ ZH_CONTENT[path.join(".claude", "commands", "lab-auto.md")] = claudeCommand(
2110
2162
  "lab-auto",
2111
2163
  "在已批准边界内编排自动实验循环",
2112
2164
  "auto mode objective",
2113
- "使用已安装的 `lab` 技能:`.claude/skills/lab/SKILL.md`。\n\n立刻针对用户当前给出的参数执行 `auto` 阶段,不要只推荐别的 lab 阶段。只有在缺少阻塞性前提时,才明确指出缺什么,并且一次最多追问一个问题。\n\n本命令运行 lab workflow 的 `auto` 阶段。它必须读取 `.lab/context/eval-protocol.md`、`.lab/context/auto-mode.md`、`.lab/context/auto-status.md` 与 `.lab/context/auto-outcome.md`,先确认 autonomy level、approval statusterminal goal schema,再把 eval-protocol 里的指标释义、主表计划、来源约束与结构化实验阶梯当作执行依据,在不修改 mission、framing 和核心 claims 的前提下编排已批准的 `run`、`iterate`、`review`、`report`,轮询长任务完成情况;如果声明了 rung,就保持会话活着并按 rung 转移继续推进。\n如果仓库的 workflow language 是中文,摘要、清单条目、任务标签和进度更新都必须使用中文,除非某个文件路径、代码标识符或字面指标名必须保持原样。\n把 `Layer 3`、`Phase 1`、`Table 2` 这类表达视为论文范围目标;只有显式写成 `Autonomy level L3` 或 `自治级别 L3` 时,才把它当成执行权限级别。\n不要用 `sleep 30`、单次 `pgrep` 或一次性的 `metrics.json` 探针来代替真实长任务命令;当真实实验进程还活着时,只允许发进度更新并继续等待。"
2165
+ "使用已安装的 `lab` 技能:`.claude/skills/lab/SKILL.md`。\n\n立刻针对用户当前给出的参数执行 `auto` 阶段,不要只推荐别的 lab 阶段。只有在缺少阻塞性前提时,才明确指出缺什么,并且一次最多追问一个问题。\n\n本命令运行 lab workflow 的 `auto` 阶段。它必须读取 `.lab/context/eval-protocol.md`、`.lab/context/auto-mode.md`、`.lab/context/auto-status.md` 与 `.lab/context/auto-outcome.md`,先确认 autonomy level、approval statusterminal goal schema,以及 primary gate、secondary guard、promotion condition、stop reason、escalation reason,再把 eval-protocol 里的指标释义、主表计划、来源约束与结构化实验阶梯当作执行依据,在不修改 mission、framing 和核心 claims 的前提下编排已批准的 `run`、`iterate`、`review`、`report`,轮询长任务完成情况;如果声明了 rung,就保持会话活着并按 rung 转移继续推进。\n如果仓库的 workflow language 是中文,摘要、清单条目、任务标签和进度更新都必须使用中文,除非某个文件路径、代码标识符或字面指标名必须保持原样。\n把 `Layer 3`、`Phase 1`、`Table 2` 这类表达视为论文范围目标;只有显式写成 `Autonomy level L3` 或 `自治级别 L3` 时,才把它当成执行权限级别。\n不要用 `sleep 30`、单次 `pgrep` 或一次性的 `metrics.json` 探针来代替真实长任务命令;当真实实验进程还活着时,只允许发进度更新并继续等待。"
2166
+ );
2167
+
2168
+ const zhRecipeQuickPathLine =
2169
+ "- 使用 `skills/lab/references/recipes.md` 作为常见阶段链路的快速路径,不要据此发明新的 slash 命令或别名。\n";
2170
+
2171
+ ZH_CONTENT[path.join(".codex", "prompts", "lab.md")] = ZH_CONTENT[
2172
+ path.join(".codex", "prompts", "lab.md")
2173
+ ].replace(
2174
+ "- 始终使用 `skills/lab/SKILL.md` 作为工作流合同。\n",
2175
+ `- 始终使用 \`skills/lab/SKILL.md\` 作为工作流合同。\n${zhRecipeQuickPathLine}`
2176
+ );
2177
+
2178
+ ZH_CONTENT[path.join(".claude", "commands", "lab.md")] = ZH_CONTENT[
2179
+ path.join(".claude", "commands", "lab.md")
2180
+ ].replace(
2181
+ "- 始终使用 `skills/lab/SKILL.md` 作为工作流合同。\n",
2182
+ `- 始终使用 \`skills/lab/SKILL.md\` 作为工作流合同。\n${zhRecipeQuickPathLine}`
2114
2183
  );
2115
2184
 
2116
2185
  ZH_CONTENT[path.join(".codex", "skills", "lab", "SKILL.md")] = `---
@@ -2631,8 +2700,15 @@ ZH_CONTENT[path.join(".codex", "skills", "lab", "stages", "auto.md")] = `# \`/la
2631
2700
  - 把 \`/lab:auto\` 当作编排层,不要再发明第二套 workflow。
2632
2701
  - 把 \`.lab/context/eval-protocol.md\` 当作论文导向指标、指标释义、主表、gate 与结构化实验阶梯的唯一来源。
2633
2702
  - 把评估协议当作“带来源的协议”,不是“临场想出来的说明”:指标定义、baseline 行为、对比实现和偏差都必须先写明来源,再用于 gate 或 promotion。
2703
+ - 把 \`.lab/context/auto-mode.md\` 当成可见的 control plane,在启动前显式写清 \`Primary gate\`、\`Secondary guard\`、\`Promotion condition\`、\`Stop reason\` 和 \`Escalation reason\`。
2634
2704
  - 契约里必须声明 \`Autonomy level\` 和 \`Approval status\`,只有显式写成 \`approved\` 才能启动。
2635
2705
  - 契约里还必须声明具体的 terminal goal:\`rounds\`、\`metric-threshold\` 或 \`task-completion\`,并补齐 \`Terminal goal target\` 与 \`Required terminal artifact\`。
2706
+ - control plane 术语固定为:
2707
+ - \`Primary gate\`:当前 rung 或 loop 目标的主通过条件
2708
+ - \`Secondary guard\`:即使主 gate 命中也能拦下通过的安全或异常检查
2709
+ - \`Promotion condition\`:允许 exploratory 工作升格为 primary package 的明确条件
2710
+ - \`Stop reason\`:结束当前循环但不升格的明确边界
2711
+ - \`Escalation reason\`:必须转人工审查或收窄下一步的明确条件
2636
2712
  - 级别含义固定为:
2637
2713
  - \`L1\`:safe run,只允许 \`run\`、\`review\`、\`report\`
2638
2714
  - \`L2\`:bounded iteration,允许 \`run\`、\`iterate\`、\`review\`、\`report\`
@@ -2663,7 +2739,8 @@ ZH_CONTENT[path.join(".codex", "skills", "lab", "stages", "auto.md")] = `# \`/la
2663
2739
  5. 发起有边界动作
2664
2740
  6. 轮询进程、checkpoint 或 summary 的变化
2665
2741
  7. 评估声明过的 terminal goal 是否已经达成
2666
- 8. 记录结果并决定 continue、promote、stop escalate
2742
+ 8. 在正确边界上评估 \`Primary gate\`、\`Secondary guard\`、\`Promotion condition\`、\`Stop reason\` 和 \`Escalation reason\`
2743
+ 9. 记录结果并决定 continue、promote、stop 或 escalate
2667
2744
 
2668
2745
  ## 交互约束
2669
2746
 
package/lib/install.cjs CHANGED
@@ -520,6 +520,7 @@ function localizeInstalledAssets(targetDir, lang, { newlyCreatedProjectOwnedPath
520
520
  path.join(".codex", "skills", "lab", "stages", "report.md"),
521
521
  path.join(".codex", "skills", "lab", "stages", "write.md"),
522
522
  path.join(".codex", "skills", "lab", "references", "workflow.md"),
523
+ path.join(".codex", "skills", "lab", "references", "recipes.md"),
523
524
  path.join(".claude", "skills", "lab", "SKILL.md"),
524
525
  path.join(".claude", "skills", "lab", "stages", "idea.md"),
525
526
  path.join(".claude", "skills", "lab", "stages", "data.md"),
@@ -532,6 +533,7 @@ function localizeInstalledAssets(targetDir, lang, { newlyCreatedProjectOwnedPath
532
533
  path.join(".claude", "skills", "lab", "stages", "report.md"),
533
534
  path.join(".claude", "skills", "lab", "stages", "write.md"),
534
535
  path.join(".claude", "skills", "lab", "references", "workflow.md"),
536
+ path.join(".claude", "skills", "lab", "references", "recipes.md"),
535
537
  path.join(".lab", ".managed", "templates", "idea.md"),
536
538
  path.join(".lab", ".managed", "templates", "data.md"),
537
539
  path.join(".lab", ".managed", "templates", "framing.md"),
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "stage_prompt": {
3
- "codex_en": "This command runs the `/lab:write` stage. Use `.codex/skills/lab/stages/write.md` as the single source of truth for template choice, paper-plan requirements, section references, validator gates, asset coverage, and final manuscript rules. Read the matching paper-writing reference and any bundled example-bank files for the requested section, revise only one section, and keep draft rounds warning-only while final-draft or export rounds must satisfy the write-stage acceptance gates. Draft ordinary manuscript rounds in `workflow_language`, and ordinary `.tex` section drafts must stay in `workflow_language` instead of treating `paper_language` as the default draft language. Treat a readable workflow-language deliverable as a real persisted artifact rather than a review layer. Maintain `.lab/writing/terminology-glossary.md` as the write-stage glossary for full forms, approved short forms, reader-facing explanations, and aliases. Apply the same academic readability standard in every language: when the round introduces or revises key terms, abbreviations, metrics, mechanism names, or system labels, use the full form first, define any short form at first mention, explain what the term is and why it matters here, keep one paper-facing name per concept, and avoid newly coined labels joined by hyphens. Keep internal identifiers out of reader-facing prose unless they are mapped once for the reader, and record the terminology-clarity self-check in the write iteration artifact. If the manuscript would start from the managed scaffold and no template decision is recorded yet, ask once whether to keep the default scaffold or attach a template directory first. If finalization reaches a round where `workflow_language` and `paper_language` differ, finish and preserve the workflow-language deliverable first, then ask once whether to keep the draft language or convert the canonical manuscript to `paper_language`, persist that answer, record both the language decision and the workflow-language deliverable path in the latest write iteration, and only then edit the final manuscript in the chosen language.",
4
- "claude_en": "This command runs the `write` stage of the lab workflow. Use `.claude/skills/lab/stages/write.md` as the single source of truth for template choice, paper-plan requirements, section references, validator gates, asset coverage, and final manuscript rules. Read the matching paper-writing reference and any bundled example-bank files for the requested section, revise only one section, and keep draft rounds warning-only while final-draft or export rounds must satisfy the write-stage acceptance gates. Draft ordinary manuscript rounds in `workflow_language`, and ordinary `.tex` section drafts must stay in `workflow_language` instead of treating `paper_language` as the default draft language. Treat a readable workflow-language deliverable as a real persisted artifact rather than a review layer. Maintain `.lab/writing/terminology-glossary.md` as the write-stage glossary for full forms, approved short forms, reader-facing explanations, and aliases. Apply the same academic readability standard in every language: when the round introduces or revises key terms, abbreviations, metrics, mechanism names, or system labels, use the full form first, define any short form at first mention, explain what the term is and why it matters here, keep one paper-facing name per concept, and avoid newly coined labels joined by hyphens. Keep internal identifiers out of reader-facing prose unless they are mapped once for the reader, and record the terminology-clarity self-check in the write iteration artifact. If the manuscript would start from the managed scaffold and no template decision is recorded yet, ask once whether to keep the default scaffold or attach a template directory first. If finalization reaches a round where `workflow_language` and `paper_language` differ, finish and preserve the workflow-language deliverable first, then ask once whether to keep the draft language or convert the canonical manuscript to `paper_language`, persist that answer, record both the language decision and the workflow-language deliverable path in the latest write iteration, and only then edit the final manuscript in the chosen language.",
3
+ "codex_en": "This command runs the `/lab:write` stage. Use `.codex/skills/lab/stages/write.md` as the single source of truth for template choice, paper-plan requirements, section references, validator gates, asset coverage, and final manuscript rules. Read the matching paper-writing reference and any bundled example-bank files for the requested section, revise only one section, and keep draft rounds warning-only while final-draft or export rounds must satisfy the write-stage acceptance gates. Draft ordinary manuscript rounds in `workflow_language`, and ordinary `.tex` section drafts must stay in `workflow_language` instead of treating `paper_language` as the default draft language. Treat a readable workflow-language deliverable as a real persisted artifact rather than a review layer. Maintain `.lab/writing/terminology-glossary.md` as the write-stage glossary for full forms, approved short forms, reader-facing explanations, and aliases. Apply the same academic readability standard in every language: when the round introduces or revises key terms, abbreviations, metrics, mechanism names, or system labels, use the full form first, define any short form at first mention, explain what the term is and why it matters here, keep one natural-language paper-facing name per concept, use natural-language full names in prose, and do not use labels containing `_` or `-` in reader-facing prose. Keep internal identifiers out of reader-facing prose unless they are mapped once for the reader and then moved back out of prose, and record the terminology-clarity self-check in the write iteration artifact. If the manuscript would start from the managed scaffold and no template decision is recorded yet, ask once whether to keep the default scaffold or attach a template directory first. If finalization reaches a round where `workflow_language` and `paper_language` differ, finish and preserve the workflow-language deliverable first, then ask once whether to keep the draft language or convert the canonical manuscript to `paper_language`, persist that answer, record both the language decision and the workflow-language deliverable path in the latest write iteration, and only then edit the final manuscript in the chosen language.",
4
+ "claude_en": "This command runs the `write` stage of the lab workflow. Use `.claude/skills/lab/stages/write.md` as the single source of truth for template choice, paper-plan requirements, section references, validator gates, asset coverage, and final manuscript rules. Read the matching paper-writing reference and any bundled example-bank files for the requested section, revise only one section, and keep draft rounds warning-only while final-draft or export rounds must satisfy the write-stage acceptance gates. Draft ordinary manuscript rounds in `workflow_language`, and ordinary `.tex` section drafts must stay in `workflow_language` instead of treating `paper_language` as the default draft language. Treat a readable workflow-language deliverable as a real persisted artifact rather than a review layer. Maintain `.lab/writing/terminology-glossary.md` as the write-stage glossary for full forms, approved short forms, reader-facing explanations, and aliases. Apply the same academic readability standard in every language: when the round introduces or revises key terms, abbreviations, metrics, mechanism names, or system labels, use the full form first, define any short form at first mention, explain what the term is and why it matters here, keep one natural-language paper-facing name per concept, use natural-language full names in prose, and do not use labels containing `_` or `-` in reader-facing prose. Keep internal identifiers out of reader-facing prose unless they are mapped once for the reader and then moved back out of prose, and record the terminology-clarity self-check in the write iteration artifact. If the manuscript would start from the managed scaffold and no template decision is recorded yet, ask once whether to keep the default scaffold or attach a template directory first. If finalization reaches a round where `workflow_language` and `paper_language` differ, finish and preserve the workflow-language deliverable first, then ask once whether to keep the draft language or convert the canonical manuscript to `paper_language`, persist that answer, record both the language decision and the workflow-language deliverable path in the latest write iteration, and only then edit the final manuscript in the chosen language.",
5
5
  "codex_zh": "本命令运行 `/lab:write` 阶段。把 `.codex/skills/lab/stages/write.md` 当成模板选择、paper-plan、section 参考、校验 gate、资产覆盖和最终 manuscript 规则的单一来源。读取与当前 section 对应的 paper-writing reference 和 bundled example-bank 文件,一次只修改一个 section;普通草稿轮次把写作校验当 warning,最终定稿或导出轮次必须满足 write-stage 的接受 gate。普通起草轮次先跟随 `workflow_language`,普通 `.tex` section 草稿也必须先停留在 `workflow_language`,不能把 `paper_language` 当成默认草稿语言,并把可阅读的 workflow-language 交付稿当成正式持久化产物,而不是 review 层。把 `.lab/writing/terminology-glossary.md` 作为写作期 glossary,用来沉淀全称、批准缩写、对外解释和可接受别名。无论当前语言是什么,都要满足同一套学术可读性标准:如果本轮引入或改写了关键术语、缩写、指标名、机制名或系统标签,就先写全称;如果后面要复用短写,就在首次出现时定义;同时说明它是什么、为什么在这里重要,保持一个概念只有一个 paper-facing 名称,并尽量避免新造的连字符拼接标签。内部标识符默认不要进入 reader-facing prose;若必须出现,只能在完成一次读者映射后使用,并把 terminology-clarity 自检写进 write iteration artifact。如果当前稿件将从托管默认 scaffold 开始,且还没有模板决定,就先追问一次:继续使用默认 scaffold,还是先接入模板目录。如果进入最终定稿时 `workflow_language` 与 `paper_language` 不一致,就先完成并保留 workflow-language 交付稿,再追问一次:保持当前语言,还是把 canonical manuscript 转成 `paper_language`;先持久化这个决定,再在最新 write iteration 里记录语言决策和 workflow-language 交付稿路径,最后才允许按该语言修改最终稿。",
6
6
  "claude_zh": "本命令运行 lab workflow 的 `write` 阶段。把 `.claude/skills/lab/stages/write.md` 当成模板选择、paper-plan、section 参考、校验 gate、资产覆盖和最终 manuscript 规则的单一来源。读取与当前 section 对应的 paper-writing reference 和 bundled example-bank 文件,一次只修改一个 section;普通草稿轮次把写作校验当 warning,最终定稿或导出轮次必须满足 write-stage 的接受 gate。普通起草轮次先跟随 `workflow_language`,普通 `.tex` section 草稿也必须先停留在 `workflow_language`,不能把 `paper_language` 当成默认草稿语言,并把可阅读的 workflow-language 交付稿当成正式持久化产物,而不是 review 层。把 `.lab/writing/terminology-glossary.md` 作为写作期 glossary,用来沉淀全称、批准缩写、对外解释和可接受别名。无论当前语言是什么,都要满足同一套学术可读性标准:如果本轮引入或改写了关键术语、缩写、指标名、机制名或系统标签,就先写全称;如果后面要复用短写,就在首次出现时定义;同时说明它是什么、为什么在这里重要,保持一个概念只有一个 paper-facing 名称,并尽量避免新造的连字符拼接标签。内部标识符默认不要进入 reader-facing prose;若必须出现,只能在完成一次读者映射后使用,并把 terminology-clarity 自检写进 write iteration artifact。如果当前稿件将从托管默认 scaffold 开始,且还没有模板决定,就先追问一次:继续使用默认 scaffold,还是先接入模板目录。如果进入最终定稿时 `workflow_language` 与 `paper_language` 不一致,就先完成并保留 workflow-language 交付稿,再追问一次:保持当前语言,还是把 canonical manuscript 转成 `paper_language`;先持久化这个决定,再在最新 write iteration 里记录语言决策和 workflow-language 交付稿路径,最后才允许按该语言修改最终稿。"
7
7
  }
@@ -7,7 +7,7 @@ argument-hint: autonomous campaign target
7
7
  Use the installed `lab` skill at `.claude/skills/lab/SKILL.md`.
8
8
 
9
9
  Execute the requested `/lab-auto` command against the user's argument now. Do not only recommend another lab stage. If a blocking prerequisite is missing, say exactly what is missing and ask at most one clarifying question.
10
- This command runs the `auto` stage of the lab workflow. It must read `.lab/context/eval-protocol.md`, `.lab/context/auto-mode.md`, `.lab/context/auto-status.md`, and `.lab/context/auto-outcome.md`, enforce the declared terminal goal schema, orchestrate approved run, iterate, review, and report stages inside that contract, poll long-running work until completion or stop conditions, and write progress plus the final outcome back into `.lab/context/auto-status.md` and `.lab/context/auto-outcome.md`.
10
+ This command runs the `auto` stage of the lab workflow. It must read `.lab/context/eval-protocol.md`, `.lab/context/auto-mode.md`, `.lab/context/auto-status.md`, and `.lab/context/auto-outcome.md`, enforce the declared terminal goal schema, make the primary gate, secondary guard, promotion condition, stop reason, and escalation reason explicit, orchestrate approved run, iterate, review, and report stages inside that contract, poll long-running work until completion or stop conditions, and write progress plus the final outcome back into `.lab/context/auto-status.md` and `.lab/context/auto-outcome.md`.
11
11
  When the repository workflow language is Chinese, summaries, checklist items, task labels, and progress updates should be written in Chinese unless a literal identifier must stay unchanged.
12
12
  Treat `Layer 3`, `Phase 1`, or `Table 2` as paper-scope targets. Treat `Autonomy level L3` as the execution permission level.
13
13
  Do not replace the real long-running experiment command with a short watcher such as `sleep 30`, `pgrep`, or a one-shot `metrics.json` probe. While the real experiment process is still alive, emit only a progress update and keep waiting.
@@ -7,4 +7,4 @@ argument-hint: section or writing target
7
7
  Use the installed `lab` skill at `.claude/skills/lab/SKILL.md`.
8
8
 
9
9
  Execute the requested `/lab-write` command against the user's argument now. Do not only recommend another lab stage. If a blocking prerequisite is missing, say exactly what is missing and ask at most one clarifying question.
10
- This command runs the `write` stage of the lab workflow. Use `.claude/skills/lab/stages/write.md` as the single source of truth for template choice, paper-plan requirements, section references, validator gates, asset coverage, and final manuscript rules. Read the matching paper-writing reference and any bundled example-bank files for the requested section, revise only one section, and keep draft rounds warning-only while final-draft or export rounds must satisfy the write-stage acceptance gates. Draft ordinary manuscript rounds in `workflow_language`, and ordinary `.tex` section drafts must stay in `workflow_language` instead of treating `paper_language` as the default draft language. Treat a readable workflow-language deliverable as a real persisted artifact rather than a review layer. Maintain `.lab/writing/terminology-glossary.md` as the write-stage glossary for full forms, approved short forms, reader-facing explanations, and aliases. Apply the same academic readability standard in every language: when the round introduces or revises key terms, abbreviations, metrics, mechanism names, or system labels, use the full form first, define any short form at first mention, explain what the term is and why it matters here, keep one paper-facing name per concept, and avoid newly coined labels joined by hyphens. Keep internal identifiers out of reader-facing prose unless they are mapped once for the reader, and record the terminology-clarity self-check in the write iteration artifact. If the manuscript would start from the managed scaffold and no template decision is recorded yet, ask once whether to keep the default scaffold or attach a template directory first. If finalization reaches a round where `workflow_language` and `paper_language` differ, finish and preserve the workflow-language deliverable first, then ask once whether to keep the draft language or convert the canonical manuscript to `paper_language`, persist that answer, record both the language decision and the workflow-language deliverable path in the latest write iteration, and only then edit the final manuscript in the chosen language.
10
+ This command runs the `write` stage of the lab workflow. Use `.claude/skills/lab/stages/write.md` as the single source of truth for template choice, paper-plan requirements, section references, validator gates, asset coverage, and final manuscript rules. Read the matching paper-writing reference and any bundled example-bank files for the requested section, revise only one section, and keep draft rounds warning-only while final-draft or export rounds must satisfy the write-stage acceptance gates. Draft ordinary manuscript rounds in `workflow_language`, and ordinary `.tex` section drafts must stay in `workflow_language` instead of treating `paper_language` as the default draft language. Treat a readable workflow-language deliverable as a real persisted artifact rather than a review layer. Maintain `.lab/writing/terminology-glossary.md` as the write-stage glossary for full forms, approved short forms, reader-facing explanations, and aliases. Apply the same academic readability standard in every language: when the round introduces or revises key terms, abbreviations, metrics, mechanism names, or system labels, use the full form first, define any short form at first mention, explain what the term is and why it matters here, keep one natural-language paper-facing name per concept, use natural-language full names in prose, and do not use labels containing `_` or `-` in reader-facing prose. Keep internal identifiers out of reader-facing prose unless they are mapped once for the reader and then moved back out of prose, and record the terminology-clarity self-check in the write iteration artifact. If the manuscript would start from the managed scaffold and no template decision is recorded yet, ask once whether to keep the default scaffold or attach a template directory first. If finalization reaches a round where `workflow_language` and `paper_language` differ, finish and preserve the workflow-language deliverable first, then ask once whether to keep the draft language or convert the canonical manuscript to `paper_language`, persist that answer, record both the language decision and the workflow-language deliverable path in the latest write iteration, and only then edit the final manuscript in the chosen language.
@@ -22,7 +22,7 @@ Use the same repository artifacts and stage boundaries every time.
22
22
  Turn the approved idea into an approved dataset and benchmark package with dataset years, papers that used each dataset, source audit, download plan, classic-public versus recent-strong-public versus claim-specific benchmark roles, and explicit rationale for canonical baselines, strong historical baselines, recent strong public methods, and closest prior work.
23
23
 
24
24
  - `/lab auto ...` or `/lab-auto`
25
- Run a bounded orchestration loop over approved execution stages. Use an auto-mode contract plus live auto-status to drive `run`, `iterate`, `review`, `report`, and optionally `write` without changing the frozen mission or framing. Choose an autonomy level, declare a concrete terminal goal, explicitly approve the contract before starting, and treat `.lab/context/eval-protocol.md` as the source of truth for metrics, metric glossary, source-backed comparison semantics, tables, and structured experiment-ladder rungs.
25
+ Run a bounded orchestration loop over approved execution stages. Use an auto-mode contract plus live auto-status to drive `run`, `iterate`, `review`, `report`, and optionally `write` without changing the frozen mission or framing. Choose an autonomy level, declare a concrete terminal goal, make the primary gate, secondary guard, promotion condition, stop reason, and escalation reason explicit, explicitly approve the contract before starting, and treat `.lab/context/eval-protocol.md` as the source of truth for metrics, metric glossary, source-backed comparison semantics, tables, and structured experiment-ladder rungs.
26
26
 
27
27
  - `/lab framing ...` or `/lab-framing`
28
28
  Lock paper-facing method name, module names, paper title, and contribution bullets by auditing current-field and adjacent-field terminology, then keep an approval gate before any section drafting.
@@ -49,6 +49,7 @@ Use the same repository artifacts and stage boundaries every time.
49
49
  ## Dispatch Rules
50
50
 
51
51
  - Always use `skills/lab/SKILL.md` as the workflow contract.
52
+ - Use `skills/lab/references/recipes.md` as the quick path for common stage chains; do not invent new slash commands or aliases from it.
52
53
  - When the user explicitly invokes `/lab <stage> ...` or a direct `/lab-<stage>` alias, execute that stage now against the provided argument instead of only recommending another lab stage.
53
54
  - Start by giving the user a concise stage summary. Materialize managed artifacts immediately when the stage contract requires them, then report the output path and next step.
54
55
  - When ambiguity matters, ask one clarifying question at a time; when multiple paths are viable, present 2-3 approaches before converging.
@@ -6,7 +6,7 @@ argument-hint: autonomous campaign target
6
6
  Use the installed `lab` skill at `.codex/skills/lab/SKILL.md`.
7
7
 
8
8
  Execute the requested `/lab:auto` stage against the user's argument now. Do not only recommend another lab stage. If a blocking prerequisite is missing, say exactly what is missing and ask at most one clarifying question.
9
- This command runs the `/lab:auto` stage. It must read `.lab/context/eval-protocol.md`, `.lab/context/auto-mode.md`, `.lab/context/auto-status.md`, and `.lab/context/auto-outcome.md`, enforce the declared terminal goal schema, orchestrate approved run, iterate, review, and report stages inside that contract, poll long-running work until completion or stop conditions, and write progress plus the final outcome back into `.lab/context/auto-status.md` and `.lab/context/auto-outcome.md`.
9
+ This command runs the `/lab:auto` stage. It must read `.lab/context/eval-protocol.md`, `.lab/context/auto-mode.md`, `.lab/context/auto-status.md`, and `.lab/context/auto-outcome.md`, enforce the declared terminal goal schema, make the primary gate, secondary guard, promotion condition, stop reason, and escalation reason explicit, orchestrate approved run, iterate, review, and report stages inside that contract, poll long-running work until completion or stop conditions, and write progress plus the final outcome back into `.lab/context/auto-status.md` and `.lab/context/auto-outcome.md`.
10
10
  When the repository workflow language is Chinese, summaries, checklist items, task labels, and progress updates should be written in Chinese unless a literal identifier must stay unchanged.
11
11
  Treat `Layer 3`, `Phase 1`, or `Table 2` as paper-scope targets. Treat `Autonomy level L3` as the execution permission level.
12
12
  Do not replace the real long-running experiment command with a short watcher such as `sleep 30`, `pgrep`, or a one-shot `metrics.json` probe. While the real experiment process is still alive, emit only a progress update and keep waiting.
@@ -6,4 +6,4 @@ argument-hint: section or writing target
6
6
  Use the installed `lab` skill at `.codex/skills/lab/SKILL.md`.
7
7
 
8
8
  Execute the requested `/lab:write` stage against the user's argument now. Do not only recommend another lab stage. If a blocking prerequisite is missing, say exactly what is missing and ask at most one clarifying question.
9
- This command runs the `/lab:write` stage. Use `.codex/skills/lab/stages/write.md` as the single source of truth for template choice, paper-plan requirements, section references, validator gates, asset coverage, and final manuscript rules. Read the matching paper-writing reference and any bundled example-bank files for the requested section, revise only one section, and keep draft rounds warning-only while final-draft or export rounds must satisfy the write-stage acceptance gates. Draft ordinary manuscript rounds in `workflow_language`, and ordinary `.tex` section drafts must stay in `workflow_language` instead of treating `paper_language` as the default draft language. Treat a readable workflow-language deliverable as a real persisted artifact rather than a review layer. Maintain `.lab/writing/terminology-glossary.md` as the write-stage glossary for full forms, approved short forms, reader-facing explanations, and aliases. Apply the same academic readability standard in every language: when the round introduces or revises key terms, abbreviations, metrics, mechanism names, or system labels, use the full form first, define any short form at first mention, explain what the term is and why it matters here, keep one paper-facing name per concept, and avoid newly coined labels joined by hyphens. Keep internal identifiers out of reader-facing prose unless they are mapped once for the reader, and record the terminology-clarity self-check in the write iteration artifact. If the manuscript would start from the managed scaffold and no template decision is recorded yet, ask once whether to keep the default scaffold or attach a template directory first. If finalization reaches a round where `workflow_language` and `paper_language` differ, finish and preserve the workflow-language deliverable first, then ask once whether to keep the draft language or convert the canonical manuscript to `paper_language`, persist that answer, record both the language decision and the workflow-language deliverable path in the latest write iteration, and only then edit the final manuscript in the chosen language.
9
+ This command runs the `/lab:write` stage. Use `.codex/skills/lab/stages/write.md` as the single source of truth for template choice, paper-plan requirements, section references, validator gates, asset coverage, and final manuscript rules. Read the matching paper-writing reference and any bundled example-bank files for the requested section, revise only one section, and keep draft rounds warning-only while final-draft or export rounds must satisfy the write-stage acceptance gates. Draft ordinary manuscript rounds in `workflow_language`, and ordinary `.tex` section drafts must stay in `workflow_language` instead of treating `paper_language` as the default draft language. Treat a readable workflow-language deliverable as a real persisted artifact rather than a review layer. Maintain `.lab/writing/terminology-glossary.md` as the write-stage glossary for full forms, approved short forms, reader-facing explanations, and aliases. Apply the same academic readability standard in every language: when the round introduces or revises key terms, abbreviations, metrics, mechanism names, or system labels, use the full form first, define any short form at first mention, explain what the term is and why it matters here, keep one natural-language paper-facing name per concept, use natural-language full names in prose, and do not use labels containing `_` or `-` in reader-facing prose. Keep internal identifiers out of reader-facing prose unless they are mapped once for the reader and then moved back out of prose, and record the terminology-clarity self-check in the write iteration artifact. If the manuscript would start from the managed scaffold and no template decision is recorded yet, ask once whether to keep the default scaffold or attach a template directory first. If finalization reaches a round where `workflow_language` and `paper_language` differ, finish and preserve the workflow-language deliverable first, then ask once whether to keep the draft language or convert the canonical manuscript to `paper_language`, persist that answer, record both the language decision and the workflow-language deliverable path in the latest write iteration, and only then edit the final manuscript in the chosen language.
@@ -16,7 +16,7 @@ argument-hint: workflow question or stage choice
16
16
  Turn the approved idea into an approved dataset and benchmark package with dataset years, papers that used each dataset, source audit, download plan, classic-public versus recent-strong-public versus claim-specific benchmark roles, and explicit rationale for canonical baselines, strong historical baselines, recent strong public methods, and closest prior work.
17
17
 
18
18
  - `/lab:auto`
19
- Run a bounded orchestration loop over approved execution stages. Use an auto-mode contract plus live auto-status to drive `run`, `iterate`, `review`, `report`, and optionally `write` without changing the frozen mission or framing. Choose an autonomy level, declare a concrete terminal goal, explicitly approve the contract before starting, and treat `.lab/context/eval-protocol.md` as the source of truth for metrics, metric glossary, source-backed comparison semantics, tables, and structured experiment-ladder rungs.
19
+ Run a bounded orchestration loop over approved execution stages. Use an auto-mode contract plus live auto-status to drive `run`, `iterate`, `review`, `report`, and optionally `write` without changing the frozen mission or framing. Choose an autonomy level, declare a concrete terminal goal, make the primary gate, secondary guard, promotion condition, stop reason, and escalation reason explicit, explicitly approve the contract before starting, and treat `.lab/context/eval-protocol.md` as the source of truth for metrics, metric glossary, source-backed comparison semantics, tables, and structured experiment-ladder rungs.
20
20
 
21
21
  - `/lab:framing`
22
22
  Lock paper-facing method name, module names, paper title, and contribution bullets by auditing current-field and adjacent-field terminology, then keep an approval gate before any section drafting.
@@ -43,6 +43,7 @@ argument-hint: workflow question or stage choice
43
43
  ## Dispatch Rules
44
44
 
45
45
  - Always use `skills/lab/SKILL.md` as the workflow contract.
46
+ - Use `skills/lab/references/recipes.md` as the quick path for common stage chains; do not invent new slash commands or aliases from it.
46
47
  - When the user explicitly invokes `/lab:<stage>`, execute that stage now against the provided argument instead of only recommending another `/lab` stage.
47
48
  - Start by giving the user a concise stage summary. Materialize managed artifacts immediately when the stage contract requires them, then report the output path and next step.
48
49
  - When ambiguity matters, ask one clarifying question at a time; when multiple paths are viable, present 2-3 approaches before converging.
@@ -20,6 +20,14 @@
20
20
  - Canonical context files refreshed:
21
21
  - Evidence index anchors:
22
22
 
23
+ ## Handoff
24
+
25
+ - Completed work:
26
+ - Frozen scope:
27
+ - Allowed next action:
28
+ - Required read set for the next owner:
29
+ - Accept / revise / reject boundary:
30
+
23
31
  ## Paper Handoff
24
32
 
25
33
  - Sections ready for `/lab:write`:
@@ -121,6 +121,14 @@ Preserve failed runs and rejected ideas.
121
121
 
122
122
  Describe unresolved risks and external validity limits.
123
123
 
124
+ ## Handoff
125
+
126
+ - Completed work:
127
+ - Frozen scope:
128
+ - Allowed next action:
129
+ - Required read set for the next owner:
130
+ - Accept / revise / reject boundary:
131
+
124
132
  ## Next Steps
125
133
 
126
134
  List concrete follow-up actions.
@@ -132,7 +132,9 @@ Suggested levels:
132
132
  - Why it survived:
133
133
  - Surviving direction 2:
134
134
  - Why it survived:
135
+ - Why the survivors still remain:
135
136
  - Rejected directions and why:
137
+ - Rejected Options:
136
138
  - Recommended narrowed direction:
137
139
  - Why this is stronger now:
138
140
 
@@ -21,6 +21,12 @@
21
21
  - Risk 1:
22
22
  - Risk 2:
23
23
 
24
+ ## Alternative Explanations and Boundary Risks
25
+
26
+ - Strongest alternative explanation still on the table:
27
+ - Boundary risk that could narrow the claim:
28
+ - What evidence would rule the alternative explanation out:
29
+
24
30
  ## Checklist
25
31
 
26
32
  - Are the academic validity checks filled and still consistent with the actual setup?
@@ -19,3 +19,9 @@
19
19
  - Validation run executed
20
20
  - Iteration reports generated
21
21
  - Final report generated
22
+
23
+ ## Failure Modes or Scenario Table
24
+
25
+ - Scenario or failure mode:
26
+ - Why it matters:
27
+ - Mitigation or design response:
@@ -5,9 +5,10 @@ Use this glossary during `/lab:write` to keep reader-facing naming stable.
5
5
  ## Naming Rules
6
6
 
7
7
  - First mention should use the full form. If a short form or acronym will be reused, define it at first mention as `Full Form (Short Form)`.
8
- - Keep one canonical paper-facing name per concept.
9
- - Keep internal identifiers, config keys, and experiment package labels out of prose unless they are mapped once for the reader.
10
- - Prefer natural phrase names over newly coined labels joined by hyphens.
8
+ - Keep one canonical natural-language paper-facing name per concept.
9
+ - Use natural-language full names in prose.
10
+ - Do not use labels containing `_` or `-` in reader-facing prose.
11
+ - Keep internal identifiers, config keys, and experiment package labels out of prose unless they are mapped once for the reader and then moved back out of prose.
11
12
 
12
13
  ## Entries
13
14
 
@@ -19,7 +20,8 @@ Use this glossary during `/lab:write` to keep reader-facing naming stable.
19
20
  - Reader-facing explanation:
20
21
  - Why this term matters here:
21
22
  - First-use section:
22
- - Allowed aliases:
23
+ - Approved short natural-language alias, if any:
24
+ - Forbidden aliases or labels:
23
25
  - Internal identifiers to avoid in prose:
24
26
 
25
27
  ## Audit
@@ -32,10 +32,12 @@
32
32
  - Key terms introduced or revised this round:
33
33
  - Was `.lab/writing/terminology-glossary.md` updated:
34
34
  - Did first mention use the full form, with an approved short form only after that:
35
+ - Did each concept keep one natural-language paper-facing name throughout the prose:
35
36
  - For each term, what it is and why it matters here:
36
37
  - Did any internal identifier leak into reader-facing prose:
37
- - Did any newly coined label joined by hyphens remain in reader-facing prose:
38
+ - Did any label containing `_` or `-` remain in reader-facing prose:
38
39
  - Were the first mentions explained in the prose:
40
+ - Did any alias drift remain unresolved:
39
41
  - Remaining reader-facing jargon risk:
40
42
 
41
43
  ## Language Decision
@@ -51,3 +53,11 @@
51
53
  - Continue or stop:
52
54
  - Next writing target:
53
55
  - Route back to `review` or `iterate` if needed:
56
+
57
+ ## Handoff
58
+
59
+ - Completed work:
60
+ - Frozen scope:
61
+ - Allowed next action:
62
+ - Required read set for the next owner:
63
+ - Accept / revise / reject boundary:
@@ -22,6 +22,11 @@ If `eval-protocol.md` declares structured rung entries, auto mode follows those
22
22
  - Terminal goal type:
23
23
  - Terminal goal target:
24
24
  - Required terminal artifact:
25
+ - Primary gate:
26
+ - Secondary guard:
27
+ - Promotion condition:
28
+ - Stop reason:
29
+ - Escalation reason:
25
30
  - If the workflow language is Chinese, keep summaries, checklist items, task labels, and progress updates in Chinese.
26
31
  - Example objective: advance paper layer 3 through one bounded protocol, tests, minimal implementation, and one small run.
27
32
 
@@ -36,6 +36,7 @@ For auto-mode orchestration or long-running experiment campaigns, also read:
36
36
  - `.lab/context/state.md` is a derived durable research snapshot; `.lab/context/workflow-state.md` holds live workflow state.
37
37
  - `.lab/context/summary.md` is the durable project summary; `.lab/context/session-brief.md` is the next-session startup brief.
38
38
  - `.lab/context/auto-mode.md` defines the bounded autonomous envelope; `.lab/context/auto-status.md` records live state for resume and handoff.
39
+ - Handoff-style artifacts should use stable wording for completed work, frozen scope, allowed next action, required read set, and accept or revise or reject boundaries.
39
40
  - If the user provides a LaTeX template directory, validate it and attach it through `paper_template_root` before drafting.
40
41
  - Treat attached template directories as user-owned assets. Do not rewrite template files unless the user explicitly asks.
41
42
  - If no template is configured, use the managed default LaTeX scaffold under the deliverable paper directory.
@@ -31,6 +31,7 @@ Use this skill when the user invokes `/lab:*` or asks for the structured researc
31
31
  - Final paper output should default to LaTeX, and its manuscript language should be decided separately from the workflow language.
32
32
  - Separate sourced facts from model-generated hypotheses.
33
33
  - Preserve failed runs, failed ideas, and limitations.
34
+ - Use `skills/lab/references/recipes.md` as the quick path for common stage chains without inventing new commands.
34
35
 
35
36
  ## Stage Contract
36
37
 
@@ -41,6 +42,7 @@ Use this skill when the user invokes `/lab:*` or asks for the structured researc
41
42
  - For each brainstorm pass 1 candidate direction, explain what it is, why it matters, roughly how it would work, what problem it solves, and its main risk.
42
43
  - Run literature sweep 1 with closest-prior references for each candidate direction before narrowing.
43
44
  - Use brainstorm pass 2 to keep only the strongest 1-2 directions, explain why each surviving direction remains, explain what was rejected, and say why the narrowed recommendation is stronger now.
45
+ - Name rejected options explicitly and say why the surviving directions still remain before making the final recommendation.
44
46
  - Run literature sweep 2 before making a final recommendation or novelty claim.
45
47
  - Include a user-visible literature summary that names the closest prior found, the recent strong papers found, and what existing work still does not solve before giving the final recommendation.
46
48
  - Build a literature-scoping bundle before claiming novelty. The default target is 20 relevant sources unless the field is genuinely too narrow and that exception is written down.
@@ -108,6 +110,7 @@ Use this skill when the user invokes `/lab:*` or asks for the structured researc
108
110
  - Use this stage to orchestrate approved execution stages with bounded autonomy.
109
111
  - Read `.lab/config/workflow.json`, `.lab/context/mission.md`, `.lab/context/state.md`, `.lab/context/workflow-state.md`, `.lab/context/decisions.md`, `.lab/context/data-decisions.md`, `.lab/context/evidence-index.md`, `.lab/context/terminology-lock.md`, `.lab/context/auto-mode.md`, and `.lab/context/auto-status.md` before acting.
110
112
  - Treat `.lab/context/auto-mode.md` as the control contract and `.lab/context/auto-status.md` as the live state file.
113
+ - Require `.lab/context/auto-mode.md` to expose `Primary gate`, `Secondary guard`, `Promotion condition`, `Stop reason`, and `Escalation reason` before execution.
111
114
  - Require `Autonomy level` and `Approval status` in `.lab/context/auto-mode.md` before execution.
112
115
  - Treat `L1` as safe-run validation, `L2` as bounded iteration, and `L3` as aggressive campaign mode.
113
116
  - Surface the level guide every time `/lab:auto` starts, and make the detailed guide mandatory when the user omits the level or mixes it with a paper layer, phase, or table target.
@@ -126,6 +129,7 @@ Use this skill when the user invokes `/lab:*` or asks for the structured researc
126
129
  - Use `.lab/changes/<change-id>/` as the canonical lab change directory.
127
130
  - Convert the approved idea into lab change artifacts using `.lab/.managed/templates/proposal.md`, `.lab/.managed/templates/design.md`, `.lab/.managed/templates/spec.md`, and `.lab/.managed/templates/tasks.md`.
128
131
  - When the approved idea involves human-subject evaluation, use this stage to freeze the human-subject experiment design: participant recruitment, sample-size rationale, condition design, assignment, measurement, and ethics or debrief details.
132
+ - When the design depends on interaction, safety, protocol, or human-subject boundaries, add a failure-mode or scenario table instead of leaving those cases implicit.
129
133
  - Update `.lab/context/decisions.md` after freezing the spec, then refresh derived views.
130
134
  - Do not skip task definition.
131
135
 
@@ -168,6 +172,7 @@ Use this skill when the user invokes `/lab:*` or asks for the structured researc
168
172
  - Read `.lab/context/mission.md`, `.lab/context/decisions.md`, `.lab/context/evidence-index.md`, and `.lab/context/data-decisions.md` before reviewing.
169
173
  - Start with a concise summary of what is being reviewed and the top review question.
170
174
  - Prioritize methodology, fairness, benchmark representativeness, comparison-category coverage, leakage, statistics, ablations, and claim discipline.
175
+ - Surface the strongest alternative explanation and any boundary risk that should narrow the claim.
171
176
  - Output findings first, then fatal flaws, then fix priority, then residual risks.
172
177
  - Use `.lab/.managed/templates/review-checklist.md`.
173
178
  - Write durable review conclusions back to `.lab/context/decisions.md`, `.lab/context/evidence-index.md`, or `.lab/context/open-questions.md` when they affect later stages. Do not use `.lab/context/state.md` as a primary write target.
@@ -200,8 +205,10 @@ Use this skill when the user invokes `/lab:*` or asks for the structured researc
200
205
  - When the languages differ, record the workflow language, paper language, finalization decision, why the decision was chosen, and the workflow-language deliverable path in the latest write iteration artifact.
201
206
  - Apply the same reader-facing standard in both languages: when a round introduces or revises key terms, abbreviations, metrics, mechanism names, or system labels, explain them at first mention by saying what they are and why they matter here.
202
207
  - First mention should use the full form. If a short form or acronym will be reused, define it at first mention and then keep usage stable.
203
- - Keep one canonical paper-facing name per concept, and keep internal identifiers out of prose unless they are mapped once for the reader.
204
- - Prefer natural phrase names over newly coined labels joined by hyphens.
208
+ - Keep one canonical natural-language paper-facing name per concept.
209
+ - Use natural-language full names in prose, and define any approved short form once before reusing it.
210
+ - Do not use labels containing `_` or `-` in reader-facing prose.
211
+ - Keep internal identifiers out of prose unless they are mapped once for the reader and then moved back out of prose.
205
212
  - Do not rely on unexplained jargon density as a substitute for academic tone.
206
213
  - Bind each claim to evidence from `report`, iteration reports, or normalized summaries.
207
214
  - Use the write-stage contract in `.codex/skills/lab/stages/write.md` or `.claude/skills/lab/stages/write.md` as the single source of truth for template choice, paper-plan requirements, section-specific references, validator calls, asset coverage, and final manuscript gates.
@@ -236,6 +243,7 @@ Use this skill when the user invokes `/lab:*` or asks for the structured researc
236
243
  ## References
237
244
 
238
245
  - Workflow summary: `.codex/skills/lab/references/workflow.md` or `.claude/skills/lab/references/workflow.md`
246
+ - Stage recipes: `skills/lab/references/recipes.md`
239
247
  - Brainstorming integration: `.codex/skills/lab/references/brainstorming-integration.md` or `.claude/skills/lab/references/brainstorming-integration.md`
240
248
  - Idea stage guide: `.codex/skills/lab/stages/idea.md` or `.claude/skills/lab/stages/idea.md`
241
249
  - Data stage guide: `.codex/skills/lab/stages/data.md` or `.claude/skills/lab/stages/data.md`
@@ -0,0 +1,38 @@
1
+ # /lab Stage Recipes
2
+
3
+ Use this file as a quick routing map for common stage chains. It does not add new commands or replace stage contracts.
4
+
5
+ ## idea -> data -> spec
6
+
7
+ Use this path when the problem is still being selected or narrowed.
8
+
9
+ - Start at `/lab:idea` to compare candidate directions, run literature sweeps, and end with an approval gate.
10
+ - Move to `/lab:data` after the idea is approved and the benchmark package needs to be frozen.
11
+ - Move to `/lab:spec` after the idea and data package are approved and the change must be turned into an actionable design.
12
+
13
+ ## run -> iterate -> review -> report
14
+
15
+ Use this path when the mission is already approved and the work is now evidence generation.
16
+
17
+ - Start at `/lab:run` for the smallest meaningful validation.
18
+ - Move to `/lab:iterate` when the campaign needs bounded rounds against a frozen mission.
19
+ - Move to `/lab:review` when methodology, fairness, leakage, or claim discipline needs reviewer-style scrutiny.
20
+ - Move to `/lab:report` when the validated evidence is stable enough for collaborator-facing synthesis.
21
+
22
+ ## framing -> write -> review
23
+
24
+ Use this path when report artifacts are stable and the paper narrative is being finalized.
25
+
26
+ - Start at `/lab:framing` to lock paper-facing names, title direction, and contribution wording.
27
+ - Move to `/lab:write` for section-by-section manuscript drafting against the approved framing and report evidence.
28
+ - Revisit `/lab:review` when a draft section or paper-level claim set needs reviewer-mode critique before finalization.
29
+
30
+ ## `/lab:auto` by approved scope
31
+
32
+ Use `/lab:auto` only after the upstream stage decisions are already approved.
33
+
34
+ - `L1`: one bounded validation cycle over approved execution stages
35
+ - `L2`: default bounded experiment iteration inside a frozen core
36
+ - `L3`: broader campaign that may include writing only when framing is already approved
37
+
38
+ `/lab:auto` is an orchestration layer over approved stage chains, not a replacement for `idea`, `data`, `framing`, or `spec`.
@@ -1,5 +1,7 @@
1
1
  # /lab Workflow Reference
2
2
 
3
+ For common stage chains, see the quick-path guide in `skills/lab/references/recipes.md`.
4
+
3
5
  ## Frozen Mission
4
6
 
5
7
  Within `/lab:iterate`, the campaign target stays fixed across rounds.
@@ -46,12 +46,19 @@
46
46
  - Treat `Academic Validity Checks` and `Integrity self-check` as mandatory automation gates. Auto mode should not proceed, promote, or declare success while those fields are missing, stale, or contradicted by the current rung.
47
47
  - Treat `Sanity and Alternative-Explanation Checks` as the anomaly gate for automation. When a rung yields all-null outputs, suspiciously identical runs, no-op deltas, or impl/result mismatches, pause promotion logic until implementation reality checks, alternative explanations, and at least one cross-check are recorded.
48
48
  - Treat paper-template selection as an explicit write-time gate, not as a silent fallback, when the loop is about to create `.tex` deliverables for the first time.
49
+ - Treat `.lab/context/auto-mode.md` as a visible control plane. The contract should make the primary gate, secondary guard, promotion condition, stop reason, and escalation reason explicit before execution starts.
49
50
  - The contract must declare `Autonomy level` and `Approval status`, and execution starts only when approval is explicitly set to `approved`.
50
51
  - The contract must also declare a concrete terminal goal:
51
52
  - `rounds`
52
53
  - `metric-threshold`
53
54
  - `task-completion`
54
55
  - The contract must provide both `Terminal goal target` and `Required terminal artifact`.
56
+ - Use these control-plane terms consistently:
57
+ - `Primary gate`: the main pass condition that decides whether the current rung or loop objective is satisfied
58
+ - `Secondary guard`: the safety or anomaly check that can block a pass even when the primary gate looks good
59
+ - `Promotion condition`: the explicit rule that allows exploratory work to be promoted into the primary package
60
+ - `Stop reason`: the concrete boundary that ends the current loop without promotion
61
+ - `Escalation reason`: the concrete condition that forces human review or a narrower next step
55
62
  - Recommended level meanings:
56
63
  - `L1`: safe run validation over `run`, `review`, and `report`
57
64
  - `L2`: bounded iteration over `run`, `iterate`, `review`, and `report`
@@ -65,6 +72,7 @@
65
72
  - Do not treat a short watcher such as `sleep 30`, a one-shot `pgrep`, or a single `metrics.json` probe as the rung command when the real experiment is still running.
66
73
  - Bind each rung to the real long-running command or process that owns the experiment result.
67
74
  - Always write a canonical `.lab/context/auto-outcome.md` when the run completes, stops, or fails.
75
+ - Keep handoff wording stable across auto outcomes and downstream report or write handoffs: record completed work, frozen scope, allowed next action, required read set for the next owner, and the accept or revise or reject boundary.
68
76
  - When the evaluation protocol declares structured ladder rungs, execute them as a foreground rung state machine:
69
77
  - each rung must declare `Stage`, `Goal`, `Command`, `Watch`, `Gate`, `On pass`, `On fail`, and `On stop`
70
78
  - keep the session alive while the current rung is running
@@ -92,7 +100,7 @@
92
100
  6. Poll for process completion, checkpoint movement, or summary generation while keeping the session alive
93
101
  7. Evaluate the declared rung gate and transition to the next rung when structured ladder mode is active
94
102
  8. Evaluate the declared terminal goal semantics at the correct boundary
95
- 9. Evaluate stop, success, and promotion checks at the correct boundary
103
+ 9. Evaluate the primary gate, secondary guard, promotion condition, stop reason, and escalation reason at the correct boundary
96
104
  10. Write auto-outcome and decide continue, promote, stop, or escalate
97
105
 
98
106
  ## Interaction Contract
@@ -23,6 +23,8 @@
23
23
  - literature sweep 1 with 3-5 closest-prior references per direction
24
24
  - brainstorm pass 2 that narrows to 1-2 surviving directions
25
25
  - each brainstorm pass 2 surviving direction explained with why it survived, plus rejected directions and why, and why the narrowed recommendation is stronger now
26
+ - rejected options stated explicitly, not only implied
27
+ - why the survivors remain stated explicitly, not only implied
26
28
  - literature sweep 2 that expands the surviving directions into the full source bundle
27
29
  - literature summary for recommendation with closest prior found, recent strong papers found, and what existing work still does not solve
28
30
  - literature scoping bundle with a default target of 20 sources, or an explicit explanation for a smaller scoped field
@@ -126,6 +128,8 @@
126
128
  - Give brainstorm pass 1 at least three candidate directions, and explain each one with what it is, why it matters, rough how it would work, what problem it solves, and its main risk.
127
129
  - Use literature sweep 1 to test candidate directions against real papers before narrowing them.
128
130
  - Use brainstorm pass 2 to explain what survived, why it survived, what was rejected, why it was rejected, and why the narrowed recommendation is stronger now.
131
+ - Name rejected options explicitly and keep a short explanation of why each one lost.
132
+ - Name why the survivors remain explicitly so later stages inherit the narrowed logic instead of only the final winner label.
129
133
  - Use literature sweep 2 to support the final recommendation with real references across the required buckets.
130
134
  - Add a short literature summary for recommendation so the final output shows the closest prior found, the recent strong papers found, and what existing work still does not solve.
131
135
  - Compare against existing methods explicitly, not by vague novelty language.
@@ -23,6 +23,7 @@
23
23
  - limitations
24
24
  - next steps
25
25
  - artifact status kept separate from validated findings
26
+ - stable handoff section that says what is complete, what is frozen, what the next owner may do, what they must read, and what counts as accept, revise, or reject
26
27
 
27
28
  ## Context Read Set
28
29
 
@@ -71,6 +72,7 @@
71
72
  - If the existing `report.md` or `main-tables.md` is missing required collaborator-facing sections from the managed templates, treat that as a report deficiency. A rerun must repair the missing sections instead of declaring "no content change" or treating the rerun as a no-op.
72
73
  - After drafting or rerunning the report, run `.lab/.managed/scripts/validate_collaborator_report.py --report <deliverables_root>/report.md --main-tables <deliverables_root>/main-tables.md`. If it fails, keep editing until it passes; do not stop at a no-op audit rerun.
73
74
  - Do not mix workflow deliverable status, rerun ids, or manuscript skeleton status into validated scientific findings; keep those in `<deliverables_root>/artifact-status.md`.
75
+ - Keep handoff wording stable across `report.md` and `<deliverables_root>/artifact-status.md`: record completed work, frozen scope, allowed next action, the required read set for the next owner, and the accept or revise or reject boundary.
74
76
  - Write durable report-level conclusions into canonical context such as `.lab/context/mission.md`, `.lab/context/eval-protocol.md`, `.lab/context/decisions.md`, and `.lab/context/evidence-index.md`, then refresh the derived `state.md` snapshot. Write live reporting progress or immediate handoff actions into `.lab/context/workflow-state.md`.
75
77
  - If `.lab/config/workflow.json` sets the workflow language to Chinese, write `report.md` and `<deliverables_root>/main-tables.md` in Chinese unless a file path, code identifier, or literal metric name must remain unchanged.
76
78
  - Prefer conservative interpretation over marketing language.
@@ -38,6 +38,8 @@
38
38
  - unsupported causal or statistical claims
39
39
  - missing ablations
40
40
  - irreproducible reporting
41
+ - unresolved alternative explanations
42
+ - boundary risks that should narrow the claim even if the implementation is correct
41
43
 
42
44
  ## Output Style
43
45
 
@@ -47,6 +49,7 @@
47
49
  - fix priority stated clearly
48
50
  - evidence-linked critique
49
51
  - explicit residual risks
52
+ - explicit alternative explanations and boundary risks
50
53
 
51
54
  ## Interaction Contract
52
55
 
@@ -54,3 +57,4 @@
54
57
  - Ask one clarifying question at a time only if review scope ambiguity would change findings or severity.
55
58
  - If there are multiple legitimate review framings, present 2-3 approaches with trade-offs and recommend the strictest useful framing.
56
59
  - Do not use brainstorming to soften critique; once scope is clear, stay in reviewer mode and deliver findings directly.
60
+ - Call out the strongest remaining alternative explanation and the strongest boundary risk when either one could materially narrow the claim.
@@ -38,6 +38,7 @@
38
38
  - Carry the approved dataset package, source choices, and benchmark mix into the change.
39
39
  - Preserve evaluation boundaries from the idea stage.
40
40
  - If the approved idea includes human-subject evaluation, convert the rough evaluation sketch into an explicit human-subject experiment design instead of leaving recruitment or protocol details implicit.
41
+ - When interaction, safety, protocol, or human-subject risk matters, add a failure-mode or scenario table instead of leaving those edge cases implicit.
41
42
  - Translate risks into concrete tasks when possible.
42
43
  - Make task granularity small enough that `/lab:run` and `/lab:iterate` can execute predictably.
43
44
  - Use one lab-native change directory per approved idea instead of scattering spec artifacts.
@@ -48,6 +49,7 @@
48
49
  - If change decomposition or validation strategy is materially ambiguous, ask one clarifying question at a time.
49
50
  - If there are multiple viable ways to structure the lab change directory, present 2-3 approaches with trade-offs and a recommendation before freezing the change.
50
51
  - Keep an approval gate before locking a change structure that will drive `/lab:run` and `/lab:iterate`.
52
+ - Make failure modes or scenario coverage explicit when the design depends on interaction boundaries, hidden assumptions, safety constraints, or human-subject protocol choices.
51
53
 
52
54
  ## Minimum Task Coverage
53
55
 
@@ -64,6 +66,7 @@
64
66
  - assignment or randomization protocol
65
67
  - measurement or survey plan
66
68
  - ethics, consent, and debrief plan when applicable
69
+ - failure-mode or scenario table when the approved idea depends on interaction, safety, protocol, or human-subject boundaries
67
70
  - artifact creation
68
71
  - validation run
69
72
  - evaluation normalization
@@ -95,9 +95,10 @@ Run these on every round:
95
95
  - Academic readability standards are the same in `workflow_language` and `paper_language`; changing languages must not lower external-reader clarity.
96
96
  - If the current round introduces or revises key terms, abbreviations, metric names, mechanism names, or system labels, explain them at first mention by briefly stating what they are and why they matter here.
97
97
  - First mention should use the full form. If a short form or acronym will be reused later, define it at first mention as `Full Form (Short Form)` before switching to the short form.
98
- - Keep one canonical paper-facing name per concept. Do not let one concept drift across paper-facing names, experiment labels, and internal identifiers.
99
- - Keep internal identifiers, config keys, and experiment package labels out of reader-facing prose unless they are mapped once for the reader.
100
- - Prefer natural phrase names over newly coined labels joined by hyphens.
98
+ - Keep one canonical natural-language paper-facing name per concept. Do not let one concept drift across paper-facing names, experiment labels, and internal identifiers.
99
+ - Use natural-language full names in prose. If an approved short form is needed later, define it once and reuse it consistently.
100
+ - Do not use labels containing `_` or `-` in reader-facing prose.
101
+ - Keep internal identifiers, config keys, and experiment package labels out of reader-facing prose unless they are mapped once for the reader and then moved back out of prose.
101
102
  - Do not use unexplained terminology density as a substitute for academic tone.
102
103
  - Build the paper asset plan before prose when the section carries introduction, experimental, method, related-work, or conclusion claims:
103
104
  - record the asset coverage targets and gaps for the current paper
@@ -145,6 +146,7 @@ Run these on every round:
145
146
  - When a round introduces or revises key terms, include a compact terminology note in the user-facing round summary and record the terminology-clarity self-check in the write-iteration artifact.
146
147
  - When `workflow_language` and `paper_language` differ, record the final manuscript language choice in the write-iteration artifact with the workflow language, paper language, finalization decision, and why that decision was chosen.
147
148
  - When `workflow_language` and `paper_language` differ, also record the persisted workflow-language deliverable path in the write-iteration artifact.
149
+ - Keep the handoff wording in the write-iteration artifact explicit and stable: say what is completed, what scope is frozen for the next round, what the next owner is allowed to do, what they must read first, and what counts as accept, revise, or reject.
148
150
  - Return paragraph-level roles for the revised prose when drafting.
149
151
  - Run the five-dimension self-review checklist before accepting a round.
150
152
  - Run reviewer-style checks after every round.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "superlab",
3
- "version": "0.1.39",
3
+ "version": "0.1.41",
4
4
  "description": "Strict /lab research workflow installer for Codex and Claude",
5
5
  "keywords": [
6
6
  "codex",