superlab 0.1.74 → 0.1.76

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/lib/i18n.cjs CHANGED
@@ -1974,7 +1974,7 @@ for (const [relativePath, content] of Object.entries(ZH_SKILL_FILES)) {
1974
1974
 
1975
1975
  const zhRebuttalModeReference = `# Rebuttal Mode
1976
1976
 
1977
- 本文件是 reviewer panel 和外部 rebuttal intake 的唯一共享合同。review、write、auto 阶段只引用本文件,不复制四审稿人逻辑。
1977
+ 本文件是 reviewer panel 和外部 rebuttal intake 的唯一共享合同。review、write、auto 阶段只引用本文件,不复制五审稿人逻辑。
1978
1978
 
1979
1979
  ## 触发条件
1980
1980
 
@@ -1985,6 +1985,19 @@ const zhRebuttalModeReference = `# Rebuttal Mode
1985
1985
 
1986
1986
  普通路径修复、依赖安装、实验轮询不触发本模式,除非它们会影响 paper-facing claim。
1987
1987
 
1988
+ ## Light Read Set / 轻量读取范围
1989
+
1990
+ Rebuttal 是批评和路由,不是全仓审计。默认只读最小证据集合:
1991
+
1992
+ - active LaTeX / 现役 LaTeX:\`main.tex\`、\`sections/*.tex\`、\`tables/*.tex\`、\`figures/*.tex\`、\`analysis/*.tex\`
1993
+ - result summaries / 结果摘要:\`summary.csv\`、\`summary.tsv\`、\`summary.json\`、\`score_effect_summary.json\`、\`metric_summary.*\`、\`run_table.*\`
1994
+ - 受管索引:evidence index、evaluation protocol、paper plan、metric glossary、terminology glossary、artifact status、active topology
1995
+ - 用户提供的外部 rebuttal、批评或审稿意见
1996
+
1997
+ Do not run a whole-repository scan / 不要默认全仓扫描。不要默认读取 raw datasets / 原始数据集、full logs / 完整日志、完整 outputs 树、源码、notebook 或无关旧稿。
1998
+
1999
+ 只有在 LaTeX claim 与结果摘要冲突、表格数值无来源、validator 指向具体文件、或用户明确要求 deep audit 时,才扩大读取范围。扩大时必须在 rebuttal panel 记录原因和额外路径。
2000
+
1988
2001
  ## 外部 Rebuttal Intake
1989
2002
 
1990
2003
  外部批评必须先转成内部 issue,再进入改稿或实验。
@@ -1994,7 +2007,7 @@ const zhRebuttalModeReference = `# Rebuttal Mode
1994
2007
  - 来源:reviewer id、AC、meta-review、同事或用户
1995
2008
  - 批评摘要
1996
2009
  - 影响对象:claim、section、table、figure、protocol、metric、threat model、experiment 或 wording
1997
- - 审稿轴:R1、R2、R3 或 R4
2010
+ - 审稿轴:R1、R2、R3、R4R5
1998
2011
  - 严重性:fatal、major、minor 或 clarification
1999
2012
  - 路由:\`write\`、\`iterate\`、\`report\`、\`framing\`、\`data\`、\`spec\` 或 \`ask-user\`
2000
2013
  - 接受检查:什么证据或稿件状态算修完
@@ -2013,7 +2026,11 @@ const zhRebuttalModeReference = `# Rebuttal Mode
2013
2026
 
2014
2027
  检查消融、鲁棒性、泛化、失败案例、替代解释和指标解释是否完整。
2015
2028
 
2016
- ### R4 Presentation / Clarity
2029
+ ### R4 Results / Tables / Numeric Evidence / 结果、表格与数值证据
2030
+
2031
+ 检查实验数值、差值、表格设计、指标方向、split 数、统计支持、加粗规则、caption 和表注是否可审计;每张主表是否说明评估什么、指标如何解释、协议如何产生行,以及哪些比较边界不能跨越。
2032
+
2033
+ ### R5 Presentation / Clarity
2017
2034
 
2018
2035
  检查叙事线、术语、图表自解释、引用、LaTeX 和 section flow 是否清楚。
2019
2036
 
@@ -2037,13 +2054,25 @@ ZH_CONTENT[path.join(".lab", ".managed", "templates", "rebuttal-panel.md")] = `#
2037
2054
  - 证据基础:
2038
2055
  - 外部 rebuttal 来源(如果有):
2039
2056
 
2057
+ ## Read-scope audit / 读取范围审计
2058
+
2059
+ - 是否从 Light Read Set / 轻量读取范围开始:
2060
+ - Active LaTeX / 现役 LaTeX 文件:
2061
+ - Result summaries / 结果摘要:
2062
+ - 受管索引:
2063
+ - 额外读取路径:
2064
+ - 如有扩大范围,原因:
2065
+ - 是否避免 whole-repository scan / 全仓扫描:
2066
+ - 是否避免 raw datasets / 原始数据集:
2067
+ - 是否避免 full logs / 完整日志:
2068
+
2040
2069
  ## 外部 Rebuttal Intake
2041
2070
 
2042
2071
  | 来源 | 批评摘要 | 影响对象 | 审稿轴 | 严重性 | 路由 | 接受检查 |
2043
2072
  | --- | --- | --- | --- | --- | --- | --- |
2044
2073
  | | | | | | | |
2045
2074
 
2046
- ## 四类审稿视角
2075
+ ## 五类审稿视角
2047
2076
 
2048
2077
  ### R1 Significance / Originality / Insight
2049
2078
 
@@ -2069,7 +2098,15 @@ ZH_CONTENT[path.join(".lab", ".managed", "templates", "rebuttal-panel.md")] = `#
2069
2098
  - 路由:
2070
2099
  - 接受检查:
2071
2100
 
2072
- ### R4 Presentation / Clarity
2101
+ ### R4 Results / Tables / Numeric Evidence / 结果、表格与数值证据
2102
+
2103
+ - 问题:
2104
+ - 为什么重要:
2105
+ - 必要修复:
2106
+ - 路由:
2107
+ - 接受检查:
2108
+
2109
+ ### R5 Presentation / Clarity
2073
2110
 
2074
2111
  - 问题:
2075
2112
  - 为什么重要:
@@ -2172,9 +2209,10 @@ const zhReviewRebuttalMode = `
2172
2209
  ## Rebuttal 模式
2173
2210
 
2174
2211
  - 当目标是论文、section、表、图、report、claim set 或外部 rebuttal 批评时,必须读取 \`skills/lab/references/rebuttal-mode.md\`。
2175
- - 不要在 review 阶段复制四审稿人逻辑;使用 \`.lab/.managed/templates/rebuttal-panel.md\` 写持久 reviewer panel 工件。
2212
+ - 不要在 review 阶段复制五审稿人逻辑;使用 \`.lab/.managed/templates/rebuttal-panel.md\` 写持久 reviewer panel 工件。
2213
+ - 对“rebuttal 一下看有什么缺点”这类快速审查,默认只用 Light Read Set / 轻量读取范围:active LaTeX / 现役 LaTeX、result summaries / 结果摘要、受管索引和用户提供的批评。不要默认 whole-repository scan / 全仓扫描。
2176
2214
  - 外部 reviewer、AC、meta-review、同事或用户批评必须先转成内部可执行 issue,再进入改稿或 response draft。
2177
- - Reviewer Panel 按 R1 Significance / Originality / Insight、R2 Soundness / Technical Quality、R3 Evaluation / Analysis、R4 Presentation / Clarity 四类审稿视角分类。
2215
+ - Reviewer Panel 按 R1 Significance / Originality / Insight、R2 Soundness / Technical Quality、R3 Evaluation / Analysis、R4 Results / Tables / Numeric Evidence、R5 Presentation / Clarity 五类审稿视角分类。
2178
2216
  - L1/L2 默认把核心变更当作批准边界;L3 通过共享核心变更台账策略处理核心 claim、协议、指标、threat model、数据集范围、benchmark 范围或 framing 变化。
2179
2217
  `;
2180
2218
 
@@ -2184,8 +2222,9 @@ const zhWriteRebuttalMode = `
2184
2222
 
2185
2223
  - 当用户提供外部 reviewer、AC、meta-review、rebuttal、同事或用户自己的批评时,起草前必须读取 \`skills/lab/references/rebuttal-mode.md\`。
2186
2224
  - 非平凡 paper-facing 写作轮次应把 rebuttal mode 当成 reviewer acceptance gate,并用 \`.lab/.managed/templates/rebuttal-panel.md\` 写 critique artifact。
2225
+ - write 的 rebuttal gate 必须先用 Light Read Set / 轻量读取范围:active LaTeX / 现役 LaTeX、result summaries / 结果摘要、受管索引和用户提供的批评;除非 rebuttal panel 记录具体扩大原因,否则不要 whole-repository scan / 全仓扫描。
2187
2226
  - 不要实现 write-only rebuttal workflow;共享 rebuttal-mode 负责审稿轴、外部 rebuttal intake、issue routing 和核心变更策略。
2188
- - fatal 或 major 的 R1/R2/R3 issue 未解决前,不要进入 prose polish;先修复、路由到 \`iterate\` / \`report\` / \`framing\` / \`spec\`,或用证据显式 waive。
2227
+ - fatal 或 major 的 R1/R2/R3/R4 issue 未解决前,不要进入 prose polish;先修复、路由到 \`iterate\` / \`report\` / \`framing\` / \`spec\`,或用证据显式 waive。
2189
2228
  - L3 或显式授权的写作 campaign 可以改 paper-level claim、协议、指标、threat model、数据集范围、benchmark 范围或 framing,但必须通过 \`skills/lab/references/rebuttal-mode.md\` 里的 Core Mutation Ledger 策略。
2190
2229
  - 在 write iteration artifact 里记录 rebuttal panel 路径、核心变更台账路径和未解决 issue id。
2191
2230
  `;
@@ -2196,6 +2235,7 @@ const zhAutoRebuttalMode = `
2196
2235
 
2197
2236
  - 当 auto campaign 包含 paper-facing \`report\`、\`write\`、外部 rebuttal repair 或 reviewer-driven paper revision 时,必须读取 \`skills/lab/references/rebuttal-mode.md\`。
2198
2237
  - 使用 \`.lab/.managed/templates/rebuttal-panel.md\` 写持久 Reviewer Panel 工件,不要在 auto mode 里复制一套 reviewer workflow。
2238
+ - reviewer-driven repair 先用 Light Read Set / 轻量读取范围:active LaTeX / 现役 LaTeX、result summaries / 结果摘要、受管索引和用户批评。除非 rebuttal panel 记录扩大原因,否则不要 whole-repository scan / 全仓扫描、raw datasets / 原始数据集或 full logs / 完整日志。
2199
2239
  - 外部 rebuttal 批评必须先转成内部 issue、route 和 acceptance check,再开始 \`run\`、\`iterate\`、\`report\` 或 \`write\`。
2200
2240
  - L1/L2 默认把核心变更当作批准边界;L3 可以在已批准 envelope 内修改 paper-level claim、协议、指标、threat model、reviewer profile、数据集范围、benchmark 范围或 framing。
2201
2241
  - L3 执行核心变更前,必须用 \`.lab/.managed/templates/core-mutation-ledger.md\` 写或更新 \`.lab/writing/core-mutation-ledger.md\`。
@@ -2703,6 +2743,8 @@ const zhAutoPriorityCodexLine =
2703
2743
  "显式的 `/lab:auto` 或 `/lab-auto` 请求,其优先级高于 brainstorming、spec review 这类更宽的创作或审阅技能路径。";
2704
2744
  const zhAutoPriorityClaudeLine =
2705
2745
  "显式的 `/lab auto` 或 `/lab-auto` 请求,其优先级高于 brainstorming、spec review 这类更宽的创作或审阅技能路径。";
2746
+ const zhAutoVisibleCloseoutLine =
2747
+ "最终可见收尾必须直接消费已通过校验的 stage report:展示请求交付物或目标的状态、核心说明表的关键行、证据路径、验证命令和验证结果、已知缺口,以及下一步动作和原因。不能只用“已完成”“已推送”或流水账命令日志结束。";
2706
2748
 
2707
2749
  ZH_CONTENT[path.join(".codex", "prompts", "lab.md")] = ZH_CONTENT[
2708
2750
  path.join(".codex", "prompts", "lab.md")
@@ -2716,6 +2758,9 @@ ZH_CONTENT[path.join(".codex", "prompts", "lab-auto.md")] = ZH_CONTENT[
2716
2758
  ].replace(
2717
2759
  "已批准的 `L2` 和 `L3` 执行 campaign 默认进入执行模式。",
2718
2760
  `${zhAutoPriorityCodexLine}\n已批准的 \`L2\` 和 \`L3\` 执行 campaign 默认进入执行模式。`
2761
+ ).replace(
2762
+ "不要用 `sleep 30`、单次 `pgrep` 或一次性的 `metrics.json` 探针来代替真实长任务命令;当真实实验进程还活着时,只允许在出现有意义变化时发进度更新,并继续等待。没有新变化时,也只按保活节奏汇报,不要让用户触发下一次轮询。",
2763
+ "不要用 `sleep 30`、单次 `pgrep` 或一次性的 `metrics.json` 探针来代替真实长任务命令;当真实实验进程还活着时,只允许在出现有意义变化时发进度更新,并继续等待。没有新变化时,也只按保活节奏汇报,不要让用户触发下一次轮询。\n\n" + zhAutoVisibleCloseoutLine
2719
2764
  );
2720
2765
 
2721
2766
  ZH_CONTENT[path.join(".claude", "commands", "lab.md")] = ZH_CONTENT[
@@ -2730,6 +2775,9 @@ ZH_CONTENT[path.join(".claude", "commands", "lab-auto.md")] = ZH_CONTENT[
2730
2775
  ].replace(
2731
2776
  "已批准的 `L2` 和 `L3` 执行 campaign 默认进入执行模式。",
2732
2777
  `${zhAutoPriorityClaudeLine}\n已批准的 \`L2\` 和 \`L3\` 执行 campaign 默认进入执行模式。`
2778
+ ).replace(
2779
+ "不要用 `sleep 30`、单次 `pgrep` 或一次性的 `metrics.json` 探针来代替真实长任务命令;当真实实验进程还活着时,只允许在出现有意义变化时发进度更新,并继续等待。没有新变化时,也只按保活节奏汇报,不要让用户触发下一次轮询。",
2780
+ "不要用 `sleep 30`、单次 `pgrep` 或一次性的 `metrics.json` 探针来代替真实长任务命令;当真实实验进程还活着时,只允许在出现有意义变化时发进度更新,并继续等待。没有新变化时,也只按保活节奏汇报,不要让用户触发下一次轮询。\n\n" + zhAutoVisibleCloseoutLine
2733
2781
  );
2734
2782
 
2735
2783
  const zhRecipeQuickPathLine =
@@ -2761,6 +2809,12 @@ ZH_CONTENT[path.join(".claude", "commands", "lab.md")] = ZH_CONTENT[
2761
2809
  "- 用户只要显式调用某个 stage,无论写成 `/lab:<stage>`、`/lab: <stage>`、`/lab <stage>`、`/lab-<stage>` 还是 `/lab:<stage>`,都要立刻执行该 stage,而不是只推荐别的阶段。\n- 如果输入看起来像 stage 请求,但又不属于上述受支持写法,就必须停下并要求用户用精确的 stage 名重述,而不是自己猜。\n"
2762
2810
  );
2763
2811
 
2812
+ for (const rootPromptKey of [path.join(".codex", "prompts", "lab.md"), path.join(".claude", "commands", "lab.md")]) {
2813
+ if (ZH_CONTENT[rootPromptKey] && !ZH_CONTENT[rootPromptKey].includes("最终可见收尾")) {
2814
+ ZH_CONTENT[rootPromptKey] += `\n\n${zhAutoVisibleCloseoutLine}\n`;
2815
+ }
2816
+ }
2817
+
2764
2818
  ZH_CONTENT[path.join(".codex", "skills", "lab", "SKILL.md")] = `---
2765
2819
  name: lab
2766
2820
  description: 严格研究工作流,覆盖 idea、data、auto、framing、spec、run、iterate、review、report 和 paper-writing。
@@ -3405,6 +3459,18 @@ ZH_CONTENT[path.join(".codex", "skills", "lab", "stages", "auto.md")] = ZH_CONTE
3405
3459
  "- 只有当级别本身真的有歧义时,才停下来追问,例如 \\`第三层\\`、\\`phase 3\\`、\\`table 3\\`。",
3406
3460
  "- 只有当级别本身真的有歧义时,才停下来追问,例如 \\`第三层\\`、\\`phase 3\\`、\\`table 3\\`。\n- 如果用户显式调用 \\`/lab:auto\\` 或 \\`/lab-auto\\`,就保持在 auto 执行路径里;只要请求仍在已批准 execution envelope 内,即使目标听起来像 feature selection、baseline selection、离散化或 candidate sweep,也不要重新路由到 brainstorming 或 spec review。"
3407
3461
  );
3462
+ const zhAutoStageVisibleCloseout = `
3463
+
3464
+ ## 最终可见收尾
3465
+
3466
+ - 最终可见收尾必须在 stage report 校验通过后给出,不能只写“已完成”“已推送”或命令流水账。
3467
+ - 最终可见收尾必须直接来自已校验的阶段报告,而不是另起一套临场叙述。
3468
+ - 最终可见收尾至少包含:请求交付物或目标及状态、核心说明表关键行、证据路径、验证命令和验证结果、已知缺口、下一步动作和为什么这样做。
3469
+ - 如果说“已完成”,也必须同时写明仍然存在的 handoff 边界,例如 PDF 编译、版面检查、外部审批、预算耗尽、冻结核心风险或环境缺失。
3470
+ `;
3471
+ if (!ZH_CONTENT[path.join(".codex", "skills", "lab", "stages", "auto.md")].includes("最终可见收尾")) {
3472
+ ZH_CONTENT[path.join(".codex", "skills", "lab", "stages", "auto.md")] += zhAutoStageVisibleCloseout;
3473
+ }
3408
3474
  ZH_CONTENT[path.join(".claude", "skills", "lab", "stages", "auto.md")] =
3409
3475
  ZH_CONTENT[path.join(".codex", "skills", "lab", "stages", "auto.md")];
3410
3476
  ZH_CONTENT[path.join(".claude", "skills", "lab", "stages", "report.md")] =
@@ -100,6 +100,7 @@ Treat all of these as equivalent stage requests:
100
100
  - While the loop is alive, `/lab auto` should keep `.lab/context/auto-ledger.md` updated with the active owner, observed state, and resume boundary.
101
101
  - Separate internal polling from user-facing progress reports.
102
102
  - While the loop is healthy, `/lab auto` should report to the user only on a meaningful change or at the keepalive cadence recorded in the current contract or runtime state, and it should not ask the user to trigger the next poll.
103
+ - Final visible closeout must consume the validated stage report: show requested deliverable statuses, Core Explanation Table rows, evidence paths, validation/verification commands and results, known gaps, and the next action. Do not end with only "done", "pushed", or a chronological command log.
103
104
 
104
105
  - Treat `Autonomy level L1/L2/L3` as the execution privilege level, not as a paper layer, phase, or table number.
105
106
  - Treat `paper layer`, `phase`, and `table` as experiment targets. For example, `paper layer 3` or `Phase 1` should not be interpreted as `Autonomy level L3`.
@@ -27,3 +27,4 @@ If the preflight block cannot be completed because any required field is missing
27
27
  When the repository workflow language is Chinese, summaries, checklist items, task labels, and progress updates should be written in Chinese unless a literal identifier must stay unchanged.
28
28
  Treat `Layer 3`, `Phase 1`, or `Table 2` as paper-scope targets. Treat `Autonomy level L3` as the execution permission level.
29
29
  Do not replace the real long-running experiment command with a short watcher such as `sleep 30`, `pgrep`, or a one-shot `metrics.json` probe. While the real experiment process is still alive, emit only a progress update and keep waiting.
30
+ Final visible closeout is mandatory when `/lab:auto` reaches stop, failure, escalation, or handoff. After validating the stage report, the final answer must consume that report directly: list the requested deliverables or objectives with status, summarize the Core Explanation Table rows, provide evidence paths, show validation/verification commands and validation results, name known gaps or commands that could not run, and state the next action plus why it is appropriate. Do not end with only `done`, `pushed`, `completed`, or a chronological command log.
@@ -27,3 +27,4 @@ If the preflight block cannot be completed because any required field is missing
27
27
  When the repository workflow language is Chinese, summaries, checklist items, task labels, and progress updates should be written in Chinese unless a literal identifier must stay unchanged.
28
28
  Treat `Layer 3`, `Phase 1`, or `Table 2` as paper-scope targets. Treat `Autonomy level L3` as the execution permission level.
29
29
  Do not replace the real long-running experiment command with a short watcher such as `sleep 30`, `pgrep`, or a one-shot `metrics.json` probe. While the real experiment process is still alive, emit only a progress update and keep waiting.
30
+ Final visible closeout is mandatory when `/lab:auto` reaches stop, failure, escalation, or handoff. After validating the stage report, the final answer must consume that report directly: list the requested deliverables or objectives with status, summarize the Core Explanation Table rows, provide evidence paths, show validation/verification commands and validation results, name known gaps or commands that could not run, and state the next action plus why it is appropriate. Do not end with only `done`, `pushed`, `completed`, or a chronological command log.
@@ -94,6 +94,7 @@ Treat all of these as equivalent stage requests:
94
94
  - While the loop is alive, `/lab:auto` should keep `.lab/context/auto-ledger.md` updated with the active owner, observed state, and resume boundary.
95
95
  - Separate internal polling from user-facing progress reports.
96
96
  - While the loop is healthy, `/lab:auto` should report to the user only on a meaningful change or at the keepalive cadence recorded in the current contract or runtime state, and it should not ask the user to trigger the next poll.
97
+ - Final visible closeout must consume the validated stage report: show requested deliverable statuses, Core Explanation Table rows, evidence paths, validation/verification commands and results, known gaps, and the next action. Do not end with only "done", "pushed", or a chronological command log.
97
98
 
98
99
  - Treat `Autonomy level L1/L2/L3` as the execution privilege level, not as a paper layer, phase, or table number.
99
100
  - Treat `paper layer`, `phase`, and `table` as experiment targets. For example, `paper layer 3` or `Phase 1` should not be interpreted as `Autonomy level L3`.
@@ -27,3 +27,4 @@ If the preflight block cannot be completed because any required field is missing
27
27
  When the repository workflow language is Chinese, summaries, checklist items, task labels, and progress updates should be written in Chinese unless a literal identifier must stay unchanged.
28
28
  Treat `Layer 3`, `Phase 1`, or `Table 2` as paper-scope targets. Treat `Autonomy level L3` as the execution permission level.
29
29
  Do not replace the real long-running experiment command with a short watcher such as `sleep 30`, `pgrep`, or a one-shot `metrics.json` probe. While the real experiment process is still alive, emit only a progress update and keep waiting.
30
+ Final visible closeout is mandatory when `/lab:auto` reaches stop, failure, escalation, or handoff. After validating the stage report, the final answer must consume that report directly: list the requested deliverables or objectives with status, summarize the Core Explanation Table rows, provide evidence paths, show validation/verification commands and validation results, name known gaps or commands that could not run, and state the next action plus why it is appropriate. Do not end with only `done`, `pushed`, `completed`, or a chronological command log.
@@ -27,3 +27,4 @@ If the preflight block cannot be completed because any required field is missing
27
27
  When the repository workflow language is Chinese, summaries, checklist items, task labels, and progress updates should be written in Chinese unless a literal identifier must stay unchanged.
28
28
  Treat `Layer 3`, `Phase 1`, or `Table 2` as paper-scope targets. Treat `Autonomy level L3` as the execution permission level.
29
29
  Do not replace the real long-running experiment command with a short watcher such as `sleep 30`, `pgrep`, or a one-shot `metrics.json` probe. While the real experiment process is still alive, emit only a progress update and keep waiting.
30
+ Final visible closeout is mandatory when `/lab:auto` reaches stop, failure, escalation, or handoff. After validating the stage report, the final answer must consume that report directly: list the requested deliverables or objectives with status, summarize the Core Explanation Table rows, provide evidence paths, show validation/verification commands and validation results, name known gaps or commands that could not run, and state the next action plus why it is appropriate. Do not end with only `done`, `pushed`, `completed`, or a chronological command log.
@@ -8,6 +8,18 @@
8
8
  - Evidence base:
9
9
  - External rebuttal source, if any:
10
10
 
11
+ ## Read-scope audit
12
+
13
+ - Started from the Light Read Set:
14
+ - active LaTeX files read:
15
+ - result summaries read:
16
+ - Managed indices read:
17
+ - Extra paths read:
18
+ - Why scope was expanded, if any:
19
+ - Whole-repository scan avoided:
20
+ - Raw datasets avoided:
21
+ - Full logs avoided:
22
+
11
23
  ## External Rebuttal Intake
12
24
 
13
25
  | Source | Raw criticism summary | Affected unit | Reviewer axis | Severity | Route | Acceptance check |
@@ -40,7 +52,15 @@
40
52
  - Route:
41
53
  - Acceptance check:
42
54
 
43
- ### R4 Presentation / Clarity
55
+ ### R4 Results / Tables / Numeric Evidence
56
+
57
+ - Finding:
58
+ - Why it matters:
59
+ - Required fix:
60
+ - Route:
61
+ - Acceptance check:
62
+
63
+ ### R5 Presentation / Clarity
44
64
 
45
65
  - Finding:
46
66
  - Why it matters:
@@ -68,4 +88,3 @@
68
88
  - Next route:
69
89
  - Blocking issue, if any:
70
90
  - Handoff note:
71
-
@@ -49,11 +49,12 @@ Use this skill when the user invokes `/lab:*` or asks for the structured researc
49
49
  - If the stage says improvement is needed, do not choose `stop` unless the next action states a concrete terminal boundary such as budget exhaustion, frozen-core risk, safety or integrity failure, impossible target, or a required approval boundary. Otherwise choose `continue`, `revise`, `rerun`, or `escalate`.
50
50
  - Stage reports are closeout and handoff artifacts, not a new user command and not a replacement for stage-specific artifacts such as idea memos, iteration reports, final reports, or write-iteration records.
51
51
  - Run `.lab/.managed/scripts/validate_stage_report.py --stage-report <stage-report> --stage <stage>` before claiming the stage is complete, and include the stage-report path plus validation result in the final user-facing summary.
52
+ - For `/lab:auto`, the final user-facing answer must visibly consume the validated stage report: summarize requested deliverable statuses, Core Explanation Table rows, evidence paths, validation/verification commands and results, known gaps, and the next action. A chat-only chronological result list is not a valid closeout.
52
53
  - Final paper output should default to LaTeX, and its manuscript language should be decided separately from the workflow language.
53
54
  - Separate sourced facts from model-generated hypotheses.
54
55
  - Preserve failed runs, failed ideas, and limitations.
55
56
  - Use `skills/lab/references/recipes.md` as the quick path for common stage chains without inventing new commands.
56
- - Use `.codex/skills/lab/references/rebuttal-mode.md` or `.claude/skills/lab/references/rebuttal-mode.md` as the single shared reviewer-panel and external rebuttal intake contract. Do not copy four-reviewer logic into `review`, `write`, or `auto` stage guides.
57
+ - Use `.codex/skills/lab/references/rebuttal-mode.md` or `.claude/skills/lab/references/rebuttal-mode.md` as the single shared reviewer-panel and external rebuttal intake contract. Do not copy five-reviewer logic into `review`, `write`, or `auto` stage guides.
57
58
 
58
59
  ## Stage Contract
59
60
 
@@ -23,6 +23,28 @@ Do not trigger rebuttal mode for routine implementation reviews, path fixes, dep
23
23
  - external rebuttal text when provided
24
24
  - active autonomy level when the stage is `/lab:auto`
25
25
 
26
+ ## Light Read Set
27
+
28
+ Rebuttal mode is a criticism and routing pass, not a full repository audit. Start with the smallest evidence bundle that can support reviewer-style findings.
29
+
30
+ Default read set:
31
+
32
+ - active LaTeX manuscript files: `main.tex`, `sections/*.tex`, `tables/*.tex`, `figures/*.tex`, and `analysis/*.tex` when they are part of the active paper topology
33
+ - result summaries: `summary.csv`, `summary.tsv`, `summary.json`, `score_effect_summary.json`, `metric_summary.*`, `run_table.*`, selected aggregate tables, and already-rendered table inputs
34
+ - managed paper indices when present: evidence index, evaluation protocol, paper plan, metric glossary, terminology glossary, artifact status, and active topology file
35
+ - the specific external rebuttal text, user criticism, or reviewer comments supplied for the pass
36
+
37
+ Do not run a whole-repository scan by default. Do not read raw datasets, full logs, full output trees, source code, notebooks, or unrelated drafts unless a specific issue cannot be resolved from the light read set.
38
+
39
+ Expand the read set only when one of these conditions holds:
40
+
41
+ - a LaTeX claim names a result whose summary file is missing or contradictory
42
+ - a table value cannot be traced to any result summary
43
+ - a validator points to a specific source file or generated artifact
44
+ - the user explicitly asks for a deep audit instead of a rebuttal pass
45
+
46
+ When expanding scope, record the reason and extra paths in the rebuttal panel. If the pass stays within the light read set, record that as well.
47
+
26
48
  ## External Rebuttal Intake
27
49
 
28
50
  External criticism must be converted into internal issues before any rewrite.
@@ -32,7 +54,7 @@ For each external comment, record:
32
54
  - source: reviewer id, AC, meta-review, colleague, or user
33
55
  - raw criticism summary
34
56
  - affected paper unit: claim, section, table, figure, protocol, metric, threat model, experiment, or wording
35
- - reviewer axis: R1, R2, R3, or R4
57
+ - reviewer axis: R1, R2, R3, R4, or R5
36
58
  - severity: fatal, major, minor, or clarification
37
59
  - route: `write`, `iterate`, `report`, `framing`, `data`, `spec`, or `ask-user`
38
60
  - acceptance check: concrete evidence or manuscript condition that resolves the issue
@@ -41,7 +63,7 @@ Do not answer external criticism with prose-only reassurance. If the issue is va
41
63
 
42
64
  ## Reviewer Panel
43
65
 
44
- Run four independent review lenses. Each lens must produce actionable issues, not vague advice.
66
+ Run five independent review lenses. Each lens must produce actionable issues, not vague advice.
45
67
 
46
68
  ### R1 Significance / Originality / Insight
47
69
 
@@ -61,7 +83,13 @@ Ask whether evaluation covers ablations, robustness, generalization, failure cas
61
83
 
62
84
  Typical fixes route to `iterate`, `report`, or `write`.
63
85
 
64
- ### R4 Presentation / Clarity
86
+ ### R4 Results / Tables / Numeric Evidence
87
+
88
+ Ask whether reported numbers, deltas, table design, metric directions, split counts, statistical support, bolding, captions, and table notes make the evidence auditable. Check whether each major table states what it evaluates, how metrics are computed or interpreted, what protocol generated the rows, and what can or cannot be compared.
89
+
90
+ Typical fixes route to `report`, `iterate`, or `write`.
91
+
92
+ ### R5 Presentation / Clarity
65
93
 
66
94
  Ask whether the storyline, terminology, figure/table semantics, citations, LaTeX, and section flow are readable and self-contained.
67
95
 
@@ -121,7 +149,7 @@ If old evidence remains usable under a narrower interpretation, say exactly wher
121
149
 
122
150
  `/lab:review` uses rebuttal mode as its reviewer-panel operating mode when the target is paper-facing or when external criticism is supplied.
123
151
 
124
- `/lab:write` uses rebuttal mode as an acceptance gate for nontrivial section or manuscript rounds. A write round may not proceed to prose polish while a fatal or major R1/R2/R3 issue remains unresolved.
152
+ `/lab:write` uses rebuttal mode as an acceptance gate for nontrivial section or manuscript rounds. A write round may not proceed to prose polish while a fatal or major R1/R2/R3/R4 issue remains unresolved.
125
153
 
126
154
  `/lab:auto` uses rebuttal mode as a promotion guard when the campaign includes paper-facing `report`, `write`, or external rebuttal repair. In L3, auto may execute core mutation after ledger entry and impact audit.
127
155
 
@@ -132,4 +160,3 @@ If old evidence remains usable under a narrower interpretation, say exactly wher
132
160
  - Revise when the fix is manuscript-only.
133
161
  - Escalate when the issue requires a decision outside the current autonomy level.
134
162
  - Stop only when the remaining issue is terminal, already waived with evidence, or outside the campaign boundary.
135
-
@@ -129,6 +129,7 @@
129
129
 
130
130
  - When an auto campaign includes paper-facing `report`, `write`, external rebuttal repair, or reviewer-driven paper revision, load the shared rebuttal procedure in `skills/lab/references/rebuttal-mode.md`.
131
131
  - Use `.lab/.managed/templates/rebuttal-panel.md` for the durable Reviewer Panel artifact instead of embedding a separate reviewer workflow in auto mode.
132
+ - Start reviewer-driven repair from the rebuttal Light Read Set: active LaTeX, result summaries, managed indices, and supplied criticism. Do not perform whole-repository scans, raw dataset reads, or full log sweeps unless the rebuttal panel records a concrete expansion reason.
132
133
  - External rebuttal criticism must be converted into internal issues, routes, and acceptance checks before `run`, `iterate`, `report`, or `write` work starts.
133
134
  - In L1/L2, core mutation remains an approval boundary unless explicitly authorized by the auto contract.
134
135
  - In L3, auto may change paper-level claim, protocol, metric, threat model, reviewer profile, dataset scope, benchmark scope, or framing inside the approved campaign envelope. It must first write or update `.lab/writing/core-mutation-ledger.md` from `.lab/.managed/templates/core-mutation-ledger.md`.
@@ -230,3 +231,13 @@
230
231
  - Fill the `Core Explanation Table` in plain language: background, why now, what ran, how the loop ran, what worked, what did not work, what was verified, what remains unverified, what needs improvement and why, how to improve and why, key evidence, and the continue/stop/revise/rerun/escalate/handoff decision.
231
232
  - If the table says improvement is needed, the next action may be `stop` only when a terminal boundary is explicitly named; otherwise choose `continue`, `revise`, `rerun`, or `escalate`.
232
233
  - Run `.lab/.managed/scripts/validate_stage_report.py --stage-report <stage-report> --stage auto` and include the report path plus validation result in the final user-facing summary.
234
+ - Final visible closeout is mandatory after validation. Do not end `/lab:auto` with only "done", "pushed", "completed", or a chronological command log.
235
+ - The final visible closeout must be derived from the validated stage report, not from a separate improvised narrative.
236
+ - The final visible closeout must include:
237
+ - the user's requested deliverables or objectives and their status: completed, repaired, failed-gate, not promoted, blocked, or handoff
238
+ - the key Core Explanation Table rows: what was done, how it was done, what worked, what did not work, what was verified, what remains unverified, whether improvement is needed and why, how to improve and why
239
+ - evidence paths and primary artifacts
240
+ - validation/verification commands and validation result, including commands that could not run
241
+ - known gaps or compile/runtime limitations
242
+ - next action and why that action is appropriate
243
+ - If the final answer says the work is "completed", it must still name any remaining handoff boundary such as PDF compile, layout check, external approval, budget exhaustion, frozen-core risk, or missing environment.
@@ -44,9 +44,10 @@
44
44
  ## Rebuttal Mode
45
45
 
46
46
  - When the target is a paper, paper section, table, figure, report, claim set, or external rebuttal criticism, run the shared reviewer-panel procedure in `skills/lab/references/rebuttal-mode.md`.
47
- - Do not duplicate the four-reviewer logic in this stage file. Use `.lab/.managed/templates/rebuttal-panel.md` for the durable critique artifact.
47
+ - Do not duplicate the five-reviewer logic in this stage file. Use `.lab/.managed/templates/rebuttal-panel.md` for the durable critique artifact.
48
+ - For quick prompts such as "rebuttal一下看有什么缺点", start with the rebuttal Light Read Set only: active LaTeX, result summaries, managed indices, and supplied criticism. Do not run a whole-repository scan unless the panel records a specific escalation reason.
48
49
  - External rebuttal, AC, meta-review, colleague, or user criticism must be converted into internal actionable issues before any rewrite or response draft.
49
- - The Reviewer Panel must classify issues across R1 Significance / Originality / Insight, R2 Soundness / Technical Quality, R3 Evaluation / Analysis, and R4 Presentation / Clarity.
50
+ - The Reviewer Panel must classify issues across R1 Significance / Originality / Insight, R2 Soundness / Technical Quality, R3 Evaluation / Analysis, R4 Results / Tables / Numeric Evidence, and R5 Presentation / Clarity.
50
51
  - Each issue must include severity, affected artifact, required fix, route, acceptance check, and whether core mutation is required.
51
52
  - In L1/L2, core mutation remains an approval boundary unless explicitly authorized. In L3, route core mutation through the shared ledger policy instead of treating it as a reviewer-stage blocker.
52
53
 
@@ -75,8 +75,9 @@ Run these on every round:
75
75
 
76
76
  - When the user provides external reviewer, AC, meta-review, rebuttal, colleague, or user criticism, load `skills/lab/references/rebuttal-mode.md` before drafting.
77
77
  - For nontrivial paper-facing write rounds, use rebuttal mode as the reviewer acceptance gate and write the critique artifact from `.lab/.managed/templates/rebuttal-panel.md`.
78
+ - Rebuttal gating in write mode must start from the rebuttal Light Read Set. Read active LaTeX, result summaries, managed indices, and supplied criticism first; avoid whole-repository scans unless the rebuttal panel records a concrete expansion reason.
78
79
  - Do not implement a separate write-only rebuttal workflow. The shared rebuttal-mode reference owns reviewer axes, external rebuttal intake, issue routing, and core mutation policy.
79
- - Fatal or major R1/R2/R3 issues block prose polish until they are repaired, routed to `iterate`/`report`/`framing`/`spec`, or explicitly waived with evidence.
80
+ - Fatal or major R1/R2/R3/R4 issues block prose polish until they are repaired, routed to `iterate`/`report`/`framing`/`spec`, or explicitly waived with evidence.
80
81
  - In L3 or an explicitly core-authorized write campaign, paper-level claim, protocol, metric, threat model, dataset scope, benchmark scope, or framing changes are allowed only through the shared Core Mutation Ledger policy in `skills/lab/references/rebuttal-mode.md`.
81
82
  - Record the rebuttal panel path, any core mutation ledger path, and unresolved issue ids in the write-iteration artifact.
82
83
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "superlab",
3
- "version": "0.1.74",
3
+ "version": "0.1.76",
4
4
  "description": "Strict /lab research workflow installer for Codex and Claude",
5
5
  "keywords": [
6
6
  "codex",