superlab 0.1.73 → 0.1.75
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/lib/i18n.cjs +272 -0
- package/lib/install.cjs +4 -0
- package/package-assets/claude/commands/lab.md +1 -0
- package/package-assets/codex/prompts/lab/auto.md +1 -0
- package/package-assets/codex/prompts/lab-auto.md +1 -0
- package/package-assets/codex/prompts/lab.md +1 -0
- package/package-assets/codex/prompts/lab:auto.md +1 -0
- package/package-assets/codex/prompts/lab/357/274/232auto.md +1 -0
- package/package-assets/shared/lab/.managed/templates/core-mutation-ledger.md +66 -0
- package/package-assets/shared/lab/.managed/templates/rebuttal-panel.md +71 -0
- package/package-assets/shared/skills/lab/SKILL.md +4 -0
- package/package-assets/shared/skills/lab/references/rebuttal-mode.md +135 -0
- package/package-assets/shared/skills/lab/stages/auto.md +20 -0
- package/package-assets/shared/skills/lab/stages/review.md +9 -0
- package/package-assets/shared/skills/lab/stages/write.md +9 -0
- package/package.json +1 -1
package/lib/i18n.cjs
CHANGED
|
@@ -1972,6 +1972,237 @@ for (const [relativePath, content] of Object.entries(ZH_SKILL_FILES)) {
|
|
|
1972
1972
|
ZH_CONTENT[relativePath.replace(".codex", ".claude")] = content;
|
|
1973
1973
|
}
|
|
1974
1974
|
|
|
1975
|
+
const zhRebuttalModeReference = `# Rebuttal Mode
|
|
1976
|
+
|
|
1977
|
+
本文件是 reviewer panel 和外部 rebuttal intake 的唯一共享合同。review、write、auto 阶段只引用本文件,不复制四审稿人逻辑。
|
|
1978
|
+
|
|
1979
|
+
## 触发条件
|
|
1980
|
+
|
|
1981
|
+
- 用户提供 reviewer、AC、meta-review、rebuttal、同事或用户自己的批评意见
|
|
1982
|
+
- \`/lab:review\` 审查论文、section、表、图、report 或 claim set
|
|
1983
|
+
- \`/lab:write\` 在接受非平凡论文轮次前需要 reviewer gate
|
|
1984
|
+
- \`/lab:auto\` 包含 paper-facing report、write 或 reviewer-driven repair
|
|
1985
|
+
|
|
1986
|
+
普通路径修复、依赖安装、实验轮询不触发本模式,除非它们会影响 paper-facing claim。
|
|
1987
|
+
|
|
1988
|
+
## 外部 Rebuttal Intake
|
|
1989
|
+
|
|
1990
|
+
外部批评必须先转成内部 issue,再进入改稿或实验。
|
|
1991
|
+
|
|
1992
|
+
每条外部意见必须记录:
|
|
1993
|
+
|
|
1994
|
+
- 来源:reviewer id、AC、meta-review、同事或用户
|
|
1995
|
+
- 批评摘要
|
|
1996
|
+
- 影响对象:claim、section、table、figure、protocol、metric、threat model、experiment 或 wording
|
|
1997
|
+
- 审稿轴:R1、R2、R3 或 R4
|
|
1998
|
+
- 严重性:fatal、major、minor 或 clarification
|
|
1999
|
+
- 路由:\`write\`、\`iterate\`、\`report\`、\`framing\`、\`data\`、\`spec\` 或 \`ask-user\`
|
|
2000
|
+
- 接受检查:什么证据或稿件状态算修完
|
|
2001
|
+
|
|
2002
|
+
## Reviewer Panel
|
|
2003
|
+
|
|
2004
|
+
### R1 Significance / Originality / Insight
|
|
2005
|
+
|
|
2006
|
+
检查问题是否重要、贡献是否只是工程堆叠、核心 insight 是否回答社区学到了什么。
|
|
2007
|
+
|
|
2008
|
+
### R2 Soundness / Technical Quality
|
|
2009
|
+
|
|
2010
|
+
检查方法、假设、协议、baseline、实现细节、指标和统计是否站得住。
|
|
2011
|
+
|
|
2012
|
+
### R3 Evaluation / Analysis
|
|
2013
|
+
|
|
2014
|
+
检查消融、鲁棒性、泛化、失败案例、替代解释和指标解释是否完整。
|
|
2015
|
+
|
|
2016
|
+
### R4 Presentation / Clarity
|
|
2017
|
+
|
|
2018
|
+
检查叙事线、术语、图表自解释、引用、LaTeX 和 section flow 是否清楚。
|
|
2019
|
+
|
|
2020
|
+
## Core Mutation Ledger
|
|
2021
|
+
|
|
2022
|
+
核心变更包括改 paper-level claim、实验协议、指标定义、主指标、threat model、reviewer profile、数据集范围、benchmark 范围或 framing。
|
|
2023
|
+
|
|
2024
|
+
L1/L2 默认把核心变更当作批准边界。L3 可以在已批准 envelope 内执行核心变更,但必须先用 \`.lab/.managed/templates/core-mutation-ledger.md\` 写入 \`.lab/writing/core-mutation-ledger.md\`。
|
|
2025
|
+
`;
|
|
2026
|
+
|
|
2027
|
+
ZH_CONTENT[path.join(".codex", "skills", "lab", "references", "rebuttal-mode.md")] = zhRebuttalModeReference;
|
|
2028
|
+
ZH_CONTENT[path.join(".claude", "skills", "lab", "references", "rebuttal-mode.md")] = zhRebuttalModeReference;
|
|
2029
|
+
|
|
2030
|
+
ZH_CONTENT[path.join(".lab", ".managed", "templates", "rebuttal-panel.md")] = `# Rebuttal Panel / rebuttal 面板
|
|
2031
|
+
|
|
2032
|
+
## 审查目标
|
|
2033
|
+
|
|
2034
|
+
- 目标工件:
|
|
2035
|
+
- 阶段:
|
|
2036
|
+
- 自治级别:
|
|
2037
|
+
- 证据基础:
|
|
2038
|
+
- 外部 rebuttal 来源(如果有):
|
|
2039
|
+
|
|
2040
|
+
## 外部 Rebuttal Intake
|
|
2041
|
+
|
|
2042
|
+
| 来源 | 批评摘要 | 影响对象 | 审稿轴 | 严重性 | 路由 | 接受检查 |
|
|
2043
|
+
| --- | --- | --- | --- | --- | --- | --- |
|
|
2044
|
+
| | | | | | | |
|
|
2045
|
+
|
|
2046
|
+
## 四类审稿视角
|
|
2047
|
+
|
|
2048
|
+
### R1 Significance / Originality / Insight
|
|
2049
|
+
|
|
2050
|
+
- 问题:
|
|
2051
|
+
- 为什么重要:
|
|
2052
|
+
- 必要修复:
|
|
2053
|
+
- 路由:
|
|
2054
|
+
- 接受检查:
|
|
2055
|
+
|
|
2056
|
+
### R2 Soundness / Technical Quality
|
|
2057
|
+
|
|
2058
|
+
- 问题:
|
|
2059
|
+
- 为什么重要:
|
|
2060
|
+
- 必要修复:
|
|
2061
|
+
- 路由:
|
|
2062
|
+
- 接受检查:
|
|
2063
|
+
|
|
2064
|
+
### R3 Evaluation / Analysis
|
|
2065
|
+
|
|
2066
|
+
- 问题:
|
|
2067
|
+
- 为什么重要:
|
|
2068
|
+
- 必要修复:
|
|
2069
|
+
- 路由:
|
|
2070
|
+
- 接受检查:
|
|
2071
|
+
|
|
2072
|
+
### R4 Presentation / Clarity
|
|
2073
|
+
|
|
2074
|
+
- 问题:
|
|
2075
|
+
- 为什么重要:
|
|
2076
|
+
- 必要修复:
|
|
2077
|
+
- 路由:
|
|
2078
|
+
- 接受检查:
|
|
2079
|
+
|
|
2080
|
+
## 可执行问题登记
|
|
2081
|
+
|
|
2082
|
+
| ID | 审稿轴 | 严重性 | 影响工件 | 问题 | 必要修复 | 路由 | 接受检查 | 是否需要核心变更 |
|
|
2083
|
+
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
|
|
2084
|
+
| | | | | | | | | |
|
|
2085
|
+
|
|
2086
|
+
## 核心变更检查
|
|
2087
|
+
|
|
2088
|
+
- 是否需要改变 paper-level claim、协议、指标、threat model、数据集范围、benchmark 范围或 framing:
|
|
2089
|
+
- 当前自治级别是否允许:
|
|
2090
|
+
- 核心变更台账路径:
|
|
2091
|
+
- 被废弃的旧 claim 或资产:
|
|
2092
|
+
- 新验证要求:
|
|
2093
|
+
|
|
2094
|
+
## 决策
|
|
2095
|
+
|
|
2096
|
+
- continue / revise / rerun / escalate / stop:
|
|
2097
|
+
- 下一路由:
|
|
2098
|
+
- 阻塞 issue:
|
|
2099
|
+
- 交接说明:
|
|
2100
|
+
`;
|
|
2101
|
+
|
|
2102
|
+
ZH_CONTENT[path.join(".lab", ".managed", "templates", "core-mutation-ledger.md")] = `# 核心变更台账
|
|
2103
|
+
|
|
2104
|
+
## 变更摘要
|
|
2105
|
+
|
|
2106
|
+
- 变更 ID:
|
|
2107
|
+
- 日期:
|
|
2108
|
+
- Mutation trigger / 触发源:internal reviewer panel / external rebuttal / failed gate / anomaly / user request
|
|
2109
|
+
- 是否被当前自治级别允许:
|
|
2110
|
+
- campaign 或阶段:
|
|
2111
|
+
- owner:
|
|
2112
|
+
|
|
2113
|
+
## 被改变的核心项
|
|
2114
|
+
|
|
2115
|
+
- 类型:claim / protocol / metric / threat model / reviewer profile / dataset scope / benchmark scope / framing
|
|
2116
|
+
- 旧状态:
|
|
2117
|
+
- 新状态:
|
|
2118
|
+
- 为什么必须改:
|
|
2119
|
+
- 为什么不能只靠稿件措辞修复:
|
|
2120
|
+
|
|
2121
|
+
## 来源 Issue
|
|
2122
|
+
|
|
2123
|
+
- rebuttal 面板路径:
|
|
2124
|
+
- issue ID:
|
|
2125
|
+
- 审稿轴:
|
|
2126
|
+
- 严重性:
|
|
2127
|
+
- 外部批评来源(如果有):
|
|
2128
|
+
|
|
2129
|
+
## 作废或收窄的旧结论
|
|
2130
|
+
|
|
2131
|
+
- Invalidated prior claims:
|
|
2132
|
+
- 作废 claims:
|
|
2133
|
+
- 受影响 sections:
|
|
2134
|
+
- 受影响 tables:
|
|
2135
|
+
- 受影响 figures:
|
|
2136
|
+
- 受影响 reports:
|
|
2137
|
+
- 被 supersede 的结果或运行:
|
|
2138
|
+
- 仍可在更窄解释下使用的证据:
|
|
2139
|
+
|
|
2140
|
+
## 全文影响审计
|
|
2141
|
+
|
|
2142
|
+
- Paper-wide impact audit:
|
|
2143
|
+
- Abstract:
|
|
2144
|
+
- Introduction:
|
|
2145
|
+
- Method:
|
|
2146
|
+
- Experiments:
|
|
2147
|
+
- Related work:
|
|
2148
|
+
- Conclusion:
|
|
2149
|
+
- Tables:
|
|
2150
|
+
- Figures / analysis assets:
|
|
2151
|
+
- Metric glossary:
|
|
2152
|
+
- Terminology glossary:
|
|
2153
|
+
- Paper plan:
|
|
2154
|
+
|
|
2155
|
+
## 新验证
|
|
2156
|
+
|
|
2157
|
+
- 需要的新证据:
|
|
2158
|
+
- rerun 或 validator 命令:
|
|
2159
|
+
- manuscript validators:
|
|
2160
|
+
- confirmation check:
|
|
2161
|
+
- 结果:
|
|
2162
|
+
|
|
2163
|
+
## 回滚
|
|
2164
|
+
|
|
2165
|
+
- 回滚条件:
|
|
2166
|
+
- 回滚目标:
|
|
2167
|
+
- 备注:
|
|
2168
|
+
`;
|
|
2169
|
+
|
|
2170
|
+
const zhReviewRebuttalMode = `
|
|
2171
|
+
|
|
2172
|
+
## Rebuttal 模式
|
|
2173
|
+
|
|
2174
|
+
- 当目标是论文、section、表、图、report、claim set 或外部 rebuttal 批评时,必须读取 \`skills/lab/references/rebuttal-mode.md\`。
|
|
2175
|
+
- 不要在 review 阶段复制四审稿人逻辑;使用 \`.lab/.managed/templates/rebuttal-panel.md\` 写持久 reviewer panel 工件。
|
|
2176
|
+
- 外部 reviewer、AC、meta-review、同事或用户批评必须先转成内部可执行 issue,再进入改稿或 response draft。
|
|
2177
|
+
- Reviewer Panel 按 R1 Significance / Originality / Insight、R2 Soundness / Technical Quality、R3 Evaluation / Analysis、R4 Presentation / Clarity 四类审稿视角分类。
|
|
2178
|
+
- L1/L2 默认把核心变更当作批准边界;L3 通过共享核心变更台账策略处理核心 claim、协议、指标、threat model、数据集范围、benchmark 范围或 framing 变化。
|
|
2179
|
+
`;
|
|
2180
|
+
|
|
2181
|
+
const zhWriteRebuttalMode = `
|
|
2182
|
+
|
|
2183
|
+
## Rebuttal 模式
|
|
2184
|
+
|
|
2185
|
+
- 当用户提供外部 reviewer、AC、meta-review、rebuttal、同事或用户自己的批评时,起草前必须读取 \`skills/lab/references/rebuttal-mode.md\`。
|
|
2186
|
+
- 非平凡 paper-facing 写作轮次应把 rebuttal mode 当成 reviewer acceptance gate,并用 \`.lab/.managed/templates/rebuttal-panel.md\` 写 critique artifact。
|
|
2187
|
+
- 不要实现 write-only rebuttal workflow;共享 rebuttal-mode 负责审稿轴、外部 rebuttal intake、issue routing 和核心变更策略。
|
|
2188
|
+
- fatal 或 major 的 R1/R2/R3 issue 未解决前,不要进入 prose polish;先修复、路由到 \`iterate\` / \`report\` / \`framing\` / \`spec\`,或用证据显式 waive。
|
|
2189
|
+
- L3 或显式授权的写作 campaign 可以改 paper-level claim、协议、指标、threat model、数据集范围、benchmark 范围或 framing,但必须通过 \`skills/lab/references/rebuttal-mode.md\` 里的 Core Mutation Ledger 策略。
|
|
2190
|
+
- 在 write iteration artifact 里记录 rebuttal panel 路径、核心变更台账路径和未解决 issue id。
|
|
2191
|
+
`;
|
|
2192
|
+
|
|
2193
|
+
const zhAutoRebuttalMode = `
|
|
2194
|
+
|
|
2195
|
+
## Rebuttal Mode Promotion Guard
|
|
2196
|
+
|
|
2197
|
+
- 当 auto campaign 包含 paper-facing \`report\`、\`write\`、外部 rebuttal repair 或 reviewer-driven paper revision 时,必须读取 \`skills/lab/references/rebuttal-mode.md\`。
|
|
2198
|
+
- 使用 \`.lab/.managed/templates/rebuttal-panel.md\` 写持久 Reviewer Panel 工件,不要在 auto mode 里复制一套 reviewer workflow。
|
|
2199
|
+
- 外部 rebuttal 批评必须先转成内部 issue、route 和 acceptance check,再开始 \`run\`、\`iterate\`、\`report\` 或 \`write\`。
|
|
2200
|
+
- L1/L2 默认把核心变更当作批准边界;L3 可以在已批准 envelope 内修改 paper-level claim、协议、指标、threat model、reviewer profile、数据集范围、benchmark 范围或 framing。
|
|
2201
|
+
- L3 执行核心变更前,必须用 \`.lab/.managed/templates/core-mutation-ledger.md\` 写或更新 \`.lab/writing/core-mutation-ledger.md\`。
|
|
2202
|
+
- 核心变更必须触发全文影响审计:section、table、figure、report、metric glossary、terminology glossary 和 paper plan 都要标记为已更新、已 supersede 或仍待处理。
|
|
2203
|
+
- 当前 L3 envelope 允许修复且预算未耗尽时,不要因为可恢复 reviewer issue 停止;按 issue route 继续进入 \`write\`、\`iterate\`、\`report\`、\`framing\` 或 \`spec\`。
|
|
2204
|
+
`;
|
|
2205
|
+
|
|
1975
2206
|
ZH_CONTENT[path.join(".lab", "system", "core.md")] = `# Lab 系统核心
|
|
1976
2207
|
|
|
1977
2208
|
本项目使用 \`.lab/\` 作为持久研究工作流根目录。
|
|
@@ -2472,6 +2703,8 @@ const zhAutoPriorityCodexLine =
|
|
|
2472
2703
|
"显式的 `/lab:auto` 或 `/lab-auto` 请求,其优先级高于 brainstorming、spec review 这类更宽的创作或审阅技能路径。";
|
|
2473
2704
|
const zhAutoPriorityClaudeLine =
|
|
2474
2705
|
"显式的 `/lab auto` 或 `/lab-auto` 请求,其优先级高于 brainstorming、spec review 这类更宽的创作或审阅技能路径。";
|
|
2706
|
+
const zhAutoVisibleCloseoutLine =
|
|
2707
|
+
"最终可见收尾必须直接消费已通过校验的 stage report:展示请求交付物或目标的状态、核心说明表的关键行、证据路径、验证命令和验证结果、已知缺口,以及下一步动作和原因。不能只用“已完成”“已推送”或流水账命令日志结束。";
|
|
2475
2708
|
|
|
2476
2709
|
ZH_CONTENT[path.join(".codex", "prompts", "lab.md")] = ZH_CONTENT[
|
|
2477
2710
|
path.join(".codex", "prompts", "lab.md")
|
|
@@ -2485,6 +2718,9 @@ ZH_CONTENT[path.join(".codex", "prompts", "lab-auto.md")] = ZH_CONTENT[
|
|
|
2485
2718
|
].replace(
|
|
2486
2719
|
"已批准的 `L2` 和 `L3` 执行 campaign 默认进入执行模式。",
|
|
2487
2720
|
`${zhAutoPriorityCodexLine}\n已批准的 \`L2\` 和 \`L3\` 执行 campaign 默认进入执行模式。`
|
|
2721
|
+
).replace(
|
|
2722
|
+
"不要用 `sleep 30`、单次 `pgrep` 或一次性的 `metrics.json` 探针来代替真实长任务命令;当真实实验进程还活着时,只允许在出现有意义变化时发进度更新,并继续等待。没有新变化时,也只按保活节奏汇报,不要让用户触发下一次轮询。",
|
|
2723
|
+
"不要用 `sleep 30`、单次 `pgrep` 或一次性的 `metrics.json` 探针来代替真实长任务命令;当真实实验进程还活着时,只允许在出现有意义变化时发进度更新,并继续等待。没有新变化时,也只按保活节奏汇报,不要让用户触发下一次轮询。\n\n" + zhAutoVisibleCloseoutLine
|
|
2488
2724
|
);
|
|
2489
2725
|
|
|
2490
2726
|
ZH_CONTENT[path.join(".claude", "commands", "lab.md")] = ZH_CONTENT[
|
|
@@ -2499,6 +2735,9 @@ ZH_CONTENT[path.join(".claude", "commands", "lab-auto.md")] = ZH_CONTENT[
|
|
|
2499
2735
|
].replace(
|
|
2500
2736
|
"已批准的 `L2` 和 `L3` 执行 campaign 默认进入执行模式。",
|
|
2501
2737
|
`${zhAutoPriorityClaudeLine}\n已批准的 \`L2\` 和 \`L3\` 执行 campaign 默认进入执行模式。`
|
|
2738
|
+
).replace(
|
|
2739
|
+
"不要用 `sleep 30`、单次 `pgrep` 或一次性的 `metrics.json` 探针来代替真实长任务命令;当真实实验进程还活着时,只允许在出现有意义变化时发进度更新,并继续等待。没有新变化时,也只按保活节奏汇报,不要让用户触发下一次轮询。",
|
|
2740
|
+
"不要用 `sleep 30`、单次 `pgrep` 或一次性的 `metrics.json` 探针来代替真实长任务命令;当真实实验进程还活着时,只允许在出现有意义变化时发进度更新,并继续等待。没有新变化时,也只按保活节奏汇报,不要让用户触发下一次轮询。\n\n" + zhAutoVisibleCloseoutLine
|
|
2502
2741
|
);
|
|
2503
2742
|
|
|
2504
2743
|
const zhRecipeQuickPathLine =
|
|
@@ -2530,6 +2769,12 @@ ZH_CONTENT[path.join(".claude", "commands", "lab.md")] = ZH_CONTENT[
|
|
|
2530
2769
|
"- 用户只要显式调用某个 stage,无论写成 `/lab:<stage>`、`/lab: <stage>`、`/lab <stage>`、`/lab-<stage>` 还是 `/lab:<stage>`,都要立刻执行该 stage,而不是只推荐别的阶段。\n- 如果输入看起来像 stage 请求,但又不属于上述受支持写法,就必须停下并要求用户用精确的 stage 名重述,而不是自己猜。\n"
|
|
2531
2770
|
);
|
|
2532
2771
|
|
|
2772
|
+
for (const rootPromptKey of [path.join(".codex", "prompts", "lab.md"), path.join(".claude", "commands", "lab.md")]) {
|
|
2773
|
+
if (ZH_CONTENT[rootPromptKey] && !ZH_CONTENT[rootPromptKey].includes("最终可见收尾")) {
|
|
2774
|
+
ZH_CONTENT[rootPromptKey] += `\n\n${zhAutoVisibleCloseoutLine}\n`;
|
|
2775
|
+
}
|
|
2776
|
+
}
|
|
2777
|
+
|
|
2533
2778
|
ZH_CONTENT[path.join(".codex", "skills", "lab", "SKILL.md")] = `---
|
|
2534
2779
|
name: lab
|
|
2535
2780
|
description: 严格研究工作流,覆盖 idea、data、auto、framing、spec、run、iterate、review、report 和 paper-writing。
|
|
@@ -3174,11 +3419,38 @@ ZH_CONTENT[path.join(".codex", "skills", "lab", "stages", "auto.md")] = ZH_CONTE
|
|
|
3174
3419
|
"- 只有当级别本身真的有歧义时,才停下来追问,例如 \\`第三层\\`、\\`phase 3\\`、\\`table 3\\`。",
|
|
3175
3420
|
"- 只有当级别本身真的有歧义时,才停下来追问,例如 \\`第三层\\`、\\`phase 3\\`、\\`table 3\\`。\n- 如果用户显式调用 \\`/lab:auto\\` 或 \\`/lab-auto\\`,就保持在 auto 执行路径里;只要请求仍在已批准 execution envelope 内,即使目标听起来像 feature selection、baseline selection、离散化或 candidate sweep,也不要重新路由到 brainstorming 或 spec review。"
|
|
3176
3421
|
);
|
|
3422
|
+
const zhAutoStageVisibleCloseout = `
|
|
3423
|
+
|
|
3424
|
+
## 最终可见收尾
|
|
3425
|
+
|
|
3426
|
+
- 最终可见收尾必须在 stage report 校验通过后给出,不能只写“已完成”“已推送”或命令流水账。
|
|
3427
|
+
- 最终可见收尾必须直接来自已校验的阶段报告,而不是另起一套临场叙述。
|
|
3428
|
+
- 最终可见收尾至少包含:请求交付物或目标及状态、核心说明表关键行、证据路径、验证命令和验证结果、已知缺口、下一步动作和为什么这样做。
|
|
3429
|
+
- 如果说“已完成”,也必须同时写明仍然存在的 handoff 边界,例如 PDF 编译、版面检查、外部审批、预算耗尽、冻结核心风险或环境缺失。
|
|
3430
|
+
`;
|
|
3431
|
+
if (!ZH_CONTENT[path.join(".codex", "skills", "lab", "stages", "auto.md")].includes("最终可见收尾")) {
|
|
3432
|
+
ZH_CONTENT[path.join(".codex", "skills", "lab", "stages", "auto.md")] += zhAutoStageVisibleCloseout;
|
|
3433
|
+
}
|
|
3177
3434
|
ZH_CONTENT[path.join(".claude", "skills", "lab", "stages", "auto.md")] =
|
|
3178
3435
|
ZH_CONTENT[path.join(".codex", "skills", "lab", "stages", "auto.md")];
|
|
3179
3436
|
ZH_CONTENT[path.join(".claude", "skills", "lab", "stages", "report.md")] =
|
|
3180
3437
|
ZH_CONTENT[path.join(".codex", "skills", "lab", "stages", "report.md")];
|
|
3181
3438
|
|
|
3439
|
+
for (const platformRoot of [".codex", ".claude"]) {
|
|
3440
|
+
const reviewKey = path.join(platformRoot, "skills", "lab", "stages", "review.md");
|
|
3441
|
+
const writeKey = path.join(platformRoot, "skills", "lab", "stages", "write.md");
|
|
3442
|
+
const autoKey = path.join(platformRoot, "skills", "lab", "stages", "auto.md");
|
|
3443
|
+
if (ZH_CONTENT[reviewKey] && !ZH_CONTENT[reviewKey].includes("rebuttal-mode.md")) {
|
|
3444
|
+
ZH_CONTENT[reviewKey] += zhReviewRebuttalMode;
|
|
3445
|
+
}
|
|
3446
|
+
if (ZH_CONTENT[writeKey] && !ZH_CONTENT[writeKey].includes("rebuttal-mode.md")) {
|
|
3447
|
+
ZH_CONTENT[writeKey] += zhWriteRebuttalMode;
|
|
3448
|
+
}
|
|
3449
|
+
if (ZH_CONTENT[autoKey] && !ZH_CONTENT[autoKey].includes("core-mutation-ledger.md")) {
|
|
3450
|
+
ZH_CONTENT[autoKey] += zhAutoRebuttalMode;
|
|
3451
|
+
}
|
|
3452
|
+
}
|
|
3453
|
+
|
|
3182
3454
|
const zhStageReportCloseout = `
|
|
3183
3455
|
|
|
3184
3456
|
## 阶段报告收尾
|
package/lib/install.cjs
CHANGED
|
@@ -654,6 +654,7 @@ function localizeInstalledAssets(targetDir, lang, { newlyCreatedProjectOwnedPath
|
|
|
654
654
|
path.join(".codex", "skills", "lab", "stages", "write.md"),
|
|
655
655
|
path.join(".codex", "skills", "lab", "references", "workflow.md"),
|
|
656
656
|
path.join(".codex", "skills", "lab", "references", "recipes.md"),
|
|
657
|
+
path.join(".codex", "skills", "lab", "references", "rebuttal-mode.md"),
|
|
657
658
|
path.join(".claude", "skills", "lab", "SKILL.md"),
|
|
658
659
|
path.join(".claude", "skills", "lab", "stages", "idea.md"),
|
|
659
660
|
path.join(".claude", "skills", "lab", "stages", "data.md"),
|
|
@@ -667,6 +668,7 @@ function localizeInstalledAssets(targetDir, lang, { newlyCreatedProjectOwnedPath
|
|
|
667
668
|
path.join(".claude", "skills", "lab", "stages", "write.md"),
|
|
668
669
|
path.join(".claude", "skills", "lab", "references", "workflow.md"),
|
|
669
670
|
path.join(".claude", "skills", "lab", "references", "recipes.md"),
|
|
671
|
+
path.join(".claude", "skills", "lab", "references", "rebuttal-mode.md"),
|
|
670
672
|
path.join(".lab", ".managed", "templates", "idea.md"),
|
|
671
673
|
path.join(".lab", ".managed", "templates", "data.md"),
|
|
672
674
|
path.join(".lab", ".managed", "templates", "framing.md"),
|
|
@@ -677,6 +679,8 @@ function localizeInstalledAssets(targetDir, lang, { newlyCreatedProjectOwnedPath
|
|
|
677
679
|
path.join(".lab", ".managed", "templates", "stage-report.md"),
|
|
678
680
|
path.join(".lab", ".managed", "templates", "iteration-report.md"),
|
|
679
681
|
path.join(".lab", ".managed", "templates", "review-checklist.md"),
|
|
682
|
+
path.join(".lab", ".managed", "templates", "rebuttal-panel.md"),
|
|
683
|
+
path.join(".lab", ".managed", "templates", "core-mutation-ledger.md"),
|
|
680
684
|
path.join(".lab", ".managed", "templates", "final-report.md"),
|
|
681
685
|
path.join(".lab", ".managed", "templates", "main-tables.md"),
|
|
682
686
|
path.join(".lab", ".managed", "templates", "artifact-status.md"),
|
|
@@ -100,6 +100,7 @@ Treat all of these as equivalent stage requests:
|
|
|
100
100
|
- While the loop is alive, `/lab auto` should keep `.lab/context/auto-ledger.md` updated with the active owner, observed state, and resume boundary.
|
|
101
101
|
- Separate internal polling from user-facing progress reports.
|
|
102
102
|
- While the loop is healthy, `/lab auto` should report to the user only on a meaningful change or at the keepalive cadence recorded in the current contract or runtime state, and it should not ask the user to trigger the next poll.
|
|
103
|
+
- Final visible closeout must consume the validated stage report: show requested deliverable statuses, Core Explanation Table rows, evidence paths, validation/verification commands and results, known gaps, and the next action. Do not end with only "done", "pushed", or a chronological command log.
|
|
103
104
|
|
|
104
105
|
- Treat `Autonomy level L1/L2/L3` as the execution privilege level, not as a paper layer, phase, or table number.
|
|
105
106
|
- Treat `paper layer`, `phase`, and `table` as experiment targets. For example, `paper layer 3` or `Phase 1` should not be interpreted as `Autonomy level L3`.
|
|
@@ -27,3 +27,4 @@ If the preflight block cannot be completed because any required field is missing
|
|
|
27
27
|
When the repository workflow language is Chinese, summaries, checklist items, task labels, and progress updates should be written in Chinese unless a literal identifier must stay unchanged.
|
|
28
28
|
Treat `Layer 3`, `Phase 1`, or `Table 2` as paper-scope targets. Treat `Autonomy level L3` as the execution permission level.
|
|
29
29
|
Do not replace the real long-running experiment command with a short watcher such as `sleep 30`, `pgrep`, or a one-shot `metrics.json` probe. While the real experiment process is still alive, emit only a progress update and keep waiting.
|
|
30
|
+
Final visible closeout is mandatory when `/lab:auto` reaches stop, failure, escalation, or handoff. After validating the stage report, the final answer must consume that report directly: list the requested deliverables or objectives with status, summarize the Core Explanation Table rows, provide evidence paths, show validation/verification commands and validation results, name known gaps or commands that could not run, and state the next action plus why it is appropriate. Do not end with only `done`, `pushed`, `completed`, or a chronological command log.
|
|
@@ -27,3 +27,4 @@ If the preflight block cannot be completed because any required field is missing
|
|
|
27
27
|
When the repository workflow language is Chinese, summaries, checklist items, task labels, and progress updates should be written in Chinese unless a literal identifier must stay unchanged.
|
|
28
28
|
Treat `Layer 3`, `Phase 1`, or `Table 2` as paper-scope targets. Treat `Autonomy level L3` as the execution permission level.
|
|
29
29
|
Do not replace the real long-running experiment command with a short watcher such as `sleep 30`, `pgrep`, or a one-shot `metrics.json` probe. While the real experiment process is still alive, emit only a progress update and keep waiting.
|
|
30
|
+
Final visible closeout is mandatory when `/lab:auto` reaches stop, failure, escalation, or handoff. After validating the stage report, the final answer must consume that report directly: list the requested deliverables or objectives with status, summarize the Core Explanation Table rows, provide evidence paths, show validation/verification commands and validation results, name known gaps or commands that could not run, and state the next action plus why it is appropriate. Do not end with only `done`, `pushed`, `completed`, or a chronological command log.
|
|
@@ -94,6 +94,7 @@ Treat all of these as equivalent stage requests:
|
|
|
94
94
|
- While the loop is alive, `/lab:auto` should keep `.lab/context/auto-ledger.md` updated with the active owner, observed state, and resume boundary.
|
|
95
95
|
- Separate internal polling from user-facing progress reports.
|
|
96
96
|
- While the loop is healthy, `/lab:auto` should report to the user only on a meaningful change or at the keepalive cadence recorded in the current contract or runtime state, and it should not ask the user to trigger the next poll.
|
|
97
|
+
- Final visible closeout must consume the validated stage report: show requested deliverable statuses, Core Explanation Table rows, evidence paths, validation/verification commands and results, known gaps, and the next action. Do not end with only "done", "pushed", or a chronological command log.
|
|
97
98
|
|
|
98
99
|
- Treat `Autonomy level L1/L2/L3` as the execution privilege level, not as a paper layer, phase, or table number.
|
|
99
100
|
- Treat `paper layer`, `phase`, and `table` as experiment targets. For example, `paper layer 3` or `Phase 1` should not be interpreted as `Autonomy level L3`.
|
|
@@ -27,3 +27,4 @@ If the preflight block cannot be completed because any required field is missing
|
|
|
27
27
|
When the repository workflow language is Chinese, summaries, checklist items, task labels, and progress updates should be written in Chinese unless a literal identifier must stay unchanged.
|
|
28
28
|
Treat `Layer 3`, `Phase 1`, or `Table 2` as paper-scope targets. Treat `Autonomy level L3` as the execution permission level.
|
|
29
29
|
Do not replace the real long-running experiment command with a short watcher such as `sleep 30`, `pgrep`, or a one-shot `metrics.json` probe. While the real experiment process is still alive, emit only a progress update and keep waiting.
|
|
30
|
+
Final visible closeout is mandatory when `/lab:auto` reaches stop, failure, escalation, or handoff. After validating the stage report, the final answer must consume that report directly: list the requested deliverables or objectives with status, summarize the Core Explanation Table rows, provide evidence paths, show validation/verification commands and validation results, name known gaps or commands that could not run, and state the next action plus why it is appropriate. Do not end with only `done`, `pushed`, `completed`, or a chronological command log.
|
|
@@ -27,3 +27,4 @@ If the preflight block cannot be completed because any required field is missing
|
|
|
27
27
|
When the repository workflow language is Chinese, summaries, checklist items, task labels, and progress updates should be written in Chinese unless a literal identifier must stay unchanged.
|
|
28
28
|
Treat `Layer 3`, `Phase 1`, or `Table 2` as paper-scope targets. Treat `Autonomy level L3` as the execution permission level.
|
|
29
29
|
Do not replace the real long-running experiment command with a short watcher such as `sleep 30`, `pgrep`, or a one-shot `metrics.json` probe. While the real experiment process is still alive, emit only a progress update and keep waiting.
|
|
30
|
+
Final visible closeout is mandatory when `/lab:auto` reaches stop, failure, escalation, or handoff. After validating the stage report, the final answer must consume that report directly: list the requested deliverables or objectives with status, summarize the Core Explanation Table rows, provide evidence paths, show validation/verification commands and validation results, name known gaps or commands that could not run, and state the next action plus why it is appropriate. Do not end with only `done`, `pushed`, `completed`, or a chronological command log.
|
|
@@ -0,0 +1,66 @@
|
|
|
1
|
+
# Core Mutation Ledger
|
|
2
|
+
|
|
3
|
+
## Mutation Summary
|
|
4
|
+
|
|
5
|
+
- Mutation ID:
|
|
6
|
+
- Date:
|
|
7
|
+
- Mutation trigger: internal reviewer panel / external rebuttal / failed gate / anomaly / user request
|
|
8
|
+
- Allowed by autonomy level:
|
|
9
|
+
- Campaign or stage:
|
|
10
|
+
- Owner:
|
|
11
|
+
|
|
12
|
+
## Changed Core Item
|
|
13
|
+
|
|
14
|
+
- Core item type: claim / protocol / metric / threat model / reviewer profile / dataset scope / benchmark scope / framing
|
|
15
|
+
- Previous state:
|
|
16
|
+
- New state:
|
|
17
|
+
- Why the change is necessary:
|
|
18
|
+
- Why a smaller manuscript-only fix is insufficient:
|
|
19
|
+
|
|
20
|
+
## Source Issue
|
|
21
|
+
|
|
22
|
+
- Rebuttal panel path:
|
|
23
|
+
- Issue ID:
|
|
24
|
+
- Reviewer axis:
|
|
25
|
+
- Severity:
|
|
26
|
+
- External criticism source, if any:
|
|
27
|
+
|
|
28
|
+
## Invalidated Prior Claims
|
|
29
|
+
|
|
30
|
+
- Invalidated prior claims:
|
|
31
|
+
- Claims invalidated:
|
|
32
|
+
- Sections invalidated:
|
|
33
|
+
- Tables invalidated:
|
|
34
|
+
- Figures invalidated:
|
|
35
|
+
- Reports invalidated:
|
|
36
|
+
- Results or runs superseded:
|
|
37
|
+
- Evidence that remains valid under a narrower interpretation:
|
|
38
|
+
|
|
39
|
+
## Paper-Wide Impact Audit
|
|
40
|
+
|
|
41
|
+
- Paper-wide impact audit:
|
|
42
|
+
- Abstract impact:
|
|
43
|
+
- Introduction impact:
|
|
44
|
+
- Method impact:
|
|
45
|
+
- Experiments impact:
|
|
46
|
+
- Related work impact:
|
|
47
|
+
- Conclusion impact:
|
|
48
|
+
- Table impact:
|
|
49
|
+
- Figure or analysis asset impact:
|
|
50
|
+
- Metric glossary impact:
|
|
51
|
+
- Terminology glossary impact:
|
|
52
|
+
- Paper plan impact:
|
|
53
|
+
|
|
54
|
+
## New Verification
|
|
55
|
+
|
|
56
|
+
- New evidence required:
|
|
57
|
+
- Rerun command or validation command:
|
|
58
|
+
- Manuscript validators required:
|
|
59
|
+
- Confirmation check:
|
|
60
|
+
- Result:
|
|
61
|
+
|
|
62
|
+
## Rollback
|
|
63
|
+
|
|
64
|
+
- Rollback condition:
|
|
65
|
+
- Rollback target:
|
|
66
|
+
- Notes:
|
|
@@ -0,0 +1,71 @@
|
|
|
1
|
+
# Rebuttal Panel
|
|
2
|
+
|
|
3
|
+
## Review Target
|
|
4
|
+
|
|
5
|
+
- Target artifact:
|
|
6
|
+
- Stage:
|
|
7
|
+
- Autonomy level:
|
|
8
|
+
- Evidence base:
|
|
9
|
+
- External rebuttal source, if any:
|
|
10
|
+
|
|
11
|
+
## External Rebuttal Intake
|
|
12
|
+
|
|
13
|
+
| Source | Raw criticism summary | Affected unit | Reviewer axis | Severity | Route | Acceptance check |
|
|
14
|
+
| --- | --- | --- | --- | --- | --- | --- |
|
|
15
|
+
| | | | | | | |
|
|
16
|
+
|
|
17
|
+
## Reviewer Panel Findings
|
|
18
|
+
|
|
19
|
+
### R1 Significance / Originality / Insight
|
|
20
|
+
|
|
21
|
+
- Finding:
|
|
22
|
+
- Why it matters:
|
|
23
|
+
- Required fix:
|
|
24
|
+
- Route:
|
|
25
|
+
- Acceptance check:
|
|
26
|
+
|
|
27
|
+
### R2 Soundness / Technical Quality
|
|
28
|
+
|
|
29
|
+
- Finding:
|
|
30
|
+
- Why it matters:
|
|
31
|
+
- Required fix:
|
|
32
|
+
- Route:
|
|
33
|
+
- Acceptance check:
|
|
34
|
+
|
|
35
|
+
### R3 Evaluation / Analysis
|
|
36
|
+
|
|
37
|
+
- Finding:
|
|
38
|
+
- Why it matters:
|
|
39
|
+
- Required fix:
|
|
40
|
+
- Route:
|
|
41
|
+
- Acceptance check:
|
|
42
|
+
|
|
43
|
+
### R4 Presentation / Clarity
|
|
44
|
+
|
|
45
|
+
- Finding:
|
|
46
|
+
- Why it matters:
|
|
47
|
+
- Required fix:
|
|
48
|
+
- Route:
|
|
49
|
+
- Acceptance check:
|
|
50
|
+
|
|
51
|
+
## Actionable Issue Register
|
|
52
|
+
|
|
53
|
+
| ID | Axis | Severity | Affected artifact | Finding | Required fix | Route | Acceptance check | Core mutation required |
|
|
54
|
+
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
|
|
55
|
+
| | | | | | | | | |
|
|
56
|
+
|
|
57
|
+
## Core Mutation Check
|
|
58
|
+
|
|
59
|
+
- Does any issue require changing a paper-level claim, protocol, metric, threat model, dataset scope, benchmark scope, or framing:
|
|
60
|
+
- If yes, is the current autonomy level L3 or explicitly approved for core mutation:
|
|
61
|
+
- Core mutation ledger path:
|
|
62
|
+
- Prior claims or assets invalidated:
|
|
63
|
+
- New verification required:
|
|
64
|
+
|
|
65
|
+
## Decision
|
|
66
|
+
|
|
67
|
+
- Continue / revise / rerun / escalate / stop:
|
|
68
|
+
- Next route:
|
|
69
|
+
- Blocking issue, if any:
|
|
70
|
+
- Handoff note:
|
|
71
|
+
|
|
@@ -49,10 +49,12 @@ Use this skill when the user invokes `/lab:*` or asks for the structured researc
|
|
|
49
49
|
- If the stage says improvement is needed, do not choose `stop` unless the next action states a concrete terminal boundary such as budget exhaustion, frozen-core risk, safety or integrity failure, impossible target, or a required approval boundary. Otherwise choose `continue`, `revise`, `rerun`, or `escalate`.
|
|
50
50
|
- Stage reports are closeout and handoff artifacts, not a new user command and not a replacement for stage-specific artifacts such as idea memos, iteration reports, final reports, or write-iteration records.
|
|
51
51
|
- Run `.lab/.managed/scripts/validate_stage_report.py --stage-report <stage-report> --stage <stage>` before claiming the stage is complete, and include the stage-report path plus validation result in the final user-facing summary.
|
|
52
|
+
- For `/lab:auto`, the final user-facing answer must visibly consume the validated stage report: summarize requested deliverable statuses, Core Explanation Table rows, evidence paths, validation/verification commands and results, known gaps, and the next action. A chat-only chronological result list is not a valid closeout.
|
|
52
53
|
- Final paper output should default to LaTeX, and its manuscript language should be decided separately from the workflow language.
|
|
53
54
|
- Separate sourced facts from model-generated hypotheses.
|
|
54
55
|
- Preserve failed runs, failed ideas, and limitations.
|
|
55
56
|
- Use `skills/lab/references/recipes.md` as the quick path for common stage chains without inventing new commands.
|
|
57
|
+
- Use `.codex/skills/lab/references/rebuttal-mode.md` or `.claude/skills/lab/references/rebuttal-mode.md` as the single shared reviewer-panel and external rebuttal intake contract. Do not copy four-reviewer logic into `review`, `write`, or `auto` stage guides.
|
|
56
58
|
|
|
57
59
|
## Stage Contract
|
|
58
60
|
|
|
@@ -220,6 +222,7 @@ Use this skill when the user invokes `/lab:*` or asks for the structured researc
|
|
|
220
222
|
- Surface the strongest alternative explanation and any boundary risk that should narrow the claim.
|
|
221
223
|
- Output findings first, then fatal flaws, then fix priority, then residual risks.
|
|
222
224
|
- Use `.lab/.managed/templates/review-checklist.md`.
|
|
225
|
+
- When reviewing paper-facing artifacts or external rebuttal criticism, use the shared rebuttal mode reference and write the panel artifact from `.lab/.managed/templates/rebuttal-panel.md`.
|
|
223
226
|
- Write durable review conclusions back to `.lab/context/decisions.md`, `.lab/context/evidence-index.md`, or `.lab/context/open-questions.md` when they affect later stages. Do not use `.lab/context/state.md` as a primary write target.
|
|
224
227
|
|
|
225
228
|
### `/lab:report`
|
|
@@ -348,6 +351,7 @@ Use this skill when the user invokes `/lab:*` or asks for the structured researc
|
|
|
348
351
|
- Workflow summary: `.codex/skills/lab/references/workflow.md` or `.claude/skills/lab/references/workflow.md`
|
|
349
352
|
- Stage recipes: `skills/lab/references/recipes.md`
|
|
350
353
|
- Brainstorming integration: `.codex/skills/lab/references/brainstorming-integration.md` or `.claude/skills/lab/references/brainstorming-integration.md`
|
|
354
|
+
- Rebuttal mode: `.codex/skills/lab/references/rebuttal-mode.md` or `.claude/skills/lab/references/rebuttal-mode.md`
|
|
351
355
|
- Idea stage guide: `.codex/skills/lab/stages/idea.md` or `.claude/skills/lab/stages/idea.md`
|
|
352
356
|
- Data stage guide: `.codex/skills/lab/stages/data.md` or `.claude/skills/lab/stages/data.md`
|
|
353
357
|
- Auto stage guide: `.codex/skills/lab/stages/auto.md` or `.claude/skills/lab/stages/auto.md`
|
|
@@ -0,0 +1,135 @@
|
|
|
1
|
+
# Rebuttal Mode
|
|
2
|
+
|
|
3
|
+
Use this reference whenever a lab stage must turn reviewer-style criticism into executable research or writing work. Keep this logic here; stage guides should reference this file instead of copying the reviewer panel rules.
|
|
4
|
+
|
|
5
|
+
## When To Trigger
|
|
6
|
+
|
|
7
|
+
Trigger rebuttal mode when any of these are present:
|
|
8
|
+
|
|
9
|
+
- external reviewer, AC, meta-review, rebuttal, or colleague criticism
|
|
10
|
+
- a user asks for reviewer-style criticism before another write or auto round
|
|
11
|
+
- `/lab:review` targets a paper, section, table, figure, report, or claim set
|
|
12
|
+
- `/lab:write` needs a reviewer gate before accepting a nontrivial manuscript round
|
|
13
|
+
- `/lab:auto` is allowed to run `write`, paper-facing `report`, or reviewer-driven repair
|
|
14
|
+
|
|
15
|
+
Do not trigger rebuttal mode for routine implementation reviews, path fixes, dependency setup, or experiment polling unless the issue affects paper-facing claims.
|
|
16
|
+
|
|
17
|
+
## Required Inputs
|
|
18
|
+
|
|
19
|
+
- artifact under review: section, paper, report, table, figure, result, or claim set
|
|
20
|
+
- evidence base used by the artifact
|
|
21
|
+
- current framing or mission when available
|
|
22
|
+
- current metric and terminology glossaries when relevant
|
|
23
|
+
- external rebuttal text when provided
|
|
24
|
+
- active autonomy level when the stage is `/lab:auto`
|
|
25
|
+
|
|
26
|
+
## External Rebuttal Intake
|
|
27
|
+
|
|
28
|
+
External criticism must be converted into internal issues before any rewrite.
|
|
29
|
+
|
|
30
|
+
For each external comment, record:
|
|
31
|
+
|
|
32
|
+
- source: reviewer id, AC, meta-review, colleague, or user
|
|
33
|
+
- raw criticism summary
|
|
34
|
+
- affected paper unit: claim, section, table, figure, protocol, metric, threat model, experiment, or wording
|
|
35
|
+
- reviewer axis: R1, R2, R3, or R4
|
|
36
|
+
- severity: fatal, major, minor, or clarification
|
|
37
|
+
- route: `write`, `iterate`, `report`, `framing`, `data`, `spec`, or `ask-user`
|
|
38
|
+
- acceptance check: concrete evidence or manuscript condition that resolves the issue
|
|
39
|
+
|
|
40
|
+
Do not answer external criticism with prose-only reassurance. If the issue is valid, it must become a repair task. If it is invalid, state the evidence that rules it out.
|
|
41
|
+
|
|
42
|
+
## Reviewer Panel
|
|
43
|
+
|
|
44
|
+
Run four independent review lenses. Each lens must produce actionable issues, not vague advice.
|
|
45
|
+
|
|
46
|
+
### R1 Significance / Originality / Insight
|
|
47
|
+
|
|
48
|
+
Ask whether the problem matters, whether the paper teaches the community something beyond an artifact, whether the motivation is necessary, and whether the claimed insight is deeper than a module stack or benchmark gain.
|
|
49
|
+
|
|
50
|
+
Typical fixes route to `framing`, `write`, or `iterate`.
|
|
51
|
+
|
|
52
|
+
### R2 Soundness / Technical Quality
|
|
53
|
+
|
|
54
|
+
Ask whether the method, assumptions, proofs, protocol, baselines, implementation details, metrics, and statistics are technically defensible and reproducible.
|
|
55
|
+
|
|
56
|
+
Typical fixes route to `spec`, `run`, `iterate`, `report`, or `write`.
|
|
57
|
+
|
|
58
|
+
### R3 Evaluation / Analysis
|
|
59
|
+
|
|
60
|
+
Ask whether evaluation covers ablations, robustness, generalization, failure cases, alternative explanations, and metric interpretation.
|
|
61
|
+
|
|
62
|
+
Typical fixes route to `iterate`, `report`, or `write`.
|
|
63
|
+
|
|
64
|
+
### R4 Presentation / Clarity
|
|
65
|
+
|
|
66
|
+
Ask whether the storyline, terminology, figure/table semantics, citations, LaTeX, and section flow are readable and self-contained.
|
|
67
|
+
|
|
68
|
+
Typical fixes route to `write`.
|
|
69
|
+
|
|
70
|
+
## Actionable Issue Register
|
|
71
|
+
|
|
72
|
+
Every issue must include:
|
|
73
|
+
|
|
74
|
+
- id
|
|
75
|
+
- reviewer axis
|
|
76
|
+
- severity
|
|
77
|
+
- affected artifact
|
|
78
|
+
- finding
|
|
79
|
+
- why it matters
|
|
80
|
+
- required fix
|
|
81
|
+
- route
|
|
82
|
+
- acceptance check
|
|
83
|
+
- whether core mutation is required
|
|
84
|
+
|
|
85
|
+
Prioritize fatal and major issues before language polish. Minor presentation fixes may be batched.
|
|
86
|
+
|
|
87
|
+
## Core Mutation Policy
|
|
88
|
+
|
|
89
|
+
Core mutation means changing any of:
|
|
90
|
+
|
|
91
|
+
- paper-level claim or contribution
|
|
92
|
+
- experiment or evaluation protocol
|
|
93
|
+
- metric definition or primary metric choice
|
|
94
|
+
- threat model, reviewer profile, dataset scope, or benchmark scope
|
|
95
|
+
- paper-facing framing that invalidates existing sections or tables
|
|
96
|
+
|
|
97
|
+
L1 and L2 treat core mutation as an approval boundary unless the current contract explicitly allows it.
|
|
98
|
+
|
|
99
|
+
L3 may perform core mutation inside the approved campaign envelope. It must not hide the change as ordinary writing or routine repair.
|
|
100
|
+
|
|
101
|
+
When L3 changes a core item, write or update `.lab/writing/core-mutation-ledger.md` from `.lab/.managed/templates/core-mutation-ledger.md` before claiming the issue is resolved.
|
|
102
|
+
|
|
103
|
+
## Core Mutation Ledger
|
|
104
|
+
|
|
105
|
+
The ledger must record:
|
|
106
|
+
|
|
107
|
+
- mutation trigger: internal reviewer panel, external rebuttal, failed gate, anomaly, or user request
|
|
108
|
+
- changed core item
|
|
109
|
+
- previous state
|
|
110
|
+
- new state
|
|
111
|
+
- why the change is necessary
|
|
112
|
+
- invalidated prior claims, sections, tables, figures, reports, or results
|
|
113
|
+
- new evidence required
|
|
114
|
+
- rerun, rewrite, or validation command
|
|
115
|
+
- paper-wide impact audit
|
|
116
|
+
- rollback condition
|
|
117
|
+
|
|
118
|
+
If old evidence remains usable under a narrower interpretation, say exactly where it remains valid. If it does not remain valid, mark it superseded.
|
|
119
|
+
|
|
120
|
+
## Stage Integration
|
|
121
|
+
|
|
122
|
+
`/lab:review` uses rebuttal mode as its reviewer-panel operating mode when the target is paper-facing or when external criticism is supplied.
|
|
123
|
+
|
|
124
|
+
`/lab:write` uses rebuttal mode as an acceptance gate for nontrivial section or manuscript rounds. A write round may not proceed to prose polish while a fatal or major R1/R2/R3 issue remains unresolved.
|
|
125
|
+
|
|
126
|
+
`/lab:auto` uses rebuttal mode as a promotion guard when the campaign includes paper-facing `report`, `write`, or external rebuttal repair. In L3, auto may execute core mutation after ledger entry and impact audit.
|
|
127
|
+
|
|
128
|
+
## Stop Or Continue Decision
|
|
129
|
+
|
|
130
|
+
- Continue when issues are actionable inside the approved envelope.
|
|
131
|
+
- Rerun when the acceptance check requires new evidence.
|
|
132
|
+
- Revise when the fix is manuscript-only.
|
|
133
|
+
- Escalate when the issue requires a decision outside the current autonomy level.
|
|
134
|
+
- Stop only when the remaining issue is terminal, already waived with evidence, or outside the campaign boundary.
|
|
135
|
+
|
|
@@ -125,6 +125,16 @@
|
|
|
125
125
|
- Before each rung and before each success, stop, or promotion decision, re-check the generic academic-risk questions: setting semantics, visibility/leakage, anchor or label policy, scale comparability, metric validity, comparison validity, statistical validity, claim boundary, and integrity self-check.
|
|
126
126
|
- Before each success, stop, or promotion decision, also re-check the anomaly policy: whether anomaly signals fired, whether simpler explanations were ruled out, whether a cross-check was performed, and whether the current interpretation is still the narrowest supported one.
|
|
127
127
|
|
|
128
|
+
## Rebuttal Mode Promotion Guard
|
|
129
|
+
|
|
130
|
+
- When an auto campaign includes paper-facing `report`, `write`, external rebuttal repair, or reviewer-driven paper revision, load the shared rebuttal procedure in `skills/lab/references/rebuttal-mode.md`.
|
|
131
|
+
- Use `.lab/.managed/templates/rebuttal-panel.md` for the durable Reviewer Panel artifact instead of embedding a separate reviewer workflow in auto mode.
|
|
132
|
+
- External rebuttal criticism must be converted into internal issues, routes, and acceptance checks before `run`, `iterate`, `report`, or `write` work starts.
|
|
133
|
+
- In L1/L2, core mutation remains an approval boundary unless explicitly authorized by the auto contract.
|
|
134
|
+
- In L3, auto may change paper-level claim, protocol, metric, threat model, reviewer profile, dataset scope, benchmark scope, or framing inside the approved campaign envelope. It must first write or update `.lab/writing/core-mutation-ledger.md` from `.lab/.managed/templates/core-mutation-ledger.md`.
|
|
135
|
+
- A core mutation must trigger paper-wide impact audit before promotion: affected sections, tables, figures, reports, metric glossary, terminology glossary, and paper plan must be listed as updated, superseded, or still pending.
|
|
136
|
+
- Do not stop after a recoverable reviewer issue when the current L3 envelope allows a repair route and budget remains. Continue through `write`, `iterate`, `report`, `framing`, or `spec` according to the issue route.
|
|
137
|
+
|
|
128
138
|
## Gate Miss And Repair Loop
|
|
129
139
|
|
|
130
140
|
- A gate miss is not automatically a terminal stop for `L2` or `L3` when `iterate` is allowed and the loop budget remains.
|
|
@@ -220,3 +230,13 @@
|
|
|
220
230
|
- Fill the `Core Explanation Table` in plain language: background, why now, what ran, how the loop ran, what worked, what did not work, what was verified, what remains unverified, what needs improvement and why, how to improve and why, key evidence, and the continue/stop/revise/rerun/escalate/handoff decision.
|
|
221
231
|
- If the table says improvement is needed, the next action may be `stop` only when a terminal boundary is explicitly named; otherwise choose `continue`, `revise`, `rerun`, or `escalate`.
|
|
222
232
|
- Run `.lab/.managed/scripts/validate_stage_report.py --stage-report <stage-report> --stage auto` and include the report path plus validation result in the final user-facing summary.
|
|
233
|
+
- Final visible closeout is mandatory after validation. Do not end `/lab:auto` with only "done", "pushed", "completed", or a chronological command log.
|
|
234
|
+
- The final visible closeout must be derived from the validated stage report, not from a separate improvised narrative.
|
|
235
|
+
- The final visible closeout must include:
|
|
236
|
+
- the user's requested deliverables or objectives and their status: completed, repaired, failed-gate, not promoted, blocked, or handoff
|
|
237
|
+
- the key Core Explanation Table rows: what was done, how it was done, what worked, what did not work, what was verified, what remains unverified, whether improvement is needed and why, how to improve and why
|
|
238
|
+
- evidence paths and primary artifacts
|
|
239
|
+
- validation/verification commands and validation result, including commands that could not run
|
|
240
|
+
- known gaps or compile/runtime limitations
|
|
241
|
+
- next action and why that action is appropriate
|
|
242
|
+
- If the final answer says the work is "completed", it must still name any remaining handoff boundary such as PDF compile, layout check, external approval, budget exhaustion, frozen-core risk, or missing environment.
|
|
@@ -41,6 +41,15 @@
|
|
|
41
41
|
- unresolved alternative explanations
|
|
42
42
|
- boundary risks that should narrow the claim even if the implementation is correct
|
|
43
43
|
|
|
44
|
+
## Rebuttal Mode
|
|
45
|
+
|
|
46
|
+
- When the target is a paper, paper section, table, figure, report, claim set, or external rebuttal criticism, run the shared reviewer-panel procedure in `skills/lab/references/rebuttal-mode.md`.
|
|
47
|
+
- Do not duplicate the four-reviewer logic in this stage file. Use `.lab/.managed/templates/rebuttal-panel.md` for the durable critique artifact.
|
|
48
|
+
- External rebuttal, AC, meta-review, colleague, or user criticism must be converted into internal actionable issues before any rewrite or response draft.
|
|
49
|
+
- The Reviewer Panel must classify issues across R1 Significance / Originality / Insight, R2 Soundness / Technical Quality, R3 Evaluation / Analysis, and R4 Presentation / Clarity.
|
|
50
|
+
- Each issue must include severity, affected artifact, required fix, route, acceptance check, and whether core mutation is required.
|
|
51
|
+
- In L1/L2, core mutation remains an approval boundary unless explicitly authorized. In L3, route core mutation through the shared ledger policy instead of treating it as a reviewer-stage blocker.
|
|
52
|
+
|
|
44
53
|
## Output Style
|
|
45
54
|
|
|
46
55
|
- concise summary first
|
|
@@ -71,6 +71,15 @@ Run these on every round:
|
|
|
71
71
|
- reviewer pass -> `skills/lab/references/paper-writing/paper-review.md`
|
|
72
72
|
- section-specific style policy -> `skills/lab/references/paper-writing/section-style-policies.md` (load the block matching the current section)
|
|
73
73
|
|
|
74
|
+
## Rebuttal Mode
|
|
75
|
+
|
|
76
|
+
- When the user provides external reviewer, AC, meta-review, rebuttal, colleague, or user criticism, load `skills/lab/references/rebuttal-mode.md` before drafting.
|
|
77
|
+
- For nontrivial paper-facing write rounds, use rebuttal mode as the reviewer acceptance gate and write the critique artifact from `.lab/.managed/templates/rebuttal-panel.md`.
|
|
78
|
+
- Do not implement a separate write-only rebuttal workflow. The shared rebuttal-mode reference owns reviewer axes, external rebuttal intake, issue routing, and core mutation policy.
|
|
79
|
+
- Fatal or major R1/R2/R3 issues block prose polish until they are repaired, routed to `iterate`/`report`/`framing`/`spec`, or explicitly waived with evidence.
|
|
80
|
+
- In L3 or an explicitly core-authorized write campaign, paper-level claim, protocol, metric, threat model, dataset scope, benchmark scope, or framing changes are allowed only through the shared Core Mutation Ledger policy in `skills/lab/references/rebuttal-mode.md`.
|
|
81
|
+
- Record the rebuttal panel path, any core mutation ledger path, and unresolved issue ids in the write-iteration artifact.
|
|
82
|
+
|
|
74
83
|
## Reference-Guided Deep Write
|
|
75
84
|
|
|
76
85
|
Trigger this mode automatically when the user provides reference PDFs, paper URLs, local reference-paper paths, a template paper, or asks to "参考" papers while continuing `/lab:write`.
|