superlab 0.1.74 → 0.1.75

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/lib/i18n.cjs CHANGED
@@ -2703,6 +2703,8 @@ const zhAutoPriorityCodexLine =
2703
2703
  "显式的 `/lab:auto` 或 `/lab-auto` 请求,其优先级高于 brainstorming、spec review 这类更宽的创作或审阅技能路径。";
2704
2704
  const zhAutoPriorityClaudeLine =
2705
2705
  "显式的 `/lab auto` 或 `/lab-auto` 请求,其优先级高于 brainstorming、spec review 这类更宽的创作或审阅技能路径。";
2706
+ const zhAutoVisibleCloseoutLine =
2707
+ "最终可见收尾必须直接消费已通过校验的 stage report:展示请求交付物或目标的状态、核心说明表的关键行、证据路径、验证命令和验证结果、已知缺口,以及下一步动作和原因。不能只用“已完成”“已推送”或流水账命令日志结束。";
2706
2708
 
2707
2709
  ZH_CONTENT[path.join(".codex", "prompts", "lab.md")] = ZH_CONTENT[
2708
2710
  path.join(".codex", "prompts", "lab.md")
@@ -2716,6 +2718,9 @@ ZH_CONTENT[path.join(".codex", "prompts", "lab-auto.md")] = ZH_CONTENT[
2716
2718
  ].replace(
2717
2719
  "已批准的 `L2` 和 `L3` 执行 campaign 默认进入执行模式。",
2718
2720
  `${zhAutoPriorityCodexLine}\n已批准的 \`L2\` 和 \`L3\` 执行 campaign 默认进入执行模式。`
2721
+ ).replace(
2722
+ "不要用 `sleep 30`、单次 `pgrep` 或一次性的 `metrics.json` 探针来代替真实长任务命令;当真实实验进程还活着时,只允许在出现有意义变化时发进度更新,并继续等待。没有新变化时,也只按保活节奏汇报,不要让用户触发下一次轮询。",
2723
+ "不要用 `sleep 30`、单次 `pgrep` 或一次性的 `metrics.json` 探针来代替真实长任务命令;当真实实验进程还活着时,只允许在出现有意义变化时发进度更新,并继续等待。没有新变化时,也只按保活节奏汇报,不要让用户触发下一次轮询。\n\n" + zhAutoVisibleCloseoutLine
2719
2724
  );
2720
2725
 
2721
2726
  ZH_CONTENT[path.join(".claude", "commands", "lab.md")] = ZH_CONTENT[
@@ -2730,6 +2735,9 @@ ZH_CONTENT[path.join(".claude", "commands", "lab-auto.md")] = ZH_CONTENT[
2730
2735
  ].replace(
2731
2736
  "已批准的 `L2` 和 `L3` 执行 campaign 默认进入执行模式。",
2732
2737
  `${zhAutoPriorityClaudeLine}\n已批准的 \`L2\` 和 \`L3\` 执行 campaign 默认进入执行模式。`
2738
+ ).replace(
2739
+ "不要用 `sleep 30`、单次 `pgrep` 或一次性的 `metrics.json` 探针来代替真实长任务命令;当真实实验进程还活着时,只允许在出现有意义变化时发进度更新,并继续等待。没有新变化时,也只按保活节奏汇报,不要让用户触发下一次轮询。",
2740
+ "不要用 `sleep 30`、单次 `pgrep` 或一次性的 `metrics.json` 探针来代替真实长任务命令;当真实实验进程还活着时,只允许在出现有意义变化时发进度更新,并继续等待。没有新变化时,也只按保活节奏汇报,不要让用户触发下一次轮询。\n\n" + zhAutoVisibleCloseoutLine
2733
2741
  );
2734
2742
 
2735
2743
  const zhRecipeQuickPathLine =
@@ -2761,6 +2769,12 @@ ZH_CONTENT[path.join(".claude", "commands", "lab.md")] = ZH_CONTENT[
2761
2769
  "- 用户只要显式调用某个 stage,无论写成 `/lab:<stage>`、`/lab: <stage>`、`/lab <stage>`、`/lab-<stage>` 还是 `/lab:<stage>`,都要立刻执行该 stage,而不是只推荐别的阶段。\n- 如果输入看起来像 stage 请求,但又不属于上述受支持写法,就必须停下并要求用户用精确的 stage 名重述,而不是自己猜。\n"
2762
2770
  );
2763
2771
 
2772
+ for (const rootPromptKey of [path.join(".codex", "prompts", "lab.md"), path.join(".claude", "commands", "lab.md")]) {
2773
+ if (ZH_CONTENT[rootPromptKey] && !ZH_CONTENT[rootPromptKey].includes("最终可见收尾")) {
2774
+ ZH_CONTENT[rootPromptKey] += `\n\n${zhAutoVisibleCloseoutLine}\n`;
2775
+ }
2776
+ }
2777
+
2764
2778
  ZH_CONTENT[path.join(".codex", "skills", "lab", "SKILL.md")] = `---
2765
2779
  name: lab
2766
2780
  description: 严格研究工作流,覆盖 idea、data、auto、framing、spec、run、iterate、review、report 和 paper-writing。
@@ -3405,6 +3419,18 @@ ZH_CONTENT[path.join(".codex", "skills", "lab", "stages", "auto.md")] = ZH_CONTE
3405
3419
  "- 只有当级别本身真的有歧义时,才停下来追问,例如 \\`第三层\\`、\\`phase 3\\`、\\`table 3\\`。",
3406
3420
  "- 只有当级别本身真的有歧义时,才停下来追问,例如 \\`第三层\\`、\\`phase 3\\`、\\`table 3\\`。\n- 如果用户显式调用 \\`/lab:auto\\` 或 \\`/lab-auto\\`,就保持在 auto 执行路径里;只要请求仍在已批准 execution envelope 内,即使目标听起来像 feature selection、baseline selection、离散化或 candidate sweep,也不要重新路由到 brainstorming 或 spec review。"
3407
3421
  );
3422
+ const zhAutoStageVisibleCloseout = `
3423
+
3424
+ ## 最终可见收尾
3425
+
3426
+ - 最终可见收尾必须在 stage report 校验通过后给出,不能只写“已完成”“已推送”或命令流水账。
3427
+ - 最终可见收尾必须直接来自已校验的阶段报告,而不是另起一套临场叙述。
3428
+ - 最终可见收尾至少包含:请求交付物或目标及状态、核心说明表关键行、证据路径、验证命令和验证结果、已知缺口、下一步动作和为什么这样做。
3429
+ - 如果说“已完成”,也必须同时写明仍然存在的 handoff 边界,例如 PDF 编译、版面检查、外部审批、预算耗尽、冻结核心风险或环境缺失。
3430
+ `;
3431
+ if (!ZH_CONTENT[path.join(".codex", "skills", "lab", "stages", "auto.md")].includes("最终可见收尾")) {
3432
+ ZH_CONTENT[path.join(".codex", "skills", "lab", "stages", "auto.md")] += zhAutoStageVisibleCloseout;
3433
+ }
3408
3434
  ZH_CONTENT[path.join(".claude", "skills", "lab", "stages", "auto.md")] =
3409
3435
  ZH_CONTENT[path.join(".codex", "skills", "lab", "stages", "auto.md")];
3410
3436
  ZH_CONTENT[path.join(".claude", "skills", "lab", "stages", "report.md")] =
@@ -100,6 +100,7 @@ Treat all of these as equivalent stage requests:
100
100
  - While the loop is alive, `/lab auto` should keep `.lab/context/auto-ledger.md` updated with the active owner, observed state, and resume boundary.
101
101
  - Separate internal polling from user-facing progress reports.
102
102
  - While the loop is healthy, `/lab auto` should report to the user only on a meaningful change or at the keepalive cadence recorded in the current contract or runtime state, and it should not ask the user to trigger the next poll.
103
+ - Final visible closeout must consume the validated stage report: show requested deliverable statuses, Core Explanation Table rows, evidence paths, validation/verification commands and results, known gaps, and the next action. Do not end with only "done", "pushed", or a chronological command log.
103
104
 
104
105
  - Treat `Autonomy level L1/L2/L3` as the execution privilege level, not as a paper layer, phase, or table number.
105
106
  - Treat `paper layer`, `phase`, and `table` as experiment targets. For example, `paper layer 3` or `Phase 1` should not be interpreted as `Autonomy level L3`.
@@ -27,3 +27,4 @@ If the preflight block cannot be completed because any required field is missing
27
27
  When the repository workflow language is Chinese, summaries, checklist items, task labels, and progress updates should be written in Chinese unless a literal identifier must stay unchanged.
28
28
  Treat `Layer 3`, `Phase 1`, or `Table 2` as paper-scope targets. Treat `Autonomy level L3` as the execution permission level.
29
29
  Do not replace the real long-running experiment command with a short watcher such as `sleep 30`, `pgrep`, or a one-shot `metrics.json` probe. While the real experiment process is still alive, emit only a progress update and keep waiting.
30
+ Final visible closeout is mandatory when `/lab:auto` reaches stop, failure, escalation, or handoff. After validating the stage report, the final answer must consume that report directly: list the requested deliverables or objectives with status, summarize the Core Explanation Table rows, provide evidence paths, show validation/verification commands and validation results, name known gaps or commands that could not run, and state the next action plus why it is appropriate. Do not end with only `done`, `pushed`, `completed`, or a chronological command log.
@@ -27,3 +27,4 @@ If the preflight block cannot be completed because any required field is missing
27
27
  When the repository workflow language is Chinese, summaries, checklist items, task labels, and progress updates should be written in Chinese unless a literal identifier must stay unchanged.
28
28
  Treat `Layer 3`, `Phase 1`, or `Table 2` as paper-scope targets. Treat `Autonomy level L3` as the execution permission level.
29
29
  Do not replace the real long-running experiment command with a short watcher such as `sleep 30`, `pgrep`, or a one-shot `metrics.json` probe. While the real experiment process is still alive, emit only a progress update and keep waiting.
30
+ Final visible closeout is mandatory when `/lab:auto` reaches stop, failure, escalation, or handoff. After validating the stage report, the final answer must consume that report directly: list the requested deliverables or objectives with status, summarize the Core Explanation Table rows, provide evidence paths, show validation/verification commands and validation results, name known gaps or commands that could not run, and state the next action plus why it is appropriate. Do not end with only `done`, `pushed`, `completed`, or a chronological command log.
@@ -94,6 +94,7 @@ Treat all of these as equivalent stage requests:
94
94
  - While the loop is alive, `/lab:auto` should keep `.lab/context/auto-ledger.md` updated with the active owner, observed state, and resume boundary.
95
95
  - Separate internal polling from user-facing progress reports.
96
96
  - While the loop is healthy, `/lab:auto` should report to the user only on a meaningful change or at the keepalive cadence recorded in the current contract or runtime state, and it should not ask the user to trigger the next poll.
97
+ - Final visible closeout must consume the validated stage report: show requested deliverable statuses, Core Explanation Table rows, evidence paths, validation/verification commands and results, known gaps, and the next action. Do not end with only "done", "pushed", or a chronological command log.
97
98
 
98
99
  - Treat `Autonomy level L1/L2/L3` as the execution privilege level, not as a paper layer, phase, or table number.
99
100
  - Treat `paper layer`, `phase`, and `table` as experiment targets. For example, `paper layer 3` or `Phase 1` should not be interpreted as `Autonomy level L3`.
@@ -27,3 +27,4 @@ If the preflight block cannot be completed because any required field is missing
27
27
  When the repository workflow language is Chinese, summaries, checklist items, task labels, and progress updates should be written in Chinese unless a literal identifier must stay unchanged.
28
28
  Treat `Layer 3`, `Phase 1`, or `Table 2` as paper-scope targets. Treat `Autonomy level L3` as the execution permission level.
29
29
  Do not replace the real long-running experiment command with a short watcher such as `sleep 30`, `pgrep`, or a one-shot `metrics.json` probe. While the real experiment process is still alive, emit only a progress update and keep waiting.
30
+ Final visible closeout is mandatory when `/lab:auto` reaches stop, failure, escalation, or handoff. After validating the stage report, the final answer must consume that report directly: list the requested deliverables or objectives with status, summarize the Core Explanation Table rows, provide evidence paths, show validation/verification commands and validation results, name known gaps or commands that could not run, and state the next action plus why it is appropriate. Do not end with only `done`, `pushed`, `completed`, or a chronological command log.
@@ -27,3 +27,4 @@ If the preflight block cannot be completed because any required field is missing
27
27
  When the repository workflow language is Chinese, summaries, checklist items, task labels, and progress updates should be written in Chinese unless a literal identifier must stay unchanged.
28
28
  Treat `Layer 3`, `Phase 1`, or `Table 2` as paper-scope targets. Treat `Autonomy level L3` as the execution permission level.
29
29
  Do not replace the real long-running experiment command with a short watcher such as `sleep 30`, `pgrep`, or a one-shot `metrics.json` probe. While the real experiment process is still alive, emit only a progress update and keep waiting.
30
+ Final visible closeout is mandatory when `/lab:auto` reaches stop, failure, escalation, or handoff. After validating the stage report, the final answer must consume that report directly: list the requested deliverables or objectives with status, summarize the Core Explanation Table rows, provide evidence paths, show validation/verification commands and validation results, name known gaps or commands that could not run, and state the next action plus why it is appropriate. Do not end with only `done`, `pushed`, `completed`, or a chronological command log.
@@ -49,6 +49,7 @@ Use this skill when the user invokes `/lab:*` or asks for the structured researc
49
49
  - If the stage says improvement is needed, do not choose `stop` unless the next action states a concrete terminal boundary such as budget exhaustion, frozen-core risk, safety or integrity failure, impossible target, or a required approval boundary. Otherwise choose `continue`, `revise`, `rerun`, or `escalate`.
50
50
  - Stage reports are closeout and handoff artifacts, not a new user command and not a replacement for stage-specific artifacts such as idea memos, iteration reports, final reports, or write-iteration records.
51
51
  - Run `.lab/.managed/scripts/validate_stage_report.py --stage-report <stage-report> --stage <stage>` before claiming the stage is complete, and include the stage-report path plus validation result in the final user-facing summary.
52
+ - For `/lab:auto`, the final user-facing answer must visibly consume the validated stage report: summarize requested deliverable statuses, Core Explanation Table rows, evidence paths, validation/verification commands and results, known gaps, and the next action. A chat-only chronological result list is not a valid closeout.
52
53
  - Final paper output should default to LaTeX, and its manuscript language should be decided separately from the workflow language.
53
54
  - Separate sourced facts from model-generated hypotheses.
54
55
  - Preserve failed runs, failed ideas, and limitations.
@@ -230,3 +230,13 @@
230
230
  - Fill the `Core Explanation Table` in plain language: background, why now, what ran, how the loop ran, what worked, what did not work, what was verified, what remains unverified, what needs improvement and why, how to improve and why, key evidence, and the continue/stop/revise/rerun/escalate/handoff decision.
231
231
  - If the table says improvement is needed, the next action may be `stop` only when a terminal boundary is explicitly named; otherwise choose `continue`, `revise`, `rerun`, or `escalate`.
232
232
  - Run `.lab/.managed/scripts/validate_stage_report.py --stage-report <stage-report> --stage auto` and include the report path plus validation result in the final user-facing summary.
233
+ - Final visible closeout is mandatory after validation. Do not end `/lab:auto` with only "done", "pushed", "completed", or a chronological command log.
234
+ - The final visible closeout must be derived from the validated stage report, not from a separate improvised narrative.
235
+ - The final visible closeout must include:
236
+ - the user's requested deliverables or objectives and their status: completed, repaired, failed-gate, not promoted, blocked, or handoff
237
+ - the key Core Explanation Table rows: what was done, how it was done, what worked, what did not work, what was verified, what remains unverified, whether improvement is needed and why, how to improve and why
238
+ - evidence paths and primary artifacts
239
+ - validation/verification commands and validation result, including commands that could not run
240
+ - known gaps or compile/runtime limitations
241
+ - next action and why that action is appropriate
242
+ - If the final answer says the work is "completed", it must still name any remaining handoff boundary such as PDF compile, layout check, external approval, budget exhaustion, frozen-core risk, or missing environment.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "superlab",
3
- "version": "0.1.74",
3
+ "version": "0.1.75",
4
4
  "description": "Strict /lab research workflow installer for Codex and Claude",
5
5
  "keywords": [
6
6
  "codex",