npm - superlab - Versions diffs - 0.1.23 → 0.1.25 - Mend

superlab 0.1.23 → 0.1.25

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (29) hide show

package/README.md +3 -2
package/README.zh-CN.md +3 -2
package/lib/auto_contracts.cjs +4 -2
package/lib/auto_runner.cjs +30 -0
package/lib/auto_state.cjs +30 -0
package/lib/context.cjs +437 -14
package/lib/eval_protocol.cjs +75 -0
package/lib/i18n.cjs +140 -24
package/lib/install.cjs +2 -0
package/package-assets/claude/commands/lab.md +2 -2
package/package-assets/codex/prompts/lab.md +2 -2
package/package-assets/shared/lab/.managed/scripts/validate_collaborator_report.py +53 -0
package/package-assets/shared/lab/.managed/templates/artifact-status.md +28 -0
package/package-assets/shared/lab/.managed/templates/final-report.md +24 -19
package/package-assets/shared/lab/.managed/templates/review-checklist.md +4 -0
package/package-assets/shared/lab/context/auto-mode.md +3 -3
package/package-assets/shared/lab/context/auto-outcome.md +15 -0
package/package-assets/shared/lab/context/eval-protocol.md +21 -0
package/package-assets/shared/lab/context/session-brief.md +1 -1
package/package-assets/shared/lab/context/state.md +19 -13
package/package-assets/shared/lab/context/workflow-state.md +19 -0
package/package-assets/shared/lab/system/core.md +4 -2
package/package-assets/shared/skills/lab/SKILL.md +10 -10
package/package-assets/shared/skills/lab/stages/auto.md +5 -1
package/package-assets/shared/skills/lab/stages/iterate.md +4 -0
package/package-assets/shared/skills/lab/stages/report.md +11 -1
package/package-assets/shared/skills/lab/stages/review.md +4 -0
package/package-assets/shared/skills/lab/stages/run.md +4 -0
package/package.json +1 -1

package/README.md CHANGED Viewed

@@ -180,7 +180,7 @@ superlab auto stop
 - `run` and `iterate` must change persistent outputs under `results_root`
 - `review` must update canonical review context
-- `report` must write `<deliverables_root>/report.md` and `<deliverables_root>/main-tables.md`
+- `report` must write `<deliverables_root>/report.md`, `<deliverables_root>/main-tables.md`, and `<deliverables_root>/artifact-status.md`
 - `write` must produce LaTeX output under `<deliverables_root>/paper/`
 - a successful promotion must write back into `.lab/context/data-decisions.md`, `.lab/context/decisions.md`, `.lab/context/state.md`, and `.lab/context/session-brief.md`
 - every run must end with `.lab/context/auto-outcome.md`, including why it stopped, whether the terminal goal was reached, and which artifact is the final outcome
@@ -201,7 +201,7 @@ Level Guide for `/lab:auto`:
 Example:
 ```text
-/lab:auto Autonomy level L2. Objective: advance paper layer 3 organizer enforcement. Terminal goal: task-completion. Scope: bounded protocol, tests, minimal implementation, and one small run. Allowed modifications: evaluator prompt registry, ingestion, and parser only.
+/lab:auto Autonomy level L2. Objective: advance paper layer 3 through one bounded protocol improvement. Terminal goal: task-completion. Scope: bounded protocol, tests, one minimal implementation, and one small run. Allowed modifications: configuration, evaluation script, and data-loading logic only.
 ```
 ## Version
@@ -309,6 +309,7 @@ See the source command docs in [commands/codex/lab.md](/Users/zhouhao119/coding/
 - `docs/research/report.md`
 - `docs/research/main-tables.md`
+- `docs/research/artifact-status.md`
 - `docs/research/paper/main.tex`
 - `docs/research/paper/sections/*.tex`

package/README.zh-CN.md CHANGED Viewed

@@ -178,7 +178,7 @@ superlab auto stop
 - `run` 和 `iterate` 必须更新 `results_root` 下的持久输出
 - `review` 必须更新规范的审查上下文
-- `report` 必须写出 `<deliverables_root>/report.md` 和 `<deliverables_root>/main-tables.md`
+- `report` 必须写出 `<deliverables_root>/report.md`、`<deliverables_root>/main-tables.md` 和 `<deliverables_root>/artifact-status.md`
 - `write` 必须写出 `<deliverables_root>/paper/` 下的 LaTeX 论文产物
 - promotion 成功后必须写回 `.lab/context/data-decisions.md`、`.lab/context/decisions.md`、`.lab/context/state.md` 和 `.lab/context/session-brief.md`
 - 每次运行都必须写出 `.lab/context/auto-outcome.md`，记录为什么停止、是否达到终止目标，以及哪一个工件是最终结果
@@ -199,7 +199,7 @@ superlab auto stop
 示例：
 ```text
-/lab:auto 自治级别 L2。目标：推进 paper layer 3 的 organizer enforcement。终止条件：完成 bounded protocol、测试、最小实现和一轮小规模结果。允许修改：evaluator prompt registry、ingestion、parser。
+/lab:auto 自治级别 L2。目标：推进 paper layer 3 的一项有边界协议改进。终止条件：完成 bounded protocol、测试、一项最小实现和一轮小规模结果。允许修改：配置、评估脚本、数据加载逻辑。
 ```
 ## 版本查询
@@ -294,6 +294,7 @@ Codex 和 Claude 的命令入口不一样：
 - `docs/research/report.md`
 - `docs/research/main-tables.md`
+- `docs/research/artifact-status.md`
 - `docs/research/paper/main.tex`
 - `docs/research/paper/sections/*.tex`

package/lib/auto_contracts.cjs CHANGED Viewed

@@ -34,6 +34,7 @@ const FROZEN_CORE_ALIASES = {
 const REVIEW_CONTEXT_FILES = [
   path.join(".lab", "context", "decisions.md"),
   path.join(".lab", "context", "state.md"),
+  path.join(".lab", "context", "workflow-state.md"),
   path.join(".lab", "context", "open-questions.md"),
   path.join(".lab", "context", "evidence-index.md"),
 ];
@@ -288,6 +289,7 @@ function stageContractSnapshot(targetDir, stage) {
     report: [
       path.join(deliverablesRoot, "report.md"),
       path.join(deliverablesRoot, "main-tables.md"),
+      path.join(deliverablesRoot, "artifact-status.md"),
     ],
     write: [
       path.join(deliverablesRoot, "paper", "main.tex"),
@@ -318,7 +320,7 @@ function verifyStageContract({ stage, snapshot }) {
   if (stage === "review") {
     if (changedPaths.length === 0) {
       throw new Error(
-        "review stage did not update canonical review context (.lab/context/decisions.md, state.md, open-questions.md, or evidence-index.md)"
+        "review stage did not update canonical review context (.lab/context/decisions.md, state.md, workflow-state.md, open-questions.md, or evidence-index.md)"
       );
     }
     return;
@@ -327,7 +329,7 @@ function verifyStageContract({ stage, snapshot }) {
   if (stage === "report") {
     const missing = Array.from(snapshot.keys()).filter((absolutePath) => !changedPaths.includes(absolutePath));
     if (missing.length > 0) {
-      throw new Error("report stage did not produce the deliverable report.md and main-tables.md under deliverables_root");
+      throw new Error("report stage did not produce report.md, main-tables.md, and artifact-status.md under deliverables_root");
     }
     return;
   }

package/lib/auto_runner.cjs CHANGED Viewed

@@ -278,6 +278,21 @@ async function startAutoMode({ targetDir, now = new Date() }) {
     comparisonSourcePapers: evalProtocol.comparisonSourcePapers,
     comparisonImplementationSource: evalProtocol.comparisonImplementationSource,
     deviationFromOriginalImplementation: evalProtocol.deviationFromOriginalImplementation,
+    evaluationSettingSemantics: evalProtocol.evaluationSettingSemantics,
+    visibilityAndLeakageRisks: evalProtocol.visibilityAndLeakageRisks,
+    anchorAndLabelPolicy: evalProtocol.anchorAndLabelPolicy,
+    scaleAndComparabilityPolicy: evalProtocol.scaleAndComparabilityPolicy,
+    metricValidityChecks: evalProtocol.metricValidityChecks,
+    comparisonValidityChecks: evalProtocol.comparisonValidityChecks,
+    statisticalValidityChecks: evalProtocol.statisticalValidityChecks,
+    claimBoundary: evalProtocol.claimBoundary,
+    integritySelfCheck: evalProtocol.integritySelfCheck,
+    anomalySignals: evalProtocol.anomalySignals,
+    implementationRealityChecks: evalProtocol.implementationRealityChecks,
+    alternativeExplanationsConsidered: evalProtocol.alternativeExplanationsConsidered,
+    crossCheckMethod: evalProtocol.crossCheckMethod,
+    bestSupportedInterpretation: evalProtocol.bestSupportedInterpretation,
+    escalationThreshold: evalProtocol.escalationThreshold,
   };
   const writeRunningStatus = (overrides = {}) => {
@@ -768,6 +783,21 @@ function stopAutoMode({ targetDir, now = new Date() }) {
     comparisonSourcePapers: evalProtocol.comparisonSourcePapers,
     comparisonImplementationSource: evalProtocol.comparisonImplementationSource,
     deviationFromOriginalImplementation: evalProtocol.deviationFromOriginalImplementation,
+    evaluationSettingSemantics: evalProtocol.evaluationSettingSemantics,
+    visibilityAndLeakageRisks: evalProtocol.visibilityAndLeakageRisks,
+    anchorAndLabelPolicy: evalProtocol.anchorAndLabelPolicy,
+    scaleAndComparabilityPolicy: evalProtocol.scaleAndComparabilityPolicy,
+    metricValidityChecks: evalProtocol.metricValidityChecks,
+    comparisonValidityChecks: evalProtocol.comparisonValidityChecks,
+    statisticalValidityChecks: evalProtocol.statisticalValidityChecks,
+    claimBoundary: evalProtocol.claimBoundary,
+    integritySelfCheck: evalProtocol.integritySelfCheck,
+    anomalySignals: evalProtocol.anomalySignals,
+    implementationRealityChecks: evalProtocol.implementationRealityChecks,
+    alternativeExplanationsConsidered: evalProtocol.alternativeExplanationsConsidered,
+    crossCheckMethod: evalProtocol.crossCheckMethod,
+    bestSupportedInterpretation: evalProtocol.bestSupportedInterpretation,
+    escalationThreshold: evalProtocol.escalationThreshold,
   };
   const status = {
     ...existing,

package/lib/auto_state.cjs CHANGED Viewed

@@ -154,6 +154,21 @@ function renderAutoOutcome(outcome, { lang = "en" } = {}) {
 - 对比方法来源论文: ${outcome.comparisonSourcePapers || ""}
 - 对比方法实现来源: ${outcome.comparisonImplementationSource || ""}
 - 与原始实现的偏差: ${outcome.deviationFromOriginalImplementation || ""}
+- 评测设定语义: ${outcome.evaluationSettingSemantics || ""}
+- 可见性与泄漏风险: ${outcome.visibilityAndLeakageRisks || ""}
+- 锚点与标签策略: ${outcome.anchorAndLabelPolicy || ""}
+- 尺度与可比性策略: ${outcome.scaleAndComparabilityPolicy || ""}
+- 指标有效性检查: ${outcome.metricValidityChecks || ""}
+- 对比有效性检查: ${outcome.comparisonValidityChecks || ""}
+- 统计有效性检查: ${outcome.statisticalValidityChecks || ""}
+- 结论边界: ${outcome.claimBoundary || ""}
+- 完整性自检: ${outcome.integritySelfCheck || ""}
+- 异常信号: ${outcome.anomalySignals || ""}
+- 实现层现实检查: ${outcome.implementationRealityChecks || ""}
+- 已考虑的替代解释: ${outcome.alternativeExplanationsConsidered || ""}
+- 交叉验证方法: ${outcome.crossCheckMethod || ""}
+- 当前最站得住的解释: ${outcome.bestSupportedInterpretation || ""}
+- 升级阈值: ${outcome.escalationThreshold || ""}
 - 终止目标类型: ${outcome.terminalGoalType || ""}
 - 终止目标目标值: ${outcome.terminalGoalTarget || ""}
 - 必要终止工件: ${outcome.requiredTerminalArtifact || ""}
@@ -191,6 +206,21 @@ function renderAutoOutcome(outcome, { lang = "en" } = {}) {
 - Comparison source papers: ${outcome.comparisonSourcePapers || ""}
 - Comparison implementation source: ${outcome.comparisonImplementationSource || ""}
 - Deviation from original implementation: ${outcome.deviationFromOriginalImplementation || ""}
+- Evaluation setting semantics: ${outcome.evaluationSettingSemantics || ""}
+- Visibility and leakage risks: ${outcome.visibilityAndLeakageRisks || ""}
+- Anchor and label policy: ${outcome.anchorAndLabelPolicy || ""}
+- Scale and comparability policy: ${outcome.scaleAndComparabilityPolicy || ""}
+- Metric validity checks: ${outcome.metricValidityChecks || ""}
+- Comparison validity checks: ${outcome.comparisonValidityChecks || ""}
+- Statistical validity checks: ${outcome.statisticalValidityChecks || ""}
+- Claim boundary: ${outcome.claimBoundary || ""}
+- Integrity self-check: ${outcome.integritySelfCheck || ""}
+- Anomaly signals: ${outcome.anomalySignals || ""}
+- Implementation reality checks: ${outcome.implementationRealityChecks || ""}
+- Alternative explanations considered: ${outcome.alternativeExplanationsConsidered || ""}
+- Cross-check method: ${outcome.crossCheckMethod || ""}
+- Best-supported interpretation: ${outcome.bestSupportedInterpretation || ""}
+- Escalation threshold: ${outcome.escalationThreshold || ""}
 - Terminal goal type: ${outcome.terminalGoalType || ""}
 - Terminal goal target: ${outcome.terminalGoalTarget || ""}
 - Required terminal artifact: ${outcome.requiredTerminalArtifact || ""}