npm - superlab - Versions diffs - 0.1.29 → 0.1.31 - Mend

superlab 0.1.29 → 0.1.31

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/bin/superlab.cjs +0 -0
package/lib/lab_idea_contract.json +4 -4
package/package-assets/claude/commands/lab-idea.md +1 -1
package/package-assets/codex/prompts/lab-idea.md +1 -1
package/package-assets/shared/lab/.managed/scripts/validate_idea_artifact.py +32 -0
package/package-assets/shared/lab/.managed/templates/idea.md +30 -0
package/package-assets/shared/skills/lab/stages/idea.md +25 -10
package/package.json +1 -1

package/bin/superlab.cjs CHANGED Viewed

File without changes

package/lib/lab_idea_contract.json CHANGED Viewed

@@ -1,8 +1,8 @@
 {
   "stage_prompt": {
-    "codex_en": "This command runs the `/lab:idea` stage. Use `.codex/skills/lab/stages/idea.md` as the single source of truth for the two brainstorm passes, two literature sweeps, closest-prior comparison, source-backed proposal memo, minimum viable experiment, and approval gate. Start with brainstorm pass 1 over 2-4 candidate directions, run literature sweep 1 with real closest-prior references for each direction, narrow the field with brainstorm pass 2, then run literature sweep 2 to build the final source bundle before producing a collaborator-readable recommendation. Keep `.lab/writing/idea-source-log.md` synchronized with the actual search queries, bucketed sources, and final source count used in both sweeps. The literature bundle should default to about 20 sources unless the field is genuinely narrow and that smaller bundle is explicitly justified.",
-    "claude_en": "This command runs the `idea` stage of the lab workflow. Use `.claude/skills/lab/stages/idea.md` as the single source of truth for the two brainstorm passes, two literature sweeps, closest-prior comparison, source-backed proposal memo, minimum viable experiment, and approval gate. Start with brainstorm pass 1 over 2-4 candidate directions, run literature sweep 1 with real closest-prior references for each direction, narrow the field with brainstorm pass 2, then run literature sweep 2 to build the final source bundle before producing a collaborator-readable recommendation. Keep `.lab/writing/idea-source-log.md` synchronized with the actual search queries, bucketed sources, and final source count used in both sweeps. The literature bundle should default to about 20 sources unless the field is genuinely narrow and that smaller bundle is explicitly justified.",
-    "codex_zh": "本命令运行 `/lab:idea` 阶段。把 `.codex/skills/lab/stages/idea.md` 当成两轮脑暴、两轮文献检索、最接近前作对照、source-backed proposal memo、最小可行实验和 approval gate 的单一来源。先做第一轮脑暴，产出 2-4 个候选方向；再做第一轮文献检索，为每个方向补最接近前作；然后用第二轮脑暴淘汰或收敛方向；最后做第二轮文献检索，补齐最终来源包，再输出协作者可读的推荐结论。`.lab/writing/idea-source-log.md` 必须和两轮检索实际用到的查询、来源分桶和最终来源数保持一致。文献来源包默认目标约 20 篇；如果领域确实很窄，必须显式解释为什么合理地低于这个目标。",
-    "claude_zh": "本命令运行 lab workflow 的 `idea` 阶段。把 `.claude/skills/lab/stages/idea.md` 当成两轮脑暴、两轮文献检索、最接近前作对照、source-backed proposal memo、最小可行实验和 approval gate 的单一来源。先做第一轮脑暴，产出 2-4 个候选方向；再做第一轮文献检索，为每个方向补最接近前作；然后用第二轮脑暴淘汰或收敛方向；最后做第二轮文献检索，补齐最终来源包，再输出协作者可读的推荐结论。`.lab/writing/idea-source-log.md` 必须和两轮检索实际用到的查询、来源分桶和最终来源数保持一致。文献来源包默认目标约 20 篇；如果领域确实很窄，必须显式解释为什么合理地低于这个目标。"
+    "codex_en": "This command runs the `/lab:idea` stage. Use `.codex/skills/lab/stages/idea.md` as the single source of truth for the two brainstorm passes, two literature sweeps, closest-prior comparison, source-backed proposal memo, evaluation sketch, tentative contributions, user guidance, minimum viable experiment, and approval gate. Start with brainstorm pass 1 over 2-4 candidate directions, run literature sweep 1 with real closest-prior references for each direction, narrow the field with brainstorm pass 2, then run literature sweep 2 to build the final source bundle before producing a collaborator-readable recommendation. The final idea memo must explain the real-world scenario, the problem solved, why current methods fall short, roughly how the idea would work, how it would be evaluated, what the tentative contributions are, and what the user should decide next. Keep `.lab/writing/idea-source-log.md` synchronized with the actual search queries, bucketed sources, and final source count used in both sweeps. The literature bundle should default to about 20 sources unless the field is genuinely narrow and that smaller bundle is explicitly justified.",
+    "claude_en": "This command runs the `idea` stage of the lab workflow. Use `.claude/skills/lab/stages/idea.md` as the single source of truth for the two brainstorm passes, two literature sweeps, closest-prior comparison, source-backed proposal memo, evaluation sketch, tentative contributions, user guidance, minimum viable experiment, and approval gate. Start with brainstorm pass 1 over 2-4 candidate directions, run literature sweep 1 with real closest-prior references for each direction, narrow the field with brainstorm pass 2, then run literature sweep 2 to build the final source bundle before producing a collaborator-readable recommendation. The final idea memo must explain the real-world scenario, the problem solved, why current methods fall short, roughly how the idea would work, how it would be evaluated, what the tentative contributions are, and what the user should decide next. Keep `.lab/writing/idea-source-log.md` synchronized with the actual search queries, bucketed sources, and final source count used in both sweeps. The literature bundle should default to about 20 sources unless the field is genuinely narrow and that smaller bundle is explicitly justified.",
+    "codex_zh": "本命令运行 `/lab:idea` 阶段。把 `.codex/skills/lab/stages/idea.md` 当成两轮脑暴、两轮文献检索、最接近前作对照、source-backed proposal memo、评测草图、暂定贡献、用户引导、最小可行实验和 approval gate 的单一来源。先做第一轮脑暴，产出 2-4 个候选方向；再做第一轮文献检索，为每个方向补最接近前作；然后用第二轮脑暴淘汰或收敛方向；最后做第二轮文献检索，补齐最终来源包，再输出协作者可读的推荐结论。最终 idea memo 必须讲清真实场景、解决了什么问题、现有方法为什么不够、准备怎么做、大致怎么评、暂定贡献是什么，以及用户下一步该决定什么。`.lab/writing/idea-source-log.md` 必须和两轮检索实际用到的查询、来源分桶和最终来源数保持一致。文献来源包默认目标约 20 篇；如果领域确实很窄，必须显式解释为什么合理地低于这个目标。",
+    "claude_zh": "本命令运行 lab workflow 的 `idea` 阶段。把 `.claude/skills/lab/stages/idea.md` 当成两轮脑暴、两轮文献检索、最接近前作对照、source-backed proposal memo、评测草图、暂定贡献、用户引导、最小可行实验和 approval gate 的单一来源。先做第一轮脑暴，产出 2-4 个候选方向；再做第一轮文献检索，为每个方向补最接近前作；然后用第二轮脑暴淘汰或收敛方向；最后做第二轮文献检索，补齐最终来源包，再输出协作者可读的推荐结论。最终 idea memo 必须讲清真实场景、解决了什么问题、现有方法为什么不够、准备怎么做、大致怎么评、暂定贡献是什么，以及用户下一步该决定什么。`.lab/writing/idea-source-log.md` 必须和两轮检索实际用到的查询、来源分桶和最终来源数保持一致。文献来源包默认目标约 20 篇；如果领域确实很窄，必须显式解释为什么合理地低于这个目标。"
   }
 }

package/package-assets/claude/commands/lab-idea.md CHANGED Viewed

@@ -7,4 +7,4 @@ argument-hint: idea or research problem
 Use the installed `lab` skill at `.claude/skills/lab/SKILL.md`.
 Execute the requested `/lab-idea` command against the user's argument now. Do not only recommend another lab stage. If a blocking prerequisite is missing, say exactly what is missing and ask at most one clarifying question.
-This command runs the `idea` stage of the lab workflow. Use `.claude/skills/lab/stages/idea.md` as the single source of truth for the two brainstorm passes, two literature sweeps, closest-prior comparison, source-backed proposal memo, minimum viable experiment, and approval gate. Start with brainstorm pass 1 over 2-4 candidate directions, run literature sweep 1 with real closest-prior references for each direction, narrow the field with brainstorm pass 2, then run literature sweep 2 to build the final source bundle before producing a collaborator-readable recommendation. Keep `.lab/writing/idea-source-log.md` synchronized with the actual search queries, bucketed sources, and final source count used in both sweeps. The literature bundle should default to about 20 sources unless the field is genuinely narrow and that smaller bundle is explicitly justified.
+This command runs the `idea` stage of the lab workflow. Use `.claude/skills/lab/stages/idea.md` as the single source of truth for the two brainstorm passes, two literature sweeps, closest-prior comparison, source-backed proposal memo, evaluation sketch, tentative contributions, user guidance, minimum viable experiment, and approval gate. Start with brainstorm pass 1 over 2-4 candidate directions, run literature sweep 1 with real closest-prior references for each direction, narrow the field with brainstorm pass 2, then run literature sweep 2 to build the final source bundle before producing a collaborator-readable recommendation. The final idea memo must explain the real-world scenario, the problem solved, why current methods fall short, roughly how the idea would work, how it would be evaluated, what the tentative contributions are, and what the user should decide next. Keep `.lab/writing/idea-source-log.md` synchronized with the actual search queries, bucketed sources, and final source count used in both sweeps. The literature bundle should default to about 20 sources unless the field is genuinely narrow and that smaller bundle is explicitly justified.

package/package-assets/codex/prompts/lab-idea.md CHANGED Viewed

@@ -6,4 +6,4 @@ argument-hint: idea or research problem
 Use the installed `lab` skill at `.codex/skills/lab/SKILL.md`.
 Execute the requested `/lab:idea` stage against the user's argument now. Do not only recommend another lab stage. If a blocking prerequisite is missing, say exactly what is missing and ask at most one clarifying question.
-This command runs the `/lab:idea` stage. Use `.codex/skills/lab/stages/idea.md` as the single source of truth for the two brainstorm passes, two literature sweeps, closest-prior comparison, source-backed proposal memo, minimum viable experiment, and approval gate. Start with brainstorm pass 1 over 2-4 candidate directions, run literature sweep 1 with real closest-prior references for each direction, narrow the field with brainstorm pass 2, then run literature sweep 2 to build the final source bundle before producing a collaborator-readable recommendation. Keep `.lab/writing/idea-source-log.md` synchronized with the actual search queries, bucketed sources, and final source count used in both sweeps. The literature bundle should default to about 20 sources unless the field is genuinely narrow and that smaller bundle is explicitly justified.
+This command runs the `/lab:idea` stage. Use `.codex/skills/lab/stages/idea.md` as the single source of truth for the two brainstorm passes, two literature sweeps, closest-prior comparison, source-backed proposal memo, evaluation sketch, tentative contributions, user guidance, minimum viable experiment, and approval gate. Start with brainstorm pass 1 over 2-4 candidate directions, run literature sweep 1 with real closest-prior references for each direction, narrow the field with brainstorm pass 2, then run literature sweep 2 to build the final source bundle before producing a collaborator-readable recommendation. The final idea memo must explain the real-world scenario, the problem solved, why current methods fall short, roughly how the idea would work, how it would be evaluated, what the tentative contributions are, and what the user should decide next. Keep `.lab/writing/idea-source-log.md` synchronized with the actual search queries, bucketed sources, and final source count used in both sweeps. The literature bundle should default to about 20 sources unless the field is genuinely narrow and that smaller bundle is explicitly justified.

package/package-assets/shared/lab/.managed/scripts/validate_idea_artifact.py CHANGED Viewed

@@ -18,9 +18,13 @@ REQUIRED_SECTIONS = {
     "Brainstorm Pass 2": [r"^##\s+Brainstorm Pass 2\s*$", r"^##\s+第二轮脑暴\s*$"],
     "Literature Sweep 2": [r"^##\s+Literature Sweep 2\s*$", r"^##\s+第二轮文献(?:检索|收敛)?\s*$"],
     "Rough Approach": [r"^##\s+Rough Approach\s*$", r"^##\s+我们准备怎么做\s*$"],
+    "Problem Solved": [r"^##\s+Problem Solved\s*$", r"^##\s+解决了什么问题\s*$"],
+    "Evaluation Sketch": [r"^##\s+Evaluation Sketch\s*$", r"^##\s+评测草图\s*$"],
+    "Tentative Contributions": [r"^##\s+Tentative Contributions\s*$", r"^##\s+暂定贡献\s*$"],
     "Candidate Experiment": [r"^##\s+Candidate Experiment\s*$", r"^##\s+(?:最小实验|候选实验)\s*$"],
     "Falsifiable Hypothesis": [r"^##\s+Falsifiable Hypothesis\s*$", r"^##\s+可证伪假设\s*$"],
     "Final Recommendation": [r"^##\s+Final Recommendation\s*$", r"^##\s+最终推荐\s*$"],
+    "User Guidance": [r"^##\s+User Guidance\s*$", r"^##\s+用户引导\s*$"],
 }
 SOURCE_LOG_SECTIONS = {
@@ -226,6 +230,26 @@ def validate_content(text: str) -> list[str]:
     if not contains_any(rough_approach, ("plain-language", "how this would work", "粗略做法", "怎么做", "why this design", "为什么")):
         issues.append("idea artifact is missing a rough plain-language approach")
+    problem_solved = extract_section_body(text, REQUIRED_SECTIONS["Problem Solved"])
+    if not has_field_value(problem_solved, ("In plain language", "白话问题", "用大白话说")):
+        issues.append("idea artifact is missing a plain-language statement of what problem the idea actually solves")
+    if not has_field_value(problem_solved, ("What becomes possible if this works", "如果这条路成立", "如果这条路可行")):
+        issues.append("idea artifact is missing the payoff of solving the proposed problem")
+    evaluation_sketch = extract_section_body(text, REQUIRED_SECTIONS["Evaluation Sketch"])
+    if not has_field_value(evaluation_sketch, ("Evaluation subject", "评测对象")):
+        issues.append("idea artifact is missing an evaluation sketch with the evaluation subject")
+    if not has_field_value(evaluation_sketch, ("Proxy or simulator, if any", "代理或模拟器")):
+        issues.append("idea artifact is missing an evaluation sketch that states any proxy or simulator")
+    if not has_field_value(evaluation_sketch, ("Main outcome to observe", "主要观察结果")):
+        issues.append("idea artifact is missing the main outcome in the evaluation sketch")
+    if not has_field_value(evaluation_sketch, ("Main validity risk", "主要有效性风险")):
+        issues.append("idea artifact is missing the main validity risk in the evaluation sketch")
+    tentative_contributions = extract_section_body(text, REQUIRED_SECTIONS["Tentative Contributions"])
+    if sum(1 for label in ("Contribution 1", "Contribution 2", "Contribution 3", "贡献 1", "贡献 2", "贡献 3") if has_field_value(tentative_contributions, (label,))) < 2:
+        issues.append("idea artifact is missing tentative contributions stated at the idea level")
     experiment = extract_section_body(text, REQUIRED_SECTIONS["Candidate Experiment"])
     if not contains_any(experiment, ("minimum viable experiment", "minimum experiment", "dataset", "metric", "最小实验", "主指标", "次指标")):
         issues.append("idea artifact is missing a minimum experiment")
@@ -234,6 +258,14 @@ def validate_content(text: str) -> list[str]:
     if not contains_any(final_recommendation, ("recommended direction", "paper-worthy", "推荐方向", "值得做论文")):
         issues.append("idea artifact is missing a final recommendation after the second sweep")
+    user_guidance = extract_section_body(text, REQUIRED_SECTIONS["User Guidance"])
+    if not has_field_value(user_guidance, ("Immediate decision needed from the user", "现在最需要你确认的选择", "Immediate decision")):
+        issues.append("idea artifact is missing user guidance about the next decision")
+    if not has_field_value(user_guidance, ("Information that would sharpen the idea", "哪些信息会显著提高下一轮判断质量", "Information that would sharpen")):
+        issues.append("idea artifact is missing user guidance about what information would sharpen the idea")
+    if not has_field_value(user_guidance, ("Recommended next stage", "推荐下一步")):
+        issues.append("idea artifact is missing user guidance about the next lab stage")
     return issues

package/package-assets/shared/lab/.managed/templates/idea.md CHANGED Viewed

@@ -54,7 +54,13 @@ Suggested levels:
 ## Existing Methods
 - Mainstream line 1:
+- Citation:
+- What it solves:
+- Why it still falls short here:
 - Mainstream line 2:
+- Citation:
+- What it solves:
+- Why it still falls short here:
 - Shared assumption:
 - Why that assumption breaks here:
@@ -127,6 +133,24 @@ Suggested levels:
 - Plain-language description of how this would work:
 - Why this design might resolve the failure case:
+## Problem Solved
+- In plain language:
+- What becomes possible if this works:
+## Evaluation Sketch
+- Evaluation subject:
+- Proxy or simulator, if any:
+- Main outcome to observe:
+- Main validity risk:
+## Tentative Contributions
+- Contribution 1:
+- Contribution 2:
+- Contribution 3:
 ## Three Meaningful Points
 1. Significance:
@@ -176,6 +200,12 @@ Suggested levels:
 - Recommended direction after two sweeps:
 - Why this is still paper-worthy:
+## User Guidance
+- Immediate decision needed from the user:
+- Information that would sharpen the idea:
+- Recommended next stage:
 ## Approval Gate
 - User-approved direction:

package/package-assets/shared/skills/lab/stages/idea.md CHANGED Viewed

@@ -6,6 +6,7 @@
 - one-sentence problem statement
 - why the problem matters in plain language
 - failure case
+- problem solved in plain language
 - idea classification
 - contribution category
 - breakthrough level
@@ -13,6 +14,8 @@
 - closest-prior-work comparison
 - why the proposed idea is better than existing methods
 - rough plain-language approach description
+- evaluation sketch with the evaluation subject, any proxy or simulator, the main outcome to observe, and the main validity risk
+- tentative contributions stated at idea level, not final paper-facing wording
 - three meaningful points
 - brainstorm pass 1 with 2-4 candidate directions
 - literature sweep 1 with 3-5 closest-prior references per direction
@@ -26,6 +29,7 @@
 - generated innovation hypothesis
 - critique before convergence
 - minimum viable experiment
+- user guidance that tells the user what decision matters now, what missing information would sharpen the idea, and what stage should come next
 - explicit approval gate before `/lab:spec`
 - canonical mission context updated with the approved problem, importance, failure case, and direction
@@ -45,6 +49,9 @@
 - If the field is genuinely too narrow to support that target, say so explicitly in both the idea artifact and the idea source log, and justify the smaller literature bundle instead of silently skipping the search.
 - The idea artifact must follow the repository `workflow_language`, not whichever language is easiest locally.
 - Before writing the full artifact, give the user a short summary with the one-sentence problem, why current methods fail, and the three meaningful points.
+- If the current evaluation plan uses a proxy, simulator, or synthetic user in place of a real subject, say that explicitly in the idea artifact and explain why it is acceptable at the idea stage.
+- Keep tentative contributions at the idea level. Do not drift into final paper-facing naming, title, or contribution wording; that belongs to `/lab:framing`.
+- End the stage output with a user-guidance block that tells the user what to decide next, what information would most improve the idea, and which `/lab` stage should follow.
 ## Context Read Set
@@ -78,16 +85,20 @@
 11. Brainstorm pass 2
 12. Literature sweep 2
 13. Rough approach in plain language
-14. Why the proposed idea is better
-15. Three meaningful points
-16. Candidate approaches and recommendation
-17. Dataset, baseline, and metric candidates
-18. Falsifiable hypothesis
-19. Expert critique
-20. Revised proposal or final recommendation
-21. Approval gate
-22. Minimum viable experiment
-23. Idea source log aligned with the two literature sweeps
+14. Problem solved in plain language
+15. Why the proposed idea is better
+16. Evaluation sketch
+17. Tentative contributions
+18. Three meaningful points
+19. Candidate approaches and recommendation
+20. Dataset, baseline, and metric candidates
+21. Falsifiable hypothesis
+22. Expert critique
+23. Revised proposal or final recommendation
+24. User guidance
+25. Approval gate
+26. Minimum viable experiment
+27. Idea source log aligned with the two literature sweeps
 ## Writing Standard
@@ -103,6 +114,10 @@
 - Do not call something paper-worthy or novel after only one brainstorm pass or one literature sweep.
 - Do not treat the idea artifact itself as the only evidence record; keep `.lab/writing/idea-source-log.md` synchronized with the actual searches and source buckets used in both literature sweeps.
 - Explain what current methods do, why they fall short, and roughly how the proposed idea would work in plain language.
+- Explain what problem the idea actually solves before describing tentative contributions.
+- Keep the evaluation sketch high-level: who or what is evaluated, what proxy or simulator is used if any, what outcome matters, and what the main validity risk is. Leave full protocol design to later stages.
+- Keep contributions tentative and high-level. The goal here is to explain what the paper might contribute, not to freeze paper-facing wording.
 - The three meaningful points should each fit in one direct sentence.
+- The final output must guide the user. Tell them what decision matters now, what information would sharpen the idea, and which `/lab` stage should come next.
 - Before approval, run `.lab/.managed/scripts/validate_idea_artifact.py --idea <idea-artifact> --source-log .lab/writing/idea-source-log.md --workflow-config .lab/config/workflow.json`.
 - Do not leave `.lab/context/mission.md` as an empty template after convergence; write the approved problem, why it matters, the current benchmark scope, and the approved direction back into canonical context.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "superlab",
-  "version": "0.1.29",
+  "version": "0.1.31",
   "description": "Strict /lab research workflow installer for Codex and Claude",
   "keywords": [
     "codex",