npm - superlab - Versions diffs - 0.1.65 → 0.1.66 - Mend

superlab 0.1.65 → 0.1.66

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

package/lib/i18n.cjs CHANGED Viewed

@@ -2150,6 +2150,12 @@ ZH_CONTENT[path.join(".codex", "skills", "lab", "stages", "write.md")] = `# \`/l
 - 如果当前 section 是 \`abstract\`、\`introduction\` 或 \`method\`，还必须继续读取本地 example bank：\`references/paper-writing/examples/index.md\`、对应的 examples index，以及 1-2 个具体 example 文件。
 - 如果当前 section 是 \`related work\`、\`experiments\` 或 \`conclusion\`，也要读取对应的本地 example bank：\`references/paper-writing/examples/index.md\`、对应的 examples index，以及 1-2 个具体 example 文件。
 - 例子只能复用结构、段落角色和句法逻辑，不能直接复用原句。
+- 如果用户在 \`/lab:write\` 里提供本地 PDF、PDF URL、HTML 页面或本地参考论文，起草前必须先运行 \`.lab/.managed/scripts/extract_reference_paper_structure.py --output-dir .lab/writing/reference-patterns <sources...>\`，除非 \`.lab/writing/reference-patterns/aggregate-template-playbook.md\` 已经覆盖这些完全相同的来源。
+- reference-patterns 是 \`/lab:write\` 的内部能力，不是让用户额外学习的新命令；用户仍然只需要输入 \`/lab-write\`。
+- 使用参考论文时，必须读取 \`.lab/writing/reference-patterns/aggregate-template-playbook.md\`、当前 section 对应的 \`.lab/writing/reference-patterns/section-templates/<section>.json\`，以及需要图表时的 \`.lab/writing/reference-patterns/visual-templates/experiment-assets.json\`。
+- 写 experiments 时，如果存在 \`.lab/writing/reference-patterns/section-templates/experiments-protocol.json\`，还必须读取它，并保留其中的协议槽位逻辑：数据集描述、数据集统计或 appendix 链接、split 或 sampling 协议、baseline / 对比方法设置、指标定义、implementation / tuning 细节、主结果、消融、敏感性或分析。
+- 不要把 \`setup\`、\`overall performance\` 这类粗标签当成完整的 experiments 结构抽取；如果参考论文里有单独的 dataset、baseline、metric、implementation 或 appendix-detail 段落，就必须把这些细槽位抽出来并在 mini-outline 中体现。
+- 参考论文只允许复现结构、段落功能、图表作用、放置逻辑和前后桥接；不得复制措辞、claim、指标、baseline、数据、caption 或结论。
 - 先写 mini-outline 再写 prose。
 - 如果当前 section 带引言、方法、实验、相关工作或结论 claim，先规划需要的表格、figure placeholders 和 citations，再写 prose。
 - 在起草 \`introduction\`、\`method\`、\`experiments\`、\`related work\` 或 \`conclusion\` 之前，必须先运行 \`.lab/.managed/scripts/validate_paper_plan.py --paper-plan .lab/writing/plan.md\`。

package/package-assets/shared/lab/.managed/scripts/extract_reference_paper_structure.py CHANGED Viewed

@@ -87,6 +87,53 @@ CANONICAL_HEADING_TITLES = {
     "appendix",
 }
+EXPERIMENT_PROTOCOL_SLOT_GUIDANCE = {
+    "dataset_description": {
+        "reader_question": "Which datasets define the evaluation scope, and why are they relevant?",
+        "placement_guidance": "place before baselines, metrics, and main results",
+    },
+    "dataset_statistics": {
+        "reader_question": "What dataset scale, feature, treatment, or split facts constrain interpretation?",
+        "placement_guidance": "place near dataset descriptions or move detailed statistics to appendix with a main-text pointer",
+    },
+    "split_protocol": {
+        "reader_question": "How are train, validation, test, seed, or sampling decisions made?",
+        "placement_guidance": "place before metrics and results so comparisons have a fixed protocol",
+    },
+    "baseline_setup": {
+        "reader_question": "Which comparator families are included, and what role does each comparator play?",
+        "placement_guidance": "place after datasets and before the main comparison table",
+    },
+    "metric_definition": {
+        "reader_question": "Which metrics decide ranking, what do they measure, and which direction is better?",
+        "placement_guidance": "place before the first result table and repeat local definitions in table notes when needed",
+    },
+    "implementation_details": {
+        "reader_question": "Which tuning, validation, training, or hardware details are needed for reproducibility?",
+        "placement_guidance": "place after metrics or in appendix when details are long",
+    },
+    "main_results": {
+        "reader_question": "What is the primary comparison result under the declared protocol?",
+        "placement_guidance": "place after setup, baselines, and metrics",
+    },
+    "ablation": {
+        "reader_question": "Which component or design choice accounts for the claimed effect?",
+        "placement_guidance": "place after the main results",
+    },
+    "sensitivity": {
+        "reader_question": "How stable is the result under relevant protocol or hyperparameter changes?",
+        "placement_guidance": "place after ablations or in an analysis subsection",
+    },
+    "appendix_dataset_statistics": {
+        "reader_question": "Which detailed dataset facts support the compact dataset setup in the main experiments?",
+        "placement_guidance": "link from the main experimental setup and keep detailed tables in appendix",
+    },
+    "appendix_baseline_metric_details": {
+        "reader_question": "Which baseline, metric, or implementation details are too long for the main setup?",
+        "placement_guidance": "link from the main setup and keep long comparator or metric definitions in appendix",
+    },
+}
 @dataclass
 class SectionRecord:
@@ -272,6 +319,8 @@ def normalize_heading(line: str) -> str:
 def section_type_for_title(title: str) -> str:
     lowered = title.lower()
+    if re.match(r"^(appendix\b|[a-z]\.\d+(?:\.\d+)*\.?\s+)", lowered.strip()):
+        return "appendix"
     lowered = re.sub(r"^\d+(?:\.\d+)*\.?\s+", "", lowered).strip()
     for section_type, aliases in SECTION_ALIASES.items():
         if any(alias in lowered for alias in aliases):
@@ -424,6 +473,60 @@ def paragraph_role(section_type: str, paragraph: str) -> str:
         return "visual_or_table_anchor"
     if section_type == "abstract":
         return "abstract_summary"
+    if any(token in lowered for token in ("dataset statistics", "statistics of the dataset", "sample count", "feature dimension")):
+        return "dataset_statistics"
+    if re.search(r"\bdatasets?\s*[:.]", lowered) or any(
+        token in lowered for token in ("benchmark dataset", "public dataset", "semi-synthetic dataset")
+    ):
+        return "dataset_description"
+    if re.search(r"\bbaselines?\s*[:.]", lowered) or any(
+        token in lowered for token in ("baseline family", "comparator", "compare with")
+    ):
+        return "baseline_setup"
+    if re.search(r"\bmetrics?\s*[:.]", lowered) or any(
+        token in lowered
+        for token in (
+            "metric definition",
+            "we report",
+            "primary metric",
+            "secondary metric",
+            "higher is better",
+            "lower is better",
+            "auuc",
+            "qini",
+        )
+    ):
+        return "metric_definition"
+    if any(
+        token in lowered
+        for token in (
+            "train/test",
+            "train, validation",
+            "training split",
+            "validation split",
+            "test split",
+            "random split",
+            "repeated split",
+            "sampling",
+            "seed",
+            "protocol",
+        )
+    ):
+        return "split_protocol"
+    if any(
+        token in lowered
+        for token in (
+            "implementation detail",
+            "hyperparameter",
+            "tuning",
+            "learning rate",
+            "epoch",
+            "batch size",
+            "hardware",
+            "gpu",
+        )
+    ):
+        return "implementation_details"
     if any(token in lowered for token in ("limitation", "future work", "caveat", "drift", "局限")):
         return "limitation_boundary"
     if any(token in lowered for token in ("ablation", "component", "without", "remove")):
@@ -726,6 +829,172 @@ def merge_unique(values: list[str]) -> list[str]:
     return merged
+def normalized_section_title(title: str) -> str:
+    title = re.sub(r"^\d+(?:\.\d+)*\.?\s+", "", title.strip())
+    title = re.sub(r"^[A-Z]\.\d+(?:\.\d+)*\.?\s+", "", title)
+    title = re.sub(r"^Appendix\s+[A-Z](?:\.\d+)*\.?\s*[:\-]?\s*", "", title, flags=re.IGNORECASE)
+    return re.sub(r"\s+", " ", title).strip().lower()
+def experiment_slots_from_signal(section: SectionRecord, role: str, text: str) -> list[str]:
+    title = normalized_section_title(section.title)
+    lowered = f"{title} {text.lower()}"
+    is_appendix = section.section_type == "appendix"
+    slots: list[str] = []
+    if is_appendix and "dataset" in lowered and any(
+        token in lowered for token in ("statistics", "statistic", "summary", "sample", "feature")
+    ):
+        slots.append("appendix_dataset_statistics")
+    if is_appendix and any(token in lowered for token in ("baseline", "metric", "implementation", "hyperparameter")):
+        slots.append("appendix_baseline_metric_details")
+    if role == "dataset_statistics" or (
+        "dataset" in lowered and any(token in lowered for token in ("statistics", "summary", "sample", "feature"))
+    ):
+        slots.append("dataset_statistics")
+    if role == "dataset_description" or "datasets" in title or "dataset" in title:
+        slots.append("dataset_description")
+    if role == "split_protocol" or any(
+        token in lowered
+        for token in ("split", "seed", "sampling", "train/test", "validation", "protocol")
+    ):
+        slots.append("split_protocol")
+    if role == "baseline_setup" or "baseline" in lowered or "comparator" in lowered:
+        slots.append("baseline_setup")
+    if role == "metric_definition" or "metric" in lowered or "auuc" in lowered or "qini" in lowered:
+        slots.append("metric_definition")
+    if role == "implementation_details" or any(
+        token in lowered
+        for token in ("implementation", "hyperparameter", "tuning", "epoch", "learning rate", "hardware")
+    ):
+        slots.append("implementation_details")
+    if "ablation" in lowered:
+        slots.append("ablation")
+    if "sensitivity" in lowered or "shift" in lowered or "trade-off" in lowered or "tradeoff" in lowered:
+        slots.append("sensitivity")
+    if role == "result_interpretation" or any(
+        token in lowered for token in ("main result", "overall performance", "performance", "results and discussion")
+    ):
+        slots.append("main_results")
+    return list(dict.fromkeys(slots))
+def is_experiment_protocol_section(section: SectionRecord) -> bool:
+    if section.section_type in {"experiments", "discussion"}:
+        return True
+    if section.section_type != "appendix":
+        return False
+    title = normalized_section_title(section.title)
+    return any(
+        token in title
+        for token in (
+            "dataset",
+            "baseline",
+            "metric",
+            "experiment",
+            "experimental",
+            "setup",
+            "result",
+            "ablation",
+            "sensitivity",
+            "complexity",
+            "online",
+        )
+    )
+def slot_payload(
+    *,
+    source_paper: str,
+    slot: str,
+    section: SectionRecord,
+    evidence_excerpt: str,
+    paragraph_index: int | None = None,
+    asset_type: str | None = None,
+    asset_id: str | None = None,
+) -> dict:
+    guidance = EXPERIMENT_PROTOCOL_SLOT_GUIDANCE[slot]
+    payload = {
+        "slot": slot,
+        "source_paper": source_paper,
+        "source_heading": section.title,
+        "source_section_type": section.section_type,
+        "paragraph_index": paragraph_index,
+        "evidence_excerpt": short_excerpt(evidence_excerpt),
+        "reader_question": guidance["reader_question"],
+        "placement_guidance": guidance["placement_guidance"],
+        "linked_main_section": "experiments" if slot.startswith("appendix_") else "",
+        "reuse_guidance": "Reuse this protocol role and placement logic only; do not copy wording, claims, metrics, data, or conclusions.",
+    }
+    if asset_type and asset_id:
+        payload["asset_type"] = asset_type
+        payload["asset_id"] = asset_id
+    return payload
+def build_experiment_protocol_slots_for_payload(payload: dict) -> list[dict]:
+    slots: list[dict] = []
+    sections_by_title = {section.title: section for section in payload["sections"]}
+    for role in payload["roles"]:
+        section = sections_by_title.get(role["section_title"])
+        if not section:
+            continue
+        if not is_experiment_protocol_section(section):
+            continue
+        role_slots = experiment_slots_from_signal(section, role["role"], role["excerpt"])
+        if not role_slots:
+            continue
+        for slot in role_slots:
+            slots.append(
+                slot_payload(
+                    source_paper=payload["slug"],
+                    slot=slot,
+                    section=section,
+                    paragraph_index=role["paragraph_index"],
+                    evidence_excerpt=role["excerpt"],
+                )
+            )
+    for asset in payload["assets"]:
+        section = sections_by_title.get(asset.appears_in_title)
+        if not section:
+            continue
+        if not is_experiment_protocol_section(section):
+            continue
+        asset_slots = experiment_slots_from_signal(section, asset.evidence_role, asset.caption)
+        if asset.evidence_role == "dataset_or_protocol" and section.section_type == "appendix":
+            asset_slots = ["appendix_dataset_statistics"]
+        if not asset_slots:
+            continue
+        for slot in asset_slots:
+            slots.append(
+                slot_payload(
+                    source_paper=payload["slug"],
+                    slot=slot,
+                    section=section,
+                    evidence_excerpt=asset.caption,
+                    asset_type=asset.asset_type,
+                    asset_id=asset.asset_id,
+                )
+            )
+    seen: set[tuple[str, str, str, str]] = set()
+    unique_slots: list[dict] = []
+    for item in slots:
+        key = (
+            item["source_paper"],
+            item["slot"],
+            item["source_heading"].lower(),
+            item["evidence_excerpt"].lower(),
+        )
+        if key in seen:
+            continue
+        seen.add(key)
+        unique_slots.append(item)
+    return unique_slots
 def build_section_templates(output_dir: Path, paper_payloads: list[dict]) -> None:
     target = output_dir / "section-templates"
     target.mkdir(parents=True, exist_ok=True)
@@ -774,10 +1043,30 @@ def build_section_templates(output_dir: Path, paper_payloads: list[dict]) -> Non
             "asset_roles": asset_roles,
             "reuse_rule": "Reuse structure only; do not copy wording, claims, metrics, or conclusions from reference papers.",
         }
+        if section_type == "experiments":
+            protocol_slots: list[dict] = []
+            for payload in paper_payloads:
+                protocol_slots.extend(build_experiment_protocol_slots_for_payload(payload))
+            template["experiment_protocol_slots"] = protocol_slots
         (target / f"{section_type}.json").write_text(
             json.dumps(template, indent=2, ensure_ascii=False),
             encoding="utf-8",
         )
+        if section_type == "experiments":
+            (target / "experiments-protocol.json").write_text(
+                json.dumps(
+                    {
+                        "section": "experiments",
+                        "template_id": "experiments-protocol-slots",
+                        "source_papers": template["source_papers"],
+                        "experiment_protocol_slots": template["experiment_protocol_slots"],
+                        "reuse_rule": "Reuse experiment setup topology only: dataset, split, baseline, metric, implementation, result, ablation, sensitivity, and appendix-link roles.",
+                    },
+                    indent=2,
+                    ensure_ascii=False,
+                ),
+                encoding="utf-8",
+            )
 def build_visual_templates(output_dir: Path, paper_payloads: list[dict]) -> None:
@@ -824,9 +1113,10 @@ def write_aggregate_playbook(output_dir: Path, paper_payloads: list[dict]) -> No
             "## Multi-Template Write Procedure",
             "",
             "1. Pick 2-3 closest section templates for the current paper section.",
-            "2. Build a mini-outline from common slots and current-paper evidence.",
-            "3. Add required table/figure assets with local before/after bridge functions.",
-            "4. Draft with current-paper terminology and evidence only.",
+            "2. For experiment sections, preserve protocol slots when present: datasets, splits, baselines, metrics, implementation details, main results, ablations, sensitivity analysis, and appendix links.",
+            "3. Build a mini-outline from common slots and current-paper evidence.",
+            "4. Add required table/figure assets with local before/after bridge functions.",
+            "5. Draft with current-paper terminology and evidence only.",
             "",
             "## Table/Figure Planning Rule",
             "",

package/package-assets/shared/lab/.managed/scripts/validate_section_draft.py CHANGED Viewed

@@ -91,6 +91,14 @@ def has_meaningful_field_value(value: str) -> bool:
     return normalized not in {"", "-", "n/a", "na", "none", "no", "not applicable", "null", "false"}
+def latest_write_iteration(project_root: Path) -> Path | None:
+    iteration_dir = project_root / ".lab" / "writing" / "iterations"
+    if not iteration_dir.exists():
+        return None
+    iteration_files = sorted(iteration_dir.glob("*.md"))
+    return iteration_files[-1] if iteration_files else None
 SECTION_STYLE_WARNINGS = {
     "abstract": [
         (
@@ -432,6 +440,32 @@ def check_active_paper_topology(section_path: Path, issues: list[str]):
     issues.extend(validate_topology_artifacts(project_root))
+def check_reference_template_intake(section_path: Path, issues: list[str]):
+    project_root = find_project_root(section_path)
+    if project_root is None:
+        return
+    reference_root = project_root / ".lab" / "writing" / "reference-patterns"
+    aggregate_playbook = reference_root / "aggregate-template-playbook.md"
+    legacy_notes_root = project_root / ".lab" / "writing" / "pdf-structure-notes"
+    has_legacy_notes = legacy_notes_root.exists() and any(legacy_notes_root.glob("*"))
+    latest_iteration = latest_write_iteration(project_root)
+    reference_sources = ""
+    if latest_iteration:
+        iteration_text = read_text(latest_iteration)
+        reference_sources = extract_markdown_field(
+            iteration_text,
+            "Reference Template Intake",
+            "Reference sources used:",
+        )
+    if (has_legacy_notes or has_meaningful_field_value(reference_sources)) and not aggregate_playbook.exists():
+        issues.append(
+            "reference papers appear to be used without .lab/writing/reference-patterns/aggregate-template-playbook.md; run extract_reference_paper_structure.py and use structured section/visual templates instead of legacy pdf-structure-notes"
+        )
 def check_abstract(text: str, issues: list[str]):
     numbers = re.findall(r"\b\d+(?:\.\d+)?\b", text)
     if len(numbers) > 6:
@@ -672,6 +706,7 @@ def main():
     check_workflow_language_targeting(section_path, blocking_issues)
     check_common_section_gate_risks(text, warning_issues)
     check_section_style_policy(text, args.section, warning_issues)
+    check_reference_template_intake(section_path, warning_issues)
     SECTION_CHECKS[args.section](text, warning_issues)
     check_neighbor_asset_files(args.section, section_path, warning_issues)

package/package-assets/shared/lab/.managed/templates/reference-template-intake.md CHANGED Viewed

@@ -19,6 +19,16 @@
 - Related work:
 - Method:
 - Experiments:
+- Experiments protocol slots:
+  - Dataset descriptions:
+  - Dataset statistics / appendix link:
+  - Split or sampling protocol:
+  - Baseline setup:
+  - Metric definitions:
+  - Implementation and tuning details:
+  - Main results:
+  - Ablations:
+  - Sensitivity or analysis:
 - Conclusion:
 ## Visual/Table Templates

package/package-assets/shared/skills/lab/stages/write.md CHANGED Viewed

@@ -120,6 +120,8 @@ Run these on every round:
 - For every reference table or figure, extract what reader question it answers, which section/subsection it supports, why it is placed there, what the prose before it should do, and what the prose after it should explain.
 - When drafting from reference templates, reproduce structure and logic only. Do not copy wording, claims, metrics, baselines, data, captions, or conclusions from reference papers.
 - Before drafting a section from reference templates, read `.lab/writing/reference-patterns/aggregate-template-playbook.md`, the matching file under `.lab/writing/reference-patterns/section-templates/`, and the matching visual/table template under `.lab/writing/reference-patterns/visual-templates/` when the section uses tables or figures.
+- For experiment sections, also read `.lab/writing/reference-patterns/section-templates/experiments-protocol.json` when it exists and preserve its protocol slot logic: dataset descriptions, dataset statistics, split/sampling protocol, baseline setup, metric definitions, implementation or tuning details, main results, ablations, sensitivity analysis, and appendix-to-main links.
+- Do not accept coarse labels such as “setup” or “overall performance” as a complete experiment-template extraction when the source papers contain explicit dataset, baseline, metric, implementation, or appendix-detail paragraphs.
 - Build a compact mini-outline before prose.
 - Academic readability standards are the same in `workflow_language` and `paper_language`; changing languages must not lower external-reader clarity.
 - If the current round introduces or revises key terms, abbreviations, metric names, mechanism names, or system labels, explain them at first mention by briefly stating what they are and why they matter here.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "superlab",
-  "version": "0.1.65",
+  "version": "0.1.66",
   "description": "Strict /lab research workflow installer for Codex and Claude",
   "keywords": [
     "codex",