npm - superlab - Versions diffs - 0.1.70 → 0.1.72 - Mend

superlab 0.1.70 → 0.1.72

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (56) hide show

package/package-assets/shared/lab/.managed/templates/stage-report.md ADDED Viewed

@@ -0,0 +1,71 @@
+# Stage Report
+## Rule Preflight
+- Rule source file:
+- Rule source revision:
+- Project version:
+- Resolved stage:
+- Resolved mode:
+- Resolved target:
+- Preflight stamp:
+- Override reason, if any:
+## Stage Identity
+- Stage:
+- Target:
+- Date:
+- Status:
+- Primary artifact:
+- Next owner:
+## Requested Outcome Mapping
+- Original request:
+- Requested deliverables:
+- Completion mapping:
+- Response shape:
+## Repair Control
+- Repair budget:
+- Repair attempts used:
+- Current failure class:
+- Repair hypothesis:
+- Evidence-changing knobs changed:
+- Ordinary engineering fixes allowed:
+- Frozen core unchanged:
+- Forbidden repairs avoided:
+- Confirmation check:
+## Core Explanation Table
+| Question | Plain Answer |
+|---|---|
+| 这是什么阶段？ |  |
+| 背景是什么？ |  |
+| 为什么现在要做？ |  |
+| 这轮具体做了什么？ |  |
+| 怎么做的？ |  |
+| 结果好的地方是什么？ |  |
+| 结果坏的地方是什么？ |  |
+| 这验证了什么？ |  |
+| 还没有验证什么？ |  |
+| 是否需要改进？为什么？ |  |
+| 下一步怎么改？为什么这样改？ |  |
+| 关键证据在哪里？ |  |
+| 现在应该继续、停止、重做还是升级？ |  |
+## Evidence And Artifacts
+- Primary artifact:
+- Supporting artifacts:
+- Validation commands:
+- Known gaps:
+## Next Action
+- Decision: continue / stop / revise / rerun / escalate / handoff
+- Concrete next step:
+- Why this next step:

package/package-assets/shared/lab/.managed/templates/write-iteration.md CHANGED Viewed

@@ -38,6 +38,19 @@
 - Terminology consistency:
 - Five-dimension self-review outcome:
+## Insight Integration
+- Core insight anchor used:
+- Section role in the insight chain:
+- Common assumption or surface explanation challenged:
+- Mechanism or why-explanation added:
+- Evidence or diagnostic result tied to the insight:
+- Did the prose avoid an isolated `Our Insights`-style section:
+- If the section is Introduction, what cognitive contrast was established:
+- If the section is Method, which design choice follows from the insight:
+- If the section is Experiments, which mechanism did the result or ablation diagnose:
+- If the section is Conclusion, what broader principle or action implication was stated:
 ## Terminology Clarity
 - Key terms introduced or revised this round:

package/package-assets/shared/skills/lab/SKILL.md CHANGED Viewed

@@ -42,6 +42,13 @@ Use this skill when the user invokes `/lab:*` or asks for the structured researc
 - Generate the `Rule Preflight` block from `.lab/.managed/rule-manifest.json` with the managed preflight renderer instead of handwriting it from memory.
 - Treat missing, stale, or contradictory `Rule Preflight` data as a stage-contract failure.
 - Project-installed rules take priority over model memory. If remembered patterns conflict with the installed rule source, follow the installed source recorded in `.lab/.managed/rule-manifest.json`.
+- Before a `/lab:*` stage reaches a final handoff, write or update one plain-language stage report under `.lab/stage-reports/` from `.lab/.managed/templates/stage-report.md`.
+- The stage report must include `Requested Outcome Mapping`: the user's original request, requested deliverables, completion status for each requested deliverable, and the response shape the user should see.
+- The stage report must include `Repair Control`. For non-repair stages, mark the section as not applicable; for auto repair, record budget, attempts used, failure class, repair hypothesis, evidence-changing knobs, ordinary engineering fixes that remain allowed, unchanged frozen core, forbidden repairs avoided, and confirmation check.
+- The stage report must include a filled `Core Explanation Table` that answers, in workflow language and plain language: background, why now, what was done, how it was done, what worked, what did not work, what was verified, what remains unverified, whether improvement is needed and why, how to improve and why, key evidence, and the continue/stop/revise/rerun/escalate/handoff decision.
+- If the stage says improvement is needed, do not choose `stop` unless the next action states a concrete terminal boundary such as budget exhaustion, frozen-core risk, safety or integrity failure, impossible target, or a required approval boundary. Otherwise choose `continue`, `revise`, `rerun`, or `escalate`.
+- Stage reports are closeout and handoff artifacts, not a new user command and not a replacement for stage-specific artifacts such as idea memos, iteration reports, final reports, or write-iteration records.
+- Run `.lab/.managed/scripts/validate_stage_report.py --stage-report <stage-report> --stage <stage>` before claiming the stage is complete, and include the stage-report path plus validation result in the final user-facing summary.
 - Final paper output should default to LaTeX, and its manuscript language should be decided separately from the workflow language.
 - Separate sourced facts from model-generated hypotheses.
 - Preserve failed runs, failed ideas, and limitations.
@@ -69,6 +76,9 @@ Use this skill when the user invokes `/lab:*` or asks for the structured researc
 - Ask one clarifying question at a time when critical ambiguity remains.
 - State the scenario, the problem, the failure case, and why the problem matters before proposing solutions.
 - Classify the idea by contribution category and breakthrough level.
+- Separate contribution from insight. Contribution is what the work adds; insight is what the work teaches and why it should matter beyond the artifact.
+- Write one reusable core insight anchor sentence so later write and report stages can keep a stable story.
+- Require an insight evidence chain before final recommendation: observation, why existing explanations fail, core insight, mechanism, validation tests, generalization or action implication, and prediction.
 - Compare against existing methods explicitly and state why the idea should be better.
 - Include a closest-prior-work comparison and a plain-language description of how the proposed direction would work.
 - In the final user-facing summary, state what current methods do, why they still fall short, how the proposed direction differs, the rough approach, the main risk, and where to read `.lab/writing/idea.md` plus `.lab/writing/idea-source-log.md`.
@@ -147,6 +157,14 @@ Use this skill when the user invokes `/lab:*` or asks for the structured researc
 - Poll long-running commands until they complete, time out, or hit a stop condition.
 - Update `.lab/context/auto-status.md`, `.lab/context/workflow-state.md`, `.lab/context/decisions.md`, `.lab/context/data-decisions.md`, and `.lab/context/evidence-index.md` as the campaign advances, then refresh the derived handoff files.
 - Keep an explicit approval gate when a proposed action would leave the frozen core defined by the auto-mode contract.
+- A failed metric gate is not by itself a terminal stop in `L2` or `L3` when `iterate` is allowed and loop budget remains. First classify the miss as recoverable or terminal.
+- Treat ordinary target misses, weak effects, overly strong effects, low coverage, placement or extraction mismatch, threshold mismatch, candidate-generation weakness, and no-op deltas as recoverable until a bounded repair attempt rules them out.
+- For recoverable misses, run at least one bounded repair iteration inside the approved envelope before stopping. Generic repair knobs include intervention strength, delivery or placement, detector or scoring threshold, candidate generation, sampling, baseline alignment, extraction/parser fixes, calibration, and control checks.
+- Ordinary engineering fixes do not count against the repair budget when they do not change evidence interpretation: path fixes, dependency fixes, parser bugs, data loading bugs, runner retries, logging, cache invalidation, and result serialization can be fixed directly inside the current envelope.
+- Count evidence-changing repairs against the repair budget: changes to intervention strength, delivery semantics, scoring thresholds, sampling, candidate generation, baseline alignment, calibration, extraction behavior that changes observed evidence, or controls that change the evaluated set.
+- Forbidden repair moves require explicit approval and cannot be used to claim success: changing the primary metric definition, relaxing target thresholds, deleting hard cases, changing labels or ground truth, switching the final test split, changing paper-facing claims, or changing threat model, reviewer profile, dataset scope, or frozen core.
+- A repair pilot that passes is not enough for promotion or final success. Require a confirmation check such as a new seed, holdout, control batch, repeated run, or anomaly check before promotion.
+- Stop without repair only when the report names the terminal boundary: exhausted budget, frozen-core change, approval-required scope change, safety or integrity risk, invalid metric, impossible target, or repeated failed repair attempts.
 ### `/lab:spec`
@@ -214,6 +232,8 @@ Use this skill when the user invokes `/lab:*` or asks for the structured researc
 - Aggregate them with `.lab/.managed/scripts/summarize_iterations.py`.
 - Write the final document with `.lab/.managed/templates/final-report.md`, the managed table summary with `.lab/.managed/templates/main-tables.md`, and the internal handoff with `.lab/.managed/templates/artifact-status.md`.
 - Keep failed attempts and limitations visible.
+- Put the report-level insight near the top: what was learned beyond the produced artifact, what evidence supports it, what action or design implication follows, and what boundary still applies.
+- Use main tables and ablations as diagnostic evidence for the insight rather than only containers for metric values.
 - Update `.lab/context/mission.md`, `.lab/context/eval-protocol.md`, `.lab/context/workflow-state.md`, and `.lab/context/evidence-index.md` with report-level handoff notes, then refresh derived views.
 - If canonical context is still skeletal, hydrate the smallest trustworthy version from frozen artifacts before finalizing the report.
 - If collaborator-critical fields remain missing after hydration, downgrade to an `artifact-anchored interim report` instead of presenting a final collaborator-ready report.
@@ -246,6 +266,8 @@ Use this skill when the user invokes `/lab:*` or asks for the structured researc
 - If a section uses canonical short names or variant labels before the section that formally defines them has been drafted, add a local naming bridge in that section and then keep those labels stable.
 - Keep one canonical natural-language paper-facing name per concept.
 - Once a paper-facing model or ablation label is chosen, reuse the canonical label instead of replacing it with a narrative alias in later prose, tables, or captions.
+- Carry the same core insight anchor through the paper: Introduction creates the contrast, Method turns it into design motivation, Experiments diagnose it with evidence, and Conclusion states the broader principle and boundary.
+- Do not create a standalone `Our Insights` section just to satisfy this; dissolve the insight into motivation, mechanism, evidence, and limitations.
 - Before drafting or polishing, check the current section block in `skills/lab/references/paper-writing/section-style-policies.md` and follow its encouraged, discouraged, and banned expression lists.
 - When the user provides reference PDFs, paper URLs, local reference-paper paths, or asks to write by reference, stay within `/lab:write` but switch to reference-guided deep writing: extract structure, map section/subsection slots, paragraph roles, and table/figure roles to the current paper, record the mapping, and only then draft prose.
 - The reference-consumption plan is not sufficient by itself. The current section must visibly realize the adopted structure slots through subsection or paragraph anchors, table/figure placement, local bridges, and reader-facing prose.
@@ -317,6 +339,7 @@ Use this skill when the user invokes `/lab:*` or asks for the structured researc
 - No auto start without an explicit autonomy level and `Approval status: approved`.
 - No final report without validated normalized results.
 - No paper-writing round without stable report artifacts, an approved framing artifact, evidence links, and LaTeX manuscript output.
+- No stage-final handoff without a validated plain-language stage report.
 - No final-draft or export round without passing section-quality, claim-safety, and manuscript-delivery validation.
 - No final-draft or export round with mismatched `workflow_language` and `paper_language` unless the latest write iteration records the language decision audit that justified the final manuscript language and the persisted workflow-language paper-layer path.

package/package-assets/shared/skills/lab/references/paper-writing/abstract.md CHANGED Viewed

@@ -57,6 +57,20 @@ Introduce the technical challenge, then use one to two sentences to present the
 4. The technical term must be easy to understand; do not create a jump in reading.
 5. This ability is very important for writing a good abstract.
+## Insight Anchor Rule
+Use one mechanism-level insight sentence as the abstract hinge. The sentence should explain why the problem behaves as it does, not only name the proposed method.
+Good shape:
+1. `We observe that [surface failure], suggesting that [root mechanism].`
+2. `This motivates [method/design], which [technical effect].`
+Avoid:
+1. A standalone "Our insight is..." sentence that is disconnected from the challenge.
+2. A method-name sentence that could be deleted without changing the reader's understanding.
 ## Version 3: Multiple Contributions
 Version 3: When there are multiple technical contributions, describe each contribution together with its technical advantage.

package/package-assets/shared/skills/lab/references/paper-writing/conclusion.md CHANGED Viewed

@@ -12,6 +12,19 @@ Close the paper with clear takeaways and credible limitations.
 4. Add limitation paragraph.
 5. End with concrete future direction.
+## Insight Closeout
+The conclusion should not introduce a new insight. It should restate the same core insight anchor as a supported takeaway and turn it into a broader principle or action implication.
+Use this order:
+1. Evidence-backed takeaway.
+2. Broader principle implied by the takeaway.
+3. Boundary that prevents overclaiming.
+4. One future direction that follows from the boundary.
+Avoid repeating the method inventory or ending with generic impact language.
 ## Limitation Guidance
 Prefer limitations tied to task goal/setting boundaries, for example:

package/package-assets/shared/skills/lab/references/paper-writing/experiments.md CHANGED Viewed

@@ -20,6 +20,25 @@ Convince reviewers with complete evidence on effectiveness, causality, and pract
    - Add stress-test scenarios (more complex scenes, rarer cases, noisier inputs, or stricter constraints).
    - Report both gains and failure modes to show realistic boundaries.
+## Insight-Diagnostic Reading
+Experiments should not only prove that a method is strong. They should diagnose whether the paper's insight is true.
+For each main result, ablation, robustness check, or failure analysis, write down:
+1. Which part of the insight this experiment tests.
+2. What alternative explanation the experiment rules out or weakens.
+3. What mechanism the observed pattern supports.
+4. What boundary or failure mode remains.
+Good interpretation shape:
+1. `This result supports the hypothesis that [mechanism], because [observed pattern].`
+2. `The ablation weakens the simpler explanation that [alternative], since [diagnostic contrast].`
+3. `The remaining failures indicate that [boundary], rather than [overclaim].`
+Avoid paragraphs that only say "Table X shows Y improves by Z." The table already contains the number; prose should explain what the number teaches.
 ## Experiment Planning
 ```mermaid

package/package-assets/shared/skills/lab/references/paper-writing/introduction.md CHANGED Viewed

@@ -56,12 +56,27 @@ graph LR
 3. What are the benefits of our contributions, why can they solve this technical challenge, and what new insight do they bring? (important)
 4. How do we use prior methods to lead readers to our solved challenge and our new insight?
+### Insight-driven introduction pass
+Before drafting the final introduction, write one core insight anchor sentence and test whether every paragraph points toward it.
+Use this causal arc:
+1. Conventional assumption or prior explanation.
+2. Observation, anomaly, or failure that this assumption does not explain.
+3. Root mechanism or insight.
+4. Method or evaluation introduced as a way to test or exploit that insight.
+5. Boundary of what the evidence can and cannot claim.
+Avoid making insight a separate subsection. The introduction should let the reader feel the contrast before the method name appears.
 ### Forward story (write in this order)
 1. Introduce the paper's task.
 2. Use prior methods to lead to the technical challenge we solve.
-3. Present xx contributions to solve this technical challenge.
-4. Explain technical advantages of our contributions and explicitly express our new insight. (important)
+3. State the insight as the root explanation for that challenge.
+4. Present xx contributions to solve this technical challenge.
+5. Explain technical advantages of our contributions and explicitly express our new insight. (important)
 ## Section Skeleton

package/package-assets/shared/skills/lab/references/paper-writing/method.md CHANGED Viewed

@@ -23,6 +23,15 @@ Recommended organization:
 3. Organize answers as a mind map or a table for clarity.
+Add an insight-to-design row before module details:
+1. What is the paper's core insight anchor?
+2. Which failure mechanism does the method need to model, block, separate, or expose?
+3. Which design choice follows from that mechanism?
+4. What prediction would be false if the mechanism were wrong?
+Method writing should read as "because this mechanism exists, this design is necessary", not as an inventory of modules.
 ## Method Writing Steps
 `Method writing steps: (1) draw pipeline figure sketch, (2) map subsections from the sketch, (3) plan each subsection with motivation/design/advantages, (4) write module design first, (5) then add motivation and technical advantages.`
@@ -52,6 +61,7 @@ Definition:
 1. Explain why this module is needed.
 2. Use problem-driven logic: because problem X exists, we design module Y.
+3. Tie the problem back to the core insight when possible, so the module feels derived rather than arbitrary.
 ### 3) Technical Advantages of This Module

package/package-assets/shared/skills/lab/references/paper-writing/section-style-policies.md CHANGED Viewed

@@ -19,6 +19,7 @@ These are paper-facing defaults. They are not project-specific branding rules.
 - Direct problem statements.
 - Explicit gap language tied to prior work.
 - One-sentence mechanism summaries.
+- Challenge -> insight -> contribution progression.
 - Bounded result claims with concrete scope.
 **Discouraged expressions**
@@ -32,6 +33,7 @@ These are paper-facing defaults. They are not project-specific branding rules.
 - Unbounded superiority claims such as "universally", "always", or "in every setting".
 - Service-style or AI-assistant meta language such as "用户说", "按你的要求", "我来解释", "let me explain", or "as requested by the user".
 - Workflow-only placeholder language such as "图的意图", "资产意图", "占位符", "workflow-language", or "sync this wording".
+- Standalone insight headings such as "Our Insights" when the insight is not woven into the abstract's challenge and contribution arc.
 ## Introduction
@@ -40,6 +42,7 @@ These are paper-facing defaults. They are not project-specific branding rules.
 **Encouraged expressions**
 - Problem -> gap -> challenge -> contribution progression.
+- Common assumption -> observed anomaly -> root mechanism progression.
 - Explicit prior-work limitation statements.
 - Clear contribution bullets or equivalent prose.
 - Early explanation of task setting and scope.
@@ -53,6 +56,7 @@ These are paper-facing defaults. They are not project-specific branding rules.
 - Empty macro-importance claims such as "this problem is increasingly critical" with no concrete consequence.
 - Marketing-style first-claim language such as "revolutionary", "game-changing", or "unprecedented" without evidence.
 - Paragraphs that only praise the paper instead of stating the research gap.
+- Standalone "Our Insights" sections; the insight should be part of the motivation and gap logic.
 - Service-style or AI-assistant meta language such as "用户说", "按你的要求", "我来解释", "let me explain", or "as requested by the user".
 - Workflow-only placeholder language such as "图的意图", "资产意图", "占位符", "workflow-language", or "sync this wording".
@@ -85,6 +89,7 @@ These are paper-facing defaults. They are not project-specific branding rules.
 **Encouraged expressions**
 - Motivation -> design -> technical effect progression.
+- Insight -> required design consequence progression.
 - Explicit role statements for modules or steps.
 - Concrete descriptions of information flow and interaction.
 - Local naming bridges when canonical labels appear before their defining section.
@@ -97,6 +102,7 @@ These are paper-facing defaults. They are not project-specific branding rules.
 **Banned expressions / moves**
 - Marketing-style or self-promotional wording such as "elegant", "powerful", "dramatically stronger", or "significantly outperforms prior methods" when used as prose decoration rather than evidence-backed result reporting.
 - Explaining the method by saying it is "better", "stronger", or "more advanced" without saying how it works.
+- Method subsections that read like API documentation without explaining which mechanism or insight requires the design.
 - Introducing new narrative aliases for canonical model or ablation labels after they have already been locked.
 - Service-style or AI-assistant meta language such as "用户说", "按你的要求", "我来解释", "let me explain", or "as requested by the user".
 - Workflow-only placeholder language such as "图的意图", "资产意图", "占位符", "workflow-language", or "sync this wording".
@@ -109,13 +115,15 @@ These are paper-facing defaults. They are not project-specific branding rules.
 **Encouraged expressions**
 - Direct statements of protocol, metric definition, and comparison scope.
 - Immediate result reporting with concrete numbers.
-- Short interpretation tied to the table or figure.
+- Short diagnostic interpretation tied to the mechanism tested by the table or figure.
+- Ablation prose that says which alternative explanation is weakened.
 - Explicit limitations or boundary statements after the result.
 **Discouraged expressions**
 - Long policy or deployment discussion after every table.
 - Re-explaining the same metric in every paragraph.
 - Paragraphs that only restate the table without synthesis.
+- Result paragraphs that say only "higher/lower/better" without explaining what the pattern teaches.
 **Banned expressions / moves**
 - Meta-reader guidance such as "这样读者可以……", "the reader can first...", or "this table lets the reader...".
@@ -133,6 +141,7 @@ These are paper-facing defaults. They are not project-specific branding rules.
 **Encouraged expressions**
 - Short recap of the paper's supported findings.
+- Broader principle implied by the supported findings.
 - Boundary or limitation statement.
 - One concrete next step or open question.

package/package-assets/shared/skills/lab/stages/auto.md CHANGED Viewed

@@ -125,6 +125,23 @@
 - Before each rung and before each success, stop, or promotion decision, re-check the generic academic-risk questions: setting semantics, visibility/leakage, anchor or label policy, scale comparability, metric validity, comparison validity, statistical validity, claim boundary, and integrity self-check.
 - Before each success, stop, or promotion decision, also re-check the anomaly policy: whether anomaly signals fired, whether simpler explanations were ruled out, whether a cross-check was performed, and whether the current interpretation is still the narrowest supported one.
+## Gate Miss And Repair Loop
+- A gate miss is not automatically a terminal stop for `L2` or `L3` when `iterate` is allowed and the loop budget remains.
+- After any failed metric gate, classify the miss before writing a terminal outcome:
+  - recoverable: ordinary target miss, weak effect, overly strong effect, low coverage, placement or extraction mismatch, threshold mismatch, candidate-generation weakness, no-op delta, or noisy split
+  - terminal: budget exhausted, frozen-core change required, approval-required scope change, safety or integrity risk, invalid metric, impossible target, or repeated failed repair attempts
+- For a recoverable miss, run at least one bounded repair iteration inside the approved envelope before stopping. The repair must state the hypothesis, the specific knob changed, the unchanged frozen core, and the validation command.
+- Generic repair knobs include intervention strength, delivery channel or placement, detector/scoring threshold, candidate generation, sampling or stratification, baseline alignment, extraction/parser behavior, calibration, and control checks.
+- Separate ordinary engineering fixes from evidence-changing repairs. Ordinary fixes such as path repair, parser bugs, dependency setup, data loading, runner retry, logging, cache invalidation, and result serialization should be fixed directly and do not spend repair budget when they do not change evidence interpretation.
+- Evidence-changing repairs must spend repair budget and be logged: changes to intervention strength, delivery semantics, scoring thresholds, sampling, candidate generation, baseline alignment, calibration, extraction behavior that changes observed evidence, or evaluated control set.
+- Do not over-constrain problem solving: a repair may change multiple coupled knobs when the hypothesis requires it, but the report must name every changed knob and explain why the knobs are coupled.
+- Forbidden repair moves cannot be used to claim success without explicit approval: changing the primary metric definition, relaxing target thresholds, deleting hard cases, changing labels or ground truth, switching the final test split, changing paper-facing claims, or changing threat model, reviewer profile, dataset scope, or frozen core.
+- A repair pilot that passes must go through a confirmation check before promotion or final success. Valid confirmation includes a new seed, holdout, control batch, repeated run, anomaly check, or other predeclared cross-check.
+- For `L3`, prefer continuing through the repair ladder until pass, terminal boundary, or budget exhaustion. Do not pause merely because the first pilot failed.
+- If stopping after a miss, the final outcome and stage report must name the terminal boundary. "The gate failed" alone is not a sufficient stop reason when a plausible repair remains.
+- The user-facing final answer must start from the user's requested deliverables: list each requested table, artifact, or objective; mark completed, failed-gate, repaired, not promoted, or blocked; then give evidence paths and the next action.
 ## Minimum Procedure
 1. Validate the auto-mode contract
@@ -194,3 +211,12 @@
 - If the user chooses to convert, persist `paper_language_finalization_decision: convert-to-paper-language`
 - While the real experiment process is still alive, emit only a progress update and keep waiting. Do not present a terminal summary for that rung until the process exits or the rung hits an explicit stop boundary.
 - While the loop is healthy, do not ask the user to trigger the next poll. Keep polling until a meaningful change, keepalive boundary, stop boundary, escalation boundary, or terminal boundary is reached.
+## Stage Report Closeout
+- At every stop, failure, escalation, or final handoff, write or update `.lab/stage-reports/<date>--auto--<target>.md` from `.lab/.managed/templates/stage-report.md`.
+- Fill `Requested Outcome Mapping` before the core table so the final answer can be checked against the user's original request rather than only against internal stage state.
+- Fill `Repair Control` with repair budget, attempts used, failure class, repair hypothesis, evidence-changing knobs, ordinary engineering fixes still allowed, unchanged frozen core, forbidden repairs avoided, and confirmation check.
+- Fill the `Core Explanation Table` in plain language: background, why now, what ran, how the loop ran, what worked, what did not work, what was verified, what remains unverified, what needs improvement and why, how to improve and why, key evidence, and the continue/stop/revise/rerun/escalate/handoff decision.
+- If the table says improvement is needed, the next action may be `stop` only when a terminal boundary is explicitly named; otherwise choose `continue`, `revise`, `rerun`, or `escalate`.
+- Run `.lab/.managed/scripts/validate_stage_report.py --stage-report <stage-report> --stage auto` and include the report path plus validation result in the final user-facing summary.

package/package-assets/shared/skills/lab/stages/data.md CHANGED Viewed

@@ -66,3 +66,12 @@
 6. Recommended approved dataset package
 7. Risks and exclusions
 8. Approval gate
+## Stage Report Closeout
+- Before final handoff, write or update `.lab/stage-reports/<date>--data--<target>.md` from `.lab/.managed/templates/stage-report.md`.
+- Fill `Requested Outcome Mapping` before the core table so the final answer can be checked against the user's original request rather than only against internal stage state.
+- Fill `Repair Control`; if no repair loop ran, mark it as not applicable and state that ordinary drafting or evidence fixes remain allowed inside the stage contract.
+- Fill the `Core Explanation Table` in plain language: background, why now, what changed, how the dataset package was chosen, what worked, what did not work, what was verified, what remains unverified, what needs improvement and why, how to improve and why, key evidence, and the continue/stop/revise/rerun/escalate/handoff decision.
+- If the table says improvement is needed, the next action may be `stop` only when a terminal boundary is explicitly named; otherwise choose `continue`, `revise`, `rerun`, or `escalate`.
+- Run `.lab/.managed/scripts/validate_stage_report.py --stage-report <stage-report> --stage data` and include the report path plus validation result in the final user-facing summary.

package/package-assets/shared/skills/lab/stages/framing.md CHANGED Viewed

@@ -69,3 +69,12 @@
 5. Recommended framing pack
 6. Forbidden claims and wording
 7. Approval gate
+## Stage Report Closeout
+- Before final handoff, write or update `.lab/stage-reports/<date>--framing--<target>.md` from `.lab/.managed/templates/stage-report.md`.
+- Fill `Requested Outcome Mapping` before the core table so the final answer can be checked against the user's original request rather than only against internal stage state.
+- Fill `Repair Control`; if no repair loop ran, mark it as not applicable and state that ordinary drafting or evidence fixes remain allowed inside the stage contract.
+- Fill the `Core Explanation Table` in plain language: background, why now, what naming or framing changed, how it was checked, what worked, what did not work, what was verified, what remains unverified, what needs improvement and why, how to improve and why, key evidence, and the continue/stop/revise/rerun/escalate/handoff decision.
+- If the table says improvement is needed, the next action may be `stop` only when a terminal boundary is explicitly named; otherwise choose `continue`, `revise`, `rerun`, or `escalate`.
+- Run `.lab/.managed/scripts/validate_stage_report.py --stage-report <stage-report> --stage framing` and include the report path plus validation result in the final user-facing summary.

package/package-assets/shared/skills/lab/stages/idea.md CHANGED Viewed

@@ -16,6 +16,8 @@
 - rough plain-language approach description
 - evaluation sketch with the evaluation subject, any proxy or simulator, the main outcome to observe, and the main validity risk
 - tentative contributions stated at idea level, not final paper-facing wording
+- explicit contribution-vs-insight separation: contribution says what the work adds, insight says what the work teaches
+- insight evidence chain: observation, why existing explanations fail, core insight, mechanism, validation tests, generalization or action implication, and prediction
 - convergence status that says what is already source-backed, what is still hypothesis-only, and whether the stage may end with a final recommendation
 - three meaningful points
 - brainstorm pass 1 with 3-4 candidate directions
@@ -67,6 +69,11 @@
 - Before ending the stage, give the user a concise decision summary that states the recommended direction, what current methods do, why they still fall short, how the proposed direction differs, the rough approach, the main risk, and where to read the full idea artifact and source log.
 - If the current evaluation plan uses a proxy, simulator, or synthetic user in place of a real subject, say that explicitly in the idea artifact and explain why it is acceptable at the idea stage.
 - Keep tentative contributions at the idea level. Do not drift into final paper-facing naming, title, or contribution wording; that belongs to `/lab:framing`.
+- Treat contribution and insight as different outputs. Contribution answers what the work adds; insight answers what the community or decision-maker learns and why the idea should generalize beyond the artifact.
+- Do not let a method name, module list, or metric gain substitute for insight. If the method name were removed, the idea should still have a clear observation, explanation, mechanism, and prediction.
+- Write a single core insight anchor sentence that downstream `/lab:write` and `/lab:report` can reuse. It should be a mechanism-level statement, not a method-name sentence.
+- Present insight as a structured evidence chain: observation -> why existing explanations fail -> core insight -> mechanism -> validation tests -> generalization or action implication -> prediction.
+- For academic ideas, the insight should explain external validity and community value. For technical or business reports, the insight should lead to an action, decision rule, or system change.
 - End the stage output with a user-guidance block that tells the user what to decide next, what information would most improve the idea, and which `/lab` stage should follow.
 ## Context Read Set
@@ -105,19 +112,21 @@
 14. Rough approach in plain language
 15. Problem solved in plain language
 16. Why the proposed idea is better
-17. Evaluation sketch
-18. Tentative contributions
-19. Three meaningful points
-20. Candidate approaches and recommendation
-21. Dataset, baseline, and metric candidates
-22. Falsifiable hypothesis
-23. Convergence status
-24. Expert critique
-25. Revised proposal or final recommendation
-26. User guidance
-27. Approval gate
-28. Minimum viable experiment
-29. Idea source log aligned with the two literature sweeps
+17. Contribution vs insight
+18. Insight evidence chain
+19. Evaluation sketch
+20. Tentative contributions
+21. Three meaningful points
+22. Candidate approaches and recommendation
+23. Dataset, baseline, and metric candidates
+24. Falsifiable hypothesis
+25. Convergence status
+26. Expert critique
+27. Revised proposal or final recommendation
+28. User guidance
+29. Approval gate
+30. Minimum viable experiment
+31. Idea source log aligned with the two literature sweeps
 ## Writing Standard
@@ -141,6 +150,8 @@
 - Do not update `.lab/context/mission.md`, `.lab/context/decisions.md`, or `.lab/context/open-questions.md` from rewrite-only mode.
 - Explain what current methods do, why they fall short, and roughly how the proposed idea would work in plain language.
 - Explain what problem the idea actually solves before describing tentative contributions.
+- Before listing tentative contributions, write the insight evidence chain in plain language. It should make the reader see the anomaly or failure first, then accept the proposed explanation.
+- A valid insight should be testable. Include at least one prediction that would be expected if the insight is right and one validation test that could falsify it.
 - Keep the evaluation sketch high-level: who or what is evaluated, what proxy or simulator is used if any, what outcome matters, and what the main validity risk is. Leave full protocol design to later stages.
 - Use the idea stage to say roughly how the idea would be validated and what the minimum viable experiment looks like, but do not freeze sample size, recruitment plan, condition count, questionnaire design, or randomization protocol here.
 - Human-subject experiment design belongs to `/lab:spec`, where recruitment, assignment, measurement, and ethics details can be made explicit.
@@ -150,3 +161,12 @@
 - The final output must be short but decision-capable. Do not hide the key recommendation logic only inside `.lab/writing/idea.md`; summarize the recommended direction, current-method contrast, difference, rough approach, and main risk in the user-facing reply, then point to `.lab/writing/idea.md` and `.lab/writing/idea-source-log.md` for the full detail.
 - Before approval, run `.lab/.managed/scripts/validate_idea_artifact.py --idea <idea-artifact> --source-log .lab/writing/idea-source-log.md --workflow-config .lab/config/workflow.json`.
 - Do not leave `.lab/context/mission.md` as an empty template after convergence; write the approved problem, why it matters, the current benchmark scope, and the approved direction back into canonical context.
+## Stage Report Closeout
+- Before final handoff, write or update `.lab/stage-reports/<date>--idea--<target>.md` from `.lab/.managed/templates/stage-report.md`.
+- Fill `Requested Outcome Mapping` before the core table so the final answer can be checked against the user's original request rather than only against internal stage state.
+- Fill `Repair Control`; if no repair loop ran, mark it as not applicable and state that ordinary drafting or evidence fixes remain allowed inside the stage contract.
+- Fill the `Core Explanation Table` in plain language: background, why now, what idea work ran, how evidence was checked, what worked, what did not work, what was verified, what remains unverified, what needs improvement and why, how to improve and why, key evidence, and the continue/stop/revise/rerun/escalate/handoff decision.
+- If the table says improvement is needed, the next action may be `stop` only when a terminal boundary is explicitly named; otherwise choose `continue`, `revise`, `rerun`, or `escalate`.
+- Run `.lab/.managed/scripts/validate_stage_report.py --stage-report <stage-report> --stage idea` and include the report path plus validation result in the final user-facing summary.

package/package-assets/shared/skills/lab/stages/iterate.md CHANGED Viewed

@@ -78,3 +78,12 @@ If the loop stops without success, record:
 - If the next move depends on an unresolved assumption, ask one clarifying question at a time.
 - If more than one next hypothesis is credible, present 2-3 approaches with trade-offs and recommend the next bounded experiment before changing the mission state.
 - Keep an approval gate when a proposed change would alter the frozen mission instead of only changing the implementation hypothesis.
+## Stage Report Closeout
+- Before final handoff, write or update `.lab/stage-reports/<date>--iterate--<target>.md` from `.lab/.managed/templates/stage-report.md`.
+- Fill `Requested Outcome Mapping` before the core table so the final answer can be checked against the user's original request rather than only against internal stage state.
+- Fill `Repair Control`; if no repair loop ran, mark it as not applicable and state that ordinary drafting or evidence fixes remain allowed inside the stage contract.
+- Fill the `Core Explanation Table` in plain language: background, why now, what rounds ran, how the loop evaluated them, what worked, what did not work, what was verified, what remains unverified, what needs improvement and why, how to improve and why, key evidence, and the continue/stop/revise/rerun/escalate/handoff decision.
+- If the table says improvement is needed, the next action may be `stop` only when a terminal boundary is explicitly named; otherwise choose `continue`, `revise`, `rerun`, or `escalate`.
+- Run `.lab/.managed/scripts/validate_stage_report.py --stage-report <stage-report> --stage iterate` and include the report path plus validation result in the final user-facing summary.

package/package-assets/shared/skills/lab/stages/report.md CHANGED Viewed

@@ -7,6 +7,8 @@
 - problem and background in plain language
 - dataset scene notes in plain language
 - contribution summary
+- core insight summary that explains what was learned beyond the produced artifact
+- decision or action implication derived from that insight
 - method overview
 - selected metrics summary
 - plain-language metric guide
@@ -60,6 +62,10 @@
 - Treat `report.md` as an external-review-ready memo. Source sections must not rely on local file paths or internal provenance notes; they must give a few human-readable anchor references instead.
 - Pull the approved method name and contribution bullets out of `.lab/context/terminology-lock.md` when that framing context exists; do not silently drop them from the collaborator-facing report.
 - Explain the method overview in collaborator language: what the method roughly does, what changed relative to the closest prior work or strongest baseline, what those prior methods do, and why they remain insufficient for the approved claim.
+- Explain the report-level insight as a mechanism, not as a slogan: observed phenomenon, why the simplest or prior explanation is insufficient, what mechanism best explains the result, what evidence supports it, what action or design implication follows, and what boundary still applies.
+- Keep insight interpretation in the reader summary, method overview, main result interpretation, ablations, and limitations. Do not hide it in a standalone inspirational paragraph.
+- When results are negative, mixed, or too strong, still write the insight honestly: what the result teaches about the mechanism, target, setup, metric, or boundary, and what action follows.
+- For technical or business reports, state the decision implication as an actionable rule, next experiment, system change, or stop boundary. Do not leave the insight as a theoretical phrase only.
 - When citing prior work or baselines in the method overview, include only the few anchor references a collaborator needs, and summarize their role and limitation in one short line each.
 - Report only the few references a collaborator needs to orient themselves quickly; do not turn `report.md` into a full bibliography dump.
 - In `Background Sources`, `Method and Baseline Sources`, and `Metric Sources`, every anchor must include a citation line, one short line about what it established or measures, and one limitation or caveat.
@@ -87,6 +93,17 @@
 - Proactively deliver a user-readable plain-language summary when the stage is reached from `/lab:auto`; do not wait for a separate follow-up request asking what the metrics or tables mean.
 - Treat `report.md` as a user-facing artifact rather than an internal dump. Prefer plain-language explanations before jargon, and explain each metric the first time it matters.
 - Treat contribution bullets as collaborator-facing claim summaries, not as internal TODOs; tie each one to the current evidence boundary.
+- Put the bottom-line insight near the top of the report: one-sentence conclusion, core insight, evidence that supports it, action implication, and biggest risk.
+- Use the main tables as diagnostic evidence for the insight, not just result containers. For each main table, state what mechanism or diagnostic question it addresses and what it does not prove.
 - If a missing assumption would change report interpretation, ask one clarifying question at a time.
 - If there are multiple defensible report framings, present 2-3 approaches with trade-offs and recommend the most evidence-faithful framing before writing.
 - Keep an approval gate when the reporting frame would materially affect what the paper later claims.
+## Stage Report Closeout
+- Before final handoff, write or update `.lab/stage-reports/<date>--report--<target>.md` from `.lab/.managed/templates/stage-report.md`.
+- Fill `Requested Outcome Mapping` before the core table so the final answer can be checked against the user's original request rather than only against internal stage state.
+- Fill `Repair Control`; if no repair loop ran, mark it as not applicable and state that ordinary drafting or evidence fixes remain allowed inside the stage contract.
+- Fill the `Core Explanation Table` in plain language: background, why now, what report artifacts were produced, how evidence was carried forward, what worked, what did not work, what was verified, what remains unverified, what needs improvement and why, how to improve and why, key evidence, and the continue/stop/revise/rerun/escalate/handoff decision.
+- If the table says improvement is needed, the next action may be `stop` only when a terminal boundary is explicitly named; otherwise choose `continue`, `revise`, `rerun`, or `escalate`.
+- Run `.lab/.managed/scripts/validate_stage_report.py --stage-report <stage-report> --stage report` and include the report path plus validation result in the final user-facing summary.

package/package-assets/shared/skills/lab/stages/review.md CHANGED Viewed

@@ -58,3 +58,12 @@
 - If there are multiple legitimate review framings, present 2-3 approaches with trade-offs and recommend the strictest useful framing.
 - Do not use brainstorming to soften critique; once scope is clear, stay in reviewer mode and deliver findings directly.
 - Call out the strongest remaining alternative explanation and the strongest boundary risk when either one could materially narrow the claim.
+## Stage Report Closeout
+- Before final handoff, write or update `.lab/stage-reports/<date>--review--<target>.md` from `.lab/.managed/templates/stage-report.md`.
+- Fill `Requested Outcome Mapping` before the core table so the final answer can be checked against the user's original request rather than only against internal stage state.
+- Fill `Repair Control`; if no repair loop ran, mark it as not applicable and state that ordinary drafting or evidence fixes remain allowed inside the stage contract.
+- Fill the `Core Explanation Table` in plain language: background, why now, what was reviewed, how the review was performed, what worked, what did not work, what was verified, what remains unverified, what needs improvement and why, how to improve and why, key evidence, and the continue/stop/revise/rerun/escalate/handoff decision.
+- If the table says improvement is needed, the next action may be `stop` only when a terminal boundary is explicitly named; otherwise choose `continue`, `revise`, `rerun`, or `escalate`.
+- Run `.lab/.managed/scripts/validate_stage_report.py --stage-report <stage-report> --stage review` and include the report path plus validation result in the final user-facing summary.

package/package-assets/shared/skills/lab/stages/run.md CHANGED Viewed

@@ -55,3 +55,12 @@
 - If the next run depends on an unresolved assumption, ask one clarifying question at a time.
 - If there are multiple defensible tiny-run options, present 2-3 approaches with trade-offs and recommend the cheapest informative run.
 - Only ask for approval when choosing a run path would materially spend more time or compute than the default smallest experiment.
+## Stage Report Closeout
+- Before final handoff, write or update `.lab/stage-reports/<date>--run--<target>.md` from `.lab/.managed/templates/stage-report.md`.
+- Fill `Requested Outcome Mapping` before the core table so the final answer can be checked against the user's original request rather than only against internal stage state.
+- Fill `Repair Control`; if no repair loop ran, mark it as not applicable and state that ordinary drafting or evidence fixes remain allowed inside the stage contract.
+- Fill the `Core Explanation Table` in plain language: background, why now, what ran, how it ran, what worked, what did not work, what was verified, what remains unverified, what needs improvement and why, how to improve and why, key evidence, and the continue/stop/revise/rerun/escalate/handoff decision.
+- If the table says improvement is needed, the next action may be `stop` only when a terminal boundary is explicitly named; otherwise choose `continue`, `revise`, `rerun`, or `escalate`.
+- Run `.lab/.managed/scripts/validate_stage_report.py --stage-report <stage-report> --stage run` and include the report path plus validation result in the final user-facing summary.

package/package-assets/shared/skills/lab/stages/spec.md CHANGED Viewed

@@ -72,3 +72,12 @@
 - evaluation normalization
 - bounded iteration
 - final report
+## Stage Report Closeout
+- Before final handoff, write or update `.lab/stage-reports/<date>--spec--<target>.md` from `.lab/.managed/templates/stage-report.md`.
+- Fill `Requested Outcome Mapping` before the core table so the final answer can be checked against the user's original request rather than only against internal stage state.
+- Fill `Repair Control`; if no repair loop ran, mark it as not applicable and state that ordinary drafting or evidence fixes remain allowed inside the stage contract.
+- Fill the `Core Explanation Table` in plain language: background, why now, what change artifacts were created, how the spec was structured, what worked, what did not work, what was verified, what remains unverified, what needs improvement and why, how to improve and why, key evidence, and the continue/stop/revise/rerun/escalate/handoff decision.
+- If the table says improvement is needed, the next action may be `stop` only when a terminal boundary is explicitly named; otherwise choose `continue`, `revise`, `rerun`, or `escalate`.
+- Run `.lab/.managed/scripts/validate_stage_report.py --stage-report <stage-report> --stage spec` and include the report path plus validation result in the final user-facing summary.