npm - superlab - Versions diffs - 0.1.71 → 0.1.73 - Mend

superlab 0.1.71 → 0.1.73

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (55) hide show

package/package-assets/shared/skills/lab/SKILL.md CHANGED Viewed

@@ -43,7 +43,10 @@ Use this skill when the user invokes `/lab:*` or asks for the structured researc
 - Treat missing, stale, or contradictory `Rule Preflight` data as a stage-contract failure.
 - Project-installed rules take priority over model memory. If remembered patterns conflict with the installed rule source, follow the installed source recorded in `.lab/.managed/rule-manifest.json`.
 - Before a `/lab:*` stage reaches a final handoff, write or update one plain-language stage report under `.lab/stage-reports/` from `.lab/.managed/templates/stage-report.md`.
+- The stage report must include `Requested Outcome Mapping`: the user's original request, requested deliverables, completion status for each requested deliverable, and the response shape the user should see.
+- The stage report must include `Repair Control`. For non-repair stages, mark the section as not applicable; for auto repair, record budget, attempts used, failure class, repair hypothesis, evidence-changing knobs, ordinary engineering fixes that remain allowed, unchanged frozen core, forbidden repairs avoided, and confirmation check.
 - The stage report must include a filled `Core Explanation Table` that answers, in workflow language and plain language: background, why now, what was done, how it was done, what worked, what did not work, what was verified, what remains unverified, whether improvement is needed and why, how to improve and why, key evidence, and the continue/stop/revise/rerun/escalate/handoff decision.
+- If the stage says improvement is needed, do not choose `stop` unless the next action states a concrete terminal boundary such as budget exhaustion, frozen-core risk, safety or integrity failure, impossible target, or a required approval boundary. Otherwise choose `continue`, `revise`, `rerun`, or `escalate`.
 - Stage reports are closeout and handoff artifacts, not a new user command and not a replacement for stage-specific artifacts such as idea memos, iteration reports, final reports, or write-iteration records.
 - Run `.lab/.managed/scripts/validate_stage_report.py --stage-report <stage-report> --stage <stage>` before claiming the stage is complete, and include the stage-report path plus validation result in the final user-facing summary.
 - Final paper output should default to LaTeX, and its manuscript language should be decided separately from the workflow language.
@@ -73,6 +76,9 @@ Use this skill when the user invokes `/lab:*` or asks for the structured researc
 - Ask one clarifying question at a time when critical ambiguity remains.
 - State the scenario, the problem, the failure case, and why the problem matters before proposing solutions.
 - Classify the idea by contribution category and breakthrough level.
+- Separate contribution from insight. Contribution is what the work adds; insight is what the work teaches and why it should matter beyond the artifact.
+- Write one reusable core insight anchor sentence so later write and report stages can keep a stable story.
+- Require an insight evidence chain before final recommendation: observation, why existing explanations fail, core insight, mechanism, validation tests, generalization or action implication, and prediction.
 - Compare against existing methods explicitly and state why the idea should be better.
 - Include a closest-prior-work comparison and a plain-language description of how the proposed direction would work.
 - In the final user-facing summary, state what current methods do, why they still fall short, how the proposed direction differs, the rough approach, the main risk, and where to read `.lab/writing/idea.md` plus `.lab/writing/idea-source-log.md`.
@@ -151,6 +157,14 @@ Use this skill when the user invokes `/lab:*` or asks for the structured researc
 - Poll long-running commands until they complete, time out, or hit a stop condition.
 - Update `.lab/context/auto-status.md`, `.lab/context/workflow-state.md`, `.lab/context/decisions.md`, `.lab/context/data-decisions.md`, and `.lab/context/evidence-index.md` as the campaign advances, then refresh the derived handoff files.
 - Keep an explicit approval gate when a proposed action would leave the frozen core defined by the auto-mode contract.
+- A failed metric gate is not by itself a terminal stop in `L2` or `L3` when `iterate` is allowed and loop budget remains. First classify the miss as recoverable or terminal.
+- Treat ordinary target misses, weak effects, overly strong effects, low coverage, placement or extraction mismatch, threshold mismatch, candidate-generation weakness, and no-op deltas as recoverable until a bounded repair attempt rules them out.
+- For recoverable misses, run at least one bounded repair iteration inside the approved envelope before stopping. Generic repair knobs include intervention strength, delivery or placement, detector or scoring threshold, candidate generation, sampling, baseline alignment, extraction/parser fixes, calibration, and control checks.
+- Ordinary engineering fixes do not count against the repair budget when they do not change evidence interpretation: path fixes, dependency fixes, parser bugs, data loading bugs, runner retries, logging, cache invalidation, and result serialization can be fixed directly inside the current envelope.
+- Count evidence-changing repairs against the repair budget: changes to intervention strength, delivery semantics, scoring thresholds, sampling, candidate generation, baseline alignment, calibration, extraction behavior that changes observed evidence, or controls that change the evaluated set.
+- Forbidden repair moves require explicit approval and cannot be used to claim success: changing the primary metric definition, relaxing target thresholds, deleting hard cases, changing labels or ground truth, switching the final test split, changing paper-facing claims, or changing threat model, reviewer profile, dataset scope, or frozen core.
+- A repair pilot that passes is not enough for promotion or final success. Require a confirmation check such as a new seed, holdout, control batch, repeated run, or anomaly check before promotion.
+- Stop without repair only when the report names the terminal boundary: exhausted budget, frozen-core change, approval-required scope change, safety or integrity risk, invalid metric, impossible target, or repeated failed repair attempts.
 ### `/lab:spec`
@@ -218,6 +232,8 @@ Use this skill when the user invokes `/lab:*` or asks for the structured researc
 - Aggregate them with `.lab/.managed/scripts/summarize_iterations.py`.
 - Write the final document with `.lab/.managed/templates/final-report.md`, the managed table summary with `.lab/.managed/templates/main-tables.md`, and the internal handoff with `.lab/.managed/templates/artifact-status.md`.
 - Keep failed attempts and limitations visible.
+- Put the report-level insight near the top: what was learned beyond the produced artifact, what evidence supports it, what action or design implication follows, and what boundary still applies.
+- Use main tables and ablations as diagnostic evidence for the insight rather than only containers for metric values.
 - Update `.lab/context/mission.md`, `.lab/context/eval-protocol.md`, `.lab/context/workflow-state.md`, and `.lab/context/evidence-index.md` with report-level handoff notes, then refresh derived views.
 - If canonical context is still skeletal, hydrate the smallest trustworthy version from frozen artifacts before finalizing the report.
 - If collaborator-critical fields remain missing after hydration, downgrade to an `artifact-anchored interim report` instead of presenting a final collaborator-ready report.
@@ -250,6 +266,8 @@ Use this skill when the user invokes `/lab:*` or asks for the structured researc
 - If a section uses canonical short names or variant labels before the section that formally defines them has been drafted, add a local naming bridge in that section and then keep those labels stable.
 - Keep one canonical natural-language paper-facing name per concept.
 - Once a paper-facing model or ablation label is chosen, reuse the canonical label instead of replacing it with a narrative alias in later prose, tables, or captions.
+- Carry the same core insight anchor through the paper: Introduction creates the contrast, Method turns it into design motivation, Experiments diagnose it with evidence, and Conclusion states the broader principle and boundary.
+- Do not create a standalone `Our Insights` section just to satisfy this; dissolve the insight into motivation, mechanism, evidence, and limitations.
 - Before drafting or polishing, check the current section block in `skills/lab/references/paper-writing/section-style-policies.md` and follow its encouraged, discouraged, and banned expression lists.
 - When the user provides reference PDFs, paper URLs, local reference-paper paths, or asks to write by reference, stay within `/lab:write` but switch to reference-guided deep writing: extract structure, map section/subsection slots, paragraph roles, and table/figure roles to the current paper, record the mapping, and only then draft prose.
 - The reference-consumption plan is not sufficient by itself. The current section must visibly realize the adopted structure slots through subsection or paragraph anchors, table/figure placement, local bridges, and reader-facing prose.

package/package-assets/shared/skills/lab/references/paper-writing/abstract.md CHANGED Viewed

@@ -57,6 +57,20 @@ Introduce the technical challenge, then use one to two sentences to present the
 4. The technical term must be easy to understand; do not create a jump in reading.
 5. This ability is very important for writing a good abstract.
+## Insight Anchor Rule
+Use one mechanism-level insight sentence as the abstract hinge. The sentence should explain why the problem behaves as it does, not only name the proposed method.
+Good shape:
+1. `We observe that [surface failure], suggesting that [root mechanism].`
+2. `This motivates [method/design], which [technical effect].`
+Avoid:
+1. A standalone "Our insight is..." sentence that is disconnected from the challenge.
+2. A method-name sentence that could be deleted without changing the reader's understanding.
 ## Version 3: Multiple Contributions
 Version 3: When there are multiple technical contributions, describe each contribution together with its technical advantage.

package/package-assets/shared/skills/lab/references/paper-writing/conclusion.md CHANGED Viewed

@@ -12,6 +12,19 @@ Close the paper with clear takeaways and credible limitations.
 4. Add limitation paragraph.
 5. End with concrete future direction.
+## Insight Closeout
+The conclusion should not introduce a new insight. It should restate the same core insight anchor as a supported takeaway and turn it into a broader principle or action implication.
+Use this order:
+1. Evidence-backed takeaway.
+2. Broader principle implied by the takeaway.
+3. Boundary that prevents overclaiming.
+4. One future direction that follows from the boundary.
+Avoid repeating the method inventory or ending with generic impact language.
 ## Limitation Guidance
 Prefer limitations tied to task goal/setting boundaries, for example:

package/package-assets/shared/skills/lab/references/paper-writing/experiments.md CHANGED Viewed

@@ -20,6 +20,25 @@ Convince reviewers with complete evidence on effectiveness, causality, and pract
    - Add stress-test scenarios (more complex scenes, rarer cases, noisier inputs, or stricter constraints).
    - Report both gains and failure modes to show realistic boundaries.
+## Insight-Diagnostic Reading
+Experiments should not only prove that a method is strong. They should diagnose whether the paper's insight is true.
+For each main result, ablation, robustness check, or failure analysis, write down:
+1. Which part of the insight this experiment tests.
+2. What alternative explanation the experiment rules out or weakens.
+3. What mechanism the observed pattern supports.
+4. What boundary or failure mode remains.
+Good interpretation shape:
+1. `This result supports the hypothesis that [mechanism], because [observed pattern].`
+2. `The ablation weakens the simpler explanation that [alternative], since [diagnostic contrast].`
+3. `The remaining failures indicate that [boundary], rather than [overclaim].`
+Avoid paragraphs that only say "Table X shows Y improves by Z." The table already contains the number; prose should explain what the number teaches.
 ## Experiment Planning
 ```mermaid

package/package-assets/shared/skills/lab/references/paper-writing/introduction.md CHANGED Viewed

@@ -56,12 +56,27 @@ graph LR
 3. What are the benefits of our contributions, why can they solve this technical challenge, and what new insight do they bring? (important)
 4. How do we use prior methods to lead readers to our solved challenge and our new insight?
+### Insight-driven introduction pass
+Before drafting the final introduction, write one core insight anchor sentence and test whether every paragraph points toward it.
+Use this causal arc:
+1. Conventional assumption or prior explanation.
+2. Observation, anomaly, or failure that this assumption does not explain.
+3. Root mechanism or insight.
+4. Method or evaluation introduced as a way to test or exploit that insight.
+5. Boundary of what the evidence can and cannot claim.
+Avoid making insight a separate subsection. The introduction should let the reader feel the contrast before the method name appears.
 ### Forward story (write in this order)
 1. Introduce the paper's task.
 2. Use prior methods to lead to the technical challenge we solve.
-3. Present xx contributions to solve this technical challenge.
-4. Explain technical advantages of our contributions and explicitly express our new insight. (important)
+3. State the insight as the root explanation for that challenge.
+4. Present xx contributions to solve this technical challenge.
+5. Explain technical advantages of our contributions and explicitly express our new insight. (important)
 ## Section Skeleton

package/package-assets/shared/skills/lab/references/paper-writing/method.md CHANGED Viewed

@@ -23,6 +23,15 @@ Recommended organization:
 3. Organize answers as a mind map or a table for clarity.
+Add an insight-to-design row before module details:
+1. What is the paper's core insight anchor?
+2. Which failure mechanism does the method need to model, block, separate, or expose?
+3. Which design choice follows from that mechanism?
+4. What prediction would be false if the mechanism were wrong?
+Method writing should read as "because this mechanism exists, this design is necessary", not as an inventory of modules.
 ## Method Writing Steps
 `Method writing steps: (1) draw pipeline figure sketch, (2) map subsections from the sketch, (3) plan each subsection with motivation/design/advantages, (4) write module design first, (5) then add motivation and technical advantages.`
@@ -52,6 +61,7 @@ Definition:
 1. Explain why this module is needed.
 2. Use problem-driven logic: because problem X exists, we design module Y.
+3. Tie the problem back to the core insight when possible, so the module feels derived rather than arbitrary.
 ### 3) Technical Advantages of This Module

package/package-assets/shared/skills/lab/references/paper-writing/section-style-policies.md CHANGED Viewed

@@ -19,6 +19,7 @@ These are paper-facing defaults. They are not project-specific branding rules.
 - Direct problem statements.
 - Explicit gap language tied to prior work.
 - One-sentence mechanism summaries.
+- Challenge -> insight -> contribution progression.
 - Bounded result claims with concrete scope.
 **Discouraged expressions**
@@ -32,6 +33,7 @@ These are paper-facing defaults. They are not project-specific branding rules.
 - Unbounded superiority claims such as "universally", "always", or "in every setting".
 - Service-style or AI-assistant meta language such as "用户说", "按你的要求", "我来解释", "let me explain", or "as requested by the user".
 - Workflow-only placeholder language such as "图的意图", "资产意图", "占位符", "workflow-language", or "sync this wording".
+- Standalone insight headings such as "Our Insights" when the insight is not woven into the abstract's challenge and contribution arc.
 ## Introduction
@@ -40,6 +42,7 @@ These are paper-facing defaults. They are not project-specific branding rules.
 **Encouraged expressions**
 - Problem -> gap -> challenge -> contribution progression.
+- Common assumption -> observed anomaly -> root mechanism progression.
 - Explicit prior-work limitation statements.
 - Clear contribution bullets or equivalent prose.
 - Early explanation of task setting and scope.
@@ -53,6 +56,7 @@ These are paper-facing defaults. They are not project-specific branding rules.
 - Empty macro-importance claims such as "this problem is increasingly critical" with no concrete consequence.
 - Marketing-style first-claim language such as "revolutionary", "game-changing", or "unprecedented" without evidence.
 - Paragraphs that only praise the paper instead of stating the research gap.
+- Standalone "Our Insights" sections; the insight should be part of the motivation and gap logic.
 - Service-style or AI-assistant meta language such as "用户说", "按你的要求", "我来解释", "let me explain", or "as requested by the user".
 - Workflow-only placeholder language such as "图的意图", "资产意图", "占位符", "workflow-language", or "sync this wording".
@@ -85,6 +89,7 @@ These are paper-facing defaults. They are not project-specific branding rules.
 **Encouraged expressions**
 - Motivation -> design -> technical effect progression.
+- Insight -> required design consequence progression.
 - Explicit role statements for modules or steps.
 - Concrete descriptions of information flow and interaction.
 - Local naming bridges when canonical labels appear before their defining section.
@@ -97,6 +102,7 @@ These are paper-facing defaults. They are not project-specific branding rules.
 **Banned expressions / moves**
 - Marketing-style or self-promotional wording such as "elegant", "powerful", "dramatically stronger", or "significantly outperforms prior methods" when used as prose decoration rather than evidence-backed result reporting.
 - Explaining the method by saying it is "better", "stronger", or "more advanced" without saying how it works.
+- Method subsections that read like API documentation without explaining which mechanism or insight requires the design.
 - Introducing new narrative aliases for canonical model or ablation labels after they have already been locked.
 - Service-style or AI-assistant meta language such as "用户说", "按你的要求", "我来解释", "let me explain", or "as requested by the user".
 - Workflow-only placeholder language such as "图的意图", "资产意图", "占位符", "workflow-language", or "sync this wording".
@@ -109,13 +115,15 @@ These are paper-facing defaults. They are not project-specific branding rules.
 **Encouraged expressions**
 - Direct statements of protocol, metric definition, and comparison scope.
 - Immediate result reporting with concrete numbers.
-- Short interpretation tied to the table or figure.
+- Short diagnostic interpretation tied to the mechanism tested by the table or figure.
+- Ablation prose that says which alternative explanation is weakened.
 - Explicit limitations or boundary statements after the result.
 **Discouraged expressions**
 - Long policy or deployment discussion after every table.
 - Re-explaining the same metric in every paragraph.
 - Paragraphs that only restate the table without synthesis.
+- Result paragraphs that say only "higher/lower/better" without explaining what the pattern teaches.
 **Banned expressions / moves**
 - Meta-reader guidance such as "这样读者可以……", "the reader can first...", or "this table lets the reader...".
@@ -133,6 +141,7 @@ These are paper-facing defaults. They are not project-specific branding rules.
 **Encouraged expressions**
 - Short recap of the paper's supported findings.
+- Broader principle implied by the supported findings.
 - Boundary or limitation statement.
 - One concrete next step or open question.

package/package-assets/shared/skills/lab/stages/auto.md CHANGED Viewed

@@ -125,6 +125,23 @@
 - Before each rung and before each success, stop, or promotion decision, re-check the generic academic-risk questions: setting semantics, visibility/leakage, anchor or label policy, scale comparability, metric validity, comparison validity, statistical validity, claim boundary, and integrity self-check.
 - Before each success, stop, or promotion decision, also re-check the anomaly policy: whether anomaly signals fired, whether simpler explanations were ruled out, whether a cross-check was performed, and whether the current interpretation is still the narrowest supported one.
+## Gate Miss And Repair Loop
+- A gate miss is not automatically a terminal stop for `L2` or `L3` when `iterate` is allowed and the loop budget remains.
+- After any failed metric gate, classify the miss before writing a terminal outcome:
+  - recoverable: ordinary target miss, weak effect, overly strong effect, low coverage, placement or extraction mismatch, threshold mismatch, candidate-generation weakness, no-op delta, or noisy split
+  - terminal: budget exhausted, frozen-core change required, approval-required scope change, safety or integrity risk, invalid metric, impossible target, or repeated failed repair attempts
+- For a recoverable miss, run at least one bounded repair iteration inside the approved envelope before stopping. The repair must state the hypothesis, the specific knob changed, the unchanged frozen core, and the validation command.
+- Generic repair knobs include intervention strength, delivery channel or placement, detector/scoring threshold, candidate generation, sampling or stratification, baseline alignment, extraction/parser behavior, calibration, and control checks.
+- Separate ordinary engineering fixes from evidence-changing repairs. Ordinary fixes such as path repair, parser bugs, dependency setup, data loading, runner retry, logging, cache invalidation, and result serialization should be fixed directly and do not spend repair budget when they do not change evidence interpretation.
+- Evidence-changing repairs must spend repair budget and be logged: changes to intervention strength, delivery semantics, scoring thresholds, sampling, candidate generation, baseline alignment, calibration, extraction behavior that changes observed evidence, or evaluated control set.
+- Do not over-constrain problem solving: a repair may change multiple coupled knobs when the hypothesis requires it, but the report must name every changed knob and explain why the knobs are coupled.
+- Forbidden repair moves cannot be used to claim success without explicit approval: changing the primary metric definition, relaxing target thresholds, deleting hard cases, changing labels or ground truth, switching the final test split, changing paper-facing claims, or changing threat model, reviewer profile, dataset scope, or frozen core.
+- A repair pilot that passes must go through a confirmation check before promotion or final success. Valid confirmation includes a new seed, holdout, control batch, repeated run, anomaly check, or other predeclared cross-check.
+- For `L3`, prefer continuing through the repair ladder until pass, terminal boundary, or budget exhaustion. Do not pause merely because the first pilot failed.
+- If stopping after a miss, the final outcome and stage report must name the terminal boundary. "The gate failed" alone is not a sufficient stop reason when a plausible repair remains.
+- The user-facing final answer must start from the user's requested deliverables: list each requested table, artifact, or objective; mark completed, failed-gate, repaired, not promoted, or blocked; then give evidence paths and the next action.
 ## Minimum Procedure
 1. Validate the auto-mode contract
@@ -198,5 +215,8 @@
 ## Stage Report Closeout
 - At every stop, failure, escalation, or final handoff, write or update `.lab/stage-reports/<date>--auto--<target>.md` from `.lab/.managed/templates/stage-report.md`.
+- Fill `Requested Outcome Mapping` before the core table so the final answer can be checked against the user's original request rather than only against internal stage state.
+- Fill `Repair Control` with repair budget, attempts used, failure class, repair hypothesis, evidence-changing knobs, ordinary engineering fixes still allowed, unchanged frozen core, forbidden repairs avoided, and confirmation check.
 - Fill the `Core Explanation Table` in plain language: background, why now, what ran, how the loop ran, what worked, what did not work, what was verified, what remains unverified, what needs improvement and why, how to improve and why, key evidence, and the continue/stop/revise/rerun/escalate/handoff decision.
+- If the table says improvement is needed, the next action may be `stop` only when a terminal boundary is explicitly named; otherwise choose `continue`, `revise`, `rerun`, or `escalate`.
 - Run `.lab/.managed/scripts/validate_stage_report.py --stage-report <stage-report> --stage auto` and include the report path plus validation result in the final user-facing summary.

package/package-assets/shared/skills/lab/stages/data.md CHANGED Viewed

@@ -70,5 +70,8 @@
 ## Stage Report Closeout
 - Before final handoff, write or update `.lab/stage-reports/<date>--data--<target>.md` from `.lab/.managed/templates/stage-report.md`.
+- Fill `Requested Outcome Mapping` before the core table so the final answer can be checked against the user's original request rather than only against internal stage state.
+- Fill `Repair Control`; if no repair loop ran, mark it as not applicable and state that ordinary drafting or evidence fixes remain allowed inside the stage contract.
 - Fill the `Core Explanation Table` in plain language: background, why now, what changed, how the dataset package was chosen, what worked, what did not work, what was verified, what remains unverified, what needs improvement and why, how to improve and why, key evidence, and the continue/stop/revise/rerun/escalate/handoff decision.
+- If the table says improvement is needed, the next action may be `stop` only when a terminal boundary is explicitly named; otherwise choose `continue`, `revise`, `rerun`, or `escalate`.
 - Run `.lab/.managed/scripts/validate_stage_report.py --stage-report <stage-report> --stage data` and include the report path plus validation result in the final user-facing summary.

package/package-assets/shared/skills/lab/stages/framing.md CHANGED Viewed

@@ -73,5 +73,8 @@
 ## Stage Report Closeout
 - Before final handoff, write or update `.lab/stage-reports/<date>--framing--<target>.md` from `.lab/.managed/templates/stage-report.md`.
+- Fill `Requested Outcome Mapping` before the core table so the final answer can be checked against the user's original request rather than only against internal stage state.
+- Fill `Repair Control`; if no repair loop ran, mark it as not applicable and state that ordinary drafting or evidence fixes remain allowed inside the stage contract.
 - Fill the `Core Explanation Table` in plain language: background, why now, what naming or framing changed, how it was checked, what worked, what did not work, what was verified, what remains unverified, what needs improvement and why, how to improve and why, key evidence, and the continue/stop/revise/rerun/escalate/handoff decision.
+- If the table says improvement is needed, the next action may be `stop` only when a terminal boundary is explicitly named; otherwise choose `continue`, `revise`, `rerun`, or `escalate`.
 - Run `.lab/.managed/scripts/validate_stage_report.py --stage-report <stage-report> --stage framing` and include the report path plus validation result in the final user-facing summary.

package/package-assets/shared/skills/lab/stages/idea.md CHANGED Viewed

@@ -16,6 +16,8 @@
 - rough plain-language approach description
 - evaluation sketch with the evaluation subject, any proxy or simulator, the main outcome to observe, and the main validity risk
 - tentative contributions stated at idea level, not final paper-facing wording
+- explicit contribution-vs-insight separation: contribution says what the work adds, insight says what the work teaches
+- insight evidence chain: observation, why existing explanations fail, core insight, mechanism, validation tests, generalization or action implication, and prediction
 - convergence status that says what is already source-backed, what is still hypothesis-only, and whether the stage may end with a final recommendation
 - three meaningful points
 - brainstorm pass 1 with 3-4 candidate directions
@@ -67,6 +69,11 @@
 - Before ending the stage, give the user a concise decision summary that states the recommended direction, what current methods do, why they still fall short, how the proposed direction differs, the rough approach, the main risk, and where to read the full idea artifact and source log.
 - If the current evaluation plan uses a proxy, simulator, or synthetic user in place of a real subject, say that explicitly in the idea artifact and explain why it is acceptable at the idea stage.
 - Keep tentative contributions at the idea level. Do not drift into final paper-facing naming, title, or contribution wording; that belongs to `/lab:framing`.
+- Treat contribution and insight as different outputs. Contribution answers what the work adds; insight answers what the community or decision-maker learns and why the idea should generalize beyond the artifact.
+- Do not let a method name, module list, or metric gain substitute for insight. If the method name were removed, the idea should still have a clear observation, explanation, mechanism, and prediction.
+- Write a single core insight anchor sentence that downstream `/lab:write` and `/lab:report` can reuse. It should be a mechanism-level statement, not a method-name sentence.
+- Present insight as a structured evidence chain: observation -> why existing explanations fail -> core insight -> mechanism -> validation tests -> generalization or action implication -> prediction.
+- For academic ideas, the insight should explain external validity and community value. For technical or business reports, the insight should lead to an action, decision rule, or system change.
 - End the stage output with a user-guidance block that tells the user what to decide next, what information would most improve the idea, and which `/lab` stage should follow.
 ## Context Read Set
@@ -105,25 +112,21 @@
 14. Rough approach in plain language
 15. Problem solved in plain language
 16. Why the proposed idea is better
-17. Evaluation sketch
-18. Tentative contributions
-19. Three meaningful points
-20. Candidate approaches and recommendation
-21. Dataset, baseline, and metric candidates
-22. Falsifiable hypothesis
-23. Convergence status
-24. Expert critique
-25. Revised proposal or final recommendation
-26. User guidance
-27. Approval gate
-28. Minimum viable experiment
-29. Idea source log aligned with the two literature sweeps
-## Stage Report Closeout
-- Before final handoff, write or update `.lab/stage-reports/<date>--idea--<target>.md` from `.lab/.managed/templates/stage-report.md`.
-- Fill the `Core Explanation Table` in plain language: background, why now, what idea work was done, how sources and brainstorm passes were used, what worked, what did not work, what was verified, what remains unverified, what needs improvement and why, how to improve and why, key evidence, and the continue/stop/revise/rerun/escalate/handoff decision.
-- Run `.lab/.managed/scripts/validate_stage_report.py --stage-report <stage-report> --stage idea` and include the report path plus validation result in the final user-facing summary.
+17. Contribution vs insight
+18. Insight evidence chain
+19. Evaluation sketch
+20. Tentative contributions
+21. Three meaningful points
+22. Candidate approaches and recommendation
+23. Dataset, baseline, and metric candidates
+24. Falsifiable hypothesis
+25. Convergence status
+26. Expert critique
+27. Revised proposal or final recommendation
+28. User guidance
+29. Approval gate
+30. Minimum viable experiment
+31. Idea source log aligned with the two literature sweeps
 ## Writing Standard
@@ -147,6 +150,8 @@
 - Do not update `.lab/context/mission.md`, `.lab/context/decisions.md`, or `.lab/context/open-questions.md` from rewrite-only mode.
 - Explain what current methods do, why they fall short, and roughly how the proposed idea would work in plain language.
 - Explain what problem the idea actually solves before describing tentative contributions.
+- Before listing tentative contributions, write the insight evidence chain in plain language. It should make the reader see the anomaly or failure first, then accept the proposed explanation.
+- A valid insight should be testable. Include at least one prediction that would be expected if the insight is right and one validation test that could falsify it.
 - Keep the evaluation sketch high-level: who or what is evaluated, what proxy or simulator is used if any, what outcome matters, and what the main validity risk is. Leave full protocol design to later stages.
 - Use the idea stage to say roughly how the idea would be validated and what the minimum viable experiment looks like, but do not freeze sample size, recruitment plan, condition count, questionnaire design, or randomization protocol here.
 - Human-subject experiment design belongs to `/lab:spec`, where recruitment, assignment, measurement, and ethics details can be made explicit.
@@ -156,3 +161,12 @@
 - The final output must be short but decision-capable. Do not hide the key recommendation logic only inside `.lab/writing/idea.md`; summarize the recommended direction, current-method contrast, difference, rough approach, and main risk in the user-facing reply, then point to `.lab/writing/idea.md` and `.lab/writing/idea-source-log.md` for the full detail.
 - Before approval, run `.lab/.managed/scripts/validate_idea_artifact.py --idea <idea-artifact> --source-log .lab/writing/idea-source-log.md --workflow-config .lab/config/workflow.json`.
 - Do not leave `.lab/context/mission.md` as an empty template after convergence; write the approved problem, why it matters, the current benchmark scope, and the approved direction back into canonical context.
+## Stage Report Closeout
+- Before final handoff, write or update `.lab/stage-reports/<date>--idea--<target>.md` from `.lab/.managed/templates/stage-report.md`.
+- Fill `Requested Outcome Mapping` before the core table so the final answer can be checked against the user's original request rather than only against internal stage state.
+- Fill `Repair Control`; if no repair loop ran, mark it as not applicable and state that ordinary drafting or evidence fixes remain allowed inside the stage contract.
+- Fill the `Core Explanation Table` in plain language: background, why now, what idea work ran, how evidence was checked, what worked, what did not work, what was verified, what remains unverified, what needs improvement and why, how to improve and why, key evidence, and the continue/stop/revise/rerun/escalate/handoff decision.
+- If the table says improvement is needed, the next action may be `stop` only when a terminal boundary is explicitly named; otherwise choose `continue`, `revise`, `rerun`, or `escalate`.
+- Run `.lab/.managed/scripts/validate_stage_report.py --stage-report <stage-report> --stage idea` and include the report path plus validation result in the final user-facing summary.

package/package-assets/shared/skills/lab/stages/iterate.md CHANGED Viewed

@@ -82,5 +82,8 @@ If the loop stops without success, record:
 ## Stage Report Closeout
 - Before final handoff, write or update `.lab/stage-reports/<date>--iterate--<target>.md` from `.lab/.managed/templates/stage-report.md`.
+- Fill `Requested Outcome Mapping` before the core table so the final answer can be checked against the user's original request rather than only against internal stage state.
+- Fill `Repair Control`; if no repair loop ran, mark it as not applicable and state that ordinary drafting or evidence fixes remain allowed inside the stage contract.
 - Fill the `Core Explanation Table` in plain language: background, why now, what rounds ran, how the loop evaluated them, what worked, what did not work, what was verified, what remains unverified, what needs improvement and why, how to improve and why, key evidence, and the continue/stop/revise/rerun/escalate/handoff decision.
+- If the table says improvement is needed, the next action may be `stop` only when a terminal boundary is explicitly named; otherwise choose `continue`, `revise`, `rerun`, or `escalate`.
 - Run `.lab/.managed/scripts/validate_stage_report.py --stage-report <stage-report> --stage iterate` and include the report path plus validation result in the final user-facing summary.

package/package-assets/shared/skills/lab/stages/report.md CHANGED Viewed

@@ -7,6 +7,8 @@
 - problem and background in plain language
 - dataset scene notes in plain language
 - contribution summary
+- core insight summary that explains what was learned beyond the produced artifact
+- decision or action implication derived from that insight
 - method overview
 - selected metrics summary
 - plain-language metric guide
@@ -60,6 +62,16 @@
 - Treat `report.md` as an external-review-ready memo. Source sections must not rely on local file paths or internal provenance notes; they must give a few human-readable anchor references instead.
 - Pull the approved method name and contribution bullets out of `.lab/context/terminology-lock.md` when that framing context exists; do not silently drop them from the collaborator-facing report.
 - Explain the method overview in collaborator language: what the method roughly does, what changed relative to the closest prior work or strongest baseline, what those prior methods do, and why they remain insufficient for the approved claim.
+- Explain the report-level insight as a mechanism, not as a slogan: observed phenomenon, why the simplest or prior explanation is insufficient, what mechanism best explains the result, what evidence supports it, what action or design implication follows, and what boundary still applies.
+- Keep insight interpretation in the reader summary, method overview, main result interpretation, ablations, and limitations. Do not hide it in a standalone inspirational paragraph.
+- When results are negative, mixed, or too strong, still write the insight honestly: what the result teaches about the mechanism, target, setup, metric, or boundary, and what action follows.
+- For technical or business reports, state the decision implication as an actionable rule, next experiment, system change, or stop boundary. Do not leave the insight as a theoretical phrase only.
+- Draft reports with three separated revision passes:
+  - Logic / evidence pass: check the bottom-line claim, evidence chain, table-to-claim mapping, failed-run handling, and whether conclusions follow from the approved protocol. Do not polish wording in this pass.
+  - Theory / metric pass: after logic is clean, check terminology, metric definitions, denominators, directionality, baseline meaning, comparison scope, and whether the chosen interpretation fits the field or report audience.
+  - Language / reader pass: only after logic and metric blockers are resolved, improve plain-language explanation, section flow, table-reading guidance, concision, and non-marketing tone.
+- Do not move to language cleanup when the logic / evidence pass or theory / metric pass still has an unresolved blocker; repair the blocker first or downgrade the report to artifact-anchored interim.
+- Default report execution may run the three passes without asking the user between them. Ask only when a failed pass would change the approved claim boundary, metric interpretation, or action recommendation.
 - When citing prior work or baselines in the method overview, include only the few anchor references a collaborator needs, and summarize their role and limitation in one short line each.
 - Report only the few references a collaborator needs to orient themselves quickly; do not turn `report.md` into a full bibliography dump.
 - In `Background Sources`, `Method and Baseline Sources`, and `Metric Sources`, every anchor must include a citation line, one short line about what it established or measures, and one limitation or caveat.
@@ -87,6 +99,8 @@
 - Proactively deliver a user-readable plain-language summary when the stage is reached from `/lab:auto`; do not wait for a separate follow-up request asking what the metrics or tables mean.
 - Treat `report.md` as a user-facing artifact rather than an internal dump. Prefer plain-language explanations before jargon, and explain each metric the first time it matters.
 - Treat contribution bullets as collaborator-facing claim summaries, not as internal TODOs; tie each one to the current evidence boundary.
+- Put the bottom-line insight near the top of the report: one-sentence conclusion, core insight, evidence that supports it, action implication, and biggest risk.
+- Use the main tables as diagnostic evidence for the insight, not just result containers. For each main table, state what mechanism or diagnostic question it addresses and what it does not prove.
 - If a missing assumption would change report interpretation, ask one clarifying question at a time.
 - If there are multiple defensible report framings, present 2-3 approaches with trade-offs and recommend the most evidence-faithful framing before writing.
 - Keep an approval gate when the reporting frame would materially affect what the paper later claims.
@@ -94,5 +108,8 @@
 ## Stage Report Closeout
 - Before final handoff, write or update `.lab/stage-reports/<date>--report--<target>.md` from `.lab/.managed/templates/stage-report.md`.
+- Fill `Requested Outcome Mapping` before the core table so the final answer can be checked against the user's original request rather than only against internal stage state.
+- Fill `Repair Control`; if no repair loop ran, mark it as not applicable and state that ordinary drafting or evidence fixes remain allowed inside the stage contract.
 - Fill the `Core Explanation Table` in plain language: background, why now, what report artifacts were produced, how evidence was carried forward, what worked, what did not work, what was verified, what remains unverified, what needs improvement and why, how to improve and why, key evidence, and the continue/stop/revise/rerun/escalate/handoff decision.
+- If the table says improvement is needed, the next action may be `stop` only when a terminal boundary is explicitly named; otherwise choose `continue`, `revise`, `rerun`, or `escalate`.
 - Run `.lab/.managed/scripts/validate_stage_report.py --stage-report <stage-report> --stage report` and include the report path plus validation result in the final user-facing summary.

package/package-assets/shared/skills/lab/stages/review.md CHANGED Viewed

@@ -62,5 +62,8 @@
 ## Stage Report Closeout
 - Before final handoff, write or update `.lab/stage-reports/<date>--review--<target>.md` from `.lab/.managed/templates/stage-report.md`.
+- Fill `Requested Outcome Mapping` before the core table so the final answer can be checked against the user's original request rather than only against internal stage state.
+- Fill `Repair Control`; if no repair loop ran, mark it as not applicable and state that ordinary drafting or evidence fixes remain allowed inside the stage contract.
 - Fill the `Core Explanation Table` in plain language: background, why now, what was reviewed, how the review was performed, what worked, what did not work, what was verified, what remains unverified, what needs improvement and why, how to improve and why, key evidence, and the continue/stop/revise/rerun/escalate/handoff decision.
+- If the table says improvement is needed, the next action may be `stop` only when a terminal boundary is explicitly named; otherwise choose `continue`, `revise`, `rerun`, or `escalate`.
 - Run `.lab/.managed/scripts/validate_stage_report.py --stage-report <stage-report> --stage review` and include the report path plus validation result in the final user-facing summary.

package/package-assets/shared/skills/lab/stages/run.md CHANGED Viewed

@@ -59,5 +59,8 @@
 ## Stage Report Closeout
 - Before final handoff, write or update `.lab/stage-reports/<date>--run--<target>.md` from `.lab/.managed/templates/stage-report.md`.
+- Fill `Requested Outcome Mapping` before the core table so the final answer can be checked against the user's original request rather than only against internal stage state.
+- Fill `Repair Control`; if no repair loop ran, mark it as not applicable and state that ordinary drafting or evidence fixes remain allowed inside the stage contract.
 - Fill the `Core Explanation Table` in plain language: background, why now, what ran, how it ran, what worked, what did not work, what was verified, what remains unverified, what needs improvement and why, how to improve and why, key evidence, and the continue/stop/revise/rerun/escalate/handoff decision.
+- If the table says improvement is needed, the next action may be `stop` only when a terminal boundary is explicitly named; otherwise choose `continue`, `revise`, `rerun`, or `escalate`.
 - Run `.lab/.managed/scripts/validate_stage_report.py --stage-report <stage-report> --stage run` and include the report path plus validation result in the final user-facing summary.

package/package-assets/shared/skills/lab/stages/spec.md CHANGED Viewed

@@ -76,5 +76,8 @@
 ## Stage Report Closeout
 - Before final handoff, write or update `.lab/stage-reports/<date>--spec--<target>.md` from `.lab/.managed/templates/stage-report.md`.
+- Fill `Requested Outcome Mapping` before the core table so the final answer can be checked against the user's original request rather than only against internal stage state.
+- Fill `Repair Control`; if no repair loop ran, mark it as not applicable and state that ordinary drafting or evidence fixes remain allowed inside the stage contract.
 - Fill the `Core Explanation Table` in plain language: background, why now, what change artifacts were created, how the spec was structured, what worked, what did not work, what was verified, what remains unverified, what needs improvement and why, how to improve and why, key evidence, and the continue/stop/revise/rerun/escalate/handoff decision.
+- If the table says improvement is needed, the next action may be `stop` only when a terminal boundary is explicitly named; otherwise choose `continue`, `revise`, `rerun`, or `escalate`.
 - Run `.lab/.managed/scripts/validate_stage_report.py --stage-report <stage-report> --stage spec` and include the report path plus validation result in the final user-facing summary.

package/package-assets/shared/skills/lab/stages/write.md CHANGED Viewed

@@ -148,6 +148,21 @@ Do not enter prose polish until the current section has passed the reference-con
 - If a section must use canonical short names, model labels, or ablation labels before the section that formally introduces them has been drafted, add a local naming bridge in that section that briefly maps the descriptive phrase to the canonical paper-facing labels and then reuse those labels consistently.
 - Keep one canonical natural-language paper-facing name per concept. Do not let one concept drift across paper-facing names, experiment labels, and internal identifiers.
 - Once a paper-facing model or ablation label is chosen, reuse the canonical label in later prose, tables, captions, and ranking summaries instead of replacing it with a narrative alias.
+- Treat the paper's core insight as an anchor that must be woven through section logic, not as an isolated `Our Insights` subsection.
+- Before drafting, recover the current core insight anchor from `.lab/writing/idea.md`, `.lab/writing/framing.md`, `.lab/writing/plan.md`, or the collaborator report. If no reliable anchor exists, write the best supported one in the write-iteration artifact and mark it as provisional instead of inventing a new paper claim.
+- Use the same insight anchor across Abstract, Introduction, Method, Experiments, and Conclusion unless the evidence changed and the framing artifact is revised.
+- In Introduction, create cognitive contrast: common assumption or prior explanation -> observed failure or anomaly -> root mechanism or insight -> contribution.
+- In Method, make design choices consequences of the insight: why the mechanism requires this decomposition, module, representation, loss, or protocol before explaining how it runs.
+- In Experiments, interpret results diagnostically: say which part of the insight each result, ablation, robustness check, or failure case supports, weakens, or bounds. Do not only read numbers from a table.
+- In Conclusion, state the broader principle or action implication implied by the evidence, then state the boundary. Do not introduce a new insight there.
+- Avoid paper-facing headings such as `Our Insights` or `核心洞见`; if a heading is needed, use normal section roles such as motivation, analysis, ablation, or discussion and let the insight appear in the prose.
+- Nontrivial section work must use three separated revision passes instead of one all-purpose rewrite:
+  - Logic pass: check the paragraph role, claim chain, premise-to-conclusion transition, evidence dependency, and whether the section naturally follows from adjacent sections. Do not polish wording in this pass.
+  - Theory / field pass: after the logic pass is clean, check concept use, field terminology, metric definitions, citation anchors, and whether the chosen framework actually fits the claim. Do not treat fluent language as proof that the theory is right.
+  - Language pass: only after logic and theory blockers are resolved, revise academic tone, sentence rhythm, transitions, concision, and local readability.
+- Do not continue into language polish when the logic pass or theory / field pass still has an unresolved blocker; repair the blocker first or route back to `review`, `iterate`, or `report` if the blocker is evidentiary.
+- Default automation should not require the user to approve every pass. Record the three pass outcomes in the write-iteration artifact and stop for one user question only when a failed pass would change paper-level framing, claims, protocol, or downstream section structure.
+- If the user explicitly asks for interactive or human-in-the-loop rewriting, show the result of each pass and wait before moving from logic -> theory -> language.
 - Before drafting or polishing, check the current section's block in `section-style-policies.md` and follow its encouraged, discouraged, and banned expression lists.
 - Before any additional tighten, compress, or polish pass on the same section, run a section-level acceptance gate first.
 - The section-level acceptance gate is passed only when canonical naming consistency, adjacent-section consistency, claim, metric, and ranking consistency with the current evidence, local clarity, local concision, and section-style compliance are all explicitly checked and no unresolved blocker remains.
@@ -246,6 +261,7 @@ Do not enter prose polish until the current section has passed the reference-con
 - When a round introduces or revises metrics, include a compact metric-glossary note in the user-facing round summary and record the metric-glossary validation in the write-iteration artifact.
 - Record the section-level acceptance gate in the write-iteration artifact before recommending further tightening on the same section.
 - Record section-style policy compliance, any retained discouraged move, and any banned move found in the write-iteration artifact.
+- Record the insight integration audit in the write-iteration artifact: core insight anchor, section role in the insight chain, challenged assumption, mechanism explanation, diagnostic evidence, and whether the prose avoided an isolated insight section.
 - Record the round target layer in the write-iteration artifact as `canonical manuscript`, `workflow-language paper layer`, or `both`.
 - If workflow-language was active and the round still targeted the canonical manuscript, record why canonical-only writing was acceptable in the write-iteration artifact.
 - If both layers were edited, record why the cross-language sync was required and whether it was explicitly requested by the user or required by final-draft/export finalization.
@@ -307,5 +323,8 @@ Do not enter prose polish until the current section has passed the reference-con
 ## Stage Report Closeout
 - Before final handoff, write or update `.lab/stage-reports/<date>--write--<target>.md` from `.lab/.managed/templates/stage-report.md`.
+- Fill `Requested Outcome Mapping` before the core table so the final answer can be checked against the user's original request rather than only against internal stage state.
+- Fill `Repair Control`; if no repair loop ran, mark it as not applicable and state that ordinary drafting or evidence fixes remain allowed inside the stage contract.
 - Fill the `Core Explanation Table` in plain language: background, why now, what section or asset changed, how evidence and writing rules were applied, what worked, what did not work, what was verified, what remains unverified, what needs improvement and why, how to improve and why, key evidence, and the continue/stop/revise/rerun/escalate/handoff decision.
+- If the table says improvement is needed, the next action may be `stop` only when a terminal boundary is explicitly named; otherwise choose `continue`, `revise`, `rerun`, or `escalate`.
 - Run `.lab/.managed/scripts/validate_stage_report.py --stage-report <stage-report> --stage write` and include the report path plus validation result in the final user-facing summary.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "superlab",
-  "version": "0.1.71",
+  "version": "0.1.73",
   "description": "Strict /lab research workflow installer for Codex and Claude",
   "keywords": [
     "codex",