npm - superlab - Versions diffs - 0.1.28 → 0.1.29 - Mend

superlab 0.1.28 → 0.1.29

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (29) hide show

package/package-assets/shared/skills/lab/SKILL.md CHANGED Viewed

@@ -22,6 +22,9 @@ Use this skill when the user invokes `/lab:*` or asks for the structured researc
 - Write durable artifacts to disk instead of leaving key decisions only in chat.
 - Use `.lab/config/workflow.json` as the global contract for workflow language, paper language, and paper format.
 - Use `.lab/context/` as the shared project state for both Codex and Claude entrypoints.
+- Treat `.lab/context/state.md` as a derived durable research snapshot and evidence-boundary view, not as a primary write target or live workflow scratchpad.
+- Treat `.lab/context/workflow-state.md` as the live workflow tracker for stage, latest update, and immediate next step.
+- Treat `.lab/context/summary.md` as the derived long-horizon summary, `.lab/context/session-brief.md` as the startup brief, and `.lab/context/next-action.md` as the lightweight action card.
 - Use `.lab/context/eval-protocol.md` as the shared evaluation contract for run, iterate, auto, and report stages, including metric glossary and experiment ladder semantics.
 - Treat evaluation semantics as source-backed once evaluation planning starts: metrics, benchmark gates, baseline behavior, comparison implementations, and deviations should come from recorded sources, not memory.
 - Workflow artifacts should follow the installed workflow language.
@@ -34,6 +37,10 @@ Use this skill when the user invokes `/lab:*` or asks for the structured researc
 ### `/lab:idea`
 - Search relevant literature, baselines, datasets, and evaluation metrics before proposing a plan.
+- Start with brainstorm pass 1 over 2-4 candidate directions instead of locking the first idea immediately.
+- Run literature sweep 1 with closest-prior references for each candidate direction before narrowing.
+- Use brainstorm pass 2 to keep only the strongest 1-2 directions and explain what was rejected.
+- Run literature sweep 2 before making a final recommendation or novelty claim.
 - Build a literature-scoping bundle before claiming novelty. The default target is 20 relevant sources unless the field is genuinely too narrow and that exception is written down.
 - Read `.lab/context/mission.md` and `.lab/context/open-questions.md` before drafting.
 - Read `.lab/config/workflow.json` before drafting and follow its `workflow_language` for idea artifacts.
@@ -49,7 +56,8 @@ Use this skill when the user invokes `/lab:*` or asks for the structured researc
 - Include a minimum viable experiment before approval.
 - Keep an explicit approval gate before `/lab:spec`.
 - Write idea artifacts with the template in `.lab/.managed/templates/idea.md`.
-- Run `.lab/.managed/scripts/validate_idea_artifact.py --idea <idea-artifact> --workflow-config .lab/config/workflow.json` before treating the idea as converged.
+- Keep `.lab/writing/idea-source-log.md` as the source-backed search manifest for the two literature sweeps.
+- Run `.lab/.managed/scripts/validate_idea_artifact.py --idea <idea-artifact> --source-log .lab/writing/idea-source-log.md --workflow-config .lab/config/workflow.json` before treating the idea as converged.
 - Update `.lab/context/mission.md`, `.lab/context/decisions.md`, and `.lab/context/open-questions.md` after convergence.
 - Do not leave `.lab/context/mission.md` as a template shell once the problem statement and approved direction are known.
 - Do not implement code in this stage.
@@ -68,7 +76,7 @@ Use this skill when the user invokes `/lab:*` or asks for the structured researc
 - Prefer official benchmark pages, official dataset pages, author project pages, and official repositories before mirrors or reposts.
 - Record download source, license or access constraints, split availability, and main risks.
 - Write the durable dataset artifact with `.lab/.managed/templates/data.md`.
-- Update `.lab/context/data-decisions.md`, `.lab/context/decisions.md`, and `.lab/context/state.md` after convergence.
+- Update `.lab/context/data-decisions.md` and `.lab/context/decisions.md` after convergence, then refresh derived views.
 - Keep an explicit approval gate before `/lab:spec`.
 ### `/lab:framing`
@@ -82,7 +90,7 @@ Use this skill when the user invokes `/lab:*` or asks for the structured researc
 - Avoid acronym-first naming and names that sound like implementation patches or marketing language.
 - Produce 2-3 candidate framing packs with trade-offs before recommending one.
 - Write the durable framing artifact with `.lab/.managed/templates/framing.md`.
-- Update `.lab/writing/framing.md`, `.lab/context/decisions.md`, and `.lab/context/state.md` after convergence.
+- Update `.lab/writing/framing.md`, `.lab/context/decisions.md`, and `.lab/context/terminology-lock.md` after convergence, then refresh derived views.
 - Keep an explicit approval gate before `/lab:write`.
 ### `/lab:auto`
@@ -96,9 +104,9 @@ Use this skill when the user invokes `/lab:*` or asks for the structured researc
 - Reuse `/lab:run`, `/lab:iterate`, `/lab:review`, `/lab:report`, and optional `/lab:write` instead of inventing a second workflow.
 - Do not automatically change the research mission, paper-facing framing, or core claims.
 - You may add exploratory datasets, benchmarks, and comparison methods inside the approved exploration envelope.
-- You may promote an exploratory addition to the primary package only after the promotion policy in `auto-mode.md` is satisfied and the promotion is written back into `.lab/context/data-decisions.md`, `.lab/context/decisions.md`, `.lab/context/state.md`, and `.lab/context/workflow-state.md`.
+- You may promote an exploratory addition to the primary package only after the promotion policy in `auto-mode.md` is satisfied and the promotion is written back into `.lab/context/data-decisions.md`, `.lab/context/decisions.md`, and `.lab/context/workflow-state.md`, then refresh derived views.
 - Poll long-running commands until they complete, time out, or hit a stop condition.
-- Update `.lab/context/auto-status.md`, `.lab/context/state.md`, `.lab/context/workflow-state.md`, `.lab/context/decisions.md`, `.lab/context/data-decisions.md`, and `.lab/context/evidence-index.md` as the campaign advances, then refresh the derived handoff files.
+- Update `.lab/context/auto-status.md`, `.lab/context/workflow-state.md`, `.lab/context/decisions.md`, `.lab/context/data-decisions.md`, and `.lab/context/evidence-index.md` as the campaign advances, then refresh the derived handoff files.
 - Keep an explicit approval gate when a proposed action would leave the frozen core defined by the auto-mode contract.
 ### `/lab:spec`
@@ -107,7 +115,7 @@ Use this skill when the user invokes `/lab:*` or asks for the structured researc
 - Read `.lab/context/mission.md`, `.lab/context/decisions.md`, `.lab/context/state.md`, `.lab/context/workflow-state.md`, and `.lab/context/data-decisions.md` before drafting the change.
 - Use `.lab/changes/<change-id>/` as the canonical lab change directory.
 - Convert the approved idea into lab change artifacts using `.lab/.managed/templates/proposal.md`, `.lab/.managed/templates/design.md`, `.lab/.managed/templates/spec.md`, and `.lab/.managed/templates/tasks.md`.
-- Update `.lab/context/state.md` and `.lab/context/decisions.md` after freezing the spec.
+- Update `.lab/context/decisions.md` after freezing the spec, then refresh derived views.
 - Do not skip task definition.
 ### `/lab:run`
@@ -118,7 +126,7 @@ Use this skill when the user invokes `/lab:*` or asks for the structured researc
 - Normalize the result with `.lab/.managed/scripts/eval_report.py`.
 - Validate normalized output with `.lab/.managed/scripts/validate_results.py`.
 - Read `.lab/context/eval-protocol.md` before choosing the smallest run so the first experiment already targets the approved tables, metrics, and gates.
-- Update `.lab/context/state.md`, `.lab/context/workflow-state.md`, `.lab/context/evidence-index.md`, and `.lab/context/eval-protocol.md` after the run.
+- Update `.lab/context/evidence-index.md`, `.lab/context/eval-protocol.md`, and `.lab/context/workflow-state.md` after the run; keep durable conclusions in canonical context and let `state.md` refresh as a derived snapshot.
 - If the evaluation protocol is still skeletal, initialize the smallest trustworthy source-backed version before treating the run as the protocol anchor.
 ### `/lab:iterate`
@@ -139,7 +147,7 @@ Use this skill when the user invokes `/lab:*` or asks for the structured researc
 - Keep metric definitions, baseline behavior, and comparison implementations anchored to the source-backed evaluation protocol before changing thresholds, gates, or ladder transitions.
 - Switch to diagnostic mode if risk increases for two consecutive rounds.
 - Write round reports with `.lab/.managed/templates/iteration-report.md`.
-- Update `.lab/context/state.md`, `.lab/context/workflow-state.md`, `.lab/context/decisions.md`, `.lab/context/evidence-index.md`, `.lab/context/open-questions.md`, and `.lab/context/eval-protocol.md` each round as needed.
+- Update `.lab/context/workflow-state.md`, `.lab/context/decisions.md`, `.lab/context/evidence-index.md`, `.lab/context/open-questions.md`, and `.lab/context/eval-protocol.md` each round as needed, then refresh derived views.
 - Keep `.lab/context/eval-protocol.md` synchronized with accepted ladder changes, benchmark scope, and source-backed implementation deviations.
 - Stop at threshold success or iteration cap, and record blockers plus next-best actions when the campaign ends without success.
@@ -151,7 +159,7 @@ Use this skill when the user invokes `/lab:*` or asks for the structured researc
 - Prioritize methodology, fairness, benchmark representativeness, comparison-category coverage, leakage, statistics, ablations, and claim discipline.
 - Output findings first, then fatal flaws, then fix priority, then residual risks.
 - Use `.lab/.managed/templates/review-checklist.md`.
-- Write durable review conclusions back to `.lab/context/state.md` or `.lab/context/open-questions.md` when they affect later stages.
+- Write durable review conclusions back to `.lab/context/decisions.md`, `.lab/context/evidence-index.md`, or `.lab/context/open-questions.md` when they affect later stages. Do not use `.lab/context/state.md` as a primary write target.
 ### `/lab:report`
@@ -162,7 +170,7 @@ Use this skill when the user invokes `/lab:*` or asks for the structured researc
 - Aggregate them with `.lab/.managed/scripts/summarize_iterations.py`.
 - Write the final document with `.lab/.managed/templates/final-report.md`, the managed table summary with `.lab/.managed/templates/main-tables.md`, and the internal handoff with `.lab/.managed/templates/artifact-status.md`.
 - Keep failed attempts and limitations visible.
-- Update `.lab/context/mission.md`, `.lab/context/eval-protocol.md`, `.lab/context/state.md`, `.lab/context/workflow-state.md`, and `.lab/context/evidence-index.md` with report-level handoff notes.
+- Update `.lab/context/mission.md`, `.lab/context/eval-protocol.md`, `.lab/context/workflow-state.md`, and `.lab/context/evidence-index.md` with report-level handoff notes, then refresh derived views.
 - If canonical context is still skeletal, hydrate the smallest trustworthy version from frozen artifacts before finalizing the report.
 - If collaborator-critical fields remain missing after hydration, downgrade to an `artifact-anchored interim report` instead of presenting a final collaborator-ready report.
@@ -188,7 +196,7 @@ Use this skill when the user invokes `/lab:*` or asks for the structured researc
 - Before finalizing a round, append and answer the five-dimension self-review checklist and revise unresolved items.
 - Apply paper-writing discipline without changing experimental truth.
 - If the evidence is insufficient, stop and route back to `review` or `iterate`.
-- Update `.lab/context/state.md` and `.lab/context/evidence-index.md` when section-level claim status changes.
+- Update `.lab/context/evidence-index.md` and any directly affected canonical context file when section-level claim status changes, then refresh derived views.
 ## Hard Gates

package/package-assets/shared/skills/lab/references/workflow.md CHANGED Viewed

@@ -25,12 +25,26 @@ Escalate to a higher-level redesign instead of mutating the mission when the cur
 ## Required Artifacts
 - one approved idea artifact derived from `.lab/.managed/templates/idea.md`
+- one idea source log at `.lab/writing/idea-source-log.md` derived from `.lab/.managed/templates/idea-source-log.md`
 - one approved dataset artifact derived from `.lab/.managed/templates/data.md`
 - one lab change directory under `.lab/changes/<change-id>/`
 - normalized JSON summary from `scripts/eval_report.py`
 - per-round report in `.lab/iterations/`
 - final report under the configured `deliverables_root`
+## Artifact Roles
+- `.lab/context/mission.md` = canonical problem statement and approved direction
+- `.lab/context/state.md` = derived durable research snapshot and evidence-boundary view
+- `.lab/context/workflow-state.md` = live workflow state for the current stage, latest update, and next step
+- `.lab/context/summary.md` = derived long-horizon project summary
+- `.lab/context/session-brief.md` = next-session startup brief with only the current focus, mission snapshot, and main risk
+- `.lab/context/next-action.md` = lightweight action card for the immediate step and fallback path
+- `.lab/writing/idea-source-log.md` = idea-stage literature evidence log
+- `<deliverables_root>/report.md` = collaborator-facing research memo
+- `<deliverables_root>/artifact-status.md` = internal artifact and workflow status
+- canonical durable writes belong in `mission.md`, `decisions.md`, `data-decisions.md`, `evidence-index.md`, `eval-protocol.md`, `open-questions.md`, and `terminology-lock.md`; refresh derived views afterward
 ## Reviewer Priorities
 - Is the baseline fair and current?

package/package-assets/shared/skills/lab/stages/auto.md CHANGED Viewed

@@ -13,6 +13,7 @@
 - `.lab/config/workflow.json`
 - `.lab/context/mission.md`
 - `.lab/context/state.md`
+- `.lab/context/workflow-state.md`
 - `.lab/context/decisions.md`
 - `.lab/context/data-decisions.md`
 - `.lab/context/evidence-index.md`
@@ -26,10 +27,11 @@
 - `.lab/context/mission.md`
 - `.lab/context/eval-protocol.md`
-- `.lab/context/state.md`
+- `.lab/context/workflow-state.md`
 - `.lab/context/decisions.md`
 - `.lab/context/data-decisions.md`
 - `.lab/context/evidence-index.md`
+- `.lab/context/next-action.md`
 - `.lab/context/summary.md`
 - `.lab/context/session-brief.md`
 - `.lab/context/auto-status.md`
@@ -39,6 +41,7 @@
 - Treat `/lab:auto` as an orchestration layer, not a replacement for existing `/lab:*` stages.
 - Treat `.lab/context/eval-protocol.md` as the source of truth for paper-facing metrics, metric glossary, table plan, gates, and structured experiment ladders.
+- Treat `.lab/context/state.md` as a derived durable snapshot and `.lab/context/workflow-state.md` as the live workflow tracker. Auto mode should update canonical durable context plus `.lab/context/workflow-state.md`, then refresh derived views instead of treating `state.md` as a primary write target.
 - Treat the evaluation protocol as source-backed, not imagination-backed: metric definitions, baseline behavior, comparison implementations, and deviations must come from recorded sources before they are used in gates or promotions.
 - Treat `Academic Validity Checks` and `Integrity self-check` as mandatory automation gates. Auto mode should not proceed, promote, or declare success while those fields are missing, stale, or contradicted by the current rung.
 - Treat `Sanity and Alternative-Explanation Checks` as the anomaly gate for automation. When a rung yields all-null outputs, suspiciously identical runs, no-op deltas, or impl/result mismatches, pause promotion logic until implementation reality checks, alternative explanations, and at least one cross-check are recorded.
@@ -56,7 +59,7 @@
 - Default allowed stages are `run`, `iterate`, `review`, and `report`. Only include `write` when framing is already approved and manuscript drafting is within scope.
 - Do not automatically change the research mission, paper-facing framing, or core claims.
 - You may add exploratory datasets, benchmarks, and comparison methods inside the exploration envelope.
-- You may promote exploratory additions to the primary package only when the contract's promotion policy is satisfied and the promotion is written back into `data-decisions.md`, `decisions.md`, `state.md`, and `workflow-state.md`.
+- You may promote exploratory additions to the primary package only when the contract's promotion policy is satisfied and the promotion is written back into `data-decisions.md`, `decisions.md`, and `workflow-state.md`, then refresh derived views.
 - Poll long-running commands until they finish, hit a timeout, or hit a stop condition.
 - Keep a poll-based waiting loop instead of sleeping blindly.
 - Do not treat a short watcher such as `sleep 30`, a one-shot `pgrep`, or a single `metrics.json` probe as the rung command when the real experiment is still running.
@@ -74,7 +77,7 @@
   - `review` must update canonical review context
   - `report` must produce `<deliverables_root>/report.md` and `<deliverables_root>/main-tables.md`
   - `write` must produce LaTeX output under `<deliverables_root>/paper/`
-- Treat promotion as incomplete unless it writes back to `data-decisions.md`, `decisions.md`, `state.md`, and `workflow-state.md`.
+- Treat promotion as incomplete unless it writes back to `data-decisions.md`, `decisions.md`, and `workflow-state.md`, then refresh derived views.
 - Do not stop or promote on the basis of a metric or comparison claim whose source-backed definition is missing from the approved evaluation protocol.
 - Before each rung and before each success, stop, or promotion decision, re-check the generic academic-risk questions: setting semantics, visibility/leakage, anchor or label policy, scale comparability, metric validity, comparison validity, statistical validity, claim boundary, and integrity self-check.
 - Before each success, stop, or promotion decision, also re-check the anomaly policy: whether anomaly signals fired, whether simpler explanations were ruled out, whether a cross-check was performed, and whether the current interpretation is still the narrowest supported one.

package/package-assets/shared/skills/lab/stages/data.md CHANGED Viewed

@@ -24,7 +24,7 @@
 ## Context Write Set
-- `.lab/context/state.md`
+- `.lab/context/workflow-state.md`
 - `.lab/context/decisions.md`
 - `.lab/context/data-decisions.md`

package/package-assets/shared/skills/lab/stages/framing.md CHANGED Viewed

@@ -23,7 +23,7 @@
 ## Context Write Set
-- `.lab/context/state.md`
+- `.lab/context/workflow-state.md`
 - `.lab/context/decisions.md`
 - `.lab/context/terminology-lock.md`

package/package-assets/shared/skills/lab/stages/idea.md CHANGED Viewed

@@ -14,6 +14,10 @@
 - why the proposed idea is better than existing methods
 - rough plain-language approach description
 - three meaningful points
+- brainstorm pass 1 with 2-4 candidate directions
+- literature sweep 1 with 3-5 closest-prior references per direction
+- brainstorm pass 2 that narrows to 1-2 surviving directions
+- literature sweep 2 that expands the surviving directions into the full source bundle
 - literature scoping bundle with a default target of 20 sources, or an explicit explanation for a smaller scoped field
 - literature-backed framing
 - sourced datasets and metrics
@@ -32,7 +36,13 @@
 - Do not merge them into one undifferentiated summary.
 - Ask one clarifying question at a time when a missing assumption would materially change the proposal.
 - Build a source bundle before claiming novelty. The default target is 20 relevant sources split across closest prior work, recent strong papers, benchmark or evaluation papers, surveys or taxonomies, and adjacent-field work when useful.
-- If the field is genuinely too narrow to support that target, say so explicitly and justify the smaller literature bundle instead of silently skipping the search.
+- Treat closest prior work, recent strong papers, benchmark or evaluation papers, and survey or taxonomy papers as mandatory coverage buckets. Do not leave those buckets empty in the final source bundle.
+- Keep a separate idea source log that records the actual search queries, bucketed sources, and final source count for both literature sweeps.
+- Use the first brainstorm pass only to generate candidate directions. Treat it as hypothesis generation, not as a novelty judgment.
+- After brainstorm pass 1, run a first literature sweep that gathers 3-5 closest-prior references per direction before narrowing the idea.
+- After literature sweep 1, run a second brainstorm pass that explicitly kills, merges, or narrows directions.
+- Only after literature sweep 2 may the artifact give a final recommendation, paper fit, or novelty claim.
+- If the field is genuinely too narrow to support that target, say so explicitly in both the idea artifact and the idea source log, and justify the smaller literature bundle instead of silently skipping the search.
 - The idea artifact must follow the repository `workflow_language`, not whichever language is easiest locally.
 - Before writing the full artifact, give the user a short summary with the one-sentence problem, why current methods fail, and the three meaningful points.
@@ -48,6 +58,11 @@
 - `.lab/context/decisions.md`
 - `.lab/context/open-questions.md`
+## Required Artifacts
+- idea artifact derived from `.lab/.managed/templates/idea.md`
+- idea source log at `.lab/writing/idea-source-log.md`, derived from `.lab/.managed/templates/idea-source-log.md`
 ## Recommended Structure
 1. Scenario
@@ -56,27 +71,38 @@
 4. Failure of existing methods
 5. Idea classification, contribution category, and breakthrough level
 6. Existing methods and shared assumptions
-7. Literature scoping bundle
-8. Closest-prior-work comparison
-9. Rough approach in plain language
-10. Why the proposed idea is better
-11. Three meaningful points
-12. Candidate approaches and recommendation
-13. Dataset, baseline, and metric candidates
-14. Falsifiable hypothesis
-15. Expert critique
-16. Revised proposal
-17. Approval gate
-18. Minimum viable experiment
+7. Brainstorm pass 1
+8. Literature sweep 1
+9. Literature scoping bundle
+10. Closest-prior-work comparison
+11. Brainstorm pass 2
+12. Literature sweep 2
+13. Rough approach in plain language
+14. Why the proposed idea is better
+15. Three meaningful points
+16. Candidate approaches and recommendation
+17. Dataset, baseline, and metric candidates
+18. Falsifiable hypothesis
+19. Expert critique
+20. Revised proposal or final recommendation
+21. Approval gate
+22. Minimum viable experiment
+23. Idea source log aligned with the two literature sweeps
 ## Writing Standard
 - Keep the problem statement short, concrete, and easy to scan.
 - Explain the scenario, target user or beneficiary, and why the problem matters before talking about novelty.
 - State why the target problem matters before talking about the method.
+- Use brainstorm pass 1 to open the space, not to declare a winner.
+- Use literature sweep 1 to test candidate directions against real papers before narrowing them.
+- Use brainstorm pass 2 to explain what survived, what was rejected, and why.
+- Use literature sweep 2 to support the final recommendation with real references across the required buckets.
 - Compare against existing methods explicitly, not by vague novelty language.
 - Do not call something new without a literature-scoping bundle and a closest-prior comparison.
+- Do not call something paper-worthy or novel after only one brainstorm pass or one literature sweep.
+- Do not treat the idea artifact itself as the only evidence record; keep `.lab/writing/idea-source-log.md` synchronized with the actual searches and source buckets used in both literature sweeps.
 - Explain what current methods do, why they fall short, and roughly how the proposed idea would work in plain language.
 - The three meaningful points should each fit in one direct sentence.
-- Before approval, run `.lab/.managed/scripts/validate_idea_artifact.py --idea <idea-artifact> --workflow-config .lab/config/workflow.json`.
+- Before approval, run `.lab/.managed/scripts/validate_idea_artifact.py --idea <idea-artifact> --source-log .lab/writing/idea-source-log.md --workflow-config .lab/config/workflow.json`.
 - Do not leave `.lab/context/mission.md` as an empty template after convergence; write the approved problem, why it matters, the current benchmark scope, and the approved direction back into canonical context.

package/package-assets/shared/skills/lab/stages/iterate.md CHANGED Viewed

@@ -17,6 +17,7 @@ Declare and keep fixed:
 - `.lab/context/mission.md`
 - `.lab/context/state.md`
+- `.lab/context/workflow-state.md`
 - `.lab/context/decisions.md`
 - `.lab/context/evidence-index.md`
 - `.lab/context/data-decisions.md`
@@ -25,7 +26,7 @@ Declare and keep fixed:
 ## Context Write Set
-- `.lab/context/state.md`
+- `.lab/context/workflow-state.md`
 - `.lab/context/decisions.md`
 - `.lab/context/evidence-index.md`
 - `.lab/context/open-questions.md`
@@ -62,6 +63,7 @@ If the loop stops without success, record:
 - Keep figures or plots under `figures_root`.
 - Do not accumulate long-lived results under `.lab/changes/<change-id>/runs`.
 - Do not change metric definitions, baseline semantics, or comparison implementations unless the approved evaluation protocol records both their sources and any deviations.
+- Write durable findings and evidence boundary changes into canonical context such as `.lab/context/decisions.md`, `.lab/context/evidence-index.md`, `.lab/context/open-questions.md`, and `.lab/context/eval-protocol.md`, then refresh the derived `state.md` snapshot. Write round-to-round execution progress into `.lab/context/workflow-state.md`.
 - When you change ladders, sample sizes, or promotion gates, keep the resulting logic anchored to the source-backed evaluation protocol instead of ad-hoc chat reasoning.
 - Keep `.lab/context/eval-protocol.md` synchronized with the active benchmark scope, ladder gates, source-backed metric definitions, and any accepted implementation deviations instead of leaving it as a stale template.
 - Re-run the `Academic Validity Checks` and `Integrity self-check` whenever you change inputs, anchors, labels, metrics, comparisons, or promotion logic.

package/package-assets/shared/skills/lab/stages/report.md CHANGED Viewed

@@ -39,8 +39,8 @@
 - `.lab/context/mission.md`
 - `.lab/context/eval-protocol.md`
-- `.lab/context/state.md`
 - `.lab/context/workflow-state.md`
+- `.lab/context/decisions.md`
 - `.lab/context/evidence-index.md`
 ## Evidence Rules
@@ -71,6 +71,7 @@
 - If the existing `report.md` or `main-tables.md` is missing required collaborator-facing sections from the managed templates, treat that as a report deficiency. A rerun must repair the missing sections instead of declaring "no content change" or treating the rerun as a no-op.
 - After drafting or rerunning the report, run `.lab/.managed/scripts/validate_collaborator_report.py --report <deliverables_root>/report.md --main-tables <deliverables_root>/main-tables.md`. If it fails, keep editing until it passes; do not stop at a no-op audit rerun.
 - Do not mix workflow deliverable status, rerun ids, or manuscript skeleton status into validated scientific findings; keep those in `<deliverables_root>/artifact-status.md`.
+- Write durable report-level conclusions into canonical context such as `.lab/context/mission.md`, `.lab/context/eval-protocol.md`, `.lab/context/decisions.md`, and `.lab/context/evidence-index.md`, then refresh the derived `state.md` snapshot. Write live reporting progress or immediate handoff actions into `.lab/context/workflow-state.md`.
 - If `.lab/config/workflow.json` sets the workflow language to Chinese, write `report.md` and `<deliverables_root>/main-tables.md` in Chinese unless a file path, code identifier, or literal metric name must remain unchanged.
 - Prefer conservative interpretation over marketing language.
 - Leave a clear handoff path into `/lab:write` with evidence links that section drafts can cite.

package/package-assets/shared/skills/lab/stages/review.md CHANGED Viewed

@@ -19,8 +19,10 @@
 ## Context Write Set
-- `.lab/context/state.md`
+- `.lab/context/workflow-state.md`
+- `.lab/context/decisions.md`
 - `.lab/context/open-questions.md`
+- `.lab/context/evidence-index.md`
 ## Reviewer Priorities

package/package-assets/shared/skills/lab/stages/run.md CHANGED Viewed

@@ -12,13 +12,15 @@
 - `.lab/context/mission.md`
 - `.lab/context/state.md`
+- `.lab/context/workflow-state.md`
 - `.lab/context/data-decisions.md`
 - `.lab/context/eval-protocol.md`
 - `.lab/config/workflow.json`
 ## Context Write Set
-- `.lab/context/state.md`
+- `.lab/context/workflow-state.md`
+- `.lab/context/decisions.md`
 - `.lab/context/evidence-index.md`
 - `.lab/context/eval-protocol.md`
@@ -30,6 +32,7 @@
 - Do not invent metric definitions, baseline behavior, or comparison implementations from memory; anchor them to the approved evaluation protocol and its recorded sources.
 - Treat `Academic Validity Checks` and `Integrity self-check` as preflight gates, not optional notes. Do not bless a run as protocol-valid until those fields are filled and still match the current experiment.
 - Treat `Sanity and Alternative-Explanation Checks` as a second preflight gate. If anomaly signals have fired and the implementation reality checks, alternative explanations, cross-check method, best-supported interpretation, or escalation threshold are still blank, do not bless the run as valid evidence.
+- Write durable research conclusions into canonical context such as `.lab/context/decisions.md`, `.lab/context/evidence-index.md`, and `.lab/context/eval-protocol.md`, then refresh the derived `state.md` snapshot. Keep live execution progress in `.lab/context/workflow-state.md`.
 - If `.lab/context/eval-protocol.md` is still skeletal, write the smallest trustworthy version of the current evaluation objective, metric set, ladder, and source-backed implementation notes before treating the run as the new protocol anchor.
 - Refuse to treat a run as scientifically valid if the protocol has not answered the generic academic-risk questions: setting semantics, visibility/leakage, anchor or label policy, scale comparability, metric validity, comparison validity, statistical validity, and claim boundary.
 - Treat all-null outputs, suspiciously identical reruns, no-op deltas, and impl/result mismatches as diagnostic triggers first; check code paths and rule out simpler explanations before interpreting them as findings.

package/package-assets/shared/skills/lab/stages/spec.md CHANGED Viewed

@@ -17,11 +17,12 @@
 - `.lab/context/mission.md`
 - `.lab/context/decisions.md`
 - `.lab/context/state.md`
+- `.lab/context/workflow-state.md`
 - `.lab/context/data-decisions.md`
 ## Context Write Set
-- `.lab/context/state.md`
+- `.lab/context/workflow-state.md`
 - `.lab/context/decisions.md`
 ## Required Change Layout

package/package-assets/shared/skills/lab/stages/write.md CHANGED Viewed

@@ -29,7 +29,7 @@
 ## Context Write Set
-- `.lab/context/state.md`
+- `.lab/context/workflow-state.md`
 - `.lab/context/evidence-index.md`
 ## Required References

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "superlab",
-  "version": "0.1.28",
+  "version": "0.1.29",
   "description": "Strict /lab research workflow installer for Codex and Claude",
   "keywords": [
     "codex",