superlab 0.1.28 → 0.1.29

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (29) hide show
  1. package/lib/auto_contracts.cjs +1 -3
  2. package/lib/auto_runner.cjs +0 -1
  3. package/lib/context.cjs +22 -30
  4. package/lib/i18n.cjs +152 -44
  5. package/lib/lab_idea_contract.json +8 -0
  6. package/package-assets/claude/commands/lab-idea.md +1 -1
  7. package/package-assets/claude/commands/lab.md +2 -4
  8. package/package-assets/codex/prompts/lab-idea.md +1 -1
  9. package/package-assets/codex/prompts/lab.md +2 -4
  10. package/package-assets/shared/lab/.managed/scripts/validate_idea_artifact.py +208 -1
  11. package/package-assets/shared/lab/.managed/templates/idea-source-log.md +37 -0
  12. package/package-assets/shared/lab/.managed/templates/idea.md +37 -1
  13. package/package-assets/shared/lab/context/auto-mode.md +2 -2
  14. package/package-assets/shared/lab/context/session-brief.md +3 -14
  15. package/package-assets/shared/lab/context/state.md +2 -0
  16. package/package-assets/shared/lab/system/core.md +4 -3
  17. package/package-assets/shared/skills/lab/SKILL.md +19 -11
  18. package/package-assets/shared/skills/lab/references/workflow.md +14 -0
  19. package/package-assets/shared/skills/lab/stages/auto.md +6 -3
  20. package/package-assets/shared/skills/lab/stages/data.md +1 -1
  21. package/package-assets/shared/skills/lab/stages/framing.md +1 -1
  22. package/package-assets/shared/skills/lab/stages/idea.md +40 -14
  23. package/package-assets/shared/skills/lab/stages/iterate.md +3 -1
  24. package/package-assets/shared/skills/lab/stages/report.md +2 -1
  25. package/package-assets/shared/skills/lab/stages/review.md +3 -1
  26. package/package-assets/shared/skills/lab/stages/run.md +4 -1
  27. package/package-assets/shared/skills/lab/stages/spec.md +2 -1
  28. package/package-assets/shared/skills/lab/stages/write.md +1 -1
  29. package/package.json +1 -1
@@ -22,6 +22,9 @@ Use this skill when the user invokes `/lab:*` or asks for the structured researc
22
22
  - Write durable artifacts to disk instead of leaving key decisions only in chat.
23
23
  - Use `.lab/config/workflow.json` as the global contract for workflow language, paper language, and paper format.
24
24
  - Use `.lab/context/` as the shared project state for both Codex and Claude entrypoints.
25
+ - Treat `.lab/context/state.md` as a derived durable research snapshot and evidence-boundary view, not as a primary write target or live workflow scratchpad.
26
+ - Treat `.lab/context/workflow-state.md` as the live workflow tracker for stage, latest update, and immediate next step.
27
+ - Treat `.lab/context/summary.md` as the derived long-horizon summary, `.lab/context/session-brief.md` as the startup brief, and `.lab/context/next-action.md` as the lightweight action card.
25
28
  - Use `.lab/context/eval-protocol.md` as the shared evaluation contract for run, iterate, auto, and report stages, including metric glossary and experiment ladder semantics.
26
29
  - Treat evaluation semantics as source-backed once evaluation planning starts: metrics, benchmark gates, baseline behavior, comparison implementations, and deviations should come from recorded sources, not memory.
27
30
  - Workflow artifacts should follow the installed workflow language.
@@ -34,6 +37,10 @@ Use this skill when the user invokes `/lab:*` or asks for the structured researc
34
37
  ### `/lab:idea`
35
38
 
36
39
  - Search relevant literature, baselines, datasets, and evaluation metrics before proposing a plan.
40
+ - Start with brainstorm pass 1 over 2-4 candidate directions instead of locking the first idea immediately.
41
+ - Run literature sweep 1 with closest-prior references for each candidate direction before narrowing.
42
+ - Use brainstorm pass 2 to keep only the strongest 1-2 directions and explain what was rejected.
43
+ - Run literature sweep 2 before making a final recommendation or novelty claim.
37
44
  - Build a literature-scoping bundle before claiming novelty. The default target is 20 relevant sources unless the field is genuinely too narrow and that exception is written down.
38
45
  - Read `.lab/context/mission.md` and `.lab/context/open-questions.md` before drafting.
39
46
  - Read `.lab/config/workflow.json` before drafting and follow its `workflow_language` for idea artifacts.
@@ -49,7 +56,8 @@ Use this skill when the user invokes `/lab:*` or asks for the structured researc
49
56
  - Include a minimum viable experiment before approval.
50
57
  - Keep an explicit approval gate before `/lab:spec`.
51
58
  - Write idea artifacts with the template in `.lab/.managed/templates/idea.md`.
52
- - Run `.lab/.managed/scripts/validate_idea_artifact.py --idea <idea-artifact> --workflow-config .lab/config/workflow.json` before treating the idea as converged.
59
+ - Keep `.lab/writing/idea-source-log.md` as the source-backed search manifest for the two literature sweeps.
60
+ - Run `.lab/.managed/scripts/validate_idea_artifact.py --idea <idea-artifact> --source-log .lab/writing/idea-source-log.md --workflow-config .lab/config/workflow.json` before treating the idea as converged.
53
61
  - Update `.lab/context/mission.md`, `.lab/context/decisions.md`, and `.lab/context/open-questions.md` after convergence.
54
62
  - Do not leave `.lab/context/mission.md` as a template shell once the problem statement and approved direction are known.
55
63
  - Do not implement code in this stage.
@@ -68,7 +76,7 @@ Use this skill when the user invokes `/lab:*` or asks for the structured researc
68
76
  - Prefer official benchmark pages, official dataset pages, author project pages, and official repositories before mirrors or reposts.
69
77
  - Record download source, license or access constraints, split availability, and main risks.
70
78
  - Write the durable dataset artifact with `.lab/.managed/templates/data.md`.
71
- - Update `.lab/context/data-decisions.md`, `.lab/context/decisions.md`, and `.lab/context/state.md` after convergence.
79
+ - Update `.lab/context/data-decisions.md` and `.lab/context/decisions.md` after convergence, then refresh derived views.
72
80
  - Keep an explicit approval gate before `/lab:spec`.
73
81
 
74
82
  ### `/lab:framing`
@@ -82,7 +90,7 @@ Use this skill when the user invokes `/lab:*` or asks for the structured researc
82
90
  - Avoid acronym-first naming and names that sound like implementation patches or marketing language.
83
91
  - Produce 2-3 candidate framing packs with trade-offs before recommending one.
84
92
  - Write the durable framing artifact with `.lab/.managed/templates/framing.md`.
85
- - Update `.lab/writing/framing.md`, `.lab/context/decisions.md`, and `.lab/context/state.md` after convergence.
93
+ - Update `.lab/writing/framing.md`, `.lab/context/decisions.md`, and `.lab/context/terminology-lock.md` after convergence, then refresh derived views.
86
94
  - Keep an explicit approval gate before `/lab:write`.
87
95
 
88
96
  ### `/lab:auto`
@@ -96,9 +104,9 @@ Use this skill when the user invokes `/lab:*` or asks for the structured researc
96
104
  - Reuse `/lab:run`, `/lab:iterate`, `/lab:review`, `/lab:report`, and optional `/lab:write` instead of inventing a second workflow.
97
105
  - Do not automatically change the research mission, paper-facing framing, or core claims.
98
106
  - You may add exploratory datasets, benchmarks, and comparison methods inside the approved exploration envelope.
99
- - You may promote an exploratory addition to the primary package only after the promotion policy in `auto-mode.md` is satisfied and the promotion is written back into `.lab/context/data-decisions.md`, `.lab/context/decisions.md`, `.lab/context/state.md`, and `.lab/context/workflow-state.md`.
107
+ - You may promote an exploratory addition to the primary package only after the promotion policy in `auto-mode.md` is satisfied and the promotion is written back into `.lab/context/data-decisions.md`, `.lab/context/decisions.md`, and `.lab/context/workflow-state.md`, then refresh derived views.
100
108
  - Poll long-running commands until they complete, time out, or hit a stop condition.
101
- - Update `.lab/context/auto-status.md`, `.lab/context/state.md`, `.lab/context/workflow-state.md`, `.lab/context/decisions.md`, `.lab/context/data-decisions.md`, and `.lab/context/evidence-index.md` as the campaign advances, then refresh the derived handoff files.
109
+ - Update `.lab/context/auto-status.md`, `.lab/context/workflow-state.md`, `.lab/context/decisions.md`, `.lab/context/data-decisions.md`, and `.lab/context/evidence-index.md` as the campaign advances, then refresh the derived handoff files.
102
110
  - Keep an explicit approval gate when a proposed action would leave the frozen core defined by the auto-mode contract.
103
111
 
104
112
  ### `/lab:spec`
@@ -107,7 +115,7 @@ Use this skill when the user invokes `/lab:*` or asks for the structured researc
107
115
  - Read `.lab/context/mission.md`, `.lab/context/decisions.md`, `.lab/context/state.md`, `.lab/context/workflow-state.md`, and `.lab/context/data-decisions.md` before drafting the change.
108
116
  - Use `.lab/changes/<change-id>/` as the canonical lab change directory.
109
117
  - Convert the approved idea into lab change artifacts using `.lab/.managed/templates/proposal.md`, `.lab/.managed/templates/design.md`, `.lab/.managed/templates/spec.md`, and `.lab/.managed/templates/tasks.md`.
110
- - Update `.lab/context/state.md` and `.lab/context/decisions.md` after freezing the spec.
118
+ - Update `.lab/context/decisions.md` after freezing the spec, then refresh derived views.
111
119
  - Do not skip task definition.
112
120
 
113
121
  ### `/lab:run`
@@ -118,7 +126,7 @@ Use this skill when the user invokes `/lab:*` or asks for the structured researc
118
126
  - Normalize the result with `.lab/.managed/scripts/eval_report.py`.
119
127
  - Validate normalized output with `.lab/.managed/scripts/validate_results.py`.
120
128
  - Read `.lab/context/eval-protocol.md` before choosing the smallest run so the first experiment already targets the approved tables, metrics, and gates.
121
- - Update `.lab/context/state.md`, `.lab/context/workflow-state.md`, `.lab/context/evidence-index.md`, and `.lab/context/eval-protocol.md` after the run.
129
+ - Update `.lab/context/evidence-index.md`, `.lab/context/eval-protocol.md`, and `.lab/context/workflow-state.md` after the run; keep durable conclusions in canonical context and let `state.md` refresh as a derived snapshot.
122
130
  - If the evaluation protocol is still skeletal, initialize the smallest trustworthy source-backed version before treating the run as the protocol anchor.
123
131
 
124
132
  ### `/lab:iterate`
@@ -139,7 +147,7 @@ Use this skill when the user invokes `/lab:*` or asks for the structured researc
139
147
  - Keep metric definitions, baseline behavior, and comparison implementations anchored to the source-backed evaluation protocol before changing thresholds, gates, or ladder transitions.
140
148
  - Switch to diagnostic mode if risk increases for two consecutive rounds.
141
149
  - Write round reports with `.lab/.managed/templates/iteration-report.md`.
142
- - Update `.lab/context/state.md`, `.lab/context/workflow-state.md`, `.lab/context/decisions.md`, `.lab/context/evidence-index.md`, `.lab/context/open-questions.md`, and `.lab/context/eval-protocol.md` each round as needed.
150
+ - Update `.lab/context/workflow-state.md`, `.lab/context/decisions.md`, `.lab/context/evidence-index.md`, `.lab/context/open-questions.md`, and `.lab/context/eval-protocol.md` each round as needed, then refresh derived views.
143
151
  - Keep `.lab/context/eval-protocol.md` synchronized with accepted ladder changes, benchmark scope, and source-backed implementation deviations.
144
152
  - Stop at threshold success or iteration cap, and record blockers plus next-best actions when the campaign ends without success.
145
153
 
@@ -151,7 +159,7 @@ Use this skill when the user invokes `/lab:*` or asks for the structured researc
151
159
  - Prioritize methodology, fairness, benchmark representativeness, comparison-category coverage, leakage, statistics, ablations, and claim discipline.
152
160
  - Output findings first, then fatal flaws, then fix priority, then residual risks.
153
161
  - Use `.lab/.managed/templates/review-checklist.md`.
154
- - Write durable review conclusions back to `.lab/context/state.md` or `.lab/context/open-questions.md` when they affect later stages.
162
+ - Write durable review conclusions back to `.lab/context/decisions.md`, `.lab/context/evidence-index.md`, or `.lab/context/open-questions.md` when they affect later stages. Do not use `.lab/context/state.md` as a primary write target.
155
163
 
156
164
  ### `/lab:report`
157
165
 
@@ -162,7 +170,7 @@ Use this skill when the user invokes `/lab:*` or asks for the structured researc
162
170
  - Aggregate them with `.lab/.managed/scripts/summarize_iterations.py`.
163
171
  - Write the final document with `.lab/.managed/templates/final-report.md`, the managed table summary with `.lab/.managed/templates/main-tables.md`, and the internal handoff with `.lab/.managed/templates/artifact-status.md`.
164
172
  - Keep failed attempts and limitations visible.
165
- - Update `.lab/context/mission.md`, `.lab/context/eval-protocol.md`, `.lab/context/state.md`, `.lab/context/workflow-state.md`, and `.lab/context/evidence-index.md` with report-level handoff notes.
173
+ - Update `.lab/context/mission.md`, `.lab/context/eval-protocol.md`, `.lab/context/workflow-state.md`, and `.lab/context/evidence-index.md` with report-level handoff notes, then refresh derived views.
166
174
  - If canonical context is still skeletal, hydrate the smallest trustworthy version from frozen artifacts before finalizing the report.
167
175
  - If collaborator-critical fields remain missing after hydration, downgrade to an `artifact-anchored interim report` instead of presenting a final collaborator-ready report.
168
176
 
@@ -188,7 +196,7 @@ Use this skill when the user invokes `/lab:*` or asks for the structured researc
188
196
  - Before finalizing a round, append and answer the five-dimension self-review checklist and revise unresolved items.
189
197
  - Apply paper-writing discipline without changing experimental truth.
190
198
  - If the evidence is insufficient, stop and route back to `review` or `iterate`.
191
- - Update `.lab/context/state.md` and `.lab/context/evidence-index.md` when section-level claim status changes.
199
+ - Update `.lab/context/evidence-index.md` and any directly affected canonical context file when section-level claim status changes, then refresh derived views.
192
200
 
193
201
  ## Hard Gates
194
202
 
@@ -25,12 +25,26 @@ Escalate to a higher-level redesign instead of mutating the mission when the cur
25
25
  ## Required Artifacts
26
26
 
27
27
  - one approved idea artifact derived from `.lab/.managed/templates/idea.md`
28
+ - one idea source log at `.lab/writing/idea-source-log.md` derived from `.lab/.managed/templates/idea-source-log.md`
28
29
  - one approved dataset artifact derived from `.lab/.managed/templates/data.md`
29
30
  - one lab change directory under `.lab/changes/<change-id>/`
30
31
  - normalized JSON summary from `scripts/eval_report.py`
31
32
  - per-round report in `.lab/iterations/`
32
33
  - final report under the configured `deliverables_root`
33
34
 
35
+ ## Artifact Roles
36
+
37
+ - `.lab/context/mission.md` = canonical problem statement and approved direction
38
+ - `.lab/context/state.md` = derived durable research snapshot and evidence-boundary view
39
+ - `.lab/context/workflow-state.md` = live workflow state for the current stage, latest update, and next step
40
+ - `.lab/context/summary.md` = derived long-horizon project summary
41
+ - `.lab/context/session-brief.md` = next-session startup brief with only the current focus, mission snapshot, and main risk
42
+ - `.lab/context/next-action.md` = lightweight action card for the immediate step and fallback path
43
+ - `.lab/writing/idea-source-log.md` = idea-stage literature evidence log
44
+ - `<deliverables_root>/report.md` = collaborator-facing research memo
45
+ - `<deliverables_root>/artifact-status.md` = internal artifact and workflow status
46
+ - canonical durable writes belong in `mission.md`, `decisions.md`, `data-decisions.md`, `evidence-index.md`, `eval-protocol.md`, `open-questions.md`, and `terminology-lock.md`; refresh derived views afterward
47
+
34
48
  ## Reviewer Priorities
35
49
 
36
50
  - Is the baseline fair and current?
@@ -13,6 +13,7 @@
13
13
  - `.lab/config/workflow.json`
14
14
  - `.lab/context/mission.md`
15
15
  - `.lab/context/state.md`
16
+ - `.lab/context/workflow-state.md`
16
17
  - `.lab/context/decisions.md`
17
18
  - `.lab/context/data-decisions.md`
18
19
  - `.lab/context/evidence-index.md`
@@ -26,10 +27,11 @@
26
27
 
27
28
  - `.lab/context/mission.md`
28
29
  - `.lab/context/eval-protocol.md`
29
- - `.lab/context/state.md`
30
+ - `.lab/context/workflow-state.md`
30
31
  - `.lab/context/decisions.md`
31
32
  - `.lab/context/data-decisions.md`
32
33
  - `.lab/context/evidence-index.md`
34
+ - `.lab/context/next-action.md`
33
35
  - `.lab/context/summary.md`
34
36
  - `.lab/context/session-brief.md`
35
37
  - `.lab/context/auto-status.md`
@@ -39,6 +41,7 @@
39
41
 
40
42
  - Treat `/lab:auto` as an orchestration layer, not a replacement for existing `/lab:*` stages.
41
43
  - Treat `.lab/context/eval-protocol.md` as the source of truth for paper-facing metrics, metric glossary, table plan, gates, and structured experiment ladders.
44
+ - Treat `.lab/context/state.md` as a derived durable snapshot and `.lab/context/workflow-state.md` as the live workflow tracker. Auto mode should update canonical durable context plus `.lab/context/workflow-state.md`, then refresh derived views instead of treating `state.md` as a primary write target.
42
45
  - Treat the evaluation protocol as source-backed, not imagination-backed: metric definitions, baseline behavior, comparison implementations, and deviations must come from recorded sources before they are used in gates or promotions.
43
46
  - Treat `Academic Validity Checks` and `Integrity self-check` as mandatory automation gates. Auto mode should not proceed, promote, or declare success while those fields are missing, stale, or contradicted by the current rung.
44
47
  - Treat `Sanity and Alternative-Explanation Checks` as the anomaly gate for automation. When a rung yields all-null outputs, suspiciously identical runs, no-op deltas, or impl/result mismatches, pause promotion logic until implementation reality checks, alternative explanations, and at least one cross-check are recorded.
@@ -56,7 +59,7 @@
56
59
  - Default allowed stages are `run`, `iterate`, `review`, and `report`. Only include `write` when framing is already approved and manuscript drafting is within scope.
57
60
  - Do not automatically change the research mission, paper-facing framing, or core claims.
58
61
  - You may add exploratory datasets, benchmarks, and comparison methods inside the exploration envelope.
59
- - You may promote exploratory additions to the primary package only when the contract's promotion policy is satisfied and the promotion is written back into `data-decisions.md`, `decisions.md`, `state.md`, and `workflow-state.md`.
62
+ - You may promote exploratory additions to the primary package only when the contract's promotion policy is satisfied and the promotion is written back into `data-decisions.md`, `decisions.md`, and `workflow-state.md`, then refresh derived views.
60
63
  - Poll long-running commands until they finish, hit a timeout, or hit a stop condition.
61
64
  - Keep a poll-based waiting loop instead of sleeping blindly.
62
65
  - Do not treat a short watcher such as `sleep 30`, a one-shot `pgrep`, or a single `metrics.json` probe as the rung command when the real experiment is still running.
@@ -74,7 +77,7 @@
74
77
  - `review` must update canonical review context
75
78
  - `report` must produce `<deliverables_root>/report.md` and `<deliverables_root>/main-tables.md`
76
79
  - `write` must produce LaTeX output under `<deliverables_root>/paper/`
77
- - Treat promotion as incomplete unless it writes back to `data-decisions.md`, `decisions.md`, `state.md`, and `workflow-state.md`.
80
+ - Treat promotion as incomplete unless it writes back to `data-decisions.md`, `decisions.md`, and `workflow-state.md`, then refresh derived views.
78
81
  - Do not stop or promote on the basis of a metric or comparison claim whose source-backed definition is missing from the approved evaluation protocol.
79
82
  - Before each rung and before each success, stop, or promotion decision, re-check the generic academic-risk questions: setting semantics, visibility/leakage, anchor or label policy, scale comparability, metric validity, comparison validity, statistical validity, claim boundary, and integrity self-check.
80
83
  - Before each success, stop, or promotion decision, also re-check the anomaly policy: whether anomaly signals fired, whether simpler explanations were ruled out, whether a cross-check was performed, and whether the current interpretation is still the narrowest supported one.
@@ -24,7 +24,7 @@
24
24
 
25
25
  ## Context Write Set
26
26
 
27
- - `.lab/context/state.md`
27
+ - `.lab/context/workflow-state.md`
28
28
  - `.lab/context/decisions.md`
29
29
  - `.lab/context/data-decisions.md`
30
30
 
@@ -23,7 +23,7 @@
23
23
 
24
24
  ## Context Write Set
25
25
 
26
- - `.lab/context/state.md`
26
+ - `.lab/context/workflow-state.md`
27
27
  - `.lab/context/decisions.md`
28
28
  - `.lab/context/terminology-lock.md`
29
29
 
@@ -14,6 +14,10 @@
14
14
  - why the proposed idea is better than existing methods
15
15
  - rough plain-language approach description
16
16
  - three meaningful points
17
+ - brainstorm pass 1 with 2-4 candidate directions
18
+ - literature sweep 1 with 3-5 closest-prior references per direction
19
+ - brainstorm pass 2 that narrows to 1-2 surviving directions
20
+ - literature sweep 2 that expands the surviving directions into the full source bundle
17
21
  - literature scoping bundle with a default target of 20 sources, or an explicit explanation for a smaller scoped field
18
22
  - literature-backed framing
19
23
  - sourced datasets and metrics
@@ -32,7 +36,13 @@
32
36
  - Do not merge them into one undifferentiated summary.
33
37
  - Ask one clarifying question at a time when a missing assumption would materially change the proposal.
34
38
  - Build a source bundle before claiming novelty. The default target is 20 relevant sources split across closest prior work, recent strong papers, benchmark or evaluation papers, surveys or taxonomies, and adjacent-field work when useful.
35
- - If the field is genuinely too narrow to support that target, say so explicitly and justify the smaller literature bundle instead of silently skipping the search.
39
+ - Treat closest prior work, recent strong papers, benchmark or evaluation papers, and survey or taxonomy papers as mandatory coverage buckets. Do not leave those buckets empty in the final source bundle.
40
+ - Keep a separate idea source log that records the actual search queries, bucketed sources, and final source count for both literature sweeps.
41
+ - Use the first brainstorm pass only to generate candidate directions. Treat it as hypothesis generation, not as a novelty judgment.
42
+ - After brainstorm pass 1, run a first literature sweep that gathers 3-5 closest-prior references per direction before narrowing the idea.
43
+ - After literature sweep 1, run a second brainstorm pass that explicitly kills, merges, or narrows directions.
44
+ - Only after literature sweep 2 may the artifact give a final recommendation, paper fit, or novelty claim.
45
+ - If the field is genuinely too narrow to support that target, say so explicitly in both the idea artifact and the idea source log, and justify the smaller literature bundle instead of silently skipping the search.
36
46
  - The idea artifact must follow the repository `workflow_language`, not whichever language is easiest locally.
37
47
  - Before writing the full artifact, give the user a short summary with the one-sentence problem, why current methods fail, and the three meaningful points.
38
48
 
@@ -48,6 +58,11 @@
48
58
  - `.lab/context/decisions.md`
49
59
  - `.lab/context/open-questions.md`
50
60
 
61
+ ## Required Artifacts
62
+
63
+ - idea artifact derived from `.lab/.managed/templates/idea.md`
64
+ - idea source log at `.lab/writing/idea-source-log.md`, derived from `.lab/.managed/templates/idea-source-log.md`
65
+
51
66
  ## Recommended Structure
52
67
 
53
68
  1. Scenario
@@ -56,27 +71,38 @@
56
71
  4. Failure of existing methods
57
72
  5. Idea classification, contribution category, and breakthrough level
58
73
  6. Existing methods and shared assumptions
59
- 7. Literature scoping bundle
60
- 8. Closest-prior-work comparison
61
- 9. Rough approach in plain language
62
- 10. Why the proposed idea is better
63
- 11. Three meaningful points
64
- 12. Candidate approaches and recommendation
65
- 13. Dataset, baseline, and metric candidates
66
- 14. Falsifiable hypothesis
67
- 15. Expert critique
68
- 16. Revised proposal
69
- 17. Approval gate
70
- 18. Minimum viable experiment
74
+ 7. Brainstorm pass 1
75
+ 8. Literature sweep 1
76
+ 9. Literature scoping bundle
77
+ 10. Closest-prior-work comparison
78
+ 11. Brainstorm pass 2
79
+ 12. Literature sweep 2
80
+ 13. Rough approach in plain language
81
+ 14. Why the proposed idea is better
82
+ 15. Three meaningful points
83
+ 16. Candidate approaches and recommendation
84
+ 17. Dataset, baseline, and metric candidates
85
+ 18. Falsifiable hypothesis
86
+ 19. Expert critique
87
+ 20. Revised proposal or final recommendation
88
+ 21. Approval gate
89
+ 22. Minimum viable experiment
90
+ 23. Idea source log aligned with the two literature sweeps
71
91
 
72
92
  ## Writing Standard
73
93
 
74
94
  - Keep the problem statement short, concrete, and easy to scan.
75
95
  - Explain the scenario, target user or beneficiary, and why the problem matters before talking about novelty.
76
96
  - State why the target problem matters before talking about the method.
97
+ - Use brainstorm pass 1 to open the space, not to declare a winner.
98
+ - Use literature sweep 1 to test candidate directions against real papers before narrowing them.
99
+ - Use brainstorm pass 2 to explain what survived, what was rejected, and why.
100
+ - Use literature sweep 2 to support the final recommendation with real references across the required buckets.
77
101
  - Compare against existing methods explicitly, not by vague novelty language.
78
102
  - Do not call something new without a literature-scoping bundle and a closest-prior comparison.
103
+ - Do not call something paper-worthy or novel after only one brainstorm pass or one literature sweep.
104
+ - Do not treat the idea artifact itself as the only evidence record; keep `.lab/writing/idea-source-log.md` synchronized with the actual searches and source buckets used in both literature sweeps.
79
105
  - Explain what current methods do, why they fall short, and roughly how the proposed idea would work in plain language.
80
106
  - The three meaningful points should each fit in one direct sentence.
81
- - Before approval, run `.lab/.managed/scripts/validate_idea_artifact.py --idea <idea-artifact> --workflow-config .lab/config/workflow.json`.
107
+ - Before approval, run `.lab/.managed/scripts/validate_idea_artifact.py --idea <idea-artifact> --source-log .lab/writing/idea-source-log.md --workflow-config .lab/config/workflow.json`.
82
108
  - Do not leave `.lab/context/mission.md` as an empty template after convergence; write the approved problem, why it matters, the current benchmark scope, and the approved direction back into canonical context.
@@ -17,6 +17,7 @@ Declare and keep fixed:
17
17
 
18
18
  - `.lab/context/mission.md`
19
19
  - `.lab/context/state.md`
20
+ - `.lab/context/workflow-state.md`
20
21
  - `.lab/context/decisions.md`
21
22
  - `.lab/context/evidence-index.md`
22
23
  - `.lab/context/data-decisions.md`
@@ -25,7 +26,7 @@ Declare and keep fixed:
25
26
 
26
27
  ## Context Write Set
27
28
 
28
- - `.lab/context/state.md`
29
+ - `.lab/context/workflow-state.md`
29
30
  - `.lab/context/decisions.md`
30
31
  - `.lab/context/evidence-index.md`
31
32
  - `.lab/context/open-questions.md`
@@ -62,6 +63,7 @@ If the loop stops without success, record:
62
63
  - Keep figures or plots under `figures_root`.
63
64
  - Do not accumulate long-lived results under `.lab/changes/<change-id>/runs`.
64
65
  - Do not change metric definitions, baseline semantics, or comparison implementations unless the approved evaluation protocol records both their sources and any deviations.
66
+ - Write durable findings and evidence boundary changes into canonical context such as `.lab/context/decisions.md`, `.lab/context/evidence-index.md`, `.lab/context/open-questions.md`, and `.lab/context/eval-protocol.md`, then refresh the derived `state.md` snapshot. Write round-to-round execution progress into `.lab/context/workflow-state.md`.
65
67
  - When you change ladders, sample sizes, or promotion gates, keep the resulting logic anchored to the source-backed evaluation protocol instead of ad-hoc chat reasoning.
66
68
  - Keep `.lab/context/eval-protocol.md` synchronized with the active benchmark scope, ladder gates, source-backed metric definitions, and any accepted implementation deviations instead of leaving it as a stale template.
67
69
  - Re-run the `Academic Validity Checks` and `Integrity self-check` whenever you change inputs, anchors, labels, metrics, comparisons, or promotion logic.
@@ -39,8 +39,8 @@
39
39
 
40
40
  - `.lab/context/mission.md`
41
41
  - `.lab/context/eval-protocol.md`
42
- - `.lab/context/state.md`
43
42
  - `.lab/context/workflow-state.md`
43
+ - `.lab/context/decisions.md`
44
44
  - `.lab/context/evidence-index.md`
45
45
 
46
46
  ## Evidence Rules
@@ -71,6 +71,7 @@
71
71
  - If the existing `report.md` or `main-tables.md` is missing required collaborator-facing sections from the managed templates, treat that as a report deficiency. A rerun must repair the missing sections instead of declaring "no content change" or treating the rerun as a no-op.
72
72
  - After drafting or rerunning the report, run `.lab/.managed/scripts/validate_collaborator_report.py --report <deliverables_root>/report.md --main-tables <deliverables_root>/main-tables.md`. If it fails, keep editing until it passes; do not stop at a no-op audit rerun.
73
73
  - Do not mix workflow deliverable status, rerun ids, or manuscript skeleton status into validated scientific findings; keep those in `<deliverables_root>/artifact-status.md`.
74
+ - Write durable report-level conclusions into canonical context such as `.lab/context/mission.md`, `.lab/context/eval-protocol.md`, `.lab/context/decisions.md`, and `.lab/context/evidence-index.md`, then refresh the derived `state.md` snapshot. Write live reporting progress or immediate handoff actions into `.lab/context/workflow-state.md`.
74
75
  - If `.lab/config/workflow.json` sets the workflow language to Chinese, write `report.md` and `<deliverables_root>/main-tables.md` in Chinese unless a file path, code identifier, or literal metric name must remain unchanged.
75
76
  - Prefer conservative interpretation over marketing language.
76
77
  - Leave a clear handoff path into `/lab:write` with evidence links that section drafts can cite.
@@ -19,8 +19,10 @@
19
19
 
20
20
  ## Context Write Set
21
21
 
22
- - `.lab/context/state.md`
22
+ - `.lab/context/workflow-state.md`
23
+ - `.lab/context/decisions.md`
23
24
  - `.lab/context/open-questions.md`
25
+ - `.lab/context/evidence-index.md`
24
26
 
25
27
  ## Reviewer Priorities
26
28
 
@@ -12,13 +12,15 @@
12
12
 
13
13
  - `.lab/context/mission.md`
14
14
  - `.lab/context/state.md`
15
+ - `.lab/context/workflow-state.md`
15
16
  - `.lab/context/data-decisions.md`
16
17
  - `.lab/context/eval-protocol.md`
17
18
  - `.lab/config/workflow.json`
18
19
 
19
20
  ## Context Write Set
20
21
 
21
- - `.lab/context/state.md`
22
+ - `.lab/context/workflow-state.md`
23
+ - `.lab/context/decisions.md`
22
24
  - `.lab/context/evidence-index.md`
23
25
  - `.lab/context/eval-protocol.md`
24
26
 
@@ -30,6 +32,7 @@
30
32
  - Do not invent metric definitions, baseline behavior, or comparison implementations from memory; anchor them to the approved evaluation protocol and its recorded sources.
31
33
  - Treat `Academic Validity Checks` and `Integrity self-check` as preflight gates, not optional notes. Do not bless a run as protocol-valid until those fields are filled and still match the current experiment.
32
34
  - Treat `Sanity and Alternative-Explanation Checks` as a second preflight gate. If anomaly signals have fired and the implementation reality checks, alternative explanations, cross-check method, best-supported interpretation, or escalation threshold are still blank, do not bless the run as valid evidence.
35
+ - Write durable research conclusions into canonical context such as `.lab/context/decisions.md`, `.lab/context/evidence-index.md`, and `.lab/context/eval-protocol.md`, then refresh the derived `state.md` snapshot. Keep live execution progress in `.lab/context/workflow-state.md`.
33
36
  - If `.lab/context/eval-protocol.md` is still skeletal, write the smallest trustworthy version of the current evaluation objective, metric set, ladder, and source-backed implementation notes before treating the run as the new protocol anchor.
34
37
  - Refuse to treat a run as scientifically valid if the protocol has not answered the generic academic-risk questions: setting semantics, visibility/leakage, anchor or label policy, scale comparability, metric validity, comparison validity, statistical validity, and claim boundary.
35
38
  - Treat all-null outputs, suspiciously identical reruns, no-op deltas, and impl/result mismatches as diagnostic triggers first; check code paths and rule out simpler explanations before interpreting them as findings.
@@ -17,11 +17,12 @@
17
17
  - `.lab/context/mission.md`
18
18
  - `.lab/context/decisions.md`
19
19
  - `.lab/context/state.md`
20
+ - `.lab/context/workflow-state.md`
20
21
  - `.lab/context/data-decisions.md`
21
22
 
22
23
  ## Context Write Set
23
24
 
24
- - `.lab/context/state.md`
25
+ - `.lab/context/workflow-state.md`
25
26
  - `.lab/context/decisions.md`
26
27
 
27
28
  ## Required Change Layout
@@ -29,7 +29,7 @@
29
29
 
30
30
  ## Context Write Set
31
31
 
32
- - `.lab/context/state.md`
32
+ - `.lab/context/workflow-state.md`
33
33
  - `.lab/context/evidence-index.md`
34
34
 
35
35
  ## Required References
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "superlab",
3
- "version": "0.1.28",
3
+ "version": "0.1.29",
4
4
  "description": "Strict /lab research workflow installer for Codex and Claude",
5
5
  "keywords": [
6
6
  "codex",