superlab 0.1.28 → 0.1.30

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (29) hide show
  1. package/lib/auto_contracts.cjs +1 -3
  2. package/lib/auto_runner.cjs +0 -1
  3. package/lib/context.cjs +22 -30
  4. package/lib/i18n.cjs +152 -44
  5. package/lib/lab_idea_contract.json +8 -0
  6. package/package-assets/claude/commands/lab-idea.md +1 -1
  7. package/package-assets/claude/commands/lab.md +2 -4
  8. package/package-assets/codex/prompts/lab-idea.md +1 -1
  9. package/package-assets/codex/prompts/lab.md +2 -4
  10. package/package-assets/shared/lab/.managed/scripts/validate_idea_artifact.py +240 -1
  11. package/package-assets/shared/lab/.managed/templates/idea-source-log.md +37 -0
  12. package/package-assets/shared/lab/.managed/templates/idea.md +67 -1
  13. package/package-assets/shared/lab/context/auto-mode.md +2 -2
  14. package/package-assets/shared/lab/context/session-brief.md +3 -14
  15. package/package-assets/shared/lab/context/state.md +2 -0
  16. package/package-assets/shared/lab/system/core.md +4 -3
  17. package/package-assets/shared/skills/lab/SKILL.md +19 -11
  18. package/package-assets/shared/skills/lab/references/workflow.md +14 -0
  19. package/package-assets/shared/skills/lab/stages/auto.md +6 -3
  20. package/package-assets/shared/skills/lab/stages/data.md +1 -1
  21. package/package-assets/shared/skills/lab/stages/framing.md +1 -1
  22. package/package-assets/shared/skills/lab/stages/idea.md +55 -14
  23. package/package-assets/shared/skills/lab/stages/iterate.md +3 -1
  24. package/package-assets/shared/skills/lab/stages/report.md +2 -1
  25. package/package-assets/shared/skills/lab/stages/review.md +3 -1
  26. package/package-assets/shared/skills/lab/stages/run.md +4 -1
  27. package/package-assets/shared/skills/lab/stages/spec.md +2 -1
  28. package/package-assets/shared/skills/lab/stages/write.md +1 -1
  29. package/package.json +1 -1
@@ -16,7 +16,7 @@ Use the same repository artifacts and stage boundaries every time.
16
16
  ## Stage Aliases
17
17
 
18
18
  - `/lab idea ...` or `/lab-idea`
19
- Research the idea, define the problem and failure case, classify the contribution and breakthrough level, compare against existing methods, end with three meaningful points, and keep an explicit approval gate before any implementation.
19
+ Research the idea through two brainstorm passes and two literature sweeps, define the problem and failure case, compare against closest prior work, then end with a source-backed recommendation and an explicit approval gate before any implementation.
20
20
 
21
21
  - `/lab data ...` or `/lab-data`
22
22
  Turn the approved idea into an approved dataset and benchmark package with dataset years, papers that used each dataset, source audit, download plan, classic-public versus recent-strong-public versus claim-specific benchmark roles, and explicit rationale for canonical baselines, strong historical baselines, recent strong public methods, and closest prior work.
@@ -58,9 +58,7 @@ Use the same repository artifacts and stage boundaries every time.
58
58
  - `iterate` requires a normalized summary from `scripts/eval_report.py`.
59
59
  - `run`, `iterate`, `auto`, and `report` should all follow `.lab/context/eval-protocol.md`, including its recorded sources for metrics and comparison implementations.
60
60
  - `write` requires an approved framing artifact from the `framing` stage.
61
- - `write` requires stable report artifacts and should only change one section per round while following the installed write-stage contract under `skills/lab/stages/write.md`.
62
- - `write` should use `.lab/writing/plan.md` as the write-time source of truth for planned tables, figures, citations, and asset coverage.
63
- - `write` should treat section-quality, claim-safety, and manuscript-delivery validators as the canonical acceptance gates for final-draft or export rounds.
61
+ - `write` requires stable report artifacts and must follow the installed write-stage contract under `skills/lab/stages/write.md` instead of re-stating write-specific rules here.
64
62
 
65
63
  ## How to Ask for `/lab auto`
66
64
 
@@ -6,4 +6,4 @@ argument-hint: idea or research problem
6
6
  Use the installed `lab` skill at `.codex/skills/lab/SKILL.md`.
7
7
 
8
8
  Execute the requested `/lab:idea` stage against the user's argument now. Do not only recommend another lab stage. If a blocking prerequisite is missing, say exactly what is missing and ask at most one clarifying question.
9
- This command runs the `/lab:idea` stage. It must produce a collaborator-readable proposal memo with a plain-language scenario, problem, why-it-matters explanation, explicit current-method landscape, closest-prior-work comparison, a literature scoping bundle that defaults to roughly 20 relevant sources unless the field is too narrow, a rough approach description, and a minimum viable experiment before the approval gate.
9
+ This command runs the `/lab:idea` stage. Use `.codex/skills/lab/stages/idea.md` as the single source of truth for the two brainstorm passes, two literature sweeps, closest-prior comparison, source-backed proposal memo, evaluation sketch, tentative contributions, user guidance, minimum viable experiment, and approval gate. Start with brainstorm pass 1 over 2-4 candidate directions, run literature sweep 1 with real closest-prior references for each direction, narrow the field with brainstorm pass 2, then run literature sweep 2 to build the final source bundle before producing a collaborator-readable recommendation. The final idea memo must explain the real-world scenario, the problem solved, why current methods fall short, roughly how the idea would work, how it would be evaluated, what the tentative contributions are, and what the user should decide next. Keep `.lab/writing/idea-source-log.md` synchronized with the actual search queries, bucketed sources, and final source count used in both sweeps. The literature bundle should default to about 20 sources unless the field is genuinely narrow and that smaller bundle is explicitly justified.
@@ -10,7 +10,7 @@ argument-hint: workflow question or stage choice
10
10
  ## Subcommands
11
11
 
12
12
  - `/lab:idea`
13
- Research the idea, define the problem and failure case, classify the contribution and breakthrough level, compare against existing methods, end with three meaningful points, and keep an explicit approval gate before any implementation.
13
+ Research the idea through two brainstorm passes and two literature sweeps, define the problem and failure case, compare against closest prior work, then end with a source-backed recommendation and an explicit approval gate before any implementation.
14
14
 
15
15
  - `/lab:data`
16
16
  Turn the approved idea into an approved dataset and benchmark package with dataset years, papers that used each dataset, source audit, download plan, classic-public versus recent-strong-public versus claim-specific benchmark roles, and explicit rationale for canonical baselines, strong historical baselines, recent strong public methods, and closest prior work.
@@ -52,9 +52,7 @@ argument-hint: workflow question or stage choice
52
52
  - `/lab:iterate` requires a normalized summary from `scripts/eval_report.py`.
53
53
  - `/lab:run`, `/lab:iterate`, `/lab:auto`, and `/lab:report` should all follow `.lab/context/eval-protocol.md`, including its recorded sources for metrics and comparison implementations.
54
54
  - `/lab:write` requires an approved framing artifact from `/lab:framing`.
55
- - `/lab:write` requires stable report artifacts and should only change one section per round while following the installed write-stage contract under `skills/lab/stages/write.md`.
56
- - `/lab:write` should use `.lab/writing/plan.md` as the write-time source of truth for planned tables, figures, citations, and asset coverage.
57
- - `/lab:write` should treat section-quality, claim-safety, and manuscript-delivery validators as the canonical acceptance gates for final-draft or export rounds.
55
+ - `/lab:write` requires stable report artifacts and must follow the installed write-stage contract under `skills/lab/stages/write.md` instead of re-stating write-specific rules here.
58
56
 
59
57
  ## How to Ask for `/lab:auto`
60
58
 
@@ -11,19 +11,60 @@ REQUIRED_SECTIONS = {
11
11
  "One-Sentence Problem": [r"^##\s+One-Sentence Problem\s*$", r"^##\s+一句话问题(?:定义)?\s*$"],
12
12
  "Why It Matters": [r"^##\s+Why It Matters\s*$", r"^##\s+为什么重要\s*$"],
13
13
  "Existing Methods": [r"^##\s+Existing Methods\s*$", r"^##\s+现有方法(?:与失败模式)?\s*$"],
14
+ "Brainstorm Pass 1": [r"^##\s+Brainstorm Pass 1\s*$", r"^##\s+第一轮脑暴\s*$"],
15
+ "Literature Sweep 1": [r"^##\s+Literature Sweep 1\s*$", r"^##\s+第一轮文献(?:检索|收敛)?\s*$"],
14
16
  "Literature Scoping Bundle": [r"^##\s+Literature Scoping Bundle\s*$", r"^##\s+文献范围(?:包)?\s*$"],
15
17
  "Closest Prior Work Comparison": [r"^##\s+Closest Prior Work Comparison\s*$", r"^##\s+最接近前作对照\s*$"],
18
+ "Brainstorm Pass 2": [r"^##\s+Brainstorm Pass 2\s*$", r"^##\s+第二轮脑暴\s*$"],
19
+ "Literature Sweep 2": [r"^##\s+Literature Sweep 2\s*$", r"^##\s+第二轮文献(?:检索|收敛)?\s*$"],
16
20
  "Rough Approach": [r"^##\s+Rough Approach\s*$", r"^##\s+我们准备怎么做\s*$"],
21
+ "Problem Solved": [r"^##\s+Problem Solved\s*$", r"^##\s+解决了什么问题\s*$"],
22
+ "Evaluation Sketch": [r"^##\s+Evaluation Sketch\s*$", r"^##\s+评测草图\s*$"],
23
+ "Tentative Contributions": [r"^##\s+Tentative Contributions\s*$", r"^##\s+暂定贡献\s*$"],
17
24
  "Candidate Experiment": [r"^##\s+Candidate Experiment\s*$", r"^##\s+(?:最小实验|候选实验)\s*$"],
18
25
  "Falsifiable Hypothesis": [r"^##\s+Falsifiable Hypothesis\s*$", r"^##\s+可证伪假设\s*$"],
26
+ "Final Recommendation": [r"^##\s+Final Recommendation\s*$", r"^##\s+最终推荐\s*$"],
27
+ "User Guidance": [r"^##\s+User Guidance\s*$", r"^##\s+用户引导\s*$"],
28
+ }
29
+
30
+ SOURCE_LOG_SECTIONS = {
31
+ "Search Intent": [r"^##\s+Search Intent\s*$", r"^##\s+检索意图\s*$"],
32
+ "Sweep 1 Log": [r"^##\s+Sweep 1 Log\s*$", r"^##\s+第一轮检索记录\s*$"],
33
+ "Sweep 2 Log": [r"^##\s+Sweep 2 Log\s*$", r"^##\s+第二轮检索记录\s*$"],
34
+ "Source Integrity Notes": [r"^##\s+Source Integrity Notes\s*$", r"^##\s+来源完整性说明\s*$"],
35
+ }
36
+
37
+ REFERENCE_PATTERN = re.compile(
38
+ r"(https?://\S+|arxiv\.org/\S+|doi\.org/\S+|doi:\s*\S+|\[[^\]]+\]\([^)]+\))",
39
+ flags=re.IGNORECASE,
40
+ )
41
+
42
+ MANDATORY_SOURCE_BUCKETS = {
43
+ "Closest prior": (
44
+ ("Closest prior bucket", "最接近前作来源数", "最接近前作 bucket"),
45
+ ("Closest prior", "最接近前作"),
46
+ ),
47
+ "Recent strong papers": (
48
+ ("Recent strong papers", "近期强相关论文", "近期强论文"),
49
+ ("Recent strong papers", "近期强相关论文", "近期强论文"),
50
+ ),
51
+ "Benchmark or evaluation papers": (
52
+ ("Benchmark or evaluation papers", "基准或评测论文", "基准论文"),
53
+ ("Benchmark or evaluation papers", "基准或评测论文", "基准论文"),
54
+ ),
55
+ "Survey or taxonomy papers": (
56
+ ("Survey or taxonomy papers", "综述或 taxonomy 论文", "综述论文"),
57
+ ("Survey or taxonomy papers", "综述或 taxonomy 论文", "综述论文"),
58
+ ),
19
59
  }
20
60
 
21
61
 
22
62
  def parse_args():
23
63
  parser = argparse.ArgumentParser(
24
- description="Validate that an idea artifact is source-backed, plain-language, and aligned with workflow language."
64
+ description="Validate that an idea artifact and idea source log are source-backed, plain-language, and aligned with workflow language."
25
65
  )
26
66
  parser.add_argument("--idea", required=True, help="Path to the idea artifact markdown file")
67
+ parser.add_argument("--source-log", required=True, help="Path to the idea source log markdown file")
27
68
  parser.add_argument("--workflow-config", required=True, help="Path to .lab/config/workflow.json")
28
69
  return parser.parse_args()
29
70
 
@@ -57,11 +98,36 @@ def missing_sections(text: str) -> list[str]:
57
98
  return missing
58
99
 
59
100
 
101
+ def missing_source_log_sections(text: str) -> list[str]:
102
+ missing = []
103
+ for section_name, patterns in SOURCE_LOG_SECTIONS.items():
104
+ if not any(re.search(pattern, text, flags=re.MULTILINE) for pattern in patterns):
105
+ missing.append(section_name)
106
+ return missing
107
+
108
+
60
109
  def contains_any(text: str, needles: tuple[str, ...]) -> bool:
61
110
  lowered = text.lower()
62
111
  return any(needle.lower() in lowered for needle in needles)
63
112
 
64
113
 
114
+ def count_references(text: str) -> int:
115
+ return len(REFERENCE_PATTERN.findall(text))
116
+
117
+
118
+ def unique_references(text: str) -> set[str]:
119
+ return {match[0] if isinstance(match, tuple) else match for match in REFERENCE_PATTERN.findall(text)}
120
+
121
+
122
+ def extract_numeric_field(body: str, labels: tuple[str, ...]) -> int | None:
123
+ for label in labels:
124
+ pattern = re.compile(rf"{re.escape(label)}\s*[::]\s*(\d+)", flags=re.IGNORECASE)
125
+ match = pattern.search(body)
126
+ if match:
127
+ return int(match.group(1))
128
+ return None
129
+
130
+
65
131
  def has_field_value(body: str, labels: tuple[str, ...]) -> bool:
66
132
  for label in labels:
67
133
  pattern = re.compile(rf"^\s*(?:-|\d+\.)\s*{re.escape(label)}[::][ \t]*([^\n]+?)\s*$", flags=re.MULTILINE)
@@ -72,6 +138,38 @@ def has_field_value(body: str, labels: tuple[str, ...]) -> bool:
72
138
  return False
73
139
 
74
140
 
141
+ def has_non_placeholder_field_value(body: str, labels: tuple[str, ...]) -> bool:
142
+ return has_field_value(body, labels)
143
+
144
+
145
+ def extract_bucket_body(body: str, labels: tuple[str, ...]) -> str:
146
+ lines = body.splitlines()
147
+ start_index = None
148
+ start_indent = 0
149
+ for index, line in enumerate(lines):
150
+ stripped = line.lstrip()
151
+ indent = len(line) - len(stripped)
152
+ for label in labels:
153
+ if re.match(rf"^-\s*{re.escape(label)}\s*:\s*$", stripped, flags=re.IGNORECASE):
154
+ start_index = index + 1
155
+ start_indent = indent
156
+ break
157
+ if start_index is not None:
158
+ break
159
+
160
+ if start_index is None:
161
+ return ""
162
+
163
+ captured: list[str] = []
164
+ for line in lines[start_index:]:
165
+ stripped = line.lstrip()
166
+ indent = len(line) - len(stripped)
167
+ if stripped.startswith("- ") and indent <= start_indent:
168
+ break
169
+ captured.append(line)
170
+ return "\n".join(captured).strip()
171
+
172
+
75
173
  def validate_content(text: str) -> list[str]:
76
174
  issues: list[str] = []
77
175
  scenario = extract_section_body(text, REQUIRED_SECTIONS["Scenario"])
@@ -86,22 +184,88 @@ def validate_content(text: str) -> list[str]:
86
184
  if not contains_any(existing_methods, ("mainstream", "current", "shared assumption", "主流", "共同假设", "不够", "不足")):
87
185
  issues.append("idea artifact is missing a concrete current-methods landscape")
88
186
 
187
+ brainstorm_1 = extract_section_body(text, REQUIRED_SECTIONS["Brainstorm Pass 1"])
188
+ if not contains_any(brainstorm_1, ("candidate direction", "候选方向", "worth checking", "值得检查")):
189
+ issues.append("idea artifact is missing a real brainstorm pass 1 shortlist")
190
+
191
+ sweep_1 = extract_section_body(text, REQUIRED_SECTIONS["Literature Sweep 1"])
192
+ if count_references(sweep_1) < 3:
193
+ issues.append("idea artifact is missing literature sweep 1 with real references")
194
+
89
195
  literature = extract_section_body(text, REQUIRED_SECTIONS["Literature Scoping Bundle"])
90
196
  if not contains_any(literature, ("target", "source count", "closest prior", "recent strong", "benchmark", "survey", "adjacent", "目标", "来源数", "最接近前作", "近期", "基准", "综述", "相邻领域")):
91
197
  issues.append("idea artifact is missing a literature scoping bundle")
198
+ if not re.search(r"(actual source count|当前已覆盖来源数|实际来源数)\s*[::]\s*\d+", literature, flags=re.IGNORECASE):
199
+ issues.append("idea artifact is missing a concrete literature source count")
200
+ default_target = extract_numeric_field(literature, ("Default target source count", "默认目标来源数", "默认来源目标数"))
201
+ actual_source_count = extract_numeric_field(literature, ("Actual source count", "当前已覆盖来源数", "实际来源数"))
202
+ if default_target is None:
203
+ issues.append("idea artifact is missing a default target source count")
204
+ if actual_source_count is not None and default_target is not None and actual_source_count < default_target:
205
+ if not has_non_placeholder_field_value(
206
+ literature,
207
+ ("If the total is below the default target, why", "如果总数低于默认目标,为什么"),
208
+ ):
209
+ issues.append("idea artifact is below the default target without explaining why the smaller source bundle is acceptable")
210
+ for bucket_name, (count_labels, _) in MANDATORY_SOURCE_BUCKETS.items():
211
+ bucket_count = extract_numeric_field(literature, count_labels)
212
+ if bucket_count is None or bucket_count <= 0:
213
+ issues.append(f"idea artifact is missing mandatory literature coverage for {bucket_name.lower()}")
92
214
 
93
215
  closest_prior = extract_section_body(text, REQUIRED_SECTIONS["Closest Prior Work Comparison"])
94
216
  if not contains_any(closest_prior, ("citation", "difference", "limitation", "引用", "差异", "局限")):
95
217
  issues.append("idea artifact is missing a closest prior work comparison")
218
+ if count_references(closest_prior) < 1:
219
+ issues.append("idea artifact is missing real reference markers in the closest prior work comparison")
220
+
221
+ brainstorm_2 = extract_section_body(text, REQUIRED_SECTIONS["Brainstorm Pass 2"])
222
+ if not contains_any(brainstorm_2, ("surviving direction", "recommended narrowed direction", "surviving", "幸存方向", "推荐收敛方向", "淘汰")):
223
+ issues.append("idea artifact is missing a real brainstorm pass 2 narrowing step")
224
+
225
+ sweep_2 = extract_section_body(text, REQUIRED_SECTIONS["Literature Sweep 2"])
226
+ if count_references(sweep_2) < 5:
227
+ issues.append("idea artifact is missing literature sweep 2 with real references")
96
228
 
97
229
  rough_approach = extract_section_body(text, REQUIRED_SECTIONS["Rough Approach"])
98
230
  if not contains_any(rough_approach, ("plain-language", "how this would work", "粗略做法", "怎么做", "why this design", "为什么")):
99
231
  issues.append("idea artifact is missing a rough plain-language approach")
100
232
 
233
+ problem_solved = extract_section_body(text, REQUIRED_SECTIONS["Problem Solved"])
234
+ if not has_field_value(problem_solved, ("In plain language", "白话问题", "用大白话说")):
235
+ issues.append("idea artifact is missing a plain-language statement of what problem the idea actually solves")
236
+ if not has_field_value(problem_solved, ("What becomes possible if this works", "如果这条路成立", "如果这条路可行")):
237
+ issues.append("idea artifact is missing the payoff of solving the proposed problem")
238
+
239
+ evaluation_sketch = extract_section_body(text, REQUIRED_SECTIONS["Evaluation Sketch"])
240
+ if not has_field_value(evaluation_sketch, ("Evaluation subject", "评测对象")):
241
+ issues.append("idea artifact is missing an evaluation sketch with the evaluation subject")
242
+ if not has_field_value(evaluation_sketch, ("Proxy or simulator, if any", "代理或模拟器")):
243
+ issues.append("idea artifact is missing an evaluation sketch that states any proxy or simulator")
244
+ if not has_field_value(evaluation_sketch, ("Main outcome to observe", "主要观察结果")):
245
+ issues.append("idea artifact is missing the main outcome in the evaluation sketch")
246
+ if not has_field_value(evaluation_sketch, ("Main validity risk", "主要有效性风险")):
247
+ issues.append("idea artifact is missing the main validity risk in the evaluation sketch")
248
+
249
+ tentative_contributions = extract_section_body(text, REQUIRED_SECTIONS["Tentative Contributions"])
250
+ if sum(1 for label in ("Contribution 1", "Contribution 2", "Contribution 3", "贡献 1", "贡献 2", "贡献 3") if has_field_value(tentative_contributions, (label,))) < 2:
251
+ issues.append("idea artifact is missing tentative contributions stated at the idea level")
252
+
101
253
  experiment = extract_section_body(text, REQUIRED_SECTIONS["Candidate Experiment"])
102
254
  if not contains_any(experiment, ("minimum viable experiment", "minimum experiment", "dataset", "metric", "最小实验", "主指标", "次指标")):
103
255
  issues.append("idea artifact is missing a minimum experiment")
104
256
 
257
+ final_recommendation = extract_section_body(text, REQUIRED_SECTIONS["Final Recommendation"])
258
+ if not contains_any(final_recommendation, ("recommended direction", "paper-worthy", "推荐方向", "值得做论文")):
259
+ issues.append("idea artifact is missing a final recommendation after the second sweep")
260
+
261
+ user_guidance = extract_section_body(text, REQUIRED_SECTIONS["User Guidance"])
262
+ if not has_field_value(user_guidance, ("Immediate decision needed from the user", "现在最需要你确认的选择", "Immediate decision")):
263
+ issues.append("idea artifact is missing user guidance about the next decision")
264
+ if not has_field_value(user_guidance, ("Information that would sharpen the idea", "哪些信息会显著提高下一轮判断质量", "Information that would sharpen")):
265
+ issues.append("idea artifact is missing user guidance about what information would sharpen the idea")
266
+ if not has_field_value(user_guidance, ("Recommended next stage", "推荐下一步")):
267
+ issues.append("idea artifact is missing user guidance about the next lab stage")
268
+
105
269
  return issues
106
270
 
107
271
 
@@ -114,25 +278,100 @@ def validate_language(text: str, workflow_language: str) -> list[str]:
114
278
  return []
115
279
 
116
280
 
281
+ def validate_source_log(text: str) -> list[str]:
282
+ issues: list[str] = []
283
+ search_intent = extract_section_body(text, SOURCE_LOG_SECTIONS["Search Intent"])
284
+ if not contains_any(search_intent, ("problem framing", "search constraints", "search window", "问题 framing", "检索约束", "检索窗口", "目标方向")):
285
+ issues.append("idea source log is missing search intent and search boundary details")
286
+
287
+ sweep_1 = extract_section_body(text, SOURCE_LOG_SECTIONS["Sweep 1 Log"])
288
+ if not contains_any(sweep_1, ("query strings", "sources found", "查询", "已找到来源")):
289
+ issues.append("idea source log is missing sweep 1 search queries or source listings")
290
+ if count_references(sweep_1) < 3:
291
+ issues.append("idea source log is missing enough real references for sweep 1")
292
+
293
+ sweep_2 = extract_section_body(text, SOURCE_LOG_SECTIONS["Sweep 2 Log"])
294
+ if not contains_any(sweep_2, ("final source bundle", "closest prior", "recent strong", "benchmark", "survey", "adjacent", "final source bundle", "最终来源包")):
295
+ issues.append("idea source log is missing a bucketed sweep 2 source bundle")
296
+ if count_references(sweep_2) < 5:
297
+ issues.append("idea source log is missing enough real references for sweep 2")
298
+ actual_count = extract_numeric_field(sweep_2, ("Actual source count", "实际来源数", "当前已覆盖来源数"))
299
+ if actual_count is None:
300
+ issues.append("idea source log is missing an actual source count")
301
+ elif len(unique_references(text)) < actual_count:
302
+ issues.append("idea source log actual source count exceeds the unique references recorded in the source log")
303
+ for bucket_name, (_, bucket_labels) in MANDATORY_SOURCE_BUCKETS.items():
304
+ bucket_body = extract_bucket_body(sweep_2, bucket_labels)
305
+ if count_references(bucket_body) < 1:
306
+ issues.append(f"idea source log is missing mandatory sweep 2 references for {bucket_name.lower()}")
307
+
308
+ integrity = extract_section_body(text, SOURCE_LOG_SECTIONS["Source Integrity Notes"])
309
+ if not contains_any(integrity, ("duplicates removed", "unused or weak sources", "caveat", "去重", "未直接依赖", "备注", "caveat")):
310
+ issues.append("idea source log is missing source integrity notes")
311
+
312
+ return issues
313
+
314
+
315
+ def cross_validate_idea_and_source_log(idea_text: str, source_log_text: str) -> list[str]:
316
+ issues: list[str] = []
317
+ literature = extract_section_body(idea_text, REQUIRED_SECTIONS["Literature Scoping Bundle"])
318
+ source_sweep_2 = extract_section_body(source_log_text, SOURCE_LOG_SECTIONS["Sweep 2 Log"])
319
+ idea_count = extract_numeric_field(literature, ("Actual source count", "当前已覆盖来源数", "实际来源数"))
320
+ source_count = extract_numeric_field(source_sweep_2, ("Actual source count", "当前已覆盖来源数", "实际来源数"))
321
+ default_target = extract_numeric_field(literature, ("Default target source count", "默认目标来源数", "默认来源目标数"))
322
+ source_integrity = extract_section_body(source_log_text, SOURCE_LOG_SECTIONS["Source Integrity Notes"])
323
+
324
+ if idea_count is not None and source_count is not None and idea_count != source_count:
325
+ issues.append("idea source log actual source count does not match the literature scoping bundle in the idea artifact")
326
+
327
+ if count_references(source_sweep_2) < count_references(extract_section_body(idea_text, REQUIRED_SECTIONS["Literature Sweep 2"])):
328
+ issues.append("idea source log sweep 2 is missing references that appear in the idea artifact literature sweep 2")
329
+
330
+ if idea_count is not None and default_target is not None and idea_count < default_target:
331
+ if not (
332
+ has_non_placeholder_field_value(
333
+ source_sweep_2,
334
+ ("Why the bundle is below the default target", "Why the bundle stays below the default target", "为什么最终来源包低于默认目标"),
335
+ )
336
+ or has_non_placeholder_field_value(
337
+ source_integrity,
338
+ ("Why the bundle is below the default target", "Why the bundle stays below the default target", "为什么最终来源包低于默认目标"),
339
+ )
340
+ ):
341
+ issues.append("idea source log is below the default target without recording why the smaller source bundle is acceptable")
342
+
343
+ return issues
344
+
345
+
117
346
  def main():
118
347
  args = parse_args()
119
348
  idea_path = Path(args.idea)
349
+ source_log_path = Path(args.source_log)
120
350
  config_path = Path(args.workflow_config)
121
351
  if not idea_path.exists():
122
352
  print(f"idea artifact does not exist: {idea_path}", file=sys.stderr)
123
353
  return 1
354
+ if not source_log_path.exists():
355
+ print(f"idea source log does not exist: {source_log_path}", file=sys.stderr)
356
+ return 1
124
357
  if not config_path.exists():
125
358
  print(f"workflow config does not exist: {config_path}", file=sys.stderr)
126
359
  return 1
127
360
 
128
361
  text = read_text(idea_path)
362
+ source_log_text = read_text(source_log_path)
129
363
  workflow_language = load_workflow_language(config_path)
130
364
  issues = []
131
365
  missing = missing_sections(text)
132
366
  if missing:
133
367
  issues.append(f"idea artifact is missing required sections: {', '.join(missing)}")
368
+ missing_source_sections = missing_source_log_sections(source_log_text)
369
+ if missing_source_sections:
370
+ issues.append(f"idea source log is missing required sections: {', '.join(missing_source_sections)}")
134
371
  issues.extend(validate_content(text))
135
372
  issues.extend(validate_language(text, workflow_language))
373
+ issues.extend(validate_source_log(source_log_text))
374
+ issues.extend(cross_validate_idea_and_source_log(text, source_log_text))
136
375
 
137
376
  if issues:
138
377
  for issue in issues:
@@ -0,0 +1,37 @@
1
+ # Idea Source Log
2
+
3
+ ## Search Intent
4
+
5
+ - Problem framing:
6
+ - Search constraints:
7
+ - Search window:
8
+
9
+ ## Sweep 1 Log
10
+
11
+ - Direction 1 query strings:
12
+ - Sources found:
13
+ - Direction 2 query strings:
14
+ - Sources found:
15
+ - Direction 3 query strings:
16
+ - Sources found:
17
+ - Direction 4 query strings:
18
+ - Sources found:
19
+ - Early elimination notes:
20
+
21
+ ## Sweep 2 Log
22
+
23
+ - Surviving directions:
24
+ - Final source bundle:
25
+ - Closest prior:
26
+ - Recent strong papers:
27
+ - Benchmark or evaluation papers:
28
+ - Survey or taxonomy papers:
29
+ - Adjacent-field papers:
30
+ - Actual source count:
31
+
32
+ ## Source Integrity Notes
33
+
34
+ - Duplicates removed:
35
+ - Unused or weak sources not relied on:
36
+ - Why the bundle stays below the default target:
37
+ - Caveats:
@@ -54,10 +54,32 @@ Suggested levels:
54
54
  ## Existing Methods
55
55
 
56
56
  - Mainstream line 1:
57
+ - Citation:
58
+ - What it solves:
59
+ - Why it still falls short here:
57
60
  - Mainstream line 2:
61
+ - Citation:
62
+ - What it solves:
63
+ - Why it still falls short here:
58
64
  - Shared assumption:
59
65
  - Why that assumption breaks here:
60
66
 
67
+ ## Brainstorm Pass 1
68
+
69
+ - Candidate direction 1:
70
+ - Candidate direction 2:
71
+ - Candidate direction 3:
72
+ - Candidate direction 4:
73
+ - Why these directions are worth checking:
74
+
75
+ ## Literature Sweep 1
76
+
77
+ - Direction 1 seed references:
78
+ - Direction 2 seed references:
79
+ - Direction 3 seed references:
80
+ - Direction 4 seed references:
81
+ - Early conclusion from the first sweep:
82
+
61
83
  ## Literature Scoping Bundle
62
84
 
63
85
  - Default target source count:
@@ -67,7 +89,7 @@ Suggested levels:
67
89
  - Benchmark or evaluation papers:
68
90
  - Survey or taxonomy papers:
69
91
  - Adjacent-field papers:
70
- - If the total is below the default target, why:
92
+ - If the total is below the default target, why is the smaller bundle still acceptable:
71
93
 
72
94
  ## Closest Prior Work Comparison
73
95
 
@@ -84,6 +106,21 @@ Suggested levels:
84
106
  - Limitation for the current problem:
85
107
  - Difference from our direction:
86
108
 
109
+ ## Brainstorm Pass 2
110
+
111
+ - Surviving direction 1:
112
+ - Surviving direction 2:
113
+ - Rejected directions and why:
114
+ - Recommended narrowed direction:
115
+
116
+ ## Literature Sweep 2
117
+
118
+ - Recent strong papers:
119
+ - Benchmark or evaluation papers:
120
+ - Survey or taxonomy papers:
121
+ - Adjacent-field papers:
122
+ - Final literature takeaway:
123
+
87
124
  ## Why Ours Is Different
88
125
 
89
126
  - Existing methods rely on:
@@ -96,6 +133,24 @@ Suggested levels:
96
133
  - Plain-language description of how this would work:
97
134
  - Why this design might resolve the failure case:
98
135
 
136
+ ## Problem Solved
137
+
138
+ - In plain language:
139
+ - What becomes possible if this works:
140
+
141
+ ## Evaluation Sketch
142
+
143
+ - Evaluation subject:
144
+ - Proxy or simulator, if any:
145
+ - Main outcome to observe:
146
+ - Main validity risk:
147
+
148
+ ## Tentative Contributions
149
+
150
+ - Contribution 1:
151
+ - Contribution 2:
152
+ - Contribution 3:
153
+
99
154
  ## Three Meaningful Points
100
155
 
101
156
  1. Significance:
@@ -140,6 +195,17 @@ Suggested levels:
140
195
  - What must be validated before implementation:
141
196
  - Kill criteria:
142
197
 
198
+ ## Final Recommendation
199
+
200
+ - Recommended direction after two sweeps:
201
+ - Why this is still paper-worthy:
202
+
203
+ ## User Guidance
204
+
205
+ - Immediate decision needed from the user:
206
+ - Information that would sharpen the idea:
207
+ - Recommended next stage:
208
+
143
209
  ## Approval Gate
144
210
 
145
211
  - User-approved direction:
@@ -51,7 +51,7 @@ If `eval-protocol.md` declares structured rung entries, auto mode follows those
51
51
 
52
52
  - Run stage contract: write persistent outputs under `results_root`.
53
53
  - Iterate stage contract: update persistent outputs under `results_root`.
54
- - Review stage contract: update canonical review context such as `.lab/context/decisions.md`, `state.md`, `workflow-state.md`, `open-questions.md`, or `evidence-index.md`.
54
+ - Review stage contract: update canonical review context such as `.lab/context/decisions.md`, `workflow-state.md`, `open-questions.md`, or `evidence-index.md`, then refresh derived views.
55
55
  - Report stage contract: write `<deliverables_root>/report.md`, `<deliverables_root>/main-tables.md`, and `<deliverables_root>/artifact-status.md`.
56
56
  - Write stage contract: write LaTeX output under `<deliverables_root>/paper/`.
57
57
 
@@ -68,4 +68,4 @@ If `eval-protocol.md` declares structured rung entries, auto mode follows those
68
68
 
69
69
  - Stop conditions:
70
70
  - Escalation conditions:
71
- - Canonical promotion writeback: update `.lab/context/data-decisions.md`, `.lab/context/decisions.md`, `.lab/context/state.md`, and `.lab/context/workflow-state.md`.
71
+ - Canonical promotion writeback: update `.lab/context/data-decisions.md`, `.lab/context/decisions.md`, and `.lab/context/workflow-state.md`, then refresh derived views such as `state.md`.
@@ -1,27 +1,16 @@
1
1
  # Session Brief
2
2
 
3
- ## Active Stage
3
+ ## Immediate Focus
4
4
 
5
5
  - Stage:
6
6
  - Current objective:
7
7
  - Immediate next action:
8
8
 
9
- ## Mission
10
-
11
- One sentence describing the active research mission.
12
-
13
- ## Best Current Path
9
+ ## Mission Snapshot
14
10
 
11
+ - Mission:
15
12
  - Approved direction:
16
13
  - Strongest supported claim:
17
- - Auto mode:
18
- - Auto objective:
19
- - Auto decision:
20
- - Collaborator report mode:
21
- - Canonical context readiness:
22
- - Method name:
23
- - Primary metrics:
24
- - Secondary metrics:
25
14
 
26
15
  ## Main Risk
27
16
 
@@ -1,5 +1,7 @@
1
1
  # Research State
2
2
 
3
+ > This file is a derived durable snapshot. Update canonical context files such as `mission.md`, `decisions.md`, `data-decisions.md`, `evidence-index.md`, `eval-protocol.md`, and `open-questions.md`, then refresh derived context instead of editing this file directly.
4
+
3
5
  ## Approved Direction
4
6
 
5
7
  - One-sentence problem:
@@ -31,7 +31,7 @@ For auto-mode orchestration or long-running experiment campaigns, also read:
31
31
  - Figures and plots belong under the configured `figures_root`, not inside `.lab/changes/`.
32
32
  - Deliverables belong under the configured `deliverables_root`, not inside `.lab/context/`.
33
33
  - Change-local `data/` directories may hold lightweight manifests or batch specs, but not the canonical dataset copy.
34
- - `.lab/context/state.md` holds durable research state; `.lab/context/workflow-state.md` holds live workflow state.
34
+ - `.lab/context/state.md` is a derived durable research snapshot; `.lab/context/workflow-state.md` holds live workflow state.
35
35
  - `.lab/context/summary.md` is the durable project summary; `.lab/context/session-brief.md` is the next-session startup brief.
36
36
  - `.lab/context/auto-mode.md` defines the bounded autonomous envelope; `.lab/context/auto-status.md` records live state for resume and handoff.
37
37
  - If the user provides a LaTeX template directory, validate it and attach it through `paper_template_root` before drafting.
@@ -55,6 +55,7 @@ Do not force `/lab:*` onto unrelated engineering tasks.
55
55
 
56
56
  ## State Discipline
57
57
 
58
- - Treat `.lab/context/*` as durable project state.
59
- - Do not silently overwrite context files.
58
+ - Treat canonical context files such as `mission.md`, `decisions.md`, `data-decisions.md`, `evidence-index.md`, `eval-protocol.md`, and `open-questions.md` as durable project state.
59
+ - Treat `state.md`, `summary.md`, `session-brief.md`, and `next-action.md` as derived views.
60
+ - Do not silently overwrite canonical context files.
60
61
  - Keep sourced evidence separate from generated hypotheses.