@researai/deepscientist 1.5.14 → 1.5.16

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (225) hide show
  1. package/README.md +336 -90
  2. package/assets/branding/logo-raster.png +0 -0
  3. package/bin/ds.js +816 -131
  4. package/docs/en/00_QUICK_START.md +36 -15
  5. package/docs/en/01_SETTINGS_REFERENCE.md +53 -4
  6. package/docs/en/02_START_RESEARCH_GUIDE.md +7 -0
  7. package/docs/en/03_QQ_CONNECTOR_GUIDE.md +19 -0
  8. package/docs/en/05_TUI_GUIDE.md +6 -0
  9. package/docs/en/06_RUNTIME_AND_CANVAS.md +4 -3
  10. package/docs/en/09_DOCTOR.md +11 -5
  11. package/docs/en/10_WEIXIN_CONNECTOR_GUIDE.md +20 -0
  12. package/docs/en/14_PROMPT_SKILLS_AND_MCP_GUIDE.md +65 -13
  13. package/docs/en/15_CODEX_PROVIDER_SETUP.md +25 -8
  14. package/docs/en/16_TELEGRAM_CONNECTOR_GUIDE.md +134 -0
  15. package/docs/en/17_WHATSAPP_CONNECTOR_GUIDE.md +126 -0
  16. package/docs/en/18_FEISHU_CONNECTOR_GUIDE.md +136 -0
  17. package/docs/en/19_EXTERNAL_CONTROLLER_GUIDE.md +226 -0
  18. package/docs/en/19_LOCAL_BROWSER_AUTH.md +70 -0
  19. package/docs/en/20_WORKSPACE_MODES_GUIDE.md +250 -0
  20. package/docs/en/README.md +24 -0
  21. package/docs/zh/00_QUICK_START.md +36 -15
  22. package/docs/zh/01_SETTINGS_REFERENCE.md +53 -4
  23. package/docs/zh/02_START_RESEARCH_GUIDE.md +7 -0
  24. package/docs/zh/03_QQ_CONNECTOR_GUIDE.md +19 -0
  25. package/docs/zh/05_TUI_GUIDE.md +6 -0
  26. package/docs/zh/09_DOCTOR.md +11 -5
  27. package/docs/zh/10_WEIXIN_CONNECTOR_GUIDE.md +20 -0
  28. package/docs/zh/14_PROMPT_SKILLS_AND_MCP_GUIDE.md +65 -13
  29. package/docs/zh/15_CODEX_PROVIDER_SETUP.md +25 -8
  30. package/docs/zh/16_TELEGRAM_CONNECTOR_GUIDE.md +134 -0
  31. package/docs/zh/17_WHATSAPP_CONNECTOR_GUIDE.md +126 -0
  32. package/docs/zh/18_FEISHU_CONNECTOR_GUIDE.md +136 -0
  33. package/docs/zh/19_EXTERNAL_CONTROLLER_GUIDE.md +226 -0
  34. package/docs/zh/19_LOCAL_BROWSER_AUTH.md +68 -0
  35. package/docs/zh/20_WORKSPACE_MODES_GUIDE.md +251 -0
  36. package/docs/zh/README.md +24 -0
  37. package/install.sh +2 -0
  38. package/package.json +1 -1
  39. package/pyproject.toml +1 -1
  40. package/src/deepscientist/__init__.py +1 -1
  41. package/src/deepscientist/acp/envelope.py +6 -0
  42. package/src/deepscientist/artifact/charts.py +567 -0
  43. package/src/deepscientist/artifact/guidance.py +50 -10
  44. package/src/deepscientist/artifact/metrics.py +228 -5
  45. package/src/deepscientist/artifact/schemas.py +3 -0
  46. package/src/deepscientist/artifact/service.py +4276 -308
  47. package/src/deepscientist/bash_exec/models.py +23 -0
  48. package/src/deepscientist/bash_exec/monitor.py +147 -67
  49. package/src/deepscientist/bash_exec/runtime.py +218 -156
  50. package/src/deepscientist/bash_exec/service.py +309 -69
  51. package/src/deepscientist/bash_exec/shells.py +87 -0
  52. package/src/deepscientist/bridges/connectors.py +51 -2
  53. package/src/deepscientist/cli.py +115 -19
  54. package/src/deepscientist/codex_cli_compat.py +232 -0
  55. package/src/deepscientist/config/models.py +8 -4
  56. package/src/deepscientist/config/service.py +38 -11
  57. package/src/deepscientist/connector/weixin_support.py +122 -1
  58. package/src/deepscientist/daemon/api/handlers.py +199 -9
  59. package/src/deepscientist/daemon/api/router.py +5 -0
  60. package/src/deepscientist/daemon/app.py +1458 -289
  61. package/src/deepscientist/doctor.py +51 -0
  62. package/src/deepscientist/file_lock.py +48 -0
  63. package/src/deepscientist/gitops/__init__.py +10 -1
  64. package/src/deepscientist/gitops/diff.py +296 -1
  65. package/src/deepscientist/gitops/service.py +4 -1
  66. package/src/deepscientist/mcp/server.py +212 -5
  67. package/src/deepscientist/process_control.py +161 -0
  68. package/src/deepscientist/prompts/builder.py +501 -453
  69. package/src/deepscientist/quest/layout.py +15 -2
  70. package/src/deepscientist/quest/service.py +2539 -195
  71. package/src/deepscientist/quest/stage_views.py +177 -1
  72. package/src/deepscientist/runners/base.py +2 -0
  73. package/src/deepscientist/runners/codex.py +169 -31
  74. package/src/deepscientist/runners/runtime_overrides.py +17 -1
  75. package/src/deepscientist/skills/__init__.py +2 -2
  76. package/src/deepscientist/skills/installer.py +196 -5
  77. package/src/deepscientist/skills/registry.py +66 -0
  78. package/src/prompts/connectors/qq.md +18 -8
  79. package/src/prompts/connectors/weixin.md +16 -6
  80. package/src/prompts/contracts/shared_interaction.md +24 -4
  81. package/src/prompts/system.md +921 -72
  82. package/src/prompts/system_copilot.md +43 -0
  83. package/src/skills/analysis-campaign/SKILL.md +32 -2
  84. package/src/skills/analysis-campaign/references/artifact-orchestration.md +1 -1
  85. package/src/skills/analysis-campaign/references/writing-facing-slice-examples.md +65 -0
  86. package/src/skills/baseline/SKILL.md +10 -0
  87. package/src/skills/decision/SKILL.md +27 -2
  88. package/src/skills/experiment/SKILL.md +16 -2
  89. package/src/skills/figure-polish/SKILL.md +1 -0
  90. package/src/skills/finalize/SKILL.md +19 -0
  91. package/src/skills/idea/SKILL.md +79 -0
  92. package/src/skills/idea/references/idea-generation-playbook.md +100 -0
  93. package/src/skills/idea/references/outline-seeding-example.md +60 -0
  94. package/src/skills/intake-audit/SKILL.md +9 -1
  95. package/src/skills/mentor/SKILL.md +217 -0
  96. package/src/skills/mentor/references/correction-rules.md +210 -0
  97. package/src/skills/mentor/references/knowledge-profile.md +91 -0
  98. package/src/skills/mentor/references/persona-profile.md +138 -0
  99. package/src/skills/mentor/references/taste-profile.md +128 -0
  100. package/src/skills/mentor/references/thought-style-profile.md +138 -0
  101. package/src/skills/mentor/references/work-profile.md +289 -0
  102. package/src/skills/mentor/references/workflow-profile.md +240 -0
  103. package/src/skills/optimize/SKILL.md +1645 -0
  104. package/src/skills/rebuttal/SKILL.md +3 -1
  105. package/src/skills/review/SKILL.md +3 -1
  106. package/src/skills/scout/SKILL.md +8 -0
  107. package/src/skills/write/SKILL.md +81 -12
  108. package/src/skills/write/references/outline-evidence-contract-example.md +107 -0
  109. package/src/tui/dist/app/AppContainer.js +22 -11
  110. package/src/tui/dist/index.js +4 -1
  111. package/src/tui/dist/lib/api.js +33 -3
  112. package/src/tui/package.json +1 -1
  113. package/src/ui/dist/assets/AiManusChatView-COFACy7V.js +204 -0
  114. package/src/ui/dist/assets/AnalysisPlugin-DnSm0GZn.js +1 -0
  115. package/src/ui/dist/assets/CliPlugin-CvwCmDQ5.js +109 -0
  116. package/src/ui/dist/assets/CodeEditorPlugin-cOqSa0xq.js +2 -0
  117. package/src/ui/dist/assets/CodeViewerPlugin-itb0tltR.js +270 -0
  118. package/src/ui/dist/assets/DocViewerPlugin-DqKkiCI6.js +7 -0
  119. package/src/ui/dist/assets/GitCommitViewerPlugin-DVgNHBCS.js +1 -0
  120. package/src/ui/dist/assets/GitDiffViewerPlugin-DxL2ezFG.js +6 -0
  121. package/src/ui/dist/assets/GitSnapshotViewer-B_RQm1YZ.js +30 -0
  122. package/src/ui/dist/assets/ImageViewerPlugin-tHqlXY3n.js +26 -0
  123. package/src/ui/dist/assets/LabCopilotPanel-ClMbq5Yu.js +14 -0
  124. package/src/ui/dist/assets/LabPlugin-L_SuE8ow.js +22 -0
  125. package/src/ui/dist/assets/LatexPlugin-B495DTXC.js +25 -0
  126. package/src/ui/dist/assets/MarkdownViewerPlugin-DG28-61B.js +128 -0
  127. package/src/ui/dist/assets/MarketplacePlugin-BiOGT-Kj.js +13 -0
  128. package/src/ui/dist/assets/{NotebookEditor-CccQYZjX.css → NotebookEditor-BHH8rdGj.css} +1 -1
  129. package/src/ui/dist/assets/NotebookEditor-BOr3x3Ej.css +1 -0
  130. package/src/ui/dist/assets/NotebookEditor-C-4Kt1p9.js +81 -0
  131. package/src/ui/dist/assets/NotebookEditor-CVsj8h_T.js +361 -0
  132. package/src/ui/dist/assets/PdfLoader-CASDQmxJ.js +16 -0
  133. package/src/ui/dist/assets/PdfLoader-Cy5jtWrr.css +1 -0
  134. package/src/ui/dist/assets/PdfMarkdownPlugin-BFhwoKsY.js +1 -0
  135. package/src/ui/dist/assets/PdfViewerPlugin-DcOzU9vd.js +17 -0
  136. package/src/ui/dist/assets/PdfViewerPlugin-nwwE-fjJ.css +1 -0
  137. package/src/ui/dist/assets/SearchPlugin-CHj7M58O.js +16 -0
  138. package/src/ui/dist/assets/SearchPlugin-DA4en4hK.css +1 -0
  139. package/src/ui/dist/assets/TextViewerPlugin-CB4DYfWO.js +54 -0
  140. package/src/ui/dist/assets/VNCViewer-CjlbyCB3.js +11 -0
  141. package/src/ui/dist/assets/bot-CFkZY-JP.js +6 -0
  142. package/src/ui/dist/assets/browser-CTB2jwNe.js +8 -0
  143. package/src/ui/dist/assets/chevron-up-Dq5ofbht.js +6 -0
  144. package/src/ui/dist/assets/code-DLC6G24T.js +6 -0
  145. package/src/ui/dist/assets/file-content-Dv4LoZec.js +1 -0
  146. package/src/ui/dist/assets/file-diff-panel-Denq-lC3.js +1 -0
  147. package/src/ui/dist/assets/file-jump-queue-DA-SdG__.js +1 -0
  148. package/src/ui/dist/assets/file-socket-Cu4Qln7Y.js +1 -0
  149. package/src/ui/dist/assets/git-commit-horizontal-BUh6G52n.js +6 -0
  150. package/src/ui/dist/assets/image-B9HUUddG.js +6 -0
  151. package/src/ui/dist/assets/index-B2B1sg-M.js +1 -0
  152. package/src/ui/dist/assets/index-Cgla8biy.css +33 -0
  153. package/src/ui/dist/assets/index-DRyx7vAc.js +1 -0
  154. package/src/ui/dist/assets/index-Gbl53BNp.js +2496 -0
  155. package/src/ui/dist/assets/index-wQ7RIIRd.js +11 -0
  156. package/src/ui/dist/assets/monaco-CiHMMNH_.js +1 -0
  157. package/src/ui/dist/assets/pdf-effect-queue-ZtnHFCAi.js +6 -0
  158. package/src/ui/dist/assets/plugin-monaco-C8UgLomw.js +19 -0
  159. package/src/ui/dist/assets/plugin-notebook-HbW2K-1c.js +169 -0
  160. package/src/ui/dist/assets/plugin-pdf-CR8hgQBV.js +357 -0
  161. package/src/ui/dist/assets/plugin-terminal-MXFIPun8.js +227 -0
  162. package/src/ui/dist/assets/popover-DL6h35vr.js +1 -0
  163. package/src/ui/dist/assets/project-sync-CsX08Qno.js +1 -0
  164. package/src/ui/dist/assets/select-DvmXt1yY.js +11 -0
  165. package/src/ui/dist/assets/sigma-7jpXazui.js +6 -0
  166. package/src/ui/dist/assets/trash-xA7kFt8i.js +11 -0
  167. package/src/ui/dist/assets/useCliAccess-DsMwDjOp.js +1 -0
  168. package/src/ui/dist/assets/useFileDiffOverlay-FuhcnKiw.js +1 -0
  169. package/src/ui/dist/assets/wrap-text-CwMn-iqb.js +11 -0
  170. package/src/ui/dist/assets/zoom-out-R-GWEhzS.js +11 -0
  171. package/src/ui/dist/index.html +5 -2
  172. package/src/ui/dist/assets/AiManusChatView-DaF9Nge_.js +0 -26597
  173. package/src/ui/dist/assets/AnalysisPlugin-BSVx6dXE.js +0 -123
  174. package/src/ui/dist/assets/CliPlugin-C9gzJX41.js +0 -5905
  175. package/src/ui/dist/assets/CodeEditorPlugin-DU9G0Tox.js +0 -427
  176. package/src/ui/dist/assets/CodeViewerPlugin-DoX_fI9l.js +0 -905
  177. package/src/ui/dist/assets/DocViewerPlugin-C4FWIXuU.js +0 -278
  178. package/src/ui/dist/assets/GitDiffViewerPlugin-BgfFMgtf.js +0 -2661
  179. package/src/ui/dist/assets/ImageViewerPlugin-tcPkfY_x.js +0 -500
  180. package/src/ui/dist/assets/LabCopilotPanel-_dKV60Bf.js +0 -4104
  181. package/src/ui/dist/assets/LabPlugin-Bje0ayoC.js +0 -2677
  182. package/src/ui/dist/assets/LatexPlugin-CVsBzAln.js +0 -1792
  183. package/src/ui/dist/assets/MarkdownViewerPlugin-xjmrqv_8.js +0 -308
  184. package/src/ui/dist/assets/MarketplacePlugin-mMM2A8wP.js +0 -413
  185. package/src/ui/dist/assets/NotebookEditor-3kVDSOBo.js +0 -4214
  186. package/src/ui/dist/assets/NotebookEditor-C3VQ7ylN.css +0 -1405
  187. package/src/ui/dist/assets/NotebookEditor-SoJ8X-MO.js +0 -84873
  188. package/src/ui/dist/assets/PdfLoader-C-Y707R3.css +0 -49
  189. package/src/ui/dist/assets/PdfLoader-DElVuHl9.js +0 -25468
  190. package/src/ui/dist/assets/PdfMarkdownPlugin-Bq88XT4G.js +0 -409
  191. package/src/ui/dist/assets/PdfViewerPlugin-CsCXMo9S.js +0 -3095
  192. package/src/ui/dist/assets/PdfViewerPlugin-DQ11QcSf.css +0 -3627
  193. package/src/ui/dist/assets/SearchPlugin-DDMrGDkh.css +0 -379
  194. package/src/ui/dist/assets/SearchPlugin-oUPvy19k.js +0 -741
  195. package/src/ui/dist/assets/TextViewerPlugin-CRkT9yNy.js +0 -472
  196. package/src/ui/dist/assets/VNCViewer-BgbuvWhR.js +0 -18821
  197. package/src/ui/dist/assets/awareness-C0NPR2Dj.js +0 -292
  198. package/src/ui/dist/assets/bot-v_RASACv.js +0 -21
  199. package/src/ui/dist/assets/browser-BAcuE0Xj.js +0 -2895
  200. package/src/ui/dist/assets/code-5hC9d0VH.js +0 -17
  201. package/src/ui/dist/assets/file-content-D1PxfOrp.js +0 -377
  202. package/src/ui/dist/assets/file-diff-panel-DG1oT_Hj.js +0 -92
  203. package/src/ui/dist/assets/file-jump-queue-r5XKgJEV.js +0 -16
  204. package/src/ui/dist/assets/file-socket-BmdFYQlk.js +0 -58
  205. package/src/ui/dist/assets/function-B5QZkkHC.js +0 -1895
  206. package/src/ui/dist/assets/image-Dqe2X2tW.js +0 -18
  207. package/src/ui/dist/assets/index-BQG-1s2o.css +0 -12553
  208. package/src/ui/dist/assets/index-DVsMKK_y.js +0 -25
  209. package/src/ui/dist/assets/index-Duvz8Ip0.js +0 -159
  210. package/src/ui/dist/assets/index-Nt9hS4ck.js +0 -244829
  211. package/src/ui/dist/assets/index-RDlNXXx1.js +0 -120
  212. package/src/ui/dist/assets/monaco-DIXge1CP.js +0 -623
  213. package/src/ui/dist/assets/pdf-effect-queue-BBTTQaO-.js +0 -47
  214. package/src/ui/dist/assets/pdf_viewer-e0g1is2C.js +0 -8206
  215. package/src/ui/dist/assets/popover-BWlolyxo.js +0 -476
  216. package/src/ui/dist/assets/project-sync-BM5PkFH4.js +0 -297
  217. package/src/ui/dist/assets/select-D4dAtrA8.js +0 -1690
  218. package/src/ui/dist/assets/sigma-CKbE5jJT.js +0 -22
  219. package/src/ui/dist/assets/square-check-big-CZNGMgiB.js +0 -17
  220. package/src/ui/dist/assets/trash-DaB37xAz.js +0 -32
  221. package/src/ui/dist/assets/useCliAccess-C2OmAcWe.js +0 -957
  222. package/src/ui/dist/assets/useFileDiffOverlay-Dowd1Ij4.js +0 -53
  223. package/src/ui/dist/assets/wrap-text-BGjAhAUq.js +0 -35
  224. package/src/ui/dist/assets/yjs-DncrqiZ8.js +0 -11243
  225. package/src/ui/dist/assets/zoom-out-dMZQMXzc.js +0 -34
@@ -0,0 +1,1645 @@
1
+ ---
2
+ name: optimize
3
+ description: Use when an algorithm-first quest should manage candidate briefs, optimization frontier, branch promotion, or fusion-aware search instead of the paper-oriented default loop.
4
+ skill_role: stage
5
+ ---
6
+
7
+ # Optimize
8
+
9
+ Use this skill for algorithm-first quests where the goal is the strongest justified optimization result rather than paper packaging.
10
+
11
+ This skill is the lightweight optimization control layer for DeepScientist.
12
+ It does not replace the normal quest runtime. It tells you how to use the existing DeepScientist artifact, memory, bash_exec, Git, and worktree mechanisms as an optimization system.
13
+
14
+ ## Interaction discipline
15
+
16
+ - Follow the shared interaction contract injected by the system prompt.
17
+ - For ordinary active work, prefer a concise progress update once work has crossed roughly 6 tool calls with a human-meaningful delta, and do not drift beyond roughly 12 tool calls or about 8 minutes without a user-visible update.
18
+ - Ordinary candidate creation, smoke checks, and route updates should stay concise.
19
+ - Use richer milestone updates only when a candidate is promoted, a strong run finishes, the frontier shifts materially, or a fusion/debug route becomes the new main path.
20
+ - When the user asks for the current optimization state, answer from the frontier and durable artifacts rather than from chat memory.
21
+ - Hard execution rule: every terminal command in this stage must go through `bash_exec`; do not use any other terminal path for smoke checks, quick validations, long runs, Git, Python, package-manager, or file-inspection commands.
22
+
23
+ ## Stage purpose
24
+
25
+ The optimize stage should do four things:
26
+
27
+ 1. turn loose ideas into candidate briefs
28
+ 2. rank and promote only the strongest briefs into durable lines
29
+ 3. manage candidate attempts within a durable line
30
+ 4. choose when to explore, exploit, fuse, debug, or stop
31
+
32
+ This skill is especially appropriate when `startup_contract.need_research_paper = false`.
33
+
34
+ Treat `optimize` as one stable stage skill with six internal submodes:
35
+
36
+ - `brief`
37
+ - `rank`
38
+ - `seed`
39
+ - `loop`
40
+ - `fusion`
41
+ - `debug`
42
+
43
+ Do not treat these as separate public skills.
44
+ Treat them as internal execution modes inside one optimize workflow.
45
+
46
+ InternAgent maps most naturally onto the `brief` and `rank` side of this stage.
47
+ MLEvolve maps most naturally onto the `seed`, `loop`, `fusion`, and `debug` side of this stage.
48
+ Do not collapse those two layers into one vague "optimize more" loop.
49
+
50
+ ## Required working files
51
+
52
+ Before broad optimization search or candidate management becomes substantial, maintain these quest-visible control files:
53
+
54
+ - `OPTIMIZE_CHECKLIST.md`
55
+ - `CANDIDATE_BOARD.md`
56
+
57
+ Use:
58
+
59
+ - the integrated `optimize checklist template` appendix section
60
+ - the integrated `candidate board template` appendix section
61
+
62
+ `OPTIMIZE_CHECKLIST.md` is the execution control surface.
63
+ It should track:
64
+
65
+ - current frontier mode
66
+ - current optimize submode
67
+ - candidate brief count
68
+ - promoted line count
69
+ - current smoke queue
70
+ - current full-eval queue
71
+ - stagnation / fusion checks
72
+ - next concrete action
73
+
74
+ `CANDIDATE_BOARD.md` is the compact candidate ledger.
75
+ It should track:
76
+
77
+ - candidate id
78
+ - candidate type: brief or implementation attempt
79
+ - parent line or parent candidate
80
+ - strategy: explore / exploit / fusion / debug
81
+ - status
82
+ - expected gain
83
+ - observed result
84
+ - promote / archive recommendation
85
+
86
+ ## Required MCP-driven workflow
87
+
88
+ Treat this as the concrete optimize workflow. Do not skip these steps just because the quest is algorithm-first.
89
+
90
+ ### 1. Recover the optimization state first
91
+
92
+ At the start of each meaningful optimize pass, use this order unless a stronger local reason exists:
93
+
94
+ 1. `artifact.get_optimization_frontier(...)`
95
+ 2. `memory.list_recent(scope='quest', limit=5)`
96
+ 3. `memory.search(...)`
97
+ 4. `artifact.get_quest_state(detail='summary')`
98
+ 5. `artifact.read_quest_documents(...)` when exact durable wording matters
99
+
100
+ Do not create new candidates before the frontier, recent optimization lessons, and current runtime refs are checked.
101
+ If the frontier is missing or obviously stale, recover that state before proposing more work.
102
+
103
+ ### 2. Shape candidate briefs before branch promotion
104
+
105
+ When the next direction is still fuzzy, do not jump straight into code or branch creation.
106
+ First turn the direction into a compact candidate brief.
107
+
108
+ The brief-shaping sequence is:
109
+
110
+ 1. clarify the bottleneck, constraints, and comparability boundary
111
+ 2. identify the incumbent or baseline that this brief must beat or complement
112
+ 3. generate a small differentiated slate, usually `2-3` serious approaches
113
+ 4. compare them on one shared surface
114
+ 5. recommend exactly one lead brief
115
+ 6. self-check the recommended brief before submission
116
+
117
+ Every serious brief should answer:
118
+
119
+ - bottleneck
120
+ - why_current_line_is_limited
121
+ - mechanism
122
+ - why_now
123
+ - keep_unchanged
124
+ - expected_gain
125
+ - implementation_surface
126
+ - main_risks
127
+
128
+ The durable call for this step is usually:
129
+
130
+ - `artifact.submit_idea(mode='create', submission_mode='candidate', ...)`
131
+
132
+ Use `idea` when the mechanism family itself is still unresolved.
133
+ Use `optimize` when the family is already chosen and the work is now branchless brief shaping, ranking, or within-line search.
134
+
135
+ ### 3. Rank candidate briefs on one explicit surface
136
+
137
+ Before promoting a line, compare the serious briefs on one shared ranking surface.
138
+ At minimum evaluate:
139
+
140
+ - expected information gain
141
+ - feasibility in current repo
142
+ - comparability against baseline
143
+ - implementation surface
144
+ - novelty or distinctiveness
145
+ - family diversity
146
+ - change-layer diversity
147
+ - incumbent-improvement potential
148
+ - failure risk
149
+
150
+ Then state:
151
+
152
+ - winner justification
153
+ - non-winner defer / reject reasons
154
+ - promotion cap: how many lines should actually be promoted now
155
+
156
+ Do not promote every plausible brief.
157
+ Default rule: promote only `1-3` candidate briefs, and usually fewer.
158
+
159
+ The durable call for this step is one of:
160
+
161
+ - `artifact.submit_idea(mode='create', submission_mode='line', source_candidate_id=..., ...)`
162
+ - `artifact.record(payload={'kind': 'decision', 'action': 'branch'|'continue'|'stop', ...})`
163
+
164
+ ### 4. Hand off promoted lines into experiment cleanly
165
+
166
+ Once a brief is promoted, the next main work belongs to `experiment`, not to vague optimize chatter.
167
+ Before substantial implementation or compute:
168
+
169
+ - activate or confirm the intended durable line
170
+ - update `OPTIMIZE_CHECKLIST.md`
171
+ - update `CANDIDATE_BOARD.md`
172
+ - create or revise `PLAN.md`
173
+ - create or revise `CHECKLIST.md`
174
+ - define the smoke queue and full-eval queue explicitly
175
+
176
+ Then hand off into `experiment` for:
177
+
178
+ - one clean implementation pass
179
+ - one bounded smoke or pilot run
180
+ - one real measured main run
181
+
182
+ Do not keep reshaping the method after the run contract is already concrete.
183
+
184
+ ### 5. Record every meaningful result durably
185
+
186
+ Use these artifact forms consistently:
187
+
188
+ - candidate brief:
189
+ - `artifact.submit_idea(..., submission_mode='candidate')`
190
+ - durable optimization line:
191
+ - `artifact.submit_idea(..., submission_mode='line')`
192
+ - implementation-level candidate attempt inside one line:
193
+ - `artifact.record(payload={'kind': 'report', 'report_type': 'optimization_candidate', ...})`
194
+ - real measured main result:
195
+ - `artifact.record_main_experiment(...)`
196
+ - route change after the result:
197
+ - `artifact.record(payload={'kind': 'decision', 'action': 'iterate'|'branch'|'continue'|'stop', ...})`
198
+
199
+ Do not treat chat summaries as substitutes for these durable records.
200
+
201
+ ### 6. Manage process lifecycle explicitly
202
+
203
+ Optimize uses the same long-run process discipline as `experiment`.
204
+
205
+ - Use `bash_exec` for smoke checks, quick validations, and long runs.
206
+ - Before launching a new run, inspect current managed sessions first.
207
+ - Do not start a duplicate process for the same purpose if a valid live session already exists.
208
+ - Use bounded smoke before long runs unless direct quick validation is already cheap and equally informative.
209
+ - Use `bash_exec(mode='detach', ...)` for long runs and monitor with `list/read/await`.
210
+ - Read logs before retrying a failed or suspicious run; do not relaunch blindly.
211
+ - Kill only on explicit invalidity, supersession, or checked no-progress conditions.
212
+ - After pause, resume, or daemon recovery, recover session state before spawning new runs.
213
+
214
+ ### 7. Route from evidence, not from momentum
215
+
216
+ After every real measured result:
217
+
218
+ 1. refresh the frontier
219
+ 2. compare the result against the incumbent and backlog
220
+ 3. choose exactly one dominant next action:
221
+ - explore
222
+ - exploit
223
+ - fusion
224
+ - debug
225
+ - stop
226
+ 4. record that route durably
227
+
228
+ Do not treat one candidate creation, one smoke pass, or one detached launch as stage completion.
229
+
230
+ ## Integrated templates and playbooks
231
+
232
+ Use the following integrated structures directly inside this skill. They replace the old optimize reference files conceptually, even if those files still exist on disk.
233
+
234
+ ### Candidate brief template
235
+
236
+ Every serious candidate brief should include:
237
+
238
+ - title
239
+ - bottleneck
240
+ - why_current_line_is_limited
241
+ - mechanism
242
+ - mechanism_family
243
+ - change_layer: `Tier1` / `Tier2` / `Tier3`
244
+ - source_lens
245
+ - keep_unchanged
246
+ - expected_gain
247
+ - implementation_surface
248
+ - risks
249
+ - foundation
250
+ - promote_now
251
+ - next_target
252
+
253
+ ### Brief-shaping playbook
254
+
255
+ Use this when a candidate direction is still fuzzy and needs to become a ranking-ready brief.
256
+
257
+ - clarify the concrete bottleneck before widening
258
+ - resolve the evaluation or comparability boundary
259
+ - identify the main hard constraint
260
+ - identify the current incumbent
261
+ - generate only a small differentiated slate
262
+ - compare on one shared surface
263
+ - recommend exactly one lead brief
264
+ - self-check for ambiguity, overlap, and weak justification
265
+
266
+ ### Candidate ranking template
267
+
268
+ When several briefs compete, produce:
269
+
270
+ - candidate set
271
+ - ranking scope
272
+ - comparison surface
273
+ - ranked candidates with score summary, why each ranks there, and promote / hold / reject
274
+ - winner justification
275
+ - non-winner notes
276
+ - promotion cap
277
+
278
+ ### Candidate board template
279
+
280
+ `CANDIDATE_BOARD.md` should expose at least these columns:
281
+
282
+ - candidate id
283
+ - level: `brief` or `implementation`
284
+ - parent
285
+ - strategy
286
+ - status
287
+ - expected gain
288
+ - observed result
289
+ - promote / archive recommendation
290
+
291
+ ### Optimize checklist template
292
+
293
+ `OPTIMIZE_CHECKLIST.md` should track at least:
294
+
295
+ - frontier has been refreshed
296
+ - primary optimize submode chosen
297
+ - current route mode chosen
298
+ - recent optimization memory reviewed
299
+ - brief slate checked for family diversity
300
+ - candidate briefs updated or confirmed
301
+ - candidate ranking updated
302
+ - promotion decision made
303
+ - current implementation pool recorded
304
+ - smoke queue defined
305
+ - full-eval queue defined
306
+ - failures classified
307
+ - stagnation check performed
308
+ - fusion eligibility checked
309
+ - next concrete action written
310
+
311
+ ### Frontier review template
312
+
313
+ Whenever route choice is unclear, write down:
314
+
315
+ - current frontier
316
+ - evidence summary
317
+ - route choice
318
+ - active optimize submode
319
+ - immediate next action
320
+
321
+ ### Code-generation route playbook
322
+
323
+ Choose one route deliberately:
324
+
325
+ - brief-only when the direction is still unclear
326
+ - stepwise generation for first substantial implementation of a new line
327
+ - diff / patch generation for improve / exploit / debug / most fusion work
328
+ - full rewrite only when the current implementation is structurally broken or mismatched
329
+
330
+ Do not jump to a rewrite merely because one local patch failed.
331
+
332
+ ### Debug response template
333
+
334
+ When a candidate fails but still looks strategically valuable, record:
335
+
336
+ - error
337
+ - retrieved memory
338
+ - root cause
339
+ - minimal fix
340
+ - keep unchanged
341
+ - next check
342
+ - archive threshold
343
+
344
+ ### Fusion playbook
345
+
346
+ Before opening a fusion candidate, answer:
347
+
348
+ - what exactly is being fused?
349
+ - why are the source strengths complementary rather than redundant?
350
+ - what remains unchanged for comparability?
351
+ - what bounded evidence would prove the fusion worthwhile?
352
+ - what bounded first validation step should run before any broad rollout?
353
+
354
+ Do not fuse two weak lines or two same-mechanism lines under different names.
355
+
356
+ ### Optimization memory template
357
+
358
+ When writing reusable optimization lessons, capture:
359
+
360
+ - type
361
+ - context
362
+ - observation
363
+ - why it matters
364
+ - retrieval hint
365
+ - reuse hint
366
+
367
+ ### Plateau response playbook
368
+
369
+ If one line keeps producing non-improving results:
370
+
371
+ 1. state that the line is plateauing
372
+ 2. identify the most likely root cause
373
+ 3. choose one larger route change:
374
+ - widen search
375
+ - promote a stronger alternative
376
+ - fuse
377
+ - debug
378
+ - stop
379
+ 4. record one explicit non-repeat rule
380
+
381
+ Do not hide plateau under a sequence of tiny "one more tweak" loops.
382
+
383
+ ### Prompt patterns worth preserving
384
+
385
+ For candidate-brief, improve, fusion, and debug prompts, preserve:
386
+
387
+ - introduction
388
+ - task description
389
+ - memory
390
+ - previous solution or previous line
391
+ - instructions
392
+ - explicit response format
393
+
394
+ Preserve these reasoning contracts whenever possible:
395
+
396
+ - WHAT is changing?
397
+ - WHY is the current line limited?
398
+ - HOW should the change address the limitation?
399
+ - KEEP UNCHANGED
400
+ - NEXT ACTION
401
+
402
+ ## Non-negotiable rules
403
+
404
+ - Do not treat every patch or micro-attempt as a new durable idea line.
405
+ - Do not create a new Git branch/worktree for every implementation-level candidate.
406
+ - Use `artifact.submit_idea(..., submission_mode='candidate')` for candidate briefs that should be ranked before promotion.
407
+ - Use `artifact.submit_idea(..., submission_mode='line')` only for directions that deserve a durable optimization line and branch/worktree.
408
+ - Use `artifact.record(payload={'kind': 'report', 'report_type': 'optimization_candidate', ...})` for implementation-level candidate attempts inside one durable line.
409
+ - Before deciding the next route, call `artifact.get_optimization_frontier(...)` when available and use it as the primary optimization-state summary.
410
+ - Keep all major optimization successes and failures durable through artifacts and memory.
411
+ - Do not drift into paper-outline, bundle, or finalize work by default while this stage is active.
412
+ - Do not convert ranking uncertainty into premature branch creation.
413
+ - Do not treat an implementation-level candidate report as a new durable optimization line.
414
+ - Do not keep widening the frontier once a small serious slate already exists.
415
+ - Do not let one optimize pass mix multiple major route changes.
416
+ One pass may inspect several possibilities, but it should finish with one dominant next action.
417
+
418
+ ## When to use
419
+
420
+ - the quest is algorithm-first
421
+ - the baseline gate is already confirmed or waived
422
+ - the task has at least one plausible optimization direction
423
+ - multiple candidate directions exist and the system should rank them before promotion
424
+ - a durable line exists and the next step is to manage explore / exploit / fuse / debug
425
+
426
+ ## Do not use when
427
+
428
+ - the baseline gate is unresolved
429
+ - the main need is a paper draft, rebuttal, or review task
430
+ - the quest is still in broad literature scouting with no concrete optimization handle
431
+
432
+ ## Core object model
433
+
434
+ Use these three object levels consistently:
435
+
436
+ 1. candidate brief
437
+ `artifact.submit_idea(mode='create', submission_mode='candidate', ...)`
438
+ This records a possible direction or method brief without opening a branch yet.
439
+
440
+ 2. durable optimization line
441
+ `artifact.submit_idea(mode='create', submission_mode='line', ...)`
442
+ This opens a real branch/worktree and becomes a formal optimization path.
443
+
444
+ 3. implementation-level candidate attempt
445
+ `artifact.record(payload={'kind': 'report', 'report_type': 'optimization_candidate', ...})`
446
+ This is a within-line attempt such as one patch, one smoke candidate, one debug candidate, or one fusion candidate.
447
+
448
+ ## Recommended workflow
449
+
450
+ 1. Read the current frontier and recent durable state.
451
+ 2. If only loose candidate directions exist, create or refine candidate briefs first.
452
+ 3. Rank the candidate briefs and promote only the best `1-3` into durable lines.
453
+ 4. Inside a durable line, generate a small candidate pool, then run bounded smoke checks before full evaluations.
454
+ 5. Record each implementation-level attempt durably with status, change plan, and result.
455
+ 6. After each real result, decide whether to explore, exploit, fuse, debug, or stop.
456
+ 7. Write optimization lessons to memory before leaving the stage.
457
+
458
+ At the start of each meaningful optimize pass, update `OPTIMIZE_CHECKLIST.md` before spending significant code or compute.
459
+
460
+ ## Mandatory first-call sequence
461
+
462
+ At the start of a meaningful optimize pass, use this order unless a stronger local reason exists:
463
+
464
+ 1. `artifact.get_optimization_frontier(...)`
465
+ 2. `memory.search(...)`
466
+ 3. `artifact.get_quest_state(detail='summary')`
467
+ 4. `artifact.read_quest_documents(...)` when exact durable wording matters
468
+
469
+ Do not start generating new candidates before the frontier and recent optimization lessons are checked.
470
+
471
+ ## Stage-start requirement
472
+
473
+ Stage-start requirement:
474
+
475
+ - run `memory.list_recent(scope='quest', limit=5)`
476
+ - run at least one `memory.search(...)`
477
+ - read `artifact.get_optimization_frontier(...)`
478
+ - update `OPTIMIZE_CHECKLIST.md`
479
+
480
+ If the frontier is missing or obviously stale, recover that state before proposing more work.
481
+
482
+ ## Internal submode selection
483
+
484
+ Choose exactly one primary optimize submode for the current meaningful pass.
485
+
486
+ Default selection order:
487
+
488
+ 1. `fusion`
489
+ - when the frontier explicitly says `fusion`
490
+ 2. `debug`
491
+ - when a strategically valuable candidate failed for a concrete and likely fixable reason
492
+ 3. `rank`
493
+ - when several candidate briefs already exist and promotion is the main unresolved question
494
+ 4. `brief`
495
+ - when the candidate-brief slate is too thin or too weak
496
+ 5. `seed`
497
+ - when a durable line exists but there is no live implementation-candidate pool
498
+ 6. `loop`
499
+ - when a live candidate pool or leading durable line already exists and the main need is bounded execution progress
500
+
501
+ Do not bounce among submodes repeatedly in one pass.
502
+ If the best submode changes after new evidence appears, record that route shift explicitly.
503
+
504
+ ## Candidate brief protocol
505
+
506
+ When a direction is interesting but not yet worthy of a new branch:
507
+
508
+ - create a candidate brief with `submission_mode='candidate'`
509
+ - keep it branchless
510
+ - record enough structure that later ranking or promotion is possible
511
+
512
+ Good candidate-brief fields include:
513
+
514
+ - title
515
+ - problem
516
+ - hypothesis
517
+ - mechanism
518
+ - mechanism_family
519
+ - change_layer
520
+ - source_lens
521
+ - expected_gain
522
+ - risks
523
+ - decision_reason
524
+ - foundation_ref
525
+ - lineage_intent
526
+
527
+ Do not promote every candidate automatically.
528
+
529
+ Use the integrated `method brief template` section for the minimum acceptable candidate-brief structure.
530
+ Use the integrated `brief shaping playbook` section when the brief is still too vague, too implementation-first, or too collapsed onto one familiar mechanism.
531
+
532
+ Candidate briefs should explicitly answer:
533
+
534
+ - WHAT bottleneck is being targeted?
535
+ - WHY is the current line limited?
536
+ - HOW does this mechanism address the limitation?
537
+ - WHAT must remain unchanged for comparability?
538
+
539
+ If the brief cannot answer those four questions clearly, it is not ready for promotion or implementation.
540
+
541
+ Treat a candidate brief as the DeepScientist form of a method brief.
542
+ It should sit between "idea intuition" and "code implementation".
543
+
544
+ Preserve this brief-shaping discipline:
545
+
546
+ 1. clarify the bottleneck, constraints, and comparability boundary first
547
+ 2. generate a small differentiated slate, usually `2-3` serious approaches
548
+ 3. recommend one approach with explicit tradeoffs against the alternatives
549
+ 4. self-check the winning brief for ambiguity, overlap, and weak justification before submission
550
+
551
+ Do not jump from "interesting intuition" to branch creation.
552
+ Do not jump from "I know how to code this" to "this deserves promotion."
553
+
554
+ When running the `brief` submode:
555
+
556
+ - produce only `2-4` serious candidate briefs by default
557
+ - ask or answer the minimum clarifying questions needed to remove ambiguity around bottleneck, constraint fit, and comparability
558
+ - explicitly keep one incumbent-compatible refinement when possible
559
+ - explicitly keep one orthogonal alternative when possible
560
+ - explicitly keep one broader lens or paradigm shift candidate when possible
561
+ - avoid generating several renamed variants of the same mechanism
562
+ - prefer mechanism-level distinctness over volume
563
+ - present the differentiated slate on one shared comparison surface before choosing a recommended brief
564
+ - keep the questioning bounded and execution-oriented rather than open-ended brainstorming
565
+
566
+ Use a coverage contract for every serious brief slate:
567
+
568
+ - one `incumbent-deepening` direction when justified
569
+ - one `orthogonal-mechanism` direction when justified
570
+ - one `paradigm/objective/data-view shift` direction when justified
571
+
572
+ If all serious briefs belong to the same mechanism family, do one widening pass before ranking.
573
+ Do not treat a same-family slate as sufficient merely because the local scores look good.
574
+
575
+ For each serious brief, record at least:
576
+
577
+ - bottleneck
578
+ - why_current_line_is_limited
579
+ - mechanism
580
+ - why_now
581
+ - mechanism_family
582
+ - change_layer: `Tier1` / `Tier2` / `Tier3`
583
+ - source_lens
584
+ - keep_unchanged
585
+ - expected_gain
586
+ - implementation_surface
587
+ - main_risks
588
+ - promote_now: yes or no
589
+
590
+ InternAgent-style behavior to preserve here:
591
+
592
+ - generate candidate methods first
593
+ - critique them before promotion
594
+ - express them as method-layer objects rather than code patches
595
+ - defer branch creation until the candidate is actually chosen
596
+ - prefer one-question-at-a-time clarification when one missing assumption would otherwise contaminate the whole brief slate
597
+
598
+ Do not require a paper-style literature hard gate inside this submode unless the quest explicitly moved back toward paper work.
599
+
600
+ ## Promotion protocol
601
+
602
+ Only promote a candidate brief into a durable line when at least one of the following is true:
603
+
604
+ - it clearly dominates the nearby alternatives
605
+ - it is top-ranked and sufficiently distinct
606
+ - the user explicitly asked to pursue it
607
+ - the current frontier indicates the line is the strongest next move
608
+
609
+ Promotion should use:
610
+
611
+ `artifact.submit_idea(mode='create', submission_mode='line', source_candidate_id=..., ...)`
612
+
613
+ When several candidate briefs are plausible, rank them explicitly before promotion.
614
+ Use the integrated `candidate ranking template` section for the minimum acceptable ranking record.
615
+
616
+ Default promotion rule:
617
+
618
+ - promote only `1-3` candidate briefs into durable lines
619
+ - if one candidate clearly dominates, promote only that one
620
+ - if the frontier is still structurally uncertain, promote at most two sufficiently distinct lines
621
+
622
+ When running the `rank` submode:
623
+
624
+ - compare the current serious briefs on one explicit shared surface
625
+ - score or rank them with written reasons
626
+ - state why the winner is better now
627
+ - state why the main alternatives are deferred rather than erased
628
+ - never treat "all seem promising" as a sufficient reason to promote them all
629
+
630
+ Use a distinct promotion policy:
631
+
632
+ - default rule: each mechanism family should contribute at most one promoted line
633
+ - do not let one familiar family fill the whole promoted slate
634
+ - only override that family cap when one candidate clearly dominates the whole field
635
+
636
+ When ranking, explicitly check:
637
+
638
+ - family diversity
639
+ - change-layer diversity
640
+ - whether the brief slate is collapsing into one familiar lens
641
+
642
+ If the top briefs are all same-family, either:
643
+
644
+ - keep only the strongest one
645
+ - or return to `brief` for a widening pass
646
+
647
+ The output of `rank` should be promotion-ready.
648
+ The output of `brief` should be candidate-ready.
649
+
650
+ ## Frontier protocol
651
+
652
+ At meaningful route boundaries, inspect:
653
+
654
+ - best branch
655
+ - best recent run
656
+ - stagnant branches
657
+ - candidate backlog
658
+ - possible fusion opportunities
659
+ - recommended mode
660
+
661
+ Prefer these route meanings:
662
+
663
+ - `explore`: widen search with fresh candidate directions
664
+ - `exploit`: focus on the strongest current line
665
+ - `fusion`: merge insights from multiple successful or complementary lines
666
+ - `debug`: rescue a candidate or line blocked by a concrete failure mode
667
+ - `stop`: the current frontier is saturated or the remaining routes are not justified
668
+
669
+ Use the integrated `frontier review template` section when the next route is unclear.
670
+
671
+ Interpret frontier state with these default heuristics:
672
+
673
+ - `explore`
674
+ - use when no line is clearly dominant
675
+ - use when current lines are too similar
676
+ - use when the search has not yet established a strong incumbent
677
+
678
+ - `exploit`
679
+ - use when one line clearly leads on evidence and comparability
680
+ - use when smoke results already narrowed the candidate pool
681
+
682
+ - `fusion`
683
+ - use when at least two lines have meaningful strengths
684
+ - use when one line is strong but another line contributes a complementary mechanism
685
+ - use when the current incumbent is stagnating but the broader frontier is still promising
686
+
687
+ - `debug`
688
+ - use when a candidate failed for a concrete and likely fixable reason
689
+ - use when the candidate is still strategically valuable after the failure
690
+
691
+ - `stop`
692
+ - use when the frontier is saturated
693
+ - use when remaining routes are low-value, redundant, or too weak relative to cost
694
+
695
+ When the frontier says `explore`, the default optimize submode is `brief`.
696
+ When the frontier says `exploit`, the default optimize submode is `seed` or `loop`.
697
+ When the frontier says `fusion`, the default optimize submode is `fusion`.
698
+ When a candidate failure dominates the next move, the default optimize submode is `debug` even if the frontier does not yet say so explicitly.
699
+
700
+ ## Seed protocol
701
+
702
+ Use `seed` after a durable line exists and before a broad execution loop begins.
703
+
704
+ The goal is not to launch a full run immediately.
705
+ The goal is to generate a small within-line candidate pool that can be smoke-tested and triaged.
706
+
707
+ When running `seed`:
708
+
709
+ - generate only `2-3` implementation-level candidates by default
710
+ - make each candidate meaningfully different in mechanism, implementation path, or risk profile
711
+ - prefer plan-first candidates over immediate large edits
712
+ - record each candidate as `report_type='optimization_candidate'`
713
+ - define which candidates enter smoke first
714
+ - for a newly promoted line, keep at least one `simple-first` candidate in the initial seed batch
715
+ - do not start a fresh line with ensemble stacking, broad HPO, or a heavy multi-stage pipeline unless durable evidence already proves the simple route is insufficient
716
+
717
+ For each seed candidate, record at least:
718
+
719
+ - candidate_id
720
+ - parent line
721
+ - strategy
722
+ - mechanism_family
723
+ - change_layer
724
+ - change_plan
725
+ - expected_gain
726
+ - keep_unchanged
727
+ - first validation step
728
+ - archive condition
729
+
730
+ MLEvolve-style behavior to preserve here:
731
+
732
+ - one durable line may produce multiple candidate attempts
733
+ - candidate generation is bounded
734
+ - smoke comes before full evaluation unless the task is explicitly `fast-check` and direct quick validation is cheaper and equally informative
735
+
736
+ Use a validation-cost-aware seed policy:
737
+
738
+ - `fast-check`: the first objective smoke signal is likely under about `20` minutes
739
+ - `slow-check`: the first objective smoke signal is likely over about `20` minutes or expensive enough that broad probing is wasteful
740
+
741
+ For `fast-check` seed work:
742
+
743
+ - widen a bit more aggressively inside the line
744
+ - a seed batch of `3-5` candidates can be justified when they are genuinely differentiated
745
+ - prefer multiple orthogonal quick tests over one over-discussed candidate
746
+ - a separate smoke stage is optional; direct submission into quick parallel validation is acceptable when the first check is already cheap
747
+ - only skip smoke when the parallel quick validations are expected to produce distinguishable conclusions rather than repeated near-duplicate outcomes
748
+
749
+ For `slow-check` seed work:
750
+
751
+ - keep the initial seed batch tighter, usually `1-2` candidates and rarely `3`
752
+ - insist on a stronger reason for every candidate entering smoke
753
+ - prefer one dominant hypothesis plus one hedge candidate over a broad exploratory pool
754
+ - do not spend long runs to discover that the brief itself was weak
755
+
756
+ Do not keep a live implementation pool dominated by the same mechanism family.
757
+ Default active-pool rule:
758
+
759
+ - at most `1-2` live candidates from the same family
760
+ - if one family already fills the live pool, new same-family candidates do not enter smoke by default
761
+
762
+ ## Loop protocol
763
+
764
+ Use `loop` when a durable line and implementation-candidate pool already exist and the main need is bounded forward motion.
765
+
766
+ Before changing code in `loop`, inspect the same-line local attempt memory for the current line.
767
+ Treat recent sibling attempts on the same line as the first memory surface, ahead of broader quest memory.
768
+
769
+ When running `loop`, choose one primary action:
770
+
771
+ - `smoke`
772
+ - `promote_to_full_eval`
773
+ - `archive`
774
+ - `record_main_result`
775
+ - `switch_to_fusion`
776
+ - `switch_to_debug`
777
+ - `stop`
778
+
779
+ Every loop pass should end with:
780
+
781
+ - one updated candidate status
782
+ - one updated next action
783
+ - one frontier review trigger
784
+
785
+ Do not leave the line with several half-started directions and no dominant next move.
786
+
787
+ Default exploit rule: one atomic improvement per pass.
788
+ Do not bundle several unrelated changes into one exploit candidate unless:
789
+
790
+ - the changes are one tightly coupled design package
791
+ - or the pass is explicitly a fusion route
792
+
793
+ MLEvolve-style behavior to preserve here:
794
+
795
+ - bounded parallelism
796
+ - small live candidate pool
797
+ - explicit move from draft -> smoke -> full eval -> archive or result
798
+ - measured frontier review after real evidence
799
+
800
+ Use a validation-cost-aware loop policy:
801
+
802
+ - for `fast-check` tasks, it is acceptable to run more quick, different tests before converging
803
+ - for `fast-check` tasks, direct quick validation may replace a separate smoke stage if that saves time without losing decision quality
804
+ - for `slow-check` tasks, use fewer but sharper passes, and require objective gain before widening or evolving further
805
+ - if the validation loop is slow, do not keep paying for frontier uncertainty that could have been reduced in `brief`
806
+ - if the validation loop is fast, prefer resolving uncertainty with evidence instead of over-arguing in chat
807
+
808
+ Use a branch/family diversity cap during exploitation:
809
+
810
+ - do not keep selecting only the locally familiar family because it is easiest to elaborate
811
+ - when several strong candidates are close, prefer the one that preserves frontier diversity
812
+ - if one branch or family already dominates recent attempts, require stronger evidence before selecting another near-duplicate attempt
813
+
814
+ ## Memory protocol
815
+
816
+ Before broad new search, run at least one `memory.search(...)` using:
817
+
818
+ - the current task name
819
+ - the active idea id
820
+ - a method keyword
821
+ - the most recent failure mode or successful mechanism
822
+
823
+ When the search appears too narrow, also retrieve one of:
824
+
825
+ - a similar failure pattern
826
+ - an orthogonal success pattern
827
+ - a deliberately dissimilar but high-value prior attempt
828
+
829
+ For `seed`, `loop`, and `debug`, also inspect the same-line local attempt memory from the current leading line before widening to broader quest memory.
830
+
831
+ Write at least one quest memory card when you learn something reusable, such as:
832
+
833
+ - a successful optimization pattern
834
+ - a repeated failure pattern
835
+ - a fusion lesson
836
+ - a reason a candidate should not be retried
837
+
838
+ Use the integrated `optimization memory template` section for the minimum acceptable memory-card shape.
839
+
840
+ Do not write generic "we tried some optimization" memory cards.
841
+ Each card should be retrieval-friendly and decision-relevant.
842
+
843
+ ## Artifact protocol
844
+
845
+ Use:
846
+
847
+ - `artifact.submit_idea(..., submission_mode='candidate')` for candidate briefs
848
+ - `artifact.submit_idea(..., submission_mode='line')` for durable promoted lines
849
+ - `artifact.record(payload={'kind': 'report', 'report_type': 'optimization_candidate', ...})` for within-line attempts
850
+ - `artifact.record(payload={'kind': 'decision', 'action': 'iterate'|'branch'|'continue'|'stop', ...})` for route changes
851
+ - `artifact.record_main_experiment(...)` for real measured line results
852
+
853
+ When the optimize pass is about ranking or promotion, also record one durable decision explaining:
854
+
855
+ - which briefs were compared
856
+ - which one won
857
+ - why promotion was justified now
858
+ - why the others were held, fused, or rejected
859
+
860
+ When recording implementation-level candidates, prefer these status values:
861
+
862
+ - `proposed`
863
+ - `smoke_running`
864
+ - `smoke_passed`
865
+ - `smoke_failed`
866
+ - `promoted`
867
+ - `full_eval_running`
868
+ - `succeeded`
869
+ - `failed`
870
+ - `archived`
871
+
872
+ Use `report_type='optimization_candidate'` consistently for implementation-level attempts so they can later be summarized into the frontier.
873
+
874
+ ## Execution protocol
875
+
876
+ - Use `bash_exec` for smoke checks and full runs.
877
+ - Prefer bounded smoke before full evaluation unless `fast-check` direct validation is cheaper and equally informative.
878
+ - Do not keep rerunning the same unchanged candidate.
879
+ - If a candidate fails with a clear root cause, either debug it deliberately or archive it.
880
+ - If the same line stalls repeatedly, switch to exploit or fusion rather than pretending more of the same is new evidence.
881
+
882
+ Use this execution order by default:
883
+
884
+ 1. candidate brief selection
885
+ 2. implementation-level candidate generation
886
+ 3. smoke test or direct quick validation
887
+ 4. promotion to fuller evaluation when justified
888
+ 5. durable result recording
889
+ 6. frontier review
890
+
891
+ Prefer only a small active pool at once:
892
+
893
+ - usually `2-4` candidate briefs before promotion
894
+ - usually `2-3` live implementation candidates in smoke
895
+ - usually `1-2` full evaluations running at once unless the environment clearly supports more
896
+
897
+ Validation-cost-aware override:
898
+
899
+ - if first-pass validation is under about `20` minutes, it is reasonable to increase smoke breadth modestly and compare more alternatives early
900
+ - if first-pass validation is under about `20` minutes, you may skip a separate smoke stage and submit several quick validations in parallel
901
+ - only do that when the validations are likely to yield different conclusions such as clear win / tie / fail / instability, rather than redundant repeats
902
+ - if first-pass validation is slower than that, keep the active pool narrow and gate evolution on clear objective signal
903
+ - for slow validation, do not promote a candidate into heavier resource investment until smoke or pilot evidence shows a real performance improvement, stability improvement, or comparability-preserving advantage
904
+
905
+ ## Code-generation route selection
906
+
907
+ Do not use the same code-generation route for every optimization step.
908
+
909
+ Prefer:
910
+
911
+ 1. brief-first, no code yet
912
+ - when the direction is still unclear
913
+ - stay at candidate-brief level
914
+
915
+ 2. stepwise generation
916
+ - for the first substantial implementation of a new durable line
917
+ - especially when the line touches multiple subsystems such as data processing, model design, and training/evaluation
918
+
919
+ 3. diff / patch generation
920
+ - when a strong current implementation already exists
921
+ - for improve, exploit, debug, and most fusion work
922
+
923
+ 4. full rewrite
924
+ - only when the current implementation is too broken or too structurally mismatched for diff patching to remain safe
925
+
926
+ Use the integrated `codegen route playbook` section before committing to a larger rewrite.
927
+
928
+ ## Debug protocol
929
+
930
+ Use `debug` when a candidate failed but still looks strategically valuable.
931
+
932
+ `debug` is bugfix-only.
933
+ Do not use a debug pass to sneak in a new performance-improvement idea.
934
+ If the proposed change goes beyond the minimal fix and becomes a new mechanism, stop and route back to `brief` or `loop` instead.
935
+
936
+ When a candidate fails:
937
+
938
+ - classify whether the failure is structural, local, or environmental
939
+ - retrieve similar failure patterns from memory before changing code
940
+ - prefer targeted fixes over broad rewrites
941
+ - define the exact post-fix bounded check before editing
942
+
943
+ Good debug prompts should make these explicit:
944
+
945
+ - the concrete error
946
+ - the likely root cause
947
+ - the minimal fix
948
+ - what must remain unchanged
949
+
950
+ Use the integrated `debug response template` section for the minimum acceptable debug response shape.
951
+
952
+ Archive rather than debug when:
953
+
954
+ - the failure is mostly strategic rather than local
955
+ - the candidate no longer looks better than the nearby alternatives
956
+ - the fix would effectively turn it into a different candidate anyway
957
+
958
+ ## Fusion protocol
959
+
960
+ Use `fusion` only when the frontier justifies cross-line combination.
961
+
962
+ Before opening a fusion candidate:
963
+
964
+ - identify the real strength of each source line
965
+ - identify the real weakness of each source line
966
+ - explain why the strengths are complementary rather than redundant
967
+ - define what remains unchanged for comparability
968
+ - define the bounded evidence that would prove the fusion was worthwhile
969
+
970
+ Use the integrated `fusion playbook` section before launching cross-line fusion.
971
+
972
+ Do not fuse:
973
+
974
+ - two lines with the same mechanism under different names
975
+ - two weak lines that lack a clear strength
976
+ - merely because multiple branches exist
977
+
978
+ If the fusion hypothesis is still underspecified, return to `brief` instead of pretending fusion is ready.
979
+
980
+ ## Prompt patterns worth preserving
981
+
982
+ For candidate-brief, improve, fusion, and debug prompts, preserve these recurring structures:
983
+
984
+ - Introduction
985
+ - Task description
986
+ - Memory
987
+ - Previous solution or previous line
988
+ - Instructions
989
+ - assistant_prefix when a stable response lead-in reduces drift
990
+ - explicit response format
991
+
992
+ And preserve these recurring reasoning contracts:
993
+
994
+ - root cause first
995
+ - WHAT / WHY / HOW
996
+ - KEEP UNCHANGED
997
+ - explicit next action
998
+
999
+ Use the integrated `prompt patterns` section as the canonical optimization prompt crib sheet.
1000
+
1001
+ ## Plateau and fusion protocol
1002
+
1003
+ Treat repeated local edits without evidence gain as a search failure mode.
1004
+
1005
+ If one line shows repeated non-improving results:
1006
+
1007
+ - stop issuing near-duplicate attempts
1008
+ - record the stagnation explicitly
1009
+ - either widen the search or fuse with another line
1010
+
1011
+ Use the integrated `fusion playbook` section before launching cross-line fusion.
1012
+ Use the integrated `plateau response playbook` section when deciding how to respond to repeated non-improving results.
1013
+
1014
+ Good fusion candidates usually satisfy both:
1015
+
1016
+ - each source line has at least one real strength
1017
+ - the strengths are complementary rather than redundant
1018
+
1019
+ Do not fuse merely because two lines both exist.
1020
+
1021
+ When a line plateaus:
1022
+
1023
+ - stop issuing near-duplicate low-information attempts
1024
+ - say explicitly that the line is plateauing
1025
+ - force one larger route change:
1026
+ - widen the brief slate
1027
+ - promote a stronger alternative
1028
+ - fuse
1029
+ - debug one blocked but valuable candidate
1030
+ - stop
1031
+
1032
+ Do not hide plateau under a sequence of tiny "one more tweak" loops.
1033
+
1034
+ Family-shift trigger:
1035
+
1036
+ - if recent attempts stay inside one mechanism family and there is no meaningful improvement
1037
+ - or if `success_patience >= 2`
1038
+ - or if `total_patience >= 5`
1039
+ - the next pass must not be another same-family Tier1 tweak
1040
+ - instead choose one of:
1041
+ - orthogonal family
1042
+ - Tier2 or Tier3 shift
1043
+ - fusion
1044
+ - stop
1045
+
1046
+ This is the default anti-collapse rule for optimize.
1047
+
1048
+ ## Task-category primer
1049
+
1050
+ Before widening a stale frontier, classify the task briefly into one or more dominant structures:
1051
+
1052
+ - tabular
1053
+ - vision / spatial
1054
+ - sequence / language
1055
+ - graph / topology
1056
+ - systems / optimization
1057
+ - mixed
1058
+
1059
+ Then ask whether the current brief slate overfits one familiar method family for that task.
1060
+ If it does, require at least one serious candidate from a different plausible family or lens before promotion.
1061
+
1062
+ ## Stall-recovery protocol
1063
+
1064
+ If the optimize stage appears to stall, diagnose the stall explicitly instead of idling.
1065
+
1066
+ Common stall classes:
1067
+
1068
+ - no frontier information
1069
+ - no candidate clearly worth promotion
1070
+ - candidate pool is too similar
1071
+ - repeated failures on one line
1072
+ - no active runs and no next action recorded
1073
+
1074
+ Preferred recovery order:
1075
+
1076
+ 1. refresh the frontier
1077
+ 2. inspect the current candidate board
1078
+ 3. inspect recent optimization memory
1079
+ 4. record one explicit route decision
1080
+ 5. continue with exactly one concrete next action
1081
+
1082
+ Do not leave the stage parked without a recorded reason and a concrete reopen condition.
1083
+
1084
+ ## Stage-end requirement
1085
+
1086
+ Stage-end requirement:
1087
+
1088
+ - write at least one `memory.write(...)` when the pass produced a reusable success pattern, repeated failure pattern, fusion lesson, or explicit non-retry rule
1089
+ - update `OPTIMIZE_CHECKLIST.md`
1090
+ - update `CANDIDATE_BOARD.md` when the candidate pool changed
1091
+ - leave one durable next action or stop condition
1092
+
1093
+ If nothing reusable was learned, record why this pass was still necessary instead of writing a fake memory card.
1094
+
1095
+ ## Completion rule
1096
+
1097
+ This stage is complete only when one of these is durably true:
1098
+
1099
+ - a stronger line was promoted and the next anchor is clear
1100
+ - the current line produced a real measured result and the next route is recorded
1101
+ - the optimization frontier says stop and that stop decision is durably recorded
1102
+
1103
+ Do not treat one candidate creation or one smoke pass as stage completion.
1104
+
1105
+ ## Integrated reference appendix
1106
+
1107
+ This appendix inlines the former `optimize/references/*.md` material so the skill remains self-contained.
1108
+
1109
+ ### brief-shaping-playbook.md
1110
+
1111
+ # Brief Shaping Playbook
1112
+
1113
+ Use this reference when a candidate direction is still fuzzy and needs to become a structured, ranking-ready brief.
1114
+
1115
+ This playbook borrows the useful part of product-style brainstorming without importing a full software-spec workflow.
1116
+ The goal is not a long design document.
1117
+ The goal is a compact candidate brief that is clear enough to compare, rank, and either submit as `submission_mode='candidate'` or reject.
1118
+
1119
+ ## 1. Clarify before widening
1120
+
1121
+ Before generating more variants, resolve the minimum ambiguity around:
1122
+
1123
+ - the concrete bottleneck
1124
+ - the evaluation or comparability boundary
1125
+ - the main hard constraint: data, metric, compute, latency, memory, interface, or training budget
1126
+ - the current incumbent or baseline that this brief must beat or complement
1127
+
1128
+ If one unknown would materially change every candidate, clarify it first instead of generating a noisy slate.
1129
+ Prefer one question at a time when clarification is genuinely needed.
1130
+ If the answer is already available from durable state, use that instead of asking.
1131
+
1132
+ ## 2. Generate a small differentiated slate
1133
+
1134
+ Default target: `2-3` serious approaches.
1135
+
1136
+ The slate should usually include:
1137
+
1138
+ - one incumbent-deepening refinement
1139
+ - one orthogonal mechanism
1140
+ - one broader shift candidate when justified
1141
+
1142
+ Do not produce several renamed variants of the same mechanism family.
1143
+ If two variants differ only by parameter choice or patch detail, keep only the sharper one.
1144
+
1145
+ For each candidate, write:
1146
+
1147
+ - bottleneck
1148
+ - why_current_line_is_limited
1149
+ - mechanism
1150
+ - why_now
1151
+ - keep_unchanged
1152
+ - expected_gain
1153
+ - main_risks
1154
+
1155
+ ## 3. Compare on one shared surface
1156
+
1157
+ Before recommending a winner, compare the serious candidates on the same dimensions:
1158
+
1159
+ - expected upside
1160
+ - comparability safety
1161
+ - implementation surface
1162
+ - mechanism distinctness
1163
+ - failure risk
1164
+ - reason this route is better now than the nearby alternatives
1165
+
1166
+ Do not let each candidate justify itself with a different scoring story.
1167
+ Use one comparison surface so ranking is auditable.
1168
+
1169
+ ## 4. Recommend exactly one lead brief
1170
+
1171
+ After comparison, recommend one lead brief and explain:
1172
+
1173
+ - why it is the best next move now
1174
+ - why the main alternatives are deferred instead of promoted
1175
+ - what evidence would quickly disconfirm the lead brief
1176
+
1177
+ Do not say "all are promising" and promote everything.
1178
+ If the slate is still too close to call, return to widening once or narrow the slate further.
1179
+
1180
+ ## 5. Self-check before submission
1181
+
1182
+ Before calling `artifact.submit_idea(..., submission_mode='candidate', ...)`, check:
1183
+
1184
+ - Is the bottleneck concrete rather than generic?
1185
+ - Does `why_current_line_is_limited` explain a real gap instead of restating the mechanism?
1186
+ - Does `why_now` explain what changed in evidence, failure pattern, or frontier state?
1187
+ - Is the comparability boundary explicit?
1188
+ - Is the recommendation based on tradeoffs rather than implementation convenience?
1189
+ - Would the brief still make sense if handed to another agent with no chat context?
1190
+
1191
+ If any answer is no, refine the brief before submission.
1192
+
1193
+ ## 6. Output shape
1194
+
1195
+ A good final brief package is short and structured:
1196
+
1197
+ 1. brief title
1198
+ 2. one-paragraph bottleneck and constraint summary
1199
+ 3. a `2-3` candidate comparison table or bullet slate
1200
+ 4. recommended brief with tradeoff summary
1201
+ 5. self-check outcome
1202
+ 6. fields ready for the integrated `method-brief-template.md` section
1203
+
1204
+ Keep it compact.
1205
+ This is a shaping pass for optimization candidates, not a paper draft or engineering spec.
1206
+
1207
+ ### candidate-board-template.md
1208
+
1209
+ # CANDIDATE_BOARD.md
1210
+
1211
+ | Candidate ID | Level | Parent | Strategy | Status | Expected Gain | Observed Result | Promote / Archive |
1212
+ | --- | --- | --- | --- | --- | --- | --- | --- |
1213
+ | cand-001 | brief | current-head | explore | proposed | Better tail accuracy | n/a | pending |
1214
+ | cand-002 | impl | cand-001 | exploit | smoke_passed | Faster convergence | smoke ok | consider promote |
1215
+
1216
+ Notes:
1217
+
1218
+ - `Level` should be `brief` or `implementation`
1219
+ - `Parent` may be a branch, idea id, run id, or candidate id
1220
+ - `Strategy` should usually be one of `explore`, `exploit`, `fusion`, `debug`
1221
+ - `Promote / Archive` should be a clear recommendation, not an empty placeholder
1222
+
1223
+ ### candidate-ranking-template.md
1224
+
1225
+ # Candidate Ranking Template
1226
+
1227
+ ## Candidate Set
1228
+
1229
+ - Candidate IDs:
1230
+ - Ranking scope:
1231
+ - Comparison surface:
1232
+
1233
+ ## Criteria
1234
+
1235
+ - expected information gain
1236
+ - feasibility in current repo
1237
+ - comparability against baseline
1238
+ - implementation surface
1239
+ - likely novelty or distinctiveness
1240
+ - risk of redundant overlap
1241
+ - incumbent-improvement potential
1242
+ - distinctness from other candidates
1243
+ - mechanism-family diversity
1244
+ - change-layer diversity
1245
+
1246
+ ## Ranked Candidates
1247
+
1248
+ 1. `candidate_id`
1249
+ Score summary:
1250
+ Why it ranks here:
1251
+ Promote / hold / reject:
1252
+
1253
+ 2. `candidate_id`
1254
+ Score summary:
1255
+ Why it ranks here:
1256
+ Promote / hold / reject:
1257
+
1258
+ 3. `candidate_id`
1259
+ Score summary:
1260
+ Why it ranks here:
1261
+ Promote / hold / reject:
1262
+
1263
+ ## Winner Justification
1264
+
1265
+ Why the selected candidate should become a durable line now.
1266
+
1267
+ ## Non-Winner Notes
1268
+
1269
+ Why the other candidates were deferred, fused, or rejected.
1270
+
1271
+ ## Promotion Cap
1272
+
1273
+ - how many candidates should be promoted now:
1274
+ - why more promotion would dilute the frontier:
1275
+ - same-family cap override justification:
1276
+
1277
+ ### codegen-route-playbook.md
1278
+
1279
+ # Codegen Route Playbook
1280
+
1281
+ Choose the code-generation route deliberately.
1282
+
1283
+ ## Use brief-only
1284
+
1285
+ Use no-code candidate briefs when:
1286
+
1287
+ - the direction is still underspecified
1288
+ - multiple distinct directions still need ranking
1289
+ - a new line should not be promoted yet
1290
+
1291
+ ## Use stepwise generation
1292
+
1293
+ Prefer stepwise generation when:
1294
+
1295
+ - a new durable line is being implemented for the first time
1296
+ - the change spans data processing, model design, and training/evaluation
1297
+ - a modular decomposition will reduce large integrated errors
1298
+ - a plan -> refine -> implement sequence is safer than one monolithic edit
1299
+
1300
+ ## Use diff / patch generation
1301
+
1302
+ Prefer diff / patch generation when:
1303
+
1304
+ - a strong current implementation already exists
1305
+ - the current change is local enough to preserve most of the line
1306
+ - the task is improve, exploit, debug, or most fusion work
1307
+ - the desired change can be described as a bounded delta from the current solution
1308
+
1309
+ ## Use full rewrite
1310
+
1311
+ Use a full rewrite only when:
1312
+
1313
+ - the existing implementation is structurally broken
1314
+ - the desired architecture no longer matches the current codebase shape
1315
+ - diff patching would be more fragile than replacement
1316
+
1317
+ Do not jump to a rewrite merely because one local patch failed.
1318
+
1319
+ ## Response shape
1320
+
1321
+ For non-trivial codegen work, prefer this shape:
1322
+
1323
+ 1. short plan
1324
+ 2. bounded implementation surface
1325
+ 3. keep-unchanged contract
1326
+ 4. validation step
1327
+
1328
+ Do not go from a vague idea directly into a large patch with no intermediate plan.
1329
+
1330
+ ### debug-response-template.md
1331
+
1332
+ # Debug Response Template
1333
+
1334
+ ## Error
1335
+
1336
+ What concrete error or failure occurred?
1337
+
1338
+ ## Retrieved Memory
1339
+
1340
+ What similar failure pattern or repair lesson should be reused before changing code?
1341
+
1342
+ ## Root Cause
1343
+
1344
+ What is the most likely underlying cause?
1345
+
1346
+ ## Minimal Fix
1347
+
1348
+ What is the smallest plausible fix?
1349
+
1350
+ ## Keep Unchanged
1351
+
1352
+ What parts of the line must remain unchanged for comparability and stability?
1353
+
1354
+ ## Next Check
1355
+
1356
+ What bounded smoke or validation check should confirm the fix?
1357
+
1358
+ ## Archive Threshold
1359
+
1360
+ What outcome would prove this candidate should be archived instead of debugged again?
1361
+
1362
+ ### frontier-review-template.md
1363
+
1364
+ # Frontier Review Template
1365
+
1366
+ ## Current Frontier
1367
+
1368
+ - mode:
1369
+ - best branch:
1370
+ - best run:
1371
+ - stagnant branches:
1372
+ - candidate backlog:
1373
+ - fusion candidates:
1374
+
1375
+ ## Evidence Summary
1376
+
1377
+ - strongest support:
1378
+ - strongest contradiction:
1379
+ - biggest unresolved risk:
1380
+
1381
+ ## Route Choice
1382
+
1383
+ - explore / exploit / fusion / debug / stop:
1384
+ - why this is the best next move:
1385
+
1386
+ ## Active Optimize Submode
1387
+
1388
+ - brief / rank / seed / loop / fusion / debug:
1389
+ - why this submode is dominant now:
1390
+
1391
+ ## Immediate Next Action
1392
+
1393
+ - exact next step:
1394
+ - what result will trigger another frontier review:
1395
+ - what result would force a different mode:
1396
+
1397
+ ### fusion-playbook.md
1398
+
1399
+ # Fusion Playbook
1400
+
1401
+ Use fusion only when:
1402
+
1403
+ - at least two lines have real strengths
1404
+ - the strengths are complementary
1405
+ - one line alone is no longer improving fast enough
1406
+
1407
+ Before fusion, write down:
1408
+
1409
+ - source line A:
1410
+ strongest mechanism:
1411
+ strongest evidence:
1412
+ main weakness:
1413
+ what must survive the fusion:
1414
+
1415
+ - source line B:
1416
+ strongest mechanism:
1417
+ strongest evidence:
1418
+ main weakness:
1419
+ what must survive the fusion:
1420
+
1421
+ Then answer:
1422
+
1423
+ - what exactly is being fused?
1424
+ - why does this combination address a real bottleneck?
1425
+ - why are the source strengths complementary rather than redundant?
1426
+ - what remains unchanged for comparability?
1427
+ - what evidence would prove the fusion was worth it?
1428
+ - what bounded first validation step should run before any broad rollout?
1429
+
1430
+ Do not fuse:
1431
+
1432
+ - two lines with the same mechanism under different names
1433
+ - two weak lines with no clear strengths
1434
+ - merely because multiple branches exist
1435
+
1436
+ ### method-brief-template.md
1437
+
1438
+ # Method Brief Template
1439
+
1440
+ ## Title
1441
+
1442
+ One short line naming the candidate direction.
1443
+
1444
+ ## Bottleneck
1445
+
1446
+ What concrete bottleneck or limitation does this target?
1447
+
1448
+ ## Why Current Line Is Limited
1449
+
1450
+ Why is the current best line or baseline not already solving this?
1451
+
1452
+ ## Mechanism
1453
+
1454
+ What specific intervention or design change is proposed?
1455
+
1456
+ ## Mechanism Family
1457
+
1458
+ Name the family explicitly, for example `adapter`, `loss`, `architecture`, `augmentation`, `ensemble`, `retrieval`, `objective-shift`.
1459
+
1460
+ ## Change Layer
1461
+
1462
+ One of:
1463
+
1464
+ - `Tier1`: local optimization / training detail
1465
+ - `Tier2`: representation or component change
1466
+ - `Tier3`: paradigm or system-level shift
1467
+
1468
+ ## Source Lens
1469
+
1470
+ Where did this candidate come from?
1471
+
1472
+ - baseline_refinement
1473
+ - orthogonal_mechanism
1474
+ - failure_repair
1475
+ - cross_domain_transfer
1476
+ - objective_shift
1477
+ - search_widening
1478
+
1479
+ ## Keep Unchanged
1480
+
1481
+ What must remain stable for comparability?
1482
+
1483
+ ## Expected Gain
1484
+
1485
+ What evidence should improve if this works?
1486
+
1487
+ ## Implementation Surface
1488
+
1489
+ - main files or modules likely involved:
1490
+ - likely change scope: local / moderate / broad
1491
+
1492
+ ## Risks
1493
+
1494
+ - Main failure mode
1495
+ - Comparability risk
1496
+ - Implementation risk
1497
+
1498
+ ## Foundation
1499
+
1500
+ - Source branch / run / baseline:
1501
+ - Why this foundation is the right starting point:
1502
+
1503
+ ## Promote Now
1504
+
1505
+ - yes / no
1506
+ - why:
1507
+
1508
+ ## Next Target
1509
+
1510
+ Usually `optimize` or `experiment`.
1511
+
1512
+ ### optimization-memory-template.md
1513
+
1514
+ # Optimization Memory Template
1515
+
1516
+ ## Type
1517
+
1518
+ - success pattern / failure pattern / fusion lesson
1519
+
1520
+ ## Context
1521
+
1522
+ - task:
1523
+ - branch or idea:
1524
+ - candidate id:
1525
+ - strategy:
1526
+
1527
+ ## Observation
1528
+
1529
+ What actually happened?
1530
+
1531
+ ## Why It Matters
1532
+
1533
+ Why should a later optimization pass retrieve this?
1534
+
1535
+ ## Retrieval Hint
1536
+
1537
+ - query keywords:
1538
+ - closest line or mechanism family:
1539
+ - when this should be recalled first:
1540
+
1541
+ ## Reuse Hint
1542
+
1543
+ When should this lesson be reused, and when should it be avoided?
1544
+
1545
+ ### optimize-checklist-template.md
1546
+
1547
+ # OPTIMIZE_CHECKLIST.md
1548
+
1549
+ - [ ] Read `artifact.get_optimization_frontier(...)` or equivalent durable frontier summary
1550
+ - [ ] Select the primary optimize submode: `brief`, `rank`, `seed`, `loop`, `fusion`, or `debug`
1551
+ - [ ] Confirm whether the current pass is `explore`, `exploit`, `fusion`, `debug`, or `stop`
1552
+ - [ ] Review recent optimization memory before generating new candidates
1553
+ - [ ] Check whether the current brief slate covers more than one mechanism family
1554
+ - [ ] Candidate briefs updated or confirmed
1555
+ - [ ] Candidate ranking updated
1556
+ - [ ] Promote only the strongest brief(s) into durable line(s) if justified
1557
+ - [ ] Current implementation candidate pool recorded
1558
+ - [ ] Smoke queue defined
1559
+ - [ ] Full-eval queue defined
1560
+ - [ ] Recent failures classified and either debugged or archived
1561
+ - [ ] Stagnation check performed
1562
+ - [ ] Family-shift trigger checked
1563
+ - [ ] Fusion eligibility checked
1564
+ - [ ] Next concrete action written
1565
+
1566
+ ### plateau-response-playbook.md
1567
+
1568
+ # Plateau Response Playbook
1569
+
1570
+ Use this when one line keeps producing non-improving results.
1571
+
1572
+ ## Plateau indicators
1573
+
1574
+ - repeated non-improving results on the same line
1575
+ - repeated "small tweak" proposals with no structural change
1576
+ - candidate queue filled with near-duplicate mechanisms
1577
+
1578
+ ## Required response
1579
+
1580
+ 1. state that the line is plateauing
1581
+ 2. identify the most likely root cause of the plateau
1582
+ 3. choose one of:
1583
+ - widen search
1584
+ - promote a stronger alternative
1585
+ - fuse with another line
1586
+ - debug a strategically valuable blocked candidate
1587
+ - stop the line
1588
+ 4. record one explicit non-repeat rule so the next pass does not retry the same low-information move
1589
+
1590
+ ## Do not do
1591
+
1592
+ - keep proposing near-identical local tweaks
1593
+ - rerun the same unchanged candidate
1594
+ - fuse without a clear complementary mechanism
1595
+ - hide a plateau under a sequence of tiny "one more tweak" edits
1596
+
1597
+ ### prompt-patterns.md
1598
+
1599
+ # Optimization Prompt Patterns
1600
+
1601
+ These prompt structures are worth preserving across optimize subroutines.
1602
+
1603
+ ## Common skeleton
1604
+
1605
+ - Introduction
1606
+ - Task description
1607
+ - Memory
1608
+ - Previous solution or previous line
1609
+ - Instructions
1610
+ - assistant_prefix when a stable response lead-in reduces drift
1611
+ - Explicit response format
1612
+
1613
+ ## Common reasoning contract
1614
+
1615
+ - WHAT is changing?
1616
+ - WHY is the current line limited?
1617
+ - HOW should the change address the limitation?
1618
+ - KEEP UNCHANGED: what must remain stable for comparability?
1619
+ - NEXT ACTION: what concrete step follows this prompt?
1620
+
1621
+ ## Plateau pattern
1622
+
1623
+ When the line is stagnating:
1624
+
1625
+ - explicitly state that the current approach has plateaued
1626
+ - forbid trivial hyperparameter-only tweaks when a deeper change is needed
1627
+ - require a larger representational or architectural shift
1628
+
1629
+ ## Fusion pattern
1630
+
1631
+ When combining lines:
1632
+
1633
+ - identify the real strength of each source line
1634
+ - explain why those strengths are complementary
1635
+ - avoid combining everything
1636
+ - preserve the comparison surface
1637
+
1638
+ ## Debug pattern
1639
+
1640
+ For debugging:
1641
+
1642
+ - restate the concrete error
1643
+ - state the likely root cause
1644
+ - require the minimal targeted fix
1645
+ - preserve the original solution intent unless the bug proves the design invalid