@researai/deepscientist 1.5.14 → 1.5.15

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (119) hide show
  1. package/README.md +8 -0
  2. package/assets/branding/logo-raster.png +0 -0
  3. package/bin/ds.js +134 -49
  4. package/docs/en/00_QUICK_START.md +2 -2
  5. package/docs/en/01_SETTINGS_REFERENCE.md +20 -4
  6. package/docs/en/03_QQ_CONNECTOR_GUIDE.md +19 -0
  7. package/docs/en/10_WEIXIN_CONNECTOR_GUIDE.md +20 -0
  8. package/docs/en/14_PROMPT_SKILLS_AND_MCP_GUIDE.md +2 -0
  9. package/docs/en/16_TELEGRAM_CONNECTOR_GUIDE.md +134 -0
  10. package/docs/en/17_WHATSAPP_CONNECTOR_GUIDE.md +126 -0
  11. package/docs/en/18_FEISHU_CONNECTOR_GUIDE.md +136 -0
  12. package/docs/en/README.md +6 -0
  13. package/docs/zh/00_QUICK_START.md +2 -2
  14. package/docs/zh/01_SETTINGS_REFERENCE.md +20 -4
  15. package/docs/zh/03_QQ_CONNECTOR_GUIDE.md +19 -0
  16. package/docs/zh/10_WEIXIN_CONNECTOR_GUIDE.md +20 -0
  17. package/docs/zh/14_PROMPT_SKILLS_AND_MCP_GUIDE.md +2 -0
  18. package/docs/zh/16_TELEGRAM_CONNECTOR_GUIDE.md +134 -0
  19. package/docs/zh/17_WHATSAPP_CONNECTOR_GUIDE.md +126 -0
  20. package/docs/zh/18_FEISHU_CONNECTOR_GUIDE.md +136 -0
  21. package/docs/zh/README.md +6 -0
  22. package/install.sh +2 -0
  23. package/package.json +1 -1
  24. package/pyproject.toml +1 -1
  25. package/src/deepscientist/__init__.py +1 -1
  26. package/src/deepscientist/artifact/charts.py +567 -0
  27. package/src/deepscientist/artifact/guidance.py +50 -10
  28. package/src/deepscientist/artifact/metrics.py +228 -5
  29. package/src/deepscientist/artifact/schemas.py +3 -0
  30. package/src/deepscientist/artifact/service.py +3534 -191
  31. package/src/deepscientist/bash_exec/models.py +23 -0
  32. package/src/deepscientist/bash_exec/monitor.py +147 -67
  33. package/src/deepscientist/bash_exec/runtime.py +218 -156
  34. package/src/deepscientist/bash_exec/service.py +79 -64
  35. package/src/deepscientist/bash_exec/shells.py +87 -0
  36. package/src/deepscientist/bridges/connectors.py +51 -2
  37. package/src/deepscientist/config/models.py +6 -3
  38. package/src/deepscientist/config/service.py +7 -2
  39. package/src/deepscientist/connector/weixin_support.py +122 -1
  40. package/src/deepscientist/daemon/api/handlers.py +75 -4
  41. package/src/deepscientist/daemon/api/router.py +1 -0
  42. package/src/deepscientist/daemon/app.py +758 -206
  43. package/src/deepscientist/doctor.py +51 -0
  44. package/src/deepscientist/file_lock.py +48 -0
  45. package/src/deepscientist/gitops/diff.py +167 -1
  46. package/src/deepscientist/mcp/server.py +173 -5
  47. package/src/deepscientist/process_control.py +161 -0
  48. package/src/deepscientist/prompts/builder.py +267 -442
  49. package/src/deepscientist/quest/service.py +2255 -163
  50. package/src/deepscientist/quest/stage_views.py +171 -0
  51. package/src/deepscientist/runners/base.py +2 -0
  52. package/src/deepscientist/runners/codex.py +88 -5
  53. package/src/deepscientist/runners/runtime_overrides.py +17 -1
  54. package/src/prompts/contracts/shared_interaction.md +13 -4
  55. package/src/prompts/system.md +916 -72
  56. package/src/skills/analysis-campaign/SKILL.md +31 -2
  57. package/src/skills/analysis-campaign/references/artifact-orchestration.md +1 -1
  58. package/src/skills/analysis-campaign/references/writing-facing-slice-examples.md +65 -0
  59. package/src/skills/baseline/SKILL.md +2 -0
  60. package/src/skills/decision/SKILL.md +19 -2
  61. package/src/skills/experiment/SKILL.md +8 -2
  62. package/src/skills/finalize/SKILL.md +18 -0
  63. package/src/skills/idea/SKILL.md +78 -0
  64. package/src/skills/idea/references/idea-generation-playbook.md +100 -0
  65. package/src/skills/idea/references/outline-seeding-example.md +60 -0
  66. package/src/skills/intake-audit/SKILL.md +1 -1
  67. package/src/skills/optimize/SKILL.md +1644 -0
  68. package/src/skills/rebuttal/SKILL.md +2 -1
  69. package/src/skills/review/SKILL.md +2 -1
  70. package/src/skills/write/SKILL.md +80 -12
  71. package/src/skills/write/references/outline-evidence-contract-example.md +107 -0
  72. package/src/tui/dist/app/AppContainer.js +3 -0
  73. package/src/tui/package.json +1 -1
  74. package/src/ui/dist/assets/{AiManusChatView-DaF9Nge_.js → AiManusChatView-DDjbFnbt.js} +12 -12
  75. package/src/ui/dist/assets/{AnalysisPlugin-BSVx6dXE.js → AnalysisPlugin-Yb5IdmaU.js} +1 -1
  76. package/src/ui/dist/assets/CliPlugin-e64sreyu.js +31037 -0
  77. package/src/ui/dist/assets/{CodeEditorPlugin-DU9G0Tox.js → CodeEditorPlugin-C4D2TIkU.js} +8 -8
  78. package/src/ui/dist/assets/{CodeViewerPlugin-DoX_fI9l.js → CodeViewerPlugin-BVoNZIvC.js} +5 -5
  79. package/src/ui/dist/assets/{DocViewerPlugin-C4FWIXuU.js → DocViewerPlugin-CLChbllo.js} +3 -3
  80. package/src/ui/dist/assets/{GitDiffViewerPlugin-BgfFMgtf.js → GitDiffViewerPlugin-C4xeFyFQ.js} +20 -20
  81. package/src/ui/dist/assets/{ImageViewerPlugin-tcPkfY_x.js → ImageViewerPlugin-OiMUAcLi.js} +5 -5
  82. package/src/ui/dist/assets/{LabCopilotPanel-_dKV60Bf.js → LabCopilotPanel-BjD2ThQF.js} +11 -11
  83. package/src/ui/dist/assets/{LabPlugin-Bje0ayoC.js → LabPlugin-DQPg-NrB.js} +2 -2
  84. package/src/ui/dist/assets/{LatexPlugin-CVsBzAln.js → LatexPlugin-CI05XAV9.js} +7 -7
  85. package/src/ui/dist/assets/{MarkdownViewerPlugin-xjmrqv_8.js → MarkdownViewerPlugin-DpeBLYZf.js} +4 -4
  86. package/src/ui/dist/assets/{MarketplacePlugin-mMM2A8wP.js → MarketplacePlugin-DolE58Q2.js} +3 -3
  87. package/src/ui/dist/assets/{NotebookEditor-3kVDSOBo.js → NotebookEditor-7Qm2rSWD.js} +11 -11
  88. package/src/ui/dist/assets/{NotebookEditor-SoJ8X-MO.js → NotebookEditor-C1kWaxKi.js} +1 -1
  89. package/src/ui/dist/assets/{PdfLoader-DElVuHl9.js → PdfLoader-BfOHw8Zw.js} +1 -1
  90. package/src/ui/dist/assets/{PdfMarkdownPlugin-Bq88XT4G.js → PdfMarkdownPlugin-BulDREv1.js} +2 -2
  91. package/src/ui/dist/assets/{PdfViewerPlugin-CsCXMo9S.js → PdfViewerPlugin-C-daaOaL.js} +10 -10
  92. package/src/ui/dist/assets/{SearchPlugin-oUPvy19k.js → SearchPlugin-CjpaiJ3A.js} +1 -1
  93. package/src/ui/dist/assets/{TextViewerPlugin-CRkT9yNy.js → TextViewerPlugin-BxIyqPQC.js} +5 -5
  94. package/src/ui/dist/assets/{VNCViewer-BgbuvWhR.js → VNCViewer-HAg9mF7M.js} +10 -10
  95. package/src/ui/dist/assets/{bot-v_RASACv.js → bot-0DYntytV.js} +1 -1
  96. package/src/ui/dist/assets/{code-5hC9d0VH.js → code-B20Slj_w.js} +1 -1
  97. package/src/ui/dist/assets/{file-content-D1PxfOrp.js → file-content-DT24KFma.js} +1 -1
  98. package/src/ui/dist/assets/{file-diff-panel-DG1oT_Hj.js → file-diff-panel-DK13YPql.js} +1 -1
  99. package/src/ui/dist/assets/{file-socket-BmdFYQlk.js → file-socket-B4T2o4nR.js} +1 -1
  100. package/src/ui/dist/assets/{image-Dqe2X2tW.js → image-DSeR_sDS.js} +1 -1
  101. package/src/ui/dist/assets/{index-RDlNXXx1.js → index-BrFje2Uk.js} +2 -2
  102. package/src/ui/dist/assets/{index-DVsMKK_y.js → index-BwRJaoTl.js} +1 -1
  103. package/src/ui/dist/assets/{index-Nt9hS4ck.js → index-D_E4281X.js} +5007 -28514
  104. package/src/ui/dist/assets/{index-Duvz8Ip0.js → index-DnYB3xb1.js} +12 -12
  105. package/src/ui/dist/assets/{index-BQG-1s2o.css → index-G7AcWcMu.css} +43 -2
  106. package/src/ui/dist/assets/{monaco-DIXge1CP.js → monaco-LExaAN3Y.js} +1 -1
  107. package/src/ui/dist/assets/{pdf-effect-queue-BBTTQaO-.js → pdf-effect-queue-BJk5okWJ.js} +1 -1
  108. package/src/ui/dist/assets/{popover-BWlolyxo.js → popover-D3Gg_FoV.js} +1 -1
  109. package/src/ui/dist/assets/{project-sync-BM5PkFH4.js → project-sync-C_ygLlVU.js} +1 -1
  110. package/src/ui/dist/assets/{select-D4dAtrA8.js → select-CpAK6uWm.js} +2 -2
  111. package/src/ui/dist/assets/{sigma-CKbE5jJT.js → sigma-DEccaSgk.js} +1 -1
  112. package/src/ui/dist/assets/{square-check-big-CZNGMgiB.js → square-check-big-uUfyVsbD.js} +1 -1
  113. package/src/ui/dist/assets/{trash-DaB37xAz.js → trash-CXvwwSe8.js} +1 -1
  114. package/src/ui/dist/assets/{useCliAccess-C2OmAcWe.js → useCliAccess-Bnop4mgR.js} +1 -1
  115. package/src/ui/dist/assets/{useFileDiffOverlay-Dowd1Ij4.js → useFileDiffOverlay-B8eUAX0I.js} +1 -1
  116. package/src/ui/dist/assets/{wrap-text-BGjAhAUq.js → wrap-text-9vbOBpkW.js} +1 -1
  117. package/src/ui/dist/assets/{zoom-out-dMZQMXzc.js → zoom-out-BgVMmOW4.js} +1 -1
  118. package/src/ui/dist/index.html +2 -2
  119. package/src/ui/dist/assets/CliPlugin-C9gzJX41.js +0 -5905
@@ -0,0 +1,1644 @@
1
+ ---
2
+ name: optimize
3
+ description: Use when an algorithm-first quest should manage candidate briefs, optimization frontier, branch promotion, or fusion-aware search instead of the paper-oriented default loop.
4
+ ---
5
+
6
+ # Optimize
7
+
8
+ Use this skill for algorithm-first quests where the goal is the strongest justified optimization result rather than paper packaging.
9
+
10
+ This skill is the lightweight optimization control layer for DeepScientist.
11
+ It does not replace the normal quest runtime. It tells you how to use the existing DeepScientist artifact, memory, bash_exec, Git, and worktree mechanisms as an optimization system.
12
+
13
+ ## Interaction discipline
14
+
15
+ - Follow the shared interaction contract injected by the system prompt.
16
+ - For ordinary active work, prefer a concise progress update once work has crossed roughly 6 tool calls with a human-meaningful delta, and do not drift beyond roughly 12 tool calls or about 8 minutes without a user-visible update.
17
+ - Ordinary candidate creation, smoke checks, and route updates should stay concise.
18
+ - Use richer milestone updates only when a candidate is promoted, a strong run finishes, the frontier shifts materially, or a fusion/debug route becomes the new main path.
19
+ - When the user asks for the current optimization state, answer from the frontier and durable artifacts rather than from chat memory.
20
+ - Hard execution rule: every terminal command in this stage must go through `bash_exec`; do not use any other terminal path for smoke checks, quick validations, long runs, Git, Python, package-manager, or file-inspection commands.
21
+
22
+ ## Stage purpose
23
+
24
+ The optimize stage should do four things:
25
+
26
+ 1. turn loose ideas into candidate briefs
27
+ 2. rank and promote only the strongest briefs into durable lines
28
+ 3. manage candidate attempts within a durable line
29
+ 4. choose when to explore, exploit, fuse, debug, or stop
30
+
31
+ This skill is especially appropriate when `startup_contract.need_research_paper = false`.
32
+
33
+ Treat `optimize` as one stable stage skill with six internal submodes:
34
+
35
+ - `brief`
36
+ - `rank`
37
+ - `seed`
38
+ - `loop`
39
+ - `fusion`
40
+ - `debug`
41
+
42
+ Do not treat these as separate public skills.
43
+ Treat them as internal execution modes inside one optimize workflow.
44
+
45
+ InternAgent maps most naturally onto the `brief` and `rank` side of this stage.
46
+ MLEvolve maps most naturally onto the `seed`, `loop`, `fusion`, and `debug` side of this stage.
47
+ Do not collapse those two layers into one vague "optimize more" loop.
48
+
49
+ ## Required working files
50
+
51
+ Before broad optimization search or candidate management becomes substantial, maintain these quest-visible control files:
52
+
53
+ - `OPTIMIZE_CHECKLIST.md`
54
+ - `CANDIDATE_BOARD.md`
55
+
56
+ Use:
57
+
58
+ - the integrated `optimize checklist template` appendix section
59
+ - the integrated `candidate board template` appendix section
60
+
61
+ `OPTIMIZE_CHECKLIST.md` is the execution control surface.
62
+ It should track:
63
+
64
+ - current frontier mode
65
+ - current optimize submode
66
+ - candidate brief count
67
+ - promoted line count
68
+ - current smoke queue
69
+ - current full-eval queue
70
+ - stagnation / fusion checks
71
+ - next concrete action
72
+
73
+ `CANDIDATE_BOARD.md` is the compact candidate ledger.
74
+ It should track:
75
+
76
+ - candidate id
77
+ - candidate type: brief or implementation attempt
78
+ - parent line or parent candidate
79
+ - strategy: explore / exploit / fusion / debug
80
+ - status
81
+ - expected gain
82
+ - observed result
83
+ - promote / archive recommendation
84
+
85
+ ## Required MCP-driven workflow
86
+
87
+ Treat this as the concrete optimize workflow. Do not skip these steps just because the quest is algorithm-first.
88
+
89
+ ### 1. Recover the optimization state first
90
+
91
+ At the start of each meaningful optimize pass, use this order unless a stronger local reason exists:
92
+
93
+ 1. `artifact.get_optimization_frontier(...)`
94
+ 2. `memory.list_recent(scope='quest', limit=5)`
95
+ 3. `memory.search(...)`
96
+ 4. `artifact.get_quest_state(detail='summary')`
97
+ 5. `artifact.read_quest_documents(...)` when exact durable wording matters
98
+
99
+ Do not create new candidates before the frontier, recent optimization lessons, and current runtime refs are checked.
100
+ If the frontier is missing or obviously stale, recover that state before proposing more work.
101
+
102
+ ### 2. Shape candidate briefs before branch promotion
103
+
104
+ When the next direction is still fuzzy, do not jump straight into code or branch creation.
105
+ First turn the direction into a compact candidate brief.
106
+
107
+ The brief-shaping sequence is:
108
+
109
+ 1. clarify the bottleneck, constraints, and comparability boundary
110
+ 2. identify the incumbent or baseline that this brief must beat or complement
111
+ 3. generate a small differentiated slate, usually `2-3` serious approaches
112
+ 4. compare them on one shared surface
113
+ 5. recommend exactly one lead brief
114
+ 6. self-check the recommended brief before submission
115
+
116
+ Every serious brief should answer:
117
+
118
+ - bottleneck
119
+ - why_current_line_is_limited
120
+ - mechanism
121
+ - why_now
122
+ - keep_unchanged
123
+ - expected_gain
124
+ - implementation_surface
125
+ - main_risks
126
+
127
+ The durable call for this step is usually:
128
+
129
+ - `artifact.submit_idea(mode='create', submission_mode='candidate', ...)`
130
+
131
+ Use `idea` when the mechanism family itself is still unresolved.
132
+ Use `optimize` when the family is already chosen and the work is now branchless brief shaping, ranking, or within-line search.
133
+
134
+ ### 3. Rank candidate briefs on one explicit surface
135
+
136
+ Before promoting a line, compare the serious briefs on one shared ranking surface.
137
+ At minimum evaluate:
138
+
139
+ - expected information gain
140
+ - feasibility in current repo
141
+ - comparability against baseline
142
+ - implementation surface
143
+ - novelty or distinctiveness
144
+ - family diversity
145
+ - change-layer diversity
146
+ - incumbent-improvement potential
147
+ - failure risk
148
+
149
+ Then state:
150
+
151
+ - winner justification
152
+ - non-winner defer / reject reasons
153
+ - promotion cap: how many lines should actually be promoted now
154
+
155
+ Do not promote every plausible brief.
156
+ Default rule: promote only `1-3` candidate briefs, and usually fewer.
157
+
158
+ The durable call for this step is one of:
159
+
160
+ - `artifact.submit_idea(mode='create', submission_mode='line', source_candidate_id=..., ...)`
161
+ - `artifact.record(payload={'kind': 'decision', 'action': 'branch'|'continue'|'stop', ...})`
162
+
163
+ ### 4. Hand off promoted lines into experiment cleanly
164
+
165
+ Once a brief is promoted, the next main work belongs to `experiment`, not to vague optimize chatter.
166
+ Before substantial implementation or compute:
167
+
168
+ - activate or confirm the intended durable line
169
+ - update `OPTIMIZE_CHECKLIST.md`
170
+ - update `CANDIDATE_BOARD.md`
171
+ - create or revise `PLAN.md`
172
+ - create or revise `CHECKLIST.md`
173
+ - define the smoke queue and full-eval queue explicitly
174
+
175
+ Then hand off into `experiment` for:
176
+
177
+ - one clean implementation pass
178
+ - one bounded smoke or pilot run
179
+ - one real measured main run
180
+
181
+ Do not keep reshaping the method after the run contract is already concrete.
182
+
183
+ ### 5. Record every meaningful result durably
184
+
185
+ Use these artifact forms consistently:
186
+
187
+ - candidate brief:
188
+ - `artifact.submit_idea(..., submission_mode='candidate')`
189
+ - durable optimization line:
190
+ - `artifact.submit_idea(..., submission_mode='line')`
191
+ - implementation-level candidate attempt inside one line:
192
+ - `artifact.record(payload={'kind': 'report', 'report_type': 'optimization_candidate', ...})`
193
+ - real measured main result:
194
+ - `artifact.record_main_experiment(...)`
195
+ - route change after the result:
196
+ - `artifact.record(payload={'kind': 'decision', 'action': 'iterate'|'branch'|'continue'|'stop', ...})`
197
+
198
+ Do not treat chat summaries as substitutes for these durable records.
199
+
200
+ ### 6. Manage process lifecycle explicitly
201
+
202
+ Optimize uses the same long-run process discipline as `experiment`.
203
+
204
+ - Use `bash_exec` for smoke checks, quick validations, and long runs.
205
+ - Before launching a new run, inspect current managed sessions first.
206
+ - Do not start a duplicate process for the same purpose if a valid live session already exists.
207
+ - Use bounded smoke before long runs unless direct quick validation is already cheap and equally informative.
208
+ - Use `bash_exec(mode='detach', ...)` for long runs and monitor with `list/read/await`.
209
+ - Read logs before retrying a failed or suspicious run; do not relaunch blindly.
210
+ - Kill only on explicit invalidity, supersession, or checked no-progress conditions.
211
+ - After pause, resume, or daemon recovery, recover session state before spawning new runs.
212
+
213
+ ### 7. Route from evidence, not from momentum
214
+
215
+ After every real measured result:
216
+
217
+ 1. refresh the frontier
218
+ 2. compare the result against the incumbent and backlog
219
+ 3. choose exactly one dominant next action:
220
+ - explore
221
+ - exploit
222
+ - fusion
223
+ - debug
224
+ - stop
225
+ 4. record that route durably
226
+
227
+ Do not treat one candidate creation, one smoke pass, or one detached launch as stage completion.
228
+
229
+ ## Integrated templates and playbooks
230
+
231
+ Use the following integrated structures directly inside this skill. They replace the old optimize reference files conceptually, even if those files still exist on disk.
232
+
233
+ ### Candidate brief template
234
+
235
+ Every serious candidate brief should include:
236
+
237
+ - title
238
+ - bottleneck
239
+ - why_current_line_is_limited
240
+ - mechanism
241
+ - mechanism_family
242
+ - change_layer: `Tier1` / `Tier2` / `Tier3`
243
+ - source_lens
244
+ - keep_unchanged
245
+ - expected_gain
246
+ - implementation_surface
247
+ - risks
248
+ - foundation
249
+ - promote_now
250
+ - next_target
251
+
252
+ ### Brief-shaping playbook
253
+
254
+ Use this when a candidate direction is still fuzzy and needs to become a ranking-ready brief.
255
+
256
+ - clarify the concrete bottleneck before widening
257
+ - resolve the evaluation or comparability boundary
258
+ - identify the main hard constraint
259
+ - identify the current incumbent
260
+ - generate only a small differentiated slate
261
+ - compare on one shared surface
262
+ - recommend exactly one lead brief
263
+ - self-check for ambiguity, overlap, and weak justification
264
+
265
+ ### Candidate ranking template
266
+
267
+ When several briefs compete, produce:
268
+
269
+ - candidate set
270
+ - ranking scope
271
+ - comparison surface
272
+ - ranked candidates with score summary, why each ranks there, and promote / hold / reject
273
+ - winner justification
274
+ - non-winner notes
275
+ - promotion cap
276
+
277
+ ### Candidate board template
278
+
279
+ `CANDIDATE_BOARD.md` should expose at least these columns:
280
+
281
+ - candidate id
282
+ - level: `brief` or `implementation`
283
+ - parent
284
+ - strategy
285
+ - status
286
+ - expected gain
287
+ - observed result
288
+ - promote / archive recommendation
289
+
290
+ ### Optimize checklist template
291
+
292
+ `OPTIMIZE_CHECKLIST.md` should track at least:
293
+
294
+ - frontier has been refreshed
295
+ - primary optimize submode chosen
296
+ - current route mode chosen
297
+ - recent optimization memory reviewed
298
+ - brief slate checked for family diversity
299
+ - candidate briefs updated or confirmed
300
+ - candidate ranking updated
301
+ - promotion decision made
302
+ - current implementation pool recorded
303
+ - smoke queue defined
304
+ - full-eval queue defined
305
+ - failures classified
306
+ - stagnation check performed
307
+ - fusion eligibility checked
308
+ - next concrete action written
309
+
310
+ ### Frontier review template
311
+
312
+ Whenever route choice is unclear, write down:
313
+
314
+ - current frontier
315
+ - evidence summary
316
+ - route choice
317
+ - active optimize submode
318
+ - immediate next action
319
+
320
+ ### Code-generation route playbook
321
+
322
+ Choose one route deliberately:
323
+
324
+ - brief-only when the direction is still unclear
325
+ - stepwise generation for first substantial implementation of a new line
326
+ - diff / patch generation for improve / exploit / debug / most fusion work
327
+ - full rewrite only when the current implementation is structurally broken or mismatched
328
+
329
+ Do not jump to a rewrite merely because one local patch failed.
330
+
331
+ ### Debug response template
332
+
333
+ When a candidate fails but still looks strategically valuable, record:
334
+
335
+ - error
336
+ - retrieved memory
337
+ - root cause
338
+ - minimal fix
339
+ - keep unchanged
340
+ - next check
341
+ - archive threshold
342
+
343
+ ### Fusion playbook
344
+
345
+ Before opening a fusion candidate, answer:
346
+
347
+ - what exactly is being fused?
348
+ - why are the source strengths complementary rather than redundant?
349
+ - what remains unchanged for comparability?
350
+ - what bounded evidence would prove the fusion worthwhile?
351
+ - what bounded first validation step should run before any broad rollout?
352
+
353
+ Do not fuse two weak lines or two same-mechanism lines under different names.
354
+
355
+ ### Optimization memory template
356
+
357
+ When writing reusable optimization lessons, capture:
358
+
359
+ - type
360
+ - context
361
+ - observation
362
+ - why it matters
363
+ - retrieval hint
364
+ - reuse hint
365
+
366
+ ### Plateau response playbook
367
+
368
+ If one line keeps producing non-improving results:
369
+
370
+ 1. state that the line is plateauing
371
+ 2. identify the most likely root cause
372
+ 3. choose one larger route change:
373
+ - widen search
374
+ - promote a stronger alternative
375
+ - fuse
376
+ - debug
377
+ - stop
378
+ 4. record one explicit non-repeat rule
379
+
380
+ Do not hide plateau under a sequence of tiny "one more tweak" loops.
381
+
382
+ ### Prompt patterns worth preserving
383
+
384
+ For candidate-brief, improve, fusion, and debug prompts, preserve:
385
+
386
+ - introduction
387
+ - task description
388
+ - memory
389
+ - previous solution or previous line
390
+ - instructions
391
+ - explicit response format
392
+
393
+ Preserve these reasoning contracts whenever possible:
394
+
395
+ - WHAT is changing?
396
+ - WHY is the current line limited?
397
+ - HOW should the change address the limitation?
398
+ - KEEP UNCHANGED
399
+ - NEXT ACTION
400
+
401
+ ## Non-negotiable rules
402
+
403
+ - Do not treat every patch or micro-attempt as a new durable idea line.
404
+ - Do not create a new Git branch/worktree for every implementation-level candidate.
405
+ - Use `artifact.submit_idea(..., submission_mode='candidate')` for candidate briefs that should be ranked before promotion.
406
+ - Use `artifact.submit_idea(..., submission_mode='line')` only for directions that deserve a durable optimization line and branch/worktree.
407
+ - Use `artifact.record(payload={'kind': 'report', 'report_type': 'optimization_candidate', ...})` for implementation-level candidate attempts inside one durable line.
408
+ - Before deciding the next route, call `artifact.get_optimization_frontier(...)` when available and use it as the primary optimization-state summary.
409
+ - Keep all major optimization successes and failures durable through artifacts and memory.
410
+ - Do not drift into paper-outline, bundle, or finalize work by default while this stage is active.
411
+ - Do not convert ranking uncertainty into premature branch creation.
412
+ - Do not treat an implementation-level candidate report as a new durable optimization line.
413
+ - Do not keep widening the frontier once a small serious slate already exists.
414
+ - Do not let one optimize pass mix multiple major route changes.
415
+ One pass may inspect several possibilities, but it should finish with one dominant next action.
416
+
417
+ ## When to use
418
+
419
+ - the quest is algorithm-first
420
+ - the baseline gate is already confirmed or waived
421
+ - the task has at least one plausible optimization direction
422
+ - multiple candidate directions exist and the system should rank them before promotion
423
+ - a durable line exists and the next step is to manage explore / exploit / fuse / debug
424
+
425
+ ## Do not use when
426
+
427
+ - the baseline gate is unresolved
428
+ - the main need is a paper draft, rebuttal, or review task
429
+ - the quest is still in broad literature scouting with no concrete optimization handle
430
+
431
+ ## Core object model
432
+
433
+ Use these three object levels consistently:
434
+
435
+ 1. candidate brief
436
+ `artifact.submit_idea(mode='create', submission_mode='candidate', ...)`
437
+ This records a possible direction or method brief without opening a branch yet.
438
+
439
+ 2. durable optimization line
440
+ `artifact.submit_idea(mode='create', submission_mode='line', ...)`
441
+ This opens a real branch/worktree and becomes a formal optimization path.
442
+
443
+ 3. implementation-level candidate attempt
444
+ `artifact.record(payload={'kind': 'report', 'report_type': 'optimization_candidate', ...})`
445
+ This is a within-line attempt such as one patch, one smoke candidate, one debug candidate, or one fusion candidate.
446
+
447
+ ## Recommended workflow
448
+
449
+ 1. Read the current frontier and recent durable state.
450
+ 2. If only loose candidate directions exist, create or refine candidate briefs first.
451
+ 3. Rank the candidate briefs and promote only the best `1-3` into durable lines.
452
+ 4. Inside a durable line, generate a small candidate pool, then run bounded smoke checks before full evaluations.
453
+ 5. Record each implementation-level attempt durably with status, change plan, and result.
454
+ 6. After each real result, decide whether to explore, exploit, fuse, debug, or stop.
455
+ 7. Write optimization lessons to memory before leaving the stage.
456
+
457
+ At the start of each meaningful optimize pass, update `OPTIMIZE_CHECKLIST.md` before spending significant code or compute.
458
+
459
+ ## Mandatory first-call sequence
460
+
461
+ At the start of a meaningful optimize pass, use this order unless a stronger local reason exists:
462
+
463
+ 1. `artifact.get_optimization_frontier(...)`
464
+ 2. `memory.search(...)`
465
+ 3. `artifact.get_quest_state(detail='summary')`
466
+ 4. `artifact.read_quest_documents(...)` when exact durable wording matters
467
+
468
+ Do not start generating new candidates before the frontier and recent optimization lessons are checked.
469
+
470
+ ## Stage-start requirement
471
+
472
+ Stage-start requirement:
473
+
474
+ - run `memory.list_recent(scope='quest', limit=5)`
475
+ - run at least one `memory.search(...)`
476
+ - read `artifact.get_optimization_frontier(...)`
477
+ - update `OPTIMIZE_CHECKLIST.md`
478
+
479
+ If the frontier is missing or obviously stale, recover that state before proposing more work.
480
+
481
+ ## Internal submode selection
482
+
483
+ Choose exactly one primary optimize submode for the current meaningful pass.
484
+
485
+ Default selection order:
486
+
487
+ 1. `fusion`
488
+ - when the frontier explicitly says `fusion`
489
+ 2. `debug`
490
+ - when a strategically valuable candidate failed for a concrete and likely fixable reason
491
+ 3. `rank`
492
+ - when several candidate briefs already exist and promotion is the main unresolved question
493
+ 4. `brief`
494
+ - when the candidate-brief slate is too thin or too weak
495
+ 5. `seed`
496
+ - when a durable line exists but there is no live implementation-candidate pool
497
+ 6. `loop`
498
+ - when a live candidate pool or leading durable line already exists and the main need is bounded execution progress
499
+
500
+ Do not bounce among submodes repeatedly in one pass.
501
+ If the best submode changes after new evidence appears, record that route shift explicitly.
502
+
503
+ ## Candidate brief protocol
504
+
505
+ When a direction is interesting but not yet worthy of a new branch:
506
+
507
+ - create a candidate brief with `submission_mode='candidate'`
508
+ - keep it branchless
509
+ - record enough structure that later ranking or promotion is possible
510
+
511
+ Good candidate-brief fields include:
512
+
513
+ - title
514
+ - problem
515
+ - hypothesis
516
+ - mechanism
517
+ - mechanism_family
518
+ - change_layer
519
+ - source_lens
520
+ - expected_gain
521
+ - risks
522
+ - decision_reason
523
+ - foundation_ref
524
+ - lineage_intent
525
+
526
+ Do not promote every candidate automatically.
527
+
528
+ Use the integrated `method brief template` section for the minimum acceptable candidate-brief structure.
529
+ Use the integrated `brief shaping playbook` section when the brief is still too vague, too implementation-first, or too collapsed onto one familiar mechanism.
530
+
531
+ Candidate briefs should explicitly answer:
532
+
533
+ - WHAT bottleneck is being targeted?
534
+ - WHY is the current line limited?
535
+ - HOW does this mechanism address the limitation?
536
+ - WHAT must remain unchanged for comparability?
537
+
538
+ If the brief cannot answer those four questions clearly, it is not ready for promotion or implementation.
539
+
540
+ Treat a candidate brief as the DeepScientist form of a method brief.
541
+ It should sit between "idea intuition" and "code implementation".
542
+
543
+ Preserve this brief-shaping discipline:
544
+
545
+ 1. clarify the bottleneck, constraints, and comparability boundary first
546
+ 2. generate a small differentiated slate, usually `2-3` serious approaches
547
+ 3. recommend one approach with explicit tradeoffs against the alternatives
548
+ 4. self-check the winning brief for ambiguity, overlap, and weak justification before submission
549
+
550
+ Do not jump from "interesting intuition" to branch creation.
551
+ Do not jump from "I know how to code this" to "this deserves promotion."
552
+
553
+ When running the `brief` submode:
554
+
555
+ - produce only `2-4` serious candidate briefs by default
556
+ - ask or answer the minimum clarifying questions needed to remove ambiguity around bottleneck, constraint fit, and comparability
557
+ - explicitly keep one incumbent-compatible refinement when possible
558
+ - explicitly keep one orthogonal alternative when possible
559
+ - explicitly keep one broader lens or paradigm shift candidate when possible
560
+ - avoid generating several renamed variants of the same mechanism
561
+ - prefer mechanism-level distinctness over volume
562
+ - present the differentiated slate on one shared comparison surface before choosing a recommended brief
563
+ - keep the questioning bounded and execution-oriented rather than open-ended brainstorming
564
+
565
+ Use a coverage contract for every serious brief slate:
566
+
567
+ - one `incumbent-deepening` direction when justified
568
+ - one `orthogonal-mechanism` direction when justified
569
+ - one `paradigm/objective/data-view shift` direction when justified
570
+
571
+ If all serious briefs belong to the same mechanism family, do one widening pass before ranking.
572
+ Do not treat a same-family slate as sufficient merely because the local scores look good.
573
+
574
+ For each serious brief, record at least:
575
+
576
+ - bottleneck
577
+ - why_current_line_is_limited
578
+ - mechanism
579
+ - why_now
580
+ - mechanism_family
581
+ - change_layer: `Tier1` / `Tier2` / `Tier3`
582
+ - source_lens
583
+ - keep_unchanged
584
+ - expected_gain
585
+ - implementation_surface
586
+ - main_risks
587
+ - promote_now: yes or no
588
+
589
+ InternAgent-style behavior to preserve here:
590
+
591
+ - generate candidate methods first
592
+ - critique them before promotion
593
+ - express them as method-layer objects rather than code patches
594
+ - defer branch creation until the candidate is actually chosen
595
+ - prefer one-question-at-a-time clarification when one missing assumption would otherwise contaminate the whole brief slate
596
+
597
+ Do not require a paper-style literature hard gate inside this submode unless the quest explicitly moved back toward paper work.
598
+
599
+ ## Promotion protocol
600
+
601
+ Only promote a candidate brief into a durable line when at least one of the following is true:
602
+
603
+ - it clearly dominates the nearby alternatives
604
+ - it is top-ranked and sufficiently distinct
605
+ - the user explicitly asked to pursue it
606
+ - the current frontier indicates the line is the strongest next move
607
+
608
+ Promotion should use:
609
+
610
+ `artifact.submit_idea(mode='create', submission_mode='line', source_candidate_id=..., ...)`
611
+
612
+ When several candidate briefs are plausible, rank them explicitly before promotion.
613
+ Use the integrated `candidate ranking template` section for the minimum acceptable ranking record.
614
+
615
+ Default promotion rule:
616
+
617
+ - promote only `1-3` candidate briefs into durable lines
618
+ - if one candidate clearly dominates, promote only that one
619
+ - if the frontier is still structurally uncertain, promote at most two sufficiently distinct lines
620
+
621
+ When running the `rank` submode:
622
+
623
+ - compare the current serious briefs on one explicit shared surface
624
+ - score or rank them with written reasons
625
+ - state why the winner is better now
626
+ - state why the main alternatives are deferred rather than erased
627
+ - never treat "all seem promising" as a sufficient reason to promote them all
628
+
629
+ Use a distinct promotion policy:
630
+
631
+ - default rule: each mechanism family should contribute at most one promoted line
632
+ - do not let one familiar family fill the whole promoted slate
633
+ - only override that family cap when one candidate clearly dominates the whole field
634
+
635
+ When ranking, explicitly check:
636
+
637
+ - family diversity
638
+ - change-layer diversity
639
+ - whether the brief slate is collapsing into one familiar lens
640
+
641
+ If the top briefs are all same-family, either:
642
+
643
+ - keep only the strongest one
644
+ - or return to `brief` for a widening pass
645
+
646
+ The output of `rank` should be promotion-ready.
647
+ The output of `brief` should be candidate-ready.
648
+
649
+ ## Frontier protocol
650
+
651
+ At meaningful route boundaries, inspect:
652
+
653
+ - best branch
654
+ - best recent run
655
+ - stagnant branches
656
+ - candidate backlog
657
+ - possible fusion opportunities
658
+ - recommended mode
659
+
660
+ Prefer these route meanings:
661
+
662
+ - `explore`: widen search with fresh candidate directions
663
+ - `exploit`: focus on the strongest current line
664
+ - `fusion`: merge insights from multiple successful or complementary lines
665
+ - `debug`: rescue a candidate or line blocked by a concrete failure mode
666
+ - `stop`: the current frontier is saturated or the remaining routes are not justified
667
+
668
+ Use the integrated `frontier review template` section when the next route is unclear.
669
+
670
+ Interpret frontier state with these default heuristics:
671
+
672
+ - `explore`
673
+ - use when no line is clearly dominant
674
+ - use when current lines are too similar
675
+ - use when the search has not yet established a strong incumbent
676
+
677
+ - `exploit`
678
+ - use when one line clearly leads on evidence and comparability
679
+ - use when smoke results already narrowed the candidate pool
680
+
681
+ - `fusion`
682
+ - use when at least two lines have meaningful strengths
683
+ - use when one line is strong but another line contributes a complementary mechanism
684
+ - use when the current incumbent is stagnating but the broader frontier is still promising
685
+
686
+ - `debug`
687
+ - use when a candidate failed for a concrete and likely fixable reason
688
+ - use when the candidate is still strategically valuable after the failure
689
+
690
+ - `stop`
691
+ - use when the frontier is saturated
692
+ - use when remaining routes are low-value, redundant, or too weak relative to cost
693
+
694
+ When the frontier says `explore`, the default optimize submode is `brief`.
695
+ When the frontier says `exploit`, the default optimize submode is `seed` or `loop`.
696
+ When the frontier says `fusion`, the default optimize submode is `fusion`.
697
+ When a candidate failure dominates the next move, the default optimize submode is `debug` even if the frontier does not yet say so explicitly.
698
+
699
+ ## Seed protocol
700
+
701
+ Use `seed` after a durable line exists and before a broad execution loop begins.
702
+
703
+ The goal is not to launch a full run immediately.
704
+ The goal is to generate a small within-line candidate pool that can be smoke-tested and triaged.
705
+
706
+ When running `seed`:
707
+
708
+ - generate only `2-3` implementation-level candidates by default
709
+ - make each candidate meaningfully different in mechanism, implementation path, or risk profile
710
+ - prefer plan-first candidates over immediate large edits
711
+ - record each candidate as `report_type='optimization_candidate'`
712
+ - define which candidates enter smoke first
713
+ - for a newly promoted line, keep at least one `simple-first` candidate in the initial seed batch
714
+ - do not start a fresh line with ensemble stacking, broad HPO, or a heavy multi-stage pipeline unless durable evidence already proves the simple route is insufficient
715
+
716
+ For each seed candidate, record at least:
717
+
718
+ - candidate_id
719
+ - parent line
720
+ - strategy
721
+ - mechanism_family
722
+ - change_layer
723
+ - change_plan
724
+ - expected_gain
725
+ - keep_unchanged
726
+ - first validation step
727
+ - archive condition
728
+
729
+ MLEvolve-style behavior to preserve here:
730
+
731
+ - one durable line may produce multiple candidate attempts
732
+ - candidate generation is bounded
733
+ - smoke comes before full evaluation unless the task is explicitly `fast-check` and direct quick validation is cheaper and equally informative
734
+
735
+ Use a validation-cost-aware seed policy:
736
+
737
+ - `fast-check`: the first objective smoke signal is likely under about `20` minutes
738
+ - `slow-check`: the first objective smoke signal is likely over about `20` minutes or expensive enough that broad probing is wasteful
739
+
740
+ For `fast-check` seed work:
741
+
742
+ - widen a bit more aggressively inside the line
743
+ - a seed batch of `3-5` candidates can be justified when they are genuinely differentiated
744
+ - prefer multiple orthogonal quick tests over one over-discussed candidate
745
+ - a separate smoke stage is optional; direct submission into quick parallel validation is acceptable when the first check is already cheap
746
+ - only skip smoke when the parallel quick validations are expected to produce distinguishable conclusions rather than repeated near-duplicate outcomes
747
+
748
+ For `slow-check` seed work:
749
+
750
+ - keep the initial seed batch tighter, usually `1-2` candidates and rarely `3`
751
+ - insist on a stronger reason for every candidate entering smoke
752
+ - prefer one dominant hypothesis plus one hedge candidate over a broad exploratory pool
753
+ - do not spend long runs to discover that the brief itself was weak
754
+
755
+ Do not keep a live implementation pool dominated by the same mechanism family.
756
+ Default active-pool rule:
757
+
758
+ - at most `1-2` live candidates from the same family
759
+ - if one family already fills the live pool, new same-family candidates do not enter smoke by default
760
+
761
+ ## Loop protocol
762
+
763
+ Use `loop` when a durable line and implementation-candidate pool already exist and the main need is bounded forward motion.
764
+
765
+ Before changing code in `loop`, inspect the same-line local attempt memory for the current line.
766
+ Treat recent sibling attempts on the same line as the first memory surface, ahead of broader quest memory.
767
+
768
+ When running `loop`, choose one primary action:
769
+
770
+ - `smoke`
771
+ - `promote_to_full_eval`
772
+ - `archive`
773
+ - `record_main_result`
774
+ - `switch_to_fusion`
775
+ - `switch_to_debug`
776
+ - `stop`
777
+
778
+ Every loop pass should end with:
779
+
780
+ - one updated candidate status
781
+ - one updated next action
782
+ - one frontier review trigger
783
+
784
+ Do not leave the line with several half-started directions and no dominant next move.
785
+
786
+ Default exploit rule: one atomic improvement per pass.
787
+ Do not bundle several unrelated changes into one exploit candidate unless:
788
+
789
+ - the changes are one tightly coupled design package
790
+ - or the pass is explicitly a fusion route
791
+
792
+ MLEvolve-style behavior to preserve here:
793
+
794
+ - bounded parallelism
795
+ - small live candidate pool
796
+ - explicit move from draft -> smoke -> full eval -> archive or result
797
+ - measured frontier review after real evidence
798
+
799
+ Use a validation-cost-aware loop policy:
800
+
801
+ - for `fast-check` tasks, it is acceptable to run more quick, different tests before converging
802
+ - for `fast-check` tasks, direct quick validation may replace a separate smoke stage if that saves time without losing decision quality
803
+ - for `slow-check` tasks, use fewer but sharper passes, and require objective gain before widening or evolving further
804
+ - if the validation loop is slow, do not keep paying for frontier uncertainty that could have been reduced in `brief`
805
+ - if the validation loop is fast, prefer resolving uncertainty with evidence instead of over-arguing in chat
806
+
807
+ Use a branch/family diversity cap during exploitation:
808
+
809
+ - do not keep selecting only the locally familiar family because it is easiest to elaborate
810
+ - when several strong candidates are close, prefer the one that preserves frontier diversity
811
+ - if one branch or family already dominates recent attempts, require stronger evidence before selecting another near-duplicate attempt
812
+
813
+ ## Memory protocol
814
+
815
+ Before broad new search, run at least one `memory.search(...)` using:
816
+
817
+ - the current task name
818
+ - the active idea id
819
+ - a method keyword
820
+ - the most recent failure mode or successful mechanism
821
+
822
+ When the search appears too narrow, also retrieve one of:
823
+
824
+ - a similar failure pattern
825
+ - an orthogonal success pattern
826
+ - a deliberately dissimilar but high-value prior attempt
827
+
828
+ For `seed`, `loop`, and `debug`, also inspect the same-line local attempt memory from the current leading line before widening to broader quest memory.
829
+
830
+ Write at least one quest memory card when you learn something reusable, such as:
831
+
832
+ - a successful optimization pattern
833
+ - a repeated failure pattern
834
+ - a fusion lesson
835
+ - a reason a candidate should not be retried
836
+
837
+ Use the integrated `optimization memory template` section for the minimum acceptable memory-card shape.
838
+
839
+ Do not write generic "we tried some optimization" memory cards.
840
+ Each card should be retrieval-friendly and decision-relevant.
841
+
842
+ ## Artifact protocol
843
+
844
+ Use:
845
+
846
+ - `artifact.submit_idea(..., submission_mode='candidate')` for candidate briefs
847
+ - `artifact.submit_idea(..., submission_mode='line')` for durable promoted lines
848
+ - `artifact.record(payload={'kind': 'report', 'report_type': 'optimization_candidate', ...})` for within-line attempts
849
+ - `artifact.record(payload={'kind': 'decision', 'action': 'iterate'|'branch'|'continue'|'stop', ...})` for route changes
850
+ - `artifact.record_main_experiment(...)` for real measured line results
851
+
852
+ When the optimize pass is about ranking or promotion, also record one durable decision explaining:
853
+
854
+ - which briefs were compared
855
+ - which one won
856
+ - why promotion was justified now
857
+ - why the others were held, fused, or rejected
858
+
859
+ When recording implementation-level candidates, prefer these status values:
860
+
861
+ - `proposed`
862
+ - `smoke_running`
863
+ - `smoke_passed`
864
+ - `smoke_failed`
865
+ - `promoted`
866
+ - `full_eval_running`
867
+ - `succeeded`
868
+ - `failed`
869
+ - `archived`
870
+
871
+ Use `report_type='optimization_candidate'` consistently for implementation-level attempts so they can later be summarized into the frontier.
872
+
873
+ ## Execution protocol
874
+
875
+ - Use `bash_exec` for smoke checks and full runs.
876
+ - Prefer bounded smoke before full evaluation unless `fast-check` direct validation is cheaper and equally informative.
877
+ - Do not keep rerunning the same unchanged candidate.
878
+ - If a candidate fails with a clear root cause, either debug it deliberately or archive it.
879
+ - If the same line stalls repeatedly, switch to exploit or fusion rather than pretending more of the same is new evidence.
880
+
881
+ Use this execution order by default:
882
+
883
+ 1. candidate brief selection
884
+ 2. implementation-level candidate generation
885
+ 3. smoke test or direct quick validation
886
+ 4. promotion to fuller evaluation when justified
887
+ 5. durable result recording
888
+ 6. frontier review
889
+
890
+ Prefer only a small active pool at once:
891
+
892
+ - usually `2-4` candidate briefs before promotion
893
+ - usually `2-3` live implementation candidates in smoke
894
+ - usually `1-2` full evaluations running at once unless the environment clearly supports more
895
+
896
+ Validation-cost-aware override:
897
+
898
+ - if first-pass validation is under about `20` minutes, it is reasonable to increase smoke breadth modestly and compare more alternatives early
899
+ - if first-pass validation is under about `20` minutes, you may skip a separate smoke stage and submit several quick validations in parallel
900
+ - only do that when the validations are likely to yield different conclusions such as clear win / tie / fail / instability, rather than redundant repeats
901
+ - if first-pass validation is slower than that, keep the active pool narrow and gate evolution on clear objective signal
902
+ - for slow validation, do not promote a candidate into heavier resource investment until smoke or pilot evidence shows a real performance improvement, stability improvement, or comparability-preserving advantage
903
+
904
+ ## Code-generation route selection
905
+
906
+ Do not use the same code-generation route for every optimization step.
907
+
908
+ Prefer:
909
+
910
+ 1. brief-first, no code yet
911
+ - when the direction is still unclear
912
+ - stay at candidate-brief level
913
+
914
+ 2. stepwise generation
915
+ - for the first substantial implementation of a new durable line
916
+ - especially when the line touches multiple subsystems such as data processing, model design, and training/evaluation
917
+
918
+ 3. diff / patch generation
919
+ - when a strong current implementation already exists
920
+ - for improve, exploit, debug, and most fusion work
921
+
922
+ 4. full rewrite
923
+ - only when the current implementation is too broken or too structurally mismatched for diff patching to remain safe
924
+
925
+ Use the integrated `codegen route playbook` section before committing to a larger rewrite.
926
+
927
+ ## Debug protocol
928
+
929
+ Use `debug` when a candidate failed but still looks strategically valuable.
930
+
931
+ `debug` is bugfix-only.
932
+ Do not use a debug pass to sneak in a new performance-improvement idea.
933
+ If the proposed change goes beyond the minimal fix and becomes a new mechanism, stop and route back to `brief` or `loop` instead.
934
+
935
+ When a candidate fails:
936
+
937
+ - classify whether the failure is structural, local, or environmental
938
+ - retrieve similar failure patterns from memory before changing code
939
+ - prefer targeted fixes over broad rewrites
940
+ - define the exact post-fix bounded check before editing
941
+
942
+ Good debug prompts should make these explicit:
943
+
944
+ - the concrete error
945
+ - the likely root cause
946
+ - the minimal fix
947
+ - what must remain unchanged
948
+
949
+ Use the integrated `debug response template` section for the minimum acceptable debug response shape.
950
+
951
+ Archive rather than debug when:
952
+
953
+ - the failure is mostly strategic rather than local
954
+ - the candidate no longer looks better than the nearby alternatives
955
+ - the fix would effectively turn it into a different candidate anyway
956
+
957
+ ## Fusion protocol
958
+
959
+ Use `fusion` only when the frontier justifies cross-line combination.
960
+
961
+ Before opening a fusion candidate:
962
+
963
+ - identify the real strength of each source line
964
+ - identify the real weakness of each source line
965
+ - explain why the strengths are complementary rather than redundant
966
+ - define what remains unchanged for comparability
967
+ - define the bounded evidence that would prove the fusion was worthwhile
968
+
969
+ Use the integrated `fusion playbook` section before launching cross-line fusion.
970
+
971
+ Do not fuse:
972
+
973
+ - two lines with the same mechanism under different names
974
+ - two weak lines that lack a clear strength
975
+ - merely because multiple branches exist
976
+
977
+ If the fusion hypothesis is still underspecified, return to `brief` instead of pretending fusion is ready.
978
+
979
+ ## Prompt patterns worth preserving
980
+
981
+ For candidate-brief, improve, fusion, and debug prompts, preserve these recurring structures:
982
+
983
+ - Introduction
984
+ - Task description
985
+ - Memory
986
+ - Previous solution or previous line
987
+ - Instructions
988
+ - assistant_prefix when a stable response lead-in reduces drift
989
+ - explicit response format
990
+
991
+ And preserve these recurring reasoning contracts:
992
+
993
+ - root cause first
994
+ - WHAT / WHY / HOW
995
+ - KEEP UNCHANGED
996
+ - explicit next action
997
+
998
+ Use the integrated `prompt patterns` section as the canonical optimization prompt crib sheet.
999
+
1000
+ ## Plateau and fusion protocol
1001
+
1002
+ Treat repeated local edits without evidence gain as a search failure mode.
1003
+
1004
+ If one line shows repeated non-improving results:
1005
+
1006
+ - stop issuing near-duplicate attempts
1007
+ - record the stagnation explicitly
1008
+ - either widen the search or fuse with another line
1009
+
1010
+ Use the integrated `fusion playbook` section before launching cross-line fusion.
1011
+ Use the integrated `plateau response playbook` section when deciding how to respond to repeated non-improving results.
1012
+
1013
+ Good fusion candidates usually satisfy both:
1014
+
1015
+ - each source line has at least one real strength
1016
+ - the strengths are complementary rather than redundant
1017
+
1018
+ Do not fuse merely because two lines both exist.
1019
+
1020
+ When a line plateaus:
1021
+
1022
+ - stop issuing near-duplicate low-information attempts
1023
+ - say explicitly that the line is plateauing
1024
+ - force one larger route change:
1025
+ - widen the brief slate
1026
+ - promote a stronger alternative
1027
+ - fuse
1028
+ - debug one blocked but valuable candidate
1029
+ - stop
1030
+
1031
+ Do not hide plateau under a sequence of tiny "one more tweak" loops.
1032
+
1033
+ Family-shift trigger:
1034
+
1035
+ - if recent attempts stay inside one mechanism family and there is no meaningful improvement
1036
+ - or if `success_patience >= 2`
1037
+ - or if `total_patience >= 5`
1038
+ - the next pass must not be another same-family Tier1 tweak
1039
+ - instead choose one of:
1040
+ - orthogonal family
1041
+ - Tier2 or Tier3 shift
1042
+ - fusion
1043
+ - stop
1044
+
1045
+ This is the default anti-collapse rule for optimize.
1046
+
1047
+ ## Task-category primer
1048
+
1049
+ Before widening a stale frontier, classify the task briefly into one or more dominant structures:
1050
+
1051
+ - tabular
1052
+ - vision / spatial
1053
+ - sequence / language
1054
+ - graph / topology
1055
+ - systems / optimization
1056
+ - mixed
1057
+
1058
+ Then ask whether the current brief slate overfits one familiar method family for that task.
1059
+ If it does, require at least one serious candidate from a different plausible family or lens before promotion.
1060
+
1061
+ ## Stall-recovery protocol
1062
+
1063
+ If the optimize stage appears to stall, diagnose the stall explicitly instead of idling.
1064
+
1065
+ Common stall classes:
1066
+
1067
+ - no frontier information
1068
+ - no candidate clearly worth promotion
1069
+ - candidate pool is too similar
1070
+ - repeated failures on one line
1071
+ - no active runs and no next action recorded
1072
+
1073
+ Preferred recovery order:
1074
+
1075
+ 1. refresh the frontier
1076
+ 2. inspect the current candidate board
1077
+ 3. inspect recent optimization memory
1078
+ 4. record one explicit route decision
1079
+ 5. continue with exactly one concrete next action
1080
+
1081
+ Do not leave the stage parked without a recorded reason and a concrete reopen condition.
1082
+
1083
+ ## Stage-end requirement
1084
+
1085
+ Stage-end requirement:
1086
+
1087
+ - write at least one `memory.write(...)` when the pass produced a reusable success pattern, repeated failure pattern, fusion lesson, or explicit non-retry rule
1088
+ - update `OPTIMIZE_CHECKLIST.md`
1089
+ - update `CANDIDATE_BOARD.md` when the candidate pool changed
1090
+ - leave one durable next action or stop condition
1091
+
1092
+ If nothing reusable was learned, record why this pass was still necessary instead of writing a fake memory card.
1093
+
1094
+ ## Completion rule
1095
+
1096
+ This stage is complete only when one of these is durably true:
1097
+
1098
+ - a stronger line was promoted and the next anchor is clear
1099
+ - the current line produced a real measured result and the next route is recorded
1100
+ - the optimization frontier says stop and that stop decision is durably recorded
1101
+
1102
+ Do not treat one candidate creation or one smoke pass as stage completion.
1103
+
1104
+ ## Integrated reference appendix
1105
+
1106
+ This appendix inlines the former `optimize/references/*.md` material so the skill remains self-contained.
1107
+
1108
+ ### brief-shaping-playbook.md
1109
+
1110
+ # Brief Shaping Playbook
1111
+
1112
+ Use this reference when a candidate direction is still fuzzy and needs to become a structured, ranking-ready brief.
1113
+
1114
+ This playbook borrows the useful part of product-style brainstorming without importing a full software-spec workflow.
1115
+ The goal is not a long design document.
1116
+ The goal is a compact candidate brief that is clear enough to compare, rank, and either submit as `submission_mode='candidate'` or reject.
1117
+
1118
+ ## 1. Clarify before widening
1119
+
1120
+ Before generating more variants, resolve the minimum ambiguity around:
1121
+
1122
+ - the concrete bottleneck
1123
+ - the evaluation or comparability boundary
1124
+ - the main hard constraint: data, metric, compute, latency, memory, interface, or training budget
1125
+ - the current incumbent or baseline that this brief must beat or complement
1126
+
1127
+ If one unknown would materially change every candidate, clarify it first instead of generating a noisy slate.
1128
+ Prefer one question at a time when clarification is genuinely needed.
1129
+ If the answer is already available from durable state, use that instead of asking.
1130
+
1131
+ ## 2. Generate a small differentiated slate
1132
+
1133
+ Default target: `2-3` serious approaches.
1134
+
1135
+ The slate should usually include:
1136
+
1137
+ - one incumbent-deepening refinement
1138
+ - one orthogonal mechanism
1139
+ - one broader shift candidate when justified
1140
+
1141
+ Do not produce several renamed variants of the same mechanism family.
1142
+ If two variants differ only by parameter choice or patch detail, keep only the sharper one.
1143
+
1144
+ For each candidate, write:
1145
+
1146
+ - bottleneck
1147
+ - why_current_line_is_limited
1148
+ - mechanism
1149
+ - why_now
1150
+ - keep_unchanged
1151
+ - expected_gain
1152
+ - main_risks
1153
+
1154
+ ## 3. Compare on one shared surface
1155
+
1156
+ Before recommending a winner, compare the serious candidates on the same dimensions:
1157
+
1158
+ - expected upside
1159
+ - comparability safety
1160
+ - implementation surface
1161
+ - mechanism distinctness
1162
+ - failure risk
1163
+ - reason this route is better now than the nearby alternatives
1164
+
1165
+ Do not let each candidate justify itself with a different scoring story.
1166
+ Use one comparison surface so ranking is auditable.
1167
+
1168
+ ## 4. Recommend exactly one lead brief
1169
+
1170
+ After comparison, recommend one lead brief and explain:
1171
+
1172
+ - why it is the best next move now
1173
+ - why the main alternatives are deferred instead of promoted
1174
+ - what evidence would quickly disconfirm the lead brief
1175
+
1176
+ Do not say "all are promising" and promote everything.
1177
+ If the slate is still too close to call, return to widening once or narrow the slate further.
1178
+
1179
+ ## 5. Self-check before submission
1180
+
1181
+ Before calling `artifact.submit_idea(..., submission_mode='candidate', ...)`, check:
1182
+
1183
+ - Is the bottleneck concrete rather than generic?
1184
+ - Does `why_current_line_is_limited` explain a real gap instead of restating the mechanism?
1185
+ - Does `why_now` explain what changed in evidence, failure pattern, or frontier state?
1186
+ - Is the comparability boundary explicit?
1187
+ - Is the recommendation based on tradeoffs rather than implementation convenience?
1188
+ - Would the brief still make sense if handed to another agent with no chat context?
1189
+
1190
+ If any answer is no, refine the brief before submission.
1191
+
1192
+ ## 6. Output shape
1193
+
1194
+ A good final brief package is short and structured:
1195
+
1196
+ 1. brief title
1197
+ 2. one-paragraph bottleneck and constraint summary
1198
+ 3. a `2-3` candidate comparison table or bullet slate
1199
+ 4. recommended brief with tradeoff summary
1200
+ 5. self-check outcome
1201
+ 6. fields ready for the integrated `method-brief-template.md` section
1202
+
1203
+ Keep it compact.
1204
+ This is a shaping pass for optimization candidates, not a paper draft or engineering spec.
1205
+
1206
+ ### candidate-board-template.md
1207
+
1208
+ # CANDIDATE_BOARD.md
1209
+
1210
+ | Candidate ID | Level | Parent | Strategy | Status | Expected Gain | Observed Result | Promote / Archive |
1211
+ | --- | --- | --- | --- | --- | --- | --- | --- |
1212
+ | cand-001 | brief | current-head | explore | proposed | Better tail accuracy | n/a | pending |
1213
+ | cand-002 | impl | cand-001 | exploit | smoke_passed | Faster convergence | smoke ok | consider promote |
1214
+
1215
+ Notes:
1216
+
1217
+ - `Level` should be `brief` or `implementation`
1218
+ - `Parent` may be a branch, idea id, run id, or candidate id
1219
+ - `Strategy` should usually be one of `explore`, `exploit`, `fusion`, `debug`
1220
+ - `Promote / Archive` should be a clear recommendation, not an empty placeholder
1221
+
1222
+ ### candidate-ranking-template.md
1223
+
1224
+ # Candidate Ranking Template
1225
+
1226
+ ## Candidate Set
1227
+
1228
+ - Candidate IDs:
1229
+ - Ranking scope:
1230
+ - Comparison surface:
1231
+
1232
+ ## Criteria
1233
+
1234
+ - expected information gain
1235
+ - feasibility in current repo
1236
+ - comparability against baseline
1237
+ - implementation surface
1238
+ - likely novelty or distinctiveness
1239
+ - risk of redundant overlap
1240
+ - incumbent-improvement potential
1241
+ - distinctness from other candidates
1242
+ - mechanism-family diversity
1243
+ - change-layer diversity
1244
+
1245
+ ## Ranked Candidates
1246
+
1247
+ 1. `candidate_id`
1248
+ Score summary:
1249
+ Why it ranks here:
1250
+ Promote / hold / reject:
1251
+
1252
+ 2. `candidate_id`
1253
+ Score summary:
1254
+ Why it ranks here:
1255
+ Promote / hold / reject:
1256
+
1257
+ 3. `candidate_id`
1258
+ Score summary:
1259
+ Why it ranks here:
1260
+ Promote / hold / reject:
1261
+
1262
+ ## Winner Justification
1263
+
1264
+ Why the selected candidate should become a durable line now.
1265
+
1266
+ ## Non-Winner Notes
1267
+
1268
+ Why the other candidates were deferred, fused, or rejected.
1269
+
1270
+ ## Promotion Cap
1271
+
1272
+ - how many candidates should be promoted now:
1273
+ - why more promotion would dilute the frontier:
1274
+ - same-family cap override justification:
1275
+
1276
+ ### codegen-route-playbook.md
1277
+
1278
+ # Codegen Route Playbook
1279
+
1280
+ Choose the code-generation route deliberately.
1281
+
1282
+ ## Use brief-only
1283
+
1284
+ Use no-code candidate briefs when:
1285
+
1286
+ - the direction is still underspecified
1287
+ - multiple distinct directions still need ranking
1288
+ - a new line should not be promoted yet
1289
+
1290
+ ## Use stepwise generation
1291
+
1292
+ Prefer stepwise generation when:
1293
+
1294
+ - a new durable line is being implemented for the first time
1295
+ - the change spans data processing, model design, and training/evaluation
1296
+ - a modular decomposition will reduce large integrated errors
1297
+ - a plan -> refine -> implement sequence is safer than one monolithic edit
1298
+
1299
+ ## Use diff / patch generation
1300
+
1301
+ Prefer diff / patch generation when:
1302
+
1303
+ - a strong current implementation already exists
1304
+ - the current change is local enough to preserve most of the line
1305
+ - the task is improve, exploit, debug, or most fusion work
1306
+ - the desired change can be described as a bounded delta from the current solution
1307
+
1308
+ ## Use full rewrite
1309
+
1310
+ Use a full rewrite only when:
1311
+
1312
+ - the existing implementation is structurally broken
1313
+ - the desired architecture no longer matches the current codebase shape
1314
+ - diff patching would be more fragile than replacement
1315
+
1316
+ Do not jump to a rewrite merely because one local patch failed.
1317
+
1318
+ ## Response shape
1319
+
1320
+ For non-trivial codegen work, prefer this shape:
1321
+
1322
+ 1. short plan
1323
+ 2. bounded implementation surface
1324
+ 3. keep-unchanged contract
1325
+ 4. validation step
1326
+
1327
+ Do not go from a vague idea directly into a large patch with no intermediate plan.
1328
+
1329
+ ### debug-response-template.md
1330
+
1331
+ # Debug Response Template
1332
+
1333
+ ## Error
1334
+
1335
+ What concrete error or failure occurred?
1336
+
1337
+ ## Retrieved Memory
1338
+
1339
+ What similar failure pattern or repair lesson should be reused before changing code?
1340
+
1341
+ ## Root Cause
1342
+
1343
+ What is the most likely underlying cause?
1344
+
1345
+ ## Minimal Fix
1346
+
1347
+ What is the smallest plausible fix?
1348
+
1349
+ ## Keep Unchanged
1350
+
1351
+ What parts of the line must remain unchanged for comparability and stability?
1352
+
1353
+ ## Next Check
1354
+
1355
+ What bounded smoke or validation check should confirm the fix?
1356
+
1357
+ ## Archive Threshold
1358
+
1359
+ What outcome would prove this candidate should be archived instead of debugged again?
1360
+
1361
+ ### frontier-review-template.md
1362
+
1363
+ # Frontier Review Template
1364
+
1365
+ ## Current Frontier
1366
+
1367
+ - mode:
1368
+ - best branch:
1369
+ - best run:
1370
+ - stagnant branches:
1371
+ - candidate backlog:
1372
+ - fusion candidates:
1373
+
1374
+ ## Evidence Summary
1375
+
1376
+ - strongest support:
1377
+ - strongest contradiction:
1378
+ - biggest unresolved risk:
1379
+
1380
+ ## Route Choice
1381
+
1382
+ - explore / exploit / fusion / debug / stop:
1383
+ - why this is the best next move:
1384
+
1385
+ ## Active Optimize Submode
1386
+
1387
+ - brief / rank / seed / loop / fusion / debug:
1388
+ - why this submode is dominant now:
1389
+
1390
+ ## Immediate Next Action
1391
+
1392
+ - exact next step:
1393
+ - what result will trigger another frontier review:
1394
+ - what result would force a different mode:
1395
+
1396
+ ### fusion-playbook.md
1397
+
1398
+ # Fusion Playbook
1399
+
1400
+ Use fusion only when:
1401
+
1402
+ - at least two lines have real strengths
1403
+ - the strengths are complementary
1404
+ - one line alone is no longer improving fast enough
1405
+
1406
+ Before fusion, write down:
1407
+
1408
+ - source line A:
1409
+ strongest mechanism:
1410
+ strongest evidence:
1411
+ main weakness:
1412
+ what must survive the fusion:
1413
+
1414
+ - source line B:
1415
+ strongest mechanism:
1416
+ strongest evidence:
1417
+ main weakness:
1418
+ what must survive the fusion:
1419
+
1420
+ Then answer:
1421
+
1422
+ - what exactly is being fused?
1423
+ - why does this combination address a real bottleneck?
1424
+ - why are the source strengths complementary rather than redundant?
1425
+ - what remains unchanged for comparability?
1426
+ - what evidence would prove the fusion was worth it?
1427
+ - what bounded first validation step should run before any broad rollout?
1428
+
1429
+ Do not fuse:
1430
+
1431
+ - two lines with the same mechanism under different names
1432
+ - two weak lines with no clear strengths
1433
+ - merely because multiple branches exist
1434
+
1435
+ ### method-brief-template.md
1436
+
1437
+ # Method Brief Template
1438
+
1439
+ ## Title
1440
+
1441
+ One short line naming the candidate direction.
1442
+
1443
+ ## Bottleneck
1444
+
1445
+ What concrete bottleneck or limitation does this target?
1446
+
1447
+ ## Why Current Line Is Limited
1448
+
1449
+ Why is the current best line or baseline not already solving this?
1450
+
1451
+ ## Mechanism
1452
+
1453
+ What specific intervention or design change is proposed?
1454
+
1455
+ ## Mechanism Family
1456
+
1457
+ Name the family explicitly, for example `adapter`, `loss`, `architecture`, `augmentation`, `ensemble`, `retrieval`, `objective-shift`.
1458
+
1459
+ ## Change Layer
1460
+
1461
+ One of:
1462
+
1463
+ - `Tier1`: local optimization / training detail
1464
+ - `Tier2`: representation or component change
1465
+ - `Tier3`: paradigm or system-level shift
1466
+
1467
+ ## Source Lens
1468
+
1469
+ Where did this candidate come from?
1470
+
1471
+ - baseline_refinement
1472
+ - orthogonal_mechanism
1473
+ - failure_repair
1474
+ - cross_domain_transfer
1475
+ - objective_shift
1476
+ - search_widening
1477
+
1478
+ ## Keep Unchanged
1479
+
1480
+ What must remain stable for comparability?
1481
+
1482
+ ## Expected Gain
1483
+
1484
+ What evidence should improve if this works?
1485
+
1486
+ ## Implementation Surface
1487
+
1488
+ - main files or modules likely involved:
1489
+ - likely change scope: local / moderate / broad
1490
+
1491
+ ## Risks
1492
+
1493
+ - Main failure mode
1494
+ - Comparability risk
1495
+ - Implementation risk
1496
+
1497
+ ## Foundation
1498
+
1499
+ - Source branch / run / baseline:
1500
+ - Why this foundation is the right starting point:
1501
+
1502
+ ## Promote Now
1503
+
1504
+ - yes / no
1505
+ - why:
1506
+
1507
+ ## Next Target
1508
+
1509
+ Usually `optimize` or `experiment`.
1510
+
1511
+ ### optimization-memory-template.md
1512
+
1513
+ # Optimization Memory Template
1514
+
1515
+ ## Type
1516
+
1517
+ - success pattern / failure pattern / fusion lesson
1518
+
1519
+ ## Context
1520
+
1521
+ - task:
1522
+ - branch or idea:
1523
+ - candidate id:
1524
+ - strategy:
1525
+
1526
+ ## Observation
1527
+
1528
+ What actually happened?
1529
+
1530
+ ## Why It Matters
1531
+
1532
+ Why should a later optimization pass retrieve this?
1533
+
1534
+ ## Retrieval Hint
1535
+
1536
+ - query keywords:
1537
+ - closest line or mechanism family:
1538
+ - when this should be recalled first:
1539
+
1540
+ ## Reuse Hint
1541
+
1542
+ When should this lesson be reused, and when should it be avoided?
1543
+
1544
+ ### optimize-checklist-template.md
1545
+
1546
+ # OPTIMIZE_CHECKLIST.md
1547
+
1548
+ - [ ] Read `artifact.get_optimization_frontier(...)` or equivalent durable frontier summary
1549
+ - [ ] Select the primary optimize submode: `brief`, `rank`, `seed`, `loop`, `fusion`, or `debug`
1550
+ - [ ] Confirm whether the current pass is `explore`, `exploit`, `fusion`, `debug`, or `stop`
1551
+ - [ ] Review recent optimization memory before generating new candidates
1552
+ - [ ] Check whether the current brief slate covers more than one mechanism family
1553
+ - [ ] Candidate briefs updated or confirmed
1554
+ - [ ] Candidate ranking updated
1555
+ - [ ] Promote only the strongest brief(s) into durable line(s) if justified
1556
+ - [ ] Current implementation candidate pool recorded
1557
+ - [ ] Smoke queue defined
1558
+ - [ ] Full-eval queue defined
1559
+ - [ ] Recent failures classified and either debugged or archived
1560
+ - [ ] Stagnation check performed
1561
+ - [ ] Family-shift trigger checked
1562
+ - [ ] Fusion eligibility checked
1563
+ - [ ] Next concrete action written
1564
+
1565
+ ### plateau-response-playbook.md
1566
+
1567
+ # Plateau Response Playbook
1568
+
1569
+ Use this when one line keeps producing non-improving results.
1570
+
1571
+ ## Plateau indicators
1572
+
1573
+ - repeated non-improving results on the same line
1574
+ - repeated "small tweak" proposals with no structural change
1575
+ - candidate queue filled with near-duplicate mechanisms
1576
+
1577
+ ## Required response
1578
+
1579
+ 1. state that the line is plateauing
1580
+ 2. identify the most likely root cause of the plateau
1581
+ 3. choose one of:
1582
+ - widen search
1583
+ - promote a stronger alternative
1584
+ - fuse with another line
1585
+ - debug a strategically valuable blocked candidate
1586
+ - stop the line
1587
+ 4. record one explicit non-repeat rule so the next pass does not retry the same low-information move
1588
+
1589
+ ## Do not do
1590
+
1591
+ - keep proposing near-identical local tweaks
1592
+ - rerun the same unchanged candidate
1593
+ - fuse without a clear complementary mechanism
1594
+ - hide a plateau under a sequence of tiny "one more tweak" edits
1595
+
1596
+ ### prompt-patterns.md
1597
+
1598
+ # Optimization Prompt Patterns
1599
+
1600
+ These prompt structures are worth preserving across optimize subroutines.
1601
+
1602
+ ## Common skeleton
1603
+
1604
+ - Introduction
1605
+ - Task description
1606
+ - Memory
1607
+ - Previous solution or previous line
1608
+ - Instructions
1609
+ - assistant_prefix when a stable response lead-in reduces drift
1610
+ - Explicit response format
1611
+
1612
+ ## Common reasoning contract
1613
+
1614
+ - WHAT is changing?
1615
+ - WHY is the current line limited?
1616
+ - HOW should the change address the limitation?
1617
+ - KEEP UNCHANGED: what must remain stable for comparability?
1618
+ - NEXT ACTION: what concrete step follows this prompt?
1619
+
1620
+ ## Plateau pattern
1621
+
1622
+ When the line is stagnating:
1623
+
1624
+ - explicitly state that the current approach has plateaued
1625
+ - forbid trivial hyperparameter-only tweaks when a deeper change is needed
1626
+ - require a larger representational or architectural shift
1627
+
1628
+ ## Fusion pattern
1629
+
1630
+ When combining lines:
1631
+
1632
+ - identify the real strength of each source line
1633
+ - explain why those strengths are complementary
1634
+ - avoid combining everything
1635
+ - preserve the comparison surface
1636
+
1637
+ ## Debug pattern
1638
+
1639
+ For debugging:
1640
+
1641
+ - restate the concrete error
1642
+ - state the likely root cause
1643
+ - require the minimal targeted fix
1644
+ - preserve the original solution intent unless the bug proves the design invalid