@researai/deepscientist 1.5.13 → 1.5.15

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (142) hide show
  1. package/README.md +8 -0
  2. package/assets/branding/logo-raster.png +0 -0
  3. package/bin/ds.js +134 -49
  4. package/docs/en/00_QUICK_START.md +2 -2
  5. package/docs/en/01_SETTINGS_REFERENCE.md +20 -4
  6. package/docs/en/03_QQ_CONNECTOR_GUIDE.md +19 -0
  7. package/docs/en/05_TUI_GUIDE.md +466 -96
  8. package/docs/en/10_WEIXIN_CONNECTOR_GUIDE.md +20 -0
  9. package/docs/en/14_PROMPT_SKILLS_AND_MCP_GUIDE.md +2 -0
  10. package/docs/en/16_TELEGRAM_CONNECTOR_GUIDE.md +134 -0
  11. package/docs/en/17_WHATSAPP_CONNECTOR_GUIDE.md +126 -0
  12. package/docs/en/18_FEISHU_CONNECTOR_GUIDE.md +136 -0
  13. package/docs/en/README.md +8 -0
  14. package/docs/zh/00_QUICK_START.md +2 -2
  15. package/docs/zh/01_SETTINGS_REFERENCE.md +20 -4
  16. package/docs/zh/03_QQ_CONNECTOR_GUIDE.md +19 -0
  17. package/docs/zh/05_TUI_GUIDE.md +465 -82
  18. package/docs/zh/10_WEIXIN_CONNECTOR_GUIDE.md +20 -0
  19. package/docs/zh/14_PROMPT_SKILLS_AND_MCP_GUIDE.md +2 -0
  20. package/docs/zh/16_TELEGRAM_CONNECTOR_GUIDE.md +134 -0
  21. package/docs/zh/17_WHATSAPP_CONNECTOR_GUIDE.md +126 -0
  22. package/docs/zh/18_FEISHU_CONNECTOR_GUIDE.md +136 -0
  23. package/docs/zh/README.md +8 -0
  24. package/install.sh +2 -0
  25. package/package.json +1 -1
  26. package/pyproject.toml +1 -1
  27. package/src/deepscientist/__init__.py +1 -1
  28. package/src/deepscientist/artifact/charts.py +567 -0
  29. package/src/deepscientist/artifact/guidance.py +50 -10
  30. package/src/deepscientist/artifact/metrics.py +228 -5
  31. package/src/deepscientist/artifact/schemas.py +3 -0
  32. package/src/deepscientist/artifact/service.py +4004 -538
  33. package/src/deepscientist/bash_exec/models.py +23 -0
  34. package/src/deepscientist/bash_exec/monitor.py +147 -67
  35. package/src/deepscientist/bash_exec/runtime.py +218 -156
  36. package/src/deepscientist/bash_exec/service.py +79 -64
  37. package/src/deepscientist/bash_exec/shells.py +87 -0
  38. package/src/deepscientist/bridges/connectors.py +51 -2
  39. package/src/deepscientist/config/models.py +6 -3
  40. package/src/deepscientist/config/service.py +7 -2
  41. package/src/deepscientist/connector/lingzhu_support.py +23 -4
  42. package/src/deepscientist/connector/weixin_support.py +122 -1
  43. package/src/deepscientist/daemon/api/handlers.py +75 -4
  44. package/src/deepscientist/daemon/api/router.py +1 -0
  45. package/src/deepscientist/daemon/app.py +869 -236
  46. package/src/deepscientist/doctor.py +51 -0
  47. package/src/deepscientist/file_lock.py +48 -0
  48. package/src/deepscientist/gitops/diff.py +167 -1
  49. package/src/deepscientist/mcp/server.py +331 -21
  50. package/src/deepscientist/process_control.py +161 -0
  51. package/src/deepscientist/prompts/builder.py +275 -491
  52. package/src/deepscientist/quest/service.py +2336 -145
  53. package/src/deepscientist/quest/stage_views.py +305 -29
  54. package/src/deepscientist/runners/base.py +2 -0
  55. package/src/deepscientist/runners/codex.py +88 -5
  56. package/src/deepscientist/runners/runtime_overrides.py +17 -1
  57. package/src/deepscientist/shared.py +6 -1
  58. package/src/prompts/contracts/shared_interaction.md +13 -4
  59. package/src/prompts/system.md +984 -1985
  60. package/src/skills/analysis-campaign/SKILL.md +31 -2
  61. package/src/skills/analysis-campaign/references/artifact-orchestration.md +1 -1
  62. package/src/skills/analysis-campaign/references/writing-facing-slice-examples.md +65 -0
  63. package/src/skills/baseline/SKILL.md +267 -994
  64. package/src/skills/baseline/references/baseline-checklist-template.md +21 -32
  65. package/src/skills/baseline/references/baseline-plan-template.md +41 -57
  66. package/src/skills/decision/SKILL.md +19 -2
  67. package/src/skills/experiment/SKILL.md +8 -2
  68. package/src/skills/finalize/SKILL.md +18 -0
  69. package/src/skills/idea/SKILL.md +78 -0
  70. package/src/skills/idea/references/idea-generation-playbook.md +100 -0
  71. package/src/skills/idea/references/outline-seeding-example.md +60 -0
  72. package/src/skills/intake-audit/SKILL.md +1 -1
  73. package/src/skills/optimize/SKILL.md +1644 -0
  74. package/src/skills/rebuttal/SKILL.md +2 -1
  75. package/src/skills/review/SKILL.md +2 -1
  76. package/src/skills/write/SKILL.md +80 -12
  77. package/src/skills/write/references/outline-evidence-contract-example.md +107 -0
  78. package/src/tui/dist/app/AppContainer.js +1445 -52
  79. package/src/tui/dist/components/Composer.js +1 -1
  80. package/src/tui/dist/components/ConfigScreen.js +190 -36
  81. package/src/tui/dist/components/GradientStatusText.js +1 -20
  82. package/src/tui/dist/components/InputPrompt.js +41 -32
  83. package/src/tui/dist/components/LoadingIndicator.js +1 -1
  84. package/src/tui/dist/components/Logo.js +61 -38
  85. package/src/tui/dist/components/MainContent.js +10 -3
  86. package/src/tui/dist/components/WelcomePanel.js +4 -12
  87. package/src/tui/dist/components/messages/AssistantMessage.js +1 -1
  88. package/src/tui/dist/components/messages/BashExecOperationMessage.js +3 -3
  89. package/src/tui/dist/components/messages/OperationMessage.js +1 -1
  90. package/src/tui/dist/index.js +28 -1
  91. package/src/tui/dist/layouts/DefaultAppLayout.js +3 -3
  92. package/src/tui/dist/lib/api.js +17 -0
  93. package/src/tui/dist/lib/connectors.js +261 -0
  94. package/src/tui/dist/semantic-colors.js +29 -19
  95. package/src/tui/package.json +1 -1
  96. package/src/ui/dist/assets/{AiManusChatView-CnJcXynW.js → AiManusChatView-DDjbFnbt.js} +12 -12
  97. package/src/ui/dist/assets/{AnalysisPlugin-DeyzPEhV.js → AnalysisPlugin-Yb5IdmaU.js} +1 -1
  98. package/src/ui/dist/assets/CliPlugin-e64sreyu.js +31037 -0
  99. package/src/ui/dist/assets/{CodeEditorPlugin-B-xicq1e.js → CodeEditorPlugin-C4D2TIkU.js} +8 -8
  100. package/src/ui/dist/assets/{CodeViewerPlugin-DT54ysXa.js → CodeViewerPlugin-BVoNZIvC.js} +5 -5
  101. package/src/ui/dist/assets/{DocViewerPlugin-DQtKT-VD.js → DocViewerPlugin-CLChbllo.js} +3 -3
  102. package/src/ui/dist/assets/{GitDiffViewerPlugin-hqHbCfnv.js → GitDiffViewerPlugin-C4xeFyFQ.js} +20 -20
  103. package/src/ui/dist/assets/{ImageViewerPlugin-OcVo33jV.js → ImageViewerPlugin-OiMUAcLi.js} +5 -5
  104. package/src/ui/dist/assets/{LabCopilotPanel-DdGwhEUV.js → LabCopilotPanel-BjD2ThQF.js} +11 -11
  105. package/src/ui/dist/assets/{LabPlugin-Ciz1gDaX.js → LabPlugin-DQPg-NrB.js} +2 -2
  106. package/src/ui/dist/assets/{LatexPlugin-BhmjNQRC.js → LatexPlugin-CI05XAV9.js} +7 -7
  107. package/src/ui/dist/assets/{MarkdownViewerPlugin-BzdVH9Bx.js → MarkdownViewerPlugin-DpeBLYZf.js} +4 -4
  108. package/src/ui/dist/assets/{MarketplacePlugin-DmyHspXt.js → MarketplacePlugin-DolE58Q2.js} +3 -3
  109. package/src/ui/dist/assets/{NotebookEditor-BTVYRGkm.js → NotebookEditor-7Qm2rSWD.js} +11 -11
  110. package/src/ui/dist/assets/{NotebookEditor-BMXKrDRk.js → NotebookEditor-C1kWaxKi.js} +1 -1
  111. package/src/ui/dist/assets/{PdfLoader-CvcjJHXv.js → PdfLoader-BfOHw8Zw.js} +1 -1
  112. package/src/ui/dist/assets/{PdfMarkdownPlugin-DW2ej8Vk.js → PdfMarkdownPlugin-BulDREv1.js} +2 -2
  113. package/src/ui/dist/assets/{PdfViewerPlugin-CmlDxbhU.js → PdfViewerPlugin-C-daaOaL.js} +10 -10
  114. package/src/ui/dist/assets/{SearchPlugin-DAjQZPSv.js → SearchPlugin-CjpaiJ3A.js} +1 -1
  115. package/src/ui/dist/assets/{TextViewerPlugin-C-nVAZb_.js → TextViewerPlugin-BxIyqPQC.js} +5 -5
  116. package/src/ui/dist/assets/{VNCViewer-D7-dIYon.js → VNCViewer-HAg9mF7M.js} +10 -10
  117. package/src/ui/dist/assets/{bot-C_G4WtNI.js → bot-0DYntytV.js} +1 -1
  118. package/src/ui/dist/assets/{code-Cd7WfiWq.js → code-B20Slj_w.js} +1 -1
  119. package/src/ui/dist/assets/{file-content-B57zsL9y.js → file-content-DT24KFma.js} +1 -1
  120. package/src/ui/dist/assets/{file-diff-panel-DVoheLFq.js → file-diff-panel-DK13YPql.js} +1 -1
  121. package/src/ui/dist/assets/{file-socket-B5kXFxZP.js → file-socket-B4T2o4nR.js} +1 -1
  122. package/src/ui/dist/assets/{image-LLOjkMHF.js → image-DSeR_sDS.js} +1 -1
  123. package/src/ui/dist/assets/{index-hOUOWbW2.js → index-BrFje2Uk.js} +2 -2
  124. package/src/ui/dist/assets/{index-Dxa2eYMY.js → index-BwRJaoTl.js} +1 -1
  125. package/src/ui/dist/assets/{index-CLQauncb.js → index-D_E4281X.js} +5418 -28620
  126. package/src/ui/dist/assets/{index-C3r2iGrp.js → index-DnYB3xb1.js} +12 -12
  127. package/src/ui/dist/assets/{index-BQG-1s2o.css → index-G7AcWcMu.css} +43 -2
  128. package/src/ui/dist/assets/{monaco-BGGAEii3.js → monaco-LExaAN3Y.js} +1 -1
  129. package/src/ui/dist/assets/{pdf-effect-queue-DlEr1_y5.js → pdf-effect-queue-BJk5okWJ.js} +1 -1
  130. package/src/ui/dist/assets/{popover-CWJbJuYY.js → popover-D3Gg_FoV.js} +1 -1
  131. package/src/ui/dist/assets/{project-sync-CRJiucYO.js → project-sync-C_ygLlVU.js} +1 -1
  132. package/src/ui/dist/assets/{select-CoHB7pvH.js → select-CpAK6uWm.js} +2 -2
  133. package/src/ui/dist/assets/{sigma-D5aJWR8J.js → sigma-DEccaSgk.js} +1 -1
  134. package/src/ui/dist/assets/{square-check-big-DUK_mnkS.js → square-check-big-uUfyVsbD.js} +1 -1
  135. package/src/ui/dist/assets/{trash-ChU3SEE3.js → trash-CXvwwSe8.js} +1 -1
  136. package/src/ui/dist/assets/{useCliAccess-BrJBV3tY.js → useCliAccess-Bnop4mgR.js} +1 -1
  137. package/src/ui/dist/assets/{useFileDiffOverlay-C2OQaVWc.js → useFileDiffOverlay-B8eUAX0I.js} +1 -1
  138. package/src/ui/dist/assets/{wrap-text-C7Qqh-om.js → wrap-text-9vbOBpkW.js} +1 -1
  139. package/src/ui/dist/assets/{zoom-out-rtX0FKya.js → zoom-out-BgVMmOW4.js} +1 -1
  140. package/src/ui/dist/index.html +2 -2
  141. package/uv.lock +1 -1
  142. package/src/ui/dist/assets/CliPlugin-CB1YODQn.js +0 -5905
@@ -1,7 +1,7 @@
1
1
  # Baseline Checklist Template
2
2
 
3
3
  Use this as a living checklist.
4
- Update it during reading, setup, smoke testing, real execution, verification, and route changes.
4
+ Keep it short by default. For a fast path, complete the core checklist first and expand only if the route becomes complex or unstable.
5
5
 
6
6
  ## Identity
7
7
 
@@ -9,49 +9,38 @@ Update it during reading, setup, smoke testing, real execution, verification, an
9
9
  - route:
10
10
  - owner stage:
11
11
 
12
- ## Analysis
12
+ ## Core
13
+
14
+ - [ ] baseline object and route are explicit
15
+ - [ ] dataset / split and metric contract are explicit enough to judge comparability
16
+ - [ ] `PLAN.md` captures the command path, expected outputs, acceptance condition, and fallback
17
+ - [ ] smoke decision is explicit:
18
+ - skipped for a justified reason, or run once with outputs checked
19
+ - [ ] real validation/run decision is explicit:
20
+ - skipped for a justified reason, or launched/read with durable evidence
21
+ - [ ] expected result files and required metrics are checked
22
+ - [ ] baseline is accepted, blocked, or waived with a durable note
23
+
24
+ ## Closeout
25
+
26
+ - [ ] concise `1-2` sentence baseline summary written
27
+ - [ ] next stage named explicitly
28
+
29
+ ## Optional Expansion
30
+
31
+ Fill this only when the route becomes full-audit, repair-heavy, or publication-oriented.
13
32
 
14
33
  - [ ] paper source identified
15
34
  - [ ] repo source identified
16
35
  - [ ] paper read enough to restate the core method faithfully
17
36
  - [ ] repo read enough to identify the real entrypoints
18
- - [ ] dataset / split contract confirmed
19
- - [ ] metric contract confirmed
20
37
  - [ ] main files to inspect or modify listed
21
- - [ ] risks and fallbacks written into `PLAN.md`
22
-
23
- ## Setup
24
-
25
38
  - [ ] working directory confirmed
26
39
  - [ ] environment route chosen
27
40
  - [ ] key dependencies checked
28
41
  - [ ] model / data download path confirmed
29
42
  - [ ] fallback source recorded for critical downloads
30
-
31
- ## Smoke Test
32
-
33
- - [ ] smoke command written in `PLAN.md`
34
- - [ ] smoke command executed
35
- - [ ] smoke outputs verified
36
- - [ ] smoke failure handled or route revised
37
-
38
- ## Main Run
39
-
40
- - [ ] real command written in `PLAN.md`
41
- - [ ] real run launched with durable logging
42
43
  - [ ] monitoring cadence started
43
44
  - [ ] health signals confirmed
44
45
  - [ ] any execution deviation reflected back into `PLAN.md`
45
-
46
- ## Verification
47
-
48
- - [ ] expected result files exist
49
- - [ ] metric keys are complete
50
- - [ ] baseline is comparable to the intended contract
51
46
  - [ ] verification note written
52
- - [ ] baseline accepted or explicitly blocked / waived
53
-
54
- ## Closeout
55
-
56
- - [ ] concise `1-2` sentence baseline summary written
57
- - [ ] next stage named explicitly
@@ -1,9 +1,10 @@
1
1
  # Baseline Plan Template
2
2
 
3
3
  Use this when the `baseline` stage becomes concrete enough to act.
4
- Keep it short when the route is simple, but do not skip the sections that affect reproducibility, code touchpoints, or fallback handling.
4
+ Keep it short when the route is simple. For fast-path attach/import/prebound validation, a one-screen plan is enough if it preserves the route, command path, outputs, acceptance condition, and fallback.
5
+ Expand the optional sections only when the route is ambiguous, code-touching, broken, multi-variant, or intended for reuse beyond the current quest.
5
6
 
6
- ## 1. Objective
7
+ ## 1. Core Contract
7
8
 
8
9
  - quest goal:
9
10
  - user's core requirements:
@@ -12,80 +13,63 @@ Keep it short when the route is simple, but do not skip the sections that affect
12
13
  - attach / import / reproduce / repair
13
14
  - baseline id:
14
15
  - variant id:
15
-
16
- ## 2. Source Package
17
-
18
16
  - source paper:
19
17
  - source repo:
20
- - fallback repo or mirror:
21
18
  - source commit / version / tag:
22
19
  - task:
23
20
  - dataset / split:
24
21
  - metric contract:
22
+ - expected command path:
23
+ - expected outputs:
24
+ - acceptance condition:
25
+ - cheapest fallback:
25
26
 
26
- ## 3. Paper And Repo Reading Notes
27
-
28
- - paper summary in `1-3` bullets:
29
- - repo summary in `1-3` bullets:
30
- - what the baseline actually does:
31
- - what the likely bottlenecks or brittle points are:
32
- - what still needs verification:
33
-
34
- ## 4. Code Touchpoints
35
-
36
- List the main files or modules that matter before you change anything substantial.
37
-
38
- | Path | Role | Why it matters now | Expected action | Notes |
39
- |---|---|---|---|---|
40
- | | | | inspect / modify / leave alone | |
41
-
42
- ## 5. Environment And Asset Plan
27
+ ## 2. Execution Path
43
28
 
44
29
  - working directory:
45
30
  - environment plan:
46
31
  - required downloads:
47
- - checkpoints / models:
48
32
  - hardware assumptions:
49
- - likely external blockers:
50
-
51
- Fallbacks and contingency options:
52
-
53
- - if Hugging Face is slow, blocked, or rate-limited:
54
- - try ModelScope, official mirrors, quest-local caches, or manually staged files
55
- - if the official repo is unavailable:
56
- - use a verified mirror and record the exact provenance
57
- - if the full run is too expensive:
58
- - define the smoke-test path and the cheapest comparable reduced pilot
33
+ - smoke test needed:
34
+ - yes / no
35
+ - smoke command:
36
+ - main validation or run command:
37
+ - expected runtime / budget:
38
+ - durable log path:
39
+ - verification targets:
40
+ - fastest failure signal:
59
41
 
60
- ## 6. Execution Strategy
42
+ ## 3. Risks And Revision
61
43
 
62
- ### Smoke Test
44
+ - main risks:
45
+ - when to escalate from fast path to full audit:
46
+ - revision note:
63
47
 
64
- - command:
65
- - purpose:
66
- - expected outputs:
67
- - fastest failure signal:
48
+ ## 4. Optional Expansion
68
49
 
69
- ### Main Run
50
+ Fill this only when the route is no longer simple.
70
51
 
71
- - command:
72
- - expected outputs:
73
- - expected runtime / budget:
74
- - durable log path:
52
+ - fallback repo or mirror:
53
+ - checkpoints / models:
54
+ - likely external blockers:
75
55
  - safe efficiency levers to try first:
76
-
77
- ### Monitoring And Sleep Rules
78
-
79
- - first checks:
80
- - `60s`
81
- - `120s`
82
- - `300s`
83
- - `600s`
84
- - `1800s`
85
56
  - health signals that justify continued monitoring rather than intervention:
86
57
  - conditions that require plan revision or kill-and-relaunch:
58
+ - paper summary in `1-3` bullets:
59
+ - repo summary in `1-3` bullets:
60
+ - what the baseline actually does:
61
+ - what the likely bottlenecks or brittle points are:
62
+ - what still needs verification:
63
+
64
+ ## 5. Optional Code Touchpoints
65
+
66
+ List the main files or modules only when you expect real inspection or edits.
67
+
68
+ | Path | Role | Why it matters now | Expected action | Notes |
69
+ |---|---|---|---|---|
70
+ | | | | inspect / modify / leave alone | |
87
71
 
88
- ## 7. Verification Plan
72
+ ## 6. Optional Verification Plan
89
73
 
90
74
  - required result files:
91
75
  - required metric keys:
@@ -93,12 +77,12 @@ Fallbacks and contingency options:
93
77
  - acceptance condition:
94
78
  - downgrade / blocked condition:
95
79
 
96
- ## 8. Checklist Link
80
+ ## 7. Checklist Link
97
81
 
98
82
  - checklist path:
99
83
  - which item should move next:
100
84
 
101
- ## 9. Revision Log
85
+ ## 8. Revision Log
102
86
 
103
87
  | Time | What changed | Why it changed | Impact on execution |
104
88
  |---|---|---|---|
@@ -84,12 +84,14 @@ Choose the smallest action that genuinely resolves the current state.
84
84
 
85
85
  In the current runtime, prefer these concrete flow actions:
86
86
 
87
+ - record a candidate brief before branch promotion -> `artifact.submit_idea(mode='create', submission_mode='candidate', ...)`
87
88
  - accepted idea -> `artifact.submit_idea(mode='create', lineage_intent='continue_line'|'branch_alternative', ...)`
89
+ - promote a candidate brief into a durable optimization line -> `artifact.submit_idea(mode='create', submission_mode='line', source_candidate_id=..., lineage_intent='continue_line'|'branch_alternative', ...)`
88
90
  - maintenance-only in-place cleanup of the same branch -> `artifact.submit_idea(mode='revise', ...)`
89
91
  - compare branch foundations before a new round -> `artifact.list_research_branches(...)`
90
92
  - return to an older durable branch without creating a new node -> `artifact.activate_branch(...)`
91
93
  - materialize the concrete main-result node when a real main experiment line is about to be or was just durably recorded -> dedicated child `run/*` branch/worktree
92
- - start the next optimization round from a measured result -> `artifact.record(kind='decision', action='iterate', ...)`
94
+ - start the next optimization round from a measured result -> `artifact.record(payload={'kind': 'decision', 'action': 'iterate', ...})`
93
95
  - launch analysis campaign -> `artifact.create_analysis_campaign(...)`
94
96
  - finish one analysis slice -> `artifact.record_analysis_slice(...)`
95
97
  - select a paper outline -> `artifact.submit_paper_outline(mode='select', ...)`
@@ -104,6 +106,7 @@ If the chosen action is baseline reuse, the decision is not complete until one o
104
106
  Treat `prepare_branch` as a compatibility or recovery action, not the normal path.
105
107
  Treat `activate_branch` as the correct recovery or revisit action when the quest should resume on an existing older durable branch while preserving the newer research head.
106
108
  Treat each accepted branch as one durable research round.
109
+ Treat candidate briefs as branchless pre-promotion objects; they are not yet durable optimization lines.
107
110
  If a branch already has a durable main-experiment result, a genuinely new optimization round should normally create a child branch from a chosen foundation rather than keep revising that old branch in place.
108
111
  Treat each durable main experiment as its own child `run/*` branch/node, not as another mutable state on the idea branch.
109
112
  When paper mode is enabled and the necessary analysis for a strong run is done, the next default route is `write` on a dedicated `paper/*` branch/worktree derived from that run branch.
@@ -121,6 +124,12 @@ Make decisions from durable evidence:
121
124
 
122
125
  Do not make major decisions from vibe or momentum.
123
126
 
127
+ When the quest is algorithm-first, add one extra truth-source rule before non-trivial route choices:
128
+
129
+ - read `artifact.get_optimization_frontier(...)`
130
+ - treat the frontier as the primary optimize-state summary
131
+ - only override it when newer durable evidence clearly dominates
132
+
124
133
  ## Workflow
125
134
 
126
135
  ### 1. State the question
@@ -249,6 +258,14 @@ When recording the decision, make explicit:
249
258
  - which existing evidence was decisive
250
259
  - what residual risk remains after the choice
251
260
 
261
+ For algorithm-first route choices, prefer this default mapping:
262
+
263
+ - frontier says `explore` -> widen or refine candidate briefs before new branch creation
264
+ - frontier says `exploit` -> keep the strongest line active and advance the best implementation candidates
265
+ - frontier says `fusion` -> open at most one bounded fusion candidate
266
+ - a fixable candidate failure dominates -> run a debug route instead of widening search blindly
267
+ - frontier says `stop` -> record the stop decision and explicit reopen condition
268
+
252
269
  Good route-selection criteria often include:
253
270
 
254
271
  - feasibility
@@ -326,7 +343,7 @@ When asking, use a structured decision request with:
326
343
 
327
344
  ### 6. Record the decision durably
328
345
 
329
- Use `artifact.record(kind='decision', ...)` for the final decision.
346
+ Use `artifact.record(payload={'kind': 'decision', ...})` for the final decision.
330
347
 
331
348
  If user input is needed, also use `artifact.interact(kind='decision_request', ...)`.
332
349
  If the timeout expires without a user reply, choose the best option yourself, record why, and notify the user of the chosen option before moving on.
@@ -39,7 +39,9 @@ Use this skill for the main evidence-producing runs of the quest.
39
39
  - If the runtime starts an auto-continue turn with no new user message, continue from the current run state, logs, artifacts, and active requirements instead of replaying the previous user turn.
40
40
  - Progress message templates are references only. Adapt to the actual context and vary wording so messages feel human, respectful, and non-robotic.
41
41
  - If a threaded user reply arrives, interpret it relative to the latest experiment progress update before assuming the task changed completely.
42
+ - Hard execution rule: every terminal command in this stage must go through `bash_exec`; do not use any other terminal path for smoke tests, real runs, Git, Python, package-manager, or file-inspection commands.
42
43
  - Prefer `bash_exec` for experiment commands so each run gets a durable session id, quest-local log folder, and later `read/list/kill` control.
44
+ - For meaningful long-running runs, include the estimated next reply time or next check-in window whenever it is defensible.
43
45
 
44
46
  ## Stage purpose
45
47
 
@@ -64,6 +66,9 @@ Use `references/evidence-ladder.md` when deciding whether the current package is
64
66
  Completing one main run is not quest completion.
65
67
  After reporting the run, keep moving to iterate, analyze, write, or finalize unless a genuine blocking decision remains.
66
68
 
69
+ When the quest is algorithm-first, treat `experiment` as the execution surface of `optimize`, not as the terminal goal of the workflow.
70
+ After a measured result, the default next move is frontier review and optimize-side route selection rather than paper packaging.
71
+
67
72
  ## Quick workflow
68
73
 
69
74
  Treat this as the short run-order summary. The detailed run contract, execution rules, and recording rules remain in `Workflow`.
@@ -90,6 +95,7 @@ Treat this as the short run-order summary. The detailed run contract, execution
90
95
  - After each `artifact.record_main_experiment(...)`, route from the measured result:
91
96
  - if paper mode is enabled, decide whether to strengthen evidence, analyze, or write
92
97
  - if paper mode is disabled, prefer iterate / revise-idea / branch over default writing
98
+ - In algorithm-first work, after each main run, return to `optimize` or `decision` for frontier review before launching another large run.
93
99
 
94
100
  ## Experiment mental guardrails
95
101
 
@@ -429,8 +435,8 @@ For commands that may run longer than a few minutes:
429
435
  - if you only need wall-clock waiting between checks, use `bash_exec(command='sleep N', mode='await', timeout_seconds=N+buffer, ...)`
430
436
  - keep a real buffer on that sleep timeout; do not set `timeout_seconds` exactly equal to `N`
431
437
  - if you are waiting on an already running managed session, prefer `bash_exec(mode='await', id=..., timeout_seconds=...)` instead of starting a new sleep command
432
- - after every completed sleep / await cycle, inspect logs and send `artifact.interact(kind='progress', ...)` with the latest real status, latest evidence, the next checkpoint, and the estimated next reply time
433
- - after the first meaningful signal and then at real checkpoints (e.g., completion, or roughly every ~30 minutes if still running), keep those progress updates going rather than waiting silently
438
+ - after every completed sleep / await cycle, inspect logs first; only send `artifact.interact(kind='progress', ...)` when the user-visible state, frontier, blocker status, or ETA materially changed
439
+ - after the first meaningful signal and then at real checkpoints (e.g., completion, recovery, blocker, or a materially widened comparable surface), keep those progress updates going rather than waiting silently
434
440
  - if the run is clearly invalid, wedged, or superseded, stop it with `bash_exec(mode='kill', id=..., wait=true, timeout_seconds=...)`; if it must die immediately, add `force=true`, record the reason, fix the issue, and relaunch cleanly
435
441
  - do not report completion until logs and output files both confirm completion
436
442
 
@@ -11,10 +11,13 @@ Use this skill to close or pause a quest responsibly.
11
11
 
12
12
  - Follow the shared interaction contract injected by the system prompt.
13
13
  - For ordinary active work, prefer a concise progress update once work has crossed roughly 6 tool calls with a human-meaningful delta, and do not drift beyond roughly 12 tool calls or about 8 minutes without a user-visible update.
14
+ - Do not emit another finalize progress update when the user-visible state is unchanged.
14
15
  - If the runtime starts an auto-continue turn with no new user message, keep finalizing from the durable quest state and active requirements instead of replaying the previous user turn.
15
16
  - If a threaded user reply arrives, interpret it relative to the latest finalize progress update before assuming the task changed completely.
16
17
  - When finalize reaches a real closure state, pause-ready packet, or route-back decision, send one threaded `artifact.interact(kind='milestone', ...)` update that names the recommendation, why it is the right call, and any reopen condition that still matters.
17
18
  - True quest completion still requires explicit user approval through the runtime completion flow before calling `artifact.complete_quest(...)`.
19
+ - Rechecking that the same bundle files still exist, or re-aligning status surfaces without changing the closure judgment, does not by itself count as a fresh milestone.
20
+ - Hard execution rule: if this stage needs terminal work such as Git inspection, packaging checks, document builds, or file inspection, every such command must go through `bash_exec`.
18
21
 
19
22
  ## Stage purpose
20
23
 
@@ -54,8 +57,19 @@ Before finalizing, gather:
54
57
  - latest quest documents
55
58
  - latest review / proofing / submission state when a paper bundle exists
56
59
  - the paper bundle manifest and its referenced paths when the quest has a paper-like deliverable
60
+ - the paper evidence ledger and selected-outline section statuses when the quest has a paper-like deliverable
57
61
 
58
62
  If finalization reveals that the quest is still too uncertain, route back through `decision` rather than forcing closure.
63
+ For paper-like deliverables, do not finalize while any of these remain true:
64
+
65
+ - required main-text outline items are still unresolved
66
+ - completed analysis remains unmapped into the paper contract
67
+ - the active paper line still reports open supplementary work that is expected to block the manuscript
68
+
69
+ If the current paper-state blocker is not obvious from the existing files, call `artifact.get_paper_contract_health(detail='full')` before deciding whether finalize is legitimate.
70
+ If the active quest/runtime state is unclear after restart or long pause, call `artifact.get_quest_state(detail='summary')` first.
71
+ If the exact latest `SUMMARY.md`, `status.md`, or active user requirement wording matters for closure, call `artifact.read_quest_documents(...)`.
72
+ If earlier user/assistant continuity matters for whether the quest should really stop, call `artifact.get_conversation_context(...)` instead of guessing from prompt context alone.
59
73
 
60
74
  ## Truth sources
61
75
 
@@ -90,6 +104,7 @@ The finalize stage should usually leave behind:
90
104
  If the quest produced a paper-style bundle, finalization should also check that the writing stage left behind enough closure evidence, such as:
91
105
 
92
106
  - selected outline and outline selection records
107
+ - evidence ledger records and section-level result tables
93
108
  - review output
94
109
  - proofing output
95
110
  - submission or packaging checklist
@@ -113,12 +128,14 @@ Say clearly what exists and why it matters. Name concrete paths or artifact ids
113
128
  When a paper bundle exists, verify the manifest inventory explicitly, including:
114
129
 
115
130
  - `paper/paper_bundle_manifest.json`
131
+ - `paper/evidence_ledger.json`
116
132
  - the recorded `paper_branch` and source evidence branch / run fields in that manifest
117
133
  - referenced `outline_path`
118
134
  - referenced `draft_path`
119
135
  - referenced `writing_plan_path`
120
136
  - referenced `references_path`
121
137
  - referenced `claim_evidence_map_path`
138
+ - referenced `evidence_ledger_path`
122
139
  - referenced `baseline_inventory_path`
123
140
  - referenced `compile_report_path`
124
141
  - referenced `pdf_path`
@@ -243,6 +260,7 @@ Weak finalization:
243
260
  - leaves no clear recommendation
244
261
  - claims “done” without showing what is actually done
245
262
  - drops the package or file inventory needed for resumption
263
+ - ignores unmapped completed analysis that never entered the paper contract
246
264
 
247
265
  ## Memory rules
248
266
 
@@ -7,6 +7,10 @@ description: Use when a quest needs concrete hypotheses, limitation analysis, ca
7
7
 
8
8
  Use this skill to turn the current baseline and problem frame into concrete, literature-grounded, testable directions.
9
9
 
10
+ When `startup_contract.need_research_paper = false` and the quest already has a concrete optimization handle, `idea` may stop after selecting or seeding a direction and then hand off into `optimize` instead of insisting on the full paper-oriented ideation loop.
11
+ In that algorithm-first case, `idea` should usually produce a small method-brief frontier and then defer candidate ranking, promotion, and bounded search to `optimize`.
12
+ When doing that handoff, prefer the brief-shaping discipline later used by `optimize`: clarify the bottleneck and constraints, keep only a small differentiated `2-3` option slate, and hand off a recommended brief rather than a pile of loose intuitions.
13
+
10
14
  ## Interaction discipline
11
15
 
12
16
  - Follow the shared interaction contract injected by the system prompt.
@@ -39,6 +43,15 @@ The output must survive three checks at once:
39
43
  - feasibility in the current repo and resource budget
40
44
  - manuscript defensibility if the line later becomes a paper claim
41
45
 
46
+ When the route already looks likely to become a paper-facing line, seed one lightweight structured outline candidate during idea work.
47
+ Use `artifact.submit_paper_outline(mode='candidate', ...)` for that seed instead of leaving the future paper structure only in prose.
48
+ Use `references/outline-seeding-example.md` for the minimum acceptable shape.
49
+ The idea-stage outline candidate is not the full paper line yet, but it should already name the likely `research_questions`, `experimental_designs`, and the first section-level evidence needs that later supplementary slices must satisfy.
50
+ Keep that seed minimal and executable: a small section skeleton plus expected evidence items is better than a long narrative outline with no concrete evidence hooks.
51
+ If the current research head, strongest measured branch, or active runtime refs are unclear after resume, call `artifact.get_quest_state(detail='summary')` and `artifact.list_research_branches(...)` before choosing a foundation.
52
+ If the current brief / plan / status wording matters for direction choice, call `artifact.read_quest_documents(...)`.
53
+ If earlier user conversation materially changes the direction-selection target, call `artifact.get_conversation_context(...)` before locking the next idea.
54
+
42
55
  Finishing one idea deliverable is not quest completion.
43
56
  After reporting a completed idea package, continue into the next justified stage unless a real blocking decision is still unresolved.
44
57
 
@@ -106,6 +119,11 @@ Break ties primarily through careful reasoning over:
106
119
  - Do not write, promote, or submit a final idea until the durable survey covers at least `5` and usually `5-10` task-modeling-related, mechanism-relevant, or otherwise directly usable papers.
107
120
  - Treat that literature floor as a hard gate, not a suggestion.
108
121
  If the direct task-modeling neighborhood truly contains fewer than `5` usable papers, record that evidence explicitly and fill the remaining slots with the closest adjacent papers whose mechanism can be translated into the current task and codebase.
122
+ - Algorithm-first exception:
123
+ - when `startup_contract.need_research_paper = false` and a concrete optimization handle already exists, you may stop after a memory sweep plus a small targeted paper check instead of satisfying the full `5-10` paper floor
124
+ - use that exception only when the immediate goal is method-brief selection for `optimize`, not paper-level novelty claims
125
+ - if you use the exception, say explicitly that the output is an optimization brief frontier rather than a paper-ready idea package
126
+ - still shape that frontier deliberately: clarify the bottleneck and comparability boundary first, keep a differentiated `2-3` candidate slate, and explain why one brief is recommended now
109
127
  - Every fresh idea build or idea-refinement pass must begin with:
110
128
  - a memory sweep, and
111
129
  - an external literature sweep.
@@ -133,12 +151,19 @@ Break ties primarily through careful reasoning over:
133
151
  - Unless strong durable evidence already narrows the route to one obvious serious option, run one bounded divergent pass that produces a small but meaningfully varied slate, usually `6-12` raw ideas before collapsing to a serious frontier that is usually `2-3` and at most `5`.
134
152
  - If all surviving candidates belong to the same mechanism family, widen once with at least two new ideation lenses before converging.
135
153
  - Keep structurally coherent rejected ideas in a parking-lot or rejected-candidate section so they can be recombined later if needed.
154
+ - In algorithm-first work, `idea` should usually produce direction families, not a large within-family variant swarm.
155
+ - Treat within-family micro-variants as `optimize` brief work unless the mechanism family itself is still unresolved.
136
156
  - Every serious candidate must answer `why now?` or `what changed?`, not just `what is the mechanism?`
137
157
  - Every selected idea must survive a two-sentence pitch and strongest-objection check before promotion.
138
158
  - Do not promote a direction unless you can explain:
139
159
  - what limitation it targets
140
160
  - why prior methods do not already solve it
141
161
  - what evidence would later be needed to defend the claim
162
+ - When the likely next route is a paper-facing main experiment plus analysis package, do not stop at prose-only idea notes; seed the likely `research_questions`, `experimental_designs`, and per-section evidence needs in the outline candidate.
163
+ - If the likely route already has a clear paper-facing structure, seed the future paper line early:
164
+ - identify the likely main-text sections
165
+ - identify which sections will need supplementary evidence rather than only the main run
166
+ - identify the concrete evidence items that must later be maintained in the paper line's outline folder or compiled outline contract
142
167
  - If the idea is not novel but still worth doing, state that honestly as:
143
168
  - replication value
144
169
  - transfer-to-new-setting value
@@ -182,6 +207,51 @@ In practice:
182
207
 
183
208
  Do not skip the `scout` pass just because the quest is already in the `idea` stage.
184
209
 
210
+ ## Direction-shaping protocol
211
+
212
+ Use `references/idea-thinking-flow.md` when the main need is better reasoning hygiene.
213
+ Use `references/idea-generation-playbook.md` when the main need is to create a new idea slate and select one clear next research object.
214
+
215
+ Default creation flow for a fresh idea pass:
216
+
217
+ 1. frame one concrete limitation
218
+ 2. separate symptom / mechanism hypothesis / consequence
219
+ 3. keep one main hypothesis plus `2-3` competing hypotheses
220
+ 4. name the primary lever bucket
221
+ 5. generate a bounded candidate slate from that framing
222
+ 6. record selected / deferred / rejected outcomes explicitly
223
+
224
+ Set the frontier width with a validation-cost estimate before widening:
225
+
226
+ - `fast-check`: the first objective validation loop is likely under about `20` minutes
227
+ - `slow-check`: the first objective validation loop is likely over about `20` minutes or otherwise expensive in compute, queue time, or human delay
228
+
229
+ For `fast-check` idea work:
230
+
231
+ - allow a slightly wider serious slate when the candidates are meaningfully different
232
+ - prefer candidates with cheap, orthogonal falsification paths
233
+ - keep more alternatives alive into `optimize` because validation is cheaper than overthinking
234
+
235
+ For `slow-check` idea work:
236
+
237
+ - keep the serious slate tighter, usually `1-3`
238
+ - demand a clearer bottleneck story and stronger evidence before adding another family
239
+ - prefer the route with the best expected evidence-per-run, not the route with the most speculative upside
240
+ - do not hand off a broad speculative slate just because it sounds interesting
241
+
242
+ Do not start by shopping for modules to add.
243
+ Do not let one attractive mechanism become the de facto framing before the limitation is pinned down.
244
+ Do not let direction-family ideation collapse into within-family variant generation too early.
245
+
246
+ In normal idea work, stop at the direction-family level:
247
+
248
+ - select which mechanism families deserve serious consideration
249
+ - identify the strongest one to carry forward
250
+ - hand off within-family brief shaping to `optimize` when the quest is algorithm-first
251
+
252
+ If the task still requires choosing among mechanism families, stay in `idea`.
253
+ If the family is already chosen and the next need is branchless method-brief shaping, hand off to `optimize`.
254
+
185
255
  ## Truth sources
186
256
 
187
257
  Use:
@@ -1118,6 +1188,14 @@ When writing paper memory cards, include enough metadata to avoid redundant sear
1118
1188
 
1119
1189
  At the end of ideation, at least one part of the literature survey must be preserved in memory so a later idea pass can retrieve it directly instead of rebuilding the search from scratch.
1120
1190
 
1191
+ Every serious idea pass should also leave a durable outcome split:
1192
+
1193
+ - one selected idea or selected direction family
1194
+ - any deferred but still plausible alternatives
1195
+ - any rejected alternatives with a one-line rejection reason
1196
+
1197
+ Do not leave the rejected and deferred reasoning only in chat.
1198
+
1121
1199
  Promote to global memory only when the lesson is reusable outside this quest.
1122
1200
 
1123
1201
  ## Artifact rules
@@ -0,0 +1,100 @@
1
+ # Idea Generation Playbook
2
+
3
+ Use this reference when the `idea` stage needs a concrete creation flow for producing a new idea slate.
4
+
5
+ The goal is not a bag of clever mechanisms.
6
+ The goal is one clear next research object plus a durable record of what was deferred or rejected.
7
+
8
+ ## 1. Start from one limitation card
9
+
10
+ Write one limitation card before generating ideas:
11
+
12
+ - observed symptom
13
+ - condition where it appears
14
+ - why it matters for the target metric or claim
15
+ - strongest evidence that this is a real pattern
16
+
17
+ If the limitation card is weak, do not widen into ideation yet.
18
+
19
+ ## 2. Split the problem into three layers
20
+
21
+ For the active limitation, separate:
22
+
23
+ - symptom
24
+ - mechanism hypothesis
25
+ - consequence
26
+
27
+ Do not skip this split.
28
+ It prevents solution-shopping from becoming the hidden driver.
29
+
30
+ ## 3. Keep competing hypotheses alive
31
+
32
+ Before selecting mechanisms, record:
33
+
34
+ - one main hypothesis
35
+ - `2-3` competing hypotheses
36
+
37
+ If there is only one hypothesis, the idea pass is usually too collapsed.
38
+
39
+ ## 4. Name the lever bucket
40
+
41
+ Choose the primary lever bucket explicitly:
42
+
43
+ - data
44
+ - model
45
+ - objective
46
+ - optimization or training dynamics
47
+ - inference
48
+ - evaluation protocol
49
+ - infrastructure
50
+
51
+ If the lever bucket is unclear, the idea is usually still too fuzzy.
52
+
53
+ ## 5. Generate direction families, not micro-variants
54
+
55
+ Create a bounded slate of direction families.
56
+ Default target:
57
+
58
+ - `6-12` raw ideas in one bounded divergent pass
59
+ - collapse to `2-3` serious candidates and at most `5`
60
+
61
+ Prefer family-level differences over small parameter or implementation variations.
62
+ If several candidates are really the same family with minor tweaks, merge them.
63
+
64
+ For each serious candidate, record:
65
+
66
+ - limitation targeted
67
+ - mechanism family
68
+ - why now
69
+ - strongest prior-work overlap
70
+ - expected evidence burden
71
+ - likely falsification path
72
+
73
+ ## 6. Select one and ledger the rest
74
+
75
+ At the end of the pass, produce three durable buckets:
76
+
77
+ - selected
78
+ - deferred
79
+ - rejected
80
+
81
+ For deferred ideas, write why they remain plausible but are not first.
82
+ For rejected ideas, write one-line rejection reasons such as:
83
+
84
+ - redundant with closer prior work
85
+ - too confounded to test cleanly
86
+ - weak value if positive
87
+ - too broad for current compute or codebase
88
+ - lower priority than the selected route
89
+
90
+ ## 7. Exit criterion
91
+
92
+ The pass is complete only when the output contains:
93
+
94
+ - one selected idea or selected direction family
95
+ - one falsifiable claim
96
+ - one minimal experiment concept
97
+ - one abandonment condition
98
+ - deferred and rejected rationale recorded durably
99
+
100
+ If the result is still a bag of possibilities, stay in `idea`.