@researai/deepscientist 1.5.9 → 1.5.12

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (165) hide show
  1. package/README.md +112 -99
  2. package/assets/branding/connector-qq.png +0 -0
  3. package/assets/branding/connector-rokid.png +0 -0
  4. package/assets/branding/connector-weixin.png +0 -0
  5. package/assets/branding/projects.png +0 -0
  6. package/bin/ds.js +519 -63
  7. package/docs/assets/branding/projects.png +0 -0
  8. package/docs/en/00_QUICK_START.md +338 -68
  9. package/docs/en/01_SETTINGS_REFERENCE.md +14 -0
  10. package/docs/en/02_START_RESEARCH_GUIDE.md +180 -4
  11. package/docs/en/04_LINGZHU_CONNECTOR_GUIDE.md +62 -179
  12. package/docs/en/09_DOCTOR.md +66 -5
  13. package/docs/en/10_WEIXIN_CONNECTOR_GUIDE.md +137 -0
  14. package/docs/en/11_LICENSE_AND_RISK.md +256 -0
  15. package/docs/en/12_GUIDED_WORKFLOW_TOUR.md +446 -0
  16. package/docs/en/13_CORE_ARCHITECTURE_GUIDE.md +297 -0
  17. package/docs/en/14_PROMPT_SKILLS_AND_MCP_GUIDE.md +506 -0
  18. package/docs/en/15_CODEX_PROVIDER_SETUP.md +284 -0
  19. package/docs/en/99_ACKNOWLEDGEMENTS.md +4 -1
  20. package/docs/en/README.md +83 -0
  21. package/docs/images/lingzhu/rokid-agent-platform-create.png +0 -0
  22. package/docs/images/weixin/weixin-plugin-entry.png +0 -0
  23. package/docs/images/weixin/weixin-plugin-entry.svg +33 -0
  24. package/docs/images/weixin/weixin-qr-confirm.svg +30 -0
  25. package/docs/images/weixin/weixin-quest-media-flow.svg +44 -0
  26. package/docs/images/weixin/weixin-settings-bind.svg +57 -0
  27. package/docs/zh/00_QUICK_START.md +345 -72
  28. package/docs/zh/01_SETTINGS_REFERENCE.md +14 -0
  29. package/docs/zh/02_START_RESEARCH_GUIDE.md +181 -3
  30. package/docs/zh/04_LINGZHU_CONNECTOR_GUIDE.md +62 -193
  31. package/docs/zh/09_DOCTOR.md +68 -5
  32. package/docs/zh/10_WEIXIN_CONNECTOR_GUIDE.md +144 -0
  33. package/docs/zh/11_LICENSE_AND_RISK.md +256 -0
  34. package/docs/zh/12_GUIDED_WORKFLOW_TOUR.md +442 -0
  35. package/docs/zh/13_CORE_ARCHITECTURE_GUIDE.md +296 -0
  36. package/docs/zh/14_PROMPT_SKILLS_AND_MCP_GUIDE.md +506 -0
  37. package/docs/zh/15_CODEX_PROVIDER_SETUP.md +285 -0
  38. package/docs/zh/99_ACKNOWLEDGEMENTS.md +4 -1
  39. package/docs/zh/README.md +129 -0
  40. package/install.sh +0 -34
  41. package/package.json +2 -2
  42. package/pyproject.toml +1 -1
  43. package/src/deepscientist/__init__.py +1 -1
  44. package/src/deepscientist/annotations.py +343 -0
  45. package/src/deepscientist/artifact/arxiv.py +484 -37
  46. package/src/deepscientist/artifact/service.py +574 -108
  47. package/src/deepscientist/arxiv_library.py +275 -0
  48. package/src/deepscientist/bash_exec/monitor.py +7 -5
  49. package/src/deepscientist/bash_exec/service.py +93 -21
  50. package/src/deepscientist/bridges/builtins.py +2 -0
  51. package/src/deepscientist/bridges/connectors.py +447 -0
  52. package/src/deepscientist/channels/__init__.py +2 -0
  53. package/src/deepscientist/channels/builtins.py +3 -1
  54. package/src/deepscientist/channels/local.py +3 -3
  55. package/src/deepscientist/channels/qq.py +8 -8
  56. package/src/deepscientist/channels/qq_gateway.py +1 -1
  57. package/src/deepscientist/channels/relay.py +14 -8
  58. package/src/deepscientist/channels/weixin.py +59 -0
  59. package/src/deepscientist/channels/weixin_ilink.py +388 -0
  60. package/src/deepscientist/config/models.py +23 -2
  61. package/src/deepscientist/config/service.py +539 -67
  62. package/src/deepscientist/connector/__init__.py +4 -0
  63. package/src/deepscientist/connector/connector_profiles.py +481 -0
  64. package/src/deepscientist/connector/lingzhu_support.py +668 -0
  65. package/src/deepscientist/connector/qq_profiles.py +206 -0
  66. package/src/deepscientist/connector/weixin_support.py +663 -0
  67. package/src/deepscientist/connector_profiles.py +1 -374
  68. package/src/deepscientist/connector_runtime.py +2 -0
  69. package/src/deepscientist/daemon/api/handlers.py +165 -5
  70. package/src/deepscientist/daemon/api/router.py +13 -1
  71. package/src/deepscientist/daemon/app.py +1444 -67
  72. package/src/deepscientist/doctor.py +4 -5
  73. package/src/deepscientist/gitops/diff.py +120 -29
  74. package/src/deepscientist/lingzhu_support.py +1 -182
  75. package/src/deepscientist/mcp/server.py +135 -7
  76. package/src/deepscientist/prompts/builder.py +128 -11
  77. package/src/deepscientist/qq_profiles.py +1 -196
  78. package/src/deepscientist/quest/node_traces.py +23 -0
  79. package/src/deepscientist/quest/service.py +359 -74
  80. package/src/deepscientist/quest/stage_views.py +71 -5
  81. package/src/deepscientist/runners/codex.py +170 -19
  82. package/src/deepscientist/runners/runtime_overrides.py +6 -0
  83. package/src/deepscientist/shared.py +33 -14
  84. package/src/deepscientist/weixin_support.py +1 -0
  85. package/src/prompts/connectors/lingzhu.md +3 -1
  86. package/src/prompts/connectors/qq.md +2 -1
  87. package/src/prompts/connectors/weixin.md +231 -0
  88. package/src/prompts/contracts/shared_interaction.md +4 -1
  89. package/src/prompts/system.md +61 -9
  90. package/src/skills/analysis-campaign/SKILL.md +46 -6
  91. package/src/skills/analysis-campaign/references/campaign-plan-template.md +21 -8
  92. package/src/skills/baseline/SKILL.md +1 -1
  93. package/src/skills/decision/SKILL.md +1 -1
  94. package/src/skills/experiment/SKILL.md +1 -1
  95. package/src/skills/finalize/SKILL.md +1 -1
  96. package/src/skills/idea/SKILL.md +1 -1
  97. package/src/skills/intake-audit/SKILL.md +1 -1
  98. package/src/skills/rebuttal/SKILL.md +74 -1
  99. package/src/skills/rebuttal/references/response-letter-template.md +55 -11
  100. package/src/skills/review/SKILL.md +118 -1
  101. package/src/skills/review/references/experiment-todo-template.md +23 -0
  102. package/src/skills/review/references/review-report-template.md +16 -0
  103. package/src/skills/review/references/revision-log-template.md +4 -0
  104. package/src/skills/scout/SKILL.md +1 -1
  105. package/src/skills/write/SKILL.md +168 -7
  106. package/src/skills/write/references/paper-experiment-matrix-template.md +131 -0
  107. package/src/tui/package.json +1 -1
  108. package/src/ui/dist/assets/{AiManusChatView-BKZ103sn.js → AiManusChatView-CnJcXynW.js} +156 -48
  109. package/src/ui/dist/assets/{AnalysisPlugin-mTTzGAlK.js → AnalysisPlugin-DeyzPEhV.js} +1 -1
  110. package/src/ui/dist/assets/{CliPlugin-BH58n3GY.js → CliPlugin-CB1YODQn.js} +164 -9
  111. package/src/ui/dist/assets/{CodeEditorPlugin-BKGRUH7e.js → CodeEditorPlugin-B-xicq1e.js} +8 -8
  112. package/src/ui/dist/assets/{CodeViewerPlugin-BMADwFWJ.js → CodeViewerPlugin-DT54ysXa.js} +5 -5
  113. package/src/ui/dist/assets/{DocViewerPlugin-ZOnTIHLN.js → DocViewerPlugin-DQtKT-VD.js} +3 -3
  114. package/src/ui/dist/assets/{GitDiffViewerPlugin-CQ7h1Djm.js → GitDiffViewerPlugin-hqHbCfnv.js} +20 -21
  115. package/src/ui/dist/assets/{ImageViewerPlugin-GVS5MsnC.js → ImageViewerPlugin-OcVo33jV.js} +5 -5
  116. package/src/ui/dist/assets/{LabCopilotPanel-BZNv1JML.js → LabCopilotPanel-DdGwhEUV.js} +11 -11
  117. package/src/ui/dist/assets/{LabPlugin-TWcJsdQA.js → LabPlugin-Ciz1gDaX.js} +2 -1
  118. package/src/ui/dist/assets/{LatexPlugin-DIjHiR2x.js → LatexPlugin-BhmjNQRC.js} +37 -11
  119. package/src/ui/dist/assets/{MarkdownViewerPlugin-D3ooGAH0.js → MarkdownViewerPlugin-BzdVH9Bx.js} +4 -4
  120. package/src/ui/dist/assets/{MarketplacePlugin-DfVfE9hN.js → MarketplacePlugin-DmyHspXt.js} +3 -3
  121. package/src/ui/dist/assets/{NotebookEditor-DDl0_Mc0.js → NotebookEditor-BMXKrDRk.js} +1 -1
  122. package/src/ui/dist/assets/{NotebookEditor-s8JhzuX1.js → NotebookEditor-BTVYRGkm.js} +12 -12
  123. package/src/ui/dist/assets/{PdfLoader-C2Sf6SJM.js → PdfLoader-CvcjJHXv.js} +14 -7
  124. package/src/ui/dist/assets/{PdfMarkdownPlugin-CXFLoIsa.js → PdfMarkdownPlugin-DW2ej8Vk.js} +73 -6
  125. package/src/ui/dist/assets/{PdfViewerPlugin-BYTmz2fK.js → PdfViewerPlugin-CmlDxbhU.js} +103 -34
  126. package/src/ui/dist/assets/PdfViewerPlugin-DQ11QcSf.css +3627 -0
  127. package/src/ui/dist/assets/{SearchPlugin-CjWBI1O9.js → SearchPlugin-DAjQZPSv.js} +1 -1
  128. package/src/ui/dist/assets/{TextViewerPlugin-DdOBU3-S.js → TextViewerPlugin-C-nVAZb_.js} +5 -4
  129. package/src/ui/dist/assets/{VNCViewer-B8HGgLwQ.js → VNCViewer-D7-dIYon.js} +10 -10
  130. package/src/ui/dist/assets/bot-C_G4WtNI.js +21 -0
  131. package/src/ui/dist/assets/branding/logo-rokid.png +0 -0
  132. package/src/ui/dist/assets/browser-BAcuE0Xj.js +2895 -0
  133. package/src/ui/dist/assets/{code-BWAY76JP.js → code-Cd7WfiWq.js} +1 -1
  134. package/src/ui/dist/assets/{file-content-C1NwU5oQ.js → file-content-B57zsL9y.js} +1 -1
  135. package/src/ui/dist/assets/{file-diff-panel-CywslwB9.js → file-diff-panel-DVoheLFq.js} +1 -1
  136. package/src/ui/dist/assets/{file-socket-B4kzuOBQ.js → file-socket-B5kXFxZP.js} +1 -1
  137. package/src/ui/dist/assets/{image-D-NZM-6P.js → image-LLOjkMHF.js} +1 -1
  138. package/src/ui/dist/assets/{index-DGIYDuTv.css → index-BQG-1s2o.css} +40 -13
  139. package/src/ui/dist/assets/{index-DHZJ_0TI.js → index-C3r2iGrp.js} +12 -12
  140. package/src/ui/dist/assets/{index-7Chr1g9c.js → index-CLQauncb.js} +15050 -9561
  141. package/src/ui/dist/assets/index-Dxa2eYMY.js +25 -0
  142. package/src/ui/dist/assets/{index-BdM1Gqfr.js → index-hOUOWbW2.js} +2 -2
  143. package/src/ui/dist/assets/{monaco-Cb2uKKe6.js → monaco-BGGAEii3.js} +1 -1
  144. package/src/ui/dist/assets/{pdf-effect-queue-DSw_D3RV.js → pdf-effect-queue-DlEr1_y5.js} +16 -1
  145. package/src/ui/dist/assets/pdf.worker.min-yatZIOMy.mjs +21 -0
  146. package/src/ui/dist/assets/{popover-Bg72DGgT.js → popover-CWJbJuYY.js} +1 -1
  147. package/src/ui/dist/assets/{project-sync-Ce_0BglY.js → project-sync-CRJiucYO.js} +18 -77
  148. package/src/ui/dist/assets/select-CoHB7pvH.js +1690 -0
  149. package/src/ui/dist/assets/{sigma-DPaACDrh.js → sigma-D5aJWR8J.js} +1 -1
  150. package/src/ui/dist/assets/{index-CDxNdQdz.js → square-check-big-DUK_mnkS.js} +2 -13
  151. package/src/ui/dist/assets/{trash-BvTgE5__.js → trash-ChU3SEE3.js} +1 -1
  152. package/src/ui/dist/assets/{useCliAccess-CgPeMOwP.js → useCliAccess-BrJBV3tY.js} +1 -1
  153. package/src/ui/dist/assets/{useFileDiffOverlay-xPhz7P5B.js → useFileDiffOverlay-C2OQaVWc.js} +1 -1
  154. package/src/ui/dist/assets/{wrap-text-C3Un3YQr.js → wrap-text-C7Qqh-om.js} +1 -1
  155. package/src/ui/dist/assets/{zoom-out-BgxLa0Ri.js → zoom-out-rtX0FKya.js} +1 -1
  156. package/src/ui/dist/index.html +2 -2
  157. package/src/ui/dist/assets/AutoFigurePlugin-BGxN8Umr.css +0 -3056
  158. package/src/ui/dist/assets/AutoFigurePlugin-C_wWw4AP.js +0 -8149
  159. package/src/ui/dist/assets/PdfViewerPlugin-BJXtIwj_.css +0 -260
  160. package/src/ui/dist/assets/Stepper-B0Dd8CxK.js +0 -158
  161. package/src/ui/dist/assets/bibtex-CKaefIN2.js +0 -189
  162. package/src/ui/dist/assets/file-utils-H2fjA46S.js +0 -109
  163. package/src/ui/dist/assets/message-square-BzjLiXir.js +0 -16
  164. package/src/ui/dist/assets/pdfjs-DU1YE8WO.js +0 -3
  165. package/src/ui/dist/assets/tooltip-C_mA6R0w.js +0 -108
@@ -0,0 +1,231 @@
1
+ # Weixin Connector Contract
2
+
3
+ - connector_contract_id: weixin
4
+ - connector_contract_scope: loaded only when Weixin is the active or bound external connector for this quest
5
+ - connector_contract_goal: use `artifact.interact(...)` as the main durable user-visible thread while respecting the Weixin iLink `context_token` reply model
6
+ - weixin_runtime_ack_rule: the Weixin bridge itself emits the immediate transport-level receipt acknowledgement before the model turn starts
7
+ - weixin_no_duplicate_ack_rule: do not waste your first model response or first `artifact.interact(...)` call on a second bare acknowledgement such as "received", "已收到", or "processing" when the bridge already sent that
8
+ - weixin_reply_style_rule: keep Weixin replies concise, milestone-first, respectful, and easy to scan on a phone
9
+ - weixin_reply_length_rule: for ordinary Weixin progress replies, normally use only 2 to 4 short sentences, or 3 short bullets at most
10
+ - weixin_summary_first_rule: start with the user-facing conclusion, then what it means, then the next action
11
+ - weixin_progress_shape_rule: make the current task, the main difficulty or latest real progress, and the next concrete measure explicit whenever possible
12
+ - weixin_eta_rule: for important long-running phases such as baseline reproduction, main experiments, analysis, or paper packaging, include a rough ETA or next check-in window when you can
13
+ - weixin_tool_call_keepalive_rule: for ordinary active work, prefer one concise Weixin progress update after roughly 6 tool calls when there is already a human-meaningful delta, and do not let work drift beyond roughly 12 tool calls or about 8 minutes without a user-visible checkpoint
14
+ - weixin_read_plan_keepalive_rule: if the active work is still mostly reading, comparison, or planning, do not wait too long for a "big result"; send a short Weixin-facing checkpoint after about 5 consecutive tool calls if the user would otherwise see silence
15
+ - weixin_internal_detail_rule: omit worker names, retry counters, pending/running/completed counts, low-level file listings, and monitor-window narration unless the user explicitly asked for them or they change the recommended action
16
+ - weixin_translation_rule: translate internal execution and file-management work into user value instead of narrating tool or filesystem churn
17
+ - weixin_preflight_rule: before sending a Weixin-facing progress update, rewrite it if it still reads like a monitor log, execution diary, or file inventory
18
+ - weixin_operator_surface_rule: treat Weixin as an operator surface for concise coordination and milestone delivery, not as a full artifact browser
19
+ - weixin_default_text_rule: plain text is the default and safest Weixin mode
20
+ - weixin_context_token_rule: ordinary downstream replies rely on the runtime-managed `context_token`; do not invent your own reply token fields
21
+ - weixin_media_rule: Weixin supports native image, video, and file delivery through structured attachments; request them through `artifact.interact(..., attachments=[...])` instead of inventing inline tag syntax
22
+ - weixin_media_path_rule: when sending native Weixin media, prefer absolute local paths; remote URLs are allowed only when the bridge can download them safely
23
+ - weixin_media_path_priority_rule: prefer quest-local files under `artifacts/`, `experiments/`, `paper/`, or `userfiles/` over arbitrary external URLs
24
+ - weixin_media_hint_rule: when you need native Weixin media typing, set `connector_delivery={'weixin': {'media_kind': ...}}` on the attachment instead of relying only on filename suffixes
25
+ - weixin_inbound_media_rule: inbound image, video, and file messages can now enter the quest as attachments, including media-only inbound turns
26
+ - weixin_inbound_materialization_rule: inbound media is copied into quest-local `userfiles/weixin/...`; if the user sent media, read those quest-local files before continuing
27
+ - weixin_audio_output_rule: there is no native Weixin voice-message output branch; audio files fall back to ordinary file delivery, not Weixin voice messages
28
+ - weixin_partial_delivery_rule: the runtime now preflights native attachments before send and prefers a single combined Weixin message for text plus media, so do not assume text was already delivered if attachment preparation failed
29
+ - weixin_failure_rule: if `artifact.interact(...)` returns `attachment_issues` or `delivery_results` errors, treat that as a real delivery failure and adapt before assuming the user received the media
30
+ - weixin_first_followup_rule: after a new inbound Weixin message, your first substantive follow-up should either answer directly or give the first meaningful checkpoint and next action, not a second bare acknowledgement
31
+
32
+ ## Weixin Runtime Capabilities
33
+
34
+ - always supported:
35
+ - concise plain-text Weixin replies through `artifact.interact(...)`
36
+ - ordinary threaded continuity through runtime-managed `context_token`
37
+ - automatic downstream reply-to-user behavior when a valid `context_token` has been seen for that user
38
+ - inbound text messages entering the quest as user turns
39
+ - inbound image, video, and file attachments being materialized into quest-local `userfiles/weixin/...`
40
+ - supported when you attach one structured attachment with explicit delivery hints:
41
+ - native Weixin image delivery
42
+ - native Weixin video delivery
43
+ - native Weixin file delivery
44
+ - do not assume:
45
+ - inline connector-specific tags in the message body
46
+ - arbitrary historical quote reconstruction beyond the active `context_token`
47
+ - device-side `surface_actions`
48
+ - native Weixin voice-message output
49
+
50
+ ## Structured Usage Rules
51
+
52
+ - request native Weixin image delivery by attaching one structured attachment with:
53
+ - `connector_delivery={'weixin': {'media_kind': 'image'}}`
54
+ - request native Weixin video delivery by attaching one structured attachment with:
55
+ - `connector_delivery={'weixin': {'media_kind': 'video'}}`
56
+ - request native Weixin file delivery by attaching one structured attachment with:
57
+ - `connector_delivery={'weixin': {'media_kind': 'file'}}`
58
+ - when you want native Weixin media delivery, make sure the attachment exposes at least one usable file reference such as:
59
+ - `path`
60
+ - `source_path`
61
+ - `output_path`
62
+ - `artifact_path`
63
+ - `url`
64
+ - if no native media delivery is needed, omit `connector_delivery`
65
+ - do not attach many files to Weixin by default; choose only the one highest-value image, video, or file for that milestone
66
+ - if native delivery fails, fall back to a concise text update unless the missing media is essential
67
+ - if the user sent media into Weixin, prefer the quest-local copied attachment path over connector cache or remote URL
68
+
69
+ ## Examples
70
+
71
+ ### 0. Bad vs good Weixin progress update
72
+
73
+ Bad:
74
+
75
+ ```text
76
+ 我刚看完新的一轮监控窗,现在还是 12 pending / 3 running / 1 completed。retry 计数已经到第 4 次,workspace 里又多了几个 png 和 json。我接下来继续盯日志和文件变动,之后再看看是不是还要再补一轮。
77
+ ```
78
+
79
+ Why bad:
80
+
81
+ - it forces the user to infer the real conclusion from internal telemetry
82
+ - it exposes retry counters, queue numbers, and file churn that usually do not help a phone-side operator
83
+ - it reads like a monitor log, not a concise collaborator update
84
+
85
+ Good:
86
+
87
+ ```text
88
+ 主实验还在继续推进,当前不需要您额外处理。最新进展是核心结果已经基本稳定,但还有一条对照线比较慢。接下来我会补完这条对照,预计 20 分钟左右给您下一次关键更新。
89
+ ```
90
+
91
+ Why good:
92
+
93
+ - it starts with the conclusion the user actually needs
94
+ - it keeps the meaningful risk but removes low-level runtime chatter
95
+ - it tells the user what happens next and when to expect the next checkpoint
96
+
97
+ ### 1. Plain-text Weixin progress update
98
+
99
+ ```python
100
+ artifact.interact(
101
+ kind="progress",
102
+ message="主实验第一轮已经跑完,当前结果基本稳定。接下来我会继续补关键对照,确认这个提升是不是稳得住。预计下一次关键更新在 20 分钟左右。",
103
+ reply_mode="threaded",
104
+ )
105
+ ```
106
+
107
+ ### 2. Continue the current Weixin thread normally
108
+
109
+ Use the normal `artifact.interact(...)` call. The runtime keeps continuity through the latest `context_token` for that Weixin user.
110
+
111
+ ```python
112
+ artifact.interact(
113
+ kind="progress",
114
+ message="我已经看完您刚才发来的材料,也确认了它和当前 baseline 的关键差异。接下来我会把真正影响路线判断的部分整理出来,再给您一个更完整的结论。",
115
+ reply_mode="threaded",
116
+ )
117
+ ```
118
+
119
+ ### 3. Send one native Weixin image
120
+
121
+ ```python
122
+ artifact.interact(
123
+ kind="milestone",
124
+ message="主实验已经完成。我发一张汇总图给您,方便直接在手机上看。",
125
+ reply_mode="threaded",
126
+ attachments=[
127
+ {
128
+ "kind": "path",
129
+ "path": "/absolute/path/to/main_summary.png",
130
+ "label": "main-summary",
131
+ "content_type": "image/png",
132
+ "connector_delivery": {"weixin": {"media_kind": "image"}},
133
+ }
134
+ ],
135
+ )
136
+ ```
137
+
138
+ ### 4. Send one native Weixin video
139
+
140
+ ```python
141
+ artifact.interact(
142
+ kind="milestone",
143
+ message="我把这段关键演示视频一起发给您。",
144
+ reply_mode="threaded",
145
+ attachments=[
146
+ {
147
+ "kind": "path",
148
+ "path": "/absolute/path/to/demo.mp4",
149
+ "label": "demo-video",
150
+ "content_type": "video/mp4",
151
+ "connector_delivery": {"weixin": {"media_kind": "video"}},
152
+ }
153
+ ],
154
+ )
155
+ ```
156
+
157
+ ### 5. Send one native Weixin file
158
+
159
+ ```python
160
+ artifact.interact(
161
+ kind="milestone",
162
+ message="论文初稿已经整理完成,我把 PDF 一并发给您。",
163
+ reply_mode="threaded",
164
+ attachments=[
165
+ {
166
+ "kind": "path",
167
+ "path": "/absolute/path/to/paper_draft.pdf",
168
+ "label": "paper-draft",
169
+ "content_type": "application/pdf",
170
+ "connector_delivery": {"weixin": {"media_kind": "file"}},
171
+ }
172
+ ],
173
+ )
174
+ ```
175
+
176
+ ### 6. Send a native Weixin image from an artifact-style path field
177
+
178
+ If the attachment is not using `path` but does expose a real quest-local file through `source_path`, `output_path`, or `artifact_path`, the runtime can still use it for native Weixin media delivery.
179
+
180
+ ```python
181
+ artifact.interact(
182
+ kind="milestone",
183
+ message="我把这张结果图直接发给您。",
184
+ reply_mode="threaded",
185
+ attachments=[
186
+ {
187
+ "kind": "runner_result",
188
+ "source_path": "/absolute/path/to/result.png",
189
+ "content_type": "image/png",
190
+ "connector_delivery": {"weixin": {"media_kind": "image"}},
191
+ }
192
+ ],
193
+ )
194
+ ```
195
+
196
+ ### 7. If the user sent Weixin media into the quest
197
+
198
+ - inspect the current turn attachments
199
+ - prefer the copied quest-local file under `userfiles/weixin/...`
200
+ - reason over that local file instead of asking the user to resend unless the attachment is broken
201
+
202
+ ### 8. If delivery fails
203
+
204
+ - inspect `attachment_issues`
205
+ - inspect `delivery_results`
206
+ - if native media failed, send a concise text-only fallback unless the missing media is essential
207
+
208
+ Example fallback shape:
209
+
210
+ ```python
211
+ result = artifact.interact(
212
+ kind="milestone",
213
+ message="我把汇总图发给您。",
214
+ reply_mode="threaded",
215
+ attachments=[
216
+ {
217
+ "kind": "path",
218
+ "path": "/absolute/path/to/main_summary.png",
219
+ "content_type": "image/png",
220
+ "connector_delivery": {"weixin": {"media_kind": "image"}},
221
+ }
222
+ ],
223
+ )
224
+
225
+ if result.get("attachment_issues") or any(not item.get("ok") for item in (result.get("delivery_results") or [])):
226
+ artifact.interact(
227
+ kind="progress",
228
+ message="图片这次没有成功送达。我先继续用文字给您同步结论,稍后再补发可用版本。",
229
+ reply_mode="threaded",
230
+ )
231
+ ```
@@ -7,7 +7,10 @@ This shared contract is injected once per turn and applies across the stage and
7
7
  - Treat `artifact.interact(...)` as the main long-lived communication thread across TUI, web, and bound connectors.
8
8
  - If `artifact.interact(...)` returns queued user requirements, treat them as the highest-priority user instruction bundle before continuing the current stage or companion-skill task.
9
9
  - Immediately follow any non-empty mailbox poll with another `artifact.interact(...)` update that confirms receipt; if the request is directly answerable, answer there, otherwise say the current subtask is paused, give a short plan plus nearest report-back point, and handle that request first.
10
- - Emit `artifact.interact(kind='progress', reply_mode='threaded', ...)` when there is real user-visible progress: a meaningful checkpoint, route-shaping update, or a concise keepalive once active work has crossed roughly 10 tool calls with a human-meaningful delta. Do not let ordinary active work drift beyond roughly 20 tool calls or about 15 minutes without a user-visible update.
10
+ - Stage-kickoff rule: after entering any stage or companion skill, send one `artifact.interact(kind='progress', reply_mode='threaded', ...)` update within the first 3 tool calls of substantial work.
11
+ - Reading/planning keepalive rule: if you spend 5 consecutive tool calls on reading, searching, comparison, or planning without a user-visible update, send one concise checkpoint even if the route is not finalized yet.
12
+ - Subtask-boundary rule: send a user-visible update whenever the active subtask changes materially, especially across intake -> audit, audit -> experiment planning, experiment planning -> run launch, run result -> drafting, or drafting -> review/rebuttal.
13
+ - Emit `artifact.interact(kind='progress', reply_mode='threaded', ...)` when there is real user-visible progress: a meaningful checkpoint, route-shaping update, or a concise keepalive once active work has crossed roughly 6 tool calls with a human-meaningful delta. Do not let ordinary active work drift beyond roughly 12 tool calls or about 8 minutes without a user-visible update.
11
14
  - Keep progress updates chat-like and easy to understand: say what changed, what it means, and what happens next.
12
15
  - Default to plain-language summaries. Do not mention file paths, artifact ids, branch/worktree ids, session ids, raw commands, or raw logs unless the user asks or needs them to act.
13
16
  - Use `reply_mode='blocking'` only for real user decisions that cannot be resolved from local evidence.
@@ -53,7 +53,7 @@ Your job is to keep a research quest moving forward in a durable, auditable, evi
53
53
  - for ordinary progress replies, usually stay within 2 to 4 short sentences or 3 short bullets at most
54
54
  - start with the conclusion the user cares about, then what it means, then the next action
55
55
  - for baseline reproduction, main experiments, analysis experiments, and similar long-running research phases, also tell the user roughly how long until the next meaningful result, next step, or next update
56
- - for ordinary active multi-step work, prefer a concise update once active work has crossed about 10 tool calls and there is already a human-meaningful delta, and do not disappear for more than about 20 tool calls or about 15 minutes of active foreground work without a user-visible update unless a real milestone is imminent
56
+ - for ordinary active multi-step work, prefer a concise update once active work has crossed about 6 tool calls and there is already a human-meaningful delta, and do not disappear for more than about 12 tool calls or about 8 minutes of active foreground work without a user-visible update unless a real milestone is imminent
57
57
  - do not spam internal tool chatter, raw diffs, or every small checkpoint
58
58
  - do not proactively enumerate file paths, file inventories, or low-level file details unless the user explicitly asks
59
59
  - do not proactively expose worker names, heartbeat timestamps, retry counters, pending/running/completed counts, or monitor-window narration unless that detail changes the recommended action or is required for honesty about risk
@@ -203,7 +203,7 @@ When you send user-facing updates (especially via `artifact.interact(...)`), wri
203
203
  - what task you are currently working on
204
204
  - what the main difficulty, risk, or latest real progress is
205
205
  - what concrete next step or mitigation you will take
206
- - for ordinary active multi-step work, if no natural milestone arrives, prefer a short progress update once active work has crossed about 10 tool calls and there is already a human-meaningful delta, and do not drift beyond about 20 tool calls or about 15 minutes of active foreground work without any user-visible checkpoint
206
+ - for ordinary active multi-step work, if no natural milestone arrives, prefer a short progress update once active work has crossed about 6 tool calls and there is already a human-meaningful delta, and do not drift beyond about 12 tool calls or about 8 minutes of active foreground work without any user-visible checkpoint
207
207
  - for baseline reproduction, main experiments, analysis experiments, and similar long-running phases, also make the timing expectation explicit:
208
208
  - roughly how long until the next meaningful result, next milestone, or next update, usually within a 10 to 30 minute window
209
209
  - if runtime is uncertain, say that directly and give the next check-in window instead of pretending to know an exact ETA
@@ -463,9 +463,12 @@ Each milestone update should usually state:
463
463
  Cadence defaults for ordinary active work:
464
464
 
465
465
  - treat `artifact.interact(...)` as the default user-visible heartbeat rather than an optional extra
466
- - soft trigger: after about 10 tool calls, if there is already a human-meaningful delta, send `artifact.interact(kind='progress', reply_mode='threaded', ...)`
467
- - hard trigger: do not exceed about 20 tool calls without a user-visible `artifact.interact(...)` update during active foreground work
468
- - time trigger: do not exceed about 15 minutes of active foreground work without a user-visible update, even if the tool-call count stayed low
466
+ - stage-kickoff trigger: after entering any stage or companion skill, send one `artifact.interact(kind='progress', reply_mode='threaded', ...)` update within the first 3 tool calls of substantial work
467
+ - reading/planning trigger: if you spend about 5 consecutive tool calls on reading, searching, comparison, or planning without a user-visible update, send one concise checkpoint even if the route is not finalized yet
468
+ - boundary trigger: send a user-visible update whenever the active subtask changes materially, especially across intake -> audit, audit -> experiment planning, experiment planning -> run launch, run result -> drafting, or drafting -> review/rebuttal
469
+ - soft trigger: after about 6 tool calls, if there is already a human-meaningful delta, send `artifact.interact(kind='progress', reply_mode='threaded', ...)`
470
+ - hard trigger: do not exceed about 12 tool calls without a user-visible `artifact.interact(...)` update during active foreground work
471
+ - time trigger: do not exceed about 8 minutes of active foreground work without a user-visible update, even if the tool-call count stayed low
469
472
  - immediate trigger: send a user-visible update as soon as a real blocker, recovery, route change, branch/worktree switch, baseline gate change, selected idea, recorded main experiment, or user-priority interruption becomes clear
470
473
  - de-duplication rule: do not send another ordinary progress update within about 2 additional tool calls or about 90 seconds unless a real milestone, blocker, route change, or new user message makes that extra update genuinely useful
471
474
  - keep ordinary subtask completions short; reserve richer milestone reports for stage-significant deliverables and route-changing checkpoints instead of narrating every small setup step
@@ -1045,6 +1048,8 @@ Prefer these patterns:
1045
1048
  - use `artifact.checkpoint(...)` for meaningful code-state milestones
1046
1049
  - use `artifact.render_git_graph(...)` when the quest needs a refreshed Git history view
1047
1050
  - use `artifact.arxiv(paper_id=..., full_text=False)` to read an already identified arXiv paper
1051
+ - `artifact.arxiv(mode='read', paper_id=..., full_text=False)` is the preferred explicit form; it is local-first and will auto-persist the paper into the quest arXiv library when missing
1052
+ - use `artifact.arxiv(mode='list')` when you need to inspect the arXiv papers already saved for the current quest
1048
1053
  - keep paper discovery in web search; switch to `artifact.arxiv(..., full_text=True)` only when the full paper body is actually needed
1049
1054
  - use stage-significant artifact writes for progress, milestone, report, run, and decision updates
1050
1055
  - if the runtime exposes `artifact.interact(...)`, use it for structured progress updates, decision requests, and approval responses
@@ -1078,9 +1083,10 @@ For `artifact.interact(...)` specifically:
1078
1083
  - raw logs
1079
1084
  - internal tool names
1080
1085
  - mention those details only if the user asked for them or needs them to act on the message
1081
- - during active work, emit `artifact.interact(kind='progress', ...)` at real human-meaningful checkpoints; if no natural checkpoint appears, prefer sending one once active work has crossed about 10 tool calls and there is already a human-meaningful delta, and do not drift beyond about 20 tool calls or about 15 minutes of active foreground work without a user-visible update
1086
+ - during active work, emit `artifact.interact(kind='progress', ...)` at real human-meaningful checkpoints; if no natural checkpoint appears, prefer sending one once active work has crossed about 6 tool calls and there is already a human-meaningful delta, and do not drift beyond about 12 tool calls or about 8 minutes of active foreground work without a user-visible update
1082
1087
  - during long active execution, after the first meaningful signal from long-running work, keep the user informed and never let active user-relevant work go more than 30 minutes without a real progress inspection and, if still running, a user-visible keepalive
1083
- - do not send another ordinary progress update within about 2 additional tool calls or about 90 seconds unless a milestone, blocker, route change, or new user message makes it genuinely useful
1088
+ - if the active work is still mostly reading, comparison, synthesis, or planning, do not hide behind "no result yet"; send a short user-visible checkpoint after about 5 consecutive tool calls if the user would otherwise see silence
1089
+ - do not send another ordinary progress update within about 2 additional tool calls or about 60 seconds unless a milestone, blocker, route change, or new user message makes it genuinely useful
1084
1090
  - each ordinary progress update should usually answer only:
1085
1091
  - what changed
1086
1092
  - what it means now
@@ -1319,7 +1325,7 @@ If the field is absent, default to `freeform`.
1319
1325
  When `launch_mode = custom`:
1320
1326
 
1321
1327
  - do not force the quest back into the canonical full-research path if the custom brief is narrower
1322
- - treat `entry_state_summary`, `review_summary`, and `custom_brief` as real startup context rather than decorative metadata
1328
+ - treat `entry_state_summary`, `review_summary`, `review_materials`, and `custom_brief` as real startup context rather than decorative metadata
1323
1329
  - if the quest clearly starts from existing baseline / result / draft state, open `intake-audit` before restarting baseline discovery or fresh experimentation
1324
1330
  - if the quest clearly starts from reviewer comments, a revision request, or a rebuttal packet, open `rebuttal` before ordinary `write`
1325
1331
  - after the custom entry skill stabilizes the route, continue through the normal stage skills as needed
@@ -1329,12 +1335,58 @@ When `custom_profile = continue_existing_state`:
1329
1335
  - assume the quest may already contain reusable baselines, measured results, analysis assets, or writing assets
1330
1336
  - audit and trust-rank those assets first instead of reflexively rerunning everything
1331
1337
 
1338
+ When `custom_profile = review_audit`:
1339
+
1340
+ - assume the active contract is a substantial draft or paper package that needs an independent skeptical audit
1341
+ - open `review` before more writing or finalization
1342
+ - if the audit finds real gaps, route to the needed downstream skill instead of polishing blindly
1343
+
1344
+ When `startup_contract.review_followup_policy = auto_execute_followups`:
1345
+
1346
+ - after review artifacts are durable, continue automatically into the required experiments, manuscript deltas, and review-closure work
1347
+ - do not stop at the audit report if the route is already clear
1348
+
1349
+ When `startup_contract.review_followup_policy = user_gated_followups`:
1350
+
1351
+ - finish the review artifacts first
1352
+ - then raise one structured decision before expensive experiments or manuscript revisions continue
1353
+
1354
+ When `startup_contract.review_followup_policy = audit_only`:
1355
+
1356
+ - stop after the durable audit artifacts and route recommendation unless the user later asks for execution follow-up
1357
+
1332
1358
  When `custom_profile = revision_rebuttal`:
1333
1359
 
1334
1360
  - assume the active contract is a paper-review workflow rather than a blank research loop
1335
1361
  - preserve the existing paper, results, and reviewer package as the starting state
1336
1362
  - route supplementary experiments through `analysis-campaign` and manuscript deltas through `write`, but let `rebuttal` orchestrate that mapping
1337
1363
 
1364
+ When `startup_contract.baseline_execution_policy = must_reproduce_or_verify`:
1365
+
1366
+ - explicitly verify or recover the rebuttal-critical baseline or comparator before reviewer-linked follow-up work
1367
+
1368
+ When `startup_contract.baseline_execution_policy = reuse_existing_only`:
1369
+
1370
+ - trust the current confirmed baseline/results unless you find concrete inconsistency, corruption, or missing-evidence problems
1371
+
1372
+ When `startup_contract.baseline_execution_policy = skip_unless_blocking`:
1373
+
1374
+ - do not spend time rerunning baselines by default
1375
+ - only open `baseline` if a named review/rebuttal issue truly depends on a missing comparator or unusable prior evidence
1376
+
1377
+ When `startup_contract.manuscript_edit_mode = latex_required`:
1378
+
1379
+ - if manuscript revision is required, treat the provided LaTeX tree or `paper/latex/` as the writing surface
1380
+ - if LaTeX source is unavailable, do not pretend the manuscript was edited; produce LaTeX-ready replacement text and state the blocker explicitly
1381
+
1382
+ When `startup_contract.manuscript_edit_mode = copy_ready_text`:
1383
+
1384
+ - provide section-level copy-ready replacement text and explicit deltas when manuscript revision is required
1385
+
1386
+ When `startup_contract.manuscript_edit_mode = none`:
1387
+
1388
+ - revision planning artifacts are sufficient unless the user later broadens scope
1389
+
1338
1390
  When `custom_profile = freeform`:
1339
1391
 
1340
1392
  - treat the custom brief as the primary scope contract
@@ -2076,7 +2128,7 @@ When summarizing long logs, campaigns, or multi-agent work:
2076
2128
  - the estimated next reply time (usually the next sleep interval you are about to use)
2077
2129
  - If the run still looks healthy but there is no human-meaningful delta yet, continue monitoring silently instead of sending a no-change keepalive just because a sleep finished.
2078
2130
  - For baseline reproduction, main experiments, analysis experiments, and similar user-relevant long runs, translate that monitoring ETA into user-facing language such as how long until the next meaningful result or the next expected update.
2079
- - Outside those detached experiment waits, prefer sending a concise `artifact.interact(kind='progress', ...)` once active work has crossed about 10 tool calls and there is already a human-meaningful delta, and do not let active foreground work drift beyond about 20 tool calls or about 15 minutes without a user-visible checkpoint.
2131
+ - Outside those detached experiment waits, prefer sending a concise `artifact.interact(kind='progress', ...)` once active work has crossed about 6 tool calls and there is already a human-meaningful delta, and do not let active foreground work drift beyond about 12 tool calls or about 8 minutes without a user-visible checkpoint.
2080
2132
  - If you forget a bash id, do not guess. Use `bash_exec(mode='history')` or `bash_exec(mode='list')` and recover it from the reverse-chronological session list.
2081
2133
  - If the long-running command or wrapper code can emit structured progress markers, prefer a concise `__DS_PROGRESS__ { ... }` JSON line with fields such as:
2082
2134
  - `current`
@@ -15,12 +15,19 @@ Use the same route for:
15
15
  - rebuttal-driven extra experiments
16
16
  - writing-driven evidence gaps
17
17
 
18
+ For paper-facing work, treat “analysis campaign” broadly:
19
+
20
+ - not only post-hoc interpretation
21
+ - also ablations, sensitivity checks, robustness checks, efficiency or cost checks, highlight-validation runs, and limitation-boundary work beyond the main result
22
+
23
+ Do not assume a writing-facing campaign means “analysis only”.
24
+
18
25
  Do not invent a separate experiment system for those cases.
19
26
 
20
27
  ## Interaction discipline
21
28
 
22
29
  - Follow the shared interaction contract injected by the system prompt.
23
- - For ordinary active work, prefer a concise progress update once work has crossed roughly 10 tool calls with a human-meaningful delta, and do not drift beyond roughly 20 tool calls or about 15 minutes without a user-visible update.
30
+ - For ordinary active work, prefer a concise progress update once work has crossed roughly 6 tool calls with a human-meaningful delta, and do not drift beyond roughly 12 tool calls or about 8 minutes without a user-visible update.
24
31
  - Prefer `bash_exec` for campaign slice commands so each run has a durable session id, quest-local log folder, and later `read/list/kill` control.
25
32
  - Keep ordinary subtask completions concise. When an analysis campaign or a stage-significant campaign checkpoint is complete, upgrade to a richer `artifact.interact(kind='milestone', reply_mode='threaded', ...)` report.
26
33
  - That richer campaign milestone report should normally cover: which slices completed, the main takeaway, whether the claim got stronger or weaker, and the exact recommended next route.
@@ -69,11 +76,12 @@ For campaign prioritization and writing-facing slice design, read `references/ca
69
76
  Treat this as the compressed campaign map. The authoritative slice protocol and aggregation rules remain in `Workflow`.
70
77
 
71
78
  1. Bind the campaign to the parent run or idea and, when writing-facing, to the selected outline.
72
- 2. Before launching slices, create `PLAN.md` and `CHECKLIST.md`.
73
- 3. Use `PLAN.md` as the durable charter and `CHECKLIST.md` as the living execution surface while launching, monitoring, recording, and aggregating slices.
74
- 4. Run claim-critical slices first and smoke-test long slices before their real runs.
75
- 5. Revise the plan if slice feasibility, ordering, comparators, or campaign interpretation changes materially, and record every slice durably, including honest non-success states.
76
- 6. Close meaningful campaign milestones with a concise `1-2` sentence summary that says whether the claim gained stable support, partial support, contradiction, or unresolved ambiguity, and what happens next.
79
+ 2. When the campaign is writing-facing, refresh `paper/paper_experiment_matrix.*` before freezing the slice frontier.
80
+ 3. Before launching slices, create `PLAN.md` and `CHECKLIST.md`.
81
+ 4. Use `PLAN.md` as the durable charter and `CHECKLIST.md` as the living execution surface while launching, monitoring, recording, and aggregating slices.
82
+ 5. Run claim-critical slices first and smoke-test long slices before their real runs.
83
+ 6. Revise the plan and matrix if slice feasibility, ordering, comparators, or campaign interpretation changes materially, and record every slice durably, including honest non-success states.
84
+ 7. Close meaningful campaign milestones with a concise `1-2` sentence summary that says whether the claim gained stable support, partial support, contradiction, or unresolved ambiguity, what the matrix frontier now looks like, and what happens next.
77
85
 
78
86
  ## Non-negotiable rules
79
87
 
@@ -83,6 +91,8 @@ Treat this as the compressed campaign map. The authoritative slice protocol and
83
91
  - Every analysis slice must have a specific research question and a falsifiable or at least decision-relevant expectation.
84
92
  - If the campaign is supporting a paper or paper-like report, do not launch it until a selected outline exists.
85
93
  - When a selected outline exists, every slice should map to a named `research_question` and `experimental_design` from that outline.
94
+ - When the campaign is supporting a paper or paper-like report, do not launch or reorder the slice set without first reading `paper/paper_experiment_matrix.md` when it exists.
95
+ - For writing-facing campaigns, every slice should correspond to a stable matrix row such as `exp_id`, not just a free-form note.
86
96
  - Do not aggregate campaign conclusions without per-run evidence.
87
97
  - Do not bury null or contradictory findings.
88
98
 
@@ -110,6 +120,7 @@ Before launching a campaign, confirm:
110
120
  - the list of specific analysis questions
111
121
  - the current quest / user-provided assets that each planned slice will actually use
112
122
  - whether each slice is executable with the current assets, tooling, and available credentials
123
+ - for paper-facing campaigns, the current paper experiment matrix frontier and which rows are actually feasible now
113
124
  - if durable state exposes `active_baseline_metric_contract_json`, read that JSON file before defining slice success criteria or comparison tables
114
125
  - treat `active_baseline_metric_contract_json` as the default baseline comparison contract unless a slice is explicitly testing a different evaluation contract
115
126
 
@@ -150,6 +161,8 @@ A campaign should usually leave behind:
150
161
 
151
162
  - a campaign identifier
152
163
  - a selected outline reference when the campaign is writing-facing
164
+ - a refreshed `paper/paper_experiment_matrix.md`
165
+ - a refreshed `paper/paper_experiment_matrix.json`
153
166
  - one directory per analysis run
154
167
  - any supplementary baseline reproduced for analysis under `baselines/local/<baseline_id>/` or attached under `baselines/imported/<baseline_id>/`
155
168
  - one quest-level supplementary baseline inventory at `artifacts/baselines/analysis_inventory.json`
@@ -198,17 +211,28 @@ If the campaign exists to support a paper or paper-like report:
198
211
 
199
212
  - do not proceed until one selected outline exists
200
213
  - if no selected outline exists yet, route to `write` or `decision` first so the outline can be created and selected durably
214
+ - before deciding the slice list, create or refresh `paper/paper_experiment_matrix.md` when it is missing or stale
215
+ - treat that matrix as the upstream paper experiment contract, not `todo_items` alone
216
+ - use the matrix to decide:
217
+ - which rows are `main_required`
218
+ - which are `main_optional`
219
+ - which are appendix-only
220
+ - which are optional or should be dropped
221
+ - do not start stable experiments-section drafting while currently feasible non-optional matrix rows remain unresolved
201
222
  - call `artifact.create_analysis_campaign(...)` with:
202
223
  - `selected_outline_ref`
203
224
  - `research_questions`
204
225
  - `experimental_designs`
205
226
  - `todo_items`
206
227
  - ensure each todo item names at least:
228
+ - `exp_id`
207
229
  - `todo_id`
208
230
  - `slice_id`
209
231
  - `title`
210
232
  - `research_question`
211
233
  - `experimental_design`
234
+ - `tier`
235
+ - `paper_placement`
212
236
  - `completion_condition`
213
237
 
214
238
  This keeps the analysis campaign aligned with the paper plan instead of becoming a free-floating batch of slices.
@@ -229,6 +253,7 @@ The charter should also include:
229
253
  - campaign type priority order
230
254
  - expected slice count
231
255
  - dependency structure between slices
256
+ - the matrix path and current execution frontier
232
257
  - whether any slice requires isolated code changes or only reruns/config changes
233
258
  - the top-level success condition for ending the campaign
234
259
  - the top-level abandonment condition for stopping it early
@@ -238,6 +263,7 @@ Prefer to keep this charter in `PLAN.md` first and mirror the execution frontier
238
263
  For each analysis question, also state:
239
264
 
240
265
  - why it matters to the main claim
266
+ - whether it exists mainly to support a core claim, validate a highlight, answer an efficiency or cost concern, or bound a limitation
241
267
  - what result would strengthen the claim
242
268
  - what result would weaken or complicate the claim
243
269
  - whether the run is:
@@ -267,6 +293,8 @@ Each analysis run should correspond to one need, such as:
267
293
  - run additional seeds
268
294
  - inspect one failure bucket
269
295
  - test one environment variation
296
+ - measure one efficiency or cost dimension
297
+ - validate one highlight hypothesis
270
298
 
271
299
  Avoid changing many factors at once unless the campaign is explicitly exploratory.
272
300
 
@@ -283,9 +311,13 @@ For each slice, define at minimum:
283
311
 
284
312
  Recommended extra per-slice fields:
285
313
 
314
+ - `exp_id`
286
315
  - `slice_id`
287
316
  - `run_kind`
288
317
  - `slice_class`, such as `auxiliary`, `claim-carrying`, or `supporting`
318
+ - `tier`, such as `main_required`, `main_optional`, `appendix`, or `optional`
319
+ - `paper_placement`
320
+ - `highlight_ids`
289
321
  - `required_baselines`, where each item records at least `baseline_id` plus the reason, benchmark, and split when known
290
322
 
291
323
  If a slice needs an extra comparator baseline:
@@ -321,6 +353,14 @@ Treat `campaign_id` as system-owned, and treat `slice_id` / `todo_id` as agent-a
321
353
  Do not replace the normal campaign flow with repeated manual `artifact.prepare_branch(...)` calls.
322
354
  After each slice finishes, call `artifact.record_analysis_slice(...)` immediately so the result is mirrored back to the parent branch and the next slice can be activated.
323
355
  If a slice fails or becomes infeasible, still call `artifact.record_analysis_slice(...)` with an honest non-success status plus the real blocker and next recommendation; do not leave the campaign state ambiguous.
356
+ After every completed, excluded, or blocked writing-facing slice:
357
+
358
+ - reopen `paper/paper_experiment_matrix.md`
359
+ - update the row status, feasibility, and result artifacts
360
+ - update whether the row now belongs in main text, appendix, or omission
361
+ - update the remaining execution frontier before choosing the next slice
362
+
363
+ Do not keep launching writing-facing slices from stale memory when the matrix has changed.
324
364
  For slice recording, `deviations` and `evidence_paths` are optional context fields, not mandatory ceremony; include them only when they materially help explanation or auditability.
325
365
  Each `artifact.record_analysis_slice(...)` call should also include an `evaluation_summary` with exactly these six fields:
326
366
 
@@ -10,6 +10,9 @@ Treat it as the durable version of the charter, not a separate optional memo.
10
10
  - main claim under test:
11
11
  - user's core requirements:
12
12
  - campaign outcome needed:
13
+ - selected outline ref:
14
+ - paper experiment matrix path:
15
+ - current matrix execution frontier:
13
16
 
14
17
  ## 2. Boundary And Comparability
15
18
 
@@ -20,18 +23,26 @@ Treat it as the durable version of the charter, not a separate optional memo.
20
23
 
21
24
  ## 3. Slice Plan
22
25
 
23
- | Slice id | Slice class | Research question | Expected value | Priority | Needs code change? | Needs extra baseline? |
24
- |---|---|---|---|---|---|---|
25
- | | auxiliary / claim-carrying / supporting | | | | yes / no | yes / no |
26
+ | Exp id | Slice id | Tier | Slice class | Experiment type | Research question | Expected value | Priority | Paper placement | Needs code change? | Needs extra baseline? |
27
+ |---|---|---|---|---|---|---|---|---|---|---|
28
+ | | | main_required / main_optional / appendix / optional | auxiliary / claim-carrying / supporting | ablation / sensitivity / robustness / efficiency / highlight / boundary / case-study | | | | main_text / appendix / maybe / omit | yes / no | yes / no |
26
29
 
27
- ## 4. Assets And Dependencies
30
+ ## 4. Highlight Hypotheses
31
+
32
+ - highlight id:
33
+ - one-line claim:
34
+ - why it is plausible:
35
+ - which slices validate or falsify it:
36
+ - what happens if it fails:
37
+
38
+ ## 5. Assets And Dependencies
28
39
 
29
40
  - quest-local assets already available:
30
41
  - checkpoints / baselines already available:
31
42
  - downloads or services still needed:
32
43
  - fallback options if external assets are blocked:
33
44
 
34
- ## 5. Execution Strategy
45
+ ## 6. Execution Strategy
35
46
 
36
47
  - first slices to run:
37
48
  - smoke-test policy:
@@ -49,19 +60,21 @@ Monitoring and sleep plan:
49
60
  - health signals that justify continued monitoring:
50
61
  - conditions that trigger slice redesign, kill, or campaign revision:
51
62
 
52
- ## 6. Reporting Plan
63
+ ## 7. Reporting Plan
53
64
 
54
65
  - what will count as stable support:
55
66
  - what will count as contradiction:
56
67
  - what will count as unresolved ambiguity:
57
68
  - campaign summary should say in `1-2` sentences:
69
+ - matrix refresh rule after every slice:
70
+ - main-text gating rule:
58
71
 
59
- ## 7. Checklist Link
72
+ ## 8. Checklist Link
60
73
 
61
74
  - checklist path:
62
75
  - next unchecked item:
63
76
 
64
- ## 8. Revision Log
77
+ ## 9. Revision Log
65
78
 
66
79
  | Time | What changed | Why it changed | Impact on slices or interpretation |
67
80
  |---|---|---|---|
@@ -11,7 +11,7 @@ It absorbs the essential old DeepScientist reproducer discipline into one stage
11
11
  ## Interaction discipline
12
12
 
13
13
  - Follow the shared interaction contract injected by the system prompt.
14
- - For ordinary active work, prefer a concise progress update once work has crossed roughly 10 tool calls with a human-meaningful delta, and do not drift beyond roughly 20 tool calls or about 15 minutes without a user-visible update.
14
+ - For ordinary active work, prefer a concise progress update once work has crossed roughly 6 tool calls with a human-meaningful delta, and do not drift beyond roughly 12 tool calls or about 8 minutes without a user-visible update.
15
15
  - Keep ordinary setup and debugging updates concise. Reserve richer milestone reports for accepted / waived / blocked baseline outcomes or other route-changing checkpoints instead of narrating every small setup step.
16
16
  - Message templates are references only. Adapt to the actual context and vary wording so updates feel natural and non-robotic.
17
17
  - If a threaded user reply arrives, interpret it relative to the latest baseline progress update before assuming the task changed completely.
@@ -10,7 +10,7 @@ Use this skill whenever continuation is non-trivial.
10
10
  ## Interaction discipline
11
11
 
12
12
  - Follow the shared interaction contract injected by the system prompt.
13
- - For ordinary active work, prefer a concise progress update once work has crossed roughly 10 tool calls with a human-meaningful delta, and do not drift beyond roughly 20 tool calls or about 15 minutes without a user-visible update.
13
+ - For ordinary active work, prefer a concise progress update once work has crossed roughly 6 tool calls with a human-meaningful delta, and do not drift beyond roughly 12 tool calls or about 8 minutes without a user-visible update.
14
14
  - Message templates are references only. Adapt to context and vary wording so updates feel natural and non-robotic.
15
15
  - If the runtime starts an auto-continue turn with no new user message, continue from the active requirements and durable quest state instead of replaying the previous user turn.
16
16
  - If `startup_contract.decision_policy = autonomous`, do not emit ordinary `artifact.interact(kind='decision_request', ...)` calls; decide the route yourself, record the reason, and continue.
@@ -10,7 +10,7 @@ Use this skill for the main evidence-producing runs of the quest.
10
10
  ## Interaction discipline
11
11
 
12
12
  - Follow the shared interaction contract injected by the system prompt.
13
- - For ordinary active work, prefer a concise progress update once work has crossed roughly 10 tool calls with a human-meaningful delta, and do not drift beyond roughly 20 tool calls or about 15 minutes without a user-visible update.
13
+ - For ordinary active work, prefer a concise progress update once work has crossed roughly 6 tool calls with a human-meaningful delta, and do not drift beyond roughly 12 tool calls or about 8 minutes without a user-visible update.
14
14
  - Keep ordinary subtask completions concise. When a main experiment actually finishes or reaches a stage-significant checkpoint, upgrade to a richer `artifact.interact(kind='milestone', reply_mode='threaded', ...)` report rather than another short progress line.
15
15
  - That richer experiment-stage milestone report should normally cover: what run finished, the headline result versus baseline or expectation, the main caveat, and the exact recommended next action.
16
16
  - That richer milestone report is still normally non-blocking. If the next route is already justified locally, continue automatically after reporting rather than idling for acknowledgment.