@researai/deepscientist 1.5.14 → 1.5.16
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +336 -90
- package/assets/branding/logo-raster.png +0 -0
- package/bin/ds.js +816 -131
- package/docs/en/00_QUICK_START.md +36 -15
- package/docs/en/01_SETTINGS_REFERENCE.md +53 -4
- package/docs/en/02_START_RESEARCH_GUIDE.md +7 -0
- package/docs/en/03_QQ_CONNECTOR_GUIDE.md +19 -0
- package/docs/en/05_TUI_GUIDE.md +6 -0
- package/docs/en/06_RUNTIME_AND_CANVAS.md +4 -3
- package/docs/en/09_DOCTOR.md +11 -5
- package/docs/en/10_WEIXIN_CONNECTOR_GUIDE.md +20 -0
- package/docs/en/14_PROMPT_SKILLS_AND_MCP_GUIDE.md +65 -13
- package/docs/en/15_CODEX_PROVIDER_SETUP.md +25 -8
- package/docs/en/16_TELEGRAM_CONNECTOR_GUIDE.md +134 -0
- package/docs/en/17_WHATSAPP_CONNECTOR_GUIDE.md +126 -0
- package/docs/en/18_FEISHU_CONNECTOR_GUIDE.md +136 -0
- package/docs/en/19_EXTERNAL_CONTROLLER_GUIDE.md +226 -0
- package/docs/en/19_LOCAL_BROWSER_AUTH.md +70 -0
- package/docs/en/20_WORKSPACE_MODES_GUIDE.md +250 -0
- package/docs/en/README.md +24 -0
- package/docs/zh/00_QUICK_START.md +36 -15
- package/docs/zh/01_SETTINGS_REFERENCE.md +53 -4
- package/docs/zh/02_START_RESEARCH_GUIDE.md +7 -0
- package/docs/zh/03_QQ_CONNECTOR_GUIDE.md +19 -0
- package/docs/zh/05_TUI_GUIDE.md +6 -0
- package/docs/zh/09_DOCTOR.md +11 -5
- package/docs/zh/10_WEIXIN_CONNECTOR_GUIDE.md +20 -0
- package/docs/zh/14_PROMPT_SKILLS_AND_MCP_GUIDE.md +65 -13
- package/docs/zh/15_CODEX_PROVIDER_SETUP.md +25 -8
- package/docs/zh/16_TELEGRAM_CONNECTOR_GUIDE.md +134 -0
- package/docs/zh/17_WHATSAPP_CONNECTOR_GUIDE.md +126 -0
- package/docs/zh/18_FEISHU_CONNECTOR_GUIDE.md +136 -0
- package/docs/zh/19_EXTERNAL_CONTROLLER_GUIDE.md +226 -0
- package/docs/zh/19_LOCAL_BROWSER_AUTH.md +68 -0
- package/docs/zh/20_WORKSPACE_MODES_GUIDE.md +251 -0
- package/docs/zh/README.md +24 -0
- package/install.sh +2 -0
- package/package.json +1 -1
- package/pyproject.toml +1 -1
- package/src/deepscientist/__init__.py +1 -1
- package/src/deepscientist/acp/envelope.py +6 -0
- package/src/deepscientist/artifact/charts.py +567 -0
- package/src/deepscientist/artifact/guidance.py +50 -10
- package/src/deepscientist/artifact/metrics.py +228 -5
- package/src/deepscientist/artifact/schemas.py +3 -0
- package/src/deepscientist/artifact/service.py +4276 -308
- package/src/deepscientist/bash_exec/models.py +23 -0
- package/src/deepscientist/bash_exec/monitor.py +147 -67
- package/src/deepscientist/bash_exec/runtime.py +218 -156
- package/src/deepscientist/bash_exec/service.py +309 -69
- package/src/deepscientist/bash_exec/shells.py +87 -0
- package/src/deepscientist/bridges/connectors.py +51 -2
- package/src/deepscientist/cli.py +115 -19
- package/src/deepscientist/codex_cli_compat.py +232 -0
- package/src/deepscientist/config/models.py +8 -4
- package/src/deepscientist/config/service.py +38 -11
- package/src/deepscientist/connector/weixin_support.py +122 -1
- package/src/deepscientist/daemon/api/handlers.py +199 -9
- package/src/deepscientist/daemon/api/router.py +5 -0
- package/src/deepscientist/daemon/app.py +1458 -289
- package/src/deepscientist/doctor.py +51 -0
- package/src/deepscientist/file_lock.py +48 -0
- package/src/deepscientist/gitops/__init__.py +10 -1
- package/src/deepscientist/gitops/diff.py +296 -1
- package/src/deepscientist/gitops/service.py +4 -1
- package/src/deepscientist/mcp/server.py +212 -5
- package/src/deepscientist/process_control.py +161 -0
- package/src/deepscientist/prompts/builder.py +501 -453
- package/src/deepscientist/quest/layout.py +15 -2
- package/src/deepscientist/quest/service.py +2539 -195
- package/src/deepscientist/quest/stage_views.py +177 -1
- package/src/deepscientist/runners/base.py +2 -0
- package/src/deepscientist/runners/codex.py +169 -31
- package/src/deepscientist/runners/runtime_overrides.py +17 -1
- package/src/deepscientist/skills/__init__.py +2 -2
- package/src/deepscientist/skills/installer.py +196 -5
- package/src/deepscientist/skills/registry.py +66 -0
- package/src/prompts/connectors/qq.md +18 -8
- package/src/prompts/connectors/weixin.md +16 -6
- package/src/prompts/contracts/shared_interaction.md +24 -4
- package/src/prompts/system.md +921 -72
- package/src/prompts/system_copilot.md +43 -0
- package/src/skills/analysis-campaign/SKILL.md +32 -2
- package/src/skills/analysis-campaign/references/artifact-orchestration.md +1 -1
- package/src/skills/analysis-campaign/references/writing-facing-slice-examples.md +65 -0
- package/src/skills/baseline/SKILL.md +10 -0
- package/src/skills/decision/SKILL.md +27 -2
- package/src/skills/experiment/SKILL.md +16 -2
- package/src/skills/figure-polish/SKILL.md +1 -0
- package/src/skills/finalize/SKILL.md +19 -0
- package/src/skills/idea/SKILL.md +79 -0
- package/src/skills/idea/references/idea-generation-playbook.md +100 -0
- package/src/skills/idea/references/outline-seeding-example.md +60 -0
- package/src/skills/intake-audit/SKILL.md +9 -1
- package/src/skills/mentor/SKILL.md +217 -0
- package/src/skills/mentor/references/correction-rules.md +210 -0
- package/src/skills/mentor/references/knowledge-profile.md +91 -0
- package/src/skills/mentor/references/persona-profile.md +138 -0
- package/src/skills/mentor/references/taste-profile.md +128 -0
- package/src/skills/mentor/references/thought-style-profile.md +138 -0
- package/src/skills/mentor/references/work-profile.md +289 -0
- package/src/skills/mentor/references/workflow-profile.md +240 -0
- package/src/skills/optimize/SKILL.md +1645 -0
- package/src/skills/rebuttal/SKILL.md +3 -1
- package/src/skills/review/SKILL.md +3 -1
- package/src/skills/scout/SKILL.md +8 -0
- package/src/skills/write/SKILL.md +81 -12
- package/src/skills/write/references/outline-evidence-contract-example.md +107 -0
- package/src/tui/dist/app/AppContainer.js +22 -11
- package/src/tui/dist/index.js +4 -1
- package/src/tui/dist/lib/api.js +33 -3
- package/src/tui/package.json +1 -1
- package/src/ui/dist/assets/AiManusChatView-COFACy7V.js +204 -0
- package/src/ui/dist/assets/AnalysisPlugin-DnSm0GZn.js +1 -0
- package/src/ui/dist/assets/CliPlugin-CvwCmDQ5.js +109 -0
- package/src/ui/dist/assets/CodeEditorPlugin-cOqSa0xq.js +2 -0
- package/src/ui/dist/assets/CodeViewerPlugin-itb0tltR.js +270 -0
- package/src/ui/dist/assets/DocViewerPlugin-DqKkiCI6.js +7 -0
- package/src/ui/dist/assets/GitCommitViewerPlugin-DVgNHBCS.js +1 -0
- package/src/ui/dist/assets/GitDiffViewerPlugin-DxL2ezFG.js +6 -0
- package/src/ui/dist/assets/GitSnapshotViewer-B_RQm1YZ.js +30 -0
- package/src/ui/dist/assets/ImageViewerPlugin-tHqlXY3n.js +26 -0
- package/src/ui/dist/assets/LabCopilotPanel-ClMbq5Yu.js +14 -0
- package/src/ui/dist/assets/LabPlugin-L_SuE8ow.js +22 -0
- package/src/ui/dist/assets/LatexPlugin-B495DTXC.js +25 -0
- package/src/ui/dist/assets/MarkdownViewerPlugin-DG28-61B.js +128 -0
- package/src/ui/dist/assets/MarketplacePlugin-BiOGT-Kj.js +13 -0
- package/src/ui/dist/assets/{NotebookEditor-CccQYZjX.css → NotebookEditor-BHH8rdGj.css} +1 -1
- package/src/ui/dist/assets/NotebookEditor-BOr3x3Ej.css +1 -0
- package/src/ui/dist/assets/NotebookEditor-C-4Kt1p9.js +81 -0
- package/src/ui/dist/assets/NotebookEditor-CVsj8h_T.js +361 -0
- package/src/ui/dist/assets/PdfLoader-CASDQmxJ.js +16 -0
- package/src/ui/dist/assets/PdfLoader-Cy5jtWrr.css +1 -0
- package/src/ui/dist/assets/PdfMarkdownPlugin-BFhwoKsY.js +1 -0
- package/src/ui/dist/assets/PdfViewerPlugin-DcOzU9vd.js +17 -0
- package/src/ui/dist/assets/PdfViewerPlugin-nwwE-fjJ.css +1 -0
- package/src/ui/dist/assets/SearchPlugin-CHj7M58O.js +16 -0
- package/src/ui/dist/assets/SearchPlugin-DA4en4hK.css +1 -0
- package/src/ui/dist/assets/TextViewerPlugin-CB4DYfWO.js +54 -0
- package/src/ui/dist/assets/VNCViewer-CjlbyCB3.js +11 -0
- package/src/ui/dist/assets/bot-CFkZY-JP.js +6 -0
- package/src/ui/dist/assets/browser-CTB2jwNe.js +8 -0
- package/src/ui/dist/assets/chevron-up-Dq5ofbht.js +6 -0
- package/src/ui/dist/assets/code-DLC6G24T.js +6 -0
- package/src/ui/dist/assets/file-content-Dv4LoZec.js +1 -0
- package/src/ui/dist/assets/file-diff-panel-Denq-lC3.js +1 -0
- package/src/ui/dist/assets/file-jump-queue-DA-SdG__.js +1 -0
- package/src/ui/dist/assets/file-socket-Cu4Qln7Y.js +1 -0
- package/src/ui/dist/assets/git-commit-horizontal-BUh6G52n.js +6 -0
- package/src/ui/dist/assets/image-B9HUUddG.js +6 -0
- package/src/ui/dist/assets/index-B2B1sg-M.js +1 -0
- package/src/ui/dist/assets/index-Cgla8biy.css +33 -0
- package/src/ui/dist/assets/index-DRyx7vAc.js +1 -0
- package/src/ui/dist/assets/index-Gbl53BNp.js +2496 -0
- package/src/ui/dist/assets/index-wQ7RIIRd.js +11 -0
- package/src/ui/dist/assets/monaco-CiHMMNH_.js +1 -0
- package/src/ui/dist/assets/pdf-effect-queue-ZtnHFCAi.js +6 -0
- package/src/ui/dist/assets/plugin-monaco-C8UgLomw.js +19 -0
- package/src/ui/dist/assets/plugin-notebook-HbW2K-1c.js +169 -0
- package/src/ui/dist/assets/plugin-pdf-CR8hgQBV.js +357 -0
- package/src/ui/dist/assets/plugin-terminal-MXFIPun8.js +227 -0
- package/src/ui/dist/assets/popover-DL6h35vr.js +1 -0
- package/src/ui/dist/assets/project-sync-CsX08Qno.js +1 -0
- package/src/ui/dist/assets/select-DvmXt1yY.js +11 -0
- package/src/ui/dist/assets/sigma-7jpXazui.js +6 -0
- package/src/ui/dist/assets/trash-xA7kFt8i.js +11 -0
- package/src/ui/dist/assets/useCliAccess-DsMwDjOp.js +1 -0
- package/src/ui/dist/assets/useFileDiffOverlay-FuhcnKiw.js +1 -0
- package/src/ui/dist/assets/wrap-text-CwMn-iqb.js +11 -0
- package/src/ui/dist/assets/zoom-out-R-GWEhzS.js +11 -0
- package/src/ui/dist/index.html +5 -2
- package/src/ui/dist/assets/AiManusChatView-DaF9Nge_.js +0 -26597
- package/src/ui/dist/assets/AnalysisPlugin-BSVx6dXE.js +0 -123
- package/src/ui/dist/assets/CliPlugin-C9gzJX41.js +0 -5905
- package/src/ui/dist/assets/CodeEditorPlugin-DU9G0Tox.js +0 -427
- package/src/ui/dist/assets/CodeViewerPlugin-DoX_fI9l.js +0 -905
- package/src/ui/dist/assets/DocViewerPlugin-C4FWIXuU.js +0 -278
- package/src/ui/dist/assets/GitDiffViewerPlugin-BgfFMgtf.js +0 -2661
- package/src/ui/dist/assets/ImageViewerPlugin-tcPkfY_x.js +0 -500
- package/src/ui/dist/assets/LabCopilotPanel-_dKV60Bf.js +0 -4104
- package/src/ui/dist/assets/LabPlugin-Bje0ayoC.js +0 -2677
- package/src/ui/dist/assets/LatexPlugin-CVsBzAln.js +0 -1792
- package/src/ui/dist/assets/MarkdownViewerPlugin-xjmrqv_8.js +0 -308
- package/src/ui/dist/assets/MarketplacePlugin-mMM2A8wP.js +0 -413
- package/src/ui/dist/assets/NotebookEditor-3kVDSOBo.js +0 -4214
- package/src/ui/dist/assets/NotebookEditor-C3VQ7ylN.css +0 -1405
- package/src/ui/dist/assets/NotebookEditor-SoJ8X-MO.js +0 -84873
- package/src/ui/dist/assets/PdfLoader-C-Y707R3.css +0 -49
- package/src/ui/dist/assets/PdfLoader-DElVuHl9.js +0 -25468
- package/src/ui/dist/assets/PdfMarkdownPlugin-Bq88XT4G.js +0 -409
- package/src/ui/dist/assets/PdfViewerPlugin-CsCXMo9S.js +0 -3095
- package/src/ui/dist/assets/PdfViewerPlugin-DQ11QcSf.css +0 -3627
- package/src/ui/dist/assets/SearchPlugin-DDMrGDkh.css +0 -379
- package/src/ui/dist/assets/SearchPlugin-oUPvy19k.js +0 -741
- package/src/ui/dist/assets/TextViewerPlugin-CRkT9yNy.js +0 -472
- package/src/ui/dist/assets/VNCViewer-BgbuvWhR.js +0 -18821
- package/src/ui/dist/assets/awareness-C0NPR2Dj.js +0 -292
- package/src/ui/dist/assets/bot-v_RASACv.js +0 -21
- package/src/ui/dist/assets/browser-BAcuE0Xj.js +0 -2895
- package/src/ui/dist/assets/code-5hC9d0VH.js +0 -17
- package/src/ui/dist/assets/file-content-D1PxfOrp.js +0 -377
- package/src/ui/dist/assets/file-diff-panel-DG1oT_Hj.js +0 -92
- package/src/ui/dist/assets/file-jump-queue-r5XKgJEV.js +0 -16
- package/src/ui/dist/assets/file-socket-BmdFYQlk.js +0 -58
- package/src/ui/dist/assets/function-B5QZkkHC.js +0 -1895
- package/src/ui/dist/assets/image-Dqe2X2tW.js +0 -18
- package/src/ui/dist/assets/index-BQG-1s2o.css +0 -12553
- package/src/ui/dist/assets/index-DVsMKK_y.js +0 -25
- package/src/ui/dist/assets/index-Duvz8Ip0.js +0 -159
- package/src/ui/dist/assets/index-Nt9hS4ck.js +0 -244829
- package/src/ui/dist/assets/index-RDlNXXx1.js +0 -120
- package/src/ui/dist/assets/monaco-DIXge1CP.js +0 -623
- package/src/ui/dist/assets/pdf-effect-queue-BBTTQaO-.js +0 -47
- package/src/ui/dist/assets/pdf_viewer-e0g1is2C.js +0 -8206
- package/src/ui/dist/assets/popover-BWlolyxo.js +0 -476
- package/src/ui/dist/assets/project-sync-BM5PkFH4.js +0 -297
- package/src/ui/dist/assets/select-D4dAtrA8.js +0 -1690
- package/src/ui/dist/assets/sigma-CKbE5jJT.js +0 -22
- package/src/ui/dist/assets/square-check-big-CZNGMgiB.js +0 -17
- package/src/ui/dist/assets/trash-DaB37xAz.js +0 -32
- package/src/ui/dist/assets/useCliAccess-C2OmAcWe.js +0 -957
- package/src/ui/dist/assets/useFileDiffOverlay-Dowd1Ij4.js +0 -53
- package/src/ui/dist/assets/wrap-text-BGjAhAUq.js +0 -35
- package/src/ui/dist/assets/yjs-DncrqiZ8.js +0 -11243
- package/src/ui/dist/assets/zoom-out-dMZQMXzc.js +0 -34
|
@@ -0,0 +1,43 @@
|
|
|
1
|
+
# DeepScientist Copilot System Prompt
|
|
2
|
+
|
|
3
|
+
You are DeepScientist, the user's research copilot for a single quest.
|
|
4
|
+
Help with planning, reading, coding, experiments, writing, debugging, environment work, analysis, and synthesis.
|
|
5
|
+
Do not assume the user wants the full autonomous research graph unless they explicitly ask for it.
|
|
6
|
+
You are a user-directed copilot, not an auto-pilot stage scheduler.
|
|
7
|
+
|
|
8
|
+
Treat arbitrary research tasks as valid first-class work here: repo audit, paper reading, experiment design, code changes, run inspection, result analysis, writing, and research planning can all be handled directly.
|
|
9
|
+
Default to request-scoped help, not stage expansion. Only shift into longer autonomous continuation when the user explicitly asks for end-to-end ownership or unattended progress.
|
|
10
|
+
|
|
11
|
+
Work in short cycles: understand the request, make a brief plan, execute the smallest useful unit, record important context durably, then report what changed and wait.
|
|
12
|
+
Use memory for durable recall, artifact for quest state and git-aware research operations, and bash_exec for terminal execution.
|
|
13
|
+
Prefer `artifact.git(...)` when a coherent implementation unit materially changed files and should become one durable git node.
|
|
14
|
+
|
|
15
|
+
Copilot SOP for ordinary user turns:
|
|
16
|
+
|
|
17
|
+
1. classify the request first:
|
|
18
|
+
- direct answer or judgment
|
|
19
|
+
- repo / workspace inspection
|
|
20
|
+
- code or file change
|
|
21
|
+
- git operation
|
|
22
|
+
- command / environment / debugging task
|
|
23
|
+
- experiment or long-running execution
|
|
24
|
+
2. choose the narrowest correct tool path before acting:
|
|
25
|
+
- use `artifact.git(...)` first for git state, commit, diff, branch, checkout, log, and show operations inside the current quest repository or worktree
|
|
26
|
+
- use `bash_exec(...)` for any shell, CLI, Python, bash, node, git CLI, or environment command execution
|
|
27
|
+
- use `artifact.read_quest_documents(...)`, `artifact.get_quest_state(...)`, or `memory.*` when you need durable quest context instead of shelling out
|
|
28
|
+
3. execute the smallest useful unit, persist only the important result, then answer plainly
|
|
29
|
+
|
|
30
|
+
Hard copilot tool rules:
|
|
31
|
+
|
|
32
|
+
- **Do not use native `shell_command` or Codex `command_execution`.**
|
|
33
|
+
- **All shell, CLI, Python, bash, node, git, package, environment, and terminal-like operations must go through `bash_exec(...)`.**
|
|
34
|
+
- **Even if the runner or model surface exposes `shell_command`, ignore it and reformulate the action as `bash_exec(...)`.**
|
|
35
|
+
- **Treat any attempt to use native `shell_command` / `command_execution` as a policy violation and immediately switch back to `bash_exec(...)`.**
|
|
36
|
+
- Do not default into `decision`-style route analysis for an ordinary direct task just because the request is open-ended or exploratory.
|
|
37
|
+
- Use `decision` only when the user is explicitly asking for a route / go-no-go judgment, or when cost, scope, branch choice, or scientific direction would materially change.
|
|
38
|
+
- If the user asks to test git itself rather than mutate the current quest repo, prefer an isolated scratch repo through `bash_exec(...)`; if the task is about the current quest repo, prefer `artifact.git(...)`.
|
|
39
|
+
|
|
40
|
+
When a branch, cost, or scientific direction materially changes the user's intent, ask before proceeding.
|
|
41
|
+
If the user asks for an open-ended research goal, first frame the immediate next unit clearly and start there instead of inventing a full autonomous route.
|
|
42
|
+
After finishing the requested unit of work, park and wait for the next user message or `/resume`.
|
|
43
|
+
stop_rule: once the current requested unit is done, summarize what changed, note anything still pending, and wait instead of auto-continuing.
|
|
@@ -1,6 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: analysis-campaign
|
|
3
3
|
description: Use when a quest needs one or more follow-up runs such as ablations, robustness checks, error analysis, or failure analysis after a main experiment.
|
|
4
|
+
skill_role: stage
|
|
4
5
|
---
|
|
5
6
|
|
|
6
7
|
# Analysis Campaign
|
|
@@ -28,6 +29,7 @@ Do not invent a separate experiment system for those cases.
|
|
|
28
29
|
|
|
29
30
|
- Follow the shared interaction contract injected by the system prompt.
|
|
30
31
|
- For ordinary active work, prefer a concise progress update once work has crossed roughly 6 tool calls with a human-meaningful delta, and do not drift beyond roughly 12 tool calls or about 8 minutes without a user-visible update.
|
|
32
|
+
- Hard execution rule: every terminal command in this stage must go through `bash_exec`; do not use any other terminal path for slice execution, smoke tests, Git, Python, package-manager, or file-inspection commands.
|
|
31
33
|
- Prefer `bash_exec` for campaign slice commands so each run has a durable session id, quest-local log folder, and later `read/list/kill` control.
|
|
32
34
|
- Keep ordinary subtask completions concise. When an analysis campaign or a stage-significant campaign checkpoint is complete, upgrade to a richer `artifact.interact(kind='milestone', reply_mode='threaded', ...)` report.
|
|
33
35
|
- That richer campaign milestone report should normally cover: which slices completed, the main takeaway, whether the claim got stronger or weaker, and the exact recommended next route.
|
|
@@ -70,6 +72,7 @@ It preserves the core old DeepScientist analysis-experimenter discipline:
|
|
|
70
72
|
The campaign should behave like a disciplined evidence program, not an unstructured pile of extra runs.
|
|
71
73
|
|
|
72
74
|
For campaign prioritization and writing-facing slice design, read `references/campaign-design.md`.
|
|
75
|
+
When the campaign is paper-facing and the mapping fields are not obvious, also read `references/writing-facing-slice-examples.md`.
|
|
73
76
|
|
|
74
77
|
## Quick workflow
|
|
75
78
|
|
|
@@ -93,6 +96,7 @@ Treat this as the compressed campaign map. The authoritative slice protocol and
|
|
|
93
96
|
- When a selected outline exists, every slice should map to a named `research_question` and `experimental_design` from that outline.
|
|
94
97
|
- When the campaign is supporting a paper or paper-like report, do not launch or reorder the slice set without first reading `paper/paper_experiment_matrix.md` when it exists.
|
|
95
98
|
- For writing-facing campaigns, every slice should correspond to a stable matrix row such as `exp_id`, not just a free-form note.
|
|
99
|
+
- For writing-facing campaigns, every todo item must also carry `section_id`, `item_id`, `claim_links`, and `paper_role`; otherwise the slice is not paper-ready.
|
|
96
100
|
- Do not aggregate campaign conclusions without per-run evidence.
|
|
97
101
|
- Do not bury null or contradictory findings.
|
|
98
102
|
|
|
@@ -129,6 +133,22 @@ Treat quest files, attached user assets, checkpoints, configs, extracted texts,
|
|
|
129
133
|
Do not design slices around hypothetical resources that the current system cannot actually access or run.
|
|
130
134
|
If a slice cannot be executed with the current system, redesign it around available assets or explicitly report that the task cannot currently be completed.
|
|
131
135
|
If infeasibility appears mid-run, attempt bounded recovery first; if still blocked, record the slice with a non-success status and explain why.
|
|
136
|
+
If ids, active refs, or current quest state are unclear after restart, call `artifact.get_quest_state(detail='summary')` and `artifact.resolve_runtime_refs(...)` before launching or recording slices.
|
|
137
|
+
If the exact quest brief / plan / status wording matters for campaign scope, call `artifact.read_quest_documents(...)`.
|
|
138
|
+
If earlier user instructions materially affect campaign scope or ordering, call `artifact.get_conversation_context(...)` before changing the slice set.
|
|
139
|
+
|
|
140
|
+
For concrete paper-facing cases:
|
|
141
|
+
|
|
142
|
+
- if the slice is the only thing keeping a main-text section unsupported, make it `main_required` / `main_text`
|
|
143
|
+
- if the slice is useful but non-blocking, make it `appendix`
|
|
144
|
+
- if the slice is informative but not meant for the manuscript, keep it durable and mark it `reference_only` with a reason
|
|
145
|
+
- after every completed paper-facing slice, verify the return path immediately:
|
|
146
|
+
- the matching outline `result_table` row is updated
|
|
147
|
+
- the section notes are updated when the outline folder exists
|
|
148
|
+
- `paper/evidence_ledger.json` reflects the new mapping
|
|
149
|
+
- the active paper line summary no longer treats that slice as missing
|
|
150
|
+
|
|
151
|
+
Do not leave a slice "completed" while the paper contract still looks stale.
|
|
132
152
|
|
|
133
153
|
## Required plan and checklist
|
|
134
154
|
|
|
@@ -235,6 +255,16 @@ If the campaign exists to support a paper or paper-like report:
|
|
|
235
255
|
- `paper_placement`
|
|
236
256
|
- `completion_condition`
|
|
237
257
|
|
|
258
|
+
For writing-facing campaigns, every slice should also carry paper-contract identity, not just free-form text:
|
|
259
|
+
|
|
260
|
+
- `section_id`
|
|
261
|
+
- `item_id`
|
|
262
|
+
- `claim_links`
|
|
263
|
+
- `paper_role`
|
|
264
|
+
|
|
265
|
+
Do not treat a completed analysis slice as paper-ready until those fields exist and the slice is mappable back into the selected outline or paper experiment matrix.
|
|
266
|
+
Use `references/writing-facing-slice-examples.md` when the correct field values are not obvious.
|
|
267
|
+
|
|
238
268
|
This keeps the analysis campaign aligned with the paper plan instead of becoming a free-floating batch of slices.
|
|
239
269
|
|
|
240
270
|
### 1. Define the campaign charter
|
|
@@ -393,8 +423,8 @@ For slices that run longer than a quick smoke check:
|
|
|
393
423
|
- if you only need wall-clock waiting between checks, use `bash_exec(command='sleep N', mode='await', timeout_seconds=N+buffer, ...)`
|
|
394
424
|
- keep a real buffer on that sleep timeout; do not set `timeout_seconds` exactly equal to `N`
|
|
395
425
|
- if you are waiting on an already running managed session, prefer `bash_exec(mode='await', id=..., timeout_seconds=...)` instead of starting a new sleep command
|
|
396
|
-
- after the first meaningful signal and then at real checkpoints (e.g., completion,
|
|
397
|
-
- after each completed sleep / await monitoring cycle for an active slice, send another
|
|
426
|
+
- after the first meaningful signal and then at real checkpoints (e.g., completion, blocker, recovery, or a materially changed evidence frontier), send `artifact.interact(kind='progress', ...)` so the user sees the newest real state
|
|
427
|
+
- after each completed sleep / await monitoring cycle for an active slice, inspect state first; only send another `artifact.interact(kind='progress', ...)` update if the user-visible state materially changed
|
|
398
428
|
- include the estimated next reply time or next check time in those monitoring updates
|
|
399
429
|
- stop them with `bash_exec(mode='kill', id=..., wait=true, timeout_seconds=...)` if the slice is invalid, wedged, or superseded; add `force=true` when immediate termination is required
|
|
400
430
|
- when you control the slice code, prefer a throttled `tqdm` progress reporter and, when feasible, pair it with concise `__DS_PROGRESS__` lines carrying phase and ETA
|
|
@@ -23,7 +23,7 @@ Use this reference because the current runtime has no dedicated `campaign` artif
|
|
|
23
23
|
|
|
24
24
|
## Recommended per-slice fields
|
|
25
25
|
|
|
26
|
-
Because `artifact.record(...)` accepts extra fields, include:
|
|
26
|
+
Because `artifact.record(payload={...})` accepts extra fields, include:
|
|
27
27
|
|
|
28
28
|
- `campaign_id`
|
|
29
29
|
- `slice_id`
|
|
@@ -0,0 +1,65 @@
|
|
|
1
|
+
# Writing-Facing Slice Examples
|
|
2
|
+
|
|
3
|
+
Use this reference when an analysis campaign is supporting a paper-like deliverable and each slice must bind to the paper contract.
|
|
4
|
+
|
|
5
|
+
## Good writing-facing todo item
|
|
6
|
+
|
|
7
|
+
```json
|
|
8
|
+
{
|
|
9
|
+
"exp_id": "EXP-ABL-001",
|
|
10
|
+
"todo_id": "todo-ablation-core",
|
|
11
|
+
"slice_id": "ablation-core",
|
|
12
|
+
"title": "Core component ablation",
|
|
13
|
+
"research_question": "RQ2",
|
|
14
|
+
"experimental_design": "Component ablation",
|
|
15
|
+
"tier": "main_required",
|
|
16
|
+
"paper_placement": "main_text",
|
|
17
|
+
"paper_role": "main_text",
|
|
18
|
+
"section_id": "analysis-mechanism",
|
|
19
|
+
"item_id": "AN-ABL-001",
|
|
20
|
+
"claim_links": ["C2"],
|
|
21
|
+
"completion_condition": "Show whether the central gain survives removal of the core component.",
|
|
22
|
+
"why_now": "The draft cannot support the mechanism claim without this slice.",
|
|
23
|
+
"success_criteria": "Produce a fair ablation under the accepted metric contract.",
|
|
24
|
+
"abandonment_criteria": "Stop only if the evaluation contract becomes invalid.",
|
|
25
|
+
"manuscript_targets": ["Results", "Mechanism analysis"]
|
|
26
|
+
}
|
|
27
|
+
```
|
|
28
|
+
|
|
29
|
+
## Bad writing-facing todo item
|
|
30
|
+
|
|
31
|
+
```json
|
|
32
|
+
{
|
|
33
|
+
"slice_id": "ablation-core",
|
|
34
|
+
"title": "Try one ablation",
|
|
35
|
+
"research_question": "RQ2"
|
|
36
|
+
}
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
Why it is bad:
|
|
40
|
+
|
|
41
|
+
- no `section_id`
|
|
42
|
+
- no `item_id`
|
|
43
|
+
- no `claim_links`
|
|
44
|
+
- no paper placement
|
|
45
|
+
- impossible to write back into the outline cleanly later
|
|
46
|
+
|
|
47
|
+
## Case guide
|
|
48
|
+
|
|
49
|
+
- Main claim support:
|
|
50
|
+
use `paper_role=main_text` and make the item part of `required_items`
|
|
51
|
+
- Supporting but non-blocking evidence:
|
|
52
|
+
use `paper_role=appendix` and make the item part of `optional_items`
|
|
53
|
+
- Useful but paper-excluded result:
|
|
54
|
+
keep the slice durable, but mark it `reference_only` or exclude it with a written reason in the matrix
|
|
55
|
+
|
|
56
|
+
## Completion rule
|
|
57
|
+
|
|
58
|
+
After `artifact.record_analysis_slice(...)`:
|
|
59
|
+
|
|
60
|
+
1. the slice result must exist under the analysis worktree
|
|
61
|
+
2. the mirror must exist under `experiments/analysis-results/`
|
|
62
|
+
3. the evidence ledger must contain the corresponding `item_id`
|
|
63
|
+
4. the selected outline section must show the updated row in `result_table`
|
|
64
|
+
|
|
65
|
+
If step 3 or 4 is missing, the slice is not paper-ready yet.
|
|
@@ -1,6 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: baseline
|
|
3
3
|
description: Use when a quest needs to attach, import, reproduce, repair, verify, compare, or publish a baseline and its metrics.
|
|
4
|
+
skill_role: stage
|
|
4
5
|
---
|
|
5
6
|
|
|
6
7
|
# Baseline
|
|
@@ -13,8 +14,16 @@ The target is one trustworthy baseline line, not an endless reproduction diary.
|
|
|
13
14
|
- Follow the shared interaction contract injected by the system prompt.
|
|
14
15
|
- Keep ordinary setup and debugging updates concise.
|
|
15
16
|
- Use richer milestone updates only when the baseline becomes trusted, caveated, blocked, waived, or route-changing.
|
|
17
|
+
- Hard execution rule: every terminal command in this stage must go through `bash_exec`; do not use any other terminal path for setup, reproduction, monitoring, verification, Git, Python, package-manager, or file-inspection commands.
|
|
16
18
|
- Prefer `bash_exec` for setup, reproduction, monitoring, and verification commands so the baseline line stays durable and auditable.
|
|
17
19
|
|
|
20
|
+
## Tool discipline
|
|
21
|
+
|
|
22
|
+
- **Do not use native `shell_command` / `command_execution` in this skill.**
|
|
23
|
+
- **All shell, CLI, Python, bash, node, git, npm, uv, and environment work must go through `bash_exec(...)`.**
|
|
24
|
+
- **For git work inside the current quest repository or worktree, prefer `artifact.git(...)` before raw shell git commands.**
|
|
25
|
+
- **If a generic git smoke test is needed outside the quest repo, use `bash_exec(...)` in an isolated scratch repository.**
|
|
26
|
+
|
|
18
27
|
## Non-negotiable rules
|
|
19
28
|
|
|
20
29
|
- no fabricated metrics, logs, run status, or success claims
|
|
@@ -463,6 +472,7 @@ Metric-contract rules:
|
|
|
463
472
|
- when confirming a baseline, submit the canonical `metrics_summary` as a flat top-level dictionary keyed by the paper-facing metric ids
|
|
464
473
|
- every canonical baseline metric entry should include `description`, either `derivation` or `origin_path`, and `source_ref`
|
|
465
474
|
- if the paper reports both aggregate and per-dataset or per-task results, preserve both whenever feasible through `metrics_summary` plus structured rows rather than one cherry-picked scalar
|
|
475
|
+
- if the source package already has a richer leaderboard table, structured result file, or `json/metric_contract.json`, reuse that richer contract instead of hand-writing a thinner one that keeps only one averaged scalar
|
|
466
476
|
- `Result/metric.md` is optional temporary scratch memory only; reconcile against it before calling `artifact.confirm_baseline(...)`, but do not treat it as a required durable file
|
|
467
477
|
|
|
468
478
|
## Publication and reuse
|
|
@@ -1,6 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: decision
|
|
3
3
|
description: Use when the quest needs an explicit go, stop, branch, reuse-baseline, write, finalize, reset, or user-decision transition with reasons and evidence.
|
|
4
|
+
skill_role: stage
|
|
4
5
|
---
|
|
5
6
|
|
|
6
7
|
# Decision
|
|
@@ -18,6 +19,13 @@ Use this skill whenever continuation is non-trivial.
|
|
|
18
19
|
- If a threaded user reply arrives, interpret it relative to the latest decision or progress interaction before assuming the task changed completely.
|
|
19
20
|
- Quest completion is a special terminal decision: first ask for explicit completion approval with `artifact.interact(kind='decision_request', reply_mode='blocking', reply_schema={'decision_type': 'quest_completion_approval'}, ...)`, and only after an explicit approval reply should you call `artifact.complete_quest(...)`.
|
|
20
21
|
|
|
22
|
+
## Tool discipline
|
|
23
|
+
|
|
24
|
+
- **Do not use native `shell_command` / `command_execution` in this skill.**
|
|
25
|
+
- **If decision-making needs shell, CLI, Python, bash, node, git, npm, uv, or environment evidence, gather it through `bash_exec(...)`.**
|
|
26
|
+
- **For git state inside the current quest repository or worktree, prefer `artifact.git(...)` before raw shell git commands.**
|
|
27
|
+
- **Use `decision` to judge the route, not as an excuse to bypass the `bash_exec(...)` / `artifact.git(...)` tool contract.**
|
|
28
|
+
|
|
21
29
|
## Stage purpose
|
|
22
30
|
|
|
23
31
|
`decision` is not a normal anchor.
|
|
@@ -84,12 +92,14 @@ Choose the smallest action that genuinely resolves the current state.
|
|
|
84
92
|
|
|
85
93
|
In the current runtime, prefer these concrete flow actions:
|
|
86
94
|
|
|
95
|
+
- record a candidate brief before branch promotion -> `artifact.submit_idea(mode='create', submission_mode='candidate', ...)`
|
|
87
96
|
- accepted idea -> `artifact.submit_idea(mode='create', lineage_intent='continue_line'|'branch_alternative', ...)`
|
|
97
|
+
- promote a candidate brief into a durable optimization line -> `artifact.submit_idea(mode='create', submission_mode='line', source_candidate_id=..., lineage_intent='continue_line'|'branch_alternative', ...)`
|
|
88
98
|
- maintenance-only in-place cleanup of the same branch -> `artifact.submit_idea(mode='revise', ...)`
|
|
89
99
|
- compare branch foundations before a new round -> `artifact.list_research_branches(...)`
|
|
90
100
|
- return to an older durable branch without creating a new node -> `artifact.activate_branch(...)`
|
|
91
101
|
- materialize the concrete main-result node when a real main experiment line is about to be or was just durably recorded -> dedicated child `run/*` branch/worktree
|
|
92
|
-
- start the next optimization round from a measured result -> `artifact.record(
|
|
102
|
+
- start the next optimization round from a measured result -> `artifact.record(payload={'kind': 'decision', 'action': 'iterate', ...})`
|
|
93
103
|
- launch analysis campaign -> `artifact.create_analysis_campaign(...)`
|
|
94
104
|
- finish one analysis slice -> `artifact.record_analysis_slice(...)`
|
|
95
105
|
- select a paper outline -> `artifact.submit_paper_outline(mode='select', ...)`
|
|
@@ -104,6 +114,7 @@ If the chosen action is baseline reuse, the decision is not complete until one o
|
|
|
104
114
|
Treat `prepare_branch` as a compatibility or recovery action, not the normal path.
|
|
105
115
|
Treat `activate_branch` as the correct recovery or revisit action when the quest should resume on an existing older durable branch while preserving the newer research head.
|
|
106
116
|
Treat each accepted branch as one durable research round.
|
|
117
|
+
Treat candidate briefs as branchless pre-promotion objects; they are not yet durable optimization lines.
|
|
107
118
|
If a branch already has a durable main-experiment result, a genuinely new optimization round should normally create a child branch from a chosen foundation rather than keep revising that old branch in place.
|
|
108
119
|
Treat each durable main experiment as its own child `run/*` branch/node, not as another mutable state on the idea branch.
|
|
109
120
|
When paper mode is enabled and the necessary analysis for a strong run is done, the next default route is `write` on a dedicated `paper/*` branch/worktree derived from that run branch.
|
|
@@ -121,6 +132,12 @@ Make decisions from durable evidence:
|
|
|
121
132
|
|
|
122
133
|
Do not make major decisions from vibe or momentum.
|
|
123
134
|
|
|
135
|
+
When the quest is algorithm-first, add one extra truth-source rule before non-trivial route choices:
|
|
136
|
+
|
|
137
|
+
- read `artifact.get_optimization_frontier(...)`
|
|
138
|
+
- treat the frontier as the primary optimize-state summary
|
|
139
|
+
- only override it when newer durable evidence clearly dominates
|
|
140
|
+
|
|
124
141
|
## Workflow
|
|
125
142
|
|
|
126
143
|
### 1. State the question
|
|
@@ -249,6 +266,14 @@ When recording the decision, make explicit:
|
|
|
249
266
|
- which existing evidence was decisive
|
|
250
267
|
- what residual risk remains after the choice
|
|
251
268
|
|
|
269
|
+
For algorithm-first route choices, prefer this default mapping:
|
|
270
|
+
|
|
271
|
+
- frontier says `explore` -> widen or refine candidate briefs before new branch creation
|
|
272
|
+
- frontier says `exploit` -> keep the strongest line active and advance the best implementation candidates
|
|
273
|
+
- frontier says `fusion` -> open at most one bounded fusion candidate
|
|
274
|
+
- a fixable candidate failure dominates -> run a debug route instead of widening search blindly
|
|
275
|
+
- frontier says `stop` -> record the stop decision and explicit reopen condition
|
|
276
|
+
|
|
252
277
|
Good route-selection criteria often include:
|
|
253
278
|
|
|
254
279
|
- feasibility
|
|
@@ -326,7 +351,7 @@ When asking, use a structured decision request with:
|
|
|
326
351
|
|
|
327
352
|
### 6. Record the decision durably
|
|
328
353
|
|
|
329
|
-
Use `artifact.record(
|
|
354
|
+
Use `artifact.record(payload={'kind': 'decision', ...})` for the final decision.
|
|
330
355
|
|
|
331
356
|
If user input is needed, also use `artifact.interact(kind='decision_request', ...)`.
|
|
332
357
|
If the timeout expires without a user reply, choose the best option yourself, record why, and notify the user of the chosen option before moving on.
|
|
@@ -1,6 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: experiment
|
|
3
3
|
description: Use when a quest is ready for a concrete implementation pass or a main experiment run tied to a selected idea and an accepted baseline.
|
|
4
|
+
skill_role: stage
|
|
4
5
|
---
|
|
5
6
|
|
|
6
7
|
# Experiment
|
|
@@ -39,7 +40,16 @@ Use this skill for the main evidence-producing runs of the quest.
|
|
|
39
40
|
- If the runtime starts an auto-continue turn with no new user message, continue from the current run state, logs, artifacts, and active requirements instead of replaying the previous user turn.
|
|
40
41
|
- Progress message templates are references only. Adapt to the actual context and vary wording so messages feel human, respectful, and non-robotic.
|
|
41
42
|
- If a threaded user reply arrives, interpret it relative to the latest experiment progress update before assuming the task changed completely.
|
|
43
|
+
- Hard execution rule: every terminal command in this stage must go through `bash_exec`; do not use any other terminal path for smoke tests, real runs, Git, Python, package-manager, or file-inspection commands.
|
|
42
44
|
- Prefer `bash_exec` for experiment commands so each run gets a durable session id, quest-local log folder, and later `read/list/kill` control.
|
|
45
|
+
- For meaningful long-running runs, include the estimated next reply time or next check-in window whenever it is defensible.
|
|
46
|
+
|
|
47
|
+
## Tool discipline
|
|
48
|
+
|
|
49
|
+
- **Do not use native `shell_command` / `command_execution` in this skill.**
|
|
50
|
+
- **All smoke tests, real runs, shell, CLI, Python, bash, node, git, npm, uv, and environment work must go through `bash_exec(...)`.**
|
|
51
|
+
- **For git work inside the current quest repository or worktree, prefer `artifact.git(...)` before raw shell git commands.**
|
|
52
|
+
- **If a scratch repository or isolated test environment is needed, create and drive it through `bash_exec(...)`, not native shell tools.**
|
|
43
53
|
|
|
44
54
|
## Stage purpose
|
|
45
55
|
|
|
@@ -64,6 +74,9 @@ Use `references/evidence-ladder.md` when deciding whether the current package is
|
|
|
64
74
|
Completing one main run is not quest completion.
|
|
65
75
|
After reporting the run, keep moving to iterate, analyze, write, or finalize unless a genuine blocking decision remains.
|
|
66
76
|
|
|
77
|
+
When the quest is algorithm-first, treat `experiment` as the execution surface of `optimize`, not as the terminal goal of the workflow.
|
|
78
|
+
After a measured result, the default next move is frontier review and optimize-side route selection rather than paper packaging.
|
|
79
|
+
|
|
67
80
|
## Quick workflow
|
|
68
81
|
|
|
69
82
|
Treat this as the short run-order summary. The detailed run contract, execution rules, and recording rules remain in `Workflow`.
|
|
@@ -90,6 +103,7 @@ Treat this as the short run-order summary. The detailed run contract, execution
|
|
|
90
103
|
- After each `artifact.record_main_experiment(...)`, route from the measured result:
|
|
91
104
|
- if paper mode is enabled, decide whether to strengthen evidence, analyze, or write
|
|
92
105
|
- if paper mode is disabled, prefer iterate / revise-idea / branch over default writing
|
|
106
|
+
- In algorithm-first work, after each main run, return to `optimize` or `decision` for frontier review before launching another large run.
|
|
93
107
|
|
|
94
108
|
## Experiment mental guardrails
|
|
95
109
|
|
|
@@ -429,8 +443,8 @@ For commands that may run longer than a few minutes:
|
|
|
429
443
|
- if you only need wall-clock waiting between checks, use `bash_exec(command='sleep N', mode='await', timeout_seconds=N+buffer, ...)`
|
|
430
444
|
- keep a real buffer on that sleep timeout; do not set `timeout_seconds` exactly equal to `N`
|
|
431
445
|
- if you are waiting on an already running managed session, prefer `bash_exec(mode='await', id=..., timeout_seconds=...)` instead of starting a new sleep command
|
|
432
|
-
- after every completed sleep / await cycle, inspect logs
|
|
433
|
-
- after the first meaningful signal and then at real checkpoints (e.g., completion,
|
|
446
|
+
- after every completed sleep / await cycle, inspect logs first; only send `artifact.interact(kind='progress', ...)` when the user-visible state, frontier, blocker status, or ETA materially changed
|
|
447
|
+
- after the first meaningful signal and then at real checkpoints (e.g., completion, recovery, blocker, or a materially widened comparable surface), keep those progress updates going rather than waiting silently
|
|
434
448
|
- if the run is clearly invalid, wedged, or superseded, stop it with `bash_exec(mode='kill', id=..., wait=true, timeout_seconds=...)`; if it must die immediately, add `force=true`, record the reason, fix the issue, and relaunch cleanly
|
|
435
449
|
- do not report completion until logs and output files both confirm completion
|
|
436
450
|
|
|
@@ -1,6 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: finalize
|
|
3
3
|
description: Use when the quest is ready to consolidate final claims, limitations, recommendations, summary state, and graph exports before stopping or archiving.
|
|
4
|
+
skill_role: stage
|
|
4
5
|
---
|
|
5
6
|
|
|
6
7
|
# Finalize
|
|
@@ -11,10 +12,13 @@ Use this skill to close or pause a quest responsibly.
|
|
|
11
12
|
|
|
12
13
|
- Follow the shared interaction contract injected by the system prompt.
|
|
13
14
|
- For ordinary active work, prefer a concise progress update once work has crossed roughly 6 tool calls with a human-meaningful delta, and do not drift beyond roughly 12 tool calls or about 8 minutes without a user-visible update.
|
|
15
|
+
- Do not emit another finalize progress update when the user-visible state is unchanged.
|
|
14
16
|
- If the runtime starts an auto-continue turn with no new user message, keep finalizing from the durable quest state and active requirements instead of replaying the previous user turn.
|
|
15
17
|
- If a threaded user reply arrives, interpret it relative to the latest finalize progress update before assuming the task changed completely.
|
|
16
18
|
- When finalize reaches a real closure state, pause-ready packet, or route-back decision, send one threaded `artifact.interact(kind='milestone', ...)` update that names the recommendation, why it is the right call, and any reopen condition that still matters.
|
|
17
19
|
- True quest completion still requires explicit user approval through the runtime completion flow before calling `artifact.complete_quest(...)`.
|
|
20
|
+
- Rechecking that the same bundle files still exist, or re-aligning status surfaces without changing the closure judgment, does not by itself count as a fresh milestone.
|
|
21
|
+
- Hard execution rule: if this stage needs terminal work such as Git inspection, packaging checks, document builds, or file inspection, every such command must go through `bash_exec`.
|
|
18
22
|
|
|
19
23
|
## Stage purpose
|
|
20
24
|
|
|
@@ -54,8 +58,19 @@ Before finalizing, gather:
|
|
|
54
58
|
- latest quest documents
|
|
55
59
|
- latest review / proofing / submission state when a paper bundle exists
|
|
56
60
|
- the paper bundle manifest and its referenced paths when the quest has a paper-like deliverable
|
|
61
|
+
- the paper evidence ledger and selected-outline section statuses when the quest has a paper-like deliverable
|
|
57
62
|
|
|
58
63
|
If finalization reveals that the quest is still too uncertain, route back through `decision` rather than forcing closure.
|
|
64
|
+
For paper-like deliverables, do not finalize while any of these remain true:
|
|
65
|
+
|
|
66
|
+
- required main-text outline items are still unresolved
|
|
67
|
+
- completed analysis remains unmapped into the paper contract
|
|
68
|
+
- the active paper line still reports open supplementary work that is expected to block the manuscript
|
|
69
|
+
|
|
70
|
+
If the current paper-state blocker is not obvious from the existing files, call `artifact.get_paper_contract_health(detail='full')` before deciding whether finalize is legitimate.
|
|
71
|
+
If the active quest/runtime state is unclear after restart or long pause, call `artifact.get_quest_state(detail='summary')` first.
|
|
72
|
+
If the exact latest `SUMMARY.md`, `status.md`, or active user requirement wording matters for closure, call `artifact.read_quest_documents(...)`.
|
|
73
|
+
If earlier user/assistant continuity matters for whether the quest should really stop, call `artifact.get_conversation_context(...)` instead of guessing from prompt context alone.
|
|
59
74
|
|
|
60
75
|
## Truth sources
|
|
61
76
|
|
|
@@ -90,6 +105,7 @@ The finalize stage should usually leave behind:
|
|
|
90
105
|
If the quest produced a paper-style bundle, finalization should also check that the writing stage left behind enough closure evidence, such as:
|
|
91
106
|
|
|
92
107
|
- selected outline and outline selection records
|
|
108
|
+
- evidence ledger records and section-level result tables
|
|
93
109
|
- review output
|
|
94
110
|
- proofing output
|
|
95
111
|
- submission or packaging checklist
|
|
@@ -113,12 +129,14 @@ Say clearly what exists and why it matters. Name concrete paths or artifact ids
|
|
|
113
129
|
When a paper bundle exists, verify the manifest inventory explicitly, including:
|
|
114
130
|
|
|
115
131
|
- `paper/paper_bundle_manifest.json`
|
|
132
|
+
- `paper/evidence_ledger.json`
|
|
116
133
|
- the recorded `paper_branch` and source evidence branch / run fields in that manifest
|
|
117
134
|
- referenced `outline_path`
|
|
118
135
|
- referenced `draft_path`
|
|
119
136
|
- referenced `writing_plan_path`
|
|
120
137
|
- referenced `references_path`
|
|
121
138
|
- referenced `claim_evidence_map_path`
|
|
139
|
+
- referenced `evidence_ledger_path`
|
|
122
140
|
- referenced `baseline_inventory_path`
|
|
123
141
|
- referenced `compile_report_path`
|
|
124
142
|
- referenced `pdf_path`
|
|
@@ -243,6 +261,7 @@ Weak finalization:
|
|
|
243
261
|
- leaves no clear recommendation
|
|
244
262
|
- claims “done” without showing what is actually done
|
|
245
263
|
- drops the package or file inventory needed for resumption
|
|
264
|
+
- ignores unmapped completed analysis that never entered the paper contract
|
|
246
265
|
|
|
247
266
|
## Memory rules
|
|
248
267
|
|
package/src/skills/idea/SKILL.md
CHANGED
|
@@ -1,12 +1,17 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: idea
|
|
3
3
|
description: Use when a quest needs concrete hypotheses, limitation analysis, candidate directions, or a selected idea relative to the active baseline.
|
|
4
|
+
skill_role: stage
|
|
4
5
|
---
|
|
5
6
|
|
|
6
7
|
# Idea
|
|
7
8
|
|
|
8
9
|
Use this skill to turn the current baseline and problem frame into concrete, literature-grounded, testable directions.
|
|
9
10
|
|
|
11
|
+
When `startup_contract.need_research_paper = false` and the quest already has a concrete optimization handle, `idea` may stop after selecting or seeding a direction and then hand off into `optimize` instead of insisting on the full paper-oriented ideation loop.
|
|
12
|
+
In that algorithm-first case, `idea` should usually produce a small method-brief frontier and then defer candidate ranking, promotion, and bounded search to `optimize`.
|
|
13
|
+
When doing that handoff, prefer the brief-shaping discipline later used by `optimize`: clarify the bottleneck and constraints, keep only a small differentiated `2-3` option slate, and hand off a recommended brief rather than a pile of loose intuitions.
|
|
14
|
+
|
|
10
15
|
## Interaction discipline
|
|
11
16
|
|
|
12
17
|
- Follow the shared interaction contract injected by the system prompt.
|
|
@@ -39,6 +44,15 @@ The output must survive three checks at once:
|
|
|
39
44
|
- feasibility in the current repo and resource budget
|
|
40
45
|
- manuscript defensibility if the line later becomes a paper claim
|
|
41
46
|
|
|
47
|
+
When the route already looks likely to become a paper-facing line, seed one lightweight structured outline candidate during idea work.
|
|
48
|
+
Use `artifact.submit_paper_outline(mode='candidate', ...)` for that seed instead of leaving the future paper structure only in prose.
|
|
49
|
+
Use `references/outline-seeding-example.md` for the minimum acceptable shape.
|
|
50
|
+
The idea-stage outline candidate is not the full paper line yet, but it should already name the likely `research_questions`, `experimental_designs`, and the first section-level evidence needs that later supplementary slices must satisfy.
|
|
51
|
+
Keep that seed minimal and executable: a small section skeleton plus expected evidence items is better than a long narrative outline with no concrete evidence hooks.
|
|
52
|
+
If the current research head, strongest measured branch, or active runtime refs are unclear after resume, call `artifact.get_quest_state(detail='summary')` and `artifact.list_research_branches(...)` before choosing a foundation.
|
|
53
|
+
If the current brief / plan / status wording matters for direction choice, call `artifact.read_quest_documents(...)`.
|
|
54
|
+
If earlier user conversation materially changes the direction-selection target, call `artifact.get_conversation_context(...)` before locking the next idea.
|
|
55
|
+
|
|
42
56
|
Finishing one idea deliverable is not quest completion.
|
|
43
57
|
After reporting a completed idea package, continue into the next justified stage unless a real blocking decision is still unresolved.
|
|
44
58
|
|
|
@@ -106,6 +120,11 @@ Break ties primarily through careful reasoning over:
|
|
|
106
120
|
- Do not write, promote, or submit a final idea until the durable survey covers at least `5` and usually `5-10` task-modeling-related, mechanism-relevant, or otherwise directly usable papers.
|
|
107
121
|
- Treat that literature floor as a hard gate, not a suggestion.
|
|
108
122
|
If the direct task-modeling neighborhood truly contains fewer than `5` usable papers, record that evidence explicitly and fill the remaining slots with the closest adjacent papers whose mechanism can be translated into the current task and codebase.
|
|
123
|
+
- Algorithm-first exception:
|
|
124
|
+
- when `startup_contract.need_research_paper = false` and a concrete optimization handle already exists, you may stop after a memory sweep plus a small targeted paper check instead of satisfying the full `5-10` paper floor
|
|
125
|
+
- use that exception only when the immediate goal is method-brief selection for `optimize`, not paper-level novelty claims
|
|
126
|
+
- if you use the exception, say explicitly that the output is an optimization brief frontier rather than a paper-ready idea package
|
|
127
|
+
- still shape that frontier deliberately: clarify the bottleneck and comparability boundary first, keep a differentiated `2-3` candidate slate, and explain why one brief is recommended now
|
|
109
128
|
- Every fresh idea build or idea-refinement pass must begin with:
|
|
110
129
|
- a memory sweep, and
|
|
111
130
|
- an external literature sweep.
|
|
@@ -133,12 +152,19 @@ Break ties primarily through careful reasoning over:
|
|
|
133
152
|
- Unless strong durable evidence already narrows the route to one obvious serious option, run one bounded divergent pass that produces a small but meaningfully varied slate, usually `6-12` raw ideas before collapsing to a serious frontier that is usually `2-3` and at most `5`.
|
|
134
153
|
- If all surviving candidates belong to the same mechanism family, widen once with at least two new ideation lenses before converging.
|
|
135
154
|
- Keep structurally coherent rejected ideas in a parking-lot or rejected-candidate section so they can be recombined later if needed.
|
|
155
|
+
- In algorithm-first work, `idea` should usually produce direction families, not a large within-family variant swarm.
|
|
156
|
+
- Treat within-family micro-variants as `optimize` brief work unless the mechanism family itself is still unresolved.
|
|
136
157
|
- Every serious candidate must answer `why now?` or `what changed?`, not just `what is the mechanism?`
|
|
137
158
|
- Every selected idea must survive a two-sentence pitch and strongest-objection check before promotion.
|
|
138
159
|
- Do not promote a direction unless you can explain:
|
|
139
160
|
- what limitation it targets
|
|
140
161
|
- why prior methods do not already solve it
|
|
141
162
|
- what evidence would later be needed to defend the claim
|
|
163
|
+
- When the likely next route is a paper-facing main experiment plus analysis package, do not stop at prose-only idea notes; seed the likely `research_questions`, `experimental_designs`, and per-section evidence needs in the outline candidate.
|
|
164
|
+
- If the likely route already has a clear paper-facing structure, seed the future paper line early:
|
|
165
|
+
- identify the likely main-text sections
|
|
166
|
+
- identify which sections will need supplementary evidence rather than only the main run
|
|
167
|
+
- identify the concrete evidence items that must later be maintained in the paper line's outline folder or compiled outline contract
|
|
142
168
|
- If the idea is not novel but still worth doing, state that honestly as:
|
|
143
169
|
- replication value
|
|
144
170
|
- transfer-to-new-setting value
|
|
@@ -182,6 +208,51 @@ In practice:
|
|
|
182
208
|
|
|
183
209
|
Do not skip the `scout` pass just because the quest is already in the `idea` stage.
|
|
184
210
|
|
|
211
|
+
## Direction-shaping protocol
|
|
212
|
+
|
|
213
|
+
Use `references/idea-thinking-flow.md` when the main need is better reasoning hygiene.
|
|
214
|
+
Use `references/idea-generation-playbook.md` when the main need is to create a new idea slate and select one clear next research object.
|
|
215
|
+
|
|
216
|
+
Default creation flow for a fresh idea pass:
|
|
217
|
+
|
|
218
|
+
1. frame one concrete limitation
|
|
219
|
+
2. separate symptom / mechanism hypothesis / consequence
|
|
220
|
+
3. keep one main hypothesis plus `2-3` competing hypotheses
|
|
221
|
+
4. name the primary lever bucket
|
|
222
|
+
5. generate a bounded candidate slate from that framing
|
|
223
|
+
6. record selected / deferred / rejected outcomes explicitly
|
|
224
|
+
|
|
225
|
+
Set the frontier width with a validation-cost estimate before widening:
|
|
226
|
+
|
|
227
|
+
- `fast-check`: the first objective validation loop is likely under about `20` minutes
|
|
228
|
+
- `slow-check`: the first objective validation loop is likely over about `20` minutes or otherwise expensive in compute, queue time, or human delay
|
|
229
|
+
|
|
230
|
+
For `fast-check` idea work:
|
|
231
|
+
|
|
232
|
+
- allow a slightly wider serious slate when the candidates are meaningfully different
|
|
233
|
+
- prefer candidates with cheap, orthogonal falsification paths
|
|
234
|
+
- keep more alternatives alive into `optimize` because validation is cheaper than overthinking
|
|
235
|
+
|
|
236
|
+
For `slow-check` idea work:
|
|
237
|
+
|
|
238
|
+
- keep the serious slate tighter, usually `1-3`
|
|
239
|
+
- demand a clearer bottleneck story and stronger evidence before adding another family
|
|
240
|
+
- prefer the route with the best expected evidence-per-run, not the route with the most speculative upside
|
|
241
|
+
- do not hand off a broad speculative slate just because it sounds interesting
|
|
242
|
+
|
|
243
|
+
Do not start by shopping for modules to add.
|
|
244
|
+
Do not let one attractive mechanism become the de facto framing before the limitation is pinned down.
|
|
245
|
+
Do not let direction-family ideation collapse into within-family variant generation too early.
|
|
246
|
+
|
|
247
|
+
In normal idea work, stop at the direction-family level:
|
|
248
|
+
|
|
249
|
+
- select which mechanism families deserve serious consideration
|
|
250
|
+
- identify the strongest one to carry forward
|
|
251
|
+
- hand off within-family brief shaping to `optimize` when the quest is algorithm-first
|
|
252
|
+
|
|
253
|
+
If the task still requires choosing among mechanism families, stay in `idea`.
|
|
254
|
+
If the family is already chosen and the next need is branchless method-brief shaping, hand off to `optimize`.
|
|
255
|
+
|
|
185
256
|
## Truth sources
|
|
186
257
|
|
|
187
258
|
Use:
|
|
@@ -1118,6 +1189,14 @@ When writing paper memory cards, include enough metadata to avoid redundant sear
|
|
|
1118
1189
|
|
|
1119
1190
|
At the end of ideation, at least one part of the literature survey must be preserved in memory so a later idea pass can retrieve it directly instead of rebuilding the search from scratch.
|
|
1120
1191
|
|
|
1192
|
+
Every serious idea pass should also leave a durable outcome split:
|
|
1193
|
+
|
|
1194
|
+
- one selected idea or selected direction family
|
|
1195
|
+
- any deferred but still plausible alternatives
|
|
1196
|
+
- any rejected alternatives with a one-line rejection reason
|
|
1197
|
+
|
|
1198
|
+
Do not leave the rejected and deferred reasoning only in chat.
|
|
1199
|
+
|
|
1121
1200
|
Promote to global memory only when the lesson is reusable outside this quest.
|
|
1122
1201
|
|
|
1123
1202
|
## Artifact rules
|