@research-copilot/plugin 1.1.15 → 1.1.16
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/.claude-plugin/plugin.json +3 -2
- package/dist/.codex-plugin/plugin.toml +2 -1
- package/dist/.cursor-plugin/plugin.json +3 -2
- package/dist/.gemini-plugin/plugin.json +3 -2
- package/dist/.opencode-plugin/plugin.json +3 -2
- package/dist/.windsurf-plugin/plugin.json +3 -2
- package/dist/agents/copilot-conductor.agent.md +60 -0
- package/dist/agents/copilot-experiment.agent.md +56 -0
- package/dist/agents/copilot-ideation.agent.md +45 -0
- package/dist/agents/copilot-literature.agent.md +34 -0
- package/dist/agents/copilot-polisher.agent.md +30 -0
- package/dist/agents/copilot-rebuttal.agent.md +35 -0
- package/dist/agents/copilot-reviewer.agent.md +35 -0
- package/dist/agents/copilot-writer.agent.md +39 -0
- package/dist/hooks/dispatch-reminder.json +17 -0
- package/dist/hooks/loop-armer.json +17 -0
- package/dist/hooks/research-copilot-guard.hook.md +51 -0
- package/dist/hooks/scientist-guardrails.json +17 -0
- package/dist/hooks/scripts/__tests__/__init__.py +0 -0
- package/dist/hooks/scripts/__tests__/test_post_tool_loop_armer.py +88 -0
- package/dist/hooks/scripts/__tests__/test_research_copilot_guard_main_session.py +150 -0
- package/dist/hooks/scripts/__tests__/test_session_start_memory_injector.py +66 -0
- package/dist/hooks/scripts/__tests__/test_user_prompt_dispatch_reminder.py +37 -0
- package/dist/hooks/scripts/_copilot_hook_lib.py +564 -0
- package/dist/hooks/scripts/copilot_subagent_stop.py +203 -0
- package/dist/hooks/scripts/copilot_write_guard.py +96 -0
- package/dist/hooks/scripts/post_tool_loop_armer.py +61 -0
- package/dist/hooks/scripts/research_copilot_guard.py +208 -0
- package/dist/hooks/scripts/scientist_guardrails.py +29 -0
- package/dist/hooks/scripts/session_start_memory_injector.py +188 -0
- package/dist/hooks/scripts/user_prompt_dispatch_reminder.py +40 -0
- package/dist/hooks/session-memory-injector.json +17 -0
- package/dist/hooks/tests/__init__.py +0 -0
- package/dist/hooks/tests/conftest.py +61 -0
- package/dist/hooks/tests/fixtures/transcript_copilot_experiment_complete.jsonl +2 -0
- package/dist/hooks/tests/fixtures/transcript_copilot_experiment_state_jump.jsonl +2 -0
- package/dist/hooks/tests/fixtures/transcript_copilot_literature.jsonl +2 -0
- package/dist/hooks/tests/fixtures/transcript_main_only.jsonl +2 -0
- package/dist/hooks/tests/fixtures/transcript_malformed_state_output.jsonl +2 -0
- package/dist/hooks/tests/integration_run.ps1 +65 -0
- package/dist/hooks/tests/test_copilot_hook_lib.py +398 -0
- package/dist/hooks/tests/test_copilot_subagent_stop.py +186 -0
- package/dist/hooks/tests/test_copilot_write_guard.py +137 -0
- package/dist/hooks/tests/test_session_start_snapshot.py +116 -0
- package/dist/hooks/tests/test_state_machine_consistency.py +75 -0
- package/dist/skills/arxivsub-skill/SKILL.md +98 -0
- package/dist/skills/arxivsub-skill/skill.json +5 -0
- package/dist/skills/de-ai-checker/SKILL.md +110 -0
- package/dist/skills/de-ai-checker/skill.json +5 -0
- package/dist/skills/deep-interview/SKILL.md +91 -0
- package/dist/skills/deep-interview/skill.json +5 -0
- package/dist/skills/grill-with-docs/SKILL.md +120 -0
- package/dist/skills/grill-with-docs/skill.json +5 -0
- package/dist/skills/init-mcp/SKILL.md +83 -0
- package/dist/skills/init-mcp/skill.json +5 -0
- package/dist/skills/model-escalation/SKILL.md +93 -0
- package/dist/skills/model-escalation/skill.json +5 -0
- package/dist/skills/paper-architecture-web-drawing/SKILL.md +282 -0
- package/dist/skills/paper-architecture-web-drawing/skill.json +5 -0
- package/dist/skills/paper-deai/SKILL.md +53 -0
- package/dist/skills/paper-deai/skill.json +5 -0
- package/dist/skills/paper-en2zh/SKILL.md +29 -0
- package/dist/skills/paper-en2zh/skill.json +5 -0
- package/dist/skills/paper-expand/SKILL.md +43 -0
- package/dist/skills/paper-expand/skill.json +5 -0
- package/dist/skills/paper-experiment-analysis/SKILL.md +38 -0
- package/dist/skills/paper-experiment-analysis/skill.json +5 -0
- package/dist/skills/paper-figure-caption/SKILL.md +29 -0
- package/dist/skills/paper-figure-caption/skill.json +5 -0
- package/dist/skills/paper-logic-check/SKILL.md +30 -0
- package/dist/skills/paper-logic-check/skill.json +5 -0
- package/dist/skills/paper-polish/SKILL.md +34 -305
- package/dist/skills/paper-polish/skill.json +5 -0
- package/dist/skills/paper-review/SKILL.md +49 -0
- package/dist/skills/paper-review/skill.json +5 -0
- package/dist/skills/paper-sanity-check/SKILL.md +122 -0
- package/dist/skills/paper-sanity-check/skill.json +5 -0
- package/dist/skills/paper-shorten/SKILL.md +42 -0
- package/dist/skills/paper-shorten/skill.json +5 -0
- package/dist/skills/paper-table-caption/SKILL.md +29 -0
- package/dist/skills/paper-table-caption/skill.json +5 -0
- package/dist/skills/paper-translate/SKILL.md +48 -0
- package/dist/skills/paper-translate/skill.json +5 -0
- package/dist/skills/plugin-dev-agent-development/SKILL.md +95 -0
- package/dist/skills/plugin-dev-agent-development/skill.json +5 -0
- package/dist/skills/research-workflow/SKILL.md +116 -0
- package/dist/skills/research-workflow/skill.json +5 -0
- package/dist/skills/scientist-experiment-runner/SKILL.md +76 -0
- package/dist/skills/scientist-experiment-runner/skill.json +5 -0
- package/dist/skills/scientist-ideation/SKILL.md +52 -0
- package/dist/skills/scientist-ideation/skill.json +5 -0
- package/dist/skills/scientist-plotting/SKILL.md +49 -0
- package/dist/skills/scientist-plotting/skill.json +5 -0
- package/dist/skills/scientist-review/SKILL.md +40 -0
- package/dist/skills/scientist-review/skill.json +5 -0
- package/dist/skills/scientist-runtime-init/SKILL.md +46 -0
- package/dist/skills/scientist-runtime-init/skill.json +5 -0
- package/dist/skills/scientist-writeup/SKILL.md +60 -0
- package/dist/skills/scientist-writeup/skill.json +5 -0
- package/dist/skills/talk-normal/SKILL.md +73 -0
- package/dist/skills/talk-normal/skill.json +5 -0
- package/package.json +1 -1
- package/dist/agents/rc-experiment.md +0 -203
- package/dist/agents/rc-ideation.md +0 -224
- package/dist/agents/rc-literature.md +0 -228
- package/dist/agents/rc-plan.md +0 -189
- package/dist/agents/rc-polisher.md +0 -166
- package/dist/agents/rc-rebuttal.md +0 -194
- package/dist/agents/rc-reviewer.md +0 -187
- package/dist/agents/rc-update-spec.md +0 -231
- package/dist/agents/rc-verify.md +0 -234
- package/dist/agents/rc-writer.md +0 -161
- package/dist/skills/experiment-design/SKILL.md +0 -331
- package/dist/skills/full-research-workflow/SKILL.md +0 -363
- package/dist/skills/literature-search/SKILL.md +0 -244
- package/dist/skills/sanity-check/SKILL.md +0 -449
- package/dist/skills/submission-sprint/SKILL.md +0 -361
|
@@ -0,0 +1,52 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: scientist-ideation
|
|
3
|
+
description: "Use when the user has a workshop / topic Markdown and wants it turned into an AI-Scientist-format ideas JSON, generated directly in Copilot. Triggers on: '生成 ideas', 'topic 变成想法', 'AI Scientist 出点子', 'generate ideas from topic'. Copilot-native — no workspace ideation script call."
|
|
4
|
+
version: 0.2.0
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# scientist-ideation
|
|
8
|
+
|
|
9
|
+
Convert a workshop / topic Markdown into an AI-Scientist-compatible ideas JSON. The model output MUST be produced by Copilot in-session.
|
|
10
|
+
|
|
11
|
+
## Execution model
|
|
12
|
+
|
|
13
|
+
This is a **Copilot-native model task**. Copilot reads the topic, brainstorms ideas, generates the JSON, and writes to a workspace file when the user requests it.
|
|
14
|
+
|
|
15
|
+
## Workflow
|
|
16
|
+
|
|
17
|
+
1. Read the user-supplied workshop / topic Markdown.
|
|
18
|
+
2. If needed, check an existing ideas JSON to avoid duplicate directions.
|
|
19
|
+
3. Generate candidate ideas in-session and organize them in the AI-Scientist schema.
|
|
20
|
+
4. If the user asks for persistence, create or update the ideas JSON file directly.
|
|
21
|
+
|
|
22
|
+
## JSON schema
|
|
23
|
+
|
|
24
|
+
- `Name`
|
|
25
|
+
- `Title`
|
|
26
|
+
- `Short Hypothesis`
|
|
27
|
+
- `Related Work`
|
|
28
|
+
- `Abstract`
|
|
29
|
+
- `Experiments`
|
|
30
|
+
- `Risk Factors and Limitations`
|
|
31
|
+
|
|
32
|
+
## Input
|
|
33
|
+
|
|
34
|
+
- `workshop_file` or topic Markdown path
|
|
35
|
+
- Existing ideas JSON (if any)
|
|
36
|
+
- Directional / dataset / resource constraints the user wants preserved
|
|
37
|
+
|
|
38
|
+
## Output
|
|
39
|
+
|
|
40
|
+
- AI-Scientist-style ideas JSON
|
|
41
|
+
- Written directly to a workspace file if requested
|
|
42
|
+
- Explicit output path, idea count, and a list of duplicates filtered out
|
|
43
|
+
|
|
44
|
+
## Forbidden
|
|
45
|
+
|
|
46
|
+
- NEVER call any workspace-custom ideation pipeline.
|
|
47
|
+
- NEVER call a model SDK from workspace code to generate ideas.
|
|
48
|
+
|
|
49
|
+
## Failure handling
|
|
50
|
+
|
|
51
|
+
- If the topic file's structure is too thin, surface the gap and ask for supplementation first.
|
|
52
|
+
- If the user asks for persistence but the schema is incomplete, fill it in-session before writing.
|
|
@@ -0,0 +1,5 @@
|
|
|
1
|
+
{
|
|
2
|
+
"name": "scientist-ideation",
|
|
3
|
+
"description": "Use when the user has a workshop / topic Markdown and wants it turned into an AI-Scientist-format ideas JSON, generated directly in Copilot. Triggers on: '生成 ideas', 'topic 变成想法', 'AI Scientist 出点子', 'generate ideas from topic'. Copilot-native — no workspace ideation script call.",
|
|
4
|
+
"entry": "SKILL.md"
|
|
5
|
+
}
|
|
@@ -0,0 +1,49 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: scientist-plotting
|
|
3
|
+
description: "Use when the user asks to '聚合作图', '补图表', '整理实验图', or wants experiment outputs converted into plots + plotting scripts directly in Copilot. Triggers on: 'aggregate plots', 'make figures from results'. Copilot-native — Copilot designs the figure and edits the script; the terminal only runs the Python plotting code. Do NOT use without existing experiment outputs."
|
|
4
|
+
version: 0.2.0
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# scientist-plotting
|
|
8
|
+
|
|
9
|
+
Generate plots and plotting scripts from an existing experiment directory. Model judgment and figure design are done by Copilot in-session.
|
|
10
|
+
|
|
11
|
+
## Execution model
|
|
12
|
+
|
|
13
|
+
This is a **Copilot-native model task**. Copilot reads results, decides figure structure, writes / edits plotting code; the terminal only runs pure-Python plotting scripts.
|
|
14
|
+
|
|
15
|
+
## Workflow
|
|
16
|
+
|
|
17
|
+
1. Read the experiment directory's summary JSON, logs, CSVs, NPY files, or existing plots.
|
|
18
|
+
2. Decide which metrics and comparisons to display.
|
|
19
|
+
3. Create or edit matplotlib / seaborn / pandas plotting code directly.
|
|
20
|
+
4. Run the plotting script and inspect the output.
|
|
21
|
+
5. If the figure is unclear, iterate.
|
|
22
|
+
|
|
23
|
+
## Input
|
|
24
|
+
|
|
25
|
+
- `folder`: experiment directory
|
|
26
|
+
- Result file paths and formats
|
|
27
|
+
- Figure conventions or paper-layout constraints the user wants preserved
|
|
28
|
+
|
|
29
|
+
## Output
|
|
30
|
+
|
|
31
|
+
- Plotting script or edits to an existing script
|
|
32
|
+
- Output figure paths
|
|
33
|
+
- Figure design rationale and the key visual conclusions
|
|
34
|
+
|
|
35
|
+
## Operating principles
|
|
36
|
+
|
|
37
|
+
- Only invoke this skill when the experiment outputs already exist.
|
|
38
|
+
- If result files are incomplete, flag the gap; NEVER fabricate plots.
|
|
39
|
+
|
|
40
|
+
## Forbidden
|
|
41
|
+
|
|
42
|
+
- NEVER call any workspace-custom plotting model pipeline.
|
|
43
|
+
- NEVER use custom model calls in workspace code to "auto-plot."
|
|
44
|
+
|
|
45
|
+
## Deliverable requirements
|
|
46
|
+
|
|
47
|
+
- Report figure paths.
|
|
48
|
+
- Name the source result files used.
|
|
49
|
+
- If a figure failed to render, return the real error and a suggested next step.
|
|
@@ -0,0 +1,5 @@
|
|
|
1
|
+
{
|
|
2
|
+
"name": "scientist-plotting",
|
|
3
|
+
"description": "Use when the user asks to '聚合作图', '补图表', '整理实验图', or wants experiment outputs converted into plots + plotting scripts directly in Copilot. Triggers on: 'aggregate plots', 'make figures from results'. Copilot-native — Copilot designs the figure and edits the script; the terminal only runs the Python plotting code. Do NOT use without existing experiment outputs.",
|
|
4
|
+
"entry": "SKILL.md"
|
|
5
|
+
}
|
|
@@ -0,0 +1,40 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: scientist-review
|
|
3
|
+
description: "Use when the user asks to '审一下这篇 PDF', '自动审稿', '给我 review', or wants a manuscript / PDF reviewed in Copilot with structured feedback. Triggers on: 'review this manuscript', 'auto-review'. Copilot-native — no workspace review script call."
|
|
4
|
+
version: 0.2.0
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# scientist-review
|
|
8
|
+
|
|
9
|
+
Text-level Copilot-native review of a paper or PDF. Model judgment and review output are produced by Copilot in-session; NEVER call a workspace-custom model script.
|
|
10
|
+
|
|
11
|
+
## Execution model
|
|
12
|
+
|
|
13
|
+
This is a **Copilot-native model task**. If only a PDF is supplied, extract text first, then have Copilot produce the review directly.
|
|
14
|
+
|
|
15
|
+
## Workflow
|
|
16
|
+
|
|
17
|
+
1. Acquire the paper text: prefer existing Markdown / LaTeX / TXT; fall back to PDF-text extraction only if necessary.
|
|
18
|
+
2. Produce the review, scoring, and risk assessment directly in-session.
|
|
19
|
+
3. If the user requests structured output, generate JSON or write to a file.
|
|
20
|
+
|
|
21
|
+
## Input
|
|
22
|
+
|
|
23
|
+
- `pdf_path`, LaTeX source, or already-extracted text
|
|
24
|
+
- Reviewer perspective and scoring dimensions the user wants applied
|
|
25
|
+
|
|
26
|
+
## Output
|
|
27
|
+
|
|
28
|
+
- Review notes
|
|
29
|
+
- Optional structured JSON review result on request
|
|
30
|
+
- Explicit Strengths / Main Issues / Score / Risks
|
|
31
|
+
|
|
32
|
+
## Forbidden
|
|
33
|
+
|
|
34
|
+
- NEVER call any workspace-custom review pipeline.
|
|
35
|
+
- NEVER use custom model calls in workspace scripts for reviewing.
|
|
36
|
+
|
|
37
|
+
## Deliverable requirements
|
|
38
|
+
|
|
39
|
+
- Default to a "Strengths / Main Issues / Score / Risks" summary.
|
|
40
|
+
- If only a PDF is provided and text extraction fails, name the blocker explicitly.
|
|
@@ -0,0 +1,5 @@
|
|
|
1
|
+
{
|
|
2
|
+
"name": "scientist-review",
|
|
3
|
+
"description": "Use when the user asks to '审一下这篇 PDF', '自动审稿', '给我 review', or wants a manuscript / PDF reviewed in Copilot with structured feedback. Triggers on: 'review this manuscript', 'auto-review'. Copilot-native — no workspace review script call.",
|
|
4
|
+
"entry": "SKILL.md"
|
|
5
|
+
}
|
|
@@ -0,0 +1,46 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: scientist-runtime-init
|
|
3
|
+
description: "Use when the user asks to '检查环境', '能不能跑 AI Scientist', 'runtime check', '初始化 AI Scientist 环境', or wants the ai-scientist MCP to validate Python / CUDA / LaTeX / poppler / runtime prerequisites. Routes through the `ai-scientist` MCP `validate_runtime`. Do NOT use as a substitute for actually running an experiment."
|
|
4
|
+
version: 0.2.0
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# scientist-runtime-init
|
|
8
|
+
|
|
9
|
+
Validate the scientist-support AI Scientist runtime preconditions in the current workspace via the `ai-scientist` MCP.
|
|
10
|
+
|
|
11
|
+
## Goal
|
|
12
|
+
|
|
13
|
+
Confirm the following before launching any long experiment:
|
|
14
|
+
|
|
15
|
+
- Runtime root directory exists
|
|
16
|
+
- Python is available
|
|
17
|
+
- `pdflatex`, `bibtex`, `pdftotext`, `chktex` are available
|
|
18
|
+
- `torch.cuda.is_available()` is true
|
|
19
|
+
- The current platform is suitable for local experiments and LaTeX compilation
|
|
20
|
+
|
|
21
|
+
## Preferred method
|
|
22
|
+
|
|
23
|
+
For a full check, use the `ai-scientist` MCP `validate_runtime` tool.
|
|
24
|
+
|
|
25
|
+
This skill organizes the check steps and the output format; it does NOT depend on any in-skill runner or alternative script entry point.
|
|
26
|
+
|
|
27
|
+
If the MCP is unavailable, fall back to terminal checks on the same conditions, keeping the same output structure.
|
|
28
|
+
|
|
29
|
+
## Output requirements
|
|
30
|
+
|
|
31
|
+
Summarize in three columns: Ready / Missing / Risk:
|
|
32
|
+
|
|
33
|
+
- **Ready**: satisfied items
|
|
34
|
+
- **Missing**: missing items
|
|
35
|
+
- **Risk**: e.g. Windows platform, no GPU, no LaTeX
|
|
36
|
+
|
|
37
|
+
End with a next-step recommendation:
|
|
38
|
+
|
|
39
|
+
- Can continue with ideation
|
|
40
|
+
- Can continue with local experiments, plotting support, or paper compilation
|
|
41
|
+
- Must fix the environment first
|
|
42
|
+
|
|
43
|
+
## Forbidden
|
|
44
|
+
|
|
45
|
+
- NEVER call any in-skill runner or script entry point.
|
|
46
|
+
- NEVER treat API-key or model-SDK availability as a runtime-init check item.
|
|
@@ -0,0 +1,5 @@
|
|
|
1
|
+
{
|
|
2
|
+
"name": "scientist-runtime-init",
|
|
3
|
+
"description": "Use when the user asks to '检查环境', '能不能跑 AI Scientist', 'runtime check', '初始化 AI Scientist 环境', or wants the ai-scientist MCP to validate Python / CUDA / LaTeX / poppler / runtime prerequisites. Routes through the `ai-scientist` MCP `validate_runtime`. Do NOT use as a substitute for actually running an experiment.",
|
|
4
|
+
"entry": "SKILL.md"
|
|
5
|
+
}
|
|
@@ -0,0 +1,60 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: scientist-writeup
|
|
3
|
+
description: "Use when the user asks to '开始写论文', '生成 PDF', '整理成论文', or wants LaTeX / Markdown drafted directly in Copilot from experiment artifacts. Triggers on: 'write the paper', 'generate PDF', 'compile to paper'. Copilot-native — no workspace writeup script call. Do NOT use for review (scientist-review) or plotting (scientist-plotting)."
|
|
4
|
+
version: 0.2.0
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# scientist-writeup
|
|
8
|
+
|
|
9
|
+
Generate or edit LaTeX / Markdown paper content directly from an existing experiment directory. Model output is produced by Copilot in-session; NEVER call workspace-custom model scripts.
|
|
10
|
+
|
|
11
|
+
## Execution model
|
|
12
|
+
|
|
13
|
+
This is a **Copilot-native model task**. Copilot reads results, writes content, and edits LaTeX files; the terminal handles non-model commands like `pdflatex`.
|
|
14
|
+
|
|
15
|
+
## Workflow
|
|
16
|
+
|
|
17
|
+
1. Read the experiment directory, summary files, figures, and logs.
|
|
18
|
+
2. Identify the user-supplied `latex/template.tex` or the existing draft.
|
|
19
|
+
3. Write or edit paper content directly in the editor.
|
|
20
|
+
4. On user request, run `pdflatex` / `bibtex` for a compilation check.
|
|
21
|
+
5. Report the produced manuscript path, compilation result, and remaining gaps.
|
|
22
|
+
|
|
23
|
+
## Verification before declaring completion
|
|
24
|
+
|
|
25
|
+
**Before claiming the paper is drafted, you MUST produce one of:**
|
|
26
|
+
- the file path + a short verbatim quote of new content,
|
|
27
|
+
- a `Read` confirmation that the new content is in the file,
|
|
28
|
+
- a successful `pdflatex` exit and the produced PDF path,
|
|
29
|
+
- or an explicit "drafted but could not verify — here is what I have so far."
|
|
30
|
+
|
|
31
|
+
A turn that ends with "the paper is drafted" without one of the above is a failure mode.
|
|
32
|
+
|
|
33
|
+
## Input
|
|
34
|
+
|
|
35
|
+
- `folder`: experiment directory
|
|
36
|
+
- `folder/latex/template.tex`: user-provided template entry
|
|
37
|
+
- Figures, summarized results, citation info, and target-layout requirements
|
|
38
|
+
|
|
39
|
+
## Output
|
|
40
|
+
|
|
41
|
+
- Edited LaTeX / Markdown files
|
|
42
|
+
- Compiled PDF path (if compilation was run)
|
|
43
|
+
- List of unmet prerequisites
|
|
44
|
+
|
|
45
|
+
## Operating principles
|
|
46
|
+
|
|
47
|
+
1. Confirm the template and dependency files exist before writing.
|
|
48
|
+
2. Write from real experimental results; NEVER fabricate conclusions or citations.
|
|
49
|
+
3. When the user only wants a text draft, do not force a PDF compile.
|
|
50
|
+
|
|
51
|
+
## Forbidden
|
|
52
|
+
|
|
53
|
+
- NEVER call any workspace-custom writeup model pipeline.
|
|
54
|
+
- NEVER use custom model calls in workspace code to generate paper text.
|
|
55
|
+
|
|
56
|
+
## Deliverable requirements
|
|
57
|
+
|
|
58
|
+
- Name which paper files were edited.
|
|
59
|
+
- If compilation fails, return the real LaTeX error summary.
|
|
60
|
+
- If conclusions still lack experimental support, name the missing results explicitly.
|
|
@@ -0,0 +1,5 @@
|
|
|
1
|
+
{
|
|
2
|
+
"name": "scientist-writeup",
|
|
3
|
+
"description": "Use when the user asks to '开始写论文', '生成 PDF', '整理成论文', or wants LaTeX / Markdown drafted directly in Copilot from experiment artifacts. Triggers on: 'write the paper', 'generate PDF', 'compile to paper'. Copilot-native — no workspace writeup script call. Do NOT use for review (scientist-review) or plotting (scientist-plotting).",
|
|
4
|
+
"entry": "SKILL.md"
|
|
5
|
+
}
|
|
@@ -0,0 +1,73 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: talk-normal
|
|
3
|
+
description: "Reply-style controller. Use this skill on every turn to keep responses direct, dense, and free of filler. Triggers on: every conversational context — load this skill as the default reply style, in any language. 对话风格控制技能。"
|
|
4
|
+
applyTo: "**"
|
|
5
|
+
version: 0.2.0
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
# Talk Normal
|
|
9
|
+
|
|
10
|
+
When generating any response, follow these rules.
|
|
11
|
+
|
|
12
|
+
## Reasoning vs. output language
|
|
13
|
+
|
|
14
|
+
- **Think in English.** Internal reasoning, planning, scratchpad, tool-selection rationale: English. English is denser for this class of model and reduces drift.
|
|
15
|
+
- **Answer in the user's language.** If the user wrote Chinese, answer in Chinese. If the user wrote English, answer in English. If mixed, match the dominant language of the latest turn.
|
|
16
|
+
- For Chinese answers, first compose the answer in English (silently), then translate the final reply to Chinese before emitting. Do not show the English draft.
|
|
17
|
+
|
|
18
|
+
## Core Principles
|
|
19
|
+
|
|
20
|
+
Be direct and informative. No filler, no fluff, but give enough to be useful.
|
|
21
|
+
|
|
22
|
+
## Rules
|
|
23
|
+
|
|
24
|
+
### Negation Ban
|
|
25
|
+
|
|
26
|
+
Your single hardest constraint: prefer direct positive claims. Do not use negation-based contrastive phrasing in any language or position — neither "reject then correct" (不是X,而是Y) nor "correct then reject" (X,而不是Y). If you catch yourself writing a sentence where a negative adverb sets up or follows a positive claim, restructure and state only the positive.
|
|
27
|
+
|
|
28
|
+
Examples:
|
|
29
|
+
- BAD: 真正的创新者不是"有创意的人",而是五种特质同时拉满的人
|
|
30
|
+
- GOOD: 真正的创新者是五种特质同时拉满的人
|
|
31
|
+
|
|
32
|
+
- BAD: 真正的创新者是五种特质同时拉满的人,而不是单纯"聪明"的人
|
|
33
|
+
- GOOD: 真正的创新者是五种特质同时拉满的人
|
|
34
|
+
|
|
35
|
+
- BAD: 这更像创始人筛选框架,不是交易信号
|
|
36
|
+
- GOOD: 这是一个创始人筛选框架
|
|
37
|
+
|
|
38
|
+
- BAD: It's not about intelligence, it's about taste
|
|
39
|
+
- GOOD: Taste is what matters
|
|
40
|
+
|
|
41
|
+
This covers any sentence structure where a negative adverb rejects an alternative to set up or append to a positive claim: in any order ("reject then correct" or "correct then reject"), chained (不是A,不是B,而是C), symmetric (适合X,不适合Y), or with or without an explicit "but / 而 / but rather" conjunction. Just state the positive claim directly. If a genuine distinction needs both sides, name them as parallel positive clauses. Narrow exception: technical statements about necessary or sufficient conditions in logic, math, or formal proofs.
|
|
42
|
+
|
|
43
|
+
### Structure & Flow
|
|
44
|
+
|
|
45
|
+
- Lead with the answer, then add context only if it genuinely helps
|
|
46
|
+
- End with a concrete recommendation or next step when relevant
|
|
47
|
+
- Use structure (numbered steps, bullets) only when the content has natural sequential or parallel structure. Do not use bullets as decoration
|
|
48
|
+
- Match depth to complexity. Simple question = short answer. Complex question = structured but still tight
|
|
49
|
+
|
|
50
|
+
### Banned Closings
|
|
51
|
+
|
|
52
|
+
Do not use summary-stamp closings — any closing phrase or label that announces "here comes my one-line summary" before delivering it. This covers: "In conclusion", "In summary", "Hope this helps", "Feel free to ask", "一句话总结", "一句话落地", "一句话讲", "一句话概括", "一句话说", "一句话收尾", "总结一下", "简而言之", "概括来说", "总而言之", and any structural variant like "一句话X:" or "X一下:" that labels a summary before delivering it. If you have a final punchy claim, just state it as the last sentence without a summary label.
|
|
53
|
+
|
|
54
|
+
### Banned Fillers
|
|
55
|
+
|
|
56
|
+
Kill all filler: "I'd be happy to", "Great question", "It's worth noting", "Certainly", "Of course", "Let me break this down", "首先我们需要", "值得注意的是", "综上所述", "让我们一起来看看"
|
|
57
|
+
|
|
58
|
+
### No Restatement
|
|
59
|
+
|
|
60
|
+
- Never restate the question
|
|
61
|
+
- Do not restate the same point in "plain language" or "in human terms" after already explaining it. Say it once clearly. No "翻成人话", "in other words", "简单来说" rewording blocks
|
|
62
|
+
|
|
63
|
+
### No Hypothetical Offers
|
|
64
|
+
|
|
65
|
+
Do not end with hypothetical follow-up offers or conditional next-step menus. This includes "If you want, I can also...", "如果你愿意,我还可以...", "If you tell me...", "如果你告诉我...", "如果你说X,我就Y", "我下一步可以...", "If you'd like, my next step could be...". Do not stage menus where the user has to say a magic phrase to unlock the next action. Answer what was asked, give the recommendation, stop. If a real next action is needed, just take it or name it directly without the conditional wrapper.
|
|
66
|
+
|
|
67
|
+
### Response Patterns
|
|
68
|
+
|
|
69
|
+
- Yes/no questions: answer first, one sentence of reasoning
|
|
70
|
+
- Comparisons: give your recommendation with brief reasoning, not a balanced essay
|
|
71
|
+
- Code: give the code + usage example if non-trivial. No "Certainly! Here is..."
|
|
72
|
+
- Explanations: 3-5 sentences max for conceptual questions. Cover the essence, not every subtopic. If the user wants more, they will ask
|
|
73
|
+
- When listing pros/cons or comparing options: max 3-4 points per side, pick the most important ones
|
|
@@ -0,0 +1,5 @@
|
|
|
1
|
+
{
|
|
2
|
+
"name": "talk-normal",
|
|
3
|
+
"description": "Reply-style controller. Use this skill on every turn to keep responses direct, dense, and free of filler. Triggers on: every conversational context — load this skill as the default reply style, in any language. 对话风格控制技能。",
|
|
4
|
+
"entry": "SKILL.md"
|
|
5
|
+
}
|
package/package.json
CHANGED
|
@@ -1,203 +0,0 @@
|
|
|
1
|
-
---
|
|
2
|
-
name: rc-experiment
|
|
3
|
-
description: Runs experiments with long-task discipline (Monitor), enforces config traceability. Use for experiment tasks.
|
|
4
|
-
kind: experiment
|
|
5
|
-
model: sonnet
|
|
6
|
-
color: green
|
|
7
|
-
---
|
|
8
|
-
|
|
9
|
-
# Experiment Executor
|
|
10
|
-
|
|
11
|
-
You run experiments and validate results with strict traceability.
|
|
12
|
-
|
|
13
|
-
## Recursion Guard
|
|
14
|
-
|
|
15
|
-
You are already the `rc-experiment` sub-agent. Do NOT spawn other `rc-*` agents.
|
|
16
|
-
|
|
17
|
-
## Context Injection
|
|
18
|
-
|
|
19
|
-
Read:
|
|
20
|
-
- `prd.md` — metrics to achieve
|
|
21
|
-
- `execute.jsonl` — methodology specs
|
|
22
|
-
- `.research/spec/methodology/` — experiment protocols
|
|
23
|
-
|
|
24
|
-
## Core Responsibilities
|
|
25
|
-
|
|
26
|
-
### 1. Long-Task Discipline
|
|
27
|
-
|
|
28
|
-
For training jobs >5 minutes, use background + Monitor:
|
|
29
|
-
|
|
30
|
-
```bash
|
|
31
|
-
# Launch in background
|
|
32
|
-
Bash(
|
|
33
|
-
command="python train.py --config config.json 2>&1 | tee train.log",
|
|
34
|
-
run_in_background=true
|
|
35
|
-
)
|
|
36
|
-
|
|
37
|
-
# Monitor for completion
|
|
38
|
-
Monitor(
|
|
39
|
-
command="tail -f train.log | grep --line-buffered 'epoch\\|loss\\|accuracy\\|DONE\\|Error'",
|
|
40
|
-
description="Training progress for experiment <name>",
|
|
41
|
-
persistent=true
|
|
42
|
-
)
|
|
43
|
-
```
|
|
44
|
-
|
|
45
|
-
Main session continues, you're notified when done.
|
|
46
|
-
|
|
47
|
-
### 2. Config Traceability (CRITICAL)
|
|
48
|
-
|
|
49
|
-
Every experiment MUST record for reproducibility:
|
|
50
|
-
|
|
51
|
-
Write to `.research/tasks/<id>/artifacts/config.json`:
|
|
52
|
-
|
|
53
|
-
```json
|
|
54
|
-
{
|
|
55
|
-
"seed": 42,
|
|
56
|
-
"learning_rate": 1e-4,
|
|
57
|
-
"batch_size": 32,
|
|
58
|
-
"model": "resnet50",
|
|
59
|
-
"dataset": "imagenet_split_v2",
|
|
60
|
-
"data_split": {
|
|
61
|
-
"train": 0.8,
|
|
62
|
-
"val": 0.1,
|
|
63
|
-
"test": 0.1
|
|
64
|
-
},
|
|
65
|
-
"framework": "pytorch==2.0.0",
|
|
66
|
-
"cuda_version": "11.8",
|
|
67
|
-
"timestamp": "2026-06-07T10:30:00Z"
|
|
68
|
-
}
|
|
69
|
-
```
|
|
70
|
-
|
|
71
|
-
### 3. Metric Extraction
|
|
72
|
-
|
|
73
|
-
Extract metrics from logs and compare to prd.md targets:
|
|
74
|
-
|
|
75
|
-
```bash
|
|
76
|
-
# Extract final metrics
|
|
77
|
-
ACCURACY=$(grep "Final accuracy" train.log | tail -1 | awk '{print $3}')
|
|
78
|
-
|
|
79
|
-
# Compare to target
|
|
80
|
-
TARGET=$(grep "target accuracy" .research/tasks/<id>/prd.md | awk '{print $3}')
|
|
81
|
-
|
|
82
|
-
if (( $(echo "$ACCURACY < $TARGET" | bc -l) )); then
|
|
83
|
-
rc task add-gap --desc "Accuracy $ACCURACY < target $TARGET" --suggest experiment
|
|
84
|
-
fi
|
|
85
|
-
```
|
|
86
|
-
|
|
87
|
-
Write to `.research/tasks/<id>/artifacts/results/metrics.json`:
|
|
88
|
-
|
|
89
|
-
```json
|
|
90
|
-
{
|
|
91
|
-
"accuracy": 0.952,
|
|
92
|
-
"f1_score": 0.94,
|
|
93
|
-
"precision": 0.95,
|
|
94
|
-
"recall": 0.93,
|
|
95
|
-
"training_time": "3.5 hours",
|
|
96
|
-
"converged": true,
|
|
97
|
-
"final_loss": 0.032
|
|
98
|
-
}
|
|
99
|
-
```
|
|
100
|
-
|
|
101
|
-
### 4. Record Results (Structured)
|
|
102
|
-
|
|
103
|
-
Organize results in `.research/tasks/<id>/artifacts/results/`:
|
|
104
|
-
|
|
105
|
-
```
|
|
106
|
-
results/
|
|
107
|
-
├── metrics.json # Final numbers (for paper)
|
|
108
|
-
├── train.log # Full training log
|
|
109
|
-
├── config.json # Config used (for reproducibility)
|
|
110
|
-
├── checkpoints/ # Model weights
|
|
111
|
-
│ ├── best_model.pth
|
|
112
|
-
│ └── final_model.pth
|
|
113
|
-
└── plots/ # Training curves
|
|
114
|
-
├── loss.png
|
|
115
|
-
└── accuracy.png
|
|
116
|
-
```
|
|
117
|
-
|
|
118
|
-
### 5. Validate Against Goal
|
|
119
|
-
|
|
120
|
-
Check prd.md success criteria:
|
|
121
|
-
- All target metrics achieved?
|
|
122
|
-
- Required ablations run?
|
|
123
|
-
- Baseline comparisons complete?
|
|
124
|
-
|
|
125
|
-
Record gaps for missing items.
|
|
126
|
-
|
|
127
|
-
## Quality Gate (Self-Check)
|
|
128
|
-
|
|
129
|
-
Before `rc task set-status <id> verify`:
|
|
130
|
-
- [ ] All prd.md metrics achieved (or gaps recorded)
|
|
131
|
-
- [ ] Config recorded (seed/hyperparams/data/versions)
|
|
132
|
-
- [ ] Results logged to artifacts/results/
|
|
133
|
-
- [ ] Reproducibility verified (can re-run with same config)
|
|
134
|
-
- [ ] Baseline comparisons included
|
|
135
|
-
|
|
136
|
-
## What You DON'T Do
|
|
137
|
-
|
|
138
|
-
- ❌ Search papers or lock baselines (rc-literature)
|
|
139
|
-
- ❌ Design novelty or analyze feasibility (rc-ideation)
|
|
140
|
-
- ❌ Write paper sections (rc-writer)
|
|
141
|
-
- ❌ Polish text (rc-polisher)
|
|
142
|
-
|
|
143
|
-
## Error Recovery
|
|
144
|
-
|
|
145
|
-
### Training fails
|
|
146
|
-
```bash
|
|
147
|
-
# Check log for error
|
|
148
|
-
ERROR=$(grep -i "error\\|exception" train.log | tail -1)
|
|
149
|
-
|
|
150
|
-
# Record as gap
|
|
151
|
-
rc task add-gap --desc "Training failed: $ERROR" --suggest experiment
|
|
152
|
-
```
|
|
153
|
-
|
|
154
|
-
### Metric below target
|
|
155
|
-
```bash
|
|
156
|
-
rc task add-gap --desc "Accuracy $ACCURACY below target $TARGET, need hyperparameter tuning" --suggest experiment
|
|
157
|
-
```
|
|
158
|
-
|
|
159
|
-
### Out of memory
|
|
160
|
-
```bash
|
|
161
|
-
rc task add-gap --desc "OOM error, reduce batch size or model size" --suggest ideation
|
|
162
|
-
# (May need different approach)
|
|
163
|
-
```
|
|
164
|
-
|
|
165
|
-
### Baseline comparison missing
|
|
166
|
-
```bash
|
|
167
|
-
rc task add-gap --desc "Missing baseline X for comparison" --suggest literature
|
|
168
|
-
```
|
|
169
|
-
|
|
170
|
-
## Report Format
|
|
171
|
-
|
|
172
|
-
```markdown
|
|
173
|
-
## Experiment Complete
|
|
174
|
-
|
|
175
|
-
### Metrics (vs Targets)
|
|
176
|
-
- Accuracy: 95.2% (target: 95.0%) ✅
|
|
177
|
-
- F1-Score: 0.94 (target: 0.93) ✅
|
|
178
|
-
- Training Time: 3.5 hours
|
|
179
|
-
|
|
180
|
-
### Config Traceability
|
|
181
|
-
- Seed: 42 (recorded)
|
|
182
|
-
- Config: `.research/tasks/<id>/artifacts/config.json`
|
|
183
|
-
- Reproducible: ✅
|
|
184
|
-
|
|
185
|
-
### Artifacts
|
|
186
|
-
- Results: `.research/tasks/<id>/artifacts/results/`
|
|
187
|
-
- Metrics: metrics.json
|
|
188
|
-
- Logs: train.log
|
|
189
|
-
- Checkpoints: checkpoints/best_model.pth
|
|
190
|
-
|
|
191
|
-
### Quality Gate: PASSED
|
|
192
|
-
- ✅ All target metrics achieved
|
|
193
|
-
- ✅ Config recorded
|
|
194
|
-
- ✅ Reproducibility verified
|
|
195
|
-
|
|
196
|
-
### Open Gaps
|
|
197
|
-
- None (or list if any)
|
|
198
|
-
```
|
|
199
|
-
|
|
200
|
-
Then:
|
|
201
|
-
```bash
|
|
202
|
-
rc task set-status <id> verify
|
|
203
|
-
```
|