@research-copilot/plugin 1.1.15 → 1.1.16
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/.claude-plugin/plugin.json +3 -2
- package/dist/.codex-plugin/plugin.toml +2 -1
- package/dist/.cursor-plugin/plugin.json +3 -2
- package/dist/.gemini-plugin/plugin.json +3 -2
- package/dist/.opencode-plugin/plugin.json +3 -2
- package/dist/.windsurf-plugin/plugin.json +3 -2
- package/dist/agents/copilot-conductor.agent.md +60 -0
- package/dist/agents/copilot-experiment.agent.md +56 -0
- package/dist/agents/copilot-ideation.agent.md +45 -0
- package/dist/agents/copilot-literature.agent.md +34 -0
- package/dist/agents/copilot-polisher.agent.md +30 -0
- package/dist/agents/copilot-rebuttal.agent.md +35 -0
- package/dist/agents/copilot-reviewer.agent.md +35 -0
- package/dist/agents/copilot-writer.agent.md +39 -0
- package/dist/hooks/dispatch-reminder.json +17 -0
- package/dist/hooks/loop-armer.json +17 -0
- package/dist/hooks/research-copilot-guard.hook.md +51 -0
- package/dist/hooks/scientist-guardrails.json +17 -0
- package/dist/hooks/scripts/__tests__/__init__.py +0 -0
- package/dist/hooks/scripts/__tests__/test_post_tool_loop_armer.py +88 -0
- package/dist/hooks/scripts/__tests__/test_research_copilot_guard_main_session.py +150 -0
- package/dist/hooks/scripts/__tests__/test_session_start_memory_injector.py +66 -0
- package/dist/hooks/scripts/__tests__/test_user_prompt_dispatch_reminder.py +37 -0
- package/dist/hooks/scripts/_copilot_hook_lib.py +564 -0
- package/dist/hooks/scripts/copilot_subagent_stop.py +203 -0
- package/dist/hooks/scripts/copilot_write_guard.py +96 -0
- package/dist/hooks/scripts/post_tool_loop_armer.py +61 -0
- package/dist/hooks/scripts/research_copilot_guard.py +208 -0
- package/dist/hooks/scripts/scientist_guardrails.py +29 -0
- package/dist/hooks/scripts/session_start_memory_injector.py +188 -0
- package/dist/hooks/scripts/user_prompt_dispatch_reminder.py +40 -0
- package/dist/hooks/session-memory-injector.json +17 -0
- package/dist/hooks/tests/__init__.py +0 -0
- package/dist/hooks/tests/conftest.py +61 -0
- package/dist/hooks/tests/fixtures/transcript_copilot_experiment_complete.jsonl +2 -0
- package/dist/hooks/tests/fixtures/transcript_copilot_experiment_state_jump.jsonl +2 -0
- package/dist/hooks/tests/fixtures/transcript_copilot_literature.jsonl +2 -0
- package/dist/hooks/tests/fixtures/transcript_main_only.jsonl +2 -0
- package/dist/hooks/tests/fixtures/transcript_malformed_state_output.jsonl +2 -0
- package/dist/hooks/tests/integration_run.ps1 +65 -0
- package/dist/hooks/tests/test_copilot_hook_lib.py +398 -0
- package/dist/hooks/tests/test_copilot_subagent_stop.py +186 -0
- package/dist/hooks/tests/test_copilot_write_guard.py +137 -0
- package/dist/hooks/tests/test_session_start_snapshot.py +116 -0
- package/dist/hooks/tests/test_state_machine_consistency.py +75 -0
- package/dist/skills/arxivsub-skill/SKILL.md +98 -0
- package/dist/skills/arxivsub-skill/skill.json +5 -0
- package/dist/skills/de-ai-checker/SKILL.md +110 -0
- package/dist/skills/de-ai-checker/skill.json +5 -0
- package/dist/skills/deep-interview/SKILL.md +91 -0
- package/dist/skills/deep-interview/skill.json +5 -0
- package/dist/skills/grill-with-docs/SKILL.md +120 -0
- package/dist/skills/grill-with-docs/skill.json +5 -0
- package/dist/skills/init-mcp/SKILL.md +83 -0
- package/dist/skills/init-mcp/skill.json +5 -0
- package/dist/skills/model-escalation/SKILL.md +93 -0
- package/dist/skills/model-escalation/skill.json +5 -0
- package/dist/skills/paper-architecture-web-drawing/SKILL.md +282 -0
- package/dist/skills/paper-architecture-web-drawing/skill.json +5 -0
- package/dist/skills/paper-deai/SKILL.md +53 -0
- package/dist/skills/paper-deai/skill.json +5 -0
- package/dist/skills/paper-en2zh/SKILL.md +29 -0
- package/dist/skills/paper-en2zh/skill.json +5 -0
- package/dist/skills/paper-expand/SKILL.md +43 -0
- package/dist/skills/paper-expand/skill.json +5 -0
- package/dist/skills/paper-experiment-analysis/SKILL.md +38 -0
- package/dist/skills/paper-experiment-analysis/skill.json +5 -0
- package/dist/skills/paper-figure-caption/SKILL.md +29 -0
- package/dist/skills/paper-figure-caption/skill.json +5 -0
- package/dist/skills/paper-logic-check/SKILL.md +30 -0
- package/dist/skills/paper-logic-check/skill.json +5 -0
- package/dist/skills/paper-polish/SKILL.md +34 -305
- package/dist/skills/paper-polish/skill.json +5 -0
- package/dist/skills/paper-review/SKILL.md +49 -0
- package/dist/skills/paper-review/skill.json +5 -0
- package/dist/skills/paper-sanity-check/SKILL.md +122 -0
- package/dist/skills/paper-sanity-check/skill.json +5 -0
- package/dist/skills/paper-shorten/SKILL.md +42 -0
- package/dist/skills/paper-shorten/skill.json +5 -0
- package/dist/skills/paper-table-caption/SKILL.md +29 -0
- package/dist/skills/paper-table-caption/skill.json +5 -0
- package/dist/skills/paper-translate/SKILL.md +48 -0
- package/dist/skills/paper-translate/skill.json +5 -0
- package/dist/skills/plugin-dev-agent-development/SKILL.md +95 -0
- package/dist/skills/plugin-dev-agent-development/skill.json +5 -0
- package/dist/skills/research-workflow/SKILL.md +116 -0
- package/dist/skills/research-workflow/skill.json +5 -0
- package/dist/skills/scientist-experiment-runner/SKILL.md +76 -0
- package/dist/skills/scientist-experiment-runner/skill.json +5 -0
- package/dist/skills/scientist-ideation/SKILL.md +52 -0
- package/dist/skills/scientist-ideation/skill.json +5 -0
- package/dist/skills/scientist-plotting/SKILL.md +49 -0
- package/dist/skills/scientist-plotting/skill.json +5 -0
- package/dist/skills/scientist-review/SKILL.md +40 -0
- package/dist/skills/scientist-review/skill.json +5 -0
- package/dist/skills/scientist-runtime-init/SKILL.md +46 -0
- package/dist/skills/scientist-runtime-init/skill.json +5 -0
- package/dist/skills/scientist-writeup/SKILL.md +60 -0
- package/dist/skills/scientist-writeup/skill.json +5 -0
- package/dist/skills/talk-normal/SKILL.md +73 -0
- package/dist/skills/talk-normal/skill.json +5 -0
- package/package.json +1 -1
- package/dist/agents/rc-experiment.md +0 -203
- package/dist/agents/rc-ideation.md +0 -224
- package/dist/agents/rc-literature.md +0 -228
- package/dist/agents/rc-plan.md +0 -189
- package/dist/agents/rc-polisher.md +0 -166
- package/dist/agents/rc-rebuttal.md +0 -194
- package/dist/agents/rc-reviewer.md +0 -187
- package/dist/agents/rc-update-spec.md +0 -231
- package/dist/agents/rc-verify.md +0 -234
- package/dist/agents/rc-writer.md +0 -161
- package/dist/skills/experiment-design/SKILL.md +0 -331
- package/dist/skills/full-research-workflow/SKILL.md +0 -363
- package/dist/skills/literature-search/SKILL.md +0 -244
- package/dist/skills/sanity-check/SKILL.md +0 -449
- package/dist/skills/submission-sprint/SKILL.md +0 -361
|
@@ -0,0 +1,49 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: paper-review
|
|
3
|
+
description: "Use when the user wants a top-conference reviewer perspective on paper quality, with a severe-but-constructive tone. Triggers on: \"审稿\", \"review\", \"reviewer\", \"peer review\", \"评审\". Outputs a review report + strategic author advice. Do NOT use for pre-submission sanity check (paper-sanity-check) or polish (paper-polish)."
|
|
4
|
+
version: 0.2.0
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Paper Review
|
|
8
|
+
|
|
9
|
+
## Role
|
|
10
|
+
You are a senior academic reviewer known for severity and precision, familiar with the evaluation standards of top computer-science venues. Your job is to gatekeep — only research that hits the highest bar on theoretical novelty, experimental rigor, and logical self-consistency gets through.
|
|
11
|
+
|
|
12
|
+
## Task
|
|
13
|
+
Read and analyze the user's paper. Ask for (or accept) the user's target venue, then write a stern-but-constructive review report.
|
|
14
|
+
|
|
15
|
+
## Constraints
|
|
16
|
+
|
|
17
|
+
### Reviewer stance (severe mode)
|
|
18
|
+
- Objectively assess the paper's actual level. Precisely locate weaknesses; honestly recognize contributions.
|
|
19
|
+
- Distinguish **truly fatal issues** from **fixable-during-revision issues** — they carry completely different weight.
|
|
20
|
+
- Score MUST faithfully reflect the paper's actual level: if method, experiment, and exposition show no obvious flaws, give the corresponding high score; if there are structural defects, state the cause clearly.
|
|
21
|
+
- Skip the social niceties; go straight to the core judgment.
|
|
22
|
+
|
|
23
|
+
### Dimensions
|
|
24
|
+
- **Community contribution**: does the paper materially advance the field? Contribution can take the form of a new method, dataset, evaluation framework, or a systematic treatment of an existing problem; mathematical density is not the yardstick.
|
|
25
|
+
- **Rigor**: are the core claims sufficiently supported by experiments? Are comparisons fair (baselines complete, versions aligned)? Do ablations cover the key design decisions?
|
|
26
|
+
- **Consistency**: are the intro's claimed contributions actually validated in the experiments section? Are any core questions evaded?
|
|
27
|
+
|
|
28
|
+
### Format
|
|
29
|
+
- Use coherent prose for complex logic; do not over-bullet.
|
|
30
|
+
- No irrelevant formatting directives.
|
|
31
|
+
|
|
32
|
+
### Output format
|
|
33
|
+
- **Part 1 [The Review Report]** (in Chinese, simulating real top-venue review style):
|
|
34
|
+
- **Summary**: one-sentence statement of the paper's core claim and contribution position.
|
|
35
|
+
- **Strengths**: 1–3 points of genuine value with their community-level significance.
|
|
36
|
+
- **Weaknesses (Critical)**: main problems, each specific to experiment setup / argumentation / exposition. NEVER vague. If no fatal issues, say so plainly.
|
|
37
|
+
- **Rating**: estimated score (1–10, Top 5% ≥ 8), with one sentence on the rationale.
|
|
38
|
+
- **Part 2 [Strategic Advice]** (in Chinese, for the authors):
|
|
39
|
+
- **Root cause**: for each Weakness in Part 1, the underlying reason — innate experimental design flaw, or exposition masking a method limit?
|
|
40
|
+
- **Salvageability**: which problems can be solved within the revision window, and which are method-level structural defects that supplementary experiments cannot rescue?
|
|
41
|
+
- **Action guide**: specifically which experiments to add, which logic to rewrite, or how to reduce attack surface in the rebuttal.
|
|
42
|
+
- Do not output anything beyond Parts 1 and 2.
|
|
43
|
+
|
|
44
|
+
## Self-check before output
|
|
45
|
+
|
|
46
|
+
1. Is each issue specific enough to be acted on? Do not say "experiments are insufficient"; say "missing [specific dataset]'s [specific validation]."
|
|
47
|
+
2. Did you misclassify a presentation issue as a method flaw? They differ in severity and repair path.
|
|
48
|
+
3. Does the score reflect actual contribution to the community, rather than applying a fixed severity template?
|
|
49
|
+
4. Is each opinion necessary? Every paper has many valid writing strategies — flag only what really matters. If nothing is wrong, just say so.
|
|
@@ -0,0 +1,5 @@
|
|
|
1
|
+
{
|
|
2
|
+
"name": "paper-review",
|
|
3
|
+
"description": "Use when the user wants a top-conference reviewer perspective on paper quality, with a severe-but-constructive tone. Triggers on: \"审稿\", \"review\", \"reviewer\", \"peer review\", \"评审\". Outputs a review report + strategic author advice. Do NOT use for pre-submission sanity check (paper-sanity-check) or polish (paper-polish).",
|
|
4
|
+
"entry": "SKILL.md"
|
|
5
|
+
}
|
|
@@ -0,0 +1,122 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: paper-sanity-check
|
|
3
|
+
description: "Use when the user is preparing to submit a paper or has completed a major revision and needs a pre-submission factual / structural / logical audit. Triggers on: \"sanity check\", \"查错\", \"基础检查\", \"check paper\", \"verify paper\", \"论文检查\", \"pre-submission check\". Six-pass audit. Do NOT use for writing style (paper-polish) or substantive review (paper-review)."
|
|
4
|
+
version: 0.2.0
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Paper Sanity Check
|
|
8
|
+
|
|
9
|
+
## Role
|
|
10
|
+
You are a meticulous pre-submission auditor for academic papers. Your job is to catch factual, structural, and logical errors that would embarrass the authors or trigger immediate desk rejection — not to improve writing style.
|
|
11
|
+
|
|
12
|
+
## When to use
|
|
13
|
+
- Before submitting a paper to a venue
|
|
14
|
+
- After major revisions to verify nothing broke
|
|
15
|
+
- When merging contributions from multiple co-authors
|
|
16
|
+
|
|
17
|
+
## Procedure
|
|
18
|
+
|
|
19
|
+
Read the **entire** paper (all sections, all figures/tables, all captions). Then run ALL six checks in a **single pass**. Do NOT skip any check.
|
|
20
|
+
|
|
21
|
+
### Check 1 · Logical flow & transitions
|
|
22
|
+
- Read section by section. At each section / subsection boundary, verify:
|
|
23
|
+
- Does the previous section's ending set up the next?
|
|
24
|
+
- Are there abrupt topic jumps without bridging?
|
|
25
|
+
- Does the argument build cumulatively, or does it loop / contradict itself?
|
|
26
|
+
- At the paragraph level inside each section, verify:
|
|
27
|
+
- Each paragraph has a clear purpose and connects to the next.
|
|
28
|
+
- No orphan paragraphs that belong elsewhere.
|
|
29
|
+
|
|
30
|
+
**Report format**: each gap as `[Section X.Y → X.Z] <description of the disconnect>`.
|
|
31
|
+
|
|
32
|
+
### Check 2 · Float reference completeness
|
|
33
|
+
- Enumerate **every** float in the paper: figures, tables, algorithms, listings, pseudocode blocks.
|
|
34
|
+
- For each, search the body for at least one explicit reference (`Figure 1`, `Table 2`, `Algorithm 3`).
|
|
35
|
+
- Flag any float **never referenced**.
|
|
36
|
+
- Flag any in-text reference pointing to a **non-existent** float (dangling reference).
|
|
37
|
+
- Check that references appear **before or near** the float — a body float referenced only in the appendix is suspicious.
|
|
38
|
+
|
|
39
|
+
**Report format**: a table with columns `[Float ID | Referenced? | Location of first reference | Issue]`.
|
|
40
|
+
|
|
41
|
+
### Check 3 · Contribution–evidence alignment
|
|
42
|
+
- Extract the **explicit contribution claims** from the Introduction.
|
|
43
|
+
- For each claim, locate the corresponding evidence (tables, figures, ablations).
|
|
44
|
+
- Grade:
|
|
45
|
+
- **Supported**: evidence directly validates the claim.
|
|
46
|
+
- **Overstated**: strong language ("significant", "substantial", "dramatically") wraps marginal numbers (e.g. <1% gain framed as a breakthrough).
|
|
47
|
+
- **Unsupported**: no experiment validates this claim.
|
|
48
|
+
- Watch the gap between the **magnitude of the language** and the **magnitude of the numbers**.
|
|
49
|
+
|
|
50
|
+
**Report format**: table with columns `[Claim # | Claim summary | Evidence location | Verdict | Notes]`.
|
|
51
|
+
|
|
52
|
+
### Check 4 · Data–analysis consistency
|
|
53
|
+
- For each experimental result discussed in prose, verify:
|
|
54
|
+
- **Numbers cited in prose** match **numbers in the corresponding table / figure**.
|
|
55
|
+
- **Comparison direction** is correct ("our method outperforms X by 3%" — actually 3%, not 2.7%, and X is not actually better).
|
|
56
|
+
- **The aspect being discussed** matches what the table / figure measures (text says accuracy, table reports F1).
|
|
57
|
+
- Flag any mismatch, even minor rounding (>0.1 absolute difference).
|
|
58
|
+
|
|
59
|
+
**Report format**: `[Section X.Y, paragraph Z] Text says "<quote>" but Table/Figure N shows <actual value>`.
|
|
60
|
+
|
|
61
|
+
### Check 5 · Cross-table data consistency
|
|
62
|
+
- Identify every unique `(model, dataset, metric)` tuple appearing in more than one table / figure.
|
|
63
|
+
- Verify the same tuple reports the same value, unless:
|
|
64
|
+
- Different hyperparameters / settings are explicitly stated, OR
|
|
65
|
+
- One is a subset / superset experiment.
|
|
66
|
+
- Flag conflicts: same model + same dataset + same metric + same conditions → different numbers in different tables.
|
|
67
|
+
|
|
68
|
+
**Report format**: `[Conflict] <model> on <dataset> (<metric>): Table X reports <A>, Table Y reports <B>. No stated difference in conditions.`
|
|
69
|
+
|
|
70
|
+
### Check 6 · Causal coherence & persuasiveness
|
|
71
|
+
|
|
72
|
+
**6a · Motivation & problem-framing causation**
|
|
73
|
+
- In Abstract and Introduction, identify every causal claim of the form "A causes / introduces / gives rise to B."
|
|
74
|
+
- For each: is B genuinely caused by A, or is it a generic issue (inherent to low-bit quantization, common across all model families) that the authors misattribute to a domain-specific factor?
|
|
75
|
+
- Watch blanket attributions tying **multiple** challenges to a **single** architectural / domain cause; check each challenge individually.
|
|
76
|
+
|
|
77
|
+
**6b · Experimental-result causation**
|
|
78
|
+
- For each key result:
|
|
79
|
+
- Does the paper explain **why** it occurs, not just **what** it is?
|
|
80
|
+
- Is the causal explanation **consistent** with the method design?
|
|
81
|
+
- Are alternative explanations considered, or at least not contradicted?
|
|
82
|
+
- Flag results presented without explanation, or with explanations that contradict the method description.
|
|
83
|
+
- Flag cherry-picked results: paper highlights one favorable metric while ignoring a regression on a sibling metric in the same table.
|
|
84
|
+
|
|
85
|
+
**Report format**: `[Section X.Y] Result: <what>. Issue: <missing causation / contradictory explanation / cherry-picking>`.
|
|
86
|
+
|
|
87
|
+
## Output format
|
|
88
|
+
|
|
89
|
+
```
|
|
90
|
+
# 论文基础检查报告
|
|
91
|
+
|
|
92
|
+
## 总览
|
|
93
|
+
- 发现问题总数:N
|
|
94
|
+
- 严重(必须修复):N
|
|
95
|
+
- 警告(建议修复):N
|
|
96
|
+
|
|
97
|
+
## 检查 1:逻辑流与衔接
|
|
98
|
+
[发现 或 "通过 — 未发现问题"]
|
|
99
|
+
|
|
100
|
+
## 检查 2:浮动体引用完整性
|
|
101
|
+
[发现 或 "通过 — 全部 N 个浮动体均已引用"]
|
|
102
|
+
|
|
103
|
+
## 检查 3:贡献-证据对齐
|
|
104
|
+
[发现 或 "通过 — 所有声明均有支撑"]
|
|
105
|
+
|
|
106
|
+
## 检查 4:数据-分析一致性
|
|
107
|
+
[发现 或 "通过 — 正文数字与表图一致"]
|
|
108
|
+
|
|
109
|
+
## 检查 5:跨表数据一致性
|
|
110
|
+
[发现 或 "通过 — 无跨表冲突"]
|
|
111
|
+
|
|
112
|
+
## 检查 6:因果连贯性与说服力
|
|
113
|
+
[发现 或 "通过 — 因果解释一致"]
|
|
114
|
+
```
|
|
115
|
+
|
|
116
|
+
## Constraints
|
|
117
|
+
|
|
118
|
+
- **NEVER** comment on writing style, grammar, or word choice — that is the job of `paper-polish` / `paper-logic-check`.
|
|
119
|
+
- **NEVER** suggest adding new experiments or changing the method — that is the job of `paper-review`.
|
|
120
|
+
- **Only** report issues you can verify from the text. Do not speculate about what the authors "might have meant."
|
|
121
|
+
- When a check passes cleanly, explicitly mark it `通过` — silence is not approval.
|
|
122
|
+
- Output the report in **Chinese**.
|
|
@@ -0,0 +1,5 @@
|
|
|
1
|
+
{
|
|
2
|
+
"name": "paper-sanity-check",
|
|
3
|
+
"description": "Use when the user is preparing to submit a paper or has completed a major revision and needs a pre-submission factual / structural / logical audit. Triggers on: \"sanity check\", \"查错\", \"基础检查\", \"check paper\", \"verify paper\", \"论文检查\", \"pre-submission check\". Six-pass audit. Do NOT use for writing style (paper-polish) or substantive review (paper-review).",
|
|
4
|
+
"entry": "SKILL.md"
|
|
5
|
+
}
|
|
@@ -0,0 +1,42 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: paper-shorten
|
|
3
|
+
description: "Use when the user asks to lightly compress English LaTeX text without losing information (target 5-15 words removed). Triggers on: \"缩写\", \"缩减\", \"shorten\", \"compress\", \"reduce length\". Does NOT delete content; only tightens syntax. Do NOT use for expansion (paper-expand) or translation (paper-translate)."
|
|
4
|
+
version: 0.2.0
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Paper Shortening
|
|
8
|
+
|
|
9
|
+
## Role
|
|
10
|
+
You are a top academic editor specializing in concision. You compress text by syntactic optimization, without losing information.
|
|
11
|
+
|
|
12
|
+
## Task
|
|
13
|
+
Lightly shorten the user's English LaTeX snippet (target reduction: ~5–15 words).
|
|
14
|
+
|
|
15
|
+
## Constraints
|
|
16
|
+
|
|
17
|
+
### Adjustment magnitude
|
|
18
|
+
- Goal: small word-count reduction (~5–15 words).
|
|
19
|
+
- NEVER large cuts. MUST preserve all core information, technical details, and experimental parameters. NEVER alter meaning.
|
|
20
|
+
|
|
21
|
+
### Compression levers
|
|
22
|
+
- Syntactic compression: subordinate clauses → phrases; passive → active when shorter; merge near-duplicate constructions.
|
|
23
|
+
- Drop redundancy: `in order to` → `to`; trim hollow filler words.
|
|
24
|
+
|
|
25
|
+
### Visual / style
|
|
26
|
+
- Keep LaTeX clean — no bold, italics, quotes for emphasis.
|
|
27
|
+
- Minimize em dashes (LaTeX `---` / Unicode `—`); use commas, parentheses, or subordinate clauses.
|
|
28
|
+
- NEVER use `\item` — keep coherent paragraphs.
|
|
29
|
+
|
|
30
|
+
### Output format
|
|
31
|
+
- **Part 1 [LaTeX]**: write the shortened English LaTeX into the file.
|
|
32
|
+
- All English.
|
|
33
|
+
- Special chars MUST be escaped (`%`, `_`, `&`).
|
|
34
|
+
- Math expressions stay as-is (keep `$`).
|
|
35
|
+
- **Part 2 [Translation]**: literal Chinese back-translation, so the user can verify nothing critical was dropped.
|
|
36
|
+
- **Part 3 [Modification Log]**: brief Chinese note on what was changed (e.g. "删除了冗余词 XXX,合并了 YYY 从句").
|
|
37
|
+
- Do not output anything beyond Parts 1–3.
|
|
38
|
+
|
|
39
|
+
## Self-check before output
|
|
40
|
+
|
|
41
|
+
1. Information integrity: did you accidentally drop an experimental parameter or qualifier? If yes, put it back.
|
|
42
|
+
2. Word count: is the cut too aggressive? Target is fine tuning, not collapsing a paragraph to a sentence.
|
|
@@ -0,0 +1,5 @@
|
|
|
1
|
+
{
|
|
2
|
+
"name": "paper-shorten",
|
|
3
|
+
"description": "Use when the user asks to lightly compress English LaTeX text without losing information (target 5-15 words removed). Triggers on: \"缩写\", \"缩减\", \"shorten\", \"compress\", \"reduce length\". Does NOT delete content; only tightens syntax. Do NOT use for expansion (paper-expand) or translation (paper-translate).",
|
|
4
|
+
"entry": "SKILL.md"
|
|
5
|
+
}
|
|
@@ -0,0 +1,29 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: paper-table-caption
|
|
3
|
+
description: "Use when the user asks for a publication-ready English table caption from a Chinese description. Triggers on: \"表标题\", \"table caption\", \"table title\", \"表的标题\". Outputs the caption text only — no `Table 1:` prefix. Do NOT use for figure captions (paper-figure-caption)."
|
|
4
|
+
version: 0.2.0
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Table Caption Generation
|
|
8
|
+
|
|
9
|
+
## Role
|
|
10
|
+
You are a seasoned academic editor specializing in precise, standards-compliant table captions.
|
|
11
|
+
|
|
12
|
+
## Task
|
|
13
|
+
Convert the user's Chinese description into a top-conference-grade English table caption.
|
|
14
|
+
|
|
15
|
+
## Constraints
|
|
16
|
+
|
|
17
|
+
### Format
|
|
18
|
+
- If the result is a **noun phrase**: Title Case (capitalize all content words), no terminal period.
|
|
19
|
+
- If the result is a **complete sentence**: Sentence case (capitalize only the first word, except proper nouns), terminal period required.
|
|
20
|
+
|
|
21
|
+
### Style
|
|
22
|
+
- Common patterns for tables: `Comparison with`, `Ablation study on`, `Results on`.
|
|
23
|
+
- De-AI: avoid words like `showcase`, `depict`; use `show`, `compare`, `present`.
|
|
24
|
+
|
|
25
|
+
### Output format
|
|
26
|
+
- Output only the English caption text.
|
|
27
|
+
- No `Table 1:` prefix — content only.
|
|
28
|
+
- Special chars MUST be escaped (`%`, `_`, `&`).
|
|
29
|
+
- Math expressions stay as-is (keep `$`).
|
|
@@ -0,0 +1,5 @@
|
|
|
1
|
+
{
|
|
2
|
+
"name": "paper-table-caption",
|
|
3
|
+
"description": "Use when the user asks for a publication-ready English table caption from a Chinese description. Triggers on: \"表标题\", \"table caption\", \"table title\", \"表的标题\". Outputs the caption text only — no `Table 1:` prefix. Do NOT use for figure captions (paper-figure-caption).",
|
|
4
|
+
"entry": "SKILL.md"
|
|
5
|
+
}
|
|
@@ -0,0 +1,48 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: paper-translate
|
|
3
|
+
description: "Use when the user has a Chinese academic draft and needs it converted to publication-ready English LaTeX. Triggers on: \"中译英\", \"Chinese to English\", \"translate\", \"翻译成英文\". Combined translation + light polish. Do NOT use for English-to-Chinese (use paper-en2zh) or deep rewriting (use paper-polish)."
|
|
4
|
+
version: 0.2.0
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Paper Translation (Chinese → English)
|
|
8
|
+
|
|
9
|
+
## Role
|
|
10
|
+
You are a top-tier research-writing expert with a second hat as a senior conference reviewer (ICML / ICLR). Your taste is severe — zero tolerance for logical gaps and language flaws.
|
|
11
|
+
|
|
12
|
+
## Task
|
|
13
|
+
Translate the user's Chinese draft into an English academic-paper fragment, with light polish for register.
|
|
14
|
+
|
|
15
|
+
## Constraints
|
|
16
|
+
|
|
17
|
+
### Visual / typographic
|
|
18
|
+
- Avoid bold, italics, quotes for emphasis — they hurt the paper's look.
|
|
19
|
+
- Keep the LaTeX clean. No decorative formatting directives.
|
|
20
|
+
|
|
21
|
+
### Style / logic
|
|
22
|
+
- Logic MUST be rigorous, word choice precise, expression dense and coherent. Prefer common words; avoid rare ones.
|
|
23
|
+
- Minimize em dashes (LaTeX `---` / Unicode `—`); use commas, parentheses, colons, subordinate clauses, or appositives.
|
|
24
|
+
- NEVER use `\item` — prose only, in coherent paragraphs.
|
|
25
|
+
- De-AI: natural flow, no mechanical connective piling.
|
|
26
|
+
|
|
27
|
+
### Tense discipline
|
|
28
|
+
- Background / prior art results: present perfect (`Recent work has shown...`, `VLMs have achieved...`).
|
|
29
|
+
- This paper's method / architecture / experimental conclusion: simple present.
|
|
30
|
+
- A specific prior work's exact methodology: simple past (`GPTQ adopted...`).
|
|
31
|
+
- Explicit historical events: simple past.
|
|
32
|
+
|
|
33
|
+
### Cohesion discipline
|
|
34
|
+
- Natural logical transitions between sentences and paragraphs. No abrupt jumps.
|
|
35
|
+
- NEVER use mechanical connectives like `First and foremost`. Use pronominal anaphora, causal subordination, concessive subordination etc. to keep coherence.
|
|
36
|
+
|
|
37
|
+
## Output format
|
|
38
|
+
- **Part 1 [LaTeX]**: only the translated English LaTeX content.
|
|
39
|
+
- All English.
|
|
40
|
+
- Special chars MUST be escaped (`95%` → `95\%`, `model_v1` → `model\_v1`, `R&D` → `R\&D`).
|
|
41
|
+
- Math expressions stay as-is (keep `$`).
|
|
42
|
+
- **Part 2 [Translation]**: a literal back-translation to Chinese, so the user can verify the logic survives.
|
|
43
|
+
- Do not output anything beyond Parts 1 and 2.
|
|
44
|
+
|
|
45
|
+
## Self-check before output
|
|
46
|
+
|
|
47
|
+
1. Reviewer's eye: assume you are the harshest reviewer; check for over-formatting, logic jumps, or untranslated Chinese.
|
|
48
|
+
2. Immediate fix: address what you found; final output must be rigorous, clean, fully Anglicized.
|
|
@@ -0,0 +1,5 @@
|
|
|
1
|
+
{
|
|
2
|
+
"name": "paper-translate",
|
|
3
|
+
"description": "Use when the user has a Chinese academic draft and needs it converted to publication-ready English LaTeX. Triggers on: \"中译英\", \"Chinese to English\", \"translate\", \"翻译成英文\". Combined translation + light polish. Do NOT use for English-to-Chinese (use paper-en2zh) or deep rewriting (use paper-polish).",
|
|
4
|
+
"entry": "SKILL.md"
|
|
5
|
+
}
|
|
@@ -0,0 +1,95 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: plugin-dev-agent-development
|
|
3
|
+
description: "Workflow for Research Copilot plugin development. Use when creating, updating, reviewing, or debugging `.agent.md`, `SKILL.md`, agent orchestration rules, worker dispatch protocols, pipeline ledger workflows, or plugin packaging docs. Triggers on: plugin-dev, agent-development, agent development, skill development, pipeline ledger, worker dispatch, agent orchestration."
|
|
4
|
+
version: 0.1.0
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Plugin Dev Agent Development
|
|
8
|
+
|
|
9
|
+
Use this workflow when developing Research Copilot agents, skills, orchestration rules, or plugin packaging behavior.
|
|
10
|
+
|
|
11
|
+
## Scope
|
|
12
|
+
|
|
13
|
+
In scope:
|
|
14
|
+
|
|
15
|
+
- Creating or editing `.agent.md` files.
|
|
16
|
+
- Creating or editing `SKILL.md` files.
|
|
17
|
+
- Changing agent routing, worker dispatch, or pipeline-ledger protocols.
|
|
18
|
+
- Updating plugin packaging docs or manifests that affect Research Copilot behavior.
|
|
19
|
+
|
|
20
|
+
Out of scope:
|
|
21
|
+
|
|
22
|
+
- General application code that does not change agent / skill / plugin behavior.
|
|
23
|
+
- MCP server runtime debugging, unless the change affects agent orchestration docs.
|
|
24
|
+
|
|
25
|
+
## Required workflow
|
|
26
|
+
|
|
27
|
+
1. Clarify the requested behavior and the affected customization surface.
|
|
28
|
+
2. Read the relevant existing agent / skill / manifest files before proposing edits.
|
|
29
|
+
3. If the change spans multiple files or agents, write a pipeline ledger under `.copilot/pipelines/` before editing.
|
|
30
|
+
4. Break the work into narrow worker-dispatchable tasks with explicit read/write scope.
|
|
31
|
+
5. Dispatch workers only for isolated subtasks; keep cross-file orchestration in the coordinator.
|
|
32
|
+
6. Review worker returns for spec compliance, evidence, and conflicts before accepting them.
|
|
33
|
+
7. Update docs and generated metadata in the same change as the agent / skill behavior.
|
|
34
|
+
8. Run the repository's skill metadata and bundle verification commands before declaring completion.
|
|
35
|
+
|
|
36
|
+
## Pipeline ledger requirement
|
|
37
|
+
|
|
38
|
+
For multi-file or multi-agent work, create a ledger named:
|
|
39
|
+
|
|
40
|
+
```text
|
|
41
|
+
.copilot/pipelines/YYYY-MM-DD-plugin-dev-<topic>-round-N.md
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
The ledger must contain:
|
|
45
|
+
|
|
46
|
+
```markdown
|
|
47
|
+
## 1. Intake
|
|
48
|
+
## 2. Round Plan
|
|
49
|
+
## 3. Task Breakdown
|
|
50
|
+
## 4. Dispatch Log
|
|
51
|
+
## 5. Worker Returns
|
|
52
|
+
## 6. Coordinator Review
|
|
53
|
+
## 7. Stage Output
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
Write `Intake`, `Round Plan`, and `Task Breakdown` before editing files or dispatching workers.
|
|
57
|
+
|
|
58
|
+
## Worker prompt template
|
|
59
|
+
|
|
60
|
+
Every worker prompt must include:
|
|
61
|
+
|
|
62
|
+
```text
|
|
63
|
+
Context & stage: <plugin-dev topic and parent coordinator>
|
|
64
|
+
This worker's goal: <one narrow task and explicit non-goals>
|
|
65
|
+
Available facts: <paths, excerpts, manifests, prior decisions>
|
|
66
|
+
Hard constraints: <allowed files, forbidden files, no invented behavior>
|
|
67
|
+
Expected output: <exact patch / audit / table / summary shape>
|
|
68
|
+
Stop condition: <when to return blocked>
|
|
69
|
+
```
|
|
70
|
+
|
|
71
|
+
## Verification
|
|
72
|
+
|
|
73
|
+
Run these commands after edits that affect self-owned skills or bundle output:
|
|
74
|
+
|
|
75
|
+
```bash
|
|
76
|
+
python self/scripts/generate-skill-json.py --check
|
|
77
|
+
python scripts/build_copilot_workspace.py --repo-root . --output dist/claude-workspace --target github
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
Also run text consistency checks for changed agent rules:
|
|
81
|
+
|
|
82
|
+
```bash
|
|
83
|
+
rg "sub-agents do NOT Task each other|NEVER let sub-agents call Task" self/AGENTS.md self/agents
|
|
84
|
+
rg "pipeline ledger|Worker boundaries|Context & stage" self/AGENTS.md self/agents self/skills/plugin-dev-agent-development
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
The first search should not find stale absolute bans. The second search should show the new protocol in the relevant files.
|
|
88
|
+
|
|
89
|
+
## Hard constraints
|
|
90
|
+
|
|
91
|
+
- Do not mix design, implementation, verification, and final reporting in the main session for multi-file agent work.
|
|
92
|
+
- Do not let workers change files outside their declared write scope.
|
|
93
|
+
- Do not accept worker output without coordinator review.
|
|
94
|
+
- Do not leave `skill.json` stale after changing a self-owned skill's frontmatter.
|
|
95
|
+
- Do not introduce duplicate skills when an existing self-owned skill already covers the same workflow.
|
|
@@ -0,0 +1,5 @@
|
|
|
1
|
+
{
|
|
2
|
+
"name": "plugin-dev-agent-development",
|
|
3
|
+
"description": "Workflow for Research Copilot plugin development. Use when creating, updating, reviewing, or debugging `.agent.md`, `SKILL.md`, agent orchestration rules, worker dispatch protocols, pipeline ledger workflows, or plugin packaging docs. Triggers on: plugin-dev, agent-development, agent development, skill development, pipeline ledger, worker dispatch, agent orchestration.",
|
|
4
|
+
"entry": "SKILL.md"
|
|
5
|
+
}
|
|
@@ -0,0 +1,116 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: research-workflow
|
|
3
|
+
description: Research pipeline workflow enforcement. Use when coordinating any research stage (literature/ideation/experiment/writing/polishing/review/rebuttal). Provides mandatory checklist and state machine rules.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Research Workflow
|
|
7
|
+
|
|
8
|
+
You are following the research-workflow skill for the main-session conductor.
|
|
9
|
+
|
|
10
|
+
This skill enforces workflow discipline through mandatory checklists and hard gates.
|
|
11
|
+
|
|
12
|
+
## Source of Truth
|
|
13
|
+
|
|
14
|
+
Shared workflow rules — state-machine format, 7 capability gates (interview / validation / research / longrun / execution / memory / handoff), 6-field delegation template, approval-gate policy, dispatch policy, back-edge matrix, `.copilot/` write-permission table, `__HANDOFF__` schema, error recovery — live in [`self/PIPELINE-OS.md`](../../PIPELINE-OS.md). This skill references §3 (gates) and §5 (approval policy) by section number; do not duplicate that content here.
|
|
15
|
+
|
|
16
|
+
## Mandatory Checklist
|
|
17
|
+
|
|
18
|
+
You MUST create a task for each item via TaskCreate and complete them in order:
|
|
19
|
+
|
|
20
|
+
1. **Load context** — Read `.copilot/state.md` or initialize skeleton
|
|
21
|
+
2. **Diagnose current stage** — Determine which state (S1-S7) user is at
|
|
22
|
+
3. **Interview gate (if PLANNING)** — Run structured interview to clarify scope
|
|
23
|
+
4. **Delegate to sub-agent** — Use Agent tool with 6-field prompt template
|
|
24
|
+
5. **Audit sub-agent output** — Verify STATE_OUTPUT block is well-formed
|
|
25
|
+
6. **Update state file** — Write transition to `.copilot/state.md`
|
|
26
|
+
7. **Check for back-edges** — Increment loop counters if routing backward
|
|
27
|
+
8. **Gate approval** — Use AskUserQuestion before any back-edge or major transition
|
|
28
|
+
9. **Report completion** — Summarize what was done + next recommended action
|
|
29
|
+
|
|
30
|
+
## Skill Activation Behavior
|
|
31
|
+
|
|
32
|
+
When this skill is invoked:
|
|
33
|
+
|
|
34
|
+
1. Create tasks for the 9 checklist items via TaskCreate
|
|
35
|
+
2. Mark each task complete as you progress through states
|
|
36
|
+
3. Before state transitions, verify prerequisite tasks are complete
|
|
37
|
+
4. If you try to skip ahead, you will be reminded of incomplete tasks
|
|
38
|
+
|
|
39
|
+
## Hard Gates
|
|
40
|
+
|
|
41
|
+
<HARD-GATE id="experiment-delegation">
|
|
42
|
+
NEVER run experiments directly. ALL experiment work (training, evaluation, ablation, metric computation) MUST be delegated to copilot-experiment via Agent tool.
|
|
43
|
+
|
|
44
|
+
If you think "I'll just run this quick experiment", STOP. That thought is the violation.
|
|
45
|
+
</HARD-GATE>
|
|
46
|
+
|
|
47
|
+
<HARD-GATE id="ideation-delegation">
|
|
48
|
+
NEVER design experiments or propose innovations directly. ALL creative work (6-dimension brainstorming, cross-domain analogy, novelty assessment) MUST be delegated to copilot-ideation via Agent tool.
|
|
49
|
+
|
|
50
|
+
If you think "I can design this experiment myself", STOP. That thought is the violation.
|
|
51
|
+
</HARD-GATE>
|
|
52
|
+
|
|
53
|
+
<HARD-GATE id="task-creation">
|
|
54
|
+
When you identify multiple steps or create a plan, you MUST call TaskCreate for each step. Listing tasks in prose without tool calls is not allowed.
|
|
55
|
+
|
|
56
|
+
If you output "步骤 1, 2, 3" without calling TaskCreate, you have violated this gate.
|
|
57
|
+
</HARD-GATE>
|
|
58
|
+
|
|
59
|
+
<HARD-GATE id="interview-gate">
|
|
60
|
+
When entering PLANNING state (user asks "what's next", "full pipeline", "submission sprint"), you MUST run a structured interview before committing to a plan. Use the interview skill (deep-interview / quick-interview / user-preference-interview) to clarify scope, constraints, and success criteria — per PIPELINE-OS §3 interview-gate. AskUserQuestion is reserved for the 6 cases listed in §5.
|
|
61
|
+
|
|
62
|
+
Do not assume you know what the user wants. Ask.
|
|
63
|
+
</HARD-GATE>
|
|
64
|
+
|
|
65
|
+
<HARD-GATE id="state-output-audit">
|
|
66
|
+
After every sub-agent delegation, you MUST audit the STATE_OUTPUT block. Verify:
|
|
67
|
+
- Current state is END or appropriate terminal state
|
|
68
|
+
- Evidence field is valid (file path or tool call ID)
|
|
69
|
+
- Action completed describes what was done
|
|
70
|
+
|
|
71
|
+
Do not proceed if audit fails. Re-delegate with refined prompt or escalate to user.
|
|
72
|
+
</HARD-GATE>
|
|
73
|
+
|
|
74
|
+
## State Machine Rules
|
|
75
|
+
|
|
76
|
+
**Forward path:**
|
|
77
|
+
S1_LITERATURE → S2_IDEATION → S3_EXPERIMENT → S4_WRITER → S5_POLISHER → S6_REVIEWER → S7_REBUTTAL → END
|
|
78
|
+
|
|
79
|
+
**Back-edges (approval-gated per PIPELINE-OS §5 case ②):**
|
|
80
|
+
- S3 → S2 (experiment shows fundamental flaw)
|
|
81
|
+
- S3 → S1 (cannot pick next ablation)
|
|
82
|
+
- S4 → S3 (missing plot/data)
|
|
83
|
+
- S4 → S2 (writing exposes contradiction)
|
|
84
|
+
- S6 → S3 (critical gap requires new experiment)
|
|
85
|
+
- S6 → S2 (critical gap shows unsupported contribution)
|
|
86
|
+
- S7 → S3 (reviewer requires new experiment)
|
|
87
|
+
- S7 → S2 (reviewer undermines novelty)
|
|
88
|
+
|
|
89
|
+
**Loop counters (3-strike rule):**
|
|
90
|
+
- Each back-edge has a counter in `.copilot/state.md`
|
|
91
|
+
- After 3 fires of the same back-edge, hard-stop via AskUserQuestion
|
|
92
|
+
- Do not dispatch that back-edge again until user chooses to reset counter
|
|
93
|
+
|
|
94
|
+
## Delegation Prompt Template
|
|
95
|
+
|
|
96
|
+
Every Agent call MUST include all six fields:
|
|
97
|
+
|
|
98
|
+
```
|
|
99
|
+
Context & stage: <user is at SN, last round did X, why we are doing this now>
|
|
100
|
+
This round's goal: <what this round completes, and what it explicitly does NOT do>
|
|
101
|
+
Available facts: <.copilot/<file>.md paths, workspace file paths, specified PDFs, etc.>
|
|
102
|
+
Hard constraints: <target venue, style, do-not-touch files, no fabricated citations>
|
|
103
|
+
Expected output: <conclusion / file diff / draft / table — concrete form>
|
|
104
|
+
Stop condition: <when to stop and report rather than push through>
|
|
105
|
+
```
|
|
106
|
+
|
|
107
|
+
## Anti-Patterns
|
|
108
|
+
|
|
109
|
+
| Thought | Reality |
|
|
110
|
+
|---------|---------|
|
|
111
|
+
| "I'll just run this quick experiment" | STOP. Delegate to copilot-experiment. |
|
|
112
|
+
| "Let me list the tasks first" | STOP. Call TaskCreate tool. |
|
|
113
|
+
| "I can design this experiment myself" | STOP. Delegate to copilot-ideation. |
|
|
114
|
+
| "The sub-agent finished, moving on" | STOP. Audit STATE_OUTPUT block first. |
|
|
115
|
+
| "This is too simple to need delegation" | STOP. Delegation is mandatory regardless of perceived simplicity. |
|
|
116
|
+
| "I'll just check one thing before delegating" | STOP. Delegate first, let sub-agent do the checking. |
|
|
@@ -0,0 +1,5 @@
|
|
|
1
|
+
{
|
|
2
|
+
"name": "research-workflow",
|
|
3
|
+
"description": "Research pipeline workflow enforcement. Use when coordinating any research stage (literature/ideation/experiment/writing/polishing/review/rebuttal). Provides mandatory checklist and state machine rules.",
|
|
4
|
+
"entry": "SKILL.md"
|
|
5
|
+
}
|
|
@@ -0,0 +1,76 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: scientist-experiment-runner
|
|
3
|
+
description: "Use when the user asks to '开始实验', '推进实验', 'run experiment', 'advance the experiment', 'turn an idea into experiment', or wants an idea JSON converted into concrete code edits and runs directly in Copilot. Copilot-native — Copilot writes the changes and reads the results in-session; the terminal only executes non-model commands. Do NOT use for plotting (scientist-plotting) or writeup (scientist-writeup)."
|
|
4
|
+
version: 0.2.0
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# scientist-experiment-runner
|
|
8
|
+
|
|
9
|
+
Convert AI Scientist's "advance the experiment" into a Copilot-native workflow: Copilot reads ideas, edits code, runs experiments, and summarizes results. NEVER call any workspace-custom model pipeline.
|
|
10
|
+
|
|
11
|
+
## Execution model
|
|
12
|
+
|
|
13
|
+
This is a **Copilot-native model task**. Copilot makes research decisions in-session; the terminal runs only non-model commands.
|
|
14
|
+
|
|
15
|
+
## Workflow
|
|
16
|
+
|
|
17
|
+
1. Read the ideas JSON; lock the `idea_idx` or specific idea in use.
|
|
18
|
+
2. Identify the code surface, config surface, and run commands.
|
|
19
|
+
3. Make the minimum necessary code or config change.
|
|
20
|
+
4. Run experiments / tests / evaluation via the terminal.
|
|
21
|
+
5. Read result files, logs, and metrics.
|
|
22
|
+
6. Continue analysis and the next-round decision in-session.
|
|
23
|
+
|
|
24
|
+
## Long-run guidance — when an experiment exceeds the longest Bash timeout (10 min)
|
|
25
|
+
|
|
26
|
+
**NEVER** poll with repeated `Bash(timeout=600000)` calls to "just wait." Each call burns 10 minutes of context and forces a fresh decision turn on identical state.
|
|
27
|
+
|
|
28
|
+
Pick the tool by estimated duration:
|
|
29
|
+
|
|
30
|
+
| Estimated time | Tool | Why |
|
|
31
|
+
|---|---|---|
|
|
32
|
+
| < 10 min | `Bash` synchronous | Fits in one call |
|
|
33
|
+
| 10 min – 2 h, command exits when done | `Bash(run_in_background=true)` | Harness auto-notifies on exit; zero polling |
|
|
34
|
+
| Hours, you need progress events | `Monitor` with `tail -f log \| grep -E --line-buffered "elapsed_steps=\|Traceback\|Error\|FAILED\|Killed\|OOM"` | One notification per event; ends when the grep filter exits |
|
|
35
|
+
| Hours, no event stream | `ScheduleWakeup(delaySeconds=N, prompt="continue checking experiment X")` | Re-enter cold after delay; cheap |
|
|
36
|
+
| Recurring polls (e.g. every 30 min) | `CronCreate` | Standard cron, in-memory |
|
|
37
|
+
|
|
38
|
+
## Verification before declaring completion
|
|
39
|
+
|
|
40
|
+
**Before claiming the experiment finished successfully, you MUST produce one of:**
|
|
41
|
+
- the exact metric value with the exact log line it came from,
|
|
42
|
+
- the file path of the produced artifact + a confirming `ls` / `Read` output,
|
|
43
|
+
- or an explicit "could not verify — here is what I have so far."
|
|
44
|
+
|
|
45
|
+
A turn that ends with "the experiment finished successfully" without one of the above is a failure mode. A "background command exited 0" notification alone is not verification — the script could have crashed silently or produced empty logs.
|
|
46
|
+
|
|
47
|
+
## Input
|
|
48
|
+
|
|
49
|
+
- `load_ideas` or ideas-JSON path
|
|
50
|
+
- `idea_idx` or target idea name
|
|
51
|
+
- Relevant code directories, training scripts, configs, run commands
|
|
52
|
+
- Expected output directory or existing experiment directory
|
|
53
|
+
|
|
54
|
+
## Output
|
|
55
|
+
|
|
56
|
+
- Code diff
|
|
57
|
+
- Commands actually run
|
|
58
|
+
- Key metrics, log summary, artifact paths
|
|
59
|
+
- Suggested next round
|
|
60
|
+
|
|
61
|
+
## Operating principles
|
|
62
|
+
|
|
63
|
+
1. Copilot owns experiment strategy and result interpretation; the terminal owns code execution.
|
|
64
|
+
2. Advance one minimum experimental slice per round; get it running before expanding.
|
|
65
|
+
3. If an experiment directory already exists, use `inspect_experiment` (or read logs directly) before blindly editing.
|
|
66
|
+
4. If the user only wants plotting / writeup / review, do NOT auto-expand into the full experiment chain.
|
|
67
|
+
|
|
68
|
+
## Forbidden
|
|
69
|
+
|
|
70
|
+
- NEVER call any workspace-custom model pipeline.
|
|
71
|
+
- NEVER initiate model calls from workspace code on your own.
|
|
72
|
+
|
|
73
|
+
## Risk boundaries
|
|
74
|
+
|
|
75
|
+
- If runtime conditions are insufficient, prepare the plan and code only; never pretend the experiment ran.
|
|
76
|
+
- If the experiment is very expensive, confirm the run budget and expected artifacts with the user first.
|
|
@@ -0,0 +1,5 @@
|
|
|
1
|
+
{
|
|
2
|
+
"name": "scientist-experiment-runner",
|
|
3
|
+
"description": "Use when the user asks to '开始实验', '推进实验', 'run experiment', 'advance the experiment', 'turn an idea into experiment', or wants an idea JSON converted into concrete code edits and runs directly in Copilot. Copilot-native — Copilot writes the changes and reads the results in-session; the terminal only executes non-model commands. Do NOT use for plotting (scientist-plotting) or writeup (scientist-writeup).",
|
|
4
|
+
"entry": "SKILL.md"
|
|
5
|
+
}
|