scientify 3.0.0 โ 3.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.en.md +24 -6
- package/README.md +31 -6
- package/dist/index.js +2 -2
- package/dist/index.js.map +1 -1
- package/dist/src/cli/research.d.ts.map +1 -1
- package/dist/src/cli/research.js +42 -1
- package/dist/src/cli/research.js.map +1 -1
- package/dist/src/commands.d.ts.map +1 -1
- package/dist/src/commands.js +159 -1
- package/dist/src/commands.js.map +1 -1
- package/dist/src/release-gate.d.ts +14 -0
- package/dist/src/release-gate.d.ts.map +1 -0
- package/dist/src/release-gate.js +124 -0
- package/dist/src/release-gate.js.map +1 -0
- package/dist/src/templates/bootstrap.d.ts.map +1 -1
- package/dist/src/templates/bootstrap.js +139 -62
- package/dist/src/templates/bootstrap.js.map +1 -1
- package/openclaw.plugin.json +9 -2
- package/package.json +10 -2
- package/skills/algorithm-selection/SKILL.md +103 -0
- package/skills/algorithm-selection/references/candidate-template.md +13 -0
- package/skills/algorithm-selection/references/selection-template.md +39 -0
- package/skills/artifact-review/SKILL.md +146 -0
- package/skills/artifact-review/references/release-gate-template.md +40 -0
- package/skills/artifact-review/references/review-checklist.md +45 -0
- package/skills/artifact-review/references/style-review-checklist.md +30 -0
- package/skills/baseline-runner/SKILL.md +103 -0
- package/skills/baseline-runner/references/baseline-matrix-template.md +9 -0
- package/skills/baseline-runner/references/baseline-report-template.md +25 -0
- package/skills/dataset-validate/SKILL.md +104 -0
- package/skills/dataset-validate/references/data-validation-template.md +38 -0
- package/skills/figure-standardize/SKILL.md +110 -0
- package/skills/figure-standardize/references/caption-template.md +12 -0
- package/skills/figure-standardize/references/figure-placement-template.md +30 -0
- package/skills/figure-standardize/references/figure-style-guide.md +36 -0
- package/skills/release-layout/SKILL.md +73 -0
- package/skills/release-layout/references/page-structure.md +14 -0
- package/skills/research-experiment/SKILL.md +10 -1
- package/skills/research-survey/SKILL.md +19 -2
- package/skills/write-paper/SKILL.md +252 -0
- package/skills/write-paper/references/boundary-notes-template.md +34 -0
- package/skills/write-paper/references/claim-inventory-template.md +32 -0
- package/skills/write-paper/references/evidence-contract.md +57 -0
- package/skills/write-paper/references/figure-callout-template.md +38 -0
- package/skills/write-paper/references/figures-manifest-template.md +44 -0
- package/skills/write-paper/references/latex/README.md +22 -0
- package/skills/write-paper/references/latex/build_paper.sh +41 -0
- package/skills/write-paper/references/latex/manuscript.tex +39 -0
- package/skills/write-paper/references/latex/references.bib +10 -0
- package/skills/write-paper/references/latex/sections/ablations.tex +3 -0
- package/skills/write-paper/references/latex/sections/abstract.tex +3 -0
- package/skills/write-paper/references/latex/sections/conclusion.tex +3 -0
- package/skills/write-paper/references/latex/sections/discussion_scope.tex +7 -0
- package/skills/write-paper/references/latex/sections/experimental_protocol.tex +3 -0
- package/skills/write-paper/references/latex/sections/introduction.tex +3 -0
- package/skills/write-paper/references/latex/sections/main_results.tex +9 -0
- package/skills/write-paper/references/latex/sections/method_system.tex +3 -0
- package/skills/write-paper/references/latex/sections/problem_setup.tex +3 -0
- package/skills/write-paper/references/latex/sections/related_work.tex +3 -0
- package/skills/write-paper/references/paper-template.md +155 -0
- package/skills/write-paper/references/paragraph-contract.md +139 -0
- package/skills/write-paper/references/paragraph-examples.md +171 -0
- package/skills/write-paper/references/style-banlist.md +81 -0
- package/skills/write-review-paper/SKILL.md +10 -4
|
@@ -0,0 +1,146 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: artifact-review
|
|
3
|
+
description: "Use this when the user wants a draft paper, figure bundle, README, release page, or experiment artifact reviewed before sharing. Checks evidence binding, claim scope, captions, layout clarity, and release readiness."
|
|
4
|
+
metadata:
|
|
5
|
+
{
|
|
6
|
+
"openclaw":
|
|
7
|
+
{
|
|
8
|
+
"emoji": "๐งพ",
|
|
9
|
+
},
|
|
10
|
+
}
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
# Artifact Review
|
|
14
|
+
|
|
15
|
+
**Don't ask permission. Just do it.**
|
|
16
|
+
|
|
17
|
+
This is a release-readiness review skill. It does **not** invent new claims or run new experiments. It checks whether the current artifacts are safe to share.
|
|
18
|
+
|
|
19
|
+
## Required Outputs
|
|
20
|
+
|
|
21
|
+
- `review/artifact_review.md`
|
|
22
|
+
- `review/release_checklist.md`
|
|
23
|
+
- `review/release_gate.json`
|
|
24
|
+
|
|
25
|
+
## Review Scope
|
|
26
|
+
|
|
27
|
+
Use this for any mix of:
|
|
28
|
+
|
|
29
|
+
- `paper/draft.md`
|
|
30
|
+
- `paper/figures_manifest.md`
|
|
31
|
+
- `review/draft.md`
|
|
32
|
+
- `experiment_res.md`
|
|
33
|
+
- figure bundles
|
|
34
|
+
- `README.md`
|
|
35
|
+
- `docs/index.html`
|
|
36
|
+
|
|
37
|
+
Review the artifact set in one or more of these modes:
|
|
38
|
+
|
|
39
|
+
- `paper review`
|
|
40
|
+
- checks claim scope, evidence binding, baseline wording, and abstract/results discipline
|
|
41
|
+
- `figure review`
|
|
42
|
+
- checks units, legends, captions, readability, and evidence labels
|
|
43
|
+
- `release page review`
|
|
44
|
+
- checks first-screen clarity, artifact entry points, and scope-boundary wording
|
|
45
|
+
- `style review`
|
|
46
|
+
- checks paragraph discipline, quantitative grounding, adjective inflation, and result-vs-interpretation separation
|
|
47
|
+
|
|
48
|
+
## Workflow
|
|
49
|
+
|
|
50
|
+
### Step 1: Inventory the Artifact Set
|
|
51
|
+
|
|
52
|
+
List the files being reviewed, the headline claims they appear to make, the source artifact path for each headline claim when available, which figures or tables support them, and which review mode applies to each file (`paper review`, `figure review`, `release page review`, or `style review`).
|
|
53
|
+
|
|
54
|
+
### Step 2: Review Findings First
|
|
55
|
+
|
|
56
|
+
Write `artifact_review.md` as a findings-first review using severity levels:
|
|
57
|
+
|
|
58
|
+
- `P0` = unsafe to publish as-is
|
|
59
|
+
- `P1` = materially weakens the claim or readability
|
|
60
|
+
- `P2` = polish or consistency issue
|
|
61
|
+
|
|
62
|
+
Each finding must include:
|
|
63
|
+
|
|
64
|
+
- the problem
|
|
65
|
+
- the affected file(s)
|
|
66
|
+
- the `evidence_path` (`N/A` if the issue is structural rather than evidence-bound)
|
|
67
|
+
- the `affected_claim_id` (`N/A` if the issue is not tied to a specific claim)
|
|
68
|
+
- why it matters
|
|
69
|
+
- the concrete fix
|
|
70
|
+
|
|
71
|
+
Also write a top-level line:
|
|
72
|
+
|
|
73
|
+
```text
|
|
74
|
+
release_verdict: HOLD | CONDITIONAL_GO | GO
|
|
75
|
+
```
|
|
76
|
+
|
|
77
|
+
Use these verdict rules:
|
|
78
|
+
|
|
79
|
+
- `HOLD`
|
|
80
|
+
- any `P0` finding exists
|
|
81
|
+
- a headline metric has no baseline, no protocol/guardrail, or no source artifact
|
|
82
|
+
- simulator/proxy evidence is written as runtime evidence
|
|
83
|
+
- `CONDITIONAL_GO`
|
|
84
|
+
- no `P0` findings exist, but one or more unresolved `P1` findings remain
|
|
85
|
+
- `GO`
|
|
86
|
+
- no `P0` findings remain
|
|
87
|
+
- no unresolved `P1` finding weakens a headline claim
|
|
88
|
+
- every headline claim can be traced to a concrete source artifact
|
|
89
|
+
|
|
90
|
+
### Step 3: Check Release Readiness
|
|
91
|
+
|
|
92
|
+
Write `release_checklist.md` using the checklist in `references/review-checklist.md`.
|
|
93
|
+
|
|
94
|
+
If `style review` applies, also use `references/style-review-checklist.md`.
|
|
95
|
+
Then write `review/release_gate.json` using `references/release-gate-template.md`.
|
|
96
|
+
|
|
97
|
+
If a paper-facing figure set exists, explicitly check the figure-text contract across:
|
|
98
|
+
|
|
99
|
+
- `paper/claim_inventory.md`
|
|
100
|
+
- `paper/figures_manifest.md`
|
|
101
|
+
- the first prose callout
|
|
102
|
+
- the figure caption
|
|
103
|
+
- the LaTeX or Markdown figure block
|
|
104
|
+
|
|
105
|
+
`release_gate.json` should include:
|
|
106
|
+
|
|
107
|
+
- `release_verdict`
|
|
108
|
+
- `generated_at`
|
|
109
|
+
- `review_scope`
|
|
110
|
+
- `blocking_findings`
|
|
111
|
+
- `p1_findings`
|
|
112
|
+
- `checked_files`
|
|
113
|
+
- `stale_if_any_newer_than`
|
|
114
|
+
|
|
115
|
+
Use `stale_if_any_newer_than` to list the release-facing artifacts that would invalidate the current gate if they change later, for example:
|
|
116
|
+
|
|
117
|
+
- `paper/draft.md`
|
|
118
|
+
- `paper/claim_inventory.md`
|
|
119
|
+
- `paper/figures_manifest.md`
|
|
120
|
+
- `README.md`
|
|
121
|
+
- `docs/index.html`
|
|
122
|
+
|
|
123
|
+
## Required Checks
|
|
124
|
+
|
|
125
|
+
1. Every headline metric has a baseline, protocol/guardrail, and source artifact.
|
|
126
|
+
2. Simulator/proxy evidence is not written as runtime evidence.
|
|
127
|
+
3. Figures have readable titles, units, legends, and captions.
|
|
128
|
+
4. The first screen of README/docs answers:
|
|
129
|
+
- what this is
|
|
130
|
+
- how to use it
|
|
131
|
+
- what artifacts exist
|
|
132
|
+
- what the scope boundary is
|
|
133
|
+
5. Unsupported claims are downgraded or explicitly marked as open.
|
|
134
|
+
6. Results paragraphs are quantitative and baseline-anchored.
|
|
135
|
+
7. Conclusion and abstract do not introduce claims that exceed the allowed confidence or section scope.
|
|
136
|
+
8. Every headline claim has a matching figure, table, or explicit text-only justification.
|
|
137
|
+
9. Every paper-facing figure has `supports_claim_ids`, a usable `callout_sentence`, and a caption that names baseline, metric, evidence type, and protocol when relevant.
|
|
138
|
+
10. Figures are introduced before or adjacent to the claims they are supposed to support.
|
|
139
|
+
11. `review/release_gate.json` matches the current verdict and names the files that would make the gate stale if changed later.
|
|
140
|
+
|
|
141
|
+
## Safety Rules
|
|
142
|
+
|
|
143
|
+
1. If the evidence trail is broken, flag it. Do not repair it with guesswork.
|
|
144
|
+
2. Prefer short, specific findings over generic writing advice.
|
|
145
|
+
3. Review the artifact that exists, not the artifact you wish existed.
|
|
146
|
+
4. If a sentence sounds impressive but is not measurable, downgrade it or flag it.
|
|
@@ -0,0 +1,40 @@
|
|
|
1
|
+
# Release Gate Template
|
|
2
|
+
|
|
3
|
+
Write `review/release_gate.json` using this shape:
|
|
4
|
+
|
|
5
|
+
```json
|
|
6
|
+
{
|
|
7
|
+
"release_verdict": "CONDITIONAL_GO",
|
|
8
|
+
"generated_at": "2026-04-02T12:00:00Z",
|
|
9
|
+
"review_scope": ["paper", "figure", "release_page", "style"],
|
|
10
|
+
"blocking_findings": 0,
|
|
11
|
+
"p1_findings": 2,
|
|
12
|
+
"checked_files": [
|
|
13
|
+
"paper/draft.md",
|
|
14
|
+
"paper/figures_manifest.md",
|
|
15
|
+
"README.md",
|
|
16
|
+
"docs/index.html"
|
|
17
|
+
],
|
|
18
|
+
"stale_if_any_newer_than": [
|
|
19
|
+
"paper/draft.md",
|
|
20
|
+
"paper/claim_inventory.md",
|
|
21
|
+
"paper/figures_manifest.md",
|
|
22
|
+
"README.md",
|
|
23
|
+
"docs/index.html"
|
|
24
|
+
]
|
|
25
|
+
}
|
|
26
|
+
```
|
|
27
|
+
|
|
28
|
+
Rules:
|
|
29
|
+
|
|
30
|
+
- `release_verdict` must be one of `HOLD`, `CONDITIONAL_GO`, or `GO`.
|
|
31
|
+
- `generated_at` should be an ISO-8601 timestamp.
|
|
32
|
+
- `review_scope` should name the review modes actually used.
|
|
33
|
+
- `blocking_findings` should count `P0` issues.
|
|
34
|
+
- `p1_findings` should count unresolved `P1` issues.
|
|
35
|
+
- `checked_files` should list the concrete files reviewed in this pass.
|
|
36
|
+
- `stale_if_any_newer_than` should list the files that would invalidate the gate if they change after review.
|
|
37
|
+
|
|
38
|
+
Freshness rule:
|
|
39
|
+
|
|
40
|
+
- if any path in `stale_if_any_newer_than` changes after the gate file is written, the gate should be treated as stale and `/artifact-review` should be rerun before sharing.
|
|
@@ -0,0 +1,45 @@
|
|
|
1
|
+
# Release Checklist
|
|
2
|
+
|
|
3
|
+
```text
|
|
4
|
+
[Required]
|
|
5
|
+
[ ] Every headline metric includes a baseline
|
|
6
|
+
[ ] Every headline metric includes a source artifact path
|
|
7
|
+
[ ] Every headline metric includes a protocol or guardrail
|
|
8
|
+
[ ] Simulator/local_runtime/runtime wording is correct
|
|
9
|
+
[ ] Every headline claim can be traced to a concrete artifact
|
|
10
|
+
[ ] Every headline claim has a figure, table, or explicit text-only justification
|
|
11
|
+
[ ] review/release_gate.json exists and matches the current verdict
|
|
12
|
+
[ ] Paper review findings include affected_claim_id where applicable
|
|
13
|
+
[ ] Figures include units and readable legends
|
|
14
|
+
[ ] Every paper-facing figure has supports_claim_ids in paper/figures_manifest.md
|
|
15
|
+
[ ] Every paper-facing figure has a callout_sentence before or at first use
|
|
16
|
+
[ ] Figure placement is aligned with the claim order in the text
|
|
17
|
+
[ ] Figure captions describe evidence boundary
|
|
18
|
+
[ ] Figure captions include baseline, metric, evidence type, and protocol when relevant
|
|
19
|
+
[ ] README/docs first screen explains what this is
|
|
20
|
+
[ ] README/docs first screen explains how to use it
|
|
21
|
+
[ ] README/docs first screen explains artifact outputs
|
|
22
|
+
[ ] README/docs first screen explains scope boundary
|
|
23
|
+
|
|
24
|
+
[Recommended]
|
|
25
|
+
[ ] Abstract only uses high-confidence claims
|
|
26
|
+
[ ] Result paragraphs can be mapped back to claim_id entries
|
|
27
|
+
[ ] Figure callouts, captions, and figure blocks are consistent with paper/figures_manifest.md
|
|
28
|
+
[ ] review/release_gate.json lists the files that would make the gate stale if changed later
|
|
29
|
+
[ ] Figure titles and captions use consistent naming
|
|
30
|
+
[ ] Release page links directly to paper/review artifacts when they exist
|
|
31
|
+
[ ] Evidence boundaries and missing validations are stated somewhere explicit, even if there is no dedicated limitations section
|
|
32
|
+
```
|
|
33
|
+
|
|
34
|
+
Verdict mapping:
|
|
35
|
+
|
|
36
|
+
- `HOLD`
|
|
37
|
+
- any required item fails in a way that breaks claim safety
|
|
38
|
+
- simulator/proxy evidence is presented as runtime evidence
|
|
39
|
+
- a headline metric lacks baseline, protocol/guardrail, or source artifact
|
|
40
|
+
- `CONDITIONAL_GO`
|
|
41
|
+
- all required items pass
|
|
42
|
+
- one or more recommended items fail, or unresolved `P1` issues remain
|
|
43
|
+
- `GO`
|
|
44
|
+
- all required items pass
|
|
45
|
+
- no unresolved `P1` issue weakens a headline claim
|
|
@@ -0,0 +1,30 @@
|
|
|
1
|
+
# Style Review Checklist
|
|
2
|
+
|
|
3
|
+
```text
|
|
4
|
+
[ ] Every result paragraph contains at least one quantitative statement
|
|
5
|
+
[ ] Every comparison sentence names a baseline or comparison target
|
|
6
|
+
[ ] Abstract uses only high-confidence claims
|
|
7
|
+
[ ] No unsupported adjective inflation appears in headline result sentences
|
|
8
|
+
[ ] Observation and interpretation are separable in discussion paragraphs
|
|
9
|
+
[ ] Conclusion does not introduce a new claim
|
|
10
|
+
[ ] Every figure-backed headline claim has a matching callout before or at first use
|
|
11
|
+
[ ] Figures referenced in the text are explained with a takeaway, not just mentioned
|
|
12
|
+
[ ] Figure callouts match the claim and do not overstate the caption or evidence boundary
|
|
13
|
+
[ ] No paragraph merely restates a figure without adding interpretation or boundary
|
|
14
|
+
```
|
|
15
|
+
|
|
16
|
+
Severity mapping:
|
|
17
|
+
|
|
18
|
+
- `P0`
|
|
19
|
+
- a headline result sentence has no metric or no baseline
|
|
20
|
+
- the abstract uses a low-confidence claim as a headline result
|
|
21
|
+
- a paragraph presents simulator-only evidence as runtime evidence
|
|
22
|
+
- `P1`
|
|
23
|
+
- a results paragraph lacks a boundary or caveat sentence
|
|
24
|
+
- discussion blends observation and interpretation so they cannot be separated
|
|
25
|
+
- a figure is referenced but no takeaway is stated in the text
|
|
26
|
+
- a figure supports a headline claim but the callout/caption/manuscript placement do not line up
|
|
27
|
+
- `P2`
|
|
28
|
+
- paragraph is wordy or repetitive
|
|
29
|
+
- wording is vague but still recoverable without changing the scientific claim
|
|
30
|
+
- sentence order weakens readability but not claim safety
|
|
@@ -0,0 +1,103 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: baseline-runner
|
|
3
|
+
description: "Use this when the project needs real baseline results before or alongside the main model. Runs classical or literature-aligned baselines under the same protocol and writes a reproducible baseline summary."
|
|
4
|
+
metadata:
|
|
5
|
+
{
|
|
6
|
+
"openclaw":
|
|
7
|
+
{
|
|
8
|
+
"emoji": "๐",
|
|
9
|
+
"requires": { "bins": ["python3", "uv"] },
|
|
10
|
+
},
|
|
11
|
+
}
|
|
12
|
+
---
|
|
13
|
+
|
|
14
|
+
# Baseline Runner
|
|
15
|
+
|
|
16
|
+
**Don't ask permission. Just do it.**
|
|
17
|
+
|
|
18
|
+
Use this skill when the project needs trustworthy baseline numbers instead of only evaluating the proposed model in isolation.
|
|
19
|
+
|
|
20
|
+
Outputs go to the workspace root.
|
|
21
|
+
|
|
22
|
+
## Use This When
|
|
23
|
+
|
|
24
|
+
- `plan_res.md` already names baselines
|
|
25
|
+
- `project/` already exists or a baseline implementation path is known
|
|
26
|
+
- the experiment stage needs matched comparison numbers
|
|
27
|
+
|
|
28
|
+
## Do Not Use This When
|
|
29
|
+
|
|
30
|
+
- the project has not finished survey or planning
|
|
31
|
+
- no baseline method has been identified yet
|
|
32
|
+
|
|
33
|
+
## Required Inputs
|
|
34
|
+
|
|
35
|
+
- `plan_res.md`
|
|
36
|
+
- `survey_res.md`
|
|
37
|
+
- `project/` when the current project already has runnable code
|
|
38
|
+
|
|
39
|
+
If `plan_res.md` is missing, stop and say: `Run /research-plan first to complete the implementation plan.`
|
|
40
|
+
|
|
41
|
+
## Required Outputs
|
|
42
|
+
|
|
43
|
+
- `baseline_res.md`
|
|
44
|
+
- `experiments/baselines/` when runnable artifacts are created
|
|
45
|
+
|
|
46
|
+
## Workflow
|
|
47
|
+
|
|
48
|
+
### Step 1: Read the Evaluation Contract
|
|
49
|
+
|
|
50
|
+
Read:
|
|
51
|
+
|
|
52
|
+
- `plan_res.md`
|
|
53
|
+
- `survey_res.md`
|
|
54
|
+
- current `experiment_res.md` if it exists
|
|
55
|
+
|
|
56
|
+
Extract:
|
|
57
|
+
|
|
58
|
+
- baseline names
|
|
59
|
+
- evaluation metric
|
|
60
|
+
- protocol or guardrail
|
|
61
|
+
- dataset or workload assumptions
|
|
62
|
+
|
|
63
|
+
### Step 2: Define the Baseline Matrix
|
|
64
|
+
|
|
65
|
+
Create a small comparison matrix with:
|
|
66
|
+
|
|
67
|
+
- baseline name
|
|
68
|
+
- source or basis
|
|
69
|
+
- expected setup
|
|
70
|
+
- metric
|
|
71
|
+
- status: `ready`, `needs adaptation`, or `missing`
|
|
72
|
+
|
|
73
|
+
Use `references/baseline-matrix-template.md`.
|
|
74
|
+
|
|
75
|
+
### Step 3: Run or Approximate Baselines Conservatively
|
|
76
|
+
|
|
77
|
+
For each baseline:
|
|
78
|
+
|
|
79
|
+
- if code is runnable under the current workspace, run it
|
|
80
|
+
- if only a lightweight adaptation is needed, implement the minimal adapter
|
|
81
|
+
- if a baseline cannot be run honestly, mark it as unavailable instead of inventing numbers
|
|
82
|
+
|
|
83
|
+
All numeric results must come from actual execution logs or explicit imported evidence.
|
|
84
|
+
|
|
85
|
+
### Step 4: Write `baseline_res.md`
|
|
86
|
+
|
|
87
|
+
Use `references/baseline-report-template.md`.
|
|
88
|
+
|
|
89
|
+
The report must include:
|
|
90
|
+
|
|
91
|
+
- which baselines were attempted
|
|
92
|
+
- which ones ran successfully
|
|
93
|
+
- the exact metric values
|
|
94
|
+
- the evaluation protocol
|
|
95
|
+
- missing or partial baselines
|
|
96
|
+
- the most comparable baseline for the current project
|
|
97
|
+
|
|
98
|
+
## Rules
|
|
99
|
+
|
|
100
|
+
1. Never fabricate baseline numbers.
|
|
101
|
+
2. Keep the protocol aligned with the main experiment whenever possible.
|
|
102
|
+
3. If a baseline is only partly comparable, say so explicitly.
|
|
103
|
+
4. Prefer 2-3 strong baselines over a long weak list.
|
|
@@ -0,0 +1,9 @@
|
|
|
1
|
+
# Baseline Matrix Template
|
|
2
|
+
|
|
3
|
+
```markdown
|
|
4
|
+
# Baseline Matrix
|
|
5
|
+
|
|
6
|
+
| Baseline | Source | Metric | Protocol | Status | Notes |
|
|
7
|
+
|----------|--------|--------|----------|--------|-------|
|
|
8
|
+
| {name} | {paper/repo} | {metric} | {protocol} | ready / needs adaptation / missing | {note} |
|
|
9
|
+
```
|
|
@@ -0,0 +1,25 @@
|
|
|
1
|
+
# Baseline Report Template
|
|
2
|
+
|
|
3
|
+
```markdown
|
|
4
|
+
# Baseline Results
|
|
5
|
+
|
|
6
|
+
## Evaluation Contract
|
|
7
|
+
- dataset or workload:
|
|
8
|
+
- metric:
|
|
9
|
+
- guardrail or protocol:
|
|
10
|
+
|
|
11
|
+
## Baselines Attempted
|
|
12
|
+
|
|
13
|
+
| Baseline | Status | Result | Evidence Source | Notes |
|
|
14
|
+
|----------|--------|--------|-----------------|-------|
|
|
15
|
+
| {name} | ran / partial / missing | {value or N/A} | {log or file} | {notes} |
|
|
16
|
+
|
|
17
|
+
## Most Comparable Baseline
|
|
18
|
+
- baseline:
|
|
19
|
+
- why this is the main comparison:
|
|
20
|
+
|
|
21
|
+
## Gaps
|
|
22
|
+
- baseline not run:
|
|
23
|
+
- reason:
|
|
24
|
+
- how to close the gap:
|
|
25
|
+
```
|
|
@@ -0,0 +1,104 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: dataset-validate
|
|
3
|
+
description: "Use this when the project needs a dedicated data-quality review before model review. Checks data reality, split correctness, label health, leakage risk, shape consistency, and mock-data disclosure."
|
|
4
|
+
metadata:
|
|
5
|
+
{
|
|
6
|
+
"openclaw":
|
|
7
|
+
{
|
|
8
|
+
"emoji": "๐๏ธ",
|
|
9
|
+
"requires": { "bins": ["python3", "uv"] },
|
|
10
|
+
},
|
|
11
|
+
}
|
|
12
|
+
---
|
|
13
|
+
|
|
14
|
+
# Dataset Validate
|
|
15
|
+
|
|
16
|
+
**Don't ask permission. Just do it.**
|
|
17
|
+
|
|
18
|
+
Use this skill before or alongside model implementation review when data quality needs to be checked separately from model quality.
|
|
19
|
+
|
|
20
|
+
Outputs go to the workspace root.
|
|
21
|
+
|
|
22
|
+
## Use This When
|
|
23
|
+
|
|
24
|
+
- `plan_res.md` already exists
|
|
25
|
+
- the project is about to implement or has just implemented a model
|
|
26
|
+
- data quality, split quality, or label integrity is still uncertain
|
|
27
|
+
|
|
28
|
+
## Do Not Use This When
|
|
29
|
+
|
|
30
|
+
- the project has no concrete plan yet
|
|
31
|
+
- there is no dataset or data-loading path to inspect
|
|
32
|
+
|
|
33
|
+
## Required Inputs
|
|
34
|
+
|
|
35
|
+
- `plan_res.md`
|
|
36
|
+
- `project/` if a data pipeline already exists
|
|
37
|
+
- `survey_res.md` when it defines dataset or protocol expectations
|
|
38
|
+
|
|
39
|
+
If `plan_res.md` is missing, stop and say: `Run /research-plan first to complete the implementation plan.`
|
|
40
|
+
|
|
41
|
+
## Required Output
|
|
42
|
+
|
|
43
|
+
- `data_validation.md`
|
|
44
|
+
|
|
45
|
+
## Workflow
|
|
46
|
+
|
|
47
|
+
### Step 1: Read the Data Contract
|
|
48
|
+
|
|
49
|
+
Read:
|
|
50
|
+
|
|
51
|
+
- `plan_res.md`
|
|
52
|
+
- `survey_res.md` if present
|
|
53
|
+
- current data-loading code under `project/data/` if present
|
|
54
|
+
|
|
55
|
+
Extract:
|
|
56
|
+
|
|
57
|
+
- expected dataset name
|
|
58
|
+
- source
|
|
59
|
+
- split structure
|
|
60
|
+
- label or target format
|
|
61
|
+
- expected shapes
|
|
62
|
+
|
|
63
|
+
### Step 2: Audit Data Reality
|
|
64
|
+
|
|
65
|
+
Check:
|
|
66
|
+
|
|
67
|
+
- whether dataset files actually exist
|
|
68
|
+
- whether the data is real or mock
|
|
69
|
+
- whether mock usage is clearly declared
|
|
70
|
+
- whether row count / sample count is plausible
|
|
71
|
+
|
|
72
|
+
### Step 3: Audit Data Integrity
|
|
73
|
+
|
|
74
|
+
Check:
|
|
75
|
+
|
|
76
|
+
- train / val / test split existence and separation
|
|
77
|
+
- label distribution or target sanity
|
|
78
|
+
- shape / dtype consistency
|
|
79
|
+
- obvious leakage risks
|
|
80
|
+
- preprocessing consistency with `plan_res.md`
|
|
81
|
+
|
|
82
|
+
If code exists, run lightweight inspection commands under the project environment to verify counts and sample structure.
|
|
83
|
+
|
|
84
|
+
### Step 4: Write `data_validation.md`
|
|
85
|
+
|
|
86
|
+
Use `references/data-validation-template.md`.
|
|
87
|
+
|
|
88
|
+
The report must include:
|
|
89
|
+
|
|
90
|
+
- dataset identity
|
|
91
|
+
- data reality check
|
|
92
|
+
- split integrity
|
|
93
|
+
- label / target health
|
|
94
|
+
- leakage risk
|
|
95
|
+
- mock-data disclosure
|
|
96
|
+
- verdict: `PASS`, `NEEDS_REVISION`, or `BLOCKED`
|
|
97
|
+
- exact next step
|
|
98
|
+
|
|
99
|
+
## Rules
|
|
100
|
+
|
|
101
|
+
1. Keep data quality separate from model quality.
|
|
102
|
+
2. Never infer that data is real if the files or loading path are missing.
|
|
103
|
+
3. If mock data is used, call it out explicitly.
|
|
104
|
+
4. If data leakage is plausible, treat it as blocking until clarified.
|
|
@@ -0,0 +1,38 @@
|
|
|
1
|
+
# Data Validation Template
|
|
2
|
+
|
|
3
|
+
```markdown
|
|
4
|
+
# Data Validation
|
|
5
|
+
|
|
6
|
+
## Dataset Identity
|
|
7
|
+
- dataset:
|
|
8
|
+
- source:
|
|
9
|
+
- expected split:
|
|
10
|
+
|
|
11
|
+
## Reality Check
|
|
12
|
+
- files present:
|
|
13
|
+
- real or mock:
|
|
14
|
+
- evidence:
|
|
15
|
+
|
|
16
|
+
## Split Integrity
|
|
17
|
+
- train split:
|
|
18
|
+
- val split:
|
|
19
|
+
- test split:
|
|
20
|
+
- leakage risk:
|
|
21
|
+
|
|
22
|
+
## Label / Target Health
|
|
23
|
+
- label format:
|
|
24
|
+
- distribution or range:
|
|
25
|
+
- obvious anomalies:
|
|
26
|
+
|
|
27
|
+
## Preprocessing Check
|
|
28
|
+
- expected preprocessing:
|
|
29
|
+
- observed preprocessing:
|
|
30
|
+
- mismatch:
|
|
31
|
+
|
|
32
|
+
## Verdict
|
|
33
|
+
- PASS / NEEDS_REVISION / BLOCKED
|
|
34
|
+
|
|
35
|
+
## Next Step
|
|
36
|
+
- recommended command:
|
|
37
|
+
- reason:
|
|
38
|
+
```
|
|
@@ -0,0 +1,110 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: figure-standardize
|
|
3
|
+
description: "Use this when the user wants to improve chart quality, standardize plotting style, regenerate release figures, or add captions/protocol notes. Normalizes fonts, colors, legends, units, and scope notes across Scientify figures."
|
|
4
|
+
metadata:
|
|
5
|
+
{
|
|
6
|
+
"openclaw":
|
|
7
|
+
{
|
|
8
|
+
"emoji": "๐",
|
|
9
|
+
},
|
|
10
|
+
}
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
# Figure Standardization
|
|
14
|
+
|
|
15
|
+
**Don't ask permission. Just do it.**
|
|
16
|
+
|
|
17
|
+
Use this skill to turn one-off Scientify charts into release-ready figures.
|
|
18
|
+
|
|
19
|
+
**Do not run new experiments here.** Work from existing results, plotting scripts, and figure bundles. If the source data is missing or inconsistent, report that explicitly instead of smoothing it over.
|
|
20
|
+
|
|
21
|
+
## Required Outputs
|
|
22
|
+
|
|
23
|
+
1. Updated plotting script(s) or a shared style helper
|
|
24
|
+
2. Regenerated `.png` and `.pdf` files when the pipeline supports both
|
|
25
|
+
3. A figure spec file:
|
|
26
|
+
- prefer `reports/figures/figure_spec.md`
|
|
27
|
+
- otherwise `project/figures/figure_spec.md`
|
|
28
|
+
4. `paper/figures_manifest.md` when the figure family is paper-facing or a `paper/` workspace already exists
|
|
29
|
+
|
|
30
|
+
## Workflow
|
|
31
|
+
|
|
32
|
+
### Step 1: Inspect Inputs
|
|
33
|
+
|
|
34
|
+
Read:
|
|
35
|
+
|
|
36
|
+
- existing figures
|
|
37
|
+
- the generator script(s)
|
|
38
|
+
- the result tables / JSON / Markdown that feed the figures
|
|
39
|
+
- any surrounding README or release notes that explain the figure family
|
|
40
|
+
|
|
41
|
+
Prefer improving an existing generator over creating a second one-off script.
|
|
42
|
+
|
|
43
|
+
### Step 2: Standardize the Figure Family
|
|
44
|
+
|
|
45
|
+
Normalize the full family, not just one chart:
|
|
46
|
+
|
|
47
|
+
- font family and title hierarchy
|
|
48
|
+
- semantic color mapping
|
|
49
|
+
- axis labels and units
|
|
50
|
+
- legend order and naming
|
|
51
|
+
- decimal precision and tick formatting
|
|
52
|
+
- line widths / marker sizes
|
|
53
|
+
- caption structure
|
|
54
|
+
- protocol note wording
|
|
55
|
+
- callout wording
|
|
56
|
+
- paper placement intent
|
|
57
|
+
|
|
58
|
+
Use:
|
|
59
|
+
|
|
60
|
+
- `references/figure-style-guide.md`
|
|
61
|
+
- `references/caption-template.md`
|
|
62
|
+
- `references/figure-placement-template.md`
|
|
63
|
+
|
|
64
|
+
### Step 3: Write the Figure Spec
|
|
65
|
+
|
|
66
|
+
Create or update `figure_spec.md` with one section per figure:
|
|
67
|
+
|
|
68
|
+
- figure filename
|
|
69
|
+
- source files
|
|
70
|
+
- metrics shown
|
|
71
|
+
- baseline or comparison family
|
|
72
|
+
- quality guard / evaluation constraint
|
|
73
|
+
- simulator/runtime note
|
|
74
|
+
- intended takeaway
|
|
75
|
+
|
|
76
|
+
If the figure is used in a paper or paper-facing report, also create or update the matching entry in `paper/figures_manifest.md` with:
|
|
77
|
+
|
|
78
|
+
- `figure_id`
|
|
79
|
+
- `file_path`
|
|
80
|
+
- `latex_label`
|
|
81
|
+
- `section`
|
|
82
|
+
- `placement_hint`
|
|
83
|
+
- `caption_short`
|
|
84
|
+
- `caption_long`
|
|
85
|
+
- `takeaway_sentence`
|
|
86
|
+
- `callout_sentence`
|
|
87
|
+
- `baseline`
|
|
88
|
+
- `evidence_type`
|
|
89
|
+
- `source_metrics`
|
|
90
|
+
- `source_files`
|
|
91
|
+
- `supports_claim_ids`
|
|
92
|
+
- `must_appear_before_claim_ids`
|
|
93
|
+
|
|
94
|
+
Keep `figure_spec.md` and `paper/figures_manifest.md` aligned. The spec is the release-facing summary; the manifest is the paper-facing contract.
|
|
95
|
+
|
|
96
|
+
### Step 4: Re-render and Verify
|
|
97
|
+
|
|
98
|
+
Re-render the figures after script updates.
|
|
99
|
+
|
|
100
|
+
Keep filenames stable unless the user explicitly asked for a new release bundle.
|
|
101
|
+
|
|
102
|
+
## Figure Rules
|
|
103
|
+
|
|
104
|
+
1. Keep metric semantics identical across a figure family.
|
|
105
|
+
2. Always show units explicitly.
|
|
106
|
+
3. If a result comes from simulator or proxy evaluation, state that in the caption or protocol note.
|
|
107
|
+
4. Do not hide failing or quality-guard-breaking baselines; mark them clearly.
|
|
108
|
+
5. Do not change the scientific claim. This skill improves packaging, not evidence.
|
|
109
|
+
6. If a figure is paper-facing, produce both a long caption and a first-use callout sentence.
|
|
110
|
+
7. If a figure supports a claim, the manifest must name that claim in `supports_claim_ids`.
|
|
@@ -0,0 +1,12 @@
|
|
|
1
|
+
# Caption Template
|
|
2
|
+
|
|
3
|
+
```text
|
|
4
|
+
caption_short:
|
|
5
|
+
Figure X. {Short statement of what is compared}.
|
|
6
|
+
|
|
7
|
+
caption_long:
|
|
8
|
+
Figure X. {What the figure shows and the main takeaway}. Baseline: {baseline family}. Metric: {metric + unit}. Evidence type: {simulator/local_runtime/runtime}. Protocol: {guardrail or evaluation scope}. Boundary: {scope note if needed}.
|
|
9
|
+
|
|
10
|
+
callout_sentence:
|
|
11
|
+
Figure \ref{{latex_label}} shows {takeaway_sentence} under {protocol or evaluation frame}.
|
|
12
|
+
```
|