scientify 2.1.0 → 3.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (186) hide show
  1. package/README.en.md +21 -1
  2. package/README.md +27 -0
  3. package/dist/index.d.ts.map +1 -1
  4. package/dist/index.js +2 -77
  5. package/dist/index.js.map +1 -1
  6. package/dist/src/cli/research.d.ts.map +1 -1
  7. package/dist/src/cli/research.js +47 -23
  8. package/dist/src/cli/research.js.map +1 -1
  9. package/dist/src/commands/metabolism-status.d.ts.map +1 -1
  10. package/dist/src/commands/metabolism-status.js +5 -25
  11. package/dist/src/commands/metabolism-status.js.map +1 -1
  12. package/dist/src/commands.d.ts +8 -8
  13. package/dist/src/commands.d.ts.map +1 -1
  14. package/dist/src/commands.js +230 -243
  15. package/dist/src/commands.js.map +1 -1
  16. package/dist/src/release-gate.d.ts +14 -0
  17. package/dist/src/release-gate.d.ts.map +1 -0
  18. package/dist/src/release-gate.js +124 -0
  19. package/dist/src/release-gate.js.map +1 -0
  20. package/dist/src/templates/bootstrap.d.ts.map +1 -1
  21. package/dist/src/templates/bootstrap.js +157 -94
  22. package/dist/src/templates/bootstrap.js.map +1 -1
  23. package/dist/src/types.d.ts +2 -10
  24. package/dist/src/types.d.ts.map +1 -1
  25. package/openclaw.plugin.json +11 -17
  26. package/package.json +2 -3
  27. package/skills/algorithm-selection/SKILL.md +103 -0
  28. package/skills/algorithm-selection/references/candidate-template.md +13 -0
  29. package/skills/algorithm-selection/references/selection-template.md +39 -0
  30. package/skills/artifact-review/SKILL.md +146 -0
  31. package/skills/artifact-review/references/release-gate-template.md +40 -0
  32. package/skills/artifact-review/references/review-checklist.md +45 -0
  33. package/skills/artifact-review/references/style-review-checklist.md +30 -0
  34. package/skills/baseline-runner/SKILL.md +103 -0
  35. package/skills/baseline-runner/references/baseline-matrix-template.md +9 -0
  36. package/skills/baseline-runner/references/baseline-report-template.md +25 -0
  37. package/skills/dataset-validate/SKILL.md +104 -0
  38. package/skills/dataset-validate/references/data-validation-template.md +38 -0
  39. package/skills/figure-standardize/SKILL.md +110 -0
  40. package/skills/figure-standardize/references/caption-template.md +12 -0
  41. package/skills/figure-standardize/references/figure-placement-template.md +30 -0
  42. package/skills/figure-standardize/references/figure-style-guide.md +36 -0
  43. package/skills/idea-generation/SKILL.md +20 -44
  44. package/skills/idea-generation/references/code-mapping.md +3 -3
  45. package/skills/idea-generation/references/idea-template.md +1 -1
  46. package/skills/idea-generation/references/reading-long-papers.md +3 -3
  47. package/skills/metabolism/SKILL.md +80 -36
  48. package/skills/paper-download/SKILL.md +61 -0
  49. package/skills/release-layout/SKILL.md +73 -0
  50. package/skills/release-layout/references/page-structure.md +14 -0
  51. package/skills/research-collect/SKILL.md +41 -111
  52. package/skills/research-experiment/SKILL.md +20 -12
  53. package/skills/research-implement/SKILL.md +10 -11
  54. package/skills/research-pipeline/SKILL.md +23 -31
  55. package/skills/research-plan/SKILL.md +7 -11
  56. package/skills/research-review/SKILL.md +21 -22
  57. package/skills/research-survey/SKILL.md +28 -25
  58. package/skills/write-paper/SKILL.md +252 -0
  59. package/skills/write-paper/references/boundary-notes-template.md +34 -0
  60. package/skills/write-paper/references/claim-inventory-template.md +32 -0
  61. package/skills/write-paper/references/evidence-contract.md +57 -0
  62. package/skills/write-paper/references/figure-callout-template.md +38 -0
  63. package/skills/write-paper/references/figures-manifest-template.md +44 -0
  64. package/skills/write-paper/references/latex/README.md +22 -0
  65. package/skills/write-paper/references/latex/build_paper.sh +41 -0
  66. package/skills/write-paper/references/latex/manuscript.tex +39 -0
  67. package/skills/write-paper/references/latex/references.bib +10 -0
  68. package/skills/write-paper/references/latex/sections/ablations.tex +3 -0
  69. package/skills/write-paper/references/latex/sections/abstract.tex +3 -0
  70. package/skills/write-paper/references/latex/sections/conclusion.tex +3 -0
  71. package/skills/write-paper/references/latex/sections/discussion_scope.tex +7 -0
  72. package/skills/write-paper/references/latex/sections/experimental_protocol.tex +3 -0
  73. package/skills/write-paper/references/latex/sections/introduction.tex +3 -0
  74. package/skills/write-paper/references/latex/sections/main_results.tex +9 -0
  75. package/skills/write-paper/references/latex/sections/method_system.tex +3 -0
  76. package/skills/write-paper/references/latex/sections/problem_setup.tex +3 -0
  77. package/skills/write-paper/references/latex/sections/related_work.tex +3 -0
  78. package/skills/write-paper/references/paper-template.md +155 -0
  79. package/skills/write-paper/references/paragraph-contract.md +139 -0
  80. package/skills/write-paper/references/paragraph-examples.md +171 -0
  81. package/skills/write-paper/references/style-banlist.md +81 -0
  82. package/skills/write-review-paper/SKILL.md +22 -16
  83. package/skills/write-review-paper/references/note-template.md +1 -1
  84. package/skills/write-review-paper/references/survey-template.md +1 -1
  85. package/dist/src/hooks/research-mode.d.ts +0 -22
  86. package/dist/src/hooks/research-mode.d.ts.map +0 -1
  87. package/dist/src/hooks/research-mode.js +0 -35
  88. package/dist/src/hooks/research-mode.js.map +0 -1
  89. package/dist/src/hooks/scientify-cron-autofill.d.ts +0 -15
  90. package/dist/src/hooks/scientify-cron-autofill.d.ts.map +0 -1
  91. package/dist/src/hooks/scientify-cron-autofill.js +0 -156
  92. package/dist/src/hooks/scientify-cron-autofill.js.map +0 -1
  93. package/dist/src/hooks/scientify-signature.d.ts +0 -21
  94. package/dist/src/hooks/scientify-signature.d.ts.map +0 -1
  95. package/dist/src/hooks/scientify-signature.js +0 -150
  96. package/dist/src/hooks/scientify-signature.js.map +0 -1
  97. package/dist/src/knowledge-state/project.d.ts +0 -13
  98. package/dist/src/knowledge-state/project.d.ts.map +0 -1
  99. package/dist/src/knowledge-state/project.js +0 -88
  100. package/dist/src/knowledge-state/project.js.map +0 -1
  101. package/dist/src/knowledge-state/render.d.ts +0 -63
  102. package/dist/src/knowledge-state/render.d.ts.map +0 -1
  103. package/dist/src/knowledge-state/render.js +0 -368
  104. package/dist/src/knowledge-state/render.js.map +0 -1
  105. package/dist/src/knowledge-state/store.d.ts +0 -19
  106. package/dist/src/knowledge-state/store.d.ts.map +0 -1
  107. package/dist/src/knowledge-state/store.js +0 -978
  108. package/dist/src/knowledge-state/store.js.map +0 -1
  109. package/dist/src/knowledge-state/types.d.ts +0 -182
  110. package/dist/src/knowledge-state/types.d.ts.map +0 -1
  111. package/dist/src/knowledge-state/types.js +0 -2
  112. package/dist/src/knowledge-state/types.js.map +0 -1
  113. package/dist/src/literature/subscription-state.d.ts +0 -112
  114. package/dist/src/literature/subscription-state.d.ts.map +0 -1
  115. package/dist/src/literature/subscription-state.js +0 -696
  116. package/dist/src/literature/subscription-state.js.map +0 -1
  117. package/dist/src/research-subscriptions/constants.d.ts +0 -16
  118. package/dist/src/research-subscriptions/constants.d.ts.map +0 -1
  119. package/dist/src/research-subscriptions/constants.js +0 -59
  120. package/dist/src/research-subscriptions/constants.js.map +0 -1
  121. package/dist/src/research-subscriptions/cron-client.d.ts +0 -8
  122. package/dist/src/research-subscriptions/cron-client.d.ts.map +0 -1
  123. package/dist/src/research-subscriptions/cron-client.js +0 -81
  124. package/dist/src/research-subscriptions/cron-client.js.map +0 -1
  125. package/dist/src/research-subscriptions/delivery.d.ts +0 -10
  126. package/dist/src/research-subscriptions/delivery.d.ts.map +0 -1
  127. package/dist/src/research-subscriptions/delivery.js +0 -82
  128. package/dist/src/research-subscriptions/delivery.js.map +0 -1
  129. package/dist/src/research-subscriptions/handlers.d.ts +0 -6
  130. package/dist/src/research-subscriptions/handlers.d.ts.map +0 -1
  131. package/dist/src/research-subscriptions/handlers.js +0 -204
  132. package/dist/src/research-subscriptions/handlers.js.map +0 -1
  133. package/dist/src/research-subscriptions/parse.d.ts +0 -11
  134. package/dist/src/research-subscriptions/parse.d.ts.map +0 -1
  135. package/dist/src/research-subscriptions/parse.js +0 -492
  136. package/dist/src/research-subscriptions/parse.js.map +0 -1
  137. package/dist/src/research-subscriptions/prompt.d.ts +0 -5
  138. package/dist/src/research-subscriptions/prompt.d.ts.map +0 -1
  139. package/dist/src/research-subscriptions/prompt.js +0 -347
  140. package/dist/src/research-subscriptions/prompt.js.map +0 -1
  141. package/dist/src/research-subscriptions/types.d.ts +0 -66
  142. package/dist/src/research-subscriptions/types.d.ts.map +0 -1
  143. package/dist/src/research-subscriptions/types.js +0 -2
  144. package/dist/src/research-subscriptions/types.js.map +0 -1
  145. package/dist/src/research-subscriptions.d.ts +0 -2
  146. package/dist/src/research-subscriptions.d.ts.map +0 -1
  147. package/dist/src/research-subscriptions.js +0 -2
  148. package/dist/src/research-subscriptions.js.map +0 -1
  149. package/dist/src/services/auto-updater.d.ts +0 -15
  150. package/dist/src/services/auto-updater.d.ts.map +0 -1
  151. package/dist/src/services/auto-updater.js +0 -188
  152. package/dist/src/services/auto-updater.js.map +0 -1
  153. package/dist/src/tools/arxiv-download.d.ts +0 -24
  154. package/dist/src/tools/arxiv-download.d.ts.map +0 -1
  155. package/dist/src/tools/arxiv-download.js +0 -177
  156. package/dist/src/tools/arxiv-download.js.map +0 -1
  157. package/dist/src/tools/github-search-tool.d.ts +0 -25
  158. package/dist/src/tools/github-search-tool.d.ts.map +0 -1
  159. package/dist/src/tools/github-search-tool.js +0 -114
  160. package/dist/src/tools/github-search-tool.js.map +0 -1
  161. package/dist/src/tools/openreview-lookup.d.ts +0 -31
  162. package/dist/src/tools/openreview-lookup.d.ts.map +0 -1
  163. package/dist/src/tools/openreview-lookup.js +0 -414
  164. package/dist/src/tools/openreview-lookup.js.map +0 -1
  165. package/dist/src/tools/paper-browser.d.ts +0 -23
  166. package/dist/src/tools/paper-browser.d.ts.map +0 -1
  167. package/dist/src/tools/paper-browser.js +0 -121
  168. package/dist/src/tools/paper-browser.js.map +0 -1
  169. package/dist/src/tools/scientify-cron.d.ts +0 -63
  170. package/dist/src/tools/scientify-cron.d.ts.map +0 -1
  171. package/dist/src/tools/scientify-cron.js +0 -265
  172. package/dist/src/tools/scientify-cron.js.map +0 -1
  173. package/dist/src/tools/scientify-literature-state.d.ts +0 -303
  174. package/dist/src/tools/scientify-literature-state.d.ts.map +0 -1
  175. package/dist/src/tools/scientify-literature-state.js +0 -957
  176. package/dist/src/tools/scientify-literature-state.js.map +0 -1
  177. package/dist/src/tools/unpaywall-download.d.ts +0 -21
  178. package/dist/src/tools/unpaywall-download.d.ts.map +0 -1
  179. package/dist/src/tools/unpaywall-download.js +0 -169
  180. package/dist/src/tools/unpaywall-download.js.map +0 -1
  181. package/dist/src/tools/workspace.d.ts +0 -32
  182. package/dist/src/tools/workspace.d.ts.map +0 -1
  183. package/dist/src/tools/workspace.js +0 -69
  184. package/dist/src/tools/workspace.js.map +0 -1
  185. package/skills/metabolism-init/SKILL.md +0 -80
  186. package/skills/research-subscription/SKILL.md +0 -119
@@ -0,0 +1,13 @@
1
+ # Candidate Route Template
2
+
3
+ Use one block per route.
4
+
5
+ ```markdown
6
+ ## Route A: {name}
7
+ - Core idea:
8
+ - Based on:
9
+ - Expected strengths:
10
+ - Expected risks:
11
+ - Implementation cost: low / medium / high
12
+ - Baseline compatibility:
13
+ ```
@@ -0,0 +1,39 @@
1
+ # Selection Result Template
2
+
3
+ ```markdown
4
+ # Algorithm Selection
5
+
6
+ ## Project Goal
7
+ - task:
8
+ - main metric:
9
+ - key constraints:
10
+
11
+ ## Decision Criteria
12
+ - criterion 1:
13
+ - criterion 2:
14
+ - criterion 3:
15
+
16
+ ## Candidate Options
17
+
18
+ | Route | Core idea | Strengths | Risks | Cost | Basis |
19
+ |-------|-----------|-----------|-------|------|-------|
20
+ | A | ... | ... | ... | ... | ... |
21
+ | B | ... | ... | ... | ... | ... |
22
+
23
+ ## Chosen Route
24
+ - route:
25
+ - why this route:
26
+ - what to implement first:
27
+
28
+ ## Rejected Routes
29
+ - route:
30
+ - why not now:
31
+
32
+ ## Fallback Route
33
+ - route:
34
+ - when to switch:
35
+
36
+ ## Next Step
37
+ - recommended command:
38
+ - expected output:
39
+ ```
@@ -0,0 +1,146 @@
1
+ ---
2
+ name: artifact-review
3
+ description: "Use this when the user wants a draft paper, figure bundle, README, release page, or experiment artifact reviewed before sharing. Checks evidence binding, claim scope, captions, layout clarity, and release readiness."
4
+ metadata:
5
+ {
6
+ "openclaw":
7
+ {
8
+ "emoji": "🧾",
9
+ },
10
+ }
11
+ ---
12
+
13
+ # Artifact Review
14
+
15
+ **Don't ask permission. Just do it.**
16
+
17
+ This is a release-readiness review skill. It does **not** invent new claims or run new experiments. It checks whether the current artifacts are safe to share.
18
+
19
+ ## Required Outputs
20
+
21
+ - `review/artifact_review.md`
22
+ - `review/release_checklist.md`
23
+ - `review/release_gate.json`
24
+
25
+ ## Review Scope
26
+
27
+ Use this for any mix of:
28
+
29
+ - `paper/draft.md`
30
+ - `paper/figures_manifest.md`
31
+ - `review/draft.md`
32
+ - `experiment_res.md`
33
+ - figure bundles
34
+ - `README.md`
35
+ - `docs/index.html`
36
+
37
+ Review the artifact set in one or more of these modes:
38
+
39
+ - `paper review`
40
+ - checks claim scope, evidence binding, baseline wording, and abstract/results discipline
41
+ - `figure review`
42
+ - checks units, legends, captions, readability, and evidence labels
43
+ - `release page review`
44
+ - checks first-screen clarity, artifact entry points, and scope-boundary wording
45
+ - `style review`
46
+ - checks paragraph discipline, quantitative grounding, adjective inflation, and result-vs-interpretation separation
47
+
48
+ ## Workflow
49
+
50
+ ### Step 1: Inventory the Artifact Set
51
+
52
+ List the files being reviewed, the headline claims they appear to make, the source artifact path for each headline claim when available, which figures or tables support them, and which review mode applies to each file (`paper review`, `figure review`, `release page review`, or `style review`).
53
+
54
+ ### Step 2: Review Findings First
55
+
56
+ Write `artifact_review.md` as a findings-first review using severity levels:
57
+
58
+ - `P0` = unsafe to publish as-is
59
+ - `P1` = materially weakens the claim or readability
60
+ - `P2` = polish or consistency issue
61
+
62
+ Each finding must include:
63
+
64
+ - the problem
65
+ - the affected file(s)
66
+ - the `evidence_path` (`N/A` if the issue is structural rather than evidence-bound)
67
+ - the `affected_claim_id` (`N/A` if the issue is not tied to a specific claim)
68
+ - why it matters
69
+ - the concrete fix
70
+
71
+ Also write a top-level line:
72
+
73
+ ```text
74
+ release_verdict: HOLD | CONDITIONAL_GO | GO
75
+ ```
76
+
77
+ Use these verdict rules:
78
+
79
+ - `HOLD`
80
+ - any `P0` finding exists
81
+ - a headline metric has no baseline, no protocol/guardrail, or no source artifact
82
+ - simulator/proxy evidence is written as runtime evidence
83
+ - `CONDITIONAL_GO`
84
+ - no `P0` findings exist, but one or more unresolved `P1` findings remain
85
+ - `GO`
86
+ - no `P0` findings remain
87
+ - no unresolved `P1` finding weakens a headline claim
88
+ - every headline claim can be traced to a concrete source artifact
89
+
90
+ ### Step 3: Check Release Readiness
91
+
92
+ Write `release_checklist.md` using the checklist in `references/review-checklist.md`.
93
+
94
+ If `style review` applies, also use `references/style-review-checklist.md`.
95
+ Then write `review/release_gate.json` using `references/release-gate-template.md`.
96
+
97
+ If a paper-facing figure set exists, explicitly check the figure-text contract across:
98
+
99
+ - `paper/claim_inventory.md`
100
+ - `paper/figures_manifest.md`
101
+ - the first prose callout
102
+ - the figure caption
103
+ - the LaTeX or Markdown figure block
104
+
105
+ `release_gate.json` should include:
106
+
107
+ - `release_verdict`
108
+ - `generated_at`
109
+ - `review_scope`
110
+ - `blocking_findings`
111
+ - `p1_findings`
112
+ - `checked_files`
113
+ - `stale_if_any_newer_than`
114
+
115
+ Use `stale_if_any_newer_than` to list the release-facing artifacts that would invalidate the current gate if they change later, for example:
116
+
117
+ - `paper/draft.md`
118
+ - `paper/claim_inventory.md`
119
+ - `paper/figures_manifest.md`
120
+ - `README.md`
121
+ - `docs/index.html`
122
+
123
+ ## Required Checks
124
+
125
+ 1. Every headline metric has a baseline, protocol/guardrail, and source artifact.
126
+ 2. Simulator/proxy evidence is not written as runtime evidence.
127
+ 3. Figures have readable titles, units, legends, and captions.
128
+ 4. The first screen of README/docs answers:
129
+ - what this is
130
+ - how to use it
131
+ - what artifacts exist
132
+ - what the scope boundary is
133
+ 5. Unsupported claims are downgraded or explicitly marked as open.
134
+ 6. Results paragraphs are quantitative and baseline-anchored.
135
+ 7. Conclusion and abstract do not introduce claims that exceed the allowed confidence or section scope.
136
+ 8. Every headline claim has a matching figure, table, or explicit text-only justification.
137
+ 9. Every paper-facing figure has `supports_claim_ids`, a usable `callout_sentence`, and a caption that names baseline, metric, evidence type, and protocol when relevant.
138
+ 10. Figures are introduced before or adjacent to the claims they are supposed to support.
139
+ 11. `review/release_gate.json` matches the current verdict and names the files that would make the gate stale if changed later.
140
+
141
+ ## Safety Rules
142
+
143
+ 1. If the evidence trail is broken, flag it. Do not repair it with guesswork.
144
+ 2. Prefer short, specific findings over generic writing advice.
145
+ 3. Review the artifact that exists, not the artifact you wish existed.
146
+ 4. If a sentence sounds impressive but is not measurable, downgrade it or flag it.
@@ -0,0 +1,40 @@
1
+ # Release Gate Template
2
+
3
+ Write `review/release_gate.json` using this shape:
4
+
5
+ ```json
6
+ {
7
+ "release_verdict": "CONDITIONAL_GO",
8
+ "generated_at": "2026-04-02T12:00:00Z",
9
+ "review_scope": ["paper", "figure", "release_page", "style"],
10
+ "blocking_findings": 0,
11
+ "p1_findings": 2,
12
+ "checked_files": [
13
+ "paper/draft.md",
14
+ "paper/figures_manifest.md",
15
+ "README.md",
16
+ "docs/index.html"
17
+ ],
18
+ "stale_if_any_newer_than": [
19
+ "paper/draft.md",
20
+ "paper/claim_inventory.md",
21
+ "paper/figures_manifest.md",
22
+ "README.md",
23
+ "docs/index.html"
24
+ ]
25
+ }
26
+ ```
27
+
28
+ Rules:
29
+
30
+ - `release_verdict` must be one of `HOLD`, `CONDITIONAL_GO`, or `GO`.
31
+ - `generated_at` should be an ISO-8601 timestamp.
32
+ - `review_scope` should name the review modes actually used.
33
+ - `blocking_findings` should count `P0` issues.
34
+ - `p1_findings` should count unresolved `P1` issues.
35
+ - `checked_files` should list the concrete files reviewed in this pass.
36
+ - `stale_if_any_newer_than` should list the files that would invalidate the gate if they change after review.
37
+
38
+ Freshness rule:
39
+
40
+ - if any path in `stale_if_any_newer_than` changes after the gate file is written, the gate should be treated as stale and `/artifact-review` should be rerun before sharing.
@@ -0,0 +1,45 @@
1
+ # Release Checklist
2
+
3
+ ```text
4
+ [Required]
5
+ [ ] Every headline metric includes a baseline
6
+ [ ] Every headline metric includes a source artifact path
7
+ [ ] Every headline metric includes a protocol or guardrail
8
+ [ ] Simulator/local_runtime/runtime wording is correct
9
+ [ ] Every headline claim can be traced to a concrete artifact
10
+ [ ] Every headline claim has a figure, table, or explicit text-only justification
11
+ [ ] review/release_gate.json exists and matches the current verdict
12
+ [ ] Paper review findings include affected_claim_id where applicable
13
+ [ ] Figures include units and readable legends
14
+ [ ] Every paper-facing figure has supports_claim_ids in paper/figures_manifest.md
15
+ [ ] Every paper-facing figure has a callout_sentence before or at first use
16
+ [ ] Figure placement is aligned with the claim order in the text
17
+ [ ] Figure captions describe evidence boundary
18
+ [ ] Figure captions include baseline, metric, evidence type, and protocol when relevant
19
+ [ ] README/docs first screen explains what this is
20
+ [ ] README/docs first screen explains how to use it
21
+ [ ] README/docs first screen explains artifact outputs
22
+ [ ] README/docs first screen explains scope boundary
23
+
24
+ [Recommended]
25
+ [ ] Abstract only uses high-confidence claims
26
+ [ ] Result paragraphs can be mapped back to claim_id entries
27
+ [ ] Figure callouts, captions, and figure blocks are consistent with paper/figures_manifest.md
28
+ [ ] review/release_gate.json lists the files that would make the gate stale if changed later
29
+ [ ] Figure titles and captions use consistent naming
30
+ [ ] Release page links directly to paper/review artifacts when they exist
31
+ [ ] Evidence boundaries and missing validations are stated somewhere explicit, even if there is no dedicated limitations section
32
+ ```
33
+
34
+ Verdict mapping:
35
+
36
+ - `HOLD`
37
+ - any required item fails in a way that breaks claim safety
38
+ - simulator/proxy evidence is presented as runtime evidence
39
+ - a headline metric lacks baseline, protocol/guardrail, or source artifact
40
+ - `CONDITIONAL_GO`
41
+ - all required items pass
42
+ - one or more recommended items fail, or unresolved `P1` issues remain
43
+ - `GO`
44
+ - all required items pass
45
+ - no unresolved `P1` issue weakens a headline claim
@@ -0,0 +1,30 @@
1
+ # Style Review Checklist
2
+
3
+ ```text
4
+ [ ] Every result paragraph contains at least one quantitative statement
5
+ [ ] Every comparison sentence names a baseline or comparison target
6
+ [ ] Abstract uses only high-confidence claims
7
+ [ ] No unsupported adjective inflation appears in headline result sentences
8
+ [ ] Observation and interpretation are separable in discussion paragraphs
9
+ [ ] Conclusion does not introduce a new claim
10
+ [ ] Every figure-backed headline claim has a matching callout before or at first use
11
+ [ ] Figures referenced in the text are explained with a takeaway, not just mentioned
12
+ [ ] Figure callouts match the claim and do not overstate the caption or evidence boundary
13
+ [ ] No paragraph merely restates a figure without adding interpretation or boundary
14
+ ```
15
+
16
+ Severity mapping:
17
+
18
+ - `P0`
19
+ - a headline result sentence has no metric or no baseline
20
+ - the abstract uses a low-confidence claim as a headline result
21
+ - a paragraph presents simulator-only evidence as runtime evidence
22
+ - `P1`
23
+ - a results paragraph lacks a boundary or caveat sentence
24
+ - discussion blends observation and interpretation so they cannot be separated
25
+ - a figure is referenced but no takeaway is stated in the text
26
+ - a figure supports a headline claim but the callout/caption/manuscript placement do not line up
27
+ - `P2`
28
+ - paragraph is wordy or repetitive
29
+ - wording is vague but still recoverable without changing the scientific claim
30
+ - sentence order weakens readability but not claim safety
@@ -0,0 +1,103 @@
1
+ ---
2
+ name: baseline-runner
3
+ description: "Use this when the project needs real baseline results before or alongside the main model. Runs classical or literature-aligned baselines under the same protocol and writes a reproducible baseline summary."
4
+ metadata:
5
+ {
6
+ "openclaw":
7
+ {
8
+ "emoji": "📏",
9
+ "requires": { "bins": ["python3", "uv"] },
10
+ },
11
+ }
12
+ ---
13
+
14
+ # Baseline Runner
15
+
16
+ **Don't ask permission. Just do it.**
17
+
18
+ Use this skill when the project needs trustworthy baseline numbers instead of only evaluating the proposed model in isolation.
19
+
20
+ Outputs go to the workspace root.
21
+
22
+ ## Use This When
23
+
24
+ - `plan_res.md` already names baselines
25
+ - `project/` already exists or a baseline implementation path is known
26
+ - the experiment stage needs matched comparison numbers
27
+
28
+ ## Do Not Use This When
29
+
30
+ - the project has not finished survey or planning
31
+ - no baseline method has been identified yet
32
+
33
+ ## Required Inputs
34
+
35
+ - `plan_res.md`
36
+ - `survey_res.md`
37
+ - `project/` when the current project already has runnable code
38
+
39
+ If `plan_res.md` is missing, stop and say: `Run /research-plan first to complete the implementation plan.`
40
+
41
+ ## Required Outputs
42
+
43
+ - `baseline_res.md`
44
+ - `experiments/baselines/` when runnable artifacts are created
45
+
46
+ ## Workflow
47
+
48
+ ### Step 1: Read the Evaluation Contract
49
+
50
+ Read:
51
+
52
+ - `plan_res.md`
53
+ - `survey_res.md`
54
+ - current `experiment_res.md` if it exists
55
+
56
+ Extract:
57
+
58
+ - baseline names
59
+ - evaluation metric
60
+ - protocol or guardrail
61
+ - dataset or workload assumptions
62
+
63
+ ### Step 2: Define the Baseline Matrix
64
+
65
+ Create a small comparison matrix with:
66
+
67
+ - baseline name
68
+ - source or basis
69
+ - expected setup
70
+ - metric
71
+ - status: `ready`, `needs adaptation`, or `missing`
72
+
73
+ Use `references/baseline-matrix-template.md`.
74
+
75
+ ### Step 3: Run or Approximate Baselines Conservatively
76
+
77
+ For each baseline:
78
+
79
+ - if code is runnable under the current workspace, run it
80
+ - if only a lightweight adaptation is needed, implement the minimal adapter
81
+ - if a baseline cannot be run honestly, mark it as unavailable instead of inventing numbers
82
+
83
+ All numeric results must come from actual execution logs or explicit imported evidence.
84
+
85
+ ### Step 4: Write `baseline_res.md`
86
+
87
+ Use `references/baseline-report-template.md`.
88
+
89
+ The report must include:
90
+
91
+ - which baselines were attempted
92
+ - which ones ran successfully
93
+ - the exact metric values
94
+ - the evaluation protocol
95
+ - missing or partial baselines
96
+ - the most comparable baseline for the current project
97
+
98
+ ## Rules
99
+
100
+ 1. Never fabricate baseline numbers.
101
+ 2. Keep the protocol aligned with the main experiment whenever possible.
102
+ 3. If a baseline is only partly comparable, say so explicitly.
103
+ 4. Prefer 2-3 strong baselines over a long weak list.
@@ -0,0 +1,9 @@
1
+ # Baseline Matrix Template
2
+
3
+ ```markdown
4
+ # Baseline Matrix
5
+
6
+ | Baseline | Source | Metric | Protocol | Status | Notes |
7
+ |----------|--------|--------|----------|--------|-------|
8
+ | {name} | {paper/repo} | {metric} | {protocol} | ready / needs adaptation / missing | {note} |
9
+ ```
@@ -0,0 +1,25 @@
1
+ # Baseline Report Template
2
+
3
+ ```markdown
4
+ # Baseline Results
5
+
6
+ ## Evaluation Contract
7
+ - dataset or workload:
8
+ - metric:
9
+ - guardrail or protocol:
10
+
11
+ ## Baselines Attempted
12
+
13
+ | Baseline | Status | Result | Evidence Source | Notes |
14
+ |----------|--------|--------|-----------------|-------|
15
+ | {name} | ran / partial / missing | {value or N/A} | {log or file} | {notes} |
16
+
17
+ ## Most Comparable Baseline
18
+ - baseline:
19
+ - why this is the main comparison:
20
+
21
+ ## Gaps
22
+ - baseline not run:
23
+ - reason:
24
+ - how to close the gap:
25
+ ```
@@ -0,0 +1,104 @@
1
+ ---
2
+ name: dataset-validate
3
+ description: "Use this when the project needs a dedicated data-quality review before model review. Checks data reality, split correctness, label health, leakage risk, shape consistency, and mock-data disclosure."
4
+ metadata:
5
+ {
6
+ "openclaw":
7
+ {
8
+ "emoji": "🗂️",
9
+ "requires": { "bins": ["python3", "uv"] },
10
+ },
11
+ }
12
+ ---
13
+
14
+ # Dataset Validate
15
+
16
+ **Don't ask permission. Just do it.**
17
+
18
+ Use this skill before or alongside model implementation review when data quality needs to be checked separately from model quality.
19
+
20
+ Outputs go to the workspace root.
21
+
22
+ ## Use This When
23
+
24
+ - `plan_res.md` already exists
25
+ - the project is about to implement or has just implemented a model
26
+ - data quality, split quality, or label integrity is still uncertain
27
+
28
+ ## Do Not Use This When
29
+
30
+ - the project has no concrete plan yet
31
+ - there is no dataset or data-loading path to inspect
32
+
33
+ ## Required Inputs
34
+
35
+ - `plan_res.md`
36
+ - `project/` if a data pipeline already exists
37
+ - `survey_res.md` when it defines dataset or protocol expectations
38
+
39
+ If `plan_res.md` is missing, stop and say: `Run /research-plan first to complete the implementation plan.`
40
+
41
+ ## Required Output
42
+
43
+ - `data_validation.md`
44
+
45
+ ## Workflow
46
+
47
+ ### Step 1: Read the Data Contract
48
+
49
+ Read:
50
+
51
+ - `plan_res.md`
52
+ - `survey_res.md` if present
53
+ - current data-loading code under `project/data/` if present
54
+
55
+ Extract:
56
+
57
+ - expected dataset name
58
+ - source
59
+ - split structure
60
+ - label or target format
61
+ - expected shapes
62
+
63
+ ### Step 2: Audit Data Reality
64
+
65
+ Check:
66
+
67
+ - whether dataset files actually exist
68
+ - whether the data is real or mock
69
+ - whether mock usage is clearly declared
70
+ - whether row count / sample count is plausible
71
+
72
+ ### Step 3: Audit Data Integrity
73
+
74
+ Check:
75
+
76
+ - train / val / test split existence and separation
77
+ - label distribution or target sanity
78
+ - shape / dtype consistency
79
+ - obvious leakage risks
80
+ - preprocessing consistency with `plan_res.md`
81
+
82
+ If code exists, run lightweight inspection commands under the project environment to verify counts and sample structure.
83
+
84
+ ### Step 4: Write `data_validation.md`
85
+
86
+ Use `references/data-validation-template.md`.
87
+
88
+ The report must include:
89
+
90
+ - dataset identity
91
+ - data reality check
92
+ - split integrity
93
+ - label / target health
94
+ - leakage risk
95
+ - mock-data disclosure
96
+ - verdict: `PASS`, `NEEDS_REVISION`, or `BLOCKED`
97
+ - exact next step
98
+
99
+ ## Rules
100
+
101
+ 1. Keep data quality separate from model quality.
102
+ 2. Never infer that data is real if the files or loading path are missing.
103
+ 3. If mock data is used, call it out explicitly.
104
+ 4. If data leakage is plausible, treat it as blocking until clarified.
@@ -0,0 +1,38 @@
1
+ # Data Validation Template
2
+
3
+ ```markdown
4
+ # Data Validation
5
+
6
+ ## Dataset Identity
7
+ - dataset:
8
+ - source:
9
+ - expected split:
10
+
11
+ ## Reality Check
12
+ - files present:
13
+ - real or mock:
14
+ - evidence:
15
+
16
+ ## Split Integrity
17
+ - train split:
18
+ - val split:
19
+ - test split:
20
+ - leakage risk:
21
+
22
+ ## Label / Target Health
23
+ - label format:
24
+ - distribution or range:
25
+ - obvious anomalies:
26
+
27
+ ## Preprocessing Check
28
+ - expected preprocessing:
29
+ - observed preprocessing:
30
+ - mismatch:
31
+
32
+ ## Verdict
33
+ - PASS / NEEDS_REVISION / BLOCKED
34
+
35
+ ## Next Step
36
+ - recommended command:
37
+ - reason:
38
+ ```