npm - scientify - Versions diffs - 3.0.0 → 3.1.0 - Mend

scientify 3.0.0 → 3.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (64) hide show

package/skills/artifact-review/references/release-gate-template.md ADDED Viewed

@@ -0,0 +1,40 @@
+# Release Gate Template
+Write `review/release_gate.json` using this shape:
+```json
+{
+  "release_verdict": "CONDITIONAL_GO",
+  "generated_at": "2026-04-02T12:00:00Z",
+  "review_scope": ["paper", "figure", "release_page", "style"],
+  "blocking_findings": 0,
+  "p1_findings": 2,
+  "checked_files": [
+    "paper/draft.md",
+    "paper/figures_manifest.md",
+    "README.md",
+    "docs/index.html"
+  ],
+  "stale_if_any_newer_than": [
+    "paper/draft.md",
+    "paper/claim_inventory.md",
+    "paper/figures_manifest.md",
+    "README.md",
+    "docs/index.html"
+  ]
+}
+```
+Rules:
+- `release_verdict` must be one of `HOLD`, `CONDITIONAL_GO`, or `GO`.
+- `generated_at` should be an ISO-8601 timestamp.
+- `review_scope` should name the review modes actually used.
+- `blocking_findings` should count `P0` issues.
+- `p1_findings` should count unresolved `P1` issues.
+- `checked_files` should list the concrete files reviewed in this pass.
+- `stale_if_any_newer_than` should list the files that would invalidate the gate if they change after review.
+Freshness rule:
+- if any path in `stale_if_any_newer_than` changes after the gate file is written, the gate should be treated as stale and `/artifact-review` should be rerun before sharing.

package/skills/artifact-review/references/review-checklist.md ADDED Viewed

@@ -0,0 +1,45 @@
+# Release Checklist
+```text
+[Required]
+[ ] Every headline metric includes a baseline
+[ ] Every headline metric includes a source artifact path
+[ ] Every headline metric includes a protocol or guardrail
+[ ] Simulator/local_runtime/runtime wording is correct
+[ ] Every headline claim can be traced to a concrete artifact
+[ ] Every headline claim has a figure, table, or explicit text-only justification
+[ ] review/release_gate.json exists and matches the current verdict
+[ ] Paper review findings include affected_claim_id where applicable
+[ ] Figures include units and readable legends
+[ ] Every paper-facing figure has supports_claim_ids in paper/figures_manifest.md
+[ ] Every paper-facing figure has a callout_sentence before or at first use
+[ ] Figure placement is aligned with the claim order in the text
+[ ] Figure captions describe evidence boundary
+[ ] Figure captions include baseline, metric, evidence type, and protocol when relevant
+[ ] README/docs first screen explains what this is
+[ ] README/docs first screen explains how to use it
+[ ] README/docs first screen explains artifact outputs
+[ ] README/docs first screen explains scope boundary
+[Recommended]
+[ ] Abstract only uses high-confidence claims
+[ ] Result paragraphs can be mapped back to claim_id entries
+[ ] Figure callouts, captions, and figure blocks are consistent with paper/figures_manifest.md
+[ ] review/release_gate.json lists the files that would make the gate stale if changed later
+[ ] Figure titles and captions use consistent naming
+[ ] Release page links directly to paper/review artifacts when they exist
+[ ] Evidence boundaries and missing validations are stated somewhere explicit, even if there is no dedicated limitations section
+```
+Verdict mapping:
+- `HOLD`
+  - any required item fails in a way that breaks claim safety
+  - simulator/proxy evidence is presented as runtime evidence
+  - a headline metric lacks baseline, protocol/guardrail, or source artifact
+- `CONDITIONAL_GO`
+  - all required items pass
+  - one or more recommended items fail, or unresolved `P1` issues remain
+- `GO`
+  - all required items pass
+  - no unresolved `P1` issue weakens a headline claim

package/skills/artifact-review/references/style-review-checklist.md ADDED Viewed

@@ -0,0 +1,30 @@
+# Style Review Checklist
+```text
+[ ] Every result paragraph contains at least one quantitative statement
+[ ] Every comparison sentence names a baseline or comparison target
+[ ] Abstract uses only high-confidence claims
+[ ] No unsupported adjective inflation appears in headline result sentences
+[ ] Observation and interpretation are separable in discussion paragraphs
+[ ] Conclusion does not introduce a new claim
+[ ] Every figure-backed headline claim has a matching callout before or at first use
+[ ] Figures referenced in the text are explained with a takeaway, not just mentioned
+[ ] Figure callouts match the claim and do not overstate the caption or evidence boundary
+[ ] No paragraph merely restates a figure without adding interpretation or boundary
+```
+Severity mapping:
+- `P0`
+  - a headline result sentence has no metric or no baseline
+  - the abstract uses a low-confidence claim as a headline result
+  - a paragraph presents simulator-only evidence as runtime evidence
+- `P1`
+  - a results paragraph lacks a boundary or caveat sentence
+  - discussion blends observation and interpretation so they cannot be separated
+  - a figure is referenced but no takeaway is stated in the text
+  - a figure supports a headline claim but the callout/caption/manuscript placement do not line up
+- `P2`
+  - paragraph is wordy or repetitive
+  - wording is vague but still recoverable without changing the scientific claim
+  - sentence order weakens readability but not claim safety

package/skills/baseline-runner/SKILL.md ADDED Viewed

@@ -0,0 +1,103 @@
+---
+name: baseline-runner
+description: "Use this when the project needs real baseline results before or alongside the main model. Runs classical or literature-aligned baselines under the same protocol and writes a reproducible baseline summary."
+metadata:
+  {
+    "openclaw":
+      {
+        "emoji": "📏",
+        "requires": { "bins": ["python3", "uv"] },
+      },
+  }
+---
+# Baseline Runner
+**Don't ask permission. Just do it.**
+Use this skill when the project needs trustworthy baseline numbers instead of only evaluating the proposed model in isolation.
+Outputs go to the workspace root.
+## Use This When
+- `plan_res.md` already names baselines
+- `project/` already exists or a baseline implementation path is known
+- the experiment stage needs matched comparison numbers
+## Do Not Use This When
+- the project has not finished survey or planning
+- no baseline method has been identified yet
+## Required Inputs
+- `plan_res.md`
+- `survey_res.md`
+- `project/` when the current project already has runnable code
+If `plan_res.md` is missing, stop and say: `Run /research-plan first to complete the implementation plan.`
+## Required Outputs
+- `baseline_res.md`
+- `experiments/baselines/` when runnable artifacts are created
+## Workflow
+### Step 1: Read the Evaluation Contract
+Read:
+- `plan_res.md`
+- `survey_res.md`
+- current `experiment_res.md` if it exists
+Extract:
+- baseline names
+- evaluation metric
+- protocol or guardrail
+- dataset or workload assumptions
+### Step 2: Define the Baseline Matrix
+Create a small comparison matrix with:
+- baseline name
+- source or basis
+- expected setup
+- metric
+- status: `ready`, `needs adaptation`, or `missing`
+Use `references/baseline-matrix-template.md`.
+### Step 3: Run or Approximate Baselines Conservatively
+For each baseline:
+- if code is runnable under the current workspace, run it
+- if only a lightweight adaptation is needed, implement the minimal adapter
+- if a baseline cannot be run honestly, mark it as unavailable instead of inventing numbers
+All numeric results must come from actual execution logs or explicit imported evidence.
+### Step 4: Write `baseline_res.md`
+Use `references/baseline-report-template.md`.
+The report must include:
+- which baselines were attempted
+- which ones ran successfully
+- the exact metric values
+- the evaluation protocol
+- missing or partial baselines
+- the most comparable baseline for the current project
+## Rules
+1. Never fabricate baseline numbers.
+2. Keep the protocol aligned with the main experiment whenever possible.
+3. If a baseline is only partly comparable, say so explicitly.
+4. Prefer 2-3 strong baselines over a long weak list.

package/skills/baseline-runner/references/baseline-matrix-template.md ADDED Viewed

@@ -0,0 +1,9 @@
+# Baseline Matrix Template
+```markdown
+# Baseline Matrix
+| Baseline | Source | Metric | Protocol | Status | Notes |
+|----------|--------|--------|----------|--------|-------|
+| {name} | {paper/repo} | {metric} | {protocol} | ready / needs adaptation / missing | {note} |
+```

package/skills/baseline-runner/references/baseline-report-template.md ADDED Viewed

@@ -0,0 +1,25 @@
+# Baseline Report Template
+```markdown
+# Baseline Results
+## Evaluation Contract
+- dataset or workload:
+- metric:
+- guardrail or protocol:
+## Baselines Attempted
+| Baseline | Status | Result | Evidence Source | Notes |
+|----------|--------|--------|-----------------|-------|
+| {name} | ran / partial / missing | {value or N/A} | {log or file} | {notes} |
+## Most Comparable Baseline
+- baseline:
+- why this is the main comparison:
+## Gaps
+- baseline not run:
+- reason:
+- how to close the gap:
+```

package/skills/dataset-validate/SKILL.md ADDED Viewed

@@ -0,0 +1,104 @@
+---
+name: dataset-validate
+description: "Use this when the project needs a dedicated data-quality review before model review. Checks data reality, split correctness, label health, leakage risk, shape consistency, and mock-data disclosure."
+metadata:
+  {
+    "openclaw":
+      {
+        "emoji": "🗂️",
+        "requires": { "bins": ["python3", "uv"] },
+      },
+  }
+---
+# Dataset Validate
+**Don't ask permission. Just do it.**
+Use this skill before or alongside model implementation review when data quality needs to be checked separately from model quality.
+Outputs go to the workspace root.
+## Use This When
+- `plan_res.md` already exists
+- the project is about to implement or has just implemented a model
+- data quality, split quality, or label integrity is still uncertain
+## Do Not Use This When
+- the project has no concrete plan yet
+- there is no dataset or data-loading path to inspect
+## Required Inputs
+- `plan_res.md`
+- `project/` if a data pipeline already exists
+- `survey_res.md` when it defines dataset or protocol expectations
+If `plan_res.md` is missing, stop and say: `Run /research-plan first to complete the implementation plan.`
+## Required Output
+- `data_validation.md`
+## Workflow
+### Step 1: Read the Data Contract
+Read:
+- `plan_res.md`
+- `survey_res.md` if present
+- current data-loading code under `project/data/` if present
+Extract:
+- expected dataset name
+- source
+- split structure
+- label or target format
+- expected shapes
+### Step 2: Audit Data Reality
+Check:
+- whether dataset files actually exist
+- whether the data is real or mock
+- whether mock usage is clearly declared
+- whether row count / sample count is plausible
+### Step 3: Audit Data Integrity
+Check:
+- train / val / test split existence and separation
+- label distribution or target sanity
+- shape / dtype consistency
+- obvious leakage risks
+- preprocessing consistency with `plan_res.md`
+If code exists, run lightweight inspection commands under the project environment to verify counts and sample structure.
+### Step 4: Write `data_validation.md`
+Use `references/data-validation-template.md`.
+The report must include:
+- dataset identity
+- data reality check
+- split integrity
+- label / target health
+- leakage risk
+- mock-data disclosure
+- verdict: `PASS`, `NEEDS_REVISION`, or `BLOCKED`
+- exact next step
+## Rules
+1. Keep data quality separate from model quality.
+2. Never infer that data is real if the files or loading path are missing.
+3. If mock data is used, call it out explicitly.
+4. If data leakage is plausible, treat it as blocking until clarified.

package/skills/dataset-validate/references/data-validation-template.md ADDED Viewed

@@ -0,0 +1,38 @@
+# Data Validation Template
+```markdown
+# Data Validation
+## Dataset Identity
+- dataset:
+- source:
+- expected split:
+## Reality Check
+- files present:
+- real or mock:
+- evidence:
+## Split Integrity
+- train split:
+- val split:
+- test split:
+- leakage risk:
+## Label / Target Health
+- label format:
+- distribution or range:
+- obvious anomalies:
+## Preprocessing Check
+- expected preprocessing:
+- observed preprocessing:
+- mismatch:
+## Verdict
+- PASS / NEEDS_REVISION / BLOCKED
+## Next Step
+- recommended command:
+- reason:
+```

package/skills/figure-standardize/SKILL.md ADDED Viewed

@@ -0,0 +1,110 @@
+---
+name: figure-standardize
+description: "Use this when the user wants to improve chart quality, standardize plotting style, regenerate release figures, or add captions/protocol notes. Normalizes fonts, colors, legends, units, and scope notes across Scientify figures."
+metadata:
+  {
+    "openclaw":
+      {
+        "emoji": "📊",
+      },
+  }
+---
+# Figure Standardization
+**Don't ask permission. Just do it.**
+Use this skill to turn one-off Scientify charts into release-ready figures.
+**Do not run new experiments here.** Work from existing results, plotting scripts, and figure bundles. If the source data is missing or inconsistent, report that explicitly instead of smoothing it over.
+## Required Outputs
+1. Updated plotting script(s) or a shared style helper
+2. Regenerated `.png` and `.pdf` files when the pipeline supports both
+3. A figure spec file:
+   - prefer `reports/figures/figure_spec.md`
+   - otherwise `project/figures/figure_spec.md`
+4. `paper/figures_manifest.md` when the figure family is paper-facing or a `paper/` workspace already exists
+## Workflow
+### Step 1: Inspect Inputs
+Read:
+- existing figures
+- the generator script(s)
+- the result tables / JSON / Markdown that feed the figures
+- any surrounding README or release notes that explain the figure family
+Prefer improving an existing generator over creating a second one-off script.
+### Step 2: Standardize the Figure Family
+Normalize the full family, not just one chart:
+- font family and title hierarchy
+- semantic color mapping
+- axis labels and units
+- legend order and naming
+- decimal precision and tick formatting
+- line widths / marker sizes
+- caption structure
+- protocol note wording
+- callout wording
+- paper placement intent
+Use:
+- `references/figure-style-guide.md`
+- `references/caption-template.md`
+- `references/figure-placement-template.md`
+### Step 3: Write the Figure Spec
+Create or update `figure_spec.md` with one section per figure:
+- figure filename
+- source files
+- metrics shown
+- baseline or comparison family
+- quality guard / evaluation constraint
+- simulator/runtime note
+- intended takeaway
+If the figure is used in a paper or paper-facing report, also create or update the matching entry in `paper/figures_manifest.md` with:
+- `figure_id`
+- `file_path`
+- `latex_label`
+- `section`
+- `placement_hint`
+- `caption_short`
+- `caption_long`
+- `takeaway_sentence`
+- `callout_sentence`
+- `baseline`
+- `evidence_type`
+- `source_metrics`
+- `source_files`
+- `supports_claim_ids`
+- `must_appear_before_claim_ids`
+Keep `figure_spec.md` and `paper/figures_manifest.md` aligned. The spec is the release-facing summary; the manifest is the paper-facing contract.
+### Step 4: Re-render and Verify
+Re-render the figures after script updates.
+Keep filenames stable unless the user explicitly asked for a new release bundle.
+## Figure Rules
+1. Keep metric semantics identical across a figure family.
+2. Always show units explicitly.
+3. If a result comes from simulator or proxy evaluation, state that in the caption or protocol note.
+4. Do not hide failing or quality-guard-breaking baselines; mark them clearly.
+5. Do not change the scientific claim. This skill improves packaging, not evidence.
+6. If a figure is paper-facing, produce both a long caption and a first-use callout sentence.
+7. If a figure supports a claim, the manifest must name that claim in `supports_claim_ids`.

package/skills/figure-standardize/references/caption-template.md ADDED Viewed

@@ -0,0 +1,12 @@
+# Caption Template
+```text
+caption_short:
+Figure X. {Short statement of what is compared}.
+caption_long:
+Figure X. {What the figure shows and the main takeaway}. Baseline: {baseline family}. Metric: {metric + unit}. Evidence type: {simulator/local_runtime/runtime}. Protocol: {guardrail or evaluation scope}. Boundary: {scope note if needed}.
+callout_sentence:
+Figure \ref{{latex_label}} shows {takeaway_sentence} under {protocol or evaluation frame}.
+```

package/skills/figure-standardize/references/figure-placement-template.md ADDED Viewed

@@ -0,0 +1,30 @@
+# Figure Placement Template
+Use this guide when assigning `section` and `placement_hint` in `paper/figures_manifest.md`.
+Recommended fields:
+```yaml
+section: "main_results"
+placement_hint: "figure[t]"
+must_appear_before_claim_ids:
+  - "claim-001"
+```
+Placement hint options:
+- `figure[t]`
+  - standard single-column top placement
+- `figure[b]`
+  - standard single-column bottom placement
+- `figure* [t]`
+  - wide figure for two-column layouts
+- `inline_reference_only`
+  - no figure block in the current section; the text only points to a figure placed elsewhere
+Placement rules:
+- Put the figure in the same section that carries its main supported claim whenever possible.
+- If a figure supports a headline result, the first callout should appear before or immediately adjacent to the corresponding claim.
+- Do not place a figure so late that the reader sees the claim well before the figure is introduced.
+- Use `must_appear_before_claim_ids` to mark claims that should not appear before the figure callout.

package/skills/figure-standardize/references/figure-style-guide.md ADDED Viewed

@@ -0,0 +1,36 @@
+# Figure Style Guide
+Use one consistent style across a figure family:
+- one font family
+- one semantic color per method family
+- one stable baseline line style
+- explicit units in axis labels
+- compact legend names
+- protocol note under the figure or in the caption
+- one stable caption and callout structure across the family
+Required paper-facing figure contract fields:
+- `caption_short`
+- `caption_long`
+- `callout_sentence`
+- `placement_hint`
+- `supports_claim_ids`
+Required caption fields:
+- what is compared
+- which baseline is used
+- what metric is shown
+- what evidence type supports the figure
+- what protocol or guardrail defines the evaluation regime when that matters for the claim
+Paper-facing figures should stay aligned across four surfaces:
+1. `paper/figures_manifest.md`
+2. the first prose callout
+3. the figure caption
+4. the eventual LaTeX figure block
+Do not let these four surfaces drift apart.

package/skills/release-layout/SKILL.md ADDED Viewed

@@ -0,0 +1,73 @@
+---
+name: release-layout
+description: "Use this when the user wants to improve README, docs pages, or microsites so a new reader can understand what the project is, how to use it, what artifacts exist, and what the scope boundaries are within one screen."
+metadata:
+  {
+    "openclaw":
+      {
+        "emoji": "🪄",
+      },
+  }
+---
+# Release Layout
+**Don't ask permission. Just do it.**
+Use this skill for outward-facing packaging surfaces such as:
+- `README.md`
+- `docs/index.html`
+- release page generator scripts
+This skill improves structure and legibility. It does **not** upgrade the scientific claim on its own.
+## Core Goal
+A first-time reader should understand, within one screen:
+1. what this is
+2. how to use it
+3. what artifacts it produces
+4. what the scope boundary is
+## Workflow
+### Step 1: Detect the Real Edit Target
+If a page is generated by a script, prefer editing the generator rather than the built HTML.
+If `review/release_gate.json` exists, read it before polishing release-facing copy.
+### Step 2: Audit the First Screen
+Check whether the hero / opening section answers the four core questions above.
+### Step 3: Reshape the Page
+Prefer this order:
+1. hero / product definition
+2. quick-start or usage path
+3. artifact map
+4. evidence / results block
+5. scope note
+6. FAQ or next steps
+Use `references/page-structure.md`.
+### Step 4: Clean the Reading Path
+Reduce:
+- duplicated claims
+- buried usage instructions
+- unexplained metrics
+- isolated figures without framing text
+## Safety Rules
+1. Do not hide limitations for the sake of visual polish.
+2. Do not introduce stronger language than the underlying artifacts support.
+3. If the result is simulator-only, say that near the top instead of burying it below the fold.
+4. If the release gate is `HOLD`, stale, or missing for a share-ready artifact set, do not present the project as fully ready to share.

package/skills/release-layout/references/page-structure.md ADDED Viewed

@@ -0,0 +1,14 @@
+# Page Structure
+Recommended first-screen order:
+1. one-line definition
+2. quick-start
+3. artifact outputs
+4. evidence boundary
+Avoid:
+- leading with large result claims before the project is defined
+- hiding usage instructions below the fold
+- showing figures without telling the reader what they mean