npm - medsci-skills - Versions diffs - 4.8.0 → 4.9.0 - Mend

medsci-skills 4.8.0 → 4.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (56) hide show

package/README.md CHANGED Viewed

@@ -268,6 +268,11 @@ The E2E pipeline (`orchestrate --e2e`) produces everything up to `qc/`. The `sub
 ## What's New
+**v4.9** — analysis-integrity hardening promoted from real review cycles, plus journal-mechanics additions. Additive and backward-compatible; still 45 skills / 36 guidelines, analysis-integrity detectors **32 → 36**:
+- **Four new gates** — a **duplicate-bibliography** check (`check_reference_duplication.py`) for the hybrid `[@key]` + hand-typed `## References` build that renders the list twice; a **cross-script binning / composite-indicator** consistency check (`check_binning_consistency.py`, `BINNING_DRIFT` / `DERIVED_DEF_DRIFT`) for a derived categorical or composite indicator defined inconsistently across analysis scripts; a **float citation-order** check (`check_citation_order.py`) for numbered Tables/Figures not first cited in ascending order per series; and an **audit-dump leak** gate (`/sync-submission`) that blocks a `/check-reporting` output mistakenly attached as a submission file.
+- **KJR technical-check conventions + percentage-decimal style**, reader-allocation-under-burden and generative-image-as-study-object reporting (`/design-ai-benchmarking`, `/check-reporting`), and a **Liver International** CSL with that journal's submission mechanics (`/manage-refs`).
 **v4.8** is the **review-harvest batch** — deterministic detector hardening promoted from real-manuscript review cycles. Additive and backward-compatible; still 45 skills / 36 guidelines, analysis-integrity detectors **30 → 32**:
 - **Two new gates** — `check_supplement_hygiene.py` lints the rendered supplement / tables / caption files (not just the manuscript) for §-labels, placeholders, build markers, response-letter framing, and unresolved body↔supplement cross-references; `check_null_calibration.py` flags a headline negative/equivalence claim made without a minimum-detectable-effect / power / equivalence statement.
@@ -356,7 +361,7 @@ Earlier in this series: analysis-integrity guards (confounding completeness, cla
 | **Battle-tested** | Used on real manuscript submissions by a practicing physician-researcher | Unknown provenance and validation |
 | **Depth per skill** | 150-600 lines of documentation + bundled reference files (curated journal profile library, checklists, formula sheets, code templates) | Typically thin SKILL.md templates |
-**MedSci-Audit** — the verification edge in the first rows above is a named suite of **28 deterministic detectors** (citation & reference integrity, cohort & pool arithmetic, scope/estimand contracts, reporting compliance, and more) that catch fabricated or drifted content before a manuscript reaches a reviewer. See **[`MEDSCI_AUDIT.md`](MEDSCI_AUDIT.md)** for the suite, its six families, and its evaluation evidence.
+**MedSci-Audit** — the verification edge in the first rows above is a named suite of **36 deterministic detectors** (citation & reference integrity, cohort & pool arithmetic, scope/estimand contracts, reporting compliance, and more) that catch fabricated or drifted content before a manuscript reaches a reviewer. See **[`MEDSCI_AUDIT.md`](MEDSCI_AUDIT.md)** for the suite, its six families, and its evaluation evidence.
 ---
@@ -601,6 +606,17 @@ Projects declare their source-of-truth layout in `SSOT.yaml`, and a `qc/migratio
 ### Skills Work Together
 Skills call each other. `check-reporting` invokes `make-figures` for PRISMA diagrams. `write-paper` calls `search-lit` for citation verification. `self-review` delegates reporting compliance to `check-reporting`. `calc-sample-size` output feeds directly into `write-protocol`'s IRB justification section.
+### Skill boundaries — which to use, and in what order
+The skill set is deliberately *specialized, not consolidated* — each skill owns a distinct artifact or lifecycle step, so the routing stays precise. The boundaries that are easy to confuse:
+- **Reference pipeline** — `search-lit` (discover candidates) → `lit-sync` (sole writer of `refs.bib`, syncs Zotero/Obsidian) → `manage-refs` (render CSL / inject CWYW / cross-ref QC, sole writer of the rendered DOCX) → `verify-refs` (read-only audit; never edits `refs.bib`). They are one pipeline, not four overlapping tools.
+- **Language passes run in order** — `humanize` (remove AI-writing tells) → `polish-language` (deterministic ESL/house-style consistency: abbreviations, spelling, en-dashes, p-value case) → `academic-aio` (AI-search/GEO visibility). Three sequential passes with non-overlapping jobs.
+- **Manuscript type picks the skill** — `write-paper` (original/IMRAD articles, case reports, MAs) vs `review-paper` (narrative / scoping / systematic literature reviews) vs `revise` (reviewer-response + tracked changes). Different structures and reporting guidelines.
+- **Author vs external reviewer** — `self-review` is your own pre-submission check (anticipated comments); `peer-review` drafts a journal-facing review as an external reviewer. Same domain probes, different user and output.
+- **Project entry** — `intake-project` classifies and scaffolds a *new or messy folder*; `orchestrate` routes a *goal or task* ("help me write a paper"). Start with `intake-project` when you have files but no structure, `orchestrate` when you have a task but no plan.
+- **Study design** — `design-study` covers general validity (analysis unit, leakage, comparator, validation) **and** carries a design-stage ceiling gate for perceptual / observer / reader / visual-Turing-test / image-provenance studies; `design-ai-benchmarking` specializes in AI-vs-human-expert evaluation (rubrics, calibration probes, LLM-as-judge).
+- **Content vs template** — `write-protocol` drafts IRB/ethics scientific content; `fill-protocol` renders that content into an institutional Word template without breaking its formatting.
 ### Validation status — available vs CI-gated vs evaluated
 Be precise about what "validated" means here — the three tiers are different facts:
 - **Available** — every bundled skill and deterministic detector. The current totals are the single source of truth in [`metadata/catalog_counts.json`](metadata/catalog_counts.json) and [`MEDSCI_AUDIT.md`](MEDSCI_AUDIT.md).

package/metadata/distribution_files.json CHANGED Viewed

@@ -51,6 +51,11 @@
       "size": 25500,
       "sha256": "6a632a88617889a1ac36418822b8af3f2bcab75bfa28169e99ae4fdf0b810365"
     },
+    {
+      "path": "skills/MAINTENANCE.md",
+      "size": 4061,
+      "sha256": "a4eaa6062e7d5879afcdac3bd954fcb783282707eea22b815d5a6f794d5a5217"
+    },
     {
       "path": "skills/academic-aio/SKILL.md",
       "size": 31396,
@@ -148,8 +153,8 @@
     },
     {
       "path": "skills/analyze-stats/SKILL.md",
-      "size": 46340,
-      "sha256": "427b587784ab62562184299f6fcb9625275c42d9191075635be1f31a3f69def3"
+      "size": 47388,
+      "sha256": "12121ea6224d8c75d4aa98a6e2ee2947c95cfc17a3902780e7bb8d7ddb0be052"
     },
     {
       "path": "skills/analyze-stats/references/analysis_guides/mediation.md",
@@ -468,8 +473,8 @@
     },
     {
       "path": "skills/check-reporting/SKILL.md",
-      "size": 32921,
-      "sha256": "6901d90a56ac4f752988b533310eedb773819241774befc2bbc64bb592d52e9f"
+      "size": 35835,
+      "sha256": "a11617fb2bcf03b63a788638ad68ab9dac8623281e8b58428706b7c43a02e8c3"
     },
     {
       "path": "skills/check-reporting/references/LICENSES.md",
@@ -666,6 +671,11 @@
       "size": 4565,
       "sha256": "f955a0479da6474e43ece05361838f8db95923ec9f7dc56863afbf4cba66174d"
     },
+    {
+      "path": "skills/check-reporting/references/genai_image_study_object_decision_aid.md",
+      "size": 4287,
+      "sha256": "34f79571566ef06eee0fc4c8c646be530806fca658720902d16642faadc8844b"
+    },
     {
       "path": "skills/check-reporting/references/step4c_registration_timing.md",
       "size": 4197,
@@ -853,8 +863,13 @@
     },
     {
       "path": "skills/design-ai-benchmarking/SKILL.md",
-      "size": 10820,
-      "sha256": "28bc4edc34b28e3d40d176743a1ccaa60b1c559ee1e401ca94c4a14131c10630"
+      "size": 12094,
+      "sha256": "b8f794a1f6c800d821305a4df8a797bea61cf34a602e0dc0dbea8f2c0c458ca5"
+    },
+    {
+      "path": "skills/design-ai-benchmarking/references/anchor_rotate_reader_allocation.md",
+      "size": 4585,
+      "sha256": "a763572efd764118e6ee57c950268c175cfeeecca00a43be53412e97c053421d"
     },
     {
       "path": "skills/design-ai-benchmarking/references/benchmark_export_schema.json",
@@ -1198,8 +1213,8 @@
     },
     {
       "path": "skills/find-journal/references/journal_profiles/KJR.md",
-      "size": 2553,
-      "sha256": "11aea434bb653d304d107133f9af5a74937907b558b22dd28029f7fdba64f31b"
+      "size": 3036,
+      "sha256": "a0814e6d62288389db7528b73a25db870ab91635dc4b946fb0c8bf8af47150a3"
     },
     {
       "path": "skills/find-journal/references/journal_profiles/Korean_Circulation_Journal.md",
@@ -1968,8 +1983,8 @@
     },
     {
       "path": "skills/manage-project/SKILL.md",
-      "size": 12313,
-      "sha256": "ea1925634e5da4eff202fd34ae874f64b7b76d577ffa0bd49c406a5d84bd4441"
+      "size": 12315,
+      "sha256": "40c3a0098a3729b839e132db3987ad0f3fc5f3eeaf1c5a56dd77673cffdb5dbd"
     },
     {
       "path": "skills/manage-project/references/pre_submission_checklist.md",
@@ -2018,13 +2033,13 @@
     },
     {
       "path": "skills/manage-refs/SKILL.md",
-      "size": 17029,
-      "sha256": "61441a243ea113e9a5aeba3154ca741db9f1301118051f461b0a3c3de7b61fdf"
+      "size": 18165,
+      "sha256": "49adc82dea2b5d7eb93946b2cd8143d66d50b53169d3ed18f4bd738bfe3af39f"
     },
     {
       "path": "skills/manage-refs/citation_styles/README.md",
-      "size": 2001,
-      "sha256": "e25f3c5689112527d0eccabcd5dc69a061f56355b3e02c8b71a1d831e04e443d"
+      "size": 2205,
+      "sha256": "d957bbdd13df10884fd54f9eb4efb096a73824c69167f872b0cb9819be031cdf"
     },
     {
       "path": "skills/manage-refs/citation_styles/american-journal-of-roentgenology.csl",
@@ -2061,6 +2076,11 @@
       "size": 5849,
       "sha256": "edde670da20212820d54649dcb96594db835eb55498e88c7de41891dfb370114"
     },
+    {
+      "path": "skills/manage-refs/citation_styles/liver-international.csl",
+      "size": 18264,
+      "sha256": "c7c144ff5df948fc09c9604bf9f8269c6cd427c29bd043da6ead24e75c80971f"
+    },
     {
       "path": "skills/manage-refs/citation_styles/nature.csl",
       "size": 6444,
@@ -2118,8 +2138,13 @@
     },
     {
       "path": "skills/manage-refs/scripts/check_csl_render.py",
-      "size": 5151,
-      "sha256": "bdaf85f75a2ebfb0224d39b4862ccf31dd914453b540de8f3c1ea6e2a0dccc48"
+      "size": 7718,
+      "sha256": "a1848c33e945024719fa1b7cc996d37555801e011c6280be6644ae4f01642601"
+    },
+    {
+      "path": "skills/manage-refs/scripts/check_reference_duplication.py",
+      "size": 10210,
+      "sha256": "439f02252338e204dadf24dc4de13e38ab3ce7b6ea394e8dee38b8ee1cf92524"
     },
     {
       "path": "skills/manage-refs/scripts/check_xref.py",
@@ -2158,8 +2183,8 @@
     },
     {
       "path": "skills/meta-analysis/SKILL.md",
-      "size": 55293,
-      "sha256": "d370d5fddffaa5fa26f438a1157d4b96412dbbf7cbe65c9c28cc8474cc736d0b"
+      "size": 49604,
+      "sha256": "4947eae188dc2fcfba68ed991ba126da8315dc79deac0649d2beea558e73025e"
     },
     {
       "path": "skills/meta-analysis/references/LICENSES.md",
@@ -2216,6 +2241,11 @@
       "size": 5538,
       "sha256": "9b2dc03572cb066528e1ef19b0699ec9434ec29cbad8b84f5fbab1492ead5480"
     },
+    {
+      "path": "skills/meta-analysis/references/empirical_lessons.md",
+      "size": 7616,
+      "sha256": "f49ebc21369095d19661d39186d1c41368811fbe689c77762daf60c74cd73ee8"
+    },
     {
       "path": "skills/meta-analysis/references/icmje_coi_guide.md",
       "size": 6043,
@@ -2773,8 +2803,8 @@
     },
     {
       "path": "skills/revise/SKILL.md",
-      "size": 27309,
-      "sha256": "80d274dbca054955c1f4953e9db4f2bf0a02aaf91a26a54ecedd855546dbaceb"
+      "size": 27775,
+      "sha256": "2da4f80e879c2d3ff31d2af435cb27ecae0ba09f1014b8ebbd799ac2472ff1ea"
     },
     {
       "path": "skills/revise/references/r2r_voice.md",
@@ -2848,8 +2878,8 @@
     },
     {
       "path": "skills/self-review/SKILL.md",
-      "size": 89813,
-      "sha256": "54bcd9c6e751555044b9b436db9e8f10e0be280b60c00048c68de5570512bf16"
+      "size": 93517,
+      "sha256": "92b6e1c0e6cdaa27d5f033ce28e58a210208d619793d221f9ffa944aa1055bba"
     },
     {
       "path": "skills/self-review/references/domain-probes/ai_overclaiming.md",
@@ -2961,6 +2991,16 @@
       "size": 17113,
       "sha256": "56096c39ddb0083c04a1254f06bafa6fac9fc8a136c9246f68773f0ba5da96d4"
     },
+    {
+      "path": "skills/self-review/scripts/check_binning_consistency.py",
+      "size": 19541,
+      "sha256": "e3bf7dd2e0871ce6905abc1d33a26c7afac76a93d184bfe2d431af97d0622f74"
+    },
+    {
+      "path": "skills/self-review/scripts/check_citation_order.py",
+      "size": 8705,
+      "sha256": "38525b4dd3ca8c9d99f090e4d42b65f10baf442f56fc4eac5174fb6ba13d90bb"
+    },
     {
       "path": "skills/self-review/scripts/check_claim_artifact.py",
       "size": 10757,
@@ -2968,8 +3008,8 @@
     },
     {
       "path": "skills/self-review/scripts/check_classical_style.py",
-      "size": 10953,
-      "sha256": "3b5c85edc57ee607a2b0a10898d68ac163ce3be6fe50b82c8c11490d7bc2705a"
+      "size": 12210,
+      "sha256": "c973ee8b776f28515439fb185e1254e08e62c2e1410e260f18a824241a331af0"
     },
     {
       "path": "skills/self-review/scripts/check_cohort_arithmetic.py",
@@ -3038,14 +3078,19 @@
     },
     {
       "path": "skills/sync-submission/SKILL.md",
-      "size": 26836,
-      "sha256": "7aa91cc5355c3257877e0e9fdb12b60a7315f3865faa225ece32a4cc6a9c2d76"
+      "size": 27787,
+      "sha256": "4da14c76c6c9326d31ee93e9515854291cba2c48692eb85cf5d9f6f4301ce465"
     },
     {
       "path": "skills/sync-submission/references/journal_availability_policy.json",
       "size": 1257,
       "sha256": "6d278675d7c734aa3589165817f5413cc46c44402ea15039e51052ab2f52c0a8"
     },
+    {
+      "path": "skills/sync-submission/scripts/_yaml_frontmatter.py",
+      "size": 1669,
+      "sha256": "028fa8c4f7a4440c72d693a2ba6d4799410de0c565c61bd30d68eb0e7c208c78"
+    },
     {
       "path": "skills/sync-submission/scripts/assemble_supplement.py",
       "size": 8979,
@@ -3066,6 +3111,11 @@
       "size": 13869,
       "sha256": "caba039c6cfbfa09aec681a9840c7e0b5650cccdf9e00ddfd869557b0fec57c8"
     },
+    {
+      "path": "skills/sync-submission/scripts/check_checklist_dump_leak.py",
+      "size": 8745,
+      "sha256": "320765b9e975601fc2d73ce15a65b1419668982630c3b7546d0909158e5a5374"
+    },
     {
       "path": "skills/sync-submission/scripts/check_cross_artifact_stale.py",
       "size": 8286,
@@ -3078,13 +3128,13 @@
     },
     {
       "path": "skills/sync-submission/scripts/check_wordcount_cap.py",
-      "size": 10053,
-      "sha256": "7d47c194bdde03accbb6fab6347621cb3efdec17131651d7e58b4a69b4f0f0c6"
+      "size": 9788,
+      "sha256": "16fecbceae672e4192a138a0509321ea367f61079ca4ef4d630667b1e64eda58"
     },
     {
       "path": "skills/sync-submission/scripts/cover_letter_drift_check.py",
-      "size": 16554,
-      "sha256": "3188551ea2557ec7445f668a4d4b64396e94c5d691114b3d5bf52a16ef27cd7b"
+      "size": 16001,
+      "sha256": "347c5b702fbe9375899795791a34e8a60e246253bafb53b15ebb59d51dd45e7d"
     },
     {
       "path": "skills/sync-submission/scripts/cross_document_n_check.py",
@@ -3098,8 +3148,8 @@
     },
     {
       "path": "skills/sync-submission/scripts/preflight_gate.py",
-      "size": 20061,
-      "sha256": "e3e7a300f258cab373410504c1f48d27a1b75c49a12d7fa5cc0f8ba62ab86c4b"
+      "size": 20954,
+      "sha256": "f4be9edf587ec5ea2b7fb782e4912173b78a4fe2035720e6d0681b3d8f36340f"
     },
     {
       "path": "skills/sync-submission/scripts/scope_drift_check.py",
@@ -3418,8 +3468,8 @@
     },
     {
       "path": "skills/write-paper/references/journal_profiles/KJR.md",
-      "size": 10367,
-      "sha256": "e74e6454054dc3665c6012b973020222f9d02e469d195a60b2570e850fc23e2c"
+      "size": 12737,
+      "sha256": "ea71f1be90ff7088ba8931c97515f221486ca7ac9c7079fefba25417e7a0e932"
     },
     {
       "path": "skills/write-paper/references/journal_profiles/Korean_Circulation_Journal.md",
@@ -3438,8 +3488,8 @@
     },
     {
       "path": "skills/write-paper/references/journal_profiles/Liver_International.md",
-      "size": 9474,
-      "sha256": "50b7cc65792c17b406ffa4cb32cfb54963d0972b71e48730d4e92a7172dcbe12"
+      "size": 12174,
+      "sha256": "4a12d53605045b20aabc827e4b803edd73f603b6112ced34ec8acfef965950aa"
     },
     {
       "path": "skills/write-paper/references/journal_profiles/Medical_Image_Analysis.md",

package/metadata/distribution_manifest.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "schema_version": 1,
-  "version": "4.8.0",
+  "version": "4.9.0",
   "owned_skills": [
     "academic-aio",
     "add-journal",

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "medsci-skills",
-  "version": "4.8.0",
+  "version": "4.9.0",
   "description": "MedSci Skills — a medical/scientific research skill suite for AI coding agents (Claude Code, Codex, Cursor, Copilot). The npm package is a terminal-friendly installer shortcut; the canonical distribution remains the GitHub repository and the Claude Code plugin marketplace.",
   "license": "SEE LICENSE IN LICENSE",
   "homepage": "https://github.com/Aperivue/medsci-skills#readme",

package/skills/MAINTENANCE.md ADDED Viewed

@@ -0,0 +1,68 @@
+# Skill Script Maintenance — taxonomy & wiring rules
+Every `.py`/`.sh` under `skills/*/scripts/` and `skills/*/tests/` falls into one of
+four categories. Misclassifying one is how a detector goes "dormant" (counted in the
+catalog but never invoked) or how a regression test gives false coverage (exists but
+never runs in CI). This file is the source of truth for which is which and what each
+category must satisfy.
+## 1. Counted analysis-integrity detector
+A script whose **filename** matches the catalog glob — `check_*.py`, `detect_*.py`,
+`derive_*.py`, or `verify_refs.py` — under `skills/*/scripts/`. The glob is the SSOT:
+`scripts/gen_detectors_catalog_json.py` and `scripts/validate_catalog_consistency.py`
+both count these and they must agree with `metadata/catalog_counts.json`
+(`integrity_detectors`).
+A counted detector MUST:
+- be registered in `scripts/gen_detectors_catalog_json.py` `FAMILY_BY_ID` (an unmapped id
+  fails generation), and bump `metadata/catalog_counts.json` + `MEDSCI_AUDIT.md` when added;
+- be **invoked** from its skill's `SKILL.md` (a named workflow step) — otherwise it is
+  dormant (counted but never run on a real manuscript);
+- have a **CI-wired** regression test (a `tests/test_*.sh` step in
+  `.github/workflows/validate.yml`) with PII-free synthetic fixtures.
+> Naming trap: a reusable helper must NOT be named `check_*`/`detect_*` or it inflates the
+> detector count. Prefix helpers with `_` (see category 2).
+## 2. Helper / library module
+Shared logic imported by other scripts in the **same** skill (skills are self-contained —
+no cross-skill imports). Name it with a leading underscore (`_yaml_frontmatter.py`) or a
+plain verb (`fill_journal_abbrev.py`) so the detector glob never counts it. Helpers do not
+need their own SKILL.md step, but if a user runs them directly they should be listed in the
+skill's tool table (e.g. `manage-refs` documents `fill_journal_abbrev.py`).
+## 3. Run-once authoring tool
+A generator a maintainer runs by hand to (re)build a committed asset — NOT invoked at skill
+invocation. These are intentionally not wired into any SKILL.md step. Keep them; document
+their purpose in their own docstring. Current run-once tools:
+- `skills/make-figures/scripts/build_jacc_template.py` — rebuilds the committed JACC Central
+  Illustration PPTX template (`references/visual_abstract_templates/jacc_central_illustration.pptx`).
+- `skills/make-figures/scripts/extract_exemplar_from_pdf.py` — extracts a figure region from a
+  PDF page to grow the make-figures Critic-Loop exemplar reference set.
+## 4. Test fixture / regression test
+Lives under `skills/<skill>/tests/`. A `test_*.sh`/`test_*.py` is only real coverage if it
+is wired into `.github/workflows/validate.yml` as a `run:` step. **Adding a test file is not
+enough** — if it is not listed in `validate.yml` it never runs and gives false confidence.
+When you add a detector and its test in the same PR, add the `validate.yml` step in that PR.
+## When you touch a skill script — checklist
+1. New `check_*`/`detect_*` detector → register in `gen_detectors_catalog_json.py`
+   (`FAMILY_BY_ID`) + bump `catalog_counts.json` + `MEDSCI_AUDIT.md` + wire into the skill's
+   `SKILL.md` + add a CI-wired test. Then run all three generators in `--check` mode.
+2. New helper → underscore/plain name (never `check_*`), import only within the same skill.
+3. New asset/fixture file → re-run `python3 scripts/gen_distribution_manifest.py` (it tracks
+   payload files and hashes; tests are excluded from the distributed payload but the manifest
+   `--check` still gates on edited payload scripts).
+4. New/edited test → add its `run:` step to `.github/workflows/validate.yml`.
+Run the full local CI-mirror before pushing (see the repo `CONTRIBUTING.md` / `validate.yml`
+gates): `validate_skills.sh`, the three `gen_*.py --check`, `validate_catalog_consistency.py`,
+`check_version_consistency.py`, `gen_skill_docs.py --check`, `check_locale_inventory.py`,
+`validate_routing_assets.py --strict`, and the installer tests.

package/skills/analyze-stats/SKILL.md CHANGED Viewed

@@ -55,6 +55,8 @@ from `analysis_guides/` to ensure correct methodology and reporting.
 ### Phase 2: Analysis Plan
+**Precondition (observational studies).** Before proposing an analysis plan for an observational design (cohort, case-control, cross-sectional, registry, or survey), confirm that a literature-grounded variable operationalization exists — a `variable_operationalization.md` from `/define-variables`, or an equivalent codebook-backed definition table. If none exists, **warn** the user and recommend running `/define-variables` first, so exposure / outcome / covariate definitions and cutoffs are citation-backed rather than invented ad hoc from the data dictionary (ad-hoc phenotype/cutoff definitions are a common reviewer-rejection trigger for observational work — see the dictionary-first discipline). This is a WARN, not a hard block: proceed on explicit user confirmation, recording that the operationalization artifact was not available. For stricter projects, treat the missing artifact as a hard stop until `/define-variables` has run. (This mirrors the same precondition already enforced in `/write-protocol` before drafting Methods.)
 Based on the data structure and research question, propose an analysis plan:
 1. **Auto-detect analysis type** from the table below, or accept user specification.

package/skills/check-reporting/SKILL.md CHANGED Viewed

@@ -115,13 +115,15 @@ user specification.
 | Quality of systematic reviews | AMSTAR 2 | ROBIS |
 | Radiomics study | CLEAR | CLAIM 2024 (if deep learning component) |
 | Educational / QI study | SQUIRE 2.0 | -- |
+| Generative AI **images ARE the study object** (realism / real-vs-synthetic reader study / model-vs-model quality) | (no single guideline -- assemble) | see decision aid below |
 **Rules:**
 - If the study involves AI/ML, always apply the AI extension in addition to the base guideline.
   - **Exception — TRIPOD**: TRIPOD+AI 2024 (Collins et al., BMJ 2024) is a complete rewrite, not an addendum to TRIPOD 2015 (Moons et al., Ann Intern Med 2015). For non-AI prediction models, use TRIPOD 2015 only. For AI/ML prediction models, use TRIPOD+AI 2024 only. Do NOT apply both simultaneously.
 - **STARD-AI** (Sounderajah et al., Nat Med 2025) extends STARD 2015 with 14 new and 4 modified items (40 total). For AI diagnostic accuracy studies, use STARD-AI (which incorporates all STARD 2015 items). Do NOT apply both STARD 2015 and STARD-AI simultaneously — STARD-AI supersedes STARD 2015 for AI studies.
 - **TRIPOD-LLM** (Gallifant et al., Nat Med 2025) is the reporting guideline for studies that develop, fine-tune, prompt, or evaluate a large language model for a clinical/biomedical task. It extends the TRIPOD family (TRIPOD 2015 → TRIPOD+AI 2024 → TRIPOD-LLM 2025); name the base instrument and the extension and cite each. It is modular — task-specific items (Annotation, Prompting, Summarization, Instruction-tuning) are N/A when that component is absent. Use TRIPOD-LLM for LLM studies in place of TRIPOD+AI; pair with MI-CLEAR-LLM when LLM accuracy is an evaluated outcome. The vendored checklist is an educational summary (own-words paraphrase of item intent); complete the official instrument for a submission checklist.
-- **MI-CLEAR-LLM** is a supplementary checklist (6 items), not a standalone reporting guideline. Always pair it with the study's primary guideline (e.g., STARD-AI for AI diagnostic accuracy, CLAIM for imaging AI). Apply MI-CLEAR-LLM whenever the study evaluates LLM accuracy as an outcome — do NOT apply it merely because the manuscript was written with LLM assistance.
+- **MI-CLEAR-LLM** is a supplementary checklist (6 items), not a standalone reporting guideline. Always pair it with the study's primary guideline (e.g., STARD-AI for AI diagnostic accuracy, CLAIM for imaging AI). Apply MI-CLEAR-LLM whenever the study evaluates LLM accuracy as an outcome — do NOT apply it merely because the manuscript was written with LLM assistance. Its scope is **LLM accuracy** studies (including VLMs interpreting images); it does **not** apply at study level to studies where a generative model *produces* the images under study (see next bullet).
+- **Generative-AI images as the study object** (a generative model synthesizes images and the study evaluates their realism, controllability, real-vs-synthetic distinguishability, or model-vs-model quality) has **no single dominant checklist**. Assemble: CLAIM 2024 (imaging-AI umbrella; model-development items N/A when commercial models are used as-is) + FUTURE-AI traceability + MI-CLEAR-LLM **transparency items only** (prompt/model/version/params/runs — for generation provenance, not study-level compliance) on the generator side; STARD-AI (for real-vs-synthetic detection) + GRRAS (reader reliability) + MRMC reporting on the evaluation side. Map applicable items and cite base + extension; never claim wholesale compliance. Full decision aid: `${CLAUDE_SKILL_DIR}/references/genai_image_study_object_decision_aid.md`.
 - If multiple guidelines apply (e.g., a diagnostic accuracy study that is also an AI study), check against all relevant guidelines and merge into one report.
 - If the user requests a specific guideline, use that one regardless of auto-detection.
@@ -246,6 +248,23 @@ study's data integrity immediately.
 - Reasons for exclusion (Methods + Figure legend) agree on counts and category names.
 **Procedure:**
+Run the deterministic implementation first — it performs steps 1, 4, 5, and 6 below
+automatically (same keyword regex, the four arithmetic equations, the body↔figure
+cross-reference) and writes `qc/prisma_figure_audit.json`:
+```bash
+python3 ${CLAUDE_SKILL_DIR}/scripts/check_prisma_figure.py \
+  --md <manuscript.md> --figure <Figure 1 source: .md manifest / caption / text export> \
+  --out qc/prisma_figure_audit.json
+```
+Exit `1` = an arithmetic or cross-reference MISMATCH (log a Part C Action Item labelled
+`[PRISMA-FIGURE]`, `fixable_by_ai: false` — the author must reconcile the numbers); exit
+`2` = missing/unparsable input. The manual algorithm below documents exactly what the
+script checks and is the fallback when Figure 1 numbers live only in a PNG/SVG that must
+be transcribed by hand:
 1. Extract numbers from manuscript Results / PRISMA flow paragraph (regex: integers near
    keywords `identified`, `duplicates`, `screened`, `excluded`, `sought`, `retrieved`,
    `assessed`, `included`).
@@ -323,9 +342,22 @@ critical item and the journal's own required elements.
 Produce a structured compliance report in two parts.
+This report is an **internal working audit** — it carries auto-fix annotations, a
+machine-readable JSON block (`compliance_pct`, `fixable_by_ai`, …), and Action
+Items. It is **NOT** the official reporting checklist a journal expects (that is
+the blank guideline form with `Item | Recommendation | Reported in page/section`,
+which the authors fill in). Never submit this report as the submission checklist.
+To make the file self-identifying so it cannot be reused by filename into a later
+submission package, **the report MUST begin with the NOT-FOR-SUBMISSION banner
+below** as its very first line. (`/sync-submission`'s `check_checklist_dump_leak`
+gate also catches this dump if it ever lands in a submission directory.)
 #### Part A: Summary
 ```
+<!-- INTERNAL AUDIT — NOT FOR SUBMISSION. This is the /check-reporting working
+report, not the official journal checklist. Do not upload to a submission portal. -->
 ## Reporting Guideline Compliance Report
 Manuscript: {title}

package/skills/check-reporting/references/genai_image_study_object_decision_aid.md ADDED Viewed

@@ -0,0 +1,60 @@
+# Decision aid — reporting studies where generative AI images ARE the study object
+**When this applies:** the study evaluates **images that a generative AI model synthesized**
+(realism, controllability/steerability, whether human readers can distinguish synthetic from
+real, or model-vs-model quality). The generative model's *output* is the object under study.
+**When this does NOT apply:** a model (incl. a vision-language model) *interprets* images and
+you measure its diagnostic accuracy — that is an AI-accuracy study; use the relevant
+accuracy guideline directly (e.g., STARD-AI, CLAIM, TRIPOD+AI, MI-CLEAR-LLM).
+## There is no single dominant checklist for this study type
+Generative-image-as-study-object work (e.g., RSNA reader studies on AI-synthesized or
+"deepfake" medical images) is reported by **assembling** existing guidelines plus a precedent
+bar. Do not claim wholesale compliance with any one checklist; map applicable items and cite
+the base guideline together with any AI extension (verify each item against the published
+source — never invent items).
+### Generator / provenance side
+- **CLAIM 2024** — medical-imaging-AI umbrella; the 2024 revision covers generative/foundation
+  models. If commercial models are used **as-is** (no training/fine-tuning by the authors),
+  the model-development / training / validation-split items are **N/A**; report data sources,
+  reference/real comparators, evaluation, transparency, and limitations.
+- **FUTURE-AI** — use the **Traceability** principle: persist verbatim prompts, a generation
+  manifest, model + version + access date, and parameters (a prompt/generation registry).
+- **MI-CLEAR-LLM — transparency *items* only, not study-level compliance.** MI-CLEAR-LLM is
+  scoped to **LLM *accuracy* studies in healthcare** (including VLMs interpreting images); it
+  is **not** a guideline for generative-output studies. Borrow its reporting *items* for
+  prompt-driven foundation models — verbatim prompt(s), model name + version + access date,
+  access channel/API, sampling parameters, number of runs, handling of non-determinism,
+  responsible party — to document generation provenance. Cite it as the basis for prompt
+  logging, not as the study's reporting guideline.
+### Reader / evaluation side
+- **STARD 2015 + STARD-AI** — if the reader task is **real-vs-synthetic discrimination**, that
+  is a diagnostic-accuracy structure: report the reference standard (what counts as
+  "real"/"synthetic"), reader blinding, flow, and accuracy with intervals. Cite base STARD
+  **and** the STARD-AI extension.
+- **GRRAS** (Guidelines for Reporting Reliability and Agreement Studies) — for inter-reader
+  feature/quality ratings: number and qualification of readers, blinding, the agreement
+  statistic (ICC / weighted kappa) with 95% CI, and separate reporting of any anchor/control
+  items.
+- **MRMC reporting** — for multi-reader multi-case designs: a-priori power, per-reader
+  randomization/seed, and a real-control arm matched on non-content attributes (resolution,
+  cropping, compression) so a format-only classifier cannot rival the readers.
+### Precedent bar (de-facto standard for this study type)
+Match the methodological bar set by published generative-image-as-study-object reader studies
+in high-impact radiology venues: a-priori power, MRMC reader platform with per-reader seeds,
+real-control matching on non-content attributes, and **explicit, pre-specified handling of
+failed / low-quality generations** (count them rather than silently excluding survivors).
+## Cross-cutting cautions
+- **No overclaim:** state which items of which guideline the study satisfies, verified against
+  the published checklist; do not assert blanket "reported per [guideline]".
+- **Manuscript's own AI-use disclosure** (writing assistance) is separate from the study-object
+  reporting above — see ICMJE/COPE and the write-paper LLM-disclosure feature.
+- **Pre-registration** of the primary estimand, frequency/realism references, and the
+  fresh-only firewall (pilot/calibration images excluded from the confirmatory set) belongs in
+  a study registry (e.g., OSF) for non-clinical reader studies — not PROSPERO (systematic
+  reviews) or a clinical-trial registry (no health-outcome intervention).

package/skills/check-reporting/tests/fixtures/prisma_body.md ADDED Viewed

@@ -0,0 +1,7 @@
+## PRISMA flow
+A total of 1000 records identified through database searching. After 200 duplicates
+removed, 800 records screened. Of these, 600 records excluded at screening, leaving
+200 reports sought for retrieval. 10 reports not retrieved. 190 reports retrieved and
+190 reports assessed for eligibility. 40 records excluded with reasons. 150 studies
+included in the synthesis.

package/skills/check-reporting/tests/fixtures/prisma_fig_clean.md ADDED Viewed

@@ -0,0 +1,10 @@
+1000 records identified
+200 duplicates removed
+800 records screened
+600 records excluded at screening
+200 reports sought for retrieval
+10 reports not retrieved
+190 reports retrieved
+190 reports assessed for eligibility
+40 records excluded with reasons
+150 studies included

package/skills/check-reporting/tests/fixtures/prisma_fig_mismatch.md ADDED Viewed

@@ -0,0 +1,10 @@
+1000 records identified
+200 duplicates removed
+800 records screened
+600 records excluded at screening
+200 reports sought for retrieval
+10 reports not retrieved
+190 reports retrieved
+190 reports assessed for eligibility
+40 records excluded with reasons
+149 studies included

package/skills/check-reporting/tests/test_prisma_figure.sh ADDED Viewed

@@ -0,0 +1,50 @@
+#!/usr/bin/env bash
+# Regression test for the PRISMA Figure 1 arithmetic + cross-reference audit
+# (check-reporting Step 4d / check_prisma_figure.py). Synthetic, PII-free fixtures.
+# Stdlib-only (python3); no network, no pandoc.
+set -u
+HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+SCRIPT="$HERE/../scripts/check_prisma_figure.py"
+BODY="$HERE/fixtures/prisma_body.md"
+CLEAN="$HERE/fixtures/prisma_fig_clean.md"
+MM="$HERE/fixtures/prisma_fig_mismatch.md"
+OUT="$(mktemp -t prisma_fig_XXXX).json"
+trap 'rm -f "$OUT"' EXIT
+for f in "$SCRIPT" "$BODY" "$CLEAN" "$MM"; do
+  [[ -f "$f" ]] || { echo "ENV-ERR: missing $f" >&2; exit 2; }
+done
+fail=0
+pass() { printf '  PASS  %s\n' "$1"; }
+bad()  { printf '  FAIL  %s\n' "$1"; fail=$((fail+1)); }
+echo "test_prisma_figure:"
+# 1. Clean figure (numbers match body, arithmetic consistent) -> audit_safe, exit 0.
+python3 "$SCRIPT" --md "$BODY" --figure "$CLEAN" --out "$OUT" >/dev/null 2>&1; rc=$?
+if [[ $rc -eq 0 ]] && python3 -c "import json,sys; sys.exit(0 if json.load(open('$OUT'))['audit_safe'] else 1)"; then
+  pass "clean body/figure -> audit_safe, exit 0"
+else
+  bad "clean case rc=$rc (expected 0 + audit_safe)"
+fi
+# 2. Mismatched figure (included 149 vs body 150) -> MISMATCH, exit 1, PRISMA-FIGURE flag.
+out="$(python3 "$SCRIPT" --md "$BODY" --figure "$MM" --out "$OUT" 2>&1)"; rc=$?
+if [[ $rc -eq 1 && "$out" == *"[PRISMA-FIGURE]"* ]] \
+   && python3 -c "import json,sys; d=json.load(open('$OUT')); sys.exit(0 if (not d['audit_safe'] and d['action_items']) else 1)"; then
+  pass "mismatched figure -> MISMATCH flagged, exit 1"
+else
+  bad "mismatch case rc=$rc (expected 1 + [PRISMA-FIGURE] + action_items)"
+fi
+# 3. Missing input -> clean error, exit 2 (no traceback).
+err="$(python3 "$SCRIPT" --md /nonexistent_prisma.md --figure "$CLEAN" --out "$OUT" 2>&1)"; rc=$?
+if [[ $rc -eq 2 && "$err" == *"not found"* && "$err" != *"Traceback"* ]]; then
+  pass "missing manuscript -> clean error, exit 2"
+else
+  bad "missing-input case rc=$rc: $err"
+fi
+if [[ $fail -eq 0 ]]; then echo "  OK"; exit 0; else echo "  $fail check(s) failed"; exit 1; fi