npm - medsci-skills - Versions diffs - 4.11.0 → 5.0.0 - Mend

medsci-skills 4.11.0 → 5.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (23) hide show

package/README.md CHANGED Viewed

@@ -4,7 +4,7 @@
 **51 skills that actually work.** Built by a physician-researcher, tested on real publications.
-*MedSci Skills is a submission-grade clinical manuscript workflow, not a generic biomedical skill catalog. Its moat is the compliance layer — 38 reporting guidelines and risk-of-bias tools, reference/citation verification, and deterministic integrity gates, before peer review sees the manuscript. It competes on clinical submission reliability, not skill count.*
+*MedSci Skills is an end-to-end research tool for physician and medical-engineering researchers — design → scaffold → validate → publish — for the clinical manuscript and the medical-AI model behind it. Its moat is the compliance layer — 38 reporting guidelines and risk-of-bias tools, reference/citation verification, and deterministic integrity gates before peer review — now extended by a model-engineering lane that scaffolds reproducible, leakage-safe training repos and audits model validation. Clinical AI model research engineering is in scope; a general AI-scientist platform is not. It competes on clinical submission reliability, not skill count.*
 [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
 [![Release](https://img.shields.io/github/v/release/Aperivue/medsci-skills?style=flat-square&color=blue)](https://github.com/Aperivue/medsci-skills/releases/latest)
@@ -42,14 +42,20 @@
 ## What is MedSci Skills?
 MedSci Skills is an open-source Claude Code skill collection for **clinical
-manuscript preparation**. It helps physician-researchers and biomedical
-investigators move from literature search, study design, statistics, and figures to
-reporting-guideline compliance, citation/reference auditing, numerical-consistency
-checks, and response-to-reviewer workflows — combining agentic writing with
-**deterministic integrity gates** for submission-grade biomedical research. It is
-**not** a diagnostic tool, an autonomous author, or a general AI-scientist platform;
-every output requires human-expert verification. New here? See the
-[3 workflows below](#start-here-3-workflows), the [FAQ](docs/faq.md), and the
+research — the manuscript and the medical-AI model alike**. It helps
+physician-researchers and biomedical/medical-engineering investigators move from
+literature search, study design, statistics, and figures to reporting-guideline
+compliance, citation/reference auditing, numerical-consistency checks, and
+response-to-reviewer workflows — combining agentic writing with **deterministic
+integrity gates** for submission-grade biomedical research. As of **v5.0** it adds a
+**model-engineering lane**: choose a paper-grounded architecture, scaffold a
+reproducible, leakage-safe PyTorch training repo, and validate, document, and
+evaluate a medical-imaging or LLM/MLLM model so the work reaches a paper — it
+**integrates** MONAI / nnU-Net, never reimplements them. Clinical AI model research
+engineering is in scope; it is **not** a diagnostic tool, an autonomous author, or a
+general AI-scientist platform, and every output requires human-expert verification.
+New here? See the [3 workflows below](#start-here-3-workflows), the
+[FAQ](docs/faq.md), and the
 [scope boundary](ROADMAP.md#not-planned--explicitly-out-of-scope).
 ---
@@ -82,17 +88,18 @@ Restart Claude Code, then start with **`/orchestrate`** — it classifies your r
 ### Install as a Claude Code plugin
-Prefer plugins? One line adds the marketplace; `/plugin` then lets you browse eight category plugins and enable the ones you want:
+Prefer plugins? One line adds the marketplace; `/plugin` then lets you browse nine category plugins and enable the ones you want:
 ```text
 /plugin marketplace add Aperivue/medsci-skills
-/plugin            # browse eight category plugins; enable the ones you want
+/plugin            # browse nine category plugins; enable the ones you want
 ```
 | Plugin | Covers |
 |--------|--------|
 | `medsci-literature` | Literature search, full-text retrieval, Zotero sync, reference-integrity audits |
 | `medsci-data` | Study design, variable operationalization, sample size, data cleaning, de-identification, codebooks, dataset versioning |
+| `medsci-modeling` | Architecture selection, reproducible model-scaffold repos, model-validation audits, Model Card/Datasheet, model & LLM/MLLM evaluation |
 | `medsci-analysis` | Statistics, figures, batch/cross-national/replication analysis, meta-analysis |
 | `medsci-writing` | IMRAD & protocol drafting, AI-pattern removal, AI-search optimization, reviewer responses |
 | `medsci-review` | Self-review, peer review, reporting-guideline compliance |
@@ -452,7 +459,7 @@ ma-scout -> search-lit -> fulltext-retrieval -> design-study ──> write-proto
 | **design-study** | Study design review: identifies analysis unit, cohort logic, data leakage risks, comparator design, validation strategy, and reporting guideline fit. |
 | **design-ai-benchmarking** | Design and validity review for benchmarking AI system(s) against a human-expert panel: evaluation-question and arm definition, decoupled multi-dimensional rubrics with anchors, planted calibration probes (positive-control / known-bad / instability / mechanism-contradiction), reviewer-panel construction with per-reviewer randomization, inter-rater reliability targets with separate control-item reliability, LLM-as-judge vs human-as-judge adjudication, construct-independence guards, and a structured JSON rating-export schema. Locks the rubric before data collection. |
 | **model-validation** | Design or audit the clinical-validation study for an engineer-built medical-imaging model (segmentation / classification / detection): patient-level split disjointness and the data-leakage taxonomy, tuning-on-test, internal vs genuine external validation, comparator design, single-run vs multi-seed variance, task-correct metric selection (Metrics Reloaded), test-set sizing, and CLAIM 2024 / TRIPOD+AI / STARD-AI reporting fit. Ships a deterministic split-leakage gate that proves patient disjointness by set arithmetic on the emitted split table. Integrates with MONAI / nnU-Net — does not replace them. |
-| **model-scaffold** | Generate a reproducible, runnable PyTorch training repo for a medical-imaging segmentation task — the missing middle link between choosing an architecture and validating a trained model. Emits a patient-level seed-locked split as an auditable artifact, a configurable U-Net, train/evaluate scripts that seed every RNG and infer under eval mode, a config, requirements, a reproducibility record, and a Methods stub with VERIFY placeholders (no fabricated numbers). Reproducibility holds by construction; ships a `check_training_hygiene` AST gate + a network-free build→validate challenge. Integrates with MONAI / nnU-Net / TorchIO — does not reimplement them. |
+| **model-scaffold** | Generate a reproducible, runnable PyTorch training repo for a medical-imaging task — segmentation (U-Net), classification, detection, image-to-image synthesis, or self-supervised pretraining — the missing middle link between choosing an architecture and validating a trained model. Emits a patient-level seed-locked split as an auditable artifact, a task-appropriate model, train/evaluate scripts that seed every RNG and infer under eval mode, a config, requirements, a reproducibility record, and a Methods stub with VERIFY placeholders (no fabricated numbers). Reproducibility holds by construction; ships a `check_training_hygiene` AST gate + a network-free build→validate challenge. Integrates with MONAI / nnU-Net / TorchIO / timm / torchvision — does not reimplement them. |
 | **architecture-zoo** | "Which architecture for which research question" decision tool: maps task (classification / segmentation / detection / transfer), modality, data scale, and class imbalance to a paper-grounded architecture shortlist. Curates the foundational curriculum (ResNet / DenseNet / EfficientNet / ViT / Swin; U-Net / 3-D U-Net / Attention & Residual U-Net / nnU-Net / Mask R-CNN; SAM/MedSAM / TotalSegmentator / BiomedCLIP / DINO / MAE / SimCLR) — each with core idea, when-to-use, medical-imaging use, reference implementation, validation setup, and the matching model-scaffold template. Advisory; teaches archetypes, not a live SOTA leaderboard. |
 | **model-card** | Generate the documentation an engineer-built medical-imaging model must carry — a Model Card (Mitchell et al. 2019), a Datasheet for its dataset (Gebru et al. 2021), and a METRIC-informed data-quality pass — filled from user-supplied facts (never fabricated), then verify every required section is present and non-empty with a deterministic completeness gate (`check_model_card_complete`). Model Card / Datasheet are documentation standards vendored as templates, not counted reporting checklists. |
 | **model-evaluation** | Compute and report task-correct held-out metrics for a trained medical-imaging model — segmentation (Dice + a boundary metric HD95/NSD, per structure), classification (AUROC + AUPRC + sensitivity/specificity with bootstrap CIs at the deployment prevalence), or detection (FROC/mAP with a stated IoU criterion) — plus calibration and subgroup slices. Emits a per-case table for analyze-stats and gates the metric choice against Metrics Reloaded / CLAIM 2024 (`check_metric_reporting`). Numbers come only from executed code. |

package/metadata/distribution_files.json CHANGED Viewed

@@ -398,14 +398,19 @@
     },
     {
       "path": "skills/architecture-zoo/SKILL.md",
-      "size": 5444,
-      "sha256": "6d8f81262a42ff24e36dca425511804b9f324d2c900f31f701c703e3d8326729"
+      "size": 5712,
+      "sha256": "ffb9a52d417f309a3b22c8d0f74e6700b67350c0d7a2a2e2c785f0cb4cb066bc"
     },
     {
       "path": "skills/architecture-zoo/references/classification.md",
       "size": 5986,
       "sha256": "035e0fddaccb0e19e23ffd7756b075154f847ee20f9a512cd2a010c0db4210fa"
     },
+    {
+      "path": "skills/architecture-zoo/references/detection.md",
+      "size": 3917,
+      "sha256": "60549b53a87dcb149c442498695321e94c28f0c8853316dda995a2dd730dfeec"
+    },
     {
       "path": "skills/architecture-zoo/references/foundation_models.md",
       "size": 5267,
@@ -413,18 +418,23 @@
     },
     {
       "path": "skills/architecture-zoo/references/index.md",
-      "size": 4065,
-      "sha256": "a1ed80efcf9a56e0286972ae9b08bd38965bb20a569e3aac5314549dfa6ad5f4"
+      "size": 4024,
+      "sha256": "d2360eb8635347f0f22be0b0e337a540db865aafdc12678455cf32bc49fd9d3d"
     },
     {
       "path": "skills/architecture-zoo/references/segmentation.md",
       "size": 6508,
       "sha256": "17618fe3d6884cf89b034ededcd69e081a65d7bfd473495eb2ab1fb5d8b15d8b"
     },
+    {
+      "path": "skills/architecture-zoo/references/synthesis.md",
+      "size": 3973,
+      "sha256": "b7fec1a55a7b9e2f8eea6979d7bfbda1fc03664aeec0cc89daac8c8f9bdfb8bc"
+    },
     {
       "path": "skills/architecture-zoo/skill.yml",
-      "size": 2889,
-      "sha256": "275cfb1d0779028d79d8596879c09b5b6c714859142cb72a12b7ec901acc69e1"
+      "size": 2836,
+      "sha256": "7d7c727a9fc75383e775feac14faf1c4caad90307ed8ca20859ef499ac17deb0"
     },
     {
       "path": "skills/author-strategy/SKILL.md",
@@ -2628,8 +2638,8 @@
     },
     {
       "path": "skills/model-scaffold/SKILL.md",
-      "size": 7048,
-      "sha256": "a1612de8a2888667137c3d81100adcb4a6c42d45c73edf93f715fad1da7e179f"
+      "size": 7686,
+      "sha256": "cede2020d3ceee0e1599cbcfb412230ab9c04b08032c8bbaff620e4129ec6785"
     },
     {
       "path": "skills/model-scaffold/references/training_guide.md",
@@ -2643,8 +2653,8 @@
     },
     {
       "path": "skills/model-scaffold/scripts/scaffold.py",
-      "size": 19327,
-      "sha256": "d37d617d9753faf6cd6f4d9332d896ad3704d4f2dea8578d4d23f02baedbfd09"
+      "size": 40470,
+      "sha256": "33569806ecd230aebb9d35c77a27b8480d23329b00f633d7fee9531a7bcec974"
     },
     {
       "path": "skills/model-scaffold/scripts/scaffold_challenge/expected/split_assignment.csv",
@@ -2663,13 +2673,13 @@
     },
     {
       "path": "skills/model-scaffold/scripts/scaffold_challenge/verify.sh",
-      "size": 4191,
-      "sha256": "f2476dc72772ffa77ba13ed43067a19bf4f9b6fbce9ce49a1726eee2f64d7b21"
+      "size": 5136,
+      "sha256": "040acee712943e8392b70b40b790438bec056a3a4e7a282f76455d2a2b791447"
     },
     {
       "path": "skills/model-scaffold/skill.yml",
-      "size": 3104,
-      "sha256": "a6284452adbbb533c63dcd294c5b4b7fce7d93e1be381d27323168ac3888ba33"
+      "size": 3195,
+      "sha256": "06c4ed02872bfc38abd1c753c02a640a2bfaa1141fbfdd9f8b9ccbe18a148d82"
     },
     {
       "path": "skills/model-validation/SKILL.md",
@@ -2963,8 +2973,8 @@
     },
     {
       "path": "skills/present-paper/SKILL.md",
-      "size": 29247,
-      "sha256": "aa8455317bd4996d5b1e0cc9d27c8be33b8112b7826749d87376c161d6cf87d5"
+      "size": 33518,
+      "sha256": "777197edf83a4d242508b366abe24681198467a95950b0acc385ac2714f33cb4"
     },
     {
       "path": "skills/present-paper/references/critic_rubrics/slide.md",
@@ -2981,11 +2991,41 @@
       "size": 15007,
       "sha256": "d0f964af7523ec8bfef50ca627878f8c2cfe58159c2a827c5f7dfd43585cf9ee"
     },
+    {
+      "path": "skills/present-paper/references/presentation_design_guidelines.md",
+      "size": 7460,
+      "sha256": "689d021ff7fc5e04abf93f3d6e1b0646bb5aa86430239b76b51d2224b3b92f0c"
+    },
     {
       "path": "skills/present-paper/references/slide_design_principles.md",
       "size": 10436,
       "sha256": "7f2a5e03c8f2ddbb2d84a163506c5f3a2d1cca1353a694abd7bfb14225324826"
     },
+    {
+      "path": "skills/present-paper/references/slide_visual_styles/CATALOG.md",
+      "size": 3177,
+      "sha256": "78782ce6916212bcae9e1d8513197721a8ac72ed1faa33ce335b36d0c88a828f"
+    },
+    {
+      "path": "skills/present-paper/references/slide_visual_styles/clinical_blue.md",
+      "size": 3249,
+      "sha256": "334ee770935b93a1ddf86c426a8f3da377d2618c0f6553308ccc38e871f26160"
+    },
+    {
+      "path": "skills/present-paper/references/slide_visual_styles/dark_modern.md",
+      "size": 3197,
+      "sha256": "b5adf2331318a1276fa5eedf173a36c056d220fbfa04819a5c4d4d68b1109f68"
+    },
+    {
+      "path": "skills/present-paper/references/slide_visual_styles/editorial_mono.md",
+      "size": 3068,
+      "sha256": "db07e84bbf4d248f0a4837fe22ed61c82e176fe94055b590131e24d25c59f20b"
+    },
+    {
+      "path": "skills/present-paper/references/slide_visual_styles/institutional_brand.md",
+      "size": 4431,
+      "sha256": "c8b04c93bf61072fc6c4ee7d402e107d7375e933d60085bf325d74830896684a"
+    },
     {
       "path": "skills/present-paper/references/slide_visual_styles/nature_lancet.md",
       "size": 7989,
@@ -3011,6 +3051,11 @@
       "size": 6758,
       "sha256": "6eeaf94c396d0f4ff365eaea0408f2fc00f8e2dc75b53a9514214194cb9329f9"
     },
+    {
+      "path": "skills/present-paper/scripts/inspect_pptx_template.py",
+      "size": 5073,
+      "sha256": "648fe3d2904a5ffffb41eb064d1780f605a672dc44618a74b5e3e59c023cb63d"
+    },
     {
       "path": "skills/present-paper/scripts/strip_notes_for_sharing.py",
       "size": 5508,

package/metadata/distribution_manifest.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "schema_version": 1,
-  "version": "4.11.0",
+  "version": "5.0.0",
   "owned_skills": [
     "academic-aio",
     "add-journal",

package/metadata/skills_catalog.json CHANGED Viewed

@@ -17,7 +17,6 @@
       "key": "data_study_design",
       "label": "Data & Study Design",
       "slugs": [
-        "architecture-zoo",
         "calc-sample-size",
         "clean-data",
         "define-variables",
@@ -25,12 +24,19 @@
         "design-ai-benchmarking",
         "design-study",
         "generate-codebook",
+        "version-dataset"
+      ]
+    },
+    {
+      "key": "model_engineering",
+      "label": "Model Engineering & Validation",
+      "slugs": [
+        "architecture-zoo",
         "mllm-eval",
         "model-card",
         "model-evaluation",
         "model-scaffold",
-        "model-validation",
-        "version-dataset"
+        "model-validation"
       ]
     },
     {
@@ -132,8 +138,8 @@
     },
     {
       "slug": "architecture-zoo",
-      "category": "data_study_design",
-      "category_label": "Data & Study Design",
+      "category": "model_engineering",
+      "category_label": "Model Engineering & Validation",
       "layer": "D",
       "owner_domain": "architecture_reference",
       "maturity": "official",
@@ -366,8 +372,8 @@
     },
     {
       "slug": "mllm-eval",
-      "category": "data_study_design",
-      "category_label": "Data & Study Design",
+      "category": "model_engineering",
+      "category_label": "Model Engineering & Validation",
       "layer": "D",
       "owner_domain": "model_evaluation",
       "maturity": "official",
@@ -375,8 +381,8 @@
     },
     {
       "slug": "model-card",
-      "category": "data_study_design",
-      "category_label": "Data & Study Design",
+      "category": "model_engineering",
+      "category_label": "Model Engineering & Validation",
       "layer": "C",
       "owner_domain": "model_reporting",
       "maturity": "official",
@@ -384,8 +390,8 @@
     },
     {
       "slug": "model-evaluation",
-      "category": "data_study_design",
-      "category_label": "Data & Study Design",
+      "category": "model_engineering",
+      "category_label": "Model Engineering & Validation",
       "layer": "B",
       "owner_domain": "model_evaluation",
       "maturity": "official",
@@ -393,17 +399,17 @@
     },
     {
       "slug": "model-scaffold",
-      "category": "data_study_design",
-      "category_label": "Data & Study Design",
+      "category": "model_engineering",
+      "category_label": "Model Engineering & Validation",
       "layer": "B",
       "owner_domain": "model_development",
       "maturity": "official",
-      "description": "Generate a reproducible, runnable PyTorch training repo for a medical-imaging segmentation task — the missing middle link between choosing an architecture and validating a trained model."
+      "description": "Generate a reproducible, runnable PyTorch training repo for a medical-imaging task — segmentation, classification, detection, image-to-image synthesis, or self-supervised pretraining — the missing mid…"
     },
     {
       "slug": "model-validation",
-      "category": "data_study_design",
-      "category_label": "Data & Study Design",
+      "category": "model_engineering",
+      "category_label": "Model Engineering & Validation",
       "layer": "D",
       "owner_domain": "model_validation",
       "maturity": "official",

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "medsci-skills",
-  "version": "4.11.0",
+  "version": "5.0.0",
   "description": "MedSci Skills — a medical/scientific research skill suite for AI coding agents (Claude Code, Codex, Cursor, Copilot). The npm package is a terminal-friendly installer shortcut; the canonical distribution remains the GitHub repository and the Claude Code plugin marketplace.",
   "license": "SEE LICENSE IN LICENSE",
   "homepage": "https://github.com/Aperivue/medsci-skills#readme",

package/skills/architecture-zoo/SKILL.md CHANGED Viewed

@@ -56,6 +56,10 @@ a family card.
   ViT / Swin / DeiT.
 - `${CLAUDE_SKILL_DIR}/references/segmentation.md` — U-Net / 3-D U-Net / V-Net / Attention & Residual
   U-Net / nnU-Net / SegResNet / Swin-UNETR / Mask R-CNN.
+- `${CLAUDE_SKILL_DIR}/references/detection.md` — R-CNN family / Faster R-CNN + FPN / Mask R-CNN /
+  RetinaNet / YOLO / DETR.
+- `${CLAUDE_SKILL_DIR}/references/synthesis.md` — Pix2Pix / CycleGAN / SPADE / diffusion (DDPM, latent) /
+  VAE / fastMRI reconstruction.
 - `${CLAUDE_SKILL_DIR}/references/foundation_models.md` — SAM / MedSAM / MedSAM2 / TotalSegmentator /
   SegVol / BiomedCLIP / DINO / MAE / SimCLR / MoCo.
 Each card gives the paper, core idea, when-to-use, medical-imaging use, reference implementation, and the

package/skills/architecture-zoo/references/detection.md ADDED Viewed

@@ -0,0 +1,68 @@
+# Detection architectures (architecture-zoo)
+For "find and localise lesions" questions — boxes / points, a count, and a per-lesion
+hit/miss (FROC). Distinct from segmentation (a pixel mask) and classification (a per-image
+label): detection localises *instances*. `/model-scaffold --task detection` emits a
+torchvision Faster R-CNN repo whose FROC/mAP you compute downstream.
+Each card: **paper → core idea → when to use → medical-imaging use → reference impl →
+validation/experiment setup.**
+---
+## Two-stage detectors (region proposal → classify)
+### R-CNN → Fast R-CNN → Faster R-CNN (+ FPN)
+- **Papers**: Girshick et al., R-CNN, *CVPR* 2014; Girshick, Fast R-CNN, *ICCV* 2015; Ren
+  et al., Faster R-CNN, *NeurIPS* 2015; Lin et al., **FPN**, *CVPR* 2017.
+- **Core idea**: Faster R-CNN adds a learned Region Proposal Network (end-to-end); FPN adds
+  a multi-scale feature pyramid so small and large lesions are both detected.
+- **When to use**: the **default two-stage detector** for medical lesion detection — strong,
+  well-understood, good on small objects with FPN; favour accuracy over real-time speed.
+- **Medical-imaging use**: nodule / lesion / aneurysm detection on CT / MR / mammography
+  (ResNet-FPN backbone).
+- **Reference impl**: torchvision `fasterrcnn_resnet50_fpn`; MONAI detection (RetinaNet).
+- **Validation setup**: report **FROC** (sensitivity per false-positive-per-scan) or **mAP
+  with the IoU match criterion stated**; per-lesion analysis with patient-level clustering
+  disclosed; not patient-level accuracy (`/model-validation` MD6).
+### Mask R-CNN (detect + segment instances)
+- **Paper**: He et al., Mask R-CNN, *ICCV* 2017.
+- **Core idea**: a mask head on Faster R-CNN → per-instance box + class + mask.
+- **When to use**: **count + localise + delineate** separate lesions (instance-level), not a
+  single semantic mask (that is `segmentation.md`).
+- **Reference impl**: torchvision `maskrcnn_resnet50_fpn`.
+- **Validation setup**: detection metrics for the boxes + per-instance Dice for the masks.
+## One-stage / query-based detectors (faster, end-to-end)
+### RetinaNet (focal loss)
+- **Paper**: Lin et al., "Focal Loss for Dense Object Detection," *ICCV* 2017.
+- **Core idea**: a one-stage dense detector with **focal loss** to handle the extreme
+  foreground/background imbalance — relevant when lesions are sparse.
+- **When to use**: faster than two-stage, strong under heavy class imbalance.
+- **Reference impl**: torchvision `retinanet_resnet50_fpn`; MONAI detection.
+### YOLO family
+- **Papers**: Redmon et al., YOLO, *CVPR* 2016; later YOLOv3+/YOLOX.
+- **Core idea**: a single network predicts boxes + classes directly on a grid — real-time.
+- **When to use**: speed-critical / interactive settings; usually two-stage detectors are
+  preferred for maximal sensitivity on small medical lesions.
+### DETR (transformer, set prediction)
+- **Paper**: Carion et al., "End-to-End Object Detection with Transformers," *ECCV* 2020.
+- **Core idea**: a transformer treats detection as direct **set prediction** (no anchors /
+  NMS) via learned object queries + bipartite matching.
+- **When to use**: large datasets where an anchor-free, end-to-end pipeline is attractive;
+  more data-hungry and slower to converge than CNN detectors.
+- **Reference impl**: the official DETR repo; Deformable DETR for faster convergence.
+---
+## Choosing among these
+Default lesion detection → **Faster R-CNN + FPN** (torchvision; `/model-scaffold --task
+detection`). Sparse lesions / imbalance → **RetinaNet (focal loss)**. Count + delineate
+instances → **Mask R-CNN**. Speed-critical → **YOLO**. Large data, anchor-free → **DETR**.
+Always report **FROC / mAP with the IoU criterion stated**, per-lesion with patient-level
+clustering disclosed. Record the choice + paper, hand to `/model-scaffold`, validate with
+`/model-validation` and `/model-evaluation`.

package/skills/architecture-zoo/references/index.md CHANGED Viewed

@@ -10,10 +10,10 @@ per-paper detail and the `/model-scaffold` template to instantiate.
 |---|---|---|
 | "is finding X present / which class" (per image / per patient) | **classification** (binary / multi-label) | `classification.md` |
 | "delineate / measure structure X" (pixel/voxel mask, volume, boundary) | **segmentation** | `segmentation.md` |
-| "find and localise lesions" (boxes / points, count, FROC) | **detection** | *(forthcoming — see segmentation's Mask R-CNN note)* |
+| "find and localise lesions" (boxes / points, count, FROC) | **detection** | `detection.md` |
 | "I have few labels / want to pretrain on unlabelled scans" | **self-supervised pretraining → fine-tune** | `foundation_models.md` |
 | "adapt a released medical foundation model" | **transfer / prompt a foundation model** | `foundation_models.md` |
-| "synthesise / translate a modality" (MRI→CT, denoise) | **image-to-image / generative** | *(forthcoming)* |
+| "synthesise / translate a modality" (MRI→CT, denoise) | **image-to-image / generative** | `synthesis.md` |
 | "generate a report / answer a visual question" | **multimodal LLM** | *(use `/mllm-eval`; not a CNN choice)* |
 ## Step 2 — let the constraints narrow it

package/skills/architecture-zoo/references/synthesis.md ADDED Viewed

@@ -0,0 +1,71 @@
+# Image synthesis / translation architectures (architecture-zoo)
+For "synthesise or translate a modality" questions — MRI→CT, non-contrast→contrast,
+low-dose→full-dose, denoising, super-resolution, or generating training images.
+`/model-scaffold --task synthesis` emits a Pix2Pix repo. **Caveat**: a synthetic image can
+carry hallucinated structure, so a downstream-task or reader validation is mandatory (the
+`image_synthesis.md` reviewer probe, IS1–IS4, owns this).
+Each card: **paper → core idea → when to use → medical-imaging use → reference impl →
+validation/experiment setup.**
+---
+## Conditional GANs (image-to-image)
+### Pix2Pix (paired) / CycleGAN (unpaired)
+- **Papers**: Isola et al., Pix2Pix, *CVPR* 2017 (paired, U-Net generator + PatchGAN);
+  Zhu et al., CycleGAN, *ICCV* 2017 (unpaired, cycle-consistency).
+- **Core idea**: a conditional GAN maps a source image to a target domain; Pix2Pix needs
+  **paired** (registered) images, CycleGAN works **unpaired** via cycle consistency.
+- **When to use**: cross-modality translation when paired data exist (Pix2Pix) or do not
+  (CycleGAN — but it can hallucinate, so validate carefully).
+- **Medical-imaging use**: MRI→CT for attenuation correction / planning, CBCT→CT, virtual
+  contrast, stain transfer in pathology; bone suppression on CXR (paired).
+- **Reference impl**: the official pytorch-CycleGAN-and-pix2pix repo; `/model-scaffold
+  --task synthesis` emits a small Pix2Pix (U-Net generator + PatchGAN).
+- **Validation setup**: image-fidelity metrics (SSIM / PSNR) are necessary but **not
+  sufficient** — add a **downstream-task** metric (does a model / clinician perform the
+  clinical task as well on synthetic as on real?) and disclose hallucination risk
+  (`image_synthesis.md` IS1–IS4).
+### SPADE / conditional generators
+- **Paper**: Park et al., SPADE, *CVPR* 2019 (spatially-adaptive normalisation from a
+  semantic map).
+- **When to use**: generating images conditioned on a segmentation map (e.g. lesion
+  insertion / data augmentation with controlled anatomy).
+- **Medical-imaging use**: nodule / lesion synthesis for augmentation (with a perceptual
+  loss; Johnson et al. 2016).
+## Diffusion models (current SOTA for fidelity / diversity)
+### DDPM / latent diffusion
+- **Papers**: Ho et al., DDPM, *NeurIPS* 2020; Rombach et al., latent diffusion, *CVPR*
+  2022.
+- **Core idea**: learn to reverse a gradual noising process; higher fidelity and mode
+  coverage than GANs, at higher compute.
+- **When to use**: when sample quality / diversity matters and compute allows; increasingly
+  the default for medical image generation and reconstruction.
+- **Reference impl**: MONAI `generative` (DiffusionModelUNet); HuggingFace `diffusers`.
+- **Validation setup**: as GANs — fidelity + downstream-task + hallucination disclosure;
+  for reconstruction, compare against the acquired ground truth.
+## Reconstruction / restoration
+### VAE / U-Net restoration / fastMRI baselines
+- **Papers**: Kingma & Welling, VAE, *ICLR* 2014; the fastMRI benchmark (Zbontar et al.
+  2018) for MRI reconstruction.
+- **When to use**: denoising, artefact removal, accelerated MRI reconstruction (often a
+  U-Net or unrolled model rather than a GAN).
+- **Validation setup**: against the fully-sampled / full-dose reference, with a downstream
+  diagnostic metric.
+---
+## Choosing among these
+Paired translation → **Pix2Pix** (`/model-scaffold --task synthesis`). Unpaired → **CycleGAN**
+(validate for hallucination). Conditioned on a map / augmentation → **SPADE**. Highest fidelity,
+compute available → **diffusion** (MONAI generative). Reconstruction / denoising → **U-Net /
+unrolled / fastMRI baselines**. In every case, **image-fidelity metrics are not enough** — add a
+downstream-task or reader validation and disclose hallucination risk. Record the choice + paper,
+hand to `/model-scaffold`, validate with `/model-validation` (and the `image_synthesis` probe).

package/skills/architecture-zoo/skill.yml CHANGED Viewed

@@ -29,7 +29,7 @@ safety_boundaries:
   - "Advisory only: it writes a decision note, never code or weights; the build is /model-scaffold."
   - "Every recommendation names its source paper; benchmark numbers are cited, never invented; the zoo describes archetypes, not a live leaderboard."
 known_limitations:
-  - "The literature moves fast; this is a curated archetype map (classification, segmentation, foundation/SSL families seeded), not an exhaustive or current SOTA ranking — additional families (detection, synthesis) land in later phases."
+  - "The literature moves fast; this is a curated archetype map (classification, segmentation, detection, synthesis, foundation/SSL families), not an exhaustive or current SOTA ranking."
   - "A sound architecture choice is necessary, not sufficient; validity still depends on the split, validation design, and metrics (/model-validation, /model-evaluation)."
 validation_commands:
   - "carry the decision note into /model-scaffold to instantiate the chosen template, then /model-validation"

package/skills/model-scaffold/SKILL.md CHANGED Viewed

@@ -1,14 +1,15 @@
 ---
 name: model-scaffold
 description: >
-  Generate a reproducible, runnable PyTorch training repo for a medical-imaging segmentation task —
-  the missing middle link between choosing an architecture and validating a trained model. Emits a
-  patient-level seed-locked split as an auditable artifact, a configurable U-Net, train and evaluate
-  scripts that seed every RNG and infer under eval mode, a config, requirements, a reproducibility
-  record, and a Methods stub with VERIFY placeholders (no fabricated numbers). The reproducibility
-  guarantees hold by construction, so the build is leakage-safe before any training runs. Integrates
-  with MONAI, nnU-Net, and TorchIO — it does not reimplement them.
-triggers: model scaffold, scaffold a model, training repo, PyTorch repo, build a model, train a segmentation model, U-Net, UNet, segmentation model, nnU-Net, MONAI, dataloader, train.py, patient-level split, reproducible training, seed everything, generate training code, medical imaging model
+  Generate a reproducible, runnable PyTorch training repo for a medical-imaging task — segmentation,
+  classification, detection, image-to-image synthesis, or self-supervised pretraining — the missing
+  middle link between choosing an architecture and validating a trained model. Emits a patient-level
+  seed-locked split as an auditable artifact, a task-appropriate model, train and evaluate scripts that
+  seed every RNG and infer under eval mode, a config, requirements, a reproducibility record, and a
+  Methods stub with VERIFY placeholders (no fabricated numbers). The reproducibility guarantees hold by
+  construction, so the build is leakage-safe before any training runs. Integrates with MONAI, nnU-Net,
+  TorchIO, timm, and torchvision — it does not reimplement them.
+triggers: model scaffold, scaffold a model, training repo, PyTorch repo, build a model, train a model, segmentation, classification, detection, image synthesis, self-supervised, SimCLR, Pix2Pix, Faster R-CNN, U-Net, UNet, nnU-Net, MONAI, timm, torchvision, dataloader, train.py, patient-level split, reproducible training, seed everything, generate training code, medical imaging model
 tools: Read, Write, Edit, Bash, Grep, Glob
 model: inherit
 ---
@@ -17,7 +18,9 @@ model: inherit
 ## Purpose
-This skill stamps out a **runnable PyTorch training repo** for a medical-imaging segmentation task
+This skill stamps out a **runnable PyTorch training repo** for a medical-imaging task — `--task`
+**segmentation** (U-Net), **classification** (CNN / `timm` backbone), **detection** (torchvision Faster
+R-CNN / FPN), **synthesis** (Pix2Pix generator + PatchGAN), or **ssl** (SimCLR encoder) —
 with the reproducibility guarantees **baked in by construction** — so the build is leakage-safe and
 reproducible before a single epoch runs. It is the imaging analogue of how `/analyze-stats` generates
 runnable statistical code: the generator produces the repo, you run the training on your GPU / Colab,
@@ -49,11 +52,14 @@ patient level off this column.
 ### Phase 2 — Generate the repo
 ```bash
 python3 ${CLAUDE_SKILL_DIR}/scripts/scaffold.py \
-  --manifest <manifest.csv> --out model_repo --seed 42 \
+  --manifest <manifest.csv> --task segmentation --out model_repo --seed 42 \
   --in-channels 1 --out-channels 1
+# --task = segmentation | classification | detection | synthesis | ssl
+#   (out-channels = num classes for classification, target channels for synthesis)
 ```
-This writes `model_repo/` with `config.yaml`, `model.py` (configurable U-Net), `dataset.py` (reads the
-frozen split), `losses.py` (Dice + BCE), `train.py`, `evaluate.py`, `requirements.txt`,
+This writes `model_repo/` with `config.yaml`, `model.py` (the task's model — U-Net / CNN / Faster R-CNN
+/ Pix2Pix / SimCLR encoder), `dataset.py` (reads the frozen split), `losses.py` (task-appropriate),
+`train.py`, `evaluate.py`, `requirements.txt`,
 `REPRODUCIBILITY.md`, `methods_stub.md`, and — the key artifact — `splits/split_assignment.csv` +
 `splits/split_seed.txt`. The split is **patient-disjoint by construction** (a deterministic group split)
 and the emitted code seeds every RNG, sets cuDNN deterministic, builds the training loader from the