medsci-skills 4.11.0 → 5.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -4,7 +4,7 @@
4
4
 
5
5
  **51 skills that actually work.** Built by a physician-researcher, tested on real publications.
6
6
 
7
- *MedSci Skills is a submission-grade clinical manuscript workflow, not a generic biomedical skill catalog. Its moat is the compliance layer — 38 reporting guidelines and risk-of-bias tools, reference/citation verification, and deterministic integrity gates, before peer review sees the manuscript. It competes on clinical submission reliability, not skill count.*
7
+ *MedSci Skills is an end-to-end research tool for physician and medical-engineering researchers — design → scaffold → validate → publish — for the clinical manuscript and the medical-AI model behind it. Its moat is the compliance layer — 38 reporting guidelines and risk-of-bias tools, reference/citation verification, and deterministic integrity gates before peer review now extended by a model-engineering lane that scaffolds reproducible, leakage-safe training repos and audits model validation. Clinical AI model research engineering is in scope; a general AI-scientist platform is not. It competes on clinical submission reliability, not skill count.*
8
8
 
9
9
  [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
10
10
  [![Release](https://img.shields.io/github/v/release/Aperivue/medsci-skills?style=flat-square&color=blue)](https://github.com/Aperivue/medsci-skills/releases/latest)
@@ -42,14 +42,20 @@
42
42
  ## What is MedSci Skills?
43
43
 
44
44
  MedSci Skills is an open-source Claude Code skill collection for **clinical
45
- manuscript preparation**. It helps physician-researchers and biomedical
46
- investigators move from literature search, study design, statistics, and figures to
47
- reporting-guideline compliance, citation/reference auditing, numerical-consistency
48
- checks, and response-to-reviewer workflows — combining agentic writing with
49
- **deterministic integrity gates** for submission-grade biomedical research. It is
50
- **not** a diagnostic tool, an autonomous author, or a general AI-scientist platform;
51
- every output requires human-expert verification. New here? See the
52
- [3 workflows below](#start-here-3-workflows), the [FAQ](docs/faq.md), and the
45
+ research — the manuscript and the medical-AI model alike**. It helps
46
+ physician-researchers and biomedical/medical-engineering investigators move from
47
+ literature search, study design, statistics, and figures to reporting-guideline
48
+ compliance, citation/reference auditing, numerical-consistency checks, and
49
+ response-to-reviewer workflows combining agentic writing with **deterministic
50
+ integrity gates** for submission-grade biomedical research. As of **v5.0** it adds a
51
+ **model-engineering lane**: choose a paper-grounded architecture, scaffold a
52
+ reproducible, leakage-safe PyTorch training repo, and validate, document, and
53
+ evaluate a medical-imaging or LLM/MLLM model so the work reaches a paper — it
54
+ **integrates** MONAI / nnU-Net, never reimplements them. Clinical AI model research
55
+ engineering is in scope; it is **not** a diagnostic tool, an autonomous author, or a
56
+ general AI-scientist platform, and every output requires human-expert verification.
57
+ New here? See the [3 workflows below](#start-here-3-workflows), the
58
+ [FAQ](docs/faq.md), and the
53
59
  [scope boundary](ROADMAP.md#not-planned--explicitly-out-of-scope).
54
60
 
55
61
  ---
@@ -82,17 +88,18 @@ Restart Claude Code, then start with **`/orchestrate`** — it classifies your r
82
88
 
83
89
  ### Install as a Claude Code plugin
84
90
 
85
- Prefer plugins? One line adds the marketplace; `/plugin` then lets you browse eight category plugins and enable the ones you want:
91
+ Prefer plugins? One line adds the marketplace; `/plugin` then lets you browse nine category plugins and enable the ones you want:
86
92
 
87
93
  ```text
88
94
  /plugin marketplace add Aperivue/medsci-skills
89
- /plugin # browse eight category plugins; enable the ones you want
95
+ /plugin # browse nine category plugins; enable the ones you want
90
96
  ```
91
97
 
92
98
  | Plugin | Covers |
93
99
  |--------|--------|
94
100
  | `medsci-literature` | Literature search, full-text retrieval, Zotero sync, reference-integrity audits |
95
101
  | `medsci-data` | Study design, variable operationalization, sample size, data cleaning, de-identification, codebooks, dataset versioning |
102
+ | `medsci-modeling` | Architecture selection, reproducible model-scaffold repos, model-validation audits, Model Card/Datasheet, model & LLM/MLLM evaluation |
96
103
  | `medsci-analysis` | Statistics, figures, batch/cross-national/replication analysis, meta-analysis |
97
104
  | `medsci-writing` | IMRAD & protocol drafting, AI-pattern removal, AI-search optimization, reviewer responses |
98
105
  | `medsci-review` | Self-review, peer review, reporting-guideline compliance |
@@ -452,7 +459,7 @@ ma-scout -> search-lit -> fulltext-retrieval -> design-study ──> write-proto
452
459
  | **design-study** | Study design review: identifies analysis unit, cohort logic, data leakage risks, comparator design, validation strategy, and reporting guideline fit. |
453
460
  | **design-ai-benchmarking** | Design and validity review for benchmarking AI system(s) against a human-expert panel: evaluation-question and arm definition, decoupled multi-dimensional rubrics with anchors, planted calibration probes (positive-control / known-bad / instability / mechanism-contradiction), reviewer-panel construction with per-reviewer randomization, inter-rater reliability targets with separate control-item reliability, LLM-as-judge vs human-as-judge adjudication, construct-independence guards, and a structured JSON rating-export schema. Locks the rubric before data collection. |
454
461
  | **model-validation** | Design or audit the clinical-validation study for an engineer-built medical-imaging model (segmentation / classification / detection): patient-level split disjointness and the data-leakage taxonomy, tuning-on-test, internal vs genuine external validation, comparator design, single-run vs multi-seed variance, task-correct metric selection (Metrics Reloaded), test-set sizing, and CLAIM 2024 / TRIPOD+AI / STARD-AI reporting fit. Ships a deterministic split-leakage gate that proves patient disjointness by set arithmetic on the emitted split table. Integrates with MONAI / nnU-Net — does not replace them. |
455
- | **model-scaffold** | Generate a reproducible, runnable PyTorch training repo for a medical-imaging segmentation task — the missing middle link between choosing an architecture and validating a trained model. Emits a patient-level seed-locked split as an auditable artifact, a configurable U-Net, train/evaluate scripts that seed every RNG and infer under eval mode, a config, requirements, a reproducibility record, and a Methods stub with VERIFY placeholders (no fabricated numbers). Reproducibility holds by construction; ships a `check_training_hygiene` AST gate + a network-free build→validate challenge. Integrates with MONAI / nnU-Net / TorchIO — does not reimplement them. |
462
+ | **model-scaffold** | Generate a reproducible, runnable PyTorch training repo for a medical-imaging task — segmentation (U-Net), classification, detection, image-to-image synthesis, or self-supervised pretraining — the missing middle link between choosing an architecture and validating a trained model. Emits a patient-level seed-locked split as an auditable artifact, a task-appropriate model, train/evaluate scripts that seed every RNG and infer under eval mode, a config, requirements, a reproducibility record, and a Methods stub with VERIFY placeholders (no fabricated numbers). Reproducibility holds by construction; ships a `check_training_hygiene` AST gate + a network-free build→validate challenge. Integrates with MONAI / nnU-Net / TorchIO / timm / torchvision — does not reimplement them. |
456
463
  | **architecture-zoo** | "Which architecture for which research question" decision tool: maps task (classification / segmentation / detection / transfer), modality, data scale, and class imbalance to a paper-grounded architecture shortlist. Curates the foundational curriculum (ResNet / DenseNet / EfficientNet / ViT / Swin; U-Net / 3-D U-Net / Attention & Residual U-Net / nnU-Net / Mask R-CNN; SAM/MedSAM / TotalSegmentator / BiomedCLIP / DINO / MAE / SimCLR) — each with core idea, when-to-use, medical-imaging use, reference implementation, validation setup, and the matching model-scaffold template. Advisory; teaches archetypes, not a live SOTA leaderboard. |
457
464
  | **model-card** | Generate the documentation an engineer-built medical-imaging model must carry — a Model Card (Mitchell et al. 2019), a Datasheet for its dataset (Gebru et al. 2021), and a METRIC-informed data-quality pass — filled from user-supplied facts (never fabricated), then verify every required section is present and non-empty with a deterministic completeness gate (`check_model_card_complete`). Model Card / Datasheet are documentation standards vendored as templates, not counted reporting checklists. |
458
465
  | **model-evaluation** | Compute and report task-correct held-out metrics for a trained medical-imaging model — segmentation (Dice + a boundary metric HD95/NSD, per structure), classification (AUROC + AUPRC + sensitivity/specificity with bootstrap CIs at the deployment prevalence), or detection (FROC/mAP with a stated IoU criterion) — plus calibration and subgroup slices. Emits a per-case table for analyze-stats and gates the metric choice against Metrics Reloaded / CLAIM 2024 (`check_metric_reporting`). Numbers come only from executed code. |
@@ -398,14 +398,19 @@
398
398
  },
399
399
  {
400
400
  "path": "skills/architecture-zoo/SKILL.md",
401
- "size": 5444,
402
- "sha256": "6d8f81262a42ff24e36dca425511804b9f324d2c900f31f701c703e3d8326729"
401
+ "size": 5712,
402
+ "sha256": "ffb9a52d417f309a3b22c8d0f74e6700b67350c0d7a2a2e2c785f0cb4cb066bc"
403
403
  },
404
404
  {
405
405
  "path": "skills/architecture-zoo/references/classification.md",
406
406
  "size": 5986,
407
407
  "sha256": "035e0fddaccb0e19e23ffd7756b075154f847ee20f9a512cd2a010c0db4210fa"
408
408
  },
409
+ {
410
+ "path": "skills/architecture-zoo/references/detection.md",
411
+ "size": 3917,
412
+ "sha256": "60549b53a87dcb149c442498695321e94c28f0c8853316dda995a2dd730dfeec"
413
+ },
409
414
  {
410
415
  "path": "skills/architecture-zoo/references/foundation_models.md",
411
416
  "size": 5267,
@@ -413,18 +418,23 @@
413
418
  },
414
419
  {
415
420
  "path": "skills/architecture-zoo/references/index.md",
416
- "size": 4065,
417
- "sha256": "a1ed80efcf9a56e0286972ae9b08bd38965bb20a569e3aac5314549dfa6ad5f4"
421
+ "size": 4024,
422
+ "sha256": "d2360eb8635347f0f22be0b0e337a540db865aafdc12678455cf32bc49fd9d3d"
418
423
  },
419
424
  {
420
425
  "path": "skills/architecture-zoo/references/segmentation.md",
421
426
  "size": 6508,
422
427
  "sha256": "17618fe3d6884cf89b034ededcd69e081a65d7bfd473495eb2ab1fb5d8b15d8b"
423
428
  },
429
+ {
430
+ "path": "skills/architecture-zoo/references/synthesis.md",
431
+ "size": 3973,
432
+ "sha256": "b7fec1a55a7b9e2f8eea6979d7bfbda1fc03664aeec0cc89daac8c8f9bdfb8bc"
433
+ },
424
434
  {
425
435
  "path": "skills/architecture-zoo/skill.yml",
426
- "size": 2889,
427
- "sha256": "275cfb1d0779028d79d8596879c09b5b6c714859142cb72a12b7ec901acc69e1"
436
+ "size": 2836,
437
+ "sha256": "7d7c727a9fc75383e775feac14faf1c4caad90307ed8ca20859ef499ac17deb0"
428
438
  },
429
439
  {
430
440
  "path": "skills/author-strategy/SKILL.md",
@@ -2628,8 +2638,8 @@
2628
2638
  },
2629
2639
  {
2630
2640
  "path": "skills/model-scaffold/SKILL.md",
2631
- "size": 7048,
2632
- "sha256": "a1612de8a2888667137c3d81100adcb4a6c42d45c73edf93f715fad1da7e179f"
2641
+ "size": 7686,
2642
+ "sha256": "cede2020d3ceee0e1599cbcfb412230ab9c04b08032c8bbaff620e4129ec6785"
2633
2643
  },
2634
2644
  {
2635
2645
  "path": "skills/model-scaffold/references/training_guide.md",
@@ -2643,8 +2653,8 @@
2643
2653
  },
2644
2654
  {
2645
2655
  "path": "skills/model-scaffold/scripts/scaffold.py",
2646
- "size": 19327,
2647
- "sha256": "d37d617d9753faf6cd6f4d9332d896ad3704d4f2dea8578d4d23f02baedbfd09"
2656
+ "size": 40470,
2657
+ "sha256": "33569806ecd230aebb9d35c77a27b8480d23329b00f633d7fee9531a7bcec974"
2648
2658
  },
2649
2659
  {
2650
2660
  "path": "skills/model-scaffold/scripts/scaffold_challenge/expected/split_assignment.csv",
@@ -2663,13 +2673,13 @@
2663
2673
  },
2664
2674
  {
2665
2675
  "path": "skills/model-scaffold/scripts/scaffold_challenge/verify.sh",
2666
- "size": 4191,
2667
- "sha256": "f2476dc72772ffa77ba13ed43067a19bf4f9b6fbce9ce49a1726eee2f64d7b21"
2676
+ "size": 5136,
2677
+ "sha256": "040acee712943e8392b70b40b790438bec056a3a4e7a282f76455d2a2b791447"
2668
2678
  },
2669
2679
  {
2670
2680
  "path": "skills/model-scaffold/skill.yml",
2671
- "size": 3104,
2672
- "sha256": "a6284452adbbb533c63dcd294c5b4b7fce7d93e1be381d27323168ac3888ba33"
2681
+ "size": 3195,
2682
+ "sha256": "06c4ed02872bfc38abd1c753c02a640a2bfaa1141fbfdd9f8b9ccbe18a148d82"
2673
2683
  },
2674
2684
  {
2675
2685
  "path": "skills/model-validation/SKILL.md",
@@ -2963,8 +2973,8 @@
2963
2973
  },
2964
2974
  {
2965
2975
  "path": "skills/present-paper/SKILL.md",
2966
- "size": 29247,
2967
- "sha256": "aa8455317bd4996d5b1e0cc9d27c8be33b8112b7826749d87376c161d6cf87d5"
2976
+ "size": 33518,
2977
+ "sha256": "777197edf83a4d242508b366abe24681198467a95950b0acc385ac2714f33cb4"
2968
2978
  },
2969
2979
  {
2970
2980
  "path": "skills/present-paper/references/critic_rubrics/slide.md",
@@ -2981,11 +2991,41 @@
2981
2991
  "size": 15007,
2982
2992
  "sha256": "d0f964af7523ec8bfef50ca627878f8c2cfe58159c2a827c5f7dfd43585cf9ee"
2983
2993
  },
2994
+ {
2995
+ "path": "skills/present-paper/references/presentation_design_guidelines.md",
2996
+ "size": 7460,
2997
+ "sha256": "689d021ff7fc5e04abf93f3d6e1b0646bb5aa86430239b76b51d2224b3b92f0c"
2998
+ },
2984
2999
  {
2985
3000
  "path": "skills/present-paper/references/slide_design_principles.md",
2986
3001
  "size": 10436,
2987
3002
  "sha256": "7f2a5e03c8f2ddbb2d84a163506c5f3a2d1cca1353a694abd7bfb14225324826"
2988
3003
  },
3004
+ {
3005
+ "path": "skills/present-paper/references/slide_visual_styles/CATALOG.md",
3006
+ "size": 3177,
3007
+ "sha256": "78782ce6916212bcae9e1d8513197721a8ac72ed1faa33ce335b36d0c88a828f"
3008
+ },
3009
+ {
3010
+ "path": "skills/present-paper/references/slide_visual_styles/clinical_blue.md",
3011
+ "size": 3249,
3012
+ "sha256": "334ee770935b93a1ddf86c426a8f3da377d2618c0f6553308ccc38e871f26160"
3013
+ },
3014
+ {
3015
+ "path": "skills/present-paper/references/slide_visual_styles/dark_modern.md",
3016
+ "size": 3197,
3017
+ "sha256": "b5adf2331318a1276fa5eedf173a36c056d220fbfa04819a5c4d4d68b1109f68"
3018
+ },
3019
+ {
3020
+ "path": "skills/present-paper/references/slide_visual_styles/editorial_mono.md",
3021
+ "size": 3068,
3022
+ "sha256": "db07e84bbf4d248f0a4837fe22ed61c82e176fe94055b590131e24d25c59f20b"
3023
+ },
3024
+ {
3025
+ "path": "skills/present-paper/references/slide_visual_styles/institutional_brand.md",
3026
+ "size": 4431,
3027
+ "sha256": "c8b04c93bf61072fc6c4ee7d402e107d7375e933d60085bf325d74830896684a"
3028
+ },
2989
3029
  {
2990
3030
  "path": "skills/present-paper/references/slide_visual_styles/nature_lancet.md",
2991
3031
  "size": 7989,
@@ -3011,6 +3051,11 @@
3011
3051
  "size": 6758,
3012
3052
  "sha256": "6eeaf94c396d0f4ff365eaea0408f2fc00f8e2dc75b53a9514214194cb9329f9"
3013
3053
  },
3054
+ {
3055
+ "path": "skills/present-paper/scripts/inspect_pptx_template.py",
3056
+ "size": 5073,
3057
+ "sha256": "648fe3d2904a5ffffb41eb064d1780f605a672dc44618a74b5e3e59c023cb63d"
3058
+ },
3014
3059
  {
3015
3060
  "path": "skills/present-paper/scripts/strip_notes_for_sharing.py",
3016
3061
  "size": 5508,
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "schema_version": 1,
3
- "version": "4.11.0",
3
+ "version": "5.0.0",
4
4
  "owned_skills": [
5
5
  "academic-aio",
6
6
  "add-journal",
@@ -17,7 +17,6 @@
17
17
  "key": "data_study_design",
18
18
  "label": "Data & Study Design",
19
19
  "slugs": [
20
- "architecture-zoo",
21
20
  "calc-sample-size",
22
21
  "clean-data",
23
22
  "define-variables",
@@ -25,12 +24,19 @@
25
24
  "design-ai-benchmarking",
26
25
  "design-study",
27
26
  "generate-codebook",
27
+ "version-dataset"
28
+ ]
29
+ },
30
+ {
31
+ "key": "model_engineering",
32
+ "label": "Model Engineering & Validation",
33
+ "slugs": [
34
+ "architecture-zoo",
28
35
  "mllm-eval",
29
36
  "model-card",
30
37
  "model-evaluation",
31
38
  "model-scaffold",
32
- "model-validation",
33
- "version-dataset"
39
+ "model-validation"
34
40
  ]
35
41
  },
36
42
  {
@@ -132,8 +138,8 @@
132
138
  },
133
139
  {
134
140
  "slug": "architecture-zoo",
135
- "category": "data_study_design",
136
- "category_label": "Data & Study Design",
141
+ "category": "model_engineering",
142
+ "category_label": "Model Engineering & Validation",
137
143
  "layer": "D",
138
144
  "owner_domain": "architecture_reference",
139
145
  "maturity": "official",
@@ -366,8 +372,8 @@
366
372
  },
367
373
  {
368
374
  "slug": "mllm-eval",
369
- "category": "data_study_design",
370
- "category_label": "Data & Study Design",
375
+ "category": "model_engineering",
376
+ "category_label": "Model Engineering & Validation",
371
377
  "layer": "D",
372
378
  "owner_domain": "model_evaluation",
373
379
  "maturity": "official",
@@ -375,8 +381,8 @@
375
381
  },
376
382
  {
377
383
  "slug": "model-card",
378
- "category": "data_study_design",
379
- "category_label": "Data & Study Design",
384
+ "category": "model_engineering",
385
+ "category_label": "Model Engineering & Validation",
380
386
  "layer": "C",
381
387
  "owner_domain": "model_reporting",
382
388
  "maturity": "official",
@@ -384,8 +390,8 @@
384
390
  },
385
391
  {
386
392
  "slug": "model-evaluation",
387
- "category": "data_study_design",
388
- "category_label": "Data & Study Design",
393
+ "category": "model_engineering",
394
+ "category_label": "Model Engineering & Validation",
389
395
  "layer": "B",
390
396
  "owner_domain": "model_evaluation",
391
397
  "maturity": "official",
@@ -393,17 +399,17 @@
393
399
  },
394
400
  {
395
401
  "slug": "model-scaffold",
396
- "category": "data_study_design",
397
- "category_label": "Data & Study Design",
402
+ "category": "model_engineering",
403
+ "category_label": "Model Engineering & Validation",
398
404
  "layer": "B",
399
405
  "owner_domain": "model_development",
400
406
  "maturity": "official",
401
- "description": "Generate a reproducible, runnable PyTorch training repo for a medical-imaging segmentation task — the missing middle link between choosing an architecture and validating a trained model."
407
+ "description": "Generate a reproducible, runnable PyTorch training repo for a medical-imaging task — segmentation, classification, detection, image-to-image synthesis, or self-supervised pretraining the missing mid…"
402
408
  },
403
409
  {
404
410
  "slug": "model-validation",
405
- "category": "data_study_design",
406
- "category_label": "Data & Study Design",
411
+ "category": "model_engineering",
412
+ "category_label": "Model Engineering & Validation",
407
413
  "layer": "D",
408
414
  "owner_domain": "model_validation",
409
415
  "maturity": "official",
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "medsci-skills",
3
- "version": "4.11.0",
3
+ "version": "5.0.0",
4
4
  "description": "MedSci Skills — a medical/scientific research skill suite for AI coding agents (Claude Code, Codex, Cursor, Copilot). The npm package is a terminal-friendly installer shortcut; the canonical distribution remains the GitHub repository and the Claude Code plugin marketplace.",
5
5
  "license": "SEE LICENSE IN LICENSE",
6
6
  "homepage": "https://github.com/Aperivue/medsci-skills#readme",
@@ -56,6 +56,10 @@ a family card.
56
56
  ViT / Swin / DeiT.
57
57
  - `${CLAUDE_SKILL_DIR}/references/segmentation.md` — U-Net / 3-D U-Net / V-Net / Attention & Residual
58
58
  U-Net / nnU-Net / SegResNet / Swin-UNETR / Mask R-CNN.
59
+ - `${CLAUDE_SKILL_DIR}/references/detection.md` — R-CNN family / Faster R-CNN + FPN / Mask R-CNN /
60
+ RetinaNet / YOLO / DETR.
61
+ - `${CLAUDE_SKILL_DIR}/references/synthesis.md` — Pix2Pix / CycleGAN / SPADE / diffusion (DDPM, latent) /
62
+ VAE / fastMRI reconstruction.
59
63
  - `${CLAUDE_SKILL_DIR}/references/foundation_models.md` — SAM / MedSAM / MedSAM2 / TotalSegmentator /
60
64
  SegVol / BiomedCLIP / DINO / MAE / SimCLR / MoCo.
61
65
  Each card gives the paper, core idea, when-to-use, medical-imaging use, reference implementation, and the
@@ -0,0 +1,68 @@
1
+ # Detection architectures (architecture-zoo)
2
+
3
+ For "find and localise lesions" questions — boxes / points, a count, and a per-lesion
4
+ hit/miss (FROC). Distinct from segmentation (a pixel mask) and classification (a per-image
5
+ label): detection localises *instances*. `/model-scaffold --task detection` emits a
6
+ torchvision Faster R-CNN repo whose FROC/mAP you compute downstream.
7
+
8
+ Each card: **paper → core idea → when to use → medical-imaging use → reference impl →
9
+ validation/experiment setup.**
10
+
11
+ ---
12
+
13
+ ## Two-stage detectors (region proposal → classify)
14
+
15
+ ### R-CNN → Fast R-CNN → Faster R-CNN (+ FPN)
16
+ - **Papers**: Girshick et al., R-CNN, *CVPR* 2014; Girshick, Fast R-CNN, *ICCV* 2015; Ren
17
+ et al., Faster R-CNN, *NeurIPS* 2015; Lin et al., **FPN**, *CVPR* 2017.
18
+ - **Core idea**: Faster R-CNN adds a learned Region Proposal Network (end-to-end); FPN adds
19
+ a multi-scale feature pyramid so small and large lesions are both detected.
20
+ - **When to use**: the **default two-stage detector** for medical lesion detection — strong,
21
+ well-understood, good on small objects with FPN; favour accuracy over real-time speed.
22
+ - **Medical-imaging use**: nodule / lesion / aneurysm detection on CT / MR / mammography
23
+ (ResNet-FPN backbone).
24
+ - **Reference impl**: torchvision `fasterrcnn_resnet50_fpn`; MONAI detection (RetinaNet).
25
+ - **Validation setup**: report **FROC** (sensitivity per false-positive-per-scan) or **mAP
26
+ with the IoU match criterion stated**; per-lesion analysis with patient-level clustering
27
+ disclosed; not patient-level accuracy (`/model-validation` MD6).
28
+
29
+ ### Mask R-CNN (detect + segment instances)
30
+ - **Paper**: He et al., Mask R-CNN, *ICCV* 2017.
31
+ - **Core idea**: a mask head on Faster R-CNN → per-instance box + class + mask.
32
+ - **When to use**: **count + localise + delineate** separate lesions (instance-level), not a
33
+ single semantic mask (that is `segmentation.md`).
34
+ - **Reference impl**: torchvision `maskrcnn_resnet50_fpn`.
35
+ - **Validation setup**: detection metrics for the boxes + per-instance Dice for the masks.
36
+
37
+ ## One-stage / query-based detectors (faster, end-to-end)
38
+
39
+ ### RetinaNet (focal loss)
40
+ - **Paper**: Lin et al., "Focal Loss for Dense Object Detection," *ICCV* 2017.
41
+ - **Core idea**: a one-stage dense detector with **focal loss** to handle the extreme
42
+ foreground/background imbalance — relevant when lesions are sparse.
43
+ - **When to use**: faster than two-stage, strong under heavy class imbalance.
44
+ - **Reference impl**: torchvision `retinanet_resnet50_fpn`; MONAI detection.
45
+
46
+ ### YOLO family
47
+ - **Papers**: Redmon et al., YOLO, *CVPR* 2016; later YOLOv3+/YOLOX.
48
+ - **Core idea**: a single network predicts boxes + classes directly on a grid — real-time.
49
+ - **When to use**: speed-critical / interactive settings; usually two-stage detectors are
50
+ preferred for maximal sensitivity on small medical lesions.
51
+
52
+ ### DETR (transformer, set prediction)
53
+ - **Paper**: Carion et al., "End-to-End Object Detection with Transformers," *ECCV* 2020.
54
+ - **Core idea**: a transformer treats detection as direct **set prediction** (no anchors /
55
+ NMS) via learned object queries + bipartite matching.
56
+ - **When to use**: large datasets where an anchor-free, end-to-end pipeline is attractive;
57
+ more data-hungry and slower to converge than CNN detectors.
58
+ - **Reference impl**: the official DETR repo; Deformable DETR for faster convergence.
59
+
60
+ ---
61
+
62
+ ## Choosing among these
63
+ Default lesion detection → **Faster R-CNN + FPN** (torchvision; `/model-scaffold --task
64
+ detection`). Sparse lesions / imbalance → **RetinaNet (focal loss)**. Count + delineate
65
+ instances → **Mask R-CNN**. Speed-critical → **YOLO**. Large data, anchor-free → **DETR**.
66
+ Always report **FROC / mAP with the IoU criterion stated**, per-lesion with patient-level
67
+ clustering disclosed. Record the choice + paper, hand to `/model-scaffold`, validate with
68
+ `/model-validation` and `/model-evaluation`.
@@ -10,10 +10,10 @@ per-paper detail and the `/model-scaffold` template to instantiate.
10
10
  |---|---|---|
11
11
  | "is finding X present / which class" (per image / per patient) | **classification** (binary / multi-label) | `classification.md` |
12
12
  | "delineate / measure structure X" (pixel/voxel mask, volume, boundary) | **segmentation** | `segmentation.md` |
13
- | "find and localise lesions" (boxes / points, count, FROC) | **detection** | *(forthcoming — see segmentation's Mask R-CNN note)* |
13
+ | "find and localise lesions" (boxes / points, count, FROC) | **detection** | `detection.md` |
14
14
  | "I have few labels / want to pretrain on unlabelled scans" | **self-supervised pretraining → fine-tune** | `foundation_models.md` |
15
15
  | "adapt a released medical foundation model" | **transfer / prompt a foundation model** | `foundation_models.md` |
16
- | "synthesise / translate a modality" (MRI→CT, denoise) | **image-to-image / generative** | *(forthcoming)* |
16
+ | "synthesise / translate a modality" (MRI→CT, denoise) | **image-to-image / generative** | `synthesis.md` |
17
17
  | "generate a report / answer a visual question" | **multimodal LLM** | *(use `/mllm-eval`; not a CNN choice)* |
18
18
 
19
19
  ## Step 2 — let the constraints narrow it
@@ -0,0 +1,71 @@
1
+ # Image synthesis / translation architectures (architecture-zoo)
2
+
3
+ For "synthesise or translate a modality" questions — MRI→CT, non-contrast→contrast,
4
+ low-dose→full-dose, denoising, super-resolution, or generating training images.
5
+ `/model-scaffold --task synthesis` emits a Pix2Pix repo. **Caveat**: a synthetic image can
6
+ carry hallucinated structure, so a downstream-task or reader validation is mandatory (the
7
+ `image_synthesis.md` reviewer probe, IS1–IS4, owns this).
8
+
9
+ Each card: **paper → core idea → when to use → medical-imaging use → reference impl →
10
+ validation/experiment setup.**
11
+
12
+ ---
13
+
14
+ ## Conditional GANs (image-to-image)
15
+
16
+ ### Pix2Pix (paired) / CycleGAN (unpaired)
17
+ - **Papers**: Isola et al., Pix2Pix, *CVPR* 2017 (paired, U-Net generator + PatchGAN);
18
+ Zhu et al., CycleGAN, *ICCV* 2017 (unpaired, cycle-consistency).
19
+ - **Core idea**: a conditional GAN maps a source image to a target domain; Pix2Pix needs
20
+ **paired** (registered) images, CycleGAN works **unpaired** via cycle consistency.
21
+ - **When to use**: cross-modality translation when paired data exist (Pix2Pix) or do not
22
+ (CycleGAN — but it can hallucinate, so validate carefully).
23
+ - **Medical-imaging use**: MRI→CT for attenuation correction / planning, CBCT→CT, virtual
24
+ contrast, stain transfer in pathology; bone suppression on CXR (paired).
25
+ - **Reference impl**: the official pytorch-CycleGAN-and-pix2pix repo; `/model-scaffold
26
+ --task synthesis` emits a small Pix2Pix (U-Net generator + PatchGAN).
27
+ - **Validation setup**: image-fidelity metrics (SSIM / PSNR) are necessary but **not
28
+ sufficient** — add a **downstream-task** metric (does a model / clinician perform the
29
+ clinical task as well on synthetic as on real?) and disclose hallucination risk
30
+ (`image_synthesis.md` IS1–IS4).
31
+
32
+ ### SPADE / conditional generators
33
+ - **Paper**: Park et al., SPADE, *CVPR* 2019 (spatially-adaptive normalisation from a
34
+ semantic map).
35
+ - **When to use**: generating images conditioned on a segmentation map (e.g. lesion
36
+ insertion / data augmentation with controlled anatomy).
37
+ - **Medical-imaging use**: nodule / lesion synthesis for augmentation (with a perceptual
38
+ loss; Johnson et al. 2016).
39
+
40
+ ## Diffusion models (current SOTA for fidelity / diversity)
41
+
42
+ ### DDPM / latent diffusion
43
+ - **Papers**: Ho et al., DDPM, *NeurIPS* 2020; Rombach et al., latent diffusion, *CVPR*
44
+ 2022.
45
+ - **Core idea**: learn to reverse a gradual noising process; higher fidelity and mode
46
+ coverage than GANs, at higher compute.
47
+ - **When to use**: when sample quality / diversity matters and compute allows; increasingly
48
+ the default for medical image generation and reconstruction.
49
+ - **Reference impl**: MONAI `generative` (DiffusionModelUNet); HuggingFace `diffusers`.
50
+ - **Validation setup**: as GANs — fidelity + downstream-task + hallucination disclosure;
51
+ for reconstruction, compare against the acquired ground truth.
52
+
53
+ ## Reconstruction / restoration
54
+
55
+ ### VAE / U-Net restoration / fastMRI baselines
56
+ - **Papers**: Kingma & Welling, VAE, *ICLR* 2014; the fastMRI benchmark (Zbontar et al.
57
+ 2018) for MRI reconstruction.
58
+ - **When to use**: denoising, artefact removal, accelerated MRI reconstruction (often a
59
+ U-Net or unrolled model rather than a GAN).
60
+ - **Validation setup**: against the fully-sampled / full-dose reference, with a downstream
61
+ diagnostic metric.
62
+
63
+ ---
64
+
65
+ ## Choosing among these
66
+ Paired translation → **Pix2Pix** (`/model-scaffold --task synthesis`). Unpaired → **CycleGAN**
67
+ (validate for hallucination). Conditioned on a map / augmentation → **SPADE**. Highest fidelity,
68
+ compute available → **diffusion** (MONAI generative). Reconstruction / denoising → **U-Net /
69
+ unrolled / fastMRI baselines**. In every case, **image-fidelity metrics are not enough** — add a
70
+ downstream-task or reader validation and disclose hallucination risk. Record the choice + paper,
71
+ hand to `/model-scaffold`, validate with `/model-validation` (and the `image_synthesis` probe).
@@ -29,7 +29,7 @@ safety_boundaries:
29
29
  - "Advisory only: it writes a decision note, never code or weights; the build is /model-scaffold."
30
30
  - "Every recommendation names its source paper; benchmark numbers are cited, never invented; the zoo describes archetypes, not a live leaderboard."
31
31
  known_limitations:
32
- - "The literature moves fast; this is a curated archetype map (classification, segmentation, foundation/SSL families seeded), not an exhaustive or current SOTA ranking — additional families (detection, synthesis) land in later phases."
32
+ - "The literature moves fast; this is a curated archetype map (classification, segmentation, detection, synthesis, foundation/SSL families), not an exhaustive or current SOTA ranking."
33
33
  - "A sound architecture choice is necessary, not sufficient; validity still depends on the split, validation design, and metrics (/model-validation, /model-evaluation)."
34
34
  validation_commands:
35
35
  - "carry the decision note into /model-scaffold to instantiate the chosen template, then /model-validation"
@@ -1,14 +1,15 @@
1
1
  ---
2
2
  name: model-scaffold
3
3
  description: >
4
- Generate a reproducible, runnable PyTorch training repo for a medical-imaging segmentation task —
5
- the missing middle link between choosing an architecture and validating a trained model. Emits a
6
- patient-level seed-locked split as an auditable artifact, a configurable U-Net, train and evaluate
7
- scripts that seed every RNG and infer under eval mode, a config, requirements, a reproducibility
8
- record, and a Methods stub with VERIFY placeholders (no fabricated numbers). The reproducibility
9
- guarantees hold by construction, so the build is leakage-safe before any training runs. Integrates
10
- with MONAI, nnU-Net, and TorchIO it does not reimplement them.
11
- triggers: model scaffold, scaffold a model, training repo, PyTorch repo, build a model, train a segmentation model, U-Net, UNet, segmentation model, nnU-Net, MONAI, dataloader, train.py, patient-level split, reproducible training, seed everything, generate training code, medical imaging model
4
+ Generate a reproducible, runnable PyTorch training repo for a medical-imaging task — segmentation,
5
+ classification, detection, image-to-image synthesis, or self-supervised pretraining the missing
6
+ middle link between choosing an architecture and validating a trained model. Emits a patient-level
7
+ seed-locked split as an auditable artifact, a task-appropriate model, train and evaluate scripts that
8
+ seed every RNG and infer under eval mode, a config, requirements, a reproducibility record, and a
9
+ Methods stub with VERIFY placeholders (no fabricated numbers). The reproducibility guarantees hold by
10
+ construction, so the build is leakage-safe before any training runs. Integrates with MONAI, nnU-Net,
11
+ TorchIO, timm, and torchvision it does not reimplement them.
12
+ triggers: model scaffold, scaffold a model, training repo, PyTorch repo, build a model, train a model, segmentation, classification, detection, image synthesis, self-supervised, SimCLR, Pix2Pix, Faster R-CNN, U-Net, UNet, nnU-Net, MONAI, timm, torchvision, dataloader, train.py, patient-level split, reproducible training, seed everything, generate training code, medical imaging model
12
13
  tools: Read, Write, Edit, Bash, Grep, Glob
13
14
  model: inherit
14
15
  ---
@@ -17,7 +18,9 @@ model: inherit
17
18
 
18
19
  ## Purpose
19
20
 
20
- This skill stamps out a **runnable PyTorch training repo** for a medical-imaging segmentation task
21
+ This skill stamps out a **runnable PyTorch training repo** for a medical-imaging task — `--task`
22
+ **segmentation** (U-Net), **classification** (CNN / `timm` backbone), **detection** (torchvision Faster
23
+ R-CNN / FPN), **synthesis** (Pix2Pix generator + PatchGAN), or **ssl** (SimCLR encoder) —
21
24
  with the reproducibility guarantees **baked in by construction** — so the build is leakage-safe and
22
25
  reproducible before a single epoch runs. It is the imaging analogue of how `/analyze-stats` generates
23
26
  runnable statistical code: the generator produces the repo, you run the training on your GPU / Colab,
@@ -49,11 +52,14 @@ patient level off this column.
49
52
  ### Phase 2 — Generate the repo
50
53
  ```bash
51
54
  python3 ${CLAUDE_SKILL_DIR}/scripts/scaffold.py \
52
- --manifest <manifest.csv> --out model_repo --seed 42 \
55
+ --manifest <manifest.csv> --task segmentation --out model_repo --seed 42 \
53
56
  --in-channels 1 --out-channels 1
57
+ # --task = segmentation | classification | detection | synthesis | ssl
58
+ # (out-channels = num classes for classification, target channels for synthesis)
54
59
  ```
55
- This writes `model_repo/` with `config.yaml`, `model.py` (configurable U-Net), `dataset.py` (reads the
56
- frozen split), `losses.py` (Dice + BCE), `train.py`, `evaluate.py`, `requirements.txt`,
60
+ This writes `model_repo/` with `config.yaml`, `model.py` (the task's model — U-Net / CNN / Faster R-CNN
61
+ / Pix2Pix / SimCLR encoder), `dataset.py` (reads the frozen split), `losses.py` (task-appropriate),
62
+ `train.py`, `evaluate.py`, `requirements.txt`,
57
63
  `REPRODUCIBILITY.md`, `methods_stub.md`, and — the key artifact — `splits/split_assignment.csv` +
58
64
  `splits/split_seed.txt`. The split is **patient-disjoint by construction** (a deterministic group split)
59
65
  and the emitted code seeds every RNG, sets cuDNN deterministic, builds the training loader from the