landmarkdiff 0.2.3__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (46) hide show
  1. landmarkdiff/__init__.py +40 -0
  2. landmarkdiff/__main__.py +207 -0
  3. landmarkdiff/api_client.py +316 -0
  4. landmarkdiff/arcface_torch.py +583 -0
  5. landmarkdiff/audit.py +338 -0
  6. landmarkdiff/augmentation.py +293 -0
  7. landmarkdiff/benchmark.py +213 -0
  8. landmarkdiff/checkpoint_manager.py +361 -0
  9. landmarkdiff/cli.py +252 -0
  10. landmarkdiff/clinical.py +223 -0
  11. landmarkdiff/conditioning.py +278 -0
  12. landmarkdiff/config.py +358 -0
  13. landmarkdiff/curriculum.py +191 -0
  14. landmarkdiff/data.py +405 -0
  15. landmarkdiff/data_version.py +301 -0
  16. landmarkdiff/displacement_model.py +745 -0
  17. landmarkdiff/ensemble.py +330 -0
  18. landmarkdiff/evaluation.py +415 -0
  19. landmarkdiff/experiment_tracker.py +231 -0
  20. landmarkdiff/face_verifier.py +947 -0
  21. landmarkdiff/fid.py +244 -0
  22. landmarkdiff/hyperparam.py +347 -0
  23. landmarkdiff/inference.py +754 -0
  24. landmarkdiff/landmarks.py +432 -0
  25. landmarkdiff/log.py +90 -0
  26. landmarkdiff/losses.py +348 -0
  27. landmarkdiff/manipulation.py +651 -0
  28. landmarkdiff/masking.py +316 -0
  29. landmarkdiff/metrics_agg.py +313 -0
  30. landmarkdiff/metrics_viz.py +464 -0
  31. landmarkdiff/model_registry.py +362 -0
  32. landmarkdiff/morphometry.py +342 -0
  33. landmarkdiff/postprocess.py +600 -0
  34. landmarkdiff/py.typed +0 -0
  35. landmarkdiff/safety.py +395 -0
  36. landmarkdiff/synthetic/__init__.py +23 -0
  37. landmarkdiff/synthetic/augmentation.py +188 -0
  38. landmarkdiff/synthetic/pair_generator.py +208 -0
  39. landmarkdiff/synthetic/tps_warp.py +273 -0
  40. landmarkdiff/validation.py +324 -0
  41. landmarkdiff-0.2.3.dist-info/METADATA +1173 -0
  42. landmarkdiff-0.2.3.dist-info/RECORD +46 -0
  43. landmarkdiff-0.2.3.dist-info/WHEEL +5 -0
  44. landmarkdiff-0.2.3.dist-info/entry_points.txt +2 -0
  45. landmarkdiff-0.2.3.dist-info/licenses/LICENSE +21 -0
  46. landmarkdiff-0.2.3.dist-info/top_level.txt +1 -0
@@ -0,0 +1,1173 @@
1
+ Metadata-Version: 2.4
2
+ Name: landmarkdiff
3
+ Version: 0.2.3
4
+ Summary: Anatomically-conditioned latent diffusion for facial surgery outcome prediction from standard clinical photography
5
+ Author-email: dreamlessx <dreamlessx@users.noreply.github.com>
6
+ License: MIT
7
+ Project-URL: Homepage, https://github.com/dreamlessx/LandmarkDiff-public
8
+ Project-URL: Repository, https://github.com/dreamlessx/LandmarkDiff-public
9
+ Project-URL: Documentation, https://landmarkdiff.readthedocs.io
10
+ Project-URL: Issues, https://github.com/dreamlessx/LandmarkDiff-public/issues
11
+ Project-URL: Changelog, https://github.com/dreamlessx/LandmarkDiff-public/blob/main/CHANGELOG.md
12
+ Classifier: Development Status :: 4 - Beta
13
+ Classifier: Intended Audience :: Science/Research
14
+ Classifier: License :: OSI Approved :: MIT License
15
+ Classifier: Operating System :: OS Independent
16
+ Classifier: Programming Language :: Python :: 3
17
+ Classifier: Programming Language :: Python :: 3.10
18
+ Classifier: Programming Language :: Python :: 3.11
19
+ Classifier: Programming Language :: Python :: 3.12
20
+ Classifier: Topic :: Scientific/Engineering :: Image Processing
21
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
22
+ Classifier: Typing :: Typed
23
+ Requires-Python: >=3.10
24
+ Description-Content-Type: text/markdown
25
+ License-File: LICENSE
26
+ Requires-Dist: torch>=2.1.0
27
+ Requires-Dist: diffusers>=0.27.0
28
+ Requires-Dist: transformers>=4.38.0
29
+ Requires-Dist: accelerate>=0.27.0
30
+ Requires-Dist: safetensors>=0.4.0
31
+ Requires-Dist: mediapipe>=0.10.9
32
+ Requires-Dist: opencv-python>=4.9.0
33
+ Requires-Dist: numpy>=1.26.0
34
+ Requires-Dist: Pillow>=10.0.0
35
+ Requires-Dist: pyyaml>=6.0
36
+ Provides-Extra: train
37
+ Requires-Dist: wandb>=0.16.0; extra == "train"
38
+ Requires-Dist: webdataset>=0.2.0; extra == "train"
39
+ Requires-Dist: deepspeed>=0.13.0; extra == "train"
40
+ Requires-Dist: lpips>=0.1.4; extra == "train"
41
+ Requires-Dist: insightface>=0.7.3; extra == "train"
42
+ Requires-Dist: onnxruntime-gpu>=1.17.0; extra == "train"
43
+ Provides-Extra: eval
44
+ Requires-Dist: torch-fidelity>=0.3.0; extra == "eval"
45
+ Requires-Dist: lpips>=0.1.4; extra == "eval"
46
+ Requires-Dist: scikit-image>=0.22.0; extra == "eval"
47
+ Requires-Dist: scipy>=1.10.0; extra == "eval"
48
+ Provides-Extra: app
49
+ Requires-Dist: gradio>=4.15.0; extra == "app"
50
+ Requires-Dist: fastapi>=0.109.0; extra == "app"
51
+ Requires-Dist: uvicorn[standard]>=0.27.0; extra == "app"
52
+ Provides-Extra: dev
53
+ Requires-Dist: pytest>=8.0.0; extra == "dev"
54
+ Requires-Dist: pytest-cov>=4.1.0; extra == "dev"
55
+ Requires-Dist: ruff>=0.2.0; extra == "dev"
56
+ Requires-Dist: mypy>=1.8.0; extra == "dev"
57
+ Requires-Dist: types-PyYAML>=6.0.0; extra == "dev"
58
+ Requires-Dist: types-requests>=2.31.0; extra == "dev"
59
+ Requires-Dist: scipy>=1.10.0; extra == "dev"
60
+ Provides-Extra: gpu
61
+ Requires-Dist: torch>=2.1.0; extra == "gpu"
62
+ Requires-Dist: xformers>=0.0.23; extra == "gpu"
63
+ Requires-Dist: triton>=2.1.0; extra == "gpu"
64
+ Dynamic: license-file
65
+
66
+ <p align="center">
67
+ <picture>
68
+ <source media="(prefers-color-scheme: dark)" srcset="assets/logo.png">
69
+ <source media="(prefers-color-scheme: light)" srcset="assets/logo.png">
70
+ <img src="assets/logo.png" alt="LandmarkDiff" width="140" height="140">
71
+ </picture>
72
+ </p>
73
+ <h1 align="center">LandmarkDiff</h1>
74
+ <p align="center">
75
+ <em>Photorealistic facial surgery outcome prediction from a single photo</em>
76
+ </p>
77
+
78
+ <p align="center">
79
+ <a href="https://github.com/dreamlessx/LandmarkDiff-public/actions/workflows/ci.yml"><img src="https://github.com/dreamlessx/LandmarkDiff-public/actions/workflows/ci.yml/badge.svg" alt="CI"></a>
80
+ <a href="https://codecov.io/gh/dreamlessx/LandmarkDiff-public"><img src="https://codecov.io/gh/dreamlessx/LandmarkDiff-public/graph/badge.svg" alt="codecov"></a>
81
+ <a href="https://dreamlessx.github.io/LandmarkDiff-public/"><img src="https://img.shields.io/badge/docs-GitHub%20Pages-blue" alt="Docs"></a>
82
+ <a href="https://pypi.org/project/landmarkdiff/"><img src="https://img.shields.io/pypi/v/landmarkdiff.svg" alt="PyPI version"></a>
83
+ <a href="https://opensource.org/licenses/MIT"><img src="https://img.shields.io/badge/License-MIT-yellow.svg" alt="License: MIT"></a>
84
+ <a href="https://www.python.org/downloads/"><img src="https://img.shields.io/badge/python-3.10%20|%203.11%20|%203.12-blue.svg" alt="Python 3.10 | 3.11 | 3.12"></a>
85
+ <a href="https://pytorch.org/"><img src="https://img.shields.io/badge/pytorch-2.1+-ee4c2c.svg" alt="PyTorch 2.1+"></a>
86
+ <a href="https://huggingface.co/spaces/dreamlessx/LandmarkDiff"><img src="https://img.shields.io/badge/%F0%9F%A4%97-Live%20Demo-yellow" alt="Hugging Face Space"></a>
87
+ <a href="https://github.com/astral-sh/ruff"><img src="https://img.shields.io/badge/code%20style-ruff-000000.svg" alt="Code style: ruff"></a>
88
+ </p>
89
+
90
+ Photorealistic facial surgery outcome prediction from a single photo, powered by anatomically-conditioned latent diffusion.
91
+
92
+ <table align="center">
93
+ <tr>
94
+ <td width="50%" valign="top">
95
+
96
+ **Input & Output**
97
+ - Single 2D photo -- any clinical photo or phone selfie
98
+ - Photorealistic post-op prediction
99
+ - Just a phone -- no depth sensors, no clinical equipment
100
+
101
+ </td>
102
+ <td width="50%" valign="top">
103
+
104
+ **Capabilities**
105
+ - **6 procedures** -- rhinoplasty, blepharoplasty, rhytidectomy, orthognathic, brow lift, mentoplasty
106
+ - **4 inference modes** -- TPS (CPU), img2img, ControlNet, ControlNet+IP
107
+ - **5 clinical flags** -- vitiligo, Bell's palsy, keloid, Ehlers-Danlos, Fitzpatrick-stratified eval
108
+
109
+ </td>
110
+ </tr>
111
+ </table>
112
+
113
+ ### Where We're Headed
114
+
115
+ The 2D pipeline ships now and works well. The end goal is full 3D: you hold up your phone, slowly rotate your head, and we reconstruct a 3D face model from that video alone. Surgical deformations then happen in 3D space -- anatomically grounded, not pixel-level warping -- and you get an interactive model you can rotate to see the predicted result from any angle. No depth sensors, no clinical scanning rigs. Just a phone camera and a short video. See the [Roadmap](#roadmap) for details on each step.
116
+
117
+ LandmarkDiff extracts MediaPipe's 478-point face mesh from the input photo, applies procedure-specific Gaussian RBF deformations calibrated from anthropometric surgical data, renders the deformed mesh as a tessellation wireframe, and feeds that wireframe into a ControlNet-conditioned Stable Diffusion 1.5 backbone to synthesize the predicted face. The output is composited back onto the original image using Laplacian pyramid blending with feathered surgical masks, then refined through neural face restoration and identity verification.
118
+
119
+ > **Paper:** "LandmarkDiff: Anatomically-Conditioned Latent Diffusion for Photorealistic Facial Surgery Outcome Prediction," targeting MICCAI 2026.
120
+
121
+ <p align="center">
122
+ <img src="demos/demo_pipeline_0.png" alt="LandmarkDiff pipeline" width="90%">
123
+ <br>
124
+ <em>Full pipeline: input photo -- landmark extraction -- mesh deformation -- ControlNet synthesis -- compositing</em>
125
+ </p>
126
+
127
+ ### Try the Live Demo
128
+
129
+ <p align="center">
130
+ <a href="https://huggingface.co/spaces/dreamlessx/LandmarkDiff">
131
+ <img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Live%20Demo-yellow?style=for-the-badge" alt="Try the Live Demo">
132
+ </a>
133
+ </p>
134
+
135
+ Runs entirely on CPU, no GPU or local install needed. Upload a photo, pick a procedure, adjust intensity, and see the predicted result with symmetry analysis in seconds.
136
+
137
+ ```bash
138
+ # Quick install
139
+ pip install -e ".[train,eval,app,dev]"
140
+
141
+ # Run a prediction
142
+ python scripts/run_inference.py photo.jpg --procedure rhinoplasty --intensity 60 --mode controlnet
143
+ ```
144
+
145
+ <br>
146
+
147
+ ---
148
+
149
+ ## Table of Contents
150
+
151
+ - [Features](#features)
152
+ - [Why LandmarkDiff](#why-landmarkdiff)
153
+ - [Supported Procedures](#supported-procedures)
154
+ - [How It Works](#how-it-works)
155
+ - [Demo Outputs](#demo-outputs)
156
+ - [Quick Start](#quick-start)
157
+ - [Inference Modes](#inference-modes)
158
+ - [Gradio Web Demo](#gradio-web-demo)
159
+ - [Symmetry Analysis](#symmetry-analysis)
160
+ - [Training](#training)
161
+ - [Evaluation and Metrics](#evaluation-and-metrics)
162
+ - [Clinical Edge Cases](#clinical-edge-cases)
163
+ - [Post-Processing Pipeline](#post-processing-pipeline)
164
+ - [Project Structure](#project-structure)
165
+ - [Configuration](#configuration)
166
+ - [Benchmarks](#benchmarks)
167
+ - [Model Zoo](#model-zoo)
168
+ - [Requirements](#requirements)
169
+ - [Docker](#docker)
170
+ - [Make Targets](#make-targets)
171
+ - [Roadmap](#roadmap)
172
+ - [Citation](#citation)
173
+ - [Contributors](#contributors)
174
+ - [Contributing](#contributing)
175
+ - [License](#license)
176
+ - [Acknowledgments](#acknowledgments)
177
+
178
+ <br>
179
+
180
+ ---
181
+
182
+ ## Features
183
+
184
+ - **Single-photo input** -- works from any 2D clinical photograph or phone selfie, no 3D scanning hardware needed
185
+ - **6 surgical procedure presets** -- rhinoplasty, blepharoplasty, rhytidectomy, orthognathic surgery, brow lift, mentoplasty (extensible to custom procedures)
186
+ - **4 inference modes** -- TPS (instant CPU), img2img, ControlNet, and ControlNet+IP-Adapter with configurable quality/speed tradeoffs
187
+ - **MediaPipe 478-point face mesh** -- anatomically grounded landmark extraction for precise deformation control
188
+ - **Gaussian RBF deformation engine** -- smooth, spatially weighted displacements calibrated from anthropometric surgical data
189
+ - **ControlNet-conditioned generation** -- photorealistic texture synthesis via Stable Diffusion 1.5 with wireframe conditioning
190
+ - **Neural post-processing** -- CodeFormer face restoration, Real-ESRGAN upscaling, LAB histogram matching, Laplacian pyramid blending
191
+ - **ArcFace identity verification** -- ensures the predicted face preserves patient identity (cosine similarity check)
192
+ - **Clinical edge-case handling** -- built-in support for vitiligo, Bell's palsy, keloid-prone skin, and Ehlers-Danlos syndrome
193
+ - **Fitzpatrick-stratified evaluation** -- all metrics (FID, LPIPS, SSIM, NME, identity) broken down by skin type I through VI
194
+ - **Intensity slider (0-100%)** -- preview subtle through aggressive versions of any procedure
195
+ - **Gradio web demo** -- 5-tab interface with single procedure, multi-procedure comparison, intensity sweep, face analysis, and multi-angle capture
196
+ - **HPC training pipeline** -- SLURM scripts with preemption checkpointing, DDP multi-GPU, curriculum training configs
197
+ - **Docker and Apptainer support** -- CPU and GPU container images for reproducible deployment
198
+ - **PEP 561 typed package** -- ships with `py.typed` marker for downstream type checking
199
+
200
+ ---
201
+
202
+ ## Why LandmarkDiff
203
+
204
+ ### The Clinical Need
205
+
206
+ Facial cosmetic surgery is one of the most common elective procedures worldwide. The American Society of Plastic Surgeons (ASPS) reported [15.6 million cosmetic procedures in the US in 2020](https://www.plasticsurgery.org/news/plastic-surgery-statistics), with rhinoplasty and blepharoplasty consistently ranking among the top 5 surgical procedures. These numbers have only grown since.
207
+
208
+ The problem is expectation management. Roughly 10--15% of rhinoplasty patients seek revision surgery, and a significant driver is the gap between what patients expected and what they got (Rohrich & Ahmad, "A Practical Approach to Rhinoplasty," *Plastic and Reconstructive Surgery*, 2016). Preoperative visualization directly affects satisfaction -- patients who see a realistic preview report better alignment between expectations and results (Kandathil et al., "Examining Preoperative Expectations and Postoperative Satisfaction in Rhinoplasty Patients," *Facial Plastic Surgery & Aesthetic Medicine*, 2021). Systematic reviews of patient-reported outcomes in rhinoplasty confirm that expectation alignment is a key predictor of satisfaction (Leong & Iglesias, "A systematic review of patient-reported outcome measures in aesthetic and functional rhinoplasty," *Journal of Plastic, Reconstructive & Aesthetic Surgery*, 2016).
209
+
210
+ But here's the catch: the tools that produce good visualizations are expensive, proprietary, or both. Most surgeons -- especially outside wealthy urban practices -- don't have access to them.
211
+
212
+ ### Existing Tools and Their Limitations
213
+
214
+ **Tier 1: Clinical 3D Simulation**
215
+
216
+ - **Canfield Scientific VECTRA** (~$30-100K) -- Dedicated structured-light 3D scanner paired with Mirror simulation software. The gold standard in top-tier practices. Produces accurate surface meshes with Face Sculptor for tissue movement simulation. Requires trained operators, expensive hardware, and in-office capture. Proprietary with no published validation studies on prediction accuracy. [Website](https://www.canfieldsci.com/imaging-systems/vectra-xt-3d-imaging-system/)
217
+ - **Crisalix** (~$200-500/mo) -- Cloud-based 3D simulation from 2D photos. 17 years in market, PE-backed (BID Equity). Supports breast and face procedures. Uses geometric morphing, not AI or diffusion. More accessible than VECTRA, but subscription-based, proprietary, and there's no open evaluation of its fidelity. [Website](https://www.crisalix.com)
218
+ - **AEDIT** ($60/mo consumer) -- Phone-based 3D scanning using 100+ photos via TrueDepth camera. Patented morphing with "100,000 facial recognition points." Covers rhinoplasty, lip filler, brow lift, and Botox simulation. Multiple patents on 3D reconstruction from phone input. Consumer-first approach, iOS only. [Website](https://aedit.com/aeditor-app)
219
+
220
+ **Tier 2: Practice Management + Lite Simulation**
221
+
222
+ - **FaceTouchUp** (~$50-100/mo) -- 2D morphing tool with AR overlay. Affordable and quick for consultations, but results look like warped photographs because that's exactly what they are -- geometric transforms with no understanding of how skin, light, or tissue actually behave. [Website](https://www.facetouchup.com)
223
+ - **TouchMD / Symplast / Consentz** -- EMR and practice management platforms with basic photo ghosting or overlay features, not true surgical simulation.
224
+
225
+ **Tier 3: Consumer Beauty Tech**
226
+
227
+ - **Perfect Corp** -- AI-powered face reshape for beauty and med spa applications. Focused on fillers and Botox visualization, not structural surgical prediction. [Website](https://www.perfectcorp.com)
228
+ - **GlamAR** -- Virtual try-on API for beauty brands. Cosmetics overlay layer, not surgical simulation. [Website](https://www.glamar.io)
229
+
230
+ **Academic approaches:**
231
+
232
+ Most recent academic work on face manipulation focuses on generic editing (make someone look older, change their expression, swap identities) rather than surgery-specific prediction. A few notable examples:
233
+
234
+ - **DiscoFaceGAN** (Deng et al., CVPR 2020) -- Disentangled controllable face generation using 3DMM coefficients. Powerful for attribute editing, but designed for general-purpose face manipulation, not surgical planning. No procedure-specific deformation models.
235
+ - **FaceShifter** (Li et al., 2019) -- High-fidelity face swapping with occlusion awareness. Impressive identity transfer, but the goal is swapping one person's face onto another, not simulating what a surgical procedure would do to the same person.
236
+ - **DiffFace** (Kim et al., 2022) -- Diffusion-based face swapping with facial guidance. Shows the potential of diffusion models for face manipulation, but targets identity transfer, not surgical outcome prediction.
237
+
238
+ The common thread: none of the commercial tools use diffusion models (all rely on geometric warping or morphing), almost none of the academic work uses real surgical data to drive deformations, none evaluates fairness across skin tones, and none handles clinical edge cases like Bell's palsy or keloid-prone skin.
239
+
240
+ | Feature | Canfield VECTRA | Crisalix | AEDIT | FaceTouchUp | **LandmarkDiff** |
241
+ |---------|-----------------|----------|-------|-------------|-------------------|
242
+ | Input | $50K+ scanner | Photos | Phone (iOS) | Photos | **Any phone** |
243
+ | Method | Geometric warp | Geometric morph | Patented morph | 2D pixel push | **ControlNet diffusion** |
244
+ | Output quality | High (3D mesh) | Medium (3D morph) | Medium (morph) | Low (pixel warp) | **High (photorealistic)** |
245
+ | Procedures | Many | Breast + face | Face + injectables | Manual any | **6 facial** |
246
+ | Price | $30-100K | ~$200-500/mo | Free/$60/mo | $50-100/mo | **Free (MIT)** |
247
+ | Open source | No | No | No | No | **Yes** |
248
+ | Published research | No | No | No | No | **Yes (arXiv)** |
249
+ | Diffusion-based | No | No | No | No | **Yes** |
250
+ | Fairness eval | No | No | No | No | **Fitzpatrick I-VI** |
251
+
252
+ ### What Makes LandmarkDiff Different
253
+
254
+ LandmarkDiff is not trying to compete with VECTRA on 3D accuracy -- we're solving a different problem. We want to make surgery visualization accessible to any surgeon with a phone and any patient who walks into a consultation, while being honest about what the tool can and can't do.
255
+
256
+ **No existing tool uses diffusion models.** Every competitor in the comparison table above relies on geometric warping or morphing. LandmarkDiff is the first published system to apply ControlNet-conditioned latent diffusion to surgical outcome prediction, producing photorealistic texture synthesis rather than geometric pixel manipulation. Combined with open-source access, published research, and Fitzpatrick-stratified fairness evaluation, this positions LandmarkDiff as both the most technically advanced and most transparent surgical visualization system available.
257
+
258
+ Concretely:
259
+
260
+ - **Open source (MIT license).** Unlike every commercial tool listed above, you can inspect, modify, and extend the code. If you don't trust the output, you can trace exactly how it was generated.
261
+ - **Single 2D photo input.** No $50K+ hardware, no multi-view capture rigs. A standard clinical photograph or phone selfie is enough.
262
+ - **Anatomically grounded deformations.** Procedure-specific landmark displacements are fitted from real surgical data (pre/post pairs), not hand-tuned or based on generic face editing semantics.
263
+ - **Diffusion-based photorealism.** ControlNet-guided Stable Diffusion produces realistic skin texture, lighting, and shadows -- not geometric morphs.
264
+ - **Clinical edge-case handling.** Built-in flags and modified behavior for vitiligo, Bell's palsy, keloid-prone skin, and Ehlers-Danlos syndrome.
265
+ - **Fitzpatrick-stratified fairness evaluation.** All metrics are broken down by Fitzpatrick skin type (I--VI) to catch and prevent performance disparities across skin tones.
266
+ - **Roadmap toward 3D.** We're working on phone-video-to-3D reconstruction to eventually provide accessible 3D visualization without Vectra-class hardware.
267
+
268
+ **Honest limitations:** We don't have prospective clinical validation yet (that's planned). Our deformation model is calibrated from a limited dataset. We currently produce 2D output, not 3D. And diffusion models can hallucinate details, so outputs should always be reviewed by a clinician before showing to patients. This is a research tool, not a medical device. The comparison above reflects publicly available information as of March 2026. Commercial tools may have undisclosed technical capabilities.
269
+
270
+ ### References
271
+
272
+ 1. American Society of Plastic Surgeons. [2020 Plastic Surgery Statistics Report](https://www.plasticsurgery.org/news/plastic-surgery-statistics). ASPS, 2021.
273
+ 2. Rohrich RJ, Ahmad J. "A Practical Approach to Rhinoplasty." *Plastic and Reconstructive Surgery*. 2016;137(4):725e--746e.
274
+ 3. Kandathil CK, et al. "Examining Preoperative Expectations and Postoperative Satisfaction in Rhinoplasty Patients: A Single-Center Study." *Facial Plastic Surgery & Aesthetic Medicine*. 2021;23(1):33--38.
275
+ 4. Leong SC, Iglesias MA. "A systematic review of patient-reported outcome measures in aesthetic and functional rhinoplasty." *Journal of Plastic, Reconstructive & Aesthetic Surgery*. 2016;69(12):1635--1645.
276
+ 5. Deng Y, et al. "Disentangled and Controllable Face Image Generation via 3D Imitative-Contrastive Learning." CVPR 2020.
277
+ 6. Li L, et al. "FaceShifter: Towards High Fidelity And Occlusion Aware Face Swapping." arXiv:1912.13457, 2019.
278
+ 7. Kim K, et al. "DiffFace: Diffusion-based Face Swapping with Facial Guidance." arXiv:2212.13344, 2022.
279
+
280
+ ---
281
+
282
+ ## Supported Procedures
283
+
284
+ LandmarkDiff ships with six procedure presets, each targeting specific anatomical regions with calibrated displacement vectors.
285
+
286
+ ### Rhinoplasty (Nose Reshaping)
287
+
288
+ Targets 24 landmarks across the nasal bridge, tip, and alar base. Key deformations include alar base narrowing (nostril width reduction), tip refinement with upward rotation, and dorsal hump reduction. Uses a 30px Gaussian RBF influence radius for smooth transitions across the nasal region.
289
+
290
+ **Landmark indices:** 1, 2, 4, 5, 6, 19, 94, 141, 168, 195, 197, 236, 240, 274, 275, 278, 279, 294, 326, 327, 360, 363, 370, 456, 460
291
+
292
+ ### Blepharoplasty (Eyelid Surgery)
293
+
294
+ Targets 28 landmarks around the upper and lower eyelids. Deformations include upper lid elevation (hooded eye correction), medial and lateral canthal tapering, and lower lid tightening. Uses a tighter 15px influence radius to avoid affecting surrounding structures like the brow.
295
+
296
+ **Landmark indices:** 33, 7, 163, 144, 145, 153, 154, 155, 157, 158, 159, 160, 161, 246, 362, 382, 381, 380, 374, 373, 390, 249, 263, 466, 388, 387, 386, 385, 384, 398
297
+
298
+ ### Rhytidectomy (Facelift)
299
+
300
+ Targets 32 landmarks along the jawline, cheeks, and periauricular region. Deformations include jowl lifting (upward and lateral traction), submental tightening, and gentle temple lifting to simulate tissue redistribution. Uses a wider 40px influence radius for the broad soft tissue mobilization typical of facelifts.
301
+
302
+ **Landmark indices:** 10, 21, 54, 58, 67, 93, 103, 109, 127, 132, 136, 150, 162, 172, 176, 187, 207, 213, 234, 284, 297, 323, 332, 338, 356, 361, 365, 379, 389, 397, 400, 427, 454
303
+
304
+ ### Orthognathic Surgery (Jaw Repositioning)
305
+
306
+ Targets 47 landmarks across the mandible, maxilla, and chin. Deformations simulate mandibular advancement or setback, chin projection changes, and lateral jaw narrowing. Uses a 35px influence radius. Note that identity loss is disabled for orthognathic predictions because jaw repositioning inherently changes facial proportions more than the other procedures.
307
+
308
+ **Landmark indices:** 0, 17, 18, 36, 37, 39, 40, 57, 61, 78, 80, 81, 82, 84, 87, 88, 91, 95, 146, 167, 169, 170, 175, 181, 191, 200, 201, 202, 204, 208, 211, 212, 214, 269, 270, 291, 311, 312, 317, 321, 324, 325, 375, 396, 405, 407, 415
309
+
310
+ ### Brow Lift
311
+
312
+ Targets 19 landmarks across the left and right brows and the upper forehead. Lateral brow landmarks receive progressively stronger upward displacement (weighted 0.7 to 1.1), simulating the lateral brow peak that defines a youthful arch. Forehead landmarks get a gentler lift with a wider influence radius (1.2x) for smooth tissue redistribution. Uses a 25px influence radius.
313
+
314
+ **Landmark indices:** 70, 63, 105, 66, 107, 300, 293, 334, 296, 336, 9, 8, 10, 109, 67, 103, 338, 297, 332
315
+
316
+ *Contributed by [@Deepak8858](https://github.com/Deepak8858) in [#35](https://github.com/dreamlessx/LandmarkDiff-public/pull/35).*
317
+
318
+ ### Mentoplasty (Chin Surgery)
319
+
320
+ Targets 8 landmarks on the chin tip, lower contour, and jaw angles. The chin tip (landmarks 152, 175) receives the strongest advancement, the lower contour follows with softer displacement at a tighter radius (0.8x), and the jaw angles get minimal pull (0.6x radius) for a natural transition. Uses a 25px influence radius.
321
+
322
+ **Landmark indices:** 148, 149, 150, 152, 171, 175, 176, 377
323
+
324
+ *Contributed by [@P-r-e-m-i-u-m](https://github.com/P-r-e-m-i-u-m) in [#36](https://github.com/dreamlessx/LandmarkDiff-public/pull/36).*
325
+
326
+ ### Adding Your Own Procedure
327
+
328
+ You can define custom procedures by specifying which landmarks to move, how far, and in what direction. See [docs/tutorials/custom_procedures.md](docs/tutorials/custom_procedures.md) for a step-by-step guide.
329
+
330
+ ---
331
+
332
+ ## How It Works
333
+
334
+ LandmarkDiff is a five-stage pipeline. Each stage is independently testable and swappable.
335
+
336
+ ```mermaid
337
+ graph LR
338
+ classDef current fill:#2563eb,stroke:#1e40af,color:#fff,stroke-width:2px
339
+ classDef postproc fill:#1d4ed8,stroke:#1e3a8a,color:#fff,stroke-width:2px
340
+ classDef planned fill:#f59e0b,stroke:#d97706,color:#fff,stroke-width:2px,stroke-dasharray:6 3
341
+ classDef io fill:#0f172a,stroke:#334155,color:#fff,stroke-width:2px
342
+
343
+ A["Input Photo<br/>(512x512)"]:::io
344
+ B["MediaPipe<br/>Face Mesh<br/><i>478 landmarks</i>"]:::current
345
+ C["Gaussian RBF<br/>Deformation<br/><i>procedure-specific</i>"]:::current
346
+ D["Conditioning<br/>Generation<br/><i>wireframe + edges</i>"]:::current
347
+ E["ControlNet +<br/>Stable Diff 1.5<br/><i>CrucibleAI model</i>"]:::current
348
+ G["Output<br/>Prediction"]:::io
349
+
350
+ A --> B --> C --> D --> E
351
+
352
+ subgraph postprocess ["Post-Processing"]
353
+ F1["CodeFormer<br/>Restoration"]:::postproc
354
+ F2["Real-ESRGAN<br/>Upscaling"]:::postproc
355
+ F3["LAB Histogram<br/>Matching"]:::postproc
356
+ F4["Laplacian<br/>Blending"]:::postproc
357
+ F5["ArcFace<br/>Identity Check"]:::postproc
358
+ end
359
+
360
+ E --> F1 --> F2 --> F3 --> F4 --> F5 --> G
361
+
362
+ subgraph future ["Planned -- 3D Extension"]
363
+ H1["Phone Video<br/>Capture"]:::planned
364
+ H2["FLAME 3D<br/>Reconstruction"]:::planned
365
+ H3["3D Surgical<br/>Deformation"]:::planned
366
+ H4["Multi-Angle<br/>Rendering"]:::planned
367
+ H5["Interactive<br/>3D Viewer"]:::planned
368
+ end
369
+
370
+ H1 -.-> H2 -.-> H3 -.-> H4 -.-> H5
371
+ ```
372
+
373
+ ### Stage 1: Landmark Extraction
374
+
375
+ MediaPipe Face Mesh detects 478 facial landmarks in 3D (x, y, z normalized coordinates) at roughly 30 fps on CPU. The landmarks are grouped into anatomical regions:
376
+
377
+ | Region | Landmark count |
378
+ |--------|---------------|
379
+ | Jawline | 33 |
380
+ | Left eye | 16 |
381
+ | Right eye | 16 |
382
+ | Left eyebrow | 10 |
383
+ | Right eyebrow | 10 |
384
+ | Nose | 25 |
385
+ | Lips | 22 |
386
+ | Left iris | 5 |
387
+ | Right iris | 5 |
388
+ | Face oval | 37 |
389
+
390
+ The extraction runs at the start of every prediction and again on the output for evaluation (NME metric).
391
+
392
+ ### Stage 2: Gaussian RBF Deformation
393
+
394
+ Each procedure preset defines a set of `DeformationHandle` objects, each specifying:
395
+ - **Which landmark** to move (index into the 478-point mesh)
396
+ - **How far** to move it (pixel displacement vector, scaled by the intensity slider)
397
+ - **How wide** the influence is (Gaussian RBF radius in pixels)
398
+
399
+ The deformation is applied as a smooth, spatially weighted field. Landmarks near the handle move the most; landmarks far away are unaffected. This prevents the jarring discontinuities you get from simple point-to-point warping.
400
+
401
+ All displacement magnitudes are scaled by the `intensity` parameter (0 to 100), so you can preview subtle through aggressive versions of the same procedure.
402
+
403
+ ### Stage 3: Conditioning Generation
404
+
405
+ The deformed landmarks are rendered into conditioning images for ControlNet:
406
+
407
+ 1. **Tessellation wireframe** - The full 2556-edge MediaPipe face mesh drawn on a black canvas. This is the primary conditioning signal. It uses a static anatomical adjacency list (not Delaunay triangulation), so the topology is invariant to landmark displacement.
408
+
409
+ 2. **Adaptive Canny edges** - Edge detection with thresholds derived from the image median (low = 0.66 * median, high = 1.33 * median). This adapts to different skin tones without hardcoded thresholds, plus morphological skeletonization to produce 1-pixel edges that ControlNet expects.
410
+
411
+ 3. **Surgical mask** - A feathered mask indicating where the procedure affects the face. Built from the convex hull of procedure-specific landmarks, dilated, Gaussian-feathered, then perturbed with Perlin-style boundary noise (2-4px) to prevent visible seam lines.
412
+
413
+ ### Stage 4: Diffusion Generation
414
+
415
+ The conditioning images are fed to CrucibleAI's pre-trained ControlNet for MediaPipe Face, which conditions Stable Diffusion 1.5 to generate a face matching the deformed mesh topology. Procedure-specific text prompts emphasize clinical photography qualities (natural appearance, sharp focus, studio lighting).
416
+
417
+ ### Stage 5: Post-Processing
418
+
419
+ Six-step refinement:
420
+ 1. **CodeFormer** neural face restoration (fidelity weight 0.7 for quality-fidelity balance)
421
+ 2. **Real-ESRGAN** background super-resolution (non-face regions only)
422
+ 3. **Histogram matching** in LAB color space for robust skin tone transfer from input to output
423
+ 4. **Frequency-aware sharpening** on the L channel only (avoids color fringing)
424
+ 5. **Laplacian pyramid blending** (6 levels) - low frequencies blend smoothly for lighting continuity, high frequencies transition sharply for texture/pore preservation
425
+ 6. **ArcFace identity verification** - flags if the output drifts too far from the input identity (cosine similarity threshold 0.6)
426
+
427
+ ---
428
+
429
+ ## Demo Outputs
430
+
431
+ ### Pipeline Visualization
432
+
433
+ <p align="center">
434
+ <img src="demos/demo_pipeline_0.png" alt="Pipeline demo -- rhinoplasty on diverse faces" width="90%">
435
+ </p>
436
+
437
+ <p align="center">
438
+ <img src="demos/demo_pipeline_1.png" alt="Pipeline demo -- rhinoplasty result" width="90%">
439
+ </p>
440
+
441
+ Each image shows five pipeline stages: **Input | Original Mesh | Manipulated Mesh | Surgical Mask | TPS-warped Result**. These are geometric-only (TPS mode, CPU) outputs; ControlNet photorealistic results will be added after training completes.
442
+
443
+ <br>
444
+
445
+ ---
446
+
447
+ ## Quick Start
448
+
449
+ ### Installation
450
+
451
+ **Prerequisites:** Python 3.10+ and PyTorch 2.1+ ([install guide](https://pytorch.org)). GPU with 6GB+ VRAM recommended for neural modes; CPU works for TPS mode.
452
+
453
+ ```bash
454
+ git clone https://github.com/dreamlessx/LandmarkDiff-public.git
455
+ cd LandmarkDiff-public
456
+
457
+ # Core (inference only)
458
+ pip install -e .
459
+
460
+ # With training dependencies
461
+ pip install -e ".[train]"
462
+
463
+ # With Gradio demo
464
+ pip install -e ".[app]"
465
+
466
+ # With evaluation metrics
467
+ pip install -e ".[eval]"
468
+
469
+ # With GPU optimizations (xformers, triton)
470
+ pip install -e ".[gpu]"
471
+
472
+ # Everything
473
+ pip install -e ".[train,eval,app,dev]"
474
+ ```
475
+
476
+ ### Run a single prediction
477
+
478
+ ```bash
479
+ python scripts/run_inference.py /path/to/face.jpg \
480
+ --procedure rhinoplasty \
481
+ --intensity 60 \
482
+ --mode controlnet
483
+ ```
484
+
485
+ This will:
486
+ 1. Detect the face and extract 478 landmarks
487
+ 2. Apply rhinoplasty deformation at 60% intensity
488
+ 3. Generate the ControlNet-conditioned prediction
489
+ 4. Composite the result back onto the original
490
+ 5. Save the output to `output/result.png`
491
+
492
+ ### CPU-only mode (no GPU needed)
493
+
494
+ ```bash
495
+ python examples/tps_only.py /path/to/face.jpg \
496
+ --procedure rhinoplasty \
497
+ --intensity 60
498
+ ```
499
+
500
+ TPS mode does pure geometric warping. It runs instantly on CPU and produces a geometrically accurate result, but without the photorealistic texture synthesis that the diffusion modes provide.
501
+
502
+ ### Batch processing
503
+
504
+ ```bash
505
+ python examples/batch_inference.py /path/to/image_dir/ \
506
+ --procedure blepharoplasty \
507
+ --intensity 50 \
508
+ --output output/batch/
509
+ ```
510
+
511
+ ---
512
+
513
+ ## Inference Modes
514
+
515
+ LandmarkDiff supports four inference modes with different quality-speed-hardware tradeoffs:
516
+
517
+ | Mode | GPU Required | Speed | Quality | Identity Preservation |
518
+ |------|-------------|-------|---------|----------------------|
519
+ | `tps` | No | Instant (~0.5s) | Geometric only | Perfect (pixel-level) |
520
+ | `img2img` | Yes (6GB) | ~5s | Good | Good |
521
+ | `controlnet` | Yes (6GB) | ~5s | Best | Good |
522
+ | `controlnet_ip` | Yes (8GB) | ~7s | Best | Best |
523
+
524
+ **TPS mode** - Thin-plate spline warping. No diffusion, no neural network inference. Just mathematically warps the pixels according to landmark displacements. Fast and deterministic, but the output looks like a geometric morph rather than a natural photo. Good for previewing the deformation before committing to a full diffusion run.
525
+
526
+ **img2img mode** - Standard Stable Diffusion img2img with the TPS-warped image as input and a feathered mask restricting generation to the surgical region. Faster than ControlNet but less controllable.
527
+
528
+ **ControlNet mode** - The primary mode. Uses CrucibleAI's pre-trained ControlNet for MediaPipe Face mesh conditioning. The deformed wireframe directly controls the spatial layout of the generated face, producing the most anatomically accurate results.
529
+
530
+ **ControlNet + IP-Adapter mode** - Adds IP-Adapter FaceID on top of ControlNet for stronger identity preservation. Uses face embeddings from the input photo to condition generation, reducing the chance of producing a different-looking person. Slightly slower due to the additional encoder pass.
531
+
532
+ ```python
533
+ from landmarkdiff.inference import LandmarkDiffPipeline
534
+
535
+ pipeline = LandmarkDiffPipeline(mode="controlnet", device="cuda")
536
+ pipeline.load()
537
+
538
+ result = pipeline.generate(
539
+ image,
540
+ procedure="rhinoplasty",
541
+ intensity=60,
542
+ num_inference_steps=30,
543
+ guidance_scale=7.5,
544
+ controlnet_conditioning_scale=1.0,
545
+ strength=0.75,
546
+ seed=42,
547
+ postprocess=True,
548
+ )
549
+
550
+ # result dict contains:
551
+ # result["output"] - final composited image
552
+ # result["output_raw"] - raw diffusion output (before compositing)
553
+ # result["output_tps"] - TPS-only geometric warp
554
+ # result["conditioning"] - wireframe fed to ControlNet
555
+ # result["mask"] - surgical mask
556
+ # result["landmarks_original"] - input landmarks
557
+ # result["landmarks_manipulated"] - deformed landmarks
558
+ # result["identity_check"] - ArcFace similarity score
559
+ ```
560
+
561
+ ---
562
+
563
+ ## Gradio Web Demo
564
+
565
+ **Try it online:** [huggingface.co/spaces/dreamlessx/LandmarkDiff](https://huggingface.co/spaces/dreamlessx/LandmarkDiff) (TPS mode, runs on CPU)
566
+
567
+ Or run locally:
568
+
569
+ ```bash
570
+ python scripts/app.py
571
+ # Opens at http://localhost:7860
572
+ ```
573
+
574
+ The demo has five tabs:
575
+
576
+ ### Tab 1: Single Procedure
577
+ Upload a photo, pick a procedure, adjust intensity from 0-100%. The interface shows every intermediate step: extracted landmarks, deformed mesh, wireframe conditioning, surgical mask, TPS warp, and the final result in a side-by-side before/after view. Clinical flags (vitiligo, Bell's palsy with side selector, keloid-prone regions, Ehlers-Danlos) are available as checkboxes.
578
+
579
+ ### Tab 2: Multi-Procedure Comparison
580
+ Set independent intensity sliders for all six procedures and generate them all from the same photo. Useful for showing a patient their options side by side.
581
+
582
+ ### Tab 3: Intensity Sweep
583
+ Pick a procedure and a number of steps (3 to 10). Generates a gallery progressing from 0% to 100% intensity so you can see exactly how the result changes with the intensity parameter.
584
+
585
+ ### Tab 4: Face Analysis
586
+ Upload a photo and get back the detected Fitzpatrick skin type, face view classification (frontal, three-quarter, or profile), yaw and pitch angles in degrees, per-region landmark counts, confidence scores, and an annotated landmark visualization.
587
+
588
+ ### Tab 5: Multi-Angle Capture
589
+ Guides the user through capturing 5 standardized clinical views: frontal (0 degrees), left three-quarter (45 degrees), right three-quarter (45 degrees), left profile (90 degrees), right profile (90 degrees). Validates each photo against the expected yaw range and generates predictions for all views, producing a combined before/after gallery.
590
+
591
+ ---
592
+
593
+ ## Symmetry Analysis
594
+
595
+ LandmarkDiff includes bilateral facial symmetry measurement as part of both the demo and the evaluation pipeline. The analysis works by reflecting left-side landmarks across the facial midline (computed from the forehead apex to the chin) and measuring their Euclidean distance to the corresponding right-side landmarks.
596
+
597
+ Five anatomical regions are scored independently:
598
+
599
+ | Region | Landmark pairs | What it captures |
600
+ |--------|---------------|------------------|
601
+ | Eyes | 6 pairs | Palpebral fissure symmetry, canthal tilt |
602
+ | Brows | 5 pairs | Brow arch height and position |
603
+ | Cheeks | 4 pairs | Malar prominence, midface balance |
604
+ | Mouth | 5 pairs | Commissure position, lip symmetry |
605
+ | Jaw | 5 pairs | Mandibular contour, chin alignment |
606
+
607
+ Scores range from 0 to 100, where 90-100 indicates high symmetry, 70-89 mild asymmetry, and below 70 notable asymmetry. All distances are normalized by inter-ocular distance for scale invariance.
608
+
609
+ The demo's **Symmetry Analysis** tab offers two modes:
610
+
611
+ - **Single photo** -- upload any face photo to get a per-region symmetry breakdown with a color-coded overlay (green/yellow/red).
612
+ - **Pre vs. post comparison** -- upload before and after photos to see how a procedure changed the symmetry profile, with per-region deltas.
613
+
614
+ Symmetry scores are also computed automatically during inference runs and reported alongside the prediction output.
615
+
616
+ ---
617
+
618
+ ## Training
619
+
620
+ Training happens in two phases.
621
+
622
+ ### Phase A: Synthetic Data (current)
623
+
624
+ Generate TPS-warped face pairs from FFHQ, then fine-tune ControlNet to reconstruct the original face from the deformed wireframe.
625
+
626
+ ```bash
627
+ # 1. Download FFHQ samples
628
+ python scripts/download_ffhq.py --num 50000 --resolution 512
629
+
630
+ # 2. Generate training pairs (original + TPS-warped + wireframe)
631
+ python scripts/generate_synthetic_data.py \
632
+ --input data/ffhq_samples/ \
633
+ --output data/synthetic_pairs/ \
634
+ --num 50000
635
+
636
+ # 3. Train ControlNet
637
+ python scripts/train_controlnet.py \
638
+ --data_dir data/synthetic_pairs/ \
639
+ --output_dir checkpoints/ \
640
+ --num_train_steps 50000
641
+ ```
642
+
643
+ Phase A uses diffusion loss only (MSE between predicted and target noise).
644
+
645
+ ### Phase B: Clinical + Combined Loss (planned)
646
+
647
+ Fine-tune further on clinical before/after pairs with the full four-term loss:
648
+
649
+ | Loss | Weight | Purpose |
650
+ |------|--------|---------|
651
+ | Diffusion (MSE) | 1.0 | Primary training signal |
652
+ | Landmark L2 | 0.1 | Anatomical accuracy (inside surgical mask only) |
653
+ | Identity (ArcFace) | 0.05 | Patient identity preservation |
654
+ | Perceptual (LPIPS) | 0.1 | Texture quality (outside mask, prevents penalizing the TPS warp) |
655
+
656
+ The landmark loss is normalized by inter-ocular distance (landmarks 33 vs 263) for scale invariance. The identity loss uses procedure-dependent face cropping - rhinoplasty crops to the upper face, blepharoplasty uses the full face, rhytidectomy crops above the jawline, and orthognathic disables identity loss entirely since jaw surgery inherently changes proportions.
657
+
658
+ ### Training Configuration
659
+
660
+ Default config at `configs/training.yaml`:
661
+
662
+ | Parameter | Value | Notes |
663
+ |-----------|-------|-------|
664
+ | Learning rate | 1e-5 | With cosine scheduler |
665
+ | Warmup steps | 500 | |
666
+ | Batch size | 4 | Gradient accumulation 4x, effective batch 16 |
667
+ | Mixed precision | bf16 | NOT fp16 - activation range exceeded |
668
+ | EMA decay | 0.9999 | |
669
+ | Checkpoint interval | 5000 steps | |
670
+ | ControlNet scale max | 1.2 | Sum > 1.2 causes saturation |
671
+
672
+ Important training safeguards:
673
+ - VAE is always frozen (gradient leak corrupts the latent space)
674
+ - GroupNorm instead of BatchNorm (batch size 4 makes BN unstable)
675
+ - TPS warps are precomputed to avoid CPU bottleneck during training
676
+ - Git LFS required for checkpoints
677
+
678
+ ### SLURM (HPC)
679
+
680
+ ```bash
681
+ sbatch scripts/train_slurm.sh
682
+ ```
683
+
684
+ See [docs/GPU_TRAINING_GUIDE.md](docs/GPU_TRAINING_GUIDE.md) for detailed HPC setup, Apptainer containers, and multi-node configurations.
685
+
686
+ ---
687
+
688
+ ## Evaluation and Metrics
689
+
690
+ ### Primary Metrics
691
+
692
+ | Metric | What it measures | Target | How it's computed |
693
+ |--------|-----------------|--------|-------------------|
694
+ | FID | Realism | < 50 | Frechet Inception Distance via torch-fidelity (GPU-accelerated) |
695
+ | LPIPS | Perceptual similarity | < 0.15 | Learned Perceptual Image Patch Similarity (AlexNet backbone) |
696
+ | SSIM | Structural similarity | > 0.80 | Structural Similarity Index between input and output |
697
+ | NME | Landmark accuracy | < 0.05 | Normalized Mean Error - L2 distance between predicted and target landmarks, normalized by inter-ocular distance (landmarks 33 vs 263) |
698
+ | Identity Sim | Identity preservation | > 0.85 | ArcFace cosine similarity between input and output face embeddings (InsightFace buffalo_l, 512-dim) |
699
+
700
+ ### Fitzpatrick Stratification
701
+
702
+ Every metric is broken down by Fitzpatrick skin type to ensure equitable performance. Skin type is classified automatically from the input photo using Individual Typology Angle (ITA):
703
+
704
+ ```
705
+ ITA = arctan((L - 50) / b) * (180 / pi)
706
+ ```
707
+
708
+ where L and b come from the LAB color space.
709
+
710
+ | ITA Range | Fitzpatrick Type | Description |
711
+ |-----------|-----------------|-------------|
712
+ | > 55 | Type I | Very light |
713
+ | 41 to 55 | Type II | Light |
714
+ | 28 to 41 | Type III | Intermediate |
715
+ | 10 to 28 | Type IV | Tan |
716
+ | -30 to 10 | Type V | Brown |
717
+ | < -30 | Type VI | Dark |
718
+
719
+ This catches cases where the model might work well on lighter skin but degrade on darker skin (or vice versa). Results are reported per-type in evaluation output.
720
+
721
+ ### Running Evaluation
722
+
723
+ ```bash
724
+ python scripts/evaluate.py \
725
+ --pred_dir output/predictions/ \
726
+ --target_dir data/targets/ \
727
+ --output eval_results.json
728
+ ```
729
+
730
+ The evaluation harness computes all metrics, stratifies by Fitzpatrick type and by procedure, and writes a JSON report.
731
+
732
+ ---
733
+
734
+ ## Clinical Edge Cases
735
+
736
+ LandmarkDiff handles four clinical conditions that affect how deformations should be applied or how the mask should behave.
737
+
738
+ ### Vitiligo
739
+
740
+ Vitiligo causes depigmented patches on the skin that should be preserved, not blended over. LandmarkDiff detects vitiligo patches using LAB luminance thresholding (high L, low saturation), filters by minimum area (200 px squared), and reduces mask intensity over detected patches by a preservation factor of 0.3. This means the surgical region is still modified, but depigmented areas are largely left alone.
741
+
742
+ ### Bell's Palsy
743
+
744
+ Bell's palsy causes unilateral facial paralysis. Deforming the paralyzed side produces unrealistic results because the tissue doesn't respond to surgery the same way. LandmarkDiff takes the affected side (left or right) as input and disables all deformation handles on that side. The bilateral landmark groups (eye, eyebrow, mouth corner, jawline) for the affected side are excluded from manipulation.
745
+
746
+ ### Keloid-Prone Skin
747
+
748
+ Keloid-prone patients develop raised scars at incision sites. LandmarkDiff identifies keloid-prone regions (specified by anatomical zone, e.g., "jawline", "nose"), creates exclusion masks with margins, and reduces mask intensity by a factor of 0.5 with additional Gaussian blur (sigma 10.0) for softer transitions. This prevents sharp compositing boundaries that would suggest incision lines.
749
+
750
+ ### Ehlers-Danlos Syndrome
751
+
752
+ Ehlers-Danlos causes tissue hypermobility - the skin stretches more than typical. LandmarkDiff multiplies the Gaussian RBF influence radius by 1.5 for Ehlers-Danlos patients, producing wider, more gradual deformations that reflect how hypermobile tissue actually responds to surgical manipulation.
753
+
754
+ ### Using Clinical Flags
755
+
756
+ ```python
757
+ from landmarkdiff.clinical import ClinicalFlags
758
+
759
+ flags = ClinicalFlags(
760
+ vitiligo=True,
761
+ bells_palsy=True,
762
+ bells_palsy_side="left",
763
+ keloid_prone=True,
764
+ keloid_regions=["jawline", "nose"],
765
+ ehlers_danlos=False,
766
+ )
767
+
768
+ result = pipeline.generate(
769
+ image,
770
+ procedure="rhinoplasty",
771
+ intensity=60,
772
+ clinical_flags=flags,
773
+ )
774
+ ```
775
+
776
+ In the Gradio demo, these are checkboxes and dropdowns in Tab 1.
777
+
778
+ ---
779
+
780
+ ## Post-Processing Pipeline
781
+
782
+ The raw diffusion output needs refinement before it looks right. The post-processing pipeline runs six steps:
783
+
784
+ ### 1. CodeFormer Face Restoration
785
+ Neural face restoration that fixes small artifacts, enhances detail, and sharpens facial features. Uses a fidelity weight of 0.7 (range 0.0 to 1.0) to balance quality enhancement against faithfulness to the diffusion output. Falls back to GFPGAN if CodeFormer is unavailable.
786
+
787
+ ### 2. Real-ESRGAN Background Enhancement
788
+ Super-resolution applied only to non-face regions (background, hair, clothing). Prevents the background from looking noticeably lower quality than the restored face.
789
+
790
+ ### 3. Skin Tone Matching
791
+ CDF histogram matching in LAB color space transfers the input photo's skin tone to the generated output. LAB matching is more robust than RGB for this because it separates luminance from color, preventing brightness shifts.
792
+
793
+ ### 4. Frequency-Aware Sharpening
794
+ Unsharp masking applied to the L channel only (luminance) with a default strength of 0.25. Sharpening only luminance avoids the color fringing artifacts you get from sharpening RGB channels directly.
795
+
796
+ ### 5. Laplacian Pyramid Blending
797
+ The compositing step - blends the generated face into the original photo. Uses a 6-level Laplacian pyramid where low-frequency levels blend smoothly (lighting and color continuity) while high-frequency levels transition sharply (texture and pore detail). This prevents the color halos and "pasted on" look that simple alpha blending produces.
798
+
799
+ ### 6. ArcFace Identity Verification
800
+ Final sanity check. Extracts ArcFace embeddings from the input and output, computes cosine similarity, and flags if the score drops below 0.6. This catches cases where the diffusion model drifted too far from the patient's appearance.
801
+
802
+ ---
803
+
804
+ ## Project Structure
805
+
806
+ ```
807
+ landmarkdiff/ # Core library
808
+ landmarks.py # MediaPipe 478-point face mesh extraction
809
+ # FaceLandmarks dataclass, extract_landmarks(),
810
+ # render_landmark_image(), LANDMARK_REGIONS dict
811
+ conditioning.py # ControlNet conditioning generation
812
+ # Tessellation wireframe (2556 edges), adaptive
813
+ # Canny edge detection, generate_conditioning()
814
+ manipulation.py # Gaussian RBF landmark deformation
815
+ # DeformationHandle, PROCEDURE_LANDMARKS,
816
+ # apply_procedure_preset(), clinical modifiers
817
+ masking.py # Feathered surgical mask generation
818
+ # Convex hull + dilation + Gaussian feather +
819
+ # Perlin boundary noise, clinical adjustments
820
+ inference.py # Full pipeline (4 modes: tps/img2img/controlnet/
821
+ # controlnet_ip), LandmarkDiffPipeline class,
822
+ # face view estimation, procedure-specific prompts
823
+ losses.py # Combined loss (diffusion + landmark + identity
824
+ # + perceptual), phase A/B control, procedure-
825
+ # dependent identity cropping
826
+ evaluation.py # Metrics (FID, LPIPS, SSIM, NME, Identity Sim),
827
+ # Fitzpatrick ITA classification, per-type and
828
+ # per-procedure stratification
829
+ clinical.py # Clinical edge cases: ClinicalFlags dataclass,
830
+ # vitiligo patch detection, Bell's palsy side
831
+ # exclusion, keloid mask adjustment, Ehlers-Danlos
832
+ postprocess.py # Neural + classical post-processing: CodeFormer,
833
+ # GFPGAN, Real-ESRGAN, LAB histogram matching,
834
+ # Laplacian pyramid blend, ArcFace verification
835
+ synthetic/
836
+ pair_generator.py # Training pair generation pipeline
837
+ tps_warp.py # Thin-plate spline warping with rigid regions
838
+ # (teeth, sclera), smart control point subsampling
839
+ # (max 80 from 478), batched evaluation
840
+ augmentation.py # Clinical photography augmentations
841
+
842
+ scripts/ # CLI tools
843
+ app.py # Gradio web demo (5 tabs)
844
+ run_inference.py # Single image inference
845
+ train_controlnet.py # ControlNet fine-tuning
846
+ evaluate.py # Automated evaluation harness
847
+ demo.py # CLI demo with visualizations
848
+ download_ffhq.py # FFHQ face image downloader
849
+ generate_synthetic_data.py # Synthetic training pair generator
850
+ train_slurm.sh # SLURM job script (single GPU)
851
+ train_slurm_v2.sh # SLURM job script (multi-GPU)
852
+ gen_synthetic_slurm.sh # SLURM job for data generation
853
+
854
+ examples/ # Runnable example scripts
855
+ basic_inference.py # Single image with GPU fallback to TPS
856
+ batch_inference.py # Process a directory of images
857
+ tps_only.py # CPU-only TPS warp (no GPU)
858
+ compare_procedures.py # Side-by-side all procedures grid
859
+ custom_procedure.py # Define a lip augmentation procedure
860
+ landmark_visualization.py # Visualize mesh with displacement arrows
861
+
862
+ benchmarks/ # Performance benchmarks
863
+ benchmark_inference.py # Inference speed across hardware
864
+ benchmark_landmarks.py # Landmark extraction throughput
865
+ benchmark_training.py # Training steps/hour
866
+
867
+ configs/ # Training configuration
868
+ training.yaml # Default hyperparameters, loss weights, safeguards
869
+
870
+ paper/ # MICCAI 2026 manuscript (Springer LNCS)
871
+ docs/ # Documentation
872
+ tutorials/ # quickstart, custom_procedures, training,
873
+ # evaluation, deployment
874
+ api/ # Per-module API reference (landmarks,
875
+ # manipulation, conditioning, inference,
876
+ # evaluation, clinical)
877
+ GPU_TRAINING_GUIDE.md # HPC setup, Apptainer, SLURM
878
+
879
+ containers/ # Apptainer/Singularity container definitions
880
+ tests/ # Unit tests (9 test modules)
881
+ demos/ # Curated sample output images
882
+ ```
883
+
884
+ ---
885
+
886
+ ## Configuration
887
+
888
+ ### Training (configs/training.yaml)
889
+
890
+ The training config controls all hyperparameters, loss weights, and safeguards. Key sections:
891
+
892
+ ```yaml
893
+ model:
894
+ controlnet: CrucibleAI/ControlNetMediaPipeFace
895
+ base_model: runwayml/stable-diffusion-v1-5
896
+
897
+ training:
898
+ learning_rate: 1.0e-5
899
+ lr_scheduler: cosine
900
+ warmup_steps: 500
901
+ batch_size: 4
902
+ gradient_accumulation_steps: 4 # effective batch = 16
903
+ num_train_steps: 10000
904
+ mixed_precision: bf16
905
+ ema_decay: 0.9999
906
+
907
+ loss_weights: # Phase B only
908
+ diffusion: 1.0
909
+ landmark: 0.1
910
+ identity: 0.05
911
+ perceptual: 0.1
912
+ ```
913
+
914
+ ### Inference Parameters
915
+
916
+ | Parameter | Default | Range | Effect |
917
+ |-----------|---------|-------|--------|
918
+ | `intensity` | 60 | 0 - 100 | How aggressive the deformation is (percentage) |
919
+ | `num_inference_steps` | 30 | 10 - 100 | Diffusion denoising steps (more = higher quality, slower) |
920
+ | `guidance_scale` | 7.5 | 1.0 - 20.0 | Classifier-free guidance strength |
921
+ | `controlnet_conditioning_scale` | 1.0 | 0.0 - 1.2 | How strongly the wireframe controls generation. Max 1.2 to avoid saturation |
922
+ | `strength` | 0.75 | 0.0 - 1.0 | img2img denoising strength |
923
+ | `seed` | None | any int | For reproducible results |
924
+
925
+ ---
926
+
927
+ ## Benchmarks
928
+
929
+ ### Inference Speed
930
+
931
+ | Hardware | Mode | Time per image |
932
+ |----------|------|----------------|
933
+ | A100 80GB | ControlNet (30 steps) | ~3 sec |
934
+ | A100 40GB | ControlNet (30 steps) | ~4 sec |
935
+ | RTX 4090 | ControlNet (30 steps) | ~5 sec |
936
+ | RTX 3090 | ControlNet (30 steps) | ~7 sec |
937
+ | T4 16GB | ControlNet (30 steps) | ~15 sec |
938
+ | M3 Pro (MPS) | ControlNet (30 steps) | ~45 sec |
939
+ | Any CPU | TPS only | ~0.5 sec |
940
+
941
+ ### Training Throughput
942
+
943
+ | Hardware | Batch size | Grad accum | Effective batch | Steps/hour |
944
+ |----------|-----------|------------|-----------------|------------|
945
+ | A100 80GB | 4 | 4 | 16 | ~600 |
946
+ | A100 40GB | 2 | 8 | 16 | ~400 |
947
+ | RTX 4090 | 2 | 8 | 16 | ~350 |
948
+ | RTX 3090 | 1 | 16 | 16 | ~200 |
949
+
950
+ ### VRAM Usage
951
+
952
+ | Component | VRAM |
953
+ |-----------|------|
954
+ | SD 1.5 (FP16) | ~2.5 GB |
955
+ | ControlNet (FP16) | ~1.5 GB |
956
+ | VAE (FP32) | ~0.5 GB |
957
+ | CodeFormer | ~0.4 GB |
958
+ | ArcFace | ~0.3 GB |
959
+ | **Total inference** | **~5.2 GB** |
960
+ | **Total training** | **~25 GB** |
961
+
962
+ Run benchmarks yourself:
963
+
964
+ ```bash
965
+ python benchmarks/benchmark_inference.py --device cuda --num_images 100
966
+ python benchmarks/benchmark_landmarks.py --num_images 1000
967
+ python benchmarks/benchmark_training.py --device cuda --num_steps 100
968
+ ```
969
+
970
+ ---
971
+
972
+ ## Model Zoo
973
+
974
+ See [MODEL_ZOO.md](MODEL_ZOO.md) for the full list of required and optional models.
975
+
976
+ **Base models (auto-downloaded on first run):**
977
+ - [runwayml/stable-diffusion-v1-5](https://huggingface.co/runwayml/stable-diffusion-v1-5) - ~4 GB
978
+ - [CrucibleAI/ControlNetMediaPipeFace](https://huggingface.co/CrucibleAI/ControlNetMediaPipeFace) - ~1.4 GB
979
+ - MediaPipe Face Mesh - ~5 MB
980
+
981
+ **Post-processing models (optional, auto-downloaded):**
982
+ - CodeFormer - ~400 MB
983
+ - GFPGAN v1.4 - ~350 MB
984
+ - Real-ESRGAN x4 - ~64 MB
985
+ - ArcFace (InsightFace buffalo_l) - ~250 MB
986
+
987
+ ---
988
+
989
+ ## Requirements
990
+
991
+ - Python 3.10+
992
+ - PyTorch 2.1+ with CUDA (or MPS on Apple Silicon)
993
+ - ~6 GB VRAM for inference (SD 1.5 + ControlNet)
994
+ - ~25 GB VRAM for training (A100 40GB minimum, 80GB recommended)
995
+ - MediaPipe 0.10.9+
996
+ - diffusers 0.27.0+, transformers 4.38.0+
997
+
998
+ Full dependency list in [pyproject.toml](pyproject.toml).
999
+
1000
+ ---
1001
+
1002
+ ## Docker
1003
+
1004
+ ```bash
1005
+ # CPU-only demo (TPS mode, no GPU required)
1006
+ docker build -t landmarkdiff:cpu -f Dockerfile.cpu .
1007
+ docker run -p 7860:7860 landmarkdiff:cpu
1008
+
1009
+ # GPU-accelerated demo (ControlNet inference)
1010
+ docker build -t landmarkdiff:gpu -f Dockerfile.gpu .
1011
+ docker run --gpus all -p 7860:7860 landmarkdiff:gpu
1012
+ ```
1013
+
1014
+ Or with Docker Compose:
1015
+
1016
+ ```bash
1017
+ docker compose up app # CPU demo on :7860
1018
+ docker compose up gpu # GPU demo on :7861
1019
+ docker compose --profile training run train # training (GPU)
1020
+ ```
1021
+
1022
+ GPU passthrough requires [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html). See [docs/docker-gpu.md](docs/docker-gpu.md) for prerequisites, VRAM requirements by GPU tier, and troubleshooting.
1023
+
1024
+ For HPC environments using Apptainer/Singularity, see [containers/](containers/).
1025
+
1026
+ ---
1027
+
1028
+ ## Make Targets
1029
+
1030
+ ```bash
1031
+ make help # show all commands
1032
+ make install # install (inference only)
1033
+ make install-dev # install with dev tools
1034
+ make install-train # install with training deps
1035
+ make install-app # install with Gradio
1036
+ make install-all # install everything
1037
+ make test # run full test suite
1038
+ make test-fast # run tests excluding slow ones
1039
+ make lint # run ruff linter
1040
+ make format # auto-format code
1041
+ make type-check # run mypy
1042
+ make check # lint + format + type-check
1043
+ make demo # launch Gradio demo
1044
+ make inference # run single inference
1045
+ make train # train ControlNet
1046
+ make evaluate # run evaluation
1047
+ make docker # build Docker image
1048
+ make paper # build MICCAI paper PDF
1049
+ make clean # remove build artifacts
1050
+ ```
1051
+
1052
+ <br>
1053
+
1054
+ ---
1055
+
1056
+ ## Roadmap
1057
+
1058
+ See [docs/ROADMAP.md](docs/ROADMAP.md) for the detailed roadmap with full milestone descriptions.
1059
+
1060
+ ### Released (v0.2.x)
1061
+ - [x] Core pipeline: landmark extraction, RBF deformation, ControlNet conditioning, mask compositing
1062
+ - [x] 6 procedure presets (rhinoplasty, blepharoplasty, rhytidectomy, orthognathic, brow lift, mentoplasty)
1063
+ - [x] Synthetic training pair generation via TPS warps
1064
+ - [x] Clinical edge case handling (vitiligo, Bell's palsy, keloid, Ehlers-Danlos)
1065
+ - [x] Neural post-processing (CodeFormer, Real-ESRGAN, ArcFace identity verification)
1066
+ - [x] Gradio demo with multi-angle capture
1067
+ - [x] Fitzpatrick-stratified evaluation protocol
1068
+ - [x] Docker and Apptainer container support
1069
+ - [x] Hugging Face Spaces interactive demo ([live](https://huggingface.co/spaces/dreamlessx/LandmarkDiff))
1070
+ - [x] Data-driven displacement model fitted from real surgical pairs
1071
+ - [ ] ControlNet fine-tuning on 50K+ synthetic pairs (in progress)
1072
+ - [ ] Populate results tables in paper
1073
+
1074
+ ### Next (v0.3.0) -- Data-Driven Training
1075
+ - [ ] Anatomically constrained displacement sampling with per-procedure variance
1076
+ - [ ] ControlNet fine-tuning on 50K+ synthetic pairs (Phase A)
1077
+ - [ ] Combined loss training on clinical pairs (Phase B)
1078
+ - [ ] Additional procedure presets (otoplasty, genioplasty)
1079
+ - [ ] MICCAI 2026 workshop paper and arXiv preprint
1080
+
1081
+ ### v0.4.0 -- 3D Face Reconstruction
1082
+ - [ ] Phone video capture -- rotate head, reconstruct full 3D face from frames
1083
+ - [ ] FLAME 3D morphable model fitting from monocular video
1084
+ - [ ] FLUX.1-dev or SDXL backbone upgrade (higher quality generation at 1024x1024)
1085
+ - [ ] IP-Adapter FaceID v2 for stronger identity preservation
1086
+
1087
+ ### v0.5.0 -- Interactive 3D Surgical Preview
1088
+ - [ ] 3D surgical deformation -- procedure-specific warps in 3D space
1089
+ - [ ] Interactive 3D preview -- rotate the predicted result from any angle
1090
+ - [ ] Mobile-optimized capture and preview workflow
1091
+
1092
+ ### Future (v1.0.0) -- Clinical Validation
1093
+ - [ ] IRB-approved prospective clinical validation study
1094
+ - [ ] Multi-view consistency loss across frontal/profile predictions
1095
+ - [ ] Physics-informed tissue simulation (FEM for soft tissue response)
1096
+ - [ ] Mobile capture app with guided head-rotation scan
1097
+ - [ ] Cloud deployment with Triton inference server
1098
+
1099
+ ### Publication Targets
1100
+ - MICCAI 2026 workshop paper (July 2026 submission)
1101
+ - RSNA 2026 abstract (May 2026)
1102
+ - Full conference paper (CVPR/NeurIPS 2027)
1103
+
1104
+ <br>
1105
+
1106
+ ---
1107
+
1108
+ ## Citation
1109
+
1110
+ If you use LandmarkDiff in your research, please cite it. Machine-readable citation metadata is available in [CITATION.cff](CITATION.cff).
1111
+
1112
+ ```bibtex
1113
+ @software{landmarkdiff2026,
1114
+ title = {LandmarkDiff: Anatomically-Conditioned Facial Surgery Outcome Prediction},
1115
+ author = {dreamlessx},
1116
+ year = {2026},
1117
+ url = {https://github.com/dreamlessx/LandmarkDiff-public},
1118
+ version = {0.2.2}
1119
+ }
1120
+ ```
1121
+
1122
+ ---
1123
+
1124
+ ## Contributors
1125
+
1126
+ We track all contributions and contributors will be acknowledged in the MICCAI 2026 paper. Significant contributions earn co-authorship.
1127
+
1128
+ | Contribution Level | Recognition |
1129
+ |---|---|
1130
+ | Bug fix or typo | Listed in [CONTRIBUTORS.md](CONTRIBUTORS.md) |
1131
+ | New procedure preset | Acknowledged in paper and README |
1132
+ | Feature module (new loss, metric, clinical handler) | Co-author on paper |
1133
+ | Clinical validation data | Co-author on paper |
1134
+ | Sustained multi-feature contributions | Co-author on paper |
1135
+
1136
+ ### Current Contributors
1137
+
1138
+ | GitHub Handle | Contribution |
1139
+ |---|---|
1140
+ | [@dreamlessx](https://github.com/dreamlessx) | Core architecture, training pipeline, paper |
1141
+ | [@Deepak8858](https://github.com/Deepak8858) | Brow lift procedure preset ([#35](https://github.com/dreamlessx/LandmarkDiff-public/pull/35)) |
1142
+ | [@P-r-e-m-i-u-m](https://github.com/P-r-e-m-i-u-m) | Mentoplasty procedure preset ([#36](https://github.com/dreamlessx/LandmarkDiff-public/pull/36)) |
1143
+
1144
+ To join this list, open a PR or contribute to an issue. See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
1145
+
1146
+ ---
1147
+
1148
+ ## Contributing
1149
+
1150
+ Contributions welcome. See [CONTRIBUTING.md](CONTRIBUTING.md) for the full guide, including development setup, coding style, testing requirements, and how to add new procedures.
1151
+
1152
+ For bug reports and feature requests, use the [issue templates](https://github.com/dreamlessx/LandmarkDiff-public/issues/new/choose).
1153
+
1154
+ For questions and general discussion, visit [GitHub Discussions](https://github.com/dreamlessx/LandmarkDiff-public/discussions).
1155
+
1156
+ For major changes, please open an issue first to discuss the proposed approach.
1157
+
1158
+ ---
1159
+
1160
+ ## License
1161
+
1162
+ MIT License. See [LICENSE](LICENSE) for details.
1163
+
1164
+ ---
1165
+
1166
+ ## Acknowledgments
1167
+
1168
+ - [CrucibleAI](https://huggingface.co/CrucibleAI/ControlNetMediaPipeFace) for the MediaPipe Face ControlNet
1169
+ - [MediaPipe](https://google.github.io/mediapipe/) for the 478-point face mesh
1170
+ - [Stable Diffusion](https://github.com/CompVis/stable-diffusion) and [ControlNet](https://github.com/lllyasviel/ControlNet) for the diffusion backbone
1171
+ - [CodeFormer](https://github.com/sczhou/CodeFormer) and [Real-ESRGAN](https://github.com/xinntao/Real-ESRGAN) for face restoration
1172
+ - [InsightFace](https://github.com/deepinsight/insightface) for ArcFace identity verification
1173
+ - [FFHQ](https://github.com/NVlabs/ffhq-dataset) for training data