mockcraft 0.1.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,22 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Bibiana Terres Stumpf, Joao Victor Motta da Silva,
4
+ Pedro Jann Luna, Anne Laure Mealier, Julien Zoubian
5
+
6
+ Permission is hereby granted, free of charge, to any person obtaining a copy
7
+ of this software and associated documentation files (the "Software"), to deal
8
+ in the Software without restriction, including without limitation the rights
9
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
10
+ copies of the Software, and to permit persons to whom the Software is
11
+ furnished to do so, subject to the following conditions:
12
+
13
+ The above copyright notice and this permission notice shall be included in all
14
+ copies or substantial portions of the Software.
15
+
16
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
17
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
18
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
19
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
20
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
21
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
22
+ SOFTWARE.
@@ -0,0 +1,392 @@
1
+ Metadata-Version: 2.4
2
+ Name: mockcraft
3
+ Version: 0.1.1
4
+ Summary: Reusable Python package for multimodal astronomical source generation using AION
5
+ Author: Bibiana Terres Stumpf, Joao Victor Motta da Silva, Pedro Jann Luna, Anne Laure Mealier, Julien Zoubian
6
+ License-Expression: MIT
7
+ Project-URL: Repository, https://github.com/CentraleDigitaleLab/mockcraft
8
+ Project-URL: Issues, https://github.com/CentraleDigitaleLab/mockcraft/issues
9
+ Project-URL: Documentation, https://github.com/CentraleDigitaleLab/mockcraft/tree/main/docs
10
+ Keywords: astronomy,astrophysics,machine-learning,generative-ai,multimodal,synthetic-catalogues,AION
11
+ Classifier: Development Status :: 3 - Alpha
12
+ Classifier: Intended Audience :: Science/Research
13
+ Classifier: Operating System :: OS Independent
14
+ Classifier: Programming Language :: Python :: 3
15
+ Classifier: Programming Language :: Python :: 3.10
16
+ Classifier: Programming Language :: Python :: 3.11
17
+ Classifier: Programming Language :: Python :: 3.12
18
+ Classifier: Topic :: Scientific/Engineering :: Astronomy
19
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
20
+ Requires-Python: <3.13,>=3.10
21
+ Description-Content-Type: text/markdown
22
+ License-File: LICENSE
23
+ Requires-Dist: numpy<2.0,>=1.26
24
+ Requires-Dist: pandas<3.0,>=2.2
25
+ Requires-Dist: scipy<1.16,>=1.11
26
+ Requires-Dist: scikit-learn<1.8,>=1.5
27
+ Requires-Dist: astropy<7.0,>=6.1
28
+ Requires-Dist: astroquery<0.5,>=0.4.7
29
+ Requires-Dist: matplotlib<4,>=3.8
30
+ Requires-Dist: torch<3,>=2.4
31
+ Requires-Dist: einops<1,>=0.8
32
+ Requires-Dist: jaxtyping<0.4,>=0.2.28
33
+ Requires-Dist: huggingface-hub<1.0,>=0.23
34
+ Requires-Dist: tokenizers<1,>=0.19
35
+ Requires-Dist: transformers<5,>=4.40
36
+ Requires-Dist: datasets<4,>=2.18
37
+ Requires-Dist: safetensors<0.6,>=0.4.3
38
+ Requires-Dist: polymathic-aion==0.0.2
39
+ Requires-Dist: filelock<4,>=3.13
40
+ Requires-Dist: tqdm<5,>=4.66
41
+ Requires-Dist: typing-extensions<5,>=4.9
42
+ Provides-Extra: viz
43
+ Requires-Dist: matplotlib<4,>=3.8; extra == "viz"
44
+ Requires-Dist: umap-learn<0.6,>=0.5.5; extra == "viz"
45
+ Provides-Extra: notebooks
46
+ Requires-Dist: ipython<9,>=8.20; extra == "notebooks"
47
+ Requires-Dist: ipykernel<7,>=6.29; extra == "notebooks"
48
+ Requires-Dist: jupyter<2,>=1.0; extra == "notebooks"
49
+ Requires-Dist: ipywidgets<9,>=8.1; extra == "notebooks"
50
+ Requires-Dist: matplotlib<4,>=3.8; extra == "notebooks"
51
+ Requires-Dist: umap-learn<0.6,>=0.5.5; extra == "notebooks"
52
+ Requires-Dist: tqdm<5,>=4.66; extra == "notebooks"
53
+ Provides-Extra: data
54
+ Requires-Dist: pyarrow<20,>=15.0; extra == "data"
55
+ Requires-Dist: datasets<4,>=2.18; extra == "data"
56
+ Provides-Extra: docs
57
+ Requires-Dist: Sphinx<9,>=7.2; extra == "docs"
58
+ Requires-Dist: docutils<0.22,>=0.20; extra == "docs"
59
+ Provides-Extra: network
60
+ Requires-Dist: redis<8,>=5.0; extra == "network"
61
+ Requires-Dist: urllib3<3,>=2.0; extra == "network"
62
+ Requires-Dist: requests<3,>=2.31; extra == "network"
63
+ Provides-Extra: security
64
+ Requires-Dist: cryptography<47,>=42.0; extra == "security"
65
+ Requires-Dist: pyOpenSSL<26,>=24.0; extra == "security"
66
+ Dynamic: license-file
67
+
68
+ # mockcraft
69
+
70
+ `mockcraft` is a Python package for generating synthetic astronomical catalogues using the [AION](https://github.com/PolymathicAI/aion) foundation model. It provides a simple interface for cross-modal generation across astronomical data types and includes a catalogue utility for fetching real sources from Gaia DR3, DESI DR1, and Legacy Survey DR8.
71
+
72
+ ---
73
+
74
+ ## What is MockCraft?
75
+
76
+ MockCraft is a pipeline for generating synthetic astronomical mock catalogues using foundation models. Given a set of real astronomical observations as input — redshifts, photometric fluxes, or images — MockCraft can generate realistic synthetic counterparts: spectra, multi-band images, and morphological parameters.
77
+
78
+ [AION](https://github.com/PolymathicAI/aion) is a multimodal foundation model trained on one of the largest astronomical datasets ever assembled (see the [Multimodal Universe paper](https://arxiv.org/pdf/2412.02527)). It learns joint representations across spectra, images, and physical parameters, enabling cross-modal generation: given any subset of observables, it can generate any other. MockCraft uses AION as its generative backbone.
79
+
80
+ ---
81
+
82
+ ## Installation
83
+
84
+ Requires Python 3.10–3.12.
85
+
86
+ ```bash
87
+ pip install mockcraft
88
+ ```
89
+
90
+ **Optional extras:**
91
+
92
+ | Extra | When to use |
93
+ |-------|-------------|
94
+ | `viz` | Visualization helpers (`plot_xp_spectrum`, `plot_image`) and UMAP projections |
95
+ | `notebooks` | Jupyter notebook workflows |
96
+ | `data` | Local Parquet files or HuggingFace datasets |
97
+
98
+ ```bash
99
+ pip install mockcraft[viz]
100
+ pip install mockcraft[notebooks]
101
+ pip install mockcraft[viz,notebooks] # both
102
+ ```
103
+
104
+ **NVIDIA GPU (CUDA 12.4, Linux) — install PyTorch before the package:**
105
+
106
+ ```bash
107
+ pip install torch --index-url https://download.pytorch.org/whl/cu124
108
+ pip install mockcraft
109
+ ```
110
+
111
+ **From source:**
112
+
113
+ ```bash
114
+ git clone https://github.com/CentraleDigitaleLab/mockcraft.git
115
+ cd mockcraft
116
+ pip install -e ".[viz]"
117
+ ```
118
+
119
+ ---
120
+
121
+ ## Quick Start
122
+
123
+ ```python
124
+ from mockcraft import SourceGenerator
125
+
126
+ gen = SourceGenerator(model="aion", seed=42)
127
+
128
+ # Redshift → spectrum + image
129
+ result = gen.generate(
130
+ inputs={"redshift": 0.3},
131
+ outputs=["spectrum", "image"],
132
+ )
133
+
134
+ print(result.spectrum.shape) # (8704,)
135
+ print(result.image.shape) # (3, 128, 128)
136
+ ```
137
+
138
+ ---
139
+
140
+ ## API Reference
141
+
142
+ ### `SourceGenerator`
143
+
144
+ ```python
145
+ from mockcraft import SourceGenerator
146
+
147
+ gen = SourceGenerator(model="aion", seed=42)
148
+ ```
149
+
150
+ | Parameter | Type | Description |
151
+ |-----------|------|-------------|
152
+ | `model` | `str` | Model identifier. Only `"aion"` is currently supported. |
153
+ | `seed` | `int` or `None` | Fixed random seed for reproducibility. |
154
+ | `device` | `str` or `None` | PyTorch device: `"cuda"`, `"mps"`, or `"cpu"`. Auto-detected if not set. |
155
+
156
+ ---
157
+
158
+ ### `generate(inputs, outputs, cfg=None, type=None, compute_embeddings=False)`
159
+
160
+ The primary generation method. Accepts any combination of input and output modalities.
161
+
162
+ ```python
163
+ result = gen.generate(
164
+ inputs={"redshift": 0.3},
165
+ outputs=["spectrum", "image"],
166
+ )
167
+ ```
168
+
169
+ | Parameter | Type | Default | Description |
170
+ |-----------|------|---------|-------------|
171
+ | `inputs` | `dict` | — | Modality name → value. Floats for scalars, numpy arrays for spectra/images. |
172
+ | `outputs` | `list[str]` | — | List of modality names to generate. |
173
+ | `cfg` | `float` or `None` | `None` | Classifier-free guidance override. Uses per-modality defaults if not set. |
174
+ | `type` | `str` or `None` | `None` | Morphological type prior: `"elliptical"`, `"spiral"`, or `"irregular"`. |
175
+ | `compute_embeddings` | `bool` | `False` | If `True`, computes AION latent embeddings `(768,)` for generated spectra and images. |
176
+
177
+ Returns a `GeneratedSource` object (see below).
178
+
179
+ ---
180
+
181
+ ### `star(temperature, logg=None, metallicity=None)`
182
+
183
+ Generate a synthetic stellar XP spectrum conditioned on effective temperature.
184
+
185
+ ```python
186
+ result = gen.star(temperature=5778.0)
187
+
188
+ print(result.xp_bp.shape) # (55,)
189
+ print(result.xp_rp.shape) # (55,)
190
+ ```
191
+
192
+ | Parameter | Type | Description |
193
+ |-----------|------|-------------|
194
+ | `temperature` | `float` | Effective temperature in Kelvin. |
195
+ | `logg` | `float` or `None` | Surface gravity (accepted for API compatibility, not yet used as conditioning). |
196
+ | `metallicity` | `float` or `None` | Metallicity [Fe/H] (accepted for API compatibility, not yet used as conditioning). |
197
+
198
+ Temperature is converted to approximate Gaia G/BP/RP fluxes via bolometric scaling relative to a solar reference (T☉ = 5778 K). Returns a `GeneratedSource` with `xp_bp` and `xp_rp` outputs.
199
+
200
+ ---
201
+
202
+ ## Supported Modality Keys
203
+
204
+ | Key | Type | Description |
205
+ |-----|------|-------------|
206
+ | `redshift` | scalar | Spectroscopic redshift |
207
+ | `parallax` | scalar | Gaia parallax (mas) |
208
+ | `flux_g`, `flux_r`, `flux_i`, `flux_z` | scalar | Legacy Survey photometric fluxes (nanomaggies) |
209
+ | `gaia_flux_g`, `gaia_flux_bp`, `gaia_flux_rp` | scalar | Gaia G / BP / RP fluxes |
210
+ | `shape_r`, `shape_e1`, `shape_e2` | scalar | Legacy Survey morphology parameters |
211
+ | `spectrum` | array `(8704,)` | DESI spectrum (flux) |
212
+ | `xp_bp`, `xp_rp` | array `(55,)` | Gaia XP coefficient arrays |
213
+ | `image` | array `(3, 128, 128)` | Legacy Survey 3-band image (g, r, z) |
214
+
215
+ Any of the above can be used as inputs or outputs in `generate()`.
216
+
217
+ ---
218
+
219
+ ## Return Type
220
+
221
+ `generate()` and `star()` return a `GeneratedSource` object.
222
+
223
+ | Field | Description |
224
+ |-------|-------------|
225
+ | `.outputs` | Dictionary of generated modalities: key → numpy array |
226
+ | `.<key>` | Attribute-style access, e.g. `.spectrum`, `.image`, `.redshift` |
227
+ | `.embedding_spectrum` | AION latent embedding of the generated spectrum `(768,)` — `None` if `compute_embeddings=False` |
228
+ | `.embedding_image` | AION latent embedding of the generated image `(768,)` — `None` if `compute_embeddings=False` |
229
+
230
+ ---
231
+
232
+ ## Generation Examples
233
+
234
+ ### Redshift → spectrum + image
235
+
236
+ ```python
237
+ result = gen.generate(
238
+ inputs={"redshift": 0.3},
239
+ outputs=["spectrum", "image"],
240
+ )
241
+ ```
242
+
243
+ Runs a two-step chained pipeline: redshift → DESI spectrum (CFG=1.0), then spectrum → Legacy Survey image (CFG=2.0).
244
+
245
+ ### Morphological type conditioning
246
+
247
+ ```python
248
+ result = gen.generate(
249
+ inputs={"redshift": 0.3},
250
+ outputs=["spectrum", "image"],
251
+ type="elliptical", # or "spiral", "irregular"
252
+ )
253
+ ```
254
+
255
+ Internally injects median `shape_r`, `shape_e1`, `shape_e2` values from DESI DR1 as additional conditioning inputs.
256
+
257
+ ### Real sources from catalogue → spectrum (with embeddings)
258
+
259
+ ```python
260
+ from mockcraft.catalogue import fetch_sources
261
+
262
+ sources = fetch_sources(
263
+ surveys=["desi"],
264
+ region="cosmos",
265
+ columns=["redshift", "flux_g", "flux_r", "flux_z"],
266
+ max_sources=10,
267
+ )
268
+
269
+ for _, row in sources.iterrows():
270
+ result = gen.generate(
271
+ inputs={"redshift": float(row["redshift"]), "flux_g": float(row["flux_g"]),
272
+ "flux_r": float(row["flux_r"]), "flux_z": float(row["flux_z"])},
273
+ outputs=["spectrum"],
274
+ compute_embeddings=True,
275
+ )
276
+ print(result.spectrum.shape) # (8704,)
277
+ print(result.embedding_spectrum.shape) # (768,)
278
+ ```
279
+
280
+ ---
281
+
282
+ ## Catalogue Utility
283
+
284
+ ```python
285
+ from mockcraft.catalogue import fetch_sources
286
+
287
+ sources = fetch_sources(
288
+ surveys=["gaia", "desi"],
289
+ region="cosmos",
290
+ columns=["ra", "dec", "magnitude", "redshift"],
291
+ max_sources=100,
292
+ )
293
+ ```
294
+
295
+ `region` accepts either a named field or an explicit `(RA, Dec, radius_deg)` tuple:
296
+
297
+ ```python
298
+ # Named region
299
+ fetch_sources(surveys=["desi"], region="cosmos", ...)
300
+
301
+ # Explicit coordinates
302
+ fetch_sources(surveys=["gaia"], region=(150.1, 2.18, 0.18), ...)
303
+ ```
304
+
305
+ ### Supported surveys and columns
306
+
307
+ | Survey | Supported columns |
308
+ |--------|-------------------|
309
+ | `"gaia"` | `ra`, `dec`, `magnitude`, `gaia_flux_g`, `gaia_flux_bp`, `gaia_flux_rp`, `gaia_parallax` |
310
+ | `"desi"` | `ra`, `dec`, `redshift`, `flux_g`, `flux_r`, `flux_z`, `targetid`, `otype` |
311
+ | `"legacy"` | `ra`, `dec`, `redshift`, `type` |
312
+
313
+ When combining multiple surveys, columns not available in a given survey are filled with `NaN`. DESI results are automatically filtered to `ZWARN == 0` (good redshift quality only).
314
+
315
+ ---
316
+
317
+ ## Visualization
318
+
319
+ Requires the `viz` extra (`pip install mockcraft[viz]`).
320
+
321
+ ```python
322
+ from mockcraft import plot_xp_spectrum, plot_image
323
+
324
+ # Plot a predicted Gaia XP spectrum
325
+ result = gen.star(temperature=5778.0)
326
+ plot_xp_spectrum(result)
327
+
328
+ # Plot a predicted Legacy Survey image
329
+ result = gen.generate(inputs={"redshift": 0.3}, outputs=["image"])
330
+ plot_image(result)
331
+ ```
332
+
333
+ ---
334
+
335
+ ## Model and Hyperparameters
336
+
337
+ The package loads `polymathic-ai/aion-base` automatically on first use.
338
+
339
+ | Parameter | Value | Role |
340
+ |-----------|-------|------|
341
+ | `CFG_SPEC` | `1.0` | Guidance scale for anything → spectrum |
342
+ | `CFG_GALAXY` | `2.0` | Guidance scale for spectrum → image |
343
+ | `MASKGIT_STEPS` | `8` | Number of iterative decoding steps |
344
+ | `TEMPERATURE` | `0.8` | Sampling temperature |
345
+ | `N_ROAR_DRAWS` | `50` | Posterior samples for redshift estimation (higher = more accurate, slower) |
346
+
347
+ ---
348
+
349
+ ## Embeddings and Validation
350
+
351
+ Setting `compute_embeddings=True` re-encodes generated spectra and images through AION's encoder to produce latent vectors of shape `(768,)`. These can be used for:
352
+
353
+ - comparing generated vs real source distributions in embedding space (cosine similarity, MMD)
354
+ - UMAP projection to inspect coverage of the latent space
355
+ - detecting mode collapse or out-of-distribution generation
356
+
357
+ Embeddings are disabled by default because they add a second forward pass per modality.
358
+
359
+ ---
360
+
361
+ ## Repository Structure
362
+
363
+ ```
364
+ mockcraft/
365
+ ├── mockcraft/
366
+ │ ├── __init__.py # Public API exports
367
+ │ ├── source_generator.py # SourceGenerator, GeneratedSource, plot helpers
368
+ │ └── catalogue.py # fetch_sources — query Gaia, DESI, Legacy via VizieR
369
+ ├── pyproject.toml
370
+ ├── README.md
371
+ └── LICENSE
372
+ ```
373
+
374
+ ---
375
+
376
+ ## License
377
+
378
+ MIT — see [LICENSE](LICENSE).
379
+
380
+ ---
381
+
382
+ ## Citation
383
+
384
+ If you use MockCraft in your research, please cite the AION paper:
385
+
386
+ ```bibtex
387
+ @article{multimodal_universe_2024,
388
+ title = {Multimodal Universe: Enabling Large-Scale Machine Learning with 100TB of Astronomical Scientific Data},
389
+ year = {2024},
390
+ url = {https://arxiv.org/pdf/2412.02527}
391
+ }
392
+ ```