gengeneeval 0.1.0__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- geneval/__init__.py +129 -0
- geneval/cli.py +333 -0
- geneval/config.py +141 -0
- geneval/core.py +41 -0
- geneval/data/__init__.py +23 -0
- geneval/data/gene_expression_datamodule.py +211 -0
- geneval/data/loader.py +437 -0
- geneval/evaluator.py +359 -0
- geneval/evaluators/__init__.py +4 -0
- geneval/evaluators/base_evaluator.py +178 -0
- geneval/evaluators/gene_expression_evaluator.py +218 -0
- geneval/metrics/__init__.py +65 -0
- geneval/metrics/base_metric.py +229 -0
- geneval/metrics/correlation.py +232 -0
- geneval/metrics/distances.py +516 -0
- geneval/metrics/metrics.py +134 -0
- geneval/models/__init__.py +1 -0
- geneval/models/base_model.py +53 -0
- geneval/results.py +334 -0
- geneval/testing.py +393 -0
- geneval/utils/__init__.py +1 -0
- geneval/utils/io.py +27 -0
- geneval/utils/preprocessing.py +82 -0
- geneval/visualization/__init__.py +38 -0
- geneval/visualization/plots.py +499 -0
- geneval/visualization/visualizer.py +1096 -0
- gengeneeval-0.1.0.dist-info/METADATA +172 -0
- gengeneeval-0.1.0.dist-info/RECORD +31 -0
- gengeneeval-0.1.0.dist-info/WHEEL +4 -0
- gengeneeval-0.1.0.dist-info/entry_points.txt +3 -0
- gengeneeval-0.1.0.dist-info/licenses/LICENSE +9 -0
|
@@ -0,0 +1,172 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: gengeneeval
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: Comprehensive evaluation of generated gene expression data. Computes metrics between real and generated datasets with support for condition matching, train/test splits, and publication-quality visualizations.
|
|
5
|
+
License: MIT
|
|
6
|
+
License-File: LICENSE
|
|
7
|
+
Keywords: gene expression,evaluation,metrics,single-cell,generative models,benchmarking
|
|
8
|
+
Author: GenEval Team
|
|
9
|
+
Author-email: geneval@example.com
|
|
10
|
+
Requires-Python: >=3.8,<4.0
|
|
11
|
+
Classifier: Development Status :: 4 - Beta
|
|
12
|
+
Classifier: Intended Audience :: Science/Research
|
|
13
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
14
|
+
Classifier: Programming Language :: Python :: 3
|
|
15
|
+
Classifier: Programming Language :: Python :: 3.8
|
|
16
|
+
Classifier: Programming Language :: Python :: 3.9
|
|
17
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
18
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
19
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
20
|
+
Classifier: Programming Language :: Python :: 3.13
|
|
21
|
+
Classifier: Programming Language :: Python :: 3.14
|
|
22
|
+
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
|
|
23
|
+
Provides-Extra: full
|
|
24
|
+
Provides-Extra: gpu
|
|
25
|
+
Requires-Dist: anndata (>=0.8.0)
|
|
26
|
+
Requires-Dist: geomloss (>=0.2.1) ; extra == "full" or extra == "gpu"
|
|
27
|
+
Requires-Dist: matplotlib (>=3.5.0)
|
|
28
|
+
Requires-Dist: numpy (>=1.21.0)
|
|
29
|
+
Requires-Dist: pandas (>=1.3.0)
|
|
30
|
+
Requires-Dist: pykeops (>=1.4.0) ; extra == "full" or extra == "gpu"
|
|
31
|
+
Requires-Dist: scanpy (>=1.9.0)
|
|
32
|
+
Requires-Dist: scipy (>=1.7.0)
|
|
33
|
+
Requires-Dist: seaborn (>=0.11.0)
|
|
34
|
+
Requires-Dist: torch (>=1.9.0)
|
|
35
|
+
Requires-Dist: umap-learn (>=0.5.0) ; extra == "full"
|
|
36
|
+
Project-URL: Homepage, https://github.com/AndreaRubbi/GenGeneEval
|
|
37
|
+
Project-URL: Repository, https://github.com/AndreaRubbi/GenGeneEval
|
|
38
|
+
Description-Content-Type: text/markdown
|
|
39
|
+
|
|
40
|
+
# GenEval: Gene Expression Evaluation Framework
|
|
41
|
+
|
|
42
|
+
[](https://badge.fury.io/py/gengeneeval)
|
|
43
|
+
[](https://www.python.org/downloads/)
|
|
44
|
+
[](https://opensource.org/licenses/MIT)
|
|
45
|
+
[](https://github.com/AndreaRubbi/GenGeneEval/actions)
|
|
46
|
+
|
|
47
|
+
**Comprehensive evaluation of generated gene expression data against real datasets.**
|
|
48
|
+
|
|
49
|
+
GenEval is a modular, object-oriented Python framework for computing metrics between real and generated gene expression datasets stored in AnnData (h5ad) format. It supports condition-based matching, train/test splits, and generates publication-quality visualizations.
|
|
50
|
+
|
|
51
|
+
## Features
|
|
52
|
+
|
|
53
|
+
### Metrics
|
|
54
|
+
All metrics are computed **per-gene** (returning a vector) and **aggregated**:
|
|
55
|
+
|
|
56
|
+
| Metric | Description | Direction |
|
|
57
|
+
|--------|-------------|-----------|
|
|
58
|
+
| **Pearson Correlation** | Linear correlation between expression profiles | Higher is better |
|
|
59
|
+
| **Spearman Correlation** | Rank correlation (robust to outliers) | Higher is better |
|
|
60
|
+
| **Wasserstein-1** | Earth Mover's Distance (L1) | Lower is better |
|
|
61
|
+
| **Wasserstein-2** | Quadratic optimal transport | Lower is better |
|
|
62
|
+
| **MMD** | Maximum Mean Discrepancy (kernel-based) | Lower is better |
|
|
63
|
+
| **Energy Distance** | Statistical potential energy | Lower is better |
|
|
64
|
+
|
|
65
|
+
### Visualizations
|
|
66
|
+
- **Boxplots & Violin plots**: Metric distributions across conditions
|
|
67
|
+
- **Radar plots**: Multi-metric comparison
|
|
68
|
+
- **Scatter plots**: Real vs generated expression
|
|
69
|
+
- **Embedding plots**: PCA/UMAP of real vs generated data
|
|
70
|
+
- **Heatmaps**: Per-gene metric values
|
|
71
|
+
|
|
72
|
+
### Key Features
|
|
73
|
+
- ✅ Condition-based matching (perturbation, cell type, etc.)
|
|
74
|
+
- ✅ Train/test split support
|
|
75
|
+
- ✅ Per-gene and aggregate metrics
|
|
76
|
+
- ✅ Modular, extensible architecture
|
|
77
|
+
- ✅ Command-line interface
|
|
78
|
+
- ✅ Publication-quality visualizations
|
|
79
|
+
|
|
80
|
+
## Installation
|
|
81
|
+
|
|
82
|
+
### Using pip
|
|
83
|
+
```bash
|
|
84
|
+
pip install -e .
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
### With GPU support (faster distance metrics)
|
|
88
|
+
```bash
|
|
89
|
+
pip install -e ".[gpu]"
|
|
90
|
+
```
|
|
91
|
+
|
|
92
|
+
## Quick Start
|
|
93
|
+
|
|
94
|
+
### Python API
|
|
95
|
+
|
|
96
|
+
```python
|
|
97
|
+
from geneval import evaluate
|
|
98
|
+
|
|
99
|
+
# Run evaluation
|
|
100
|
+
results = evaluate(
|
|
101
|
+
real_path="real_data.h5ad",
|
|
102
|
+
generated_path="generated_data.h5ad",
|
|
103
|
+
condition_columns=["perturbation", "cell_type"],
|
|
104
|
+
split_column="split", # Optional: for train/test
|
|
105
|
+
output_dir="evaluation_output/"
|
|
106
|
+
)
|
|
107
|
+
|
|
108
|
+
# Access results
|
|
109
|
+
print(results.summary())
|
|
110
|
+
|
|
111
|
+
# Get metric for specific split
|
|
112
|
+
test_results = results.get_split("test")
|
|
113
|
+
for condition, cond_result in test_results.conditions.items():
|
|
114
|
+
print(f"{condition}: Pearson={cond_result.get_metric_value('pearson'):.3f}")
|
|
115
|
+
```
|
|
116
|
+
|
|
117
|
+
### Command Line
|
|
118
|
+
|
|
119
|
+
```bash
|
|
120
|
+
# Basic usage
|
|
121
|
+
geneval --real real.h5ad --generated generated.h5ad \
|
|
122
|
+
--conditions perturbation cell_type \
|
|
123
|
+
--output results/
|
|
124
|
+
|
|
125
|
+
# With split column
|
|
126
|
+
geneval --real real.h5ad --generated generated.h5ad \
|
|
127
|
+
--conditions perturbation \
|
|
128
|
+
--split-column split \
|
|
129
|
+
--splits test \
|
|
130
|
+
--output results/
|
|
131
|
+
|
|
132
|
+
# Specify metrics
|
|
133
|
+
geneval --real real.h5ad --generated generated.h5ad \
|
|
134
|
+
--conditions perturbation \
|
|
135
|
+
--metrics pearson spearman wasserstein_1 mmd \
|
|
136
|
+
--output results/
|
|
137
|
+
```
|
|
138
|
+
|
|
139
|
+
## Expected Data Format
|
|
140
|
+
|
|
141
|
+
GenEval expects AnnData (h5ad) files with:
|
|
142
|
+
|
|
143
|
+
### Required
|
|
144
|
+
- `adata.X`: Gene expression matrix (samples × genes)
|
|
145
|
+
- `adata.var_names`: Gene identifiers (must overlap between datasets)
|
|
146
|
+
- `adata.obs[condition_columns]`: Columns for matching conditions
|
|
147
|
+
|
|
148
|
+
### Optional
|
|
149
|
+
- `adata.obs[split_column]`: Train/test split indicator
|
|
150
|
+
|
|
151
|
+
## Output Structure
|
|
152
|
+
|
|
153
|
+
```
|
|
154
|
+
output/
|
|
155
|
+
├── summary.json # Aggregate metrics and metadata
|
|
156
|
+
├── results.csv # Per-condition metrics table
|
|
157
|
+
├── per_gene_*.csv # Per-gene metric values
|
|
158
|
+
└── plots/
|
|
159
|
+
├── boxplot_metrics.png
|
|
160
|
+
├── violin_metrics.png
|
|
161
|
+
├── radar_split.png
|
|
162
|
+
├── scatter_grid.png
|
|
163
|
+
└── embedding_pca.png
|
|
164
|
+
```
|
|
165
|
+
|
|
166
|
+
## Contributing
|
|
167
|
+
|
|
168
|
+
Contributions are welcome! Please feel free to submit a pull request or open an issue.
|
|
169
|
+
|
|
170
|
+
## License
|
|
171
|
+
|
|
172
|
+
This project is licensed under the MIT License. See the LICENSE file for details.
|
|
@@ -0,0 +1,31 @@
|
|
|
1
|
+
geneval/__init__.py,sha256=6aSk50coEvm8rqwxGvuCyCaN_Dqj1VfuSqvOSTLEgqY,2872
|
|
2
|
+
geneval/cli.py,sha256=0ai0IGyn3SSmEnfLRJhcr0brvUxuNZHE4IXod7jvosU,9977
|
|
3
|
+
geneval/config.py,sha256=gkCjs_gzPWgUZNcmSR3Y70XQCAZ1m9AKLueaM-x8bvw,3729
|
|
4
|
+
geneval/core.py,sha256=No0DP8bNR6LedfCWEedY9C5r_c4M14rvSPaGZqbxc94,1155
|
|
5
|
+
geneval/data/__init__.py,sha256=nD3uWostZbYD3Yj_TOE44LvPDen-Vm3gN8ZH0QptPGw,450
|
|
6
|
+
geneval/data/gene_expression_datamodule.py,sha256=XiBIdf68JZ-3S-FaZsrQlBJA7qL9uUXo2C8y0r4an5M,8009
|
|
7
|
+
geneval/data/loader.py,sha256=zpRmwGZ4PJkB3rpXXRCMFtvMi4qvUrPkKmvIlGjfRpY,14555
|
|
8
|
+
geneval/evaluator.py,sha256=grPudMng-CcnWwkxQGWM6RZ198Q-1THkR4MCXtadCdU,11545
|
|
9
|
+
geneval/evaluators/__init__.py,sha256=i11sHvhsjEAeI3Aw9zFTPmCYuqkGxzTHggAKehe3HQ0,160
|
|
10
|
+
geneval/evaluators/base_evaluator.py,sha256=yJL568HdNofIcHgNOElSQMVlG9oRPTTDIZ7CmKccRqs,5967
|
|
11
|
+
geneval/evaluators/gene_expression_evaluator.py,sha256=v8QL6tzOQ3QVXdPMM8tFHTTviZC3WsPRX4G0ShgeDUw,8743
|
|
12
|
+
geneval/metrics/__init__.py,sha256=wk0CdFXvipfPqXWUMsRRz9CPiSVPG40Id4lyoSaLIkY,1417
|
|
13
|
+
geneval/metrics/base_metric.py,sha256=prbnB-Ap-P64m-2_TUrHxO3NFQaw-obVg1Tw4pjC5EY,6961
|
|
14
|
+
geneval/metrics/correlation.py,sha256=jpYmaihWK89J1E5yQinGUJeB6pTZ21xPNHJi3XYyXJE,6987
|
|
15
|
+
geneval/metrics/distances.py,sha256=9mWzbMbIBY1ckOd2a0l3by3aEFMQZL9bVMSeP44xzUg,16155
|
|
16
|
+
geneval/metrics/metrics.py,sha256=RPRUkgaDeL3cmJDEN7b3sUuPZdvrWXI3YRWwdsTTjL0,4171
|
|
17
|
+
geneval/models/__init__.py,sha256=vJHXIhwzykjoqZ-vHQJnPwwjSUu9nnMyo7jGnWlTd94,42
|
|
18
|
+
geneval/models/base_model.py,sha256=2QDtweYTgiovnksaRPBjNbIDu1l9l_WQMMFfeIX3GB8,1345
|
|
19
|
+
geneval/results.py,sha256=iXSB0o0f1jQrCKjc-lbRfwBFGhspTDDJpQ2K2tM-XR4,11362
|
|
20
|
+
geneval/testing.py,sha256=bD8c966LB6inNHabrFccoCRULPtPc_UYTty-uw7aSGU,11864
|
|
21
|
+
geneval/utils/__init__.py,sha256=wwzI0HWMz0FUp4V66XGRfzeaK3gaQUnIjDstG8ZUpFI,40
|
|
22
|
+
geneval/utils/io.py,sha256=LrRhIRlx_wlCs5Mayaq8hyVIp9uduHHohXuv8zQMwyI,888
|
|
23
|
+
geneval/utils/preprocessing.py,sha256=1Cij1O2dwDR6_zh5IEgLPq3jEmV8VfIRjfQrHiKe3Mw,2612
|
|
24
|
+
geneval/visualization/__init__.py,sha256=LN19jl5xV4WVJTePaOUHWvKZ_pgDFp1chhcklGkNtm8,792
|
|
25
|
+
geneval/visualization/plots.py,sha256=3K94r3x5NjIUZ-hYVQIivO63VkLOvDWl-BLB_qL2pSY,15008
|
|
26
|
+
geneval/visualization/visualizer.py,sha256=dwx1oc0C3dXlJguvIx1pLAOeHcGE5v85OctHOsRE2Yo,36526
|
|
27
|
+
gengeneeval-0.1.0.dist-info/METADATA,sha256=KaYsfE44TMNNBVrfCEFvk4sBczDR5IYy1fzfY8C1nEI,6041
|
|
28
|
+
gengeneeval-0.1.0.dist-info/WHEEL,sha256=3ny-bZhpXrU6vSQ1UPG34FoxZBp3lVcvK0LkgUz6VLk,88
|
|
29
|
+
gengeneeval-0.1.0.dist-info/entry_points.txt,sha256=xTkwnNa2fP0w1uGVsafzRTaCeuBSWLlNO-1CN8uBSK0,43
|
|
30
|
+
gengeneeval-0.1.0.dist-info/licenses/LICENSE,sha256=RDHgHDI4rSDq35R4CAC3npy86YUnmZ81ecO7aHfmmGA,1073
|
|
31
|
+
gengeneeval-0.1.0.dist-info/RECORD,,
|
|
@@ -0,0 +1,9 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2023 [Your Name]
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
|
|
6
|
+
|
|
7
|
+
1. The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
|
|
8
|
+
|
|
9
|
+
2. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|