fastmhn 1.0.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
fastmhn-1.0.0/LICENSE ADDED
@@ -0,0 +1,7 @@
1
+ Copyright (c) 2025 Simon Pfahler
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
4
+
5
+ The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
6
+
7
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
fastmhn-1.0.0/PKG-INFO ADDED
@@ -0,0 +1,218 @@
1
+ Metadata-Version: 2.4
2
+ Name: fastmhn
3
+ Version: 1.0.0
4
+ Summary: Fast inference of MHN models
5
+ Author-email: Simon Pfahler <simon.pfahler@ur.de>
6
+ License-Expression: MIT
7
+ Project-URL: Repository, https://github.com/simon-pfahler/fastmhn
8
+ Project-URL: Homepage, https://github.com/simon-pfahler/fastmhn
9
+ Project-URL: Issues, https://github.com/simon-pfahler/fastmhn/issues
10
+ Keywords: mhn,mutational hierarchical networks,cancer,evolution,probabilistic graphical models
11
+ Classifier: Development Status :: 5 - Production/Stable
12
+ Classifier: Intended Audience :: Science/Research
13
+ Classifier: Intended Audience :: Developers
14
+ Classifier: Programming Language :: Python :: 3
15
+ Classifier: Programming Language :: Python :: 3.11
16
+ Classifier: Programming Language :: Python :: 3.12
17
+ Classifier: Programming Language :: Python :: 3.13
18
+ Classifier: Programming Language :: Python :: 3.14
19
+ Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
20
+ Classifier: Topic :: Scientific/Engineering :: Mathematics
21
+ Classifier: Operating System :: OS Independent
22
+ Requires-Python: >=3.11
23
+ Description-Content-Type: text/markdown
24
+ License-File: LICENSE
25
+ Requires-Dist: numpy
26
+ Requires-Dist: mhn
27
+ Requires-Dist: joblib
28
+ Dynamic: license-file
29
+
30
+ ![Test](https://img.shields.io/github/actions/workflow/status/simon-pfahler/fastmhn/test.yml.svg?branch=main&label=test)
31
+
32
+ # FastMHN - Fast inference of MHNs
33
+
34
+ **FastMHN** is a Python package for approximate learning of Mutational Hierarchical Networks (MHNs) and observation MHNs (oMHNs). It enables fast inference through suitable rank-1 approximations of the time-marginalized probability distributions, making it practical to work with larger datasets where exact methods would be computationally prohibitive.
35
+
36
+ ## Overview
37
+
38
+ Mutational Hierarchical Networks (MHNs) are probabilistic graphical models used to model the accumulation of mutations in cancer and other evolutionary processes. They capture dependencies between binary events (e.g., mutations, copy-number alterations) through a graph structure.
39
+
40
+ This package provides:
41
+
42
+ - **Approximate learning** of MHN and oMHN models using clustering-based approximations
43
+ - **Exact learning** methods for smaller datasets (mostly for testing)
44
+ - **Cross-validation** support for hyperparameter tuning (e.g., regularization strength)
45
+
46
+ The approximation methods allow inference on datasets with higher mutational burdens where exact computation of the full state space would be infeasible.
47
+
48
+ ## Installation
49
+
50
+ The package can be installed directly from PyPI:
51
+
52
+ ```bash
53
+ pip install fastmhn
54
+ ```
55
+
56
+ Or clone the repository and install manually:
57
+
58
+ ```bash
59
+ git clone https://phygit.ur.de/physics/mhn/fastmhn.git
60
+ cd fastmhn
61
+ pip install -e .
62
+ ```
63
+
64
+ ### Dependencies
65
+
66
+ - Python >= 3.11
67
+ - NumPy
68
+ - joblib
69
+ - mhn
70
+
71
+ ## Example Usage
72
+
73
+ ### Learning an MHN model
74
+
75
+ ```python
76
+ import numpy as np
77
+ import fastmhn
78
+
79
+ # Generate synthetic data: N samples, d events
80
+ # Each row is a binary vector indicating which events occurred
81
+ d = 5
82
+ N = 100
83
+ data = np.random.randint(2, size=(N, d), dtype=np.int32)
84
+
85
+ # Learn MHN model with approximate gradient computation
86
+ theta = fastmhn.learn.learn_mhn(
87
+ data,
88
+ reg=1e-2, # L1 regularization strength
89
+ gradient_and_score_params={"max_cluster_size": 10},
90
+ adam_params={
91
+ "alpha": 0.1,
92
+ "beta1": 0.7,
93
+ "beta2": 0.9,
94
+ "eps": 1e-8,
95
+ "verbose": True,
96
+ },
97
+ )
98
+
99
+ # theta is a d x d matrix representing the learned MHN
100
+ print(f"Learned theta matrix:\n{theta}")
101
+ ```
102
+
103
+ Replace `data` with your own dataset, this is just a placeholder in the code snippet.
104
+
105
+ ### Learning an oMHN model
106
+
107
+ The observation MHN (oMHN) extends MHN by modeling observation rates that the active events can influence:
108
+
109
+ ```python
110
+ import numpy as np
111
+ import fastmhn
112
+
113
+ # Generate data
114
+ d = 5
115
+ N = 100
116
+ data = np.random.randint(2, size=(N, d), dtype=np.int32)
117
+
118
+ # Learn oMHN model
119
+ theta = fastmhn.learn.learn_omhn(
120
+ data,
121
+ reg=1e-2,
122
+ gradient_and_score_params={"max_cluster_size": 10},
123
+ adam_params={"alpha": 0.1, "beta1": 0.7, "beta2": 0.9, "eps": 1e-8},
124
+ )
125
+
126
+ # theta is a (d+1) x d matrix
127
+ # First d rows: MHN parameters
128
+ # Last row: observation rates
129
+ print(f"Learned oMHN theta matrix:\n{theta}")
130
+ ```
131
+
132
+ ### Cross-validation for regularization strength
133
+
134
+ ```python
135
+ import numpy as np
136
+ import fastmhn
137
+
138
+ # Generate data
139
+ d = 5
140
+ N = 100
141
+ data = np.random.randint(2, size=(N, d), dtype=np.int32)
142
+
143
+ # Cross-validation parameters
144
+ k = 5 # number of folds
145
+ reg = 1e-2 # regularization strength to evaluate
146
+
147
+ # Shuffle data
148
+ rng = np.random.default_rng(42)
149
+ shuffled_indices = np.arange(N)
150
+ rng.shuffle(shuffled_indices)
151
+ data = data[shuffled_indices, :]
152
+
153
+ # Create folds
154
+ fold_sizes = (N // k) * np.ones(k, dtype=int)
155
+ fold_sizes[: N % k] += 1
156
+
157
+ # Get score offset for comparison
158
+ score_offset = fastmhn.utility.get_score_offset(data)
159
+ average_validation_score = 0
160
+
161
+ for k_index in range(k):
162
+ # Split into training and validation
163
+ val_start = np.sum(fold_sizes[:k_index])
164
+ val_end = np.sum(fold_sizes[: k_index + 1])
165
+ data_val = data[val_start:val_end]
166
+ data_train = np.concatenate((data[:val_start], data[val_end:]))
167
+
168
+ # Learn model on training data
169
+ theta = fastmhn.learn.learn_omhn(
170
+ data_train,
171
+ reg=reg,
172
+ gradient_and_score_params={"max_cluster_size": 10},
173
+ adam_params={"verbose": False},
174
+ )
175
+
176
+ # Evaluate on validation data
177
+ ctheta = fastmhn.utility.cmhn_from_omhn(theta)
178
+ _, val_score = fastmhn.approx.approx_gradient_and_score(
179
+ ctheta, data_val, max_cluster_size=10
180
+ )
181
+ average_validation_score += val_score
182
+
183
+ average_validation_score /= k
184
+ print(f"Average validation score: {average_validation_score} (offset: {score_offset})")
185
+ ```
186
+
187
+ ### Using the command-line scripts
188
+
189
+ The repository includes convenience scripts for common tasks:
190
+
191
+ - `learn_approx_mhn.py` - Learn an MHN model
192
+ - `learn_approx_omhn.py` - Learn an oMHN model
193
+ - `learn_approx_omhn_crossvalidated.py` - Learn oMHN with cross-validation
194
+
195
+ You can use these as templates or run them directly:
196
+
197
+ ```bash
198
+ python learn_approx_omhn.py
199
+ ```
200
+
201
+ ## API Reference
202
+
203
+ The main functions are accessible through the `fastmhn` package:
204
+
205
+ - `fastmhn.learn.learn_mhn()` - Learn an MHN model
206
+ - `fastmhn.learn.learn_omhn()` - Learn an oMHN model
207
+ - `fastmhn.approx.approx_gradient_and_score()` - Approximate gradient and score computation
208
+ - `fastmhn.exact.gradient_and_score()` - Exact gradient and score computation
209
+ - `fastmhn.utility.create_pD()` - Create probability distribution
210
+ - `fastmhn.utility.generate_data()` - Generate synthetic data
211
+
212
+ ## License
213
+
214
+ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
215
+
216
+ ## Repository
217
+
218
+ - GitHub: https://github.com/simon-pfahler/fastmhn
@@ -0,0 +1,189 @@
1
+ ![Test](https://img.shields.io/github/actions/workflow/status/simon-pfahler/fastmhn/test.yml.svg?branch=main&label=test)
2
+
3
+ # FastMHN - Fast inference of MHNs
4
+
5
+ **FastMHN** is a Python package for approximate learning of Mutational Hierarchical Networks (MHNs) and observation MHNs (oMHNs). It enables fast inference through suitable rank-1 approximations of the time-marginalized probability distributions, making it practical to work with larger datasets where exact methods would be computationally prohibitive.
6
+
7
+ ## Overview
8
+
9
+ Mutational Hierarchical Networks (MHNs) are probabilistic graphical models used to model the accumulation of mutations in cancer and other evolutionary processes. They capture dependencies between binary events (e.g., mutations, copy-number alterations) through a graph structure.
10
+
11
+ This package provides:
12
+
13
+ - **Approximate learning** of MHN and oMHN models using clustering-based approximations
14
+ - **Exact learning** methods for smaller datasets (mostly for testing)
15
+ - **Cross-validation** support for hyperparameter tuning (e.g., regularization strength)
16
+
17
+ The approximation methods allow inference on datasets with higher mutational burdens where exact computation of the full state space would be infeasible.
18
+
19
+ ## Installation
20
+
21
+ The package can be installed directly from PyPI:
22
+
23
+ ```bash
24
+ pip install fastmhn
25
+ ```
26
+
27
+ Or clone the repository and install manually:
28
+
29
+ ```bash
30
+ git clone https://phygit.ur.de/physics/mhn/fastmhn.git
31
+ cd fastmhn
32
+ pip install -e .
33
+ ```
34
+
35
+ ### Dependencies
36
+
37
+ - Python >= 3.11
38
+ - NumPy
39
+ - joblib
40
+ - mhn
41
+
42
+ ## Example Usage
43
+
44
+ ### Learning an MHN model
45
+
46
+ ```python
47
+ import numpy as np
48
+ import fastmhn
49
+
50
+ # Generate synthetic data: N samples, d events
51
+ # Each row is a binary vector indicating which events occurred
52
+ d = 5
53
+ N = 100
54
+ data = np.random.randint(2, size=(N, d), dtype=np.int32)
55
+
56
+ # Learn MHN model with approximate gradient computation
57
+ theta = fastmhn.learn.learn_mhn(
58
+ data,
59
+ reg=1e-2, # L1 regularization strength
60
+ gradient_and_score_params={"max_cluster_size": 10},
61
+ adam_params={
62
+ "alpha": 0.1,
63
+ "beta1": 0.7,
64
+ "beta2": 0.9,
65
+ "eps": 1e-8,
66
+ "verbose": True,
67
+ },
68
+ )
69
+
70
+ # theta is a d x d matrix representing the learned MHN
71
+ print(f"Learned theta matrix:\n{theta}")
72
+ ```
73
+
74
+ Replace `data` with your own dataset, this is just a placeholder in the code snippet.
75
+
76
+ ### Learning an oMHN model
77
+
78
+ The observation MHN (oMHN) extends MHN by modeling observation rates that the active events can influence:
79
+
80
+ ```python
81
+ import numpy as np
82
+ import fastmhn
83
+
84
+ # Generate data
85
+ d = 5
86
+ N = 100
87
+ data = np.random.randint(2, size=(N, d), dtype=np.int32)
88
+
89
+ # Learn oMHN model
90
+ theta = fastmhn.learn.learn_omhn(
91
+ data,
92
+ reg=1e-2,
93
+ gradient_and_score_params={"max_cluster_size": 10},
94
+ adam_params={"alpha": 0.1, "beta1": 0.7, "beta2": 0.9, "eps": 1e-8},
95
+ )
96
+
97
+ # theta is a (d+1) x d matrix
98
+ # First d rows: MHN parameters
99
+ # Last row: observation rates
100
+ print(f"Learned oMHN theta matrix:\n{theta}")
101
+ ```
102
+
103
+ ### Cross-validation for regularization strength
104
+
105
+ ```python
106
+ import numpy as np
107
+ import fastmhn
108
+
109
+ # Generate data
110
+ d = 5
111
+ N = 100
112
+ data = np.random.randint(2, size=(N, d), dtype=np.int32)
113
+
114
+ # Cross-validation parameters
115
+ k = 5 # number of folds
116
+ reg = 1e-2 # regularization strength to evaluate
117
+
118
+ # Shuffle data
119
+ rng = np.random.default_rng(42)
120
+ shuffled_indices = np.arange(N)
121
+ rng.shuffle(shuffled_indices)
122
+ data = data[shuffled_indices, :]
123
+
124
+ # Create folds
125
+ fold_sizes = (N // k) * np.ones(k, dtype=int)
126
+ fold_sizes[: N % k] += 1
127
+
128
+ # Get score offset for comparison
129
+ score_offset = fastmhn.utility.get_score_offset(data)
130
+ average_validation_score = 0
131
+
132
+ for k_index in range(k):
133
+ # Split into training and validation
134
+ val_start = np.sum(fold_sizes[:k_index])
135
+ val_end = np.sum(fold_sizes[: k_index + 1])
136
+ data_val = data[val_start:val_end]
137
+ data_train = np.concatenate((data[:val_start], data[val_end:]))
138
+
139
+ # Learn model on training data
140
+ theta = fastmhn.learn.learn_omhn(
141
+ data_train,
142
+ reg=reg,
143
+ gradient_and_score_params={"max_cluster_size": 10},
144
+ adam_params={"verbose": False},
145
+ )
146
+
147
+ # Evaluate on validation data
148
+ ctheta = fastmhn.utility.cmhn_from_omhn(theta)
149
+ _, val_score = fastmhn.approx.approx_gradient_and_score(
150
+ ctheta, data_val, max_cluster_size=10
151
+ )
152
+ average_validation_score += val_score
153
+
154
+ average_validation_score /= k
155
+ print(f"Average validation score: {average_validation_score} (offset: {score_offset})")
156
+ ```
157
+
158
+ ### Using the command-line scripts
159
+
160
+ The repository includes convenience scripts for common tasks:
161
+
162
+ - `learn_approx_mhn.py` - Learn an MHN model
163
+ - `learn_approx_omhn.py` - Learn an oMHN model
164
+ - `learn_approx_omhn_crossvalidated.py` - Learn oMHN with cross-validation
165
+
166
+ You can use these as templates or run them directly:
167
+
168
+ ```bash
169
+ python learn_approx_omhn.py
170
+ ```
171
+
172
+ ## API Reference
173
+
174
+ The main functions are accessible through the `fastmhn` package:
175
+
176
+ - `fastmhn.learn.learn_mhn()` - Learn an MHN model
177
+ - `fastmhn.learn.learn_omhn()` - Learn an oMHN model
178
+ - `fastmhn.approx.approx_gradient_and_score()` - Approximate gradient and score computation
179
+ - `fastmhn.exact.gradient_and_score()` - Exact gradient and score computation
180
+ - `fastmhn.utility.create_pD()` - Create probability distribution
181
+ - `fastmhn.utility.generate_data()` - Generate synthetic data
182
+
183
+ ## License
184
+
185
+ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
186
+
187
+ ## Repository
188
+
189
+ - GitHub: https://github.com/simon-pfahler/fastmhn
@@ -0,0 +1,44 @@
1
+ [build-system]
2
+ requires = ["setuptools>=61.0", "wheel"]
3
+ build-backend = "setuptools.build_meta"
4
+
5
+ [project]
6
+ name = "fastmhn"
7
+ authors = [
8
+ {name = "Simon Pfahler", email = "simon.pfahler@ur.de"}
9
+ ]
10
+ description = "Fast inference of MHN models"
11
+ requires-python = ">=3.11"
12
+ readme = "README.md"
13
+ license = "MIT"
14
+ classifiers = [
15
+ "Development Status :: 5 - Production/Stable",
16
+ "Intended Audience :: Science/Research",
17
+ "Intended Audience :: Developers",
18
+ "Programming Language :: Python :: 3",
19
+ "Programming Language :: Python :: 3.11",
20
+ "Programming Language :: Python :: 3.12",
21
+ "Programming Language :: Python :: 3.13",
22
+ "Programming Language :: Python :: 3.14",
23
+ "Topic :: Scientific/Engineering :: Bio-Informatics",
24
+ "Topic :: Scientific/Engineering :: Mathematics",
25
+ "Operating System :: OS Independent",
26
+ ]
27
+ keywords = [
28
+ "mhn",
29
+ "mutational hierarchical networks",
30
+ "cancer",
31
+ "evolution",
32
+ "probabilistic graphical models",
33
+ ]
34
+ dependencies = [
35
+ "numpy",
36
+ "mhn",
37
+ "joblib"
38
+ ]
39
+ version = "1.0.0"
40
+
41
+ [project.urls]
42
+ Repository = "https://github.com/simon-pfahler/fastmhn"
43
+ Homepage = "https://github.com/simon-pfahler/fastmhn"
44
+ Issues = "https://github.com/simon-pfahler/fastmhn/issues"
@@ -0,0 +1,4 @@
1
+ [egg_info]
2
+ tag_build =
3
+ tag_date = 0
4
+
@@ -0,0 +1,85 @@
1
+ """
2
+ fastmhn -- Fast inference of MHN (Mutational Hierarchical Networks) models.
3
+
4
+ Version: 1.0.0
5
+
6
+ Modules
7
+ -------
8
+ utility : Utility functions for data generation, pD creation, etc.
9
+ explicit : Exact calculations using full state space
10
+ exact : Alternative exact implementations
11
+ approx : Approximate calculations using clustering
12
+ clustering : Hierarchical clustering algorithms
13
+ learn : Learning algorithms (Adam, AdamW) and model fitting
14
+ """
15
+
16
+ __version__ = "1.0.0"
17
+
18
+ from . import approx, clustering, exact, explicit, learn, utility
19
+ from .approx import approx_gradient_and_score
20
+ from .clustering import hierarchical_clustering
21
+ from .exact import gradient_and_score
22
+ from .explicit import (
23
+ apply_eye_minus_Q,
24
+ apply_eye_minus_Q_diag,
25
+ apply_eye_minus_Q_offdiag,
26
+ apply_Qdiff_ii,
27
+ calculate_pTheta,
28
+ create_full_Q,
29
+ score,
30
+ )
31
+ from .learn import learn_mhn, learn_omhn
32
+ from .utility import (
33
+ adam,
34
+ adamW,
35
+ backward_substitution,
36
+ cmhn_from_omhn,
37
+ create_indep_model,
38
+ create_pD,
39
+ forward_substitution,
40
+ generate_data,
41
+ generate_theta,
42
+ get_score_offset,
43
+ get_subdata,
44
+ jacobi,
45
+ )
46
+
47
+ __all__ = [
48
+ # utility
49
+ "adam",
50
+ "adamW",
51
+ "backward_substitution",
52
+ "cmhn_from_omhn",
53
+ "create_indep_model",
54
+ "create_pD",
55
+ "forward_substitution",
56
+ "generate_data",
57
+ "generate_theta",
58
+ "get_score_offset",
59
+ "get_subdata",
60
+ "jacobi",
61
+ # exact
62
+ "gradient_and_score",
63
+ # explicit
64
+ "apply_Qdiff_ii",
65
+ "apply_eye_minus_Q",
66
+ "apply_eye_minus_Q_diag",
67
+ "apply_eye_minus_Q_offdiag",
68
+ "calculate_pTheta",
69
+ "create_full_Q",
70
+ "score",
71
+ # approx
72
+ "approx_gradient_and_score",
73
+ # clustering
74
+ "hierarchical_clustering",
75
+ # learn
76
+ "learn_mhn",
77
+ "learn_omhn",
78
+ # submodules
79
+ "approx",
80
+ "clustering",
81
+ "exact",
82
+ "explicit",
83
+ "learn",
84
+ "utility",
85
+ ]