vpop-calibration 2.0.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2024 Nova
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,78 @@
1
+ Metadata-Version: 2.4
2
+ Name: vpop-calibration
3
+ Version: 2.0.0
4
+ Summary:
5
+ License-File: LICENSE
6
+ Author: Paul Lemarre
7
+ Author-email: paul.lemarre@novainsilico.ai
8
+ Requires-Python: >=3.12,<4.0
9
+ Classifier: Programming Language :: Python :: 3
10
+ Classifier: Programming Language :: Python :: 3.12
11
+ Classifier: Programming Language :: Python :: 3.13
12
+ Classifier: Programming Language :: Python :: 3.14
13
+ Requires-Dist: gpytorch (>=1.14.2,<2.0.0)
14
+ Requires-Dist: matplotlib (>=3.10.7,<4.0.0)
15
+ Requires-Dist: numpy (>=2.3.4,<3.0.0)
16
+ Requires-Dist: pandas (>=2.3.3,<3.0.0)
17
+ Requires-Dist: plotly (>=6.4.0,<7.0.0)
18
+ Requires-Dist: scipy (>=1.16.3,<2.0.0)
19
+ Requires-Dist: torch
20
+ Requires-Dist: tqdm (>=4.67.1,<5.0.0)
21
+ Requires-Dist: uuid (>=1.30,<2.0)
22
+ Description-Content-Type: text/markdown
23
+
24
+ # Vpop calibration
25
+
26
+ ## Description
27
+
28
+ A set of Python tools to allow for virtual population calibration, using a non-linear mixed effects (NLME) model approach, combined with surrogate models in order to speed up the simulation of QSP models.
29
+
30
+ ### Currently available features
31
+
32
+ - Surrogate modeling using gaussian processes, implemented using [GPyTorch](https://github.com/cornellius-gp/gpytorch)
33
+ - Synthetic data generation using ODE models. The current implementation uses [scipy.integrate.solve_ivp](https://docs.scipy.org/doc/scipy/reference/generated/scipy.integrate.solve_ivp.html), parallelized with [multiprocessing](https://docs.python.org/3/library/multiprocessing.html)
34
+ - Non-linear mixed effect models:
35
+ - Log-distributed parameters
36
+ - Additive or multiplicative error model
37
+ - Covariates handling
38
+ - Known individual patient descriptors (i.e. covariates with no effect on other descriptors outside of the structural model)
39
+ - SAEM: see the [dedicated doc](./docs/saem_implementation.md) for more details
40
+
41
+ ## Getting started
42
+
43
+ - [Tutorial](./examples/saem_gp_model.ipynb): this notebook demonstrates step-by-step how to create and train a surrogate model, using a reference ODE model and a GP surrogate model. It then showcases how to optimize the surrogate model on synthetic data using SAEM
44
+ - Other available examples:
45
+ - [Data generation using Sobol sequences](./examples/generate_data_ranges.ipynb)
46
+ - [Data generation using a reference NLME model](./examples/generate_data_nlme.ipynb)
47
+ - [Training and exporting a GP using synthetic data](./examples/train_gp.ipynb)
48
+ - [Running SAEM on a reference ODE model](./examples/saem_ode_model.ipynb). Note: the current implementation is notably under-optimized for running SAEM directly on an ODE structural model. This is implemented for testing purposes mostly
49
+ - [Training a GP with a deep kernel](./examples/train_deep_kernel.ipynb)
50
+
51
+ ## Support
52
+
53
+ For any issue or comments, please reach out to paul.lemarre@novainsilico.ai, or feel free to open an issue in the repo directly.
54
+
55
+ ## Authors and acknowledgment
56
+
57
+ - Paul Lemarre
58
+ - Eléonore Dravet
59
+ - Adeline Leclerq-Sampson
60
+
61
+ ## Roadmap
62
+
63
+ - NLME:
64
+ - Support additional error models (additive-multiplicative, power, etc...)
65
+ - Support additional covariate models (categorical covariates)
66
+ - Add residual diagnostic methods (weighted residuals computation and visualization)
67
+ - Structural models:
68
+ - Integrate with SBML models (Roadrunner)
69
+ - Surrogate models:
70
+ - Support additional surrogate models in PyTorch
71
+ - Optimizer:
72
+ - Add SVGP for surrogate model optimization
73
+
74
+ ## References
75
+
76
+ - [Delyon et al. 99](https://doi.org/10.1214/aos/1018031103): Bernard Delyon. Marc Lavielle. Eric Moulines. "Convergence of a stochastic approximation version of the EM algorithm." Ann. Statist. 27 (1) 94 - 128, February 1999. https://doi.org/10.1214/aos/1018031103
77
+ - [Grenier et al. 2018](https://doi.org/10.1007/s40314-016-0337-5): Grenier, E., Helbert, C., Louvet, V. et al. Population parametrization of costly black box models using iterations between SAEM algorithm and kriging. Comp. Appl. Math. 37, 161–173 (2018). https://doi.org/10.1007/s40314-016-0337-5
78
+
@@ -0,0 +1,54 @@
1
+ # Vpop calibration
2
+
3
+ ## Description
4
+
5
+ A set of Python tools to allow for virtual population calibration, using a non-linear mixed effects (NLME) model approach, combined with surrogate models in order to speed up the simulation of QSP models.
6
+
7
+ ### Currently available features
8
+
9
+ - Surrogate modeling using gaussian processes, implemented using [GPyTorch](https://github.com/cornellius-gp/gpytorch)
10
+ - Synthetic data generation using ODE models. The current implementation uses [scipy.integrate.solve_ivp](https://docs.scipy.org/doc/scipy/reference/generated/scipy.integrate.solve_ivp.html), parallelized with [multiprocessing](https://docs.python.org/3/library/multiprocessing.html)
11
+ - Non-linear mixed effect models:
12
+ - Log-distributed parameters
13
+ - Additive or multiplicative error model
14
+ - Covariates handling
15
+ - Known individual patient descriptors (i.e. covariates with no effect on other descriptors outside of the structural model)
16
+ - SAEM: see the [dedicated doc](./docs/saem_implementation.md) for more details
17
+
18
+ ## Getting started
19
+
20
+ - [Tutorial](./examples/saem_gp_model.ipynb): this notebook demonstrates step-by-step how to create and train a surrogate model, using a reference ODE model and a GP surrogate model. It then showcases how to optimize the surrogate model on synthetic data using SAEM
21
+ - Other available examples:
22
+ - [Data generation using Sobol sequences](./examples/generate_data_ranges.ipynb)
23
+ - [Data generation using a reference NLME model](./examples/generate_data_nlme.ipynb)
24
+ - [Training and exporting a GP using synthetic data](./examples/train_gp.ipynb)
25
+ - [Running SAEM on a reference ODE model](./examples/saem_ode_model.ipynb). Note: the current implementation is notably under-optimized for running SAEM directly on an ODE structural model. This is implemented for testing purposes mostly
26
+ - [Training a GP with a deep kernel](./examples/train_deep_kernel.ipynb)
27
+
28
+ ## Support
29
+
30
+ For any issue or comments, please reach out to paul.lemarre@novainsilico.ai, or feel free to open an issue in the repo directly.
31
+
32
+ ## Authors and acknowledgment
33
+
34
+ - Paul Lemarre
35
+ - Eléonore Dravet
36
+ - Adeline Leclerq-Sampson
37
+
38
+ ## Roadmap
39
+
40
+ - NLME:
41
+ - Support additional error models (additive-multiplicative, power, etc...)
42
+ - Support additional covariate models (categorical covariates)
43
+ - Add residual diagnostic methods (weighted residuals computation and visualization)
44
+ - Structural models:
45
+ - Integrate with SBML models (Roadrunner)
46
+ - Surrogate models:
47
+ - Support additional surrogate models in PyTorch
48
+ - Optimizer:
49
+ - Add SVGP for surrogate model optimization
50
+
51
+ ## References
52
+
53
+ - [Delyon et al. 99](https://doi.org/10.1214/aos/1018031103): Bernard Delyon. Marc Lavielle. Eric Moulines. "Convergence of a stochastic approximation version of the EM algorithm." Ann. Statist. 27 (1) 94 - 128, February 1999. https://doi.org/10.1214/aos/1018031103
54
+ - [Grenier et al. 2018](https://doi.org/10.1007/s40314-016-0337-5): Grenier, E., Helbert, C., Louvet, V. et al. Population parametrization of costly black box models using iterations between SAEM algorithm and kriging. Comp. Appl. Math. 37, 161–173 (2018). https://doi.org/10.1007/s40314-016-0337-5
@@ -0,0 +1,64 @@
1
+ [project]
2
+ name = "vpop-calibration"
3
+ version = "2.0.0"
4
+ description = ""
5
+ authors = [{ name = "Paul Lemarre", email = "paul.lemarre@novainsilico.ai" }]
6
+ readme = "README.md"
7
+ requires-python = ">=3.12,<4.0"
8
+ dependencies = [
9
+ "torch",
10
+ "gpytorch (>=1.14.2,<2.0.0)",
11
+ "scipy (>=1.16.3,<2.0.0)",
12
+ "uuid (>=1.30,<2.0)",
13
+ "matplotlib (>=3.10.7,<4.0.0)",
14
+ "plotly (>=6.4.0,<7.0.0)",
15
+ "pandas (>=2.3.3,<3.0.0)",
16
+ "numpy (>=2.3.4,<3.0.0)",
17
+ "tqdm (>=4.67.1,<5.0.0)",
18
+ ]
19
+
20
+ [tool.poetry.group.dev.dependencies]
21
+ torch = { url = "https://download.pytorch.org/whl/cpu/torch-2.9.0%2Bcpu-cp313-cp313-manylinux_2_28_x86_64.whl#sha256=6c9b217584400963d5b4daddb3711ec7a3778eab211e18654fba076cce3b8682" }
22
+ ipykernel = "^7.1.0"
23
+ jupyter = "^1.1.1"
24
+ ipython = "^9.7.0"
25
+ jupyterlab = "^4.4.10"
26
+ jupytext = "^1.18.1"
27
+ plotnine = "^0.15.1"
28
+ pytest = "^9.0.1"
29
+ pytest-cov = "^7.0.0"
30
+
31
+ [build-system]
32
+ requires = ["poetry-core>=2.0.0,<3.0.0"]
33
+ build-backend = "poetry.core.masonry.api"
34
+
35
+ [tool.pytest]
36
+ minversion = "9.0"
37
+ addopts = ["-ra", "-q", "--cov=vpop_calibration", "--cov-report=markdown"]
38
+ testpaths = ["vpop_calibration/test"]
39
+ filterwarnings = ["error", "ignore::UserWarning", "ignore::DeprecationWarning"]
40
+
41
+ [tool.coverage.run]
42
+ branch = true
43
+ omit = ["vpop_calibration/test/*"]
44
+
45
+ [tool.coverage.report]
46
+ # Regexes for lines to exclude from consideration
47
+ exclude_also = [
48
+ # Don't complain about missing debug-only code:
49
+ "def __repr__",
50
+ "if self\\.debug",
51
+
52
+ # Don't complain if tests don't hit defensive assertion code:
53
+ "raise AssertionError",
54
+ "raise NotImplementedError",
55
+
56
+ # Don't complain if non-runnable code isn't run:
57
+ "if 0:",
58
+ "if __name__ == .__main__.:",
59
+
60
+ # Don't complain about abstract methods, they aren't run:
61
+ "@(abc\\.)?abstractmethod",
62
+ ]
63
+ show_missing = true
64
+ ignore_errors = true
@@ -0,0 +1,19 @@
1
+ from .nlme import NlmeModel
2
+ from .saem import PySaem
3
+ from .structural_model import StructuralGp, StructuralOdeModel
4
+ from .model import *
5
+ from .ode import OdeModel
6
+ from .vpop import generate_vpop_from_ranges
7
+ from .data_generation import simulate_dataset_from_omega, simulate_dataset_from_ranges
8
+
9
+ __all__ = [
10
+ "GP",
11
+ "OdeModel",
12
+ "StructuralGp",
13
+ "StructuralOdeModel",
14
+ "NlmeModel",
15
+ "PySaem",
16
+ "simulate_dataset_from_omega",
17
+ "simulate_dataset_from_ranges",
18
+ "generate_vpop_from_ranges",
19
+ ]
@@ -0,0 +1,180 @@
1
+ import numpy as np
2
+ import pandas as pd
3
+ from typing import Optional
4
+
5
+ from .ode import OdeModel
6
+ from .vpop import generate_vpop_from_ranges
7
+ from .structural_model import StructuralOdeModel
8
+ from .nlme import NlmeModel
9
+
10
+
11
+ def simulate_dataset_from_ranges(
12
+ ode_model: OdeModel,
13
+ log_nb_individuals: int,
14
+ param_ranges: dict[str, dict[str, float | bool]],
15
+ initial_conditions: np.ndarray,
16
+ protocol_design: Optional[pd.DataFrame],
17
+ residual_error_variance: Optional[np.ndarray],
18
+ error_model: Optional[str], # "additive" or "proportional"
19
+ time_steps: np.ndarray,
20
+ ) -> pd.DataFrame:
21
+ """Generate a simulated data set with an ODE model
22
+
23
+ Simulates a dataset for training a surrogate model. Timesteps can be different for each output.
24
+ The parameter space is explored with Sobol sequences.
25
+
26
+ Args:
27
+ log_nb_individuals (int): The number of simulated patients will be 2^this parameter
28
+ param_ranges (list[dict]): For each parameter in the model, a dict describing the search space 'low': low bound, 'high': high bound, and 'log': True if the search space is log-scaled
29
+ initial_conditions (array): set of initial conditions, one for each variable
30
+ protocol_design (optional): a DataFrame with a `protocol_arm` column, and one column per parameter override
31
+ residual_error_variance (np.array): A 1D array of residual error variances for each output.
32
+ error_model (str): the type of error model ("additive" or "proportional").
33
+ time_steps (np.array): an array with the time points
34
+ Returns:
35
+ pd.DataFrame: A DataFrame with columns 'id', parameter names, 'time', 'output_name', and 'value'.
36
+
37
+ Notes:
38
+ If a parameter appears both in the ranges and in the protocol design, the ranges take precedence.
39
+ """
40
+
41
+ # Validate input data
42
+ params_to_explore = list(param_ranges.keys())
43
+
44
+ if protocol_design is None:
45
+ print("No protocol")
46
+ params = params_to_explore
47
+ params_in_protocol = []
48
+ protocol_design_filt = pd.DataFrame({"protocol_arm": ["identity"]})
49
+ else:
50
+ params_in_protocol = protocol_design.drop(
51
+ "protocol_arm", axis=1
52
+ ).columns.tolist()
53
+ # Find the paramaters that appear both in the ranges and the protocol
54
+ overlap = set(params_to_explore) & set(params_in_protocol)
55
+ if overlap != set():
56
+ protocol_design_filt = protocol_design.drop(list(overlap), axis=1)
57
+ print(
58
+ f"Warning: ignoring entries {overlap} from the protocol design (already defined in the ranges)."
59
+ )
60
+ else:
61
+ protocol_design_filt = protocol_design
62
+
63
+ params = params_to_explore + params_in_protocol
64
+ if set(params) != set(ode_model.param_names):
65
+ raise ValueError(
66
+ f"Under-defined system: missing {set(ode_model.param_names) - set(params)}"
67
+ )
68
+ # Generate the vpop using sobol sequences
69
+ patients_df = generate_vpop_from_ranges(log_nb_individuals, param_ranges)
70
+
71
+ # Add a choice of protocol arm for each patient
72
+ protocol_arms = pd.DataFrame(protocol_design_filt["protocol_arm"].drop_duplicates())
73
+ patients_df = patients_df.merge(protocol_arms, how="cross")
74
+ # Add the outputs for each patient
75
+ outputs = pd.DataFrame({"output_name": ode_model.variable_names})
76
+ patients_df = patients_df.merge(outputs, how="cross")
77
+ # Simulate the ODE model
78
+ output_df = ode_model.run_trial(
79
+ patients_df, initial_conditions, protocol_design_filt, time_steps
80
+ )
81
+ # Pivot to wide to add noise per model output
82
+ wide_output = output_df.pivot_table(
83
+ index=["id", *ode_model.param_names, "time", "protocol_arm"],
84
+ columns="output_name",
85
+ values="predicted_value",
86
+ ).reset_index()
87
+
88
+ if error_model is None:
89
+ pass
90
+ else:
91
+ if residual_error_variance is None:
92
+ raise ValueError("Undefined residual error variance.")
93
+ else:
94
+ # Add noise to the data
95
+ noise = np.random.normal(
96
+ np.zeros_like(residual_error_variance),
97
+ np.sqrt(residual_error_variance),
98
+ (wide_output.shape[0], ode_model.nb_outputs),
99
+ )
100
+ if error_model == "additive":
101
+ wide_output[ode_model.variable_names] += noise
102
+ elif error_model == "proportional":
103
+ wide_output[ode_model.variable_names] += (
104
+ noise * wide_output[ode_model.variable_names]
105
+ )
106
+ else:
107
+ raise ValueError(f"Incorrect error_model choice: {error_model}")
108
+ # Pivot back to long format
109
+ long_output = wide_output.melt(
110
+ id_vars=[
111
+ "id",
112
+ "protocol_arm",
113
+ "time",
114
+ *ode_model.param_names,
115
+ ],
116
+ value_vars=ode_model.variable_names,
117
+ var_name="output_name",
118
+ value_name="value",
119
+ )
120
+ # Remove the protocol arm overrides from the data set, they described by the protocol_arm column now
121
+ long_output = long_output.drop(params_in_protocol, axis=1)
122
+ return long_output
123
+
124
+
125
+ def simulate_dataset_from_omega(
126
+ ode_model: OdeModel,
127
+ protocol_design: pd.DataFrame,
128
+ time_steps: np.ndarray,
129
+ init_conditions: np.ndarray,
130
+ log_mi: dict[str, float],
131
+ log_pdu: dict[str, dict[str, float]],
132
+ error_model: str,
133
+ res_var: list[float],
134
+ covariate_map: dict[str, dict[str, dict[str, str | float]]],
135
+ patient_covariates: pd.DataFrame,
136
+ ) -> pd.DataFrame:
137
+ """Generate synthetic data set using an ODE model and population distributions of parameters
138
+
139
+ Args:
140
+ ode_model (OdeModel): The equations to be simulated
141
+ protocol_design (pd.DataFrame): _description_
142
+ time_steps (np.ndarray): _description_
143
+ init_conditions (np.ndarray): _description_
144
+ log_mi (dict[str, float]): _description_
145
+ log_pdu (dict[str, dict[str, float]]): _description_
146
+ error_model (str): _description_
147
+ res_var (list[float]): _description_
148
+ covariate_map (dict[str, dict[str, dict[str, str | float]]]): _description_
149
+ patient_covariates (pd.DataFrame): _description_
150
+
151
+ Returns:
152
+ pd.DataFrame: _description_
153
+ """
154
+
155
+ structural_model = StructuralOdeModel(ode_model, protocol_design, init_conditions)
156
+ nlme_model = NlmeModel(
157
+ structural_model,
158
+ patient_covariates,
159
+ log_mi,
160
+ log_pdu,
161
+ res_var,
162
+ covariate_map,
163
+ error_model,
164
+ )
165
+ etas = nlme_model.sample_individual_etas()
166
+ theta = nlme_model.individual_parameters(etas, nlme_model.patients)
167
+ vpop = pd.DataFrame(
168
+ data=theta.numpy(), columns=nlme_model.structural_model.parameter_names
169
+ )
170
+ vpop["id"] = nlme_model.patients
171
+ protocol_arms = patient_covariates[["id", "protocol_arm"]]
172
+ vpop = vpop.merge(protocol_arms, on=["id"], how="left")
173
+ vpop = vpop.merge(
174
+ pd.DataFrame(data=nlme_model.outputs_names, columns=["output_name"]),
175
+ how="cross",
176
+ )
177
+ out = ode_model.run_trial(
178
+ vpop, init_conditions, protocol_design, time_steps
179
+ ).rename({"predicted_value": "value"}, axis=1)
180
+ return out
@@ -0,0 +1,3 @@
1
+ from .gp import GP
2
+
3
+ __all__ = ["GP"]