cccpm 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
cccpm-0.1.0/PKG-INFO ADDED
@@ -0,0 +1,108 @@
1
+ Metadata-Version: 2.4
2
+ Name: cccpm
3
+ Version: 0.1.0
4
+ Summary: Confound-Corrected Connectome-based Predictive Modeling Python Package
5
+ License: MIT
6
+ Author: Nils Winter
7
+ Author-email: nils.r.winter@uni-muenster.de
8
+ Requires-Python: >=3.10,<4.0
9
+ Classifier: License :: OSI Approved :: MIT License
10
+ Classifier: Programming Language :: Python :: 3
11
+ Classifier: Programming Language :: Python :: 3.10
12
+ Classifier: Programming Language :: Python :: 3.11
13
+ Classifier: Programming Language :: Python :: 3.12
14
+ Classifier: Programming Language :: Python :: 3.13
15
+ Classifier: Programming Language :: Python :: 3.14
16
+ Requires-Dist: arakawa
17
+ Requires-Dist: bleach
18
+ Requires-Dist: netplotbrain
19
+ Requires-Dist: networkx
20
+ Requires-Dist: nilearn
21
+ Requires-Dist: numpy
22
+ Requires-Dist: pandas
23
+ Requires-Dist: pingouin
24
+ Requires-Dist: plotly
25
+ Requires-Dist: pytest
26
+ Requires-Dist: pytest-cov
27
+ Requires-Dist: scikit-image
28
+ Requires-Dist: scikit-learn
29
+ Requires-Dist: tinycss2
30
+ Requires-Dist: tqdm
31
+ Requires-Dist: typer
32
+ Description-Content-Type: text/markdown
33
+
34
+ [![GitHub Workflow Status](https://github.com/wwu-mmll/confound_corrected_cpm/actions/workflows/python-test.yml/badge.svg)](https://github.com/wwu-mmll/confound_corrected_cpm/actions/workflows/python-test.yml)
35
+ [![Coverage Status](https://coveralls.io/repos/github/wwu-mmll/confound_corrected_cpm/badge.svg)](https://coveralls.io/github/wwu-mmll/confound_corrected_cpm)
36
+ [![Github Contributors](https://img.shields.io/github/contributors-anon/wwu-mmll/cpm_python?color=blue)](https://github.com/wwu-mmll/cpm_python/graphs/contributors)
37
+ [![Github Commits](https://img.shields.io/github/commit-activity/y/wwu-mmll/cpm_python)](https://github.com/wwu-mmll/cpm_python/commits/main)
38
+
39
+ # Confound-Corrected Connectome-Based Predictive Modelling in Python
40
+ **Confound-Corrected Connectome-Based Predictive Modelling** is a Python package for performing connectome-based predictive modeling (CPM). This toolbox is designed for researchers in neuroscience and psychiatry, providing robust methods for building predictive models based on structural or functional connectome data. It emphasizes replicability, interpretability, and flexibility, making it a valuable tool for analyzing brain connectivity and its relationship to behavior or clinical outcomes.
41
+
42
+ ---
43
+
44
+ ## What is Connectome-Based Predictive Modeling?
45
+
46
+ Connectome-based predictive modeling (CPM) is a machine learning framework that leverages the brain's connectivity patterns to predict individual differences in behavior, cognition, or clinical status. By identifying key edges in the connectome, CPM creates models that link connectivity metrics with target variables (e.g., clinical scores). This approach is particularly suited for studying complex relationships in neuroimaging data and developing interpretable predictive models.
47
+
48
+ ---
49
+
50
+ ## Key Features
51
+
52
+ - **Univariate Edge Selection**: Supports methods like `pearson`, `spearman`, and their partial correlation counterparts, with options for p-threshold optimization and FDR correction.
53
+ - **Cross-Validation**: Implements nested cross-validation for robust model evaluation.
54
+ - **Edge Stability**: Selects stable edges across folds to improve model reliability.
55
+ - **Confound Adjustment**: Controls for covariates during edge selection and modeling.
56
+ - **Permutation Testing**: Assesses the statistical significance of models using robust permutation-based methods.
57
+
58
+ ---
59
+
60
+ ## Documentation
61
+
62
+ For detailed instructions on installation, usage, and advanced configurations, visit the [documentation website](https://wwu-mmll.github.io/confound_corrected_cpm/).
63
+
64
+ ---
65
+
66
+ ## Installation
67
+
68
+ Install the package from GitHub:
69
+
70
+ ```bash
71
+ git clone https://github.com/mmll/confound_corrected_cpm.git
72
+ cd cpm_python
73
+ pip install .
74
+ ```
75
+
76
+ ## Quick Example
77
+ Here's a quick overview of how to run a CPM analysis:
78
+
79
+ ```python
80
+ from cccpm.cpm_analysis import CPMRegression
81
+ from cccpm.edge_selection import UnivariateEdgeSelection, PThreshold
82
+ from sklearn.model_selection import KFold
83
+
84
+ # Configure edge selection
85
+ univariate_edge_selection = UnivariateEdgeSelection(
86
+ edge_statistic=["pearson"],
87
+ edge_selection=[PThreshold(threshold=[0.05], correction=["fdr_by"])]
88
+ )
89
+
90
+ # Create the CPMRegression object
91
+ cpm = CPMRegression(
92
+ results_directory="results/",
93
+ cv=KFold(n_splits=10, shuffle=True, random_state=42),
94
+ edge_selection=univariate_edge_selection,
95
+ n_permutations=100
96
+ )
97
+
98
+ # Run the analysis
99
+ X = ... # Connectome data
100
+ y = ... # Target variable
101
+ covariates = ... # Covariates
102
+ cpm.run(X, y, covariates)
103
+ ```
104
+
105
+ ## Contributing
106
+ Contributions are welcome! If you have ideas, feedback, or feature requests, feel free to open an issue or submit a pull request on the GitHub repository.
107
+
108
+
cccpm-0.1.0/README.md ADDED
@@ -0,0 +1,74 @@
1
+ [![GitHub Workflow Status](https://github.com/wwu-mmll/confound_corrected_cpm/actions/workflows/python-test.yml/badge.svg)](https://github.com/wwu-mmll/confound_corrected_cpm/actions/workflows/python-test.yml)
2
+ [![Coverage Status](https://coveralls.io/repos/github/wwu-mmll/confound_corrected_cpm/badge.svg)](https://coveralls.io/github/wwu-mmll/confound_corrected_cpm)
3
+ [![Github Contributors](https://img.shields.io/github/contributors-anon/wwu-mmll/cpm_python?color=blue)](https://github.com/wwu-mmll/cpm_python/graphs/contributors)
4
+ [![Github Commits](https://img.shields.io/github/commit-activity/y/wwu-mmll/cpm_python)](https://github.com/wwu-mmll/cpm_python/commits/main)
5
+
6
+ # Confound-Corrected Connectome-Based Predictive Modelling in Python
7
+ **Confound-Corrected Connectome-Based Predictive Modelling** is a Python package for performing connectome-based predictive modeling (CPM). This toolbox is designed for researchers in neuroscience and psychiatry, providing robust methods for building predictive models based on structural or functional connectome data. It emphasizes replicability, interpretability, and flexibility, making it a valuable tool for analyzing brain connectivity and its relationship to behavior or clinical outcomes.
8
+
9
+ ---
10
+
11
+ ## What is Connectome-Based Predictive Modeling?
12
+
13
+ Connectome-based predictive modeling (CPM) is a machine learning framework that leverages the brain's connectivity patterns to predict individual differences in behavior, cognition, or clinical status. By identifying key edges in the connectome, CPM creates models that link connectivity metrics with target variables (e.g., clinical scores). This approach is particularly suited for studying complex relationships in neuroimaging data and developing interpretable predictive models.
14
+
15
+ ---
16
+
17
+ ## Key Features
18
+
19
+ - **Univariate Edge Selection**: Supports methods like `pearson`, `spearman`, and their partial correlation counterparts, with options for p-threshold optimization and FDR correction.
20
+ - **Cross-Validation**: Implements nested cross-validation for robust model evaluation.
21
+ - **Edge Stability**: Selects stable edges across folds to improve model reliability.
22
+ - **Confound Adjustment**: Controls for covariates during edge selection and modeling.
23
+ - **Permutation Testing**: Assesses the statistical significance of models using robust permutation-based methods.
24
+
25
+ ---
26
+
27
+ ## Documentation
28
+
29
+ For detailed instructions on installation, usage, and advanced configurations, visit the [documentation website](https://wwu-mmll.github.io/confound_corrected_cpm/).
30
+
31
+ ---
32
+
33
+ ## Installation
34
+
35
+ Install the package from GitHub:
36
+
37
+ ```bash
38
+ git clone https://github.com/mmll/confound_corrected_cpm.git
39
+ cd cpm_python
40
+ pip install .
41
+ ```
42
+
43
+ ## Quick Example
44
+ Here's a quick overview of how to run a CPM analysis:
45
+
46
+ ```python
47
+ from cccpm.cpm_analysis import CPMRegression
48
+ from cccpm.edge_selection import UnivariateEdgeSelection, PThreshold
49
+ from sklearn.model_selection import KFold
50
+
51
+ # Configure edge selection
52
+ univariate_edge_selection = UnivariateEdgeSelection(
53
+ edge_statistic=["pearson"],
54
+ edge_selection=[PThreshold(threshold=[0.05], correction=["fdr_by"])]
55
+ )
56
+
57
+ # Create the CPMRegression object
58
+ cpm = CPMRegression(
59
+ results_directory="results/",
60
+ cv=KFold(n_splits=10, shuffle=True, random_state=42),
61
+ edge_selection=univariate_edge_selection,
62
+ n_permutations=100
63
+ )
64
+
65
+ # Run the analysis
66
+ X = ... # Connectome data
67
+ y = ... # Target variable
68
+ covariates = ... # Covariates
69
+ cpm.run(X, y, covariates)
70
+ ```
71
+
72
+ ## Contributing
73
+ Contributions are welcome! If you have ideas, feedback, or feature requests, feel free to open an issue or submit a pull request on the GitHub repository.
74
+
@@ -0,0 +1 @@
1
+ from cccpm.cpm_analysis import CPMRegression
@@ -0,0 +1,272 @@
1
+ import os
2
+ import logging
3
+ import shutil
4
+
5
+ from typing import Union, Type
6
+ from tqdm import tqdm
7
+
8
+ import numpy as np
9
+ import pandas as pd
10
+ from sklearn.model_selection import BaseCrossValidator, BaseShuffleSplit, KFold, RepeatedKFold, StratifiedKFold
11
+ from sklearn.linear_model import LinearRegression
12
+
13
+ from cccpm.fold import run_inner_folds
14
+ from cccpm.logging import setup_logging
15
+ from cccpm.more_models import BaseCPMModel, LinearCPMModel
16
+ from cccpm.edge_selection import UnivariateEdgeSelection, PThreshold
17
+ from cccpm.results_manager import ResultsManager, PermutationManager
18
+ from cccpm.utils import train_test_split, check_data, impute_missing_values, select_stable_edges, generate_data_insights
19
+ from cccpm.scoring import score_regression_models
20
+ from cccpm.reporting import HTMLReporter
21
+
22
+
23
+ class CPMRegression:
24
+ """
25
+ This class handles the process of performing CPM Regression with cross-validation and permutation testing.
26
+ """
27
+ def __init__(self,
28
+ results_directory: str,
29
+ cpm_model: Type[BaseCPMModel] = LinearCPMModel,
30
+ cv: Union[BaseCrossValidator, BaseShuffleSplit, RepeatedKFold, StratifiedKFold] = KFold(n_splits=10, shuffle=True, random_state=42),
31
+ inner_cv: Union[BaseCrossValidator, BaseShuffleSplit, RepeatedKFold, StratifiedKFold] = None,
32
+ edge_selection: UnivariateEdgeSelection = UnivariateEdgeSelection(
33
+ edge_statistic='pearson',
34
+ edge_selection=[PThreshold(threshold=[0.05], correction=[None])]
35
+ ),
36
+ select_stable_edges: bool = False,
37
+ stability_threshold: float = 0.8,
38
+ impute_missing_values: bool = True,
39
+ calculate_residuals: bool = False,
40
+ n_permutations: int = 0,
41
+ atlas_labels: str = None):
42
+ """
43
+ Initialize the CPMRegression object.
44
+
45
+ Parameters
46
+ ----------
47
+ results_directory: str
48
+ Directory to save results.
49
+ cv: Union[BaseCrossValidator, BaseShuffleSplit]
50
+ Outer cross-validation strategy.
51
+ inner_cv: Union[BaseCrossValidator, BaseShuffleSplit]
52
+ Inner cross-validation strategy for edge selection.
53
+ edge_selection: UnivariateEdgeSelection
54
+ Method for edge selection.
55
+ impute_missing_values: bool
56
+ Whether to impute missing values.
57
+ n_permutations: int
58
+ Number of permutations to run for permutation testing.
59
+ atlas_labels: str
60
+ CSV file containing atlas and regions labels.
61
+ """
62
+ self.results_directory = results_directory
63
+ self.cpm_model = cpm_model
64
+ self.cv = cv
65
+ self.inner_cv = inner_cv
66
+ self.edge_selection = edge_selection
67
+ self.select_stable_edges = select_stable_edges
68
+ self.stability_threshold = stability_threshold
69
+ self.impute_missing_values = impute_missing_values
70
+ self.calculate_residuals = calculate_residuals
71
+ self.n_permutations = n_permutations
72
+
73
+ np.random.seed(42)
74
+ os.makedirs(self.results_directory, exist_ok=True)
75
+ os.makedirs(os.path.join(self.results_directory, "edges"), exist_ok=True)
76
+ setup_logging(os.path.join(self.results_directory, "cpm_log.txt"))
77
+ self.logger = logging.getLogger(__name__)
78
+
79
+ # Log important configuration details
80
+ self._log_analysis_details()
81
+
82
+ # check inner cv and param grid
83
+ if self.inner_cv is None:
84
+ if len(self.edge_selection.param_grid) > 1:
85
+ raise RuntimeError("Multiple hyperparameter configurations but no inner cv defined. "
86
+ "Please provide only one hyperparameter configuration or an inner cv.")
87
+ if self.select_stable_edges:
88
+ raise RuntimeError("Stable edges can only be selected when using an inner cv.")
89
+
90
+ # check and copy atlas labels file
91
+ self.atlas_labels = self._validate_and_copy_atlas_file(atlas_labels)
92
+
93
+ # results are saved to the results manager instance
94
+ self.results_manager = None
95
+
96
+ def _log_analysis_details(self):
97
+ """
98
+ Log important information about the analysis in a structured format.
99
+ """
100
+ self.logger.info("Starting CPM Regression Analysis")
101
+ self.logger.info("="*50)
102
+ self.logger.info(f"Results Directory: {self.results_directory}")
103
+ self.logger.info(f"CPM Model: {self.cpm_model.name}")
104
+ self.logger.info(f"Outer CV strategy: {self.cv}")
105
+ self.logger.info(f"Inner CV strategy: {self.inner_cv}")
106
+ self.logger.info(f"Edge selection method: {self.edge_selection}")
107
+ self.logger.info(f"Select stable edges: {'Yes' if self.select_stable_edges else 'No'}")
108
+ if self.select_stable_edges:
109
+ self.logger.info(f"Stability threshold: {self.stability_threshold}")
110
+ self.logger.info(f"Impute Missing Values: {'Yes' if self.impute_missing_values else 'No'}")
111
+ self.logger.info(f"Calculate residuals: {'Yes' if self.calculate_residuals else 'No'}")
112
+ self.logger.info(f"Number of Permutations: {self.n_permutations}")
113
+ self.logger.info("="*50)
114
+
115
+ def _validate_and_copy_atlas_file(self, csv_path):
116
+ """
117
+ Validates that a CSV file exists and contains the required columns ('x', 'y', 'z', 'region').
118
+ If valid, copies it to <self.results_directory>/edges.
119
+ """
120
+ if csv_path is None:
121
+ return None
122
+
123
+ required_columns = {"x", "y", "z", "region"}
124
+ csv_path = os.path.abspath(csv_path)
125
+
126
+ # Check if file exists
127
+ if not os.path.isfile(csv_path):
128
+ raise RuntimeError(f"CSV file does not exist: {csv_path}")
129
+
130
+ # Try to read and validate columns
131
+ try:
132
+ df = pd.read_csv(csv_path)
133
+ missing = required_columns - set(df.columns)
134
+
135
+ if missing:
136
+ raise RuntimeError(f"CSV file is missing required columns: {', '.join(missing)}")
137
+ except Exception as e:
138
+ raise RuntimeError(f"Error reading CSV file {csv_path}: {e}")
139
+
140
+ # File and columns valid, proceed to copy
141
+ dest_path = os.path.join(self.results_directory, "edges", os.path.basename(csv_path))
142
+
143
+ try:
144
+ shutil.copy(csv_path, dest_path)
145
+ self.logger.info(f"Copied CSV file to {dest_path}")
146
+ return dest_path
147
+ except Exception as e:
148
+ self.logger.error(f"Error copying file to {dest_path}: {e}")
149
+ return None
150
+
151
+ def run(self,
152
+ X: Union[pd.DataFrame, np.ndarray],
153
+ y: Union[pd.Series, pd.DataFrame, np.ndarray],
154
+ covariates: Union[pd.Series, pd.DataFrame, np.ndarray]):
155
+ """
156
+ Estimates a model using the provided data and conducts permutation testing. This method first fits the model to the actual data and subsequently performs estimation on permuted data for a specified number of permutations. Finally, it calculates permutation results.
157
+
158
+ Parameters
159
+ ----------
160
+ X: Feature data used for the model. Can be a pandas DataFrame or a NumPy array.
161
+ y: Target variable used in the estimation process. Can be a pandas Series, DataFrame, or a NumPy array.
162
+ covariates: Additional covariate data to include in the model. Can be a pandas Series, DataFrame, or a NumPy array.
163
+
164
+ """
165
+ self.logger.info(f"Starting estimation with {self.n_permutations} permutations.")
166
+
167
+ # check data and convert to numpy
168
+ generate_data_insights(X=X, y=y, covariates=covariates, results_directory=self.results_directory)
169
+ X, y, covariates = check_data(X, y, covariates, impute_missings=self.impute_missing_values)
170
+
171
+ # Estimate models on actual data
172
+ self._single_run(X=X, y=y, covariates=covariates, perm_run=0)
173
+ self.logger.info("=" * 50)
174
+
175
+ # Estimate models on permuted data
176
+ for perm_id in tqdm(range(1, self.n_permutations + 1), desc="Permutation runs", unit="run",
177
+ total=self.n_permutations):
178
+ y = np.random.permutation(y)
179
+ self._single_run(X=X, y=y, covariates=covariates, perm_run=perm_id)
180
+
181
+ if self.n_permutations > 0:
182
+ PermutationManager.calculate_permutation_results(self.results_directory, self.logger)
183
+ self.logger.info("Estimation completed.")
184
+ self.logger.info("Generating results file.")
185
+ reporter = HTMLReporter(results_directory=self.results_directory, atlas_labels=self.atlas_labels)
186
+ reporter.generate_html_report()
187
+
188
+ def generate_html_report(self):
189
+ self.logger.info("Generating HTML report.")
190
+ reporter = HTMLReporter(results_directory=self.results_directory, atlas_labels=self.atlas_labels)
191
+ reporter.generate_html_report()
192
+
193
+ def _single_run(self,
194
+ X: Union[pd.DataFrame, np.ndarray],
195
+ y: Union[pd.Series, pd.DataFrame, np.ndarray],
196
+ covariates: Union[pd.Series, pd.DataFrame, np.ndarray],
197
+ perm_run: int = 0):
198
+ """
199
+ Perform an estimation run (either real or permuted data). Includes outer cross-validation loop. For permutation
200
+ runs, the same strategy is used, but printing is less verbose and the results folder changes.
201
+
202
+ :param X: Features (predictors).
203
+ :param y: Labels (target variable).
204
+ :param covariates: Covariates to control for.
205
+ :param perm_run: Permutation run identifier.
206
+ """
207
+ results_manager = ResultsManager(output_dir=self.results_directory, perm_run=perm_run,
208
+ n_folds=self.cv.get_n_splits(), n_features=X.shape[1])
209
+
210
+ iterator = (
211
+ tqdm(
212
+ enumerate(self.cv.split(X, y)),
213
+ total=self.cv.get_n_splits(),
214
+ desc="Running outer folds",
215
+ unit="fold"
216
+ )
217
+ if not perm_run else
218
+ enumerate(self.cv.split(X, y))
219
+ )
220
+ for outer_fold, (train, test) in iterator:
221
+ # split according to single outer fold
222
+ X_train, X_test, y_train, y_test, cov_train, cov_test = train_test_split(train, test, X, y, covariates)
223
+
224
+ # impute missing values
225
+ if self.impute_missing_values:
226
+ X_train, X_test, cov_train, cov_test = impute_missing_values(X_train, X_test, cov_train, cov_test)
227
+
228
+ # residualize X to remove effect of covariates
229
+ if self.calculate_residuals:
230
+ residual_model = LinearRegression().fit(cov_train, X_train)
231
+ X_train = X_train - residual_model.predict(cov_train)
232
+ X_test = X_test - residual_model.predict(cov_test)
233
+
234
+ # if the user specified an inner cross-validation, estimate models witin inner loop
235
+ if self.inner_cv:
236
+ best_params, stability_edges = run_inner_folds(cpm_model=self.cpm_model,
237
+ X=X_train, y=y_train, covariates=cov_train,
238
+ inner_cv=self.inner_cv,
239
+ edge_selection=self.edge_selection,
240
+ results_directory=os.path.join(results_manager.results_directory, 'folds', str(outer_fold)),
241
+ perm_run=perm_run)
242
+ else:
243
+ best_params = self.edge_selection.param_grid[0]
244
+
245
+ # Use best parameters to estimate performance on outer fold test set
246
+ if self.select_stable_edges:
247
+ edges = select_stable_edges(stability_edges, self.stability_threshold)
248
+ else:
249
+ self.edge_selection.set_params(**best_params)
250
+ edges = self.edge_selection.fit_transform(X=X_train, y=y_train, covariates=cov_train).return_selected_edges()
251
+
252
+ results_manager.store_edges(edges=edges, fold=outer_fold)
253
+
254
+ # Build model and make predictions
255
+ model = self.cpm_model(edges=edges).fit(X_train, y_train, cov_train)
256
+ y_pred = model.predict(X_test, cov_test)
257
+ network_strengths = model.get_network_strengths(X_test, cov_test)
258
+ metrics = score_regression_models(y_true=y_test, y_pred=y_pred)
259
+ results_manager.store_predictions(y_pred=y_pred, y_true=y_test, params=best_params, fold=outer_fold,
260
+ param_id=0, test_indices=test)
261
+ results_manager.store_metrics(metrics=metrics, params=best_params, fold=outer_fold, param_id=0)
262
+ results_manager.store_network_strengths(network_strengths=network_strengths, y_true=y_test, fold=outer_fold)
263
+
264
+ # once all outer folds are done, calculate final results and edge stability
265
+ results_manager.calculate_final_cv_results()
266
+ results_manager.calculate_edge_stability()
267
+
268
+ if not perm_run:
269
+ self.logger.info(results_manager.agg_results.round(4).to_string())
270
+ results_manager.save_predictions()
271
+ results_manager.save_network_strengths()
272
+ self.results_manager = results_manager