pi-oplsda 1.0.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 KaikunXu
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,154 @@
1
+ Metadata-Version: 2.4
2
+ Name: pi-oplsda
3
+ Version: 1.0.1
4
+ Summary: A high-performance and user-friendly Python package for OPLS-DA, aligned with 'ropls'. Features parallel permutation tests with dynamic progress tracking, and publication-ready visualizations.
5
+ Author-email: KaikunXu <xukaikun.bio@qq.com>
6
+ License: MIT License
7
+
8
+ Copyright (c) 2026 KaikunXu
9
+
10
+ Permission is hereby granted, free of charge, to any person obtaining a copy
11
+ of this software and associated documentation files (the "Software"), to deal
12
+ in the Software without restriction, including without limitation the rights
13
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
14
+ copies of the Software, and to permit persons to whom the Software is
15
+ furnished to do so, subject to the following conditions:
16
+
17
+ The above copyright notice and this permission notice shall be included in all
18
+ copies or substantial portions of the Software.
19
+
20
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
21
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
22
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
23
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
24
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
25
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
26
+ SOFTWARE.
27
+
28
+ Project-URL: Homepage, https://github.com/KaikunXu/pi-oplsda
29
+ Project-URL: Bug Tracker, https://github.com/KaikunXu/pi-oplsda/issues
30
+ Classifier: Programming Language :: Python :: 3
31
+ Classifier: License :: OSI Approved :: MIT License
32
+ Classifier: Operating System :: OS Independent
33
+ Classifier: Intended Audience :: Science/Research
34
+ Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
35
+ Classifier: Topic :: Scientific/Engineering :: Chemistry
36
+ Requires-Python: >=3.9
37
+ Description-Content-Type: text/markdown
38
+ License-File: LICENSE
39
+ Requires-Dist: numpy>=1.21.0
40
+ Requires-Dist: pandas>=1.3.0
41
+ Requires-Dist: scipy>=1.7.0
42
+ Requires-Dist: scikit-learn>=1.0.2
43
+ Requires-Dist: joblib>=1.3.0
44
+ Requires-Dist: matplotlib>=3.3.0
45
+ Requires-Dist: seaborn>=0.11.0
46
+ Requires-Dist: tqdm>=4.65.0
47
+ Requires-Dist: rich>=13.0.0
48
+ Requires-Dist: patchworklib>=0.6.2
49
+ Requires-Dist: plotnine>=0.10.1
50
+ Provides-Extra: test
51
+ Requires-Dist: pytest>=7.0.0; extra == "test"
52
+ Requires-Dist: tabulate>=0.10.0; extra == "test"
53
+ Dynamic: license-file
54
+
55
+ # π-OPLS-DA (`pi-oplsda`)
56
+
57
+ [![GitHub release](https://img.shields.io/github/v/release/KaikunXu/pi-oplsda)](https://github.com/KaikunXu/pi-oplsda/releases)
58
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
59
+
60
+
61
+ > A high-performance, Pythonic implementation of Orthogonal Partial Least Squares Discriminant Analysis (OPLS-DA), tailored for metabolomics and bioinformatics.
62
+
63
+ `pi-oplsda` bridges the gap between the rigorous algorithmic foundation of the gold-standard R package `ropls` and the modern Python data science ecosystem. It delivers blazing-fast parallel computing, native Pandas integration, and publication-ready visualizations—all in one lightweight package.
64
+
65
+ ## ✨ Core Capabilities
66
+
67
+ + **Standardized Rigor:** Perfectly replicates `ropls` step-wise variance increments ($R^2X$, $R^2Y$, $Q^2$) and NIPALS-based orthogonal signal correction (OSC).
68
+ + **Pandas Native:** Seamlessly feed `pandas.DataFrame` into the model. Sample IDs and feature names are automatically tracked, eliminating the need for tedious matrix index management.
69
+ + **Multi-Core Acceleration:** Powered by `joblib`, permutation tests are fully parallelized.
70
+ + **Publication-Ready Graphics:** Built on `matplotlib` and `seaborn` to generate clean, high-resolution diagnostic plots with smart legend placement.
71
+ + **Structured Export:** Extract model parameters, sample scores, and biomarker statistics (VIP, Covariance, Correlation) as instantly usable DataFrames for downstream pipelines.
72
+
73
+ > **Note:** Due to the nature of eigen-decomposition, the signs of scores and loadings may be flipped between platforms. This is mathematically equivalent and does not affect biological interpretation.
74
+
75
+ ## 📦 Installation
76
+ You can install `pi-oplsda` using either of the following methods, depending on whether you simply want to use the package or if you plan to modify the source code.
77
+
78
+ Option 1: Install directly from GitHub (Recommended for most users)
79
+
80
+ ```bash
81
+ pip install git+https://github.com/KaikunXu/pi-oplsda.git
82
+ ```
83
+
84
+ Option 2: Install from source (For developers)
85
+
86
+ If you want to contribute to the project, modify the algorithm, or explore the source code, you can clone the repository and install it in "editable" mode. This means any changes you make to the local code will immediately take effect without needing to reinstall the package.
87
+
88
+ ```bash
89
+ # 1. Clone the repository
90
+ git clone https://github.com/KaikunXu/pi-oplsda.git
91
+
92
+ # 2. Navigate into the project directory
93
+ cd pi-oplsda
94
+
95
+ # 3. Install in editable mode
96
+ pip install -e .
97
+ ```
98
+
99
+ ## 🚀 Quickstart & Tutorials
100
+
101
+ We provide interactive Jupyter Notebooks that walk you through the entire OPLS-DA workflow and our rigorous validation process:
102
+
103
+ * **[Quickstart Tutorial](examples/quickstart_en.ipynb)**: A comprehensive guide from data loading to visualization and prediction.
104
+ * **[R-ropls Equivalence Benchmark](examples/benchmark.ipynb)**: The complete script used to prove numerical consistency between Python and R implementations.
105
+
106
+ ## 📈 Visualization
107
+
108
+ Running the `OPLSDA_Visualizer` will automatically generate a suite of tightly integrated diagnostic subplots to evaluate your model from multiple dimensions:
109
+
110
+ + **Model Overview:** Displays the step-wise increments of $R^2Y$ and $Q^2$ for both predictive and orthogonal components, illustrating the model's global explanatory and predictive capacity.
111
+ + **X-Score Plot:** Visualizes sample clustering and separation in the predictive latent space, complete with 95% Hotelling's $T^2$ confidence ellipses.
112
+ + **Observation Diagnostics:** Evaluates the relationship between sample influence (Score Distance) and model fit (Orthogonal Distance / DModX) to robustly identify multivariate outliers.
113
+ + **Permutation Test:** Validates model robustness against overfitting by comparing the original $R^2Y$ and $Q^2$ against permuted null distributions, providing empirical p-values.
114
+ + **VIP Bar Plot:** Ranks the top features contributing to group separation. It features **automatic text wrapping** for excessively long metabolite names on the Y-axis to ensure clean, publication-ready layouts.
115
+ + **S-Plot (Optional):** Highlights potential biomarkers based on the interplay between covariance (magnitude) and correlation (reliability). *(Note: This plot is available exclusively for binary classification models, as demonstrated in the Quickstart Tutorial).*
116
+
117
+ !["pi-oplsda_visualizer"](https://github.com/KaikunXu/pi-oplsda/blob/main/assets/pi-oplsda_visualizer.png)
118
+
119
+ ## 🎯 Mathematical Equivalence & Benchmarking
120
+
121
+ `pi-oplsda` is strictly validated against the gold-standard R package `ropls` (Bioconductor) to ensure scientific integrity. Our cross-platform benchmarking demonstrates that `pi-oplsda` produces numerically identical results across all key OPLS-DA metrics.
122
+
123
+ Using the **Sacurine** human urine dataset (183 samples, 109 metabolites), we compared the Python and R implementations:
124
+
125
+ | Metric | Description | Comparison |
126
+ | :--- | :--- | :--- |
127
+ | **Global Quality** | Cumulative $R^2X$, $R^2Y$, and $Q^2$ | **Approximately equal** |
128
+ | **Error Assessment** | Root Mean Square Error of Estimation (RMSEE) | **Approximately equal** |
129
+ | **Latent Space** | Predictive Scores ($t_1$, $to_{n}$) and Loadings ($p_1$) | **Pearson's r > 0.999** |
130
+ | **Variable Importance** | Variable Importance in Projection (VIP) scores | **Pearson's r > 0.999** |
131
+
132
+ Cross-platform benchmarking demonstrates that `pi-oplsda` produces numerically identical results across all key OPLS-DA metrics. To ensure a rigorous, one-to-one comparison of the underlying computational results, the testing process directly invokes the R engine via the `rpy2` interface. Model parameters were strictly aligned between platforms, fixing the predictive component to 1, optimizing orthogonal components (n=3), and employing 7-fold cross-validation.
133
+
134
+ In the visualizations below, the solid red scatter points map the values generated by both platforms as coordinate pairs ($x_{\text{ropls}}, y_{\text{pi-oplsda}}$), while the dashed black lines denote the ideal axis of perfect equivalence ($y=x$). The exceptionally high correlation coefficients ($r \approx 1.0000$) provide mathematical proof of algorithmic identity:
135
+
136
+ * **Global Model Metrics (Top-Left):** Compares the overall cumulative explained variance ($R^2X$, $R^2Y$), cross-validated predictability ($Q^2$), and Root Mean Square Error of Estimation (RMSEE). The negligible absolute differences (Abs Diff $< 10^{-2}$) confirm macroscopic equivalence.
137
+ * **Orthogonal Latent Space (Top-Right):** Displays the correlation of the orthogonal score vectors (e.g., $t_{o1}, t_{o2}$). This confirms that both models possess identical geometric capabilities in extracting and filtering intra-class structured noise.
138
+ * **Feature Importance & Predictive Space (Bottom Row):** Illustrates the three critical vectors driving the discriminatory power of OPLS-DA: **VIP Scores** (determining biomarker ranking), **Predictive Scores** $t_1$ (driving sample clustering), and **Predictive Loadings** $p_1$ (determining feature weights). The diagonal alignment confirms absolute accuracy in microscopic sample profiling and feature extraction.
139
+
140
+ !["pi_oplsda_benchmark.png"](https://github.com/KaikunXu/pi-oplsda/blob/main/assets/pi_oplsda_benchmark.png)
141
+
142
+ ## 🤝 Acknowledgements
143
+
144
+ The algorithmic foundation of `pi-oplsda` is deeply inspired by the excellent R package [`ropls`](https://bioconductor.org/packages/ropls/).
145
+
146
+ > **Note:** Portions of this codebase, including code refactoring and documentation, were refined with the assistance of Gemini 3.1 Pro. All AI-assisted contributions have been strictly reviewed by the human author to ensure scientific accuracy and code quality.
147
+
148
+ ## 🛠 Contributing
149
+
150
+ Contributions, issues, and feature requests are welcome! Feel free to check the [issues page](https://github.com/KaikunXu/pi-oplsda/issues).
151
+
152
+ ## 📄 License
153
+
154
+ This project is licensed under the **MIT License**.
@@ -0,0 +1,100 @@
1
+ # π-OPLS-DA (`pi-oplsda`)
2
+
3
+ [![GitHub release](https://img.shields.io/github/v/release/KaikunXu/pi-oplsda)](https://github.com/KaikunXu/pi-oplsda/releases)
4
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
5
+
6
+
7
+ > A high-performance, Pythonic implementation of Orthogonal Partial Least Squares Discriminant Analysis (OPLS-DA), tailored for metabolomics and bioinformatics.
8
+
9
+ `pi-oplsda` bridges the gap between the rigorous algorithmic foundation of the gold-standard R package `ropls` and the modern Python data science ecosystem. It delivers blazing-fast parallel computing, native Pandas integration, and publication-ready visualizations—all in one lightweight package.
10
+
11
+ ## ✨ Core Capabilities
12
+
13
+ + **Standardized Rigor:** Perfectly replicates `ropls` step-wise variance increments ($R^2X$, $R^2Y$, $Q^2$) and NIPALS-based orthogonal signal correction (OSC).
14
+ + **Pandas Native:** Seamlessly feed `pandas.DataFrame` into the model. Sample IDs and feature names are automatically tracked, eliminating the need for tedious matrix index management.
15
+ + **Multi-Core Acceleration:** Powered by `joblib`, permutation tests are fully parallelized.
16
+ + **Publication-Ready Graphics:** Built on `matplotlib` and `seaborn` to generate clean, high-resolution diagnostic plots with smart legend placement.
17
+ + **Structured Export:** Extract model parameters, sample scores, and biomarker statistics (VIP, Covariance, Correlation) as instantly usable DataFrames for downstream pipelines.
18
+
19
+ > **Note:** Due to the nature of eigen-decomposition, the signs of scores and loadings may be flipped between platforms. This is mathematically equivalent and does not affect biological interpretation.
20
+
21
+ ## 📦 Installation
22
+ You can install `pi-oplsda` using either of the following methods, depending on whether you simply want to use the package or if you plan to modify the source code.
23
+
24
+ Option 1: Install directly from GitHub (Recommended for most users)
25
+
26
+ ```bash
27
+ pip install git+https://github.com/KaikunXu/pi-oplsda.git
28
+ ```
29
+
30
+ Option 2: Install from source (For developers)
31
+
32
+ If you want to contribute to the project, modify the algorithm, or explore the source code, you can clone the repository and install it in "editable" mode. This means any changes you make to the local code will immediately take effect without needing to reinstall the package.
33
+
34
+ ```bash
35
+ # 1. Clone the repository
36
+ git clone https://github.com/KaikunXu/pi-oplsda.git
37
+
38
+ # 2. Navigate into the project directory
39
+ cd pi-oplsda
40
+
41
+ # 3. Install in editable mode
42
+ pip install -e .
43
+ ```
44
+
45
+ ## 🚀 Quickstart & Tutorials
46
+
47
+ We provide interactive Jupyter Notebooks that walk you through the entire OPLS-DA workflow and our rigorous validation process:
48
+
49
+ * **[Quickstart Tutorial](examples/quickstart_en.ipynb)**: A comprehensive guide from data loading to visualization and prediction.
50
+ * **[R-ropls Equivalence Benchmark](examples/benchmark.ipynb)**: The complete script used to prove numerical consistency between Python and R implementations.
51
+
52
+ ## 📈 Visualization
53
+
54
+ Running the `OPLSDA_Visualizer` will automatically generate a suite of tightly integrated diagnostic subplots to evaluate your model from multiple dimensions:
55
+
56
+ + **Model Overview:** Displays the step-wise increments of $R^2Y$ and $Q^2$ for both predictive and orthogonal components, illustrating the model's global explanatory and predictive capacity.
57
+ + **X-Score Plot:** Visualizes sample clustering and separation in the predictive latent space, complete with 95% Hotelling's $T^2$ confidence ellipses.
58
+ + **Observation Diagnostics:** Evaluates the relationship between sample influence (Score Distance) and model fit (Orthogonal Distance / DModX) to robustly identify multivariate outliers.
59
+ + **Permutation Test:** Validates model robustness against overfitting by comparing the original $R^2Y$ and $Q^2$ against permuted null distributions, providing empirical p-values.
60
+ + **VIP Bar Plot:** Ranks the top features contributing to group separation. It features **automatic text wrapping** for excessively long metabolite names on the Y-axis to ensure clean, publication-ready layouts.
61
+ + **S-Plot (Optional):** Highlights potential biomarkers based on the interplay between covariance (magnitude) and correlation (reliability). *(Note: This plot is available exclusively for binary classification models, as demonstrated in the Quickstart Tutorial).*
62
+
63
+ !["pi-oplsda_visualizer"](https://github.com/KaikunXu/pi-oplsda/blob/main/assets/pi-oplsda_visualizer.png)
64
+
65
+ ## 🎯 Mathematical Equivalence & Benchmarking
66
+
67
+ `pi-oplsda` is strictly validated against the gold-standard R package `ropls` (Bioconductor) to ensure scientific integrity. Our cross-platform benchmarking demonstrates that `pi-oplsda` produces numerically identical results across all key OPLS-DA metrics.
68
+
69
+ Using the **Sacurine** human urine dataset (183 samples, 109 metabolites), we compared the Python and R implementations:
70
+
71
+ | Metric | Description | Comparison |
72
+ | :--- | :--- | :--- |
73
+ | **Global Quality** | Cumulative $R^2X$, $R^2Y$, and $Q^2$ | **Approximately equal** |
74
+ | **Error Assessment** | Root Mean Square Error of Estimation (RMSEE) | **Approximately equal** |
75
+ | **Latent Space** | Predictive Scores ($t_1$, $to_{n}$) and Loadings ($p_1$) | **Pearson's r > 0.999** |
76
+ | **Variable Importance** | Variable Importance in Projection (VIP) scores | **Pearson's r > 0.999** |
77
+
78
+ Cross-platform benchmarking demonstrates that `pi-oplsda` produces numerically identical results across all key OPLS-DA metrics. To ensure a rigorous, one-to-one comparison of the underlying computational results, the testing process directly invokes the R engine via the `rpy2` interface. Model parameters were strictly aligned between platforms, fixing the predictive component to 1, optimizing orthogonal components (n=3), and employing 7-fold cross-validation.
79
+
80
+ In the visualizations below, the solid red scatter points map the values generated by both platforms as coordinate pairs ($x_{\text{ropls}}, y_{\text{pi-oplsda}}$), while the dashed black lines denote the ideal axis of perfect equivalence ($y=x$). The exceptionally high correlation coefficients ($r \approx 1.0000$) provide mathematical proof of algorithmic identity:
81
+
82
+ * **Global Model Metrics (Top-Left):** Compares the overall cumulative explained variance ($R^2X$, $R^2Y$), cross-validated predictability ($Q^2$), and Root Mean Square Error of Estimation (RMSEE). The negligible absolute differences (Abs Diff $< 10^{-2}$) confirm macroscopic equivalence.
83
+ * **Orthogonal Latent Space (Top-Right):** Displays the correlation of the orthogonal score vectors (e.g., $t_{o1}, t_{o2}$). This confirms that both models possess identical geometric capabilities in extracting and filtering intra-class structured noise.
84
+ * **Feature Importance & Predictive Space (Bottom Row):** Illustrates the three critical vectors driving the discriminatory power of OPLS-DA: **VIP Scores** (determining biomarker ranking), **Predictive Scores** $t_1$ (driving sample clustering), and **Predictive Loadings** $p_1$ (determining feature weights). The diagonal alignment confirms absolute accuracy in microscopic sample profiling and feature extraction.
85
+
86
+ !["pi_oplsda_benchmark.png"](https://github.com/KaikunXu/pi-oplsda/blob/main/assets/pi_oplsda_benchmark.png)
87
+
88
+ ## 🤝 Acknowledgements
89
+
90
+ The algorithmic foundation of `pi-oplsda` is deeply inspired by the excellent R package [`ropls`](https://bioconductor.org/packages/ropls/).
91
+
92
+ > **Note:** Portions of this codebase, including code refactoring and documentation, were refined with the assistance of Gemini 3.1 Pro. All AI-assisted contributions have been strictly reviewed by the human author to ensure scientific accuracy and code quality.
93
+
94
+ ## 🛠 Contributing
95
+
96
+ Contributions, issues, and feature requests are welcome! Feel free to check the [issues page](https://github.com/KaikunXu/pi-oplsda/issues).
97
+
98
+ ## 📄 License
99
+
100
+ This project is licensed under the **MIT License**.
@@ -0,0 +1,55 @@
1
+ [build-system]
2
+ requires = ["setuptools>=61.0"]
3
+ build-backend = "setuptools.build_meta"
4
+
5
+ [project]
6
+ name = "pi-oplsda"
7
+ dynamic = ["version", "readme"]
8
+ authors = [
9
+ { name="KaikunXu", email="xukaikun.bio@qq.com" },
10
+ ]
11
+ description = "A high-performance and user-friendly Python package for OPLS-DA, aligned with 'ropls'. Features parallel permutation tests with dynamic progress tracking, and publication-ready visualizations."
12
+ requires-python = ">=3.9"
13
+ license = {file = "LICENSE"}
14
+ classifiers = [
15
+ "Programming Language :: Python :: 3",
16
+ "License :: OSI Approved :: MIT License",
17
+ "Operating System :: OS Independent",
18
+ "Intended Audience :: Science/Research",
19
+ "Topic :: Scientific/Engineering :: Bio-Informatics",
20
+ "Topic :: Scientific/Engineering :: Chemistry"
21
+ ]
22
+ dependencies = [
23
+ "numpy>=1.21.0",
24
+ "pandas>=1.3.0",
25
+ "scipy>=1.7.0",
26
+ "scikit-learn>=1.0.2",
27
+ "joblib>=1.3.0",
28
+ "matplotlib>=3.3.0",
29
+ "seaborn>=0.11.0",
30
+ "tqdm>=4.65.0",
31
+ "rich>=13.0.0",
32
+ "patchworklib>=0.6.2",
33
+ "plotnine>=0.10.1"
34
+ ]
35
+
36
+ [project.urls]
37
+ "Homepage" = "https://github.com/KaikunXu/pi-oplsda"
38
+ "Bug Tracker" = "https://github.com/KaikunXu/pi-oplsda/issues"
39
+
40
+ [tool.setuptools.packages.find]
41
+ where = ["src"]
42
+ include = ["piopls*"]
43
+
44
+ [tool.setuptools.package-data]
45
+ "piopls.data" = ["*.csv"]
46
+
47
+ [tool.setuptools.dynamic]
48
+ version = {attr = "piopls.__version__"}
49
+ readme = {file = ["README.md"], content-type = "text/markdown"}
50
+
51
+ [project.optional-dependencies]
52
+ test = [
53
+ "pytest>=7.0.0",
54
+ "tabulate>=0.10.0"
55
+ ]
@@ -0,0 +1,4 @@
1
+ [egg_info]
2
+ tag_build =
3
+ tag_date = 0
4
+
@@ -0,0 +1,154 @@
1
+ Metadata-Version: 2.4
2
+ Name: pi-oplsda
3
+ Version: 1.0.1
4
+ Summary: A high-performance and user-friendly Python package for OPLS-DA, aligned with 'ropls'. Features parallel permutation tests with dynamic progress tracking, and publication-ready visualizations.
5
+ Author-email: KaikunXu <xukaikun.bio@qq.com>
6
+ License: MIT License
7
+
8
+ Copyright (c) 2026 KaikunXu
9
+
10
+ Permission is hereby granted, free of charge, to any person obtaining a copy
11
+ of this software and associated documentation files (the "Software"), to deal
12
+ in the Software without restriction, including without limitation the rights
13
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
14
+ copies of the Software, and to permit persons to whom the Software is
15
+ furnished to do so, subject to the following conditions:
16
+
17
+ The above copyright notice and this permission notice shall be included in all
18
+ copies or substantial portions of the Software.
19
+
20
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
21
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
22
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
23
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
24
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
25
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
26
+ SOFTWARE.
27
+
28
+ Project-URL: Homepage, https://github.com/KaikunXu/pi-oplsda
29
+ Project-URL: Bug Tracker, https://github.com/KaikunXu/pi-oplsda/issues
30
+ Classifier: Programming Language :: Python :: 3
31
+ Classifier: License :: OSI Approved :: MIT License
32
+ Classifier: Operating System :: OS Independent
33
+ Classifier: Intended Audience :: Science/Research
34
+ Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
35
+ Classifier: Topic :: Scientific/Engineering :: Chemistry
36
+ Requires-Python: >=3.9
37
+ Description-Content-Type: text/markdown
38
+ License-File: LICENSE
39
+ Requires-Dist: numpy>=1.21.0
40
+ Requires-Dist: pandas>=1.3.0
41
+ Requires-Dist: scipy>=1.7.0
42
+ Requires-Dist: scikit-learn>=1.0.2
43
+ Requires-Dist: joblib>=1.3.0
44
+ Requires-Dist: matplotlib>=3.3.0
45
+ Requires-Dist: seaborn>=0.11.0
46
+ Requires-Dist: tqdm>=4.65.0
47
+ Requires-Dist: rich>=13.0.0
48
+ Requires-Dist: patchworklib>=0.6.2
49
+ Requires-Dist: plotnine>=0.10.1
50
+ Provides-Extra: test
51
+ Requires-Dist: pytest>=7.0.0; extra == "test"
52
+ Requires-Dist: tabulate>=0.10.0; extra == "test"
53
+ Dynamic: license-file
54
+
55
+ # π-OPLS-DA (`pi-oplsda`)
56
+
57
+ [![GitHub release](https://img.shields.io/github/v/release/KaikunXu/pi-oplsda)](https://github.com/KaikunXu/pi-oplsda/releases)
58
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
59
+
60
+
61
+ > A high-performance, Pythonic implementation of Orthogonal Partial Least Squares Discriminant Analysis (OPLS-DA), tailored for metabolomics and bioinformatics.
62
+
63
+ `pi-oplsda` bridges the gap between the rigorous algorithmic foundation of the gold-standard R package `ropls` and the modern Python data science ecosystem. It delivers blazing-fast parallel computing, native Pandas integration, and publication-ready visualizations—all in one lightweight package.
64
+
65
+ ## ✨ Core Capabilities
66
+
67
+ + **Standardized Rigor:** Perfectly replicates `ropls` step-wise variance increments ($R^2X$, $R^2Y$, $Q^2$) and NIPALS-based orthogonal signal correction (OSC).
68
+ + **Pandas Native:** Seamlessly feed `pandas.DataFrame` into the model. Sample IDs and feature names are automatically tracked, eliminating the need for tedious matrix index management.
69
+ + **Multi-Core Acceleration:** Powered by `joblib`, permutation tests are fully parallelized.
70
+ + **Publication-Ready Graphics:** Built on `matplotlib` and `seaborn` to generate clean, high-resolution diagnostic plots with smart legend placement.
71
+ + **Structured Export:** Extract model parameters, sample scores, and biomarker statistics (VIP, Covariance, Correlation) as instantly usable DataFrames for downstream pipelines.
72
+
73
+ > **Note:** Due to the nature of eigen-decomposition, the signs of scores and loadings may be flipped between platforms. This is mathematically equivalent and does not affect biological interpretation.
74
+
75
+ ## 📦 Installation
76
+ You can install `pi-oplsda` using either of the following methods, depending on whether you simply want to use the package or if you plan to modify the source code.
77
+
78
+ Option 1: Install directly from GitHub (Recommended for most users)
79
+
80
+ ```bash
81
+ pip install git+https://github.com/KaikunXu/pi-oplsda.git
82
+ ```
83
+
84
+ Option 2: Install from source (For developers)
85
+
86
+ If you want to contribute to the project, modify the algorithm, or explore the source code, you can clone the repository and install it in "editable" mode. This means any changes you make to the local code will immediately take effect without needing to reinstall the package.
87
+
88
+ ```bash
89
+ # 1. Clone the repository
90
+ git clone https://github.com/KaikunXu/pi-oplsda.git
91
+
92
+ # 2. Navigate into the project directory
93
+ cd pi-oplsda
94
+
95
+ # 3. Install in editable mode
96
+ pip install -e .
97
+ ```
98
+
99
+ ## 🚀 Quickstart & Tutorials
100
+
101
+ We provide interactive Jupyter Notebooks that walk you through the entire OPLS-DA workflow and our rigorous validation process:
102
+
103
+ * **[Quickstart Tutorial](examples/quickstart_en.ipynb)**: A comprehensive guide from data loading to visualization and prediction.
104
+ * **[R-ropls Equivalence Benchmark](examples/benchmark.ipynb)**: The complete script used to prove numerical consistency between Python and R implementations.
105
+
106
+ ## 📈 Visualization
107
+
108
+ Running the `OPLSDA_Visualizer` will automatically generate a suite of tightly integrated diagnostic subplots to evaluate your model from multiple dimensions:
109
+
110
+ + **Model Overview:** Displays the step-wise increments of $R^2Y$ and $Q^2$ for both predictive and orthogonal components, illustrating the model's global explanatory and predictive capacity.
111
+ + **X-Score Plot:** Visualizes sample clustering and separation in the predictive latent space, complete with 95% Hotelling's $T^2$ confidence ellipses.
112
+ + **Observation Diagnostics:** Evaluates the relationship between sample influence (Score Distance) and model fit (Orthogonal Distance / DModX) to robustly identify multivariate outliers.
113
+ + **Permutation Test:** Validates model robustness against overfitting by comparing the original $R^2Y$ and $Q^2$ against permuted null distributions, providing empirical p-values.
114
+ + **VIP Bar Plot:** Ranks the top features contributing to group separation. It features **automatic text wrapping** for excessively long metabolite names on the Y-axis to ensure clean, publication-ready layouts.
115
+ + **S-Plot (Optional):** Highlights potential biomarkers based on the interplay between covariance (magnitude) and correlation (reliability). *(Note: This plot is available exclusively for binary classification models, as demonstrated in the Quickstart Tutorial).*
116
+
117
+ !["pi-oplsda_visualizer"](https://github.com/KaikunXu/pi-oplsda/blob/main/assets/pi-oplsda_visualizer.png)
118
+
119
+ ## 🎯 Mathematical Equivalence & Benchmarking
120
+
121
+ `pi-oplsda` is strictly validated against the gold-standard R package `ropls` (Bioconductor) to ensure scientific integrity. Our cross-platform benchmarking demonstrates that `pi-oplsda` produces numerically identical results across all key OPLS-DA metrics.
122
+
123
+ Using the **Sacurine** human urine dataset (183 samples, 109 metabolites), we compared the Python and R implementations:
124
+
125
+ | Metric | Description | Comparison |
126
+ | :--- | :--- | :--- |
127
+ | **Global Quality** | Cumulative $R^2X$, $R^2Y$, and $Q^2$ | **Approximately equal** |
128
+ | **Error Assessment** | Root Mean Square Error of Estimation (RMSEE) | **Approximately equal** |
129
+ | **Latent Space** | Predictive Scores ($t_1$, $to_{n}$) and Loadings ($p_1$) | **Pearson's r > 0.999** |
130
+ | **Variable Importance** | Variable Importance in Projection (VIP) scores | **Pearson's r > 0.999** |
131
+
132
+ Cross-platform benchmarking demonstrates that `pi-oplsda` produces numerically identical results across all key OPLS-DA metrics. To ensure a rigorous, one-to-one comparison of the underlying computational results, the testing process directly invokes the R engine via the `rpy2` interface. Model parameters were strictly aligned between platforms, fixing the predictive component to 1, optimizing orthogonal components (n=3), and employing 7-fold cross-validation.
133
+
134
+ In the visualizations below, the solid red scatter points map the values generated by both platforms as coordinate pairs ($x_{\text{ropls}}, y_{\text{pi-oplsda}}$), while the dashed black lines denote the ideal axis of perfect equivalence ($y=x$). The exceptionally high correlation coefficients ($r \approx 1.0000$) provide mathematical proof of algorithmic identity:
135
+
136
+ * **Global Model Metrics (Top-Left):** Compares the overall cumulative explained variance ($R^2X$, $R^2Y$), cross-validated predictability ($Q^2$), and Root Mean Square Error of Estimation (RMSEE). The negligible absolute differences (Abs Diff $< 10^{-2}$) confirm macroscopic equivalence.
137
+ * **Orthogonal Latent Space (Top-Right):** Displays the correlation of the orthogonal score vectors (e.g., $t_{o1}, t_{o2}$). This confirms that both models possess identical geometric capabilities in extracting and filtering intra-class structured noise.
138
+ * **Feature Importance & Predictive Space (Bottom Row):** Illustrates the three critical vectors driving the discriminatory power of OPLS-DA: **VIP Scores** (determining biomarker ranking), **Predictive Scores** $t_1$ (driving sample clustering), and **Predictive Loadings** $p_1$ (determining feature weights). The diagonal alignment confirms absolute accuracy in microscopic sample profiling and feature extraction.
139
+
140
+ !["pi_oplsda_benchmark.png"](https://github.com/KaikunXu/pi-oplsda/blob/main/assets/pi_oplsda_benchmark.png)
141
+
142
+ ## 🤝 Acknowledgements
143
+
144
+ The algorithmic foundation of `pi-oplsda` is deeply inspired by the excellent R package [`ropls`](https://bioconductor.org/packages/ropls/).
145
+
146
+ > **Note:** Portions of this codebase, including code refactoring and documentation, were refined with the assistance of Gemini 3.1 Pro. All AI-assisted contributions have been strictly reviewed by the human author to ensure scientific accuracy and code quality.
147
+
148
+ ## 🛠 Contributing
149
+
150
+ Contributions, issues, and feature requests are welcome! Feel free to check the [issues page](https://github.com/KaikunXu/pi-oplsda/issues).
151
+
152
+ ## 📄 License
153
+
154
+ This project is licensed under the **MIT License**.
@@ -0,0 +1,17 @@
1
+ LICENSE
2
+ README.md
3
+ pyproject.toml
4
+ src/pi_oplsda.egg-info/PKG-INFO
5
+ src/pi_oplsda.egg-info/SOURCES.txt
6
+ src/pi_oplsda.egg-info/dependency_links.txt
7
+ src/pi_oplsda.egg-info/requires.txt
8
+ src/pi_oplsda.egg-info/top_level.txt
9
+ src/piopls/__init__.py
10
+ src/piopls/datasets.py
11
+ src/piopls/oplsda_models.py
12
+ src/piopls/oplsda_plotting.py
13
+ src/piopls/utils.py
14
+ src/piopls/data/sacurine_X.csv
15
+ src/piopls/data/sacurine_Y.csv
16
+ tests/test_oplsda.py
17
+ tests/test_ropls_equivalence.py
@@ -0,0 +1,15 @@
1
+ numpy>=1.21.0
2
+ pandas>=1.3.0
3
+ scipy>=1.7.0
4
+ scikit-learn>=1.0.2
5
+ joblib>=1.3.0
6
+ matplotlib>=3.3.0
7
+ seaborn>=0.11.0
8
+ tqdm>=4.65.0
9
+ rich>=13.0.0
10
+ patchworklib>=0.6.2
11
+ plotnine>=0.10.1
12
+
13
+ [test]
14
+ pytest>=7.0.0
15
+ tabulate>=0.10.0
@@ -0,0 +1 @@
1
+ piopls
@@ -0,0 +1,19 @@
1
+ # src/piopls/__init__.py
2
+
3
+ """
4
+ piopls: A Python package for Orthogonal Partial Least Squares Discriminant Analysis (OPLS-DA).
5
+ Strictly aligned with ropls algorithm definitions, providing highly efficient parallel
6
+ permutation tests and publication-ready visualizations.
7
+ """
8
+
9
+ from .oplsda_models import OPLSDA
10
+ from .oplsda_plotting import OPLSDA_Visualizer
11
+ from .datasets import load_sacurine
12
+
13
+ __all__ = [
14
+ "OPLSDA",
15
+ "OPLSDA_Visualizer",
16
+ "load_sacurine"
17
+ ]
18
+
19
+ __version__ = "1.0.1"