cagqsar 1.0.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
cagqsar-1.0.0/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Sathish Kumar M Ponnaiya
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
cagqsar-1.0.0/PKG-INFO ADDED
@@ -0,0 +1,171 @@
1
+ Metadata-Version: 2.4
2
+ Name: cagqsar
3
+ Version: 1.0.0
4
+ Summary: A complete CLI QSAR pipeline for drug discovery and predictive toxicology
5
+ Author: Sathish Kumar M Ponnaiya
6
+ Classifier: Programming Language :: Python :: 3
7
+ Classifier: Operating System :: POSIX :: Linux
8
+ Classifier: Intended Audience :: Science/Research
9
+ Classifier: Topic :: Scientific/Engineering :: Chemistry
10
+ Requires-Python: >=3.8
11
+ Description-Content-Type: text/markdown
12
+ License-File: LICENSE
13
+ Requires-Dist: numpy
14
+ Requires-Dist: pandas
15
+ Requires-Dist: scipy
16
+ Requires-Dist: scikit-learn
17
+ Requires-Dist: xgboost
18
+ Requires-Dist: matplotlib
19
+ Requires-Dist: rdkit
20
+ Provides-Extra: torch
21
+ Requires-Dist: torch; extra == "torch"
22
+ Dynamic: license-file
23
+
24
+ # CAG-QSAR CLI Tool
25
+
26
+ A complete, production-grade command-line QSAR (Quantitative Structure-Activity Relationship) modeling pipeline. This tool automates the process of data curation, molecular descriptor calculation, feature selection, data splitting, model building, and rigorous validation.
27
+
28
+ ---
29
+
30
+ ## Installation
31
+
32
+ You can install this package in Linux/WSL using three different methods, depending on your environment.
33
+
34
+ ### Prerequisites
35
+ Make sure you have Python (version >= 3.8) and `pip` installed:
36
+ ```bash
37
+ sudo apt update
38
+ sudo apt install python3 python3-pip python3-venv -y
39
+ ```
40
+
41
+ ### Option A: Install from PyPI (Once Published)
42
+ After publishing the package to PyPI, you can create a virtual environment and install the tool globally or locally using `pip`:
43
+ ```bash
44
+ # 1. Create a virtual environment
45
+ python3 -m venv qsar_env
46
+ source qsar_env/bin/activate
47
+
48
+ # 2. Install the package from PyPI
49
+ pip install cagqsar
50
+
51
+ # 3. Run the CLI tool
52
+ cagqsar --help
53
+ ```
54
+
55
+ ### Option B: Local Source Install (No root access required)
56
+ You can build and install the package locally from the repository folder:
57
+ ```bash
58
+ # 1. Navigate to the repository directory
59
+ cd git_qsar
60
+
61
+ # 2. Create and activate a virtual environment
62
+ python3 -m venv qsar_env
63
+ source qsar_env/bin/activate
64
+
65
+ # 3. Install the package locally
66
+ pip install .
67
+
68
+ # 4. Run the CLI tool
69
+ cagqsar --help
70
+ ```
71
+
72
+ ### Option C: Editable Development Mode
73
+ If you plan to modify the source code of the pipeline and want changes to reflect instantly:
74
+ ```bash
75
+ pip install -e .
76
+ ```
77
+
78
+ ---
79
+
80
+ ## Command Line Usage
81
+
82
+ After installation, run the application from any directory in your terminal:
83
+
84
+ ```bash
85
+ cagqsar --data <dataset_csv> --smiles <smiles_column> --activity <activity_column> [options]
86
+ ```
87
+
88
+ ### Core CLI Arguments:
89
+ * `--data`: Path to the CSV dataset (Required).
90
+ * `--smiles`: Column name containing SMILES strings (Required).
91
+ * `--activity`: Column name containing activities in nM (Required).
92
+ * `--model`: Regression algorithm to train: `mlr` (MLR), `pls` (PLS), `rf` (Random Forest), `svr` (SVM), `xgb` (XGBoost), or `gnn` (Graph Neural Network) (Default: `pls`).
93
+ * `--split`: Splitting method: `random` or `pca` (Kennard-Stone PCA-distance split) (Default: `pca`).
94
+ * `--test_size`: Fraction of data allocated to the test set (Default: `0.2`).
95
+ * `--var_thresh`: Variance filter threshold for dropping constant descriptors (Default: `0.01`).
96
+ * `--corr_thresh`: Correlation threshold for collinearity filter (Default: `0.85`).
97
+ * `--y_rand_runs`: Number of Y-randomization validation loops (Default: `50`).
98
+ * `--fingerprints`: Flag to compute 2D fingerprints (Morgan/ECFP + MACCS keys) in addition to physical descriptors.
99
+ * `--out_dir`: Directory to export curated data, model reports, trained model binaries, and evaluation plots (Default: `qsar_output`).
100
+
101
+ ---
102
+
103
+ ## Programmatic Import in Python
104
+
105
+ Once the package is installed, you can import and use any of its internal logic (like the structure curator or descriptor calculator) in your own scripts:
106
+
107
+ ```python
108
+ from cagqsar import curate_molecule, get_rdkit_descriptors
109
+
110
+ # 1. Clean a SMILES structure and remove salt fragments
111
+ clean_smiles, mol = curate_molecule("CN(C)C(=O)c1ccccc1.Cl", Chem.SaltRemover.SaltRemover())
112
+
113
+ # 2. Extract standard RDKit descriptors
114
+ descriptors = get_rdkit_descriptors(mol)
115
+ ```
116
+
117
+ ---
118
+
119
+ ## Publishing to GitHub & PyPI
120
+
121
+ Follow these instructions to publish your code for public access.
122
+
123
+ ### 1. Publishing to GitHub
124
+ Initialize the local git repository, commit the files, and push to GitHub:
125
+ ```bash
126
+ # 1. Initialize repository
127
+ git init
128
+
129
+ # 2. Add files (automatically respects .gitignore)
130
+ git add .
131
+
132
+ # 3. Create initial commit
133
+ git commit -m "feat: initial release of cagqsar v1.0.0"
134
+
135
+ # 4. Set main branch name
136
+ git branch -M main
137
+
138
+ # 5. Add remote GitHub link and push
139
+ git remote add origin https://github.com/YOUR_USERNAME/cagqsar.git
140
+ git push -u origin main
141
+ ```
142
+
143
+ ### 2. Publishing to PyPI
144
+ To make the tool installable globally via `pip install cagqsar`, build and upload the package distributions to the Python Package Index (PyPI):
145
+
146
+ ```bash
147
+ # 1. Install packaging build tools
148
+ pip install --upgrade build twine
149
+
150
+ # 2. Compile source distribution (sdist) and binary wheel (bdist_wheel)
151
+ python3 -m build
152
+
153
+ # 3. Verify build files
154
+ twine check dist/*
155
+
156
+ # 4. Upload to PyPI (requires PyPI API Token)
157
+ python3 -m twine upload dist/*
158
+ ```
159
+
160
+ ---
161
+
162
+ ## Acknowledgments & Credits
163
+
164
+ * **Concept, Idea & Planning**: Sathish Kumar M Ponnaiya (SKM Ponnaiya).
165
+ * **Infrastructure & Support**: **Ponnaiya's Code And Genome Pvt Ltd, Madurai** (System support, server resources, internet facilities, and infrastructure).
166
+ * **AI Coding Partner**: Pair-programmed and optimized using **Antigravity**, a Google DeepMind agentic coding system.
167
+ * **Large Language Model (LLM)**: Driven by Google's **Gemini 3.5 Flash**.
168
+ * **Access Provider**: Grateful to **Jio** for enabling Gemini Premium access.
169
+ * **Test Dataset**: Sourced from the public **BindingDB database**.
170
+
171
+ *This software is open-access and free for all users under the terms of the MIT License.*
@@ -0,0 +1,148 @@
1
+ # CAG-QSAR CLI Tool
2
+
3
+ A complete, production-grade command-line QSAR (Quantitative Structure-Activity Relationship) modeling pipeline. This tool automates the process of data curation, molecular descriptor calculation, feature selection, data splitting, model building, and rigorous validation.
4
+
5
+ ---
6
+
7
+ ## Installation
8
+
9
+ You can install this package in Linux/WSL using three different methods, depending on your environment.
10
+
11
+ ### Prerequisites
12
+ Make sure you have Python (version >= 3.8) and `pip` installed:
13
+ ```bash
14
+ sudo apt update
15
+ sudo apt install python3 python3-pip python3-venv -y
16
+ ```
17
+
18
+ ### Option A: Install from PyPI (Once Published)
19
+ After publishing the package to PyPI, you can create a virtual environment and install the tool globally or locally using `pip`:
20
+ ```bash
21
+ # 1. Create a virtual environment
22
+ python3 -m venv qsar_env
23
+ source qsar_env/bin/activate
24
+
25
+ # 2. Install the package from PyPI
26
+ pip install cagqsar
27
+
28
+ # 3. Run the CLI tool
29
+ cagqsar --help
30
+ ```
31
+
32
+ ### Option B: Local Source Install (No root access required)
33
+ You can build and install the package locally from the repository folder:
34
+ ```bash
35
+ # 1. Navigate to the repository directory
36
+ cd git_qsar
37
+
38
+ # 2. Create and activate a virtual environment
39
+ python3 -m venv qsar_env
40
+ source qsar_env/bin/activate
41
+
42
+ # 3. Install the package locally
43
+ pip install .
44
+
45
+ # 4. Run the CLI tool
46
+ cagqsar --help
47
+ ```
48
+
49
+ ### Option C: Editable Development Mode
50
+ If you plan to modify the source code of the pipeline and want changes to reflect instantly:
51
+ ```bash
52
+ pip install -e .
53
+ ```
54
+
55
+ ---
56
+
57
+ ## Command Line Usage
58
+
59
+ After installation, run the application from any directory in your terminal:
60
+
61
+ ```bash
62
+ cagqsar --data <dataset_csv> --smiles <smiles_column> --activity <activity_column> [options]
63
+ ```
64
+
65
+ ### Core CLI Arguments:
66
+ * `--data`: Path to the CSV dataset (Required).
67
+ * `--smiles`: Column name containing SMILES strings (Required).
68
+ * `--activity`: Column name containing activities in nM (Required).
69
+ * `--model`: Regression algorithm to train: `mlr` (MLR), `pls` (PLS), `rf` (Random Forest), `svr` (SVM), `xgb` (XGBoost), or `gnn` (Graph Neural Network) (Default: `pls`).
70
+ * `--split`: Splitting method: `random` or `pca` (Kennard-Stone PCA-distance split) (Default: `pca`).
71
+ * `--test_size`: Fraction of data allocated to the test set (Default: `0.2`).
72
+ * `--var_thresh`: Variance filter threshold for dropping constant descriptors (Default: `0.01`).
73
+ * `--corr_thresh`: Correlation threshold for collinearity filter (Default: `0.85`).
74
+ * `--y_rand_runs`: Number of Y-randomization validation loops (Default: `50`).
75
+ * `--fingerprints`: Flag to compute 2D fingerprints (Morgan/ECFP + MACCS keys) in addition to physical descriptors.
76
+ * `--out_dir`: Directory to export curated data, model reports, trained model binaries, and evaluation plots (Default: `qsar_output`).
77
+
78
+ ---
79
+
80
+ ## Programmatic Import in Python
81
+
82
+ Once the package is installed, you can import and use any of its internal logic (like the structure curator or descriptor calculator) in your own scripts:
83
+
84
+ ```python
85
+ from cagqsar import curate_molecule, get_rdkit_descriptors
86
+
87
+ # 1. Clean a SMILES structure and remove salt fragments
88
+ clean_smiles, mol = curate_molecule("CN(C)C(=O)c1ccccc1.Cl", Chem.SaltRemover.SaltRemover())
89
+
90
+ # 2. Extract standard RDKit descriptors
91
+ descriptors = get_rdkit_descriptors(mol)
92
+ ```
93
+
94
+ ---
95
+
96
+ ## Publishing to GitHub & PyPI
97
+
98
+ Follow these instructions to publish your code for public access.
99
+
100
+ ### 1. Publishing to GitHub
101
+ Initialize the local git repository, commit the files, and push to GitHub:
102
+ ```bash
103
+ # 1. Initialize repository
104
+ git init
105
+
106
+ # 2. Add files (automatically respects .gitignore)
107
+ git add .
108
+
109
+ # 3. Create initial commit
110
+ git commit -m "feat: initial release of cagqsar v1.0.0"
111
+
112
+ # 4. Set main branch name
113
+ git branch -M main
114
+
115
+ # 5. Add remote GitHub link and push
116
+ git remote add origin https://github.com/YOUR_USERNAME/cagqsar.git
117
+ git push -u origin main
118
+ ```
119
+
120
+ ### 2. Publishing to PyPI
121
+ To make the tool installable globally via `pip install cagqsar`, build and upload the package distributions to the Python Package Index (PyPI):
122
+
123
+ ```bash
124
+ # 1. Install packaging build tools
125
+ pip install --upgrade build twine
126
+
127
+ # 2. Compile source distribution (sdist) and binary wheel (bdist_wheel)
128
+ python3 -m build
129
+
130
+ # 3. Verify build files
131
+ twine check dist/*
132
+
133
+ # 4. Upload to PyPI (requires PyPI API Token)
134
+ python3 -m twine upload dist/*
135
+ ```
136
+
137
+ ---
138
+
139
+ ## Acknowledgments & Credits
140
+
141
+ * **Concept, Idea & Planning**: Sathish Kumar M Ponnaiya (SKM Ponnaiya).
142
+ * **Infrastructure & Support**: **Ponnaiya's Code And Genome Pvt Ltd, Madurai** (System support, server resources, internet facilities, and infrastructure).
143
+ * **AI Coding Partner**: Pair-programmed and optimized using **Antigravity**, a Google DeepMind agentic coding system.
144
+ * **Large Language Model (LLM)**: Driven by Google's **Gemini 3.5 Flash**.
145
+ * **Access Provider**: Grateful to **Jio** for enabling Gemini Premium access.
146
+ * **Test Dataset**: Sourced from the public **BindingDB database**.
147
+
148
+ *This software is open-access and free for all users under the terms of the MIT License.*
@@ -0,0 +1,18 @@
1
+ # CAG-QSAR Package Initialization
2
+ # Exposes key pipeline functions for programmatic use in Python scripts
3
+
4
+ from .pipeline import (
5
+ curate_molecule,
6
+ curate_dataset,
7
+ get_rdkit_descriptors,
8
+ get_2d_fingerprints,
9
+ generate_descriptors,
10
+ select_features,
11
+ kennard_stone_split,
12
+ split_dataset,
13
+ evaluate_qsar_model,
14
+ main
15
+ )
16
+
17
+ __version__ = "1.0.0"
18
+ __author__ = "Sathish Kumar M Ponnaiya"