peptcrnet 1.0.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- peptcrnet-1.0.0/CONTRIBUTING.md +213 -0
- peptcrnet-1.0.0/LICENSE +21 -0
- peptcrnet-1.0.0/MANIFEST.in +8 -0
- peptcrnet-1.0.0/PKG-INFO +363 -0
- peptcrnet-1.0.0/README.md +291 -0
- peptcrnet-1.0.0/datasets/atchley.txt +21 -0
- peptcrnet-1.0.0/figures/Pipeline.jpg +0 -0
- peptcrnet-1.0.0/peptcrnet/__init__.py +54 -0
- peptcrnet-1.0.0/peptcrnet/config.py +363 -0
- peptcrnet-1.0.0/peptcrnet/data/__init__.py +39 -0
- peptcrnet-1.0.0/peptcrnet/data/loader.py +477 -0
- peptcrnet-1.0.0/peptcrnet/data/preprocessing.py +451 -0
- peptcrnet-1.0.0/peptcrnet/data/utils.py +427 -0
- peptcrnet-1.0.0/peptcrnet/pipeline.py +680 -0
- peptcrnet-1.0.0/peptcrnet/predictor.py +939 -0
- peptcrnet-1.0.0/peptcrnet.egg-info/PKG-INFO +363 -0
- peptcrnet-1.0.0/peptcrnet.egg-info/SOURCES.txt +22 -0
- peptcrnet-1.0.0/peptcrnet.egg-info/dependency_links.txt +1 -0
- peptcrnet-1.0.0/peptcrnet.egg-info/requires.txt +36 -0
- peptcrnet-1.0.0/peptcrnet.egg-info/top_level.txt +1 -0
- peptcrnet-1.0.0/requirements.txt +35 -0
- peptcrnet-1.0.0/setup.cfg +4 -0
- peptcrnet-1.0.0/setup.py +85 -0
- peptcrnet-1.0.0/synthetic_training_data.csv +5001 -0
|
@@ -0,0 +1,213 @@
|
|
|
1
|
+
# Contributing to PepTCRNet
|
|
2
|
+
|
|
3
|
+
We welcome contributions to PepTCRNet! This document provides guidelines for contributing to the project.
|
|
4
|
+
|
|
5
|
+
## Getting Started
|
|
6
|
+
|
|
7
|
+
1. Fork the repository on GitHub
|
|
8
|
+
2. Clone your fork locally:
|
|
9
|
+
```bash
|
|
10
|
+
git clone https://github.com/yourusername/Pep-TCRNet.git
|
|
11
|
+
cd Pep-TCRNet
|
|
12
|
+
```
|
|
13
|
+
3. Create a new branch for your feature:
|
|
14
|
+
```bash
|
|
15
|
+
git checkout -b feature/amazing-feature
|
|
16
|
+
```
|
|
17
|
+
|
|
18
|
+
## Development Setup
|
|
19
|
+
|
|
20
|
+
### Install Development Dependencies
|
|
21
|
+
|
|
22
|
+
```bash
|
|
23
|
+
# Create virtual environment
|
|
24
|
+
conda create -n peptcrnet-dev python=3.8
|
|
25
|
+
conda activate peptcrnet-dev
|
|
26
|
+
|
|
27
|
+
# Install package in development mode
|
|
28
|
+
pip install -e .
|
|
29
|
+
|
|
30
|
+
# Install development dependencies
|
|
31
|
+
pip install -e ".[dev]"
|
|
32
|
+
```
|
|
33
|
+
|
|
34
|
+
### Run Tests
|
|
35
|
+
|
|
36
|
+
```bash
|
|
37
|
+
# Run package tests
|
|
38
|
+
python test_package.py
|
|
39
|
+
|
|
40
|
+
# Run network embedding tests
|
|
41
|
+
python test_network_embedding.py
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
## Code Style
|
|
45
|
+
|
|
46
|
+
- Follow PEP 8 Python style guidelines
|
|
47
|
+
- Use descriptive variable names
|
|
48
|
+
- Add docstrings to all functions and classes
|
|
49
|
+
- Keep line length under 88 characters
|
|
50
|
+
|
|
51
|
+
## Making Changes
|
|
52
|
+
|
|
53
|
+
### Types of Contributions
|
|
54
|
+
|
|
55
|
+
- **Bug fixes**: Fix existing functionality
|
|
56
|
+
- **New features**: Add new capabilities
|
|
57
|
+
- **Documentation**: Improve docs and examples
|
|
58
|
+
- **Performance**: Optimize existing code
|
|
59
|
+
- **Tests**: Add or improve test coverage
|
|
60
|
+
|
|
61
|
+
### Commit Messages
|
|
62
|
+
|
|
63
|
+
Use clear, descriptive commit messages:
|
|
64
|
+
```
|
|
65
|
+
Add uncertainty quantification to predictions
|
|
66
|
+
|
|
67
|
+
- Implement Monte Carlo sampling in BayesianClassifier
|
|
68
|
+
- Add confidence scores to prediction outputs
|
|
69
|
+
- Update documentation with uncertainty examples
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
## Submitting Changes
|
|
73
|
+
|
|
74
|
+
1. **Test your changes**: Ensure all tests pass
|
|
75
|
+
2. **Update documentation**: Add/update relevant docs
|
|
76
|
+
3. **Commit your changes**: Use clear commit messages
|
|
77
|
+
4. **Push to your fork**:
|
|
78
|
+
```bash
|
|
79
|
+
git push origin feature/amazing-feature
|
|
80
|
+
```
|
|
81
|
+
5. **Create Pull Request**: Submit PR with description of changes
|
|
82
|
+
|
|
83
|
+
## Pull Request Guidelines
|
|
84
|
+
|
|
85
|
+
### PR Description Template
|
|
86
|
+
|
|
87
|
+
```markdown
|
|
88
|
+
## Description
|
|
89
|
+
Brief description of what this PR does.
|
|
90
|
+
|
|
91
|
+
## Type of Change
|
|
92
|
+
- [ ] Bug fix
|
|
93
|
+
- [ ] New feature
|
|
94
|
+
- [ ] Documentation update
|
|
95
|
+
- [ ] Performance improvement
|
|
96
|
+
|
|
97
|
+
## Testing
|
|
98
|
+
- [ ] Tests pass locally
|
|
99
|
+
- [ ] Added new tests for changes
|
|
100
|
+
- [ ] Updated documentation
|
|
101
|
+
|
|
102
|
+
## Checklist
|
|
103
|
+
- [ ] Code follows project style
|
|
104
|
+
- [ ] Self-review completed
|
|
105
|
+
- [ ] Documentation updated
|
|
106
|
+
```
|
|
107
|
+
|
|
108
|
+
### Review Process
|
|
109
|
+
|
|
110
|
+
1. **Automated Checks**: CI/CD will run tests
|
|
111
|
+
2. **Code Review**: Maintainers will review code
|
|
112
|
+
3. **Discussion**: Address feedback and make changes
|
|
113
|
+
4. **Approval**: PR approved and merged
|
|
114
|
+
|
|
115
|
+
## Development Guidelines
|
|
116
|
+
|
|
117
|
+
### Adding New Features
|
|
118
|
+
|
|
119
|
+
1. **Design first**: Discuss major changes in issues
|
|
120
|
+
2. **Modular approach**: Keep components separate
|
|
121
|
+
3. **Backward compatibility**: Don't break existing APIs
|
|
122
|
+
4. **Documentation**: Document all public functions
|
|
123
|
+
|
|
124
|
+
### Code Organization
|
|
125
|
+
|
|
126
|
+
```
|
|
127
|
+
peptcrnet/
|
|
128
|
+
├── __init__.py # Package interface
|
|
129
|
+
├── predictor.py # Main prediction APIs
|
|
130
|
+
├── pipeline.py # Training pipeline
|
|
131
|
+
├── config.py # Configuration management
|
|
132
|
+
└── data/ # Data utilities
|
|
133
|
+
├── loader.py
|
|
134
|
+
├── preprocessing.py
|
|
135
|
+
└── utils.py
|
|
136
|
+
```
|
|
137
|
+
|
|
138
|
+
### Testing
|
|
139
|
+
|
|
140
|
+
- Add tests for new features
|
|
141
|
+
- Ensure existing tests still pass
|
|
142
|
+
- Test with different Python versions (3.8+)
|
|
143
|
+
- Include edge cases and error conditions
|
|
144
|
+
|
|
145
|
+
## Documentation
|
|
146
|
+
|
|
147
|
+
### Adding Documentation
|
|
148
|
+
|
|
149
|
+
- Use clear, concise language
|
|
150
|
+
- Include code examples
|
|
151
|
+
- Update README.md if needed
|
|
152
|
+
- Add docstrings to new functions
|
|
153
|
+
|
|
154
|
+
### Notebook Guidelines
|
|
155
|
+
|
|
156
|
+
- Clear cell organization
|
|
157
|
+
- Markdown explanations
|
|
158
|
+
- Working code examples
|
|
159
|
+
- Expected outputs shown
|
|
160
|
+
|
|
161
|
+
## Issue Reporting
|
|
162
|
+
|
|
163
|
+
### Bug Reports
|
|
164
|
+
|
|
165
|
+
Include:
|
|
166
|
+
- Python version
|
|
167
|
+
- PepTCRNet version
|
|
168
|
+
- Steps to reproduce
|
|
169
|
+
- Expected vs actual behavior
|
|
170
|
+
- Error messages (if any)
|
|
171
|
+
|
|
172
|
+
### Feature Requests
|
|
173
|
+
|
|
174
|
+
Include:
|
|
175
|
+
- Use case description
|
|
176
|
+
- Proposed solution
|
|
177
|
+
- Alternative solutions considered
|
|
178
|
+
- Implementation suggestions
|
|
179
|
+
|
|
180
|
+
## Community Guidelines
|
|
181
|
+
|
|
182
|
+
### Code of Conduct
|
|
183
|
+
|
|
184
|
+
- Be respectful and inclusive
|
|
185
|
+
- Help others learn and contribute
|
|
186
|
+
- Focus on constructive feedback
|
|
187
|
+
- Celebrate diverse perspectives
|
|
188
|
+
|
|
189
|
+
### Getting Help
|
|
190
|
+
|
|
191
|
+
- **GitHub Issues**: Bug reports and features
|
|
192
|
+
- **GitHub Discussions**: General questions
|
|
193
|
+
- **Documentation**: Check existing docs first
|
|
194
|
+
|
|
195
|
+
## Recognition
|
|
196
|
+
|
|
197
|
+
Contributors will be:
|
|
198
|
+
- Listed in CONTRIBUTORS.md
|
|
199
|
+
- Acknowledged in release notes
|
|
200
|
+
- Tagged in relevant documentation
|
|
201
|
+
|
|
202
|
+
## License
|
|
203
|
+
|
|
204
|
+
By contributing, you agree that your contributions will be licensed under the MIT License.
|
|
205
|
+
|
|
206
|
+
## Questions?
|
|
207
|
+
|
|
208
|
+
Feel free to ask questions in:
|
|
209
|
+
- GitHub Issues (for bugs/features)
|
|
210
|
+
- GitHub Discussions (for general help)
|
|
211
|
+
- Email: peptcrnet@example.com
|
|
212
|
+
|
|
213
|
+
Thank you for contributing to PepTCRNet! 🎉
|
peptcrnet-1.0.0/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2024 PepTCRNet Team
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
peptcrnet-1.0.0/PKG-INFO
ADDED
|
@@ -0,0 +1,363 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: peptcrnet
|
|
3
|
+
Version: 1.0.0
|
|
4
|
+
Summary: A Deep Learning Framework for TCR-Peptide Recognition Prediction
|
|
5
|
+
Home-page: https://github.com/mlizhangx/Pep-TCRNet
|
|
6
|
+
Author: PepTCRNet Team
|
|
7
|
+
Author-email: mlizhang@gmail.com
|
|
8
|
+
Project-URL: Bug Reports, https://github.com/mlizhangx/Pep-TCRNet/issues
|
|
9
|
+
Project-URL: Source, https://github.com/mlizhangx/Pep-TCRNet
|
|
10
|
+
Project-URL: Documentation, https://peptcrnet.readthedocs.io
|
|
11
|
+
Keywords: TCR peptide recognition deep-learning bioinformatics immunology
|
|
12
|
+
Classifier: Development Status :: 5 - Production/Stable
|
|
13
|
+
Classifier: Intended Audience :: Science/Research
|
|
14
|
+
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
|
|
15
|
+
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
|
|
16
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
17
|
+
Classifier: Programming Language :: Python :: 3
|
|
18
|
+
Classifier: Programming Language :: Python :: 3.8
|
|
19
|
+
Classifier: Programming Language :: Python :: 3.9
|
|
20
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
21
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
22
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
23
|
+
Requires-Python: >=3.8
|
|
24
|
+
Description-Content-Type: text/markdown
|
|
25
|
+
License-File: LICENSE
|
|
26
|
+
Requires-Dist: numpy<2.0.0,>=1.24.0
|
|
27
|
+
Requires-Dist: pandas<3.0.0,>=2.0.0
|
|
28
|
+
Requires-Dist: scikit-learn<2.0.0,>=1.3.0
|
|
29
|
+
Requires-Dist: scipy<2.0.0,>=1.10.0
|
|
30
|
+
Requires-Dist: tensorflow<3.0.0,>=2.13.0
|
|
31
|
+
Requires-Dist: tf-keras>=2.13.0
|
|
32
|
+
Requires-Dist: tensorflow-probability[tf]<1.0.0,>=0.21.0
|
|
33
|
+
Requires-Dist: matplotlib<4.0.0,>=3.7.0
|
|
34
|
+
Requires-Dist: seaborn<1.0.0,>=0.12.0
|
|
35
|
+
Requires-Dist: networkx<4.0.0,>=2.8.0
|
|
36
|
+
Requires-Dist: stellargraph>=1.2.0
|
|
37
|
+
Requires-Dist: python-Levenshtein>=0.21.0
|
|
38
|
+
Requires-Dist: umap-learn>=0.5.0
|
|
39
|
+
Requires-Dist: hdbscan>=0.8.0
|
|
40
|
+
Requires-Dist: tqdm>=4.65.0
|
|
41
|
+
Requires-Dist: natsort>=8.0.0
|
|
42
|
+
Requires-Dist: joblib>=1.3.0
|
|
43
|
+
Requires-Dist: jupyter>=1.0.0
|
|
44
|
+
Requires-Dist: ipywidgets>=8.0.0
|
|
45
|
+
Requires-Dist: notebook>=6.5.0
|
|
46
|
+
Provides-Extra: dev
|
|
47
|
+
Requires-Dist: pytest>=6.0; extra == "dev"
|
|
48
|
+
Requires-Dist: pytest-cov>=2.0; extra == "dev"
|
|
49
|
+
Requires-Dist: black>=21.0; extra == "dev"
|
|
50
|
+
Requires-Dist: flake8>=3.8; extra == "dev"
|
|
51
|
+
Requires-Dist: sphinx>=4.0; extra == "dev"
|
|
52
|
+
Requires-Dist: sphinx-rtd-theme>=0.5; extra == "dev"
|
|
53
|
+
Provides-Extra: viz
|
|
54
|
+
Requires-Dist: seaborn>=0.12.0; extra == "viz"
|
|
55
|
+
Requires-Dist: matplotlib>=3.7.0; extra == "viz"
|
|
56
|
+
Requires-Dist: plotly>=5.0; extra == "viz"
|
|
57
|
+
Provides-Extra: gpu
|
|
58
|
+
Requires-Dist: tensorflow-gpu>=2.13.0; extra == "gpu"
|
|
59
|
+
Dynamic: author
|
|
60
|
+
Dynamic: author-email
|
|
61
|
+
Dynamic: classifier
|
|
62
|
+
Dynamic: description
|
|
63
|
+
Dynamic: description-content-type
|
|
64
|
+
Dynamic: home-page
|
|
65
|
+
Dynamic: keywords
|
|
66
|
+
Dynamic: license-file
|
|
67
|
+
Dynamic: project-url
|
|
68
|
+
Dynamic: provides-extra
|
|
69
|
+
Dynamic: requires-dist
|
|
70
|
+
Dynamic: requires-python
|
|
71
|
+
Dynamic: summary
|
|
72
|
+
|
|
73
|
+
# PepTCRNet: Deep Learning for TCR-Peptide Recognition Prediction
|
|
74
|
+
|
|
75
|
+
<p align="center">
|
|
76
|
+
<img src="figures/Pipeline.jpg" alt="PepTCRNet Pipeline" width="600"/>
|
|
77
|
+
</p>
|
|
78
|
+
|
|
79
|
+
[](https://www.python.org/downloads/)
|
|
80
|
+
[](https://www.tensorflow.org/)
|
|
81
|
+
[](https://opensource.org/licenses/MIT)
|
|
82
|
+
|
|
83
|
+
**PepTCRNet** is a state-of-the-art deep learning framework for predicting T-cell receptor (TCR) recognition of peptide antigens. It combines advanced neural network architectures with comprehensive feature engineering to achieve high-accuracy predictions with uncertainty quantification.
|
|
84
|
+
|
|
85
|
+
## 🌟 Key Features
|
|
86
|
+
|
|
87
|
+
- **Multi-modal Integration**: Seamlessly combines sequence, categorical, and network-based features
|
|
88
|
+
- **Advanced Embeddings**: Utilizes autoencoders, position encoding, and Atchley factors for sequence representation
|
|
89
|
+
- **Bayesian Neural Networks**: Provides uncertainty quantification for predictions
|
|
90
|
+
- **Comprehensive Pipeline**: End-to-end solution from data preprocessing to model deployment
|
|
91
|
+
- **Flexible Architecture**: Modular design allows easy customization and extension
|
|
92
|
+
- **Class Imbalance Handling**: Built-in support for imbalanced datasets
|
|
93
|
+
- **Rich Visualizations**: Extensive plotting utilities for model interpretation
|
|
94
|
+
|
|
95
|
+
## 🚀 Quick Start
|
|
96
|
+
|
|
97
|
+
### Run the Complete Demo (Easiest!)
|
|
98
|
+
```bash
|
|
99
|
+
# One-click demo launcher
|
|
100
|
+
./run_demo.sh
|
|
101
|
+
```
|
|
102
|
+
|
|
103
|
+
This launches the complete **Scenario 17** demo using all features!
|
|
104
|
+
|
|
105
|
+
### Installation
|
|
106
|
+
|
|
107
|
+
#### From Source (Current Setup)
|
|
108
|
+
```bash
|
|
109
|
+
cd /Users/lung/Documents/Projects/peptcrnet/PepTCR-Net
|
|
110
|
+
|
|
111
|
+
# Install in development mode
|
|
112
|
+
pip install -e .
|
|
113
|
+
|
|
114
|
+
# Run the demo
|
|
115
|
+
conda activate tfBNN
|
|
116
|
+
jupyter notebook DEMO_Complete_Pipeline.ipynb
|
|
117
|
+
```
|
|
118
|
+
|
|
119
|
+
#### Future: From PyPI (After Publishing)
|
|
120
|
+
```bash
|
|
121
|
+
pip install peptcrnet
|
|
122
|
+
```
|
|
123
|
+
|
|
124
|
+
### Basic Usage
|
|
125
|
+
|
|
126
|
+
```python
|
|
127
|
+
import peptcrnet
|
|
128
|
+
from peptcrnet import PepTCRNetPipeline
|
|
129
|
+
|
|
130
|
+
# Initialize pipeline
|
|
131
|
+
pipeline = PepTCRNetPipeline(data_path='your_data.csv')
|
|
132
|
+
|
|
133
|
+
# Load and prepare data
|
|
134
|
+
pipeline.load_data()
|
|
135
|
+
pipeline.split_data(test_size=0.2, val_size=0.1)
|
|
136
|
+
|
|
137
|
+
# Prepare features
|
|
138
|
+
pipeline.prepare_features(feature_types=['sequences', 'categorical'])
|
|
139
|
+
|
|
140
|
+
# Train model
|
|
141
|
+
history = pipeline.train(epochs=100, batch_size=128)
|
|
142
|
+
|
|
143
|
+
# Evaluate with uncertainty
|
|
144
|
+
results = pipeline.evaluate_with_uncertainty(n_samples=200)
|
|
145
|
+
|
|
146
|
+
# Make predictions
|
|
147
|
+
predictions = pipeline.predict(new_data)
|
|
148
|
+
```
|
|
149
|
+
|
|
150
|
+
## 📊 Data Format
|
|
151
|
+
|
|
152
|
+
PepTCRNet expects input data in CSV format with the following columns:
|
|
153
|
+
|
|
154
|
+
| Column | Description | Example |
|
|
155
|
+
|--------|-------------|---------|
|
|
156
|
+
| `CDR3` | TCR CDR3β sequence | `CASSRGQGNEQFF` |
|
|
157
|
+
| `Peptide` | Peptide sequence or class label | `GILGFVFTL` |
|
|
158
|
+
| `V` | V gene segment | `TRBV7-2` |
|
|
159
|
+
| `J` | J gene segment | `TRBJ2-1` |
|
|
160
|
+
| `HLA-A` | HLA-A allele | `A*02:01` |
|
|
161
|
+
| `HLA-B` | HLA-B allele | `B*07:02` |
|
|
162
|
+
| `HLA-C` | HLA-C allele | `C*07:01` |
|
|
163
|
+
|
|
164
|
+
## 🧪 Demo Notebook
|
|
165
|
+
|
|
166
|
+
Try our interactive demo notebook to see PepTCRNet in action:
|
|
167
|
+
|
|
168
|
+
```bash
|
|
169
|
+
jupyter notebook demo_pipeline.ipynb
|
|
170
|
+
```
|
|
171
|
+
|
|
172
|
+
The demo includes:
|
|
173
|
+
- Sample data generation
|
|
174
|
+
- Step-by-step pipeline walkthrough
|
|
175
|
+
- Model training and evaluation
|
|
176
|
+
- Uncertainty quantification
|
|
177
|
+
- Visualization examples
|
|
178
|
+
|
|
179
|
+
## 📚 Documentation
|
|
180
|
+
|
|
181
|
+
### Pipeline Components
|
|
182
|
+
|
|
183
|
+
#### 1. Data Loading and Preprocessing
|
|
184
|
+
```python
|
|
185
|
+
from peptcrnet.data import DataLoader
|
|
186
|
+
|
|
187
|
+
loader = DataLoader('data.csv', atchley_path='atchley_factors.txt')
|
|
188
|
+
stats = loader.get_summary_stats()
|
|
189
|
+
splits = loader.split_data()
|
|
190
|
+
```
|
|
191
|
+
|
|
192
|
+
#### 2. Feature Engineering
|
|
193
|
+
```python
|
|
194
|
+
from peptcrnet.embeddings import SequenceEmbedder, CategoricalEmbedder
|
|
195
|
+
|
|
196
|
+
# Sequence embeddings
|
|
197
|
+
seq_embedder = SequenceEmbedder(atchley_factors, max_length=30)
|
|
198
|
+
tcr_embeddings = seq_embedder.encode_sequences(tcr_sequences)
|
|
199
|
+
|
|
200
|
+
# Categorical embeddings
|
|
201
|
+
cat_embedder = CategoricalEmbedder()
|
|
202
|
+
cat_embeddings = cat_embedder.encode_features(categorical_data)
|
|
203
|
+
```
|
|
204
|
+
|
|
205
|
+
#### 3. Model Training
|
|
206
|
+
```python
|
|
207
|
+
from peptcrnet.models import BayesianClassifier
|
|
208
|
+
|
|
209
|
+
model = BayesianClassifier(
|
|
210
|
+
input_shapes={'sequences': (100,), 'categorical': (50,)},
|
|
211
|
+
num_classes=5,
|
|
212
|
+
hidden_dims=[512, 256, 64]
|
|
213
|
+
)
|
|
214
|
+
|
|
215
|
+
history = model.train(X_train, y_train, X_val, y_val)
|
|
216
|
+
```
|
|
217
|
+
|
|
218
|
+
#### 4. Evaluation and Visualization
|
|
219
|
+
```python
|
|
220
|
+
from peptcrnet.evaluation import ModelEvaluator
|
|
221
|
+
from peptcrnet.visualization import plot_confusion_matrix, plot_roc_curves
|
|
222
|
+
|
|
223
|
+
evaluator = ModelEvaluator()
|
|
224
|
+
metrics = evaluator.compute_metrics(y_true, y_pred, y_proba)
|
|
225
|
+
|
|
226
|
+
plot_confusion_matrix(y_true, y_pred)
|
|
227
|
+
plot_roc_curves(y_true, y_proba)
|
|
228
|
+
```
|
|
229
|
+
|
|
230
|
+
## ⚙️ Configuration
|
|
231
|
+
|
|
232
|
+
PepTCRNet uses a centralized configuration system:
|
|
233
|
+
|
|
234
|
+
```python
|
|
235
|
+
from peptcrnet import config
|
|
236
|
+
|
|
237
|
+
# Access configuration
|
|
238
|
+
print(config.ModelParams.MAX_TCR_LENGTH)
|
|
239
|
+
print(config.TrainingParams.BATCH_SIZE)
|
|
240
|
+
|
|
241
|
+
# Save configuration
|
|
242
|
+
config.save_config('my_config.json')
|
|
243
|
+
|
|
244
|
+
# Load configuration
|
|
245
|
+
config.load_config('my_config.json')
|
|
246
|
+
```
|
|
247
|
+
|
|
248
|
+
## 🔬 Advanced Features
|
|
249
|
+
|
|
250
|
+
### Uncertainty Quantification
|
|
251
|
+
|
|
252
|
+
PepTCRNet provides Bayesian uncertainty estimation:
|
|
253
|
+
|
|
254
|
+
```python
|
|
255
|
+
# Multiple forward passes for uncertainty
|
|
256
|
+
predictions, uncertainty = pipeline.predict_with_uncertainty(
|
|
257
|
+
test_data,
|
|
258
|
+
n_samples=200
|
|
259
|
+
)
|
|
260
|
+
|
|
261
|
+
# Identify high-confidence predictions
|
|
262
|
+
high_confidence_mask = uncertainty < threshold
|
|
263
|
+
```
|
|
264
|
+
|
|
265
|
+
### Custom Feature Combinations
|
|
266
|
+
|
|
267
|
+
Experiment with different feature combinations:
|
|
268
|
+
|
|
269
|
+
```python
|
|
270
|
+
# Define feature cases
|
|
271
|
+
feature_cases = {
|
|
272
|
+
1: ['TCR'],
|
|
273
|
+
2: ['TCR', 'Peptide'],
|
|
274
|
+
3: ['TCR', 'Peptide', 'HLA'],
|
|
275
|
+
4: ['TCR', 'Peptide', 'HLA', 'VJ', 'Network']
|
|
276
|
+
}
|
|
277
|
+
|
|
278
|
+
# Train with specific features
|
|
279
|
+
pipeline.prepare_features(feature_types=feature_cases[3])
|
|
280
|
+
```
|
|
281
|
+
|
|
282
|
+
### Model Persistence
|
|
283
|
+
|
|
284
|
+
Save and load trained models:
|
|
285
|
+
|
|
286
|
+
```python
|
|
287
|
+
# Save complete pipeline
|
|
288
|
+
pipeline.save_pipeline('output_dir/')
|
|
289
|
+
|
|
290
|
+
# Load saved pipeline
|
|
291
|
+
new_pipeline = PepTCRNetPipeline()
|
|
292
|
+
new_pipeline.load_pipeline('output_dir/')
|
|
293
|
+
```
|
|
294
|
+
|
|
295
|
+
## 📈 Performance
|
|
296
|
+
|
|
297
|
+
PepTCRNet achieves state-of-the-art performance on TCR-peptide binding prediction:
|
|
298
|
+
|
|
299
|
+
- **Accuracy**: Up to 95% on benchmark datasets
|
|
300
|
+
- **AUC-ROC**: >0.90 for multi-class classification
|
|
301
|
+
- **Uncertainty Calibration**: Well-calibrated confidence scores
|
|
302
|
+
|
|
303
|
+
## 🤝 Contributing
|
|
304
|
+
|
|
305
|
+
We welcome contributions! Please see our [Contributing Guidelines](CONTRIBUTING.md) for details.
|
|
306
|
+
|
|
307
|
+
```bash
|
|
308
|
+
# Fork the repository
|
|
309
|
+
# Create your feature branch
|
|
310
|
+
git checkout -b feature/amazing-feature
|
|
311
|
+
|
|
312
|
+
# Commit your changes
|
|
313
|
+
git commit -m 'Add amazing feature'
|
|
314
|
+
|
|
315
|
+
# Push to the branch
|
|
316
|
+
git push origin feature/amazing-feature
|
|
317
|
+
|
|
318
|
+
# Open a Pull Request
|
|
319
|
+
```
|
|
320
|
+
|
|
321
|
+
## 📝 Citation
|
|
322
|
+
|
|
323
|
+
If you use PepTCRNet in your research, please cite:
|
|
324
|
+
|
|
325
|
+
```bibtex
|
|
326
|
+
@article{peptcrnet2024,
|
|
327
|
+
title={PepTCRNet: A Deep Learning Framework for TCR-Peptide Recognition Prediction},
|
|
328
|
+
author={Your Name et al.},
|
|
329
|
+
journal={Journal Name},
|
|
330
|
+
year={2024}
|
|
331
|
+
}
|
|
332
|
+
```
|
|
333
|
+
|
|
334
|
+
## 📄 License
|
|
335
|
+
|
|
336
|
+
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
|
|
337
|
+
|
|
338
|
+
## 🙏 Acknowledgments
|
|
339
|
+
|
|
340
|
+
- Thanks to all contributors who have helped shape PepTCRNet
|
|
341
|
+
- Inspired by advances in deep learning for immunology
|
|
342
|
+
- Built with TensorFlow and the Python scientific computing ecosystem
|
|
343
|
+
|
|
344
|
+
## 📮 Contact
|
|
345
|
+
|
|
346
|
+
- **Issues**: [GitHub Issues](https://github.com/mlizhangx/Pep-TCRNet/issues)
|
|
347
|
+
- **Discussions**: [GitHub Discussions](https://github.com/mlizhangx/Pep-TCRNet/discussions)
|
|
348
|
+
- **Email**: peptcrnet@example.com
|
|
349
|
+
|
|
350
|
+
## 🗺️ Roadmap
|
|
351
|
+
|
|
352
|
+
- [ ] Support for TCRα chains
|
|
353
|
+
- [ ] Integration with single-cell RNA-seq data
|
|
354
|
+
- [ ] Web interface for predictions
|
|
355
|
+
- [ ] Pre-trained models for common peptides
|
|
356
|
+
- [ ] GPU optimization for large-scale predictions
|
|
357
|
+
- [ ] Docker containerization
|
|
358
|
+
|
|
359
|
+
---
|
|
360
|
+
|
|
361
|
+
<p align="center">
|
|
362
|
+
Made with ❤️ by the PepTCRNet Team
|
|
363
|
+
</p>
|