pyprotalign 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- pyprotalign-0.1.0/.gitignore +10 -0
- pyprotalign-0.1.0/LICENSE.txt +21 -0
- pyprotalign-0.1.0/PKG-INFO +344 -0
- pyprotalign-0.1.0/README.md +319 -0
- pyprotalign-0.1.0/pyproject.toml +128 -0
- pyprotalign-0.1.0/src/pyprotalign/__init__.py +8 -0
- pyprotalign-0.1.0/src/pyprotalign/alignment.py +313 -0
- pyprotalign-0.1.0/src/pyprotalign/cli.py +392 -0
- pyprotalign-0.1.0/src/pyprotalign/io.py +52 -0
- pyprotalign-0.1.0/src/pyprotalign/kabsch.py +128 -0
- pyprotalign-0.1.0/src/pyprotalign/py.typed +0 -0
- pyprotalign-0.1.0/src/pyprotalign/refine.py +96 -0
- pyprotalign-0.1.0/src/pyprotalign/selection.py +128 -0
- pyprotalign-0.1.0/src/pyprotalign/transform.py +113 -0
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 ugSUBMARINE
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
|
@@ -0,0 +1,344 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: pyprotalign
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: Protein structure superposition package and CLI tool
|
|
5
|
+
Project-URL: Homepage, https://github.com/ugSUBMARINE/pyprotalign
|
|
6
|
+
Project-URL: Documentation, https://github.com/ugSUBMARINE/pyprotalign#readme
|
|
7
|
+
Project-URL: Repository, https://github.com/ugSUBMARINE/pyprotalign.git
|
|
8
|
+
Project-URL: Bug Tracker, https://github.com/ugSUBMARINE/pyprotalign/issues
|
|
9
|
+
Project-URL: Changelog, https://github.com/ugSUBMARINE/pyprotalign/releases
|
|
10
|
+
Author-email: Karl Gruber <gammaturn@gmail.com>
|
|
11
|
+
Maintainer-email: Karl Gruber <gammaturn@gmail.com>
|
|
12
|
+
License: MIT
|
|
13
|
+
License-File: LICENSE.txt
|
|
14
|
+
Keywords: alignment,cli,protein structure,structural bioinformatics,superposition
|
|
15
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
16
|
+
Classifier: Operating System :: OS Independent
|
|
17
|
+
Classifier: Programming Language :: Python :: 3
|
|
18
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
19
|
+
Classifier: Programming Language :: Python :: 3.13
|
|
20
|
+
Classifier: Programming Language :: Python :: 3.14
|
|
21
|
+
Requires-Python: >=3.12
|
|
22
|
+
Requires-Dist: gemmi>=0.7.4
|
|
23
|
+
Requires-Dist: numpy>=1.26
|
|
24
|
+
Description-Content-Type: text/markdown
|
|
25
|
+
|
|
26
|
+
# pyprotalign
|
|
27
|
+
|
|
28
|
+
Protein structure superposition using sequence alignment and iterative refinement.
|
|
29
|
+
|
|
30
|
+
## Features
|
|
31
|
+
|
|
32
|
+
- **Sequence-based alignment**: Automatically identifies corresponding atoms via sequence alignment
|
|
33
|
+
- **Kabsch algorithm**: Optimal least-squares superposition
|
|
34
|
+
- **Iterative refinement**: Outlier rejection for improved accuracy
|
|
35
|
+
- **Multi-chain support**:
|
|
36
|
+
- Single-chain alignment with specified or default chains
|
|
37
|
+
- Global alignment of all matching chains
|
|
38
|
+
- Quaternary alignment with smart chain matching by proximity
|
|
39
|
+
- **Batch processing**: Align multiple mobile structures to a single reference
|
|
40
|
+
|
|
41
|
+
## Installation
|
|
42
|
+
|
|
43
|
+
### Using uv/pip
|
|
44
|
+
```bash
|
|
45
|
+
uv pip install pyprotalign
|
|
46
|
+
```
|
|
47
|
+
|
|
48
|
+
### From source
|
|
49
|
+
```bash
|
|
50
|
+
git clone https://github.com/ugSUBMARINE/pyprotalign.git
|
|
51
|
+
cd pyprotalign
|
|
52
|
+
uv venv
|
|
53
|
+
uv sync
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
## Quick Start
|
|
57
|
+
|
|
58
|
+
### CLI Tool
|
|
59
|
+
|
|
60
|
+
```bash
|
|
61
|
+
# Basic superposition (uses first protein chain from each structure)
|
|
62
|
+
uv run protalign fixed.cif mobile.cif -o superposed.cif
|
|
63
|
+
|
|
64
|
+
# Specify chains to align
|
|
65
|
+
uv run protalign fixed.cif mobile.cif --fixed-chain A --mobile-chain B
|
|
66
|
+
|
|
67
|
+
# Global alignment (align all matching chains: A-A, B-B, etc.)
|
|
68
|
+
uv run protalign fixed.cif mobile.cif --global
|
|
69
|
+
|
|
70
|
+
# Quaternary alignment (smart chain matching by proximity)
|
|
71
|
+
uv run protalign fixed.cif mobile.cif --quaternary --distance-threshold 8.0
|
|
72
|
+
|
|
73
|
+
# Quaternary alignment with chain renaming
|
|
74
|
+
uv run protalign fixed.cif mobile.cif --quaternary --rename-chains
|
|
75
|
+
|
|
76
|
+
# With iterative refinement (reject outliers)
|
|
77
|
+
uv run protalign fixed.cif mobile.cif --refine --cutoff 2.0 --cycles 5
|
|
78
|
+
|
|
79
|
+
# Output as PDB
|
|
80
|
+
uv run protalign fixed.cif mobile.cif -o superposed.pdb
|
|
81
|
+
|
|
82
|
+
# Batch alignment: multiple mobile files (outputs <stem>_superposed.cif)
|
|
83
|
+
uv run protalign reference.cif mobile1.cif mobile2.cif mobile3.cif
|
|
84
|
+
|
|
85
|
+
# Custom output suffix (e.g., <stem>_aligned.cif)
|
|
86
|
+
uv run protalign reference.cif *.cif --output aligned
|
|
87
|
+
|
|
88
|
+
# Batch with quaternary mode (e.g., for AlphaFold/Boltz multi-chain models)
|
|
89
|
+
uv run protalign reference.cif *.cif --quaternary --output aligned
|
|
90
|
+
```
|
|
91
|
+
|
|
92
|
+
Batch mode:
|
|
93
|
+
- Activated when multiple mobile files provided
|
|
94
|
+
- Outputs `<stem>_<suffix>.cif` for each mobile file
|
|
95
|
+
- Reports progress and summary with RMSD values
|
|
96
|
+
- Continues on errors
|
|
97
|
+
|
|
98
|
+
## Usage
|
|
99
|
+
|
|
100
|
+
```
|
|
101
|
+
usage: protalign [-h] [--version] [-o OUTPUT] [--fixed-chain FIXED_CHAIN] [--mobile-chain MOBILE_CHAIN] [--refine] [--cycles CYCLES] [--cutoff CUTOFF] [--global] [--quaternary] [--distance-threshold DISTANCE_THRESHOLD] [--rename-chains] [--verbose]
|
|
102
|
+
fixed mobile [mobile ...]
|
|
103
|
+
|
|
104
|
+
Protein structure superposition tool
|
|
105
|
+
|
|
106
|
+
positional arguments:
|
|
107
|
+
fixed Fixed structure file (PDB or mmCIF)
|
|
108
|
+
mobile Mobile structure file(s) (PDB or mmCIF). If multiple files provided, batch mode is activated.
|
|
109
|
+
|
|
110
|
+
options:
|
|
111
|
+
-h, --help show this help message and exit
|
|
112
|
+
--version show program's version number and exit
|
|
113
|
+
-o, --output OUTPUT Output file (single mode) or suffix (batch mode) (default: superposed.cif)
|
|
114
|
+
--fixed-chain FIXED_CHAIN
|
|
115
|
+
Chain ID for fixed structure (e.g., A). Also used as 'seed' chain in quaternary mode. If not specified, uses first protein chain.
|
|
116
|
+
--mobile-chain MOBILE_CHAIN
|
|
117
|
+
Chain ID for mobile structure (e.g., A). Also used as 'seed' chain in quaternary mode. If not specified, uses first protein chain.
|
|
118
|
+
--refine Use iterative refinement to reject outliers
|
|
119
|
+
--cycles CYCLES Maximum refinement cycles (default: 5)
|
|
120
|
+
--cutoff CUTOFF Outlier rejection cutoff (distance > cutoff * RMSD) (default: 2.0)
|
|
121
|
+
--global Align all protein chains by matching chain IDs (A-A, B-B, etc.) and pooling coordinates
|
|
122
|
+
--quaternary Quaternary alignment: match chains by proximity, rename to match fixed
|
|
123
|
+
--distance-threshold DISTANCE_THRESHOLD
|
|
124
|
+
Distance threshold (Å) for chain matching in quaternary mode (default: 8.0)
|
|
125
|
+
--rename-chains Rename mobile chains to match fixed (only with --quaternary)
|
|
126
|
+
--verbose Enable verbose output (show refinement cycles, chain matching details)
|
|
127
|
+
```
|
|
128
|
+
|
|
129
|
+
### Output
|
|
130
|
+
|
|
131
|
+
The tool reports:
|
|
132
|
+
- Chain(s) and number of residues (single-chain mode)
|
|
133
|
+
- Chains aligned and total pairs (global mode)
|
|
134
|
+
- Number of aligned CA atom pairs
|
|
135
|
+
- Final RMSD in Ångströms
|
|
136
|
+
- If using `--refine`: number of pairs retained/rejected
|
|
137
|
+
|
|
138
|
+
### Examples
|
|
139
|
+
|
|
140
|
+
**Single-chain alignment:**
|
|
141
|
+
```bash
|
|
142
|
+
$ uv run protalign 9jn4.cif 9ebk.cif --refine
|
|
143
|
+
```
|
|
144
|
+
|
|
145
|
+
```
|
|
146
|
+
Fixed: chain B, 213 residues
|
|
147
|
+
Mobile: chain B, 219 residues
|
|
148
|
+
Aligned: 207 CA atom pairs
|
|
149
|
+
Refinement: 167 pairs retained, 40 rejected
|
|
150
|
+
RMSD: 0.637 Å
|
|
151
|
+
Superposed structure written to: superposed.cif
|
|
152
|
+
```
|
|
153
|
+
|
|
154
|
+
**Chain selection:**
|
|
155
|
+
```bash
|
|
156
|
+
$ uv run protalign 9jn4.cif 9ebk.cif --fixed-chain A --mobile-chain B
|
|
157
|
+
```
|
|
158
|
+
|
|
159
|
+
```
|
|
160
|
+
Fixed: chain A, 213 residues
|
|
161
|
+
Mobile: chain B, 219 residues
|
|
162
|
+
Aligned: 207 CA atom pairs
|
|
163
|
+
RMSD: 1.807 Å
|
|
164
|
+
Superposed structure written to: superposed.cif
|
|
165
|
+
```
|
|
166
|
+
|
|
167
|
+
**Global multi-chain alignment:**
|
|
168
|
+
```bash
|
|
169
|
+
$ uv run protalign 9jn4.cif 9jn6.cif --global
|
|
170
|
+
````
|
|
171
|
+
|
|
172
|
+
```
|
|
173
|
+
Chains: A, B, C, D
|
|
174
|
+
Aligned: 850 CA atom pairs across 4 chains
|
|
175
|
+
RMSD: 33.550 Å
|
|
176
|
+
Superposed structure written to: superposed.cif
|
|
177
|
+
```
|
|
178
|
+
|
|
179
|
+
**Quaternary alignment (chain labels differ):**
|
|
180
|
+
```bash
|
|
181
|
+
$ uv run protalign 9jn4.cif 9jn6.cif --quaternary
|
|
182
|
+
```
|
|
183
|
+
|
|
184
|
+
```
|
|
185
|
+
Quaternary alignment:
|
|
186
|
+
B → B (matched)
|
|
187
|
+
D → C (matched)
|
|
188
|
+
A → A (matched)
|
|
189
|
+
C → D (matched)
|
|
190
|
+
Aligned: 850 CA pairs across 4 chain pairs
|
|
191
|
+
RMSD: 0.180 Å
|
|
192
|
+
Superposed structure written to: superposed.cif
|
|
193
|
+
```
|
|
194
|
+
|
|
195
|
+
**Verbose output (detailed progress):**
|
|
196
|
+
```bash
|
|
197
|
+
$ uv run protalign 9jn4.cif 9jn6.cif --quaternary --refine --verbose
|
|
198
|
+
```
|
|
199
|
+
|
|
200
|
+
```
|
|
201
|
+
=== Quaternary Alignment ===
|
|
202
|
+
Seed alignment: B → B
|
|
203
|
+
Refinement cycles:
|
|
204
|
+
Cycle 1: 213 pairs, RMSD = 0.110 Å
|
|
205
|
+
Cycle 2: 205 pairs, RMSD = 0.101 Å
|
|
206
|
+
Cycle 3: 197 pairs, RMSD = 0.093 Å
|
|
207
|
+
Cycle 4: 195 pairs, RMSD = 0.092 Å
|
|
208
|
+
Converged (no more outliers)
|
|
209
|
+
Chain center distances after seed alignment:
|
|
210
|
+
D ↔ C: 0.05 Å ✓
|
|
211
|
+
D ↔ D: 33.91 Å ✗
|
|
212
|
+
D ↔ A: 40.36 Å ✗
|
|
213
|
+
A ↔ D: 17.00 Å ✗
|
|
214
|
+
A ↔ A: 0.19 Å ✓
|
|
215
|
+
C ↔ D: 0.23 Å ✓
|
|
216
|
+
Quaternary alignment:
|
|
217
|
+
B → B (matched)
|
|
218
|
+
D → C (matched)
|
|
219
|
+
A → A (matched)
|
|
220
|
+
C → D (matched)
|
|
221
|
+
Aligned: 850 CA pairs across 4 chain pairs
|
|
222
|
+
=== Final Refinement ===
|
|
223
|
+
Refinement cycles:
|
|
224
|
+
Cycle 1: 850 pairs, RMSD = 0.180 Å
|
|
225
|
+
Cycle 2: 831 pairs, RMSD = 0.161 Å
|
|
226
|
+
Cycle 3: 813 pairs, RMSD = 0.155 Å
|
|
227
|
+
Cycle 4: 803 pairs, RMSD = 0.152 Å
|
|
228
|
+
Cycle 5: 801 pairs, RMSD = 0.151 Å
|
|
229
|
+
Converged (no more outliers)
|
|
230
|
+
RMSD: 0.151 Å
|
|
231
|
+
Superposed structure written to: superposed.cif
|
|
232
|
+
```
|
|
233
|
+
|
|
234
|
+
**Batch alignment (multiple mobile structures):**
|
|
235
|
+
```bash
|
|
236
|
+
$ uv run protalign 9jn4.cif 9jn5.cif 9jn6.cif 9ebk.cif --fixed-chain D --mobile-chain A --output aligned
|
|
237
|
+
```
|
|
238
|
+
|
|
239
|
+
```
|
|
240
|
+
Processing 1/3: 9jn5.cif
|
|
241
|
+
Fixed: chain D, 212 residues
|
|
242
|
+
Mobile: chain A, 211 residues
|
|
243
|
+
Aligned: 211 CA atom pairs
|
|
244
|
+
RMSD: 0.142 Å
|
|
245
|
+
Output: 9jn5_aligned.cif
|
|
246
|
+
|
|
247
|
+
Processing 2/3: 9jn6.cif
|
|
248
|
+
Fixed: chain D, 212 residues
|
|
249
|
+
Mobile: chain A, 214 residues
|
|
250
|
+
Aligned: 212 CA atom pairs
|
|
251
|
+
RMSD: 0.302 Å
|
|
252
|
+
Output: 9jn6_aligned.cif
|
|
253
|
+
|
|
254
|
+
Processing 3/3: 9ebk.cif
|
|
255
|
+
Fixed: chain D, 212 residues
|
|
256
|
+
Mobile: chain A, 219 residues
|
|
257
|
+
Aligned: 207 CA atom pairs
|
|
258
|
+
RMSD: 1.754 Å
|
|
259
|
+
Output: 9ebk_aligned.cif
|
|
260
|
+
|
|
261
|
+
================================================================================
|
|
262
|
+
SUMMARY
|
|
263
|
+
================================================================================
|
|
264
|
+
Total: 3 | Successful: 3 | Failed: 0
|
|
265
|
+
|
|
266
|
+
Successful alignments:
|
|
267
|
+
9jn5.cif RMSD: 0.142 Å → 9jn5_aligned.cif
|
|
268
|
+
9jn6.cif RMSD: 0.302 Å → 9jn6_aligned.cif
|
|
269
|
+
9ebk.cif RMSD: 1.754 Å → 9ebk_aligned.cif
|
|
270
|
+
```
|
|
271
|
+
|
|
272
|
+
## Algorithm
|
|
273
|
+
|
|
274
|
+
### Single-chain mode (default)
|
|
275
|
+
1. **Load structures**: Reads PDB or mmCIF files
|
|
276
|
+
2. **Extract chains**: Selects specified chain or first protein chain
|
|
277
|
+
3. **Sequence alignment**: Aligns sequences using gemmi's implementation
|
|
278
|
+
4. **Extract CA atoms**: Gets Cα coordinates from aligned residues
|
|
279
|
+
5. **Superposition**: Applies Kabsch algorithm for optimal transformation
|
|
280
|
+
6. **Refinement** (optional): Iteratively rejects outliers beyond `cutoff × RMSD`
|
|
281
|
+
7. **Transform**: Applies transformation to entire mobile structure
|
|
282
|
+
8. **Output**: Writes superposed structure in requested format
|
|
283
|
+
|
|
284
|
+
### Global mode (`--global`)
|
|
285
|
+
1. **Load structures**: Reads PDB or mmCIF files
|
|
286
|
+
2. **Match chains**: Identifies common chain IDs (A-A, B-B, etc.)
|
|
287
|
+
3. **Align per chain**: Sequence alignment for each chain pair
|
|
288
|
+
4. **Pool coordinates**: Combines CA atoms from all matched chains
|
|
289
|
+
5. **Single transformation**: Computes one transformation for all pooled coordinates
|
|
290
|
+
6. **Refinement** (optional): Iteratively rejects outliers across all chains
|
|
291
|
+
7. **Transform**: Applies transformation to entire mobile structure
|
|
292
|
+
8. **Output**: Writes superposed structure in requested format
|
|
293
|
+
|
|
294
|
+
### Quaternary mode (`--quaternary`)
|
|
295
|
+
1. **Load structures**: Reads PDB or mmCIF files
|
|
296
|
+
2. **Seed alignment**: Aligns specified or first chain pair with optional refinement
|
|
297
|
+
3. **Proximity matching**: Transforms mobile copy, matches remaining chains by distance between chain centers
|
|
298
|
+
4. **Pool coordinates**: Sequence aligns all matched chain pairs, pools CA atoms
|
|
299
|
+
5. **Final transformation**: Computes transformation on pooled coords with optional refinement
|
|
300
|
+
6. **Transform**: Applies transformation to mobile structure
|
|
301
|
+
7. **Rename** (optional with `--rename-chains`): Renames mobile chains to match fixed
|
|
302
|
+
8. **Output**: Writes superposed structure
|
|
303
|
+
|
|
304
|
+
## Development
|
|
305
|
+
|
|
306
|
+
### Setup
|
|
307
|
+
```bash
|
|
308
|
+
uv venv # Create virtual environment
|
|
309
|
+
uv sync --group dev # Install with dev dependencies
|
|
310
|
+
```
|
|
311
|
+
|
|
312
|
+
### Testing
|
|
313
|
+
```bash
|
|
314
|
+
uv run pytest # Run all tests
|
|
315
|
+
uv run pytest --cov # With coverage report
|
|
316
|
+
```
|
|
317
|
+
|
|
318
|
+
### Code Quality
|
|
319
|
+
```bash
|
|
320
|
+
uv run mypy src tests # Type checking (strict mode)
|
|
321
|
+
uv run ruff check . # Linting
|
|
322
|
+
uv run ruff format . # Auto-formatting
|
|
323
|
+
```
|
|
324
|
+
|
|
325
|
+
## Dependencies
|
|
326
|
+
|
|
327
|
+
- **numpy** (≥1.26): Numerical operations
|
|
328
|
+
- **gemmi** (≥0.7.4): Structure I/O and sequence alignment
|
|
329
|
+
|
|
330
|
+
## Requirements
|
|
331
|
+
|
|
332
|
+
- Python ≥3.12
|
|
333
|
+
|
|
334
|
+
## License
|
|
335
|
+
|
|
336
|
+
This project is licensed under the MIT License.
|
|
337
|
+
|
|
338
|
+
## Contributing
|
|
339
|
+
|
|
340
|
+
Contributions are welcome! Please open an issue or submit a pull request on GitHub.
|
|
341
|
+
|
|
342
|
+
## Acknowledgements
|
|
343
|
+
|
|
344
|
+
Thanks to the developers of `gemmi` for their excellent library. Coding was supported by `warp.dev`.
|
|
@@ -0,0 +1,319 @@
|
|
|
1
|
+
# pyprotalign
|
|
2
|
+
|
|
3
|
+
Protein structure superposition using sequence alignment and iterative refinement.
|
|
4
|
+
|
|
5
|
+
## Features
|
|
6
|
+
|
|
7
|
+
- **Sequence-based alignment**: Automatically identifies corresponding atoms via sequence alignment
|
|
8
|
+
- **Kabsch algorithm**: Optimal least-squares superposition
|
|
9
|
+
- **Iterative refinement**: Outlier rejection for improved accuracy
|
|
10
|
+
- **Multi-chain support**:
|
|
11
|
+
- Single-chain alignment with specified or default chains
|
|
12
|
+
- Global alignment of all matching chains
|
|
13
|
+
- Quaternary alignment with smart chain matching by proximity
|
|
14
|
+
- **Batch processing**: Align multiple mobile structures to a single reference
|
|
15
|
+
|
|
16
|
+
## Installation
|
|
17
|
+
|
|
18
|
+
### Using uv/pip
|
|
19
|
+
```bash
|
|
20
|
+
uv pip install pyprotalign
|
|
21
|
+
```
|
|
22
|
+
|
|
23
|
+
### From source
|
|
24
|
+
```bash
|
|
25
|
+
git clone https://github.com/ugSUBMARINE/pyprotalign.git
|
|
26
|
+
cd pyprotalign
|
|
27
|
+
uv venv
|
|
28
|
+
uv sync
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
## Quick Start
|
|
32
|
+
|
|
33
|
+
### CLI Tool
|
|
34
|
+
|
|
35
|
+
```bash
|
|
36
|
+
# Basic superposition (uses first protein chain from each structure)
|
|
37
|
+
uv run protalign fixed.cif mobile.cif -o superposed.cif
|
|
38
|
+
|
|
39
|
+
# Specify chains to align
|
|
40
|
+
uv run protalign fixed.cif mobile.cif --fixed-chain A --mobile-chain B
|
|
41
|
+
|
|
42
|
+
# Global alignment (align all matching chains: A-A, B-B, etc.)
|
|
43
|
+
uv run protalign fixed.cif mobile.cif --global
|
|
44
|
+
|
|
45
|
+
# Quaternary alignment (smart chain matching by proximity)
|
|
46
|
+
uv run protalign fixed.cif mobile.cif --quaternary --distance-threshold 8.0
|
|
47
|
+
|
|
48
|
+
# Quaternary alignment with chain renaming
|
|
49
|
+
uv run protalign fixed.cif mobile.cif --quaternary --rename-chains
|
|
50
|
+
|
|
51
|
+
# With iterative refinement (reject outliers)
|
|
52
|
+
uv run protalign fixed.cif mobile.cif --refine --cutoff 2.0 --cycles 5
|
|
53
|
+
|
|
54
|
+
# Output as PDB
|
|
55
|
+
uv run protalign fixed.cif mobile.cif -o superposed.pdb
|
|
56
|
+
|
|
57
|
+
# Batch alignment: multiple mobile files (outputs <stem>_superposed.cif)
|
|
58
|
+
uv run protalign reference.cif mobile1.cif mobile2.cif mobile3.cif
|
|
59
|
+
|
|
60
|
+
# Custom output suffix (e.g., <stem>_aligned.cif)
|
|
61
|
+
uv run protalign reference.cif *.cif --output aligned
|
|
62
|
+
|
|
63
|
+
# Batch with quaternary mode (e.g., for AlphaFold/Boltz multi-chain models)
|
|
64
|
+
uv run protalign reference.cif *.cif --quaternary --output aligned
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
Batch mode:
|
|
68
|
+
- Activated when multiple mobile files provided
|
|
69
|
+
- Outputs `<stem>_<suffix>.cif` for each mobile file
|
|
70
|
+
- Reports progress and summary with RMSD values
|
|
71
|
+
- Continues on errors
|
|
72
|
+
|
|
73
|
+
## Usage
|
|
74
|
+
|
|
75
|
+
```
|
|
76
|
+
usage: protalign [-h] [--version] [-o OUTPUT] [--fixed-chain FIXED_CHAIN] [--mobile-chain MOBILE_CHAIN] [--refine] [--cycles CYCLES] [--cutoff CUTOFF] [--global] [--quaternary] [--distance-threshold DISTANCE_THRESHOLD] [--rename-chains] [--verbose]
|
|
77
|
+
fixed mobile [mobile ...]
|
|
78
|
+
|
|
79
|
+
Protein structure superposition tool
|
|
80
|
+
|
|
81
|
+
positional arguments:
|
|
82
|
+
fixed Fixed structure file (PDB or mmCIF)
|
|
83
|
+
mobile Mobile structure file(s) (PDB or mmCIF). If multiple files provided, batch mode is activated.
|
|
84
|
+
|
|
85
|
+
options:
|
|
86
|
+
-h, --help show this help message and exit
|
|
87
|
+
--version show program's version number and exit
|
|
88
|
+
-o, --output OUTPUT Output file (single mode) or suffix (batch mode) (default: superposed.cif)
|
|
89
|
+
--fixed-chain FIXED_CHAIN
|
|
90
|
+
Chain ID for fixed structure (e.g., A). Also used as 'seed' chain in quaternary mode. If not specified, uses first protein chain.
|
|
91
|
+
--mobile-chain MOBILE_CHAIN
|
|
92
|
+
Chain ID for mobile structure (e.g., A). Also used as 'seed' chain in quaternary mode. If not specified, uses first protein chain.
|
|
93
|
+
--refine Use iterative refinement to reject outliers
|
|
94
|
+
--cycles CYCLES Maximum refinement cycles (default: 5)
|
|
95
|
+
--cutoff CUTOFF Outlier rejection cutoff (distance > cutoff * RMSD) (default: 2.0)
|
|
96
|
+
--global Align all protein chains by matching chain IDs (A-A, B-B, etc.) and pooling coordinates
|
|
97
|
+
--quaternary Quaternary alignment: match chains by proximity, rename to match fixed
|
|
98
|
+
--distance-threshold DISTANCE_THRESHOLD
|
|
99
|
+
Distance threshold (Å) for chain matching in quaternary mode (default: 8.0)
|
|
100
|
+
--rename-chains Rename mobile chains to match fixed (only with --quaternary)
|
|
101
|
+
--verbose Enable verbose output (show refinement cycles, chain matching details)
|
|
102
|
+
```
|
|
103
|
+
|
|
104
|
+
### Output
|
|
105
|
+
|
|
106
|
+
The tool reports:
|
|
107
|
+
- Chain(s) and number of residues (single-chain mode)
|
|
108
|
+
- Chains aligned and total pairs (global mode)
|
|
109
|
+
- Number of aligned CA atom pairs
|
|
110
|
+
- Final RMSD in Ångströms
|
|
111
|
+
- If using `--refine`: number of pairs retained/rejected
|
|
112
|
+
|
|
113
|
+
### Examples
|
|
114
|
+
|
|
115
|
+
**Single-chain alignment:**
|
|
116
|
+
```bash
|
|
117
|
+
$ uv run protalign 9jn4.cif 9ebk.cif --refine
|
|
118
|
+
```
|
|
119
|
+
|
|
120
|
+
```
|
|
121
|
+
Fixed: chain B, 213 residues
|
|
122
|
+
Mobile: chain B, 219 residues
|
|
123
|
+
Aligned: 207 CA atom pairs
|
|
124
|
+
Refinement: 167 pairs retained, 40 rejected
|
|
125
|
+
RMSD: 0.637 Å
|
|
126
|
+
Superposed structure written to: superposed.cif
|
|
127
|
+
```
|
|
128
|
+
|
|
129
|
+
**Chain selection:**
|
|
130
|
+
```bash
|
|
131
|
+
$ uv run protalign 9jn4.cif 9ebk.cif --fixed-chain A --mobile-chain B
|
|
132
|
+
```
|
|
133
|
+
|
|
134
|
+
```
|
|
135
|
+
Fixed: chain A, 213 residues
|
|
136
|
+
Mobile: chain B, 219 residues
|
|
137
|
+
Aligned: 207 CA atom pairs
|
|
138
|
+
RMSD: 1.807 Å
|
|
139
|
+
Superposed structure written to: superposed.cif
|
|
140
|
+
```
|
|
141
|
+
|
|
142
|
+
**Global multi-chain alignment:**
|
|
143
|
+
```bash
|
|
144
|
+
$ uv run protalign 9jn4.cif 9jn6.cif --global
|
|
145
|
+
````
|
|
146
|
+
|
|
147
|
+
```
|
|
148
|
+
Chains: A, B, C, D
|
|
149
|
+
Aligned: 850 CA atom pairs across 4 chains
|
|
150
|
+
RMSD: 33.550 Å
|
|
151
|
+
Superposed structure written to: superposed.cif
|
|
152
|
+
```
|
|
153
|
+
|
|
154
|
+
**Quaternary alignment (chain labels differ):**
|
|
155
|
+
```bash
|
|
156
|
+
$ uv run protalign 9jn4.cif 9jn6.cif --quaternary
|
|
157
|
+
```
|
|
158
|
+
|
|
159
|
+
```
|
|
160
|
+
Quaternary alignment:
|
|
161
|
+
B → B (matched)
|
|
162
|
+
D → C (matched)
|
|
163
|
+
A → A (matched)
|
|
164
|
+
C → D (matched)
|
|
165
|
+
Aligned: 850 CA pairs across 4 chain pairs
|
|
166
|
+
RMSD: 0.180 Å
|
|
167
|
+
Superposed structure written to: superposed.cif
|
|
168
|
+
```
|
|
169
|
+
|
|
170
|
+
**Verbose output (detailed progress):**
|
|
171
|
+
```bash
|
|
172
|
+
$ uv run protalign 9jn4.cif 9jn6.cif --quaternary --refine --verbose
|
|
173
|
+
```
|
|
174
|
+
|
|
175
|
+
```
|
|
176
|
+
=== Quaternary Alignment ===
|
|
177
|
+
Seed alignment: B → B
|
|
178
|
+
Refinement cycles:
|
|
179
|
+
Cycle 1: 213 pairs, RMSD = 0.110 Å
|
|
180
|
+
Cycle 2: 205 pairs, RMSD = 0.101 Å
|
|
181
|
+
Cycle 3: 197 pairs, RMSD = 0.093 Å
|
|
182
|
+
Cycle 4: 195 pairs, RMSD = 0.092 Å
|
|
183
|
+
Converged (no more outliers)
|
|
184
|
+
Chain center distances after seed alignment:
|
|
185
|
+
D ↔ C: 0.05 Å ✓
|
|
186
|
+
D ↔ D: 33.91 Å ✗
|
|
187
|
+
D ↔ A: 40.36 Å ✗
|
|
188
|
+
A ↔ D: 17.00 Å ✗
|
|
189
|
+
A ↔ A: 0.19 Å ✓
|
|
190
|
+
C ↔ D: 0.23 Å ✓
|
|
191
|
+
Quaternary alignment:
|
|
192
|
+
B → B (matched)
|
|
193
|
+
D → C (matched)
|
|
194
|
+
A → A (matched)
|
|
195
|
+
C → D (matched)
|
|
196
|
+
Aligned: 850 CA pairs across 4 chain pairs
|
|
197
|
+
=== Final Refinement ===
|
|
198
|
+
Refinement cycles:
|
|
199
|
+
Cycle 1: 850 pairs, RMSD = 0.180 Å
|
|
200
|
+
Cycle 2: 831 pairs, RMSD = 0.161 Å
|
|
201
|
+
Cycle 3: 813 pairs, RMSD = 0.155 Å
|
|
202
|
+
Cycle 4: 803 pairs, RMSD = 0.152 Å
|
|
203
|
+
Cycle 5: 801 pairs, RMSD = 0.151 Å
|
|
204
|
+
Converged (no more outliers)
|
|
205
|
+
RMSD: 0.151 Å
|
|
206
|
+
Superposed structure written to: superposed.cif
|
|
207
|
+
```
|
|
208
|
+
|
|
209
|
+
**Batch alignment (multiple mobile structures):**
|
|
210
|
+
```bash
|
|
211
|
+
$ uv run protalign 9jn4.cif 9jn5.cif 9jn6.cif 9ebk.cif --fixed-chain D --mobile-chain A --output aligned
|
|
212
|
+
```
|
|
213
|
+
|
|
214
|
+
```
|
|
215
|
+
Processing 1/3: 9jn5.cif
|
|
216
|
+
Fixed: chain D, 212 residues
|
|
217
|
+
Mobile: chain A, 211 residues
|
|
218
|
+
Aligned: 211 CA atom pairs
|
|
219
|
+
RMSD: 0.142 Å
|
|
220
|
+
Output: 9jn5_aligned.cif
|
|
221
|
+
|
|
222
|
+
Processing 2/3: 9jn6.cif
|
|
223
|
+
Fixed: chain D, 212 residues
|
|
224
|
+
Mobile: chain A, 214 residues
|
|
225
|
+
Aligned: 212 CA atom pairs
|
|
226
|
+
RMSD: 0.302 Å
|
|
227
|
+
Output: 9jn6_aligned.cif
|
|
228
|
+
|
|
229
|
+
Processing 3/3: 9ebk.cif
|
|
230
|
+
Fixed: chain D, 212 residues
|
|
231
|
+
Mobile: chain A, 219 residues
|
|
232
|
+
Aligned: 207 CA atom pairs
|
|
233
|
+
RMSD: 1.754 Å
|
|
234
|
+
Output: 9ebk_aligned.cif
|
|
235
|
+
|
|
236
|
+
================================================================================
|
|
237
|
+
SUMMARY
|
|
238
|
+
================================================================================
|
|
239
|
+
Total: 3 | Successful: 3 | Failed: 0
|
|
240
|
+
|
|
241
|
+
Successful alignments:
|
|
242
|
+
9jn5.cif RMSD: 0.142 Å → 9jn5_aligned.cif
|
|
243
|
+
9jn6.cif RMSD: 0.302 Å → 9jn6_aligned.cif
|
|
244
|
+
9ebk.cif RMSD: 1.754 Å → 9ebk_aligned.cif
|
|
245
|
+
```
|
|
246
|
+
|
|
247
|
+
## Algorithm
|
|
248
|
+
|
|
249
|
+
### Single-chain mode (default)
|
|
250
|
+
1. **Load structures**: Reads PDB or mmCIF files
|
|
251
|
+
2. **Extract chains**: Selects specified chain or first protein chain
|
|
252
|
+
3. **Sequence alignment**: Aligns sequences using gemmi's implementation
|
|
253
|
+
4. **Extract CA atoms**: Gets Cα coordinates from aligned residues
|
|
254
|
+
5. **Superposition**: Applies Kabsch algorithm for optimal transformation
|
|
255
|
+
6. **Refinement** (optional): Iteratively rejects outliers beyond `cutoff × RMSD`
|
|
256
|
+
7. **Transform**: Applies transformation to entire mobile structure
|
|
257
|
+
8. **Output**: Writes superposed structure in requested format
|
|
258
|
+
|
|
259
|
+
### Global mode (`--global`)
|
|
260
|
+
1. **Load structures**: Reads PDB or mmCIF files
|
|
261
|
+
2. **Match chains**: Identifies common chain IDs (A-A, B-B, etc.)
|
|
262
|
+
3. **Align per chain**: Sequence alignment for each chain pair
|
|
263
|
+
4. **Pool coordinates**: Combines CA atoms from all matched chains
|
|
264
|
+
5. **Single transformation**: Computes one transformation for all pooled coordinates
|
|
265
|
+
6. **Refinement** (optional): Iteratively rejects outliers across all chains
|
|
266
|
+
7. **Transform**: Applies transformation to entire mobile structure
|
|
267
|
+
8. **Output**: Writes superposed structure in requested format
|
|
268
|
+
|
|
269
|
+
### Quaternary mode (`--quaternary`)
|
|
270
|
+
1. **Load structures**: Reads PDB or mmCIF files
|
|
271
|
+
2. **Seed alignment**: Aligns specified or first chain pair with optional refinement
|
|
272
|
+
3. **Proximity matching**: Transforms mobile copy, matches remaining chains by distance between chain centers
|
|
273
|
+
4. **Pool coordinates**: Sequence aligns all matched chain pairs, pools CA atoms
|
|
274
|
+
5. **Final transformation**: Computes transformation on pooled coords with optional refinement
|
|
275
|
+
6. **Transform**: Applies transformation to mobile structure
|
|
276
|
+
7. **Rename** (optional with `--rename-chains`): Renames mobile chains to match fixed
|
|
277
|
+
8. **Output**: Writes superposed structure
|
|
278
|
+
|
|
279
|
+
## Development
|
|
280
|
+
|
|
281
|
+
### Setup
|
|
282
|
+
```bash
|
|
283
|
+
uv venv # Create virtual environment
|
|
284
|
+
uv sync --group dev # Install with dev dependencies
|
|
285
|
+
```
|
|
286
|
+
|
|
287
|
+
### Testing
|
|
288
|
+
```bash
|
|
289
|
+
uv run pytest # Run all tests
|
|
290
|
+
uv run pytest --cov # With coverage report
|
|
291
|
+
```
|
|
292
|
+
|
|
293
|
+
### Code Quality
|
|
294
|
+
```bash
|
|
295
|
+
uv run mypy src tests # Type checking (strict mode)
|
|
296
|
+
uv run ruff check . # Linting
|
|
297
|
+
uv run ruff format . # Auto-formatting
|
|
298
|
+
```
|
|
299
|
+
|
|
300
|
+
## Dependencies
|
|
301
|
+
|
|
302
|
+
- **numpy** (≥1.26): Numerical operations
|
|
303
|
+
- **gemmi** (≥0.7.4): Structure I/O and sequence alignment
|
|
304
|
+
|
|
305
|
+
## Requirements
|
|
306
|
+
|
|
307
|
+
- Python ≥3.12
|
|
308
|
+
|
|
309
|
+
## License
|
|
310
|
+
|
|
311
|
+
This project is licensed under the MIT License.
|
|
312
|
+
|
|
313
|
+
## Contributing
|
|
314
|
+
|
|
315
|
+
Contributions are welcome! Please open an issue or submit a pull request on GitHub.
|
|
316
|
+
|
|
317
|
+
## Acknowledgements
|
|
318
|
+
|
|
319
|
+
Thanks to the developers of `gemmi` for their excellent library. Coding was supported by `warp.dev`.
|