pyprotalign 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,10 @@
1
+ # Python-generated files
2
+ __pycache__/
3
+ *.py[oc]
4
+ build/
5
+ dist/
6
+ wheels/
7
+ *.egg-info
8
+
9
+ # Virtual environments
10
+ .venv
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 ugSUBMARINE
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,344 @@
1
+ Metadata-Version: 2.4
2
+ Name: pyprotalign
3
+ Version: 0.1.0
4
+ Summary: Protein structure superposition package and CLI tool
5
+ Project-URL: Homepage, https://github.com/ugSUBMARINE/pyprotalign
6
+ Project-URL: Documentation, https://github.com/ugSUBMARINE/pyprotalign#readme
7
+ Project-URL: Repository, https://github.com/ugSUBMARINE/pyprotalign.git
8
+ Project-URL: Bug Tracker, https://github.com/ugSUBMARINE/pyprotalign/issues
9
+ Project-URL: Changelog, https://github.com/ugSUBMARINE/pyprotalign/releases
10
+ Author-email: Karl Gruber <gammaturn@gmail.com>
11
+ Maintainer-email: Karl Gruber <gammaturn@gmail.com>
12
+ License: MIT
13
+ License-File: LICENSE.txt
14
+ Keywords: alignment,cli,protein structure,structural bioinformatics,superposition
15
+ Classifier: License :: OSI Approved :: MIT License
16
+ Classifier: Operating System :: OS Independent
17
+ Classifier: Programming Language :: Python :: 3
18
+ Classifier: Programming Language :: Python :: 3.12
19
+ Classifier: Programming Language :: Python :: 3.13
20
+ Classifier: Programming Language :: Python :: 3.14
21
+ Requires-Python: >=3.12
22
+ Requires-Dist: gemmi>=0.7.4
23
+ Requires-Dist: numpy>=1.26
24
+ Description-Content-Type: text/markdown
25
+
26
+ # pyprotalign
27
+
28
+ Protein structure superposition using sequence alignment and iterative refinement.
29
+
30
+ ## Features
31
+
32
+ - **Sequence-based alignment**: Automatically identifies corresponding atoms via sequence alignment
33
+ - **Kabsch algorithm**: Optimal least-squares superposition
34
+ - **Iterative refinement**: Outlier rejection for improved accuracy
35
+ - **Multi-chain support**:
36
+ - Single-chain alignment with specified or default chains
37
+ - Global alignment of all matching chains
38
+ - Quaternary alignment with smart chain matching by proximity
39
+ - **Batch processing**: Align multiple mobile structures to a single reference
40
+
41
+ ## Installation
42
+
43
+ ### Using uv/pip
44
+ ```bash
45
+ uv pip install pyprotalign
46
+ ```
47
+
48
+ ### From source
49
+ ```bash
50
+ git clone https://github.com/ugSUBMARINE/pyprotalign.git
51
+ cd pyprotalign
52
+ uv venv
53
+ uv sync
54
+ ```
55
+
56
+ ## Quick Start
57
+
58
+ ### CLI Tool
59
+
60
+ ```bash
61
+ # Basic superposition (uses first protein chain from each structure)
62
+ uv run protalign fixed.cif mobile.cif -o superposed.cif
63
+
64
+ # Specify chains to align
65
+ uv run protalign fixed.cif mobile.cif --fixed-chain A --mobile-chain B
66
+
67
+ # Global alignment (align all matching chains: A-A, B-B, etc.)
68
+ uv run protalign fixed.cif mobile.cif --global
69
+
70
+ # Quaternary alignment (smart chain matching by proximity)
71
+ uv run protalign fixed.cif mobile.cif --quaternary --distance-threshold 8.0
72
+
73
+ # Quaternary alignment with chain renaming
74
+ uv run protalign fixed.cif mobile.cif --quaternary --rename-chains
75
+
76
+ # With iterative refinement (reject outliers)
77
+ uv run protalign fixed.cif mobile.cif --refine --cutoff 2.0 --cycles 5
78
+
79
+ # Output as PDB
80
+ uv run protalign fixed.cif mobile.cif -o superposed.pdb
81
+
82
+ # Batch alignment: multiple mobile files (outputs <stem>_superposed.cif)
83
+ uv run protalign reference.cif mobile1.cif mobile2.cif mobile3.cif
84
+
85
+ # Custom output suffix (e.g., <stem>_aligned.cif)
86
+ uv run protalign reference.cif *.cif --output aligned
87
+
88
+ # Batch with quaternary mode (e.g., for AlphaFold/Boltz multi-chain models)
89
+ uv run protalign reference.cif *.cif --quaternary --output aligned
90
+ ```
91
+
92
+ Batch mode:
93
+ - Activated when multiple mobile files provided
94
+ - Outputs `<stem>_<suffix>.cif` for each mobile file
95
+ - Reports progress and summary with RMSD values
96
+ - Continues on errors
97
+
98
+ ## Usage
99
+
100
+ ```
101
+ usage: protalign [-h] [--version] [-o OUTPUT] [--fixed-chain FIXED_CHAIN] [--mobile-chain MOBILE_CHAIN] [--refine] [--cycles CYCLES] [--cutoff CUTOFF] [--global] [--quaternary] [--distance-threshold DISTANCE_THRESHOLD] [--rename-chains] [--verbose]
102
+ fixed mobile [mobile ...]
103
+
104
+ Protein structure superposition tool
105
+
106
+ positional arguments:
107
+ fixed Fixed structure file (PDB or mmCIF)
108
+ mobile Mobile structure file(s) (PDB or mmCIF). If multiple files provided, batch mode is activated.
109
+
110
+ options:
111
+ -h, --help show this help message and exit
112
+ --version show program's version number and exit
113
+ -o, --output OUTPUT Output file (single mode) or suffix (batch mode) (default: superposed.cif)
114
+ --fixed-chain FIXED_CHAIN
115
+ Chain ID for fixed structure (e.g., A). Also used as 'seed' chain in quaternary mode. If not specified, uses first protein chain.
116
+ --mobile-chain MOBILE_CHAIN
117
+ Chain ID for mobile structure (e.g., A). Also used as 'seed' chain in quaternary mode. If not specified, uses first protein chain.
118
+ --refine Use iterative refinement to reject outliers
119
+ --cycles CYCLES Maximum refinement cycles (default: 5)
120
+ --cutoff CUTOFF Outlier rejection cutoff (distance > cutoff * RMSD) (default: 2.0)
121
+ --global Align all protein chains by matching chain IDs (A-A, B-B, etc.) and pooling coordinates
122
+ --quaternary Quaternary alignment: match chains by proximity, rename to match fixed
123
+ --distance-threshold DISTANCE_THRESHOLD
124
+ Distance threshold (Å) for chain matching in quaternary mode (default: 8.0)
125
+ --rename-chains Rename mobile chains to match fixed (only with --quaternary)
126
+ --verbose Enable verbose output (show refinement cycles, chain matching details)
127
+ ```
128
+
129
+ ### Output
130
+
131
+ The tool reports:
132
+ - Chain(s) and number of residues (single-chain mode)
133
+ - Chains aligned and total pairs (global mode)
134
+ - Number of aligned CA atom pairs
135
+ - Final RMSD in Ångströms
136
+ - If using `--refine`: number of pairs retained/rejected
137
+
138
+ ### Examples
139
+
140
+ **Single-chain alignment:**
141
+ ```bash
142
+ $ uv run protalign 9jn4.cif 9ebk.cif --refine
143
+ ```
144
+
145
+ ```
146
+ Fixed: chain B, 213 residues
147
+ Mobile: chain B, 219 residues
148
+ Aligned: 207 CA atom pairs
149
+ Refinement: 167 pairs retained, 40 rejected
150
+ RMSD: 0.637 Å
151
+ Superposed structure written to: superposed.cif
152
+ ```
153
+
154
+ **Chain selection:**
155
+ ```bash
156
+ $ uv run protalign 9jn4.cif 9ebk.cif --fixed-chain A --mobile-chain B
157
+ ```
158
+
159
+ ```
160
+ Fixed: chain A, 213 residues
161
+ Mobile: chain B, 219 residues
162
+ Aligned: 207 CA atom pairs
163
+ RMSD: 1.807 Å
164
+ Superposed structure written to: superposed.cif
165
+ ```
166
+
167
+ **Global multi-chain alignment:**
168
+ ```bash
169
+ $ uv run protalign 9jn4.cif 9jn6.cif --global
170
+ ````
171
+
172
+ ```
173
+ Chains: A, B, C, D
174
+ Aligned: 850 CA atom pairs across 4 chains
175
+ RMSD: 33.550 Å
176
+ Superposed structure written to: superposed.cif
177
+ ```
178
+
179
+ **Quaternary alignment (chain labels differ):**
180
+ ```bash
181
+ $ uv run protalign 9jn4.cif 9jn6.cif --quaternary
182
+ ```
183
+
184
+ ```
185
+ Quaternary alignment:
186
+ B → B (matched)
187
+ D → C (matched)
188
+ A → A (matched)
189
+ C → D (matched)
190
+ Aligned: 850 CA pairs across 4 chain pairs
191
+ RMSD: 0.180 Å
192
+ Superposed structure written to: superposed.cif
193
+ ```
194
+
195
+ **Verbose output (detailed progress):**
196
+ ```bash
197
+ $ uv run protalign 9jn4.cif 9jn6.cif --quaternary --refine --verbose
198
+ ```
199
+
200
+ ```
201
+ === Quaternary Alignment ===
202
+ Seed alignment: B → B
203
+ Refinement cycles:
204
+ Cycle 1: 213 pairs, RMSD = 0.110 Å
205
+ Cycle 2: 205 pairs, RMSD = 0.101 Å
206
+ Cycle 3: 197 pairs, RMSD = 0.093 Å
207
+ Cycle 4: 195 pairs, RMSD = 0.092 Å
208
+ Converged (no more outliers)
209
+ Chain center distances after seed alignment:
210
+ D ↔ C: 0.05 Å ✓
211
+ D ↔ D: 33.91 Å ✗
212
+ D ↔ A: 40.36 Å ✗
213
+ A ↔ D: 17.00 Å ✗
214
+ A ↔ A: 0.19 Å ✓
215
+ C ↔ D: 0.23 Å ✓
216
+ Quaternary alignment:
217
+ B → B (matched)
218
+ D → C (matched)
219
+ A → A (matched)
220
+ C → D (matched)
221
+ Aligned: 850 CA pairs across 4 chain pairs
222
+ === Final Refinement ===
223
+ Refinement cycles:
224
+ Cycle 1: 850 pairs, RMSD = 0.180 Å
225
+ Cycle 2: 831 pairs, RMSD = 0.161 Å
226
+ Cycle 3: 813 pairs, RMSD = 0.155 Å
227
+ Cycle 4: 803 pairs, RMSD = 0.152 Å
228
+ Cycle 5: 801 pairs, RMSD = 0.151 Å
229
+ Converged (no more outliers)
230
+ RMSD: 0.151 Å
231
+ Superposed structure written to: superposed.cif
232
+ ```
233
+
234
+ **Batch alignment (multiple mobile structures):**
235
+ ```bash
236
+ $ uv run protalign 9jn4.cif 9jn5.cif 9jn6.cif 9ebk.cif --fixed-chain D --mobile-chain A --output aligned
237
+ ```
238
+
239
+ ```
240
+ Processing 1/3: 9jn5.cif
241
+ Fixed: chain D, 212 residues
242
+ Mobile: chain A, 211 residues
243
+ Aligned: 211 CA atom pairs
244
+ RMSD: 0.142 Å
245
+ Output: 9jn5_aligned.cif
246
+
247
+ Processing 2/3: 9jn6.cif
248
+ Fixed: chain D, 212 residues
249
+ Mobile: chain A, 214 residues
250
+ Aligned: 212 CA atom pairs
251
+ RMSD: 0.302 Å
252
+ Output: 9jn6_aligned.cif
253
+
254
+ Processing 3/3: 9ebk.cif
255
+ Fixed: chain D, 212 residues
256
+ Mobile: chain A, 219 residues
257
+ Aligned: 207 CA atom pairs
258
+ RMSD: 1.754 Å
259
+ Output: 9ebk_aligned.cif
260
+
261
+ ================================================================================
262
+ SUMMARY
263
+ ================================================================================
264
+ Total: 3 | Successful: 3 | Failed: 0
265
+
266
+ Successful alignments:
267
+ 9jn5.cif RMSD: 0.142 Å → 9jn5_aligned.cif
268
+ 9jn6.cif RMSD: 0.302 Å → 9jn6_aligned.cif
269
+ 9ebk.cif RMSD: 1.754 Å → 9ebk_aligned.cif
270
+ ```
271
+
272
+ ## Algorithm
273
+
274
+ ### Single-chain mode (default)
275
+ 1. **Load structures**: Reads PDB or mmCIF files
276
+ 2. **Extract chains**: Selects specified chain or first protein chain
277
+ 3. **Sequence alignment**: Aligns sequences using gemmi's implementation
278
+ 4. **Extract CA atoms**: Gets Cα coordinates from aligned residues
279
+ 5. **Superposition**: Applies Kabsch algorithm for optimal transformation
280
+ 6. **Refinement** (optional): Iteratively rejects outliers beyond `cutoff × RMSD`
281
+ 7. **Transform**: Applies transformation to entire mobile structure
282
+ 8. **Output**: Writes superposed structure in requested format
283
+
284
+ ### Global mode (`--global`)
285
+ 1. **Load structures**: Reads PDB or mmCIF files
286
+ 2. **Match chains**: Identifies common chain IDs (A-A, B-B, etc.)
287
+ 3. **Align per chain**: Sequence alignment for each chain pair
288
+ 4. **Pool coordinates**: Combines CA atoms from all matched chains
289
+ 5. **Single transformation**: Computes one transformation for all pooled coordinates
290
+ 6. **Refinement** (optional): Iteratively rejects outliers across all chains
291
+ 7. **Transform**: Applies transformation to entire mobile structure
292
+ 8. **Output**: Writes superposed structure in requested format
293
+
294
+ ### Quaternary mode (`--quaternary`)
295
+ 1. **Load structures**: Reads PDB or mmCIF files
296
+ 2. **Seed alignment**: Aligns specified or first chain pair with optional refinement
297
+ 3. **Proximity matching**: Transforms mobile copy, matches remaining chains by distance between chain centers
298
+ 4. **Pool coordinates**: Sequence aligns all matched chain pairs, pools CA atoms
299
+ 5. **Final transformation**: Computes transformation on pooled coords with optional refinement
300
+ 6. **Transform**: Applies transformation to mobile structure
301
+ 7. **Rename** (optional with `--rename-chains`): Renames mobile chains to match fixed
302
+ 8. **Output**: Writes superposed structure
303
+
304
+ ## Development
305
+
306
+ ### Setup
307
+ ```bash
308
+ uv venv # Create virtual environment
309
+ uv sync --group dev # Install with dev dependencies
310
+ ```
311
+
312
+ ### Testing
313
+ ```bash
314
+ uv run pytest # Run all tests
315
+ uv run pytest --cov # With coverage report
316
+ ```
317
+
318
+ ### Code Quality
319
+ ```bash
320
+ uv run mypy src tests # Type checking (strict mode)
321
+ uv run ruff check . # Linting
322
+ uv run ruff format . # Auto-formatting
323
+ ```
324
+
325
+ ## Dependencies
326
+
327
+ - **numpy** (≥1.26): Numerical operations
328
+ - **gemmi** (≥0.7.4): Structure I/O and sequence alignment
329
+
330
+ ## Requirements
331
+
332
+ - Python ≥3.12
333
+
334
+ ## License
335
+
336
+ This project is licensed under the MIT License.
337
+
338
+ ## Contributing
339
+
340
+ Contributions are welcome! Please open an issue or submit a pull request on GitHub.
341
+
342
+ ## Acknowledgements
343
+
344
+ Thanks to the developers of `gemmi` for their excellent library. Coding was supported by `warp.dev`.
@@ -0,0 +1,319 @@
1
+ # pyprotalign
2
+
3
+ Protein structure superposition using sequence alignment and iterative refinement.
4
+
5
+ ## Features
6
+
7
+ - **Sequence-based alignment**: Automatically identifies corresponding atoms via sequence alignment
8
+ - **Kabsch algorithm**: Optimal least-squares superposition
9
+ - **Iterative refinement**: Outlier rejection for improved accuracy
10
+ - **Multi-chain support**:
11
+ - Single-chain alignment with specified or default chains
12
+ - Global alignment of all matching chains
13
+ - Quaternary alignment with smart chain matching by proximity
14
+ - **Batch processing**: Align multiple mobile structures to a single reference
15
+
16
+ ## Installation
17
+
18
+ ### Using uv/pip
19
+ ```bash
20
+ uv pip install pyprotalign
21
+ ```
22
+
23
+ ### From source
24
+ ```bash
25
+ git clone https://github.com/ugSUBMARINE/pyprotalign.git
26
+ cd pyprotalign
27
+ uv venv
28
+ uv sync
29
+ ```
30
+
31
+ ## Quick Start
32
+
33
+ ### CLI Tool
34
+
35
+ ```bash
36
+ # Basic superposition (uses first protein chain from each structure)
37
+ uv run protalign fixed.cif mobile.cif -o superposed.cif
38
+
39
+ # Specify chains to align
40
+ uv run protalign fixed.cif mobile.cif --fixed-chain A --mobile-chain B
41
+
42
+ # Global alignment (align all matching chains: A-A, B-B, etc.)
43
+ uv run protalign fixed.cif mobile.cif --global
44
+
45
+ # Quaternary alignment (smart chain matching by proximity)
46
+ uv run protalign fixed.cif mobile.cif --quaternary --distance-threshold 8.0
47
+
48
+ # Quaternary alignment with chain renaming
49
+ uv run protalign fixed.cif mobile.cif --quaternary --rename-chains
50
+
51
+ # With iterative refinement (reject outliers)
52
+ uv run protalign fixed.cif mobile.cif --refine --cutoff 2.0 --cycles 5
53
+
54
+ # Output as PDB
55
+ uv run protalign fixed.cif mobile.cif -o superposed.pdb
56
+
57
+ # Batch alignment: multiple mobile files (outputs <stem>_superposed.cif)
58
+ uv run protalign reference.cif mobile1.cif mobile2.cif mobile3.cif
59
+
60
+ # Custom output suffix (e.g., <stem>_aligned.cif)
61
+ uv run protalign reference.cif *.cif --output aligned
62
+
63
+ # Batch with quaternary mode (e.g., for AlphaFold/Boltz multi-chain models)
64
+ uv run protalign reference.cif *.cif --quaternary --output aligned
65
+ ```
66
+
67
+ Batch mode:
68
+ - Activated when multiple mobile files provided
69
+ - Outputs `<stem>_<suffix>.cif` for each mobile file
70
+ - Reports progress and summary with RMSD values
71
+ - Continues on errors
72
+
73
+ ## Usage
74
+
75
+ ```
76
+ usage: protalign [-h] [--version] [-o OUTPUT] [--fixed-chain FIXED_CHAIN] [--mobile-chain MOBILE_CHAIN] [--refine] [--cycles CYCLES] [--cutoff CUTOFF] [--global] [--quaternary] [--distance-threshold DISTANCE_THRESHOLD] [--rename-chains] [--verbose]
77
+ fixed mobile [mobile ...]
78
+
79
+ Protein structure superposition tool
80
+
81
+ positional arguments:
82
+ fixed Fixed structure file (PDB or mmCIF)
83
+ mobile Mobile structure file(s) (PDB or mmCIF). If multiple files provided, batch mode is activated.
84
+
85
+ options:
86
+ -h, --help show this help message and exit
87
+ --version show program's version number and exit
88
+ -o, --output OUTPUT Output file (single mode) or suffix (batch mode) (default: superposed.cif)
89
+ --fixed-chain FIXED_CHAIN
90
+ Chain ID for fixed structure (e.g., A). Also used as 'seed' chain in quaternary mode. If not specified, uses first protein chain.
91
+ --mobile-chain MOBILE_CHAIN
92
+ Chain ID for mobile structure (e.g., A). Also used as 'seed' chain in quaternary mode. If not specified, uses first protein chain.
93
+ --refine Use iterative refinement to reject outliers
94
+ --cycles CYCLES Maximum refinement cycles (default: 5)
95
+ --cutoff CUTOFF Outlier rejection cutoff (distance > cutoff * RMSD) (default: 2.0)
96
+ --global Align all protein chains by matching chain IDs (A-A, B-B, etc.) and pooling coordinates
97
+ --quaternary Quaternary alignment: match chains by proximity, rename to match fixed
98
+ --distance-threshold DISTANCE_THRESHOLD
99
+ Distance threshold (Å) for chain matching in quaternary mode (default: 8.0)
100
+ --rename-chains Rename mobile chains to match fixed (only with --quaternary)
101
+ --verbose Enable verbose output (show refinement cycles, chain matching details)
102
+ ```
103
+
104
+ ### Output
105
+
106
+ The tool reports:
107
+ - Chain(s) and number of residues (single-chain mode)
108
+ - Chains aligned and total pairs (global mode)
109
+ - Number of aligned CA atom pairs
110
+ - Final RMSD in Ångströms
111
+ - If using `--refine`: number of pairs retained/rejected
112
+
113
+ ### Examples
114
+
115
+ **Single-chain alignment:**
116
+ ```bash
117
+ $ uv run protalign 9jn4.cif 9ebk.cif --refine
118
+ ```
119
+
120
+ ```
121
+ Fixed: chain B, 213 residues
122
+ Mobile: chain B, 219 residues
123
+ Aligned: 207 CA atom pairs
124
+ Refinement: 167 pairs retained, 40 rejected
125
+ RMSD: 0.637 Å
126
+ Superposed structure written to: superposed.cif
127
+ ```
128
+
129
+ **Chain selection:**
130
+ ```bash
131
+ $ uv run protalign 9jn4.cif 9ebk.cif --fixed-chain A --mobile-chain B
132
+ ```
133
+
134
+ ```
135
+ Fixed: chain A, 213 residues
136
+ Mobile: chain B, 219 residues
137
+ Aligned: 207 CA atom pairs
138
+ RMSD: 1.807 Å
139
+ Superposed structure written to: superposed.cif
140
+ ```
141
+
142
+ **Global multi-chain alignment:**
143
+ ```bash
144
+ $ uv run protalign 9jn4.cif 9jn6.cif --global
145
+ ````
146
+
147
+ ```
148
+ Chains: A, B, C, D
149
+ Aligned: 850 CA atom pairs across 4 chains
150
+ RMSD: 33.550 Å
151
+ Superposed structure written to: superposed.cif
152
+ ```
153
+
154
+ **Quaternary alignment (chain labels differ):**
155
+ ```bash
156
+ $ uv run protalign 9jn4.cif 9jn6.cif --quaternary
157
+ ```
158
+
159
+ ```
160
+ Quaternary alignment:
161
+ B → B (matched)
162
+ D → C (matched)
163
+ A → A (matched)
164
+ C → D (matched)
165
+ Aligned: 850 CA pairs across 4 chain pairs
166
+ RMSD: 0.180 Å
167
+ Superposed structure written to: superposed.cif
168
+ ```
169
+
170
+ **Verbose output (detailed progress):**
171
+ ```bash
172
+ $ uv run protalign 9jn4.cif 9jn6.cif --quaternary --refine --verbose
173
+ ```
174
+
175
+ ```
176
+ === Quaternary Alignment ===
177
+ Seed alignment: B → B
178
+ Refinement cycles:
179
+ Cycle 1: 213 pairs, RMSD = 0.110 Å
180
+ Cycle 2: 205 pairs, RMSD = 0.101 Å
181
+ Cycle 3: 197 pairs, RMSD = 0.093 Å
182
+ Cycle 4: 195 pairs, RMSD = 0.092 Å
183
+ Converged (no more outliers)
184
+ Chain center distances after seed alignment:
185
+ D ↔ C: 0.05 Å ✓
186
+ D ↔ D: 33.91 Å ✗
187
+ D ↔ A: 40.36 Å ✗
188
+ A ↔ D: 17.00 Å ✗
189
+ A ↔ A: 0.19 Å ✓
190
+ C ↔ D: 0.23 Å ✓
191
+ Quaternary alignment:
192
+ B → B (matched)
193
+ D → C (matched)
194
+ A → A (matched)
195
+ C → D (matched)
196
+ Aligned: 850 CA pairs across 4 chain pairs
197
+ === Final Refinement ===
198
+ Refinement cycles:
199
+ Cycle 1: 850 pairs, RMSD = 0.180 Å
200
+ Cycle 2: 831 pairs, RMSD = 0.161 Å
201
+ Cycle 3: 813 pairs, RMSD = 0.155 Å
202
+ Cycle 4: 803 pairs, RMSD = 0.152 Å
203
+ Cycle 5: 801 pairs, RMSD = 0.151 Å
204
+ Converged (no more outliers)
205
+ RMSD: 0.151 Å
206
+ Superposed structure written to: superposed.cif
207
+ ```
208
+
209
+ **Batch alignment (multiple mobile structures):**
210
+ ```bash
211
+ $ uv run protalign 9jn4.cif 9jn5.cif 9jn6.cif 9ebk.cif --fixed-chain D --mobile-chain A --output aligned
212
+ ```
213
+
214
+ ```
215
+ Processing 1/3: 9jn5.cif
216
+ Fixed: chain D, 212 residues
217
+ Mobile: chain A, 211 residues
218
+ Aligned: 211 CA atom pairs
219
+ RMSD: 0.142 Å
220
+ Output: 9jn5_aligned.cif
221
+
222
+ Processing 2/3: 9jn6.cif
223
+ Fixed: chain D, 212 residues
224
+ Mobile: chain A, 214 residues
225
+ Aligned: 212 CA atom pairs
226
+ RMSD: 0.302 Å
227
+ Output: 9jn6_aligned.cif
228
+
229
+ Processing 3/3: 9ebk.cif
230
+ Fixed: chain D, 212 residues
231
+ Mobile: chain A, 219 residues
232
+ Aligned: 207 CA atom pairs
233
+ RMSD: 1.754 Å
234
+ Output: 9ebk_aligned.cif
235
+
236
+ ================================================================================
237
+ SUMMARY
238
+ ================================================================================
239
+ Total: 3 | Successful: 3 | Failed: 0
240
+
241
+ Successful alignments:
242
+ 9jn5.cif RMSD: 0.142 Å → 9jn5_aligned.cif
243
+ 9jn6.cif RMSD: 0.302 Å → 9jn6_aligned.cif
244
+ 9ebk.cif RMSD: 1.754 Å → 9ebk_aligned.cif
245
+ ```
246
+
247
+ ## Algorithm
248
+
249
+ ### Single-chain mode (default)
250
+ 1. **Load structures**: Reads PDB or mmCIF files
251
+ 2. **Extract chains**: Selects specified chain or first protein chain
252
+ 3. **Sequence alignment**: Aligns sequences using gemmi's implementation
253
+ 4. **Extract CA atoms**: Gets Cα coordinates from aligned residues
254
+ 5. **Superposition**: Applies Kabsch algorithm for optimal transformation
255
+ 6. **Refinement** (optional): Iteratively rejects outliers beyond `cutoff × RMSD`
256
+ 7. **Transform**: Applies transformation to entire mobile structure
257
+ 8. **Output**: Writes superposed structure in requested format
258
+
259
+ ### Global mode (`--global`)
260
+ 1. **Load structures**: Reads PDB or mmCIF files
261
+ 2. **Match chains**: Identifies common chain IDs (A-A, B-B, etc.)
262
+ 3. **Align per chain**: Sequence alignment for each chain pair
263
+ 4. **Pool coordinates**: Combines CA atoms from all matched chains
264
+ 5. **Single transformation**: Computes one transformation for all pooled coordinates
265
+ 6. **Refinement** (optional): Iteratively rejects outliers across all chains
266
+ 7. **Transform**: Applies transformation to entire mobile structure
267
+ 8. **Output**: Writes superposed structure in requested format
268
+
269
+ ### Quaternary mode (`--quaternary`)
270
+ 1. **Load structures**: Reads PDB or mmCIF files
271
+ 2. **Seed alignment**: Aligns specified or first chain pair with optional refinement
272
+ 3. **Proximity matching**: Transforms mobile copy, matches remaining chains by distance between chain centers
273
+ 4. **Pool coordinates**: Sequence aligns all matched chain pairs, pools CA atoms
274
+ 5. **Final transformation**: Computes transformation on pooled coords with optional refinement
275
+ 6. **Transform**: Applies transformation to mobile structure
276
+ 7. **Rename** (optional with `--rename-chains`): Renames mobile chains to match fixed
277
+ 8. **Output**: Writes superposed structure
278
+
279
+ ## Development
280
+
281
+ ### Setup
282
+ ```bash
283
+ uv venv # Create virtual environment
284
+ uv sync --group dev # Install with dev dependencies
285
+ ```
286
+
287
+ ### Testing
288
+ ```bash
289
+ uv run pytest # Run all tests
290
+ uv run pytest --cov # With coverage report
291
+ ```
292
+
293
+ ### Code Quality
294
+ ```bash
295
+ uv run mypy src tests # Type checking (strict mode)
296
+ uv run ruff check . # Linting
297
+ uv run ruff format . # Auto-formatting
298
+ ```
299
+
300
+ ## Dependencies
301
+
302
+ - **numpy** (≥1.26): Numerical operations
303
+ - **gemmi** (≥0.7.4): Structure I/O and sequence alignment
304
+
305
+ ## Requirements
306
+
307
+ - Python ≥3.12
308
+
309
+ ## License
310
+
311
+ This project is licensed under the MIT License.
312
+
313
+ ## Contributing
314
+
315
+ Contributions are welcome! Please open an issue or submit a pull request on GitHub.
316
+
317
+ ## Acknowledgements
318
+
319
+ Thanks to the developers of `gemmi` for their excellent library. Coding was supported by `warp.dev`.