molscope 0.6.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- molscope-0.6.0/LICENSE +21 -0
- molscope-0.6.0/PKG-INFO +335 -0
- molscope-0.6.0/README.md +308 -0
- molscope-0.6.0/molscope/__init__.py +53 -0
- molscope-0.6.0/molscope/__main__.py +3 -0
- molscope-0.6.0/molscope/cli.py +74 -0
- molscope-0.6.0/molscope/coarsegrain.py +411 -0
- molscope-0.6.0/molscope/contactmap.py +116 -0
- molscope-0.6.0/molscope/descriptors.py +232 -0
- molscope-0.6.0/molscope/elements.py +75 -0
- molscope-0.6.0/molscope/ensemble.py +143 -0
- molscope-0.6.0/molscope/graph.py +151 -0
- molscope-0.6.0/molscope/io.py +342 -0
- molscope-0.6.0/molscope/molecule.py +502 -0
- molscope-0.6.0/molscope/plotting.py +191 -0
- molscope-0.6.0/molscope.egg-info/PKG-INFO +335 -0
- molscope-0.6.0/molscope.egg-info/SOURCES.txt +29 -0
- molscope-0.6.0/molscope.egg-info/dependency_links.txt +1 -0
- molscope-0.6.0/molscope.egg-info/entry_points.txt +2 -0
- molscope-0.6.0/molscope.egg-info/requires.txt +14 -0
- molscope-0.6.0/molscope.egg-info/top_level.txt +1 -0
- molscope-0.6.0/pyproject.toml +65 -0
- molscope-0.6.0/setup.cfg +4 -0
- molscope-0.6.0/tests/test_clustering.py +94 -0
- molscope-0.6.0/tests/test_coarsegrain.py +214 -0
- molscope-0.6.0/tests/test_contactmap.py +87 -0
- molscope-0.6.0/tests/test_descriptors.py +73 -0
- molscope-0.6.0/tests/test_features.py +227 -0
- molscope-0.6.0/tests/test_graph.py +99 -0
- molscope-0.6.0/tests/test_io.py +130 -0
- molscope-0.6.0/tests/test_molecule.py +108 -0
molscope-0.6.0/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 Roshan Shrestha
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
molscope-0.6.0/PKG-INFO
ADDED
|
@@ -0,0 +1,335 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: molscope
|
|
3
|
+
Version: 0.6.0
|
|
4
|
+
Summary: Lightweight molecular structure analysis, visualisation, graph export, and coarse-graining in Python.
|
|
5
|
+
Author-email: Roshan Shrestha <roshanpra@gmail.com>
|
|
6
|
+
License-Expression: MIT
|
|
7
|
+
Project-URL: Homepage, https://github.com/roshan2004/molscope
|
|
8
|
+
Project-URL: Repository, https://github.com/roshan2004/molscope
|
|
9
|
+
Keywords: chemistry,molecular-structure,pdb,xyz,mmcif,coarse-graining,molecular-graphs,machine-learning,visualization
|
|
10
|
+
Classifier: Programming Language :: Python :: 3
|
|
11
|
+
Classifier: Topic :: Scientific/Engineering :: Chemistry
|
|
12
|
+
Classifier: Topic :: Scientific/Engineering :: Visualization
|
|
13
|
+
Requires-Python: >=3.9
|
|
14
|
+
Description-Content-Type: text/markdown
|
|
15
|
+
License-File: LICENSE
|
|
16
|
+
Requires-Dist: numpy>=1.21
|
|
17
|
+
Requires-Dist: matplotlib>=3.5
|
|
18
|
+
Provides-Extra: test
|
|
19
|
+
Requires-Dist: pytest>=7; extra == "test"
|
|
20
|
+
Provides-Extra: fast
|
|
21
|
+
Requires-Dist: scipy>=1.7; extra == "fast"
|
|
22
|
+
Provides-Extra: viz
|
|
23
|
+
Requires-Dist: py3Dmol>=2.0; extra == "viz"
|
|
24
|
+
Provides-Extra: graph
|
|
25
|
+
Requires-Dist: networkx>=2.6; extra == "graph"
|
|
26
|
+
Dynamic: license-file
|
|
27
|
+
|
|
28
|
+
# MolScope
|
|
29
|
+
|
|
30
|
+
[](https://github.com/roshan2004/molscope/actions/workflows/ci.yml)
|
|
31
|
+
[](pyproject.toml)
|
|
32
|
+
[](LICENSE)
|
|
33
|
+
[](https://github.com/astral-sh/ruff)
|
|
34
|
+
|
|
35
|
+
Lightweight molecular structure analysis, visualisation, graph export, and
|
|
36
|
+
coarse-graining in Python. Read `.xyz`, `.pdb`, `.cif` and `.sdf` files
|
|
37
|
+
(optionally gzip-compressed), select and analyse atoms, and visualise them in
|
|
38
|
+
3D. The `.cif` reader is a basic mmCIF parser for standard `_atom_site`
|
|
39
|
+
coordinate loops, not a full mmCIF syntax implementation.
|
|
40
|
+
|
|
41
|
+
| 3D structure rendering | Residue contact map | Coarse-grained beads |
|
|
42
|
+
| --- | --- | --- |
|
|
43
|
+
|  |  |  |
|
|
44
|
+
|
|
45
|
+
## What it does
|
|
46
|
+
|
|
47
|
+
- **Read and write** XYZ, PDB, mmCIF and SDF (gzip-aware), fetch structures by
|
|
48
|
+
id from RCSB, and load multi-model NMR ensembles.
|
|
49
|
+
- **Select and measure** by chain, element or residue; compute distances,
|
|
50
|
+
angles, dihedrals and Kabsch-aligned RMSD.
|
|
51
|
+
- **Analyse** centroids, radius of gyration, the inertia tensor, inferred bonds
|
|
52
|
+
and contacts.
|
|
53
|
+
- **Contact maps** at atom or residue level, with heatmap plots.
|
|
54
|
+
- **Ensembles**: pairwise RMSD, RMSF, averaging, and conformer clustering.
|
|
55
|
+
- **Export for ML**: flat structural descriptors and molecular graphs for
|
|
56
|
+
NetworkX, PyTorch Geometric and DGL.
|
|
57
|
+
- **Coarse-grain** onto residue, Martini-style or custom bead mappings.
|
|
58
|
+
- **Visualise** with 3D matplotlib plots, an interactive py3Dmol viewer, spin
|
|
59
|
+
GIFs, and a command-line interface.
|
|
60
|
+
|
|
61
|
+
## Install
|
|
62
|
+
|
|
63
|
+
With [uv](https://docs.astral.sh/uv/) (recommended):
|
|
64
|
+
|
|
65
|
+
```bash
|
|
66
|
+
uv sync # creates .venv, installs deps + dev tools from the lockfile
|
|
67
|
+
uv run molscope 1fqy.pdb # run the CLI
|
|
68
|
+
uv run pytest # run the tests
|
|
69
|
+
```
|
|
70
|
+
|
|
71
|
+
`uv sync` pins the interpreter from `.python-version` and resolves against
|
|
72
|
+
`uv.lock` for reproducible installs. Use `uv sync --no-dev` to skip the test tools.
|
|
73
|
+
|
|
74
|
+
With plain pip:
|
|
75
|
+
|
|
76
|
+
```bash
|
|
77
|
+
python -m venv .venv && source .venv/bin/activate
|
|
78
|
+
pip install -e ".[test]" # or: pip install -r requirements.txt
|
|
79
|
+
```
|
|
80
|
+
|
|
81
|
+
## Documentation
|
|
82
|
+
|
|
83
|
+
The documentation website is built with MkDocs Material:
|
|
84
|
+
|
|
85
|
+
```bash
|
|
86
|
+
uv sync --group docs
|
|
87
|
+
uv run mkdocs serve
|
|
88
|
+
python scripts/build_user_guide_pdf.py
|
|
89
|
+
```
|
|
90
|
+
|
|
91
|
+
Docs source lives in `docs/`; the site configuration is `mkdocs.yml`. The PDF
|
|
92
|
+
builder requires Pandoc and a LaTeX engine such as `xelatex`.
|
|
93
|
+
|
|
94
|
+
## Quickstart
|
|
95
|
+
|
|
96
|
+
A runnable end-to-end tour over the bundled sample structures lives in
|
|
97
|
+
[`example.py`](example.py):
|
|
98
|
+
|
|
99
|
+
```bash
|
|
100
|
+
uv run python example.py # opens 3D plot windows
|
|
101
|
+
MPLBACKEND=Agg uv run python example.py # headless: saves PNGs instead
|
|
102
|
+
```
|
|
103
|
+
|
|
104
|
+
It reads an `.xyz` and a `.pdb`, prints derived properties, compares the NMR
|
|
105
|
+
models of `1aml`, writes a transformed structure back out, and renders a plot.
|
|
106
|
+
|
|
107
|
+
## Library
|
|
108
|
+
|
|
109
|
+
```python
|
|
110
|
+
import molscope as ms
|
|
111
|
+
|
|
112
|
+
mol = ms.read("1fqy.pdb") # parser chosen from the extension
|
|
113
|
+
mol = ms.fetch("1fqy") # ...or download straight from RCSB by id
|
|
114
|
+
print(mol.summary()) # atoms, formula, chains, bounding box
|
|
115
|
+
|
|
116
|
+
mol = mol.centered().rotate("z", 90).translate((1, 2, -1))
|
|
117
|
+
mol.plot() # CPK colours, inferred bonds, equal aspect
|
|
118
|
+
```
|
|
119
|
+
|
|
120
|
+
`Molecule` is immutable: `translate`, `centered` and `rotate` each return a new
|
|
121
|
+
molecule, so transformations chain cleanly without aliasing. Equality is by
|
|
122
|
+
value (`np.array_equal` on coordinates).
|
|
123
|
+
|
|
124
|
+
### Selections
|
|
125
|
+
|
|
126
|
+
PDB files, and standard mmCIF atom-site loops, carry per-atom metadata (atom
|
|
127
|
+
name, residue, chain), so you can slice a structure:
|
|
128
|
+
|
|
129
|
+
```python
|
|
130
|
+
mol.select(chain="A") # one chain
|
|
131
|
+
mol.select(element="C") # all carbons
|
|
132
|
+
mol.select(resname="HOH") # waters
|
|
133
|
+
mol.select(resid=(10, 20)) # an inclusive residue range
|
|
134
|
+
mol.alpha_carbons() # CA atoms (the usual basis for protein RMSD)
|
|
135
|
+
mol.backbone() # N, CA, C, O
|
|
136
|
+
mol[mask_or_indices] # subset by numpy mask / index array
|
|
137
|
+
```
|
|
138
|
+
|
|
139
|
+
### Analysis and measurements
|
|
140
|
+
|
|
141
|
+
```python
|
|
142
|
+
mol.centroid, mol.center_of_mass # geometric / mass-weighted centre
|
|
143
|
+
mol.radius_of_gyration # compactness (angstrom)
|
|
144
|
+
mol.dimensions, mol.formula # bounding box, Hill-order formula
|
|
145
|
+
mol.bonds() # inferred bond index pairs (KD-tree if scipy)
|
|
146
|
+
mol.contacts(cutoff=5.0) # atom pairs within a distance
|
|
147
|
+
|
|
148
|
+
mol.distance(i, j) # bond length
|
|
149
|
+
mol.angle(i, j, k) # bond angle (degrees)
|
|
150
|
+
mol.dihedral(a, b, c, d) # torsion angle (degrees)
|
|
151
|
+
|
|
152
|
+
a.alpha_carbons().rmsd(b.alpha_carbons(), align=True) # CA-RMSD after Kabsch fit
|
|
153
|
+
```
|
|
154
|
+
|
|
155
|
+
### Structural descriptors for ML
|
|
156
|
+
|
|
157
|
+
```python
|
|
158
|
+
features = mol.descriptors() # flat dict of scalar/vector descriptors
|
|
159
|
+
features["radius_of_gyration"]
|
|
160
|
+
features["principal_moments"] # 3 values
|
|
161
|
+
features["distance_histogram"] # fixed-size histogram
|
|
162
|
+
|
|
163
|
+
X, names = ms.featurize_many(
|
|
164
|
+
["a.pdb", "b.pdb", "c.xyz"],
|
|
165
|
+
return_names=True,
|
|
166
|
+
) # numeric matrix + column names
|
|
167
|
+
```
|
|
168
|
+
|
|
169
|
+
Descriptors include atom/residue counts, element counts, molecular mass,
|
|
170
|
+
centres, radius of gyration, bounding-box dimensions, inertia tensor, principal
|
|
171
|
+
moments/axes, shape anisotropy, compactness, distance histograms, bond-length
|
|
172
|
+
summary statistics, and atom/residue contact summaries. Full contact maps remain
|
|
173
|
+
available through `mol.contact_map(...)`.
|
|
174
|
+
|
|
175
|
+
### Contact maps
|
|
176
|
+
|
|
177
|
+
```python
|
|
178
|
+
cmap = mol.contact_map(cutoff=8.0, level="residue") # CA-CA contacts -> ContactMap
|
|
179
|
+
cmap.matrix # (R, R) array
|
|
180
|
+
mol.plot_contact_map(cutoff=8.0) # heatmap
|
|
181
|
+
|
|
182
|
+
mol.contact_map(level="atom") # atom-level map
|
|
183
|
+
mol.contact_map(level="residue", method="min") # closest inter-residue atom
|
|
184
|
+
mol.contact_map(level="residue", method="com") # residue centre of mass
|
|
185
|
+
```
|
|
186
|
+
|
|
187
|
+
### NMR ensembles
|
|
188
|
+
|
|
189
|
+
```python
|
|
190
|
+
from molscope import ensemble
|
|
191
|
+
|
|
192
|
+
models = ms.read_pdb_models("1aml.pdb") # all 20 models
|
|
193
|
+
ensemble.rmsd_matrix(models) # pairwise RMSD matrix
|
|
194
|
+
ensemble.rmsf(models) # per-atom fluctuation
|
|
195
|
+
ensemble.average(models) # mean structure
|
|
196
|
+
ensemble.align_all(models) # superpose every model onto the first
|
|
197
|
+
|
|
198
|
+
# Per-residue-pair contact probability across the ensemble (NMR variability)
|
|
199
|
+
freq = ms.ensemble_contact_frequency(models, cutoff=8.0)
|
|
200
|
+
freq.plot() # heatmap of contact frequencies in [0, 1]
|
|
201
|
+
```
|
|
202
|
+
|
|
203
|
+
### Comparing and clustering conformers
|
|
204
|
+
|
|
205
|
+
Cluster an ensemble (NMR models, conformer sets, docking poses, MD snapshots) by
|
|
206
|
+
pairwise RMSD:
|
|
207
|
+
|
|
208
|
+
```python
|
|
209
|
+
matrix = ms.rmsd_matrix(models, align=True) # (M, M) RMSD matrix
|
|
210
|
+
ms.plot_rmsd_heatmap(matrix) # heatmap
|
|
211
|
+
|
|
212
|
+
clusters = ms.cluster(models, method="hierarchical") # data-driven cutoff
|
|
213
|
+
clusters = ms.cluster(models, n_clusters=3) # ...or a fixed count
|
|
214
|
+
clusters.n_clusters # how many clusters
|
|
215
|
+
clusters.groups() # {cluster_id: [model indices]}
|
|
216
|
+
clusters.representatives() # {cluster_id: medoid model index}
|
|
217
|
+
|
|
218
|
+
ms.plot_rmsd_heatmap(matrix, order=clusters.order) # reorder into diagonal blocks
|
|
219
|
+
```
|
|
220
|
+
|
|
221
|
+
### Writing and viewing
|
|
222
|
+
|
|
223
|
+
```python
|
|
224
|
+
ms.write_xyz(mol.centered(), "out.xyz") # write transformed coordinates back
|
|
225
|
+
ms.write_pdb(mol, "out.pdb")
|
|
226
|
+
|
|
227
|
+
mol.plot(color_by="chain") # colour by element / chain / residue
|
|
228
|
+
mol.view(style="cartoon") # interactive py3Dmol viewer (notebooks)
|
|
229
|
+
from molscope.plotting import spin_gif
|
|
230
|
+
spin_gif(mol, "spin.gif") # rotating animation
|
|
231
|
+
```
|
|
232
|
+
|
|
233
|
+
### Molecular graphs (for machine learning)
|
|
234
|
+
|
|
235
|
+
Turn 3D coordinates plus inferred bonds into a graph, then export to the common
|
|
236
|
+
ML frameworks. The base `to_graph()` needs no extra dependencies; each exporter
|
|
237
|
+
imports its backend lazily.
|
|
238
|
+
|
|
239
|
+
```python
|
|
240
|
+
mol = ms.read("1fqy.pdb")
|
|
241
|
+
|
|
242
|
+
g = mol.to_graph() # MolecularGraph: nodes + edges, no deps
|
|
243
|
+
g.n_atoms, g.n_bonds # counts
|
|
244
|
+
g.atomic_numbers, g.masses # per-node arrays
|
|
245
|
+
g.node_features() # (N, 2) default features [atomic_number, mass]
|
|
246
|
+
|
|
247
|
+
G = mol.to_networkx() # networkx.Graph with node/edge attributes
|
|
248
|
+
data = mol.to_pyg_data() # torch_geometric.data.Data (x, pos, edge_index, edge_attr, z)
|
|
249
|
+
dglg = mol.to_dgl_graph() # dgl.DGLGraph with ndata/edata tensors
|
|
250
|
+
```
|
|
251
|
+
|
|
252
|
+
Nodes carry element, atomic number, mass, coordinates and (from PDB/mmCIF) atom
|
|
253
|
+
name, residue and chain. Edges carry the bonded pair, interatomic distance, and
|
|
254
|
+
bond order (`1.0` for geometrically inferred bonds). Install backends as needed:
|
|
255
|
+
`pip install "molscope[graph]"` installs only NetworkX. PyTorch Geometric and
|
|
256
|
+
DGL are optional manual installs: `pip install torch torch_geometric` or
|
|
257
|
+
`pip install dgl` after choosing the right PyTorch build for your platform.
|
|
258
|
+
|
|
259
|
+
### Coarse-graining
|
|
260
|
+
|
|
261
|
+
Map an atomistic structure onto a smaller set of beads. The result is an
|
|
262
|
+
ordinary `Molecule` (beads as "atoms") with explicit CG bonds attached, so it
|
|
263
|
+
plots, transforms and graphs like anything else.
|
|
264
|
+
|
|
265
|
+
```python
|
|
266
|
+
mol = ms.read("1fqy.pdb")
|
|
267
|
+
|
|
268
|
+
cg = mol.coarse_grain("residue_com") # one bead per residue (centre of mass)
|
|
269
|
+
cg = mol.coarse_grain("residue_centroid") # ...or geometric centroid
|
|
270
|
+
cg = mol.coarse_grain("martini") # simplified backbone + side-chain beads
|
|
271
|
+
cg.plot(scale=200) # beads + backbone topology
|
|
272
|
+
print(cg.mapping_report()) # explain beads, dropped atoms, and bonds
|
|
273
|
+
|
|
274
|
+
# Custom bead definitions by residue + atom name (needs PDB/mmCIF metadata)
|
|
275
|
+
mapping = {"ALA": {"BB": ["N", "CA", "C", "O"], "SC": ["CB"]}}
|
|
276
|
+
cg = mol.coarse_grain(mapping)
|
|
277
|
+
cg, report = mol.coarse_grain(mapping, return_report=True)
|
|
278
|
+
|
|
279
|
+
# Custom bead definitions by atom index (works on ANY structure, even .xyz)
|
|
280
|
+
cg = mol.coarse_grain({"head": [0, 1, 2, 3], "tail": [4, 5, 6, 7]},
|
|
281
|
+
bonds=[("head", "tail")]) # define the bead network too
|
|
282
|
+
|
|
283
|
+
cg.to_graph() # CG bead network, ready for ML
|
|
284
|
+
```
|
|
285
|
+
|
|
286
|
+
Bead positions are mass-weighted (or centroids). For residue mappings bonds are
|
|
287
|
+
generated automatically (within a residue, plus a backbone chain between
|
|
288
|
+
residues); pass `bonds=` to define them yourself. Name-based bonds are intended
|
|
289
|
+
for unique bead names such as `head`/`tail`; repeated names such as `BB`/`SC`
|
|
290
|
+
are ambiguous, so use bead indices for those. Atoms you leave unassigned are
|
|
291
|
+
dropped with a warning. This is meant
|
|
292
|
+
for teaching and prototyping CG mappings, not as a replacement for production
|
|
293
|
+
Martini parameters.
|
|
294
|
+
|
|
295
|
+
## Command line
|
|
296
|
+
|
|
297
|
+
```bash
|
|
298
|
+
molscope helix_201.xyz --translate 1 2 -1
|
|
299
|
+
molscope 1fqy.pdb --select atom_name=CA --color-by residue --save ca.png
|
|
300
|
+
molscope --fetch 1aml --center --gif amyloid.gif
|
|
301
|
+
python -m molscope 1fqy.pdb # equivalent if not pip-installed
|
|
302
|
+
```
|
|
303
|
+
|
|
304
|
+
## Sample structures
|
|
305
|
+
|
|
306
|
+
| File | Contents |
|
|
307
|
+
|------|----------|
|
|
308
|
+
| `helix_201.xyz` | a helix (bare coordinates) |
|
|
309
|
+
| `1fqy.pdb` | Aquaporin-1, single model (1661 atoms) |
|
|
310
|
+
| `1aml.pdb` | Alzheimer amyloid A4 peptide, 20-model NMR ensemble |
|
|
311
|
+
|
|
312
|
+
## Notes
|
|
313
|
+
|
|
314
|
+
- PDB files are parsed by **fixed columns**, not whitespace splitting, so atoms
|
|
315
|
+
with touching coordinate fields (large or negative values) read correctly.
|
|
316
|
+
- Alternate conformations (altLoc) other than the primary one are skipped.
|
|
317
|
+
- `read_pdb` returns a single model (`model=1` by default); use `read_pdb_models`
|
|
318
|
+
for the whole ensemble.
|
|
319
|
+
- Bond inference uses a `scipy.spatial.cKDTree` when available; without scipy it
|
|
320
|
+
falls back to a dense `O(n^2)` search that is refused above ~8000 atoms.
|
|
321
|
+
- Optional extras: `pip install "molscope[fast]"` (scipy, faster bonds/contacts)
|
|
322
|
+
and `"molscope[viz]"` (py3Dmol, for `Molecule.view`).
|
|
323
|
+
|
|
324
|
+
## Tests and linting
|
|
325
|
+
|
|
326
|
+
```bash
|
|
327
|
+
uv run pytest # full test suite
|
|
328
|
+
uv run ruff check . # lint
|
|
329
|
+
```
|
|
330
|
+
|
|
331
|
+
CI (GitHub Actions) runs both across Python 3.9 / 3.11 / 3.13 on every push and PR.
|
|
332
|
+
|
|
333
|
+
## License
|
|
334
|
+
|
|
335
|
+
[MIT](LICENSE)
|
molscope-0.6.0/README.md
ADDED
|
@@ -0,0 +1,308 @@
|
|
|
1
|
+
# MolScope
|
|
2
|
+
|
|
3
|
+
[](https://github.com/roshan2004/molscope/actions/workflows/ci.yml)
|
|
4
|
+
[](pyproject.toml)
|
|
5
|
+
[](LICENSE)
|
|
6
|
+
[](https://github.com/astral-sh/ruff)
|
|
7
|
+
|
|
8
|
+
Lightweight molecular structure analysis, visualisation, graph export, and
|
|
9
|
+
coarse-graining in Python. Read `.xyz`, `.pdb`, `.cif` and `.sdf` files
|
|
10
|
+
(optionally gzip-compressed), select and analyse atoms, and visualise them in
|
|
11
|
+
3D. The `.cif` reader is a basic mmCIF parser for standard `_atom_site`
|
|
12
|
+
coordinate loops, not a full mmCIF syntax implementation.
|
|
13
|
+
|
|
14
|
+
| 3D structure rendering | Residue contact map | Coarse-grained beads |
|
|
15
|
+
| --- | --- | --- |
|
|
16
|
+
|  |  |  |
|
|
17
|
+
|
|
18
|
+
## What it does
|
|
19
|
+
|
|
20
|
+
- **Read and write** XYZ, PDB, mmCIF and SDF (gzip-aware), fetch structures by
|
|
21
|
+
id from RCSB, and load multi-model NMR ensembles.
|
|
22
|
+
- **Select and measure** by chain, element or residue; compute distances,
|
|
23
|
+
angles, dihedrals and Kabsch-aligned RMSD.
|
|
24
|
+
- **Analyse** centroids, radius of gyration, the inertia tensor, inferred bonds
|
|
25
|
+
and contacts.
|
|
26
|
+
- **Contact maps** at atom or residue level, with heatmap plots.
|
|
27
|
+
- **Ensembles**: pairwise RMSD, RMSF, averaging, and conformer clustering.
|
|
28
|
+
- **Export for ML**: flat structural descriptors and molecular graphs for
|
|
29
|
+
NetworkX, PyTorch Geometric and DGL.
|
|
30
|
+
- **Coarse-grain** onto residue, Martini-style or custom bead mappings.
|
|
31
|
+
- **Visualise** with 3D matplotlib plots, an interactive py3Dmol viewer, spin
|
|
32
|
+
GIFs, and a command-line interface.
|
|
33
|
+
|
|
34
|
+
## Install
|
|
35
|
+
|
|
36
|
+
With [uv](https://docs.astral.sh/uv/) (recommended):
|
|
37
|
+
|
|
38
|
+
```bash
|
|
39
|
+
uv sync # creates .venv, installs deps + dev tools from the lockfile
|
|
40
|
+
uv run molscope 1fqy.pdb # run the CLI
|
|
41
|
+
uv run pytest # run the tests
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
`uv sync` pins the interpreter from `.python-version` and resolves against
|
|
45
|
+
`uv.lock` for reproducible installs. Use `uv sync --no-dev` to skip the test tools.
|
|
46
|
+
|
|
47
|
+
With plain pip:
|
|
48
|
+
|
|
49
|
+
```bash
|
|
50
|
+
python -m venv .venv && source .venv/bin/activate
|
|
51
|
+
pip install -e ".[test]" # or: pip install -r requirements.txt
|
|
52
|
+
```
|
|
53
|
+
|
|
54
|
+
## Documentation
|
|
55
|
+
|
|
56
|
+
The documentation website is built with MkDocs Material:
|
|
57
|
+
|
|
58
|
+
```bash
|
|
59
|
+
uv sync --group docs
|
|
60
|
+
uv run mkdocs serve
|
|
61
|
+
python scripts/build_user_guide_pdf.py
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
Docs source lives in `docs/`; the site configuration is `mkdocs.yml`. The PDF
|
|
65
|
+
builder requires Pandoc and a LaTeX engine such as `xelatex`.
|
|
66
|
+
|
|
67
|
+
## Quickstart
|
|
68
|
+
|
|
69
|
+
A runnable end-to-end tour over the bundled sample structures lives in
|
|
70
|
+
[`example.py`](example.py):
|
|
71
|
+
|
|
72
|
+
```bash
|
|
73
|
+
uv run python example.py # opens 3D plot windows
|
|
74
|
+
MPLBACKEND=Agg uv run python example.py # headless: saves PNGs instead
|
|
75
|
+
```
|
|
76
|
+
|
|
77
|
+
It reads an `.xyz` and a `.pdb`, prints derived properties, compares the NMR
|
|
78
|
+
models of `1aml`, writes a transformed structure back out, and renders a plot.
|
|
79
|
+
|
|
80
|
+
## Library
|
|
81
|
+
|
|
82
|
+
```python
|
|
83
|
+
import molscope as ms
|
|
84
|
+
|
|
85
|
+
mol = ms.read("1fqy.pdb") # parser chosen from the extension
|
|
86
|
+
mol = ms.fetch("1fqy") # ...or download straight from RCSB by id
|
|
87
|
+
print(mol.summary()) # atoms, formula, chains, bounding box
|
|
88
|
+
|
|
89
|
+
mol = mol.centered().rotate("z", 90).translate((1, 2, -1))
|
|
90
|
+
mol.plot() # CPK colours, inferred bonds, equal aspect
|
|
91
|
+
```
|
|
92
|
+
|
|
93
|
+
`Molecule` is immutable: `translate`, `centered` and `rotate` each return a new
|
|
94
|
+
molecule, so transformations chain cleanly without aliasing. Equality is by
|
|
95
|
+
value (`np.array_equal` on coordinates).
|
|
96
|
+
|
|
97
|
+
### Selections
|
|
98
|
+
|
|
99
|
+
PDB files, and standard mmCIF atom-site loops, carry per-atom metadata (atom
|
|
100
|
+
name, residue, chain), so you can slice a structure:
|
|
101
|
+
|
|
102
|
+
```python
|
|
103
|
+
mol.select(chain="A") # one chain
|
|
104
|
+
mol.select(element="C") # all carbons
|
|
105
|
+
mol.select(resname="HOH") # waters
|
|
106
|
+
mol.select(resid=(10, 20)) # an inclusive residue range
|
|
107
|
+
mol.alpha_carbons() # CA atoms (the usual basis for protein RMSD)
|
|
108
|
+
mol.backbone() # N, CA, C, O
|
|
109
|
+
mol[mask_or_indices] # subset by numpy mask / index array
|
|
110
|
+
```
|
|
111
|
+
|
|
112
|
+
### Analysis and measurements
|
|
113
|
+
|
|
114
|
+
```python
|
|
115
|
+
mol.centroid, mol.center_of_mass # geometric / mass-weighted centre
|
|
116
|
+
mol.radius_of_gyration # compactness (angstrom)
|
|
117
|
+
mol.dimensions, mol.formula # bounding box, Hill-order formula
|
|
118
|
+
mol.bonds() # inferred bond index pairs (KD-tree if scipy)
|
|
119
|
+
mol.contacts(cutoff=5.0) # atom pairs within a distance
|
|
120
|
+
|
|
121
|
+
mol.distance(i, j) # bond length
|
|
122
|
+
mol.angle(i, j, k) # bond angle (degrees)
|
|
123
|
+
mol.dihedral(a, b, c, d) # torsion angle (degrees)
|
|
124
|
+
|
|
125
|
+
a.alpha_carbons().rmsd(b.alpha_carbons(), align=True) # CA-RMSD after Kabsch fit
|
|
126
|
+
```
|
|
127
|
+
|
|
128
|
+
### Structural descriptors for ML
|
|
129
|
+
|
|
130
|
+
```python
|
|
131
|
+
features = mol.descriptors() # flat dict of scalar/vector descriptors
|
|
132
|
+
features["radius_of_gyration"]
|
|
133
|
+
features["principal_moments"] # 3 values
|
|
134
|
+
features["distance_histogram"] # fixed-size histogram
|
|
135
|
+
|
|
136
|
+
X, names = ms.featurize_many(
|
|
137
|
+
["a.pdb", "b.pdb", "c.xyz"],
|
|
138
|
+
return_names=True,
|
|
139
|
+
) # numeric matrix + column names
|
|
140
|
+
```
|
|
141
|
+
|
|
142
|
+
Descriptors include atom/residue counts, element counts, molecular mass,
|
|
143
|
+
centres, radius of gyration, bounding-box dimensions, inertia tensor, principal
|
|
144
|
+
moments/axes, shape anisotropy, compactness, distance histograms, bond-length
|
|
145
|
+
summary statistics, and atom/residue contact summaries. Full contact maps remain
|
|
146
|
+
available through `mol.contact_map(...)`.
|
|
147
|
+
|
|
148
|
+
### Contact maps
|
|
149
|
+
|
|
150
|
+
```python
|
|
151
|
+
cmap = mol.contact_map(cutoff=8.0, level="residue") # CA-CA contacts -> ContactMap
|
|
152
|
+
cmap.matrix # (R, R) array
|
|
153
|
+
mol.plot_contact_map(cutoff=8.0) # heatmap
|
|
154
|
+
|
|
155
|
+
mol.contact_map(level="atom") # atom-level map
|
|
156
|
+
mol.contact_map(level="residue", method="min") # closest inter-residue atom
|
|
157
|
+
mol.contact_map(level="residue", method="com") # residue centre of mass
|
|
158
|
+
```
|
|
159
|
+
|
|
160
|
+
### NMR ensembles
|
|
161
|
+
|
|
162
|
+
```python
|
|
163
|
+
from molscope import ensemble
|
|
164
|
+
|
|
165
|
+
models = ms.read_pdb_models("1aml.pdb") # all 20 models
|
|
166
|
+
ensemble.rmsd_matrix(models) # pairwise RMSD matrix
|
|
167
|
+
ensemble.rmsf(models) # per-atom fluctuation
|
|
168
|
+
ensemble.average(models) # mean structure
|
|
169
|
+
ensemble.align_all(models) # superpose every model onto the first
|
|
170
|
+
|
|
171
|
+
# Per-residue-pair contact probability across the ensemble (NMR variability)
|
|
172
|
+
freq = ms.ensemble_contact_frequency(models, cutoff=8.0)
|
|
173
|
+
freq.plot() # heatmap of contact frequencies in [0, 1]
|
|
174
|
+
```
|
|
175
|
+
|
|
176
|
+
### Comparing and clustering conformers
|
|
177
|
+
|
|
178
|
+
Cluster an ensemble (NMR models, conformer sets, docking poses, MD snapshots) by
|
|
179
|
+
pairwise RMSD:
|
|
180
|
+
|
|
181
|
+
```python
|
|
182
|
+
matrix = ms.rmsd_matrix(models, align=True) # (M, M) RMSD matrix
|
|
183
|
+
ms.plot_rmsd_heatmap(matrix) # heatmap
|
|
184
|
+
|
|
185
|
+
clusters = ms.cluster(models, method="hierarchical") # data-driven cutoff
|
|
186
|
+
clusters = ms.cluster(models, n_clusters=3) # ...or a fixed count
|
|
187
|
+
clusters.n_clusters # how many clusters
|
|
188
|
+
clusters.groups() # {cluster_id: [model indices]}
|
|
189
|
+
clusters.representatives() # {cluster_id: medoid model index}
|
|
190
|
+
|
|
191
|
+
ms.plot_rmsd_heatmap(matrix, order=clusters.order) # reorder into diagonal blocks
|
|
192
|
+
```
|
|
193
|
+
|
|
194
|
+
### Writing and viewing
|
|
195
|
+
|
|
196
|
+
```python
|
|
197
|
+
ms.write_xyz(mol.centered(), "out.xyz") # write transformed coordinates back
|
|
198
|
+
ms.write_pdb(mol, "out.pdb")
|
|
199
|
+
|
|
200
|
+
mol.plot(color_by="chain") # colour by element / chain / residue
|
|
201
|
+
mol.view(style="cartoon") # interactive py3Dmol viewer (notebooks)
|
|
202
|
+
from molscope.plotting import spin_gif
|
|
203
|
+
spin_gif(mol, "spin.gif") # rotating animation
|
|
204
|
+
```
|
|
205
|
+
|
|
206
|
+
### Molecular graphs (for machine learning)
|
|
207
|
+
|
|
208
|
+
Turn 3D coordinates plus inferred bonds into a graph, then export to the common
|
|
209
|
+
ML frameworks. The base `to_graph()` needs no extra dependencies; each exporter
|
|
210
|
+
imports its backend lazily.
|
|
211
|
+
|
|
212
|
+
```python
|
|
213
|
+
mol = ms.read("1fqy.pdb")
|
|
214
|
+
|
|
215
|
+
g = mol.to_graph() # MolecularGraph: nodes + edges, no deps
|
|
216
|
+
g.n_atoms, g.n_bonds # counts
|
|
217
|
+
g.atomic_numbers, g.masses # per-node arrays
|
|
218
|
+
g.node_features() # (N, 2) default features [atomic_number, mass]
|
|
219
|
+
|
|
220
|
+
G = mol.to_networkx() # networkx.Graph with node/edge attributes
|
|
221
|
+
data = mol.to_pyg_data() # torch_geometric.data.Data (x, pos, edge_index, edge_attr, z)
|
|
222
|
+
dglg = mol.to_dgl_graph() # dgl.DGLGraph with ndata/edata tensors
|
|
223
|
+
```
|
|
224
|
+
|
|
225
|
+
Nodes carry element, atomic number, mass, coordinates and (from PDB/mmCIF) atom
|
|
226
|
+
name, residue and chain. Edges carry the bonded pair, interatomic distance, and
|
|
227
|
+
bond order (`1.0` for geometrically inferred bonds). Install backends as needed:
|
|
228
|
+
`pip install "molscope[graph]"` installs only NetworkX. PyTorch Geometric and
|
|
229
|
+
DGL are optional manual installs: `pip install torch torch_geometric` or
|
|
230
|
+
`pip install dgl` after choosing the right PyTorch build for your platform.
|
|
231
|
+
|
|
232
|
+
### Coarse-graining
|
|
233
|
+
|
|
234
|
+
Map an atomistic structure onto a smaller set of beads. The result is an
|
|
235
|
+
ordinary `Molecule` (beads as "atoms") with explicit CG bonds attached, so it
|
|
236
|
+
plots, transforms and graphs like anything else.
|
|
237
|
+
|
|
238
|
+
```python
|
|
239
|
+
mol = ms.read("1fqy.pdb")
|
|
240
|
+
|
|
241
|
+
cg = mol.coarse_grain("residue_com") # one bead per residue (centre of mass)
|
|
242
|
+
cg = mol.coarse_grain("residue_centroid") # ...or geometric centroid
|
|
243
|
+
cg = mol.coarse_grain("martini") # simplified backbone + side-chain beads
|
|
244
|
+
cg.plot(scale=200) # beads + backbone topology
|
|
245
|
+
print(cg.mapping_report()) # explain beads, dropped atoms, and bonds
|
|
246
|
+
|
|
247
|
+
# Custom bead definitions by residue + atom name (needs PDB/mmCIF metadata)
|
|
248
|
+
mapping = {"ALA": {"BB": ["N", "CA", "C", "O"], "SC": ["CB"]}}
|
|
249
|
+
cg = mol.coarse_grain(mapping)
|
|
250
|
+
cg, report = mol.coarse_grain(mapping, return_report=True)
|
|
251
|
+
|
|
252
|
+
# Custom bead definitions by atom index (works on ANY structure, even .xyz)
|
|
253
|
+
cg = mol.coarse_grain({"head": [0, 1, 2, 3], "tail": [4, 5, 6, 7]},
|
|
254
|
+
bonds=[("head", "tail")]) # define the bead network too
|
|
255
|
+
|
|
256
|
+
cg.to_graph() # CG bead network, ready for ML
|
|
257
|
+
```
|
|
258
|
+
|
|
259
|
+
Bead positions are mass-weighted (or centroids). For residue mappings bonds are
|
|
260
|
+
generated automatically (within a residue, plus a backbone chain between
|
|
261
|
+
residues); pass `bonds=` to define them yourself. Name-based bonds are intended
|
|
262
|
+
for unique bead names such as `head`/`tail`; repeated names such as `BB`/`SC`
|
|
263
|
+
are ambiguous, so use bead indices for those. Atoms you leave unassigned are
|
|
264
|
+
dropped with a warning. This is meant
|
|
265
|
+
for teaching and prototyping CG mappings, not as a replacement for production
|
|
266
|
+
Martini parameters.
|
|
267
|
+
|
|
268
|
+
## Command line
|
|
269
|
+
|
|
270
|
+
```bash
|
|
271
|
+
molscope helix_201.xyz --translate 1 2 -1
|
|
272
|
+
molscope 1fqy.pdb --select atom_name=CA --color-by residue --save ca.png
|
|
273
|
+
molscope --fetch 1aml --center --gif amyloid.gif
|
|
274
|
+
python -m molscope 1fqy.pdb # equivalent if not pip-installed
|
|
275
|
+
```
|
|
276
|
+
|
|
277
|
+
## Sample structures
|
|
278
|
+
|
|
279
|
+
| File | Contents |
|
|
280
|
+
|------|----------|
|
|
281
|
+
| `helix_201.xyz` | a helix (bare coordinates) |
|
|
282
|
+
| `1fqy.pdb` | Aquaporin-1, single model (1661 atoms) |
|
|
283
|
+
| `1aml.pdb` | Alzheimer amyloid A4 peptide, 20-model NMR ensemble |
|
|
284
|
+
|
|
285
|
+
## Notes
|
|
286
|
+
|
|
287
|
+
- PDB files are parsed by **fixed columns**, not whitespace splitting, so atoms
|
|
288
|
+
with touching coordinate fields (large or negative values) read correctly.
|
|
289
|
+
- Alternate conformations (altLoc) other than the primary one are skipped.
|
|
290
|
+
- `read_pdb` returns a single model (`model=1` by default); use `read_pdb_models`
|
|
291
|
+
for the whole ensemble.
|
|
292
|
+
- Bond inference uses a `scipy.spatial.cKDTree` when available; without scipy it
|
|
293
|
+
falls back to a dense `O(n^2)` search that is refused above ~8000 atoms.
|
|
294
|
+
- Optional extras: `pip install "molscope[fast]"` (scipy, faster bonds/contacts)
|
|
295
|
+
and `"molscope[viz]"` (py3Dmol, for `Molecule.view`).
|
|
296
|
+
|
|
297
|
+
## Tests and linting
|
|
298
|
+
|
|
299
|
+
```bash
|
|
300
|
+
uv run pytest # full test suite
|
|
301
|
+
uv run ruff check . # lint
|
|
302
|
+
```
|
|
303
|
+
|
|
304
|
+
CI (GitHub Actions) runs both across Python 3.9 / 3.11 / 3.13 on every push and PR.
|
|
305
|
+
|
|
306
|
+
## License
|
|
307
|
+
|
|
308
|
+
[MIT](LICENSE)
|