molscope 0.6.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
molscope-0.6.0/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Roshan Shrestha
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,335 @@
1
+ Metadata-Version: 2.4
2
+ Name: molscope
3
+ Version: 0.6.0
4
+ Summary: Lightweight molecular structure analysis, visualisation, graph export, and coarse-graining in Python.
5
+ Author-email: Roshan Shrestha <roshanpra@gmail.com>
6
+ License-Expression: MIT
7
+ Project-URL: Homepage, https://github.com/roshan2004/molscope
8
+ Project-URL: Repository, https://github.com/roshan2004/molscope
9
+ Keywords: chemistry,molecular-structure,pdb,xyz,mmcif,coarse-graining,molecular-graphs,machine-learning,visualization
10
+ Classifier: Programming Language :: Python :: 3
11
+ Classifier: Topic :: Scientific/Engineering :: Chemistry
12
+ Classifier: Topic :: Scientific/Engineering :: Visualization
13
+ Requires-Python: >=3.9
14
+ Description-Content-Type: text/markdown
15
+ License-File: LICENSE
16
+ Requires-Dist: numpy>=1.21
17
+ Requires-Dist: matplotlib>=3.5
18
+ Provides-Extra: test
19
+ Requires-Dist: pytest>=7; extra == "test"
20
+ Provides-Extra: fast
21
+ Requires-Dist: scipy>=1.7; extra == "fast"
22
+ Provides-Extra: viz
23
+ Requires-Dist: py3Dmol>=2.0; extra == "viz"
24
+ Provides-Extra: graph
25
+ Requires-Dist: networkx>=2.6; extra == "graph"
26
+ Dynamic: license-file
27
+
28
+ # MolScope
29
+
30
+ [![CI](https://github.com/roshan2004/molscope/actions/workflows/ci.yml/badge.svg)](https://github.com/roshan2004/molscope/actions/workflows/ci.yml)
31
+ [![Python](https://img.shields.io/badge/python-3.9%20%7C%203.11%20%7C%203.13-blue)](pyproject.toml)
32
+ [![License: MIT](https://img.shields.io/badge/license-MIT-yellow.svg)](LICENSE)
33
+ [![Code style: Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
34
+
35
+ Lightweight molecular structure analysis, visualisation, graph export, and
36
+ coarse-graining in Python. Read `.xyz`, `.pdb`, `.cif` and `.sdf` files
37
+ (optionally gzip-compressed), select and analyse atoms, and visualise them in
38
+ 3D. The `.cif` reader is a basic mmCIF parser for standard `_atom_site`
39
+ coordinate loops, not a full mmCIF syntax implementation.
40
+
41
+ | 3D structure rendering | Residue contact map | Coarse-grained beads |
42
+ | --- | --- | --- |
43
+ | ![Aquaporin-1 rendered as a 3D element-coloured molecular structure](docs/assets/readme/aquaporin-structure-v2.png) | ![Residue-level contact map heatmap for Aquaporin-1](docs/assets/readme/residue-contact-map.png) | ![Coarse-grained bead model of Aquaporin-1](docs/assets/readme/coarse-grained-beads-v2.png) |
44
+
45
+ ## What it does
46
+
47
+ - **Read and write** XYZ, PDB, mmCIF and SDF (gzip-aware), fetch structures by
48
+ id from RCSB, and load multi-model NMR ensembles.
49
+ - **Select and measure** by chain, element or residue; compute distances,
50
+ angles, dihedrals and Kabsch-aligned RMSD.
51
+ - **Analyse** centroids, radius of gyration, the inertia tensor, inferred bonds
52
+ and contacts.
53
+ - **Contact maps** at atom or residue level, with heatmap plots.
54
+ - **Ensembles**: pairwise RMSD, RMSF, averaging, and conformer clustering.
55
+ - **Export for ML**: flat structural descriptors and molecular graphs for
56
+ NetworkX, PyTorch Geometric and DGL.
57
+ - **Coarse-grain** onto residue, Martini-style or custom bead mappings.
58
+ - **Visualise** with 3D matplotlib plots, an interactive py3Dmol viewer, spin
59
+ GIFs, and a command-line interface.
60
+
61
+ ## Install
62
+
63
+ With [uv](https://docs.astral.sh/uv/) (recommended):
64
+
65
+ ```bash
66
+ uv sync # creates .venv, installs deps + dev tools from the lockfile
67
+ uv run molscope 1fqy.pdb # run the CLI
68
+ uv run pytest # run the tests
69
+ ```
70
+
71
+ `uv sync` pins the interpreter from `.python-version` and resolves against
72
+ `uv.lock` for reproducible installs. Use `uv sync --no-dev` to skip the test tools.
73
+
74
+ With plain pip:
75
+
76
+ ```bash
77
+ python -m venv .venv && source .venv/bin/activate
78
+ pip install -e ".[test]" # or: pip install -r requirements.txt
79
+ ```
80
+
81
+ ## Documentation
82
+
83
+ The documentation website is built with MkDocs Material:
84
+
85
+ ```bash
86
+ uv sync --group docs
87
+ uv run mkdocs serve
88
+ python scripts/build_user_guide_pdf.py
89
+ ```
90
+
91
+ Docs source lives in `docs/`; the site configuration is `mkdocs.yml`. The PDF
92
+ builder requires Pandoc and a LaTeX engine such as `xelatex`.
93
+
94
+ ## Quickstart
95
+
96
+ A runnable end-to-end tour over the bundled sample structures lives in
97
+ [`example.py`](example.py):
98
+
99
+ ```bash
100
+ uv run python example.py # opens 3D plot windows
101
+ MPLBACKEND=Agg uv run python example.py # headless: saves PNGs instead
102
+ ```
103
+
104
+ It reads an `.xyz` and a `.pdb`, prints derived properties, compares the NMR
105
+ models of `1aml`, writes a transformed structure back out, and renders a plot.
106
+
107
+ ## Library
108
+
109
+ ```python
110
+ import molscope as ms
111
+
112
+ mol = ms.read("1fqy.pdb") # parser chosen from the extension
113
+ mol = ms.fetch("1fqy") # ...or download straight from RCSB by id
114
+ print(mol.summary()) # atoms, formula, chains, bounding box
115
+
116
+ mol = mol.centered().rotate("z", 90).translate((1, 2, -1))
117
+ mol.plot() # CPK colours, inferred bonds, equal aspect
118
+ ```
119
+
120
+ `Molecule` is immutable: `translate`, `centered` and `rotate` each return a new
121
+ molecule, so transformations chain cleanly without aliasing. Equality is by
122
+ value (`np.array_equal` on coordinates).
123
+
124
+ ### Selections
125
+
126
+ PDB files, and standard mmCIF atom-site loops, carry per-atom metadata (atom
127
+ name, residue, chain), so you can slice a structure:
128
+
129
+ ```python
130
+ mol.select(chain="A") # one chain
131
+ mol.select(element="C") # all carbons
132
+ mol.select(resname="HOH") # waters
133
+ mol.select(resid=(10, 20)) # an inclusive residue range
134
+ mol.alpha_carbons() # CA atoms (the usual basis for protein RMSD)
135
+ mol.backbone() # N, CA, C, O
136
+ mol[mask_or_indices] # subset by numpy mask / index array
137
+ ```
138
+
139
+ ### Analysis and measurements
140
+
141
+ ```python
142
+ mol.centroid, mol.center_of_mass # geometric / mass-weighted centre
143
+ mol.radius_of_gyration # compactness (angstrom)
144
+ mol.dimensions, mol.formula # bounding box, Hill-order formula
145
+ mol.bonds() # inferred bond index pairs (KD-tree if scipy)
146
+ mol.contacts(cutoff=5.0) # atom pairs within a distance
147
+
148
+ mol.distance(i, j) # bond length
149
+ mol.angle(i, j, k) # bond angle (degrees)
150
+ mol.dihedral(a, b, c, d) # torsion angle (degrees)
151
+
152
+ a.alpha_carbons().rmsd(b.alpha_carbons(), align=True) # CA-RMSD after Kabsch fit
153
+ ```
154
+
155
+ ### Structural descriptors for ML
156
+
157
+ ```python
158
+ features = mol.descriptors() # flat dict of scalar/vector descriptors
159
+ features["radius_of_gyration"]
160
+ features["principal_moments"] # 3 values
161
+ features["distance_histogram"] # fixed-size histogram
162
+
163
+ X, names = ms.featurize_many(
164
+ ["a.pdb", "b.pdb", "c.xyz"],
165
+ return_names=True,
166
+ ) # numeric matrix + column names
167
+ ```
168
+
169
+ Descriptors include atom/residue counts, element counts, molecular mass,
170
+ centres, radius of gyration, bounding-box dimensions, inertia tensor, principal
171
+ moments/axes, shape anisotropy, compactness, distance histograms, bond-length
172
+ summary statistics, and atom/residue contact summaries. Full contact maps remain
173
+ available through `mol.contact_map(...)`.
174
+
175
+ ### Contact maps
176
+
177
+ ```python
178
+ cmap = mol.contact_map(cutoff=8.0, level="residue") # CA-CA contacts -> ContactMap
179
+ cmap.matrix # (R, R) array
180
+ mol.plot_contact_map(cutoff=8.0) # heatmap
181
+
182
+ mol.contact_map(level="atom") # atom-level map
183
+ mol.contact_map(level="residue", method="min") # closest inter-residue atom
184
+ mol.contact_map(level="residue", method="com") # residue centre of mass
185
+ ```
186
+
187
+ ### NMR ensembles
188
+
189
+ ```python
190
+ from molscope import ensemble
191
+
192
+ models = ms.read_pdb_models("1aml.pdb") # all 20 models
193
+ ensemble.rmsd_matrix(models) # pairwise RMSD matrix
194
+ ensemble.rmsf(models) # per-atom fluctuation
195
+ ensemble.average(models) # mean structure
196
+ ensemble.align_all(models) # superpose every model onto the first
197
+
198
+ # Per-residue-pair contact probability across the ensemble (NMR variability)
199
+ freq = ms.ensemble_contact_frequency(models, cutoff=8.0)
200
+ freq.plot() # heatmap of contact frequencies in [0, 1]
201
+ ```
202
+
203
+ ### Comparing and clustering conformers
204
+
205
+ Cluster an ensemble (NMR models, conformer sets, docking poses, MD snapshots) by
206
+ pairwise RMSD:
207
+
208
+ ```python
209
+ matrix = ms.rmsd_matrix(models, align=True) # (M, M) RMSD matrix
210
+ ms.plot_rmsd_heatmap(matrix) # heatmap
211
+
212
+ clusters = ms.cluster(models, method="hierarchical") # data-driven cutoff
213
+ clusters = ms.cluster(models, n_clusters=3) # ...or a fixed count
214
+ clusters.n_clusters # how many clusters
215
+ clusters.groups() # {cluster_id: [model indices]}
216
+ clusters.representatives() # {cluster_id: medoid model index}
217
+
218
+ ms.plot_rmsd_heatmap(matrix, order=clusters.order) # reorder into diagonal blocks
219
+ ```
220
+
221
+ ### Writing and viewing
222
+
223
+ ```python
224
+ ms.write_xyz(mol.centered(), "out.xyz") # write transformed coordinates back
225
+ ms.write_pdb(mol, "out.pdb")
226
+
227
+ mol.plot(color_by="chain") # colour by element / chain / residue
228
+ mol.view(style="cartoon") # interactive py3Dmol viewer (notebooks)
229
+ from molscope.plotting import spin_gif
230
+ spin_gif(mol, "spin.gif") # rotating animation
231
+ ```
232
+
233
+ ### Molecular graphs (for machine learning)
234
+
235
+ Turn 3D coordinates plus inferred bonds into a graph, then export to the common
236
+ ML frameworks. The base `to_graph()` needs no extra dependencies; each exporter
237
+ imports its backend lazily.
238
+
239
+ ```python
240
+ mol = ms.read("1fqy.pdb")
241
+
242
+ g = mol.to_graph() # MolecularGraph: nodes + edges, no deps
243
+ g.n_atoms, g.n_bonds # counts
244
+ g.atomic_numbers, g.masses # per-node arrays
245
+ g.node_features() # (N, 2) default features [atomic_number, mass]
246
+
247
+ G = mol.to_networkx() # networkx.Graph with node/edge attributes
248
+ data = mol.to_pyg_data() # torch_geometric.data.Data (x, pos, edge_index, edge_attr, z)
249
+ dglg = mol.to_dgl_graph() # dgl.DGLGraph with ndata/edata tensors
250
+ ```
251
+
252
+ Nodes carry element, atomic number, mass, coordinates and (from PDB/mmCIF) atom
253
+ name, residue and chain. Edges carry the bonded pair, interatomic distance, and
254
+ bond order (`1.0` for geometrically inferred bonds). Install backends as needed:
255
+ `pip install "molscope[graph]"` installs only NetworkX. PyTorch Geometric and
256
+ DGL are optional manual installs: `pip install torch torch_geometric` or
257
+ `pip install dgl` after choosing the right PyTorch build for your platform.
258
+
259
+ ### Coarse-graining
260
+
261
+ Map an atomistic structure onto a smaller set of beads. The result is an
262
+ ordinary `Molecule` (beads as "atoms") with explicit CG bonds attached, so it
263
+ plots, transforms and graphs like anything else.
264
+
265
+ ```python
266
+ mol = ms.read("1fqy.pdb")
267
+
268
+ cg = mol.coarse_grain("residue_com") # one bead per residue (centre of mass)
269
+ cg = mol.coarse_grain("residue_centroid") # ...or geometric centroid
270
+ cg = mol.coarse_grain("martini") # simplified backbone + side-chain beads
271
+ cg.plot(scale=200) # beads + backbone topology
272
+ print(cg.mapping_report()) # explain beads, dropped atoms, and bonds
273
+
274
+ # Custom bead definitions by residue + atom name (needs PDB/mmCIF metadata)
275
+ mapping = {"ALA": {"BB": ["N", "CA", "C", "O"], "SC": ["CB"]}}
276
+ cg = mol.coarse_grain(mapping)
277
+ cg, report = mol.coarse_grain(mapping, return_report=True)
278
+
279
+ # Custom bead definitions by atom index (works on ANY structure, even .xyz)
280
+ cg = mol.coarse_grain({"head": [0, 1, 2, 3], "tail": [4, 5, 6, 7]},
281
+ bonds=[("head", "tail")]) # define the bead network too
282
+
283
+ cg.to_graph() # CG bead network, ready for ML
284
+ ```
285
+
286
+ Bead positions are mass-weighted (or centroids). For residue mappings bonds are
287
+ generated automatically (within a residue, plus a backbone chain between
288
+ residues); pass `bonds=` to define them yourself. Name-based bonds are intended
289
+ for unique bead names such as `head`/`tail`; repeated names such as `BB`/`SC`
290
+ are ambiguous, so use bead indices for those. Atoms you leave unassigned are
291
+ dropped with a warning. This is meant
292
+ for teaching and prototyping CG mappings, not as a replacement for production
293
+ Martini parameters.
294
+
295
+ ## Command line
296
+
297
+ ```bash
298
+ molscope helix_201.xyz --translate 1 2 -1
299
+ molscope 1fqy.pdb --select atom_name=CA --color-by residue --save ca.png
300
+ molscope --fetch 1aml --center --gif amyloid.gif
301
+ python -m molscope 1fqy.pdb # equivalent if not pip-installed
302
+ ```
303
+
304
+ ## Sample structures
305
+
306
+ | File | Contents |
307
+ |------|----------|
308
+ | `helix_201.xyz` | a helix (bare coordinates) |
309
+ | `1fqy.pdb` | Aquaporin-1, single model (1661 atoms) |
310
+ | `1aml.pdb` | Alzheimer amyloid A4 peptide, 20-model NMR ensemble |
311
+
312
+ ## Notes
313
+
314
+ - PDB files are parsed by **fixed columns**, not whitespace splitting, so atoms
315
+ with touching coordinate fields (large or negative values) read correctly.
316
+ - Alternate conformations (altLoc) other than the primary one are skipped.
317
+ - `read_pdb` returns a single model (`model=1` by default); use `read_pdb_models`
318
+ for the whole ensemble.
319
+ - Bond inference uses a `scipy.spatial.cKDTree` when available; without scipy it
320
+ falls back to a dense `O(n^2)` search that is refused above ~8000 atoms.
321
+ - Optional extras: `pip install "molscope[fast]"` (scipy, faster bonds/contacts)
322
+ and `"molscope[viz]"` (py3Dmol, for `Molecule.view`).
323
+
324
+ ## Tests and linting
325
+
326
+ ```bash
327
+ uv run pytest # full test suite
328
+ uv run ruff check . # lint
329
+ ```
330
+
331
+ CI (GitHub Actions) runs both across Python 3.9 / 3.11 / 3.13 on every push and PR.
332
+
333
+ ## License
334
+
335
+ [MIT](LICENSE)
@@ -0,0 +1,308 @@
1
+ # MolScope
2
+
3
+ [![CI](https://github.com/roshan2004/molscope/actions/workflows/ci.yml/badge.svg)](https://github.com/roshan2004/molscope/actions/workflows/ci.yml)
4
+ [![Python](https://img.shields.io/badge/python-3.9%20%7C%203.11%20%7C%203.13-blue)](pyproject.toml)
5
+ [![License: MIT](https://img.shields.io/badge/license-MIT-yellow.svg)](LICENSE)
6
+ [![Code style: Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
7
+
8
+ Lightweight molecular structure analysis, visualisation, graph export, and
9
+ coarse-graining in Python. Read `.xyz`, `.pdb`, `.cif` and `.sdf` files
10
+ (optionally gzip-compressed), select and analyse atoms, and visualise them in
11
+ 3D. The `.cif` reader is a basic mmCIF parser for standard `_atom_site`
12
+ coordinate loops, not a full mmCIF syntax implementation.
13
+
14
+ | 3D structure rendering | Residue contact map | Coarse-grained beads |
15
+ | --- | --- | --- |
16
+ | ![Aquaporin-1 rendered as a 3D element-coloured molecular structure](docs/assets/readme/aquaporin-structure-v2.png) | ![Residue-level contact map heatmap for Aquaporin-1](docs/assets/readme/residue-contact-map.png) | ![Coarse-grained bead model of Aquaporin-1](docs/assets/readme/coarse-grained-beads-v2.png) |
17
+
18
+ ## What it does
19
+
20
+ - **Read and write** XYZ, PDB, mmCIF and SDF (gzip-aware), fetch structures by
21
+ id from RCSB, and load multi-model NMR ensembles.
22
+ - **Select and measure** by chain, element or residue; compute distances,
23
+ angles, dihedrals and Kabsch-aligned RMSD.
24
+ - **Analyse** centroids, radius of gyration, the inertia tensor, inferred bonds
25
+ and contacts.
26
+ - **Contact maps** at atom or residue level, with heatmap plots.
27
+ - **Ensembles**: pairwise RMSD, RMSF, averaging, and conformer clustering.
28
+ - **Export for ML**: flat structural descriptors and molecular graphs for
29
+ NetworkX, PyTorch Geometric and DGL.
30
+ - **Coarse-grain** onto residue, Martini-style or custom bead mappings.
31
+ - **Visualise** with 3D matplotlib plots, an interactive py3Dmol viewer, spin
32
+ GIFs, and a command-line interface.
33
+
34
+ ## Install
35
+
36
+ With [uv](https://docs.astral.sh/uv/) (recommended):
37
+
38
+ ```bash
39
+ uv sync # creates .venv, installs deps + dev tools from the lockfile
40
+ uv run molscope 1fqy.pdb # run the CLI
41
+ uv run pytest # run the tests
42
+ ```
43
+
44
+ `uv sync` pins the interpreter from `.python-version` and resolves against
45
+ `uv.lock` for reproducible installs. Use `uv sync --no-dev` to skip the test tools.
46
+
47
+ With plain pip:
48
+
49
+ ```bash
50
+ python -m venv .venv && source .venv/bin/activate
51
+ pip install -e ".[test]" # or: pip install -r requirements.txt
52
+ ```
53
+
54
+ ## Documentation
55
+
56
+ The documentation website is built with MkDocs Material:
57
+
58
+ ```bash
59
+ uv sync --group docs
60
+ uv run mkdocs serve
61
+ python scripts/build_user_guide_pdf.py
62
+ ```
63
+
64
+ Docs source lives in `docs/`; the site configuration is `mkdocs.yml`. The PDF
65
+ builder requires Pandoc and a LaTeX engine such as `xelatex`.
66
+
67
+ ## Quickstart
68
+
69
+ A runnable end-to-end tour over the bundled sample structures lives in
70
+ [`example.py`](example.py):
71
+
72
+ ```bash
73
+ uv run python example.py # opens 3D plot windows
74
+ MPLBACKEND=Agg uv run python example.py # headless: saves PNGs instead
75
+ ```
76
+
77
+ It reads an `.xyz` and a `.pdb`, prints derived properties, compares the NMR
78
+ models of `1aml`, writes a transformed structure back out, and renders a plot.
79
+
80
+ ## Library
81
+
82
+ ```python
83
+ import molscope as ms
84
+
85
+ mol = ms.read("1fqy.pdb") # parser chosen from the extension
86
+ mol = ms.fetch("1fqy") # ...or download straight from RCSB by id
87
+ print(mol.summary()) # atoms, formula, chains, bounding box
88
+
89
+ mol = mol.centered().rotate("z", 90).translate((1, 2, -1))
90
+ mol.plot() # CPK colours, inferred bonds, equal aspect
91
+ ```
92
+
93
+ `Molecule` is immutable: `translate`, `centered` and `rotate` each return a new
94
+ molecule, so transformations chain cleanly without aliasing. Equality is by
95
+ value (`np.array_equal` on coordinates).
96
+
97
+ ### Selections
98
+
99
+ PDB files, and standard mmCIF atom-site loops, carry per-atom metadata (atom
100
+ name, residue, chain), so you can slice a structure:
101
+
102
+ ```python
103
+ mol.select(chain="A") # one chain
104
+ mol.select(element="C") # all carbons
105
+ mol.select(resname="HOH") # waters
106
+ mol.select(resid=(10, 20)) # an inclusive residue range
107
+ mol.alpha_carbons() # CA atoms (the usual basis for protein RMSD)
108
+ mol.backbone() # N, CA, C, O
109
+ mol[mask_or_indices] # subset by numpy mask / index array
110
+ ```
111
+
112
+ ### Analysis and measurements
113
+
114
+ ```python
115
+ mol.centroid, mol.center_of_mass # geometric / mass-weighted centre
116
+ mol.radius_of_gyration # compactness (angstrom)
117
+ mol.dimensions, mol.formula # bounding box, Hill-order formula
118
+ mol.bonds() # inferred bond index pairs (KD-tree if scipy)
119
+ mol.contacts(cutoff=5.0) # atom pairs within a distance
120
+
121
+ mol.distance(i, j) # bond length
122
+ mol.angle(i, j, k) # bond angle (degrees)
123
+ mol.dihedral(a, b, c, d) # torsion angle (degrees)
124
+
125
+ a.alpha_carbons().rmsd(b.alpha_carbons(), align=True) # CA-RMSD after Kabsch fit
126
+ ```
127
+
128
+ ### Structural descriptors for ML
129
+
130
+ ```python
131
+ features = mol.descriptors() # flat dict of scalar/vector descriptors
132
+ features["radius_of_gyration"]
133
+ features["principal_moments"] # 3 values
134
+ features["distance_histogram"] # fixed-size histogram
135
+
136
+ X, names = ms.featurize_many(
137
+ ["a.pdb", "b.pdb", "c.xyz"],
138
+ return_names=True,
139
+ ) # numeric matrix + column names
140
+ ```
141
+
142
+ Descriptors include atom/residue counts, element counts, molecular mass,
143
+ centres, radius of gyration, bounding-box dimensions, inertia tensor, principal
144
+ moments/axes, shape anisotropy, compactness, distance histograms, bond-length
145
+ summary statistics, and atom/residue contact summaries. Full contact maps remain
146
+ available through `mol.contact_map(...)`.
147
+
148
+ ### Contact maps
149
+
150
+ ```python
151
+ cmap = mol.contact_map(cutoff=8.0, level="residue") # CA-CA contacts -> ContactMap
152
+ cmap.matrix # (R, R) array
153
+ mol.plot_contact_map(cutoff=8.0) # heatmap
154
+
155
+ mol.contact_map(level="atom") # atom-level map
156
+ mol.contact_map(level="residue", method="min") # closest inter-residue atom
157
+ mol.contact_map(level="residue", method="com") # residue centre of mass
158
+ ```
159
+
160
+ ### NMR ensembles
161
+
162
+ ```python
163
+ from molscope import ensemble
164
+
165
+ models = ms.read_pdb_models("1aml.pdb") # all 20 models
166
+ ensemble.rmsd_matrix(models) # pairwise RMSD matrix
167
+ ensemble.rmsf(models) # per-atom fluctuation
168
+ ensemble.average(models) # mean structure
169
+ ensemble.align_all(models) # superpose every model onto the first
170
+
171
+ # Per-residue-pair contact probability across the ensemble (NMR variability)
172
+ freq = ms.ensemble_contact_frequency(models, cutoff=8.0)
173
+ freq.plot() # heatmap of contact frequencies in [0, 1]
174
+ ```
175
+
176
+ ### Comparing and clustering conformers
177
+
178
+ Cluster an ensemble (NMR models, conformer sets, docking poses, MD snapshots) by
179
+ pairwise RMSD:
180
+
181
+ ```python
182
+ matrix = ms.rmsd_matrix(models, align=True) # (M, M) RMSD matrix
183
+ ms.plot_rmsd_heatmap(matrix) # heatmap
184
+
185
+ clusters = ms.cluster(models, method="hierarchical") # data-driven cutoff
186
+ clusters = ms.cluster(models, n_clusters=3) # ...or a fixed count
187
+ clusters.n_clusters # how many clusters
188
+ clusters.groups() # {cluster_id: [model indices]}
189
+ clusters.representatives() # {cluster_id: medoid model index}
190
+
191
+ ms.plot_rmsd_heatmap(matrix, order=clusters.order) # reorder into diagonal blocks
192
+ ```
193
+
194
+ ### Writing and viewing
195
+
196
+ ```python
197
+ ms.write_xyz(mol.centered(), "out.xyz") # write transformed coordinates back
198
+ ms.write_pdb(mol, "out.pdb")
199
+
200
+ mol.plot(color_by="chain") # colour by element / chain / residue
201
+ mol.view(style="cartoon") # interactive py3Dmol viewer (notebooks)
202
+ from molscope.plotting import spin_gif
203
+ spin_gif(mol, "spin.gif") # rotating animation
204
+ ```
205
+
206
+ ### Molecular graphs (for machine learning)
207
+
208
+ Turn 3D coordinates plus inferred bonds into a graph, then export to the common
209
+ ML frameworks. The base `to_graph()` needs no extra dependencies; each exporter
210
+ imports its backend lazily.
211
+
212
+ ```python
213
+ mol = ms.read("1fqy.pdb")
214
+
215
+ g = mol.to_graph() # MolecularGraph: nodes + edges, no deps
216
+ g.n_atoms, g.n_bonds # counts
217
+ g.atomic_numbers, g.masses # per-node arrays
218
+ g.node_features() # (N, 2) default features [atomic_number, mass]
219
+
220
+ G = mol.to_networkx() # networkx.Graph with node/edge attributes
221
+ data = mol.to_pyg_data() # torch_geometric.data.Data (x, pos, edge_index, edge_attr, z)
222
+ dglg = mol.to_dgl_graph() # dgl.DGLGraph with ndata/edata tensors
223
+ ```
224
+
225
+ Nodes carry element, atomic number, mass, coordinates and (from PDB/mmCIF) atom
226
+ name, residue and chain. Edges carry the bonded pair, interatomic distance, and
227
+ bond order (`1.0` for geometrically inferred bonds). Install backends as needed:
228
+ `pip install "molscope[graph]"` installs only NetworkX. PyTorch Geometric and
229
+ DGL are optional manual installs: `pip install torch torch_geometric` or
230
+ `pip install dgl` after choosing the right PyTorch build for your platform.
231
+
232
+ ### Coarse-graining
233
+
234
+ Map an atomistic structure onto a smaller set of beads. The result is an
235
+ ordinary `Molecule` (beads as "atoms") with explicit CG bonds attached, so it
236
+ plots, transforms and graphs like anything else.
237
+
238
+ ```python
239
+ mol = ms.read("1fqy.pdb")
240
+
241
+ cg = mol.coarse_grain("residue_com") # one bead per residue (centre of mass)
242
+ cg = mol.coarse_grain("residue_centroid") # ...or geometric centroid
243
+ cg = mol.coarse_grain("martini") # simplified backbone + side-chain beads
244
+ cg.plot(scale=200) # beads + backbone topology
245
+ print(cg.mapping_report()) # explain beads, dropped atoms, and bonds
246
+
247
+ # Custom bead definitions by residue + atom name (needs PDB/mmCIF metadata)
248
+ mapping = {"ALA": {"BB": ["N", "CA", "C", "O"], "SC": ["CB"]}}
249
+ cg = mol.coarse_grain(mapping)
250
+ cg, report = mol.coarse_grain(mapping, return_report=True)
251
+
252
+ # Custom bead definitions by atom index (works on ANY structure, even .xyz)
253
+ cg = mol.coarse_grain({"head": [0, 1, 2, 3], "tail": [4, 5, 6, 7]},
254
+ bonds=[("head", "tail")]) # define the bead network too
255
+
256
+ cg.to_graph() # CG bead network, ready for ML
257
+ ```
258
+
259
+ Bead positions are mass-weighted (or centroids). For residue mappings bonds are
260
+ generated automatically (within a residue, plus a backbone chain between
261
+ residues); pass `bonds=` to define them yourself. Name-based bonds are intended
262
+ for unique bead names such as `head`/`tail`; repeated names such as `BB`/`SC`
263
+ are ambiguous, so use bead indices for those. Atoms you leave unassigned are
264
+ dropped with a warning. This is meant
265
+ for teaching and prototyping CG mappings, not as a replacement for production
266
+ Martini parameters.
267
+
268
+ ## Command line
269
+
270
+ ```bash
271
+ molscope helix_201.xyz --translate 1 2 -1
272
+ molscope 1fqy.pdb --select atom_name=CA --color-by residue --save ca.png
273
+ molscope --fetch 1aml --center --gif amyloid.gif
274
+ python -m molscope 1fqy.pdb # equivalent if not pip-installed
275
+ ```
276
+
277
+ ## Sample structures
278
+
279
+ | File | Contents |
280
+ |------|----------|
281
+ | `helix_201.xyz` | a helix (bare coordinates) |
282
+ | `1fqy.pdb` | Aquaporin-1, single model (1661 atoms) |
283
+ | `1aml.pdb` | Alzheimer amyloid A4 peptide, 20-model NMR ensemble |
284
+
285
+ ## Notes
286
+
287
+ - PDB files are parsed by **fixed columns**, not whitespace splitting, so atoms
288
+ with touching coordinate fields (large or negative values) read correctly.
289
+ - Alternate conformations (altLoc) other than the primary one are skipped.
290
+ - `read_pdb` returns a single model (`model=1` by default); use `read_pdb_models`
291
+ for the whole ensemble.
292
+ - Bond inference uses a `scipy.spatial.cKDTree` when available; without scipy it
293
+ falls back to a dense `O(n^2)` search that is refused above ~8000 atoms.
294
+ - Optional extras: `pip install "molscope[fast]"` (scipy, faster bonds/contacts)
295
+ and `"molscope[viz]"` (py3Dmol, for `Molecule.view`).
296
+
297
+ ## Tests and linting
298
+
299
+ ```bash
300
+ uv run pytest # full test suite
301
+ uv run ruff check . # lint
302
+ ```
303
+
304
+ CI (GitHub Actions) runs both across Python 3.9 / 3.11 / 3.13 on every push and PR.
305
+
306
+ ## License
307
+
308
+ [MIT](LICENSE)