PyPI - molscope - Versions diffs - 0.6.2__tar.gz → 0.7.0__tar.gz - Mend

molscope 0.6.2tar.gz → 0.7.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (33) hide show

{molscope-0.6.2 → molscope-0.7.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: molscope
-Version: 0.6.2
+Version: 0.7.0
 Summary: Lightweight molecular structure analysis, visualisation, graph export, and coarse-graining in Python.
 Author-email: Roshan Shrestha <roshanpra@gmail.com>
 License-Expression: MIT
@@ -38,9 +38,9 @@ coarse-graining in Python. Read `.xyz`, `.pdb`, `.cif` and `.sdf` files
 3D. The `.cif` reader is a basic mmCIF parser for standard `_atom_site`
 coordinate loops, not a full mmCIF syntax implementation.
-| 3D structure rendering | Residue contact map | Coarse-grained beads |
-| --- | --- | --- |
-| ![Aquaporin-1 rendered as a 3D element-coloured molecular structure](https://raw.githubusercontent.com/roshan2004/molscope/main/docs/assets/readme/aquaporin-structure-v2.png) | ![Residue-level contact map heatmap for Aquaporin-1](https://raw.githubusercontent.com/roshan2004/molscope/main/docs/assets/readme/residue-contact-map.png) | ![Coarse-grained bead model of Aquaporin-1](https://raw.githubusercontent.com/roshan2004/molscope/main/docs/assets/readme/coarse-grained-beads-v2.png) |
+| 3D structure (element) | Secondary structure (DSSP) | Residue contact map | Coarse-grained beads |
+| --- | --- | --- | --- |
+| ![Aquaporin-1 rendered as a 3D element-coloured molecular structure](https://raw.githubusercontent.com/roshan2004/molscope/main/docs/assets/readme/aquaporin-structure-v2.png) | ![Aquaporin-1 coloured by DSSP secondary structure: helices red, turns cyan, coil grey](https://raw.githubusercontent.com/roshan2004/molscope/main/docs/assets/readme/secondary-structure.png) | ![Residue-level contact map heatmap for Aquaporin-1](https://raw.githubusercontent.com/roshan2004/molscope/main/docs/assets/readme/residue-contact-map.png) | ![Coarse-grained bead model of Aquaporin-1](https://raw.githubusercontent.com/roshan2004/molscope/main/docs/assets/readme/coarse-grained-beads-v2.png) |
 ## What it does
@@ -51,6 +51,8 @@ coordinate loops, not a full mmCIF syntax implementation.
 - **Analyse** centroids, radius of gyration, the inertia tensor, inferred bonds
   and contacts.
 - **Contact maps** at atom or residue level, with heatmap plots.
+- **Secondary structure** via a self-contained, dependency-free DSSP, with
+  `plot(color_by="ss")`.
 - **Ensembles**: pairwise RMSD, RMSF, averaging, and conformer clustering.
 - **Export for ML**: flat structural descriptors and molecular graphs for
   NetworkX, PyTorch Geometric and DGL.
@@ -58,6 +60,33 @@ coordinate loops, not a full mmCIF syntax implementation.
 - **Visualise** with 3D matplotlib plots, an interactive py3Dmol viewer, spin
   GIFs, and a command-line interface.
+## Why MolScope?
+MolScope is **not** intended to replace full molecular-simulation or
+cheminformatics frameworks. It is a lightweight **educational and prototyping**
+toolkit for reading common molecular structure files, performing simple
+structural analysis, exporting graph representations for ML workflows, and
+experimenting with coarse-grained mappings. Its core depends only on NumPy and
+Matplotlib, and the API is Python-first and scriptable.
+In particular, the coarse-graining tools are for **educational CG mapping and
+bead-graph prototyping**: useful for exploring mappings before moving to a
+production Martini workflow. They are not a validated Martini force-field
+generator.
+| Tool | Main focus | How MolScope differs |
+| --- | --- | --- |
+| RDKit | Cheminformatics | MolScope leans toward structure visualisation, protein/PDB-style metadata, and CG prototyping |
+| MDAnalysis | MD trajectories | MolScope is lighter and easier for static structures and teaching |
+| MDTraj | Trajectory analysis | MolScope is simpler and graph/CG oriented |
+| Biopython | Structure parsing / bioinformatics | MolScope adds 3D analysis, ML-graph export, and coarse-graining |
+| PyMOL / VMD | Interactive visualisation | MolScope is Python-first, scriptable, and ML-export friendly |
+| nglview | Notebook structure viewer | MolScope also does analysis, descriptors, graphs and CG, not just viewing |
+Reach for those tools when you need their depth and validation. Reach for
+MolScope when you want something small, readable, and quick to teach or
+prototype with.
 ## Install
 With [uv](https://docs.astral.sh/uv/) (recommended):
@@ -184,6 +213,29 @@ mol.contact_map(level="residue", method="min")        # closest inter-residue at
 mol.contact_map(level="residue", method="com")        # residue centre of mass
 ```
+### Secondary structure (DSSP)
+Assign protein secondary structure from backbone hydrogen-bond patterns with a
+self-contained, pure-NumPy DSSP (no external `mkdssp` binary needed):
+```python
+mol = ms.read("1fqy.pdb")
+ss = mol.secondary_structure()      # SecondaryStructure, one code per residue
+ss.string                           # e.g. '--HHHHHHHH--SS--EEEE--'
+ss.codes                            # per-residue array
+ss.summary()                        # helix/strand/coil counts and fractions
+mol.plot(color_by="ss")             # colour the 3D view by secondary structure
+```
+Codes follow DSSP: `H`/`G`/`I` helices, `E`/`B` strands, `T` turn, `S` bend,
+`-` coil. This is a simplified **educational** implementation: it reproduces the
+main classes from the Kabsch-Sander hydrogen-bond model but is not bit-identical
+to the reference `mkdssp` on every edge case. It needs backbone N/CA/C/O atoms,
+so use PDB/mmCIF input (not a bare `.xyz`). The secondary-structure render in the
+showcase above (helices red, turns cyan, coil grey) is produced this way.
 ### NMR ensembles
 ```python

{molscope-0.6.2 → molscope-0.7.0}/README.md RENAMED Viewed

@@ -11,9 +11,9 @@ coarse-graining in Python. Read `.xyz`, `.pdb`, `.cif` and `.sdf` files
 3D. The `.cif` reader is a basic mmCIF parser for standard `_atom_site`
 coordinate loops, not a full mmCIF syntax implementation.
-| 3D structure rendering | Residue contact map | Coarse-grained beads |
-| --- | --- | --- |
-| ![Aquaporin-1 rendered as a 3D element-coloured molecular structure](https://raw.githubusercontent.com/roshan2004/molscope/main/docs/assets/readme/aquaporin-structure-v2.png) | ![Residue-level contact map heatmap for Aquaporin-1](https://raw.githubusercontent.com/roshan2004/molscope/main/docs/assets/readme/residue-contact-map.png) | ![Coarse-grained bead model of Aquaporin-1](https://raw.githubusercontent.com/roshan2004/molscope/main/docs/assets/readme/coarse-grained-beads-v2.png) |
+| 3D structure (element) | Secondary structure (DSSP) | Residue contact map | Coarse-grained beads |
+| --- | --- | --- | --- |
+| ![Aquaporin-1 rendered as a 3D element-coloured molecular structure](https://raw.githubusercontent.com/roshan2004/molscope/main/docs/assets/readme/aquaporin-structure-v2.png) | ![Aquaporin-1 coloured by DSSP secondary structure: helices red, turns cyan, coil grey](https://raw.githubusercontent.com/roshan2004/molscope/main/docs/assets/readme/secondary-structure.png) | ![Residue-level contact map heatmap for Aquaporin-1](https://raw.githubusercontent.com/roshan2004/molscope/main/docs/assets/readme/residue-contact-map.png) | ![Coarse-grained bead model of Aquaporin-1](https://raw.githubusercontent.com/roshan2004/molscope/main/docs/assets/readme/coarse-grained-beads-v2.png) |
 ## What it does
@@ -24,6 +24,8 @@ coordinate loops, not a full mmCIF syntax implementation.
 - **Analyse** centroids, radius of gyration, the inertia tensor, inferred bonds
   and contacts.
 - **Contact maps** at atom or residue level, with heatmap plots.
+- **Secondary structure** via a self-contained, dependency-free DSSP, with
+  `plot(color_by="ss")`.
 - **Ensembles**: pairwise RMSD, RMSF, averaging, and conformer clustering.
 - **Export for ML**: flat structural descriptors and molecular graphs for
   NetworkX, PyTorch Geometric and DGL.
@@ -31,6 +33,33 @@ coordinate loops, not a full mmCIF syntax implementation.
 - **Visualise** with 3D matplotlib plots, an interactive py3Dmol viewer, spin
   GIFs, and a command-line interface.
+## Why MolScope?
+MolScope is **not** intended to replace full molecular-simulation or
+cheminformatics frameworks. It is a lightweight **educational and prototyping**
+toolkit for reading common molecular structure files, performing simple
+structural analysis, exporting graph representations for ML workflows, and
+experimenting with coarse-grained mappings. Its core depends only on NumPy and
+Matplotlib, and the API is Python-first and scriptable.
+In particular, the coarse-graining tools are for **educational CG mapping and
+bead-graph prototyping**: useful for exploring mappings before moving to a
+production Martini workflow. They are not a validated Martini force-field
+generator.
+| Tool | Main focus | How MolScope differs |
+| --- | --- | --- |
+| RDKit | Cheminformatics | MolScope leans toward structure visualisation, protein/PDB-style metadata, and CG prototyping |
+| MDAnalysis | MD trajectories | MolScope is lighter and easier for static structures and teaching |
+| MDTraj | Trajectory analysis | MolScope is simpler and graph/CG oriented |
+| Biopython | Structure parsing / bioinformatics | MolScope adds 3D analysis, ML-graph export, and coarse-graining |
+| PyMOL / VMD | Interactive visualisation | MolScope is Python-first, scriptable, and ML-export friendly |
+| nglview | Notebook structure viewer | MolScope also does analysis, descriptors, graphs and CG, not just viewing |
+Reach for those tools when you need their depth and validation. Reach for
+MolScope when you want something small, readable, and quick to teach or
+prototype with.
 ## Install
 With [uv](https://docs.astral.sh/uv/) (recommended):
@@ -157,6 +186,29 @@ mol.contact_map(level="residue", method="min")        # closest inter-residue at
 mol.contact_map(level="residue", method="com")        # residue centre of mass
 ```
+### Secondary structure (DSSP)
+Assign protein secondary structure from backbone hydrogen-bond patterns with a
+self-contained, pure-NumPy DSSP (no external `mkdssp` binary needed):
+```python
+mol = ms.read("1fqy.pdb")
+ss = mol.secondary_structure()      # SecondaryStructure, one code per residue
+ss.string                           # e.g. '--HHHHHHHH--SS--EEEE--'
+ss.codes                            # per-residue array
+ss.summary()                        # helix/strand/coil counts and fractions
+mol.plot(color_by="ss")             # colour the 3D view by secondary structure
+```
+Codes follow DSSP: `H`/`G`/`I` helices, `E`/`B` strands, `T` turn, `S` bend,
+`-` coil. This is a simplified **educational** implementation: it reproduces the
+main classes from the Kabsch-Sander hydrogen-bond model but is not bit-identical
+to the reference `mkdssp` on every edge case. It needs backbone N/CA/C/O atoms,
+so use PDB/mmCIF input (not a bare `.xyz`). The secondary-structure render in the
+showcase above (helices red, turns cyan, coil grey) is produced this way.
 ### NMR ensembles
 ```python

{molscope-0.6.2 → molscope-0.7.0}/molscope/__init__.py RENAMED Viewed

@@ -37,10 +37,11 @@ Examples
 See https://github.com/roshan2004/molscope for the full documentation.
 """
-from . import coarsegrain, ensemble
+from . import coarsegrain, dssp, ensemble
 from .coarsegrain import BeadMapping, BondMapping, CoarseGrainReport, DroppedAtom
 from .contactmap import ContactMap
 from .descriptors import descriptors, featurize_many
+from .dssp import SecondaryStructure
 from .ensemble import Clustering, cluster, rmsd_matrix
 from .ensemble import contact_frequency as ensemble_contact_frequency
 from .graph import MolecularGraph
@@ -68,9 +69,11 @@ __all__ = [
     "DroppedAtom",
     "Molecule",
     "MolecularGraph",
+    "SecondaryStructure",
     "cluster",
     "coarsegrain",
     "descriptors",
+    "dssp",
     "ensemble",
     "ensemble_contact_frequency",
     "featurize_many",
@@ -87,4 +90,4 @@ __all__ = [
     "write_pdb",
     "write_xyz",
 ]
-__version__ = "0.6.2"
+__version__ = "0.7.0"

molscope-0.7.0/molscope/dssp.py ADDED Viewed

@@ -0,0 +1,232 @@
+"""Simplified DSSP secondary-structure assignment, in pure NumPy.
+This implements the Kabsch & Sander (1983) approach: backbone amide hydrogens
+are placed geometrically, an electrostatic hydrogen-bond energy is computed
+between every backbone C=O and N-H pair, and secondary structure is assigned
+from the resulting hydrogen-bond pattern (helices from n-turns, strands from
+bridges/ladders, turns and bends).
+It is an **educational/prototyping** implementation: it covers the main DSSP
+classes (H/G/I helices, E/B strands, T turns, S bends) but is not bit-identical
+to the reference ``mkdssp`` program on every edge case. It needs backbone N, CA,
+C and O atoms, so it works on proteins read from PDB/mmCIF, not on bare ``.xyz``.
+Codes: ``H`` alpha-helix, ``G`` 3-10 helix, ``I`` pi-helix, ``E`` beta-strand,
+``B`` beta-bridge, ``T`` turn, ``S`` bend, ``-`` coil.
+"""
+from __future__ import annotations
+from dataclasses import dataclass
+from typing import TYPE_CHECKING
+import numpy as np
+if TYPE_CHECKING:
+    from .molecule import Molecule
+# Kabsch-Sander hydrogen-bond energy constants (gives kcal/mol with r in angstrom).
+_Q1 = 0.42
+_Q2 = 0.20
+_F = 332.0
+_HBOND_CUTOFF = -0.5      # energy below this counts as a hydrogen bond
+_CA_CUTOFF = 9.0          # no H-bond if CA-CA further than this (angstrom)
+_CHAIN_BREAK = 2.5        # C(i)-N(i+1) above this is a chain break (angstrom)
+# Single-letter codes, highest assignment priority first.
+_PRIORITY = ["H", "E", "B", "G", "I", "T", "S"]
+#: Colours for ``Molecule.plot(color_by="ss")``.
+SS_COLORS = {
+    "H": "#e6194b",   # alpha-helix
+    "G": "#f58231",   # 3-10 helix
+    "I": "#911eb4",   # pi-helix
+    "E": "#ffe119",   # beta-strand
+    "B": "#bfef45",   # beta-bridge
+    "T": "#42d4f4",   # turn
+    "S": "#aaffc3",   # bend
+    "-": "#d9d9d9",   # coil
+}
+@dataclass(frozen=True)
+class SecondaryStructure:
+    """Per-residue DSSP assignment for a structure.
+    ``codes`` holds one single-character code per residue, aligned with
+    ``resids``/``chains``/``resnames`` (in chain/residue order).
+    """
+    codes: np.ndarray          # (R,) '<U1' DSSP codes
+    resids: np.ndarray         # (R,) residue ids
+    chains: list               # (R,) chain ids
+    resnames: list             # (R,) residue names
+    def __len__(self) -> int:
+        return len(self.codes)
+    @property
+    def string(self) -> str:
+        """The assignment as a single string, e.g. ``'--HHHHH--EEEE--'``."""
+        return "".join(self.codes.tolist())
+    def summary(self) -> dict:
+        """Counts and fractions for helix, strand, and coil/other."""
+        helix = int(np.isin(self.codes, ["H", "G", "I"]).sum())
+        strand = int(np.isin(self.codes, ["E", "B"]).sum())
+        total = len(self.codes)
+        coil = total - helix - strand
+        denom = total or 1
+        return {
+            "residues": total,
+            "helix": helix, "strand": strand, "coil": coil,
+            "helix_fraction": helix / denom,
+            "strand_fraction": strand / denom,
+            "coil_fraction": coil / denom,
+        }
+def _backbone_residues(molecule: Molecule):
+    """Extract per-residue N/CA/C/O coordinates for residues with a full backbone."""
+    if not molecule.atom_names or len(molecule.resids) == 0:
+        raise ValueError(
+            "secondary-structure assignment needs per-atom names and residue ids "
+            "(read the structure from PDB or mmCIF)"
+        )
+    names = molecule.atom_names
+    coords = molecule.coords
+    N, CA, C, O = [], [], [], []
+    resids, chains, resnames = [], [], []
+    for idx, resname, resid, chain in molecule.residue_groups():
+        atoms = {names[i].upper(): i for i in idx}
+        if all(a in atoms for a in ("N", "CA", "C", "O")):
+            N.append(coords[atoms["N"]])
+            CA.append(coords[atoms["CA"]])
+            C.append(coords[atoms["C"]])
+            O.append(coords[atoms["O"]])
+            resids.append(resid)
+            chains.append(chain)
+            resnames.append(resname)
+    if not resids:
+        raise ValueError("no residues with a complete N/CA/C/O backbone were found")
+    return (
+        np.array(N, float), np.array(CA, float), np.array(C, float), np.array(O, float),
+        np.array(resids, int), chains, resnames,
+    )
+def _amide_hydrogens(N, C, O, chains, connected):
+    """Place backbone amide H atoms from the previous residue's C=O geometry.
+    Returns coordinates with NaN where no hydrogen exists (first residue of a
+    chain, or after a chain break) so those residues cannot act as H-bond donors.
+    """
+    H = np.full_like(N, np.nan)
+    co = C - O                                   # reverse of the C=O bond
+    norm = np.linalg.norm(co, axis=1, keepdims=True)
+    with np.errstate(invalid="ignore"):
+        co_unit = co / norm
+    for i in range(1, len(N)):
+        if connected[i]:
+            H[i] = N[i] + co_unit[i - 1]         # 1.0 A along O(i-1)->C(i-1)
+    return H
+def _hbond_matrix(N, CA, C, O, H):
+    """Boolean (R, R) matrix; entry [i, j] true if C=O of i bonds N-H of j."""
+    def pdist(a, b):
+        return np.linalg.norm(a[:, None, :] - b[None, :, :], axis=-1)
+    r_on = pdist(O, N)        # acceptor O(i) - donor N(j)
+    r_ch = pdist(C, H)
+    r_oh = pdist(O, H)
+    r_cn = pdist(C, N)
+    with np.errstate(divide="ignore", invalid="ignore"):
+        energy = _Q1 * _Q2 * _F * (1.0 / r_on + 1.0 / r_ch - 1.0 / r_oh - 1.0 / r_cn)
+    ca = pdist(CA, CA)
+    bonded = (energy < _HBOND_CUTOFF) & (ca < _CA_CUTOFF)
+    bonded &= ~np.isnan(energy)                  # donors without an H are excluded
+    np.fill_diagonal(bonded, False)
+    return bonded
+def assign(molecule: Molecule) -> SecondaryStructure:
+    """Assign secondary structure to a protein with a simplified DSSP.
+    Returns a :class:`SecondaryStructure` with one code per backbone residue.
+    Raises ``ValueError`` if the molecule lacks the metadata or backbone atoms
+    needed (e.g. a bare ``.xyz`` file).
+    """
+    N, CA, C, O, resids, chains, resnames = _backbone_residues(molecule)
+    R = len(resids)
+    chain_arr = np.array(chains)
+    # Chain connectivity: same chain and a real peptide bond to the previous residue.
+    connected = np.zeros(R, dtype=bool)
+    if R > 1:
+        cn = np.linalg.norm(C[:-1] - N[1:], axis=1)
+        connected[1:] = (chain_arr[1:] == chain_arr[:-1]) & (cn < _CHAIN_BREAK)
+    H = _amide_hydrogens(N, C, O, chains, connected)
+    hb = _hbond_matrix(N, CA, C, O, H)
+    def same_chain_turn(i, n):
+        return i + n < R and chain_arr[i] == chain_arr[i + n]
+    # n-turns: an H-bond from residue i to i+n within one chain.
+    turn = {n: np.zeros(R, dtype=bool) for n in (3, 4, 5)}
+    for n in (3, 4, 5):
+        for i in range(R - n):
+            if same_chain_turn(i, n) and hb[i, i + n]:
+                turn[n][i] = True
+    masks = {code: np.zeros(R, dtype=bool) for code in _PRIORITY}
+    # Helices: two consecutive n-turns. 4 -> H (alpha), 3 -> G, 5 -> I.
+    for n, code, span in ((4, "H", 4), (3, "G", 3), (5, "I", 5)):
+        for i in range(1, R):
+            if turn[n][i] and turn[n][i - 1]:
+                masks[code][i:i + span] = True
+    # Bridges -> strands. Needs i-1, i+1, j-1, j+1 in range and |i-j| > 2.
+    bridged = np.zeros(R, dtype=bool)
+    for i in range(1, R - 1):
+        for j in range(i + 3, R - 1):
+            parallel = (hb[i - 1, j] and hb[j, i + 1]) or (hb[j - 1, i] and hb[i, j + 1])
+            anti = (hb[i, j] and hb[j, i]) or (hb[i - 1, j + 1] and hb[j - 1, i + 1])
+            if parallel or anti:
+                bridged[i] = bridged[j] = True
+    for i in range(R):
+        if bridged[i]:
+            neighbour = (i > 0 and bridged[i - 1]) or (i < R - 1 and bridged[i + 1])
+            masks["E" if neighbour else "B"][i] = True
+    # Turns: residues spanned by an n-turn.
+    for n in (3, 4, 5):
+        for i in np.nonzero(turn[n])[0]:
+            masks["T"][i + 1:i + n] = True
+    # Bends: sharp kink in the CA trace (virtual angle > 70 degrees).
+    if R > 4:
+        v1 = CA[2:-2] - CA[:-4]
+        v2 = CA[4:] - CA[2:-2]
+        cos = np.sum(v1 * v2, axis=1) / (
+            np.linalg.norm(v1, axis=1) * np.linalg.norm(v2, axis=1) + 1e-9
+        )
+        masks["S"][2:-2] = np.degrees(np.arccos(np.clip(cos, -1, 1))) > 70.0
+    # Resolve by priority (lowest first so higher-priority codes overwrite).
+    codes = np.full(R, "-", dtype="<U1")
+    for code in reversed(_PRIORITY):
+        codes[masks[code]] = code
+    return SecondaryStructure(codes, resids, chains, resnames)
+def per_atom_ss(molecule: Molecule) -> list:
+    """SS code for every atom (its residue's code; ``'-'`` for non-protein atoms)."""
+    ss = assign(molecule)
+    by_residue = {(c, int(r)): code for c, r, code in zip(ss.chains, ss.resids, ss.codes)}
+    chains = molecule.chains or [""] * len(molecule)
+    return [by_residue.get((chains[i], int(molecule.resids[i])), "-")
+            for i in range(len(molecule))]

{molscope-0.6.2 → molscope-0.7.0}/molscope/molecule.py RENAMED Viewed

@@ -339,6 +339,19 @@ class Molecule:
         """Shortcut for ``self.contact_map(...).plot()``."""
         return self.contact_map(cutoff, level, method).plot(**kwargs)
+    def secondary_structure(self):
+        """Assign protein secondary structure with a simplified DSSP.
+        Returns a :class:`molscope.dssp.SecondaryStructure` with one code per
+        backbone residue (``H``/``G``/``I`` helices, ``E``/``B`` strands, ``T``
+        turn, ``S`` bend, ``-`` coil). Needs N/CA/C/O backbone atoms and residue
+        metadata, so it works on proteins read from PDB/mmCIF. Colour a 3D plot
+        by the assignment with ``mol.plot(color_by="ss")``.
+        """
+        from .dssp import assign
+        return assign(self)
     def rmsd(self, other: Molecule, align: bool = False) -> float:
         """Root-mean-square deviation from ``other`` (matched by index).

{molscope-0.6.2 → molscope-0.7.0}/molscope/plotting.py RENAMED Viewed

@@ -22,8 +22,9 @@ def plot(
 ):
     """Scatter-plot atoms in 3D with an equal aspect ratio.
-    ``color_by`` selects the colouring: ``"element"`` (CPK), ``"chain"`` or
-    ``"residue"`` (categorical palette). Atom sizes scale with covalent radius.
+    ``color_by`` selects the colouring: ``"element"`` (CPK), ``"chain"``,
+    ``"residue"`` (categorical palette), or ``"ss"`` (secondary structure, via
+    a simplified DSSP). Atom sizes scale with covalent radius.
     Bonds are drawn when ``show_bonds`` is true, or, when ``None``, automatically
     for molecules small enough to infer bonds cheaply. Returns the ``Axes3D``;
     pass ``show=False`` to suppress ``plt.show()``.
@@ -161,6 +162,10 @@ def plot_rmsd_heatmap(matrix, order=None, ax=None, cmap="viridis", show: bool =
 def _colors(molecule, color_by: str):
     if color_by == "element":
         return [elements.color(e) for e in molecule.elements]
+    if color_by == "ss":
+        from . import dssp
+        return [dssp.SS_COLORS[c] for c in dssp.per_atom_ss(molecule)]
     if color_by == "chain":
         keys = molecule.chains
     elif color_by == "residue":

{molscope-0.6.2 → molscope-0.7.0}/molscope.egg-info/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: molscope
-Version: 0.6.2
+Version: 0.7.0
 Summary: Lightweight molecular structure analysis, visualisation, graph export, and coarse-graining in Python.
 Author-email: Roshan Shrestha <roshanpra@gmail.com>
 License-Expression: MIT
@@ -38,9 +38,9 @@ coarse-graining in Python. Read `.xyz`, `.pdb`, `.cif` and `.sdf` files
 3D. The `.cif` reader is a basic mmCIF parser for standard `_atom_site`
 coordinate loops, not a full mmCIF syntax implementation.
-| 3D structure rendering | Residue contact map | Coarse-grained beads |
-| --- | --- | --- |
-| ![Aquaporin-1 rendered as a 3D element-coloured molecular structure](https://raw.githubusercontent.com/roshan2004/molscope/main/docs/assets/readme/aquaporin-structure-v2.png) | ![Residue-level contact map heatmap for Aquaporin-1](https://raw.githubusercontent.com/roshan2004/molscope/main/docs/assets/readme/residue-contact-map.png) | ![Coarse-grained bead model of Aquaporin-1](https://raw.githubusercontent.com/roshan2004/molscope/main/docs/assets/readme/coarse-grained-beads-v2.png) |
+| 3D structure (element) | Secondary structure (DSSP) | Residue contact map | Coarse-grained beads |
+| --- | --- | --- | --- |
+| ![Aquaporin-1 rendered as a 3D element-coloured molecular structure](https://raw.githubusercontent.com/roshan2004/molscope/main/docs/assets/readme/aquaporin-structure-v2.png) | ![Aquaporin-1 coloured by DSSP secondary structure: helices red, turns cyan, coil grey](https://raw.githubusercontent.com/roshan2004/molscope/main/docs/assets/readme/secondary-structure.png) | ![Residue-level contact map heatmap for Aquaporin-1](https://raw.githubusercontent.com/roshan2004/molscope/main/docs/assets/readme/residue-contact-map.png) | ![Coarse-grained bead model of Aquaporin-1](https://raw.githubusercontent.com/roshan2004/molscope/main/docs/assets/readme/coarse-grained-beads-v2.png) |
 ## What it does
@@ -51,6 +51,8 @@ coordinate loops, not a full mmCIF syntax implementation.
 - **Analyse** centroids, radius of gyration, the inertia tensor, inferred bonds
   and contacts.
 - **Contact maps** at atom or residue level, with heatmap plots.
+- **Secondary structure** via a self-contained, dependency-free DSSP, with
+  `plot(color_by="ss")`.
 - **Ensembles**: pairwise RMSD, RMSF, averaging, and conformer clustering.
 - **Export for ML**: flat structural descriptors and molecular graphs for
   NetworkX, PyTorch Geometric and DGL.
@@ -58,6 +60,33 @@ coordinate loops, not a full mmCIF syntax implementation.
 - **Visualise** with 3D matplotlib plots, an interactive py3Dmol viewer, spin
   GIFs, and a command-line interface.
+## Why MolScope?
+MolScope is **not** intended to replace full molecular-simulation or
+cheminformatics frameworks. It is a lightweight **educational and prototyping**
+toolkit for reading common molecular structure files, performing simple
+structural analysis, exporting graph representations for ML workflows, and
+experimenting with coarse-grained mappings. Its core depends only on NumPy and
+Matplotlib, and the API is Python-first and scriptable.
+In particular, the coarse-graining tools are for **educational CG mapping and
+bead-graph prototyping**: useful for exploring mappings before moving to a
+production Martini workflow. They are not a validated Martini force-field
+generator.
+| Tool | Main focus | How MolScope differs |
+| --- | --- | --- |
+| RDKit | Cheminformatics | MolScope leans toward structure visualisation, protein/PDB-style metadata, and CG prototyping |
+| MDAnalysis | MD trajectories | MolScope is lighter and easier for static structures and teaching |
+| MDTraj | Trajectory analysis | MolScope is simpler and graph/CG oriented |
+| Biopython | Structure parsing / bioinformatics | MolScope adds 3D analysis, ML-graph export, and coarse-graining |
+| PyMOL / VMD | Interactive visualisation | MolScope is Python-first, scriptable, and ML-export friendly |
+| nglview | Notebook structure viewer | MolScope also does analysis, descriptors, graphs and CG, not just viewing |
+Reach for those tools when you need their depth and validation. Reach for
+MolScope when you want something small, readable, and quick to teach or
+prototype with.
 ## Install
 With [uv](https://docs.astral.sh/uv/) (recommended):
@@ -184,6 +213,29 @@ mol.contact_map(level="residue", method="min")        # closest inter-residue at
 mol.contact_map(level="residue", method="com")        # residue centre of mass
 ```
+### Secondary structure (DSSP)
+Assign protein secondary structure from backbone hydrogen-bond patterns with a
+self-contained, pure-NumPy DSSP (no external `mkdssp` binary needed):
+```python
+mol = ms.read("1fqy.pdb")
+ss = mol.secondary_structure()      # SecondaryStructure, one code per residue
+ss.string                           # e.g. '--HHHHHHHH--SS--EEEE--'
+ss.codes                            # per-residue array
+ss.summary()                        # helix/strand/coil counts and fractions
+mol.plot(color_by="ss")             # colour the 3D view by secondary structure
+```
+Codes follow DSSP: `H`/`G`/`I` helices, `E`/`B` strands, `T` turn, `S` bend,
+`-` coil. This is a simplified **educational** implementation: it reproduces the
+main classes from the Kabsch-Sander hydrogen-bond model but is not bit-identical
+to the reference `mkdssp` on every edge case. It needs backbone N/CA/C/O atoms,
+so use PDB/mmCIF input (not a bare `.xyz`). The secondary-structure render in the
+showcase above (helices red, turns cyan, coil grey) is produced this way.
 ### NMR ensembles
 ```python

{molscope-0.6.2 → molscope-0.7.0}/molscope.egg-info/SOURCES.txt RENAMED Viewed

@@ -7,6 +7,7 @@ molscope/cli.py
 molscope/coarsegrain.py
 molscope/contactmap.py
 molscope/descriptors.py
+molscope/dssp.py
 molscope/elements.py
 molscope/ensemble.py
 molscope/graph.py
@@ -23,6 +24,7 @@ tests/test_clustering.py
 tests/test_coarsegrain.py
 tests/test_contactmap.py
 tests/test_descriptors.py
+tests/test_dssp.py
 tests/test_features.py
 tests/test_graph.py
 tests/test_io.py

{molscope-0.6.2 → molscope-0.7.0}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "molscope"
-version = "0.6.2"
+version = "0.7.0"
 description = "Lightweight molecular structure analysis, visualisation, graph export, and coarse-graining in Python."
 readme = "README.md"
 requires-python = ">=3.9"
@@ -63,3 +63,5 @@ ignore = ["UP045"]
 [tool.ruff.lint.per-file-ignores]
 # Notebook cells legitimately re-import and import mid-file across cells.
 "*.ipynb" = ["E402", "F811", "I001"]
+# N, CA, C, O are the standard backbone atom names; "O" reads clearly here.
+"molscope/dssp.py" = ["E741"]

molscope-0.7.0/tests/test_dssp.py ADDED Viewed

@@ -0,0 +1,67 @@
+"""Tests for the simplified DSSP secondary-structure assignment."""
+import os
+import numpy as np
+import pytest
+import molscope as ms
+from molscope import SecondaryStructure, dssp
+from molscope.molecule import Molecule
+DATA = os.path.dirname(os.path.dirname(__file__))
+def aquaporin():
+    return ms.read(os.path.join(DATA, "1fqy.pdb"))
+def test_assign_returns_one_code_per_residue():
+    mol = aquaporin()
+    ss = mol.secondary_structure()
+    assert isinstance(ss, SecondaryStructure)
+    n_residues = sum(1 for _ in mol.residue_groups())
+    assert len(ss) == n_residues
+    assert len(ss.string) == n_residues
+    assert set(ss.string) <= set("HGIEBTS-")
+def test_aquaporin_is_helix_rich():
+    # Aquaporin-1 is an all-alpha membrane protein: helix-dominated, no sheets.
+    summary = aquaporin().secondary_structure().summary()
+    assert summary["helix_fraction"] > 0.4
+    assert summary["strand"] == 0
+    counts = summary["helix"] + summary["strand"] + summary["coil"]
+    assert counts == summary["residues"]
+def test_summary_fractions_sum_to_one():
+    summary = aquaporin().secondary_structure().summary()
+    total = (
+        summary["helix_fraction"]
+        + summary["strand_fraction"]
+        + summary["coil_fraction"]
+    )
+    assert total == pytest.approx(1.0)
+def test_per_atom_ss_aligns_with_atoms():
+    mol = aquaporin()
+    per_atom = dssp.per_atom_ss(mol)
+    assert len(per_atom) == len(mol)
+    assert set(per_atom) <= set("HGIEBTS-")
+def test_plot_color_by_ss(tmp_path):
+    import matplotlib
+    matplotlib.use("Agg")
+    mol = aquaporin()
+    ax = mol.plot(color_by="ss", show=False)
+    assert ax is not None
+def test_requires_backbone_metadata():
+    # A bare coordinate molecule (no atom names / resids) cannot be assigned.
+    bare = Molecule(np.zeros((3, 3)), ["C", "C", "C"])
+    with pytest.raises(ValueError):
+        bare.secondary_structure()