molscope 0.8.2__tar.gz → 0.9.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- molscope-0.9.0/CHANGELOG.md +174 -0
- {molscope-0.8.2 → molscope-0.9.0}/CITATION.cff +12 -2
- {molscope-0.8.2 → molscope-0.9.0}/MANIFEST.in +1 -0
- {molscope-0.8.2/molscope.egg-info → molscope-0.9.0}/PKG-INFO +89 -6
- {molscope-0.8.2 → molscope-0.9.0}/README.md +84 -5
- {molscope-0.8.2 → molscope-0.9.0}/docs/api-reference.md +10 -0
- {molscope-0.8.2 → molscope-0.9.0}/docs/examples/binding-site.md +27 -1
- {molscope-0.8.2 → molscope-0.9.0}/docs/limitations.md +9 -5
- {molscope-0.8.2 → molscope-0.9.0}/docs/user-guide/descriptors.md +14 -0
- molscope-0.9.0/docs/user-guide/library-selection.md +82 -0
- molscope-0.9.0/docs/user-guide/mcp-server.md +88 -0
- {molscope-0.8.2 → molscope-0.9.0}/docs/user-guide/protein-analysis.md +18 -0
- {molscope-0.8.2 → molscope-0.9.0}/docs/user-guide/reading-files.md +6 -2
- {molscope-0.8.2 → molscope-0.9.0}/docs/user-guide/selections.md +13 -0
- {molscope-0.8.2 → molscope-0.9.0}/docs/validation.md +15 -5
- {molscope-0.8.2 → molscope-0.9.0}/examples/binding_site.py +20 -1
- molscope-0.9.0/examples/data/1shg.pdb +761 -0
- molscope-0.9.0/examples/data/1ubq.pdb +970 -0
- {molscope-0.8.2 → molscope-0.9.0}/mkdocs.yml +2 -0
- {molscope-0.8.2 → molscope-0.9.0}/molscope/__init__.py +6 -2
- {molscope-0.8.2 → molscope-0.9.0}/molscope/cli.py +209 -1
- {molscope-0.8.2 → molscope-0.9.0}/molscope/coarsegrain.py +58 -11
- {molscope-0.8.2 → molscope-0.9.0}/molscope/contactmap.py +19 -8
- molscope-0.9.0/molscope/contacts.py +591 -0
- {molscope-0.8.2 → molscope-0.9.0}/molscope/dssp.py +34 -13
- {molscope-0.8.2 → molscope-0.9.0}/molscope/ensemble.py +9 -2
- {molscope-0.8.2 → molscope-0.9.0}/molscope/graph.py +27 -12
- {molscope-0.8.2 → molscope-0.9.0}/molscope/io.py +38 -21
- molscope-0.9.0/molscope/library.py +279 -0
- molscope-0.9.0/molscope/mcp_server.py +327 -0
- {molscope-0.8.2 → molscope-0.9.0}/molscope/molecule.py +211 -13
- {molscope-0.8.2 → molscope-0.9.0}/molscope/plotting.py +1 -1
- {molscope-0.8.2 → molscope-0.9.0/molscope.egg-info}/PKG-INFO +89 -6
- {molscope-0.8.2 → molscope-0.9.0}/molscope.egg-info/SOURCES.txt +12 -0
- {molscope-0.8.2 → molscope-0.9.0}/molscope.egg-info/entry_points.txt +1 -0
- {molscope-0.8.2 → molscope-0.9.0}/molscope.egg-info/requires.txt +8 -0
- {molscope-0.8.2 → molscope-0.9.0}/pyproject.toml +7 -1
- molscope-0.9.0/tests/fixtures/insertion_codes.cif +20 -0
- molscope-0.9.0/tests/fixtures/ugly_residue_ids.pdb +11 -0
- molscope-0.9.0/tests/test_cli.py +140 -0
- molscope-0.9.0/tests/test_cli_batch.py +192 -0
- {molscope-0.8.2 → molscope-0.9.0}/tests/test_contactmap.py +17 -0
- molscope-0.9.0/tests/test_contacts.py +289 -0
- {molscope-0.8.2 → molscope-0.9.0}/tests/test_extras.py +1 -0
- {molscope-0.8.2 → molscope-0.9.0}/tests/test_features.py +1 -0
- {molscope-0.8.2 → molscope-0.9.0}/tests/test_graph.py +11 -0
- {molscope-0.8.2 → molscope-0.9.0}/tests/test_io.py +45 -0
- molscope-0.9.0/tests/test_library.py +278 -0
- molscope-0.9.0/tests/test_mcp_server.py +181 -0
- molscope-0.9.0/tests/test_molecule.py +241 -0
- {molscope-0.8.2 → molscope-0.9.0}/tests/test_protein_workflows.py +2 -1
- molscope-0.9.0/tests/validation/test_binding_sites_ref.py +49 -0
- {molscope-0.8.2 → molscope-0.9.0}/tests/validation/test_dssp_ref.py +56 -14
- molscope-0.8.2/molscope/contacts.py +0 -292
- molscope-0.8.2/tests/test_cli.py +0 -48
- molscope-0.8.2/tests/test_cli_batch.py +0 -54
- molscope-0.8.2/tests/test_contacts.py +0 -122
- molscope-0.8.2/tests/test_molecule.py +0 -131
- {molscope-0.8.2 → molscope-0.9.0}/LICENSE +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/docs/assets/coarsegrain/1fqy-cg-mapping-comparison.png +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/docs/assets/coarsegrain/1fqy-martini-mapping.png +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/docs/assets/contactmaps/1aml-contact-frequency.png +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/docs/assets/contactmaps/1fqy-ca-distance-matrix.png +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/docs/assets/contactmaps/1fqy-residue-contact-map.png +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/docs/assets/geometry/1aml-rmsf-profile.png +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/docs/assets/geometry/1fqy-principal-axes.png +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/docs/assets/graphs/1fqy-residue-contact-graph.png +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/docs/assets/readme/aquaporin-structure-v2.png +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/docs/assets/readme/coarse-grained-beads-v2.png +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/docs/assets/readme/residue-contact-map.png +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/docs/assets/readme/secondary-structure.png +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/docs/benchmarks.md +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/docs/contributing.md +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/docs/examples/analyze-contacts.md +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/docs/examples/build-molecular-graph.md +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/docs/examples/coarse-grain-protein.md +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/docs/examples/compare-nmr-models.md +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/docs/examples/export-pyg.md +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/docs/examples/geometry-tour.md +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/docs/examples/index.md +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/docs/examples/pdb-to-graph-cg.md +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/docs/examples/pdb-to-pyg-ml.md +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/docs/examples/protein-analysis-from-scratch.md +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/docs/examples/residue-contact-graphs.md +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/docs/index.md +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/docs/installation.md +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/docs/quickstart.md +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/docs/roadmap.md +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/docs/tutorials/index.md +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/docs/tutorials/pdb-to-coarse-grained-beads.md +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/docs/tutorials/pdb-to-descriptors.md +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/docs/tutorials/pdb-to-graph-gnn.md +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/docs/user-guide/chemical-perception.md +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/docs/user-guide/coarse-graining.md +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/docs/user-guide/contact-maps.md +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/docs/user-guide/coordinate-formats.md +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/docs/user-guide/ensembles.md +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/docs/user-guide/geometry.md +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/docs/user-guide/molecular-graphs.md +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/docs/user-guide/plotting.md +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/examples/coarse_graining.py +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/examples/data/1aml.pdb +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/examples/data/1fqy.pdb +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/examples/data/3ptb.pdb +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/examples/data/helix_201.xyz +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/examples/geometry.py +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/examples/graph_to_gnn.py +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/examples/legacy_utils.py +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/examples/pdb_to_pyg_ml.py +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/examples/protein_analysis.py +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/examples/residue_contact_graph.py +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/examples/tour.py +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/molscope/__main__.py +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/molscope/chem.py +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/molscope/cif.py +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/molscope/descriptors.py +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/molscope/distance.py +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/molscope/elements.py +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/molscope.egg-info/dependency_links.txt +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/molscope.egg-info/top_level.txt +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/notebooks/molscope_tour.ipynb +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/notebooks/pdb_to_gnn.ipynb +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/notebooks/protein_analysis_from_scratch.ipynb +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/requirements.txt +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/scripts/benchmark_core.py +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/scripts/build_gnn_notebook.py +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/scripts/build_protein_analysis_notebook.py +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/scripts/build_user_guide_pdf.py +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/scripts/render_coarsegrain_images.py +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/scripts/render_contact_analysis_images.py +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/scripts/render_geometry_images.py +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/setup.cfg +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/tests/fixtures/bad_coord.pdb +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/tests/fixtures/bad_coord.xyz +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/tests/fixtures/bad_counts.sdf +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/tests/fixtures/missing_coord_col.cif +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/tests/fixtures/no_atom_site.cif +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/tests/fixtures/no_atoms.pdb +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/tests/fixtures/short_atom.pdb +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/tests/fixtures/truncated.sdf +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/tests/fixtures/truncated.xyz +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/tests/fixtures/v3000.sdf +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/tests/fixtures/water.sdf +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/tests/test_cg_mapping.py +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/tests/test_chem.py +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/tests/test_cif_validation.py +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/tests/test_clustering.py +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/tests/test_coarsegrain.py +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/tests/test_descriptors.py +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/tests/test_dssp.py +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/tests/validation/test_bonds_ref.py +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/tests/validation/test_chem_ref.py +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/tests/validation/test_geometry_ref.py +0 -0
- {molscope-0.8.2 → molscope-0.9.0}/tests/validation/test_invariants.py +0 -0
|
@@ -0,0 +1,174 @@
|
|
|
1
|
+
# Changelog
|
|
2
|
+
|
|
3
|
+
All notable changes to MolScope are documented here.
|
|
4
|
+
|
|
5
|
+
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
|
|
6
|
+
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
7
|
+
While the package is pre-1.0, minor versions may include backwards-incompatible
|
|
8
|
+
API changes; these are called out under **Changed** where they occur.
|
|
9
|
+
|
|
10
|
+
## [Unreleased]
|
|
11
|
+
|
|
12
|
+
## [0.9.0] - 2026-05-29
|
|
13
|
+
|
|
14
|
+
### Added
|
|
15
|
+
|
|
16
|
+
- Molecule-table workflow: `molscope select` and the `molscope.library` module
|
|
17
|
+
read a CSV/XLSX of molecules and pick a diverse subset by MaxMin
|
|
18
|
+
(farthest-first) selection over descriptors. Select on existing numeric columns
|
|
19
|
+
(e.g. `MW`, `ALogP`) or compute RDKit descriptors from a SMILES column with
|
|
20
|
+
`--compute-descriptors --smiles-col`. Adds an `xlsx` extra (`openpyxl`) for
|
|
21
|
+
spreadsheet I/O; CSV input and selection need no optional backend.
|
|
22
|
+
- Optional MCP (Model Context Protocol) server, `molscope.mcp_server`, exposing
|
|
23
|
+
MolScope's analyses as tools for AI assistants such as Claude Code and Claude
|
|
24
|
+
Desktop. Adds a `molscope-mcp` console script, an `mcp` extra
|
|
25
|
+
(`pip install "molscope[mcp]"`, Python >= 3.10), and nine read-only tools that
|
|
26
|
+
wrap the existing API: summarise, descriptors, secondary structure, contact
|
|
27
|
+
map, binding site, molecular graph, coarse-grain, and two PNG render tools.
|
|
28
|
+
- Broadened the DSSP reference cross-check to three fold classes instead of one:
|
|
29
|
+
helix-dominated Aquaporin-1 (`1fqy`), mixed alpha/beta ubiquitin (`1ubq`), and
|
|
30
|
+
the all-beta SH3 domain (`1shg`). The test is parametrised and prints
|
|
31
|
+
per-fold 3-state agreement so results read as a range. Measured against
|
|
32
|
+
`mkdssp` 4.5.8: 99.1% (`1fqy`), 100% (`1ubq`), 98.2% (`1shg`).
|
|
33
|
+
- Bundled `1ubq.pdb` and `1shg.pdb` in `examples/data` as the new validation
|
|
34
|
+
structures.
|
|
35
|
+
|
|
36
|
+
### Changed
|
|
37
|
+
|
|
38
|
+
- Documentation and the JOSS paper now report DSSP agreement as a measured
|
|
39
|
+
range across fold classes rather than a single helical figure, and no longer
|
|
40
|
+
imply that strand-rich folds agree markedly less well.
|
|
41
|
+
|
|
42
|
+
## [0.8.3] - 2026-05-28
|
|
43
|
+
|
|
44
|
+
### Added
|
|
45
|
+
|
|
46
|
+
- JOSS paper draft under `paper/` and Zenodo deposition metadata
|
|
47
|
+
(`.zenodo.json`) for an archival DOI.
|
|
48
|
+
|
|
49
|
+
### Changed
|
|
50
|
+
|
|
51
|
+
- Bumped GitHub Actions to Node 24-ready versions.
|
|
52
|
+
|
|
53
|
+
## [0.8.2] - 2026-05-28
|
|
54
|
+
|
|
55
|
+
### Added
|
|
56
|
+
|
|
57
|
+
- Read the Docs hosting for the documentation site.
|
|
58
|
+
- Coverage reporting via `pytest-cov` and Codecov, measured with the optional
|
|
59
|
+
backends installed so the RDKit, gemmi, NetworkX, SciPy and Torch paths are
|
|
60
|
+
exercised rather than skipped.
|
|
61
|
+
- Coarse-grained virtual-site support, preserved as derived coordinate metadata.
|
|
62
|
+
- PDB workflow tutorials and a protein-analysis-from-scratch walkthrough.
|
|
63
|
+
- CLI batch analysis and graph-export subcommands, with improved selection
|
|
64
|
+
handling.
|
|
65
|
+
- `O(n)` cell-list neighbour search and opt-in periodic-boundary support for the
|
|
66
|
+
distance and contact methods.
|
|
67
|
+
- Residue contact graphs for ML and an educational coarse-graining workflow.
|
|
68
|
+
- Scientific validation tables, references, and a reproducible benchmarks page.
|
|
69
|
+
|
|
70
|
+
### Changed
|
|
71
|
+
|
|
72
|
+
- Reorganised the limitations page by workflow and added a Graph ML section.
|
|
73
|
+
- Refocused the documentation around the three core workflows.
|
|
74
|
+
- Completed the geometry API and added a visual geometry guide validated against
|
|
75
|
+
MDAnalysis.
|
|
76
|
+
- Polished the coarse-graining workflow, contact maps, and dense distance
|
|
77
|
+
backends; improved coordinate-format parser errors and added edge-case
|
|
78
|
+
fixtures.
|
|
79
|
+
- Stopped tracking generated graph exports and added `graphs/` to `.gitignore`.
|
|
80
|
+
|
|
81
|
+
### Fixed
|
|
82
|
+
|
|
83
|
+
- Batch CLI crash with `--jobs > 1` on spawn-based platforms.
|
|
84
|
+
- gemmi-backed mmCIF unit-cell read.
|
|
85
|
+
- Lint errors, and made periodic boundaries opt-in for distance and contact
|
|
86
|
+
methods.
|
|
87
|
+
|
|
88
|
+
## [0.8.1] - 2026-05-26
|
|
89
|
+
|
|
90
|
+
### Added
|
|
91
|
+
|
|
92
|
+
- Tier-2 validation suite cross-checking against reference scientific tools,
|
|
93
|
+
covering DSSP (`mkdssp`), geometry and RMSD (MDAnalysis), bond perception and
|
|
94
|
+
chemical features (RDKit), and contact maps.
|
|
95
|
+
- PyTorch Geometric ML tutorial and `CITATION.cff` citation metadata.
|
|
96
|
+
|
|
97
|
+
### Changed
|
|
98
|
+
|
|
99
|
+
- DSSP validation now invokes `mkdssp` directly instead of going through
|
|
100
|
+
Biopython, and the 3-state agreement floor was tightened to 0.95 after
|
|
101
|
+
observing 99.1% on CI.
|
|
102
|
+
- CI runs the validation job with the required extras and fails loudly if the
|
|
103
|
+
reference tools cannot be imported.
|
|
104
|
+
- Polished repository layout and documentation.
|
|
105
|
+
|
|
106
|
+
### Fixed
|
|
107
|
+
|
|
108
|
+
- README Mermaid diagram syntax.
|
|
109
|
+
|
|
110
|
+
## [0.8.0] - 2026-05-26
|
|
111
|
+
|
|
112
|
+
### Added
|
|
113
|
+
|
|
114
|
+
- Expanded molecular parsing and machine-learning feature support.
|
|
115
|
+
|
|
116
|
+
## [0.7.0] - 2026-05-25
|
|
117
|
+
|
|
118
|
+
### Added
|
|
119
|
+
|
|
120
|
+
- Simplified, dependency-free DSSP-style secondary-structure assignment based on
|
|
121
|
+
the Kabsch-Sander hydrogen-bond model.
|
|
122
|
+
|
|
123
|
+
### Changed
|
|
124
|
+
|
|
125
|
+
- README: added a "Why MolScope" section, a tool comparison, and a CLI output
|
|
126
|
+
example; the secondary-structure render replaced the earlier hero animation.
|
|
127
|
+
|
|
128
|
+
## [0.6.2] - 2026-05-25
|
|
129
|
+
|
|
130
|
+
### Fixed
|
|
131
|
+
|
|
132
|
+
- README images now render on PyPI, and the publish workflow was hardened.
|
|
133
|
+
|
|
134
|
+
## [0.6.1] - 2026-05-25
|
|
135
|
+
|
|
136
|
+
### Added
|
|
137
|
+
|
|
138
|
+
- PyPI trusted-publishing workflow.
|
|
139
|
+
|
|
140
|
+
### Changed
|
|
141
|
+
|
|
142
|
+
- Expanded the package docstring to cover the full feature scope.
|
|
143
|
+
|
|
144
|
+
## [0.6.0] - 2026-05-25
|
|
145
|
+
|
|
146
|
+
Initial public release under the **MolScope** name, renamed from the earlier
|
|
147
|
+
`molecule3d` prototype. This release consolidated the core toolkit:
|
|
148
|
+
|
|
149
|
+
### Added
|
|
150
|
+
|
|
151
|
+
- `Molecule` object on a NumPy core, with fixed-column PDB parsing and readers
|
|
152
|
+
and writers for XYZ, PDB, mmCIF and SDF.
|
|
153
|
+
- Per-atom metadata, metadata-based selections, geometry measurements, and RMSD.
|
|
154
|
+
- Molecular graph construction with NetworkX, PyTorch Geometric and DGL
|
|
155
|
+
exporters.
|
|
156
|
+
- Coarse-graining tools: residue and custom mappings, explicit-bond support,
|
|
157
|
+
index-based mappings, and a dropped-atom warning.
|
|
158
|
+
- Contact maps and ensemble contact-frequency analysis, plus ensemble RMSD
|
|
159
|
+
clustering and an RMSD heatmap.
|
|
160
|
+
- Native structural descriptors and a MkDocs documentation site, including a
|
|
161
|
+
user-guide PDF builder.
|
|
162
|
+
- `uv` support (lockfile, dev dependency group, `.python-version`), continuous
|
|
163
|
+
integration, and README visual examples.
|
|
164
|
+
|
|
165
|
+
[Unreleased]: https://github.com/roshan2004/molscope/compare/v0.9.0...HEAD
|
|
166
|
+
[0.9.0]: https://github.com/roshan2004/molscope/compare/v0.8.3...v0.9.0
|
|
167
|
+
[0.8.3]: https://github.com/roshan2004/molscope/compare/v0.8.2...v0.8.3
|
|
168
|
+
[0.8.2]: https://github.com/roshan2004/molscope/compare/v0.8.1...v0.8.2
|
|
169
|
+
[0.8.1]: https://github.com/roshan2004/molscope/compare/v0.8.0...v0.8.1
|
|
170
|
+
[0.8.0]: https://github.com/roshan2004/molscope/compare/v0.7.0...v0.8.0
|
|
171
|
+
[0.7.0]: https://github.com/roshan2004/molscope/compare/v0.6.2...v0.7.0
|
|
172
|
+
[0.6.2]: https://github.com/roshan2004/molscope/compare/v0.6.1...v0.6.2
|
|
173
|
+
[0.6.1]: https://github.com/roshan2004/molscope/compare/v0.6.0...v0.6.1
|
|
174
|
+
[0.6.0]: https://github.com/roshan2004/molscope/releases/tag/v0.6.0
|
|
@@ -5,9 +5,19 @@ title: "MolScope: lightweight molecular structure analysis, visualisation, graph
|
|
|
5
5
|
authors:
|
|
6
6
|
- family-names: Shrestha
|
|
7
7
|
given-names: Roshan
|
|
8
|
-
|
|
9
|
-
|
|
8
|
+
orcid: "https://orcid.org/0000-0002-9356-5136"
|
|
9
|
+
affiliation: "Independent Researcher"
|
|
10
|
+
version: 0.9.0
|
|
11
|
+
date-released: "2026-05-29"
|
|
10
12
|
license: MIT
|
|
13
|
+
doi: 10.5281/zenodo.20433850
|
|
14
|
+
identifiers:
|
|
15
|
+
- type: doi
|
|
16
|
+
value: 10.5281/zenodo.20433850
|
|
17
|
+
description: Concept DOI (always resolves to the latest archived version)
|
|
18
|
+
- type: doi
|
|
19
|
+
value: 10.5281/zenodo.20433851
|
|
20
|
+
description: DOI for v0.8.3
|
|
11
21
|
repository-code: "https://github.com/roshan2004/molscope"
|
|
12
22
|
url: "https://github.com/roshan2004/molscope"
|
|
13
23
|
abstract: >
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: molscope
|
|
3
|
-
Version: 0.
|
|
3
|
+
Version: 0.9.0
|
|
4
4
|
Summary: Lightweight molecular coordinate workflows for descriptors, graph ML, and coarse-grained beads.
|
|
5
5
|
Author-email: Roshan Shrestha <roshanpra@gmail.com>
|
|
6
6
|
License-Expression: MIT
|
|
@@ -30,8 +30,12 @@ Provides-Extra: chem
|
|
|
30
30
|
Requires-Dist: rdkit>=2023.9; extra == "chem"
|
|
31
31
|
Provides-Extra: cif
|
|
32
32
|
Requires-Dist: gemmi>=0.7; extra == "cif"
|
|
33
|
+
Provides-Extra: xlsx
|
|
34
|
+
Requires-Dist: openpyxl>=3.1; extra == "xlsx"
|
|
33
35
|
Provides-Extra: gpu
|
|
34
36
|
Requires-Dist: torch>=2.0; extra == "gpu"
|
|
37
|
+
Provides-Extra: mcp
|
|
38
|
+
Requires-Dist: mcp>=1.2; python_version >= "3.10" and extra == "mcp"
|
|
35
39
|
Provides-Extra: validation
|
|
36
40
|
Requires-Dist: mdanalysis>=2.7; extra == "validation"
|
|
37
41
|
Requires-Dist: rdkit>=2023.9; extra == "validation"
|
|
@@ -57,6 +61,7 @@ Dynamic: license-file
|
|
|
57
61
|
[](pyproject.toml)
|
|
58
62
|
[](LICENSE)
|
|
59
63
|
[](https://github.com/astral-sh/ruff)
|
|
64
|
+
[](https://doi.org/10.5281/zenodo.20433850)
|
|
60
65
|
|
|
61
66
|
Lightweight molecular structure analysis, graph export, and coarse-graining in
|
|
62
67
|
Python. MolScope is built around three polished workflows: turn coordinate
|
|
@@ -264,11 +269,17 @@ mol.select(chain="A") # one chain
|
|
|
264
269
|
mol.select(element="C") # all carbons
|
|
265
270
|
mol.select(resname="HOH") # waters
|
|
266
271
|
mol.select(resid=(10, 20)) # an inclusive residue range
|
|
272
|
+
mol.select(resid=100, icode="A") # PDB/mmCIF insertion code
|
|
273
|
+
mol.residue_ids # full ResidueId(chain, resid, icode, resname)
|
|
267
274
|
mol.alpha_carbons() # CA atoms (the usual basis for protein RMSD)
|
|
268
275
|
mol.backbone() # N, CA, C, O
|
|
269
276
|
mol[mask_or_indices] # subset by numpy mask / index array
|
|
270
277
|
```
|
|
271
278
|
|
|
279
|
+
Residue grouping, contact maps, binding sites and residue graphs use full
|
|
280
|
+
residue identity internally, so residues like `100A` and `100B` stay separate
|
|
281
|
+
while integer `resids` remain available for older code.
|
|
282
|
+
|
|
272
283
|
### Analysis and measurements
|
|
273
284
|
|
|
274
285
|
```python
|
|
@@ -353,10 +364,11 @@ Codes follow DSSP notation: `H`/`G`/`I` helices, `E`/`B` strands, `T` turn, `S`
|
|
|
353
364
|
bend, `-` coil. This is a simplified **educational** implementation of the
|
|
354
365
|
Kabsch-Sander hydrogen-bond model: not bit-identical to the reference `mkdssp`
|
|
355
366
|
on every edge case, but validated against it. A CI cross-check
|
|
356
|
-
(`tests/validation`)
|
|
357
|
-
(helix/strand/coil) with `mkdssp
|
|
358
|
-
|
|
359
|
-
|
|
367
|
+
(`tests/validation`) spans three fold classes and reports **~98 to 99%
|
|
368
|
+
per-residue 3-state agreement** (helix/strand/coil) with `mkdssp`: 99.1% on the
|
|
369
|
+
helical aquaporin (`1fqy`), 100% on mixed alpha/beta ubiquitin (`1ubq`), and
|
|
370
|
+
98.2% on the all-beta SH3 domain (`1shg`). It needs backbone N/CA/C/O atoms, so
|
|
371
|
+
use PDB/mmCIF input (not a bare
|
|
360
372
|
`.xyz`). The secondary-structure render in the showcase above (helices red,
|
|
361
373
|
turns cyan, coil grey) is produced this way.
|
|
362
374
|
|
|
@@ -513,7 +525,8 @@ inspection and reuse, not a validated simulation topology.
|
|
|
513
525
|
|
|
514
526
|
## Command-line interface
|
|
515
527
|
|
|
516
|
-
MolScope provides a
|
|
528
|
+
MolScope provides a CLI for visualization, binding-site tables, batch analysis,
|
|
529
|
+
and ML graph export.
|
|
517
530
|
|
|
518
531
|
### View (default)
|
|
519
532
|
Visualize a structure, apply transformations, and save images or animations.
|
|
@@ -529,6 +542,14 @@ Batch compute molecular descriptors for many files and save to a CSV table.
|
|
|
529
542
|
molscope analyze examples/data/*.pdb --out results.csv --preset native-3d --jobs 4
|
|
530
543
|
```
|
|
531
544
|
|
|
545
|
+
### Binding sites
|
|
546
|
+
Write protein-ligand binding-site residue contacts and optional pocket
|
|
547
|
+
descriptors to CSV.
|
|
548
|
+
```bash
|
|
549
|
+
molscope binding-site examples/data/3ptb.pdb --out site.csv --cutoff 4.5
|
|
550
|
+
molscope binding-site examples/data/3ptb.pdb --out site.csv --descriptors-out pocket.csv
|
|
551
|
+
```
|
|
552
|
+
|
|
532
553
|
### Export
|
|
533
554
|
Batch export molecular graphs to PyTorch Geometric, DGL, or NetworkX formats.
|
|
534
555
|
```bash
|
|
@@ -536,6 +557,53 @@ molscope export "data/*.cif" --to pyg --out-dir pyg_graphs/ --pe laplacian --job
|
|
|
536
557
|
```
|
|
537
558
|
Supports advanced features like `--self-loops`, `--global-node`, and `--pe` (positional encodings).
|
|
538
559
|
|
|
560
|
+
### Select
|
|
561
|
+
Pick a diverse subset from a molecule table (`.csv` or `.xlsx`) by MaxMin
|
|
562
|
+
selection over descriptors. Select on existing numeric columns, or compute
|
|
563
|
+
descriptors from a SMILES column with RDKit.
|
|
564
|
+
```bash
|
|
565
|
+
# diverse pick on existing property columns (no extras needed for CSV)
|
|
566
|
+
molscope select molecules.csv --descriptor-cols MW ALogP -n 100 --out picked.csv
|
|
567
|
+
|
|
568
|
+
# or compute RDKit descriptors from SMILES first (needs the chem extra;
|
|
569
|
+
# .xlsx input needs the xlsx extra)
|
|
570
|
+
molscope select molecules.xlsx --smiles-col SMILES --compute-descriptors -n 100 --out picked.csv
|
|
571
|
+
```
|
|
572
|
+
`MolLogP` (RDKit's Crippen logP) is the ALogP equivalent. Rows with missing
|
|
573
|
+
descriptors are skipped; selection is deterministic.
|
|
574
|
+
|
|
575
|
+
## Use from an AI assistant (MCP)
|
|
576
|
+
|
|
577
|
+
MolScope ships an optional [Model Context Protocol](https://modelcontextprotocol.io)
|
|
578
|
+
server, so an AI assistant such as Claude Code or Claude Desktop can drive its
|
|
579
|
+
analyses in natural language. The server exposes MolScope's existing features as
|
|
580
|
+
MCP tools; it adds no new science, just an adapter layer over the public API.
|
|
581
|
+
|
|
582
|
+
```bash
|
|
583
|
+
pip install "molscope[mcp]" # needs Python >= 3.10
|
|
584
|
+
claude mcp add molscope -- molscope-mcp # register with Claude Code
|
|
585
|
+
```
|
|
586
|
+
|
|
587
|
+
The server runs over stdio (`molscope-mcp`, or `python -m molscope.mcp_server`)
|
|
588
|
+
and provides these tools:
|
|
589
|
+
|
|
590
|
+
| Tool | What it does |
|
|
591
|
+
| --- | --- |
|
|
592
|
+
| `summarize_structure` | Load a file or PDB id and report atoms, formula, chains, size. |
|
|
593
|
+
| `compute_descriptors` | Descriptor table across one or more structures (the batch tool). |
|
|
594
|
+
| `secondary_structure` | Per-residue DSSP codes and helix/strand/coil composition. |
|
|
595
|
+
| `contact_map` | Contact count, contact order, and labelled contacting pairs. |
|
|
596
|
+
| `binding_site` | Protein residues around a ligand, closest first. |
|
|
597
|
+
| `molecular_graph` | Node/edge counts and feature names for the ML graph. |
|
|
598
|
+
| `coarse_grain` | Bead assignment statistics for a coarse-grained mapping. |
|
|
599
|
+
| `render_structure`, `render_contact_map` | Return PNG figures. |
|
|
600
|
+
|
|
601
|
+
Each tool takes a `source` that is either a local coordinate-file path or a
|
|
602
|
+
4-character PDB id (fetched from RCSB). For example, you can ask the assistant to
|
|
603
|
+
*"fetch trypsin (3ptb), find the benzamidine binding-site residues, and render a
|
|
604
|
+
contact map"*, and it will call `binding_site` then `render_contact_map`. See
|
|
605
|
+
[`docs/user-guide/mcp-server.md`](docs/user-guide/mcp-server.md) for the full
|
|
606
|
+
guide.
|
|
539
607
|
|
|
540
608
|
## Sample structures
|
|
541
609
|
|
|
@@ -544,6 +612,7 @@ Supports advanced features like `--self-loops`, `--global-node`, and `--pe` (pos
|
|
|
544
612
|
| `examples/data/helix_201.xyz` | a helix (bare coordinates) |
|
|
545
613
|
| `examples/data/1fqy.pdb` | Aquaporin-1, single model (1661 atoms) |
|
|
546
614
|
| `examples/data/1aml.pdb` | Alzheimer amyloid A4 peptide, 20-model NMR ensemble |
|
|
615
|
+
| `examples/data/3ptb.pdb` | Trypsin-benzamidine complex, ligand-binding-site example |
|
|
547
616
|
|
|
548
617
|
## Notes
|
|
549
618
|
|
|
@@ -582,6 +651,20 @@ that run everywhere, plus cross-checks against reference scientific tools (the
|
|
|
582
651
|
simplified DSSP vs `mkdssp`) that turn "the tests pass" into a measured
|
|
583
652
|
agreement number.
|
|
584
653
|
|
|
654
|
+
## Changelog
|
|
655
|
+
|
|
656
|
+
Notable changes for each release are recorded in [`CHANGELOG.md`](CHANGELOG.md),
|
|
657
|
+
following the [Keep a Changelog](https://keepachangelog.com/) format.
|
|
658
|
+
|
|
659
|
+
## How to cite
|
|
660
|
+
|
|
661
|
+
Each release of MolScope is archived on Zenodo with a citable DOI. The concept
|
|
662
|
+
DOI [10.5281/zenodo.20433850](https://doi.org/10.5281/zenodo.20433850) always
|
|
663
|
+
resolves to the latest version; each release also has its own version DOI.
|
|
664
|
+
Machine-readable citation metadata lives in [`CITATION.cff`](CITATION.cff), so
|
|
665
|
+
GitHub's "Cite this repository" button on the sidebar produces BibTeX and APA
|
|
666
|
+
entries automatically.
|
|
667
|
+
|
|
585
668
|
## License
|
|
586
669
|
|
|
587
670
|
[MIT](LICENSE)
|
|
@@ -7,6 +7,7 @@
|
|
|
7
7
|
[](pyproject.toml)
|
|
8
8
|
[](LICENSE)
|
|
9
9
|
[](https://github.com/astral-sh/ruff)
|
|
10
|
+
[](https://doi.org/10.5281/zenodo.20433850)
|
|
10
11
|
|
|
11
12
|
Lightweight molecular structure analysis, graph export, and coarse-graining in
|
|
12
13
|
Python. MolScope is built around three polished workflows: turn coordinate
|
|
@@ -214,11 +215,17 @@ mol.select(chain="A") # one chain
|
|
|
214
215
|
mol.select(element="C") # all carbons
|
|
215
216
|
mol.select(resname="HOH") # waters
|
|
216
217
|
mol.select(resid=(10, 20)) # an inclusive residue range
|
|
218
|
+
mol.select(resid=100, icode="A") # PDB/mmCIF insertion code
|
|
219
|
+
mol.residue_ids # full ResidueId(chain, resid, icode, resname)
|
|
217
220
|
mol.alpha_carbons() # CA atoms (the usual basis for protein RMSD)
|
|
218
221
|
mol.backbone() # N, CA, C, O
|
|
219
222
|
mol[mask_or_indices] # subset by numpy mask / index array
|
|
220
223
|
```
|
|
221
224
|
|
|
225
|
+
Residue grouping, contact maps, binding sites and residue graphs use full
|
|
226
|
+
residue identity internally, so residues like `100A` and `100B` stay separate
|
|
227
|
+
while integer `resids` remain available for older code.
|
|
228
|
+
|
|
222
229
|
### Analysis and measurements
|
|
223
230
|
|
|
224
231
|
```python
|
|
@@ -303,10 +310,11 @@ Codes follow DSSP notation: `H`/`G`/`I` helices, `E`/`B` strands, `T` turn, `S`
|
|
|
303
310
|
bend, `-` coil. This is a simplified **educational** implementation of the
|
|
304
311
|
Kabsch-Sander hydrogen-bond model: not bit-identical to the reference `mkdssp`
|
|
305
312
|
on every edge case, but validated against it. A CI cross-check
|
|
306
|
-
(`tests/validation`)
|
|
307
|
-
(helix/strand/coil) with `mkdssp
|
|
308
|
-
|
|
309
|
-
|
|
313
|
+
(`tests/validation`) spans three fold classes and reports **~98 to 99%
|
|
314
|
+
per-residue 3-state agreement** (helix/strand/coil) with `mkdssp`: 99.1% on the
|
|
315
|
+
helical aquaporin (`1fqy`), 100% on mixed alpha/beta ubiquitin (`1ubq`), and
|
|
316
|
+
98.2% on the all-beta SH3 domain (`1shg`). It needs backbone N/CA/C/O atoms, so
|
|
317
|
+
use PDB/mmCIF input (not a bare
|
|
310
318
|
`.xyz`). The secondary-structure render in the showcase above (helices red,
|
|
311
319
|
turns cyan, coil grey) is produced this way.
|
|
312
320
|
|
|
@@ -463,7 +471,8 @@ inspection and reuse, not a validated simulation topology.
|
|
|
463
471
|
|
|
464
472
|
## Command-line interface
|
|
465
473
|
|
|
466
|
-
MolScope provides a
|
|
474
|
+
MolScope provides a CLI for visualization, binding-site tables, batch analysis,
|
|
475
|
+
and ML graph export.
|
|
467
476
|
|
|
468
477
|
### View (default)
|
|
469
478
|
Visualize a structure, apply transformations, and save images or animations.
|
|
@@ -479,6 +488,14 @@ Batch compute molecular descriptors for many files and save to a CSV table.
|
|
|
479
488
|
molscope analyze examples/data/*.pdb --out results.csv --preset native-3d --jobs 4
|
|
480
489
|
```
|
|
481
490
|
|
|
491
|
+
### Binding sites
|
|
492
|
+
Write protein-ligand binding-site residue contacts and optional pocket
|
|
493
|
+
descriptors to CSV.
|
|
494
|
+
```bash
|
|
495
|
+
molscope binding-site examples/data/3ptb.pdb --out site.csv --cutoff 4.5
|
|
496
|
+
molscope binding-site examples/data/3ptb.pdb --out site.csv --descriptors-out pocket.csv
|
|
497
|
+
```
|
|
498
|
+
|
|
482
499
|
### Export
|
|
483
500
|
Batch export molecular graphs to PyTorch Geometric, DGL, or NetworkX formats.
|
|
484
501
|
```bash
|
|
@@ -486,6 +503,53 @@ molscope export "data/*.cif" --to pyg --out-dir pyg_graphs/ --pe laplacian --job
|
|
|
486
503
|
```
|
|
487
504
|
Supports advanced features like `--self-loops`, `--global-node`, and `--pe` (positional encodings).
|
|
488
505
|
|
|
506
|
+
### Select
|
|
507
|
+
Pick a diverse subset from a molecule table (`.csv` or `.xlsx`) by MaxMin
|
|
508
|
+
selection over descriptors. Select on existing numeric columns, or compute
|
|
509
|
+
descriptors from a SMILES column with RDKit.
|
|
510
|
+
```bash
|
|
511
|
+
# diverse pick on existing property columns (no extras needed for CSV)
|
|
512
|
+
molscope select molecules.csv --descriptor-cols MW ALogP -n 100 --out picked.csv
|
|
513
|
+
|
|
514
|
+
# or compute RDKit descriptors from SMILES first (needs the chem extra;
|
|
515
|
+
# .xlsx input needs the xlsx extra)
|
|
516
|
+
molscope select molecules.xlsx --smiles-col SMILES --compute-descriptors -n 100 --out picked.csv
|
|
517
|
+
```
|
|
518
|
+
`MolLogP` (RDKit's Crippen logP) is the ALogP equivalent. Rows with missing
|
|
519
|
+
descriptors are skipped; selection is deterministic.
|
|
520
|
+
|
|
521
|
+
## Use from an AI assistant (MCP)
|
|
522
|
+
|
|
523
|
+
MolScope ships an optional [Model Context Protocol](https://modelcontextprotocol.io)
|
|
524
|
+
server, so an AI assistant such as Claude Code or Claude Desktop can drive its
|
|
525
|
+
analyses in natural language. The server exposes MolScope's existing features as
|
|
526
|
+
MCP tools; it adds no new science, just an adapter layer over the public API.
|
|
527
|
+
|
|
528
|
+
```bash
|
|
529
|
+
pip install "molscope[mcp]" # needs Python >= 3.10
|
|
530
|
+
claude mcp add molscope -- molscope-mcp # register with Claude Code
|
|
531
|
+
```
|
|
532
|
+
|
|
533
|
+
The server runs over stdio (`molscope-mcp`, or `python -m molscope.mcp_server`)
|
|
534
|
+
and provides these tools:
|
|
535
|
+
|
|
536
|
+
| Tool | What it does |
|
|
537
|
+
| --- | --- |
|
|
538
|
+
| `summarize_structure` | Load a file or PDB id and report atoms, formula, chains, size. |
|
|
539
|
+
| `compute_descriptors` | Descriptor table across one or more structures (the batch tool). |
|
|
540
|
+
| `secondary_structure` | Per-residue DSSP codes and helix/strand/coil composition. |
|
|
541
|
+
| `contact_map` | Contact count, contact order, and labelled contacting pairs. |
|
|
542
|
+
| `binding_site` | Protein residues around a ligand, closest first. |
|
|
543
|
+
| `molecular_graph` | Node/edge counts and feature names for the ML graph. |
|
|
544
|
+
| `coarse_grain` | Bead assignment statistics for a coarse-grained mapping. |
|
|
545
|
+
| `render_structure`, `render_contact_map` | Return PNG figures. |
|
|
546
|
+
|
|
547
|
+
Each tool takes a `source` that is either a local coordinate-file path or a
|
|
548
|
+
4-character PDB id (fetched from RCSB). For example, you can ask the assistant to
|
|
549
|
+
*"fetch trypsin (3ptb), find the benzamidine binding-site residues, and render a
|
|
550
|
+
contact map"*, and it will call `binding_site` then `render_contact_map`. See
|
|
551
|
+
[`docs/user-guide/mcp-server.md`](docs/user-guide/mcp-server.md) for the full
|
|
552
|
+
guide.
|
|
489
553
|
|
|
490
554
|
## Sample structures
|
|
491
555
|
|
|
@@ -494,6 +558,7 @@ Supports advanced features like `--self-loops`, `--global-node`, and `--pe` (pos
|
|
|
494
558
|
| `examples/data/helix_201.xyz` | a helix (bare coordinates) |
|
|
495
559
|
| `examples/data/1fqy.pdb` | Aquaporin-1, single model (1661 atoms) |
|
|
496
560
|
| `examples/data/1aml.pdb` | Alzheimer amyloid A4 peptide, 20-model NMR ensemble |
|
|
561
|
+
| `examples/data/3ptb.pdb` | Trypsin-benzamidine complex, ligand-binding-site example |
|
|
497
562
|
|
|
498
563
|
## Notes
|
|
499
564
|
|
|
@@ -532,6 +597,20 @@ that run everywhere, plus cross-checks against reference scientific tools (the
|
|
|
532
597
|
simplified DSSP vs `mkdssp`) that turn "the tests pass" into a measured
|
|
533
598
|
agreement number.
|
|
534
599
|
|
|
600
|
+
## Changelog
|
|
601
|
+
|
|
602
|
+
Notable changes for each release are recorded in [`CHANGELOG.md`](CHANGELOG.md),
|
|
603
|
+
following the [Keep a Changelog](https://keepachangelog.com/) format.
|
|
604
|
+
|
|
605
|
+
## How to cite
|
|
606
|
+
|
|
607
|
+
Each release of MolScope is archived on Zenodo with a citable DOI. The concept
|
|
608
|
+
DOI [10.5281/zenodo.20433850](https://doi.org/10.5281/zenodo.20433850) always
|
|
609
|
+
resolves to the latest version; each release also has its own version DOI.
|
|
610
|
+
Machine-readable citation metadata lives in [`CITATION.cff`](CITATION.cff), so
|
|
611
|
+
GitHub's "Cite this repository" button on the sidebar produces BibTeX and APA
|
|
612
|
+
entries automatically.
|
|
613
|
+
|
|
535
614
|
## License
|
|
536
615
|
|
|
537
616
|
[MIT](LICENSE)
|
|
@@ -9,12 +9,22 @@
|
|
|
9
9
|
- `molscope.write_pdb(molecule, path)`, `write_xyz(molecule, path)`.
|
|
10
10
|
- `molscope.featurize_many(paths, return_names=False)`: build an ML feature matrix.
|
|
11
11
|
- `molscope.descriptor_feature_names(preset)`: stable flattened descriptor columns.
|
|
12
|
+
- `molscope.pocket_descriptor_feature_names("pocket-basic")`: stable binding-pocket descriptor columns.
|
|
12
13
|
- `molscope.node_feature_names(preset)`, `edge_feature_names(preset)`: atom/bond graph preset columns.
|
|
13
14
|
- `molscope.residue_node_feature_names(preset)`, `residue_edge_feature_names(preset)`: residue contact graph preset columns.
|
|
14
15
|
- `molscope.interface_residues(mol, chain_a, chain_b, cutoff=5.0)`, `chain_contact_matrix(mol, cutoff=5.0)`: chain interfaces.
|
|
15
16
|
- `molscope.ligands(mol, ...)`, `binding_site(mol, ligand=None, cutoff=4.5)`: ligand detection and binding-site residues.
|
|
16
17
|
- `molscope.backbone_torsions(mol)`: per-residue phi/psi/omega.
|
|
17
18
|
|
|
19
|
+
Residue identity helpers:
|
|
20
|
+
|
|
21
|
+
- `molscope.ResidueId(chain, resid, insertion_code="", resname="")`: full residue identity used by PDB/mmCIF-aware APIs.
|
|
22
|
+
- `molscope.ResidueGroup`: yielded by `Molecule.residue_groups()`; has `.residue_id` and still unpacks as `(atom_indices, resname, resid, chain)`.
|
|
23
|
+
|
|
24
|
+
`BindingSite` results expose `to_records()`, `to_molecule(mol)`,
|
|
25
|
+
`descriptors(mol, preset="pocket-basic")`, and `plot(mol)` for residue tables,
|
|
26
|
+
pocket descriptor extraction, and quick figures.
|
|
27
|
+
|
|
18
28
|
## Molecule
|
|
19
29
|
|
|
20
30
|
Construction:
|
|
@@ -12,7 +12,7 @@ mol = ms.read("examples/data/3ptb.pdb")
|
|
|
12
12
|
print(mol.ligands()) # [LigandResidue(A:BEN1, 9 atoms)]
|
|
13
13
|
|
|
14
14
|
site = mol.binding_site(cutoff=4.5) # single ligand auto-detected
|
|
15
|
-
print(site) # BindingSite(BEN1: 13 residues < 4.5 A)
|
|
15
|
+
print(site) # BindingSite(A:BEN1: 13 residues < 4.5 A)
|
|
16
16
|
|
|
17
17
|
for res, dist in zip(site.residues, site.min_distances):
|
|
18
18
|
print(f"{res!s:<10} {dist:.2f} A")
|
|
@@ -23,13 +23,39 @@ for res, dist in zip(site.residues, site.min_distances):
|
|
|
23
23
|
# A:SER195 3.65 <- catalytic serine
|
|
24
24
|
```
|
|
25
25
|
|
|
26
|
+
For quick figures or reports, convert the site to table-friendly residue
|
|
27
|
+
records and extract descriptors for only the site residues:
|
|
28
|
+
|
|
29
|
+
```python
|
|
30
|
+
site.to_records()[0]
|
|
31
|
+
# {'residue_id': 'A:GLY219', 'chain': 'A', 'resid': 219,
|
|
32
|
+
# 'insertion_code': '', 'resname': 'GLY',
|
|
33
|
+
# 'min_distance': 2.815..., 'n_atom_contacts': 5}
|
|
34
|
+
|
|
35
|
+
site.descriptors(mol, preset="pocket-basic")
|
|
36
|
+
site.plot(mol, show=False) # pocket residues plus ligand
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
The same residue table is available from the command line:
|
|
40
|
+
|
|
41
|
+
```bash
|
|
42
|
+
molscope binding-site examples/data/3ptb.pdb --out site.csv --cutoff 4.5
|
|
43
|
+
```
|
|
44
|
+
|
|
45
|
+
Add `--descriptors-out pocket.csv` to also write the one-row
|
|
46
|
+
`pocket-basic` descriptor table.
|
|
47
|
+
|
|
26
48
|
When a structure has several ligands, select one by residue name or location:
|
|
27
49
|
|
|
28
50
|
```python
|
|
29
51
|
mol.binding_site(ligand="BEN")
|
|
30
52
|
mol.binding_site(ligand=("A", 1))
|
|
53
|
+
mol.binding_site(ligand=("A", 100, "A")) # with insertion code
|
|
31
54
|
```
|
|
32
55
|
|
|
56
|
+
The CLI accepts the same choices as `--ligand BEN`, `--ligand A:1`, or
|
|
57
|
+
`--ligand A:100:A`.
|
|
58
|
+
|
|
33
59
|
A runnable version lives in
|
|
34
60
|
[`examples/binding_site.py`](https://github.com/roshan2004/molscope/blob/main/examples/binding_site.py).
|
|
35
61
|
See the full guide:
|
|
@@ -122,17 +122,21 @@ style assignment based on backbone hydrogen-bond patterns.
|
|
|
122
122
|
What is validated:
|
|
123
123
|
|
|
124
124
|
- The validation suite compares MolScope's simplified DSSP against `mkdssp`
|
|
125
|
-
where the reference binary is available
|
|
126
|
-
|
|
127
|
-
|
|
125
|
+
across three fold classes where the reference binary is available:
|
|
126
|
+
Aquaporin-1 (`1fqy`, helix-dominated), ubiquitin (`1ubq`, mixed alpha/beta),
|
|
127
|
+
and the SH3 domain (`1shg`, all-beta).
|
|
128
|
+
- After reducing DSSP states to helix/strand/coil, agreement is high across
|
|
129
|
+
all three folds: about 99% on the helical and mixed structures and about
|
|
130
|
+
98% on the all-beta one.
|
|
128
131
|
|
|
129
132
|
Limits:
|
|
130
133
|
|
|
131
134
|
- Not bit-identical to reference `mkdssp`. Treat output as the
|
|
132
135
|
educational/prototyping equivalent of DSSP, not as a substitute for it in
|
|
133
136
|
production pipelines.
|
|
134
|
-
-
|
|
135
|
-
|
|
137
|
+
- Disagreements concentrate at the boundary residues of helices and strands,
|
|
138
|
+
and on irregular or low-quality structures, rather than on any one fold
|
|
139
|
+
class.
|
|
136
140
|
- Needs backbone N/CA/C/O atoms, so bare XYZ input is insufficient.
|
|
137
141
|
- Only the standard collapsed states (H/E/C) are validated against the
|
|
138
142
|
reference; finer DSSP categories are not.
|
|
@@ -57,6 +57,20 @@ Preset options:
|
|
|
57
57
|
- `native-3d`: `native-basic` plus centres, inertia, principal axes/moments, and distance histograms.
|
|
58
58
|
- `rdkit-basic`: `native-basic` plus a stable subset of RDKit scalar descriptors.
|
|
59
59
|
|
|
60
|
+
Ligand binding sites have their own fixed-size preset because they need a
|
|
61
|
+
ligand context:
|
|
62
|
+
|
|
63
|
+
```python
|
|
64
|
+
mol = ms.read("examples/data/3ptb.pdb")
|
|
65
|
+
site = mol.binding_site(cutoff=4.5)
|
|
66
|
+
pocket = site.descriptors(mol, preset="pocket-basic")
|
|
67
|
+
names = ms.pocket_descriptor_feature_names("pocket-basic")
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
`pocket-basic` includes pocket atom and residue counts, amino-acid composition,
|
|
71
|
+
protein-ligand contact counts, radius of gyration, bounding-box dimensions, and
|
|
72
|
+
ligand-distance summaries.
|
|
73
|
+
|
|
60
74
|
## RDKit descriptors
|
|
61
75
|
|
|
62
76
|
Install the optional chemical backend to access RDKit's scalar descriptor set:
|