molscope 0.8.2__tar.gz → 0.9.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (154) hide show
  1. molscope-0.9.0/CHANGELOG.md +174 -0
  2. {molscope-0.8.2 → molscope-0.9.0}/CITATION.cff +12 -2
  3. {molscope-0.8.2 → molscope-0.9.0}/MANIFEST.in +1 -0
  4. {molscope-0.8.2/molscope.egg-info → molscope-0.9.0}/PKG-INFO +89 -6
  5. {molscope-0.8.2 → molscope-0.9.0}/README.md +84 -5
  6. {molscope-0.8.2 → molscope-0.9.0}/docs/api-reference.md +10 -0
  7. {molscope-0.8.2 → molscope-0.9.0}/docs/examples/binding-site.md +27 -1
  8. {molscope-0.8.2 → molscope-0.9.0}/docs/limitations.md +9 -5
  9. {molscope-0.8.2 → molscope-0.9.0}/docs/user-guide/descriptors.md +14 -0
  10. molscope-0.9.0/docs/user-guide/library-selection.md +82 -0
  11. molscope-0.9.0/docs/user-guide/mcp-server.md +88 -0
  12. {molscope-0.8.2 → molscope-0.9.0}/docs/user-guide/protein-analysis.md +18 -0
  13. {molscope-0.8.2 → molscope-0.9.0}/docs/user-guide/reading-files.md +6 -2
  14. {molscope-0.8.2 → molscope-0.9.0}/docs/user-guide/selections.md +13 -0
  15. {molscope-0.8.2 → molscope-0.9.0}/docs/validation.md +15 -5
  16. {molscope-0.8.2 → molscope-0.9.0}/examples/binding_site.py +20 -1
  17. molscope-0.9.0/examples/data/1shg.pdb +761 -0
  18. molscope-0.9.0/examples/data/1ubq.pdb +970 -0
  19. {molscope-0.8.2 → molscope-0.9.0}/mkdocs.yml +2 -0
  20. {molscope-0.8.2 → molscope-0.9.0}/molscope/__init__.py +6 -2
  21. {molscope-0.8.2 → molscope-0.9.0}/molscope/cli.py +209 -1
  22. {molscope-0.8.2 → molscope-0.9.0}/molscope/coarsegrain.py +58 -11
  23. {molscope-0.8.2 → molscope-0.9.0}/molscope/contactmap.py +19 -8
  24. molscope-0.9.0/molscope/contacts.py +591 -0
  25. {molscope-0.8.2 → molscope-0.9.0}/molscope/dssp.py +34 -13
  26. {molscope-0.8.2 → molscope-0.9.0}/molscope/ensemble.py +9 -2
  27. {molscope-0.8.2 → molscope-0.9.0}/molscope/graph.py +27 -12
  28. {molscope-0.8.2 → molscope-0.9.0}/molscope/io.py +38 -21
  29. molscope-0.9.0/molscope/library.py +279 -0
  30. molscope-0.9.0/molscope/mcp_server.py +327 -0
  31. {molscope-0.8.2 → molscope-0.9.0}/molscope/molecule.py +211 -13
  32. {molscope-0.8.2 → molscope-0.9.0}/molscope/plotting.py +1 -1
  33. {molscope-0.8.2 → molscope-0.9.0/molscope.egg-info}/PKG-INFO +89 -6
  34. {molscope-0.8.2 → molscope-0.9.0}/molscope.egg-info/SOURCES.txt +12 -0
  35. {molscope-0.8.2 → molscope-0.9.0}/molscope.egg-info/entry_points.txt +1 -0
  36. {molscope-0.8.2 → molscope-0.9.0}/molscope.egg-info/requires.txt +8 -0
  37. {molscope-0.8.2 → molscope-0.9.0}/pyproject.toml +7 -1
  38. molscope-0.9.0/tests/fixtures/insertion_codes.cif +20 -0
  39. molscope-0.9.0/tests/fixtures/ugly_residue_ids.pdb +11 -0
  40. molscope-0.9.0/tests/test_cli.py +140 -0
  41. molscope-0.9.0/tests/test_cli_batch.py +192 -0
  42. {molscope-0.8.2 → molscope-0.9.0}/tests/test_contactmap.py +17 -0
  43. molscope-0.9.0/tests/test_contacts.py +289 -0
  44. {molscope-0.8.2 → molscope-0.9.0}/tests/test_extras.py +1 -0
  45. {molscope-0.8.2 → molscope-0.9.0}/tests/test_features.py +1 -0
  46. {molscope-0.8.2 → molscope-0.9.0}/tests/test_graph.py +11 -0
  47. {molscope-0.8.2 → molscope-0.9.0}/tests/test_io.py +45 -0
  48. molscope-0.9.0/tests/test_library.py +278 -0
  49. molscope-0.9.0/tests/test_mcp_server.py +181 -0
  50. molscope-0.9.0/tests/test_molecule.py +241 -0
  51. {molscope-0.8.2 → molscope-0.9.0}/tests/test_protein_workflows.py +2 -1
  52. molscope-0.9.0/tests/validation/test_binding_sites_ref.py +49 -0
  53. {molscope-0.8.2 → molscope-0.9.0}/tests/validation/test_dssp_ref.py +56 -14
  54. molscope-0.8.2/molscope/contacts.py +0 -292
  55. molscope-0.8.2/tests/test_cli.py +0 -48
  56. molscope-0.8.2/tests/test_cli_batch.py +0 -54
  57. molscope-0.8.2/tests/test_contacts.py +0 -122
  58. molscope-0.8.2/tests/test_molecule.py +0 -131
  59. {molscope-0.8.2 → molscope-0.9.0}/LICENSE +0 -0
  60. {molscope-0.8.2 → molscope-0.9.0}/docs/assets/coarsegrain/1fqy-cg-mapping-comparison.png +0 -0
  61. {molscope-0.8.2 → molscope-0.9.0}/docs/assets/coarsegrain/1fqy-martini-mapping.png +0 -0
  62. {molscope-0.8.2 → molscope-0.9.0}/docs/assets/contactmaps/1aml-contact-frequency.png +0 -0
  63. {molscope-0.8.2 → molscope-0.9.0}/docs/assets/contactmaps/1fqy-ca-distance-matrix.png +0 -0
  64. {molscope-0.8.2 → molscope-0.9.0}/docs/assets/contactmaps/1fqy-residue-contact-map.png +0 -0
  65. {molscope-0.8.2 → molscope-0.9.0}/docs/assets/geometry/1aml-rmsf-profile.png +0 -0
  66. {molscope-0.8.2 → molscope-0.9.0}/docs/assets/geometry/1fqy-principal-axes.png +0 -0
  67. {molscope-0.8.2 → molscope-0.9.0}/docs/assets/graphs/1fqy-residue-contact-graph.png +0 -0
  68. {molscope-0.8.2 → molscope-0.9.0}/docs/assets/readme/aquaporin-structure-v2.png +0 -0
  69. {molscope-0.8.2 → molscope-0.9.0}/docs/assets/readme/coarse-grained-beads-v2.png +0 -0
  70. {molscope-0.8.2 → molscope-0.9.0}/docs/assets/readme/residue-contact-map.png +0 -0
  71. {molscope-0.8.2 → molscope-0.9.0}/docs/assets/readme/secondary-structure.png +0 -0
  72. {molscope-0.8.2 → molscope-0.9.0}/docs/benchmarks.md +0 -0
  73. {molscope-0.8.2 → molscope-0.9.0}/docs/contributing.md +0 -0
  74. {molscope-0.8.2 → molscope-0.9.0}/docs/examples/analyze-contacts.md +0 -0
  75. {molscope-0.8.2 → molscope-0.9.0}/docs/examples/build-molecular-graph.md +0 -0
  76. {molscope-0.8.2 → molscope-0.9.0}/docs/examples/coarse-grain-protein.md +0 -0
  77. {molscope-0.8.2 → molscope-0.9.0}/docs/examples/compare-nmr-models.md +0 -0
  78. {molscope-0.8.2 → molscope-0.9.0}/docs/examples/export-pyg.md +0 -0
  79. {molscope-0.8.2 → molscope-0.9.0}/docs/examples/geometry-tour.md +0 -0
  80. {molscope-0.8.2 → molscope-0.9.0}/docs/examples/index.md +0 -0
  81. {molscope-0.8.2 → molscope-0.9.0}/docs/examples/pdb-to-graph-cg.md +0 -0
  82. {molscope-0.8.2 → molscope-0.9.0}/docs/examples/pdb-to-pyg-ml.md +0 -0
  83. {molscope-0.8.2 → molscope-0.9.0}/docs/examples/protein-analysis-from-scratch.md +0 -0
  84. {molscope-0.8.2 → molscope-0.9.0}/docs/examples/residue-contact-graphs.md +0 -0
  85. {molscope-0.8.2 → molscope-0.9.0}/docs/index.md +0 -0
  86. {molscope-0.8.2 → molscope-0.9.0}/docs/installation.md +0 -0
  87. {molscope-0.8.2 → molscope-0.9.0}/docs/quickstart.md +0 -0
  88. {molscope-0.8.2 → molscope-0.9.0}/docs/roadmap.md +0 -0
  89. {molscope-0.8.2 → molscope-0.9.0}/docs/tutorials/index.md +0 -0
  90. {molscope-0.8.2 → molscope-0.9.0}/docs/tutorials/pdb-to-coarse-grained-beads.md +0 -0
  91. {molscope-0.8.2 → molscope-0.9.0}/docs/tutorials/pdb-to-descriptors.md +0 -0
  92. {molscope-0.8.2 → molscope-0.9.0}/docs/tutorials/pdb-to-graph-gnn.md +0 -0
  93. {molscope-0.8.2 → molscope-0.9.0}/docs/user-guide/chemical-perception.md +0 -0
  94. {molscope-0.8.2 → molscope-0.9.0}/docs/user-guide/coarse-graining.md +0 -0
  95. {molscope-0.8.2 → molscope-0.9.0}/docs/user-guide/contact-maps.md +0 -0
  96. {molscope-0.8.2 → molscope-0.9.0}/docs/user-guide/coordinate-formats.md +0 -0
  97. {molscope-0.8.2 → molscope-0.9.0}/docs/user-guide/ensembles.md +0 -0
  98. {molscope-0.8.2 → molscope-0.9.0}/docs/user-guide/geometry.md +0 -0
  99. {molscope-0.8.2 → molscope-0.9.0}/docs/user-guide/molecular-graphs.md +0 -0
  100. {molscope-0.8.2 → molscope-0.9.0}/docs/user-guide/plotting.md +0 -0
  101. {molscope-0.8.2 → molscope-0.9.0}/examples/coarse_graining.py +0 -0
  102. {molscope-0.8.2 → molscope-0.9.0}/examples/data/1aml.pdb +0 -0
  103. {molscope-0.8.2 → molscope-0.9.0}/examples/data/1fqy.pdb +0 -0
  104. {molscope-0.8.2 → molscope-0.9.0}/examples/data/3ptb.pdb +0 -0
  105. {molscope-0.8.2 → molscope-0.9.0}/examples/data/helix_201.xyz +0 -0
  106. {molscope-0.8.2 → molscope-0.9.0}/examples/geometry.py +0 -0
  107. {molscope-0.8.2 → molscope-0.9.0}/examples/graph_to_gnn.py +0 -0
  108. {molscope-0.8.2 → molscope-0.9.0}/examples/legacy_utils.py +0 -0
  109. {molscope-0.8.2 → molscope-0.9.0}/examples/pdb_to_pyg_ml.py +0 -0
  110. {molscope-0.8.2 → molscope-0.9.0}/examples/protein_analysis.py +0 -0
  111. {molscope-0.8.2 → molscope-0.9.0}/examples/residue_contact_graph.py +0 -0
  112. {molscope-0.8.2 → molscope-0.9.0}/examples/tour.py +0 -0
  113. {molscope-0.8.2 → molscope-0.9.0}/molscope/__main__.py +0 -0
  114. {molscope-0.8.2 → molscope-0.9.0}/molscope/chem.py +0 -0
  115. {molscope-0.8.2 → molscope-0.9.0}/molscope/cif.py +0 -0
  116. {molscope-0.8.2 → molscope-0.9.0}/molscope/descriptors.py +0 -0
  117. {molscope-0.8.2 → molscope-0.9.0}/molscope/distance.py +0 -0
  118. {molscope-0.8.2 → molscope-0.9.0}/molscope/elements.py +0 -0
  119. {molscope-0.8.2 → molscope-0.9.0}/molscope.egg-info/dependency_links.txt +0 -0
  120. {molscope-0.8.2 → molscope-0.9.0}/molscope.egg-info/top_level.txt +0 -0
  121. {molscope-0.8.2 → molscope-0.9.0}/notebooks/molscope_tour.ipynb +0 -0
  122. {molscope-0.8.2 → molscope-0.9.0}/notebooks/pdb_to_gnn.ipynb +0 -0
  123. {molscope-0.8.2 → molscope-0.9.0}/notebooks/protein_analysis_from_scratch.ipynb +0 -0
  124. {molscope-0.8.2 → molscope-0.9.0}/requirements.txt +0 -0
  125. {molscope-0.8.2 → molscope-0.9.0}/scripts/benchmark_core.py +0 -0
  126. {molscope-0.8.2 → molscope-0.9.0}/scripts/build_gnn_notebook.py +0 -0
  127. {molscope-0.8.2 → molscope-0.9.0}/scripts/build_protein_analysis_notebook.py +0 -0
  128. {molscope-0.8.2 → molscope-0.9.0}/scripts/build_user_guide_pdf.py +0 -0
  129. {molscope-0.8.2 → molscope-0.9.0}/scripts/render_coarsegrain_images.py +0 -0
  130. {molscope-0.8.2 → molscope-0.9.0}/scripts/render_contact_analysis_images.py +0 -0
  131. {molscope-0.8.2 → molscope-0.9.0}/scripts/render_geometry_images.py +0 -0
  132. {molscope-0.8.2 → molscope-0.9.0}/setup.cfg +0 -0
  133. {molscope-0.8.2 → molscope-0.9.0}/tests/fixtures/bad_coord.pdb +0 -0
  134. {molscope-0.8.2 → molscope-0.9.0}/tests/fixtures/bad_coord.xyz +0 -0
  135. {molscope-0.8.2 → molscope-0.9.0}/tests/fixtures/bad_counts.sdf +0 -0
  136. {molscope-0.8.2 → molscope-0.9.0}/tests/fixtures/missing_coord_col.cif +0 -0
  137. {molscope-0.8.2 → molscope-0.9.0}/tests/fixtures/no_atom_site.cif +0 -0
  138. {molscope-0.8.2 → molscope-0.9.0}/tests/fixtures/no_atoms.pdb +0 -0
  139. {molscope-0.8.2 → molscope-0.9.0}/tests/fixtures/short_atom.pdb +0 -0
  140. {molscope-0.8.2 → molscope-0.9.0}/tests/fixtures/truncated.sdf +0 -0
  141. {molscope-0.8.2 → molscope-0.9.0}/tests/fixtures/truncated.xyz +0 -0
  142. {molscope-0.8.2 → molscope-0.9.0}/tests/fixtures/v3000.sdf +0 -0
  143. {molscope-0.8.2 → molscope-0.9.0}/tests/fixtures/water.sdf +0 -0
  144. {molscope-0.8.2 → molscope-0.9.0}/tests/test_cg_mapping.py +0 -0
  145. {molscope-0.8.2 → molscope-0.9.0}/tests/test_chem.py +0 -0
  146. {molscope-0.8.2 → molscope-0.9.0}/tests/test_cif_validation.py +0 -0
  147. {molscope-0.8.2 → molscope-0.9.0}/tests/test_clustering.py +0 -0
  148. {molscope-0.8.2 → molscope-0.9.0}/tests/test_coarsegrain.py +0 -0
  149. {molscope-0.8.2 → molscope-0.9.0}/tests/test_descriptors.py +0 -0
  150. {molscope-0.8.2 → molscope-0.9.0}/tests/test_dssp.py +0 -0
  151. {molscope-0.8.2 → molscope-0.9.0}/tests/validation/test_bonds_ref.py +0 -0
  152. {molscope-0.8.2 → molscope-0.9.0}/tests/validation/test_chem_ref.py +0 -0
  153. {molscope-0.8.2 → molscope-0.9.0}/tests/validation/test_geometry_ref.py +0 -0
  154. {molscope-0.8.2 → molscope-0.9.0}/tests/validation/test_invariants.py +0 -0
@@ -0,0 +1,174 @@
1
+ # Changelog
2
+
3
+ All notable changes to MolScope are documented here.
4
+
5
+ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
6
+ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
+ While the package is pre-1.0, minor versions may include backwards-incompatible
8
+ API changes; these are called out under **Changed** where they occur.
9
+
10
+ ## [Unreleased]
11
+
12
+ ## [0.9.0] - 2026-05-29
13
+
14
+ ### Added
15
+
16
+ - Molecule-table workflow: `molscope select` and the `molscope.library` module
17
+ read a CSV/XLSX of molecules and pick a diverse subset by MaxMin
18
+ (farthest-first) selection over descriptors. Select on existing numeric columns
19
+ (e.g. `MW`, `ALogP`) or compute RDKit descriptors from a SMILES column with
20
+ `--compute-descriptors --smiles-col`. Adds an `xlsx` extra (`openpyxl`) for
21
+ spreadsheet I/O; CSV input and selection need no optional backend.
22
+ - Optional MCP (Model Context Protocol) server, `molscope.mcp_server`, exposing
23
+ MolScope's analyses as tools for AI assistants such as Claude Code and Claude
24
+ Desktop. Adds a `molscope-mcp` console script, an `mcp` extra
25
+ (`pip install "molscope[mcp]"`, Python >= 3.10), and nine read-only tools that
26
+ wrap the existing API: summarise, descriptors, secondary structure, contact
27
+ map, binding site, molecular graph, coarse-grain, and two PNG render tools.
28
+ - Broadened the DSSP reference cross-check to three fold classes instead of one:
29
+ helix-dominated Aquaporin-1 (`1fqy`), mixed alpha/beta ubiquitin (`1ubq`), and
30
+ the all-beta SH3 domain (`1shg`). The test is parametrised and prints
31
+ per-fold 3-state agreement so results read as a range. Measured against
32
+ `mkdssp` 4.5.8: 99.1% (`1fqy`), 100% (`1ubq`), 98.2% (`1shg`).
33
+ - Bundled `1ubq.pdb` and `1shg.pdb` in `examples/data` as the new validation
34
+ structures.
35
+
36
+ ### Changed
37
+
38
+ - Documentation and the JOSS paper now report DSSP agreement as a measured
39
+ range across fold classes rather than a single helical figure, and no longer
40
+ imply that strand-rich folds agree markedly less well.
41
+
42
+ ## [0.8.3] - 2026-05-28
43
+
44
+ ### Added
45
+
46
+ - JOSS paper draft under `paper/` and Zenodo deposition metadata
47
+ (`.zenodo.json`) for an archival DOI.
48
+
49
+ ### Changed
50
+
51
+ - Bumped GitHub Actions to Node 24-ready versions.
52
+
53
+ ## [0.8.2] - 2026-05-28
54
+
55
+ ### Added
56
+
57
+ - Read the Docs hosting for the documentation site.
58
+ - Coverage reporting via `pytest-cov` and Codecov, measured with the optional
59
+ backends installed so the RDKit, gemmi, NetworkX, SciPy and Torch paths are
60
+ exercised rather than skipped.
61
+ - Coarse-grained virtual-site support, preserved as derived coordinate metadata.
62
+ - PDB workflow tutorials and a protein-analysis-from-scratch walkthrough.
63
+ - CLI batch analysis and graph-export subcommands, with improved selection
64
+ handling.
65
+ - `O(n)` cell-list neighbour search and opt-in periodic-boundary support for the
66
+ distance and contact methods.
67
+ - Residue contact graphs for ML and an educational coarse-graining workflow.
68
+ - Scientific validation tables, references, and a reproducible benchmarks page.
69
+
70
+ ### Changed
71
+
72
+ - Reorganised the limitations page by workflow and added a Graph ML section.
73
+ - Refocused the documentation around the three core workflows.
74
+ - Completed the geometry API and added a visual geometry guide validated against
75
+ MDAnalysis.
76
+ - Polished the coarse-graining workflow, contact maps, and dense distance
77
+ backends; improved coordinate-format parser errors and added edge-case
78
+ fixtures.
79
+ - Stopped tracking generated graph exports and added `graphs/` to `.gitignore`.
80
+
81
+ ### Fixed
82
+
83
+ - Batch CLI crash with `--jobs > 1` on spawn-based platforms.
84
+ - gemmi-backed mmCIF unit-cell read.
85
+ - Lint errors, and made periodic boundaries opt-in for distance and contact
86
+ methods.
87
+
88
+ ## [0.8.1] - 2026-05-26
89
+
90
+ ### Added
91
+
92
+ - Tier-2 validation suite cross-checking against reference scientific tools,
93
+ covering DSSP (`mkdssp`), geometry and RMSD (MDAnalysis), bond perception and
94
+ chemical features (RDKit), and contact maps.
95
+ - PyTorch Geometric ML tutorial and `CITATION.cff` citation metadata.
96
+
97
+ ### Changed
98
+
99
+ - DSSP validation now invokes `mkdssp` directly instead of going through
100
+ Biopython, and the 3-state agreement floor was tightened to 0.95 after
101
+ observing 99.1% on CI.
102
+ - CI runs the validation job with the required extras and fails loudly if the
103
+ reference tools cannot be imported.
104
+ - Polished repository layout and documentation.
105
+
106
+ ### Fixed
107
+
108
+ - README Mermaid diagram syntax.
109
+
110
+ ## [0.8.0] - 2026-05-26
111
+
112
+ ### Added
113
+
114
+ - Expanded molecular parsing and machine-learning feature support.
115
+
116
+ ## [0.7.0] - 2026-05-25
117
+
118
+ ### Added
119
+
120
+ - Simplified, dependency-free DSSP-style secondary-structure assignment based on
121
+ the Kabsch-Sander hydrogen-bond model.
122
+
123
+ ### Changed
124
+
125
+ - README: added a "Why MolScope" section, a tool comparison, and a CLI output
126
+ example; the secondary-structure render replaced the earlier hero animation.
127
+
128
+ ## [0.6.2] - 2026-05-25
129
+
130
+ ### Fixed
131
+
132
+ - README images now render on PyPI, and the publish workflow was hardened.
133
+
134
+ ## [0.6.1] - 2026-05-25
135
+
136
+ ### Added
137
+
138
+ - PyPI trusted-publishing workflow.
139
+
140
+ ### Changed
141
+
142
+ - Expanded the package docstring to cover the full feature scope.
143
+
144
+ ## [0.6.0] - 2026-05-25
145
+
146
+ Initial public release under the **MolScope** name, renamed from the earlier
147
+ `molecule3d` prototype. This release consolidated the core toolkit:
148
+
149
+ ### Added
150
+
151
+ - `Molecule` object on a NumPy core, with fixed-column PDB parsing and readers
152
+ and writers for XYZ, PDB, mmCIF and SDF.
153
+ - Per-atom metadata, metadata-based selections, geometry measurements, and RMSD.
154
+ - Molecular graph construction with NetworkX, PyTorch Geometric and DGL
155
+ exporters.
156
+ - Coarse-graining tools: residue and custom mappings, explicit-bond support,
157
+ index-based mappings, and a dropped-atom warning.
158
+ - Contact maps and ensemble contact-frequency analysis, plus ensemble RMSD
159
+ clustering and an RMSD heatmap.
160
+ - Native structural descriptors and a MkDocs documentation site, including a
161
+ user-guide PDF builder.
162
+ - `uv` support (lockfile, dev dependency group, `.python-version`), continuous
163
+ integration, and README visual examples.
164
+
165
+ [Unreleased]: https://github.com/roshan2004/molscope/compare/v0.9.0...HEAD
166
+ [0.9.0]: https://github.com/roshan2004/molscope/compare/v0.8.3...v0.9.0
167
+ [0.8.3]: https://github.com/roshan2004/molscope/compare/v0.8.2...v0.8.3
168
+ [0.8.2]: https://github.com/roshan2004/molscope/compare/v0.8.1...v0.8.2
169
+ [0.8.1]: https://github.com/roshan2004/molscope/compare/v0.8.0...v0.8.1
170
+ [0.8.0]: https://github.com/roshan2004/molscope/compare/v0.7.0...v0.8.0
171
+ [0.7.0]: https://github.com/roshan2004/molscope/compare/v0.6.2...v0.7.0
172
+ [0.6.2]: https://github.com/roshan2004/molscope/compare/v0.6.1...v0.6.2
173
+ [0.6.1]: https://github.com/roshan2004/molscope/compare/v0.6.0...v0.6.1
174
+ [0.6.0]: https://github.com/roshan2004/molscope/releases/tag/v0.6.0
@@ -5,9 +5,19 @@ title: "MolScope: lightweight molecular structure analysis, visualisation, graph
5
5
  authors:
6
6
  - family-names: Shrestha
7
7
  given-names: Roshan
8
- version: 0.8.2
9
- date-released: "2026-05-28"
8
+ orcid: "https://orcid.org/0000-0002-9356-5136"
9
+ affiliation: "Independent Researcher"
10
+ version: 0.9.0
11
+ date-released: "2026-05-29"
10
12
  license: MIT
13
+ doi: 10.5281/zenodo.20433850
14
+ identifiers:
15
+ - type: doi
16
+ value: 10.5281/zenodo.20433850
17
+ description: Concept DOI (always resolves to the latest archived version)
18
+ - type: doi
19
+ value: 10.5281/zenodo.20433851
20
+ description: DOI for v0.8.3
11
21
  repository-code: "https://github.com/roshan2004/molscope"
12
22
  url: "https://github.com/roshan2004/molscope"
13
23
  abstract: >
@@ -1,4 +1,5 @@
1
1
  include CITATION.cff
2
+ include CHANGELOG.md
2
3
  include LICENSE
3
4
  include README.md
4
5
  include mkdocs.yml
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: molscope
3
- Version: 0.8.2
3
+ Version: 0.9.0
4
4
  Summary: Lightweight molecular coordinate workflows for descriptors, graph ML, and coarse-grained beads.
5
5
  Author-email: Roshan Shrestha <roshanpra@gmail.com>
6
6
  License-Expression: MIT
@@ -30,8 +30,12 @@ Provides-Extra: chem
30
30
  Requires-Dist: rdkit>=2023.9; extra == "chem"
31
31
  Provides-Extra: cif
32
32
  Requires-Dist: gemmi>=0.7; extra == "cif"
33
+ Provides-Extra: xlsx
34
+ Requires-Dist: openpyxl>=3.1; extra == "xlsx"
33
35
  Provides-Extra: gpu
34
36
  Requires-Dist: torch>=2.0; extra == "gpu"
37
+ Provides-Extra: mcp
38
+ Requires-Dist: mcp>=1.2; python_version >= "3.10" and extra == "mcp"
35
39
  Provides-Extra: validation
36
40
  Requires-Dist: mdanalysis>=2.7; extra == "validation"
37
41
  Requires-Dist: rdkit>=2023.9; extra == "validation"
@@ -57,6 +61,7 @@ Dynamic: license-file
57
61
  [![Python](https://img.shields.io/badge/python-3.9%20%7C%203.11%20%7C%203.13-blue)](pyproject.toml)
58
62
  [![License: MIT](https://img.shields.io/badge/license-MIT-yellow.svg)](LICENSE)
59
63
  [![Code style: Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
64
+ [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.20433850.svg)](https://doi.org/10.5281/zenodo.20433850)
60
65
 
61
66
  Lightweight molecular structure analysis, graph export, and coarse-graining in
62
67
  Python. MolScope is built around three polished workflows: turn coordinate
@@ -264,11 +269,17 @@ mol.select(chain="A") # one chain
264
269
  mol.select(element="C") # all carbons
265
270
  mol.select(resname="HOH") # waters
266
271
  mol.select(resid=(10, 20)) # an inclusive residue range
272
+ mol.select(resid=100, icode="A") # PDB/mmCIF insertion code
273
+ mol.residue_ids # full ResidueId(chain, resid, icode, resname)
267
274
  mol.alpha_carbons() # CA atoms (the usual basis for protein RMSD)
268
275
  mol.backbone() # N, CA, C, O
269
276
  mol[mask_or_indices] # subset by numpy mask / index array
270
277
  ```
271
278
 
279
+ Residue grouping, contact maps, binding sites and residue graphs use full
280
+ residue identity internally, so residues like `100A` and `100B` stay separate
281
+ while integer `resids` remain available for older code.
282
+
272
283
  ### Analysis and measurements
273
284
 
274
285
  ```python
@@ -353,10 +364,11 @@ Codes follow DSSP notation: `H`/`G`/`I` helices, `E`/`B` strands, `T` turn, `S`
353
364
  bend, `-` coil. This is a simplified **educational** implementation of the
354
365
  Kabsch-Sander hydrogen-bond model: not bit-identical to the reference `mkdssp`
355
366
  on every edge case, but validated against it. A CI cross-check
356
- (`tests/validation`) puts it at **~99% per-residue 3-state agreement**
357
- (helix/strand/coil) with `mkdssp` 4.2.2 on the bundled aquaporin (`1fqy`);
358
- strand-rich folds, where reference DSSP is hardest to match, will agree less
359
- closely. It needs backbone N/CA/C/O atoms, so use PDB/mmCIF input (not a bare
367
+ (`tests/validation`) spans three fold classes and reports **~98 to 99%
368
+ per-residue 3-state agreement** (helix/strand/coil) with `mkdssp`: 99.1% on the
369
+ helical aquaporin (`1fqy`), 100% on mixed alpha/beta ubiquitin (`1ubq`), and
370
+ 98.2% on the all-beta SH3 domain (`1shg`). It needs backbone N/CA/C/O atoms, so
371
+ use PDB/mmCIF input (not a bare
360
372
  `.xyz`). The secondary-structure render in the showcase above (helices red,
361
373
  turns cyan, coil grey) is produced this way.
362
374
 
@@ -513,7 +525,8 @@ inspection and reuse, not a validated simulation topology.
513
525
 
514
526
  ## Command-line interface
515
527
 
516
- MolScope provides a powerful CLI for visualization, batch analysis, and ML graph export.
528
+ MolScope provides a CLI for visualization, binding-site tables, batch analysis,
529
+ and ML graph export.
517
530
 
518
531
  ### View (default)
519
532
  Visualize a structure, apply transformations, and save images or animations.
@@ -529,6 +542,14 @@ Batch compute molecular descriptors for many files and save to a CSV table.
529
542
  molscope analyze examples/data/*.pdb --out results.csv --preset native-3d --jobs 4
530
543
  ```
531
544
 
545
+ ### Binding sites
546
+ Write protein-ligand binding-site residue contacts and optional pocket
547
+ descriptors to CSV.
548
+ ```bash
549
+ molscope binding-site examples/data/3ptb.pdb --out site.csv --cutoff 4.5
550
+ molscope binding-site examples/data/3ptb.pdb --out site.csv --descriptors-out pocket.csv
551
+ ```
552
+
532
553
  ### Export
533
554
  Batch export molecular graphs to PyTorch Geometric, DGL, or NetworkX formats.
534
555
  ```bash
@@ -536,6 +557,53 @@ molscope export "data/*.cif" --to pyg --out-dir pyg_graphs/ --pe laplacian --job
536
557
  ```
537
558
  Supports advanced features like `--self-loops`, `--global-node`, and `--pe` (positional encodings).
538
559
 
560
+ ### Select
561
+ Pick a diverse subset from a molecule table (`.csv` or `.xlsx`) by MaxMin
562
+ selection over descriptors. Select on existing numeric columns, or compute
563
+ descriptors from a SMILES column with RDKit.
564
+ ```bash
565
+ # diverse pick on existing property columns (no extras needed for CSV)
566
+ molscope select molecules.csv --descriptor-cols MW ALogP -n 100 --out picked.csv
567
+
568
+ # or compute RDKit descriptors from SMILES first (needs the chem extra;
569
+ # .xlsx input needs the xlsx extra)
570
+ molscope select molecules.xlsx --smiles-col SMILES --compute-descriptors -n 100 --out picked.csv
571
+ ```
572
+ `MolLogP` (RDKit's Crippen logP) is the ALogP equivalent. Rows with missing
573
+ descriptors are skipped; selection is deterministic.
574
+
575
+ ## Use from an AI assistant (MCP)
576
+
577
+ MolScope ships an optional [Model Context Protocol](https://modelcontextprotocol.io)
578
+ server, so an AI assistant such as Claude Code or Claude Desktop can drive its
579
+ analyses in natural language. The server exposes MolScope's existing features as
580
+ MCP tools; it adds no new science, just an adapter layer over the public API.
581
+
582
+ ```bash
583
+ pip install "molscope[mcp]" # needs Python >= 3.10
584
+ claude mcp add molscope -- molscope-mcp # register with Claude Code
585
+ ```
586
+
587
+ The server runs over stdio (`molscope-mcp`, or `python -m molscope.mcp_server`)
588
+ and provides these tools:
589
+
590
+ | Tool | What it does |
591
+ | --- | --- |
592
+ | `summarize_structure` | Load a file or PDB id and report atoms, formula, chains, size. |
593
+ | `compute_descriptors` | Descriptor table across one or more structures (the batch tool). |
594
+ | `secondary_structure` | Per-residue DSSP codes and helix/strand/coil composition. |
595
+ | `contact_map` | Contact count, contact order, and labelled contacting pairs. |
596
+ | `binding_site` | Protein residues around a ligand, closest first. |
597
+ | `molecular_graph` | Node/edge counts and feature names for the ML graph. |
598
+ | `coarse_grain` | Bead assignment statistics for a coarse-grained mapping. |
599
+ | `render_structure`, `render_contact_map` | Return PNG figures. |
600
+
601
+ Each tool takes a `source` that is either a local coordinate-file path or a
602
+ 4-character PDB id (fetched from RCSB). For example, you can ask the assistant to
603
+ *"fetch trypsin (3ptb), find the benzamidine binding-site residues, and render a
604
+ contact map"*, and it will call `binding_site` then `render_contact_map`. See
605
+ [`docs/user-guide/mcp-server.md`](docs/user-guide/mcp-server.md) for the full
606
+ guide.
539
607
 
540
608
  ## Sample structures
541
609
 
@@ -544,6 +612,7 @@ Supports advanced features like `--self-loops`, `--global-node`, and `--pe` (pos
544
612
  | `examples/data/helix_201.xyz` | a helix (bare coordinates) |
545
613
  | `examples/data/1fqy.pdb` | Aquaporin-1, single model (1661 atoms) |
546
614
  | `examples/data/1aml.pdb` | Alzheimer amyloid A4 peptide, 20-model NMR ensemble |
615
+ | `examples/data/3ptb.pdb` | Trypsin-benzamidine complex, ligand-binding-site example |
547
616
 
548
617
  ## Notes
549
618
 
@@ -582,6 +651,20 @@ that run everywhere, plus cross-checks against reference scientific tools (the
582
651
  simplified DSSP vs `mkdssp`) that turn "the tests pass" into a measured
583
652
  agreement number.
584
653
 
654
+ ## Changelog
655
+
656
+ Notable changes for each release are recorded in [`CHANGELOG.md`](CHANGELOG.md),
657
+ following the [Keep a Changelog](https://keepachangelog.com/) format.
658
+
659
+ ## How to cite
660
+
661
+ Each release of MolScope is archived on Zenodo with a citable DOI. The concept
662
+ DOI [10.5281/zenodo.20433850](https://doi.org/10.5281/zenodo.20433850) always
663
+ resolves to the latest version; each release also has its own version DOI.
664
+ Machine-readable citation metadata lives in [`CITATION.cff`](CITATION.cff), so
665
+ GitHub's "Cite this repository" button on the sidebar produces BibTeX and APA
666
+ entries automatically.
667
+
585
668
  ## License
586
669
 
587
670
  [MIT](LICENSE)
@@ -7,6 +7,7 @@
7
7
  [![Python](https://img.shields.io/badge/python-3.9%20%7C%203.11%20%7C%203.13-blue)](pyproject.toml)
8
8
  [![License: MIT](https://img.shields.io/badge/license-MIT-yellow.svg)](LICENSE)
9
9
  [![Code style: Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
10
+ [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.20433850.svg)](https://doi.org/10.5281/zenodo.20433850)
10
11
 
11
12
  Lightweight molecular structure analysis, graph export, and coarse-graining in
12
13
  Python. MolScope is built around three polished workflows: turn coordinate
@@ -214,11 +215,17 @@ mol.select(chain="A") # one chain
214
215
  mol.select(element="C") # all carbons
215
216
  mol.select(resname="HOH") # waters
216
217
  mol.select(resid=(10, 20)) # an inclusive residue range
218
+ mol.select(resid=100, icode="A") # PDB/mmCIF insertion code
219
+ mol.residue_ids # full ResidueId(chain, resid, icode, resname)
217
220
  mol.alpha_carbons() # CA atoms (the usual basis for protein RMSD)
218
221
  mol.backbone() # N, CA, C, O
219
222
  mol[mask_or_indices] # subset by numpy mask / index array
220
223
  ```
221
224
 
225
+ Residue grouping, contact maps, binding sites and residue graphs use full
226
+ residue identity internally, so residues like `100A` and `100B` stay separate
227
+ while integer `resids` remain available for older code.
228
+
222
229
  ### Analysis and measurements
223
230
 
224
231
  ```python
@@ -303,10 +310,11 @@ Codes follow DSSP notation: `H`/`G`/`I` helices, `E`/`B` strands, `T` turn, `S`
303
310
  bend, `-` coil. This is a simplified **educational** implementation of the
304
311
  Kabsch-Sander hydrogen-bond model: not bit-identical to the reference `mkdssp`
305
312
  on every edge case, but validated against it. A CI cross-check
306
- (`tests/validation`) puts it at **~99% per-residue 3-state agreement**
307
- (helix/strand/coil) with `mkdssp` 4.2.2 on the bundled aquaporin (`1fqy`);
308
- strand-rich folds, where reference DSSP is hardest to match, will agree less
309
- closely. It needs backbone N/CA/C/O atoms, so use PDB/mmCIF input (not a bare
313
+ (`tests/validation`) spans three fold classes and reports **~98 to 99%
314
+ per-residue 3-state agreement** (helix/strand/coil) with `mkdssp`: 99.1% on the
315
+ helical aquaporin (`1fqy`), 100% on mixed alpha/beta ubiquitin (`1ubq`), and
316
+ 98.2% on the all-beta SH3 domain (`1shg`). It needs backbone N/CA/C/O atoms, so
317
+ use PDB/mmCIF input (not a bare
310
318
  `.xyz`). The secondary-structure render in the showcase above (helices red,
311
319
  turns cyan, coil grey) is produced this way.
312
320
 
@@ -463,7 +471,8 @@ inspection and reuse, not a validated simulation topology.
463
471
 
464
472
  ## Command-line interface
465
473
 
466
- MolScope provides a powerful CLI for visualization, batch analysis, and ML graph export.
474
+ MolScope provides a CLI for visualization, binding-site tables, batch analysis,
475
+ and ML graph export.
467
476
 
468
477
  ### View (default)
469
478
  Visualize a structure, apply transformations, and save images or animations.
@@ -479,6 +488,14 @@ Batch compute molecular descriptors for many files and save to a CSV table.
479
488
  molscope analyze examples/data/*.pdb --out results.csv --preset native-3d --jobs 4
480
489
  ```
481
490
 
491
+ ### Binding sites
492
+ Write protein-ligand binding-site residue contacts and optional pocket
493
+ descriptors to CSV.
494
+ ```bash
495
+ molscope binding-site examples/data/3ptb.pdb --out site.csv --cutoff 4.5
496
+ molscope binding-site examples/data/3ptb.pdb --out site.csv --descriptors-out pocket.csv
497
+ ```
498
+
482
499
  ### Export
483
500
  Batch export molecular graphs to PyTorch Geometric, DGL, or NetworkX formats.
484
501
  ```bash
@@ -486,6 +503,53 @@ molscope export "data/*.cif" --to pyg --out-dir pyg_graphs/ --pe laplacian --job
486
503
  ```
487
504
  Supports advanced features like `--self-loops`, `--global-node`, and `--pe` (positional encodings).
488
505
 
506
+ ### Select
507
+ Pick a diverse subset from a molecule table (`.csv` or `.xlsx`) by MaxMin
508
+ selection over descriptors. Select on existing numeric columns, or compute
509
+ descriptors from a SMILES column with RDKit.
510
+ ```bash
511
+ # diverse pick on existing property columns (no extras needed for CSV)
512
+ molscope select molecules.csv --descriptor-cols MW ALogP -n 100 --out picked.csv
513
+
514
+ # or compute RDKit descriptors from SMILES first (needs the chem extra;
515
+ # .xlsx input needs the xlsx extra)
516
+ molscope select molecules.xlsx --smiles-col SMILES --compute-descriptors -n 100 --out picked.csv
517
+ ```
518
+ `MolLogP` (RDKit's Crippen logP) is the ALogP equivalent. Rows with missing
519
+ descriptors are skipped; selection is deterministic.
520
+
521
+ ## Use from an AI assistant (MCP)
522
+
523
+ MolScope ships an optional [Model Context Protocol](https://modelcontextprotocol.io)
524
+ server, so an AI assistant such as Claude Code or Claude Desktop can drive its
525
+ analyses in natural language. The server exposes MolScope's existing features as
526
+ MCP tools; it adds no new science, just an adapter layer over the public API.
527
+
528
+ ```bash
529
+ pip install "molscope[mcp]" # needs Python >= 3.10
530
+ claude mcp add molscope -- molscope-mcp # register with Claude Code
531
+ ```
532
+
533
+ The server runs over stdio (`molscope-mcp`, or `python -m molscope.mcp_server`)
534
+ and provides these tools:
535
+
536
+ | Tool | What it does |
537
+ | --- | --- |
538
+ | `summarize_structure` | Load a file or PDB id and report atoms, formula, chains, size. |
539
+ | `compute_descriptors` | Descriptor table across one or more structures (the batch tool). |
540
+ | `secondary_structure` | Per-residue DSSP codes and helix/strand/coil composition. |
541
+ | `contact_map` | Contact count, contact order, and labelled contacting pairs. |
542
+ | `binding_site` | Protein residues around a ligand, closest first. |
543
+ | `molecular_graph` | Node/edge counts and feature names for the ML graph. |
544
+ | `coarse_grain` | Bead assignment statistics for a coarse-grained mapping. |
545
+ | `render_structure`, `render_contact_map` | Return PNG figures. |
546
+
547
+ Each tool takes a `source` that is either a local coordinate-file path or a
548
+ 4-character PDB id (fetched from RCSB). For example, you can ask the assistant to
549
+ *"fetch trypsin (3ptb), find the benzamidine binding-site residues, and render a
550
+ contact map"*, and it will call `binding_site` then `render_contact_map`. See
551
+ [`docs/user-guide/mcp-server.md`](docs/user-guide/mcp-server.md) for the full
552
+ guide.
489
553
 
490
554
  ## Sample structures
491
555
 
@@ -494,6 +558,7 @@ Supports advanced features like `--self-loops`, `--global-node`, and `--pe` (pos
494
558
  | `examples/data/helix_201.xyz` | a helix (bare coordinates) |
495
559
  | `examples/data/1fqy.pdb` | Aquaporin-1, single model (1661 atoms) |
496
560
  | `examples/data/1aml.pdb` | Alzheimer amyloid A4 peptide, 20-model NMR ensemble |
561
+ | `examples/data/3ptb.pdb` | Trypsin-benzamidine complex, ligand-binding-site example |
497
562
 
498
563
  ## Notes
499
564
 
@@ -532,6 +597,20 @@ that run everywhere, plus cross-checks against reference scientific tools (the
532
597
  simplified DSSP vs `mkdssp`) that turn "the tests pass" into a measured
533
598
  agreement number.
534
599
 
600
+ ## Changelog
601
+
602
+ Notable changes for each release are recorded in [`CHANGELOG.md`](CHANGELOG.md),
603
+ following the [Keep a Changelog](https://keepachangelog.com/) format.
604
+
605
+ ## How to cite
606
+
607
+ Each release of MolScope is archived on Zenodo with a citable DOI. The concept
608
+ DOI [10.5281/zenodo.20433850](https://doi.org/10.5281/zenodo.20433850) always
609
+ resolves to the latest version; each release also has its own version DOI.
610
+ Machine-readable citation metadata lives in [`CITATION.cff`](CITATION.cff), so
611
+ GitHub's "Cite this repository" button on the sidebar produces BibTeX and APA
612
+ entries automatically.
613
+
535
614
  ## License
536
615
 
537
616
  [MIT](LICENSE)
@@ -9,12 +9,22 @@
9
9
  - `molscope.write_pdb(molecule, path)`, `write_xyz(molecule, path)`.
10
10
  - `molscope.featurize_many(paths, return_names=False)`: build an ML feature matrix.
11
11
  - `molscope.descriptor_feature_names(preset)`: stable flattened descriptor columns.
12
+ - `molscope.pocket_descriptor_feature_names("pocket-basic")`: stable binding-pocket descriptor columns.
12
13
  - `molscope.node_feature_names(preset)`, `edge_feature_names(preset)`: atom/bond graph preset columns.
13
14
  - `molscope.residue_node_feature_names(preset)`, `residue_edge_feature_names(preset)`: residue contact graph preset columns.
14
15
  - `molscope.interface_residues(mol, chain_a, chain_b, cutoff=5.0)`, `chain_contact_matrix(mol, cutoff=5.0)`: chain interfaces.
15
16
  - `molscope.ligands(mol, ...)`, `binding_site(mol, ligand=None, cutoff=4.5)`: ligand detection and binding-site residues.
16
17
  - `molscope.backbone_torsions(mol)`: per-residue phi/psi/omega.
17
18
 
19
+ Residue identity helpers:
20
+
21
+ - `molscope.ResidueId(chain, resid, insertion_code="", resname="")`: full residue identity used by PDB/mmCIF-aware APIs.
22
+ - `molscope.ResidueGroup`: yielded by `Molecule.residue_groups()`; has `.residue_id` and still unpacks as `(atom_indices, resname, resid, chain)`.
23
+
24
+ `BindingSite` results expose `to_records()`, `to_molecule(mol)`,
25
+ `descriptors(mol, preset="pocket-basic")`, and `plot(mol)` for residue tables,
26
+ pocket descriptor extraction, and quick figures.
27
+
18
28
  ## Molecule
19
29
 
20
30
  Construction:
@@ -12,7 +12,7 @@ mol = ms.read("examples/data/3ptb.pdb")
12
12
  print(mol.ligands()) # [LigandResidue(A:BEN1, 9 atoms)]
13
13
 
14
14
  site = mol.binding_site(cutoff=4.5) # single ligand auto-detected
15
- print(site) # BindingSite(BEN1: 13 residues < 4.5 A)
15
+ print(site) # BindingSite(A:BEN1: 13 residues < 4.5 A)
16
16
 
17
17
  for res, dist in zip(site.residues, site.min_distances):
18
18
  print(f"{res!s:<10} {dist:.2f} A")
@@ -23,13 +23,39 @@ for res, dist in zip(site.residues, site.min_distances):
23
23
  # A:SER195 3.65 <- catalytic serine
24
24
  ```
25
25
 
26
+ For quick figures or reports, convert the site to table-friendly residue
27
+ records and extract descriptors for only the site residues:
28
+
29
+ ```python
30
+ site.to_records()[0]
31
+ # {'residue_id': 'A:GLY219', 'chain': 'A', 'resid': 219,
32
+ # 'insertion_code': '', 'resname': 'GLY',
33
+ # 'min_distance': 2.815..., 'n_atom_contacts': 5}
34
+
35
+ site.descriptors(mol, preset="pocket-basic")
36
+ site.plot(mol, show=False) # pocket residues plus ligand
37
+ ```
38
+
39
+ The same residue table is available from the command line:
40
+
41
+ ```bash
42
+ molscope binding-site examples/data/3ptb.pdb --out site.csv --cutoff 4.5
43
+ ```
44
+
45
+ Add `--descriptors-out pocket.csv` to also write the one-row
46
+ `pocket-basic` descriptor table.
47
+
26
48
  When a structure has several ligands, select one by residue name or location:
27
49
 
28
50
  ```python
29
51
  mol.binding_site(ligand="BEN")
30
52
  mol.binding_site(ligand=("A", 1))
53
+ mol.binding_site(ligand=("A", 100, "A")) # with insertion code
31
54
  ```
32
55
 
56
+ The CLI accepts the same choices as `--ligand BEN`, `--ligand A:1`, or
57
+ `--ligand A:100:A`.
58
+
33
59
  A runnable version lives in
34
60
  [`examples/binding_site.py`](https://github.com/roshan2004/molscope/blob/main/examples/binding_site.py).
35
61
  See the full guide:
@@ -122,17 +122,21 @@ style assignment based on backbone hydrogen-bond patterns.
122
122
  What is validated:
123
123
 
124
124
  - The validation suite compares MolScope's simplified DSSP against `mkdssp`
125
- where the reference binary is available.
126
- - On the bundled Aquaporin-1 structure (`1fqy`), CI records about 99%
127
- agreement after reducing DSSP states to helix/strand/coil.
125
+ across three fold classes where the reference binary is available:
126
+ Aquaporin-1 (`1fqy`, helix-dominated), ubiquitin (`1ubq`, mixed alpha/beta),
127
+ and the SH3 domain (`1shg`, all-beta).
128
+ - After reducing DSSP states to helix/strand/coil, agreement is high across
129
+ all three folds: about 99% on the helical and mixed structures and about
130
+ 98% on the all-beta one.
128
131
 
129
132
  Limits:
130
133
 
131
134
  - Not bit-identical to reference `mkdssp`. Treat output as the
132
135
  educational/prototyping equivalent of DSSP, not as a substitute for it in
133
136
  production pipelines.
134
- - Strand-rich and edge-case folds can disagree more strongly than helical
135
- test structures.
137
+ - Disagreements concentrate at the boundary residues of helices and strands,
138
+ and on irregular or low-quality structures, rather than on any one fold
139
+ class.
136
140
  - Needs backbone N/CA/C/O atoms, so bare XYZ input is insufficient.
137
141
  - Only the standard collapsed states (H/E/C) are validated against the
138
142
  reference; finer DSSP categories are not.
@@ -57,6 +57,20 @@ Preset options:
57
57
  - `native-3d`: `native-basic` plus centres, inertia, principal axes/moments, and distance histograms.
58
58
  - `rdkit-basic`: `native-basic` plus a stable subset of RDKit scalar descriptors.
59
59
 
60
+ Ligand binding sites have their own fixed-size preset because they need a
61
+ ligand context:
62
+
63
+ ```python
64
+ mol = ms.read("examples/data/3ptb.pdb")
65
+ site = mol.binding_site(cutoff=4.5)
66
+ pocket = site.descriptors(mol, preset="pocket-basic")
67
+ names = ms.pocket_descriptor_feature_names("pocket-basic")
68
+ ```
69
+
70
+ `pocket-basic` includes pocket atom and residue counts, amino-acid composition,
71
+ protein-ligand contact counts, radius of gyration, bounding-box dimensions, and
72
+ ligand-distance summaries.
73
+
60
74
  ## RDKit descriptors
61
75
 
62
76
  Install the optional chemical backend to access RDKit's scalar descriptor set: