sawnergy 1.0.6__py3-none-any.whl → 1.0.7__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of sawnergy might be problematic. Click here for more details.

@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: sawnergy
3
- Version: 1.0.6
3
+ Version: 1.0.7
4
4
  Summary: Toolkit for transforming molecular dynamics (MD) trajectories into rich graph representations
5
5
  Home-page: https://github.com/Yehor-Mishchyriak/SAWNERGY
6
6
  Author: Yehor Mishchyriak
@@ -39,18 +39,44 @@ Dynamic: summary
39
39
  ![Python](https://img.shields.io/badge/python-3.11%2B-blue)
40
40
 
41
41
  A toolkit for transforming molecular dynamics (MD) trajectories into rich graph representations, sampling
42
- random and self-avoiding walks, learning node embeddings, and visualising residue interaction networks (RINs). SAWNERGY
42
+ random and self-avoiding walks, learning node embeddings, and visualizing residue interaction networks (RINs). SAWNERGY
43
43
  keeps the full workflow — from `cpptraj` output to skip-gram embeddings (node2vec approach) — inside Python, backed by efficient Zarr-based archives and optional GPU acceleration.
44
44
 
45
45
  ---
46
46
 
47
+ ## Installation
48
+
49
+ ```bash
50
+ pip install sawnergy
51
+ ```
52
+
53
+ > **Optional:** For GPU training, install PyTorch separately (e.g., `pip install torch`).
54
+ > **Note:** RIN building requires `cpptraj` (AmberTools). Ensure it is discoverable via `$PATH` or the `CPPTRAJ`
55
+ > environment variable. Probably the easiest solution: install AmberTools via conda, activate the environment, and SAWNERGY will find cpptraj executable on its own, so just run your code and don't worry about it.
56
+
57
+ ---
58
+
59
+ # UPDATES:
60
+
61
+ ## v1.0.7 — What’s new:
62
+ - **Added plain SkipGram model**
63
+ - Now, the user can choose if they want to apply the negative sampling technique (two binary classifiers) or train a single classifier over the vocabulary (full softmax). For more detail, see: [node2vec](https://arxiv.org/pdf/1607.00653), [word2vec](https://arxiv.org/pdf/1301.3781), and [negative_sampling](https://arxiv.org/pdf/1402.3722).
64
+ - **Set a harsher default for low interaction energies pruning during RIN construction**
65
+ - Now we zero out 85% of the lowest interaction energies as opposed to the past 30% default, leading to more meaningful embeddings.
66
+ - **BUG FIX: Visualizer**
67
+ - Previously, the visualizer would silently draw edges of 0 magnitude, meaning they were actually being drawn but were invisible due to full transparency and 0 width. As a result, the displayed image / animation would be very laggy. Now, this was fixed, and given high pruning default, the displayed interaction networks are clean and smooth under rotations, dragging, etc.
68
+ - **New Embedding Visualizer (3D)**
69
+ - New lightweight viewer for per-frame embeddings that projects embeddings with PCA to a **3D** scatter. Supports the same node coloring semantics, optional node labels, and the same antialiasing/depthshade controls. Works in headless setups using the same backend guard and uses a blocking `show=True` for scripts.
70
+
71
+ ---
72
+
47
73
  ## Why SAWNERGY?
48
74
 
49
75
  - **Bridge simulations and graph ML**: Convert raw MD trajectories into residue interaction networks ready for graph
50
76
  algorithms and downstream machine learning tasks.
51
- - **Deterministic, shareable artefacts**: Every stage produces compressed Zarr archives that contain both data and metadata so runs can be reproduced, shared, or inspected later.
52
- - **High-performance data handling**: Heavy arrays live in shared memory during walk sampling to allow parallel processing without serealization overhead; archives are written in chunked, compressed form for fast read/write.
53
- - **Flexible embedding backends**: Train skip-gram with negative sampling (SGNS) models using either PureML or PyTorch.
77
+ - **Deterministic, shareable artifacts**: Every stage produces compressed Zarr archives that contain both data and metadata so runs can be reproduced, shared, or inspected later.
78
+ - **High-performance data handling**: Heavy arrays live in shared memory during walk sampling to allow parallel processing without serialization overhead; archives are written in chunked, compressed form for fast read/write.
79
+ - **Flexible objectives & backends**: Train Skip-Gram with **negative sampling** (`objective="sgns"`) or **plain Skip-Gram** (`objective="sg"`), using either **PureML** (default) or **PyTorch**.
54
80
  - **Visualization out of the box**: Plot and animate residue networks without leaving Python, using the data produced by RINBuilder
55
81
 
56
82
  ---
@@ -91,9 +117,9 @@ node indexing, and RNG seeds stay consistent across the toolchain.
91
117
  * Wraps the AmberTools `cpptraj` executable to:
92
118
  - compute per-frame electrostatic (EMAP) and van der Waals (VMAP) energy matrices at the atomic level,
93
119
  - project atom–atom interactions to residue–residue interactions using compositional masks,
94
- - prune, symmetrise, remove self-interactions, and L1-normalise the matrices,
95
- - compute per-residue centres of mass (COM) over the same frames.
96
- * Outputs a compressed Zarr archive with transition matrices, optional prenormalised energies, COM snapshots, and rich
120
+ - prune, symmetrize, remove self-interactions, and L1-normalise the matrices,
121
+ - compute per-residue centers of mass (COM) over the same frames.
122
+ * Outputs a compressed Zarr archive with transition matrices, optional pre-normalized energies, COM snapshots, and rich
97
123
  metadata (frame range, pruning quantile, molecule ID, etc.).
98
124
  * Supports parallel `cpptraj` execution, batch processing, and keeps temporary stores tidy via
99
125
  `ArrayStorage.compress_and_cleanup`.
@@ -103,7 +129,7 @@ node indexing, and RNG seeds stay consistent across the toolchain.
103
129
  * Opens RIN archives, resolves dataset names from attributes, and renders nodes plus attractive/repulsive edge bundles
104
130
  in 3D using Matplotlib.
105
131
  * Allows both static frame visualization and trajectory animation.
106
- * Handles backend selection (`Agg` fallback in headless environments) and offers convenient colour palettes via
132
+ * Handles backend selection (`Agg` fallback in headless environments) and offers convenient color palettes via
107
133
  `visualizer_util`.
108
134
 
109
135
  ### `sawnergy.walks.Walker`
@@ -140,23 +166,13 @@ node indexing, and RNG seeds stay consistent across the toolchain.
140
166
  |---|---|---|
141
167
  | **RIN** | `ATTRACTIVE_transitions` → **(T, N, N)**, float32 • `REPULSIVE_transitions` → **(T, N, N)**, float32 (optional) • `ATTRACTIVE_energies` → **(T, N, N)**, float32 (optional) • `REPULSIVE_energies` → **(T, N, N)**, float32 (optional) • `COM` → **(T, N, 3)**, float32 | `time_created` (ISO) • `com_name` = `"COM"` • `molecule_of_interest` (int) • `frame_range` = `(start, end)` inclusive • `frame_batch_size` (int) • `prune_low_energies_frac` (float in [0,1]) • `attractive_transitions_name` / `repulsive_transitions_name` (dataset names or `None`) • `attractive_energies_name` / `repulsive_energies_name` (dataset names or `None`) |
142
168
  | **Walks** | `ATTRACTIVE_RWs` → **(T, N·num_RWs, L+1)**, int32 (optional) • `REPULSIVE_RWs` → **(T, N·num_RWs, L+1)**, int32 (optional) • `ATTRACTIVE_SAWs` → **(T, N·num_SAWs, L+1)**, int32 (optional) • `REPULSIVE_SAWs` → **(T, N·num_SAWs, L+1)**, int32 (optional) <br/>_Note:_ node IDs are **1-based**.| `time_created` (ISO) • `seed` (int) • `rng_scheme` = `"SeedSequence.spawn_per_batch_v1"` • `num_workers` (int) • `in_parallel` (bool) • `batch_size_nodes` (int) • `num_RWs` / `num_SAWs` (ints) • `node_count` (N) • `time_stamp_count` (T) • `walk_length` (L) • `walks_per_node` (int) • `attractive_RWs_name` / `repulsive_RWs_name` / `attractive_SAWs_name` / `repulsive_SAWs_name` (dataset names or `None`) • `walks_layout` = `"time_leading_3d"` |
143
- | **Embeddings** | `FRAME_EMBEDDINGS` → **(frames_written, vocab_size, D)**, typically float32 | `time_created` (ISO) • `seed` (int) • `rng_scheme` = `"SeedSequence.spawn_per_frame_v1"` • `source_walks_path` (str) • `model_base` = `"torch"` or `"pureml"` • `rin_type` = `"attr"` or `"repuls"` • `using_mode` = `"RW"|"SAW"|"merged"` • `window_size` (int) • `alpha` (float; noise exponent) • `dimensionality` = D • `num_negative_samples` (int) • `num_epochs` (int) • `batch_size` (int) • `shuffle_data` (bool) • `frames_written` (int) • `vocab_size` (int) • `frame_count` (int) • `embedding_dtype` (str) • `frame_embeddings_name` = `"FRAME_EMBEDDINGS"` • `arrays_per_chunk` (int) • `compression_level` (int) |
169
+ | **Embeddings** | `FRAME_EMBEDDINGS` → **(frames_written, vocab_size, D)**, typically float32 | `time_created` (ISO) • `seed` (int) • `rng_scheme` = `"SeedSequence.spawn_per_frame_v1"` • `source_walks_path` (str) • `model_base` = `"torch"` or `"pureml"` • `rin_type` = `"attr"` or `"repuls"` • `using_mode` = `"RW"|"SAW"|"merged"` • `window_size` (int) • `alpha` (float; noise exponent) • `dimensionality` = D • `num_negative_samples` (int) • `num_epochs` (int) • `batch_size` (int) • `shuffle_data` (bool) • `frames_written` (int) • `vocab_size` (int) • `frame_count` (int) • `embedding_dtype` (str) • `frame_embeddings_name` = `"FRAME_EMBEDDINGS"` • `arrays_per_chunk` (int) • `compression_level` (int) • `objective` = `"sgns"` or `"sg"` |
144
170
 
145
171
  **Notes**
146
172
 
147
173
  - In **RIN**, `T` equals the number of frame **batches** written (i.e., `frame_range` swept in steps of `frame_batch_size`). `ATTRACTIVE/REPULSIVE_energies` are **pre-normalised** absolute energies (written only when `keep_prenormalized_energies=True`), whereas `ATTRACTIVE/REPULSIVE_transitions` are the **row-wise L1-normalised** versions used for sampling.
148
174
  - All archives are Zarr v3 groups. ArrayStorage also maintains per-block metadata in root attrs: `array_chunk_size_in_block`, `array_shape_in_block`, and `array_dtype_in_block` (dicts keyed by dataset name). You’ll see these in every archive.
149
-
150
- ---
151
-
152
- ## Installation
153
-
154
- ```bash
155
- pip install sawnergy
156
- ```
157
-
158
- > **Note:** RIN building requires `cpptraj` (AmberTools). Ensure it is discoverable via `$PATH` or the `CPPTRAJ`
159
- > environment variable.
175
+ - In **Embeddings**, `alpha` and `num_negative_samples` apply to **SGNS** only and are ignored for `objective="sg"`.
160
176
 
161
177
  ---
162
178
 
@@ -181,7 +197,7 @@ rin_builder.build_rin(
181
197
  molecule_of_interest=1,
182
198
  frame_range=(1, 100),
183
199
  frame_batch_size=10,
184
- prune_low_energies_frac=0.3,
200
+ prune_low_energies_frac=0.85,
185
201
  output_path=rin_path,
186
202
  include_attractive=True,
187
203
  include_repulsive=False,
@@ -210,6 +226,7 @@ embeddings_path = embedder.embed_all(
210
226
  RIN_type="attr",
211
227
  using="merged",
212
228
  window_size=4,
229
+ objective="sgns",
213
230
  num_negative_samples=5,
214
231
  num_epochs=5,
215
232
  batch_size=1024,
@@ -232,12 +249,12 @@ print("Embeddings written to", embeddings_path)
232
249
 
233
250
  ---
234
251
 
235
- ## Visualisation
252
+ ## Visualization
236
253
 
237
254
  ```python
238
255
  from sawnergy.visual import Visualizer
239
256
 
240
- v = sawnergy.visual.Visualizer("./RIN_demo.zip")
257
+ v = Visualizer("./RIN_demo.zip")
241
258
  v.build_frame(1,
242
259
  node_colors="rainbow",
243
260
  displayed_nodes="ALL",
@@ -250,6 +267,13 @@ v.build_frame(1,
250
267
 
251
268
  `Visualizer` lazily loads datasets and works even in headless environments (falls back to the `Agg` backend).
252
269
 
270
+ ```python
271
+ from sawnergy.embedding import Visualizer
272
+
273
+ viz = sawnergy.embedding.Visualizer("./EMBEDDINGS_demo.zip")
274
+ viz.build_frame(1, show=True)
275
+ ```
276
+
253
277
  ---
254
278
 
255
279
  ## Advanced Notes
@@ -0,0 +1,23 @@
1
+ sawnergy/__init__.py,sha256=Dq1U38ah6nPRFEDKN41mYphcTynKfnItca6QkYkpSbs,248
2
+ sawnergy/logging_util.py,sha256=mfYw8IsYtOfCXayjkd4g9jHuupluxRNbqyFegRkiAhQ,1476
3
+ sawnergy/sawnergy_util.py,sha256=Htx9wr0S8TXt5aHT2mtEdYf1TCo_BC1IUwNNuZdIR-4,49432
4
+ sawnergy/embedding/SGNS_pml.py,sha256=LfZDlIF3-KnWUAjhwOT5ggGl2OoReM8_L0TCVYs6GJ0,14299
5
+ sawnergy/embedding/SGNS_torch.py,sha256=NIr-RlOmXlEPe3m8Z6XvuG0b8MGTidETfugigcQTwFs,11232
6
+ sawnergy/embedding/__init__.py,sha256=T1YXb7S5Zyy_kIqlarDSX3imd_FGFH6nDuvLQ3hMKsE,1764
7
+ sawnergy/embedding/embedder.py,sha256=K9I6HYYQFH7SHpgxeTCf8_MMvyLxVaAaltoMwJbgyqo,28749
8
+ sawnergy/embedding/visualizer.py,sha256=bweituYNj5dOzFhvU4n_E-RbzZiUKw6bfJchFLfjFD4,8625
9
+ sawnergy/rin/__init__.py,sha256=z19hLfEIp3bwzY-eCHQBQf0NRTCJzVz_FLIpVV5q0W4,162
10
+ sawnergy/rin/rin_builder.py,sha256=d1cC4KKY9zzNlqhxHWTFM-QyXRXubd2zlCrSM-dV5pc,44624
11
+ sawnergy/rin/rin_util.py,sha256=5TKywA5qfm76Gl4Cyz7oBPasmE5chclR7UM4hawwQOg,14939
12
+ sawnergy/visual/__init__.py,sha256=p_ByFtfrP19b5_qiJlkAnYesZN3M1LjIo421LUgVVbw,502
13
+ sawnergy/visual/visualizer.py,sha256=GVD_rFavDXFz9-h28eFf5nPBujUvRncn_zYoHcFHZ3Q,33155
14
+ sawnergy/visual/visualizer_util.py,sha256=7y3kWjHxDQMoG0dmimceHKTC5veVChoyvW7d0qXH23k,15100
15
+ sawnergy/walks/__init__.py,sha256=Z_Kaffhn3oUX13z9jbY0V5Ncdwj9Cnr--n9D-s7gh5k,250
16
+ sawnergy/walks/walker.py,sha256=scvfZFrSL4AwpmspD0Jb0uhnrVIRRwE_hPCE3bG6zpg,37729
17
+ sawnergy/walks/walker_util.py,sha256=ETdyPNIDwDQCA8Z5t38keBhYBJ56_ksT_0NhOCY-tHE,15361
18
+ sawnergy-1.0.7.dist-info/licenses/LICENSE,sha256=cElK4bCsDhyAEON3H05s35bQZvxBcXBiCOrOdiUhDCY,11346
19
+ sawnergy-1.0.7.dist-info/licenses/NOTICE,sha256=eVTbuSasZrmMJVtKoWOzsKyu4ZNm7Ks7dzI3Tx5tEHc,109
20
+ sawnergy-1.0.7.dist-info/METADATA,sha256=x0PQa0JilbayBcgywmnCL8IZZwTylzz8gOGnvwJHeDc,15433
21
+ sawnergy-1.0.7.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
22
+ sawnergy-1.0.7.dist-info/top_level.txt,sha256=-67FQD6FD9Gjt74WTmO9hNYA3MLB4HaSxci0sEKC5Lo,9
23
+ sawnergy-1.0.7.dist-info/RECORD,,
@@ -1,22 +0,0 @@
1
- sawnergy/__init__.py,sha256=Dq1U38ah6nPRFEDKN41mYphcTynKfnItca6QkYkpSbs,248
2
- sawnergy/logging_util.py,sha256=tnhToHchnWaORHU73dxzBuL1e_C-AXFdPExDZTEI6tE,1474
3
- sawnergy/sawnergy_util.py,sha256=Htx9wr0S8TXt5aHT2mtEdYf1TCo_BC1IUwNNuZdIR-4,49432
4
- sawnergy/embedding/SGNS_pml.py,sha256=xF_0DksJTUH5DxchTwkg-Ol975lwH1O259Wa0ZSbmDA,6298
5
- sawnergy/embedding/SGNS_torch.py,sha256=3Pa_mk5mzsl27M87q4tNmitOouxDdG5ZzxpdaOSyGt8,6411
6
- sawnergy/embedding/__init__.py,sha256=sxUh2RcZyPs8aCdvec8x843Bm3DBaYQNrBF8VyvLQ-k,965
7
- sawnergy/embedding/embedder.py,sha256=0DRkEfjWqnKCHdr0AxN3wjqclezMOOw6THZE7GlxihE,26266
8
- sawnergy/rin/__init__.py,sha256=z19hLfEIp3bwzY-eCHQBQf0NRTCJzVz_FLIpVV5q0W4,162
9
- sawnergy/rin/rin_builder.py,sha256=z5hCvW-jHnnv7ZgHlQlruRAMKa-TnKFdvkMcoHBhX78,44623
10
- sawnergy/rin/rin_util.py,sha256=5TKywA5qfm76Gl4Cyz7oBPasmE5chclR7UM4hawwQOg,14939
11
- sawnergy/visual/__init__.py,sha256=p_ByFtfrP19b5_qiJlkAnYesZN3M1LjIo421LUgVVbw,502
12
- sawnergy/visual/visualizer.py,sha256=qqggoLRNi6t0awXEt-Hy2ut9S0Y8_uKznyozlGLR1Q8,33131
13
- sawnergy/visual/visualizer_util.py,sha256=C9W22CJmfJuTV5_uYsEnG8YChR4nH7OHKbNz26hAyB0,15028
14
- sawnergy/walks/__init__.py,sha256=Z_Kaffhn3oUX13z9jbY0V5Ncdwj9Cnr--n9D-s7gh5k,250
15
- sawnergy/walks/walker.py,sha256=scvfZFrSL4AwpmspD0Jb0uhnrVIRRwE_hPCE3bG6zpg,37729
16
- sawnergy/walks/walker_util.py,sha256=ETdyPNIDwDQCA8Z5t38keBhYBJ56_ksT_0NhOCY-tHE,15361
17
- sawnergy-1.0.6.dist-info/licenses/LICENSE,sha256=cElK4bCsDhyAEON3H05s35bQZvxBcXBiCOrOdiUhDCY,11346
18
- sawnergy-1.0.6.dist-info/licenses/NOTICE,sha256=eVTbuSasZrmMJVtKoWOzsKyu4ZNm7Ks7dzI3Tx5tEHc,109
19
- sawnergy-1.0.6.dist-info/METADATA,sha256=9_ocluBr8baUZfTcZdBkdNx_AIu3VOtKADEyMuTc3CY,13367
20
- sawnergy-1.0.6.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
21
- sawnergy-1.0.6.dist-info/top_level.txt,sha256=-67FQD6FD9Gjt74WTmO9hNYA3MLB4HaSxci0sEKC5Lo,9
22
- sawnergy-1.0.6.dist-info/RECORD,,