risk-network 0.0.9b45__tar.gz → 0.0.10__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {risk_network-0.0.9b45 → risk_network-0.0.10}/PKG-INFO +28 -47
- risk_network-0.0.10/README.md +83 -0
- {risk_network-0.0.9b45 → risk_network-0.0.10}/risk/__init__.py +1 -1
- {risk_network-0.0.9b45 → risk_network-0.0.10}/risk/annotations/annotations.py +6 -1
- {risk_network-0.0.9b45 → risk_network-0.0.10}/risk/neighborhoods/domains.py +27 -20
- {risk_network-0.0.9b45 → risk_network-0.0.10}/risk/network/geometry.py +2 -19
- {risk_network-0.0.9b45 → risk_network-0.0.10}/risk_network.egg-info/PKG-INFO +28 -47
- {risk_network-0.0.9b45 → risk_network-0.0.10}/setup.py +1 -1
- risk_network-0.0.9b45/README.md +0 -102
- {risk_network-0.0.9b45 → risk_network-0.0.10}/LICENSE +0 -0
- {risk_network-0.0.9b45 → risk_network-0.0.10}/MANIFEST.in +0 -0
- {risk_network-0.0.9b45 → risk_network-0.0.10}/pyproject.toml +0 -0
- {risk_network-0.0.9b45 → risk_network-0.0.10}/risk/annotations/__init__.py +0 -0
- {risk_network-0.0.9b45 → risk_network-0.0.10}/risk/annotations/io.py +0 -0
- {risk_network-0.0.9b45 → risk_network-0.0.10}/risk/log/__init__.py +0 -0
- {risk_network-0.0.9b45 → risk_network-0.0.10}/risk/log/console.py +0 -0
- {risk_network-0.0.9b45 → risk_network-0.0.10}/risk/log/parameters.py +0 -0
- {risk_network-0.0.9b45 → risk_network-0.0.10}/risk/neighborhoods/__init__.py +0 -0
- {risk_network-0.0.9b45 → risk_network-0.0.10}/risk/neighborhoods/api.py +0 -0
- {risk_network-0.0.9b45 → risk_network-0.0.10}/risk/neighborhoods/community.py +0 -0
- {risk_network-0.0.9b45 → risk_network-0.0.10}/risk/neighborhoods/neighborhoods.py +0 -0
- {risk_network-0.0.9b45 → risk_network-0.0.10}/risk/network/__init__.py +0 -0
- {risk_network-0.0.9b45 → risk_network-0.0.10}/risk/network/graph/__init__.py +0 -0
- {risk_network-0.0.9b45 → risk_network-0.0.10}/risk/network/graph/api.py +0 -0
- {risk_network-0.0.9b45 → risk_network-0.0.10}/risk/network/graph/graph.py +0 -0
- {risk_network-0.0.9b45 → risk_network-0.0.10}/risk/network/graph/summary.py +0 -0
- {risk_network-0.0.9b45 → risk_network-0.0.10}/risk/network/io.py +0 -0
- {risk_network-0.0.9b45 → risk_network-0.0.10}/risk/network/plotter/__init__.py +0 -0
- {risk_network-0.0.9b45 → risk_network-0.0.10}/risk/network/plotter/api.py +0 -0
- {risk_network-0.0.9b45 → risk_network-0.0.10}/risk/network/plotter/canvas.py +0 -0
- {risk_network-0.0.9b45 → risk_network-0.0.10}/risk/network/plotter/contour.py +0 -0
- {risk_network-0.0.9b45 → risk_network-0.0.10}/risk/network/plotter/labels.py +0 -0
- {risk_network-0.0.9b45 → risk_network-0.0.10}/risk/network/plotter/network.py +0 -0
- {risk_network-0.0.9b45 → risk_network-0.0.10}/risk/network/plotter/plotter.py +0 -0
- {risk_network-0.0.9b45 → risk_network-0.0.10}/risk/network/plotter/utils/colors.py +0 -0
- {risk_network-0.0.9b45 → risk_network-0.0.10}/risk/network/plotter/utils/layout.py +0 -0
- {risk_network-0.0.9b45 → risk_network-0.0.10}/risk/risk.py +0 -0
- {risk_network-0.0.9b45 → risk_network-0.0.10}/risk/stats/__init__.py +0 -0
- {risk_network-0.0.9b45 → risk_network-0.0.10}/risk/stats/permutation/__init__.py +0 -0
- {risk_network-0.0.9b45 → risk_network-0.0.10}/risk/stats/permutation/permutation.py +0 -0
- {risk_network-0.0.9b45 → risk_network-0.0.10}/risk/stats/permutation/test_functions.py +0 -0
- {risk_network-0.0.9b45 → risk_network-0.0.10}/risk/stats/significance.py +0 -0
- {risk_network-0.0.9b45 → risk_network-0.0.10}/risk/stats/stat_tests.py +0 -0
- {risk_network-0.0.9b45 → risk_network-0.0.10}/risk_network.egg-info/SOURCES.txt +0 -0
- {risk_network-0.0.9b45 → risk_network-0.0.10}/risk_network.egg-info/dependency_links.txt +0 -0
- {risk_network-0.0.9b45 → risk_network-0.0.10}/risk_network.egg-info/requires.txt +0 -0
- {risk_network-0.0.9b45 → risk_network-0.0.10}/risk_network.egg-info/top_level.txt +0 -0
- {risk_network-0.0.9b45 → risk_network-0.0.10}/setup.cfg +0 -0
@@ -1,6 +1,6 @@
|
|
1
1
|
Metadata-Version: 2.2
|
2
2
|
Name: risk-network
|
3
|
-
Version: 0.0.
|
3
|
+
Version: 0.0.10
|
4
4
|
Summary: A Python package for biological network analysis
|
5
5
|
Author: Ira Horecka
|
6
6
|
Author-email: Ira Horecka <ira89@icloud.com>
|
@@ -726,92 +726,73 @@ Dynamic: requires-python
|
|
726
726
|

|
727
727
|
[](https://doi.org/10.5281/zenodo.xxxxxxx)
|
728
728
|

|
729
|
-

|
730
730
|
|
731
|
-
**RISK** (Regional Inference of Significant Kinships) is a next-generation tool
|
731
|
+
**RISK** (Regional Inference of Significant Kinships) is a next-generation tool for biological network annotation and visualization. RISK integrates community detection-based clustering, rigorous statistical enrichment analysis, and a modular framework to uncover biologically meaningful relationships and generate high-resolution visualizations. RISK supports diverse data formats and is optimized for large-scale network analysis, making it a valuable resource for researchers in systems biology and beyond.
|
732
732
|
|
733
733
|
## Documentation and Tutorial
|
734
734
|
|
735
|
-
|
736
|
-
- **Tutorial**: An interactive Jupyter notebook tutorial can be found [here](https://github.com/riskportal/network-tutorial).
|
737
|
-
We highly recommend new users to consult the documentation and tutorial early on to fully leverage RISK's capabilities.
|
735
|
+
An interactive Jupyter notebook tutorial can be found [here](https://github.com/riskportal/network-tutorial). We highly recommend new users to consult the documentation and tutorial early on to fully utilize RISK's capabilities.
|
738
736
|
|
739
737
|
## Installation
|
740
738
|
|
741
|
-
RISK is compatible with Python 3.8
|
739
|
+
RISK is compatible with Python 3.8 or later and runs on all major operating systems. To install the latest version of RISK, run:
|
742
740
|
|
743
741
|
```bash
|
744
|
-
pip install risk-network
|
742
|
+
pip install risk-network --upgrade
|
745
743
|
```
|
746
744
|
|
747
745
|
## Features
|
748
746
|
|
749
|
-
- **Comprehensive Network Analysis**: Analyze biological networks
|
750
|
-
- **Advanced Clustering Algorithms**:
|
751
|
-
- **Flexible Visualization**:
|
752
|
-
- **Efficient Data Handling**:
|
753
|
-
- **Statistical Analysis**:
|
747
|
+
- **Comprehensive Network Analysis**: Analyze biological networks (e.g., protein–protein interaction and genetic interaction networks) as well as non-biological networks.
|
748
|
+
- **Advanced Clustering Algorithms**: Supports Louvain, Leiden, Markov Clustering, Greedy Modularity, Label Propagation, Spinglass, and Walktrap for identifying structured network regions.
|
749
|
+
- **Flexible Visualization**: Produce customizable, high-resolution network visualizations with kernel density estimate overlays, adjustable node and edge attributes, and export options in SVG, PNG, and PDF formats.
|
750
|
+
- **Efficient Data Handling**: Supports multiple input/output formats, including JSON, CSV, TSV, Excel, Cytoscape, and GPickle.
|
751
|
+
- **Statistical Analysis**: Assess functional enrichment using hypergeometric, permutation, binomial, chi-squared, Poisson, and z-score tests, ensuring statistical adaptability across datasets.
|
754
752
|
- **Cross-Domain Applicability**: Suitable for network analysis across biological and non-biological domains, including social and communication networks.
|
755
753
|
|
756
754
|
## Example Usage
|
757
755
|
|
758
|
-
We applied RISK to a *Saccharomyces cerevisiae* protein–protein interaction network,
|
756
|
+
We applied RISK to a *Saccharomyces cerevisiae* protein–protein interaction network from Michaelis et al. (2023), filtering for proteins with six or more interactions to emphasize core functional relationships. RISK identified compact, statistically enriched clusters corresponding to biological processes such as ribosomal assembly and mitochondrial organization.
|
759
757
|
|
760
|
-
](https://i.imgur.com/lJHJrJr.jpeg)
|
761
759
|
|
762
|
-
RISK
|
760
|
+
This figure highlights RISK’s capability to detect both established and novel functional modules within the yeast interactome.
|
763
761
|
|
764
762
|
## Citation
|
765
763
|
|
766
|
-
If you use RISK in your research, please cite
|
764
|
+
If you use RISK in your research, please cite:
|
767
765
|
|
768
|
-
**Horecka
|
766
|
+
**Horecka et al.**, "RISK: a next-generation tool for biological network annotation and visualization", **Bioinformatics**, 2025. DOI: [10.1234/zenodo.xxxxxxx](https://doi.org/10.1234/zenodo.xxxxxxx)
|
769
767
|
|
770
768
|
## Software Architecture and Implementation
|
771
769
|
|
772
|
-
RISK features a streamlined, modular architecture designed to meet diverse research needs.
|
770
|
+
RISK features a streamlined, modular architecture designed to meet diverse research needs. It includes dedicated modules for:
|
773
771
|
|
774
|
-
|
775
|
-
|
776
|
-
- **
|
777
|
-
- **Visualization
|
778
|
-
|
779
|
-
### Clustering Algorithms
|
780
|
-
|
781
|
-
- **Available Algorithms**:
|
782
|
-
- Greedy Modularity
|
783
|
-
- Label Propagation
|
784
|
-
- Louvain
|
785
|
-
- Markov Clustering
|
786
|
-
- Spinglass
|
787
|
-
- Walktrap
|
788
|
-
- **Distance Metrics**: Supports both spherical and Euclidean distance metrics.
|
789
|
-
|
790
|
-
### Statistical Tests
|
791
|
-
|
792
|
-
- **Hypergeometric Test**
|
793
|
-
- **Permutation Test** (single- or multi-process modes)
|
794
|
-
- **Poisson Test**
|
772
|
+
- **Data I/O**: Supports JSON, CSV, TSV, Excel, Cytoscape, and GPickle formats.
|
773
|
+
- **Clustering**: Supports multiple clustering methods, including Louvain, Leiden, Markov Clustering, Greedy Modularity, Label Propagation, Spinglass, and Walktrap. Provides flexible distance metrics tailored to network structure.
|
774
|
+
- **Statistical Analysis**: Provides a suite of tests for overrepresentation analysis of annotations.
|
775
|
+
- **Visualization**: Offers customizable, high-resolution output in multiple formats, including SVG, PNG, and PDF.
|
795
776
|
|
796
777
|
## Performance and Efficiency
|
797
778
|
|
798
|
-
|
779
|
+
Benchmarking results demonstrate that RISK efficiently scales to networks exceeding hundreds of thousands of edges, maintaining low execution times and optimal memory usage across statistical tests.
|
799
780
|
|
800
781
|
## Contributing
|
801
782
|
|
802
|
-
We welcome contributions from the community
|
783
|
+
We welcome contributions from the community:
|
803
784
|
|
804
|
-
- [Issues Tracker](https://github.com/
|
805
|
-
- [Source Code](https://github.com/
|
785
|
+
- [Issues Tracker](https://github.com/riskportal/network/issues)
|
786
|
+
- [Source Code](https://github.com/riskportal/network/tree/main/risk)
|
806
787
|
|
807
788
|
## Support
|
808
789
|
|
809
|
-
If you encounter issues or have suggestions for new features, please use the [Issues Tracker](https://github.com/
|
790
|
+
If you encounter issues or have suggestions for new features, please use the [Issues Tracker](https://github.com/riskportal/network/issues) on GitHub.
|
810
791
|
|
811
792
|
## License
|
812
793
|
|
813
|
-
RISK is
|
794
|
+
RISK is open source under the [GNU General Public License v3.0](https://www.gnu.org/licenses/gpl-3.0.en.html).
|
814
795
|
|
815
796
|
---
|
816
797
|
|
817
|
-
**Note**: For detailed documentation and to access the interactive tutorial, please visit the links
|
798
|
+
**Note**: For detailed documentation and to access the interactive tutorial, please visit the links above.
|
@@ -0,0 +1,83 @@
|
|
1
|
+
# RISK Network
|
2
|
+
|
3
|
+
<p align="center">
|
4
|
+
<img src="https://i.imgur.com/8TleEJs.png" width="50%" />
|
5
|
+
</p>
|
6
|
+
|
7
|
+
<br>
|
8
|
+
|
9
|
+

|
10
|
+
[](https://pypi.python.org/pypi/risk-network)
|
11
|
+

|
12
|
+
[](https://doi.org/10.5281/zenodo.xxxxxxx)
|
13
|
+

|
14
|
+

|
15
|
+
|
16
|
+
**RISK** (Regional Inference of Significant Kinships) is a next-generation tool for biological network annotation and visualization. RISK integrates community detection-based clustering, rigorous statistical enrichment analysis, and a modular framework to uncover biologically meaningful relationships and generate high-resolution visualizations. RISK supports diverse data formats and is optimized for large-scale network analysis, making it a valuable resource for researchers in systems biology and beyond.
|
17
|
+
|
18
|
+
## Documentation and Tutorial
|
19
|
+
|
20
|
+
An interactive Jupyter notebook tutorial can be found [here](https://github.com/riskportal/network-tutorial). We highly recommend new users to consult the documentation and tutorial early on to fully utilize RISK's capabilities.
|
21
|
+
|
22
|
+
## Installation
|
23
|
+
|
24
|
+
RISK is compatible with Python 3.8 or later and runs on all major operating systems. To install the latest version of RISK, run:
|
25
|
+
|
26
|
+
```bash
|
27
|
+
pip install risk-network --upgrade
|
28
|
+
```
|
29
|
+
|
30
|
+
## Features
|
31
|
+
|
32
|
+
- **Comprehensive Network Analysis**: Analyze biological networks (e.g., protein–protein interaction and genetic interaction networks) as well as non-biological networks.
|
33
|
+
- **Advanced Clustering Algorithms**: Supports Louvain, Leiden, Markov Clustering, Greedy Modularity, Label Propagation, Spinglass, and Walktrap for identifying structured network regions.
|
34
|
+
- **Flexible Visualization**: Produce customizable, high-resolution network visualizations with kernel density estimate overlays, adjustable node and edge attributes, and export options in SVG, PNG, and PDF formats.
|
35
|
+
- **Efficient Data Handling**: Supports multiple input/output formats, including JSON, CSV, TSV, Excel, Cytoscape, and GPickle.
|
36
|
+
- **Statistical Analysis**: Assess functional enrichment using hypergeometric, permutation, binomial, chi-squared, Poisson, and z-score tests, ensuring statistical adaptability across datasets.
|
37
|
+
- **Cross-Domain Applicability**: Suitable for network analysis across biological and non-biological domains, including social and communication networks.
|
38
|
+
|
39
|
+
## Example Usage
|
40
|
+
|
41
|
+
We applied RISK to a *Saccharomyces cerevisiae* protein–protein interaction network from Michaelis et al. (2023), filtering for proteins with six or more interactions to emphasize core functional relationships. RISK identified compact, statistically enriched clusters corresponding to biological processes such as ribosomal assembly and mitochondrial organization.
|
42
|
+
|
43
|
+
[](https://i.imgur.com/lJHJrJr.jpeg)
|
44
|
+
|
45
|
+
This figure highlights RISK’s capability to detect both established and novel functional modules within the yeast interactome.
|
46
|
+
|
47
|
+
## Citation
|
48
|
+
|
49
|
+
If you use RISK in your research, please cite:
|
50
|
+
|
51
|
+
**Horecka et al.**, "RISK: a next-generation tool for biological network annotation and visualization", **Bioinformatics**, 2025. DOI: [10.1234/zenodo.xxxxxxx](https://doi.org/10.1234/zenodo.xxxxxxx)
|
52
|
+
|
53
|
+
## Software Architecture and Implementation
|
54
|
+
|
55
|
+
RISK features a streamlined, modular architecture designed to meet diverse research needs. It includes dedicated modules for:
|
56
|
+
|
57
|
+
- **Data I/O**: Supports JSON, CSV, TSV, Excel, Cytoscape, and GPickle formats.
|
58
|
+
- **Clustering**: Supports multiple clustering methods, including Louvain, Leiden, Markov Clustering, Greedy Modularity, Label Propagation, Spinglass, and Walktrap. Provides flexible distance metrics tailored to network structure.
|
59
|
+
- **Statistical Analysis**: Provides a suite of tests for overrepresentation analysis of annotations.
|
60
|
+
- **Visualization**: Offers customizable, high-resolution output in multiple formats, including SVG, PNG, and PDF.
|
61
|
+
|
62
|
+
## Performance and Efficiency
|
63
|
+
|
64
|
+
Benchmarking results demonstrate that RISK efficiently scales to networks exceeding hundreds of thousands of edges, maintaining low execution times and optimal memory usage across statistical tests.
|
65
|
+
|
66
|
+
## Contributing
|
67
|
+
|
68
|
+
We welcome contributions from the community:
|
69
|
+
|
70
|
+
- [Issues Tracker](https://github.com/riskportal/network/issues)
|
71
|
+
- [Source Code](https://github.com/riskportal/network/tree/main/risk)
|
72
|
+
|
73
|
+
## Support
|
74
|
+
|
75
|
+
If you encounter issues or have suggestions for new features, please use the [Issues Tracker](https://github.com/riskportal/network/issues) on GitHub.
|
76
|
+
|
77
|
+
## License
|
78
|
+
|
79
|
+
RISK is open source under the [GNU General Public License v3.0](https://www.gnu.org/licenses/gpl-3.0.en.html).
|
80
|
+
|
81
|
+
---
|
82
|
+
|
83
|
+
**Note**: For detailed documentation and to access the interactive tutorial, please visit the links above.
|
@@ -53,7 +53,7 @@ def ensure_nltk_resource(resource: str) -> None:
|
|
53
53
|
print(f"Unzipped '{resource}' successfully.")
|
54
54
|
break # Stop after unzipping the first found ZIP.
|
55
55
|
|
56
|
-
# Final check: Try to
|
56
|
+
# Final check: Try to check resource one last time. If it fails, rai
|
57
57
|
try:
|
58
58
|
nltk.data.find(resource_path)
|
59
59
|
print(f"Resource '{resource}' is now available.")
|
@@ -62,6 +62,11 @@ def ensure_nltk_resource(resource: str) -> None:
|
|
62
62
|
|
63
63
|
|
64
64
|
# Ensure the NLTK stopwords and WordNet resources are available
|
65
|
+
# punkt is known to have issues with the default download method, so we use a custom function if it fails
|
66
|
+
try:
|
67
|
+
ensure_nltk_resource("punkt")
|
68
|
+
except LookupError:
|
69
|
+
nltk.download("punkt")
|
65
70
|
ensure_nltk_resource("stopwords")
|
66
71
|
ensure_nltk_resource("wordnet")
|
67
72
|
# Use NLTK's stopwords - load all languages
|
@@ -3,12 +3,12 @@ risk/neighborhoods/domains
|
|
3
3
|
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
4
4
|
"""
|
5
5
|
|
6
|
-
from contextlib import suppress
|
7
6
|
from itertools import product
|
8
7
|
from typing import Tuple, Union
|
9
8
|
|
10
9
|
import numpy as np
|
11
10
|
import pandas as pd
|
11
|
+
from numpy.linalg import LinAlgError
|
12
12
|
from scipy.cluster.hierarchy import linkage, fcluster
|
13
13
|
from sklearn.metrics import silhouette_score
|
14
14
|
from tqdm import tqdm
|
@@ -42,7 +42,7 @@ def define_domains(
|
|
42
42
|
Args:
|
43
43
|
top_annotations (pd.DataFrame): DataFrame of top annotations data for the network nodes.
|
44
44
|
significant_neighborhoods_significance (np.ndarray): The binary significance matrix below alpha.
|
45
|
-
linkage_criterion (str): The clustering criterion for defining groups.
|
45
|
+
linkage_criterion (str): The clustering criterion for defining groups. Choose "off" to disable clustering.
|
46
46
|
linkage_method (str): The linkage method for clustering. Choose "auto" to optimize.
|
47
47
|
linkage_metric (str): The linkage metric for clustering. Choose "auto" to optimize.
|
48
48
|
linkage_threshold (float, str): The threshold for clustering. Choose "auto" to optimize.
|
@@ -73,7 +73,7 @@ def define_domains(
|
|
73
73
|
domains = fcluster(Z, max_d_optimal, criterion=linkage_criterion)
|
74
74
|
top_annotations["domain"] = 0
|
75
75
|
top_annotations.loc[top_annotations["significant_annotations"], "domain"] = domains
|
76
|
-
except ValueError:
|
76
|
+
except (ValueError, LinAlgError):
|
77
77
|
# If a ValueError is encountered, handle it by assigning unique domains
|
78
78
|
n_rows = len(top_annotations)
|
79
79
|
if linkage_criterion == "off":
|
@@ -247,28 +247,36 @@ def _optimize_silhouette_across_linkage_and_metrics(
|
|
247
247
|
bar_format="{l_bar}{bar}| {n_fmt}/{total_fmt} [{elapsed}<{remaining}]",
|
248
248
|
):
|
249
249
|
# Some linkage methods and metrics may not work with certain data
|
250
|
-
|
250
|
+
try:
|
251
251
|
Z = linkage(m, method=method, metric=metric)
|
252
|
-
# Only optimize silhouette score if the threshold is "auto"
|
253
252
|
if linkage_threshold == "auto":
|
254
|
-
|
255
|
-
|
256
|
-
|
257
|
-
|
258
|
-
|
259
|
-
best_overall_metric = metric
|
253
|
+
try:
|
254
|
+
threshold, score = _find_best_silhouette_score(Z, m, metric, linkage_criterion)
|
255
|
+
except (ValueError, LinAlgError):
|
256
|
+
continue # Skip to the next combination
|
257
|
+
current_threshold = threshold
|
260
258
|
else:
|
261
|
-
# Use the provided threshold without optimization
|
262
259
|
score = silhouette_score(
|
263
260
|
m,
|
264
261
|
fcluster(Z, linkage_threshold * np.max(Z[:, 2]), criterion=linkage_criterion),
|
265
262
|
metric=metric,
|
266
263
|
)
|
267
|
-
|
268
|
-
|
269
|
-
|
270
|
-
|
271
|
-
|
264
|
+
current_threshold = linkage_threshold
|
265
|
+
except (ValueError, LinAlgError):
|
266
|
+
continue # Skip to the next combination
|
267
|
+
|
268
|
+
if score > best_overall_score:
|
269
|
+
best_overall_score = score
|
270
|
+
best_overall_threshold = float(current_threshold) # Ensure it's a float
|
271
|
+
best_overall_method = method
|
272
|
+
best_overall_metric = metric
|
273
|
+
|
274
|
+
# Ensure that we always return a valid tuple:
|
275
|
+
if best_overall_score == -np.inf:
|
276
|
+
# No valid linkage was found; return default values.
|
277
|
+
best_overall_threshold = float(linkage_threshold) if linkage_threshold != "auto" else 0.0
|
278
|
+
best_overall_method = linkage_method
|
279
|
+
best_overall_metric = linkage_metric
|
272
280
|
|
273
281
|
return best_overall_method, best_overall_metric, best_overall_threshold
|
274
282
|
|
@@ -280,7 +288,6 @@ def _find_best_silhouette_score(
|
|
280
288
|
linkage_criterion: str,
|
281
289
|
lower_bound: float = 0.001,
|
282
290
|
upper_bound: float = 1.0,
|
283
|
-
resolution: float = 0.001,
|
284
291
|
) -> Tuple[float, float]:
|
285
292
|
"""Find the best silhouette score using binary search.
|
286
293
|
|
@@ -291,7 +298,6 @@ def _find_best_silhouette_score(
|
|
291
298
|
linkage_criterion (str): Clustering criterion.
|
292
299
|
lower_bound (float, optional): Lower bound for search. Defaults to 0.001.
|
293
300
|
upper_bound (float, optional): Upper bound for search. Defaults to 1.0.
|
294
|
-
resolution (float, optional): Desired resolution for the best threshold. Defaults to 0.001.
|
295
301
|
|
296
302
|
Returns:
|
297
303
|
Tuple[float, float]:
|
@@ -300,6 +306,7 @@ def _find_best_silhouette_score(
|
|
300
306
|
"""
|
301
307
|
best_score = -np.inf
|
302
308
|
best_threshold = None
|
309
|
+
minimum_linkage_threshold = 1e-6
|
303
310
|
|
304
311
|
# Test lower bound
|
305
312
|
max_d_lower = np.max(Z[:, 2]) * lower_bound
|
@@ -328,7 +335,7 @@ def _find_best_silhouette_score(
|
|
328
335
|
lower_bound = (lower_bound + upper_bound) / 2
|
329
336
|
|
330
337
|
# Binary search loop
|
331
|
-
while upper_bound - lower_bound >
|
338
|
+
while upper_bound - lower_bound > minimum_linkage_threshold:
|
332
339
|
mid_threshold = (upper_bound + lower_bound) / 2
|
333
340
|
max_d_mid = np.max(Z[:, 2]) * mid_threshold
|
334
341
|
clusters_mid = fcluster(Z, max_d_mid, criterion=linkage_criterion)
|
@@ -12,7 +12,7 @@ def assign_edge_lengths(
|
|
12
12
|
compute_sphere: bool = True,
|
13
13
|
surface_depth: float = 0.0,
|
14
14
|
) -> nx.Graph:
|
15
|
-
"""Assign edge lengths in the graph, optionally mapping nodes to a sphere
|
15
|
+
"""Assign edge lengths in the graph, optionally mapping nodes to a sphere.
|
16
16
|
|
17
17
|
Args:
|
18
18
|
G (nx.Graph): The input graph.
|
@@ -33,9 +33,8 @@ def assign_edge_lengths(
|
|
33
33
|
return np.arccos(np.clip(dot_products, -1.0, 1.0))
|
34
34
|
return np.linalg.norm(u_coords - v_coords, axis=1)
|
35
35
|
|
36
|
-
# Normalize graph coordinates
|
36
|
+
# Normalize graph coordinates
|
37
37
|
_normalize_graph_coordinates(G)
|
38
|
-
_normalize_weights(G)
|
39
38
|
|
40
39
|
# Map nodes to sphere and adjust depth if required
|
41
40
|
if compute_sphere:
|
@@ -110,22 +109,6 @@ def _normalize_graph_coordinates(G: nx.Graph) -> None:
|
|
110
109
|
G.nodes[node]["x"], G.nodes[node]["y"] = normalized_xy[i]
|
111
110
|
|
112
111
|
|
113
|
-
def _normalize_weights(G: nx.Graph) -> None:
|
114
|
-
"""Normalize the weights of the edges in the graph.
|
115
|
-
|
116
|
-
Args:
|
117
|
-
G (nx.Graph): The input graph with weighted edges.
|
118
|
-
"""
|
119
|
-
# "weight" is present for all edges - weights are 1.0 if weight was not specified by the user
|
120
|
-
weights = [data["weight"] for _, _, data in G.edges(data=True)]
|
121
|
-
if weights: # Ensure there are weighted edges
|
122
|
-
min_weight = min(weights)
|
123
|
-
max_weight = max(weights)
|
124
|
-
range_weight = max_weight - min_weight if max_weight > min_weight else 1
|
125
|
-
for _, _, data in G.edges(data=True):
|
126
|
-
data["normalized_weight"] = (data["weight"] - min_weight) / range_weight
|
127
|
-
|
128
|
-
|
129
112
|
def _create_depth(G: nx.Graph, surface_depth: float = 0.0) -> nx.Graph:
|
130
113
|
"""Adjust the 'z' attribute of each node based on the subcluster strengths and normalized surface depth.
|
131
114
|
|
@@ -1,6 +1,6 @@
|
|
1
1
|
Metadata-Version: 2.2
|
2
2
|
Name: risk-network
|
3
|
-
Version: 0.0.
|
3
|
+
Version: 0.0.10
|
4
4
|
Summary: A Python package for biological network analysis
|
5
5
|
Author: Ira Horecka
|
6
6
|
Author-email: Ira Horecka <ira89@icloud.com>
|
@@ -726,92 +726,73 @@ Dynamic: requires-python
|
|
726
726
|

|
727
727
|
[](https://doi.org/10.5281/zenodo.xxxxxxx)
|
728
728
|

|
729
|
-

|
730
730
|
|
731
|
-
**RISK** (Regional Inference of Significant Kinships) is a next-generation tool
|
731
|
+
**RISK** (Regional Inference of Significant Kinships) is a next-generation tool for biological network annotation and visualization. RISK integrates community detection-based clustering, rigorous statistical enrichment analysis, and a modular framework to uncover biologically meaningful relationships and generate high-resolution visualizations. RISK supports diverse data formats and is optimized for large-scale network analysis, making it a valuable resource for researchers in systems biology and beyond.
|
732
732
|
|
733
733
|
## Documentation and Tutorial
|
734
734
|
|
735
|
-
|
736
|
-
- **Tutorial**: An interactive Jupyter notebook tutorial can be found [here](https://github.com/riskportal/network-tutorial).
|
737
|
-
We highly recommend new users to consult the documentation and tutorial early on to fully leverage RISK's capabilities.
|
735
|
+
An interactive Jupyter notebook tutorial can be found [here](https://github.com/riskportal/network-tutorial). We highly recommend new users to consult the documentation and tutorial early on to fully utilize RISK's capabilities.
|
738
736
|
|
739
737
|
## Installation
|
740
738
|
|
741
|
-
RISK is compatible with Python 3.8
|
739
|
+
RISK is compatible with Python 3.8 or later and runs on all major operating systems. To install the latest version of RISK, run:
|
742
740
|
|
743
741
|
```bash
|
744
|
-
pip install risk-network
|
742
|
+
pip install risk-network --upgrade
|
745
743
|
```
|
746
744
|
|
747
745
|
## Features
|
748
746
|
|
749
|
-
- **Comprehensive Network Analysis**: Analyze biological networks
|
750
|
-
- **Advanced Clustering Algorithms**:
|
751
|
-
- **Flexible Visualization**:
|
752
|
-
- **Efficient Data Handling**:
|
753
|
-
- **Statistical Analysis**:
|
747
|
+
- **Comprehensive Network Analysis**: Analyze biological networks (e.g., protein–protein interaction and genetic interaction networks) as well as non-biological networks.
|
748
|
+
- **Advanced Clustering Algorithms**: Supports Louvain, Leiden, Markov Clustering, Greedy Modularity, Label Propagation, Spinglass, and Walktrap for identifying structured network regions.
|
749
|
+
- **Flexible Visualization**: Produce customizable, high-resolution network visualizations with kernel density estimate overlays, adjustable node and edge attributes, and export options in SVG, PNG, and PDF formats.
|
750
|
+
- **Efficient Data Handling**: Supports multiple input/output formats, including JSON, CSV, TSV, Excel, Cytoscape, and GPickle.
|
751
|
+
- **Statistical Analysis**: Assess functional enrichment using hypergeometric, permutation, binomial, chi-squared, Poisson, and z-score tests, ensuring statistical adaptability across datasets.
|
754
752
|
- **Cross-Domain Applicability**: Suitable for network analysis across biological and non-biological domains, including social and communication networks.
|
755
753
|
|
756
754
|
## Example Usage
|
757
755
|
|
758
|
-
We applied RISK to a *Saccharomyces cerevisiae* protein–protein interaction network,
|
756
|
+
We applied RISK to a *Saccharomyces cerevisiae* protein–protein interaction network from Michaelis et al. (2023), filtering for proteins with six or more interactions to emphasize core functional relationships. RISK identified compact, statistically enriched clusters corresponding to biological processes such as ribosomal assembly and mitochondrial organization.
|
759
757
|
|
760
|
-
](https://i.imgur.com/lJHJrJr.jpeg)
|
761
759
|
|
762
|
-
RISK
|
760
|
+
This figure highlights RISK’s capability to detect both established and novel functional modules within the yeast interactome.
|
763
761
|
|
764
762
|
## Citation
|
765
763
|
|
766
|
-
If you use RISK in your research, please cite
|
764
|
+
If you use RISK in your research, please cite:
|
767
765
|
|
768
|
-
**Horecka
|
766
|
+
**Horecka et al.**, "RISK: a next-generation tool for biological network annotation and visualization", **Bioinformatics**, 2025. DOI: [10.1234/zenodo.xxxxxxx](https://doi.org/10.1234/zenodo.xxxxxxx)
|
769
767
|
|
770
768
|
## Software Architecture and Implementation
|
771
769
|
|
772
|
-
RISK features a streamlined, modular architecture designed to meet diverse research needs.
|
770
|
+
RISK features a streamlined, modular architecture designed to meet diverse research needs. It includes dedicated modules for:
|
773
771
|
|
774
|
-
|
775
|
-
|
776
|
-
- **
|
777
|
-
- **Visualization
|
778
|
-
|
779
|
-
### Clustering Algorithms
|
780
|
-
|
781
|
-
- **Available Algorithms**:
|
782
|
-
- Greedy Modularity
|
783
|
-
- Label Propagation
|
784
|
-
- Louvain
|
785
|
-
- Markov Clustering
|
786
|
-
- Spinglass
|
787
|
-
- Walktrap
|
788
|
-
- **Distance Metrics**: Supports both spherical and Euclidean distance metrics.
|
789
|
-
|
790
|
-
### Statistical Tests
|
791
|
-
|
792
|
-
- **Hypergeometric Test**
|
793
|
-
- **Permutation Test** (single- or multi-process modes)
|
794
|
-
- **Poisson Test**
|
772
|
+
- **Data I/O**: Supports JSON, CSV, TSV, Excel, Cytoscape, and GPickle formats.
|
773
|
+
- **Clustering**: Supports multiple clustering methods, including Louvain, Leiden, Markov Clustering, Greedy Modularity, Label Propagation, Spinglass, and Walktrap. Provides flexible distance metrics tailored to network structure.
|
774
|
+
- **Statistical Analysis**: Provides a suite of tests for overrepresentation analysis of annotations.
|
775
|
+
- **Visualization**: Offers customizable, high-resolution output in multiple formats, including SVG, PNG, and PDF.
|
795
776
|
|
796
777
|
## Performance and Efficiency
|
797
778
|
|
798
|
-
|
779
|
+
Benchmarking results demonstrate that RISK efficiently scales to networks exceeding hundreds of thousands of edges, maintaining low execution times and optimal memory usage across statistical tests.
|
799
780
|
|
800
781
|
## Contributing
|
801
782
|
|
802
|
-
We welcome contributions from the community
|
783
|
+
We welcome contributions from the community:
|
803
784
|
|
804
|
-
- [Issues Tracker](https://github.com/
|
805
|
-
- [Source Code](https://github.com/
|
785
|
+
- [Issues Tracker](https://github.com/riskportal/network/issues)
|
786
|
+
- [Source Code](https://github.com/riskportal/network/tree/main/risk)
|
806
787
|
|
807
788
|
## Support
|
808
789
|
|
809
|
-
If you encounter issues or have suggestions for new features, please use the [Issues Tracker](https://github.com/
|
790
|
+
If you encounter issues or have suggestions for new features, please use the [Issues Tracker](https://github.com/riskportal/network/issues) on GitHub.
|
810
791
|
|
811
792
|
## License
|
812
793
|
|
813
|
-
RISK is
|
794
|
+
RISK is open source under the [GNU General Public License v3.0](https://www.gnu.org/licenses/gpl-3.0.en.html).
|
814
795
|
|
815
796
|
---
|
816
797
|
|
817
|
-
**Note**: For detailed documentation and to access the interactive tutorial, please visit the links
|
798
|
+
**Note**: For detailed documentation and to access the interactive tutorial, please visit the links above.
|
@@ -10,8 +10,8 @@ from setuptools import setup, find_packages
|
|
10
10
|
import numpy
|
11
11
|
|
12
12
|
|
13
|
-
# Function to extract version from __init__.py
|
14
13
|
def find_version():
|
14
|
+
"""Function to find the version string in the __init__.py file."""
|
15
15
|
with open("risk/__init__.py", "r", encoding="utf-8") as f:
|
16
16
|
version_match = re.search(r"^__version__ = ['\"]([^'\"]*)['\"]", f.read(), re.M)
|
17
17
|
if version_match:
|
risk_network-0.0.9b45/README.md
DELETED
@@ -1,102 +0,0 @@
|
|
1
|
-
# RISK Network
|
2
|
-
|
3
|
-
<p align="center">
|
4
|
-
<img src="https://i.imgur.com/8TleEJs.png" width="50%" />
|
5
|
-
</p>
|
6
|
-
|
7
|
-
<br>
|
8
|
-
|
9
|
-

|
10
|
-
[](https://pypi.python.org/pypi/risk-network)
|
11
|
-

|
12
|
-
[](https://doi.org/10.5281/zenodo.xxxxxxx)
|
13
|
-

|
14
|
-

|
15
|
-
|
16
|
-
**RISK** (Regional Inference of Significant Kinships) is a next-generation tool designed to streamline the analysis of biological and non-biological networks. RISK enhances network analysis with its modular architecture, extensive file format support, and advanced clustering algorithms. It simplifies the creation of publication-quality figures, making it an important tool for researchers across disciplines.
|
17
|
-
|
18
|
-
## Documentation and Tutorial
|
19
|
-
|
20
|
-
- **Documentation**: Comprehensive documentation is available [here](Documentation link).
|
21
|
-
- **Tutorial**: An interactive Jupyter notebook tutorial can be found [here](https://github.com/riskportal/network-tutorial).
|
22
|
-
We highly recommend new users to consult the documentation and tutorial early on to fully leverage RISK's capabilities.
|
23
|
-
|
24
|
-
## Installation
|
25
|
-
|
26
|
-
RISK is compatible with Python 3.8 and later versions and operates on all major operating systems. Install RISK via pip:
|
27
|
-
|
28
|
-
```bash
|
29
|
-
pip install risk-network
|
30
|
-
```
|
31
|
-
|
32
|
-
## Features
|
33
|
-
|
34
|
-
- **Comprehensive Network Analysis**: Analyze biological networks such as protein–protein interaction (PPI) and gene regulatory networks, as well as non-biological networks.
|
35
|
-
- **Advanced Clustering Algorithms**: Utilize algorithms like Louvain, Markov Clustering, Spinglass, and more to identify key functional modules.
|
36
|
-
- **Flexible Visualization**: Generate clear, publication-quality figures with customizable node and edge attributes, including colors, shapes, sizes, and labels.
|
37
|
-
- **Efficient Data Handling**: Optimized for large datasets, supporting multiple file formats such as JSON, CSV, TSV, Excel, Cytoscape, and GPickle.
|
38
|
-
- **Statistical Analysis**: Integrated statistical tests, including hypergeometric, permutation, and Poisson tests, to assess the significance of enriched regions.
|
39
|
-
- **Cross-Domain Applicability**: Suitable for network analysis across biological and non-biological domains, including social and communication networks.
|
40
|
-
|
41
|
-
## Example Usage
|
42
|
-
|
43
|
-
We applied RISK to a *Saccharomyces cerevisiae* protein–protein interaction network, revealing both established and novel functional relationships. The visualization below highlights key biological processes such as ribosomal assembly and mitochondrial organization.
|
44
|
-
|
45
|
-

|
46
|
-
|
47
|
-
RISK successfully detected both known and novel functional clusters within the yeast interactome. Clusters related to Golgi transport and actin nucleation were clearly defined and closely located, showcasing RISK's ability to map well-characterized interactions. Additionally, RISK identified links between mRNA processing pathways and vesicle trafficking proteins, consistent with recent studies demonstrating the role of vesicles in mRNA localization and stability.
|
48
|
-
|
49
|
-
## Citation
|
50
|
-
|
51
|
-
If you use RISK in your research, please cite the following:
|
52
|
-
|
53
|
-
**Horecka**, *et al.*, "RISK: a next-generation tool for biological network annotation and visualization", **[Journal Name]**, 2024. DOI: [10.1234/zenodo.xxxxxxx](https://doi.org/10.1234/zenodo.xxxxxxx)
|
54
|
-
|
55
|
-
## Software Architecture and Implementation
|
56
|
-
|
57
|
-
RISK features a streamlined, modular architecture designed to meet diverse research needs. Each module focuses on a specific task—such as network input/output, statistical analysis, or visualization—ensuring ease of adaptation and extension. This design enhances flexibility and reduces development overhead for users integrating RISK into their workflows.
|
58
|
-
|
59
|
-
### Supported Data Formats
|
60
|
-
|
61
|
-
- **Input/Output**: JSON, CSV, TSV, Excel, Cytoscape, GPickle.
|
62
|
-
- **Visualization Outputs**: SVG, PNG, PDF.
|
63
|
-
|
64
|
-
### Clustering Algorithms
|
65
|
-
|
66
|
-
- **Available Algorithms**:
|
67
|
-
- Greedy Modularity
|
68
|
-
- Label Propagation
|
69
|
-
- Louvain
|
70
|
-
- Markov Clustering
|
71
|
-
- Spinglass
|
72
|
-
- Walktrap
|
73
|
-
- **Distance Metrics**: Supports both spherical and Euclidean distance metrics.
|
74
|
-
|
75
|
-
### Statistical Tests
|
76
|
-
|
77
|
-
- **Hypergeometric Test**
|
78
|
-
- **Permutation Test** (single- or multi-process modes)
|
79
|
-
- **Poisson Test**
|
80
|
-
|
81
|
-
## Performance and Efficiency
|
82
|
-
|
83
|
-
In benchmarking tests using the yeast interactome network, RISK demonstrated substantial improvements over previous tools in both computational performance and memory efficiency. RISK processed the dataset approximately **3.25 times faster**, reducing CPU time by **69%**, and required **25% less peak memory usage**, underscoring its efficient utilization of computational resources.
|
84
|
-
|
85
|
-
## Contributing
|
86
|
-
|
87
|
-
We welcome contributions from the community. Please use the following resources:
|
88
|
-
|
89
|
-
- [Issues Tracker](https://github.com/irahorecka/risk/issues)
|
90
|
-
- [Source Code](https://github.com/irahorecka/risk/tree/main/risk)
|
91
|
-
|
92
|
-
## Support
|
93
|
-
|
94
|
-
If you encounter issues or have suggestions for new features, please use the [Issues Tracker](https://github.com/irahorecka/risk/issues) on GitHub.
|
95
|
-
|
96
|
-
## License
|
97
|
-
|
98
|
-
RISK is freely available as open-source software under the [GNU General Public License v3.0](https://www.gnu.org/licenses/gpl-3.0.en.html).
|
99
|
-
|
100
|
-
---
|
101
|
-
|
102
|
-
**Note**: For detailed documentation and to access the interactive tutorial, please visit the links provided in the [Documentation and Tutorial](#documentation-and-tutorial) section.
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|