PyPI - risk-network - Versions diffs - 0.0.8b1__tar.gz → 0.0.8b2__tar.gz - Mend

risk-network 0.0.8b1tar.gz → 0.0.8b2tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (38) hide show

{risk_network-0.0.8b1 → risk_network-0.0.8b2}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: risk-network
-Version: 0.0.8b1
+Version: 0.0.8b2
 Summary: A Python package for biological network analysis
 Author: Ira Horecka
 Author-email: Ira Horecka <ira89@icloud.com>
@@ -709,42 +709,105 @@ Requires-Dist: statsmodels
 Requires-Dist: threadpoolctl
 Requires-Dist: tqdm
-<p align="center">
-  <img src="https://i.imgur.com/Fo9EmnK.png" width="400" />
-</p>
+# RISK
 <p align="center">
-  <a href="https://pypi.python.org/pypi/risk-network"><img src="https://img.shields.io/pypi/v/risk-network.svg" alt="pypiv"></a>
-  <a href="https://www.python.org/downloads/"><img src="https://img.shields.io/badge/python-3.8+-blue.svg" alt="Python 3.8+"></a>
-  <a href="https://raw.githubusercontent.com/irahorecka/chrono24/main/LICENSE"><img src="https://img.shields.io/badge/License-GPLv3-blue.svg" alt="License: GPL v3"></a>
+  <img src="https://i.imgur.com/8TleEJs.png" width="50%" />
 </p>
-## RISK
+<br>
+![Python](https://img.shields.io/badge/python-3.8%2B-yellow)
+[![pypiv](https://img.shields.io/pypi/v/risk-network.svg)](https://pypi.python.org/pypi/risk-network)
+![License](https://img.shields.io/badge/license-GPLv3-purple)
+[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.xxxxxxx.svg)](https://doi.org/10.5281/zenodo.xxxxxxx)
+![Downloads](https://img.shields.io/pypi/dm/risk-network)
+![Platforms](https://img.shields.io/badge/platform-linux%20%7C%20macos%20%7C%20windows-lightgrey)
+**RISK (RISK Infers Spatial Kinships)** is a next-generation tool designed to streamline the analysis of biological and non-biological networks. RISK enhances network analysis with its modular architecture, extensive file format support, and advanced clustering algorithms. It simplifies the creation of publication-quality figures, making it an important tool for researchers across disciplines.
-#### RISK Infers Spatial Kinships
+## Documentation and Tutorial
+- **Documentation**: Comprehensive documentation is available at [Documentation link].
+- **Tutorial**: An interactive Jupyter notebook tutorial can be found at [Tutorial link].
+We highly recommend new users to consult the documentation and tutorial early on to fully leverage RISK's capabilities.
+## Installation
-RISK is a software tool for visualizing spatial relationships in networks. It aims to enhance network analysis by integrating advanced network annotation algorithms, such as Louvain and Markov Clustering, to identify key functional modules and pathways.
+RISK is compatible with Python 3.8 and later versions and operates on all major operating systems. Install RISK via pip:
+```bash
+pip install risk-network
+```
 ## Features
-- Spatial analysis of biological networks
-- Functional enrichment detection
-- Optimized performance
+- **Comprehensive Network Analysis**: Analyze biological networks such as protein–protein interaction (PPI) and gene regulatory networks, as well as non-biological networks.
+- **Advanced Clustering Algorithms**: Utilize algorithms like Louvain, Markov Clustering, Spinglass, and more to identify key functional modules.
+- **Flexible Visualization**: Generate clear, publication-quality figures with customizable node and edge attributes, including colors, shapes, sizes, and labels.
+- **Efficient Data Handling**: Optimized for large datasets, supporting multiple file formats such as JSON, CSV, TSV, Excel, Cytoscape, and GPickle.
+- **Statistical Analysis**: Integrated statistical tests, including hypergeometric, permutation, and Poisson tests, to assess the significance of enriched regions.
+- **Cross-Domain Applicability**: Suitable for network analysis across biological and non-biological domains, including social and communication networks.
-## Example
+## Example Usage
-*Saccharomyces cerevisiae* proteins oriented by physical interactions discovered through affinity enrichment and mass spectrometry (Michaelis et al., 2023).
+We applied RISK to a *Saccharomyces cerevisiae* protein–protein interaction network, revealing both established and novel functional relationships. The visualization below highlights key biological processes such as ribosomal assembly and mitochondrial organization.
-![PPI Network Demo](https://i.imgur.com/NnyK6nO.png)
+![RISK Main Figure](https://i.imgur.com/TUVfvfH.jpeg)
-## Installation
+RISK successfully detected both known and novel functional clusters within the yeast interactome. Clusters related to Golgi transport and actin nucleation were clearly defined and closely located, showcasing RISK's ability to map well-characterized interactions. Additionally, RISK identified links between mRNA processing pathways and vesicle trafficking proteins, consistent with recent studies demonstrating the role of vesicles in mRNA localization and stability.
+## Citation
+If you use RISK in your research, please cite the following:
+**Horecka**, *et al.*, "RISK: a next-generation tool for biological network annotation and visualization", **[Journal Name]**, 2024. DOI: [10.1234/zenodo.xxxxxxx](https://doi.org/10.1234/zenodo.xxxxxxx)
+## Software Architecture and Implementation
-Coming soon...
+RISK features a streamlined, modular architecture designed to meet diverse research needs. Each module focuses on a specific task—such as network input/output, statistical analysis, or visualization—ensuring ease of adaptation and extension. This design enhances flexibility and reduces development overhead for users integrating RISK into their workflows.
-## Usage
+### Supported Data Formats
-Coming soon...
+- **Input/Output**: JSON, CSV, TSV, Excel, Cytoscape, GPickle.
+- **Visualization Outputs**: SVG, PNG, PDF.
+### Clustering Algorithms
+- **Available Algorithms**:
+  - Greedy Modularity
+  - Label Propagation
+  - Louvain
+  - Markov Clustering
+  - Spinglass
+  - Walktrap
+- **Distance Metrics**: Supports both spherical and Euclidean distance metrics.
+### Statistical Tests
+- **Hypergeometric Test**
+- **Permutation Test** (single- or multi-process modes)
+- **Poisson Test**
+## Performance and Efficiency
+In benchmarking tests using the yeast interactome network, RISK demonstrated substantial improvements over previous tools in both computational performance and memory efficiency. RISK processed the dataset approximately **3.25 times faster**, reducing CPU time by **69%**, and required **25% less peak memory usage**, underscoring its efficient utilization of computational resources.
+## Contributing
+We welcome contributions from the community. Please use the following resources:
+- [Issues Tracker](https://github.com/irahorecka/risk/issues)
+- [Source Code](https://github.com/irahorecka/risk/tree/main/risk)
+## Support
+If you encounter issues or have suggestions for new features, please use the [Issues Tracker](https://github.com/irahorecka/risk/issues) on GitHub.
 ## License
-This project is licensed under the GPL-3.0 license.
+RISK is freely available as open-source software under the [GNU General Public License v3.0](https://www.gnu.org/licenses/gpl-3.0.en.html).
+---
+**Note**: For detailed documentation and to access the interactive tutorial, please visit the links provided in the [Documentation and Tutorial](#documentation-and-tutorial) section.

risk_network-0.0.8b2/README.md ADDED Viewed

@@ -0,0 +1,102 @@
+# RISK
+<p align="center">
+  <img src="https://i.imgur.com/8TleEJs.png" width="50%" />
+</p>
+<br>
+![Python](https://img.shields.io/badge/python-3.8%2B-yellow)
+[![pypiv](https://img.shields.io/pypi/v/risk-network.svg)](https://pypi.python.org/pypi/risk-network)
+![License](https://img.shields.io/badge/license-GPLv3-purple)
+[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.xxxxxxx.svg)](https://doi.org/10.5281/zenodo.xxxxxxx)
+![Downloads](https://img.shields.io/pypi/dm/risk-network)
+![Platforms](https://img.shields.io/badge/platform-linux%20%7C%20macos%20%7C%20windows-lightgrey)
+**RISK (RISK Infers Spatial Kinships)** is a next-generation tool designed to streamline the analysis of biological and non-biological networks. RISK enhances network analysis with its modular architecture, extensive file format support, and advanced clustering algorithms. It simplifies the creation of publication-quality figures, making it an important tool for researchers across disciplines.
+## Documentation and Tutorial
+- **Documentation**: Comprehensive documentation is available at [Documentation link].
+- **Tutorial**: An interactive Jupyter notebook tutorial can be found at [Tutorial link].
+We highly recommend new users to consult the documentation and tutorial early on to fully leverage RISK's capabilities.
+## Installation
+RISK is compatible with Python 3.8 and later versions and operates on all major operating systems. Install RISK via pip:
+```bash
+pip install risk-network
+```
+## Features
+- **Comprehensive Network Analysis**: Analyze biological networks such as protein–protein interaction (PPI) and gene regulatory networks, as well as non-biological networks.
+- **Advanced Clustering Algorithms**: Utilize algorithms like Louvain, Markov Clustering, Spinglass, and more to identify key functional modules.
+- **Flexible Visualization**: Generate clear, publication-quality figures with customizable node and edge attributes, including colors, shapes, sizes, and labels.
+- **Efficient Data Handling**: Optimized for large datasets, supporting multiple file formats such as JSON, CSV, TSV, Excel, Cytoscape, and GPickle.
+- **Statistical Analysis**: Integrated statistical tests, including hypergeometric, permutation, and Poisson tests, to assess the significance of enriched regions.
+- **Cross-Domain Applicability**: Suitable for network analysis across biological and non-biological domains, including social and communication networks.
+## Example Usage
+We applied RISK to a *Saccharomyces cerevisiae* protein–protein interaction network, revealing both established and novel functional relationships. The visualization below highlights key biological processes such as ribosomal assembly and mitochondrial organization.
+![RISK Main Figure](https://i.imgur.com/TUVfvfH.jpeg)
+RISK successfully detected both known and novel functional clusters within the yeast interactome. Clusters related to Golgi transport and actin nucleation were clearly defined and closely located, showcasing RISK's ability to map well-characterized interactions. Additionally, RISK identified links between mRNA processing pathways and vesicle trafficking proteins, consistent with recent studies demonstrating the role of vesicles in mRNA localization and stability.
+## Citation
+If you use RISK in your research, please cite the following:
+**Horecka**, *et al.*, "RISK: a next-generation tool for biological network annotation and visualization", **[Journal Name]**, 2024. DOI: [10.1234/zenodo.xxxxxxx](https://doi.org/10.1234/zenodo.xxxxxxx)
+## Software Architecture and Implementation
+RISK features a streamlined, modular architecture designed to meet diverse research needs. Each module focuses on a specific task—such as network input/output, statistical analysis, or visualization—ensuring ease of adaptation and extension. This design enhances flexibility and reduces development overhead for users integrating RISK into their workflows.
+### Supported Data Formats
+- **Input/Output**: JSON, CSV, TSV, Excel, Cytoscape, GPickle.
+- **Visualization Outputs**: SVG, PNG, PDF.
+### Clustering Algorithms
+- **Available Algorithms**:
+  - Greedy Modularity
+  - Label Propagation
+  - Louvain
+  - Markov Clustering
+  - Spinglass
+  - Walktrap
+- **Distance Metrics**: Supports both spherical and Euclidean distance metrics.
+### Statistical Tests
+- **Hypergeometric Test**
+- **Permutation Test** (single- or multi-process modes)
+- **Poisson Test**
+## Performance and Efficiency
+In benchmarking tests using the yeast interactome network, RISK demonstrated substantial improvements over previous tools in both computational performance and memory efficiency. RISK processed the dataset approximately **3.25 times faster**, reducing CPU time by **69%**, and required **25% less peak memory usage**, underscoring its efficient utilization of computational resources.
+## Contributing
+We welcome contributions from the community. Please use the following resources:
+- [Issues Tracker](https://github.com/irahorecka/risk/issues)
+- [Source Code](https://github.com/irahorecka/risk/tree/main/risk)
+## Support
+If you encounter issues or have suggestions for new features, please use the [Issues Tracker](https://github.com/irahorecka/risk/issues) on GitHub.
+## License
+RISK is freely available as open-source software under the [GNU General Public License v3.0](https://www.gnu.org/licenses/gpl-3.0.en.html).
+---
+**Note**: For detailed documentation and to access the interactive tutorial, please visit the links provided in the [Documentation and Tutorial](#documentation-and-tutorial) section.

{risk_network-0.0.8b1 → risk_network-0.0.8b2}/risk/__init__.py RENAMED Viewed

@@ -7,4 +7,4 @@ RISK: RISK Infers Spatial Kinships
 from risk.risk import RISK
-__version__ = "0.0.8-beta.1"
+__version__ = "0.0.8-beta.2"

{risk_network-0.0.8b1 → risk_network-0.0.8b2}/risk/network/graph.py RENAMED Viewed

@@ -148,27 +148,6 @@ class NetworkGraph:
         return transformed_colors
-    def _get_composite_node_colors(self, domain_colors: np.ndarray) -> np.ndarray:
-        """Generate composite colors for nodes based on domain colors and counts.
-        Args:
-            domain_colors (np.ndarray): Array of colors corresponding to each domain.
-        Returns:
-            np.ndarray: Array of composite colors for each node.
-        """
-        # Determine the number of nodes
-        num_nodes = len(self.node_coordinates)
-        # Initialize composite colors array with shape (number of nodes, 4) for RGBA
-        composite_colors = np.zeros((num_nodes, 4))
-        # Assign colors to nodes based on domain_colors
-        for domain_id, nodes in self.domain_id_to_node_ids_map.items():
-            color = domain_colors[domain_id]
-            for node in nodes:
-                composite_colors[node] = color
-        return composite_colors
     def _get_domain_colors(
         self,
         cmap: str = "gist_rainbow",
@@ -193,9 +172,29 @@ class NetworkGraph:
             color=color,
             random_seed=random_seed,
         )
-        self.network, self.domain_id_to_node_ids_map
         return dict(zip(self.domain_id_to_node_ids_map.keys(), domain_colors))
+    def _get_composite_node_colors(self, domain_colors: np.ndarray) -> np.ndarray:
+        """Generate composite colors for nodes based on domain colors and counts.
+        Args:
+            domain_colors (np.ndarray): Array of colors corresponding to each domain.
+        Returns:
+            np.ndarray: Array of composite colors for each node.
+        """
+        # Determine the number of nodes
+        num_nodes = len(self.node_coordinates)
+        # Initialize composite colors array with shape (number of nodes, 4) for RGBA
+        composite_colors = np.zeros((num_nodes, 4))
+        # Assign colors to nodes based on domain_colors
+        for domain_id, nodes in self.domain_id_to_node_ids_map.items():
+            color = domain_colors[domain_id]
+            for node in nodes:
+                composite_colors[node] = color
+        return composite_colors
 def _transform_colors(
     colors: np.ndarray,

{risk_network-0.0.8b1 → risk_network-0.0.8b2}/risk/network/plot.py RENAMED Viewed

@@ -3,6 +3,7 @@ risk/network/plot
 ~~~~~~~~~~~~~~~~~
 """
+from functools import lru_cache
 from typing import Any, Dict, List, Tuple, Union
 import matplotlib.colors as mcolors
@@ -748,15 +749,6 @@ class NetworkPlotter:
         # Set max_labels to the total number of domains if not provided (None)
         if max_labels is None:
             max_labels = len(self.graph.domain_id_to_node_ids_map)
-        # Convert colors to RGBA using the _to_rgba helper function
-        fontcolor = _to_rgba(
-            fontcolor, fontalpha, num_repeats=len(self.graph.domain_id_to_node_ids_map)
-        )
-        arrow_color = _to_rgba(
-            arrow_color, arrow_alpha, num_repeats=len(self.graph.domain_id_to_node_ids_map)
-        )
         # Normalize words_to_omit to lowercase
         if words_to_omit:
             words_to_omit = set(word.lower() for word in words_to_omit)
@@ -773,76 +765,42 @@ class NetworkPlotter:
         filtered_domain_terms = {}
         # Handle the ids_to_keep logic
         if ids_to_keep:
-            # Convert ids_to_keep to remove accidental duplicates
-            ids_to_keep = set(ids_to_keep)
-            # Check if the number of provided ids_to_keep exceeds max_labels
-            if max_labels is not None and len(ids_to_keep) > max_labels:
-                raise ValueError(
-                    f"Number of provided IDs ({len(ids_to_keep)}) exceeds max_labels ({max_labels})."
-                )
-            # Process the specified IDs first
-            for domain in ids_to_keep:
-                if (
-                    domain in self.graph.domain_id_to_domain_terms_map
-                    and domain in domain_centroids
-                ):
-                    # Handle ids_to_replace logic here for ids_to_keep
-                    if ids_to_replace and domain in ids_to_replace:
-                        terms = ids_to_replace[domain].split(" ")
-                    else:
-                        terms = self.graph.domain_id_to_domain_terms_map[domain].split(" ")
-                    # Apply words_to_omit, word length constraints, and max_words
-                    if words_to_omit:
-                        terms = [term for term in terms if term.lower() not in words_to_omit]
-                    terms = [
-                        term for term in terms if min_word_length <= len(term) <= max_word_length
-                    ]
-                    terms = terms[:max_words]
-                    # Check if the domain passes the word count condition
-                    if len(terms) >= min_words:
-                        filtered_domain_centroids[domain] = domain_centroids[domain]
-                        filtered_domain_terms[domain] = " ".join(terms)
-                        valid_indices.append(
-                            list(domain_centroids.keys()).index(domain)
-                        )  # Track the valid index
+            # Process the ids_to_keep first INPLACE
+            self._process_ids_to_keep(
+                ids_to_keep,
+                max_labels,
+                domain_centroids,
+                ids_to_replace,
+                words_to_omit,
+                min_word_length,
+                max_word_length,
+                max_words,
+                min_words,
+                filtered_domain_centroids,
+                filtered_domain_terms,
+                valid_indices,
+            )
         # Calculate remaining labels to plot after processing ids_to_keep
         remaining_labels = (
             max_labels - len(ids_to_keep) if ids_to_keep and max_labels else max_labels
         )
-        # Process remaining domains to fill in additional labels, if there are slots left
+        # Process remaining domains INPLACE to fill in additional labels, if there are slots left
         if remaining_labels and remaining_labels > 0:
-            for idx, (domain, centroid) in enumerate(domain_centroids.items()):
-                # Check if the domain is NaN and continue if true
-                if pd.isna(domain) or (isinstance(domain, float) and np.isnan(domain)):
-                    continue  # Skip NaN domains
-                if ids_to_keep and domain in ids_to_keep:
-                    continue  # Skip domains already handled by ids_to_keep
-                # Handle ids_to_replace logic first
-                if ids_to_replace and domain in ids_to_replace:
-                    terms = ids_to_replace[domain].split(" ")
-                else:
-                    terms = self.graph.domain_id_to_domain_terms_map[domain].split(" ")
-                # Apply words_to_omit, word length constraints, and max_words
-                if words_to_omit:
-                    terms = [term for term in terms if term.lower() not in words_to_omit]
-                terms = [term for term in terms if min_word_length <= len(term) <= max_word_length]
-                terms = terms[:max_words]
-                # Check if the domain passes the word count condition
-                if len(terms) >= min_words:
-                    filtered_domain_centroids[domain] = centroid
-                    filtered_domain_terms[domain] = " ".join(terms)
-                    valid_indices.append(idx)  # Track the valid index
-                # Stop once we've reached the max_labels limit
-                if len(filtered_domain_centroids) >= max_labels:
-                    break
+            self._process_remaining_domains(
+                domain_centroids,
+                ids_to_keep,
+                ids_to_replace,
+                words_to_omit,
+                min_word_length,
+                max_word_length,
+                max_words,
+                min_words,
+                max_labels,
+                filtered_domain_centroids,
+                filtered_domain_terms,
+                valid_indices,
+            )
         # Calculate the bounding box around the network
         center, radius = _calculate_bounding_box(self.graph.node_coordinates, radius_margin=scale)
@@ -850,11 +808,19 @@ class NetworkPlotter:
         best_label_positions = _calculate_best_label_positions(
             filtered_domain_centroids, center, radius, offset
         )
+        # Convert colors to RGBA using the _to_rgba helper function
+        fontcolor = _to_rgba(
+            fontcolor, fontalpha, num_repeats=len(self.graph.domain_id_to_node_ids_map)
+        )
+        arrow_color = _to_rgba(
+            arrow_color, arrow_alpha, num_repeats=len(self.graph.domain_id_to_node_ids_map)
+        )
         # Annotate the network with labels
         for idx, (domain, pos) in zip(valid_indices, best_label_positions.items()):
             centroid = filtered_domain_centroids[domain]
-            annotations = filtered_domain_terms[domain].split(" ")[:max_words]
+            # Split by special key to split annotation into multiple lines
+            annotations = filtered_domain_terms[domain].split("::::")
             self.ax.annotate(
                 "\n".join(annotations),
                 xy=centroid,
@@ -1001,6 +967,158 @@ class NetworkPlotter:
         domain_central_node = node_positions[central_node_idx]
         return domain_central_node
+    def _process_ids_to_keep(
+        self,
+        ids_to_keep: Union[List[str], Tuple[str], np.ndarray, None],
+        max_labels: Union[int, None],
+        domain_centroids: Dict[str, np.ndarray],
+        ids_to_replace: Union[Dict[str, str], None],
+        words_to_omit: Union[List[str], None],
+        min_word_length: int,
+        max_word_length: int,
+        max_words: int,
+        min_words: int,
+        filtered_domain_centroids: Dict[str, np.ndarray],
+        filtered_domain_terms: Dict[str, str],
+        valid_indices: List[int],
+    ) -> None:
+        """Process the ids_to_keep, apply filtering, and store valid domain centroids and terms.
+        Args:
+            ids_to_keep (list, tuple, np.ndarray, or None, optional): IDs of domains that must be labeled.
+            max_labels (int, optional): Maximum number of labels allowed.
+            domain_centroids (dict): Mapping of domains to their centroids.
+            ids_to_replace (dict, optional): A dictionary mapping domain IDs to custom labels. Defaults to None.
+            words_to_omit (list, optional): List of words to omit from the labels. Defaults to None.
+            min_word_length (int): Minimum allowed word length.
+            max_word_length (int): Maximum allowed word length.
+            max_words (int): Maximum number of words allowed.
+            min_words (int): Minimum number of words required for a domain.
+            filtered_domain_centroids (dict): Dictionary to store filtered domain centroids (output).
+            filtered_domain_terms (dict): Dictionary to store filtered domain terms (output).
+            valid_indices (list): List to store valid indices (output).
+        Note:
+            The `filtered_domain_centroids`, `filtered_domain_terms`, and `valid_indices` are modified in-place.
+        Raises:
+            ValueError: If the number of provided `ids_to_keep` exceeds `max_labels`.
+        """
+        # Convert ids_to_keep to a set for faster, unique lookups
+        ids_to_keep = set(ids_to_keep) if ids_to_keep else set()
+        # Check if the number of provided ids_to_keep exceeds max_labels
+        if max_labels is not None and len(ids_to_keep) > max_labels:
+            raise ValueError(
+                f"Number of provided IDs ({len(ids_to_keep)}) exceeds max_labels ({max_labels})."
+            )
+        # Process each domain in ids_to_keep
+        for domain in ids_to_keep:
+            if domain in self.graph.domain_id_to_domain_terms_map and domain in domain_centroids:
+                filtered_domain_terms[domain] = self._process_terms(
+                    domain=domain,
+                    ids_to_replace=ids_to_replace,
+                    words_to_omit=words_to_omit,
+                    min_word_length=min_word_length,
+                    max_word_length=max_word_length,
+                    max_words=max_words,
+                )
+                filtered_domain_centroids[domain] = domain_centroids[domain]
+                valid_indices.append(list(domain_centroids.keys()).index(domain))
+    def _process_remaining_domains(
+        self,
+        domain_centroids: Dict[str, np.ndarray],
+        ids_to_keep: Union[List[str], Tuple[str], np.ndarray, None],
+        ids_to_replace: Union[Dict[str, str], None],
+        words_to_omit: Union[List[str], None],
+        min_word_length: int,
+        max_word_length: int,
+        max_words: int,
+        min_words: int,
+        max_labels: Union[int, None],
+        filtered_domain_centroids: Dict[str, np.ndarray],
+        filtered_domain_terms: Dict[str, str],
+        valid_indices: List[int],
+    ) -> None:
+        """Process remaining domains to fill in additional labels, if there are slots left.
+        Args:
+            domain_centroids (dict): Mapping of domains to their centroids.
+            ids_to_keep (list, tuple, np.ndarray, or None, optional): IDs of domains that must be labeled. Defaults to None.
+            ids_to_replace (dict, optional): A dictionary mapping domain IDs to custom labels. Defaults to None.
+            words_to_omit (list, optional): List of words to omit from the labels. Defaults to None.
+            min_word_length (int): Minimum allowed word length.
+            max_word_length (int): Maximum allowed word length.
+            max_words (int): Maximum number of words allowed.
+            min_words (int): Minimum number of words required for a domain.
+            max_labels (int, optional): Maximum number of labels allowed. Defaults to None.
+            filtered_domain_centroids (dict): Dictionary to store filtered domain centroids (output).
+            filtered_domain_terms (dict): Dictionary to store filtered domain terms (output).
+            valid_indices (list): List to store valid indices (output).
+        Note:
+            The `filtered_domain_centroids`, `filtered_domain_terms`, and `valid_indices` are modified in-place.
+        """
+        for idx, (domain, centroid) in enumerate(domain_centroids.items()):
+            # Check if the domain is NaN and continue if true
+            if pd.isna(domain) or (isinstance(domain, float) and np.isnan(domain)):
+                continue  # Skip NaN domains
+            if ids_to_keep and domain in ids_to_keep:
+                continue  # Skip domains already handled by ids_to_keep
+            filtered_domain_terms[domain] = self._process_terms(
+                domain=domain,
+                ids_to_replace=ids_to_replace,
+                words_to_omit=words_to_omit,
+                min_word_length=min_word_length,
+                max_word_length=max_word_length,
+                max_words=max_words,
+            )
+            filtered_domain_centroids[domain] = centroid
+            valid_indices.append(idx)
+    def _process_terms(
+        self,
+        domain: str,
+        ids_to_replace: Union[Dict[str, str], None],
+        words_to_omit: Union[List[str], None],
+        min_word_length: int,
+        max_word_length: int,
+        max_words: int,
+    ) -> List[str]:
+        """Process terms for a domain, applying word length constraints and combining words where appropriate.
+        Args:
+            domain (str): The domain being processed.
+            ids_to_replace (dict, optional): Dictionary mapping domain IDs to custom labels.
+            words_to_omit (list, optional): List of words to omit from the labels.
+            min_word_length (int): Minimum allowed word length.
+            max_word_length (int): Maximum allowed word length.
+            max_words (int): Maximum number of words allowed.
+        Returns:
+            list: Processed terms, with words combined if necessary to fit within constraints.
+        """
+        # Handle ids_to_replace logic
+        if ids_to_replace and domain in ids_to_replace:
+            terms = ids_to_replace[domain].split(" ")
+        else:
+            terms = self.graph.domain_id_to_domain_terms_map[domain].split(" ")
+        # Apply words_to_omit and word length constraints
+        if words_to_omit:
+            terms = [
+                term
+                for term in terms
+                if term.lower() not in words_to_omit and len(term) >= min_word_length
+            ]
+        # Use the combine_words function directly to handle word combinations and length constraints
+        compressed_terms = _combine_words(tuple(terms), max_word_length, max_words)
+        return compressed_terms
     def get_annotated_node_colors(
         self,
         cmap: str = "gist_rainbow",
@@ -1254,7 +1372,9 @@ def _to_rgba(
     # Handle array of colors case (including strings, RGB, and RGBA)
     elif isinstance(color, (list, tuple, np.ndarray)):
         rgba_colors = []
-        for c in color:
+        for i in range(num_repeats):
+            # Reiterate over the colors if the number of repeats exceeds the number of colors
+            c = color[i % len(color)]
             # Ensure each element is either a valid string or a list/tuple of length 3 (RGB) or 4 (RGBA)
             if isinstance(c, str) or (
                 isinstance(c, (list, tuple, np.ndarray)) and len(c) in [3, 4]
@@ -1313,6 +1433,59 @@ def _calculate_bounding_box(
     return center, radius
+def _combine_words(words: List[str], max_length: int, max_words: int) -> str:
+    """Combine words to fit within the max_length and max_words constraints,
+    and separate the final output by ':' for plotting.
+    Args:
+        words (List[str]): List of words to combine.
+        max_length (int): Maximum allowed length for a combined line.
+        max_words (int): Maximum number of lines (words) allowed.
+    Returns:
+        str: String of combined words separated by ':' for line breaks.
+    """
+    def try_combinations(words_batch: List[str]) -> List[str]:
+        """Try to combine words within a batch and return them with combined words separated by ':'."""
+        combined_lines = []
+        i = 0
+        while i < len(words_batch):
+            current_word = words_batch[i]
+            combined_word = current_word  # Start with the current word
+            # Try to combine more words if possible, and ensure the combination fits within max_length
+            for j in range(i + 1, len(words_batch)):
+                next_word = words_batch[j]
+                if len(combined_word) + len(next_word) + 2 <= max_length:  # +2 for ', '
+                    combined_word = f"{combined_word} {next_word}"
+                    i += 1  # Move past the combined word
+                else:
+                    break  # Stop combining if the length is exceeded
+            combined_lines.append(combined_word)  # Add the combined word or single word
+            i += 1  # Move to the next word
+            # Stop if we've reached the max_words limit
+            if len(combined_lines) >= max_words:
+                break
+        return combined_lines
+    # Main logic: start with max_words number of words
+    combined_lines = try_combinations(words[:max_words])
+    remaining_words = words[max_words:]  # Remaining words after the initial batch
+    # Continue pulling more words until we fill the lines
+    while remaining_words and len(combined_lines) < max_words:
+        available_slots = max_words - len(combined_lines)
+        words_to_add = remaining_words[:available_slots]
+        remaining_words = remaining_words[available_slots:]
+        combined_lines += try_combinations(words_to_add)
+    # Join the final combined lines with '::::', a special separator for line breaks
+    return "::::".join(combined_lines[:max_words])
 def _calculate_best_label_positions(
     filtered_domain_centroids: Dict[str, Any], center: np.ndarray, radius: float, offset: float
 ) -> Dict[str, Any]:

{risk_network-0.0.8b1 → risk_network-0.0.8b2}/risk_network.egg-info/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: risk-network
-Version: 0.0.8b1
+Version: 0.0.8b2
 Summary: A Python package for biological network analysis
 Author: Ira Horecka
 Author-email: Ira Horecka <ira89@icloud.com>
@@ -709,42 +709,105 @@ Requires-Dist: statsmodels
 Requires-Dist: threadpoolctl
 Requires-Dist: tqdm
-<p align="center">
-  <img src="https://i.imgur.com/Fo9EmnK.png" width="400" />
-</p>
+# RISK
 <p align="center">
-  <a href="https://pypi.python.org/pypi/risk-network"><img src="https://img.shields.io/pypi/v/risk-network.svg" alt="pypiv"></a>
-  <a href="https://www.python.org/downloads/"><img src="https://img.shields.io/badge/python-3.8+-blue.svg" alt="Python 3.8+"></a>
-  <a href="https://raw.githubusercontent.com/irahorecka/chrono24/main/LICENSE"><img src="https://img.shields.io/badge/License-GPLv3-blue.svg" alt="License: GPL v3"></a>
+  <img src="https://i.imgur.com/8TleEJs.png" width="50%" />
 </p>
-## RISK
+<br>
+![Python](https://img.shields.io/badge/python-3.8%2B-yellow)
+[![pypiv](https://img.shields.io/pypi/v/risk-network.svg)](https://pypi.python.org/pypi/risk-network)
+![License](https://img.shields.io/badge/license-GPLv3-purple)
+[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.xxxxxxx.svg)](https://doi.org/10.5281/zenodo.xxxxxxx)
+![Downloads](https://img.shields.io/pypi/dm/risk-network)
+![Platforms](https://img.shields.io/badge/platform-linux%20%7C%20macos%20%7C%20windows-lightgrey)
+**RISK (RISK Infers Spatial Kinships)** is a next-generation tool designed to streamline the analysis of biological and non-biological networks. RISK enhances network analysis with its modular architecture, extensive file format support, and advanced clustering algorithms. It simplifies the creation of publication-quality figures, making it an important tool for researchers across disciplines.
-#### RISK Infers Spatial Kinships
+## Documentation and Tutorial
+- **Documentation**: Comprehensive documentation is available at [Documentation link].
+- **Tutorial**: An interactive Jupyter notebook tutorial can be found at [Tutorial link].
+We highly recommend new users to consult the documentation and tutorial early on to fully leverage RISK's capabilities.
+## Installation
-RISK is a software tool for visualizing spatial relationships in networks. It aims to enhance network analysis by integrating advanced network annotation algorithms, such as Louvain and Markov Clustering, to identify key functional modules and pathways.
+RISK is compatible with Python 3.8 and later versions and operates on all major operating systems. Install RISK via pip:
+```bash
+pip install risk-network
+```
 ## Features
-- Spatial analysis of biological networks
-- Functional enrichment detection
-- Optimized performance
+- **Comprehensive Network Analysis**: Analyze biological networks such as protein–protein interaction (PPI) and gene regulatory networks, as well as non-biological networks.
+- **Advanced Clustering Algorithms**: Utilize algorithms like Louvain, Markov Clustering, Spinglass, and more to identify key functional modules.
+- **Flexible Visualization**: Generate clear, publication-quality figures with customizable node and edge attributes, including colors, shapes, sizes, and labels.
+- **Efficient Data Handling**: Optimized for large datasets, supporting multiple file formats such as JSON, CSV, TSV, Excel, Cytoscape, and GPickle.
+- **Statistical Analysis**: Integrated statistical tests, including hypergeometric, permutation, and Poisson tests, to assess the significance of enriched regions.
+- **Cross-Domain Applicability**: Suitable for network analysis across biological and non-biological domains, including social and communication networks.
-## Example
+## Example Usage
-*Saccharomyces cerevisiae* proteins oriented by physical interactions discovered through affinity enrichment and mass spectrometry (Michaelis et al., 2023).
+We applied RISK to a *Saccharomyces cerevisiae* protein–protein interaction network, revealing both established and novel functional relationships. The visualization below highlights key biological processes such as ribosomal assembly and mitochondrial organization.
-![PPI Network Demo](https://i.imgur.com/NnyK6nO.png)
+![RISK Main Figure](https://i.imgur.com/TUVfvfH.jpeg)
-## Installation
+RISK successfully detected both known and novel functional clusters within the yeast interactome. Clusters related to Golgi transport and actin nucleation were clearly defined and closely located, showcasing RISK's ability to map well-characterized interactions. Additionally, RISK identified links between mRNA processing pathways and vesicle trafficking proteins, consistent with recent studies demonstrating the role of vesicles in mRNA localization and stability.
+## Citation
+If you use RISK in your research, please cite the following:
+**Horecka**, *et al.*, "RISK: a next-generation tool for biological network annotation and visualization", **[Journal Name]**, 2024. DOI: [10.1234/zenodo.xxxxxxx](https://doi.org/10.1234/zenodo.xxxxxxx)
+## Software Architecture and Implementation
-Coming soon...
+RISK features a streamlined, modular architecture designed to meet diverse research needs. Each module focuses on a specific task—such as network input/output, statistical analysis, or visualization—ensuring ease of adaptation and extension. This design enhances flexibility and reduces development overhead for users integrating RISK into their workflows.
-## Usage
+### Supported Data Formats
-Coming soon...
+- **Input/Output**: JSON, CSV, TSV, Excel, Cytoscape, GPickle.
+- **Visualization Outputs**: SVG, PNG, PDF.
+### Clustering Algorithms
+- **Available Algorithms**:
+  - Greedy Modularity
+  - Label Propagation
+  - Louvain
+  - Markov Clustering
+  - Spinglass
+  - Walktrap
+- **Distance Metrics**: Supports both spherical and Euclidean distance metrics.
+### Statistical Tests
+- **Hypergeometric Test**
+- **Permutation Test** (single- or multi-process modes)
+- **Poisson Test**
+## Performance and Efficiency
+In benchmarking tests using the yeast interactome network, RISK demonstrated substantial improvements over previous tools in both computational performance and memory efficiency. RISK processed the dataset approximately **3.25 times faster**, reducing CPU time by **69%**, and required **25% less peak memory usage**, underscoring its efficient utilization of computational resources.
+## Contributing
+We welcome contributions from the community. Please use the following resources:
+- [Issues Tracker](https://github.com/irahorecka/risk/issues)
+- [Source Code](https://github.com/irahorecka/risk/tree/main/risk)
+## Support
+If you encounter issues or have suggestions for new features, please use the [Issues Tracker](https://github.com/irahorecka/risk/issues) on GitHub.
 ## License
-This project is licensed under the GPL-3.0 license.
+RISK is freely available as open-source software under the [GNU General Public License v3.0](https://www.gnu.org/licenses/gpl-3.0.en.html).
+---
+**Note**: For detailed documentation and to access the interactive tutorial, please visit the links provided in the [Documentation and Tutorial](#documentation-and-tutorial) section.

risk_network-0.0.8b1/README.md DELETED Viewed

@@ -1,39 +0,0 @@
-<p align="center">
-  <img src="https://i.imgur.com/Fo9EmnK.png" width="400" />
-</p>
-<p align="center">
-  <a href="https://pypi.python.org/pypi/risk-network"><img src="https://img.shields.io/pypi/v/risk-network.svg" alt="pypiv"></a>
-  <a href="https://www.python.org/downloads/"><img src="https://img.shields.io/badge/python-3.8+-blue.svg" alt="Python 3.8+"></a>
-  <a href="https://raw.githubusercontent.com/irahorecka/chrono24/main/LICENSE"><img src="https://img.shields.io/badge/License-GPLv3-blue.svg" alt="License: GPL v3"></a>
-</p>
-## RISK
-#### RISK Infers Spatial Kinships
-RISK is a software tool for visualizing spatial relationships in networks. It aims to enhance network analysis by integrating advanced network annotation algorithms, such as Louvain and Markov Clustering, to identify key functional modules and pathways.
-## Features
-- Spatial analysis of biological networks
-- Functional enrichment detection
-- Optimized performance
-## Example
-*Saccharomyces cerevisiae* proteins oriented by physical interactions discovered through affinity enrichment and mass spectrometry (Michaelis et al., 2023).
-![PPI Network Demo](https://i.imgur.com/NnyK6nO.png)
-## Installation
-Coming soon...
-## Usage
-Coming soon...
-## License
-This project is licensed under the GPL-3.0 license.