PyPI - risk-network - Versions diffs - 0.0.14b2__tar.gz → 0.0.15__tar.gz - Mend

risk-network 0.0.14b2tar.gz → 0.0.15tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (56) hide show

risk_network-0.0.15/PKG-INFO ADDED Viewed

@@ -0,0 +1,109 @@
+Metadata-Version: 2.4
+Name: risk-network
+Version: 0.0.15
+Summary: A Python package for scalable network analysis and high-quality visualization.
+Author-email: Ira Horecka <ira89@icloud.com>
+License: GPL-3.0-or-later
+Project-URL: Homepage, https://github.com/riskportal/risk
+Project-URL: Issues, https://github.com/riskportal/risk/issues
+Classifier: Development Status :: 4 - Beta
+Classifier: Intended Audience :: Developers
+Classifier: Intended Audience :: Science/Research
+Classifier: License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)
+Classifier: Operating System :: OS Independent
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.8
+Classifier: Programming Language :: Python :: 3 :: Only
+Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
+Classifier: Topic :: Scientific/Engineering :: Information Analysis
+Classifier: Topic :: Scientific/Engineering :: Visualization
+Classifier: Topic :: Software Development :: Libraries :: Python Modules
+Requires-Python: >=3.8
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Requires-Dist: ipywidgets
+Requires-Dist: leidenalg
+Requires-Dist: markov_clustering
+Requires-Dist: matplotlib
+Requires-Dist: networkx
+Requires-Dist: nltk
+Requires-Dist: numpy
+Requires-Dist: openpyxl
+Requires-Dist: pandas
+Requires-Dist: python-igraph
+Requires-Dist: python-louvain
+Requires-Dist: scikit-learn
+Requires-Dist: scipy
+Requires-Dist: statsmodels
+Requires-Dist: threadpoolctl
+Requires-Dist: tqdm
+Dynamic: license-file
+# RISK
+![Python](https://img.shields.io/badge/python-3.8%2B-yellow)
+[![pypiv](https://img.shields.io/pypi/v/risk-network.svg)](https://pypi.python.org/pypi/risk-network)
+![License](https://img.shields.io/badge/license-GPLv3-purple)
+[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.xxxxxxx.svg)](https://doi.org/10.5281/zenodo.xxxxxxx)
+![Downloads](https://img.shields.io/pypi/dm/risk-network)
+![Tests](https://github.com/riskportal/risk/actions/workflows/ci.yml/badge.svg)
+**RISK** (Regional Inference of Significant Kinships) is a next-generation tool for biological network annotation and visualization. It integrates community detection algorithms, rigorous overrepresentation analysis, and a modular framework for diverse network types. RISK identifies biologically coherent relationships within networks and generates publication-ready visualizations, making it a useful tool for biological and interdisciplinary network analysis.
+For a full description of RISK and its applications, see:
+<br>
+**Horecka and Röst (2025)**, _"RISK: a next-generation tool for biological network annotation and visualization"_.
+<br>
+DOI: [10.5281/zenodo.xxxxxxx](https://doi.org/10.5281/zenodo.xxxxxxx)
+## Documentation and Tutorial
+Full documentation is available at:
+- **Docs:** [https://riskportal.github.io/risk-docs](https://riskportal.github.io/risk-docs)
+- **Tutorial Jupyter Notebook Repository:** [https://github.com/riskportal/risk-docs](https://github.com/riskportal/risk-docs)
+## Installation
+RISK is compatible with Python 3.8 or later and runs on all major operating systems. To install the latest version of RISK, run:
+```bash
+pip install risk-network --upgrade
+```
+## Key Features of RISK
+- **Broad Data Compatibility**: Accepts multiple network formats (Cytoscape, Cytoscape JSON, GPickle, NetworkX) and user-provided annotations formatted as term–to–gene membership tables (JSON, CSV, TSV, Excel, Python dictionaries).
+- **Flexible Clustering**: Offers Louvain, Leiden, Markov Clustering, Greedy Modularity, Label Propagation, Spinglass, and Walktrap, with user-defined resolution parameters to detect both coarse and fine-grained modules.
+- **Statistical Testing**: Provides permutation, hypergeometric, chi-squared, and binomial tests, balancing statistical rigor with speed.
+- **High-Resolution Visualization**: Generates publication-ready figures with customizable node/edge properties, contour overlays, and export to SVG, PNG, or PDF.
+## Example Usage
+We applied RISK to a _Saccharomyces cerevisiae_ protein–protein interaction (PPI) network (Michaelis _et al_., 2023; 3,839 proteins, 30,955 interactions). RISK identified compact, functional modules overrepresented in Gene Ontology Biological Process (GO BP) terms (Ashburner _et al_., 2000), revealing biological organization including ribosomal assembly, mitochondrial organization, and RNA polymerase activity (P < 0.0001).
+[![RISK analysis of the yeast PPI network](https://i.imgur.com/fSNf5Ad.jpeg)](https://i.imgur.com/fSNf5Ad.jpeg)
+**RISK workflow overview and analysis of the yeast PPI network**. GO BP terms are color-coded to represent key cellular processes—including ribosomal assembly, mitochondrial organization, and RNA polymerase activity (P < 0.0001).
+## Citation
+If you use RISK in your research, please cite the following:
+**Horecka and Röst (2025)**, _"RISK: a next-generation tool for biological network annotation and visualization"_.
+<br>
+DOI: [10.5281/zenodo.xxxxxxx](https://doi.org/10.5281/zenodo.xxxxxxx)
+## Contributing
+We welcome contributions from the community:
+- [Issues Tracker](https://github.com/riskportal/risk/issues)
+- [Source Code](https://github.com/riskportal/risk/tree/main/risk)
+## Support
+If you encounter issues or have suggestions for new features, please use the [Issues Tracker](https://github.com/riskportal/risk/issues) on GitHub.
+## License
+RISK is open source under the [GNU General Public License v3.0](https://www.gnu.org/licenses/gpl-3.0.en.html).

risk_network-0.0.15/README.md ADDED Viewed

@@ -0,0 +1,68 @@
+# RISK
+![Python](https://img.shields.io/badge/python-3.8%2B-yellow)
+[![pypiv](https://img.shields.io/pypi/v/risk-network.svg)](https://pypi.python.org/pypi/risk-network)
+![License](https://img.shields.io/badge/license-GPLv3-purple)
+[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.xxxxxxx.svg)](https://doi.org/10.5281/zenodo.xxxxxxx)
+![Downloads](https://img.shields.io/pypi/dm/risk-network)
+![Tests](https://github.com/riskportal/risk/actions/workflows/ci.yml/badge.svg)
+**RISK** (Regional Inference of Significant Kinships) is a next-generation tool for biological network annotation and visualization. It integrates community detection algorithms, rigorous overrepresentation analysis, and a modular framework for diverse network types. RISK identifies biologically coherent relationships within networks and generates publication-ready visualizations, making it a useful tool for biological and interdisciplinary network analysis.
+For a full description of RISK and its applications, see:
+<br>
+**Horecka and Röst (2025)**, _"RISK: a next-generation tool for biological network annotation and visualization"_.
+<br>
+DOI: [10.5281/zenodo.xxxxxxx](https://doi.org/10.5281/zenodo.xxxxxxx)
+## Documentation and Tutorial
+Full documentation is available at:
+- **Docs:** [https://riskportal.github.io/risk-docs](https://riskportal.github.io/risk-docs)
+- **Tutorial Jupyter Notebook Repository:** [https://github.com/riskportal/risk-docs](https://github.com/riskportal/risk-docs)
+## Installation
+RISK is compatible with Python 3.8 or later and runs on all major operating systems. To install the latest version of RISK, run:
+```bash
+pip install risk-network --upgrade
+```
+## Key Features of RISK
+- **Broad Data Compatibility**: Accepts multiple network formats (Cytoscape, Cytoscape JSON, GPickle, NetworkX) and user-provided annotations formatted as term–to–gene membership tables (JSON, CSV, TSV, Excel, Python dictionaries).
+- **Flexible Clustering**: Offers Louvain, Leiden, Markov Clustering, Greedy Modularity, Label Propagation, Spinglass, and Walktrap, with user-defined resolution parameters to detect both coarse and fine-grained modules.
+- **Statistical Testing**: Provides permutation, hypergeometric, chi-squared, and binomial tests, balancing statistical rigor with speed.
+- **High-Resolution Visualization**: Generates publication-ready figures with customizable node/edge properties, contour overlays, and export to SVG, PNG, or PDF.
+## Example Usage
+We applied RISK to a _Saccharomyces cerevisiae_ protein–protein interaction (PPI) network (Michaelis _et al_., 2023; 3,839 proteins, 30,955 interactions). RISK identified compact, functional modules overrepresented in Gene Ontology Biological Process (GO BP) terms (Ashburner _et al_., 2000), revealing biological organization including ribosomal assembly, mitochondrial organization, and RNA polymerase activity (P < 0.0001).
+[![RISK analysis of the yeast PPI network](https://i.imgur.com/fSNf5Ad.jpeg)](https://i.imgur.com/fSNf5Ad.jpeg)
+**RISK workflow overview and analysis of the yeast PPI network**. GO BP terms are color-coded to represent key cellular processes—including ribosomal assembly, mitochondrial organization, and RNA polymerase activity (P < 0.0001).
+## Citation
+If you use RISK in your research, please cite the following:
+**Horecka and Röst (2025)**, _"RISK: a next-generation tool for biological network annotation and visualization"_.
+<br>
+DOI: [10.5281/zenodo.xxxxxxx](https://doi.org/10.5281/zenodo.xxxxxxx)
+## Contributing
+We welcome contributions from the community:
+- [Issues Tracker](https://github.com/riskportal/risk/issues)
+- [Source Code](https://github.com/riskportal/risk/tree/main/risk)
+## Support
+If you encounter issues or have suggestions for new features, please use the [Issues Tracker](https://github.com/riskportal/risk/issues) on GitHub.
+## License
+RISK is open source under the [GNU General Public License v3.0](https://www.gnu.org/licenses/gpl-3.0.en.html).

{risk_network-0.0.14b2 → risk_network-0.0.15}/pyproject.toml RENAMED Viewed

@@ -48,8 +48,8 @@ dependencies = [
 text = "GPL-3.0-or-later"
 [project.urls]
-Homepage = "https://github.com/riskportal/network"
-Issues = "https://github.com/riskportal/network/issues"
+Homepage = "https://github.com/riskportal/risk"
+Issues = "https://github.com/riskportal/risk/issues"
 [tool.setuptools]
 package-dir = {"" = "src"}

{risk_network-0.0.14b2 → risk_network-0.0.15}/src/risk/__init__.py RENAMED Viewed

@@ -8,4 +8,4 @@ RISK: Regional Inference of Significant Kinships
 from ._risk import RISK
 __all__ = ["RISK"]
-__version__ = "0.0.14-beta.2"
+__version__ = "0.0.15"

{risk_network-0.0.14b2 → risk_network-0.0.15}/src/risk/_neighborhoods/_api.py RENAMED Viewed

@@ -17,8 +17,6 @@ from ._stats import (
     compute_chi2_test,
     compute_hypergeom_test,
     compute_permutation_test,
-    compute_poisson_test,
-    compute_zscore_test,
 )
@@ -226,98 +224,6 @@ class NeighborhoodsAPI:
             max_workers=max_workers,
         )
-    def load_neighborhoods_poisson(
-        self,
-        network: nx.Graph,
-        annotation: Dict[str, Any],
-        distance_metric: Union[str, List, Tuple, np.ndarray] = "louvain",
-        louvain_resolution: float = 0.1,
-        leiden_resolution: float = 1.0,
-        fraction_shortest_edges: Union[float, List, Tuple, np.ndarray] = 0.5,
-        null_distribution: str = "network",
-        random_seed: int = 888,
-    ) -> Dict[str, Any]:
-        """
-        Load significant neighborhoods for the network using the Poisson test.
-        Args:
-            network (nx.Graph): The network graph.
-            annotation (Dict[str, Any]): The annotation associated with the network.
-            distance_metric (str, List, Tuple, or np.ndarray, optional): The distance metric(s) to use. Can be a string for one
-                metric or a list/tuple/ndarray of metrics ('greedy_modularity', 'louvain', 'leiden', 'label_propagation',
-                'markov_clustering', 'walktrap', 'spinglass'). Defaults to 'louvain'.
-            louvain_resolution (float, optional): Resolution parameter for Louvain clustering. Defaults to 0.1.
-            leiden_resolution (float, optional): Resolution parameter for Leiden clustering. Defaults to 1.0.
-            fraction_shortest_edges (float, List, Tuple, or np.ndarray, optional): Shortest edge rank fraction threshold(s) for creating subgraphs.
-                Can be a single float for one threshold or a list/tuple of floats corresponding to multiple thresholds.
-                Defaults to 0.5.
-            null_distribution (str, optional): Type of null distribution ('network' or 'annotation'). Defaults to "network".
-            random_seed (int, optional): Seed for random number generation. Defaults to 888.
-        Returns:
-            Dict[str, Any]: Computed significance of neighborhoods.
-        """
-        log_header("Running Poisson test")
-        # Compute neighborhood significance using the Poisson test
-        return self._load_neighborhoods_by_statistical_test(
-            network=network,
-            annotation=annotation,
-            distance_metric=distance_metric,
-            louvain_resolution=louvain_resolution,
-            leiden_resolution=leiden_resolution,
-            fraction_shortest_edges=fraction_shortest_edges,
-            null_distribution=null_distribution,
-            random_seed=random_seed,
-            statistical_test_key="poisson",
-            statistical_test_function=compute_poisson_test,
-        )
-    def load_neighborhoods_zscore(
-        self,
-        network: nx.Graph,
-        annotation: Dict[str, Any],
-        distance_metric: Union[str, List, Tuple, np.ndarray] = "louvain",
-        louvain_resolution: float = 0.1,
-        leiden_resolution: float = 1.0,
-        fraction_shortest_edges: Union[float, List, Tuple, np.ndarray] = 0.5,
-        null_distribution: str = "network",
-        random_seed: int = 888,
-    ) -> Dict[str, Any]:
-        """
-        Load significant neighborhoods for the network using the z-score test.
-        Args:
-            network (nx.Graph): The network graph.
-            annotation (Dict[str, Any]): The annotation associated with the network.
-            distance_metric (str, List, Tuple, or np.ndarray, optional): The distance metric(s) to use. Can be a string for one
-                metric or a list/tuple/ndarray of metrics ('greedy_modularity', 'louvain', 'leiden', 'label_propagation',
-                'markov_clustering', 'walktrap', 'spinglass'). Defaults to 'louvain'.
-            louvain_resolution (float, optional): Resolution parameter for Louvain clustering. Defaults to 0.1.
-            leiden_resolution (float, optional): Resolution parameter for Leiden clustering. Defaults to 1.0.
-            fraction_shortest_edges (float, List, Tuple, or np.ndarray, optional): Shortest edge rank fraction threshold(s) for creating subgraphs.
-                Can be a single float for one threshold or a list/tuple of floats corresponding to multiple thresholds.
-                Defaults to 0.5.
-            null_distribution (str, optional): Type of null distribution ('network' or 'annotation'). Defaults to "network".
-            random_seed (int, optional): Seed for random number generation. Defaults to 888.
-        Returns:
-            Dict[str, Any]: Computed significance of neighborhoods.
-        """
-        log_header("Running z-score test")
-        # Compute neighborhood significance using the z-score test
-        return self._load_neighborhoods_by_statistical_test(
-            network=network,
-            annotation=annotation,
-            distance_metric=distance_metric,
-            louvain_resolution=louvain_resolution,
-            leiden_resolution=leiden_resolution,
-            fraction_shortest_edges=fraction_shortest_edges,
-            null_distribution=null_distribution,
-            random_seed=random_seed,
-            statistical_test_key="zscore",
-            statistical_test_function=compute_zscore_test,
-        )
     def _load_neighborhoods_by_statistical_test(
         self,
         network: nx.Graph,
@@ -348,7 +254,7 @@ class NeighborhoodsAPI:
             null_distribution (str, optional): The type of null distribution to use ('network' or 'annotation').
                 Defaults to "network".
             random_seed (int, optional): Seed for random number generation to ensure reproducibility. Defaults to 888.
-            statistical_test_key (str, optional): Key or name of the statistical test to be applied (e.g., "hypergeom", "poisson").
+            statistical_test_key (str, optional): Key or name of the statistical test to be applied (e.g., "hypergeom", "binom").
                 Used for logging and debugging. Defaults to "hypergeom".
             statistical_test_function (Any, optional): The function implementing the statistical test.
                 It should accept neighborhoods, annotation, null distribution, and additional kwargs.

{risk_network-0.0.14b2 → risk_network-0.0.15}/src/risk/_neighborhoods/_domains.py RENAMED Viewed

@@ -54,37 +54,48 @@ def define_domains(
     Raises:
         ValueError: If the clustering criterion is set to "off" or if an error occurs during clustering.
     """
-    try:
-        if linkage_criterion == "off":
-            raise ValueError("Clustering is turned off.")
+    # Validate args first; let user mistakes raise immediately
+    clustering_off = _validate_clustering_args(
+        linkage_criterion, linkage_method, linkage_metric, linkage_threshold
+    )
+    # If clustering is turned off, assign unique domains and skip
+    if clustering_off:
+        n_rows = len(top_annotation)
+        logger.warning("Clustering is turned off. Skipping clustering.")
+        top_annotation["domain"] = range(1, n_rows + 1)
+    else:
         # Transpose the matrix to cluster annotations
         m = significant_neighborhoods_significance[:, top_annotation["significant_annotation"]].T
         # Safeguard the matrix by replacing NaN, Inf, and -Inf values
         m = _safeguard_matrix(m)
-        # Optimize silhouette score across different linkage methods and distance metrics
-        best_linkage, best_metric, best_threshold = _optimize_silhouette_across_linkage_and_metrics(
-            m, linkage_criterion, linkage_method, linkage_metric, linkage_threshold
-        )
-        # Perform hierarchical clustering
-        Z = linkage(m, method=best_linkage, metric=best_metric)
-        logger.warning(
-            f"Linkage criterion: '{linkage_criterion}'\nLinkage method: '{best_linkage}'\nLinkage metric: '{best_metric}'\nLinkage threshold: {round(best_threshold, 3)}"
-        )
-        # Calculate the optimal threshold for clustering
-        max_d_optimal = np.max(Z[:, 2]) * best_threshold
-        # Assign domains to the annotation matrix
-        domains = fcluster(Z, max_d_optimal, criterion=linkage_criterion)
-        top_annotation["domain"] = 0
-        top_annotation.loc[top_annotation["significant_annotation"], "domain"] = domains
-    except (ValueError, LinAlgError):
-        # If a ValueError is encountered, handle it by assigning unique domains
-        n_rows = len(top_annotation)
-        if linkage_criterion == "off":
-            logger.warning("Clustering is turned off. Skipping clustering.")
-        else:
-            logger.error("Error encountered. Skipping clustering.")
-        top_annotation["domain"] = range(1, n_rows + 1)  # Assign unique domains
+        try:
+            # Optimize silhouette score across different linkage methods and distance metrics
+            (
+                best_linkage,
+                best_metric,
+                best_threshold,
+            ) = _optimize_silhouette_across_linkage_and_metrics(
+                m, linkage_criterion, linkage_method, linkage_metric, linkage_threshold
+            )
+            # Perform hierarchical clustering
+            Z = linkage(m, method=best_linkage, metric=best_metric)
+            logger.warning(
+                f"Linkage criterion: '{linkage_criterion}'\nLinkage method: '{best_linkage}'\nLinkage metric: '{best_metric}'\nLinkage threshold: {round(best_threshold, 3)}"
+            )
+            # Calculate the optimal threshold for clustering
+            max_d_optimal = np.max(Z[:, 2]) * best_threshold
+            # Assign domains to the annotation matrix
+            domains = fcluster(Z, max_d_optimal, criterion=linkage_criterion)
+            top_annotation["domain"] = 0
+            top_annotation.loc[top_annotation["significant_annotation"], "domain"] = domains
+        except (LinAlgError, ValueError):
+            # Numerical errors or degenerate input are handled gracefully (not user error)
+            n_rows = len(top_annotation)
+            logger.error(
+                "Clustering failed due to numerical or data degeneracy. Assigning unique domains."
+            )
+            top_annotation["domain"] = range(1, n_rows + 1)
     # Create DataFrames to store domain information
     node_to_significance = pd.DataFrame(
@@ -184,6 +195,46 @@ def trim_domains(
     return valid_domains, valid_trimmed_domains_matrix
+def _validate_clustering_args(
+    linkage_criterion: str,
+    linkage_method: str,
+    linkage_metric: str,
+    linkage_threshold: Union[float, str],
+) -> bool:
+    """
+    Validate user-provided clustering arguments.
+    Returns:
+        bool: True if clustering is turned off (criterion == 'off'); False otherwise.
+    Raises:
+        ValueError: If any argument is invalid (user error).
+    """
+    # Allow opting out of clustering without raising
+    if linkage_criterion == "off":
+        return True
+    # Validate linkage method (allow "auto")
+    if linkage_method != "auto" and linkage_method not in LINKAGE_METHODS:
+        raise ValueError(
+            f"Invalid linkage_method '{linkage_method}'. Allowed values are 'auto' or one of: {sorted(LINKAGE_METHODS)}"
+        )
+    # Validate linkage metric (allow "auto")
+    if linkage_metric != "auto" and linkage_metric not in LINKAGE_METRICS:
+        raise ValueError(
+            f"Invalid linkage_metric '{linkage_metric}'. Allowed values are 'auto' or one of: {sorted(LINKAGE_METRICS)}"
+        )
+    # Validate linkage threshold (allow "auto"; otherwise must be float in (0, 1])
+    if linkage_threshold != "auto":
+        try:
+            lt = float(linkage_threshold)
+        except (TypeError, ValueError):
+            raise ValueError("linkage_threshold must be 'auto' or a float in the interval (0, 1].")
+        if not (0.0 < lt <= 1.0):
+            raise ValueError(f"linkage_threshold must be within (0, 1]. Received: {lt}")
+    return False
 def _safeguard_matrix(matrix: np.ndarray) -> np.ndarray:
     """
     Safeguard the matrix by replacing NaN, Inf, and -Inf values.

{risk_network-0.0.14b2 → risk_network-0.0.15}/src/risk/_neighborhoods/_neighborhoods.py RENAMED Viewed

@@ -394,34 +394,33 @@ def _prune_neighbors(
     # Identify indices with non-zero rows in the binary significance matrix
     non_zero_indices = np.where(significant_binary_significance_matrix.sum(axis=1) != 0)[0]
     median_distances = []
+    distance_lookup = {}
     for node in non_zero_indices:
-        neighbors = [
-            n
-            for n in network.neighbors(node)
-            if significant_binary_significance_matrix[n].sum() != 0
-        ]
-        if neighbors:
-            median_distance = np.median(
-                [_get_euclidean_distance(node, n, network) for n in neighbors]
-            )
-            median_distances.append(median_distance)
+        dist = _median_distance_to_significant_neighbors(
+            node, network, significant_binary_significance_matrix
+        )
+        if dist is not None:
+            median_distances.append(dist)
+            distance_lookup[node] = dist
+    if not median_distances:
+        logger.warning("No significant neighbors found for pruning.")
+        significant_significance_matrix = np.where(
+            significant_binary_significance_matrix == 1, significance_matrix, 0
+        )
+        return (
+            significance_matrix,
+            significant_binary_significance_matrix,
+            significant_significance_matrix,
+        )
     # Calculate the distance threshold value based on rank
     distance_threshold_value = _calculate_threshold(median_distances, 1 - distance_threshold)
     # Prune nodes that are outliers based on the distance threshold
-    for row_index in non_zero_indices:
-        neighbors = [
-            n
-            for n in network.neighbors(row_index)
-            if significant_binary_significance_matrix[n].sum() != 0
-        ]
-        if neighbors:
-            median_distance = np.median(
-                [_get_euclidean_distance(row_index, n, network) for n in neighbors]
-            )
-            if median_distance >= distance_threshold_value:
-                significance_matrix[row_index] = 0
-                significant_binary_significance_matrix[row_index] = 0
+    for node, dist in distance_lookup.items():
+        if dist >= distance_threshold_value:
+            significance_matrix[node] = 0
+            significant_binary_significance_matrix[node] = 0
     # Create a matrix where non-significant entries are set to zero
     significant_significance_matrix = np.where(
@@ -435,6 +434,29 @@ def _prune_neighbors(
     )
+def _median_distance_to_significant_neighbors(
+    node, network, significance_mask
+) -> Union[float, None]:
+    """
+    Calculate the median distance from a node to its significant neighbors.
+    Args:
+        node (Any): The node for which the median distance is being calculated.
+        network (nx.Graph): The network graph containing the nodes.
+        significance_mask (np.ndarray): Binary matrix indicating significant nodes.
+    Returns:
+        Union[float, None]: The median distance to significant neighbors, or None if no significant neighbors exist.
+    """
+    neighbors = [n for n in network.neighbors(node) if significance_mask[n].sum() != 0]
+    if not neighbors:
+        return None
+    # Calculate distances to significant neighbors
+    distances = [_get_euclidean_distance(node, n, network) for n in neighbors]
+    return np.median(distances)
 def _get_euclidean_distance(node1: Any, node2: Any, network: nx.Graph) -> float:
     """
     Calculate the Euclidean distance between two nodes in the network.

{risk_network-0.0.14b2 → risk_network-0.0.15}/src/risk/_neighborhoods/_stats/__init__.py RENAMED Viewed

@@ -8,6 +8,4 @@ from ._tests import (
     compute_binom_test,
     compute_chi2_test,
     compute_hypergeom_test,
-    compute_poisson_test,
-    compute_zscore_test,
 )

{risk_network-0.0.14b2 → risk_network-0.0.15}/src/risk/_neighborhoods/_stats/_tests.py RENAMED Viewed

@@ -7,7 +7,7 @@ from typing import Any, Dict
 import numpy as np
 from scipy.sparse import csr_matrix
-from scipy.stats import binom, chi2, hypergeom, norm, poisson
+from scipy.stats import binom, chi2, hypergeom, norm
 def compute_binom_test(
@@ -174,107 +174,3 @@ def compute_hypergeom_test(
     )
     return {"depletion_pvals": depletion_pvals, "enrichment_pvals": enrichment_pvals}
-def compute_poisson_test(
-    neighborhoods: csr_matrix,
-    annotation: csr_matrix,
-    null_distribution: str = "network",
-) -> Dict[str, Any]:
-    """
-    Compute Poisson test for enrichment and depletion in neighborhoods with selectable null distribution.
-    Args:
-        neighborhoods (csr_matrix): Sparse binary matrix representing neighborhoods.
-        annotation (csr_matrix): Sparse binary matrix representing annotation.
-        null_distribution (str, optional): Type of null distribution ('network' or 'annotation'). Defaults to "network".
-    Returns:
-        Dict[str, Any]: Dictionary containing depletion and enrichment p-values.
-    Raises:
-        ValueError: If an invalid null_distribution value is provided.
-    """
-    # Matrix multiplication to get the number of annotated nodes in each neighborhood
-    annotated_in_neighborhood = neighborhoods @ annotation  # Sparse result
-    # Convert annotated counts to dense for downstream calculations
-    annotated_in_neighborhood_dense = annotated_in_neighborhood.toarray()
-    # Compute lambda_expected based on the chosen null distribution
-    if null_distribution == "network":
-        # Use the mean across neighborhoods (axis=1)
-        lambda_expected = np.mean(annotated_in_neighborhood_dense, axis=1, keepdims=True)
-    elif null_distribution == "annotation":
-        # Use the mean across annotations (axis=0)
-        lambda_expected = np.mean(annotated_in_neighborhood_dense, axis=0, keepdims=True)
-    else:
-        raise ValueError(
-            "Invalid null_distribution value. Choose either 'network' or 'annotation'."
-        )
-    # Compute p-values for enrichment and depletion using Poisson distribution
-    enrichment_pvals = 1 - poisson.cdf(annotated_in_neighborhood_dense - 1, lambda_expected)
-    depletion_pvals = poisson.cdf(annotated_in_neighborhood_dense, lambda_expected)
-    return {"enrichment_pvals": enrichment_pvals, "depletion_pvals": depletion_pvals}
-def compute_zscore_test(
-    neighborhoods: csr_matrix,
-    annotation: csr_matrix,
-    null_distribution: str = "network",
-) -> Dict[str, Any]:
-    """
-    Compute z-score test for enrichment and depletion in neighborhoods with selectable null distribution.
-    Args:
-        neighborhoods (csr_matrix): Sparse binary matrix representing neighborhoods.
-        annotation (csr_matrix): Sparse binary matrix representing annotation.
-        null_distribution (str, optional): Type of null distribution ('network' or 'annotation'). Defaults to "network".
-    Returns:
-        Dict[str, Any]: Dictionary containing depletion and enrichment p-values.
-    Raises:
-        ValueError: If an invalid null_distribution value is provided.
-    """
-    # Total number of nodes in the network
-    total_node_count = neighborhoods.shape[1]
-    # Compute sums
-    if null_distribution == "network":
-        background_population = total_node_count
-        neighborhood_sums = neighborhoods.sum(axis=0).A.flatten()  # Dense column sums
-        annotation_sums = annotation.sum(axis=0).A.flatten()  # Dense row sums
-    elif null_distribution == "annotation":
-        annotated_nodes = annotation.sum(axis=1).A.flatten() > 0  # Dense boolean mask
-        background_population = annotated_nodes.sum()
-        neighborhood_sums = neighborhoods[annotated_nodes].sum(axis=0).A.flatten()
-        annotation_sums = annotation[annotated_nodes].sum(axis=0).A.flatten()
-    else:
-        raise ValueError(
-            "Invalid null_distribution value. Choose either 'network' or 'annotation'."
-        )
-    # Observed values
-    observed = (neighborhoods.T @ annotation).toarray()  # Convert sparse result to dense
-    # Expected values under the null
-    neighborhood_sums = neighborhood_sums.reshape(-1, 1)  # Ensure correct shape
-    annotation_sums = annotation_sums.reshape(1, -1)  # Ensure correct shape
-    expected = (neighborhood_sums @ annotation_sums) / background_population
-    # Standard deviation under the null
-    std_dev = np.sqrt(
-        expected
-        * (1 - annotation_sums / background_population)
-        * (1 - neighborhood_sums / background_population)
-    )
-    std_dev[std_dev == 0] = np.nan  # Avoid division by zero
-    # Compute z-scores
-    z_scores = (observed - expected) / std_dev
-    # Convert z-scores to depletion and enrichment p-values
-    enrichment_pvals = norm.sf(z_scores)  # Upper tail
-    depletion_pvals = norm.cdf(z_scores)  # Lower tail
-    return {"depletion_pvals": depletion_pvals, "enrichment_pvals": enrichment_pvals}

risk-network 0.0.14b2__tar.gz → 0.0.15__tar.gz

risk-network 0.0.14b2tar.gz → 0.0.15tar.gz