PyPI - tdaphantom - Versions diffs - 1.0.0__tar.gz - Mend

tdaphantom 1.0.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (19) hide show

tdaphantom-1.0.0/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2026 Moriarty
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

tdaphantom-1.0.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,140 @@
+Metadata-Version: 2.4
+Name: tdaphantom
+Version: 1.0.0
+Summary: Statistical hypothesis testing for persistence diagrams and barcodes
+Author: W. Moriarty
+License: MIT
+Classifier: Programming Language :: Python :: 3
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Operating System :: OS Independent
+Classifier: Topic :: Scientific/Engineering :: Mathematics
+Requires-Python: >=3.10
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Requires-Dist: numpy>=1.26
+Requires-Dist: matplotlib>=3.7
+Requires-Dist: gudhi>=3.11.0
+Requires-Dist: ripser>=0.6.14
+Dynamic: author
+Dynamic: classifier
+Dynamic: description
+Dynamic: description-content-type
+Dynamic: license
+Dynamic: license-file
+Dynamic: requires-dist
+Dynamic: requires-python
+Dynamic: summary
+# TDA-PHANTOM
+Topological data analysis -
+Persistent Homology Analysis via Null Testing On Manifolds (TDA-PHANTOM)
+is a tool for statistically analysing significance of persistence diagrams and barcodes.
+This project implements hypothesis tests from:
+- *Confidence Sets for Persistence Diagrams*
+  Fasy et al. (2014)
+  DOI: https://doi.org/10.1214/14-AOS1252
+- *A Universal Null-Distribution for Topological Data Analysis*
+  Bobrowski and Skraba (2023)
+  DOI: https://doi.org/10.1038/s41598-023-37842-2
+## Installation
+Via [PyPI](https://pypi.org/project/tdaphantom/):
+```bash
+pip install tdaphantom
+```
+Or you can clone this repository and install it manually:
+```bash
+python setup.py install
+```
+## Overview
+This tool can build a Vietoris-Rips complex from either a point cloud or distance matrix.
+It can then be used to visualise the persistence diagram for that complex, and run various hypothesis tests for it.
+The results of these hypothesis tests can be analysed via a return results array, or visualised in a signifiance persistence diagram.
+## Example Usage
+### Generate data
+```python
+def _make_circle(n=2000, noise=0.03, seed=1):
+    rng   = np.random.default_rng(seed)
+    theta = rng.uniform(0, 2 * np.pi, n)
+    pts   = np.stack([np.cos(theta), np.sin(theta)], axis=1)
+    return pts + rng.normal(0, noise, pts.shape)
+pc_circle = _make_circle()
+```
+### Init Phantom class
+```python
+phantom_circle = Phantom(pc_circle)
+```
+### Calculate persistence diagram
+Here we go up to homological dimension 1
+```python
+phantom_circle.calculate_dgms_from_point_cloud_ripser(k=1)
+```
+### Display persistence diagram
+```python
+phantom_circle.display_dgms()
+```
+![Persistence diagram for a circle using phatom](./README_SRC/circle_dgms.png)
+### Run hypothesis test
+```python
+alpha      = 0.01
+correction = None
+methods    = ["universal_null"]
+phantom_circle.hypothesis_test(alpha, correction_method=correction, methods=methods, k=1)
+```
+### Display signifiance persistence diagram
+```python
+phantom_circle.display_results()
+```
+![signifiance Persistence diagram for a circle using phatom](./README_SRC/circle_sig.png)
+## Basic useage
+## Avaliable methods
+## Universal null median
+### Useage
+### Theory
+## Universal null mean
+### Useage
+### Theory
+## Bottleneck subsampling
+### Useage
+### Theory
+## TODO
+* Add bottleneck shells
+* Add bottleneck density
+* Add bottleneck concentration
+* Add more integeration tests
+* Add more unit tests

tdaphantom-1.0.0/README.md ADDED Viewed

@@ -0,0 +1,113 @@
+# TDA-PHANTOM
+Topological data analysis -
+Persistent Homology Analysis via Null Testing On Manifolds (TDA-PHANTOM)
+is a tool for statistically analysing significance of persistence diagrams and barcodes.
+This project implements hypothesis tests from:
+- *Confidence Sets for Persistence Diagrams*
+  Fasy et al. (2014)
+  DOI: https://doi.org/10.1214/14-AOS1252
+- *A Universal Null-Distribution for Topological Data Analysis*
+  Bobrowski and Skraba (2023)
+  DOI: https://doi.org/10.1038/s41598-023-37842-2
+## Installation
+Via [PyPI](https://pypi.org/project/tdaphantom/):
+```bash
+pip install tdaphantom
+```
+Or you can clone this repository and install it manually:
+```bash
+python setup.py install
+```
+## Overview
+This tool can build a Vietoris-Rips complex from either a point cloud or distance matrix.
+It can then be used to visualise the persistence diagram for that complex, and run various hypothesis tests for it.
+The results of these hypothesis tests can be analysed via a return results array, or visualised in a signifiance persistence diagram.
+## Example Usage
+### Generate data
+```python
+def _make_circle(n=2000, noise=0.03, seed=1):
+    rng   = np.random.default_rng(seed)
+    theta = rng.uniform(0, 2 * np.pi, n)
+    pts   = np.stack([np.cos(theta), np.sin(theta)], axis=1)
+    return pts + rng.normal(0, noise, pts.shape)
+pc_circle = _make_circle()
+```
+### Init Phantom class
+```python
+phantom_circle = Phantom(pc_circle)
+```
+### Calculate persistence diagram
+Here we go up to homological dimension 1
+```python
+phantom_circle.calculate_dgms_from_point_cloud_ripser(k=1)
+```
+### Display persistence diagram
+```python
+phantom_circle.display_dgms()
+```
+![Persistence diagram for a circle using phatom](./README_SRC/circle_dgms.png)
+### Run hypothesis test
+```python
+alpha      = 0.01
+correction = None
+methods    = ["universal_null"]
+phantom_circle.hypothesis_test(alpha, correction_method=correction, methods=methods, k=1)
+```
+### Display signifiance persistence diagram
+```python
+phantom_circle.display_results()
+```
+![signifiance Persistence diagram for a circle using phatom](./README_SRC/circle_sig.png)
+## Basic useage
+## Avaliable methods
+## Universal null median
+### Useage
+### Theory
+## Universal null mean
+### Useage
+### Theory
+## Bottleneck subsampling
+### Useage
+### Theory
+## TODO
+* Add bottleneck shells
+* Add bottleneck density
+* Add bottleneck concentration
+* Add more integeration tests
+* Add more unit tests

tdaphantom-1.0.0/setup.cfg ADDED Viewed

@@ -0,0 +1,4 @@
+[egg_info]
+tag_build =
+tag_date = 0

tdaphantom-1.0.0/setup.py ADDED Viewed

@@ -0,0 +1,29 @@
+from setuptools import setup, find_packages
+with open("README.md", "r", encoding="utf-8") as f:
+    long_description = f.read()
+setup(
+    name="tdaphantom",
+    version="1.0.0",
+    description="Statistical hypothesis testing for persistence diagrams and barcodes",
+    long_description=long_description,
+    long_description_content_type="text/markdown",
+    author="W. Moriarty",
+    license="MIT",
+    packages=find_packages(),
+    include_package_data=True,
+    python_requires=">=3.10",
+    install_requires=[
+    "numpy>=1.26",
+    "matplotlib>=3.7",
+    "gudhi>=3.11.0",
+    "ripser>=0.6.14",
+    ],
+    classifiers=[
+        "Programming Language :: Python :: 3",
+        "License :: OSI Approved :: MIT License",
+        "Operating System :: OS Independent",
+        "Topic :: Scientific/Engineering :: Mathematics",
+    ],
+)

tdaphantom-1.0.0/tdaphantom/__init__.py ADDED Viewed

	@@ -0,0 +1,2 @@
1	+
2	+ from .tdaphantom import Phantom

tdaphantom-1.0.0/tdaphantom/hypothesis_tests/__init__.py ADDED Viewed

File without changes

tdaphantom-1.0.0/tdaphantom/hypothesis_tests/bottleneck_distance_tests/__init__.py ADDED Viewed

File without changes

tdaphantom-1.0.0/tdaphantom/hypothesis_tests/bottleneck_distance_tests/bottleneck_distance_test.py ADDED Viewed

@@ -0,0 +1,190 @@
+import numpy as np
+from typing import List, Optional
+import math
+import random
+import gudhi
+from scipy.spatial.distance import cdist
+from tdaphantom.metrics.metrics import w_infinity, hausdorff_dist_matrix, hausdorff
+class BNTest:
+    def __init__(
+        self,
+        point_cloud:         np.ndarray,
+        is_distance_matrix:  bool = False,
+        dgm:                 np.ndarray = None,
+        k:                   int = 1,
+        alpha:               float = 0.05,
+        complex:             str = "VR",
+        max_depth:           int = 50,  # low and slow
+        method: str = "bottleneck:subsample"
+    ):
+        """
+        The bottleneck hypothesis testing from
+        'confidence sets for persistence diagrams'
+        by Fasy et al
+        """
+        self.method = method
+        self.method_calls = {
+            "bottleneck:subsample": self.subsample,
+            "bottleneck:concentration": self.concentration,
+            "bottleneck:shells": self.shells,
+            "bottleneck:density": self.density
+        }
+        self.pc = point_cloud
+        self.dgm = dgm
+        self.is_distance_matrix = is_distance_matrix
+        self.k = k
+        self.complex = complex  # currently only VR is supported
+        self.max_depth = max_depth
+        self.alpha = alpha
+    def _subsampling_method_via_persistence(self, subsample_percentage: float = 0.3) -> float:
+        """
+        DEPRECIATED - DO NOT USE
+        E[W_infnity(hat(P),P)] != E[W_infnity(hat(P),subsample_hat(P)]
+        """
+        n = len(self.dgm)
+        b = int(0.4*n)
+        try:
+            N = min(int(subsample_percentage * math.comb(n, b)), self.max_depth)
+        except OverflowError:
+            N = self.max_depth
+        T_j_array = np.zeros(N)
+        for i in range(N):
+            idx = np.random.choice(n, size=b, replace=False)
+            subsample = self.dgm[idx]
+            T_j_array[i] = self.w_infinity(subsample, self.dgm)
+    def _subsampling_method(self, subsample_percentage: float = 0.8) -> np.ndarray:
+        """
+        Fasy et al. 4.1 subsampling
+        b   = subsample size = O(n / log(n))
+        N   = number of subsamples (theory uses n choose b, but we will use a subset)
+        A bar with persistence > C_b is significant at level alpha.
+        By the bottleneck stability theorem, W_inf(PH(S_n), PH(P)) <= C_b
+        with probability >= 1 - alpha.
+        From the paper:
+        P(H(S_n, M) > C_n) <= alpha + O((b/n)^(1/4))
+        The bias term O((b/n)^(1/4)) -> 0 as b/n -> 0, so theory requires b << n.
+        The paper uses b = O(n / log(n)) for the theoretical guarantee.
+        In practice, larger b gives smaller c_n and more power but looser theory guarantees.
+        """
+        n = len(self.pc)
+        b = min(int(3.5*(n / np.log(n))), int(0.8*n))
+        try:
+            N = min(int(subsample_percentage * math.comb(n, b)), self.max_depth)
+        except OverflowError:
+            N = self.max_depth
+        all_idx = np.arange(n)
+        if not self.is_distance_matrix:
+            D = cdist(self.pc, self.pc)
+        else:
+            D = self.pc
+        T_j_array = np.zeros(N)
+        for i in range(N):
+            idx = np.random.choice(n, size=b, replace=False)
+            # h(S_n, S_b*) = max_{i in S_n} min_{j in S_b*} D[i,j]
+            T_j_array[i] = float(D[:, idx].min(axis=1).max())
+        # bias_order = (b/n)**(0.25)
+        return T_j_array
+    def subsample(self):
+        """
+        Calls subsampling method to calculate c_n and p_values
+        """
+        births = self.dgm[:, 0]
+        deaths = self.dgm[:, 1]
+        pers = deaths - births
+        T_j_array = self._subsampling_method()
+        c_n = float(np.quantile(T_j_array, 1.0 - self.alpha))
+        p_values = np.array([
+            float(np.mean(T_j_array >= p / 2)) for p in pers
+        ])
+        return c_n, p_values
+    def concentration_of_measure_method(self):
+        """
+        Fasy et al. 4.2 concentration of measure
+        From the paper:
+        P(H(S_n, M) > \hat(t_n)) <= alpha + O((log(n)/n)^(1/(2+d)))
+        """
+        ...
+    def concentration(self):
+        """
+        Calls concentration method to calculate c_n
+        """
+        c_n = self.concentration_of_measure_method()
+        return c_n, None
+    def shells_method(self):
+        """
+        Fasy et al. 4.3 method of shells
+        From the paper:
+        P(H(S_{2,n}, M) > \hat(t_{1,n}) <= alpha + O(r_n)
+        """
+        ...
+    def shells(self):
+        """
+        Calls shells method to calculate c_n
+        """
+        c_n = self.shells_method()
+        return c_n, None
+    def denisty_method(self):
+        """
+        Fasy et al. 4.4 Density estimation
+        From the paper:
+        P(||\hat{p}_h - p_h||_infinity > Z_alpha / sqrt(nh^D) ) <= alpha + O(log(n)/nh^D)^((4+D)/(4+2D))
+        """
+        ...
+    def denisty(self):
+        """
+        Calls shells method to calculate c_n
+        """
+        c_n = self.density_method()
+        return c_n, None
+    def results(self) -> dict:
+        """
+        Returns a structured array with one row per bar.
+        Cols: birth, death, pers, p_value, significant
+        and the threshold c_n
+        """
+        c_n = -np.inf
+        p_values = None
+        if self.method in self.method_calls.keys():
+            c_n, p_values = self.method_calls[self.method]()
+        births = self.dgm[:, 0]
+        deaths = self.dgm[:, 1]
+        pers = deaths - births
+        rejected = pers > 2*c_n
+        return {
+            "results_array": np.column_stack([
+                births,
+                deaths,
+                pers,
+                p_values,
+                rejected.astype(float),
+            ]),
+            "threshold": 2*c_n  # used for diagram
+        }

tdaphantom-1.0.0/tdaphantom/hypothesis_tests/universal_null_tests/__init__.py ADDED Viewed

File without changes