PyPI - cluster-affinity - Versions diffs - 0.0.6__tar.gz - Mend

cluster-affinity 0.0.6__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (22) hide show

cluster_affinity-0.0.6/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2022 Sanket Wagle
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

cluster_affinity-0.0.6/PKG-INFO ADDED Viewed

@@ -0,0 +1,27 @@
+Metadata-Version: 2.2
+Name: cluster_affinity
+Version: 0.0.6
+Summary: A tool to calculate the cluster affinity distance between two trees
+Author-email: Sanket Wagle <swagle@iastate.edu>
+Project-URL: Homepage, https://github.com/swagle8987/cluster_affinity
+Project-URL: Issues, https://github.com/swagle8987/cluster_affinity/issues
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Requires-Dist: dendropy
+Requires-Dist: numpy
+Requires-Dist: pytest
+The Asymmetric Cluster Affinity cost is a phylogenetic cost based on calculating the symmetric difference between the cluster representations of trees. Currently the CLI tool supports calculating the cluster affinity distance from the source tree to the target tree.
+### Installation
+Cluster Affinity is available in PyPi and can be installed as pip install cluster_affinity. Note that the package is built for Python 3.10 or higher. Cluster Affinity depends on dendropy, numpy and pytest.
+### Tutorial
+---
+Currently the CLI tool supports comparing two trees and outputting the cluster affinity cost. The CLI command for the same is
+``
+cluster_affinity t1 t2
+``
+where t1 and t2 are paths to newick representations of the trees.

cluster_affinity-0.0.6/README.md ADDED Viewed

@@ -0,0 +1,14 @@
+The Asymmetric Cluster Affinity cost is a phylogenetic cost based on calculating the symmetric difference between the cluster representations of trees. Currently the CLI tool supports calculating the cluster affinity distance from the source tree to the target tree.
+### Installation
+Cluster Affinity is available in PyPi and can be installed as pip install cluster_affinity. Note that the package is built for Python 3.10 or higher. Cluster Affinity depends on dendropy, numpy and pytest.
+### Tutorial
+---
+Currently the CLI tool supports comparing two trees and outputting the cluster affinity cost. The CLI command for the same is
+``
+cluster_affinity t1 t2
+``
+where t1 and t2 are paths to newick representations of the trees.

cluster_affinity-0.0.6/pyproject.toml ADDED Viewed

@@ -0,0 +1,20 @@
+[build-system]
+requires = ["setuptools>=61.0"]
+build-backend = "setuptools.build_meta"
+[project]
+dependencies = ["dendropy","numpy","pytest"]
+name="cluster_affinity"
+version="0.0.6"
+authors = [
+	{name="Sanket Wagle", email="swagle@iastate.edu"}
+]
+description="A tool to calculate the cluster affinity distance between two trees"
+readme="README.md"
+[project.scripts]
+cluster_affinity = "main:cluster_affinity_script"
+[project.urls]
+Homepage = "https://github.com/swagle8987/cluster_affinity"
+Issues = "https://github.com/swagle8987/cluster_affinity/issues"

cluster_affinity-0.0.6/setup.cfg ADDED Viewed

@@ -0,0 +1,4 @@
+[egg_info]
+tag_build =
+tag_date = 0

cluster_affinity-0.0.6/setup.py ADDED Viewed

@@ -0,0 +1,6 @@
+#!/usr/bin/env python
+from setuptools import setup
+if __name__ == "__main__":
+    setup()

cluster_affinity-0.0.6/src/__init__.py ADDED Viewed

	@@ -0,0 +1,2 @@
1	+ from .cluster_computation import rooted_cluster_affinity
2	+ from .main import cluster_affinity_script

cluster_affinity-0.0.6/src/cluster_affinity.egg-info/PKG-INFO ADDED Viewed

@@ -0,0 +1,27 @@
+Metadata-Version: 2.2
+Name: cluster_affinity
+Version: 0.0.6
+Summary: A tool to calculate the cluster affinity distance between two trees
+Author-email: Sanket Wagle <swagle@iastate.edu>
+Project-URL: Homepage, https://github.com/swagle8987/cluster_affinity
+Project-URL: Issues, https://github.com/swagle8987/cluster_affinity/issues
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Requires-Dist: dendropy
+Requires-Dist: numpy
+Requires-Dist: pytest
+The Asymmetric Cluster Affinity cost is a phylogenetic cost based on calculating the symmetric difference between the cluster representations of trees. Currently the CLI tool supports calculating the cluster affinity distance from the source tree to the target tree.
+### Installation
+Cluster Affinity is available in PyPi and can be installed as pip install cluster_affinity. Note that the package is built for Python 3.10 or higher. Cluster Affinity depends on dendropy, numpy and pytest.
+### Tutorial
+---
+Currently the CLI tool supports comparing two trees and outputting the cluster affinity cost. The CLI command for the same is
+``
+cluster_affinity t1 t2
+``
+where t1 and t2 are paths to newick representations of the trees.

cluster_affinity-0.0.6/src/cluster_affinity.egg-info/SOURCES.txt ADDED Viewed

@@ -0,0 +1,20 @@
+LICENSE
+README.md
+pyproject.toml
+setup.py
+src/__init__.py
+src/main.py
+src/cluster_affinity.egg-info/PKG-INFO
+src/cluster_affinity.egg-info/SOURCES.txt
+src/cluster_affinity.egg-info/dependency_links.txt
+src/cluster_affinity.egg-info/entry_points.txt
+src/cluster_affinity.egg-info/requires.txt
+src/cluster_affinity.egg-info/top_level.txt
+src/cluster_computation/__init__.py
+src/cluster_computation/cluster_affinity.py
+src/cluster_computation/extendedtree.py
+src/cluster_computation/heavy_path.py
+src/cluster_computation/tau.py
+src/cluster_computation/try.py
+src/test/__init__.py
+src/test/test_clustertree.py

cluster_affinity-0.0.6/src/cluster_affinity.egg-info/dependency_links.txt ADDED Viewed

	@@ -0,0 +1 @@
1	+

cluster_affinity-0.0.6/src/cluster_affinity.egg-info/entry_points.txt ADDED Viewed

	@@ -0,0 +1,2 @@
1	+ [console_scripts]
2	+ cluster_affinity = main:cluster_affinity_script

cluster_affinity-0.0.6/src/cluster_affinity.egg-info/requires.txt ADDED Viewed

@@ -0,0 +1,3 @@
+dendropy
+numpy
+pytest

cluster_affinity-0.0.6/src/cluster_affinity.egg-info/top_level.txt ADDED Viewed

@@ -0,0 +1,4 @@
+__init__
+cluster_computation
+main
+test

cluster_affinity-0.0.6/src/cluster_computation/__init__.py ADDED Viewed

	@@ -0,0 +1 @@
1	+ from .cluster_affinity import rooted_cluster_affinity

cluster_affinity-0.0.6/src/cluster_computation/cluster_affinity.py ADDED Viewed

@@ -0,0 +1,53 @@
+import math
+import heapq
+import numpy as np
+'''
+    cluster_affinity: Tree -> Tree -> int
+'''
+def rooted_cluster_affinity(t1,t2):
+    t1_cmap = convert_tree_to_cmap(t1)
+    tree_dist = 0
+    for i in t1_cmap.values():
+        tree_dist += cluster_tree_dist(i,t2)
+    return tree_dist
+'''
+    cluster_tree_dist: Cluster -> Tree -> Int
+'''
+def cluster_tree_dist(c,t2):
+    mindist = math.inf
+    intersection_lookup = dict()
+    size_lookup = dict()
+    for i in t2.postorder_node_iter():
+        intersection = 0
+        size = 0
+        if i.is_leaf():
+            size = 1
+            if i.taxon.label in c:
+                intersection = 1
+            else:
+                intersection = 0
+                size_lookup[i] = 1
+        else:
+            for ch in i.child_node_iter():
+                intersection += intersection_lookup[ch]
+                size += size_lookup[ch]
+        intersection_lookup[i] = intersection
+        size_lookup[i] = size
+        newdist = len(c) + size - 2*intersection
+        if mindist > newdist:
+            mindist = newdist
+    return mindist
+def convert_tree_to_cmap(t):
+    cluster_map = dict()
+    for i in t.postorder_node_iter():
+        if i.is_leaf():
+            c = {i.taxon.label}
+        else:
+            c = set()
+            for ch in i.child_node_iter():
+                c = c | cluster_map[ch]
+        cluster_map[i] = c
+    return cluster_map

cluster_affinity-0.0.6/src/cluster_computation/extendedtree.py ADDED Viewed

@@ -0,0 +1,46 @@
+import dendropy
+class ExtendedTree(dendropy.Tree):
+    def __init__(self,*args,**kwargs) -> None:
+        super().__init__(*args,**kwargs)
+        self.taxon_mapping = dict()
+        self.compute_sizes()
+    @classmethod
+    def node_factory(cls, **kwargs):
+        return NewNode(**kwargs)
+    def compute_sizes(self):
+        ind = 0
+        def recursive_compute(node):
+            nonlocal ind
+            if node.is_leaf() and node.parent_node: ## Because the seed node is also a leaf??
+                self.taxon_mapping[node.taxon.label] = node
+                node.size = 1
+            else:
+                node.size = sum([recursive_compute(i) for i in node.child_nodes()])
+            node.index = ind
+            ind += 1
+            return node.size
+        recursive_compute(self.seed_node)
+    def get_leaf_from_taxonlabel(self,l):
+        return self.taxon_mapping[l]
+class NewNode(dendropy.Node):
+    def __init__(self,*args,**kwargs) -> None:
+        super().__init__(*args,**kwargs)
+        self.size = -1
+        self.index = -1
+        self.heavy = False
+    def is_heavy(self):
+        if not self._parent_node:
+            return True
+        elif self.parent_node.size <= 2*self.size:
+            return True
+        else:
+            return False

cluster_affinity-0.0.6/src/cluster_computation/heavy_path.py ADDED Viewed

@@ -0,0 +1,120 @@
+from extendedtree import ExtendedTree, NewNode
+from math import floor
+class IntervalNode:
+    def __init__(self, a, b) -> None:
+       self.start = a
+       self.end = b
+       self.left_child = None
+       self.right_child = None
+       self.parent = None
+    def set_left_child(self,lnode):
+        self.left_child = lnode
+        lnode.set_parent(self)
+    def set_right_child(self,rnode):
+        self.right_child = rnode
+        rnode.set_parent(self)
+    def set_parent(self, parent):
+        self.parent = parent
+    def get_interval(self):
+        return (self.start,self.end)
+class PathSearchTree:
+    def __init__(self,path):
+        self.nodes = set(path)
+        self.path = path
+        self.root = IntervalNode(0,len(path)-1)
+        self.D = dict()
+        self.minval = dict()
+        self.maxval = dict()
+        self.interval_lookup = {(self.root.start,self.root.end):self.root}
+        nnodes = [self.root]
+        while nnodes:
+            n = nnodes.pop()
+            self.minval[n.start,n.end] = self.path[n.end].size
+            self.maxval[n.start,n.end] = self.path[n.start].size
+            if n.start != n.end:
+                self.D[n.start,n.end] = 0
+                l = n.end - n.start
+                lnode = IntervalNode(n.start,n.start + floor(l/2))
+                rnode = IntervalNode(n.start+floor(l/2)+1,n.end)
+                n.set_left_child(lnode)
+                n.set_right_child(rnode)
+                self.interval_lookup[(lnode.start,lnode.end)] = lnode
+                self.interval_lookup[(rnode.start,rnode.end)] = rnode
+                nnodes.extend([lnode,rnode])
+            else:
+                self.D[n.start,n.end] = self.path[n.start].size
+    def update_path(self,l,d):
+        x = self.path.index(l)
+        self.D[x,x] = self.D[x,x] + d
+        self.minval[x,x] = self.minval[x,x] + d
+        self.maxval[x,x] = self.maxval[x,x] +  d
+        a,b = x,x
+        while self.root.start != a and self.root.end != b and a and b:
+            a_p,b_p = self.get_parent_interval((a,b)).get_interval()
+            a_s,b_s = self.get_sibling_interval((a,b)).get_interval()
+            if b == b_p:
+                self.D[a_s,b_s] = self.D[a_s,b_s] + d
+                self.minval[a_s,b_s] = self.minval[a_s,b_s] + d
+                self.maxval[a_s,b_s] = self.maxval[a_s,b_s] + d
+            self.minval[a_p,b_p] = min(self.minval[a_s,b_s],self.minval[a,b])+self.D[a_p,b_p]
+            self.maxval[a_p,b_p] = max(self.maxval[a_s,b_s],self.maxval[a,b])+self.D[a_p,b_p]
+            a,b = a_p,b_p
+    def get_parent_interval(self,interval):
+        return self.interval_lookup[interval].parent
+    def get_sibling_interval(self, interval):
+        n = self.interval_lookup[interval]
+        if n.parent == None:
+            ValueError("Root node has no parent")
+        elif n.start == n.parent.start:
+            return n.parent.right_child
+        else:
+            return n.parent.left_child
+    def contains(self, x):
+        if x in self.nodes:
+            return True
+        else:
+            return False
+    def __str__(self) -> str:
+        return ",".join([str(i) for i in self.path])
+class HeavyPathDecomposition:
+    def __init__(self,tree: ExtendedTree):
+        self.paths = []
+        self.tree = tree
+        visited = set()
+        for l in tree.leaf_node_iter():
+            path = []
+            next_node = l
+            while next_node not in visited:
+                path.append(next_node)
+                visited.add(next_node)
+                if next_node.parent_node and next_node.is_heavy():
+                    next_node = next_node._parent_node
+                else:
+                    break
+            self.paths.append(PathSearchTree(path[::-1]))
+    def get_path(self,x):
+        for i in self.paths:
+            if i.contains(x):
+                return i
+    def __str__(self) -> str:
+        return "\n".join([str(i) for i in self.paths])

cluster_affinity-0.0.6/src/cluster_computation/tau.py ADDED Viewed

	@@ -0,0 +1 @@
1	+ import dendropy

cluster_affinity-0.0.6/src/cluster_computation/try.py ADDED Viewed

@@ -0,0 +1,9 @@
+from heavy_path import *
+import dendropy
+from extendedtree import ExtendedTree
+from cluster_affinity import rooted_cluster_affinity
+t = ExtendedTree.get(data="((A,B),((C,D),(F,E)));", schema="newick",rooting="default-rooted")
+t2 = ExtendedTree.get(data="((A,D),((C,B),(F,E)));", schema="newick",rooting="default-rooted")
+print(rooted_cluster_affinity(t,t))
+print(rooted_cluster_affinity(t,t2)) ## d((A,B)) + d((C,D)) + d((C,D,E,F)) = 1 + 1 + 2 = 4

cluster_affinity-0.0.6/src/main.py ADDED Viewed

@@ -0,0 +1,28 @@
+import argparse
+from cluster_computation import rooted_cluster_affinity
+import dendropy
+def cluster_affinity_script():
+    parser = argparse.ArgumentParser(
+            prog='Cluster Affinity',
+            description='Calculates the Asymmetric Cluster Affinity cost from t1 to t2',
+    )
+    parser.add_argument('t1', help='The source tree from which the cost is to be calculated')
+    parser.add_argument('t2', help='The target tree to which is to be calculated')
+    args = parser.parse_args()
+    tns = dendropy.TaxonNamespace(label="taxa")
+    t1 = dendropy.Tree.get(path=args.t1,taxon_namespace=tns,schema="newick",rooting="default-rooted")
+    t2 = dendropy.Tree.get(path=args.t2,taxon_namespace=tns,schema="newick",rooting="default-rooted")
+    if len(tns)>len(t1.poll_taxa()) or len(tns)>len(t2.poll_taxa()):
+        raise RuntimeWarning("The trees do not have the same taxon set")
+    print(rooted_cluster_affinity(t1,t2))
+if __name__=="__main__":
+    cluster_affinity_script()

cluster_affinity-0.0.6/src/test/__init__.py ADDED Viewed

File without changes

cluster_affinity-0.0.6/src/test/test_clustertree.py ADDED Viewed

@@ -0,0 +1,31 @@
+import unittest
+from ..cluster_computation import *
+from dendropy import Tree,TaxonNamespace
+from dendropy.simulate import treesim
+import math
+import numpy as np
+class TestClusterComputation:
+    t1 = Tree.get(data="((A,B),(C,D));",schema="newick",rooting="default-rooted")
+    t2 = Tree.get(data="((A,C),(B,D));",schema="newick",rooting="default-rooted")
+    def test_cluster_affinity_zero(self):
+        dist = cluster_affinity.rooted_cluster_affinity(self.t1,self.t1)
+        assert dist == 0
+    def test_cluster_affinity(self):
+        dist = cluster_affinity.rooted_cluster_affinity(self.t1,self.t2)
+        assert dist == 2
+    def test_cluster_affinity_tau(self):
+        ntax = 100
+        taxon_ns = TaxonNamespace(["l{}".format(i) for i in range(ntax)])
+        for i in range(1000):
+            t1 = treesim.birth_death_tree(birth_rate=1.0,death_rate=0,num_extant_tips=len(taxon_ns),taxon_namespace=taxon_ns)
+            t2 = treesim.birth_death_tree(birth_rate=1.0,death_rate=0,num_extant_tips=len(taxon_ns),taxon_namespace=taxon_ns)
+            dist = cluster_affinity.rooted_cluster_affinity(t1,t2)
+            assert dist >= 0,"{} {} {} {}".format(t1.as_string(schema="newick"),
+                                               t2.as_string(schema="newick"),
+                                               (np.min(cluster_affinity.rooted_cluster_affinity_matrix(t1,t2),axis=0)),
+                                               cluster_affinity.rooted_cluster_affinity(t1,t2))
+            assert dist <= math.ceil(ntax*ntax - 2*ntax)/4