PyPI - biocypher - Versions diffs - 0.5.34__tar.gz → 0.5.36__tar.gz - Mend

biocypher 0.5.34tar.gz → 0.5.36tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of biocypher might be problematic. Click here for more details.

Files changed (24) hide show

{biocypher-0.5.34 → biocypher-0.5.36}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: biocypher
-Version: 0.5.34
+Version: 0.5.36
 Summary: A unifying framework for biomedical research knowledge graphs
 Home-page: https://github.com/biocypher/biocypher
 License: MIT
@@ -47,6 +47,7 @@ Description-Content-Type: text/markdown
 ## ❓ Description
 Knowledge graphs (KGs) are an [approach to knowledge
 representation](https://en.wikipedia.org/wiki/Knowledge_graph) that uses graph
 structure to facilitate exploration and analysis of complex data, often
@@ -67,6 +68,7 @@ the docs [here](https://biocypher.org).
 </img>
 ## 📖 Documentation
 Tutorial and developer docs at https://biocypher.org. For a quickstart into your
 own pipeline, you can refer to our [project
 template](https://github.com/biocypher/project-template), and for an overview of
@@ -75,6 +77,7 @@ features, visit our [GitHub Project
 Board](https://github.com/orgs/biocypher/projects/3/views/2).
 ## ⚙️ Installation / Usage
 Install the package from PyPI using `pip install biocypher`. More comprehensive
 installation and configuration instructions can be found
 [here](https://biocypher.org/installation.html).
@@ -84,6 +87,7 @@ and the various pipelines we have created. You can find these on the [Components
 Project Board](https://github.com/orgs/biocypher/projects/3/views/2).
 ## 🤝 Getting involved
 We are very happy about contributions from the community, large and small!
 If you would like to contribute to BioCypher development, please refer to
 our [contribution guidelines](CONTRIBUTING.md). :)
@@ -96,11 +100,15 @@ please join our community at https://biocypher.zulipchat.com!
 > This disclaimer was adapted from the [Pooch](https://github.com/fatiando/pooch) project.
 ## ✍️ Citation
-The BioCypher paper has been peer-reviewed in
-[Nature Biotechnology](https://www.nature.com/articles/s41587-023-01848-y).
-Before, it was available as a preprint at https://arxiv.org/abs/2212.13543.
+The BioCypher paper has been peer-reviewed in [Nature
+Biotechnology](https://www.nature.com/articles/s41587-023-01848-y). It is
+available as a self-archived version on
+[Zenodo](https://zenodo.org/records/10320714).  Before, it was available as a
+preprint at https://arxiv.org/abs/2212.13543.
 ## Acknowledgements
 This project has received funding from the European Union’s Horizon 2020
 research and innovation programme under grant agreement No 965193 for DECIDER
 and No 116030 for TransQST.

{biocypher-0.5.34 → biocypher-0.5.36}/README.md RENAMED Viewed

@@ -11,6 +11,7 @@
 ## ❓ Description
 Knowledge graphs (KGs) are an [approach to knowledge
 representation](https://en.wikipedia.org/wiki/Knowledge_graph) that uses graph
 structure to facilitate exploration and analysis of complex data, often
@@ -31,6 +32,7 @@ the docs [here](https://biocypher.org).
 </img>
 ## 📖 Documentation
 Tutorial and developer docs at https://biocypher.org. For a quickstart into your
 own pipeline, you can refer to our [project
 template](https://github.com/biocypher/project-template), and for an overview of
@@ -39,6 +41,7 @@ features, visit our [GitHub Project
 Board](https://github.com/orgs/biocypher/projects/3/views/2).
 ## ⚙️ Installation / Usage
 Install the package from PyPI using `pip install biocypher`. More comprehensive
 installation and configuration instructions can be found
 [here](https://biocypher.org/installation.html).
@@ -48,6 +51,7 @@ and the various pipelines we have created. You can find these on the [Components
 Project Board](https://github.com/orgs/biocypher/projects/3/views/2).
 ## 🤝 Getting involved
 We are very happy about contributions from the community, large and small!
 If you would like to contribute to BioCypher development, please refer to
 our [contribution guidelines](CONTRIBUTING.md). :)
@@ -60,11 +64,15 @@ please join our community at https://biocypher.zulipchat.com!
 > This disclaimer was adapted from the [Pooch](https://github.com/fatiando/pooch) project.
 ## ✍️ Citation
-The BioCypher paper has been peer-reviewed in
-[Nature Biotechnology](https://www.nature.com/articles/s41587-023-01848-y).
-Before, it was available as a preprint at https://arxiv.org/abs/2212.13543.
+The BioCypher paper has been peer-reviewed in [Nature
+Biotechnology](https://www.nature.com/articles/s41587-023-01848-y). It is
+available as a self-archived version on
+[Zenodo](https://zenodo.org/records/10320714).  Before, it was available as a
+preprint at https://arxiv.org/abs/2212.13543.
 ## Acknowledgements
 This project has received funding from the European Union’s Horizon 2020
 research and innovation programme under grant agreement No 965193 for DECIDER
 and No 116030 for TransQST.

{biocypher-0.5.34 → biocypher-0.5.36}/biocypher/_get.py RENAMED Viewed

@@ -15,6 +15,7 @@ BioCypher get module. Used to download and cache data from external sources.
 from __future__ import annotations
 from typing import Optional
+import shutil
 from ._logger import logger
@@ -109,55 +110,98 @@ class Downloader:
         Returns:
             str or list: The path or paths to the downloaded resource(s).
         """
-        # check if resource is cached
-        cache_record = self._get_cache_record(resource)
+        expired = self._is_cache_expired(resource)
+        if expired or not cache:
+            self._delete_expired_resource_cache(resource)
+            logger.info(f"Asking for download of {resource.name}.")
+            paths = self._download_resource(cache, resource)
+        else:
+            paths = self.get_cached_version(resource)
+        self._update_cache_record(resource)
+        return paths
+    def _is_cache_expired(self, resource: Resource) -> bool:
+        """
+        Check if resource cache is expired.
+        Args:
+            resource (Resource): The resource to download.
+        Returns:
+            bool: cache is expired or not.
+        """
+        cache_record = self._get_cache_record(resource)
         if cache_record:
-            # check if resource is expired (formatted in days)
-            dl = cache_record.get("date_downloaded")
-            # convert string to datetime
-            dl = datetime.strptime(dl, "%Y-%m-%d %H:%M:%S.%f")
-            lt = timedelta(days=resource.lifetime)
-            expired = dl + lt < datetime.now()
+            download_time = datetime.strptime(
+                cache_record.get("date_downloaded"), "%Y-%m-%d %H:%M:%S.%f"
+            )
+            lifetime = timedelta(days=resource.lifetime)
+            expired = download_time + lifetime < datetime.now()
         else:
             expired = True
+        return expired
-        # download resource
-        if expired or not cache:
-            logger.info(f"Asking for download of {resource.name}.")
+    def _delete_expired_resource_cache(self, resource: Resource):
+        resource_cache_path = self.cache_dir + "/" + resource.name
+        if os.path.exists(resource_cache_path) and os.path.isdir(
+            resource_cache_path
+        ):
+            shutil.rmtree(resource_cache_path)
-            if resource.is_dir:
-                files = self._get_files(resource)
-                resource.url_s = [resource.url_s + "/" + file for file in files]
-                resource.is_dir = False
-                paths = self._download_or_cache(resource, cache)
-            elif isinstance(resource.url_s, list):
-                paths = []
-                for url in resource.url_s:
-                    fname = url[url.rfind("/") + 1 :]
-                    paths.append(
-                        self._retrieve(
-                            url=url,
-                            fname=fname,
-                            path=os.path.join(self.cache_dir, resource.name),
-                        )
+    def _download_resource(self, cache, resource):
+        """Download a resource.
+        Args:
+            cache (bool): Whether to cache the resource or not.
+            resource (Resource): The resource to download.
+        Returns:
+            str or list: The path or paths to the downloaded resource(s).
+        """
+        if resource.is_dir:
+            files = self._get_files(resource)
+            resource.url_s = [resource.url_s + "/" + file for file in files]
+            resource.is_dir = False
+            paths = self._download_or_cache(resource, cache)
+        elif isinstance(resource.url_s, list):
+            paths = []
+            for url in resource.url_s:
+                fname = url[url.rfind("/") + 1 :]
+                paths.append(
+                    self._retrieve(
+                        url=url,
+                        fname=fname,
+                        path=os.path.join(self.cache_dir, resource.name),
                     )
-            else:
-                fname = resource.url_s[resource.url_s.rfind("/") + 1 :]
-                paths = self._retrieve(
-                    url=resource.url_s,
-                    fname=fname,
-                    path=os.path.join(self.cache_dir, resource.name),
                 )
+        else:
+            fname = resource.url_s[resource.url_s.rfind("/") + 1 :]
+            paths = self._retrieve(
+                url=resource.url_s,
+                fname=fname,
+                path=os.path.join(self.cache_dir, resource.name),
+            )
+        # sometimes a compressed file contains multiple files
+        # TODO ask for a list of files in the archive to be used from the
+        # adapter
+        return paths
-            # sometimes a compressed file contains multiple files
-            # TODO ask for a list of files in the archive to be used from the
-            # adapter
+    def get_cached_version(self, resource) -> list[str]:
+        """Get the cached version of a resource.
-            # update cache record
-            self._update_cache_record(resource)
+        Args:
+            resource (Resource): The resource to get the cached version of.
-            return paths
+        Returns:
+            list[str]: The paths to the cached resource(s).
+        """
+        cached_resource_location = os.path.join(self.cache_dir, resource.name)
+        logger.info(f"Use cached version from {cached_resource_location}.")
+        paths = []
+        for file in os.listdir(cached_resource_location):
+            paths.append(os.path.join(cached_resource_location, file))
+        return paths
     def _retrieve(
         self,

{biocypher-0.5.34 → biocypher-0.5.36}/biocypher/_logger.py RENAMED Viewed

@@ -48,7 +48,7 @@ def get_logger(name: str = "biocypher") -> logging.Logger:
         # create logger
         logger = logging.getLogger(name)
         logger.setLevel(logging.DEBUG)
-        logger.propagate = False
+        logger.propagate = True
         # formatting
         file_formatter = logging.Formatter(

{biocypher-0.5.34 → biocypher-0.5.36}/biocypher/_metadata.py RENAMED Viewed

@@ -19,7 +19,7 @@ import importlib.metadata
 import toml
-_VERSION = "0.5.34"
+_VERSION = "0.5.36"
 def get_metadata():

{biocypher-0.5.34 → biocypher-0.5.36}/biocypher/_misc.py RENAMED Viewed

@@ -76,56 +76,80 @@ def ensure_iterable(value: Any) -> Iterable:
     return value if isinstance(value, LIST_LIKE) else (value,)
-def create_tree_visualisation(inheritance_tree: Union[dict, nx.Graph]) -> str:
+def create_tree_visualisation(inheritance_graph: Union[dict, nx.Graph]) -> Tree:
     """
     Creates a visualisation of the inheritance tree using treelib.
     """
+    inheritance_tree = _get_inheritance_tree(inheritance_graph)
+    classes, root = _find_root_node(inheritance_tree)
+    tree = Tree()
+    tree.create_node(root, root)
+    while classes:
+        for child in classes:
+            parent = inheritance_tree[child]
+            if parent in tree.nodes.keys() or parent == root:
+                tree.create_node(child, child, parent=parent)
+        for node in tree.nodes.keys():
+            if node in classes:
+                classes.remove(node)
+    return tree
+def _get_inheritance_tree(inheritance_graph: Union[dict, nx.Graph]) -> dict:
+    """Transforms an inheritance_graph into an inheritance_tree.
+    Args:
+        inheritance_graph: A dict or nx.Graph representing the inheritance graph.
+    Returns:
+        A dict representing the inheritance tree.
+    """
+    if isinstance(inheritance_graph, nx.Graph):
+        inheritance_tree = nx.to_dict_of_lists(inheritance_graph)
+        multiple_parents_present = _multiple_inheritance_present(
+            inheritance_tree
+        )
+        if multiple_parents_present:
+            logger.warning(
+                "The ontology contains multiple inheritance (one child node has multiple parent nodes). This is not visualized in the following hierarchy tree (the child node is only added once). If you want to browse all relationships of the parsed ontology write a graphml file to disk and view this file."
+            )
-    if isinstance(inheritance_tree, nx.Graph):
-        inheritance_tree = nx.to_dict_of_lists(inheritance_tree)
         # unlist values
         inheritance_tree = {k: v[0] for k, v in inheritance_tree.items() if v}
+        return inheritance_tree
+    elif not _multiple_inheritance_present(inheritance_graph):
+        return inheritance_graph
-    # find root node
+def _multiple_inheritance_present(inheritance_tree: dict) -> bool:
+    """Checks if multiple inheritance is present in the inheritance_tree."""
+    return any(len(value) > 1 for value in inheritance_tree.values())
+def _find_root_node(inheritance_tree: dict) -> tuple[set, str]:
     classes = set(inheritance_tree.keys())
     parents = set(inheritance_tree.values())
     root = list(parents - classes)
     if len(root) > 1:
         if "entity" in root:
-            root = "entity"  # default: good standard? TODO
+            root = "entity"  # TODO: default: good standard?
         else:
             raise ValueError(
                 "Inheritance tree cannot have more than one root node. "
                 f"Found {len(root)}: {root}."
             )
     else:
         root = root[0]
     if not root:
         # find key whose value is None
         root = list(inheritance_tree.keys())[
             list(inheritance_tree.values()).index(None)
         ]
-    tree = Tree()
-    tree.create_node(root, root)
-    while classes:
-        for child in classes:
-            parent = inheritance_tree[child]
-            if parent in tree.nodes.keys() or parent == root:
-                tree.create_node(child, child, parent=parent)
-        for node in tree.nodes.keys():
-            if node in classes:
-                classes.remove(node)
-    return tree
+    return classes, root
 # string conversion, adapted from Biolink Model Toolkit

{biocypher-0.5.34 → biocypher-0.5.36}/biocypher/_ontology.py RENAMED Viewed

@@ -93,7 +93,7 @@ class OntologyAdapter:
         self._reverse_labels = reverse_labels
         self._remove_prefixes = remove_prefixes
-        # Load the ontology into an rdflib Graph according to the file extension
+        # Load the ontology into a rdflib Graph according to the file extension
         self._rdf_graph = self._load_rdf_graph(ontology_file)
         self._nx_graph = self._rdf_to_nx(
@@ -107,56 +107,77 @@ class OntologyAdapter:
         G = nx.DiGraph()
         # Define a recursive function to add subclasses to the graph
-        def add_subclasses(node):
-            # Only add nodes that have a label
-            if (node, rdflib.RDFS.label, None) not in g:
+        def add_subclasses(parent_node):
+            if not has_label(parent_node, g):
                 return
-            nx_id, nx_label = _get_nx_id_and_label(node)
-            if nx_id not in G:
-                G.add_node(nx_id)
-                G.nodes[nx_id]["label"] = nx_label
-            # Recursively add all subclasses of the node to the graph
-            for s, _, o in g.triples((None, rdflib.RDFS.subClassOf, node)):
-                # Only add nodes that have a label
-                if (s, rdflib.RDFS.label, None) not in g:
-                    continue
-                s_id, s_label = _get_nx_id_and_label(s)
-                G.add_node(s_id)
-                G.nodes[s_id]["label"] = s_label
+            nx_parent_node_id, nx_parent_node_label = _get_nx_id_and_label(
+                parent_node
+            )
-                G.add_edge(s_id, nx_id)
-                add_subclasses(s)
-                add_parents(s)
+            if nx_parent_node_id not in G:
+                add_node(nx_parent_node_id, nx_parent_node_label)
+            child_nodes = get_child_nodes(parent_node, g)
+            if child_nodes:
+                for child_node in child_nodes:
+                    if not has_label(child_node, g):
+                        continue
+                    (
+                        nx_child_node_id,
+                        nx_child_node_label,
+                    ) = _get_nx_id_and_label(child_node)
+                    add_node(nx_child_node_id, nx_child_node_label)
+                    G.add_edge(nx_child_node_id, nx_parent_node_id)
+                for child_node in child_nodes:
+                    add_subclasses(child_node)
+                    add_parents(child_node)
         def add_parents(node):
-            # Only add nodes that have a label
-            if (node, rdflib.RDFS.label, None) not in g:
+            if not has_label(node, g):
                 return
             nx_id, nx_label = _get_nx_id_and_label(node)
             # Recursively add all parents of the node to the graph
             for s, _, o in g.triples((node, rdflib.RDFS.subClassOf, None)):
-                # Only add nodes that have a label
-                if (o, rdflib.RDFS.label, None) not in g:
+                if not has_label(o, g):
                     continue
                 o_id, o_label = _get_nx_id_and_label(o)
-                # Skip nodes already in the graph
+                # Skip if node already in the graph
                 if o_id in G:
                     continue
-                G.add_node(o_id)
-                G.nodes[o_id]["label"] = o_label
+                add_node(o_id, o_label)
                 G.add_edge(nx_id, o_id)
                 add_parents(o)
+        def has_label(node: rdflib.URIRef, g: rdflib.Graph) -> bool:
+            """Does the node have a label in g?
+            Args:
+                node (rdflib.URIRef): The node to check
+                g (rdflib.Graph): The graph to check in
+            Returns:
+                bool: True if the node has a label, False otherwise
+            """
+            return (node, rdflib.RDFS.label, None) in g
+        def add_node(nx_node_id: str, nx_node_label: str):
+            """Add a node to the graph.
+            Args:
+                nx_node_id (str): The ID of the node
+                nx_node_label (str): The label of the node
+            """
+            G.add_node(nx_node_id)
+            G.nodes[nx_node_id]["label"] = nx_node_label
         def _get_nx_id_and_label(node):
             node_id_str = self._remove_prefix(str(node))
             node_label_str = str(g.value(node, rdflib.RDFS.label)).replace(
@@ -168,6 +189,79 @@ class OntologyAdapter:
             nx_label = node_id_str if switch_id_and_label else node_label_str
             return nx_id, nx_label
+        def get_child_nodes(
+            parent_node: rdflib.URIRef, g: rdflib.Graph
+        ) -> list:
+            """Get the child nodes of a node in the ontology.
+            Accounts for the case of multiple parents defined in intersectionOf.
+            Args:
+                parent_node (rdflib.URIRef): The parent node to get the children of
+                g (rdflib.Graph): The graph to get the children from
+            Returns:
+                list: A list of the child nodes
+            """
+            child_nodes = []
+            for s, p, o in g.triples((None, rdflib.RDFS.subClassOf, None)):
+                if (o, rdflib.RDF.type, rdflib.OWL.Class) in g and (
+                    o,
+                    rdflib.OWL.intersectionOf,
+                    None,
+                ) in g:
+                    # Check if node has multiple parent nodes defined in intersectionOf (one of them = parent_node)
+                    parent_nodes = get_nodes_in_intersectionof(o)
+                    if parent_node in parent_nodes:
+                        child_nodes.append(s)
+                        for node in parent_nodes:
+                            add_parents(node)
+                elif o == parent_node:
+                    # only one parent node
+                    child_nodes.append(s)
+            return child_nodes
+        def get_nodes_in_intersectionof(o: rdflib.URIRef) -> list:
+            """Get the nodes in an intersectionOf node.
+            Args:
+                o (rdflib.URIRef): The intersectionOf node
+            Returns:
+                list: A list of the nodes in the intersectionOf node
+            """
+            anonymous_intersection_nodes = []
+            for _, _, anonymous_object in g.triples(
+                (o, rdflib.OWL.intersectionOf, None)
+            ):
+                anonymous_intersection_nodes.append(anonymous_object)
+            anonymous_intersection_node = anonymous_intersection_nodes[0]
+            nodes_in_intersection = retrieve_rdf_linked_list(
+                anonymous_intersection_node
+            )
+            return nodes_in_intersection
+        def retrieve_rdf_linked_list(subject: rdflib.URIRef) -> list:
+            """Recursively retrieves a linked list from RDF.
+            Example RDF list with the items [item1, item2]:
+            list_node - first -> item1
+            list_node - rest -> list_node2
+            list_node2 - first -> item2
+            list_node2 - rest -> nil
+            Args:
+                subject (rdflib.URIRef): One list_node of the RDF list
+            Returns:
+                list: The items of the RDF list
+            """
+            rdf_list = []
+            for s, p, o in g.triples((subject, rdflib.RDF.first, None)):
+                rdf_list.append(o)
+            for s, p, o in g.triples((subject, rdflib.RDF.rest, None)):
+                if o != rdflib.RDF.nil:
+                    rdf_list.extend(retrieve_rdf_linked_list(o))
+            return rdf_list
         # Add all subclasses of the root node to the graph
         add_subclasses(root)

{biocypher-0.5.34 → biocypher-0.5.36}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [tool.poetry]
 name = "biocypher"
-version = "0.5.34"
+version = "0.5.36"
 description = "A unifying framework for biomedical research knowledge graphs"
 authors = [
     "Sebastian Lobentanzer <sebastian.lobentanzer@gmail.com>",