biolink-helper-pkg 1.0.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,114 @@
1
+ Metadata-Version: 2.4
2
+ Name: biolink-helper-pkg
3
+ Version: 1.0.0
4
+ Summary: Standalone BiolinkHelper module extracted from RTX
5
+ Author-email: Your Name <you@example.com>
6
+ Requires-Python: >=3.8
7
+ Description-Content-Type: text/markdown
8
+ Requires-Dist: networkx>=2.8
9
+ Requires-Dist: PyYAML>=6.0
10
+ Requires-Dist: requests>=2.25
11
+
12
+ # BiolinkHelper
13
+
14
+ `BiolinkHelper` is a lightweight Python utility class for working with a specific **Biolink Model version**.
15
+ Its data source is the [Biolink Model YAML](https://github.com/biolink/biolink-model/blob/master/biolink-model.yaml) file.
16
+ ---
17
+
18
+ ## Overview
19
+
20
+ The `BiolinkHelper` class requires two mandatory inputs at initialization time:
21
+
22
+ * A **Biolink Model version**
23
+ * A **local cache path** where Biolink resources are stored (or will be stored)
24
+
25
+ No defaults are assumed.
26
+
27
+ ---
28
+
29
+ ## Installation
30
+
31
+
32
+ ```bash
33
+ pip install biolink-helper-pkg
34
+ ```
35
+
36
+ ---
37
+
38
+ ## Usage
39
+
40
+ ### Basic Example
41
+
42
+ ```python
43
+ from biolink_helper_pkg import BiolinkHelper
44
+
45
+ helper = BiolinkHelper(
46
+ biolink_version="4.2.6",
47
+ cached_path="/data/biolink_cache"
48
+ )
49
+ ```
50
+
51
+ ---
52
+
53
+ ## Class API
54
+
55
+ ### `BiolinkHelper`
56
+
57
+ ```python
58
+ class BiolinkHelper:
59
+
60
+ def __init__(self, biolink_version, cached_path):
61
+ ...
62
+ ```
63
+
64
+ #### Parameters
65
+
66
+ | Name | Type | Required | Description |
67
+ | ----------------- | ----- | -------- |-------------------------------------------------------------------------------------------------------------|
68
+ | `biolink_version` | `str` | Yes | The Biolink Model version to use (e.g. `"4.2.6"`). This value is mandatory and must be provided explicitly. |
69
+ | `cached_path` | `str` | Yes | Path to a directory used for caching Biolink YAML files and lookup maps. Must be readable and writable. |
70
+
71
+ > Both parameters are **mandatory**.
72
+ > Initialization will fail if either is missing or invalid.
73
+
74
+ ---
75
+
76
+ #### How to use
77
+
78
+ Examples of ways to get **ancestors**:
79
+ ```
80
+ biolink_helper.get_ancestors("biolink:Drug")
81
+ biolink_helper.get_ancestors(["biolink:Drug", "biolink:Protein"])
82
+ biolink_helper.get_ancestors("biolink:Drug", include_mixins=False)
83
+ biolink_helper.get_ancestors("biolink:Protein", include_conflations=False)
84
+ biolink_helper.get_ancestors("biolink:treats")
85
+ ```
86
+
87
+ Examples of ways to get **descendants**:
88
+ ```
89
+ biolink_helper.get_descendants("biolink:ChemicalEntity")
90
+ biolink_helper.get_descendants(["biolink:ChemicalEntity", "biolink:Protein"])
91
+ biolink_helper.get_descendants("biolink:ChemicalEntity", include_mixins=False)
92
+ biolink_helper.get_descendants("biolink:Protein", include_conflations=False)
93
+ biolink_helper.get_descendants("biolink:related_to")
94
+ ```
95
+
96
+ Ancestors/descendants are always returned in a list. Relevant mixins are included in the returned list by default, but you can turn that behavior off via the `include_mixins` parameter, as shown in some of the above examples. Inclusion of ARAX-defined conflations can be controlled via the `include_conflations` parameter (default is True).
97
+
98
+ Other available methods include getting **canonical predicates**:
99
+
100
+ ```
101
+ biolink_helper.get_canonical_predicates("biolink:treated_by")
102
+ biolink_helper.get_canonical_predicates(["biolink:treated_by", "biolink:related_to"])
103
+ ```
104
+
105
+ And **filtering out mixins**:
106
+
107
+ ```
108
+ biolink_helper.filter_out_mixins(["biolink:ChemicalEntity", "biolink:PhysicalEssence"]])
109
+ ```
110
+
111
+
112
+ #### Debugging
113
+
114
+ The JSON version saved in `cached_path` is just there for easier debugging/viewing.
@@ -0,0 +1,103 @@
1
+ # BiolinkHelper
2
+
3
+ `BiolinkHelper` is a lightweight Python utility class for working with a specific **Biolink Model version**.
4
+ Its data source is the [Biolink Model YAML](https://github.com/biolink/biolink-model/blob/master/biolink-model.yaml) file.
5
+ ---
6
+
7
+ ## Overview
8
+
9
+ The `BiolinkHelper` class requires two mandatory inputs at initialization time:
10
+
11
+ * A **Biolink Model version**
12
+ * A **local cache path** where Biolink resources are stored (or will be stored)
13
+
14
+ No defaults are assumed.
15
+
16
+ ---
17
+
18
+ ## Installation
19
+
20
+
21
+ ```bash
22
+ pip install biolink-helper-pkg
23
+ ```
24
+
25
+ ---
26
+
27
+ ## Usage
28
+
29
+ ### Basic Example
30
+
31
+ ```python
32
+ from biolink_helper_pkg import BiolinkHelper
33
+
34
+ helper = BiolinkHelper(
35
+ biolink_version="4.2.6",
36
+ cached_path="/data/biolink_cache"
37
+ )
38
+ ```
39
+
40
+ ---
41
+
42
+ ## Class API
43
+
44
+ ### `BiolinkHelper`
45
+
46
+ ```python
47
+ class BiolinkHelper:
48
+
49
+ def __init__(self, biolink_version, cached_path):
50
+ ...
51
+ ```
52
+
53
+ #### Parameters
54
+
55
+ | Name | Type | Required | Description |
56
+ | ----------------- | ----- | -------- |-------------------------------------------------------------------------------------------------------------|
57
+ | `biolink_version` | `str` | Yes | The Biolink Model version to use (e.g. `"4.2.6"`). This value is mandatory and must be provided explicitly. |
58
+ | `cached_path` | `str` | Yes | Path to a directory used for caching Biolink YAML files and lookup maps. Must be readable and writable. |
59
+
60
+ > Both parameters are **mandatory**.
61
+ > Initialization will fail if either is missing or invalid.
62
+
63
+ ---
64
+
65
+ #### How to use
66
+
67
+ Examples of ways to get **ancestors**:
68
+ ```
69
+ biolink_helper.get_ancestors("biolink:Drug")
70
+ biolink_helper.get_ancestors(["biolink:Drug", "biolink:Protein"])
71
+ biolink_helper.get_ancestors("biolink:Drug", include_mixins=False)
72
+ biolink_helper.get_ancestors("biolink:Protein", include_conflations=False)
73
+ biolink_helper.get_ancestors("biolink:treats")
74
+ ```
75
+
76
+ Examples of ways to get **descendants**:
77
+ ```
78
+ biolink_helper.get_descendants("biolink:ChemicalEntity")
79
+ biolink_helper.get_descendants(["biolink:ChemicalEntity", "biolink:Protein"])
80
+ biolink_helper.get_descendants("biolink:ChemicalEntity", include_mixins=False)
81
+ biolink_helper.get_descendants("biolink:Protein", include_conflations=False)
82
+ biolink_helper.get_descendants("biolink:related_to")
83
+ ```
84
+
85
+ Ancestors/descendants are always returned in a list. Relevant mixins are included in the returned list by default, but you can turn that behavior off via the `include_mixins` parameter, as shown in some of the above examples. Inclusion of ARAX-defined conflations can be controlled via the `include_conflations` parameter (default is True).
86
+
87
+ Other available methods include getting **canonical predicates**:
88
+
89
+ ```
90
+ biolink_helper.get_canonical_predicates("biolink:treated_by")
91
+ biolink_helper.get_canonical_predicates(["biolink:treated_by", "biolink:related_to"])
92
+ ```
93
+
94
+ And **filtering out mixins**:
95
+
96
+ ```
97
+ biolink_helper.filter_out_mixins(["biolink:ChemicalEntity", "biolink:PhysicalEssence"]])
98
+ ```
99
+
100
+
101
+ #### Debugging
102
+
103
+ The JSON version saved in `cached_path` is just there for easier debugging/viewing.
@@ -0,0 +1,4 @@
1
+ from .biolink_helper import BiolinkHelper
2
+
3
+ __all__ = ["BiolinkHelper"]
4
+
@@ -0,0 +1,530 @@
1
+ #!/bin/env python3
2
+ """
3
+ Usage: python biolink_helper.py [biolink version number, e.g. 3.0.3]
4
+ """
5
+
6
+ import argparse
7
+ import datetime
8
+ import json
9
+ import pathlib
10
+ import pickle
11
+ import sys
12
+ from collections import defaultdict
13
+ from typing import Optional, List, Set, Dict, Union
14
+
15
+ import networkx as nx
16
+ import requests
17
+ import yaml
18
+
19
+
20
+ def eprint(*args, **kwargs): print(*args, file=sys.stderr, **kwargs)
21
+
22
+ class BiolinkHelper:
23
+
24
+ def __init__(self, biolink_version, cached_path):
25
+ self.biolink_version = biolink_version
26
+ self.root_category = "biolink:NamedThing"
27
+ self.root_predicate = "biolink:related_to"
28
+ self.biolink_model_file_path = f"{cached_path}/biolink_model_{biolink_version}.yaml"
29
+ self.biolink_lookup_map_path = f"{cached_path}/biolink_lookup_map_{biolink_version}_v5.pickle"
30
+ self.biolink_lookup_map = self._load_biolink_lookup_map()
31
+ protein_like_categories = {"biolink:Protein", "biolink:Gene"}
32
+ disease_like_categories = {"biolink:Disease", "biolink:PhenotypicFeature", "biolink:DiseaseOrPhenotypicFeature"}
33
+ self.arax_conflations = {
34
+ "biolink:Protein": protein_like_categories,
35
+ "biolink:Gene": protein_like_categories,
36
+ "biolink:Disease": disease_like_categories,
37
+ "biolink:PhenotypicFeature": disease_like_categories,
38
+ "biolink:DiseaseOrPhenotypicFeature": disease_like_categories
39
+ }
40
+
41
+ def _load_biolink_lookup_map(self):
42
+ timestamp = str(datetime.datetime.now().isoformat())
43
+
44
+ if not pathlib.Path(self.biolink_lookup_map_path).exists():
45
+ eprint(f"{timestamp}: DEBUG: lookup map not here! {self.biolink_lookup_map_path}")
46
+ # Parse the relevant Biolink yaml file and create/save local indexes
47
+ self.download_biolink_model()
48
+ with open(self.biolink_model_file_path) as f:
49
+ biolink_model = yaml.safe_load(f)
50
+ return self._create_biolink_lookup_map(biolink_model)
51
+ else:
52
+ # A local file already exists for this Biolink version, so just load it
53
+ eprint(f"{timestamp}: DEBUG: Loading pickle file: {self.biolink_model_file_path}")
54
+ with open(self.biolink_lookup_map_path, "rb") as biolink_map_file:
55
+ biolink_lookup_map = pickle.load(biolink_map_file)
56
+ return biolink_lookup_map
57
+
58
+ def download_biolink_model(self):
59
+ if not pathlib.Path(self.biolink_model_file_path).exists():
60
+ response = requests.get(
61
+ f"https://raw.githubusercontent.com/biolink/biolink-model/{self.biolink_version}/biolink-model.yaml",
62
+ timeout=10)
63
+ if response.status_code != 200: # Sometimes Biolink's tags start with 'v', so try that
64
+ response = requests.get(
65
+ f"https://raw.githubusercontent.com/biolink/biolink-model/v{self.biolink_version}/biolink-model.yaml",
66
+ timeout=10)
67
+
68
+ if response.status_code == 200:
69
+ biolink_model = yaml.safe_load(response.text)
70
+ with open(self.biolink_model_file_path, "w") as f:
71
+ yaml.safe_dump(
72
+ biolink_model,
73
+ f,
74
+ sort_keys=False,
75
+ allow_unicode=True
76
+ )
77
+ else:
78
+ raise RuntimeError(f"ERROR: Request to get Biolink {self.biolink_version} YAML file returned "
79
+ f"{response.status_code} response. Cannot load BiolinkHelper.")
80
+
81
+ def get_ancestors(self, biolink_items: Union[str, List[str]], include_mixins: bool = True, include_conflations: bool = True) -> List[str]:
82
+ """
83
+ Returns the ancestors of Biolink categories, predicates, category mixins, or predicate mixins. Input
84
+ categories/predicates/mixins are themselves included in the returned ancestor list. For proper
85
+ categories/predicates, inclusion of mixin ancestors can be turned on or off via the include_mixins flag.
86
+ Note that currently the 'include_mixins' flag is only relevant when inputting *proper* predicates/categories;
87
+ if only predicate/category *mixins* are input, then the 'include_mixins' flag does nothing (mixins will always
88
+ be included in that case). Inclusion of ARAX-defined conflations (e.g., gene == protein) can be controlled via
89
+ the include_conflations parameter.
90
+ """
91
+ input_item_set = self._convert_to_set(biolink_items)
92
+ categories = input_item_set.intersection(set(self.biolink_lookup_map["categories"]))
93
+ predicates = input_item_set.intersection(set(self.biolink_lookup_map["predicates"]))
94
+ aspects = input_item_set.intersection(set(self.biolink_lookup_map["aspects"]))
95
+ directions = input_item_set.intersection(set(self.biolink_lookup_map["directions"]))
96
+ ancestors = input_item_set.copy()
97
+ if include_conflations:
98
+ categories = set(self.add_conflations(categories))
99
+ for category in categories:
100
+ ancestor_property = "ancestors" if not include_mixins and "ancestors" in self.biolink_lookup_map["categories"][category] else "ancestors_with_mixins"
101
+ ancestors.update(self.biolink_lookup_map["categories"][category][ancestor_property])
102
+ for predicate in predicates:
103
+ ancestor_property = "ancestors" if not include_mixins and "ancestors" in self.biolink_lookup_map["predicates"][predicate] else "ancestors_with_mixins"
104
+ ancestors.update(self.biolink_lookup_map["predicates"][predicate][ancestor_property])
105
+ for aspect in aspects:
106
+ ancestors.update(self.biolink_lookup_map["aspects"][aspect]["ancestors"])
107
+ for direction in directions:
108
+ ancestors.update(self.biolink_lookup_map["directions"][direction]["ancestors"])
109
+ return list(ancestors)
110
+
111
+ def get_descendants(self, biolink_items: Union[str, List[str], Set[str]], include_mixins: bool = True, include_conflations: bool = True) -> List[str]:
112
+ """
113
+ Returns the descendants of Biolink categories, predicates, category mixins, or predicate mixins. Input
114
+ categories/predicates/mixins are themselves included in the returned descendant list. For proper
115
+ categories/predicates, inclusion of mixin descendants can be turned on or off via the include_mixins flag.
116
+ Note that currently the 'include_mixins' flag is only relevant when inputting *proper* predicates/categories;
117
+ if only predicate/category mixins are input, then the 'include_mixins' flag does nothing (mixins will always
118
+ be included in that case). Inclusion of ARAX-defined conflations (e.g., gene == protein) can be controlled
119
+ via the include_conflations parameter.
120
+ """
121
+ input_item_set = self._convert_to_set(biolink_items)
122
+ categories = input_item_set.intersection(set(self.biolink_lookup_map["categories"]))
123
+ predicates = input_item_set.intersection(set(self.biolink_lookup_map["predicates"]))
124
+ aspects = input_item_set.intersection(set(self.biolink_lookup_map["aspects"]))
125
+ directions = input_item_set.intersection(set(self.biolink_lookup_map["directions"]))
126
+ descendants = input_item_set.copy()
127
+ if include_conflations:
128
+ categories = set(self.add_conflations(categories))
129
+ for category in categories:
130
+ descendant_property = "descendants" if not include_mixins and "descendants" in self.biolink_lookup_map["categories"][category] else "descendants_with_mixins"
131
+ descendants.update(self.biolink_lookup_map["categories"][category][descendant_property])
132
+ for predicate in predicates:
133
+ descendant_property = "descendants" if not include_mixins and "descendants" in self.biolink_lookup_map["predicates"][predicate] else "descendants_with_mixins"
134
+ descendants.update(self.biolink_lookup_map["predicates"][predicate][descendant_property])
135
+ for aspect in aspects:
136
+ descendants.update(self.biolink_lookup_map["aspects"][aspect]["descendants"])
137
+ for direction in directions:
138
+ descendants.update(self.biolink_lookup_map["directions"][direction]["descendants"])
139
+ return list(descendants)
140
+
141
+ def get_canonical_predicates(self, predicates: Union[str, List[str], Set[str]], print_warnings: bool = True) -> List[str]:
142
+ """
143
+ Returns the canonical version of the input predicate(s). Accepts a single predicate or multiple predicates as
144
+ input and always returns the canonical predicate(s) in a list. Works with both proper and mixin predicates.
145
+ """
146
+ input_predicate_set = self._convert_to_set(predicates)
147
+ valid_predicates = input_predicate_set.intersection(self.biolink_lookup_map["predicates"])
148
+ invalid_predicates = input_predicate_set.difference(valid_predicates)
149
+ if invalid_predicates and print_warnings:
150
+ eprint(f"WARNING: Provided predicate(s) {invalid_predicates} do not exist in Biolink {self.biolink_version}")
151
+ canonical_predicates = {self.biolink_lookup_map["predicates"][predicate]["canonical_predicate"]
152
+ for predicate in valid_predicates}
153
+ canonical_predicates.update(invalid_predicates) # Go ahead and include those we don't have canonical info for
154
+ return list(canonical_predicates)
155
+
156
+ def get_predicate_depth_map(self)->Dict[str,int]:
157
+ with open(self.biolink_model_file_path) as f:
158
+ biolink_model = yaml.safe_load(f)
159
+ predicate_dag = self._build_predicate_dag(biolink_model)
160
+ return self._get_depths_from_root(predicate_dag)
161
+
162
+ def is_symmetric(self, predicate: str) -> Optional[bool]:
163
+ if predicate in self.biolink_lookup_map["predicates"]:
164
+ return self.biolink_lookup_map["predicates"][predicate]["is_symmetric"]
165
+ else:
166
+ return True # Consider unrecognized predicates symmetric (rather than throw error)
167
+
168
+ def replace_mixins_with_direct_mappings(self, biolink_items: Union[str, List[str], Set[str]]) -> List[str]:
169
+ # TODO: After Plover is revised to not use this method, delete it.. (nothing else uses it)
170
+ input_item_set = self._convert_to_set(biolink_items)
171
+ return list(input_item_set)
172
+
173
+ def filter_out_mixins(self, biolink_items: Union[List[str], Set[str]]) -> List[str]:
174
+ """
175
+ Removes any predicate or category mixins in the input list.
176
+ """
177
+ input_item_set = self._convert_to_set(biolink_items)
178
+ non_mixin_items = set(item for item in biolink_items if not (self.biolink_lookup_map["predicates"].get(item, dict()).get("is_mixin") or
179
+ self.biolink_lookup_map["categories"].get(item, dict()).get("is_mixin")))
180
+ return list(non_mixin_items)
181
+
182
+ def add_conflations(self, categories: Union[str, List[str], Set[str]]) -> List[str]:
183
+ """
184
+ Adds any "equivalent" categories (according to ARAX) to the input categories.
185
+ """
186
+ category_set = self._convert_to_set(categories)
187
+ return list({conflated_category for category in category_set
188
+ for conflated_category in self.arax_conflations.get(category, {category})})
189
+
190
+ def get_root_category(self) -> str:
191
+ return self.root_category
192
+
193
+ def get_root_predicate(self) -> str:
194
+ return self.root_predicate
195
+
196
+ def _create_biolink_lookup_map(self, biolink_model) -> Dict[str, Dict[str, Dict[str, Union[str, List[str], bool]]]]:
197
+ timestamp = str(datetime.datetime.now().isoformat())
198
+ eprint(f"{timestamp}: INFO: Building local Biolink {self.biolink_version} ancestor/descendant lookup map "
199
+ f"because one doesn't yet exist")
200
+ biolink_lookup_map = {"predicates": dict(), "categories": dict(),
201
+ "aspects": dict(), "directions": dict()}
202
+
203
+ # -------------------------------- PREDICATES --------------------------------- #
204
+ predicate_dag = self._build_predicate_dag(biolink_model)
205
+ # Build our map of predicate ancestors/descendants for easy lookup, first WITH mixins
206
+ for node_id in list(predicate_dag.nodes):
207
+ node_info = predicate_dag.nodes[node_id]
208
+ biolink_lookup_map["predicates"][node_id] = {
209
+ "ancestors_with_mixins": self._get_ancestors_nx(predicate_dag, node_id),
210
+ "descendants_with_mixins": self._get_descendants_nx(predicate_dag, node_id),
211
+ "is_symmetric": node_info.get("is_symmetric", False),
212
+ "canonical_predicate": node_info.get("canonical_predicate"),
213
+ "is_mixin": node_info.get("is_mixin", False)
214
+ }
215
+ # Now build our predicate ancestor/descendant lookup maps WITHOUT mixins
216
+ mixin_node_ids = [node_id for node_id, data in predicate_dag.nodes(data=True) if data.get("is_mixin")]
217
+ for mixin_node_id in mixin_node_ids:
218
+ predicate_dag.remove_node(mixin_node_id)
219
+ for node_id in list(predicate_dag.nodes):
220
+ biolink_lookup_map["predicates"][node_id]["ancestors"] = self._get_ancestors_nx(predicate_dag, node_id)
221
+ biolink_lookup_map["predicates"][node_id]["descendants"] = self._get_descendants_nx(predicate_dag, node_id)
222
+
223
+ # -------------------------------- CATEGORIES --------------------------------- #
224
+ category_dag = self._build_category_dag(biolink_model)
225
+ # Build our map of category ancestors/descendants for easy lookup, first WITH mixins
226
+ for node_id in list(category_dag.nodes):
227
+ node_info = category_dag.nodes[node_id]
228
+ biolink_lookup_map["categories"][node_id] = {
229
+ "ancestors_with_mixins": self._get_ancestors_nx(category_dag, node_id),
230
+ "descendants_with_mixins": self._get_descendants_nx(category_dag, node_id),
231
+ "is_mixin": node_info.get("is_mixin", False)
232
+ }
233
+ # Now build our category ancestor/descendant lookup maps WITHOUT mixins
234
+ mixin_node_ids = [node_id for node_id, data in category_dag.nodes(data=True) if data.get("is_mixin")]
235
+ for mixin_node_id in mixin_node_ids:
236
+ category_dag.remove_node(mixin_node_id)
237
+ for node_id in list(category_dag.nodes):
238
+ biolink_lookup_map["categories"][node_id]["ancestors"] = self._get_ancestors_nx(category_dag, node_id)
239
+ biolink_lookup_map["categories"][node_id]["descendants"] = self._get_descendants_nx(category_dag, node_id)
240
+
241
+ # -------------------------------- ASPECTS --------------------------------- #
242
+ aspect_dag = self._build_aspect_dag(biolink_model)
243
+ for node_id in list(aspect_dag.nodes):
244
+ biolink_lookup_map["aspects"][node_id] = {
245
+ "ancestors": self._get_ancestors_nx(aspect_dag, node_id),
246
+ "descendants": self._get_descendants_nx(aspect_dag, node_id)
247
+ }
248
+
249
+ # -------------------------------- DIRECTIONS --------------------------------- #
250
+ direction_dag = self._build_direction_dag(biolink_model)
251
+ for node_id in list(direction_dag.nodes):
252
+ biolink_lookup_map["directions"][node_id] = {
253
+ "ancestors": self._get_ancestors_nx(direction_dag, node_id),
254
+ "descendants": self._get_descendants_nx(direction_dag, node_id)
255
+ }
256
+
257
+ # Cache our Biolink lookup map (never needs to be refreshed for the given Biolink version)
258
+ with open(self.biolink_lookup_map_path, "wb") as output_file:
259
+ pickle.dump(biolink_lookup_map, output_file) # Use pickle so we can save Sets
260
+ # Also save a JSON version to help with debugging
261
+ json_file_path = self.biolink_lookup_map_path.replace(".pickle", ".json")
262
+ with open(json_file_path, "w+") as output_json_file:
263
+ json.dump(biolink_lookup_map, output_json_file, default=self._serialize_with_sets, indent=4)
264
+
265
+ return biolink_lookup_map
266
+
267
+ def _build_predicate_dag(self, biolink_model: dict) -> nx.DiGraph:
268
+ predicate_dag = nx.DiGraph()
269
+
270
+ # NOTE: 'slots' includes some things that aren't predicates, but we don't care; doesn't hurt to include them
271
+ for slot_name_english, info in biolink_model["slots"].items():
272
+ slot_name = self._convert_to_biolink_snakecase(slot_name_english)
273
+ # Record relationship between this node and its parent, if provided
274
+ parent_name_english = info.get("is_a")
275
+ if parent_name_english:
276
+ parent_name = self._convert_to_biolink_snakecase(parent_name_english)
277
+ predicate_dag.add_edge(parent_name, slot_name)
278
+ # Record relationship between this node and any direct 'mixins', if provided (treat same as is_a)
279
+ direct_mappings_english = info.get("mixins", [])
280
+ direct_mappings = {self._convert_to_biolink_snakecase(mapping_english)
281
+ for mapping_english in direct_mappings_english}
282
+ for direct_mapping in direct_mappings:
283
+ predicate_dag.add_edge(direct_mapping, slot_name)
284
+
285
+ # Record node metadata
286
+ self._add_node_if_doesnt_exist(predicate_dag, slot_name)
287
+ if info.get("mixin"):
288
+ predicate_dag.nodes[slot_name]["is_mixin"] = True
289
+ if info.get("symmetric"):
290
+ predicate_dag.nodes[slot_name]["is_symmetric"] = True
291
+ # Record the canonical form of this predicate
292
+ inverse_predicate_english = info.get("inverse")
293
+ is_canonical_predicate = info.get("annotations", dict()).get("canonical_predicate")
294
+ # A couple 'inverse' pairs of predicates in Biolink 3.0.3 seem to be missing a 'canonical_predicate' label,
295
+ # so we work around that below (see https://github.com/biolink/biolink-model/issues/1112)
296
+ canonical_predicate_english = slot_name_english if is_canonical_predicate or not inverse_predicate_english else inverse_predicate_english
297
+ canonical_predicate = self._convert_to_biolink_snakecase(canonical_predicate_english)
298
+ predicate_dag.nodes[slot_name]["canonical_predicate"] = canonical_predicate
299
+
300
+ # Last, filter out things that are not predicates (Biolink 'slots' includes other things too..)
301
+ non_predicate_node_ids = [node_id for node_id, data in predicate_dag.nodes(data=True)
302
+ if not (self.root_predicate in self._get_ancestors_nx(predicate_dag, node_id)
303
+ or data.get("is_mixin"))]
304
+ for non_predicate_node_id in non_predicate_node_ids:
305
+ predicate_dag.remove_node(non_predicate_node_id)
306
+
307
+ return predicate_dag
308
+
309
+ def _build_category_dag(self, biolink_model: dict) -> nx.DiGraph:
310
+ category_dag = nx.DiGraph()
311
+
312
+ for class_name_english, info in biolink_model["classes"].items():
313
+ class_name = self._convert_to_biolink_camelcase(class_name_english)
314
+ # Record relationship between this node and its parent, if provided
315
+ parent_name_english = info.get("is_a")
316
+ if parent_name_english:
317
+ parent_name = self._convert_to_biolink_camelcase(parent_name_english)
318
+ category_dag.add_edge(parent_name, class_name)
319
+ # Record relationship between this node and any direct 'mixins', if provided (treat same as is_a)
320
+ direct_mappings_english = info.get("mixins", [])
321
+ direct_mappings = {self._convert_to_biolink_camelcase(mapping_english)
322
+ for mapping_english in direct_mappings_english}
323
+ for direct_mapping in direct_mappings:
324
+ category_dag.add_edge(direct_mapping, class_name)
325
+
326
+ # Record node metadata
327
+ self._add_node_if_doesnt_exist(category_dag, class_name)
328
+ if info.get("mixin"):
329
+ category_dag.nodes[class_name]["is_mixin"] = True
330
+
331
+ # Last, filter out things that are not categories (Biolink 'classes' includes other things too..)
332
+ non_category_node_ids = [node_id for node_id, data in category_dag.nodes(data=True)
333
+ if not (self.root_category in self._get_ancestors_nx(category_dag, node_id)
334
+ or data.get("is_mixin"))]
335
+ for non_category_node_id in non_category_node_ids:
336
+ category_dag.remove_node(non_category_node_id)
337
+
338
+ return category_dag
339
+
340
+ def _build_aspect_dag(self, biolink_model: dict) -> nx.DiGraph:
341
+ aspect_dag = nx.DiGraph()
342
+
343
+ aspect_enum_field_name = "gene_or_gene_product_or_chemical_entity_aspect_enum" if self.biolink_version.startswith("3.0") else "GeneOrGeneProductOrChemicalEntityAspectEnum"
344
+ parent_to_child_dict = defaultdict(set)
345
+ for aspect_name_english, info in biolink_model["enums"][aspect_enum_field_name]["permissible_values"].items():
346
+ aspect_name_trapi = self._convert_to_biolink_snakecase(aspect_name_english, add_biolink_prefix=False)
347
+ parent_name_english = info.get("is_a") if info else None
348
+ if parent_name_english:
349
+ parent_name_trapi = self._convert_to_biolink_snakecase(parent_name_english, add_biolink_prefix=False)
350
+ aspect_dag.add_edge(parent_name_trapi, aspect_name_trapi)
351
+
352
+ return aspect_dag
353
+
354
+ def _build_direction_dag(self, biolink_model: dict) -> nx.DiGraph:
355
+ direction_dag = nx.DiGraph()
356
+
357
+ direction_enum_field_name = "direction_qualifier_enum" if self.biolink_version.startswith("3.0") else "DirectionQualifierEnum"
358
+ parent_to_child_dict = defaultdict(set)
359
+ for direction_name_english, info in biolink_model["enums"][direction_enum_field_name]["permissible_values"].items():
360
+ direction_name_trapi = self._convert_to_biolink_snakecase(direction_name_english, add_biolink_prefix=False)
361
+ parent_name_english = info.get("is_a") if info else None
362
+ if parent_name_english:
363
+ parent_name_trapi = self._convert_to_biolink_snakecase(parent_name_english, add_biolink_prefix=False)
364
+ direction_dag.add_edge(parent_name_trapi, direction_name_trapi)
365
+
366
+ return direction_dag
367
+
368
+ def _get_depths_from_root(self, dag)-> Dict[str,int]:
369
+ node_depths = {}
370
+ for node in nx.topological_sort(dag):
371
+ # Skip if the node is the start node
372
+
373
+ # Get all predecessors of the current node
374
+ predecessors = list(dag.predecessors(node))
375
+
376
+ # If the node has predecessors, calculate its depth as max(depth of predecessors) + 1
377
+ if predecessors:
378
+ node_depths[node] = max(node_depths[pred] for pred in predecessors) + 1
379
+ else:
380
+ node_depths[node] = 0 # Handle nodes that have no predecessors (if any)
381
+
382
+ return node_depths
383
+
384
+ @staticmethod
385
+ def _get_ancestors_nx(nx_graph: nx.DiGraph, node_id: str) -> List[str]:
386
+ return list(nx.ancestors(nx_graph, node_id).union({node_id}))
387
+
388
+ @staticmethod
389
+ def _get_descendants_nx(nx_graph: nx.DiGraph, node_id: str) -> List[str]:
390
+ return list(nx.descendants(nx_graph, node_id).union({node_id}))
391
+
392
+ @staticmethod
393
+ def _add_node_if_doesnt_exist(nx_graph: nx.DiGraph, node_id: str):
394
+ if not nx_graph.has_node(node_id):
395
+ nx_graph.add_node(node_id)
396
+
397
+ @staticmethod
398
+ def _convert_to_biolink_snakecase(english_term: str, add_biolink_prefix: bool = True):
399
+ # NOTE: aspects/directions should *not* have 'biolink' prefix, that's why that is controlled via a flag..
400
+ snakecase_term = english_term.replace(' ', '_')
401
+ if add_biolink_prefix and not snakecase_term.startswith("biolink:"):
402
+ return f"biolink:{snakecase_term}"
403
+ else:
404
+ return snakecase_term
405
+
406
+ @staticmethod
407
+ def _convert_to_biolink_camelcase(english_term: str):
408
+ camel_case_class_name = "".join([f"{word[0].upper()}{word[1:]}" for word in english_term.split(" ")])
409
+ if camel_case_class_name.startswith("biolink:"):
410
+ return camel_case_class_name
411
+ else:
412
+ return f"biolink:{camel_case_class_name}"
413
+
414
+ @staticmethod
415
+ def _convert_to_set(items: any) -> Set[str]:
416
+ if isinstance(items, str):
417
+ return {items}
418
+ elif isinstance(items, list):
419
+ return set(items)
420
+ elif isinstance(items, set):
421
+ return items
422
+ else:
423
+ return set()
424
+
425
+ @staticmethod
426
+ def _serialize_with_sets(obj: any) -> any:
427
+ return list(obj) if isinstance(obj, set) else obj
428
+
429
+
430
+ def main():
431
+ arg_parser = argparse.ArgumentParser()
432
+ arg_parser.add_argument('version', nargs='?', help="The Biolink Model version number to use")
433
+ arg_parser.add_argument('cached_path', nargs='?', help="A directory for caching downloaded biolink models")
434
+ args = arg_parser.parse_args()
435
+
436
+ bh = BiolinkHelper(biolink_version=args.version, cached_path=args.cached_path)
437
+
438
+ assert bh.root_predicate in bh.biolink_lookup_map["predicates"]
439
+ assert bh.root_category in bh.biolink_lookup_map["categories"]
440
+
441
+ # Test descendants
442
+ chemical_entity_descendants = bh.get_descendants("biolink:ChemicalEntity", include_mixins=True)
443
+ assert "biolink:Drug" in chemical_entity_descendants
444
+ assert "biolink:ChemicalEntity" in chemical_entity_descendants
445
+ assert "biolink:SmallMolecule" in chemical_entity_descendants
446
+ assert "biolink:NamedThing" not in chemical_entity_descendants
447
+ chemical_entity_descenants_no_mixins = bh.get_descendants("biolink:ChemicalEntity", include_mixins=False)
448
+ assert "biolink:Drug" in chemical_entity_descenants_no_mixins
449
+ assert "biolink:NamedThing" not in chemical_entity_descenants_no_mixins
450
+
451
+ # Test ancestors
452
+ protein_ancestors = bh.get_ancestors("biolink:Protein", include_mixins=True)
453
+ assert "biolink:NamedThing" in protein_ancestors
454
+ assert "biolink:ProteinIsoform" not in protein_ancestors
455
+ assert "biolink:GeneProductMixin" in protein_ancestors
456
+ protein_ancestors_no_mixins = bh.get_ancestors("biolink:Protein", include_mixins=False)
457
+ assert "biolink:NamedThing" in protein_ancestors_no_mixins
458
+ assert "biolink:ProteinIsoform" not in protein_ancestors_no_mixins
459
+ assert "biolink:GeneProductMixin" not in protein_ancestors_no_mixins
460
+ assert len(protein_ancestors_no_mixins) < len(protein_ancestors)
461
+
462
+ # Test predicates
463
+ treats_ancestors = bh.get_ancestors("biolink:treats")
464
+ assert "biolink:treats_or_applied_or_studied_to_treat" in treats_ancestors
465
+ related_to_descendants = bh.get_descendants("biolink:related_to", include_mixins=True)
466
+ assert "biolink:treats" in related_to_descendants
467
+ assert "biolink:same_as" in bh.get_descendants("biolink:close_match")
468
+ assert "biolink:close_match" in bh.get_ancestors("biolink:same_as")
469
+
470
+ # Test lists
471
+ combined_ancestors = bh.get_ancestors(["biolink:Gene", "biolink:Drug"])
472
+ assert "biolink:Drug" in combined_ancestors
473
+ assert "biolink:Gene" in combined_ancestors
474
+ assert "biolink:BiologicalEntity" in combined_ancestors
475
+
476
+ # Test conflations
477
+ protein_ancestors = bh.get_ancestors("biolink:Protein", include_conflations=True)
478
+ assert "biolink:Gene" in protein_ancestors
479
+ gene_descendants = bh.get_descendants("biolink:Gene", include_conflations=True)
480
+ assert "biolink:Protein" in gene_descendants
481
+ gene_conflations = bh.add_conflations("biolink:Gene")
482
+ assert set(gene_conflations) == {"biolink:Gene", "biolink:Protein"}
483
+
484
+ # Test canonical predicates
485
+ canonical_treated_by = bh.get_canonical_predicates("biolink:treated_by")
486
+ canonical_treats = bh.get_canonical_predicates("biolink:treats")
487
+ assert canonical_treated_by == ["biolink:treats"]
488
+ assert canonical_treats == ["biolink:treats"]
489
+ canonical_related_to = bh.get_canonical_predicates("biolink:related_to")
490
+ assert canonical_related_to == ["biolink:related_to"]
491
+ assert bh.get_canonical_predicates("biolink:superclass_of") == ["biolink:subclass_of"]
492
+
493
+ # Test filtering out mixins
494
+ mixin_less_list = bh.filter_out_mixins(["biolink:Protein", "biolink:Drug", "biolink:PhysicalEssence"])
495
+ assert set(mixin_less_list) == {"biolink:Protein", "biolink:Drug"}
496
+
497
+ # Test treats predicates
498
+ treats_or_descendants = bh.get_descendants("biolink:treats_or_applied_or_studied_to_treat",
499
+ include_mixins=True)
500
+ print(f"Descendants of 'biolink:treats_or_applied_or_studied_to_treat are: \n{treats_or_descendants}")
501
+ assert "biolink:treats" in treats_or_descendants
502
+ assert "biolink:applied_to_treat" in treats_or_descendants
503
+ assert "biolink:ameliorates_condition" in treats_or_descendants
504
+ assert "biolink:treats" in bh.get_descendants("biolink:related_to",
505
+ include_mixins=True)
506
+
507
+ # Test predicate symmetry
508
+ assert bh.is_symmetric("biolink:related_to")
509
+ assert bh.is_symmetric("biolink:close_match")
510
+ assert not bh.is_symmetric("biolink:subclass_of")
511
+
512
+ # Test aspects
513
+ assert "molecular_modification" in bh.get_ancestors("ribosylation")
514
+ assert "activity" in bh.get_descendants("activity_or_abundance")
515
+
516
+ # Test directions
517
+ assert "increased" in bh.get_ancestors("upregulated")
518
+ assert "downregulated" in bh.get_descendants("decreased")
519
+
520
+ # Test excluding mixins
521
+ assert "biolink:treats" not in bh.get_descendants("biolink:related_to", include_mixins=False)
522
+
523
+ # Test replacing mixins with direct mappings TODO: remove after usages of this method in Plover are removed..
524
+ assert ["biolink:treats"] == bh.replace_mixins_with_direct_mappings(["biolink:treats"])
525
+
526
+ print("All BiolinkHelper tests passed!")
527
+
528
+
529
+ if __name__ == "__main__":
530
+ main()
@@ -0,0 +1,114 @@
1
+ Metadata-Version: 2.4
2
+ Name: biolink-helper-pkg
3
+ Version: 1.0.0
4
+ Summary: Standalone BiolinkHelper module extracted from RTX
5
+ Author-email: Your Name <you@example.com>
6
+ Requires-Python: >=3.8
7
+ Description-Content-Type: text/markdown
8
+ Requires-Dist: networkx>=2.8
9
+ Requires-Dist: PyYAML>=6.0
10
+ Requires-Dist: requests>=2.25
11
+
12
+ # BiolinkHelper
13
+
14
+ `BiolinkHelper` is a lightweight Python utility class for working with a specific **Biolink Model version**.
15
+ Its data source is the [Biolink Model YAML](https://github.com/biolink/biolink-model/blob/master/biolink-model.yaml) file.
16
+ ---
17
+
18
+ ## Overview
19
+
20
+ The `BiolinkHelper` class requires two mandatory inputs at initialization time:
21
+
22
+ * A **Biolink Model version**
23
+ * A **local cache path** where Biolink resources are stored (or will be stored)
24
+
25
+ No defaults are assumed.
26
+
27
+ ---
28
+
29
+ ## Installation
30
+
31
+
32
+ ```bash
33
+ pip install biolink-helper-pkg
34
+ ```
35
+
36
+ ---
37
+
38
+ ## Usage
39
+
40
+ ### Basic Example
41
+
42
+ ```python
43
+ from biolink_helper_pkg import BiolinkHelper
44
+
45
+ helper = BiolinkHelper(
46
+ biolink_version="4.2.6",
47
+ cached_path="/data/biolink_cache"
48
+ )
49
+ ```
50
+
51
+ ---
52
+
53
+ ## Class API
54
+
55
+ ### `BiolinkHelper`
56
+
57
+ ```python
58
+ class BiolinkHelper:
59
+
60
+ def __init__(self, biolink_version, cached_path):
61
+ ...
62
+ ```
63
+
64
+ #### Parameters
65
+
66
+ | Name | Type | Required | Description |
67
+ | ----------------- | ----- | -------- |-------------------------------------------------------------------------------------------------------------|
68
+ | `biolink_version` | `str` | Yes | The Biolink Model version to use (e.g. `"4.2.6"`). This value is mandatory and must be provided explicitly. |
69
+ | `cached_path` | `str` | Yes | Path to a directory used for caching Biolink YAML files and lookup maps. Must be readable and writable. |
70
+
71
+ > Both parameters are **mandatory**.
72
+ > Initialization will fail if either is missing or invalid.
73
+
74
+ ---
75
+
76
+ #### How to use
77
+
78
+ Examples of ways to get **ancestors**:
79
+ ```
80
+ biolink_helper.get_ancestors("biolink:Drug")
81
+ biolink_helper.get_ancestors(["biolink:Drug", "biolink:Protein"])
82
+ biolink_helper.get_ancestors("biolink:Drug", include_mixins=False)
83
+ biolink_helper.get_ancestors("biolink:Protein", include_conflations=False)
84
+ biolink_helper.get_ancestors("biolink:treats")
85
+ ```
86
+
87
+ Examples of ways to get **descendants**:
88
+ ```
89
+ biolink_helper.get_descendants("biolink:ChemicalEntity")
90
+ biolink_helper.get_descendants(["biolink:ChemicalEntity", "biolink:Protein"])
91
+ biolink_helper.get_descendants("biolink:ChemicalEntity", include_mixins=False)
92
+ biolink_helper.get_descendants("biolink:Protein", include_conflations=False)
93
+ biolink_helper.get_descendants("biolink:related_to")
94
+ ```
95
+
96
+ Ancestors/descendants are always returned in a list. Relevant mixins are included in the returned list by default, but you can turn that behavior off via the `include_mixins` parameter, as shown in some of the above examples. Inclusion of ARAX-defined conflations can be controlled via the `include_conflations` parameter (default is True).
97
+
98
+ Other available methods include getting **canonical predicates**:
99
+
100
+ ```
101
+ biolink_helper.get_canonical_predicates("biolink:treated_by")
102
+ biolink_helper.get_canonical_predicates(["biolink:treated_by", "biolink:related_to"])
103
+ ```
104
+
105
+ And **filtering out mixins**:
106
+
107
+ ```
108
+ biolink_helper.filter_out_mixins(["biolink:ChemicalEntity", "biolink:PhysicalEssence"]])
109
+ ```
110
+
111
+
112
+ #### Debugging
113
+
114
+ The JSON version saved in `cached_path` is just there for easier debugging/viewing.
@@ -0,0 +1,9 @@
1
+ README.md
2
+ pyproject.toml
3
+ biolink_helper_pkg/__init__.py
4
+ biolink_helper_pkg/biolink_helper.py
5
+ biolink_helper_pkg.egg-info/PKG-INFO
6
+ biolink_helper_pkg.egg-info/SOURCES.txt
7
+ biolink_helper_pkg.egg-info/dependency_links.txt
8
+ biolink_helper_pkg.egg-info/requires.txt
9
+ biolink_helper_pkg.egg-info/top_level.txt
@@ -0,0 +1,3 @@
1
+ networkx>=2.8
2
+ PyYAML>=6.0
3
+ requests>=2.25
@@ -0,0 +1 @@
1
+ biolink_helper_pkg
@@ -0,0 +1,20 @@
1
+ [build-system]
2
+ requires = ["setuptools>=61.0", "wheel"]
3
+ build-backend = "setuptools.build_meta"
4
+
5
+ [project]
6
+ name = "biolink-helper-pkg"
7
+ version = "1.0.0"
8
+ description = "Standalone BiolinkHelper module extracted from RTX"
9
+ readme = "README.md"
10
+ authors = [
11
+ { name="Your Name", email="you@example.com" }
12
+ ]
13
+ license = { file="LICENSE" }
14
+ requires-python = ">=3.8"
15
+ dependencies = [
16
+ "networkx>=2.8",
17
+ "PyYAML>=6.0",
18
+ "requests>=2.25"
19
+ ]
20
+
@@ -0,0 +1,4 @@
1
+ [egg_info]
2
+ tag_build =
3
+ tag_date = 0
4
+