PyPI - scibite-toolkit - Versions diffs - 1.0.0__tar.gz - Mend

scibite-toolkit 1.0.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (19) hide show

scibite_toolkit-1.0.0/LICENSE.txt +1 -0
scibite_toolkit-1.0.0/PKG-INFO +241 -0
scibite_toolkit-1.0.0/README.md +221 -0
scibite_toolkit-1.0.0/scibite_toolkit/__init__.py +4 -0
scibite_toolkit-1.0.0/scibite_toolkit/centree.py +158 -0
scibite_toolkit-1.0.0/scibite_toolkit/docstore.py +324 -0
scibite_toolkit-1.0.0/scibite_toolkit/scibite_search.py +990 -0
scibite_toolkit-1.0.0/scibite_toolkit/termite.py +1119 -0
scibite_toolkit-1.0.0/scibite_toolkit/texpress.py +592 -0
scibite_toolkit-1.0.0/scibite_toolkit/utilities.py +108 -0
scibite_toolkit-1.0.0/scibite_toolkit/workbench.py +780 -0
scibite_toolkit-1.0.0/scibite_toolkit.egg-info/PKG-INFO +241 -0
scibite_toolkit-1.0.0/scibite_toolkit.egg-info/SOURCES.txt +17 -0
scibite_toolkit-1.0.0/scibite_toolkit.egg-info/dependency_links.txt +1 -0
scibite_toolkit-1.0.0/scibite_toolkit.egg-info/not-zip-safe +1 -0
scibite_toolkit-1.0.0/scibite_toolkit.egg-info/requires.txt +7 -0
scibite_toolkit-1.0.0/scibite_toolkit.egg-info/top_level.txt +1 -0
scibite_toolkit-1.0.0/setup.cfg +4 -0
scibite_toolkit-1.0.0/setup.py +37 -0

scibite_toolkit-1.0.0/LICENSE.txt ADDED Viewed

	@@ -0,0 +1 @@
1	+ The toolkit is dual licensed, if you are a customer it is covered under your existing conditions, if not currently a customer the toolkit is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/4.0/.

scibite_toolkit-1.0.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,241 @@
+Metadata-Version: 2.1
+Name: scibite_toolkit
+Version: 1.0.0
+Summary: scibite-toolkit - python library for calling SciBite applications: TERMite, TExpress, SciBite Search, CENtree and Workbench. The library also enables processing of the JSON results from such requests
+Home-page: https://github.com/elsevier-health/scibite-toolkit
+Author: SciBite
+Author-email: help@scibite.com
+License: Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License
+Classifier: Programming Language :: Python :: 3
+Classifier: Operating System :: OS Independent
+Description-Content-Type: text/markdown
+License-File: LICENSE.txt
+Requires-Dist: bs4
+Requires-Dist: pandas
+Requires-Dist: openpyxl
+Requires-Dist: requests
+Requires-Dist: sphinx
+Requires-Dist: sphinx-js
+Requires-Dist: rst2pdf
+### Project Description
+scibite-toolkit - python library for making calls to [SciBite](https://www.scibite.com/)'s TERMite, CENtree, Workbench and SciBite Search.
+The library also enables post-processing of the JSON returned from such requests.
+## Install
+```
+$ pip3 install scibite_toolkit
+```
+Versions listed on [PyPi](https://pypi.org/project/scibite-toolkit/)!
+## Example call to TERMite
+In this example call to TERMite, we will annotate one zip file from MEDLINE and then process the output to a dataframe with the built in functions of the toolkit.
+We will use the first zip file from PubMed's Annual Baseline files.
+Two example scripts will be shown - one that authenticates with a SciBite hosted instance of TERMite and one that hosts with a local instance of TERMite (hosted by customer).
+*Please note the following:
+ you can test with any file.
+If you would like to test with just text (and not a file), please use "t.set_text('your text') and don't use the t.set_binary_content command.
+### Example 1 - SciBite Hosted instance of TERMite
+```python
+import pandas as pd
+from scibite_toolkit import termite
+# Initialize your TERMite Request
+t = termite.TermiteRequestBuilder()
+# Specify your TERMite API Endpoint and login URL
+t.set_url('url_endpoint')
+t.set_saas_login_url('login_url')
+# Authenticate with the instance
+username = 'username
+password = 'password'
+t.set_auth_saas(username, password)
+# Set your runtime options
+t.set_entities('INDICATION')  # comma separated list of VOCabs you want to run over your data
+t.set_input_format('medline.xml')  # the input format of the data sent to TERMite
+t.set_output_format('json')  # the output format of the response from TERMite
+t.set_binary_content('path/to/file')  # the file path of the file you want to annotate
+t.set_subsume(True)  # set subsume run time option (RTO) to true
+# Execute the request and convert response to dataframe for easy analysis
+termite_response = t.execute()
+resp_df = termite.get_termite_dataframe(termite_response)
+print(resp_df.head(3))
+```
+### Example 2 - Local Instance of TERMite (Hosted by Customer)
+```python
+import pandas as pd
+from scibite_toolkit import termite
+# Initialize your TERMite Request
+t = termite.TermiteRequestBuilder()
+# Specify your TERMite API Endpoint and login URL
+t.set_url('url_endpoint')
+# Authenticate with the instance
+username = 'username'
+password = 'password^'
+t.set_basic_auth(username, password)
+# Set your runtime options
+t.set_entities('INDICATION')  # comma separated list of VOCabs you want to run over your data
+t.set_input_format('medline.xml')  # the input format of the data sent to TERMite
+t.set_output_format('json')  # the output format of the response from TERMite
+t.set_binary_content('path/to/file')  # the file path of the file you want to annotate
+t.set_subsume(True)  # set subsume run time option (RTO) to true
+# Execute the request and convert response to dataframe for easy analysis
+termite_response = t.execute()
+resp_df = termite.get_termite_dataframe(termite_response)
+print(resp_df.head(3))
+```
+## Example call to TExpress
+In this example call to TExpress, we will annotate one zip file from Medline and then process the output to a dataframe with the built in functions of the toolkit.
+We will use the first zip file from PubMed's Annual Baseline files.
+Two example scripts will be shown - one that authenticates with a SciBite hosted instance of TExpress and one that authenticates with a local instance of TExpress (hosted by the customer).
+Please note the following:
+ you can test with any file.
+If you would like to test with just text (and not a file), please use "t.set_text('your text') and don't use the t.set_binary_content command.
+### Example 1 - SciBite Hosted Instance of TExpress
+```python
+import pandas as pd
+from scibite_toolkit import texpress
+# Initialize your TERMite Request
+t = texpress.TexpressRequestBuilder()
+# Specify your TERMite API Endpoint and login URL
+t.set_url('url_endpoint')
+t.set_saas_login_url('login_url')
+# Authenticate with the instance
+username = 'username'
+password = 'password'
+t.set_auth_saas(username, password)
+# Set your runtime options
+t.set_entities('INDICATION')  # comma separated list of VOCabs you want to run over your data
+t.set_input_format('medline.xml')  # the input format of the data sent to TERMite
+t.set_output_format('json')  # the output format of the response from TERMite
+t.set_binary_content('path/to/file')  # the file path of the file you want to annotate
+t.set_subsume(True)  # set subsume run time option (RTO) to true
+t.set_pattern(':(INDICATION):{0,5}:(INDICATION)')  # pattern to tell TExpress what to look for within data
+# Execute the request and convert response to dataframe for easy analysis
+texpress_resp = t.execute()
+resp_df = texpress.get_texpress_dataframe(texpress_resp)
+print(resp_df.head(3))
+```
+### Example 2 - Local Instance of TExpress (Hosted by Customer)
+```python
+import pandas as pd
+from scibite_toolkit import texpress
+# Initialize your TERMite Request
+t = texpress.TexpressRequestBuilder()
+# Specify your TERMite API Endpoint
+t.set_url('url_endpoint')
+# Authenticate with the instance
+username = 'username'
+password = 'password'
+t.set_basic_auth(username, password)
+# Set your runtime options
+t.set_entities('INDICATION')  # comma separated list of VOCabs you want to run over your data
+t.set_input_format('pdf')  # the input format of the data sent to TERMite
+t.set_output_format('medline.xml')  # the output format of the response from TERMite
+t.set_binary_content('/path/to/file')  # the file path of the file you want to annotate
+t.set_subsume(True)  # set subsume run time option (RTO) to true
+t.set_pattern(':(INDICATION):{0,5}:(INDICATION)')  # pattern to tell TExpress what to look for within data
+# Execute the request and convert response to dataframe for easy analysis
+texpress_resp = t.execute()
+resp_df = texpress.get_texpress_dataframe(texpress_resp)
+print(resp_df.head(3))
+```
+## Example call to SciBite Search
+```python
+from scibite_toolkit import scibite_search
+# First authenticate - The examples provided are assuming our SaaS-hosted instances, adapt accordingly
+ss_home = 'https://yourdomain-search.saas.scibite.com/'
+sbs_auth_url = "https://yourdomain.saas.scibite.com/"
+client_id = "yourclientid"
+client_secret ="yourclientsecret"
+s = scibite_search.SBSRequestBuilder()
+s.set_url(ss_home)
+s.set_auth_url(sbs_auth_url)
+s.set_oauth2(client_id,client_secret) #Authentication will last according to what was was set up when generating the client
+# Now you can use the request object
+# Search over documents
+sample_query = 'schema_id="clinical_trial" AND (title~INDICATION$D011565 AND DRUG$*)'
+# Note that endpoint is capped at 100 results, but you can paginate using the offset parameter
+response = s.get_docs(query=sample_query,markup=True,limit=100)
+# Co-ocurrence search across sentences
+# Get the top 50 co-ocurrence sentence aggregates for psoriasis indication and any gene
+response = s.get_aggregates(query='INDICATION$D011565',vocabs=['HGNCGENE'],limit=50)
+```
+## Example call to Workbench
+```python
+from scibite_toolkit import workbench
+#first authenticate with the instance
+username = 'username'
+password = 'password'
+client_id = 'client_id'
+wb = WorkbenchRequestBuilder()
+url = 'https://workbench-url.com'
+wb.set_oauth2(client_id, username, password)
+#then set up your call - here we will be creating a WB dataset, uploading a file to it and annotating it
+wb.set_dataset_name = 'My Test Dataset'
+wb.set_dataset_desc = 'My Test Description'
+wb.create_dataset()
+wb.set_file_input('path/to/file.xlsx')
+wb.upload_file_to_dataset()
+#In this example, we will only annotate two columns with pre-selected VOCabs.
+#If you would like to tell WB to annotate the dataset without setting a termite config, just call auto_annotate_dataset
+vocabs = [[5,6],[8,9]]
+attrs = [200,201]
+wb.set_termite_config('',vocabs,attrs)
+wb.auto_annotate_dataset()
+```
+## License
+Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

scibite_toolkit-1.0.0/README.md ADDED Viewed

@@ -0,0 +1,221 @@
+### Project Description
+scibite-toolkit - python library for making calls to [SciBite](https://www.scibite.com/)'s TERMite, CENtree, Workbench and SciBite Search.
+The library also enables post-processing of the JSON returned from such requests.
+## Install
+```
+$ pip3 install scibite_toolkit
+```
+Versions listed on [PyPi](https://pypi.org/project/scibite-toolkit/)!
+## Example call to TERMite
+In this example call to TERMite, we will annotate one zip file from MEDLINE and then process the output to a dataframe with the built in functions of the toolkit.
+We will use the first zip file from PubMed's Annual Baseline files.
+Two example scripts will be shown - one that authenticates with a SciBite hosted instance of TERMite and one that hosts with a local instance of TERMite (hosted by customer).
+*Please note the following:
+ you can test with any file.
+If you would like to test with just text (and not a file), please use "t.set_text('your text') and don't use the t.set_binary_content command.
+### Example 1 - SciBite Hosted instance of TERMite
+```python
+import pandas as pd
+from scibite_toolkit import termite
+# Initialize your TERMite Request
+t = termite.TermiteRequestBuilder()
+# Specify your TERMite API Endpoint and login URL
+t.set_url('url_endpoint')
+t.set_saas_login_url('login_url')
+# Authenticate with the instance
+username = 'username
+password = 'password'
+t.set_auth_saas(username, password)
+# Set your runtime options
+t.set_entities('INDICATION')  # comma separated list of VOCabs you want to run over your data
+t.set_input_format('medline.xml')  # the input format of the data sent to TERMite
+t.set_output_format('json')  # the output format of the response from TERMite
+t.set_binary_content('path/to/file')  # the file path of the file you want to annotate
+t.set_subsume(True)  # set subsume run time option (RTO) to true
+# Execute the request and convert response to dataframe for easy analysis
+termite_response = t.execute()
+resp_df = termite.get_termite_dataframe(termite_response)
+print(resp_df.head(3))
+```
+### Example 2 - Local Instance of TERMite (Hosted by Customer)
+```python
+import pandas as pd
+from scibite_toolkit import termite
+# Initialize your TERMite Request
+t = termite.TermiteRequestBuilder()
+# Specify your TERMite API Endpoint and login URL
+t.set_url('url_endpoint')
+# Authenticate with the instance
+username = 'username'
+password = 'password^'
+t.set_basic_auth(username, password)
+# Set your runtime options
+t.set_entities('INDICATION')  # comma separated list of VOCabs you want to run over your data
+t.set_input_format('medline.xml')  # the input format of the data sent to TERMite
+t.set_output_format('json')  # the output format of the response from TERMite
+t.set_binary_content('path/to/file')  # the file path of the file you want to annotate
+t.set_subsume(True)  # set subsume run time option (RTO) to true
+# Execute the request and convert response to dataframe for easy analysis
+termite_response = t.execute()
+resp_df = termite.get_termite_dataframe(termite_response)
+print(resp_df.head(3))
+```
+## Example call to TExpress
+In this example call to TExpress, we will annotate one zip file from Medline and then process the output to a dataframe with the built in functions of the toolkit.
+We will use the first zip file from PubMed's Annual Baseline files.
+Two example scripts will be shown - one that authenticates with a SciBite hosted instance of TExpress and one that authenticates with a local instance of TExpress (hosted by the customer).
+Please note the following:
+ you can test with any file.
+If you would like to test with just text (and not a file), please use "t.set_text('your text') and don't use the t.set_binary_content command.
+### Example 1 - SciBite Hosted Instance of TExpress
+```python
+import pandas as pd
+from scibite_toolkit import texpress
+# Initialize your TERMite Request
+t = texpress.TexpressRequestBuilder()
+# Specify your TERMite API Endpoint and login URL
+t.set_url('url_endpoint')
+t.set_saas_login_url('login_url')
+# Authenticate with the instance
+username = 'username'
+password = 'password'
+t.set_auth_saas(username, password)
+# Set your runtime options
+t.set_entities('INDICATION')  # comma separated list of VOCabs you want to run over your data
+t.set_input_format('medline.xml')  # the input format of the data sent to TERMite
+t.set_output_format('json')  # the output format of the response from TERMite
+t.set_binary_content('path/to/file')  # the file path of the file you want to annotate
+t.set_subsume(True)  # set subsume run time option (RTO) to true
+t.set_pattern(':(INDICATION):{0,5}:(INDICATION)')  # pattern to tell TExpress what to look for within data
+# Execute the request and convert response to dataframe for easy analysis
+texpress_resp = t.execute()
+resp_df = texpress.get_texpress_dataframe(texpress_resp)
+print(resp_df.head(3))
+```
+### Example 2 - Local Instance of TExpress (Hosted by Customer)
+```python
+import pandas as pd
+from scibite_toolkit import texpress
+# Initialize your TERMite Request
+t = texpress.TexpressRequestBuilder()
+# Specify your TERMite API Endpoint
+t.set_url('url_endpoint')
+# Authenticate with the instance
+username = 'username'
+password = 'password'
+t.set_basic_auth(username, password)
+# Set your runtime options
+t.set_entities('INDICATION')  # comma separated list of VOCabs you want to run over your data
+t.set_input_format('pdf')  # the input format of the data sent to TERMite
+t.set_output_format('medline.xml')  # the output format of the response from TERMite
+t.set_binary_content('/path/to/file')  # the file path of the file you want to annotate
+t.set_subsume(True)  # set subsume run time option (RTO) to true
+t.set_pattern(':(INDICATION):{0,5}:(INDICATION)')  # pattern to tell TExpress what to look for within data
+# Execute the request and convert response to dataframe for easy analysis
+texpress_resp = t.execute()
+resp_df = texpress.get_texpress_dataframe(texpress_resp)
+print(resp_df.head(3))
+```
+## Example call to SciBite Search
+```python
+from scibite_toolkit import scibite_search
+# First authenticate - The examples provided are assuming our SaaS-hosted instances, adapt accordingly
+ss_home = 'https://yourdomain-search.saas.scibite.com/'
+sbs_auth_url = "https://yourdomain.saas.scibite.com/"
+client_id = "yourclientid"
+client_secret ="yourclientsecret"
+s = scibite_search.SBSRequestBuilder()
+s.set_url(ss_home)
+s.set_auth_url(sbs_auth_url)
+s.set_oauth2(client_id,client_secret) #Authentication will last according to what was was set up when generating the client
+# Now you can use the request object
+# Search over documents
+sample_query = 'schema_id="clinical_trial" AND (title~INDICATION$D011565 AND DRUG$*)'
+# Note that endpoint is capped at 100 results, but you can paginate using the offset parameter
+response = s.get_docs(query=sample_query,markup=True,limit=100)
+# Co-ocurrence search across sentences
+# Get the top 50 co-ocurrence sentence aggregates for psoriasis indication and any gene
+response = s.get_aggregates(query='INDICATION$D011565',vocabs=['HGNCGENE'],limit=50)
+```
+## Example call to Workbench
+```python
+from scibite_toolkit import workbench
+#first authenticate with the instance
+username = 'username'
+password = 'password'
+client_id = 'client_id'
+wb = WorkbenchRequestBuilder()
+url = 'https://workbench-url.com'
+wb.set_oauth2(client_id, username, password)
+#then set up your call - here we will be creating a WB dataset, uploading a file to it and annotating it
+wb.set_dataset_name = 'My Test Dataset'
+wb.set_dataset_desc = 'My Test Description'
+wb.create_dataset()
+wb.set_file_input('path/to/file.xlsx')
+wb.upload_file_to_dataset()
+#In this example, we will only annotate two columns with pre-selected VOCabs.
+#If you would like to tell WB to annotate the dataset without setting a termite config, just call auto_annotate_dataset
+vocabs = [[5,6],[8,9]]
+attrs = [200,201]
+wb.set_termite_config('',vocabs,attrs)
+wb.auto_annotate_dataset()
+```
+## License
+Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

scibite_toolkit-1.0.0/scibite_toolkit/__init__.py ADDED Viewed

@@ -0,0 +1,4 @@
+from .termite import *
+from .texpress import *

scibite_toolkit-1.0.0/scibite_toolkit/centree.py ADDED Viewed

@@ -0,0 +1,158 @@
+import requests
+import logging
+# Get the logger for this module
+logger = logging.getLogger(__name__)
+class CentreeRequestBuilder:
+    """
+    Class for creating CENtree Requests.
+    """
+    def __init__(self, timeout: int = 10):
+        """
+        Initialize the CentreeRequestBuilder.
+        Parameters
+        ----------
+        timeout : int, optional
+            The timeout for HTTP requests in seconds (default is 10 seconds).
+        """
+        self.centree_url = ''
+        self.headers = {}
+        self.session = requests.Session()
+        self.timeout = timeout
+        self.logger: logging.Logger = logger
+    def set_url(self, centree_url: str):
+        """
+        Set the URL of the CENtree instance.
+        Parameters
+        ----------
+        centree_url : str
+            The URL of the CENtree instance to be hit.
+        Examples
+        --------
+        >>> crb.set_url("http://example.com")
+        """
+        self.centree_url = centree_url.rstrip('/')
+        self.logger.info(f"Set CENtree URL to {self.centree_url}")
+    def set_authentication(self, username: str, password: str, remember_me: bool = True, verification: bool = True):
+        """
+        Authenticates with the CENtree token API using username and password, generates an access token,
+        and sets the request header.
+        Parameters
+        ----------
+        username : str
+            The username for authentication.
+        password : str
+            The password for authentication.
+        remember_me : bool, optional
+            Whether to remember the user (default is True).
+        verification : bool, optional
+            Whether to verify SSL certificates (default is True).
+        Examples
+        --------
+        >>> crb.set_authentication("user", "pass")
+        """
+        authenticate_url = f"{self.centree_url}/api/authenticate"
+        try:
+            token_response = self.session.post(
+                authenticate_url,
+                json={
+                    "rememberMe": remember_me,
+                    "username": username,
+                    "password": password,
+                },
+                headers={"Content-Type": "application/json"},
+                verify=verification,
+                timeout=self.timeout
+            )
+            token_response.raise_for_status()
+            access_token = token_response.json().get("id_token")
+            if not access_token:
+                raise ValueError("Access token not found in the response.")
+            self.headers = {"Authorization": f"Bearer {access_token}"}
+            self.logger.info("Authentication successful")
+        except requests.exceptions.HTTPError as http_err:
+            self.logger.error(f"HTTP error occurred: {http_err.response.status_code} - {http_err.response.reason}")
+            raise http_err  # Re-raise the HTTPError for the test to catch
+        except requests.exceptions.RequestException as req_err:
+            self.logger.error(f"Request error: {req_err}")
+            raise req_err  # Re-raise the RequestException for the test to catch
+        except ValueError as val_err:
+            self.logger.error(f"Value error: {val_err}")
+            raise val_err  # Re-raise the ValueError for the test to catch
+        except Exception as err:
+            self.logger.error(f"An error occurred: {err}")
+            raise err  # Re-raise the generic exception for the test to catch
+    def search_classes(self, query: str, ontology_id: str = None, exact: bool = False, obsolete: bool = False,
+                       page_from: int = 0, page_size: int = 10) -> dict:
+        """
+        Search classes in the CENtree ontology.
+        Parameters
+        ----------
+        query : str
+            The search query.
+        ontology_id : str, optional
+            The ontology ID to search within.
+        exact : bool, optional
+            Whether to perform an exact search (default is False).
+        obsolete : bool, optional
+            Whether to include obsolete classes (default is False).
+        page_from : int, optional
+            The starting page number (default is 0).
+        page_size : int, optional
+            The number of results per page (default is 10).
+        Returns
+        -------
+        dict
+            The JSON response from the search endpoint.
+        Examples
+        --------
+        >>> result = crb.search_classes("diabetes")
+        """
+        params = {
+            "q": query,
+            "ontology": ontology_id,
+            "from": page_from,
+            "size": page_size
+        }
+        # Clean up params dictionary to remove None values
+        params = {k: v for k, v in params.items() if v is not None}
+        # Construct the endpoint URL
+        endpoint_suffix = ''
+        if obsolete:
+            endpoint_suffix += '/obsolete'
+        if exact:
+            endpoint_suffix += '/exact'
+        search_endpoint = f"{self.centree_url}/api/search{endpoint_suffix}"
+        try:
+            response = self.session.get(search_endpoint, params=params, headers=self.headers, timeout=self.timeout)
+            response.raise_for_status()
+            self.logger.info("Search request successful")
+            return response.json()
+        except requests.exceptions.HTTPError as http_err:
+            self.logger.error(f"HTTP error occurred: {http_err}")
+        except requests.exceptions.RequestException as req_err:
+            self.logger.error(f"Request error occurred: {req_err}")
+        except Exception as err:
+            self.logger.error(f"An error occurred: {err}")