PyPI - genal-python - Versions diffs - 0.9__tar.gz → 1.0__tar.gz - Mend

genal-python 0.9tar.gz → 1.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (106) hide show

{genal_python-0.9 → genal_python-1.0}/.DS_Store RENAMED Viewed

Binary file

genal_python-1.0/Genal_flowchart.png ADDED Viewed

Binary file

{genal_python-0.9 → genal_python-1.0}/PKG-INFO RENAMED Viewed

@@ -1,9 +1,9 @@
-Metadata-Version: 2.1
+Metadata-Version: 2.3
 Name: genal-python
-Version: 0.9
+Version: 1.0
 Summary: A python toolkit for polygenic risk scoring and mendelian randomization.
 Author-email: Cyprien Rivier <riviercyprien@gmail.com>
-Requires-Python: >=3.7
+Requires-Python: >=3.8
 Description-Content-Type: text/markdown
 Classifier: Programming Language :: Python :: 3
 Classifier: License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)
@@ -16,12 +16,16 @@ Requires-Dist: plotnine==0.12.3
 Requires-Dist: psutil==5.9.1
 Requires-Dist: pyliftover==0.4
 Requires-Dist: scikit_learn>=1.3.0
-Requires-Dist: scipy>=1.11.4
+Requires-Dist: scipy>=1.10.1, <1.11
 Requires-Dist: statsmodels==0.14.0
 Requires-Dist: tqdm==4.66.1
 Requires-Dist: wget==3.2
 Project-URL: Home, https://github.com/CypRiv/genal
+[![Python 3.8](https://img.shields.io/badge/python-3.8%20%7C%203.9%20%7C%203.10-blue)](https://www.python.org/downloads/release/python-3100/)
+<img src="/genal_logo.png" data-canonical-src="/genal_logo.png" height="80" />
 <center><h1> genal: A Python Toolkit for Genetic Risk Scoring and Mendelian Randomization </h1></center>
@@ -54,12 +58,15 @@ The module prioritizes user-friendliness and intuitive operation, aiming to redu
 Genal draws on concepts from well-established R packages such as TwoSampleMR, MR-Presso, MendelianRandomization, and gwasvcf, adapting their proven methodologies to the Python environment. This approach ensures that users have access to tried and tested techniques with the versatility of Python's data science tools.
+<img src="/Genal_flowchart.png" data-canonical-src="/Genal_flowchart.png" style="max-width:100%;" />
+Genal flowchart. Created in https://www.BioRender.com
 ## Citation <a name="citation"></a>
 If you're using genal, please cite the following paper:
 **Genal: A Python Toolkit for Genetic Risk Scoring and Mendelian Randomization.** Cyprien A. Rivier, Santiago Clocchiatti-Tuozzo, Shufan Huo, Victor Torres-Lopez, Daniela Renedo, Kevin N. Sheth, Guido J. Falcone, Julian N. Acosta. medRxiv 2024.05.23.24307776; doi: https://doi.org/10.1101/2024.05.23.24307776
 ## Requirements for the genal module <a name="paragraph1"></a>
-***Python 3.11 or later***. https://www.python.org/ <br>
+***Python 3.8 or later***. https://www.python.org/ <br>
 ## Installation and How to use the genal module <a name="paragraph2"></a>
@@ -70,7 +77,7 @@ If you're using genal, please cite the following paper:
 >
 > **Optional**: It is recommended to create a new environment to avoid dependencies conflicts. Here, we create a new conda environment called 'genal_env'.
 > ```
-> conda create --name genal_env python=3.11
+> conda create --name genal_env python=3.8
 > conda activate genal_env
 > ```
@@ -84,12 +91,19 @@ And import it in a python environment with:
 import genal
 ```
-The main genal functionalities require a working installation of PLINK v1.9 that can be downloaded here: https://www.cog-genomics.org/plink/
-Once downloaded, the path to the plink executable can be set with:
+The main genal functionalities require a working installation of PLINK v1.9 (and not 2.0 as certain functionalities have not been updated yet).
+If you have already installed plink v1.9, you can set the path to its executable with:
 ```
 genal.set_plink(path="/path/to/plink/executable/file")
 ```
+If plink is not installed, genal can install the correct version for your system with the following line:
+```
+genal.install_plink()
+```
 ### Documentation <a name="paragraph2.2"></a>
 For detailed information on how to use the functionalities of Genal, please refer to the documentation: https://genal.rtfd.io
@@ -124,7 +138,7 @@ For this tutorial, we will obtain genetic instruments for systolic blood pressur
 ### Data loading <a name="paragraph3.1"></a>
-We start this tutorial with publicly available summary statistics from a large GWAS study of systolic blood pressure. [Link to study](https://www.nature.com/articles/s41588-018-0205-x). After downloading and unzipping the summary statistics, we load them into a pandas DataFrame:
+We start this tutorial with publicly available summary statistics from a large GWAS study of systolic blood pressure. [Link to study](https://www.nature.com/articles/s41588-018-0205-x). [Download link](http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST006001-GCST007000/GCST006624/Evangelou_30224653_SBP.txt.gz). After downloading and unzipping the summary statistics, we load them into a pandas DataFrame:
 ```python
 import pandas as pd
@@ -378,7 +392,7 @@ You can customize how the proxies are chosen with the following arguments:
 To run MR, we need to load both our exposure and outcome SNP-level data in `genal.Geno` instances. In our case, the genetic instruments of the MR are the SNPs associated with blood pressure at genome-wide significant levels resulting from the clumping of the blood pressure GWAS. They are stored in our `SBP_clumped` `genal.Geno` instance which also include their association with the exposure trait (instrument-SBP estimates in the `BETA` column).
-To get their association with the outcome trait (instrument-stroke estimates), we are going to use SNP-level data from a large GWAS of stroke performed by the GIGASTROKE consortium ([https://www.nature.com/articles/s41586-022-05165-3](https://www.nature.com/articles/s41586-022-05165-3)):
+To get their association with the outcome trait (instrument-stroke estimates), we are going to use SNP-level data from a large GWAS of stroke performed by the GIGASTROKE consortium: [Link to study](https://www.nature.com/articles/s41586-022-05165-3). [Link to download](http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST90104001-GCST90105000/GCST90104539/GCST90104539_buildGRCh37.tsv.gz):
 ```python
 stroke_gwas = pd.read_csv("GCST90104539_buildGRCh37.tsv",sep="\t")

{genal_python-0.9 → genal_python-1.0}/README.md RENAMED Viewed

@@ -1,3 +1,7 @@
+[![Python 3.8](https://img.shields.io/badge/python-3.8%20%7C%203.9%20%7C%203.10-blue)](https://www.python.org/downloads/release/python-3100/)
+<img src="/genal_logo.png" data-canonical-src="/genal_logo.png" height="80" />
 <center><h1> genal: A Python Toolkit for Genetic Risk Scoring and Mendelian Randomization </h1></center>
@@ -30,12 +34,15 @@ The module prioritizes user-friendliness and intuitive operation, aiming to redu
 Genal draws on concepts from well-established R packages such as TwoSampleMR, MR-Presso, MendelianRandomization, and gwasvcf, adapting their proven methodologies to the Python environment. This approach ensures that users have access to tried and tested techniques with the versatility of Python's data science tools.
+<img src="/Genal_flowchart.png" data-canonical-src="/Genal_flowchart.png" style="max-width:100%;" />
+Genal flowchart. Created in https://www.BioRender.com
 ## Citation <a name="citation"></a>
 If you're using genal, please cite the following paper:
 **Genal: A Python Toolkit for Genetic Risk Scoring and Mendelian Randomization.** Cyprien A. Rivier, Santiago Clocchiatti-Tuozzo, Shufan Huo, Victor Torres-Lopez, Daniela Renedo, Kevin N. Sheth, Guido J. Falcone, Julian N. Acosta. medRxiv 2024.05.23.24307776; doi: https://doi.org/10.1101/2024.05.23.24307776
 ## Requirements for the genal module <a name="paragraph1"></a>
-***Python 3.11 or later***. https://www.python.org/ <br>
+***Python 3.8 or later***. https://www.python.org/ <br>
 ## Installation and How to use the genal module <a name="paragraph2"></a>
@@ -46,7 +53,7 @@ If you're using genal, please cite the following paper:
 >
 > **Optional**: It is recommended to create a new environment to avoid dependencies conflicts. Here, we create a new conda environment called 'genal_env'.
 > ```
-> conda create --name genal_env python=3.11
+> conda create --name genal_env python=3.8
 > conda activate genal_env
 > ```
@@ -60,12 +67,19 @@ And import it in a python environment with:
 import genal
 ```
-The main genal functionalities require a working installation of PLINK v1.9 that can be downloaded here: https://www.cog-genomics.org/plink/
-Once downloaded, the path to the plink executable can be set with:
+The main genal functionalities require a working installation of PLINK v1.9 (and not 2.0 as certain functionalities have not been updated yet).
+If you have already installed plink v1.9, you can set the path to its executable with:
 ```
 genal.set_plink(path="/path/to/plink/executable/file")
 ```
+If plink is not installed, genal can install the correct version for your system with the following line:
+```
+genal.install_plink()
+```
 ### Documentation <a name="paragraph2.2"></a>
 For detailed information on how to use the functionalities of Genal, please refer to the documentation: https://genal.rtfd.io
@@ -100,7 +114,7 @@ For this tutorial, we will obtain genetic instruments for systolic blood pressur
 ### Data loading <a name="paragraph3.1"></a>
-We start this tutorial with publicly available summary statistics from a large GWAS study of systolic blood pressure. [Link to study](https://www.nature.com/articles/s41588-018-0205-x). After downloading and unzipping the summary statistics, we load them into a pandas DataFrame:
+We start this tutorial with publicly available summary statistics from a large GWAS study of systolic blood pressure. [Link to study](https://www.nature.com/articles/s41588-018-0205-x). [Download link](http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST006001-GCST007000/GCST006624/Evangelou_30224653_SBP.txt.gz). After downloading and unzipping the summary statistics, we load them into a pandas DataFrame:
 ```python
 import pandas as pd
@@ -354,7 +368,7 @@ You can customize how the proxies are chosen with the following arguments:
 To run MR, we need to load both our exposure and outcome SNP-level data in `genal.Geno` instances. In our case, the genetic instruments of the MR are the SNPs associated with blood pressure at genome-wide significant levels resulting from the clumping of the blood pressure GWAS. They are stored in our `SBP_clumped` `genal.Geno` instance which also include their association with the exposure trait (instrument-SBP estimates in the `BETA` column).
-To get their association with the outcome trait (instrument-stroke estimates), we are going to use SNP-level data from a large GWAS of stroke performed by the GIGASTROKE consortium ([https://www.nature.com/articles/s41586-022-05165-3](https://www.nature.com/articles/s41586-022-05165-3)):
+To get their association with the outcome trait (instrument-stroke estimates), we are going to use SNP-level data from a large GWAS of stroke performed by the GIGASTROKE consortium: [Link to study](https://www.nature.com/articles/s41586-022-05165-3). [Link to download](http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST90104001-GCST90105000/GCST90104539/GCST90104539_buildGRCh37.tsv.gz):
 ```python
 stroke_gwas = pd.read_csv("GCST90104539_buildGRCh37.tsv",sep="\t")

{genal_python-0.9 → genal_python-1.0}/docs/.DS_Store RENAMED Viewed

Binary file

{genal_python-0.9 → genal_python-1.0/docs}/requirements.txt RENAMED Viewed

@@ -1,13 +1,14 @@
+sphinx
+sphinx_rtd_theme
 aiohttp==3.9.5
 nest_asyncio==1.5.5
-numpy>=1.24.4, <2.0
+numpy>=1.24.4,<2.0
 pandas>=2.0.3
 plotnine==0.12.3
 psutil==5.9.1
 pyliftover==0.4
 scikit_learn>=1.3.0
 scipy>=1.11.4
-sphinx_rtd_theme==1.3.0
 statsmodels==0.14.0
 tqdm==4.66.1
-wget==3.2
+wget==3.2

{genal_python-0.9 → genal_python-1.0}/docs/source/conf.py RENAMED Viewed

@@ -13,7 +13,7 @@ sys.path.insert(0, os.path.abspath('../../'))
 project = 'genal'
 copyright = '2023, Cyprien A. Rivier'
 author = 'Cyprien A. Rivier'
-release = 'v0.9'
+release = 'v1.0'
 # -- General configuration ---------------------------------------------------

{genal_python-0.9 → genal_python-1.0}/docs/source/index.rst RENAMED Viewed

@@ -6,9 +6,9 @@
 genal: A Python Toolkit for Genetic Risk Scoring and Mendelian Randomization
 ============================================================================
-:Author: Cyprien Rivier
+:Author: Cyprien A. Rivier
 :Date: |today|
-:Version: "0.8"
+:Version: "1.0"
 Genal is a python module designed to make it easy to run genetic risk scores and mendelian randomization analyses. It integrates a collection of tools that facilitate the cleaning of single nucleotide polymorphism data (usually derived from Genome-Wide Association Studies) and enable the execution of key clinical population genetic workflows. The functionalities provided by genal include clumping, lifting, association testing, polygenic risk scoring, and Mendelian randomization analyses, all within a single Python module.
@@ -46,7 +46,7 @@ Citation
 If you use genal in your work, please cite the following paper:
 .. [Rivier.2024] *Genal: A Python Toolkit for Genetic Risk Scoring and Mendelian Randomization*
-   Cyprien Rivier, Cyprien A. Rivier, Santiago Clocchiatti-Tuozzo, Shufan Huo, Victor Torres-Lopez, Daniela Renedo, Kevin N. Sheth, Guido J. Falcone, Julian N. Acosta.
+   Cyprien A. Rivier, Santiago Clocchiatti-Tuozzo, Shufan Huo, Victor Torres-Lopez, Daniela Renedo, Kevin N. Sheth, Guido J. Falcone, Julian N. Acosta.
    medRxiv. 2024 May `10.1101/2024.05.23.24307776 <https://doi.org/10.1101/2024.05.23.24307776>`_.
 References

{genal_python-0.9 → genal_python-1.0}/docs/source/introduction.rst RENAMED Viewed

@@ -7,10 +7,10 @@ Installation
     .. code-block:: bash
-        conda create --name genal_env python=3.11
+        conda create --name genal_env python=3.8
         conda activate genal_env
-The genal package requires Python 3.11. Download and install it with pip:
+The genal package requires Python 3.8 or later. Download and install it with pip:
 .. code-block:: bash
@@ -22,13 +22,19 @@ And import it in a python environment with:
     import genal
-The main genal functionalities require a working installation of PLINK v1.9 (and not 2.0 as certain functionalities have not been updated yet) that can be downloaded here: https://www.cog-genomics.org/plink/
-Once downloaded, the path to the plink 1.9 executable should be set with:
+The main genal functionalities require a working installation of PLINK v1.9 (and not 2.0 as certain functionalities have not been updated yet).
+If you have already installed plink v1.9, you can set the path to its executable with:
 .. code-block:: python
     genal.set_plink(path="/path/to/plink/executable/file")
+If plink is not installed, genal can install the correct version for your system with the :meth:`~genal.tools.install_plink` function:
+.. code-block:: python
+    genal.install_plink()
 ========
 Tutorial
 ========
@@ -51,7 +57,7 @@ h. `GWAS Catalog`_
 Data loading
 ============
-We start this tutorial with publicly available summary statistics data from a large GWAS of systolic blood pressure (https://www.nature.com/articles/s41588-018-0205-x). After downloading and unzipping the summary statistics, we load the data into a pandas dataframe:
+We start this tutorial with publicly available summary statistics data from a large GWAS of systolic blood pressure `Link to study <https://www.nature.com/articles/s41588-018-0205-x>`_. `Download link <http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST006001-GCST007000/GCST006624/Evangelou_30224653_SBP.txt.gz>`_. After downloading and unzipping the summary statistics, we load the data into a pandas dataframe:
 .. code-block:: python
@@ -318,7 +324,8 @@ Mendelian Randomization
 To run MR, we need to load both our exposure and outcome SNP-level data in :class:`~genal.Geno` instances. In our case, the genetic instruments of the MR are the SNPs associated with blood pressure at genome-wide significant levels resulting from the clumping of the blood pressure GWAS. They are stored in our ``SBP_clumped`` :class:`~genal.Geno` instance which also include their association with the exposure trait (instrument-SBP estimates in the ``BETA`` column).
-To get their association with the outcome trait (instrument-stroke estimates), we are going to use SNP-level data from a large GWAS of stroke performed by the GIGASTROKE consortium (`Nature article <https://www.nature.com/articles/s41586-022-05165-3>`_):
+To get their association with the outcome trait (instrument-stroke estimates), we are going to use SNP-level data from a large GWAS of stroke performed by the GIGASTROKE consortium:
+`Link to study <https://www.nature.com/articles/s41586-022-05165-3>`_. `Download link <http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST90104001-GCST90105000/GCST90104539/GCST90104539_buildGRCh37.tsv.gz>`_.
 .. code-block:: python

{genal_python-0.9 → genal_python-1.0}/genal/Geno.py RENAMED Viewed

@@ -223,6 +223,8 @@ class Geno:
             and "EA" in data.columns
         )
         if missing_nea_condition and preprocessing in ['Fill', 'Fill_delete']:
+            check_allele_column(data, "EA", keep_multi)
+            self.checks["EA"] = True
             data = fill_nea(data, self.get_reference_panel(reference_panel))
         # Fill missing EA and NEA columns from reference data if necessary and preprocessing is enabled
@@ -254,7 +256,7 @@ class Geno:
             check_allele_condition = (allele_col in data.columns) and (
                 (preprocessing in ['Fill', 'Fill_delete']) or (not keep_multi)
             )
-            if check_allele_condition:
+            if check_allele_condition and not self.checks[allele_col]:
                 check_allele_column(data, allele_col, keep_multi)
                 self.checks[allele_col] = True
@@ -687,7 +689,9 @@ class Geno:
         snp_list = data["SNP"]
         # Extract SNPs using the provided path and SNP list
-        _ = extract_snps_func(snp_list, self.name, path)
+        path = extract_snps_func(snp_list, self.name, path)
+        if path == "FAILED":
+            raise ValueError("No SNPs were extracted from the genetic data and the association test can't be run.")
         # Perform the association test
         updated_data = association_test_func(

{genal_python-0.9 → genal_python-1.0}/genal/MR.py RENAMED Viewed

@@ -196,8 +196,6 @@ def mr_egger_regression(BETA_e, SE_e, BETA_o, SE_o):
         SE_e (numpy array): Standard errors corresponding to `BETA_e`.
         BETA_o (numpy array): Effect sizes of the same genetic variants on the outcome.
         SE_o (numpy array): Standard errors corresponding to `BETA_o`.
-        nboot (int): Number of boostrap iterations to obtain the standard error and p-value
-        cpus (int): Number of cpu cores to use in parallel for the boostrapping iterations.
     Returns:
         list of dict: A list containing two dictionaries with the results for the egger regression estimate and the egger regression intercept (horizontal pleiotropy estimate):

{genal_python-0.9 → genal_python-1.0}/genal/MR_tools.py RENAMED Viewed

@@ -2,7 +2,6 @@ import pandas as pd
 import numpy as np
 import datetime
 import os, subprocess
-import scipy.stats as st
 from pandas.api.types import is_numeric_dtype
 from .proxy import find_proxies, query_outcome_proxy

{genal_python-0.9 → genal_python-1.0}/genal/MRpresso.py RENAMED Viewed

@@ -2,8 +2,6 @@ import numpy as np
 import pandas as pd
 import statsmodels.api as sm
 import statsmodels.formula.api as smf
-from scipy import stats
-from scipy.stats import norm, chi2
 from concurrent.futures import ProcessPoolExecutor, as_completed
 from sklearn.linear_model import LinearRegression
 from tqdm import tqdm
@@ -11,7 +9,6 @@ from numpy.random import default_rng
 from functools import partial
 ##todo: implement the multivariable option, for the moment we assume only 1 BETA_e column
-# Also: check if we can replace the LinearRegression of sklearn with one from statsmodels to avoid using sklearn just for that
 # MR-PRESSO main function

{genal_python-0.9 → genal_python-1.0}/genal/__init__.py RENAMED Viewed

@@ -1,9 +1,9 @@
 import os
 import json
-from .tools import default_config, write_config, set_plink, delete_tmp, get_reference_panel_path
+from .tools import default_config, write_config, set_plink, install_plink, delete_tmp, get_reference_panel_path
 from .geno_tools import Combine_Geno
-__version__ = "0.9"
+__version__ = "1.0"
 config_dir = os.path.expanduser(
     "~/.genal/"

{genal_python-0.9 → genal_python-1.0}/genal/extract_prs.py RENAMED Viewed

@@ -121,14 +121,17 @@ def extract_snps_func(snp_list, name, path=None):
     else:
         extract_snps_from_combined_data(name, path, output_path, snp_list_path)
-    bim_path = output_path + ".bim"
-    if not os.path.exists(bim_path):
-        print(f"None of the provided SNPs were found in the genetic data.")
-        return "FAILED"
-    else:
-        print(f"Created bed/bim/fam fileset with extracted SNPs: {output_path}")
-        # Report SNPs not found
-        report_snps_not_found(nrow, name)
+    #Check that at least 1 variant has been extracted. If not, return "FAILED" to warn downstream functions (prs, association_test)
+    log_path = output_path + ".log"
+    with open(log_path, 'r') as log_file:
+        if "No variants remaining" in log_file.read():
+            print("None of the provided SNPs were found in the genetic data.")
+            return "FAILED"
+        else:
+            print(f"Created bed/bim/fam fileset with extracted SNPs: {output_path}")
+            # Report SNPs not found
+            report_snps_not_found(nrow, name)
     return output_path

{genal_python-0.9 → genal_python-1.0}/genal/geno_tools.py RENAMED Viewed

@@ -121,7 +121,7 @@ def check_beta_column(data, effect_column, preprocessing):
             "The argument effect_column accepts only 'BETA' or 'OR' as values."
         )
     if effect_column == "OR":
-        data["BETA"] = np.log(data["BETA"])
+        data["BETA"] = np.log(data["BETA"].clip(lower=0.01))
         data.drop(columns="SE", errors="ignore", inplace=True)
         print("The BETA column has been log-transformed to obtain Beta estimates.")
     return

{genal_python-0.9 → genal_python-1.0}/genal/snp_query.py RENAMED Viewed

@@ -2,7 +2,7 @@ import aiohttp
 import asyncio
 import numpy as np
 import nest_asyncio
-from tqdm.asyncio import tqdm_asyncio
+from tqdm.auto import tqdm
 # Using nest_asyncio to allow execution in notebooks
 nest_asyncio.apply()
@@ -10,8 +10,16 @@ nest_asyncio.apply()
 # Main function to start the event loop and run the asynchronous query
 def async_query_gwas_catalog(snps, p_threshold=5e-8, return_p=False, return_study=False,
                              max_associations=None, timeout=100):
-    loop = asyncio.get_event_loop()
-    results_global, errors, timeouts = loop.run_until_complete(query_gwas_catalog_coroutine(snps, p_threshold, return_p, return_study, max_associations, timeout))
+    try:
+        loop = asyncio.get_event_loop()
+    except RuntimeError:
+        loop = asyncio.new_event_loop()
+        asyncio.set_event_loop(loop)
+    results_global, errors, timeouts = loop.run_until_complete(
+        query_gwas_catalog_coroutine(
+            snps, p_threshold, return_p, return_study, max_associations, timeout
+        )
+    )
     return results_global, errors, timeouts
@@ -36,18 +44,21 @@ async def query_gwas_catalog_coroutine(snps, p_threshold=5e-8, return_p=False, r
     """
     results_global = {}  # Dictionary storing the SNP (keys) and results for each SNP: a list of single strings or tuples
-    errors = []  # List storing SNP for which the GWAS Catalog could not be queried
-    timeouts = [] # List storing SNP for which the timeout was reached
+    errors = []          # List storing SNP for which the GWAS Catalog could not be queried
+    timeouts = []        # List storing SNP for which the timeout was reached
-    async def fetch(session, url, timeout=timeout):
+    async def fetch(session, url, timeout_duration=timeout):
         try:
-            async with asyncio.timeout(timeout):
-                async with session.get(url) as response:
-                    if response.status == 200:
-                        return await response.json()
-                    return None
+            # Wrap the entire fetch operation with asyncio.wait_for for timeout
+            response = await asyncio.wait_for(session.get(url), timeout=timeout_duration)
+            async with response:
+                if response.status == 200:
+                    return await response.json()
+                return None
         except asyncio.TimeoutError:
             return "TIMEOUT"
+        except aiohttp.ClientError:
+            return "ERROR"
     async def process_snp(session, snp):
         #print(f"Processing SNP {snp}")
@@ -55,11 +66,13 @@ async def query_gwas_catalog_coroutine(snps, p_threshold=5e-8, return_p=False, r
         results_snp = []  # List storing the results for each association found for this SNP
         base_url = f"https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/{snp}/associations?projection=associationBySnp"
-        base_data = await fetch(session, base_url, timeout=timeout)
+        base_data = await fetch(session, base_url, timeout_duration=timeout)
         if base_data == "TIMEOUT":
             timeouts.append(snp)
-        elif base_data:
+        elif base_data == "ERROR" or base_data is None:
+            errors.append(snp)
+        else:
             i = 0
             # Process each association found for this SNP
             for assoc in base_data.get('_embedded', {}).get('associations', []):
@@ -72,13 +85,25 @@ async def query_gwas_catalog_coroutine(snps, p_threshold=5e-8, return_p=False, r
                 pvalue = assoc.get("pvalue", np.nan)
                 # If the pvalue of the association does not pass the threshold, the association is not processed further nor reported
                 if pvalue < p_threshold:
-                    trait = assoc.get("efoTraits", [])[0].get("trait", "")
+                    efo_traits = assoc.get("efoTraits", [])
+                    if efo_traits:
+                        trait = efo_traits[0].get("trait", "")
+                    else:
+                        trait = ""
                     # If the return_study flag is active: query the page containing the GWAS Catalog study ID
                     if return_study:
-                        study_url = assoc.get("_links", {}).get("study", {}).get("href", {})
-                        study_data = await fetch(session, study_url, timeout=timeout)
-                        study_id = "TIMEOUT" if study_data == "TIMEOUT" else study_data.get("accessionId", "") if study_data else "Not found"
+                        study_url = assoc.get("_links", {}).get("study", {}).get("href", "")
+                        if study_url:
+                            study_data = await fetch(session, study_url, timeout_duration=timeout)
+                            if study_data == "TIMEOUT":
+                                study_id = "TIMEOUT"
+                            elif study_data == "ERROR" or study_data is None:
+                                study_id = "Error"
+                            else:
+                                study_id = study_data.get("accessionId", "Not found")
+                        else:
+                            study_id = "Not available"
                     else:
                         study_id = None
@@ -109,14 +134,13 @@ async def query_gwas_catalog_coroutine(snps, p_threshold=5e-8, return_p=False, r
                 results_snp = [(trait, min_trait[trait]) for trait in min_trait]
             results_global[snp] = results_snp
-        else:
-            errors.append(snp)
     async with aiohttp.ClientSession() as session:
         tasks = [process_snp(session, snp) for snp in snps]
-        await tqdm_asyncio.gather(*tasks)
+        # Initialize tqdm progress bar
+        with tqdm(total=len(tasks), desc="Processing SNPs") as pbar:
+            for coro in asyncio.as_completed(tasks):
+                await coro
+                pbar.update(1)
-    # Exclude timeouts from errors
-    #errors = [error for error in errors if error not in timeouts]
-    return results_global, errors, timeouts
+    return results_global, errors, timeouts

{genal_python-0.9 → genal_python-1.0}/genal/tools.py RENAMED Viewed

@@ -1,14 +1,16 @@
-import os, subprocess
+import os, subprocess, sys
 import pandas as pd
 import json
 import wget
 import shutil
 import tarfile
+import platform
+import requests
+import zipfile
 from .constants import REF_PANELS, REF_PANELS_URL
 config_path = os.path.join(os.path.expanduser("~/.genal/"), "config.json")
-# default_ref_path = os.path.join(os.getcwd(), "tmp_GENAL", "Reference_files")
 default_ref_path = os.path.join(os.path.expanduser("~/.genal/"), "Reference_files")
@@ -79,6 +81,25 @@ def create_tmp():
                 "Unable to create the 'tmp_GENAL' directory. Check permissions."
             )
+def check_bfiles(filepath):
+    """Check if the path specified leads to a bed/bim/fam triple."""
+    if (
+        os.path.exists("{}.bed".format(filepath))
+        and os.path.exists("{}.bim".format(filepath))
+        and os.path.exists("{}.fam".format(filepath))
+    ):
+        return True
+    return False
+def delete_tmp():
+    """Delete the tmp folder."""
+    if os.path.isdir("tmp_GENAL"):
+        shutil.rmtree("tmp_GENAL")
+        print("The tmp_GENAL folder has been successfully deleted.")
+    else:
+        print("There is no tmp_GENAL folder to delete in the current directory.")
+    return
 def set_reference_folder(path=""):
     """
@@ -234,6 +255,7 @@ def load_reference_panel(reference_panel="eur"):
         reference_panel_df["CHR"] = reference_panel_df["CHR"].astype(str).str.replace("^chr", "", regex=True).astype(int)
     return reference_panel_df
 def set_plink(path=""):
     """Set the plink 1.9 path and verify that it is the correct version."""
     if not path:
@@ -279,22 +301,125 @@ def get_plink19_path():
         return config["paths"]["plink19_path"]
-def check_bfiles(filepath):
-    """Check if the path specified leads to a bed/bim/fam triple."""
-    if (
-        os.path.exists("{}.bed".format(filepath))
-        and os.path.exists("{}.bim".format(filepath))
-        and os.path.exists("{}.fam".format(filepath))
-    ):
-        return True
-    return False
+def is_plink_installed(plink_path):
+    try:
+        result = subprocess.run([plink_path, '--version'],
+                                stdout=subprocess.PIPE,
+                                stderr=subprocess.PIPE,
+                                text=True,
+                                check=True)
+        # Parse version from output
+        if 'plink v1.9' in result.stdout.lower():
+            return True
+        else:
+            return False
+    except (subprocess.CalledProcessError, FileNotFoundError):
+        return False
-def delete_tmp():
-    """Delete the tmp folder."""
-    if os.path.isdir("tmp_GENAL"):
-        shutil.rmtree("tmp_GENAL")
-        print("The tmp_GENAL folder has been successfully deleted.")
+def install_plink(path=None):
+    """Install plink 1.9 for the current operating system.
+    Args:
+        path (str, optional): Path to the folder to install plink in. If not provided, install it in the .genal folder at root.
+    """
+    # Determine operating system and architecture
+    system = platform.system()
+    system_arch = platform.architecture()[0][:2]
+    # Handle path variable
+    if not path:
+        path = os.path.join(os.path.expanduser("~/.genal/"), "plink")
+        print(f"You have not specified a path for the installation of plink. The following directory will be used: {path}")
+    # Determine the path of the plink binary
+    if system == 'Windows':
+        plink_path = os.path.join(path, 'plink.exe')
     else:
-        print("There is no tmp_GENAL folder to delete in the current directory.")
+        plink_path = os.path.join(path, 'plink')
+    # Check that it does not already exists
+    if is_plink_installed(plink_path):
+        print(f"Plink1.9 is already installed at {plink_path}. Installation is skipped.")
+        return
+    # If the directory doesn't exist, attempt to create it
+    if not os.path.isdir(path):
+        try:
+            os.makedirs(path, exist_ok=True)
+        except OSError:
+            raise OSError(
+                f"Unable to create the '{path}' directory. Check permissions."
+            )
+    # Determine appropriate download link
+    if system == "Linux":
+        if system_arch == "64":
+            download_url = "https://s3.amazonaws.com/plink1-assets/plink_linux_x86_64_20241022.zip"
+        else:
+            download_url = "https://s3.amazonaws.com/plink1-assets/plink_linux_i686_20241022.zip"
+    elif system == "Windows":
+        if system_arch == "64":
+            download_url = "https://s3.amazonaws.com/plink1-assets/plink_win64_20241022.zip"
+        else:
+            download_url = "https://s3.amazonaws.com/plink1-assets/plink_win32_20241022.zip"
+    elif system == "Darwin":
+        download_url = "https://s3.amazonaws.com/plink1-assets/plink_mac_20241022.zip"
+    else:
+        raise ValueError("Your operating system is not Linux, Windows, or Mac OS and plink1.9 can't be installed automatically. \
+        Please install plink1.9 manually from https://www.cog-genomics.org/plink/1.9/")
+    # Create tmp folder if it does not exist and zip file path
+    create_tmp()
+    zip_path = os.path.join("tmp_GENAL", 'plink1.9.zip')
+    # Download plink
+    print(f"Downloading plink1.9 for {system} {system_arch}bits from {download_url}...")
+    try:
+        response = requests.get(download_url, stream=True)
+        response.raise_for_status()
+        with open(zip_path, 'wb') as f:
+            for chunk in response.iter_content(chunk_size=8192):
+                f.write(chunk)
+        print("Download completed.")
+    except requests.RequestException as e:
+        print(f"Failed to download plink1.9: {e}")
+        sys.exit(1)
+    # Extract the zip file
+    print("Extracting plink1.9...")
+    try:
+        with zipfile.ZipFile(zip_path, 'r') as zip_ref:
+            zip_ref.extractall(path)
+        print("Extraction completed.")
+    except zipfile.BadZipFile as e:
+        print(f"Failed to extract plink1.9: {e}")
+        sys.exit(1)
+    # Clean zip file
+    os.remove(zip_path)
+    # Make the file executable
+    try:
+        os.chmod(plink_path, 0o755)  # Set permissions to rw-r--r--
+    except PermissionError:
+        print("Permission denied: cannot change file permissions.")
+    except FileNotFoundError:
+        print("File not found: cannot change permissions on a non-existent file.")
+    except OSError as e:
+        print(f"OS error occurred: {e}")
+    # Test the installation
+    if is_plink_installed(plink_path):
+        print("plink1.9 has been successfully installed and is accessible.")
+    else:
+        print("plink1.9 installation may have failed. \
+        Please install manually from https://www.cog-genomics.org/plink/1.9/ and set the path using set_plink(path).")
+    # Change config file
+    config = read_config()
+    config["paths"]["plink19_path"] = plink_path
+    write_config(config)
+    print(f"Path to plink 1.9 successfully set: '{plink_path}'")
     return

genal_python-1.0/genal_logo.png ADDED Viewed

Binary file

{genal_python-0.9 → genal_python-1.0}/pyproject.toml RENAMED Viewed

@@ -4,11 +4,11 @@ build-backend = "flit_core.buildapi"
 [project]
 name = "genal-python"  # Updated name for PyPI
-version = "0.9"
+version = "1.0"
 authors = [{name = "Cyprien Rivier", email = "riviercyprien@gmail.com"}]
 description = "A python toolkit for polygenic risk scoring and mendelian randomization."
 readme = "README.md"
-requires-python = ">=3.7"
+requires-python = ">=3.8"
 license = {file = "LICENSE"}
 classifiers = [
     "Programming Language :: Python :: 3",
@@ -26,7 +26,7 @@ dependencies = [
 "psutil==5.9.1",
 "pyliftover==0.4",
 "scikit_learn>=1.3.0",
-"scipy>=1.11.4",
+"scipy>=1.10.1, <1.11",
 "statsmodels==0.14.0",
 "tqdm==4.66.1",
 "wget==3.2"