PyPI - genal-python - Versions diffs - 0.8__tar.gz → 1.0__tar.gz - Mend

genal-python 0.8tar.gz → 1.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (173) hide show

genal_python-1.0/.DS_Store ADDED Viewed

Binary file

{genal_python-0.8 → genal_python-1.0}/.gitignore RENAMED Viewed

@@ -2,4 +2,5 @@ __pycache__/
 dist/
 .ipynb_checkpoints/
 ipynb_checkpoints/
-genal/.ipynb_checkpoints/
+genal/.ipynb_checkpoints/
+test_data/

genal_python-1.0/Genal_flowchart.png ADDED Viewed

Binary file

{genal_python-0.8 → genal_python-1.0}/PKG-INFO RENAMED Viewed

@@ -1,9 +1,9 @@
-Metadata-Version: 2.1
+Metadata-Version: 2.3
 Name: genal-python
-Version: 0.8
+Version: 1.0
 Summary: A python toolkit for polygenic risk scoring and mendelian randomization.
 Author-email: Cyprien Rivier <riviercyprien@gmail.com>
-Requires-Python: >=3.7
+Requires-Python: >=3.8
 Description-Content-Type: text/markdown
 Classifier: Programming Language :: Python :: 3
 Classifier: License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)
@@ -16,12 +16,16 @@ Requires-Dist: plotnine==0.12.3
 Requires-Dist: psutil==5.9.1
 Requires-Dist: pyliftover==0.4
 Requires-Dist: scikit_learn>=1.3.0
-Requires-Dist: scipy>=1.11.4
+Requires-Dist: scipy>=1.10.1, <1.11
 Requires-Dist: statsmodels==0.14.0
 Requires-Dist: tqdm==4.66.1
 Requires-Dist: wget==3.2
 Project-URL: Home, https://github.com/CypRiv/genal
+[![Python 3.8](https://img.shields.io/badge/python-3.8%20%7C%203.9%20%7C%203.10-blue)](https://www.python.org/downloads/release/python-3100/)
+<img src="/genal_logo.png" data-canonical-src="/genal_logo.png" height="80" />
 <center><h1> genal: A Python Toolkit for Genetic Risk Scoring and Mendelian Randomization </h1></center>
@@ -54,33 +58,52 @@ The module prioritizes user-friendliness and intuitive operation, aiming to redu
 Genal draws on concepts from well-established R packages such as TwoSampleMR, MR-Presso, MendelianRandomization, and gwasvcf, adapting their proven methodologies to the Python environment. This approach ensures that users have access to tried and tested techniques with the versatility of Python's data science tools.
+<img src="/Genal_flowchart.png" data-canonical-src="/Genal_flowchart.png" style="max-width:100%;" />
+Genal flowchart. Created in https://www.BioRender.com
 ## Citation <a name="citation"></a>
 If you're using genal, please cite the following paper:
 **Genal: A Python Toolkit for Genetic Risk Scoring and Mendelian Randomization.** Cyprien A. Rivier, Santiago Clocchiatti-Tuozzo, Shufan Huo, Victor Torres-Lopez, Daniela Renedo, Kevin N. Sheth, Guido J. Falcone, Julian N. Acosta. medRxiv 2024.05.23.24307776; doi: https://doi.org/10.1101/2024.05.23.24307776
 ## Requirements for the genal module <a name="paragraph1"></a>
-***Python 3.9 or later***. https://www.python.org/ <br>
+***Python 3.8 or later***. https://www.python.org/ <br>
 ## Installation and How to use the genal module <a name="paragraph2"></a>
 ### Installation <a name="paragraph2.1"></a>
+> **Note:**
+>
+> **Optional**: It is recommended to create a new environment to avoid dependencies conflicts. Here, we create a new conda environment called 'genal_env'.
+> ```
+> conda create --name genal_env python=3.8
+> conda activate genal_env
+> ```
 Download and install the package with pip:
 ```
 pip install genal-python
 ```
-And it can be imported in a python environment with:
+And import it in a python environment with:
 ```python
 import genal
 ```
-The main genal functionalities require a working installation of PLINK v1.9 that can be downloaded here: https://www.cog-genomics.org/plink/
-Once downloaded, the path to the plink executable can be set with:
+The main genal functionalities require a working installation of PLINK v1.9 (and not 2.0 as certain functionalities have not been updated yet).
+If you have already installed plink v1.9, you can set the path to its executable with:
 ```
 genal.set_plink(path="/path/to/plink/executable/file")
 ```
+If plink is not installed, genal can install the correct version for your system with the following line:
+```
+genal.install_plink()
+```
 ### Documentation <a name="paragraph2.2"></a>
 For detailed information on how to use the functionalities of Genal, please refer to the documentation: https://genal.rtfd.io
@@ -115,7 +138,7 @@ For this tutorial, we will obtain genetic instruments for systolic blood pressur
 ### Data loading <a name="paragraph3.1"></a>
-We start this tutorial with publicly available summary statistics from a large GWAS study of systolic blood pressure. [Link to study](https://www.nature.com/articles/s41588-018-0205-x). After downloading and unzipping the summary statistics, we load them into a pandas DataFrame:
+We start this tutorial with publicly available summary statistics from a large GWAS study of systolic blood pressure. [Link to study](https://www.nature.com/articles/s41588-018-0205-x). [Download link](http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST006001-GCST007000/GCST006624/Evangelou_30224653_SBP.txt.gz). After downloading and unzipping the summary statistics, we load them into a pandas DataFrame:
 ```python
 import pandas as pd
@@ -237,7 +260,7 @@ You can also use a custom reference panel by specifying to the reference_panel a
 ### Clumping <a name="paragraph3.3"></a>
-Clumping is the step at which we select the SNPs that will be used as our genetic instruments in future Polygenic Risk Scores and Mendelian Randomization analyses. The process involves identifying the SNPs that are strongly associated with our trait of interest (systolic blood pressure in this tutorial) and are independent from each other. This second step ensures that selected SNPs are not highly correlated, (i.e., they are not in high linkage disequilibrium). For this step, we again need to use a reference panel.
+Clumping, or C+T: Clumping + Thresholding, is the step at which we select the SNPs that will be used as our genetic instruments in future Polygenic Risk Scores and Mendelian Randomization analyses. The process involves identifying the SNPs that are strongly associated with our trait of interest (systolic blood pressure in this tutorial) and are independent from each other. This second step ensures that selected SNPs are not highly correlated, (i.e., they are not in high linkage disequilibrium). For this step, we again need to use a reference panel.
 The SNP-data loaded in a `genal.Geno` instance can be clumped using the `genal.Geno.clump` method. It will return another `genal.Geno` instance containing only the clumped data:
@@ -309,7 +332,7 @@ The output of the `genal.Geno.prs` method will include how many SNPs were used t
 Here, we see that about half of the SNPs were not extracted from the data. In such cases, we may want to try and salvage some of these SNPs by looking for proxies (SNPs in high linkage disequilibrium, i.e. highly correlated SNPs). This can be done by specifying the `proxy = True`. argument:
 ```python
-SBP_clumped.prs(name = "SBP_prs" ,path = "Pop_chr$", proxy = True, reference_panel = "eur", r2=0.8, kb=5000, window_snps=5000)
+SBP_clumped.prs(name = "SBP_prs_proxy" ,path = "Pop_chr$", proxy = True, reference_panel = "eur", r2=0.8, kb=5000, window_snps=5000)
 ```
 and the output is:
@@ -369,7 +392,7 @@ You can customize how the proxies are chosen with the following arguments:
 To run MR, we need to load both our exposure and outcome SNP-level data in `genal.Geno` instances. In our case, the genetic instruments of the MR are the SNPs associated with blood pressure at genome-wide significant levels resulting from the clumping of the blood pressure GWAS. They are stored in our `SBP_clumped` `genal.Geno` instance which also include their association with the exposure trait (instrument-SBP estimates in the `BETA` column).
-To get their association with the outcome trait (instrument-stroke estimates), we are going to use SNP-level data from a large GWAS of stroke performed by the GIGASTROKE consortium ([https://www.nature.com/articles/s41586-022-05165-3](https://www.nature.com/articles/s41586-022-05165-3)):
+To get their association with the outcome trait (instrument-stroke estimates), we are going to use SNP-level data from a large GWAS of stroke performed by the GIGASTROKE consortium: [Link to study](https://www.nature.com/articles/s41586-022-05165-3). [Link to download](http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST90104001-GCST90105000/GCST90104539/GCST90104539_buildGRCh37.tsv.gz):
 ```python
 stroke_gwas = pd.read_csv("GCST90104539_buildGRCh37.tsv",sep="\t")
@@ -410,21 +433,25 @@ Genal will print how many SNPs were successfully found and extracted from the ou
     1541 SNPs out of 1545 are present in the outcome data.
     (Exposure data, Outcome data, Outcome name) stored in the .MR_data attribute.
-Here as well you have the option to use proxies for the instruments that are not present in the outcome data:
-```python
-SBP_clumped.query_outcome(Stroke_geno, proxy = True, reference_panel = "eur", kb = 5000, r2 = 0.6, window_snps = 5000)
-```
-And genal will print the number of missing instruments which have been proxied:
-    Outcome data successfully loaded from 'b352e412' geno instance.
-    Identifying the exposure SNPs present in the outcome data...
-    1541 SNPs out of 1545 are present in the outcome data.
-    Searching proxies for 4 SNPs...
-    Using the EUR reference panel.
-    Found proxies for 4 SNPs.
-    (Exposure data, Outcome data, Outcome name) stored in the .MR_data attribute.
+> **Note:**
+>Here as well you have the option to use proxies for the instruments that are not present in the outcome data:
+>
+> Here as well you have the option to use proxies for the instruments that are not present in the outcome data:
+>
+> ```python
+> SBP_clumped.query_outcome(Stroke_geno, proxy = True, reference_panel = "eur", kb = 5000, r2 = 0.6, window_snps = 5000)
+> ```
+>
+> And genal will print the number of missing instruments that have been proxied:
+>
+>     Outcome data successfully loaded from 'b352e412' geno instance.
+>     Identifying the exposure SNPs present in the outcome data...
+>     1541 SNPs out of 1545 are present in the outcome data.
+>     Searching proxies for 4 SNPs...
+>     Using the EUR reference panel.
+>     Found proxies for 4 SNPs.
+>     (Exposure data, Outcome data, Outcome name) stored in the .MR_data attribute.
 After extracting the instruments from the outcome data, the `SBP_clumped` `genal.Geno` instance contains an `MR_data` attribute containing the instruments-exposure and instruments-outcome associations necessary to run MR. Running MR is now as simple as calling the `genal.Geno.MR` method of the SBP_clumped `genal.Geno` instance:
@@ -469,7 +496,7 @@ By default, only some MR methods (inverse-variance weighted, weighted median, Si
 - `Weighted-mode` for the Weighted mode method
 - `all` to run all the above methods
-For more fine-tuning, such as settings for the number of boostrapping iterations, please refer to the API.
+For more fine-tuning, such as settings for the number of boostrapping iterations, please refer to the API: [https://genal.readthedocs.io/en/latest/modules.html#id4](MR method).
 If you want to visualize the obtained MR results, you can use the `genal.Geno.MR_plot` method that will plot each SNP in an `effect_on_exposure x effect_on_outcome` plane as well as lines corresponding to different MR methods:
@@ -522,7 +549,7 @@ df_pheno = pd.read_csv("path/to/trait/data")
 > **Note:**
 >
->    One important detail is to make sure that the IDs of the participants are identical in the phenotypic data and in the genetic data.
+>    One important point is to make sure that the IDs of the participants are identical in the phenotypic data and in the genetic data.
 Then, it is advised to make a copy of the `genal.Geno` instance containing our instruments as we are going to update their coefficients and to avoid any confusion:

{genal_python-0.8 → genal_python-1.0}/README.md RENAMED Viewed

@@ -1,3 +1,7 @@
+[![Python 3.8](https://img.shields.io/badge/python-3.8%20%7C%203.9%20%7C%203.10-blue)](https://www.python.org/downloads/release/python-3100/)
+<img src="/genal_logo.png" data-canonical-src="/genal_logo.png" height="80" />
 <center><h1> genal: A Python Toolkit for Genetic Risk Scoring and Mendelian Randomization </h1></center>
@@ -30,33 +34,52 @@ The module prioritizes user-friendliness and intuitive operation, aiming to redu
 Genal draws on concepts from well-established R packages such as TwoSampleMR, MR-Presso, MendelianRandomization, and gwasvcf, adapting their proven methodologies to the Python environment. This approach ensures that users have access to tried and tested techniques with the versatility of Python's data science tools.
+<img src="/Genal_flowchart.png" data-canonical-src="/Genal_flowchart.png" style="max-width:100%;" />
+Genal flowchart. Created in https://www.BioRender.com
 ## Citation <a name="citation"></a>
 If you're using genal, please cite the following paper:
 **Genal: A Python Toolkit for Genetic Risk Scoring and Mendelian Randomization.** Cyprien A. Rivier, Santiago Clocchiatti-Tuozzo, Shufan Huo, Victor Torres-Lopez, Daniela Renedo, Kevin N. Sheth, Guido J. Falcone, Julian N. Acosta. medRxiv 2024.05.23.24307776; doi: https://doi.org/10.1101/2024.05.23.24307776
 ## Requirements for the genal module <a name="paragraph1"></a>
-***Python 3.9 or later***. https://www.python.org/ <br>
+***Python 3.8 or later***. https://www.python.org/ <br>
 ## Installation and How to use the genal module <a name="paragraph2"></a>
 ### Installation <a name="paragraph2.1"></a>
+> **Note:**
+>
+> **Optional**: It is recommended to create a new environment to avoid dependencies conflicts. Here, we create a new conda environment called 'genal_env'.
+> ```
+> conda create --name genal_env python=3.8
+> conda activate genal_env
+> ```
 Download and install the package with pip:
 ```
 pip install genal-python
 ```
-And it can be imported in a python environment with:
+And import it in a python environment with:
 ```python
 import genal
 ```
-The main genal functionalities require a working installation of PLINK v1.9 that can be downloaded here: https://www.cog-genomics.org/plink/
-Once downloaded, the path to the plink executable can be set with:
+The main genal functionalities require a working installation of PLINK v1.9 (and not 2.0 as certain functionalities have not been updated yet).
+If you have already installed plink v1.9, you can set the path to its executable with:
 ```
 genal.set_plink(path="/path/to/plink/executable/file")
 ```
+If plink is not installed, genal can install the correct version for your system with the following line:
+```
+genal.install_plink()
+```
 ### Documentation <a name="paragraph2.2"></a>
 For detailed information on how to use the functionalities of Genal, please refer to the documentation: https://genal.rtfd.io
@@ -91,7 +114,7 @@ For this tutorial, we will obtain genetic instruments for systolic blood pressur
 ### Data loading <a name="paragraph3.1"></a>
-We start this tutorial with publicly available summary statistics from a large GWAS study of systolic blood pressure. [Link to study](https://www.nature.com/articles/s41588-018-0205-x). After downloading and unzipping the summary statistics, we load them into a pandas DataFrame:
+We start this tutorial with publicly available summary statistics from a large GWAS study of systolic blood pressure. [Link to study](https://www.nature.com/articles/s41588-018-0205-x). [Download link](http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST006001-GCST007000/GCST006624/Evangelou_30224653_SBP.txt.gz). After downloading and unzipping the summary statistics, we load them into a pandas DataFrame:
 ```python
 import pandas as pd
@@ -213,7 +236,7 @@ You can also use a custom reference panel by specifying to the reference_panel a
 ### Clumping <a name="paragraph3.3"></a>
-Clumping is the step at which we select the SNPs that will be used as our genetic instruments in future Polygenic Risk Scores and Mendelian Randomization analyses. The process involves identifying the SNPs that are strongly associated with our trait of interest (systolic blood pressure in this tutorial) and are independent from each other. This second step ensures that selected SNPs are not highly correlated, (i.e., they are not in high linkage disequilibrium). For this step, we again need to use a reference panel.
+Clumping, or C+T: Clumping + Thresholding, is the step at which we select the SNPs that will be used as our genetic instruments in future Polygenic Risk Scores and Mendelian Randomization analyses. The process involves identifying the SNPs that are strongly associated with our trait of interest (systolic blood pressure in this tutorial) and are independent from each other. This second step ensures that selected SNPs are not highly correlated, (i.e., they are not in high linkage disequilibrium). For this step, we again need to use a reference panel.
 The SNP-data loaded in a `genal.Geno` instance can be clumped using the `genal.Geno.clump` method. It will return another `genal.Geno` instance containing only the clumped data:
@@ -285,7 +308,7 @@ The output of the `genal.Geno.prs` method will include how many SNPs were used t
 Here, we see that about half of the SNPs were not extracted from the data. In such cases, we may want to try and salvage some of these SNPs by looking for proxies (SNPs in high linkage disequilibrium, i.e. highly correlated SNPs). This can be done by specifying the `proxy = True`. argument:
 ```python
-SBP_clumped.prs(name = "SBP_prs" ,path = "Pop_chr$", proxy = True, reference_panel = "eur", r2=0.8, kb=5000, window_snps=5000)
+SBP_clumped.prs(name = "SBP_prs_proxy" ,path = "Pop_chr$", proxy = True, reference_panel = "eur", r2=0.8, kb=5000, window_snps=5000)
 ```
 and the output is:
@@ -345,7 +368,7 @@ You can customize how the proxies are chosen with the following arguments:
 To run MR, we need to load both our exposure and outcome SNP-level data in `genal.Geno` instances. In our case, the genetic instruments of the MR are the SNPs associated with blood pressure at genome-wide significant levels resulting from the clumping of the blood pressure GWAS. They are stored in our `SBP_clumped` `genal.Geno` instance which also include their association with the exposure trait (instrument-SBP estimates in the `BETA` column).
-To get their association with the outcome trait (instrument-stroke estimates), we are going to use SNP-level data from a large GWAS of stroke performed by the GIGASTROKE consortium ([https://www.nature.com/articles/s41586-022-05165-3](https://www.nature.com/articles/s41586-022-05165-3)):
+To get their association with the outcome trait (instrument-stroke estimates), we are going to use SNP-level data from a large GWAS of stroke performed by the GIGASTROKE consortium: [Link to study](https://www.nature.com/articles/s41586-022-05165-3). [Link to download](http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST90104001-GCST90105000/GCST90104539/GCST90104539_buildGRCh37.tsv.gz):
 ```python
 stroke_gwas = pd.read_csv("GCST90104539_buildGRCh37.tsv",sep="\t")
@@ -386,21 +409,25 @@ Genal will print how many SNPs were successfully found and extracted from the ou
     1541 SNPs out of 1545 are present in the outcome data.
     (Exposure data, Outcome data, Outcome name) stored in the .MR_data attribute.
-Here as well you have the option to use proxies for the instruments that are not present in the outcome data:
-```python
-SBP_clumped.query_outcome(Stroke_geno, proxy = True, reference_panel = "eur", kb = 5000, r2 = 0.6, window_snps = 5000)
-```
-And genal will print the number of missing instruments which have been proxied:
-    Outcome data successfully loaded from 'b352e412' geno instance.
-    Identifying the exposure SNPs present in the outcome data...
-    1541 SNPs out of 1545 are present in the outcome data.
-    Searching proxies for 4 SNPs...
-    Using the EUR reference panel.
-    Found proxies for 4 SNPs.
-    (Exposure data, Outcome data, Outcome name) stored in the .MR_data attribute.
+> **Note:**
+>Here as well you have the option to use proxies for the instruments that are not present in the outcome data:
+>
+> Here as well you have the option to use proxies for the instruments that are not present in the outcome data:
+>
+> ```python
+> SBP_clumped.query_outcome(Stroke_geno, proxy = True, reference_panel = "eur", kb = 5000, r2 = 0.6, window_snps = 5000)
+> ```
+>
+> And genal will print the number of missing instruments that have been proxied:
+>
+>     Outcome data successfully loaded from 'b352e412' geno instance.
+>     Identifying the exposure SNPs present in the outcome data...
+>     1541 SNPs out of 1545 are present in the outcome data.
+>     Searching proxies for 4 SNPs...
+>     Using the EUR reference panel.
+>     Found proxies for 4 SNPs.
+>     (Exposure data, Outcome data, Outcome name) stored in the .MR_data attribute.
 After extracting the instruments from the outcome data, the `SBP_clumped` `genal.Geno` instance contains an `MR_data` attribute containing the instruments-exposure and instruments-outcome associations necessary to run MR. Running MR is now as simple as calling the `genal.Geno.MR` method of the SBP_clumped `genal.Geno` instance:
@@ -445,7 +472,7 @@ By default, only some MR methods (inverse-variance weighted, weighted median, Si
 - `Weighted-mode` for the Weighted mode method
 - `all` to run all the above methods
-For more fine-tuning, such as settings for the number of boostrapping iterations, please refer to the API.
+For more fine-tuning, such as settings for the number of boostrapping iterations, please refer to the API: [https://genal.readthedocs.io/en/latest/modules.html#id4](MR method).
 If you want to visualize the obtained MR results, you can use the `genal.Geno.MR_plot` method that will plot each SNP in an `effect_on_exposure x effect_on_outcome` plane as well as lines corresponding to different MR methods:
@@ -498,7 +525,7 @@ df_pheno = pd.read_csv("path/to/trait/data")
 > **Note:**
 >
->    One important detail is to make sure that the IDs of the participants are identical in the phenotypic data and in the genetic data.
+>    One important point is to make sure that the IDs of the participants are identical in the phenotypic data and in the genetic data.
 Then, it is advised to make a copy of the `genal.Geno` instance containing our instruments as we are going to update their coefficients and to avoid any confusion:

{genal_python-0.8 → genal_python-1.0}/docs/.DS_Store RENAMED Viewed

Binary file

{genal_python-0.8 → genal_python-1.0/docs/build}/.DS_Store RENAMED Viewed

Binary file

{genal_python-0.8 → genal_python-1.0}/docs/build/.doctrees/api.doctree RENAMED Viewed

Binary file

genal_python-1.0/docs/build/.doctrees/environment.pickle ADDED Viewed

Binary file

{genal_python-0.8 → genal_python-1.0}/docs/build/.doctrees/index.doctree RENAMED Viewed

Binary file

genal_python-1.0/docs/build/.doctrees/introduction.doctree ADDED Viewed

Binary file

{genal_python-0.8 → genal_python-1.0}/docs/build/_modules/genal/Geno.html RENAMED Viewed

@@ -3,14 +3,14 @@
 <head>
   <meta charset="utf-8" />
   <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-  <title>genal.Geno &mdash; genal v0.0 documentation</title>
+  <title>genal.Geno &mdash; genal v0.8 documentation</title>
       <link rel="stylesheet" href="../../_static/pygments.css" type="text/css" />
       <link rel="stylesheet" href="../../_static/css/theme.css" type="text/css" />
   <!--[if lt IE 9]>
     <script src="../../_static/js/html5shiv.min.js"></script>
   <![endif]-->
-        <script src="../../_static/documentation_options.js?v=90b5f367"></script>
+        <script src="../../_static/documentation_options.js?v=b326c068"></script>
         <script src="../../_static/doctools.js?v=9a2dae69"></script>
         <script src="../../_static/sphinx_highlight.js?v=dc90522c"></script>
     <script src="../../_static/js/theme.js"></script>
@@ -940,7 +940,7 @@
         <span class="n">methods</span><span class="o">=</span><span class="p">[</span>
             <span class="s2">&quot;IVW&quot;</span><span class="p">,</span>
             <span class="s2">&quot;WM&quot;</span><span class="p">,</span>
-            <span class="s2">&quot;Simple-median&quot;</span><span class="p">,</span>
+            <span class="s2">&quot;Simple-mode&quot;</span><span class="p">,</span>
             <span class="s2">&quot;Egger&quot;</span><span class="p">,</span>
         <span class="p">],</span>
         <span class="n">exposure_name</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span>

{genal_python-0.8 → genal_python-1.0}/docs/build/_sources/index.rst.txt RENAMED Viewed

@@ -18,7 +18,7 @@ Genal draws on concepts from well-established R packages such as TwoSampleMR, MR
 To install the latest release, type::
-    pip install genal
+    pip install genal-python
 Contents
 --------

{genal_python-0.8 → genal_python-1.0}/docs/build/_sources/introduction.rst.txt RENAMED Viewed

@@ -2,13 +2,21 @@
 Installation
 ============
-The genal package can be easily installed with pip:
+.. note::
+    **Optional**: It is recommended to create a new environment to avoid dependencies conflicts. Here, we create a new conda environment called 'genal'.
+    .. code-block:: bash
+        conda create --name genal python=3.11
+        conda activate genal
+The genal package requires Python 3.11. Download and install it with pip:
 .. code-block:: bash
-    pip install genal
+    pip install genal-python
-And it can be imported in a python environment with:
+And import it in a python environment with:
 .. code-block:: python
@@ -145,7 +153,7 @@ By default, and depending on the global preprocessing level (``'None'``, ``'Fill
 If you do not wish to run certain steps, or wish to run only certain steps, you can use additional arguments. For more information, please refer to the :meth:`~genal.Geno.preprocess_data` method in the API documentation.
-In our case, the ``SNP`` column (for SNP identifier - rsid) was missing from our dataframe and has been added based on a 1000 genome reference panel:
+In our case, the ``SNP`` column (for SNP identifier - rsid) was missing from our dataframe and has been added based on a 1000 genome reference panel::
     Using the EUR reference panel.
     The SNP column (rsID) has been created. 197511 (2.787%) SNPs were not found in the reference data and their ID set to CHR:POS:EA.
@@ -176,7 +184,7 @@ You can also use a custom reference panel by specifying the path to bed/bim/fam
 Clumping
 --------
-Clumping is the step at which we select the SNPs that will be used as our genetic instruments in future Polygenic Risk Scores and Mendelian Randomization analyses. The process involves identifying the SNPs that are strongly associated with our trait of interest (systolic blood pressure in this tutorial) and are independent from each other. This second step ensures that selected SNPs are not highly correlated, (i.e., they are not in high linkage disequilibrium). For this step, we again need to use a reference panel.
+Clumping, or C+T: Clumping + Thresholding, is the step at which we select the SNPs that will be used as our genetic instruments in future Polygenic Risk Scores and Mendelian Randomization analyses. The process involves identifying the SNPs that are strongly associated with our trait of interest (systolic blood pressure in this tutorial) and are independent from each other. This second step ensures that selected SNPs are not highly correlated, (i.e., they are not in high linkage disequilibrium). For this step, we again need to use a reference panel.
 The SNP-data loaded in a :class:`~genal.Geno` instance can be clumped using the :meth:`~genal.Geno.clump` method. It will return another :class:`~genal.Geno` instance containing only the clumped data:
@@ -250,7 +258,7 @@ Here, we see that about half of the SNPs were not extracted from the data. In su
 .. code-block:: python
-    SBP_clumped.prs(name="SBP_prs", path="Pop_chr$", proxy=True, reference_panel="eur", r2=0.8, kb=5000, window_snps=5000)
+    SBP_clumped.prs(name="SBP_prs_proxy", path="Pop_chr$", proxy=True, reference_panel="eur", r2=0.8, kb=5000, window_snps=5000)
 and the output is::
@@ -489,7 +497,7 @@ Let's start by loading phenotypic data:
     df_pheno = pd.read_csv("path/to/trait/data")
 .. note::
-   One important detail is to make sure that the IDs of the participants are identical in the phenotypic data and in the genetic data.
+   One important point is to make sure that the IDs of the participants are identical in the phenotypic data and in the genetic data.
 Then, it is advised to make a copy of the :class:`~genal.Geno` instance containing our instruments as we are going to update their coefficients and to avoid any confusion:
@@ -589,7 +597,7 @@ Which will output::
         701 (45.37%) SNPs failed to query (not found in GWAS Catalog) and 7 (0.5%) SNPs timed out after 34.33 seconds. You can increase the timeout value with the timeout argument.
 And the :attr:`~genal.Geno.data` attribute now contains an `ASSOC` column::
         EA NEA    EAF    BETA     SE  CHR        POS         SNP                                               ASSOC
         0  A   G  0.1784  0.2330  0.0402   10  102075479    rs603424  [eicosanoids measurement, decadienedioic acid (...]
         1  A   G  0.0706 -0.3873  0.0626   10  102403682   rs2996303                                       FAILED_QUERY

{genal_python-0.8 → genal_python-1.0}/docs/build/api.html RENAMED Viewed

@@ -473,7 +473,7 @@ flipping palindromic SNPs (relevant if action=2). Default is 0.42.</p></li>
 <dl class="py method">
 <dt class="sig sig-object py" id="id1">
-<span class="sig-name descname"><span class="pre">MR_plot</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">methods</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">['IVW',</span> <span class="pre">'WM',</span> <span class="pre">'Simple-median',</span> <span class="pre">'Egger']</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">exposure_name</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">outcome_name</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">filename</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span><a class="reference internal" href="_modules/genal/Geno.html#Geno.MR_plot"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#id1" title="Link to this definition">¶</a></dt>
+<span class="sig-name descname"><span class="pre">MR_plot</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">methods</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">['IVW',</span> <span class="pre">'WM',</span> <span class="pre">'Simple-mode',</span> <span class="pre">'Egger']</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">exposure_name</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">outcome_name</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">filename</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span><a class="reference internal" href="_modules/genal/Geno.html#Geno.MR_plot"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#id1" title="Link to this definition">¶</a></dt>
 <dd><p>Creates and returns a scatter plot of individual SNP effects with lines representing different Mendelian Randomization (MR) methods. Each MR method specified in the ‘methods’ argument is represented as a line in the plot.</p>
 <dl class="field-list simple">
 <dt class="field-odd">Parameters<span class="colon">:</span></dt>

{genal_python-0.8 → genal_python-1.0}/docs/build/index.html RENAMED Viewed

@@ -89,7 +89,7 @@
 <p>The module prioritizes user-friendliness and intuitive operation, aiming to reduce the complexity of data analysis for researchers. Despite its focus on simplicity, Genal does not sacrifice the depth of customization or the precision of analysis. Researchers can expect to maintain analytical rigour while benefiting from the streamlined experience.</p>
 <p>Genal draws on concepts from well-established R packages such as TwoSampleMR, MR-Presso, MendelianRandomization, and gwasvcf, adapting their proven methodologies to the Python environment. This approach ensures that users have access to tried and tested techniques with the versatility of Python’s data science tools.</p>
 <p>To install the latest release, type:</p>
-<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">pip</span> <span class="n">install</span> <span class="n">genal</span>
+<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">pip</span> <span class="n">install</span> <span class="n">genal</span><span class="o">-</span><span class="n">python</span>
 </pre></div>
 </div>
 <section id="contents">

{genal_python-0.8 → genal_python-1.0}/docs/build/introduction.html RENAMED Viewed

@@ -88,11 +88,19 @@
   <section id="installation">
 <h1>Installation<a class="headerlink" href="#installation" title="Link to this heading">¶</a></h1>
-<p>The genal package can be easily installed with pip:</p>
-<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>pip<span class="w"> </span>install<span class="w"> </span>genal
+<div class="admonition note">
+<p class="admonition-title">Note</p>
+<p><strong>Optional</strong>: It is recommended to create a new environment to avoid dependencies conflicts. Here, we create a new conda environment called ‘genal’.</p>
+<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>conda<span class="w"> </span>create<span class="w"> </span>--name<span class="w"> </span>genal<span class="w"> </span><span class="nv">python</span><span class="o">=</span><span class="m">3</span>.11
+conda<span class="w"> </span>activate<span class="w"> </span>genal
+</pre></div>
+</div>
+</div>
+<p>The genal package requires Python 3.11. Download and install it with pip:</p>
+<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>pip<span class="w"> </span>install<span class="w"> </span>genal-python
 </pre></div>
 </div>
-<p>And it can be imported in a python environment with:</p>
+<p>And import it in a python environment with:</p>
 <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">genal</span>
 </pre></div>
 </div>
@@ -205,11 +213,11 @@ Once downloaded, the path to the plink 1.9 executable should be set with:</p>
 </ul>
 <p>If you do not wish to run certain steps, or wish to run only certain steps, you can use additional arguments. For more information, please refer to the <a class="reference internal" href="modules.html#id0" title="genal.Geno.preprocess_data"><code class="xref py py-meth docutils literal notranslate"><span class="pre">preprocess_data()</span></code></a> method in the API documentation.</p>
 <p>In our case, the <code class="docutils literal notranslate"><span class="pre">SNP</span></code> column (for SNP identifier - rsid) was missing from our dataframe and has been added based on a 1000 genome reference panel:</p>
-<blockquote>
-<div><p>Using the EUR reference panel.
-The SNP column (rsID) has been created. 197511 (2.787%) SNPs were not found in the reference data and their ID set to CHR:POS:EA.
-The BETA column looks like Beta estimates. Use effect_column=’OR’ if it is a column of Odds Ratios.</p>
-</div></blockquote>
+<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">Using</span> <span class="n">the</span> <span class="n">EUR</span> <span class="n">reference</span> <span class="n">panel</span><span class="o">.</span>
+<span class="n">The</span> <span class="n">SNP</span> <span class="n">column</span> <span class="p">(</span><span class="n">rsID</span><span class="p">)</span> <span class="n">has</span> <span class="n">been</span> <span class="n">created</span><span class="o">.</span> <span class="mi">197511</span> <span class="p">(</span><span class="mf">2.787</span><span class="o">%</span><span class="p">)</span> <span class="n">SNPs</span> <span class="n">were</span> <span class="ow">not</span> <span class="n">found</span> <span class="ow">in</span> <span class="n">the</span> <span class="n">reference</span> <span class="n">data</span> <span class="ow">and</span> <span class="n">their</span> <span class="n">ID</span> <span class="nb">set</span> <span class="n">to</span> <span class="n">CHR</span><span class="p">:</span><span class="n">POS</span><span class="p">:</span><span class="n">EA</span><span class="o">.</span>
+<span class="n">The</span> <span class="n">BETA</span> <span class="n">column</span> <span class="n">looks</span> <span class="n">like</span> <span class="n">Beta</span> <span class="n">estimates</span><span class="o">.</span> <span class="n">Use</span> <span class="n">effect_column</span><span class="o">=</span><span class="s1">&#39;OR&#39;</span> <span class="k">if</span> <span class="n">it</span> <span class="ow">is</span> <span class="n">a</span> <span class="n">column</span> <span class="n">of</span> <span class="n">Odds</span> <span class="n">Ratios</span><span class="o">.</span>
+</pre></div>
+</div>
 <p>You can always check the data of a <code class="docutils literal notranslate"><span class="pre">genal.Geno</span></code> instance by accessing the <code class="docutils literal notranslate"><span class="pre">data</span></code> attribute:</p>
 <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="n">SBP_Geno</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">head</span><span class="p">(</span><span class="mi">5</span><span class="p">)</span>
 <span class="go">    EA NEA     EAF   BETA     SE        P  CHR       POS        SNP</span>
@@ -229,7 +237,7 @@ The BETA column looks like Beta estimates. Use effect_column=’OR’ if it is a
 </section>
 <section id="clumping">
 <h2>Clumping<a class="headerlink" href="#clumping" title="Link to this heading">¶</a></h2>
-<p>Clumping is the step at which we select the SNPs that will be used as our genetic instruments in future Polygenic Risk Scores and Mendelian Randomization analyses. The process involves identifying the SNPs that are strongly associated with our trait of interest (systolic blood pressure in this tutorial) and are independent from each other. This second step ensures that selected SNPs are not highly correlated, (i.e., they are not in high linkage disequilibrium). For this step, we again need to use a reference panel.</p>
+<p>Clumping, or C+T: Clumping + Thresholding, is the step at which we select the SNPs that will be used as our genetic instruments in future Polygenic Risk Scores and Mendelian Randomization analyses. The process involves identifying the SNPs that are strongly associated with our trait of interest (systolic blood pressure in this tutorial) and are independent from each other. This second step ensures that selected SNPs are not highly correlated, (i.e., they are not in high linkage disequilibrium). For this step, we again need to use a reference panel.</p>
 <p>The SNP-data loaded in a <a class="reference internal" href="modules.html#genal.Geno" title="genal.Geno"><code class="xref py py-class docutils literal notranslate"><span class="pre">Geno</span></code></a> instance can be clumped using the <a class="reference internal" href="modules.html#id1" title="genal.Geno.clump"><code class="xref py py-meth docutils literal notranslate"><span class="pre">clump()</span></code></a> method. It will return another <a class="reference internal" href="modules.html#genal.Geno" title="genal.Geno"><code class="xref py py-class docutils literal notranslate"><span class="pre">Geno</span></code></a> instance containing only the clumped data:</p>
 <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">SBP_clumped</span> <span class="o">=</span> <span class="n">SBP_Geno</span><span class="o">.</span><span class="n">clump</span><span class="p">(</span><span class="n">p1</span><span class="o">=</span><span class="mf">5e-8</span><span class="p">,</span> <span class="n">r2</span><span class="o">=</span><span class="mf">0.1</span><span class="p">,</span> <span class="n">kb</span><span class="o">=</span><span class="mi">250</span><span class="p">,</span> <span class="n">reference_panel</span><span class="o">=</span><span class="s2">&quot;eur&quot;</span><span class="p">)</span>
 </pre></div>
@@ -293,7 +301,7 @@ The output of the <a class="reference internal" href="modules.html#id2" title="g
 </pre></div>
 </div>
 <p>Here, we see that about half of the SNPs were not extracted from the data. In such cases, we may want to try and salvage some of these SNPs by looking for proxies (SNPs in high linkage disequilibrium, i.e. highly correlated SNPs). This can be done by specifying the <code class="docutils literal notranslate"><span class="pre">proxy</span> <span class="pre">=</span> <span class="pre">True</span></code> argument:</p>
-<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">SBP_clumped</span><span class="o">.</span><span class="n">prs</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s2">&quot;SBP_prs&quot;</span><span class="p">,</span> <span class="n">path</span><span class="o">=</span><span class="s2">&quot;Pop_chr$&quot;</span><span class="p">,</span> <span class="n">proxy</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">reference_panel</span><span class="o">=</span><span class="s2">&quot;eur&quot;</span><span class="p">,</span> <span class="n">r2</span><span class="o">=</span><span class="mf">0.8</span><span class="p">,</span> <span class="n">kb</span><span class="o">=</span><span class="mi">5000</span><span class="p">,</span> <span class="n">window_snps</span><span class="o">=</span><span class="mi">5000</span><span class="p">)</span>
+<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">SBP_clumped</span><span class="o">.</span><span class="n">prs</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s2">&quot;SBP_prs_proxy&quot;</span><span class="p">,</span> <span class="n">path</span><span class="o">=</span><span class="s2">&quot;Pop_chr$&quot;</span><span class="p">,</span> <span class="n">proxy</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">reference_panel</span><span class="o">=</span><span class="s2">&quot;eur&quot;</span><span class="p">,</span> <span class="n">r2</span><span class="o">=</span><span class="mf">0.8</span><span class="p">,</span> <span class="n">kb</span><span class="o">=</span><span class="mi">5000</span><span class="p">,</span> <span class="n">window_snps</span><span class="o">=</span><span class="mi">5000</span><span class="p">)</span>
 </pre></div>
 </div>
 <p>and the output is:</p>
@@ -551,7 +559,7 @@ It can be run using the <a class="reference internal" href="modules.html#id5" ti
 </div>
 <div class="admonition note">
 <p class="admonition-title">Note</p>
-<p>One important detail is to make sure that the IDs of the participants are identical in the phenotypic data and in the genetic data.</p>
+<p>One important point is to make sure that the IDs of the participants are identical in the phenotypic data and in the genetic data.</p>
 </div>
 <p>Then, it is advised to make a copy of the <a class="reference internal" href="modules.html#genal.Geno" title="genal.Geno"><code class="xref py py-class docutils literal notranslate"><span class="pre">Geno</span></code></a> instance containing our instruments as we are going to update their coefficients and to avoid any confusion:</p>
 <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">SBP_adjusted</span> <span class="o">=</span> <span class="n">SBP_clumped</span><span class="o">.</span><span class="n">copy</span><span class="p">()</span>

genal-python 0.8__tar.gz → 1.0__tar.gz

genal-python 0.8tar.gz → 1.0tar.gz