PyPI - genal-python - Versions diffs - 1.0__tar.gz → 1.2__tar.gz - Mend

genal-python 1.0tar.gz → 1.2tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (111) hide show

genal_python-1.2/Genal_flowchart.png ADDED Viewed

Binary file

{genal_python-1.0 → genal_python-1.2}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.3
 Name: genal-python
-Version: 1.0
+Version: 1.2
 Summary: A python toolkit for polygenic risk scoring and mendelian randomization.
 Author-email: Cyprien Rivier <riviercyprien@gmail.com>
 Requires-Python: >=3.8
@@ -29,10 +29,6 @@ Project-URL: Home, https://github.com/CypRiv/genal
 <center><h1> genal: A Python Toolkit for Genetic Risk Scoring and Mendelian Randomization </h1></center>
-**This project was developed by Cyprien A. Rivier**
 # Table of contents
 1. [Introduction](#introduction)
 2. [Citation](#citation)
@@ -62,8 +58,12 @@ Genal draws on concepts from well-established R packages such as TwoSampleMR, MR
 Genal flowchart. Created in https://www.BioRender.com
 ## Citation <a name="citation"></a>
-If you're using genal, please cite the following paper:
-**Genal: A Python Toolkit for Genetic Risk Scoring and Mendelian Randomization.** Cyprien A. Rivier, Santiago Clocchiatti-Tuozzo, Shufan Huo, Victor Torres-Lopez, Daniela Renedo, Kevin N. Sheth, Guido J. Falcone, Julian N. Acosta. medRxiv 2024.05.23.24307776; doi: https://doi.org/10.1101/2024.05.23.24307776
+This project was developed by Cyprien A. Rivier.
+If you're using genal, please cite the following paper:
+**Genal: A Python Toolkit for Genetic Risk Scoring and Mendelian Randomization.**
+Cyprien A. Rivier, Santiago Clocchiatti-Tuozzo, Shufan Huo, Victor Torres-Lopez, Daniela Renedo, Kevin N. Sheth, Guido J. Falcone, Julian N. Acosta.
+Bioinformatics Advances 2024.
+doi: https://doi.org/10.1093/bioadv/vbae207
 ## Requirements for the genal module <a name="paragraph1"></a>
 ***Python 3.8 or later***. https://www.python.org/ <br>
@@ -91,8 +91,8 @@ And import it in a python environment with:
 import genal
 ```
-The main genal functionalities require a working installation of PLINK v1.9 (and not 2.0 as certain functionalities have not been updated yet).
-If you have already installed plink v1.9, you can set the path to its executable with:
+The main genal functionalities require a working installation of PLINK v2.0.
+If you have already installed plink v2.0, you can set the path to its executable with:
 ```
 genal.set_plink(path="/path/to/plink/executable/file")
@@ -256,7 +256,7 @@ You do not need to obtain the 1000 genome reference panel yourself, genal will d
 SBP_Geno.preprocess_data(preprocessing = 'Fill_delete', reference_panel = "afr")
 ```
-You can also use a custom reference panel by specifying to the reference_panel argument a path to bed/bim/fam files (without the extension).
+You can also use a custom reference panel by specifying to the reference_panel argument a path to bed/bim/fam (plink v1.9 format) or pgen/pvar/psam files (plink v2.0 format), without the extension.
 ### Clumping <a name="paragraph3.3"></a>
@@ -289,7 +289,7 @@ Computing a Polygenic Risk Score (PRS) can be done in one line with the `genal.G
 SBP_clumped.prs(name = "SBP_prs", path = "path/to/genetic/files")
 ```
-The genetic files of the target population can be either contained in one triple of bed/bim/fam files with information for all SNPs, or divided by chromosome (one bed/bim/fam triple for chr 1, another for chr 2, etc...). In the latter case, provide the path by replacing the chromosome number by `$` and genal will extract the necessary SNPs from each chromosome and merge them before running the PRS. For instance, if the genetic files are named `Pop_chr1.bed`, `Pop_chr1.bim`, `Pop_chr1.fam`, `Pop_chr2.bed`, ..., you can use:
+The genetic files of the target population can be either contained in one triple of bed/bim/fam or pgen/pvar/psam files with information for all SNPs, or divided by chromosome (one bed/bim/fam or pgen/pvar/psam triple for chr 1, another for chr 2, etc...). In the latter case, provide the path by replacing the chromosome number by `$` and genal will extract the necessary SNPs from each chromosome and merge them before running the PRS. For instance, if the genetic files are named `Pop_chr1.bed`, `Pop_chr1.bim`, `Pop_chr1.fam`, `Pop_chr2.bed`, ..., you can use:
 ```python
 SBP_clumped.prs(name = "SBP_prs", path = "Pop_chr$")
@@ -347,26 +347,7 @@ and the output is:
     7(0.455%) duplicated SNPs have been removed. Use keep_dups=True to keep them.
     Extracting SNPs for each chromosome...
     SNPs extracted for chr1.
-    SNPs extracted for chr2.
-    SNPs extracted for chr3.
-    SNPs extracted for chr4.
-    SNPs extracted for chr5.
-    SNPs extracted for chr6.
-    SNPs extracted for chr7.
-    SNPs extracted for chr8.
-    SNPs extracted for chr9.
-    SNPs extracted for chr10.
-    SNPs extracted for chr11.
-    SNPs extracted for chr12.
-    SNPs extracted for chr13.
-    SNPs extracted for chr14.
-    SNPs extracted for chr15.
-    SNPs extracted for chr16.
-    SNPs extracted for chr17.
-    SNPs extracted for chr18.
-    SNPs extracted for chr19.
-    SNPs extracted for chr20.
-    SNPs extracted for chr21.
+    ...
     SNPs extracted for chr22.
     Merging SNPs extracted from each chromosome...
     Created bed/bim/fam fileset with extracted SNPs: tmp_GENAL/4f4ce6a7_allchr
@@ -435,7 +416,6 @@ Genal will print how many SNPs were successfully found and extracted from the ou
 > **Note:**
->Here as well you have the option to use proxies for the instruments that are not present in the outcome data:
 >
 > Here as well you have the option to use proxies for the instruments that are not present in the outcome data:
 >
@@ -479,7 +459,7 @@ You can specify several arguments. We refer to the API for a full list, but the
 - `action = 2`: Uses effect allele frequencies to attempt to flip them (conservative, default)
 - `action = 3`: Removes all palindromic SNPs (very conservative)
-If you choose the option 2 or 3 (recommended), genal will print the list of palindromic SNPs that have been removed from the analysis.
+When choosing the option 2 or 3 (recommended), genal will print the list of palindromic SNPs that have been removed from the analysis.
 By default, only some MR methods (inverse-variance weighted, weighted median, Simple mode, MR-Egger) are going to be run. But if you wish to run a different set of MR methods, you can pass a list of strings to the `methods` argument. The possible strings are:
 - `IVW` for the classical Inverse-Variance Weighted method with random effects
@@ -496,7 +476,7 @@ By default, only some MR methods (inverse-variance weighted, weighted median, Si
 - `Weighted-mode` for the Weighted mode method
 - `all` to run all the above methods
-For more fine-tuning, such as settings for the number of boostrapping iterations, please refer to the API: [https://genal.readthedocs.io/en/latest/modules.html#id4](MR method).
+For more fine-tuning, such as settings for the number of boostrapping iterations, please refer to the API: [https://genal.readthedocs.io/en/latest/modules.html#id4] (MR method).
 If you want to visualize the obtained MR results, you can use the `genal.Geno.MR_plot` method that will plot each SNP in an `effect_on_exposure x effect_on_outcome` plane as well as lines corresponding to different MR methods:
@@ -505,9 +485,9 @@ SBP_clumped.MR_plot(filename="MR_plot_SBP_AS")
 ```
 ![MR plot](docs/build/_images/MR_plot_SBP_AS.png)
-You can select which MR methods you wish to plot with the `methods` argument. Note that for an MR method to be plotted, they must be included in the latest `genal.Geno.MR` call of this `genal.Geno` instance.
+You can select which MR methods to plot with the `methods` argument. Note that for an MR method to be plotted, they must be included in the latest `genal.Geno.MR` call of this `genal.Geno` instance.
-If you wish to include the heterogeneity values (Cochran's Q) in the results, you can use the heterogeneity argument in the `genal.Geno.MR` call. Here, the heterogeneity for the inverse-variance weighted method:
+To include the heterogeneity values (Cochran's Q) in the results, you can use the heterogeneity argument in the `genal.Geno.MR` call. Here, the heterogeneity for the inverse-variance weighted method:
 ```python
 SBP_clumped.MR(action = 2, methods = ["Egger","IVW"], exposure_name = "SBP", outcome_name = "Stroke_eur", heterogeneity = True)
@@ -520,6 +500,12 @@ And that will give:
 | SBP      | Stroke_eur | Egger Intercept          | 1499 | -0.001381 | 0.000813  | 8.935529e-02  | 2959.965136 | 1497 | 1.253763e-98 |
 | SBP      | Stroke_eur | Inverse-Variance Weighted| 1499 | 0.023049  | 0.001061  | 1.382645e-104 | 2965.678836 | 1498 | 4.280737e-99 |
+To display the coefficients as odds ratios with confidence intervals for a binary outcome trait, you can use the `odds = True` argument:
+```python
+SBP_clumped.MR(action = 2, methods = ["Egger","IVW"], exposure_name = "SBP", outcome_name = "Stroke_eur", heterogeneity = True, odds = True)
+```
 As expected, many MR methods indicate that SBP is strongly associated with stroke, but there could be concerns for horizontal pleiotropy (instruments influencing the outcome through a different pathway than the one used as exposure) given the almost significant MR-Egger intercept p-value.
 To investigate horizontal pleiotropy in more details, a very useful method is Mendelian Randomization Pleiotropy RESidual Sum and Outlier (MR-PRESSO). MR-PRESSO is a method designed to detect and correct for horizontal pleiotropy. It will identify which instruments are likely to be pleiotropic on their effect on the outcome, and it will rerun an inverse-variance weighted MR after excluding them. It can be run using the `genal.Geno.MRpresso` method:
@@ -549,7 +535,7 @@ df_pheno = pd.read_csv("path/to/trait/data")
 > **Note:**
 >
->    One important point is to make sure that the IDs of the participants are identical in the phenotypic data and in the genetic data.
+>    One important point is to make sure that both the Family IDs (FID) and Individual IDs (IID) of the participants are identical in the phenotypic data and in the genetic data.
 Then, it is advised to make a copy of the `genal.Geno` instance containing our instruments as we are going to update their coefficients and to avoid any confusion:
@@ -560,7 +546,7 @@ SBP_adjusted = SBP_clumped.copy()
 We can then call the `genal.Geno.set_phenotype` method, specifying which column contains our trait of interest (for the association testing) and which column contains the individual IDs:
 ```python
-SBP_adjusted.set_phenotype(df_pheno, PHENO = "htn", IID = "IID")
+SBP_adjusted.set_phenotype(df_pheno, PHENO = "htn", IID = "IID", FID = "FID")
 ```
 At this point, genal will identify if the phenotype is binary or quantitative in order to choose the appropriate regression model. If the phenotype is binary, it will assume that the most frequent value is coding for control (and the other value for case), this can be changed with `alternate_control = True`:
@@ -580,26 +566,7 @@ Genal will print information regarding the number of individuals used in the tes
     CHR/POS columns present: SNPs searched based on genomic positions.
     Extracting SNPs for each chromosome...
     SNPs extracted for chr1.
-    SNPs extracted for chr2.
-    SNPs extracted for chr3.
-    SNPs extracted for chr4.
-    SNPs extracted for chr5.
-    SNPs extracted for chr6.
-    SNPs extracted for chr7.
-    SNPs extracted for chr8.
-    SNPs extracted for chr9.
-    SNPs extracted for chr10.
-    SNPs extracted for chr11.
-    SNPs extracted for chr12.
-    SNPs extracted for chr13.
-    SNPs extracted for chr14.
-    SNPs extracted for chr15.
-    SNPs extracted for chr16.
-    SNPs extracted for chr17.
-    SNPs extracted for chr18.
-    SNPs extracted for chr19.
-    SNPs extracted for chr20.
-    SNPs extracted for chr21.
+    ...
     SNPs extracted for chr22.
     Merging SNPs extracted from each chromosome...
     Created bed/bim/fam fileset with extracted SNPs: tmp_GENAL/e415aab3_allchr

{genal_python-1.0 → genal_python-1.2}/README.md RENAMED Viewed

@@ -5,10 +5,6 @@
 <center><h1> genal: A Python Toolkit for Genetic Risk Scoring and Mendelian Randomization </h1></center>
-**This project was developed by Cyprien A. Rivier**
 # Table of contents
 1. [Introduction](#introduction)
 2. [Citation](#citation)
@@ -38,8 +34,12 @@ Genal draws on concepts from well-established R packages such as TwoSampleMR, MR
 Genal flowchart. Created in https://www.BioRender.com
 ## Citation <a name="citation"></a>
-If you're using genal, please cite the following paper:
-**Genal: A Python Toolkit for Genetic Risk Scoring and Mendelian Randomization.** Cyprien A. Rivier, Santiago Clocchiatti-Tuozzo, Shufan Huo, Victor Torres-Lopez, Daniela Renedo, Kevin N. Sheth, Guido J. Falcone, Julian N. Acosta. medRxiv 2024.05.23.24307776; doi: https://doi.org/10.1101/2024.05.23.24307776
+This project was developed by Cyprien A. Rivier.
+If you're using genal, please cite the following paper:
+**Genal: A Python Toolkit for Genetic Risk Scoring and Mendelian Randomization.**
+Cyprien A. Rivier, Santiago Clocchiatti-Tuozzo, Shufan Huo, Victor Torres-Lopez, Daniela Renedo, Kevin N. Sheth, Guido J. Falcone, Julian N. Acosta.
+Bioinformatics Advances 2024.
+doi: https://doi.org/10.1093/bioadv/vbae207
 ## Requirements for the genal module <a name="paragraph1"></a>
 ***Python 3.8 or later***. https://www.python.org/ <br>
@@ -67,8 +67,8 @@ And import it in a python environment with:
 import genal
 ```
-The main genal functionalities require a working installation of PLINK v1.9 (and not 2.0 as certain functionalities have not been updated yet).
-If you have already installed plink v1.9, you can set the path to its executable with:
+The main genal functionalities require a working installation of PLINK v2.0.
+If you have already installed plink v2.0, you can set the path to its executable with:
 ```
 genal.set_plink(path="/path/to/plink/executable/file")
@@ -232,7 +232,7 @@ You do not need to obtain the 1000 genome reference panel yourself, genal will d
 SBP_Geno.preprocess_data(preprocessing = 'Fill_delete', reference_panel = "afr")
 ```
-You can also use a custom reference panel by specifying to the reference_panel argument a path to bed/bim/fam files (without the extension).
+You can also use a custom reference panel by specifying to the reference_panel argument a path to bed/bim/fam (plink v1.9 format) or pgen/pvar/psam files (plink v2.0 format), without the extension.
 ### Clumping <a name="paragraph3.3"></a>
@@ -265,7 +265,7 @@ Computing a Polygenic Risk Score (PRS) can be done in one line with the `genal.G
 SBP_clumped.prs(name = "SBP_prs", path = "path/to/genetic/files")
 ```
-The genetic files of the target population can be either contained in one triple of bed/bim/fam files with information for all SNPs, or divided by chromosome (one bed/bim/fam triple for chr 1, another for chr 2, etc...). In the latter case, provide the path by replacing the chromosome number by `$` and genal will extract the necessary SNPs from each chromosome and merge them before running the PRS. For instance, if the genetic files are named `Pop_chr1.bed`, `Pop_chr1.bim`, `Pop_chr1.fam`, `Pop_chr2.bed`, ..., you can use:
+The genetic files of the target population can be either contained in one triple of bed/bim/fam or pgen/pvar/psam files with information for all SNPs, or divided by chromosome (one bed/bim/fam or pgen/pvar/psam triple for chr 1, another for chr 2, etc...). In the latter case, provide the path by replacing the chromosome number by `$` and genal will extract the necessary SNPs from each chromosome and merge them before running the PRS. For instance, if the genetic files are named `Pop_chr1.bed`, `Pop_chr1.bim`, `Pop_chr1.fam`, `Pop_chr2.bed`, ..., you can use:
 ```python
 SBP_clumped.prs(name = "SBP_prs", path = "Pop_chr$")
@@ -323,26 +323,7 @@ and the output is:
     7(0.455%) duplicated SNPs have been removed. Use keep_dups=True to keep them.
     Extracting SNPs for each chromosome...
     SNPs extracted for chr1.
-    SNPs extracted for chr2.
-    SNPs extracted for chr3.
-    SNPs extracted for chr4.
-    SNPs extracted for chr5.
-    SNPs extracted for chr6.
-    SNPs extracted for chr7.
-    SNPs extracted for chr8.
-    SNPs extracted for chr9.
-    SNPs extracted for chr10.
-    SNPs extracted for chr11.
-    SNPs extracted for chr12.
-    SNPs extracted for chr13.
-    SNPs extracted for chr14.
-    SNPs extracted for chr15.
-    SNPs extracted for chr16.
-    SNPs extracted for chr17.
-    SNPs extracted for chr18.
-    SNPs extracted for chr19.
-    SNPs extracted for chr20.
-    SNPs extracted for chr21.
+    ...
     SNPs extracted for chr22.
     Merging SNPs extracted from each chromosome...
     Created bed/bim/fam fileset with extracted SNPs: tmp_GENAL/4f4ce6a7_allchr
@@ -411,7 +392,6 @@ Genal will print how many SNPs were successfully found and extracted from the ou
 > **Note:**
->Here as well you have the option to use proxies for the instruments that are not present in the outcome data:
 >
 > Here as well you have the option to use proxies for the instruments that are not present in the outcome data:
 >
@@ -455,7 +435,7 @@ You can specify several arguments. We refer to the API for a full list, but the
 - `action = 2`: Uses effect allele frequencies to attempt to flip them (conservative, default)
 - `action = 3`: Removes all palindromic SNPs (very conservative)
-If you choose the option 2 or 3 (recommended), genal will print the list of palindromic SNPs that have been removed from the analysis.
+When choosing the option 2 or 3 (recommended), genal will print the list of palindromic SNPs that have been removed from the analysis.
 By default, only some MR methods (inverse-variance weighted, weighted median, Simple mode, MR-Egger) are going to be run. But if you wish to run a different set of MR methods, you can pass a list of strings to the `methods` argument. The possible strings are:
 - `IVW` for the classical Inverse-Variance Weighted method with random effects
@@ -472,7 +452,7 @@ By default, only some MR methods (inverse-variance weighted, weighted median, Si
 - `Weighted-mode` for the Weighted mode method
 - `all` to run all the above methods
-For more fine-tuning, such as settings for the number of boostrapping iterations, please refer to the API: [https://genal.readthedocs.io/en/latest/modules.html#id4](MR method).
+For more fine-tuning, such as settings for the number of boostrapping iterations, please refer to the API: [https://genal.readthedocs.io/en/latest/modules.html#id4] (MR method).
 If you want to visualize the obtained MR results, you can use the `genal.Geno.MR_plot` method that will plot each SNP in an `effect_on_exposure x effect_on_outcome` plane as well as lines corresponding to different MR methods:
@@ -481,9 +461,9 @@ SBP_clumped.MR_plot(filename="MR_plot_SBP_AS")
 ```
 ![MR plot](docs/build/_images/MR_plot_SBP_AS.png)
-You can select which MR methods you wish to plot with the `methods` argument. Note that for an MR method to be plotted, they must be included in the latest `genal.Geno.MR` call of this `genal.Geno` instance.
+You can select which MR methods to plot with the `methods` argument. Note that for an MR method to be plotted, they must be included in the latest `genal.Geno.MR` call of this `genal.Geno` instance.
-If you wish to include the heterogeneity values (Cochran's Q) in the results, you can use the heterogeneity argument in the `genal.Geno.MR` call. Here, the heterogeneity for the inverse-variance weighted method:
+To include the heterogeneity values (Cochran's Q) in the results, you can use the heterogeneity argument in the `genal.Geno.MR` call. Here, the heterogeneity for the inverse-variance weighted method:
 ```python
 SBP_clumped.MR(action = 2, methods = ["Egger","IVW"], exposure_name = "SBP", outcome_name = "Stroke_eur", heterogeneity = True)
@@ -496,6 +476,12 @@ And that will give:
 | SBP      | Stroke_eur | Egger Intercept          | 1499 | -0.001381 | 0.000813  | 8.935529e-02  | 2959.965136 | 1497 | 1.253763e-98 |
 | SBP      | Stroke_eur | Inverse-Variance Weighted| 1499 | 0.023049  | 0.001061  | 1.382645e-104 | 2965.678836 | 1498 | 4.280737e-99 |
+To display the coefficients as odds ratios with confidence intervals for a binary outcome trait, you can use the `odds = True` argument:
+```python
+SBP_clumped.MR(action = 2, methods = ["Egger","IVW"], exposure_name = "SBP", outcome_name = "Stroke_eur", heterogeneity = True, odds = True)
+```
 As expected, many MR methods indicate that SBP is strongly associated with stroke, but there could be concerns for horizontal pleiotropy (instruments influencing the outcome through a different pathway than the one used as exposure) given the almost significant MR-Egger intercept p-value.
 To investigate horizontal pleiotropy in more details, a very useful method is Mendelian Randomization Pleiotropy RESidual Sum and Outlier (MR-PRESSO). MR-PRESSO is a method designed to detect and correct for horizontal pleiotropy. It will identify which instruments are likely to be pleiotropic on their effect on the outcome, and it will rerun an inverse-variance weighted MR after excluding them. It can be run using the `genal.Geno.MRpresso` method:
@@ -525,7 +511,7 @@ df_pheno = pd.read_csv("path/to/trait/data")
 > **Note:**
 >
->    One important point is to make sure that the IDs of the participants are identical in the phenotypic data and in the genetic data.
+>    One important point is to make sure that both the Family IDs (FID) and Individual IDs (IID) of the participants are identical in the phenotypic data and in the genetic data.
 Then, it is advised to make a copy of the `genal.Geno` instance containing our instruments as we are going to update their coefficients and to avoid any confusion:
@@ -536,7 +522,7 @@ SBP_adjusted = SBP_clumped.copy()
 We can then call the `genal.Geno.set_phenotype` method, specifying which column contains our trait of interest (for the association testing) and which column contains the individual IDs:
 ```python
-SBP_adjusted.set_phenotype(df_pheno, PHENO = "htn", IID = "IID")
+SBP_adjusted.set_phenotype(df_pheno, PHENO = "htn", IID = "IID", FID = "FID")
 ```
 At this point, genal will identify if the phenotype is binary or quantitative in order to choose the appropriate regression model. If the phenotype is binary, it will assume that the most frequent value is coding for control (and the other value for case), this can be changed with `alternate_control = True`:
@@ -556,26 +542,7 @@ Genal will print information regarding the number of individuals used in the tes
     CHR/POS columns present: SNPs searched based on genomic positions.
     Extracting SNPs for each chromosome...
     SNPs extracted for chr1.
-    SNPs extracted for chr2.
-    SNPs extracted for chr3.
-    SNPs extracted for chr4.
-    SNPs extracted for chr5.
-    SNPs extracted for chr6.
-    SNPs extracted for chr7.
-    SNPs extracted for chr8.
-    SNPs extracted for chr9.
-    SNPs extracted for chr10.
-    SNPs extracted for chr11.
-    SNPs extracted for chr12.
-    SNPs extracted for chr13.
-    SNPs extracted for chr14.
-    SNPs extracted for chr15.
-    SNPs extracted for chr16.
-    SNPs extracted for chr17.
-    SNPs extracted for chr18.
-    SNPs extracted for chr19.
-    SNPs extracted for chr20.
-    SNPs extracted for chr21.
+    ...
     SNPs extracted for chr22.
     Merging SNPs extracted from each chromosome...
     Created bed/bim/fam fileset with extracted SNPs: tmp_GENAL/e415aab3_allchr

genal_python-1.2/docs/source/Images/Genal_flowchart.png ADDED Viewed

Binary file

genal_python-1.2/docs/source/Images/genal_logo.png ADDED Viewed

Binary file

{genal_python-1.0 → genal_python-1.2}/docs/source/conf.py RENAMED Viewed

@@ -13,7 +13,7 @@ sys.path.insert(0, os.path.abspath('../../'))
 project = 'genal'
 copyright = '2023, Cyprien A. Rivier'
 author = 'Cyprien A. Rivier'
-release = 'v1.0'
+release = 'v1.1'
 # -- General configuration ---------------------------------------------------

{genal_python-1.0 → genal_python-1.2}/docs/source/index.rst RENAMED Viewed

@@ -3,12 +3,16 @@
    You can adapt this file completely to your liking, but it should at least
    contain the root `toctree` directive.
+.. image:: Images/genal_logo.png
+   :alt: genal_logo
+   :width: 400px
 genal: A Python Toolkit for Genetic Risk Scoring and Mendelian Randomization
 ============================================================================
 :Author: Cyprien A. Rivier
 :Date: |today|
-:Version: "1.0"
+:Version: "1.2"
 Genal is a python module designed to make it easy to run genetic risk scores and mendelian randomization analyses. It integrates a collection of tools that facilitate the cleaning of single nucleotide polymorphism data (usually derived from Genome-Wide Association Studies) and enable the execution of key clinical population genetic workflows. The functionalities provided by genal include clumping, lifting, association testing, polygenic risk scoring, and Mendelian randomization analyses, all within a single Python module.
@@ -47,7 +51,7 @@ If you use genal in your work, please cite the following paper:
 .. [Rivier.2024] *Genal: A Python Toolkit for Genetic Risk Scoring and Mendelian Randomization*
    Cyprien A. Rivier, Santiago Clocchiatti-Tuozzo, Shufan Huo, Victor Torres-Lopez, Daniela Renedo, Kevin N. Sheth, Guido J. Falcone, Julian N. Acosta.
-   medRxiv. 2024 May `10.1101/2024.05.23.24307776 <https://doi.org/10.1101/2024.05.23.24307776>`_.
+   Bioinformatics Advances. 2024 December; `10.1093/bioadv/vbae207 <https://doi.org/10.1093/bioadv/vbae207>`_.
 References
 ----------

{genal_python-1.0 → genal_python-1.2}/docs/source/introduction.rst RENAMED Viewed

@@ -22,8 +22,8 @@ And import it in a python environment with:
     import genal
-The main genal functionalities require a working installation of PLINK v1.9 (and not 2.0 as certain functionalities have not been updated yet).
-If you have already installed plink v1.9, you can set the path to its executable with:
+The main genal functionalities require a working installation of PLINK v2.0.
+If you have already installed plink v2.0, you can set the path to its executable with:
 .. code-block:: python
@@ -39,6 +39,9 @@ If plink is not installed, genal can install the correct version for your system
 Tutorial
 ========
+.. image:: Images/Genal_flowchart.png
+   :alt: Genal_flowchart
 For the purpose of this tutorial, we are going to build a PRS of systolic blood pressure (SBP) and investigate the genetically-determined effect of SBP on the risk of stroke. We will use both summary statistics from Genome-Wide Association Studies (GWAS) and individual-level data from the UK Biobank as our test population. We are going to go through the following steps:
 Table of contents
@@ -185,8 +188,7 @@ By default, the reference panel used is the European (EUR) one. You can specify
     SBP_Geno.preprocess_data(preprocessing='Fill_delete', reference_panel="afr")
-You can also use a custom reference panel by specifying the path to bed/bim/fam files (without the extension) in the ``reference_panel`` argument.
+You can also use a custom reference panel by specifying to the reference_panel argument a path to bed/bim/fam (plink v1.9 format) or pgen/pvar/psam files (plink v2.0 format), without the extension.
 Clumping
 --------
@@ -221,7 +223,7 @@ Computing a Polygenic Risk Score (PRS) can be done in one line with the :meth:`~
     SBP_clumped.prs(name="SBP_prs", path="path/to/genetic/files")
-The genetic files of the target population can be either contained in one triple of bed/bim/fam files with information for all SNPs, or divided by chromosome (one bed/bim/fam triple for chr 1, another for chr 2, etc...). In the latter case, provide the path by replacing the chromosome number by ``$`` and genal will extract the necessary SNPs from each chromosome and merge them before running the PRS. For instance, if the genetic files are named ``Pop_chr1.bed``, ``Pop_chr1.bim``, ``Pop_chr1.fam``, ``Pop_chr2.bed``, ..., you can use:
+The genetic files of the target population can be either contained in one triple of bed/bim/fam or pgen/pvar/psam files with information for all SNPs, or divided by chromosome (one bed/bim/fam or pgen/pvar/psam triple for chr 1, another for chr 2, etc...). In the latter case, provide the path by replacing the chromosome number by `$` and genal will extract the necessary SNPs from each chromosome and merge them before running the PRS. For instance, if the genetic files are named `Pop_chr1.bed`, `Pop_chr1.bim`, `Pop_chr1.fam`, `Pop_chr2.bed`, ..., you can use:
 .. code-block:: python
@@ -279,26 +281,7 @@ and the output is::
     7(0.455%) duplicated SNPs have been removed. Use keep_dups=True to keep them.
     Extracting SNPs for each chromosome...
     SNPs extracted for chr1.
-    SNPs extracted for chr2.
-    SNPs extracted for chr3.
-    SNPs extracted for chr4.
-    SNPs extracted for chr5.
-    SNPs extracted for chr6.
-    SNPs extracted for chr7.
-    SNPs extracted for chr8.
-    SNPs extracted for chr9.
-    SNPs extracted for chr10.
-    SNPs extracted for chr11.
-    SNPs extracted for chr12.
-    SNPs extracted for chr13.
-    SNPs extracted for chr14.
-    SNPs extracted for chr15.
-    SNPs extracted for chr16.
-    SNPs extracted for chr17.
-    SNPs extracted for chr18.
-    SNPs extracted for chr19.
-    SNPs extracted for chr20.
-    SNPs extracted for chr21.
+    ...
     SNPs extracted for chr22.
     Merging SNPs extracted from each chromosome...
     Created bed/bim/fam fileset with extracted SNPs: tmp_GENAL/4f4ce6a7_allchr
@@ -451,9 +434,9 @@ If you want to visualize the obtained MR results, you can use the :meth:`~genal.
 .. image:: Images/MR_plot_SBP_AS.png
    :alt: MR plot
-You can select which MR methods you wish to plot with the ``methods`` argument. Note that for an MR method to be plotted, they must be included in the latest :meth:`~genal.Geno.MR` call of this :class:`~genal.Geno` instance.
+You can select which MR methods to plot with the ``methods`` argument. Note that for an MR method to be plotted, they must be included in the latest :meth:`~genal.Geno.MR` call of this :class:`~genal.Geno` instance.
-If you wish to include the heterogeneity values (Cochran's Q) in the results, you can use the heterogeneity argument in the :meth:`~genal.Geno.MR` call. Here, the heterogeneity for the inverse-variance weighted method:
+To include the heterogeneity values (Cochran's Q) in the results, you can use the heterogeneity argument in the :meth:`~genal.Geno.MR` call. Here, the heterogeneity for the inverse-variance weighted method:
 .. code-block:: python
@@ -468,8 +451,12 @@ And that will give:
     1      SBP  Stroke_eur            Egger Intercept  1499 -0.001381  0.000813  8.935529e-02  2959.965136  1497  1.253763e-98
     2      SBP  Stroke_eur  Inverse-Variance Weighted  1499  0.023049  0.001061  1.382645e-104 2965.678836  1498  4.280737e-99
+To display the coefficients as odds ratios with confidence intervals for a binary outcome trait, you can use the `odds = True` argument:
+.. code-block:: python
+    SBP_clumped.MR(action=2, methods=["Egger","IVW"], exposure_name="SBP", outcome_name="Stroke_eur", heterogeneity=True, odds=True)
 As expected, many MR methods indicate that SBP is strongly associated with stroke, but there could be concerns for horizontal pleiotropy (instruments influencing the outcome through a different pathway than the one used as exposure) given the almost significant MR-Egger intercept p-value.
 To investigate horizontal pleiotropy in more detail, a very useful method is Mendelian Randomization Pleiotropy RESidual Sum and Outlier (MR-PRESSO).
@@ -506,7 +493,7 @@ Let's start by loading phenotypic data:
     df_pheno = pd.read_csv("path/to/trait/data")
 .. note::
-   One important point is to make sure that the IDs of the participants are identical in the phenotypic data and in the genetic data.
+   One important point is to make sure that both the Family IDs (FID) and Individual IDs (IID) of the participants are identical in the phenotypic data and in the genetic data.
 Then, it is advised to make a copy of the :class:`~genal.Geno` instance containing our instruments as we are going to update their coefficients and to avoid any confusion:
@@ -518,7 +505,7 @@ We can then call the :meth:`~genal.Geno.set_phenotype` method, specifying which
 .. code-block:: python
-    SBP_adjusted.set_phenotype(df_pheno, PHENO="htn", IID="IID")
+    SBP_adjusted.set_phenotype(df_pheno, PHENO="htn", IID="IID", FID="FID")
 At this point, genal will identify if the phenotype is binary or quantitative in order to choose the appropriate regression model. If the phenotype is binary, it will assume that the most frequent value is coding for control (and the other value for case), this can be changed with ``alternate_control=True``::
@@ -537,26 +524,7 @@ Genal will print information regarding the number of individuals used in the tes
     CHR/POS columns present: SNPs searched based on genomic positions.
     Extracting SNPs for each chromosome...
     SNPs extracted for chr1.
-    SNPs extracted for chr2.
-    SNPs extracted for chr3.
-    SNPs extracted for chr4.
-    SNPs extracted for chr5.
-    SNPs extracted for chr6.
-    SNPs extracted for chr7.
-    SNPs extracted for chr8.
-    SNPs extracted for chr9.
-    SNPs extracted for chr10.
-    SNPs extracted for chr11.
-    SNPs extracted for chr12.
-    SNPs extracted for chr13.
-    SNPs extracted for chr14.
-    SNPs extracted for chr15.
-    SNPs extracted for chr16.
-    SNPs extracted for chr17.
-    SNPs extracted for chr18.
-    SNPs extracted for chr19.
-    SNPs extracted for chr20.
-    SNPs extracted for chr21.
+    ...
     SNPs extracted for chr22.
     Merging SNPs extracted from each chromosome...
     Created bed/bim/fam fileset with extracted SNPs: tmp_GENAL/e415aab3_allchr

genal-python 1.0__tar.gz → 1.2__tar.gz

genal-python 1.0tar.gz → 1.2tar.gz