genal-python 1.0__tar.gz → 1.2__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- genal_python-1.2/Genal_flowchart.png +0 -0
- {genal_python-1.0 → genal_python-1.2}/PKG-INFO +25 -58
- {genal_python-1.0 → genal_python-1.2}/README.md +24 -57
- genal_python-1.2/docs/source/Images/Genal_flowchart.png +0 -0
- genal_python-1.2/docs/source/Images/genal_logo.png +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/source/conf.py +1 -1
- {genal_python-1.0 → genal_python-1.2}/docs/source/index.rst +6 -2
- {genal_python-1.0 → genal_python-1.2}/docs/source/introduction.rst +18 -50
- {genal_python-1.0 → genal_python-1.2}/genal/Geno.py +309 -215
- {genal_python-1.0 → genal_python-1.2}/genal/MR.py +1 -5
- {genal_python-1.0 → genal_python-1.2}/genal/MR_tools.py +170 -155
- {genal_python-1.0 → genal_python-1.2}/genal/MRpresso.py +20 -18
- genal_python-1.2/genal/__init__.py +19 -0
- {genal_python-1.0 → genal_python-1.2}/genal/association.py +173 -115
- {genal_python-1.0 → genal_python-1.2}/genal/clump.py +42 -21
- {genal_python-1.0 → genal_python-1.2}/genal/constants.py +3 -0
- {genal_python-1.0 → genal_python-1.2}/genal/extract_prs.py +157 -136
- {genal_python-1.0 → genal_python-1.2}/genal/geno_tools.py +21 -11
- {genal_python-1.0 → genal_python-1.2}/genal/proxy.py +123 -54
- {genal_python-1.0 → genal_python-1.2}/genal/tools.py +211 -137
- genal_python-1.2/genal_logo.png +0 -0
- {genal_python-1.0 → genal_python-1.2}/pyproject.toml +1 -1
- genal_python-1.0/Genal_flowchart.png +0 -0
- genal_python-1.0/genal/__init__.py +0 -20
- genal_python-1.0/genal_logo.png +0 -0
- {genal_python-1.0 → genal_python-1.2}/.DS_Store +0 -0
- {genal_python-1.0 → genal_python-1.2}/.gitignore +0 -0
- {genal_python-1.0 → genal_python-1.2}/.readthedocs.yaml +0 -0
- {genal_python-1.0 → genal_python-1.2}/LICENSE +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/.DS_Store +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/Makefile +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/build/.DS_Store +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/build/.buildinfo +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/build/.doctrees/api.doctree +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/build/.doctrees/environment.pickle +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/build/.doctrees/genal.doctree +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/build/.doctrees/index.doctree +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/build/.doctrees/introduction.doctree +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/build/.doctrees/modules.doctree +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/build/_images/MR_plot_SBP_AS.png +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/build/_modules/genal/Geno.html +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/build/_modules/genal/MR.html +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/build/_modules/genal/MR_tools.html +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/build/_modules/genal/MRpresso.html +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/build/_modules/genal/association.html +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/build/_modules/genal/clump.html +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/build/_modules/genal/extract_prs.html +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/build/_modules/genal/geno_tools.html +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/build/_modules/genal/lift.html +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/build/_modules/genal/proxy.html +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/build/_modules/genal/snp_query.html +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/build/_modules/genal/tools.html +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/build/_modules/index.html +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/build/_sources/api.rst.txt +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/build/_sources/genal.rst.txt +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/build/_sources/index.rst.txt +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/build/_sources/introduction.rst.txt +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/build/_sources/modules.rst.txt +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/build/_static/basic.css +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/build/_static/css/badge_only.css +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/build/_static/css/fonts/Roboto-Slab-Bold.woff +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/build/_static/css/fonts/Roboto-Slab-Bold.woff2 +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/build/_static/css/fonts/Roboto-Slab-Regular.woff +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/build/_static/css/fonts/Roboto-Slab-Regular.woff2 +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/build/_static/css/fonts/fontawesome-webfont.eot +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/build/_static/css/fonts/fontawesome-webfont.svg +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/build/_static/css/fonts/fontawesome-webfont.ttf +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/build/_static/css/fonts/fontawesome-webfont.woff +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/build/_static/css/fonts/fontawesome-webfont.woff2 +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/build/_static/css/fonts/lato-bold-italic.woff +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/build/_static/css/fonts/lato-bold-italic.woff2 +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/build/_static/css/fonts/lato-bold.woff +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/build/_static/css/fonts/lato-bold.woff2 +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/build/_static/css/fonts/lato-normal-italic.woff +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/build/_static/css/fonts/lato-normal-italic.woff2 +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/build/_static/css/fonts/lato-normal.woff +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/build/_static/css/fonts/lato-normal.woff2 +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/build/_static/css/theme.css +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/build/_static/doctools.js +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/build/_static/documentation_options.js +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/build/_static/file.png +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/build/_static/js/badge_only.js +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/build/_static/js/html5shiv-printshiv.min.js +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/build/_static/js/html5shiv.min.js +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/build/_static/js/theme.js +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/build/_static/language_data.js +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/build/_static/minus.png +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/build/_static/plus.png +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/build/_static/pygments.css +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/build/_static/searchtools.js +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/build/_static/sphinx_highlight.js +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/build/api.html +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/build/genal.html +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/build/genindex.html +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/build/index.html +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/build/introduction.html +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/build/modules.html +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/build/objects.inv +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/build/py-modindex.html +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/build/search.html +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/build/searchindex.js +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/make.bat +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/requirements.txt +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/source/.DS_Store +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/source/Images/MR_plot_SBP_AS.png +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/source/api.rst +0 -0
- {genal_python-1.0 → genal_python-1.2}/docs/source/modules.rst +0 -0
- {genal_python-1.0 → genal_python-1.2}/genal/lift.py +0 -0
- {genal_python-1.0 → genal_python-1.2}/genal/snp_query.py +0 -0
- {genal_python-1.0 → genal_python-1.2}/gitignore +0 -0
- {genal_python-1.0 → genal_python-1.2}/readthedocs.yaml +0 -0
|
Binary file
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.3
|
|
2
2
|
Name: genal-python
|
|
3
|
-
Version: 1.
|
|
3
|
+
Version: 1.2
|
|
4
4
|
Summary: A python toolkit for polygenic risk scoring and mendelian randomization.
|
|
5
5
|
Author-email: Cyprien Rivier <riviercyprien@gmail.com>
|
|
6
6
|
Requires-Python: >=3.8
|
|
@@ -29,10 +29,6 @@ Project-URL: Home, https://github.com/CypRiv/genal
|
|
|
29
29
|
<center><h1> genal: A Python Toolkit for Genetic Risk Scoring and Mendelian Randomization </h1></center>
|
|
30
30
|
|
|
31
31
|
|
|
32
|
-
|
|
33
|
-
**This project was developed by Cyprien A. Rivier**
|
|
34
|
-
|
|
35
|
-
|
|
36
32
|
# Table of contents
|
|
37
33
|
1. [Introduction](#introduction)
|
|
38
34
|
2. [Citation](#citation)
|
|
@@ -62,8 +58,12 @@ Genal draws on concepts from well-established R packages such as TwoSampleMR, MR
|
|
|
62
58
|
|
|
63
59
|
Genal flowchart. Created in https://www.BioRender.com
|
|
64
60
|
## Citation <a name="citation"></a>
|
|
65
|
-
|
|
66
|
-
|
|
61
|
+
This project was developed by Cyprien A. Rivier.
|
|
62
|
+
If you're using genal, please cite the following paper:
|
|
63
|
+
**Genal: A Python Toolkit for Genetic Risk Scoring and Mendelian Randomization.**
|
|
64
|
+
Cyprien A. Rivier, Santiago Clocchiatti-Tuozzo, Shufan Huo, Victor Torres-Lopez, Daniela Renedo, Kevin N. Sheth, Guido J. Falcone, Julian N. Acosta.
|
|
65
|
+
Bioinformatics Advances 2024.
|
|
66
|
+
doi: https://doi.org/10.1093/bioadv/vbae207
|
|
67
67
|
|
|
68
68
|
## Requirements for the genal module <a name="paragraph1"></a>
|
|
69
69
|
***Python 3.8 or later***. https://www.python.org/ <br>
|
|
@@ -91,8 +91,8 @@ And import it in a python environment with:
|
|
|
91
91
|
import genal
|
|
92
92
|
```
|
|
93
93
|
|
|
94
|
-
The main genal functionalities require a working installation of PLINK
|
|
95
|
-
If you have already installed plink
|
|
94
|
+
The main genal functionalities require a working installation of PLINK v2.0.
|
|
95
|
+
If you have already installed plink v2.0, you can set the path to its executable with:
|
|
96
96
|
|
|
97
97
|
```
|
|
98
98
|
genal.set_plink(path="/path/to/plink/executable/file")
|
|
@@ -256,7 +256,7 @@ You do not need to obtain the 1000 genome reference panel yourself, genal will d
|
|
|
256
256
|
SBP_Geno.preprocess_data(preprocessing = 'Fill_delete', reference_panel = "afr")
|
|
257
257
|
```
|
|
258
258
|
|
|
259
|
-
You can also use a custom reference panel by specifying to the reference_panel argument a path to bed/bim/fam files (without the extension
|
|
259
|
+
You can also use a custom reference panel by specifying to the reference_panel argument a path to bed/bim/fam (plink v1.9 format) or pgen/pvar/psam files (plink v2.0 format), without the extension.
|
|
260
260
|
|
|
261
261
|
### Clumping <a name="paragraph3.3"></a>
|
|
262
262
|
|
|
@@ -289,7 +289,7 @@ Computing a Polygenic Risk Score (PRS) can be done in one line with the `genal.G
|
|
|
289
289
|
SBP_clumped.prs(name = "SBP_prs", path = "path/to/genetic/files")
|
|
290
290
|
```
|
|
291
291
|
|
|
292
|
-
The genetic files of the target population can be either contained in one triple of bed/bim/fam files with information for all SNPs, or divided by chromosome (one bed/bim/fam triple for chr 1, another for chr 2, etc...). In the latter case, provide the path by replacing the chromosome number by `$` and genal will extract the necessary SNPs from each chromosome and merge them before running the PRS. For instance, if the genetic files are named `Pop_chr1.bed`, `Pop_chr1.bim`, `Pop_chr1.fam`, `Pop_chr2.bed`, ..., you can use:
|
|
292
|
+
The genetic files of the target population can be either contained in one triple of bed/bim/fam or pgen/pvar/psam files with information for all SNPs, or divided by chromosome (one bed/bim/fam or pgen/pvar/psam triple for chr 1, another for chr 2, etc...). In the latter case, provide the path by replacing the chromosome number by `$` and genal will extract the necessary SNPs from each chromosome and merge them before running the PRS. For instance, if the genetic files are named `Pop_chr1.bed`, `Pop_chr1.bim`, `Pop_chr1.fam`, `Pop_chr2.bed`, ..., you can use:
|
|
293
293
|
|
|
294
294
|
```python
|
|
295
295
|
SBP_clumped.prs(name = "SBP_prs", path = "Pop_chr$")
|
|
@@ -347,26 +347,7 @@ and the output is:
|
|
|
347
347
|
7(0.455%) duplicated SNPs have been removed. Use keep_dups=True to keep them.
|
|
348
348
|
Extracting SNPs for each chromosome...
|
|
349
349
|
SNPs extracted for chr1.
|
|
350
|
-
|
|
351
|
-
SNPs extracted for chr3.
|
|
352
|
-
SNPs extracted for chr4.
|
|
353
|
-
SNPs extracted for chr5.
|
|
354
|
-
SNPs extracted for chr6.
|
|
355
|
-
SNPs extracted for chr7.
|
|
356
|
-
SNPs extracted for chr8.
|
|
357
|
-
SNPs extracted for chr9.
|
|
358
|
-
SNPs extracted for chr10.
|
|
359
|
-
SNPs extracted for chr11.
|
|
360
|
-
SNPs extracted for chr12.
|
|
361
|
-
SNPs extracted for chr13.
|
|
362
|
-
SNPs extracted for chr14.
|
|
363
|
-
SNPs extracted for chr15.
|
|
364
|
-
SNPs extracted for chr16.
|
|
365
|
-
SNPs extracted for chr17.
|
|
366
|
-
SNPs extracted for chr18.
|
|
367
|
-
SNPs extracted for chr19.
|
|
368
|
-
SNPs extracted for chr20.
|
|
369
|
-
SNPs extracted for chr21.
|
|
350
|
+
...
|
|
370
351
|
SNPs extracted for chr22.
|
|
371
352
|
Merging SNPs extracted from each chromosome...
|
|
372
353
|
Created bed/bim/fam fileset with extracted SNPs: tmp_GENAL/4f4ce6a7_allchr
|
|
@@ -435,7 +416,6 @@ Genal will print how many SNPs were successfully found and extracted from the ou
|
|
|
435
416
|
|
|
436
417
|
|
|
437
418
|
> **Note:**
|
|
438
|
-
>Here as well you have the option to use proxies for the instruments that are not present in the outcome data:
|
|
439
419
|
>
|
|
440
420
|
> Here as well you have the option to use proxies for the instruments that are not present in the outcome data:
|
|
441
421
|
>
|
|
@@ -479,7 +459,7 @@ You can specify several arguments. We refer to the API for a full list, but the
|
|
|
479
459
|
- `action = 2`: Uses effect allele frequencies to attempt to flip them (conservative, default)
|
|
480
460
|
- `action = 3`: Removes all palindromic SNPs (very conservative)
|
|
481
461
|
|
|
482
|
-
|
|
462
|
+
When choosing the option 2 or 3 (recommended), genal will print the list of palindromic SNPs that have been removed from the analysis.
|
|
483
463
|
|
|
484
464
|
By default, only some MR methods (inverse-variance weighted, weighted median, Simple mode, MR-Egger) are going to be run. But if you wish to run a different set of MR methods, you can pass a list of strings to the `methods` argument. The possible strings are:
|
|
485
465
|
- `IVW` for the classical Inverse-Variance Weighted method with random effects
|
|
@@ -496,7 +476,7 @@ By default, only some MR methods (inverse-variance weighted, weighted median, Si
|
|
|
496
476
|
- `Weighted-mode` for the Weighted mode method
|
|
497
477
|
- `all` to run all the above methods
|
|
498
478
|
|
|
499
|
-
For more fine-tuning, such as settings for the number of boostrapping iterations, please refer to the API: [https://genal.readthedocs.io/en/latest/modules.html#id4](MR method).
|
|
479
|
+
For more fine-tuning, such as settings for the number of boostrapping iterations, please refer to the API: [https://genal.readthedocs.io/en/latest/modules.html#id4] (MR method).
|
|
500
480
|
|
|
501
481
|
If you want to visualize the obtained MR results, you can use the `genal.Geno.MR_plot` method that will plot each SNP in an `effect_on_exposure x effect_on_outcome` plane as well as lines corresponding to different MR methods:
|
|
502
482
|
|
|
@@ -505,9 +485,9 @@ SBP_clumped.MR_plot(filename="MR_plot_SBP_AS")
|
|
|
505
485
|
```
|
|
506
486
|
|
|
507
487
|

|
|
508
|
-
You can select which MR methods
|
|
488
|
+
You can select which MR methods to plot with the `methods` argument. Note that for an MR method to be plotted, they must be included in the latest `genal.Geno.MR` call of this `genal.Geno` instance.
|
|
509
489
|
|
|
510
|
-
|
|
490
|
+
To include the heterogeneity values (Cochran's Q) in the results, you can use the heterogeneity argument in the `genal.Geno.MR` call. Here, the heterogeneity for the inverse-variance weighted method:
|
|
511
491
|
|
|
512
492
|
```python
|
|
513
493
|
SBP_clumped.MR(action = 2, methods = ["Egger","IVW"], exposure_name = "SBP", outcome_name = "Stroke_eur", heterogeneity = True)
|
|
@@ -520,6 +500,12 @@ And that will give:
|
|
|
520
500
|
| SBP | Stroke_eur | Egger Intercept | 1499 | -0.001381 | 0.000813 | 8.935529e-02 | 2959.965136 | 1497 | 1.253763e-98 |
|
|
521
501
|
| SBP | Stroke_eur | Inverse-Variance Weighted| 1499 | 0.023049 | 0.001061 | 1.382645e-104 | 2965.678836 | 1498 | 4.280737e-99 |
|
|
522
502
|
|
|
503
|
+
To display the coefficients as odds ratios with confidence intervals for a binary outcome trait, you can use the `odds = True` argument:
|
|
504
|
+
|
|
505
|
+
```python
|
|
506
|
+
SBP_clumped.MR(action = 2, methods = ["Egger","IVW"], exposure_name = "SBP", outcome_name = "Stroke_eur", heterogeneity = True, odds = True)
|
|
507
|
+
```
|
|
508
|
+
|
|
523
509
|
As expected, many MR methods indicate that SBP is strongly associated with stroke, but there could be concerns for horizontal pleiotropy (instruments influencing the outcome through a different pathway than the one used as exposure) given the almost significant MR-Egger intercept p-value.
|
|
524
510
|
To investigate horizontal pleiotropy in more details, a very useful method is Mendelian Randomization Pleiotropy RESidual Sum and Outlier (MR-PRESSO). MR-PRESSO is a method designed to detect and correct for horizontal pleiotropy. It will identify which instruments are likely to be pleiotropic on their effect on the outcome, and it will rerun an inverse-variance weighted MR after excluding them. It can be run using the `genal.Geno.MRpresso` method:
|
|
525
511
|
|
|
@@ -549,7 +535,7 @@ df_pheno = pd.read_csv("path/to/trait/data")
|
|
|
549
535
|
|
|
550
536
|
> **Note:**
|
|
551
537
|
>
|
|
552
|
-
> One important point is to make sure that the IDs of the participants are identical in the phenotypic data and in the genetic data.
|
|
538
|
+
> One important point is to make sure that both the Family IDs (FID) and Individual IDs (IID) of the participants are identical in the phenotypic data and in the genetic data.
|
|
553
539
|
|
|
554
540
|
Then, it is advised to make a copy of the `genal.Geno` instance containing our instruments as we are going to update their coefficients and to avoid any confusion:
|
|
555
541
|
|
|
@@ -560,7 +546,7 @@ SBP_adjusted = SBP_clumped.copy()
|
|
|
560
546
|
We can then call the `genal.Geno.set_phenotype` method, specifying which column contains our trait of interest (for the association testing) and which column contains the individual IDs:
|
|
561
547
|
|
|
562
548
|
```python
|
|
563
|
-
SBP_adjusted.set_phenotype(df_pheno, PHENO = "htn", IID = "IID")
|
|
549
|
+
SBP_adjusted.set_phenotype(df_pheno, PHENO = "htn", IID = "IID", FID = "FID")
|
|
564
550
|
```
|
|
565
551
|
|
|
566
552
|
At this point, genal will identify if the phenotype is binary or quantitative in order to choose the appropriate regression model. If the phenotype is binary, it will assume that the most frequent value is coding for control (and the other value for case), this can be changed with `alternate_control = True`:
|
|
@@ -580,26 +566,7 @@ Genal will print information regarding the number of individuals used in the tes
|
|
|
580
566
|
CHR/POS columns present: SNPs searched based on genomic positions.
|
|
581
567
|
Extracting SNPs for each chromosome...
|
|
582
568
|
SNPs extracted for chr1.
|
|
583
|
-
|
|
584
|
-
SNPs extracted for chr3.
|
|
585
|
-
SNPs extracted for chr4.
|
|
586
|
-
SNPs extracted for chr5.
|
|
587
|
-
SNPs extracted for chr6.
|
|
588
|
-
SNPs extracted for chr7.
|
|
589
|
-
SNPs extracted for chr8.
|
|
590
|
-
SNPs extracted for chr9.
|
|
591
|
-
SNPs extracted for chr10.
|
|
592
|
-
SNPs extracted for chr11.
|
|
593
|
-
SNPs extracted for chr12.
|
|
594
|
-
SNPs extracted for chr13.
|
|
595
|
-
SNPs extracted for chr14.
|
|
596
|
-
SNPs extracted for chr15.
|
|
597
|
-
SNPs extracted for chr16.
|
|
598
|
-
SNPs extracted for chr17.
|
|
599
|
-
SNPs extracted for chr18.
|
|
600
|
-
SNPs extracted for chr19.
|
|
601
|
-
SNPs extracted for chr20.
|
|
602
|
-
SNPs extracted for chr21.
|
|
569
|
+
...
|
|
603
570
|
SNPs extracted for chr22.
|
|
604
571
|
Merging SNPs extracted from each chromosome...
|
|
605
572
|
Created bed/bim/fam fileset with extracted SNPs: tmp_GENAL/e415aab3_allchr
|
|
@@ -5,10 +5,6 @@
|
|
|
5
5
|
<center><h1> genal: A Python Toolkit for Genetic Risk Scoring and Mendelian Randomization </h1></center>
|
|
6
6
|
|
|
7
7
|
|
|
8
|
-
|
|
9
|
-
**This project was developed by Cyprien A. Rivier**
|
|
10
|
-
|
|
11
|
-
|
|
12
8
|
# Table of contents
|
|
13
9
|
1. [Introduction](#introduction)
|
|
14
10
|
2. [Citation](#citation)
|
|
@@ -38,8 +34,12 @@ Genal draws on concepts from well-established R packages such as TwoSampleMR, MR
|
|
|
38
34
|
|
|
39
35
|
Genal flowchart. Created in https://www.BioRender.com
|
|
40
36
|
## Citation <a name="citation"></a>
|
|
41
|
-
|
|
42
|
-
|
|
37
|
+
This project was developed by Cyprien A. Rivier.
|
|
38
|
+
If you're using genal, please cite the following paper:
|
|
39
|
+
**Genal: A Python Toolkit for Genetic Risk Scoring and Mendelian Randomization.**
|
|
40
|
+
Cyprien A. Rivier, Santiago Clocchiatti-Tuozzo, Shufan Huo, Victor Torres-Lopez, Daniela Renedo, Kevin N. Sheth, Guido J. Falcone, Julian N. Acosta.
|
|
41
|
+
Bioinformatics Advances 2024.
|
|
42
|
+
doi: https://doi.org/10.1093/bioadv/vbae207
|
|
43
43
|
|
|
44
44
|
## Requirements for the genal module <a name="paragraph1"></a>
|
|
45
45
|
***Python 3.8 or later***. https://www.python.org/ <br>
|
|
@@ -67,8 +67,8 @@ And import it in a python environment with:
|
|
|
67
67
|
import genal
|
|
68
68
|
```
|
|
69
69
|
|
|
70
|
-
The main genal functionalities require a working installation of PLINK
|
|
71
|
-
If you have already installed plink
|
|
70
|
+
The main genal functionalities require a working installation of PLINK v2.0.
|
|
71
|
+
If you have already installed plink v2.0, you can set the path to its executable with:
|
|
72
72
|
|
|
73
73
|
```
|
|
74
74
|
genal.set_plink(path="/path/to/plink/executable/file")
|
|
@@ -232,7 +232,7 @@ You do not need to obtain the 1000 genome reference panel yourself, genal will d
|
|
|
232
232
|
SBP_Geno.preprocess_data(preprocessing = 'Fill_delete', reference_panel = "afr")
|
|
233
233
|
```
|
|
234
234
|
|
|
235
|
-
You can also use a custom reference panel by specifying to the reference_panel argument a path to bed/bim/fam files (without the extension
|
|
235
|
+
You can also use a custom reference panel by specifying to the reference_panel argument a path to bed/bim/fam (plink v1.9 format) or pgen/pvar/psam files (plink v2.0 format), without the extension.
|
|
236
236
|
|
|
237
237
|
### Clumping <a name="paragraph3.3"></a>
|
|
238
238
|
|
|
@@ -265,7 +265,7 @@ Computing a Polygenic Risk Score (PRS) can be done in one line with the `genal.G
|
|
|
265
265
|
SBP_clumped.prs(name = "SBP_prs", path = "path/to/genetic/files")
|
|
266
266
|
```
|
|
267
267
|
|
|
268
|
-
The genetic files of the target population can be either contained in one triple of bed/bim/fam files with information for all SNPs, or divided by chromosome (one bed/bim/fam triple for chr 1, another for chr 2, etc...). In the latter case, provide the path by replacing the chromosome number by `$` and genal will extract the necessary SNPs from each chromosome and merge them before running the PRS. For instance, if the genetic files are named `Pop_chr1.bed`, `Pop_chr1.bim`, `Pop_chr1.fam`, `Pop_chr2.bed`, ..., you can use:
|
|
268
|
+
The genetic files of the target population can be either contained in one triple of bed/bim/fam or pgen/pvar/psam files with information for all SNPs, or divided by chromosome (one bed/bim/fam or pgen/pvar/psam triple for chr 1, another for chr 2, etc...). In the latter case, provide the path by replacing the chromosome number by `$` and genal will extract the necessary SNPs from each chromosome and merge them before running the PRS. For instance, if the genetic files are named `Pop_chr1.bed`, `Pop_chr1.bim`, `Pop_chr1.fam`, `Pop_chr2.bed`, ..., you can use:
|
|
269
269
|
|
|
270
270
|
```python
|
|
271
271
|
SBP_clumped.prs(name = "SBP_prs", path = "Pop_chr$")
|
|
@@ -323,26 +323,7 @@ and the output is:
|
|
|
323
323
|
7(0.455%) duplicated SNPs have been removed. Use keep_dups=True to keep them.
|
|
324
324
|
Extracting SNPs for each chromosome...
|
|
325
325
|
SNPs extracted for chr1.
|
|
326
|
-
|
|
327
|
-
SNPs extracted for chr3.
|
|
328
|
-
SNPs extracted for chr4.
|
|
329
|
-
SNPs extracted for chr5.
|
|
330
|
-
SNPs extracted for chr6.
|
|
331
|
-
SNPs extracted for chr7.
|
|
332
|
-
SNPs extracted for chr8.
|
|
333
|
-
SNPs extracted for chr9.
|
|
334
|
-
SNPs extracted for chr10.
|
|
335
|
-
SNPs extracted for chr11.
|
|
336
|
-
SNPs extracted for chr12.
|
|
337
|
-
SNPs extracted for chr13.
|
|
338
|
-
SNPs extracted for chr14.
|
|
339
|
-
SNPs extracted for chr15.
|
|
340
|
-
SNPs extracted for chr16.
|
|
341
|
-
SNPs extracted for chr17.
|
|
342
|
-
SNPs extracted for chr18.
|
|
343
|
-
SNPs extracted for chr19.
|
|
344
|
-
SNPs extracted for chr20.
|
|
345
|
-
SNPs extracted for chr21.
|
|
326
|
+
...
|
|
346
327
|
SNPs extracted for chr22.
|
|
347
328
|
Merging SNPs extracted from each chromosome...
|
|
348
329
|
Created bed/bim/fam fileset with extracted SNPs: tmp_GENAL/4f4ce6a7_allchr
|
|
@@ -411,7 +392,6 @@ Genal will print how many SNPs were successfully found and extracted from the ou
|
|
|
411
392
|
|
|
412
393
|
|
|
413
394
|
> **Note:**
|
|
414
|
-
>Here as well you have the option to use proxies for the instruments that are not present in the outcome data:
|
|
415
395
|
>
|
|
416
396
|
> Here as well you have the option to use proxies for the instruments that are not present in the outcome data:
|
|
417
397
|
>
|
|
@@ -455,7 +435,7 @@ You can specify several arguments. We refer to the API for a full list, but the
|
|
|
455
435
|
- `action = 2`: Uses effect allele frequencies to attempt to flip them (conservative, default)
|
|
456
436
|
- `action = 3`: Removes all palindromic SNPs (very conservative)
|
|
457
437
|
|
|
458
|
-
|
|
438
|
+
When choosing the option 2 or 3 (recommended), genal will print the list of palindromic SNPs that have been removed from the analysis.
|
|
459
439
|
|
|
460
440
|
By default, only some MR methods (inverse-variance weighted, weighted median, Simple mode, MR-Egger) are going to be run. But if you wish to run a different set of MR methods, you can pass a list of strings to the `methods` argument. The possible strings are:
|
|
461
441
|
- `IVW` for the classical Inverse-Variance Weighted method with random effects
|
|
@@ -472,7 +452,7 @@ By default, only some MR methods (inverse-variance weighted, weighted median, Si
|
|
|
472
452
|
- `Weighted-mode` for the Weighted mode method
|
|
473
453
|
- `all` to run all the above methods
|
|
474
454
|
|
|
475
|
-
For more fine-tuning, such as settings for the number of boostrapping iterations, please refer to the API: [https://genal.readthedocs.io/en/latest/modules.html#id4](MR method).
|
|
455
|
+
For more fine-tuning, such as settings for the number of boostrapping iterations, please refer to the API: [https://genal.readthedocs.io/en/latest/modules.html#id4] (MR method).
|
|
476
456
|
|
|
477
457
|
If you want to visualize the obtained MR results, you can use the `genal.Geno.MR_plot` method that will plot each SNP in an `effect_on_exposure x effect_on_outcome` plane as well as lines corresponding to different MR methods:
|
|
478
458
|
|
|
@@ -481,9 +461,9 @@ SBP_clumped.MR_plot(filename="MR_plot_SBP_AS")
|
|
|
481
461
|
```
|
|
482
462
|
|
|
483
463
|

|
|
484
|
-
You can select which MR methods
|
|
464
|
+
You can select which MR methods to plot with the `methods` argument. Note that for an MR method to be plotted, they must be included in the latest `genal.Geno.MR` call of this `genal.Geno` instance.
|
|
485
465
|
|
|
486
|
-
|
|
466
|
+
To include the heterogeneity values (Cochran's Q) in the results, you can use the heterogeneity argument in the `genal.Geno.MR` call. Here, the heterogeneity for the inverse-variance weighted method:
|
|
487
467
|
|
|
488
468
|
```python
|
|
489
469
|
SBP_clumped.MR(action = 2, methods = ["Egger","IVW"], exposure_name = "SBP", outcome_name = "Stroke_eur", heterogeneity = True)
|
|
@@ -496,6 +476,12 @@ And that will give:
|
|
|
496
476
|
| SBP | Stroke_eur | Egger Intercept | 1499 | -0.001381 | 0.000813 | 8.935529e-02 | 2959.965136 | 1497 | 1.253763e-98 |
|
|
497
477
|
| SBP | Stroke_eur | Inverse-Variance Weighted| 1499 | 0.023049 | 0.001061 | 1.382645e-104 | 2965.678836 | 1498 | 4.280737e-99 |
|
|
498
478
|
|
|
479
|
+
To display the coefficients as odds ratios with confidence intervals for a binary outcome trait, you can use the `odds = True` argument:
|
|
480
|
+
|
|
481
|
+
```python
|
|
482
|
+
SBP_clumped.MR(action = 2, methods = ["Egger","IVW"], exposure_name = "SBP", outcome_name = "Stroke_eur", heterogeneity = True, odds = True)
|
|
483
|
+
```
|
|
484
|
+
|
|
499
485
|
As expected, many MR methods indicate that SBP is strongly associated with stroke, but there could be concerns for horizontal pleiotropy (instruments influencing the outcome through a different pathway than the one used as exposure) given the almost significant MR-Egger intercept p-value.
|
|
500
486
|
To investigate horizontal pleiotropy in more details, a very useful method is Mendelian Randomization Pleiotropy RESidual Sum and Outlier (MR-PRESSO). MR-PRESSO is a method designed to detect and correct for horizontal pleiotropy. It will identify which instruments are likely to be pleiotropic on their effect on the outcome, and it will rerun an inverse-variance weighted MR after excluding them. It can be run using the `genal.Geno.MRpresso` method:
|
|
501
487
|
|
|
@@ -525,7 +511,7 @@ df_pheno = pd.read_csv("path/to/trait/data")
|
|
|
525
511
|
|
|
526
512
|
> **Note:**
|
|
527
513
|
>
|
|
528
|
-
> One important point is to make sure that the IDs of the participants are identical in the phenotypic data and in the genetic data.
|
|
514
|
+
> One important point is to make sure that both the Family IDs (FID) and Individual IDs (IID) of the participants are identical in the phenotypic data and in the genetic data.
|
|
529
515
|
|
|
530
516
|
Then, it is advised to make a copy of the `genal.Geno` instance containing our instruments as we are going to update their coefficients and to avoid any confusion:
|
|
531
517
|
|
|
@@ -536,7 +522,7 @@ SBP_adjusted = SBP_clumped.copy()
|
|
|
536
522
|
We can then call the `genal.Geno.set_phenotype` method, specifying which column contains our trait of interest (for the association testing) and which column contains the individual IDs:
|
|
537
523
|
|
|
538
524
|
```python
|
|
539
|
-
SBP_adjusted.set_phenotype(df_pheno, PHENO = "htn", IID = "IID")
|
|
525
|
+
SBP_adjusted.set_phenotype(df_pheno, PHENO = "htn", IID = "IID", FID = "FID")
|
|
540
526
|
```
|
|
541
527
|
|
|
542
528
|
At this point, genal will identify if the phenotype is binary or quantitative in order to choose the appropriate regression model. If the phenotype is binary, it will assume that the most frequent value is coding for control (and the other value for case), this can be changed with `alternate_control = True`:
|
|
@@ -556,26 +542,7 @@ Genal will print information regarding the number of individuals used in the tes
|
|
|
556
542
|
CHR/POS columns present: SNPs searched based on genomic positions.
|
|
557
543
|
Extracting SNPs for each chromosome...
|
|
558
544
|
SNPs extracted for chr1.
|
|
559
|
-
|
|
560
|
-
SNPs extracted for chr3.
|
|
561
|
-
SNPs extracted for chr4.
|
|
562
|
-
SNPs extracted for chr5.
|
|
563
|
-
SNPs extracted for chr6.
|
|
564
|
-
SNPs extracted for chr7.
|
|
565
|
-
SNPs extracted for chr8.
|
|
566
|
-
SNPs extracted for chr9.
|
|
567
|
-
SNPs extracted for chr10.
|
|
568
|
-
SNPs extracted for chr11.
|
|
569
|
-
SNPs extracted for chr12.
|
|
570
|
-
SNPs extracted for chr13.
|
|
571
|
-
SNPs extracted for chr14.
|
|
572
|
-
SNPs extracted for chr15.
|
|
573
|
-
SNPs extracted for chr16.
|
|
574
|
-
SNPs extracted for chr17.
|
|
575
|
-
SNPs extracted for chr18.
|
|
576
|
-
SNPs extracted for chr19.
|
|
577
|
-
SNPs extracted for chr20.
|
|
578
|
-
SNPs extracted for chr21.
|
|
545
|
+
...
|
|
579
546
|
SNPs extracted for chr22.
|
|
580
547
|
Merging SNPs extracted from each chromosome...
|
|
581
548
|
Created bed/bim/fam fileset with extracted SNPs: tmp_GENAL/e415aab3_allchr
|
|
Binary file
|
|
Binary file
|
|
@@ -13,7 +13,7 @@ sys.path.insert(0, os.path.abspath('../../'))
|
|
|
13
13
|
project = 'genal'
|
|
14
14
|
copyright = '2023, Cyprien A. Rivier'
|
|
15
15
|
author = 'Cyprien A. Rivier'
|
|
16
|
-
release = 'v1.
|
|
16
|
+
release = 'v1.1'
|
|
17
17
|
|
|
18
18
|
|
|
19
19
|
# -- General configuration ---------------------------------------------------
|
|
@@ -3,12 +3,16 @@
|
|
|
3
3
|
You can adapt this file completely to your liking, but it should at least
|
|
4
4
|
contain the root `toctree` directive.
|
|
5
5
|
|
|
6
|
+
.. image:: Images/genal_logo.png
|
|
7
|
+
:alt: genal_logo
|
|
8
|
+
:width: 400px
|
|
9
|
+
|
|
6
10
|
genal: A Python Toolkit for Genetic Risk Scoring and Mendelian Randomization
|
|
7
11
|
============================================================================
|
|
8
12
|
|
|
9
13
|
:Author: Cyprien A. Rivier
|
|
10
14
|
:Date: |today|
|
|
11
|
-
:Version: "1.
|
|
15
|
+
:Version: "1.2"
|
|
12
16
|
|
|
13
17
|
Genal is a python module designed to make it easy to run genetic risk scores and mendelian randomization analyses. It integrates a collection of tools that facilitate the cleaning of single nucleotide polymorphism data (usually derived from Genome-Wide Association Studies) and enable the execution of key clinical population genetic workflows. The functionalities provided by genal include clumping, lifting, association testing, polygenic risk scoring, and Mendelian randomization analyses, all within a single Python module.
|
|
14
18
|
|
|
@@ -47,7 +51,7 @@ If you use genal in your work, please cite the following paper:
|
|
|
47
51
|
|
|
48
52
|
.. [Rivier.2024] *Genal: A Python Toolkit for Genetic Risk Scoring and Mendelian Randomization*
|
|
49
53
|
Cyprien A. Rivier, Santiago Clocchiatti-Tuozzo, Shufan Huo, Victor Torres-Lopez, Daniela Renedo, Kevin N. Sheth, Guido J. Falcone, Julian N. Acosta.
|
|
50
|
-
|
|
54
|
+
Bioinformatics Advances. 2024 December; `10.1093/bioadv/vbae207 <https://doi.org/10.1093/bioadv/vbae207>`_.
|
|
51
55
|
|
|
52
56
|
References
|
|
53
57
|
----------
|
|
@@ -22,8 +22,8 @@ And import it in a python environment with:
|
|
|
22
22
|
|
|
23
23
|
import genal
|
|
24
24
|
|
|
25
|
-
The main genal functionalities require a working installation of PLINK
|
|
26
|
-
If you have already installed plink
|
|
25
|
+
The main genal functionalities require a working installation of PLINK v2.0.
|
|
26
|
+
If you have already installed plink v2.0, you can set the path to its executable with:
|
|
27
27
|
|
|
28
28
|
.. code-block:: python
|
|
29
29
|
|
|
@@ -39,6 +39,9 @@ If plink is not installed, genal can install the correct version for your system
|
|
|
39
39
|
Tutorial
|
|
40
40
|
========
|
|
41
41
|
|
|
42
|
+
.. image:: Images/Genal_flowchart.png
|
|
43
|
+
:alt: Genal_flowchart
|
|
44
|
+
|
|
42
45
|
For the purpose of this tutorial, we are going to build a PRS of systolic blood pressure (SBP) and investigate the genetically-determined effect of SBP on the risk of stroke. We will use both summary statistics from Genome-Wide Association Studies (GWAS) and individual-level data from the UK Biobank as our test population. We are going to go through the following steps:
|
|
43
46
|
|
|
44
47
|
Table of contents
|
|
@@ -185,8 +188,7 @@ By default, the reference panel used is the European (EUR) one. You can specify
|
|
|
185
188
|
|
|
186
189
|
SBP_Geno.preprocess_data(preprocessing='Fill_delete', reference_panel="afr")
|
|
187
190
|
|
|
188
|
-
You can also use a custom reference panel by specifying the path to bed/bim/fam files (
|
|
189
|
-
|
|
191
|
+
You can also use a custom reference panel by specifying to the reference_panel argument a path to bed/bim/fam (plink v1.9 format) or pgen/pvar/psam files (plink v2.0 format), without the extension.
|
|
190
192
|
|
|
191
193
|
Clumping
|
|
192
194
|
--------
|
|
@@ -221,7 +223,7 @@ Computing a Polygenic Risk Score (PRS) can be done in one line with the :meth:`~
|
|
|
221
223
|
|
|
222
224
|
SBP_clumped.prs(name="SBP_prs", path="path/to/genetic/files")
|
|
223
225
|
|
|
224
|
-
The genetic files of the target population can be either contained in one triple of bed/bim/fam files with information for all SNPs, or divided by chromosome (one bed/bim/fam triple for chr 1, another for chr 2, etc...). In the latter case, provide the path by replacing the chromosome number by
|
|
226
|
+
The genetic files of the target population can be either contained in one triple of bed/bim/fam or pgen/pvar/psam files with information for all SNPs, or divided by chromosome (one bed/bim/fam or pgen/pvar/psam triple for chr 1, another for chr 2, etc...). In the latter case, provide the path by replacing the chromosome number by `$` and genal will extract the necessary SNPs from each chromosome and merge them before running the PRS. For instance, if the genetic files are named `Pop_chr1.bed`, `Pop_chr1.bim`, `Pop_chr1.fam`, `Pop_chr2.bed`, ..., you can use:
|
|
225
227
|
|
|
226
228
|
.. code-block:: python
|
|
227
229
|
|
|
@@ -279,26 +281,7 @@ and the output is::
|
|
|
279
281
|
7(0.455%) duplicated SNPs have been removed. Use keep_dups=True to keep them.
|
|
280
282
|
Extracting SNPs for each chromosome...
|
|
281
283
|
SNPs extracted for chr1.
|
|
282
|
-
|
|
283
|
-
SNPs extracted for chr3.
|
|
284
|
-
SNPs extracted for chr4.
|
|
285
|
-
SNPs extracted for chr5.
|
|
286
|
-
SNPs extracted for chr6.
|
|
287
|
-
SNPs extracted for chr7.
|
|
288
|
-
SNPs extracted for chr8.
|
|
289
|
-
SNPs extracted for chr9.
|
|
290
|
-
SNPs extracted for chr10.
|
|
291
|
-
SNPs extracted for chr11.
|
|
292
|
-
SNPs extracted for chr12.
|
|
293
|
-
SNPs extracted for chr13.
|
|
294
|
-
SNPs extracted for chr14.
|
|
295
|
-
SNPs extracted for chr15.
|
|
296
|
-
SNPs extracted for chr16.
|
|
297
|
-
SNPs extracted for chr17.
|
|
298
|
-
SNPs extracted for chr18.
|
|
299
|
-
SNPs extracted for chr19.
|
|
300
|
-
SNPs extracted for chr20.
|
|
301
|
-
SNPs extracted for chr21.
|
|
284
|
+
...
|
|
302
285
|
SNPs extracted for chr22.
|
|
303
286
|
Merging SNPs extracted from each chromosome...
|
|
304
287
|
Created bed/bim/fam fileset with extracted SNPs: tmp_GENAL/4f4ce6a7_allchr
|
|
@@ -451,9 +434,9 @@ If you want to visualize the obtained MR results, you can use the :meth:`~genal.
|
|
|
451
434
|
.. image:: Images/MR_plot_SBP_AS.png
|
|
452
435
|
:alt: MR plot
|
|
453
436
|
|
|
454
|
-
You can select which MR methods
|
|
437
|
+
You can select which MR methods to plot with the ``methods`` argument. Note that for an MR method to be plotted, they must be included in the latest :meth:`~genal.Geno.MR` call of this :class:`~genal.Geno` instance.
|
|
455
438
|
|
|
456
|
-
|
|
439
|
+
To include the heterogeneity values (Cochran's Q) in the results, you can use the heterogeneity argument in the :meth:`~genal.Geno.MR` call. Here, the heterogeneity for the inverse-variance weighted method:
|
|
457
440
|
|
|
458
441
|
.. code-block:: python
|
|
459
442
|
|
|
@@ -468,8 +451,12 @@ And that will give:
|
|
|
468
451
|
1 SBP Stroke_eur Egger Intercept 1499 -0.001381 0.000813 8.935529e-02 2959.965136 1497 1.253763e-98
|
|
469
452
|
2 SBP Stroke_eur Inverse-Variance Weighted 1499 0.023049 0.001061 1.382645e-104 2965.678836 1498 4.280737e-99
|
|
470
453
|
|
|
454
|
+
To display the coefficients as odds ratios with confidence intervals for a binary outcome trait, you can use the `odds = True` argument:
|
|
455
|
+
|
|
456
|
+
.. code-block:: python
|
|
457
|
+
|
|
458
|
+
SBP_clumped.MR(action=2, methods=["Egger","IVW"], exposure_name="SBP", outcome_name="Stroke_eur", heterogeneity=True, odds=True)
|
|
471
459
|
|
|
472
|
-
|
|
473
460
|
As expected, many MR methods indicate that SBP is strongly associated with stroke, but there could be concerns for horizontal pleiotropy (instruments influencing the outcome through a different pathway than the one used as exposure) given the almost significant MR-Egger intercept p-value.
|
|
474
461
|
|
|
475
462
|
To investigate horizontal pleiotropy in more detail, a very useful method is Mendelian Randomization Pleiotropy RESidual Sum and Outlier (MR-PRESSO).
|
|
@@ -506,7 +493,7 @@ Let's start by loading phenotypic data:
|
|
|
506
493
|
df_pheno = pd.read_csv("path/to/trait/data")
|
|
507
494
|
|
|
508
495
|
.. note::
|
|
509
|
-
One important point is to make sure that the IDs of the participants are identical in the phenotypic data and in the genetic data.
|
|
496
|
+
One important point is to make sure that both the Family IDs (FID) and Individual IDs (IID) of the participants are identical in the phenotypic data and in the genetic data.
|
|
510
497
|
|
|
511
498
|
Then, it is advised to make a copy of the :class:`~genal.Geno` instance containing our instruments as we are going to update their coefficients and to avoid any confusion:
|
|
512
499
|
|
|
@@ -518,7 +505,7 @@ We can then call the :meth:`~genal.Geno.set_phenotype` method, specifying which
|
|
|
518
505
|
|
|
519
506
|
.. code-block:: python
|
|
520
507
|
|
|
521
|
-
SBP_adjusted.set_phenotype(df_pheno, PHENO="htn", IID="IID")
|
|
508
|
+
SBP_adjusted.set_phenotype(df_pheno, PHENO="htn", IID="IID", FID="FID")
|
|
522
509
|
|
|
523
510
|
At this point, genal will identify if the phenotype is binary or quantitative in order to choose the appropriate regression model. If the phenotype is binary, it will assume that the most frequent value is coding for control (and the other value for case), this can be changed with ``alternate_control=True``::
|
|
524
511
|
|
|
@@ -537,26 +524,7 @@ Genal will print information regarding the number of individuals used in the tes
|
|
|
537
524
|
CHR/POS columns present: SNPs searched based on genomic positions.
|
|
538
525
|
Extracting SNPs for each chromosome...
|
|
539
526
|
SNPs extracted for chr1.
|
|
540
|
-
|
|
541
|
-
SNPs extracted for chr3.
|
|
542
|
-
SNPs extracted for chr4.
|
|
543
|
-
SNPs extracted for chr5.
|
|
544
|
-
SNPs extracted for chr6.
|
|
545
|
-
SNPs extracted for chr7.
|
|
546
|
-
SNPs extracted for chr8.
|
|
547
|
-
SNPs extracted for chr9.
|
|
548
|
-
SNPs extracted for chr10.
|
|
549
|
-
SNPs extracted for chr11.
|
|
550
|
-
SNPs extracted for chr12.
|
|
551
|
-
SNPs extracted for chr13.
|
|
552
|
-
SNPs extracted for chr14.
|
|
553
|
-
SNPs extracted for chr15.
|
|
554
|
-
SNPs extracted for chr16.
|
|
555
|
-
SNPs extracted for chr17.
|
|
556
|
-
SNPs extracted for chr18.
|
|
557
|
-
SNPs extracted for chr19.
|
|
558
|
-
SNPs extracted for chr20.
|
|
559
|
-
SNPs extracted for chr21.
|
|
527
|
+
...
|
|
560
528
|
SNPs extracted for chr22.
|
|
561
529
|
Merging SNPs extracted from each chromosome...
|
|
562
530
|
Created bed/bim/fam fileset with extracted SNPs: tmp_GENAL/e415aab3_allchr
|