genal-python 0.7__tar.gz → 0.9__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- genal_python-0.9/.DS_Store +0 -0
- {genal_python-0.7 → genal_python-0.9}/.gitignore +1 -1
- genal_python-0.9/.readthedocs.yaml +22 -0
- {genal_python-0.7 → genal_python-0.9}/PKG-INFO +105 -29
- {genal_python-0.7 → genal_python-0.9}/README.md +105 -27
- genal_python-0.9/docs/.DS_Store +0 -0
- genal_python-0.9/docs/build/.DS_Store +0 -0
- {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/.buildinfo +1 -1
- genal_python-0.9/docs/build/.doctrees/api.doctree +0 -0
- genal_python-0.9/docs/build/.doctrees/environment.pickle +0 -0
- genal_python-0.9/docs/build/.doctrees/genal.doctree +0 -0
- genal_python-0.9/docs/build/.doctrees/index.doctree +0 -0
- genal_python-0.9/docs/build/.doctrees/introduction.doctree +0 -0
- genal_python-0.9/docs/build/.doctrees/modules.doctree +0 -0
- genal_python-0.9/docs/build/_modules/genal/Geno.html +1480 -0
- genal_python-0.9/docs/build/_modules/genal/MR.html +1065 -0
- genal_python-0.9/docs/build/_modules/genal/MR_tools.html +671 -0
- genal_python-0.9/docs/build/_modules/genal/MRpresso.html +409 -0
- genal_python-0.9/docs/build/_modules/genal/association.html +445 -0
- genal_python-0.9/docs/build/_modules/genal/clump.html +183 -0
- genal_python-0.9/docs/build/_modules/genal/extract_prs.html +426 -0
- genal_python-0.9/docs/build/_modules/genal/geno_tools.html +567 -0
- genal_python-0.9/docs/build/_modules/genal/lift.html +371 -0
- genal_python-0.9/docs/build/_modules/genal/proxy.html +359 -0
- genal_python-0.9/docs/build/_modules/genal/snp_query.html +231 -0
- genal_python-0.9/docs/build/_modules/genal/tools.html +440 -0
- genal_python-0.9/docs/build/_modules/index.html +114 -0
- genal_python-0.7/docs/_build/html/_sources/source/genal.rst.txt → genal_python-0.9/docs/build/_sources/api.rst.txt +39 -40
- {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/_sources/index.rst.txt +15 -5
- genal_python-0.9/docs/build/_sources/introduction.rst.txt +674 -0
- genal_python-0.9/docs/build/_sources/modules.rst.txt +82 -0
- {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/_static/basic.css +1 -1
- {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/_static/doctools.js +1 -1
- {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/_static/documentation_options.js +1 -1
- {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/_static/language_data.js +2 -2
- {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/_static/searchtools.js +105 -60
- genal_python-0.7/docs/_build/html/source/genal.html → genal_python-0.9/docs/build/api.html +1310 -1119
- {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/genal.html +191 -170
- genal_python-0.9/docs/build/genindex.html +584 -0
- {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/index.html +36 -42
- genal_python-0.9/docs/build/introduction.html +714 -0
- genal_python-0.7/docs/_build/html/api.html → genal_python-0.9/docs/build/modules.html +299 -421
- genal_python-0.9/docs/build/objects.inv +0 -0
- {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/py-modindex.html +20 -20
- {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/search.html +8 -8
- genal_python-0.9/docs/build/searchindex.js +1 -0
- genal_python-0.9/docs/source/.DS_Store +0 -0
- genal_python-0.9/docs/source/Images/MR_plot_SBP_AS.png +0 -0
- genal_python-0.7/docs/source/genal.rst → genal_python-0.9/docs/source/api.rst +39 -40
- {genal_python-0.7 → genal_python-0.9}/docs/source/conf.py +3 -2
- {genal_python-0.7 → genal_python-0.9}/docs/source/index.rst +15 -5
- genal_python-0.9/docs/source/introduction.rst +677 -0
- genal_python-0.9/docs/source/modules.rst +82 -0
- {genal_python-0.7 → genal_python-0.9}/genal/Geno.py +73 -50
- {genal_python-0.7 → genal_python-0.9}/genal/MR.py +16 -16
- {genal_python-0.7 → genal_python-0.9}/genal/MR_tools.py +11 -0
- {genal_python-0.7 → genal_python-0.9}/genal/__init__.py +1 -1
- {genal_python-0.7 → genal_python-0.9}/genal/constants.py +1 -0
- {genal_python-0.7 → genal_python-0.9}/genal/extract_prs.py +34 -12
- {genal_python-0.7 → genal_python-0.9}/genal/geno_tools.py +2 -2
- {genal_python-0.7 → genal_python-0.9}/genal/snp_query.py +53 -17
- {genal_python-0.7 → genal_python-0.9}/genal/tools.py +16 -6
- {genal_python-0.7 → genal_python-0.9}/pyproject.toml +2 -3
- genal_python-0.7/docs/_build/doctrees/api.doctree +0 -0
- genal_python-0.7/docs/_build/doctrees/environment.pickle +0 -0
- genal_python-0.7/docs/_build/doctrees/genal.doctree +0 -0
- genal_python-0.7/docs/_build/doctrees/index.doctree +0 -0
- genal_python-0.7/docs/_build/doctrees/introduction.doctree +0 -0
- genal_python-0.7/docs/_build/doctrees/modules.doctree +0 -0
- genal_python-0.7/docs/_build/doctrees/source/genal.doctree +0 -0
- genal_python-0.7/docs/_build/doctrees/source/modules.doctree +0 -0
- genal_python-0.7/docs/_build/html/_sources/api.rst.txt +0 -24
- genal_python-0.7/docs/_build/html/_sources/introduction.rst.txt +0 -505
- genal_python-0.7/docs/_build/html/_sources/modules.rst.txt +0 -7
- genal_python-0.7/docs/_build/html/_sources/source/modules.rst.txt +0 -7
- genal_python-0.7/docs/_build/html/_static/_sphinx_javascript_frameworks_compat.js +0 -123
- genal_python-0.7/docs/_build/html/_static/jquery.js +0 -2
- genal_python-0.7/docs/_build/html/genindex.html +0 -701
- genal_python-0.7/docs/_build/html/introduction.html +0 -584
- genal_python-0.7/docs/_build/html/modules.html +0 -269
- genal_python-0.7/docs/_build/html/objects.inv +0 -0
- genal_python-0.7/docs/_build/html/searchindex.js +0 -1
- genal_python-0.7/docs/_build/html/source/modules.html +0 -259
- genal_python-0.7/docs/requirements.txt +0 -2
- genal_python-0.7/docs/source/api.rst +0 -24
- genal_python-0.7/docs/source/introduction.rst +0 -505
- genal_python-0.7/docs/source/modules.rst +0 -7
- {genal_python-0.7 → genal_python-0.9}/LICENSE +0 -0
- {genal_python-0.7 → genal_python-0.9}/docs/Makefile +0 -0
- {genal_python-0.7/docs/Images → genal_python-0.9/docs/build/_images}/MR_plot_SBP_AS.png +0 -0
- {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/_sources/genal.rst.txt +0 -0
- {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/_static/css/badge_only.css +0 -0
- {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/_static/css/fonts/Roboto-Slab-Bold.woff +0 -0
- {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/_static/css/fonts/Roboto-Slab-Bold.woff2 +0 -0
- {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/_static/css/fonts/Roboto-Slab-Regular.woff +0 -0
- {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/_static/css/fonts/Roboto-Slab-Regular.woff2 +0 -0
- {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/_static/css/fonts/fontawesome-webfont.eot +0 -0
- {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/_static/css/fonts/fontawesome-webfont.svg +0 -0
- {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/_static/css/fonts/fontawesome-webfont.ttf +0 -0
- {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/_static/css/fonts/fontawesome-webfont.woff +0 -0
- {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/_static/css/fonts/fontawesome-webfont.woff2 +0 -0
- {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/_static/css/fonts/lato-bold-italic.woff +0 -0
- {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/_static/css/fonts/lato-bold-italic.woff2 +0 -0
- {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/_static/css/fonts/lato-bold.woff +0 -0
- {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/_static/css/fonts/lato-bold.woff2 +0 -0
- {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/_static/css/fonts/lato-normal-italic.woff +0 -0
- {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/_static/css/fonts/lato-normal-italic.woff2 +0 -0
- {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/_static/css/fonts/lato-normal.woff +0 -0
- {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/_static/css/fonts/lato-normal.woff2 +0 -0
- {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/_static/css/theme.css +0 -0
- {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/_static/file.png +0 -0
- {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/_static/js/badge_only.js +0 -0
- {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/_static/js/html5shiv-printshiv.min.js +0 -0
- {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/_static/js/html5shiv.min.js +0 -0
- {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/_static/js/theme.js +0 -0
- {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/_static/minus.png +0 -0
- {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/_static/plus.png +0 -0
- {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/_static/pygments.css +0 -0
- {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/_static/sphinx_highlight.js +0 -0
- {genal_python-0.7 → genal_python-0.9}/docs/make.bat +0 -0
- {genal_python-0.7 → genal_python-0.9}/genal/MRpresso.py +0 -0
- {genal_python-0.7 → genal_python-0.9}/genal/association.py +0 -0
- {genal_python-0.7 → genal_python-0.9}/genal/clump.py +0 -0
- {genal_python-0.7 → genal_python-0.9}/genal/lift.py +0 -0
- {genal_python-0.7 → genal_python-0.9}/genal/proxy.py +0 -0
- {genal_python-0.7 → genal_python-0.9}/gitignore +0 -0
- {genal_python-0.7 → genal_python-0.9}/readthedocs.yaml +0 -0
- {genal_python-0.7 → genal_python-0.9}/requirements.txt +0 -0
|
Binary file
|
|
@@ -0,0 +1,22 @@
|
|
|
1
|
+
# .readthedocs.yaml
|
|
2
|
+
# Read the Docs configuration file
|
|
3
|
+
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details
|
|
4
|
+
|
|
5
|
+
# Required
|
|
6
|
+
version: 2
|
|
7
|
+
|
|
8
|
+
# Set the version of Python and other tools you might need
|
|
9
|
+
build:
|
|
10
|
+
os: ubuntu-22.04
|
|
11
|
+
tools:
|
|
12
|
+
python: "3.11"
|
|
13
|
+
|
|
14
|
+
# Build documentation in the docs/ directory with Sphinx
|
|
15
|
+
sphinx:
|
|
16
|
+
configuration: docs/source/conf.py
|
|
17
|
+
|
|
18
|
+
# We recommend specifying your dependencies to enable reproducible builds:
|
|
19
|
+
# https://docs.readthedocs.io/en/stable/guides/reproducible-builds.html
|
|
20
|
+
python:
|
|
21
|
+
install:
|
|
22
|
+
- requirements: docs/requirements.txt
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.1
|
|
2
2
|
Name: genal-python
|
|
3
|
-
Version: 0.
|
|
3
|
+
Version: 0.9
|
|
4
4
|
Summary: A python toolkit for polygenic risk scoring and mendelian randomization.
|
|
5
5
|
Author-email: Cyprien Rivier <riviercyprien@gmail.com>
|
|
6
6
|
Requires-Python: >=3.7
|
|
@@ -17,7 +17,6 @@ Requires-Dist: psutil==5.9.1
|
|
|
17
17
|
Requires-Dist: pyliftover==0.4
|
|
18
18
|
Requires-Dist: scikit_learn>=1.3.0
|
|
19
19
|
Requires-Dist: scipy>=1.11.4
|
|
20
|
-
Requires-Dist: sphinx_rtd_theme==1.3.0
|
|
21
20
|
Requires-Dist: statsmodels==0.14.0
|
|
22
21
|
Requires-Dist: tqdm==4.66.1
|
|
23
22
|
Requires-Dist: wget==3.2
|
|
@@ -32,10 +31,11 @@ Project-URL: Home, https://github.com/CypRiv/genal
|
|
|
32
31
|
|
|
33
32
|
# Table of contents
|
|
34
33
|
1. [Introduction](#introduction)
|
|
35
|
-
2. [Citation]
|
|
34
|
+
2. [Citation](#citation)
|
|
36
35
|
3. [Requirements for the genal module](#paragraph1)
|
|
37
36
|
4. [Installation and how to use genal](#paragraph2)
|
|
38
37
|
1. [Installation](#paragraph2.1)
|
|
38
|
+
2. [Documentation](#paragraph2.2)
|
|
39
39
|
5. [Tutorial and presentation of the main tools](#paragraph3)
|
|
40
40
|
1. [Data loading](#paragraph3.1)
|
|
41
41
|
2. [Data preprocessing](#paragraph3.2)
|
|
@@ -44,6 +44,7 @@ Project-URL: Home, https://github.com/CypRiv/genal
|
|
|
44
44
|
5. [Mendelian Randomization](#paragraph3.5)
|
|
45
45
|
6. [SNP-association testing](#paragraph3.6)
|
|
46
46
|
7. [Lifting](#paragraph3.7)
|
|
47
|
+
8. [GWAS Catalog](#paragraph3.8)
|
|
47
48
|
|
|
48
49
|
|
|
49
50
|
## Introduction <a name="introduction"></a>
|
|
@@ -58,18 +59,27 @@ If you're using genal, please cite the following paper:
|
|
|
58
59
|
**Genal: A Python Toolkit for Genetic Risk Scoring and Mendelian Randomization.** Cyprien A. Rivier, Santiago Clocchiatti-Tuozzo, Shufan Huo, Victor Torres-Lopez, Daniela Renedo, Kevin N. Sheth, Guido J. Falcone, Julian N. Acosta. medRxiv 2024.05.23.24307776; doi: https://doi.org/10.1101/2024.05.23.24307776
|
|
59
60
|
|
|
60
61
|
## Requirements for the genal module <a name="paragraph1"></a>
|
|
61
|
-
***Python 3.
|
|
62
|
+
***Python 3.11 or later***. https://www.python.org/ <br>
|
|
62
63
|
|
|
63
64
|
|
|
64
65
|
## Installation and How to use the genal module <a name="paragraph2"></a>
|
|
65
66
|
|
|
66
67
|
### Installation <a name="paragraph2.1"></a>
|
|
67
68
|
|
|
69
|
+
> **Note:**
|
|
70
|
+
>
|
|
71
|
+
> **Optional**: It is recommended to create a new environment to avoid dependencies conflicts. Here, we create a new conda environment called 'genal_env'.
|
|
72
|
+
> ```
|
|
73
|
+
> conda create --name genal_env python=3.11
|
|
74
|
+
> conda activate genal_env
|
|
75
|
+
> ```
|
|
76
|
+
|
|
77
|
+
|
|
68
78
|
Download and install the package with pip:
|
|
69
79
|
```
|
|
70
80
|
pip install genal-python
|
|
71
81
|
```
|
|
72
|
-
And it
|
|
82
|
+
And import it in a python environment with:
|
|
73
83
|
```python
|
|
74
84
|
import genal
|
|
75
85
|
```
|
|
@@ -80,6 +90,16 @@ Once downloaded, the path to the plink executable can be set with:
|
|
|
80
90
|
```
|
|
81
91
|
genal.set_plink(path="/path/to/plink/executable/file")
|
|
82
92
|
```
|
|
93
|
+
### Documentation <a name="paragraph2.2"></a>
|
|
94
|
+
|
|
95
|
+
For detailed information on how to use the functionalities of Genal, please refer to the documentation: https://genal.rtfd.io
|
|
96
|
+
|
|
97
|
+
The documentation covers:
|
|
98
|
+
- Installation
|
|
99
|
+
- This tutorial
|
|
100
|
+
- The list of the main functions with complete description of their arguments
|
|
101
|
+
- An exhaustive API reference
|
|
102
|
+
|
|
83
103
|
|
|
84
104
|
## Tutorial <a name="paragraph3"></a>
|
|
85
105
|
For this tutorial, we will obtain genetic instruments for systolic blood pressure (SBP), compute a Polygenic Risk Score (PRS), and run a Mendelian Randomization analysis to investigate the genetically-determined effect of SBP on the risk of stroke. We will utilize summary statistics from Genome-Wide Association Studies (GWAS) and individual-level data from the UK Biobank. The steps include:
|
|
@@ -100,11 +120,11 @@ For this tutorial, we will obtain genetic instruments for systolic blood pressur
|
|
|
100
120
|
- Data lifting to another genomic build
|
|
101
121
|
- In pure Python
|
|
102
122
|
- Using LiftOver
|
|
103
|
-
-
|
|
123
|
+
- Querying the GWAS Catalog
|
|
104
124
|
|
|
105
125
|
### Data loading <a name="paragraph3.1"></a>
|
|
106
126
|
|
|
107
|
-
We
|
|
127
|
+
We start this tutorial with publicly available summary statistics from a large GWAS study of systolic blood pressure. [Link to study](https://www.nature.com/articles/s41588-018-0205-x). After downloading and unzipping the summary statistics, we load them into a pandas DataFrame:
|
|
108
128
|
|
|
109
129
|
```python
|
|
110
130
|
import pandas as pd
|
|
@@ -133,6 +153,10 @@ The `genal.Geno` takes as input a pandas dataframe where each row corresponds to
|
|
|
133
153
|
- **P**: Column name for effect p-value. Defaults to `'P'`.
|
|
134
154
|
- **EAF**: Column name for effect allele frequency. Defaults to `'EAF'`.
|
|
135
155
|
|
|
156
|
+
> **Note:**
|
|
157
|
+
>
|
|
158
|
+
> You do not need all columns to move forward, as not all columns are required by every function. Additionally, some columns can be imputed as we will see in the next paragraph.
|
|
159
|
+
|
|
136
160
|
After inspecting the dataframe, we first need to extract the chromosome and position information from the `MarkerName` column into two new columns `CHR` and `POS`:
|
|
137
161
|
|
|
138
162
|
```python
|
|
@@ -158,7 +182,7 @@ The last argument (`keep_columns = False`) indicates that we do not wish to keep
|
|
|
158
182
|
|
|
159
183
|
> **Note:**
|
|
160
184
|
>
|
|
161
|
-
> Make sure to read the readme file usually provided with the summary statistics to identify the correct columns. It is particularly important to correctly identify the allele that represents the effect allele.
|
|
185
|
+
> Make sure to read the readme file usually provided with the summary statistics to identify the correct columns. It is particularly important to correctly identify the allele that represents the effect allele.
|
|
162
186
|
|
|
163
187
|
### Data preprocessing <a name="paragraph3.2"></a>
|
|
164
188
|
|
|
@@ -222,7 +246,7 @@ You can also use a custom reference panel by specifying to the reference_panel a
|
|
|
222
246
|
|
|
223
247
|
### Clumping <a name="paragraph3.3"></a>
|
|
224
248
|
|
|
225
|
-
Clumping is the step at which we select the SNPs that will be used as our genetic instruments in future Polygenic Risk Scores and Mendelian Randomization analyses. The process involves identifying the SNPs that are strongly associated with our trait of interest (systolic blood pressure in this tutorial) and are independent from each other. This second step ensures that selected SNPs are not highly correlated, (i.e., they are not in high linkage disequilibrium). For this step, we again need to use a reference panel.
|
|
249
|
+
Clumping, or C+T: Clumping + Thresholding, is the step at which we select the SNPs that will be used as our genetic instruments in future Polygenic Risk Scores and Mendelian Randomization analyses. The process involves identifying the SNPs that are strongly associated with our trait of interest (systolic blood pressure in this tutorial) and are independent from each other. This second step ensures that selected SNPs are not highly correlated, (i.e., they are not in high linkage disequilibrium). For this step, we again need to use a reference panel.
|
|
226
250
|
|
|
227
251
|
The SNP-data loaded in a `genal.Geno` instance can be clumped using the `genal.Geno.clump` method. It will return another `genal.Geno` instance containing only the clumped data:
|
|
228
252
|
|
|
@@ -294,7 +318,7 @@ The output of the `genal.Geno.prs` method will include how many SNPs were used t
|
|
|
294
318
|
Here, we see that about half of the SNPs were not extracted from the data. In such cases, we may want to try and salvage some of these SNPs by looking for proxies (SNPs in high linkage disequilibrium, i.e. highly correlated SNPs). This can be done by specifying the `proxy = True`. argument:
|
|
295
319
|
|
|
296
320
|
```python
|
|
297
|
-
SBP_clumped.prs(name = "
|
|
321
|
+
SBP_clumped.prs(name = "SBP_prs_proxy" ,path = "Pop_chr$", proxy = True, reference_panel = "eur", r2=0.8, kb=5000, window_snps=5000)
|
|
298
322
|
```
|
|
299
323
|
|
|
300
324
|
and the output is:
|
|
@@ -337,7 +361,7 @@ and the output is:
|
|
|
337
361
|
The PRS computation was successful and used 1330/1538 (86.476%) SNPs.
|
|
338
362
|
PRS data saved to SBP_prs.csv
|
|
339
363
|
|
|
340
|
-
In our case, we have been able to find proxies for
|
|
364
|
+
In our case, we have been able to find proxies for 578 of the 786 SNPs that were missing in the population genetic data (7 potential proxies have been removed because they were identical to SNPs already present in our data).
|
|
341
365
|
|
|
342
366
|
You can customize how the proxies are chosen with the following arguments:
|
|
343
367
|
- `reference_panel`: The reference population used to derive linkage disequilibrium values and find proxies. Defaults to `eur`.
|
|
@@ -347,7 +371,7 @@ You can customize how the proxies are chosen with the following arguments:
|
|
|
347
371
|
|
|
348
372
|
> **Note:**
|
|
349
373
|
>
|
|
350
|
-
> You can call the `genal.Geno.prs` method on any `Geno` instance (containing at least the EA, BETA, and either SNP or CHR/POS columns). The data does not need to be clumped, and there is no limit to the number of
|
|
374
|
+
> You can call the `genal.Geno.prs` method on any `Geno` instance (containing at least the EA, BETA, and either SNP or CHR/POS columns). The data does not need to be clumped, and there is no limit to the number of SNPs used to compute the scores.
|
|
351
375
|
|
|
352
376
|
|
|
353
377
|
### Mendelian Randomization <a name="paragraph3.5"></a>
|
|
@@ -395,21 +419,25 @@ Genal will print how many SNPs were successfully found and extracted from the ou
|
|
|
395
419
|
1541 SNPs out of 1545 are present in the outcome data.
|
|
396
420
|
(Exposure data, Outcome data, Outcome name) stored in the .MR_data attribute.
|
|
397
421
|
|
|
398
|
-
Here as well you have the option to use proxies for the instruments that are not present in the outcome data:
|
|
399
|
-
|
|
400
|
-
```python
|
|
401
|
-
SBP_clumped.query_outcome(Stroke_geno, proxy = True, reference_panel = "eur", kb = 5000, r2 = 0.6, window_snps = 5000)
|
|
402
|
-
```
|
|
403
|
-
|
|
404
|
-
And genal will print the number of missing instruments which have been proxied:
|
|
405
422
|
|
|
406
|
-
|
|
407
|
-
|
|
408
|
-
|
|
409
|
-
|
|
410
|
-
|
|
411
|
-
|
|
412
|
-
|
|
423
|
+
> **Note:**
|
|
424
|
+
>Here as well you have the option to use proxies for the instruments that are not present in the outcome data:
|
|
425
|
+
>
|
|
426
|
+
> Here as well you have the option to use proxies for the instruments that are not present in the outcome data:
|
|
427
|
+
>
|
|
428
|
+
> ```python
|
|
429
|
+
> SBP_clumped.query_outcome(Stroke_geno, proxy = True, reference_panel = "eur", kb = 5000, r2 = 0.6, window_snps = 5000)
|
|
430
|
+
> ```
|
|
431
|
+
>
|
|
432
|
+
> And genal will print the number of missing instruments that have been proxied:
|
|
433
|
+
>
|
|
434
|
+
> Outcome data successfully loaded from 'b352e412' geno instance.
|
|
435
|
+
> Identifying the exposure SNPs present in the outcome data...
|
|
436
|
+
> 1541 SNPs out of 1545 are present in the outcome data.
|
|
437
|
+
> Searching proxies for 4 SNPs...
|
|
438
|
+
> Using the EUR reference panel.
|
|
439
|
+
> Found proxies for 4 SNPs.
|
|
440
|
+
> (Exposure data, Outcome data, Outcome name) stored in the .MR_data attribute.
|
|
413
441
|
|
|
414
442
|
After extracting the instruments from the outcome data, the `SBP_clumped` `genal.Geno` instance contains an `MR_data` attribute containing the instruments-exposure and instruments-outcome associations necessary to run MR. Running MR is now as simple as calling the `genal.Geno.MR` method of the SBP_clumped `genal.Geno` instance:
|
|
415
443
|
|
|
@@ -454,7 +482,7 @@ By default, only some MR methods (inverse-variance weighted, weighted median, Si
|
|
|
454
482
|
- `Weighted-mode` for the Weighted mode method
|
|
455
483
|
- `all` to run all the above methods
|
|
456
484
|
|
|
457
|
-
For more fine-tuning, such as settings for the number of boostrapping iterations, please refer to the API.
|
|
485
|
+
For more fine-tuning, such as settings for the number of boostrapping iterations, please refer to the API: [https://genal.readthedocs.io/en/latest/modules.html#id4](MR method).
|
|
458
486
|
|
|
459
487
|
If you want to visualize the obtained MR results, you can use the `genal.Geno.MR_plot` method that will plot each SNP in an `effect_on_exposure x effect_on_outcome` plane as well as lines corresponding to different MR methods:
|
|
460
488
|
|
|
@@ -462,7 +490,7 @@ If you want to visualize the obtained MR results, you can use the `genal.Geno.MR
|
|
|
462
490
|
SBP_clumped.MR_plot(filename="MR_plot_SBP_AS")
|
|
463
491
|
```
|
|
464
492
|
|
|
465
|
-

|
|
466
494
|
You can select which MR methods you wish to plot with the `methods` argument. Note that for an MR method to be plotted, they must be included in the latest `genal.Geno.MR` call of this `genal.Geno` instance.
|
|
467
495
|
|
|
468
496
|
If you wish to include the heterogeneity values (Cochran's Q) in the results, you can use the heterogeneity argument in the `genal.Geno.MR` call. Here, the heterogeneity for the inverse-variance weighted method:
|
|
@@ -507,7 +535,7 @@ df_pheno = pd.read_csv("path/to/trait/data")
|
|
|
507
535
|
|
|
508
536
|
> **Note:**
|
|
509
537
|
>
|
|
510
|
-
>
|
|
538
|
+
> One important point is to make sure that the IDs of the participants are identical in the phenotypic data and in the genetic data.
|
|
511
539
|
|
|
512
540
|
Then, it is advised to make a copy of the `genal.Geno` instance containing our instruments as we are going to update their coefficients and to avoid any confusion:
|
|
513
541
|
|
|
@@ -585,6 +613,54 @@ You can specify the path of the LiftOver executable to the `liftover_path` argum
|
|
|
585
613
|
SBP_Geno.lift(start = "hg19", end = "hg38", replace = False, liftover_path = "path/to/liftover/exec")
|
|
586
614
|
```
|
|
587
615
|
|
|
616
|
+
### GWAS Catalog <a name="paragraph3.8"></a>
|
|
588
617
|
|
|
618
|
+
It is sometimes interesting to determine the traits associated with our SNPs. In Mendelian Randomization, for instance, we may want to exclude instruments that are associated with traits likely causing horizontal pleiotropy. For this purpose, we can use the `genal.Geno.query_gwas_catalog` method. This method will query the GWAS Catalog API to determine the list of traits associated with each of our SNPs and store the results in a list in the `ASSOC` column of the `.data` attribute:
|
|
589
619
|
|
|
620
|
+
```python
|
|
621
|
+
SBP_clumped.query_gwas_catalog(p_threshold=5e-8)
|
|
622
|
+
```
|
|
623
|
+
Which will output:
|
|
624
|
+
|
|
625
|
+
Querying the GWAS Catalog and creating the ASSOC column.
|
|
626
|
+
Only associations with a p-value <= 5e-08 are reported. Use the p_threshold argument to change the threshold.
|
|
627
|
+
To report the p-value for each association, use return_p=True.
|
|
628
|
+
To report the study ID for each association, use return_study=True.
|
|
629
|
+
The .data attribute will be modified. Use replace=False to leave it as is.
|
|
630
|
+
100%|██████████| 1545/1545 [00:34<00:00, 44.86it/s]
|
|
631
|
+
The ASSOC column has been successfully created.
|
|
632
|
+
701 (45.37%) SNPs failed to query (not found in GWAS Catalog) and 7 (0.5%) SNPs timed out after 34.33 seconds. You can increase the timeout value with the timeout argument.
|
|
633
|
+
| EA | NEA | EAF | BETA | SE | CHR | POS | SNP | ASSOC |
|
|
634
|
+
|-----|-----|-------|--------|--------|-----|------------|------------|------------------------------------------------------------------------|
|
|
635
|
+
| A | G | 0.1784| 0.2330 | 0.0402 | 10 | 102075479 | rs603424 | [eicosanoids measurement, decadienedioic acid (...] |
|
|
636
|
+
| A | G | 0.0706| -0.3873| 0.0626 | 10 | 102403682 | rs2996303 | FAILED_QUERY |
|
|
637
|
+
| T | G | 0.8872| 0.6846 | 0.0480 | 10 | 102553647 | rs1006545 | [diastolic blood pressure, systolic blood pressure...] |
|
|
638
|
+
| T | G | 0.6652| -0.2098| 0.0340 | 10 | 102558506 | rs12570050 | FAILED_QUERY |
|
|
639
|
+
| T | C | 0.3057| -0.2448| 0.0334 | 10 | 102603924 | rs4919502 | FAILED_QUERY |
|
|
640
|
+
| ... | ... | ... | ... | ... | ... | ... | ... | ... | |
|
|
641
|
+
| T | C | 0.3514| 0.2203 | 0.0314 | 9 | 9350706 | rs1332813 | [diastolic blood pressure, systolic blood pressure...] |
|
|
642
|
+
| T | C | 0.6880| -0.1897| 0.0332 | 9 | 94201341 | rs10820855 | FAILED_QUERY |
|
|
643
|
+
| A | T | 0.3669| -0.1862| 0.0313 | 9 | 95201540 | rs7045409 | [protein measurement, pulse pressure measurement...] |
|
|
644
|
+
|
|
645
|
+
If you are also interested in the p-values of each SNP-trait association, or the ID of the study from which the association was reported, you can use the `return_p = True` and `return_study = True` arguments. Then, the `ASSOC` column will contain a list of tuples, where each tuple contains the trait name, the p-value, and the study ID:
|
|
590
646
|
|
|
647
|
+
```python
|
|
648
|
+
SBP_clumped.query_gwas_catalog(p_threshold=5e-8, return_p=True, return_study=True)
|
|
649
|
+
```
|
|
650
|
+
|
|
651
|
+
| EA | NEA | EAF | BETA | SE | CHR | POS | SNP | ASSOC |
|
|
652
|
+
|-----|-----|-------|--------|--------|-----|------------|------------|------------------------------------------------------------------------|
|
|
653
|
+
| A | G | 0.1784| 0.2330 | 0.0402 | 10 | 102075479 | rs603424 | TIMEOUT |
|
|
654
|
+
| A | G | 0.0706| -0.3873| 0.0626 | 10 | 102403682 | rs2996303 | FAILED_QUERY |
|
|
655
|
+
| T | G | 0.8872| 0.6846 | 0.0480 | 10 | 102553647 | rs1006545 | [(heart rate response to exercise, 6e-12, GCST... |
|
|
656
|
+
| T | G | 0.6652| -0.2098| 0.0340 | 10 | 102558506 | rs12570050 | FAILED_QUERY |
|
|
657
|
+
| T | C | 0.3057| -0.2448| 0.0334 | 10 | 102603924 | rs4919502 | FAILED_QUERY |
|
|
658
|
+
| ... | ... | ... | ... | ... | ... | ... | ... | ... | |
|
|
659
|
+
| T | C | 0.3514| 0.2203 | 0.0314 | 9 | 9350706 | rs1332813 | [(diastolic blood pressure, 1e-12, GCST9031029... |
|
|
660
|
+
| T | C | 0.6880| -0.1897| 0.0332 | 9 | 94201341 | rs10820855 | FAILED_QUERY |
|
|
661
|
+
| A | T | 0.3669| -0.1862| 0.0313 | 9 | 95201540 | rs7045409 | [(systolic blood pressure, 9e-13, GCST006624),... |
|
|
662
|
+
|
|
663
|
+
|
|
664
|
+
> **Note:**
|
|
665
|
+
>
|
|
666
|
+
> As you can see, many SNPs failed to be queried. This is normal as the GWAS Catalog is not exhaustive.
|
|
@@ -7,10 +7,11 @@
|
|
|
7
7
|
|
|
8
8
|
# Table of contents
|
|
9
9
|
1. [Introduction](#introduction)
|
|
10
|
-
2. [Citation]
|
|
10
|
+
2. [Citation](#citation)
|
|
11
11
|
3. [Requirements for the genal module](#paragraph1)
|
|
12
12
|
4. [Installation and how to use genal](#paragraph2)
|
|
13
13
|
1. [Installation](#paragraph2.1)
|
|
14
|
+
2. [Documentation](#paragraph2.2)
|
|
14
15
|
5. [Tutorial and presentation of the main tools](#paragraph3)
|
|
15
16
|
1. [Data loading](#paragraph3.1)
|
|
16
17
|
2. [Data preprocessing](#paragraph3.2)
|
|
@@ -19,6 +20,7 @@
|
|
|
19
20
|
5. [Mendelian Randomization](#paragraph3.5)
|
|
20
21
|
6. [SNP-association testing](#paragraph3.6)
|
|
21
22
|
7. [Lifting](#paragraph3.7)
|
|
23
|
+
8. [GWAS Catalog](#paragraph3.8)
|
|
22
24
|
|
|
23
25
|
|
|
24
26
|
## Introduction <a name="introduction"></a>
|
|
@@ -33,18 +35,27 @@ If you're using genal, please cite the following paper:
|
|
|
33
35
|
**Genal: A Python Toolkit for Genetic Risk Scoring and Mendelian Randomization.** Cyprien A. Rivier, Santiago Clocchiatti-Tuozzo, Shufan Huo, Victor Torres-Lopez, Daniela Renedo, Kevin N. Sheth, Guido J. Falcone, Julian N. Acosta. medRxiv 2024.05.23.24307776; doi: https://doi.org/10.1101/2024.05.23.24307776
|
|
34
36
|
|
|
35
37
|
## Requirements for the genal module <a name="paragraph1"></a>
|
|
36
|
-
***Python 3.
|
|
38
|
+
***Python 3.11 or later***. https://www.python.org/ <br>
|
|
37
39
|
|
|
38
40
|
|
|
39
41
|
## Installation and How to use the genal module <a name="paragraph2"></a>
|
|
40
42
|
|
|
41
43
|
### Installation <a name="paragraph2.1"></a>
|
|
42
44
|
|
|
45
|
+
> **Note:**
|
|
46
|
+
>
|
|
47
|
+
> **Optional**: It is recommended to create a new environment to avoid dependencies conflicts. Here, we create a new conda environment called 'genal_env'.
|
|
48
|
+
> ```
|
|
49
|
+
> conda create --name genal_env python=3.11
|
|
50
|
+
> conda activate genal_env
|
|
51
|
+
> ```
|
|
52
|
+
|
|
53
|
+
|
|
43
54
|
Download and install the package with pip:
|
|
44
55
|
```
|
|
45
56
|
pip install genal-python
|
|
46
57
|
```
|
|
47
|
-
And it
|
|
58
|
+
And import it in a python environment with:
|
|
48
59
|
```python
|
|
49
60
|
import genal
|
|
50
61
|
```
|
|
@@ -55,6 +66,16 @@ Once downloaded, the path to the plink executable can be set with:
|
|
|
55
66
|
```
|
|
56
67
|
genal.set_plink(path="/path/to/plink/executable/file")
|
|
57
68
|
```
|
|
69
|
+
### Documentation <a name="paragraph2.2"></a>
|
|
70
|
+
|
|
71
|
+
For detailed information on how to use the functionalities of Genal, please refer to the documentation: https://genal.rtfd.io
|
|
72
|
+
|
|
73
|
+
The documentation covers:
|
|
74
|
+
- Installation
|
|
75
|
+
- This tutorial
|
|
76
|
+
- The list of the main functions with complete description of their arguments
|
|
77
|
+
- An exhaustive API reference
|
|
78
|
+
|
|
58
79
|
|
|
59
80
|
## Tutorial <a name="paragraph3"></a>
|
|
60
81
|
For this tutorial, we will obtain genetic instruments for systolic blood pressure (SBP), compute a Polygenic Risk Score (PRS), and run a Mendelian Randomization analysis to investigate the genetically-determined effect of SBP on the risk of stroke. We will utilize summary statistics from Genome-Wide Association Studies (GWAS) and individual-level data from the UK Biobank. The steps include:
|
|
@@ -75,11 +96,11 @@ For this tutorial, we will obtain genetic instruments for systolic blood pressur
|
|
|
75
96
|
- Data lifting to another genomic build
|
|
76
97
|
- In pure Python
|
|
77
98
|
- Using LiftOver
|
|
78
|
-
-
|
|
99
|
+
- Querying the GWAS Catalog
|
|
79
100
|
|
|
80
101
|
### Data loading <a name="paragraph3.1"></a>
|
|
81
102
|
|
|
82
|
-
We
|
|
103
|
+
We start this tutorial with publicly available summary statistics from a large GWAS study of systolic blood pressure. [Link to study](https://www.nature.com/articles/s41588-018-0205-x). After downloading and unzipping the summary statistics, we load them into a pandas DataFrame:
|
|
83
104
|
|
|
84
105
|
```python
|
|
85
106
|
import pandas as pd
|
|
@@ -108,6 +129,10 @@ The `genal.Geno` takes as input a pandas dataframe where each row corresponds to
|
|
|
108
129
|
- **P**: Column name for effect p-value. Defaults to `'P'`.
|
|
109
130
|
- **EAF**: Column name for effect allele frequency. Defaults to `'EAF'`.
|
|
110
131
|
|
|
132
|
+
> **Note:**
|
|
133
|
+
>
|
|
134
|
+
> You do not need all columns to move forward, as not all columns are required by every function. Additionally, some columns can be imputed as we will see in the next paragraph.
|
|
135
|
+
|
|
111
136
|
After inspecting the dataframe, we first need to extract the chromosome and position information from the `MarkerName` column into two new columns `CHR` and `POS`:
|
|
112
137
|
|
|
113
138
|
```python
|
|
@@ -133,7 +158,7 @@ The last argument (`keep_columns = False`) indicates that we do not wish to keep
|
|
|
133
158
|
|
|
134
159
|
> **Note:**
|
|
135
160
|
>
|
|
136
|
-
> Make sure to read the readme file usually provided with the summary statistics to identify the correct columns. It is particularly important to correctly identify the allele that represents the effect allele.
|
|
161
|
+
> Make sure to read the readme file usually provided with the summary statistics to identify the correct columns. It is particularly important to correctly identify the allele that represents the effect allele.
|
|
137
162
|
|
|
138
163
|
### Data preprocessing <a name="paragraph3.2"></a>
|
|
139
164
|
|
|
@@ -197,7 +222,7 @@ You can also use a custom reference panel by specifying to the reference_panel a
|
|
|
197
222
|
|
|
198
223
|
### Clumping <a name="paragraph3.3"></a>
|
|
199
224
|
|
|
200
|
-
Clumping is the step at which we select the SNPs that will be used as our genetic instruments in future Polygenic Risk Scores and Mendelian Randomization analyses. The process involves identifying the SNPs that are strongly associated with our trait of interest (systolic blood pressure in this tutorial) and are independent from each other. This second step ensures that selected SNPs are not highly correlated, (i.e., they are not in high linkage disequilibrium). For this step, we again need to use a reference panel.
|
|
225
|
+
Clumping, or C+T: Clumping + Thresholding, is the step at which we select the SNPs that will be used as our genetic instruments in future Polygenic Risk Scores and Mendelian Randomization analyses. The process involves identifying the SNPs that are strongly associated with our trait of interest (systolic blood pressure in this tutorial) and are independent from each other. This second step ensures that selected SNPs are not highly correlated, (i.e., they are not in high linkage disequilibrium). For this step, we again need to use a reference panel.
|
|
201
226
|
|
|
202
227
|
The SNP-data loaded in a `genal.Geno` instance can be clumped using the `genal.Geno.clump` method. It will return another `genal.Geno` instance containing only the clumped data:
|
|
203
228
|
|
|
@@ -269,7 +294,7 @@ The output of the `genal.Geno.prs` method will include how many SNPs were used t
|
|
|
269
294
|
Here, we see that about half of the SNPs were not extracted from the data. In such cases, we may want to try and salvage some of these SNPs by looking for proxies (SNPs in high linkage disequilibrium, i.e. highly correlated SNPs). This can be done by specifying the `proxy = True`. argument:
|
|
270
295
|
|
|
271
296
|
```python
|
|
272
|
-
SBP_clumped.prs(name = "
|
|
297
|
+
SBP_clumped.prs(name = "SBP_prs_proxy" ,path = "Pop_chr$", proxy = True, reference_panel = "eur", r2=0.8, kb=5000, window_snps=5000)
|
|
273
298
|
```
|
|
274
299
|
|
|
275
300
|
and the output is:
|
|
@@ -312,7 +337,7 @@ and the output is:
|
|
|
312
337
|
The PRS computation was successful and used 1330/1538 (86.476%) SNPs.
|
|
313
338
|
PRS data saved to SBP_prs.csv
|
|
314
339
|
|
|
315
|
-
In our case, we have been able to find proxies for
|
|
340
|
+
In our case, we have been able to find proxies for 578 of the 786 SNPs that were missing in the population genetic data (7 potential proxies have been removed because they were identical to SNPs already present in our data).
|
|
316
341
|
|
|
317
342
|
You can customize how the proxies are chosen with the following arguments:
|
|
318
343
|
- `reference_panel`: The reference population used to derive linkage disequilibrium values and find proxies. Defaults to `eur`.
|
|
@@ -322,7 +347,7 @@ You can customize how the proxies are chosen with the following arguments:
|
|
|
322
347
|
|
|
323
348
|
> **Note:**
|
|
324
349
|
>
|
|
325
|
-
> You can call the `genal.Geno.prs` method on any `Geno` instance (containing at least the EA, BETA, and either SNP or CHR/POS columns). The data does not need to be clumped, and there is no limit to the number of
|
|
350
|
+
> You can call the `genal.Geno.prs` method on any `Geno` instance (containing at least the EA, BETA, and either SNP or CHR/POS columns). The data does not need to be clumped, and there is no limit to the number of SNPs used to compute the scores.
|
|
326
351
|
|
|
327
352
|
|
|
328
353
|
### Mendelian Randomization <a name="paragraph3.5"></a>
|
|
@@ -370,21 +395,25 @@ Genal will print how many SNPs were successfully found and extracted from the ou
|
|
|
370
395
|
1541 SNPs out of 1545 are present in the outcome data.
|
|
371
396
|
(Exposure data, Outcome data, Outcome name) stored in the .MR_data attribute.
|
|
372
397
|
|
|
373
|
-
Here as well you have the option to use proxies for the instruments that are not present in the outcome data:
|
|
374
|
-
|
|
375
|
-
```python
|
|
376
|
-
SBP_clumped.query_outcome(Stroke_geno, proxy = True, reference_panel = "eur", kb = 5000, r2 = 0.6, window_snps = 5000)
|
|
377
|
-
```
|
|
378
|
-
|
|
379
|
-
And genal will print the number of missing instruments which have been proxied:
|
|
380
398
|
|
|
381
|
-
|
|
382
|
-
|
|
383
|
-
|
|
384
|
-
|
|
385
|
-
|
|
386
|
-
|
|
387
|
-
|
|
399
|
+
> **Note:**
|
|
400
|
+
>Here as well you have the option to use proxies for the instruments that are not present in the outcome data:
|
|
401
|
+
>
|
|
402
|
+
> Here as well you have the option to use proxies for the instruments that are not present in the outcome data:
|
|
403
|
+
>
|
|
404
|
+
> ```python
|
|
405
|
+
> SBP_clumped.query_outcome(Stroke_geno, proxy = True, reference_panel = "eur", kb = 5000, r2 = 0.6, window_snps = 5000)
|
|
406
|
+
> ```
|
|
407
|
+
>
|
|
408
|
+
> And genal will print the number of missing instruments that have been proxied:
|
|
409
|
+
>
|
|
410
|
+
> Outcome data successfully loaded from 'b352e412' geno instance.
|
|
411
|
+
> Identifying the exposure SNPs present in the outcome data...
|
|
412
|
+
> 1541 SNPs out of 1545 are present in the outcome data.
|
|
413
|
+
> Searching proxies for 4 SNPs...
|
|
414
|
+
> Using the EUR reference panel.
|
|
415
|
+
> Found proxies for 4 SNPs.
|
|
416
|
+
> (Exposure data, Outcome data, Outcome name) stored in the .MR_data attribute.
|
|
388
417
|
|
|
389
418
|
After extracting the instruments from the outcome data, the `SBP_clumped` `genal.Geno` instance contains an `MR_data` attribute containing the instruments-exposure and instruments-outcome associations necessary to run MR. Running MR is now as simple as calling the `genal.Geno.MR` method of the SBP_clumped `genal.Geno` instance:
|
|
390
419
|
|
|
@@ -429,7 +458,7 @@ By default, only some MR methods (inverse-variance weighted, weighted median, Si
|
|
|
429
458
|
- `Weighted-mode` for the Weighted mode method
|
|
430
459
|
- `all` to run all the above methods
|
|
431
460
|
|
|
432
|
-
For more fine-tuning, such as settings for the number of boostrapping iterations, please refer to the API.
|
|
461
|
+
For more fine-tuning, such as settings for the number of boostrapping iterations, please refer to the API: [https://genal.readthedocs.io/en/latest/modules.html#id4](MR method).
|
|
433
462
|
|
|
434
463
|
If you want to visualize the obtained MR results, you can use the `genal.Geno.MR_plot` method that will plot each SNP in an `effect_on_exposure x effect_on_outcome` plane as well as lines corresponding to different MR methods:
|
|
435
464
|
|
|
@@ -437,7 +466,7 @@ If you want to visualize the obtained MR results, you can use the `genal.Geno.MR
|
|
|
437
466
|
SBP_clumped.MR_plot(filename="MR_plot_SBP_AS")
|
|
438
467
|
```
|
|
439
468
|
|
|
440
|
-

|
|
441
470
|
You can select which MR methods you wish to plot with the `methods` argument. Note that for an MR method to be plotted, they must be included in the latest `genal.Geno.MR` call of this `genal.Geno` instance.
|
|
442
471
|
|
|
443
472
|
If you wish to include the heterogeneity values (Cochran's Q) in the results, you can use the heterogeneity argument in the `genal.Geno.MR` call. Here, the heterogeneity for the inverse-variance weighted method:
|
|
@@ -482,7 +511,7 @@ df_pheno = pd.read_csv("path/to/trait/data")
|
|
|
482
511
|
|
|
483
512
|
> **Note:**
|
|
484
513
|
>
|
|
485
|
-
>
|
|
514
|
+
> One important point is to make sure that the IDs of the participants are identical in the phenotypic data and in the genetic data.
|
|
486
515
|
|
|
487
516
|
Then, it is advised to make a copy of the `genal.Geno` instance containing our instruments as we are going to update their coefficients and to avoid any confusion:
|
|
488
517
|
|
|
@@ -560,5 +589,54 @@ You can specify the path of the LiftOver executable to the `liftover_path` argum
|
|
|
560
589
|
SBP_Geno.lift(start = "hg19", end = "hg38", replace = False, liftover_path = "path/to/liftover/exec")
|
|
561
590
|
```
|
|
562
591
|
|
|
592
|
+
### GWAS Catalog <a name="paragraph3.8"></a>
|
|
593
|
+
|
|
594
|
+
It is sometimes interesting to determine the traits associated with our SNPs. In Mendelian Randomization, for instance, we may want to exclude instruments that are associated with traits likely causing horizontal pleiotropy. For this purpose, we can use the `genal.Geno.query_gwas_catalog` method. This method will query the GWAS Catalog API to determine the list of traits associated with each of our SNPs and store the results in a list in the `ASSOC` column of the `.data` attribute:
|
|
595
|
+
|
|
596
|
+
```python
|
|
597
|
+
SBP_clumped.query_gwas_catalog(p_threshold=5e-8)
|
|
598
|
+
```
|
|
599
|
+
Which will output:
|
|
600
|
+
|
|
601
|
+
Querying the GWAS Catalog and creating the ASSOC column.
|
|
602
|
+
Only associations with a p-value <= 5e-08 are reported. Use the p_threshold argument to change the threshold.
|
|
603
|
+
To report the p-value for each association, use return_p=True.
|
|
604
|
+
To report the study ID for each association, use return_study=True.
|
|
605
|
+
The .data attribute will be modified. Use replace=False to leave it as is.
|
|
606
|
+
100%|██████████| 1545/1545 [00:34<00:00, 44.86it/s]
|
|
607
|
+
The ASSOC column has been successfully created.
|
|
608
|
+
701 (45.37%) SNPs failed to query (not found in GWAS Catalog) and 7 (0.5%) SNPs timed out after 34.33 seconds. You can increase the timeout value with the timeout argument.
|
|
609
|
+
| EA | NEA | EAF | BETA | SE | CHR | POS | SNP | ASSOC |
|
|
610
|
+
|-----|-----|-------|--------|--------|-----|------------|------------|------------------------------------------------------------------------|
|
|
611
|
+
| A | G | 0.1784| 0.2330 | 0.0402 | 10 | 102075479 | rs603424 | [eicosanoids measurement, decadienedioic acid (...] |
|
|
612
|
+
| A | G | 0.0706| -0.3873| 0.0626 | 10 | 102403682 | rs2996303 | FAILED_QUERY |
|
|
613
|
+
| T | G | 0.8872| 0.6846 | 0.0480 | 10 | 102553647 | rs1006545 | [diastolic blood pressure, systolic blood pressure...] |
|
|
614
|
+
| T | G | 0.6652| -0.2098| 0.0340 | 10 | 102558506 | rs12570050 | FAILED_QUERY |
|
|
615
|
+
| T | C | 0.3057| -0.2448| 0.0334 | 10 | 102603924 | rs4919502 | FAILED_QUERY |
|
|
616
|
+
| ... | ... | ... | ... | ... | ... | ... | ... | ... | |
|
|
617
|
+
| T | C | 0.3514| 0.2203 | 0.0314 | 9 | 9350706 | rs1332813 | [diastolic blood pressure, systolic blood pressure...] |
|
|
618
|
+
| T | C | 0.6880| -0.1897| 0.0332 | 9 | 94201341 | rs10820855 | FAILED_QUERY |
|
|
619
|
+
| A | T | 0.3669| -0.1862| 0.0313 | 9 | 95201540 | rs7045409 | [protein measurement, pulse pressure measurement...] |
|
|
620
|
+
|
|
621
|
+
If you are also interested in the p-values of each SNP-trait association, or the ID of the study from which the association was reported, you can use the `return_p = True` and `return_study = True` arguments. Then, the `ASSOC` column will contain a list of tuples, where each tuple contains the trait name, the p-value, and the study ID:
|
|
622
|
+
|
|
623
|
+
```python
|
|
624
|
+
SBP_clumped.query_gwas_catalog(p_threshold=5e-8, return_p=True, return_study=True)
|
|
625
|
+
```
|
|
626
|
+
|
|
627
|
+
| EA | NEA | EAF | BETA | SE | CHR | POS | SNP | ASSOC |
|
|
628
|
+
|-----|-----|-------|--------|--------|-----|------------|------------|------------------------------------------------------------------------|
|
|
629
|
+
| A | G | 0.1784| 0.2330 | 0.0402 | 10 | 102075479 | rs603424 | TIMEOUT |
|
|
630
|
+
| A | G | 0.0706| -0.3873| 0.0626 | 10 | 102403682 | rs2996303 | FAILED_QUERY |
|
|
631
|
+
| T | G | 0.8872| 0.6846 | 0.0480 | 10 | 102553647 | rs1006545 | [(heart rate response to exercise, 6e-12, GCST... |
|
|
632
|
+
| T | G | 0.6652| -0.2098| 0.0340 | 10 | 102558506 | rs12570050 | FAILED_QUERY |
|
|
633
|
+
| T | C | 0.3057| -0.2448| 0.0334 | 10 | 102603924 | rs4919502 | FAILED_QUERY |
|
|
634
|
+
| ... | ... | ... | ... | ... | ... | ... | ... | ... | |
|
|
635
|
+
| T | C | 0.3514| 0.2203 | 0.0314 | 9 | 9350706 | rs1332813 | [(diastolic blood pressure, 1e-12, GCST9031029... |
|
|
636
|
+
| T | C | 0.6880| -0.1897| 0.0332 | 9 | 94201341 | rs10820855 | FAILED_QUERY |
|
|
637
|
+
| A | T | 0.3669| -0.1862| 0.0313 | 9 | 95201540 | rs7045409 | [(systolic blood pressure, 9e-13, GCST006624),... |
|
|
563
638
|
|
|
564
639
|
|
|
640
|
+
> **Note:**
|
|
641
|
+
>
|
|
642
|
+
> As you can see, many SNPs failed to be queried. This is normal as the GWAS Catalog is not exhaustive.
|
|
Binary file
|
|
Binary file
|
|
@@ -1,4 +1,4 @@
|
|
|
1
1
|
# Sphinx build info version 1
|
|
2
2
|
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
|
|
3
|
-
config:
|
|
3
|
+
config: 1a3c03fa317dbf0f46b6f7567774d6c5
|
|
4
4
|
tags: 645f666f9bcd5a90fca523b33c5a78b7
|
|
Binary file
|
|
Binary file
|
|
Binary file
|
|
Binary file
|
|
Binary file
|
|
Binary file
|